Pick up where you left off.
A microlearning curriculum for running generative AI workloads on Kubernetes.
All paths
Re-tune- Path 01 · Foundation
Why Kubernetes for generative AI
You will be able to defend the choice of Kubernetes for an LLM workload — and explain what changes when the workload is a 30 GB model.
0/3 - Path 02 · Practical
Model serving on Kubernetes
You will know how to pick a model server, declare it with KServe, and deliver weights without baking them into your image.
0/3 - Path 03 · Practical
GPU scheduling and resource management
You will know how Kubernetes discovers GPUs, when to share them, and how to plan tensor and pipeline parallelism.
0/3 - Path 04 · Advanced
Scaling, routing, and disaggregated serving
You will be able to design an autoscaling, cache-aware, cost-aware inference plane that survives bursty traffic.
0/3 - Path 05 · Practical
Observability for LLM systems
You will know which metrics actually matter (TTFT, TPOT, goodput) and how to wire logs, metrics, and traces for streaming workloads.
0/3 - Path 06 · Advanced
Tuning at scale: LoRA and HPC scheduling
You will know when to fine-tune, how LoRA changes the serving story, and what gang and topology-aware scheduling buy you.
0/3 - Path 07 · Advanced
AI-driven apps: RAG and agents
You will be able to architect a RAG pipeline and a safe agentic system on Kubernetes, with state, identity, and failure domains in mind.
0/3
22 ideas, one diagram each. The fastest way to look something up.
Spaced flashcards built from definitions, decisions, and failure modes.
Compose a real platform. See where it leaks before users do.