Kube + LLM, made readable

Learn Concepts Review Playground Saved

Your dashboard

Pick up where you left off.

A microlearning curriculum for running generative AI workloads on Kubernetes.

Lessons completed

0

of 21

Saved insights

—

Save what matters from any card

Concepts

22

Browse the full library

All paths

Path 01 · Foundation
Why Kubernetes for generative AI
You will be able to defend the choice of Kubernetes for an LLM workload — and explain what changes when the workload is a 30 GB model.
0/3
Path 02 · Practical
Model serving on Kubernetes
You will know how to pick a model server, declare it with KServe, and deliver weights without baking them into your image.
0/3
Path 03 · Practical
GPU scheduling and resource management
You will know how Kubernetes discovers GPUs, when to share them, and how to plan tensor and pipeline parallelism.
0/3
Path 04 · Advanced
Scaling, routing, and disaggregated serving
You will be able to design an autoscaling, cache-aware, cost-aware inference plane that survives bursty traffic.
0/3
Path 05 · Practical
Observability for LLM systems
You will know which metrics actually matter (TTFT, TPOT, goodput) and how to wire logs, metrics, and traces for streaming workloads.
0/3
Path 06 · Advanced
Tuning at scale: LoRA and HPC scheduling
You will know when to fine-tune, how LoRA changes the serving story, and what gang and topology-aware scheduling buy you.
0/3
Path 07 · Advanced
AI-driven apps: RAG and agents
You will be able to architect a RAG pipeline and a safe agentic system on Kubernetes, with state, identity, and failure domains in mind.
0/3

Concept library

22 ideas, one diagram each. The fastest way to look something up.

Spaced flashcards built from definitions, decisions, and failure modes.

Architecture playground

Compose a real platform. See where it leaks before users do.