Kube + LLM, made readable

Practical · gpu

GPU scheduling and resource management

You will know how Kubernetes discovers GPUs, when to share them, and how to plan tensor and pipeline parallelism.

Progress

0 / 3 lessons

01
GPU discovery on Kubernetes
How the cluster learns what hardware it actually has
3 min
02
Scheduling, MIG, and Dynamic Resource Allocation
How to share scarce GPUs without ruining anyone's latency
4 min
03
Multi-GPU inference: tensor and pipeline parallelism
When one GPU is not enough, how do many cooperate?
4 min

Lock it in

GPU sharing for a mixed workload

A shared GPU cluster supports both production inference and best-effort tuning experiments.

Try the scenario