GPU scheduling and resource management
1 / 8
Scheduling, MIG, and Dynamic Resource AllocationHow to share scarce GPUs without ruining anyone's latency
The tension

GPUs are scarce, so sharing matters

An H100 is too big for a small model and too small for a 70B model. The platform has to decide: hand out whole GPUs, slice them, or coordinate several. Each choice has a different fairness, isolation, and performance shape.