Generative AI on Kubernetes

Framing

All or nothing

Distributed training does not partially work. If you ask for 64 GPUs and the scheduler gives you 60 immediately and 4 later, your job sits idle wasting 60 GPUs. Gang scheduling enforces 'all together, or not at all'.