Framing
All or nothing
Distributed training does not partially work. If you ask for 64 GPUs and the scheduler gives you 60 immediately and 4 later, your job sits idle wasting 60 GPUs. Gang scheduling enforces 'all together, or not at all'.
Distributed training does not partially work. If you ask for 64 GPUs and the scheduler gives you 60 immediately and 4 later, your job sits idle wasting 60 GPUs. Gang scheduling enforces 'all together, or not at all'.