The asymmetry
Prefill is compute-bound, decode is memory-bound
Prefill processes the whole prompt in one shot — it loves dense compute. Decode emits one token at a time and is bottlenecked by KV-cache bandwidth. Mixing them on one GPU means each phase fights the other.