Why parallelism
Big models do not fit, period
A 70B model in BF16 needs ~140 GB just for weights. No single accelerator has that today. You must split the model across GPUs — and how you split it determines whether the result is fast or merely possible.