Generative AI on Kubernetes

Why parallelism

Big models do not fit, period

A 70B model in BF16 needs ~140 GB just for weights. No single accelerator has that today. You must split the model across GPUs — and how you split it determines whether the result is fast or merely possible.