Kube + LLM, made readable

Practical · model serving

Model serving on Kubernetes

You will know how to pick a model server, declare it with KServe, and deliver weights without baking them into your image.

Progress

0 / 3 lessons

01
Anatomy of a model server
Why you almost never wrap PyTorch in Flask in production
4 min
02
KServe and model server controllers
Make 'deploy a model' a one-line declarative resource
4 min
03
Model data: weights, formats, and Modelcars
Where do 30 GB of weights actually live, and how do they get to the GPU?
4 min

Lock it in

Multi-tenant fine-tuned serving

20 tenants, each with their own LoRA adapter on the same 7B base.

Try the scenario