Kube + LLM, made readable

Playground

Compose a real LLM platform.

Each scenario gives you a brief, a goal, and a rack of building blocks. Pick what you'd put in production and see where the design holds up.

Scenario 01
Bursty LLM inference endpoint
Serve a 13B chat model behind a public API. Traffic doubles in 30s twice a day. Cold start is 4 minutes.
Open
Scenario 02
Multi-tenant fine-tuned serving
20 tenants, each with their own LoRA adapter on the same 7B base.
Open
Scenario 03
GPU sharing for a mixed workload
A shared GPU cluster supports both production inference and best-effort tuning experiments.
Open
Scenario 04
Detecting a quality regression
You are about to roll out a new fine-tune. Infra metrics are green.
Open
Scenario 05
Shared tuning cluster for several teams
Three product teams need to run nightly LoRA tunes on a shared GPU pool.
Open
Scenario 06
Platform-grade RAG service
Stand up a RAG platform multiple product teams will share.
Open