Generative AI on Kubernetes

Framing

Old dashboards do not tell the new story

Standard infra dashboards say the GPU is busy. They do not say whether users wait, whether tokens stream smoothly, or whether the server is stuck queueing. LLM-specific metrics close that gap.