Framing
Old dashboards do not tell the new story
Standard infra dashboards say the GPU is busy. They do not say whether users wait, whether tokens stream smoothly, or whether the server is stuck queueing. LLM-specific metrics close that gap.
Standard infra dashboards say the GPU is busy. They do not say whether users wait, whether tokens stream smoothly, or whether the server is stuck queueing. LLM-specific metrics close that gap.