Generative AI on Kubernetes

Why this differs

Identical replicas have non-identical state

Two LLM replicas may run the same image, but each carries its own KV cache full of recent prompts. Routing the next message in a conversation to a random replica throws that cache away and recomputes prefill from scratch.