Framing
GPU pods do not scale like web pods
Spinning up a new replica might take five minutes (image pull, weight load, warm-up). The autoscaler must react before users feel pain — using signals that actually correlate with saturation.
Spinning up a new replica might take five minutes (image pull, weight load, warm-up). The autoscaler must react before users feel pain — using signals that actually correlate with saturation.