Architecture playground
GPU sharing for a mixed workload
A shared GPU cluster supports both production inference and best-effort tuning experiments.
Goal
Maximize utilization without letting experiments hurt prod latency.
Constraints
- Shared cluster
- Multiple teams
Compose your reference architecture
0 components selectedCompute
Scaling