Scale-test resource requirements
An honest, provider-agnostic estimate of the compute needed to run each BigFleet scale-test rung under realistic conditions, and where the practical ceilings are. Units are abstract — a “host” is one ~96 vCPU / ~384 GiB cloud VM — with no provider- or region-specific naming, so the arithmetic holds independent of where you run it.
The lower-rung numbers are measured (see scale-test results); the higher rungs are extrapolated from those plus one dedicated per-host density experiment. The caveats section is explicit about which is which.
The rungs
Each rung simulates ~10× the Pods of the previous one (a rung’s name is its machine count; Pod demand is roughly machine-count × per-machine density).
| rung | machines | total simulated Pods |
|---|---|---|
| 5k | ~5,000 | 500,000 |
| 50k | ~50,000 | 5,000,000 |
| 500k | ~500,000 | 50,000,000 |
| 5m | ~5,000,000 | 500,000,000 |
How the harness uses hosts
The fleet is simulated with kwok (Kubernetes WithOut Kubelet — simulated nodes/Pods, no real containers):
- 1 engine/hub host runs the BigFleet shard (decision engine) + coordinator + metrics. It hosts no simulated Pods and stays light at small scale.
- N satellite hosts, each running M kwok clusters; each kwok cluster simulates ~25,000 Pods and carries its own kube-apiserver + kube-scheduler + the BigFleet operator. The satellite host’s CPU is the binding resource — it pays for apiserver / scheduler / etcd / operator control-plane work, not container runtime (kwok has none).
- Each satellite’s operators hold a control session back to the hub.
So the host count for a rung is driven by how many Pods one satellite host can carry cleanly, plus one hub.
Per-host clean density (measured)
“Clean” = no sustained host oversubscription (load below core count), all capacity-delivery SLOs green (SLOs), no scheduler runaway.
| scheduler config | clean Pods/host | evidence |
|---|---|---|
| default (uncapped) | ~125K (proven), ~150–160K ceiling | the 5k baseline ran 5 clusters × 25K = 125K/host at ~50–65 % CPU on a 96-vCPU host |
| pod-backoff capped | ~225K | density probe: 200K clean (~45–60 % CPU); 250K saturates (~95 % CPU); uncapped 250K runs away (load compounds past the core count) |
The capped figure is the important one. At high density the kube-scheduler’s retry/backoff churn dominates host CPU and, uncapped, compounds into a load runaway the scheduler never recovers from. Capping the pod backoff removes that amplification — the same host then carries ~1.8× the Pods at lower CPU.
The cap is a harness scheduler setting, not a BigFleet setting, and it’s a density/cost lever, not an SLO lever: BigFleet’s capacity-delivery gates pass with the scheduler uncapped (the published 5k baseline runs uncapped — see ADR-0054), and end-to-end Pod-bind latency is informational under the release gate precisely because it’s scheduler-bound. The cap removes a scheduler CPU artifact to fit more simulated Pods per host; it does not change what BigFleet is graded on.
Density scales (roughly inversely) with host size; with a fixed VM size, host count — not host size — is the lever.
Resource estimates per rung
sat hosts = ceil(Pods / density), plus 1 hub. The VM count ≈ distinct-host count when VMs are spread across regions (a cross-region VM lands ~1:1 on its own physical host). Within a single busy region the platform tends to pack ~1.25–2 VMs per physical host, so reaching a target distinct-host count there needs proportionally more VMs (or churn) — spreading across regions is the efficient way to add distinct hosts.
| rung | Pods | sat hosts @125K (uncapped) | sat hosts @225K (capped) | total hosts (capped, +hub) | feasibility on a cloud-VM fleet |
|---|---|---|---|---|---|
| 5k | 500K | 4 | 3 | ~4 | Trivial — a handful of VMs in one region. (Published baseline: 1 hub + 4 sats.) |
| 50k | 5M | 40 | 23 | ~24 | Reachable. A single region packs well below 24 distinct hosts, so spread across 2–3 regions (cross-region VMs = 1:1 distinct hosts) and use the scheduler cap. Lands within a modest VM budget (a few dozen). |
| 500k | 50M | 400 | 223 | ~224 | Beyond a modest VM budget — hundreds of VMs. Two harness limits also bind: the hub control plane must shard (multiple shard replicas, unvalidated at this inventory), and the single-relay control-session transport does not scale to hundreds of cross-region sessions (needs an in-cloud or in-cluster relay). Requires a dedicated cluster/cloud allocation. |
| 5m | 500M | 4000 | 2223 | ~2,200 | Not a cloud-VM-fleet task. Thousands of VMs, a multi-shard engine control plane, and a non-relay transport — effectively a purpose-built test cluster and harness redesign. |
Binding constraint per rung (what stops you, in order)
- 5k — nothing; fits comfortably in one region.
- 50k — distinct-host count vs. VM budget. Solved by (i) capped density (~halves hosts, 40 → ~24) and (ii) cross-region VMs (1:1 distinct hosts, sidestepping single-region packing). ~24 hosts / ~24–30 VMs.
- 500k — VM count (hundreds) + control-session transport (single relay → in-cloud / in-cluster) + hub sharding. Needs dedicated infrastructure.
- 5m — all of the above at 10×: a dedicated large test cluster, a sharded engine, and an in-cluster transport. A distinct engineering effort, not a fleet-assembly task.
Caveats & confidence
- Density is empirical but single-host. One satellite host was pushed to saturation (uncapped 250K = runaway; capped 200K clean / 250K saturated → ceiling ~225K capped, ~150K uncapped). Multi-host fleets add cross-host variance and shared-host (noisy-neighbor) contention — budget headroom; don’t provision at the saturation edge.
- Hub validated only to ~500K-Pod inventory. The single hub host sat at ~4 % CPU at the 5k rung. At 50M–500M the hub must shard; that scaling is unvalidated and is a separate axis from satellite host count.
- Lever assumptions: ~25K Pods per kwok cluster and ~96-vCPU hosts. Both change the arithmetic if changed.
- Cross-region latency is mostly fine. The latency-sensitive operator operations are host-local; only the hub control session crosses regions, and it has ample latency headroom. The scaling limit at hundreds+ of sessions is the single relay, not the latency.
Summary
| rung | clean hosts (capped) | verdict |
|---|---|---|
| 5k | ~4 | routine — published baseline |
| 50k | ~24 | reachable: capped density + cross-region spread, ~24–30 VMs |
| 500k | ~224 | needs dedicated cluster allocation + transport / hub-sharding work |
| 5m | ~2,200 | a purpose-built effort (cluster + sharded engine + in-cluster transport) |
The headline: scheduler-capping roughly halves the host count, and cross-region VMs remove the single-region packing wall — together they bring 50k within a modest VM budget. 500k and 5m are gated not by BigFleet’s engine (which stays light per-decision) but by raw VM count and two harness components — control-session transport and hub sharding — that would need dedicated work.