Skip to content

Architecture Decision Records

ADR index

#StatusTitle
1AcceptedRecord architecture decisions
2AcceptedCoordinator topology: single region
3SupersededShard snapshot: eventual consistency on the cycle hot path
4AcceptedIncremental reconcile via since-revision
5AcceptedProvider boundary is the validation point
6AcceptedShard self-registers via heartbeat
7AcceptedCluster-to-shard binding is operator-chosen at deploy time
8Amended by ADR-0048Coordinator admin RPCs are leader-only and unauthenticated in v1
9AcceptedReclaim uses policy/v1 eviction and async drain
10AcceptedMinimum Kubernetes version 1.31
11AcceptedBootstrap template is Helm values text template
12AcceptedHelm charts published to GHCR as OCI artefacts
13AcceptedDemand-to-inventory regimes and SLOs
14AcceptedSLO posture: binding latency, not cycle wall-clock
15AcceptedRealistic archetype improvements
16AcceptedNodeStateUpdate carries node identity
17AcceptedPer-CR binding latency vs fingerprint fanout
18AcceptedInternal vs user-facing binding latency
19AcceptedPhase 1 cloud vs bench discrepancy
20AcceptedInternal binding latency SLO respects rollup interval
21AcceptedPersistent execute pool
22AcceptedNeed.Count semantics — Pod count vs machine count, and where packing lives
23AcceptedReal kube-scheduler in the scaletest harness, retire pod-shim’s binding role
24AcceptedCo-location via podAffinity — the CoLocation CR field, roll-up aggregates
25AcceptedThe load-driver anchors sameRack groups — a gang-scheduler stand-in
26AcceptedThe scaletest harness must model the Speculative tier
27AcceptedRoll-up demand is a constrained aggregate resource request, not (per-pod-shape, count)
28AcceptedCycle-p99 SLO is regime-parametric; the realistic catalog scales with Need cardinality
29AcceptedPhase 1 Omega-style OCC — shared-state, commit-broker priority, dual-mode commits
30ProposedIncremental Phase 1 — delta-only processing as a layered optimization
31ProposedParSync-style partitioned synchronization — conditional follow-on for raised per-shard ceilings
32AcceptedRealistic catalog production-calibrated workload distribution
33RejectedPhase 1 supply-credit must respect bind readiness, not just provider state — superseded by ADR-0035
34AcceptedScaletest is bring-your-own-substrate
35AcceptedScaletest SLOs are measured at steady state under churn, not at ramp
36AcceptedPhase 3 reclaim must not fire before a cluster’s first rollup has arrived
37AcceptedScaletest catalog node-affinity dimensions must be realistic — drop synthetic team/app label axes
38AcceptedScaletest workloads are controller-managed objects (Deployment / StatefulSet), not bare Pods
39AcceptedOne CapacityRequest per Pod — not per unschedulable Pod; the demand signal must be total, not unmet
40AcceptedSame-domain attribution is unified — every supply-crediting site is domain-aware
41AcceptedSub-machine Same-Needs fold into atomic aggregates — Same is for cross-machine topology
42AcceptedUnsatisfiable-regime Same-domain choice is sticky at equal coverage — switch only for strictly greater
42aAcceptedADR-0042 Addendum: aged acquisition parking — group identity on the wire, park after 8 unsatisfiable cycles, re-probe every 32
43AcceptedHarness-observed triggers get a demand-realism check before mechanism ships
44AcceptedSeed machine pools are sized by machine demand (pod share ÷ packing density, gang-aware per-zone floors), not workload weight
45AcceptedCapacity counts for a cluster iff bound — Phase 3 reclaims on demand shrinkage only; BigFleet never models packing (author decision; supersedes its own first draft)
46AcceptedActuation safety rails — per-cluster reclaim blast-radius cap, empty-roll-up quarantine, global kill switch
47AcceptedCoordinator quorum formation by ordinal join; offline snapshot restore as single-voter recovery
48AcceptedOpt-in file-based mTLS with bigfleet:// URI SAN identity binding — supersedes ADR-0008’s transport posture
49AcceptedIdle→Speculative release (paper §8’s other half) — per-CapacityType idle holds inside Phase 3; the hold window is the rail, not a cap
50AcceptedRealism catalog (realistic.yaml) calibrated to a realistic MACHINE fleet via per-archetype node-packing density; GPU inference densified (8/node), training whole-machine (1); amends M66.2 + ADR-0044 (author decision)
51AcceptedSame-domain choice follows THIS gang’s bindings (gang-granular attribution) — record Need.Group on the binding, break capped-coverage ties on gang-own coverage; refines ADR-0045, fixes M77g (author decision)
52AcceptedThe shard counts its own in-flight provision commitment against the deficit — credit attributed Creating machines in the coverage walk; amends ADR-0045’s “no in-flight discounting” one state earlier, fixes the #66/#74 pre-Configuring runway over-acquire (author decision)
53DeferredTwo-axis machine-state model (provisioned × bound + op annotation) — scouted as an alternative to ADR-0052 and judged worse for the over-acquire (doesn’t fix it; 149-ref blast; raises correctness surface); deferred as a standalone future ergonomics initiative, wire-frozen, post-ladder (author decision)
54AcceptedSteady pod-bind SLO reframe under an uncapped real scheduler — release gate moves off the end-to-end pod-bind p99 (uncapped-scheduler / reprovision-bound, not BigFleet’s deliverable) onto BigFleet’s capacity-delivery hops (configure-phase p99, Bootstrap success ratio, node-state-update p99, shortfalls==0) plus a loose end-to-end p50 liveness floor; the end-to-end p99 becomes informational (author decision)
55ProposedCoordinator-driven cross-shard rebalancing (realises bigfleet.md §9: transfer idle → reassign quota → cross-shard preempt) — a leader-only tiered rebalancer + the three stub handlers made real, reusing the M20/M69 drain path; anti-oscillation via cooldown + demand-pull invariant; machine-ids donor-resolved, ownership via shard-local persisted owned-set (author decided to BUILD not remove, 2026-06-19; Proposed pending staged-build greenlight)
56AcceptedCoverage credit gated on observed node readiness — Option A (provider-contract obligation): Configure must not report Configured until the node is observed Ready, enforced by a new conformance cluster-join scenario (no shard change); closes the S1 silent false-Configured → phantom-capacity hole that bootstrapSuccessRatio (reported failures) and ADR-0033 (ramp throughput) do not cover (author decision)
57AcceptedP0: shard emits NodeStateUpdate on reconcile-observed transitions + resyncs node state on operator (re)connect — notifyNodeState fired only from the worker/applyTransition path, so async (providerkit) providers, which reach terminal Configured via reconcile, were invisible to the operator (workload never schedules); the in-process fake masked it and the assumed reconnect resync was never built. Shard→operator only, static stability preserved (author decision)
58AcceptedShard→provider fencing high-water mark is per (shard_id, machine_id), not per shard_id — a single live shard’s concurrent execute pool draws monotonic sequence numbers but races the sends, so a per-shard mark fenced the shard against its own out-of-order arrivals on different machines (false zombie → ~30/120 machines bricked at execute-concurrency 32). Per-machine keying stays monotonic (shard serializes per machine) while letting concurrent cross-machine ops proceed; a true zombie is still caught on epoch. Dir 3 (serialize stamp+send) refuted (server-side goroutine race). Contract + conformance (B302 broadened) + snapshot-format change; surfaced by bigfleet-demo (author decision)
59AcceptedP0: async-provider drain finalizes via reconcile — executeDrain applied the terminal binding-clear (Cluster/Assigned* = "") onto the transitional Draining ack an async (providerkit) provider returns, setting Draining-without-a-cluster and tripping the invariant → every Reclaim/Preempt drain failed, capacity never released. Fix: clear only on terminal Idle (mirroring executeDelete); the async Draining ack is left Draining-with-cluster and finalized via the ADR-0057 reconcile path, which also clears Assigned* on a transition to an unbound state. Fake gains DrainStaged to model it. Shard-local, sync path byte-identical; third bigfleet-demo async gap (author decision)
60Accepted (ListQuotas/ListProviders later removed as dormant scaffolding)A read-only coordinator SAN role (bigfleet://readonly) + general-purpose read RPCs — splits the coordinator’s authenticated surface so read RPCs (ListShards/ListDomainAssignments/ListQuotas/ListProviders/ListShardReports) accept bigfleet://readonly OR admin while mutating RPCs stay admin-only; a read-only dashboard/CLI cert then can’t change the fleet (closes the K8s-Dashboard over-privileged-read footgun). Adds ListShardReports (leader-local soft-state snapshot per shard: ShardSummary + top-N Shortfall, carries received_at) and ListProviders. General-purpose, no hot-path dependency; amends ADR-0048; motivated by bigfleet-web-dashboard (author decision)
61Accepted (amended 2026-06-28: matching-supply cardinality + preemption-summary + same-candidate decision-context fields)A shard-side read-only needs-inspection RPC — the only surface that can answer “which of a cluster’s needs are satisfied vs unmet, and why”, because the live NeedsTable lives in the shard and the coordinator only holds an aggregated/anonymous/requirements-stripped top-100 shortfall ledger. New readonly-gated (bigfleet://readonly, mirrors ADR-0060) streaming, per-cluster-filtered RPC on a dedicated read-only service on the shard’s gRPC server, returning per-Need last-cycle verdicts (satisfied / residual-deficit vector / claimed counts / Same domain + satisfiability / acquisition-parked / unmet_reason); retained as a trimmed projection behind a build-then-swap RWMutex at the existing recordShortfalls capture point. Static-stability-safe (read of retained shard-local state, no coordinator import). Reason taxonomy is two-tier: SATISFIED + TOPOLOGY_UNSATISFIABLE(Same) are pure retain (no engine change); PRIORITY_STARVED/NO_MATCHING_SUPPLY/PREEMPTION_EXHAUSTED need cheap behaviour-preserving OCC/Phase-2 instrumentation — author built both tiers now. General-purpose (CLI + dashboard consumers); motivated by the bigfleet-web-dashboard needs explorer (author decision)