Decision map — every ADR, where it lives in code, and what guards it

This is the implementation map for BigFleet’s architecture decisions: a single navigable page that takes each ADR and answers where does this decision actually live in the tree, and which test keeps it honest? It is the maintainer’s companion to the ADRs themselves — open this when you are about to change code that an ADR constrains, and it will point you at the files and the *_test.go that fail if you get it wrong.

It is not the canonical status table. That is ../adr/index.md — the single source of truth for each ADR’s status line (Accepted / Proposed / Rejected / Superseded / Amended). This page copies status lines faithfully but defers to the index; if they ever disagree, the index wins and this page is stale. Likewise, the prose deep-dives under README.md explain the why function-by-function; this page is the index into them, organised by decision rather than by subsystem.

When a code path here disagrees with a higher authority, the higher authority wins and the divergence is documented (not papered over). The source-of-truth ordering (see ../index.md’s “When the docs disagree”) is:

The two papers — ../papers/bigfleet.md, ../papers/fleet-scale-kubernetes.md.
Author decisions in ../adr/.
../plan.md.
The code.

A handful of decisions are spec-only: the ADR was Accepted as a framing or was Proposed/Rejected and never shipped mechanism, or its mechanism was later removed by a superseding ADR. Those rows say spec-only / superseded explicitly rather than inventing a code path. Trust that label — it means the code genuinely is not there.

Decision lineage

Most decisions stand alone. The ones that don’t form chains; reading the chain is the only way to understand why the current code looks the way it does (a row’s “implemented in” often reflects the last link, not the ADR you started from).

ADR-0003 (shard snapshot eventual consistency) — superseded, mechanism removed. The original background-fold goroutine + CycleSnapshot() was reversed at M44.4 Drop A (synchronous Snapshot()) and the fold loop / live triple-indexes were deleted at M66.1. The ADR’s mechanism no longer exists in the tree; pkg/inventory/inventory.go and pkg/shard/shard.go carry supersession comments.
ADR-0008 transport posture — superseded by ADR-0048. ADR-0008’s leader-only RPC contract stands. Its “v1 ships unauthenticated, wrap it in a sidecar” posture is replaced by ADR-0048’s opt-in file-based mTLS with bigfleet:// URI-SAN identity binding.
ADR-0013 → ADR-0014 → ADR-0017 → ADR-0018 (the SLO arc). ADR-0013’s three-regime / cycle-p99 release gate was reframed by ADR-0014 (binding latency becomes the gate, cycle wall-clock a tracked envelope); ADR-0014 was built on by ADR-0017 (per-Pod histogram as the gate source) and amended by ADR-0018 (the harness metric is internal-only; renamed internalBindingLatencyP99Seconds). ADR-0013’s named three-regime/convergence-rate scheme was never built — spec-only.
ADR-0033 — Rejected, superseded by ADR-0035. The bind plateau ADR-0033 targeted was a kube-scheduler ramp property, not a BigFleet steady-state bug. No code shipped; ADR-0035 moved the fix to the harness (“measure SLOs at steady state under churn, not at ramp”) and was itself amended 2026-06-14 (reclaim settle-window + bounded floor, accepting the ADR-0021 async-actuation floor).
ADR-0045 supersedes its own first draft. The withdrawn first draft proposed operator-reported per-machine consumption (rejected as scheduler-shadowing). The accepted rule is the single attribution contract: capacity counts for a cluster iff it is bound to it. M68 (“single attribution”) dissolves into it.
The domain-attribution arc: ADR-0040 → 0041 → 0042 → 0042-addendum → 0045 → 0051. This is the longest chain and the one most worth reading end-to-end before touching pkg/decision/occ/samebucket.go.
- 0040 makes every supply-crediting site Same-domain-aware; its Addendum chooses the Same domain once per Need per cycle jointly over creditable + acquirable supply.
- 0041 folds sub-machine Same-Needs into atomic aggregates (Same is for cross-machine topology only).
- 0042 makes unsatisfiable-regime domain choice sticky at equal coverage.
- 0042-addendum engages the named escalation path: aged acquisition parking (group ID on the wire, parkAfterCycles=8, reprobeEveryCycles=32).
- 0045 then changes the accounting rule underneath all of it (bound-vs-demand), removing the Bootstrap≈Reclaim oscillation class by construction.
- 0051 refines 0045 to gang granularity (bigfleet.lucy.sh/assigned-group), pinning the domain choice and the claimed set to a fixed point through the bootstrap dwell.
The deep-dive prose for this arc is domain-attribution.md (companion page; see also phase1-occ.md and needs-table.md).
ADR-0042-addendum is the parking cautionary tale. It is the single largest pool of unforced engine complexity — built rigorously against a demand shape one catalog archetype fabricated. That is exactly what ADR-0043 (“harness-observed triggers get a demand-realism check before mechanism ships”) exists to catch; ADR-0043 is codified as a working-discipline rule, and ADR-0050 / ADR-0044 are downstream “fix the harness and re-measure” applications of it.
The Phase-1 engine arc: ADR-0022 → 0027 → 0028 → 0029 (→ 0030, 0031 proposed). Need.Count is pod count (0022) → roll-up demand is a constrained aggregate resource request (0027) → cycle-p99 is regime-parametric (0028) → Phase 1 becomes Omega-style OCC (0029, which supersedes 0028’s OCC-deferral). ADR-0030 (incremental delta-only) and ADR-0031 (ParSync partitioning) are Proposed conditional follow-ons to 0029 — spec-only until measurement promotes them.

The map

Code and test paths below are reproduced from the grounded digest. Where a decision shipped nothing, the cell reads spec-only (see the lineage above for why). ADR numbers link to the record; implementation paths are exact.

Architecture & topology

ADR	Title	Status	Decision	Why	Implemented in	Guarded by
0002	Coordinator topology — single Raft group, single region (v1)	Accepted	v1 coordinator is one 3-replica Raft group in one region (3 AZs), `hashicorp/raft` + BoltDB on local disk; its region is a documented SPOF for cross-shard rebalancing only.	Simplest well-trodden shape; static stability makes a regional coordinator outage degraded, not lost, so the SPOF is defensible.	`pkg/coordinator/coordinator.go`, `pkg/coordinator/fsm.go`, `pkg/coordinator/join.go`, `go.mod`	`pkg/shard/no_coordinator_dep_test.go`, `pkg/coordinator/coordinator_test.go`, `pkg/coordinator/join_test.go`
0006	Shards self-register via the `ReportShard` heartbeat	Accepted	`ShardReport` gains optional `shard_address` (field 8); first report from an unknown shard Raft-Applies `AddShard{ID,Address}` synchronously, then the cheap `MarkHeartbeat` path; `ErrShardExists` swallowed.	Folding registration into the heartbeat avoids a second RPC, auth surface, and startup retry path.	`pkg/coordinator/grpc_server.go`, `pkg/coordinator/fsm.go`, `pkg/coordinator/state.go`, `api/proto/bigfleet/v1alpha1/coordinator.proto`	`pkg/coordinator/grpc_server_test.go`, `pkg/coordinator/coordinator_test.go`, `pkg/coordinator/grpc_server_identity_test.go`
0007	Cluster-to-shard binding is operator-chosen at deploy time	Accepted	The operator’s `--shard-addr` flag (from the chart’s `shardAddress`) is the canonical binding; first-contact-wins, re-bind needs a chart upgrade + restart; reconnects dial the same static StatefulSet DNS.	Static addressing keeps the coordinator out of the data-plane dial/reconnect path, preserving static stability.	`cmd/operator/main.go`, `deploy/helm/bigfleet-operator/values.yaml`, `deploy/helm/bigfleet-operator/templates/deployment.yaml`, `pkg/operator/operator.go`, `pkg/operator/stream.go`, `pkg/shard/session.go`	`pkg/operator/operator_test.go`
0047	Coordinator quorum by ordinal join; offline snapshot restore	Accepted (M75)	StatefulSet pattern: ordinal 0 honours `--bootstrap`, ordinals >0 join the leader via leader-only `JoinRaftCluster`; idempotent re-join; `bigfleetctl snapshot save/restore` rebuilds a stopped node as a single-voter, others re-form quorum via join.	The “3-replica HA” install actually bootstrapped three independent single-node clusters (`AddVoter` had zero callers) and DR had no restore tool.	`pkg/coordinator/join.go`, `pkg/coordinator/coordinator.go`, `pkg/coordinator/grpc_server.go`, `pkg/coordinator/snapshot_restore.go`, `pkg/coordinator/snapshot_export.go`, `cmd/bigfleet/coordinator.go`, `cmd/bigfleetctl/main.go`, `deploy/helm/bigfleet/templates/coordinator-statefulset.yaml`, `deploy/helm/bigfleet/values.yaml`	`pkg/coordinator/join_test.go`, `test/integration/raft_quorum_test.go`

Decision engine & cost

ADR	Title	Status	Decision	Why	Implemented in	Guarded by
0003	Shard inventory snapshots eventually consistent on the cycle hot path	Superseded (M44.4 Drop A; fold goroutine + live triple-indexes removed at M66.1)	Originally: cycle read an O(1) `CycleSnapshot()` from a background debounced fold goroutine. Reversed: cycle now uses synchronous `Snapshot()`; the fold loop and `CycleSnapshot` were removed.	Eventual consistency was safe (idempotent actions, stale-reject `Apply`), but synchronous `Snapshot()` was simpler and made the fold goroutine redundant.	spec-only — mechanism removed; supersession comments at `pkg/inventory/inventory.go`, `pkg/shard/shard.go`; surviving `Inventory.Snapshot()` is what the cycle now uses	—
0019	Phase 1 cloud-vs-bench discrepancy — instrument before optimising	Accepted	Add per-sub-path Phase 1 instrumentation before touching `pkg/decision/`; rewrite the M38 failure injector to `ConfiguredCount()×ratePerSec` Poisson mean (default `1.16e-7`).	Cloud and bench disagreed 6000×; optimising on bench would optimise the wrong code.	`pkg/metrics/metrics.go`, `cmd/bigfleet/shard.go`, `pkg/provider/fake/fake.go`, `test/scaletest/chart/values.yaml`, `test/scaletest/chart/templates/shard.yaml`, `pkg/decision/phase1_realistic_bench_test.go` (note: the Phase 1 sub-path histograms are defined but no longer observed — the allocator they targeted became the OCC broker, ADR-0027/0029)	`pkg/decision/phase1_realistic_bench_test.go`, `pkg/decision/phase1_takecolocated_bench_test.go`, `pkg/needs/snapshot_bench_test.go`
0021	Persistent execute pool — decouple action execution from the cycle barrier	Accepted	Replace per-cycle dispatch + `wg.Wait` with a shard-scoped persistent worker pool draining a bounded `actionQueue`, each action capped by `ExecuteTimeout` (30 s); cycle enqueues and returns, dropping (and counting) on full queue.	`wg.Wait` made wall-clock = max(action latency) and cascaded cycle-ctx cancellation into machine state, capping throughput.	`pkg/shard/shard.go`, `pkg/metrics/metrics.go`	`pkg/shard/shard_test.go`, `pkg/shard/execute_drain_test.go`, `pkg/shard/reconcile_test.go`
0022	`Need.Count` is Pod count, not machine count	Accepted (predecessor of ADR-0027)	Treat `Need.Count` as Pod count; Phase 1/3 compute machines by diffing aggregate demand (`Profile.Resources×Count`) against `Σ Machine.Allocatable` in resource-vector space, taking the bottleneck dimension.	The impl had drifted to one Bootstrap per Pod, over-provisioning by the density factor when Pods shared a Profile.	`pkg/decision/phase1_assign.go`, `pkg/needs/needs.go`, `pkg/decision/phase3_reclaim.go` (final code is the ADR-0027 form — `PodsPerMachine`/`densityFor` removed)	`pkg/decision/phase1_same_test.go`
0027	Roll-up demand is a constrained aggregate resource request	Accepted	`CapacityNeed` redefined: `aggregate_resources` replaces per-pod resources, `min_unit` is the atomic schedulable unit, `count` removed; Phase 1 supply is `Σ Machine.Allocatable` counted once per machine (no density projection).	Per-fingerprint dedicated-density accounting over-credited phantom capacity when fingerprints shared physical eligibility, masking real deficits (shortfalls=0 while pods stuck).	`api/proto/bigfleet/v1alpha1/capacity.proto`, `pkg/proto/bigfleet/v1alpha1/capacity.pb.go`, `pkg/decision/phase1_assign.go`, `pkg/decision/occ/cycle.go`, `pkg/decision/occ/seed.go`, `pkg/decision/occ/state.go`, `pkg/decision/match.go`	`pkg/decision/phase1_test.go`, `pkg/decision/phase1_realistic_test.go`, `pkg/decision/phase1_same_test.go`, `pkg/decision/phase1_spread_test.go`, `pkg/decision/phase3_test.go`, `pkg/decision/integration_test.go`, `pkg/decision/occ/cycle_test.go`
0028	Cycle-p99 SLO is regime-parametric; realistic catalog scales with Need cardinality	Accepted (OCC-deferral superseded by ADR-0029)	The 100 ms cycle-p99 bar applies only to the aggregated regime; the realistic regime is graded on a per-Need Phase 1 p99 bar (≤200 µs, later demoted to aspirational) plus relaxed envelopes scaling with Need cardinality.	Each `sameRack` group becomes its own Need, so Phase 1 wall-clock scales with Need cardinality; the absolute cycle bar grades the workload, not BigFleet.	spec-only (SLO framing; no constants in tree)	`pkg/decision/phase1_uber5k_bench_test.go`, `pkg/decision/phase1_realistic_bench_test.go`, `pkg/decision/phase1_takecolocated_bench_test.go`
0029	Phase 1 Omega-style OCC — shared-state, commit-broker priority, dual-mode commits	Accepted	Phase 1 redesigned as Omega-style OCC: shared immutable snapshot, single unordered Need queue served by a worker pool, single mutex-guarded commit broker doing per-bucket seqno CAS; priority enforced reactively at the broker (displacement + re-queue); `ModeIncremental`/`ModeAllOrNothing`; bounded retries → shortfall.	Constant-factor optimisation of the single-threaded sorted loop can’t reach ADR-0028’s envelope; the only lever is iteration-count reduction via intra-shard concurrency.	`pkg/decision/occ/cycle.go`, `pkg/decision/occ/broker.go`, `pkg/decision/occ/state.go`, `pkg/decision/occ/types.go`, `pkg/decision/occ/seed.go`, `pkg/decision/occ/candidates.go`, `pkg/decision/occ/samebucket.go`, `pkg/decision/occ/samesupply.go`, `pkg/decision/occ/poolcache.go`, `pkg/decision/occ/match.go`, `pkg/decision/phase1_assign.go`	`pkg/decision/occ/broker_test.go`, `pkg/decision/occ/cycle_test.go`, `pkg/decision/occ/displacement_test.go`, `pkg/decision/occ/state_test.go`, `pkg/decision/occ/candidates_test.go`, `pkg/decision/occ/samebucket_test.go`, `pkg/decision/occ/incumbency_repro_test.go`, `pkg/decision/occ/samesupply_bench_test.go`
0030	Incremental Phase 1 — delta-only processing	Proposed (conditional follow-on to 0029)	Layer a delta-only fast path over OCC: per-Need digest + inventory transition events detect changed Needs/machines; only the delta is OCC-processed, drift caught by digest check + periodic full re-sync.	Single-pass OCC is still O(NeedsTable); in steady state most Needs don’t change, so re-processing them is wasted work.	spec-only (the unrelated `--incremental-reconcile` flag is ADR-0004’s `since_revision` cursor, not this)	—
0031	ParSync-style partitioned synchronization	Proposed (conditional follow-on to 0029)	Record (don’t build) a ParSync design partitioning the OCC claimed-set into P partitions, refreshing one per worker per cycle; promotion gated on measured conflict-rate ≥0.3 or a >500K ceiling raise.	Above 500K BigFleet scales by adding shards, not bigger ones, so the contention win is zero until a ceiling raise exists — YAGNI.	spec-only	—

Capacity model & attribution

ADR	Title	Status	Decision	Why	Implemented in	Guarded by
0036	Phase 3 reclaim must not fire before a cluster’s first rollup	Accepted	Shard tracks `firstRollupReceived[ClusterID]` (set true on first `RollupReport`, even an empty one); Phase 3 early-returns for any cluster whose flag is false.	An empty NeedsTable at startup is indistinguishable from “no demand”; without the gate Phase 3 would reclaim the whole fleet before operators reconnect (static-stability violation).	`pkg/shard/shard.go`, `pkg/decision/phase3_reclaim.go`	`pkg/decision/phase3_test.go`, `pkg/shard/safety_test.go`
0040	`Same`-domain attribution is unified — every supply-crediting site is domain-aware	Accepted	Every supply-crediting site becomes domain-aware for `Same`-Profiles, mirroring `FindSame`’s single-best-bucket rule; Addendum: the `Same` domain is chosen once per Need per cycle, jointly over creditable + acquirable supply, with Phase 3 mirroring identical scoring.	Crediting was vacuous (across domains) while acquisition was strict — Phase 1 chased un-finishable gangs, Phase 3 reclaimed the over-provision: a self-sustaining Bootstrap≈Reclaim equilibrium.	`pkg/decision/occ/seed.go`, `pkg/decision/occ/samebucket.go`, `pkg/decision/occ/cycle.go`, `pkg/decision/phase3_reclaim.go`, `pkg/decision/phase1_assign.go`, `pkg/decision/occ/types.go`	`pkg/decision/samebucket_test.go`, `pkg/decision/occ/samebucket_test.go`, `pkg/decision/occ/candidates_test.go`, `pkg/decision/integration_test.go`
0041	Sub-machine `Same`-Needs fold into atomic aggregates	Accepted	`NormalizeDemand`: a `Same`-Need that fits one matching machine folds into one plain Need (`min_unit` = one gang’s aggregate); Needs that fit no machine keep their per-gang `Same` Need. Riders: Phase 3 acquirable fold; `ChooseSameBucket` prefers creditable in the satisfiable regime.	ADR-0024+0039 reshaped demand into ~2,400 sub-machine gang Needs each up-rounding to a whole machine; a gang that fits one machine needs no `Same` machinery.	`pkg/decision/normalize.go`, `pkg/decision/occ/seed.go`, `pkg/decision/occ/samebucket.go`, `pkg/decision/phase3_reclaim.go`	`pkg/decision/normalize_test.go`, `sim/closedloop_test.go`
0042	Unsatisfiable-regime domain choice is sticky at equal coverage	Accepted	In `ChooseSameBucket`’s unsatisfiable regime, switch domains only for strictly greater coverage; at equal coverage the incumbent domain (creditable supply present) wins before count/lexicographic tie-breaks. Stateless.	Multi-machine GPU gangs no rack can host re-derived the joint domain from scratch each cycle; identical-total racks tied constantly, so claim-walk perturbations flipped the tie and drove ~27/sec Bootstrap↔Reclaim churn.	`pkg/decision/occ/samebucket.go`	`sim/m61_repro_test.go`
0042-addendum	Aged acquisition parking — the escalation path, engaged	Accepted (extends 0042 after PARTIAL cloud validation)	(1) group ID on the wire (`CapacityNeed.group`, field 9); (2) aged acquisition parking — at `parkAfterCycles=8` a persistently-unsatisfied class goes creditable-only; (3) re-probe every `reprobeEveryCycles=32`. Per-class age ledger on the shard only, no coordinator.	0042’s exact-tie pinning was too narrow: per-domain acquirable totals shift slightly each cycle so coverage is rarely exactly equal and the strictly-greater branch keeps firing on marginal deltas. This is the parking cautionary tale flagged by ADR-0043.	`api/proto/bigfleet/v1alpha1/capacity.proto`, `pkg/proto/bigfleet/v1alpha1/capacity.pb.go`, `pkg/needs/needs.go`, `pkg/shard/shard.go`, `pkg/decision/occ/cycle.go`, `pkg/decision/occ/seed.go`, `pkg/decision/occ/types.go`, `pkg/decision/phase1_assign.go`, `pkg/decision/phase2_inversions.go`, `pkg/decision/phase3_reclaim.go`	`pkg/shard/parking_test.go`, `pkg/decision/phase2_test.go`, `sim/m61_repro_test.go`
0045	Capacity counts for a cluster iff it is bound — BigFleet never models packing	Accepted (supersedes its own first draft; M68 dissolves in)	One rule: capacity counts iff bound (`Configure` is atomic fulfillment; the machine state machine is the only ledger). Phase 1 fulfills demand−bound; Phase 3 reclaim is triggered by demand shrinkage only; satisfied-but-stuck is the cluster’s problem. Per-machine consumed vectors / residual-fit / bound-open splits rejected by name.	Any arithmetic anticipating whether the cluster’s scheduler can use bound capacity shadows the scheduler (“not a scheduler” hard rule); the bound-vs-demand contract removes the Bootstrap≈Reclaim class by construction.	`pkg/decision/phase3_reclaim.go`, `pkg/decision/phase1_assign.go`	`sim/m67_repro_test.go`, `pkg/decision/phase3_test.go`, `pkg/decision/integration_test.go`, `sim/m73_release_test.go`
0051	`Same`-domain choice follows this gang’s bindings (gang-granular attribution) + M77h machine-selection	Accepted (refines 0045, does not reverse it; M77g + M77h)	Record the serving gang on each binding via additive `bigfleet.lucy.sh/assigned-group` (from `Need.Group` at Configure-time; machine gains `AssignedGroup`). `ChooseSameBucket` breaks capped-coverage ties on the gang’s OWN creditable coverage; M77h: `incumbentFirst` stably partitions a gang’s incumbents ahead of non-incumbents under stop-when-covered.	Cluster-granular coverage cannot tell “this domain holds my gang’s machines” from “an equal number of unrelated machines”; under ADR-0050’s bootstrap dwell the tie fell through to moving acquirable slack, causing a sustained domain-flap lockstep.	`pkg/machine/shardmetadata.go`, `pkg/machine/machine.go`, `pkg/decision/occ/samebucket.go`, `pkg/decision/occ/seed.go`, `pkg/decision/action.go`, `pkg/provider/fake/fake.go`	`pkg/decision/occ/samebucket_test.go`, `pkg/decision/occ/incumbency_repro_test.go`, `pkg/machine/shardmetadata_test.go`, `sim/incumbency_repro_test.go`, `sim/gang_dwell_test.go`, `test/conformance/metadata_test.go`

The full prose walkthrough of this group is domain-attribution.md (companion deep-dive), with supporting detail in phase1-occ.md and needs-table.md.

Provider boundary

ADR	Title	Status	Decision	Why	Implemented in	Guarded by
0004	Incremental reconcile via `since_revision` — opt-in, deltas only	Accepted	`Config.IncrementalReconcile` (default false). Off = always-correct full `List()` + removal walk. On = pass `reconcileCursor` as `ListFilter.SinceRevision`, apply deltas, advance cursor, skip removal walk. Cursor is process-state; tombstones deferred.	Unfiltered `List` dominated the cycle (~87% at 500K); cursor deltas cut shard cycle p99 ~81% while the opt-in flag keeps the safe full-list default.	`pkg/shard/reconcile.go`, `pkg/shard/shard.go`, `pkg/provider/fake/fake.go`, `api/proto/bigfleet/v1alpha1/provider.proto`, `pkg/provider/provider.go`	`test/conformance/conformance_test.go`, `pkg/shard/cycle_phasedump_test.go`
0005	The provider boundary is the validation point; reconcile trusts domain types	Accepted (amended by ADR-0046 Addendum / M70)	`reconcile` applies provider machines directly to inventory without the `MachineToProto`+`MachineFromProto` round-trip; validation sits at each provider boundary (`pkg/conv`) and `inventory.Apply` is the apply-path net. (M70 re-added cost-field validation via `validateProviderMachine` on the slow path.)	The per-reconcile round-trip re-validated the same enum twice and dominated post-burst cycles; moving validation to the boundary dropped cycle mean ~24% at 500K.	`pkg/shard/reconcile.go`, `pkg/conv/conv.go`, `pkg/provider/grpcadapter/grpcadapter.go`, `pkg/provider/fake/fake.go`, `pkg/machine/machine.go`	`pkg/shard/reconcile_test.go`, `pkg/machine/machine_test.go`, `pkg/conv/conv_test.go`

Operator & CRDs

ADR	Title	Status	Decision	Why	Implemented in	Guarded by
0009	`ReclaimInstruction` uses policy/v1 Eviction and acks before drain completes	Accepted	Operator cordons each node synchronously, patches `UpcomingNode` to `Draining`, sends `ReclaimAck` (started semantics), then drains async: skip DaemonSet pods, post policy/v1 Eviction, retry 429/PDB with 2 s backoff bounded by `grace_period_seconds`, walk to `Drained`/`Failed`.	policy/v1 Eviction makes the apiserver enforce PDBs; ack-on-cordon is the honest static-stability post-condition — a multi-minute drain must not hold the session recv-loop hostage.	`pkg/operator/reclaim.go`	`pkg/operator/reclaim_internal_test.go`
0010	Minimum Kubernetes version 1.31	Accepted	All three charts declare `kubeVersion: ">= 1.31.0-0"`; no back-compat shim for rendering the CRD without `selectableFields`.	The `CapacityRequest` CRD’s `selectableFields` (powering `kubectl --field-selector=status.phase`) only went GA in 1.31.	`deploy/helm/bigfleet/Chart.yaml`, `deploy/helm/bigfleet-operator/Chart.yaml`, `deploy/helm/bigfleet-unschedulable-pod-controller/Chart.yaml`, `api/crd/bigfleet.lucy.sh_capacityrequests.yaml`	(enforced at helm-install / `helm template --kube-version`; no Go test)
0011	`BootstrapTemplate` is a helm-values text/template, not a CRD or webhook	Accepted	Configured via a `bootstrapTemplate` values block rendered into a ConfigMap, mounted at `/etc/bigfleet/bootstrap.tmpl`, parsed at startup; Go callback retained for embedders (callback wins). No CRD, webhook, or Sprig — stdlib `text/template` only.	File-mounted parse-once template keeps the `BootstrapRequest` hot path free of any runtime apiserver/webhook coupling.	`pkg/operator/bootstrap_template.go`, `pkg/operator/bootstrap.go`, `pkg/operator/operator.go`, `cmd/operator/main.go`, `deploy/helm/bigfleet-operator/values.yaml`, `deploy/helm/bigfleet-operator/templates/deployment.yaml`	`pkg/operator/bootstrap_template_test.go`
0012	Helm charts published to GHCR as OCI artefacts on every push to main	Accepted	New `charts.yml` mirrors `images.yml`: on push, `helm package` + `helm push` to `oci://ghcr.io/<owner>/charts/<chart>` tagged with the Chart version (immutable, no floating `latest`); on PR, `helm lint`/`package`/`template --kube-version=1.31.0`.	OCI-via-GHCR piggy-backs on the existing image-publishing auth/flow, letting users install without cloning.	`.github/workflows/charts.yml`	(CI workflow; no Go test)
0016	`NodeStateUpdate` carries node identity (labels, resources, taints)	Accepted	`NodeStateUpdate` gains `labels` (9), `resources` (10), `taints` (11, new `Taint` message); shard populates from the machine `Profile` on every emit, operator copies into `UpcomingNode.Spec.{Labels,Resources,Taints}`.	Any controller pre-allocating against an upcoming node needs its shape before kubelet joins; the shard already holds it. (Taints plumbed but not exercised by a synthetic emitter — labels+resources only.)	`api/proto/bigfleet/v1alpha1/shard.proto`, `pkg/shard/shard.go`, `pkg/operator/upcoming.go`	`pkg/apis/bigfleet/v1alpha1/roundtrip_test.go`
0024	Co-location via podAffinity — the `CoLocation` CR field, roll-up aggregates	Accepted (builds on ADR-0022)	Derive co-location from required podAffinity, carried as a structured `CoLocationTerm {LabelSelector, TopologyKey}`; UPC translates podAffinity→CoLocation, operator derives aggregation group + `Same` key at roll-up, retiring `CoLocationKey`. Companion: 256 MiB gRPC ceiling in `pkg/grpcutil`.	The old owner-UID key put every pod in its own group so the roll-up never aggregated (O(unschedulable-pods)); podAffinity is the native, zero-user-change signal.	`pkg/apis/bigfleet/v1alpha1/capacityrequest_types.go`, `pkg/apis/bigfleet/v1alpha1/zz_generated.deepcopy.go`, `pkg/controller/cr/controller.go`, `pkg/operator/rollup.go`, `pkg/grpcutil/grpcutil.go`	`pkg/controller/cr/controller_test.go`, `pkg/operator/rollup_topology_test.go`, `pkg/apis/bigfleet/v1alpha1/roundtrip_test.go`, `pkg/operator/rollup_colocated_bench_test.go`
0039	One `CapacityRequest` per Pod — not per unschedulable Pod	Accepted	The reference UPC creates a CR for every Pod, not only `reason=Unschedulable`, honouring Fleet-Scale Kubernetes §6.1; CR stays owner-referenced and GC’s on deletion.	~84% of bound Pods carried no CR (pre-bind fast-path + ADR-0038 recreated Pods bypass Unschedulable), undercounting demand ~6× and giving Phase 3 a phantom surplus.	`pkg/controller/cr/controller.go`	`pkg/controller/cr/controller_test.go`

Security & fencing

ADR	Title	Status	Decision	Why	Implemented in	Guarded by
0008	Coordinator admin RPCs — leader-only, unauthenticated in v1, sidecar for external	Accepted; transport/authn posture superseded by ADR-0048 (leader-only contract stands)	All admin RPCs are leader-only (followers reject `FailedPrecondition`); reads go through the leader’s `State` RLock. v1 ships unauthenticated (NetworkPolicy / external sidecar); `bigfleetctl` is the canonical insecure-by-default client; `SetQuota` deferred.	Leader-only avoids stale-read footguns + client-side leader-cache logic; shipping no in-tree authn avoids picking an identity winner.	`pkg/coordinator/grpc_server.go`, `cmd/bigfleetctl/main.go`	`pkg/coordinator/grpc_server_test.go`
0048	Opt-in file-based mTLS with `bigfleet://` URI SAN identity binding	Accepted (M74) (supersedes ADR-0008 transport posture)	Symmetric `--tls-cert/--tls-key/--tls-ca` on every server+client (once in `pkg/grpcutil`): all three = mTLS (TLS 1.3, mutual verify), none = plaintext, partial = startup error; hot-reload certs. Identity is exactly one `bigfleet://` URI SAN per cert; shard Session binds SAN to `Hello.cluster_id`, admin surface requires `bigfleet://admin`, mismatch → `PermissionDenied`.	Every surface was plaintext and the shard trusted the client-asserted `Hello.cluster_id`, so any reachable client could impersonate any cluster; identity binding demotes ADR-0046’s roll-up guard to defence-in-depth.	`pkg/grpcutil/tls.go`, `pkg/grpcutil/grpcutil.go`, `pkg/shard/session.go`, `pkg/coordinator/grpc_server.go`, `pkg/coordinator/join.go`, `pkg/shard/coordclient/coordclient.go`, `pkg/provider/grpcclient/grpcclient.go`, `pkg/operator/operator.go`, `cmd/bigfleet/shard.go`, `cmd/bigfleet/coordinator.go`, `cmd/operator/main.go`, `cmd/bigfleetctl/main.go`	`pkg/grpcutil/tls_test.go`, `pkg/shard/session_identity_test.go`, `pkg/coordinator/grpc_server_identity_test.go`, `pkg/grpcutil/tlstest/tlstest.go`

Scale-test methodology & SLOs

ADR	Title	Status	Decision	Why	Implemented in	Guarded by
0013	Demand-to-inventory regimes and SLOs	Accepted (cycle-p99 gate superseded by ADR-0014; three-regime scheme not built)	Promised three regimes with distinct SLOs: steady-state (≤2%, p99 ≤50 ms), burst (≤10%, p99 ≤100 ms, the gate), reprovisioning (≤100%, convergence ≥5,000 bindings/cycle).	Real fleets live in the burst regime; full-fleet reprovisioning is a backlog-drain deserving a throughput contract, not a per-cycle SLO.	spec-only (the named three-regime/convergence-rate scheme has no code; ADR-0014 reframed it)	—
0014	SLO posture — binding latency is the gate, cycle wall-clock is a tracked metric	Accepted (amended by ADR-0018; built on by ADR-0017)	`bindingLatencyP99` (CR creation → Configured) becomes the user-facing gate with per-tier targets; `shardCycleDurationP99 ≤ rollupInterval/2` becomes a tracked envelope, not a gate.	No comparable system gates a release on a sub-100 ms rebalance loop; users feel binding latency.	`test/scaletest/cmd/scaletest-runner/main.go`, `test/scaletest/cmd/pod-shim/main.go`	(harness wiring; exercised end-to-end by scaletest runs)
0015	Realistic archetype improvements (multiplicity, bimodal lifetimes, bursts, Same-rack, size skew)	Accepted	Five harness extensions: fingerprint multiplicity (`sizeBuckets`), bimodal CR lifetimes (`meanLifetimeSeconds`), concentrated burst actions, `Same`-rack co-location (`sameRack`/`groupSizeRange`), heavy-tailed `clusterSizeDistribution`.	The M31 single-shape catalog was “less honest than it claims”; conclusions drawn against it may not generalise to production demand.	`pkg/scaletest/archetype/archetype.go`, `pkg/scaletest/archetype/sizing.go`, `test/scaletest/cmd/load-driver/main.go`, `cmd/bigfleet/shard.go`, `test/scaletest/cmd/scaletest-runner/main.go`, `test/scaletest/profiles/archetypes/realistic.yaml`	`pkg/scaletest/archetype/archetype_test.go`, `pkg/scaletest/archetype/sizing_test.go`, `pkg/scaletest/archetype/realistic_mix_test.go`, `test/scaletest/cmd/load-driver/main_test.go`
0017	Per-CR binding latency is the user-facing metric; fingerprint fan-out is its own thing	Accepted (builds on 0014; renamed/scoped by 0018)	Add a per-Pod `bigfleet_scaletest_pod_bind_latency_seconds` histogram (in pod-shim) as the gate source; recast the legacy histogram as a fan-out diagnostic. Addenda: stop falling back to the legacy histogram; make Pod-mode the default.	The legacy histogram measured per-(cluster,fingerprint) fan-out, ramped to the top bucket on a 50-cluster run, and was a gameable gate.	`test/scaletest/cmd/pod-shim/main.go`, `test/scaletest/cmd/scaletest-runner/main.go`, `test/scaletest/cmd/load-driver/main.go`	(exercised through scaletest harness runs)
0018	”binding latency” in the harness is internal-only; the user-facing number lives elsewhere	Accepted (amends 0014; preserves 0017’s gate)	Rename `bindingLatencyP99Seconds` → `internalBindingLatencyP99Seconds`; reframe 0014’s tier targets as internal-only floors (fake provider returns instantly); real-provider validation moves to conformance / out-of-tree scaletests / production canaries.	The in-process fake contributes zero latency, so the harness metric measures only BigFleet’s internal contribution — calling it “what users feel” overstated it.	`test/scaletest/cmd/scaletest-runner/main.go`, `test/scaletest/cmd/pod-shim/main.go`, `test/scaletest/profiles/uber-5k.yaml`, `test/scaletest/profiles/500k.yaml`, `test/scaletest/profiles/dev-50.yaml`	(harness-config change; not unit-tested)
0020	Internal binding-latency SLO must respect the rollup interval	Accepted	Set the harness internal-binding-latency SLO to 15 s (~10 s rollup + ~5 s headroom) rather than lowering the operator’s 10 s `rollupInterval`; cloud profiles carry an explicit override.	The 10 s `rollupInterval` is a hard p99 floor a 5 s SLO can never clear; 10 s rollup is the right production posture.	`test/scaletest/cmd/scaletest-runner/main.go`, `pkg/operator/operator.go`, `test/scaletest/profiles/5k.yaml`, `test/scaletest/profiles/uber-50k.yaml`, `test/scaletest/profiles/uber-1m.yaml`, `test/scaletest/profiles/uber-5m.yaml`	`pkg/operator/operator_test.go`
0023	Real kube-scheduler in the harness, retire pod-shim’s binding role	Accepted	Replace pod-shim’s binder with a real kube-scheduler (`MostAllocated` to preserve ADR-0022 density) per kwok apiserver; keep only `UpcomingNode`→fake-Node as a node-creator binary; gate behind `harness.scheduler`. Harness-only.	Pod-shim’s custom binder (102 s p99) had become the dominant variable in the published numbers — measuring the harness, not BigFleet.	`test/scaletest/cmd/node-creator/main.go`, `test/scaletest/image/entrypoint-apiserver.sh`, `test/scaletest/image/entrypoint-workload.sh`, `test/scaletest/chart/values.yaml`, `test/scaletest/chart/templates/kwok-clusters.yaml`, `test/scaletest/profiles/dev-50.yaml`, `test/scaletest/profiles/dev-500.yaml`, `test/scaletest/profiles/uber-500k.yaml`	(harness infra; validated via kind/cloud runs)
0025	The load-driver anchors `sameRack` groups — a gang-scheduler stand-in	Accepted	The load-driver force-binds one anchor pod per `sameRack` group to break the self-referential podAffinity bootstrap deadlock; kube-scheduler places the rest.	Lets `sameRack` profiles clear the ramp gate while keeping ADR-0024’s real-podAffinity path — gang bootstrapping is genuinely above the autoscaler.	`test/scaletest/cmd/load-driver/main.go`	`test/scaletest/cmd/load-driver/main_test.go`
0026	The scaletest harness must model the Speculative tier	Accepted	`seedFakeInventory` seeds a Speculative quota pool (`--seed-speculative N`, default non-zero) alongside Idle/Configured; slots minted as `OnDemand` with non-zero price + small interruption probability so `effective_cost` is meaningful and Phase 1 prefers Idle then Speculative.	The harness only ever had a fixed Idle pool, so unmet demand became permanent shortfall — leaving BigFleet’s entire elastic-procurement half as dead code, mis-measuring ceilings.	`cmd/bigfleet/shard.go`, `pkg/provider/fake/fake.go`, `test/scaletest/cmd/scaletest-runner/main.go`, `test/scaletest/cmd/scaletest-runner/preflight.go`, `pkg/scaletest/preflight/preflight.go`	`test/scaletest/cmd/scaletest-runner/render_test.go`, `test/scaletest/cmd/scaletest-runner/preflight_test.go`, `test/conformance/selftest_test.go`, `pkg/provider/fake/fake_test.go`
0028	Cycle-p99 SLO is regime-parametric	Accepted	(See Decision engine & cost above — graded on per-Need Phase 1 p99 + cardinality-scaled envelopes; OCC-deferral superseded by 0029.)	Phase 1 wall-clock scales with Need cardinality; the absolute bar grades the workload, not BigFleet.	spec-only	`pkg/decision/phase1_uber5k_bench_test.go`, `pkg/decision/phase1_realistic_bench_test.go`, `pkg/decision/phase1_takecolocated_bench_test.go`
0032	Realistic archetype catalog — production-calibrated distribution	Accepted	Replace the six-archetype catalog with ten Pod-count-weighted archetypes (70% tiny-stateless long tail … 1% gpu/critical), fold sidecar overhead into per-Pod shape, add `allowPartial` + `spreadConstraintProb`/`spreadConstraint` (~42% of Needs).	The prior catalog was miscalibrated (missing modal small Pod, no spread, single-priority, oversized gangs), so every uber-* number benchmarked a non-representative workload.	`pkg/scaletest/archetype/archetype.go`, `test/scaletest/profiles/archetypes/realistic.yaml`, `test/scaletest/cmd/load-driver/main.go`	`pkg/scaletest/archetype/realistic_mix_test.go`, `pkg/scaletest/archetype/archetype_test.go`, `pkg/scaletest/archetype/sizing_test.go`, `test/scaletest/cmd/load-driver/main_test.go`
0033	Phase 1 supply-credit must respect bind readiness	Rejected (superseded by ADR-0035)	Proposed OC1: a Configured machine credits supply only after `UpcomingNode` reaches `Ready`, via `Machine.BindReady` + a `NodeBindReady` stream message. Rejected — the bind plateau was a kube-scheduler ramp property; the fix moved to the harness.	The triggering plateau was a kube-scheduler property under high label-cardinality that only manifests at ramp, and ramp is not an SLO.	spec-only (no code shipped)	—
0034	Scaletest is bring-your-own-substrate	Accepted	Split each `*-Nk.yaml` into a substrate-agnostic test definition (scale/catalog/seed/loadProfile) + a separately-named example substrate (`example-fat-host`, `example-mid-host`, `example-kind-laptop`); runner derives geometry/cost/feasibility from profile × substrate; drop provider-named profiles.	Profiles conflated “what test” with “where to run”, leaking substrate names into filenames and forcing N×M file growth.	`test/scaletest/cmd/scaletest-runner/main.go`, `test/scaletest/substrates/example-fat-host.yaml`, `test/scaletest/substrates/example-mid-host.yaml`, `test/scaletest/substrates/example-kind-laptop.yaml`, `test/scaletest/profiles/5k.yaml`	`test/scaletest/cmd/scaletest-runner/merge_test.go`, `test/scaletest/cmd/scaletest-runner/substrate_test.go`, `test/scaletest/cmd/scaletest-runner/byo_integration_test.go`
0035	Scaletest SLOs are measured at steady state under churn, not at ramp	Accepted (supersedes ADR-0033 + M22 ramp-gating; amended 2026-06-14)	Gate pass/fail on steady-state per-CR binding-latency / cycle / rollup SLOs during a churn soak, inventory pre-seeded + Pods pre-bound at install; ramp becomes observational. 2026-06-14 amendment: reclaim baseline at `soakStart+settleSeconds`, bounded `maxReclaimActionsDuringSoak` gate (accepting the ADR-0021 async floor). Harness-only.	Ramp throughput is dominated by downstream kube-scheduler behaviour and is not the SLO; conflating ramp with the SLO produced a multi-week rabbit hole.	`test/scaletest/cmd/scaletest-runner/main.go`, `test/scaletest/cmd/load-driver/main.go`, `test/scaletest/profiles/dev-50.yaml`, `test/scaletest/profiles/5k.yaml`	(runner gate verified by read; no dedicated `_test.go`)*
0037	Drop synthetic team/app label axes from the catalog	Accepted	The catalog’s node-affinity dimensions must mirror real production (instance-type, zone, hardware only); synthetic ownership axes (`team`, `app`) removed from `realistic.yaml`. The `labelAxes` mechanism is retained for a future real axis.	Routing synthetic fingerprint cardinality through Pod nodeAffinity made kube-scheduler reject 98.6% of placements (bind plateaued at 9.5%); team/app are ownership labels, not node-affinity dimensions.	`test/scaletest/profiles/archetypes/realistic.yaml`, `pkg/scaletest/archetype/archetype.go`	`pkg/scaletest/archetype/archetype_test.go`
0038	Scaletest workloads are controller-managed objects, not bare Pods	Accepted	Load-driver creates Deployments (stateless) / StatefulSets (stateful), one per archetype fingerprint per cluster; the kwok apiserver runs the deployment/replicaset/statefulset controllers so evicted Pods recreate. No BigFleet change.	Bare Pods don’t survive eviction, so every Phase 3 reclaim permanently destroyed demand (CR cascade-GC’d) → a self-sustaining Bootstrap+Reclaim cascade.	`test/scaletest/cmd/load-driver/main.go`, `test/scaletest/image/entrypoint-apiserver.sh`, `test/scaletest/image/Dockerfile`	`test/scaletest/cmd/load-driver/main_test.go`
0043	Harness-observed triggers get a demand-realism check before mechanism ships	Accepted	Any ADR motivated by harness-observed evidence must contain a “Demand realism” section (what demand triggers it, would production emit it, if not fix the harness and re-measure first) before designing mechanism. A gate, not a formality; incident/paper-triggered ADRs exempt.	The single largest pool of unforced engine complexity (the ADR-0042 parking layer) was built against a demand shape one catalog archetype fabricated.	spec-only — codified as a working-discipline rule; first applied in `docs/adr/0044-machine-count-aware-seed-sizing.md`	—
0044	Seed machine pools are sized by machine demand, not workload weight	Accepted (follows from ADR-0043; harness-scope)	Seed machine shares derive from pod demand: `machineShare ∝ podShare/podsPerMachine`, `podsPerMachine` = density for core-resource archetypes / 1 when any bucket requests an extended resource; gang archetypes get a per-zone floor of `max(GroupSizeRange)`.	Weight-proportional pools underweight whole-machine archetypes’ supply by ~density×, so GPU gangs were short 120–238 machines/zone every cycle on a fleet with ample aggregate capacity.	`pkg/scaletest/archetype/sizing.go`, `pkg/scaletest/archetype/archetype.go`, `test/scaletest/cmd/load-driver/main.go`, `test/scaletest/profiles/archetypes/realistic.yaml`	`pkg/scaletest/archetype/sizing_test.go`, `pkg/scaletest/archetype/archetype_test.go`, `pkg/scaletest/archetype/realistic_mix_test.go`, `test/scaletest/cmd/load-driver/main_test.go`
0050	The realism catalog is calibrated to a realistic machine fleet, via per-archetype packing density	Accepted (M78 first step; harness-scope)	Calibrate `realistic.yaml` to a realistic machine fleet (~15% GPU), back-solving weights as `machineShare × podsPerNode / E[replicas]`; replace M66.2’s “GPU density = 1” with a per-archetype `PodsPerNode` (cpu/mem = 100, GPU inference = 8, GPU training = 1).	For a whole-machine GPU workload pod-share IS machine-share, so a realistic ~7% GPU pod mix implies an unrealistic ~90% GPU machine fleet — failing ADR-0043’s realism test.	`pkg/scaletest/archetype/archetype.go`, `pkg/scaletest/archetype/sizing.go`, `test/scaletest/profiles/archetypes/realistic.yaml`, `test/scaletest/cmd/scaletest-runner/main.go`, `pkg/scaletest/preflight/preflight.go`	`pkg/scaletest/archetype/sizing_test.go`, `pkg/scaletest/archetype/realistic_mix_test.go`

Actuation safety

ADR	Title	Status	Decision	Why	Implemented in	Guarded by
0046	Actuation safety rails — reclaim blast-radius cap, empty-roll-up quarantine, kill switch (+ Addendum: shadow mode, ingest validation, audit log)	Accepted (M70)	Three rails at the actuation/ingest boundary (`pkg/decision` untouched): (1) per-cycle per-cluster reclaim cap `max(1, ⌊fraction×C⌋)`, default 0.05, only Reclaim capped (Phase 2 exempt); (2) empty-roll-up quarantine (<10% retained, held until 3 consistent drops); (3) `--actuation-paused` kill switch. Addendum: `--dry-run`, `machine.Invariant` cost-bounds at ingest (reject-loudly), `--audit-log` JSONL.	Nothing bounded the damage of a wrong decision — a zero-demand roll-up could drain a fleet in one cycle; the rails bound actuation volume (not allocation, so §16’s priority-only throttle is intact).	`pkg/shard/safety.go`, `pkg/shard/shard.go`, `pkg/shard/session.go`, `pkg/shard/execute.go`, `pkg/shard/reconcile.go`, `pkg/machine/machine.go`, `pkg/metrics/metrics.go`, `cmd/bigfleet/shard.go`, `cmd/bigfleet/all_in_one.go`, `deploy/helm/bigfleet/values.yaml`, `deploy/helm/bigfleet/templates/shard-statefulset.yaml`	`pkg/shard/safety_test.go`, `pkg/machine/machine_test.go`, `pkg/shard/restart_test.go`
0049	Idle→Speculative release — per-CapacityType idle holds inside Phase 3	Accepted (M73)	Implements paper §8’s release half: Phase 3 emits `Delete` for an Idle machine iff not in the claimed-set AND its per-CapacityType hold (`DefaultReleasePolicy`: bare-metal/reserved = forever, on-demand = 10m, spot = 1m) has expired, from an in-memory idle-since stamp; `executeDelete` walks Idle→Deleting→Speculative. No per-cycle release cap — the hold window is the only rail.	Releasing an Idle machine has zero blast radius (Idle ⇒ unbound, counts for nothing under ADR-0045) and the re-buy loop can’t close (worst case one Create per machine per hold), so the hold window alone bounds churn.	`pkg/decision/release.go`, `pkg/decision/phase3_reclaim.go`, `pkg/decision/action.go`, `pkg/inventory/inventory.go`, `pkg/shard/execute.go`, `pkg/shard/shard.go`, `pkg/metrics/metrics.go`	`pkg/decision/phase3_test.go`, `pkg/shard/execute_delete_test.go`, `pkg/inventory/inventory_test.go`, `sim/m73_release_test.go`, `test/conformance/conformance_test.go`

Process / meta

ADR	Title	Status	Decision	Why	Implemented in	Guarded by
0001	Record architecture decisions	Accepted	Record significant (hard-to-reverse) decisions as sequentially-numbered immutable Markdown ADRs in `docs/adr/`; changing direction means a new superseding ADR, not editing an accepted one.	A discoverable, reviewable audit trail of why each path was chosen, recoverable without spelunking commit history.	`docs/adr/`, `docs/adr/index.md` (process ADR; the index convention is enforced by the project convention that every ADR adds a row to `docs/adr/index.md`)	—

README.md — the internals index; prose deep-dives per subsystem (this page is their ADR→code companion).
domain-attribution.md — the full walkthrough of the ADR-0040→0051 attribution arc (companion deep-dive for the Capacity model & attribution group).
../adr/ and ../adr/index.md — the ADRs themselves and the canonical status table (this page defers to the index on status).