BigFleet internals

Code-level deep-dives for contributors and maintainers. These pages bridge the gap between ../architecture.md — which sketches the two tiers and the three phases at a tour altitude — and the source itself. Each one picks a single subsystem, opens the files, and explains the why: the constraint, the paper section or ADR that fixed it, and the failure it prevents. Read ../architecture.md first for the shape, then come here for the function-by-function detail and the *_test.go that keeps each invariant honest. If instead you are deploying or extending BigFleet, you want the guides under ../index.md (operator-guide, provider-author-guide, api-reference); these internals pages assume you already have, and deliberately go deeper rather than repeat them.

These pages also assume you have skimmed the BigFleet paper (vendored at ../papers/bigfleet.md) and the operating-model paper (vendored at ../papers/fleet-scale-kubernetes.md). They link to the papers rather than re-deriving them.

Source-of-truth ordering

Every page applies the same authority order (from ../index.md). When code and a higher source disagree, the higher source wins and the page documents the divergence explicitly rather than papering over it:

The two papers — ../papers/bigfleet.md, ../papers/fleet-scale-kubernetes.md.
Author decisions in ../adr/.
../plan.md.
The code.

For the canonical ADR status table (Accepted / Proposed / Rejected / Superseded / Amended) see ../adr/index.md. For the ADR→code cross-reference — where each decision is realised and which test guards it — see decision-map.md; it is the maintainer’s companion to these pages.

Reading order

New to the internals? Read in this order:

data-flow.md — the end-to-end picture, an unschedulable pod becoming a bound node.
decision-engine.md — the heart: the worker loop and three phases.
shard-hot-path.md — the loop that runs the engine, and the concurrency model around it.

Then drill into whichever subsystem you are changing, using the grouped table below.

Decision & capacity

The engine that turns demand into provisioning. Start here if you are touching pkg/decision, pkg/needs, or pkg/machine.

Doc	Covers	Read when
`decision-engine.md`	The decision engine: the per-cycle worker loop and the three fixed phases — Phase 1 (assign), Phase 2 (preempt inversions), Phase 3 (reclaim excess) — plus the fixed `effective_cost` and victim-score arithmetic.	Changing any provisioning behaviour, or reasoning about why a machine was acquired, preempted, or released.
`phase1-occ.md`	Phase 1 internals: the Omega-style optimistic-concurrency assignment — how candidate idle inventory is selected by effective cost, claimed, and how conflicting claims resolve within a cycle.	Working on Phase 1 assignment, idle-tiebreak ordering, or concurrency in the assign step.
`machine-lifecycle.md`	The machine state machine: three stable + four transitional + Failed states, the legal transitions, and which provider RPC drives each edge.	Touching `pkg/machine`, adding a transition, or debugging a stuck machine.
`needs-table.md`	NeedsTable, profiles, powers-of-2 penalty bucketing, and `Same`-folding — how full-replacement roll-ups become the priority-sorted demand the engine walks.	Changing demand ingestion, penalty buckets, or `Same`-operator handling.

Shard & coordinator

The two tiers. Start here for pkg/shard (the hot path, autonomous) or pkg/coordinator (the Raft tier).

Doc	Covers	Read when
`shard-hot-path.md`	The shard controller hot path: the cycle loop, inventory snapshotting, session multiplexing on the one bidi stream per cluster, and the lock-light concurrency model. Includes the no-coordinator-dependency guard (`pkg/shard/no_coordinator_dep_test.go`).	Changing `pkg/shard`. Mandatory read before any commit that touches the hot path.
`coordinator-raft.md`	The coordinator: `hashicorp/raft` over BoltDB, the FSM, cluster→shard and topology-domain→shard assignment, quota allocation, and ordinal join / offline-restore (ADR-0047).	Changing `pkg/coordinator`, replication, or assignment.
`static-stability.md`	Static stability: how clusters keep running with BigFleet entirely down, why `pkg/shard` must not import `pkg/coordinator`, and the class of designs this rules out.	Before any change that could put a coordinator dependency on the hot path, or weaken autonomous operation.

Protocols & identity

The wire. Start here for anything in api/proto, api/crd, pkg/provider, or pkg/fencing.

Doc	Covers	Read when
`wire-protocols.md`	Wire protocols and CRDs in depth: `capacity.proto`, `shard.proto` (the operator-initiated bidi `Session`), `coordinator.proto`, `provider.proto`, the CRDs, full-replacement roll-up semantics, and `supersedes_key` stream coalescing.	Changing any proto or CRD, or reasoning about stream/reconnect ordering.
`provider-protocol.md`	The `CapacityProvider` protocol and client: the six RPCs (Create / Configure / Drain / Delete / Get / List — no Watch), `List + Get` reconciliation, the dial-out client and plugin registry, and the test-only fake (`pkg/provider/fake`, never deployed).	Implementing a provider (pair with `../provider-author-guide.md`) or changing `pkg/provider`.
`fencing-and-identity.md`	Fencing and mTLS identity: the term / epoch / sequence helpers in `pkg/fencing`, and the `bigfleet://` URI-SAN identity binding (ADR-0048, superseding ADR-0008’s transport posture).	Changing fencing, stale-write protection, or transport identity.

Operator & lifecycle

The cluster-side agent and the optional pod controller.

Doc	Covers	Read when
`operator-and-controllers.md`	The operator and the unschedulable-pod controller: outbound-only dial, the multiplexed `Shard.Session` stream, `CapacityRequest` CR → NeedsTable aggregation, write-back of `AvailableCapacity` / `UpcomingNode`, and the optional `bigfleet-unschedulable-pod-controller`.	Changing `pkg/operator` or `pkg/controller/cr`.

Scale & testing

How we prove it works and how it survives load.

Doc	Covers	Read when
`scaletest-harness.md`	The scale-test harness architecture: the synthetic Go simulator (`make scale`), the kind rung, the `sim/` workload generators and scenarios, profiles, and how demand is generated and measured.	Working on the harness, a scale profile, or interpreting a scale run.
`testing-and-validation.md`	Testing taxonomy and the validation ladder: unit / property / integration / conformance / e2e, and the `prevalidate → kind → cloud` ladder from `../scaletest.md`.	Deciding where a test belongs, or before filing a cloud brief.

Cross-cutting

The threads that run through every subsystem.

Doc	Covers	Read when
`data-flow.md`	End-to-end data flow: an unschedulable pod becoming a bound node, traced through CR → operator → roll-up → NeedsTable → decision engine → provider → machine state → CR write-back.	Onboarding, or tracing a request across component boundaries.
`domain-attribution.md`	The domain-attribution saga: how `Same`-domain supply crediting evolved across ADR-0040 → ADR-0051 — unified attribution, sub-machine folding, sticky choice, aged-acquisition parking, consumed-capacity model, and gang-granular attribution.	Touching `Same`-domain supply crediting, or debugging a domain-choice flap.
`observability.md`	Metrics and observability catalog: the emitted metrics, what each measures, and which are load-bearing SLO signals (cycle p99, phase trends).	Adding a metric, wiring a dashboard, or reading a scaletest’s Grafana.

If you find a divergence between any of these pages and a higher source-of-truth, fix the code or the page and note it — the ordering above is the project’s source-of-truth policy. Back to the documentation landing page: ../index.md.