ADR-0042 Addendum: aged acquisition parking — the escalation path, engaged
Status
Accepted, 2026-06-11. Extends ADR-0042 after its cloud validation (bigfleet-uber #57) returned PARTIAL.
Context
The equal-coverage incumbent tie-break cut the unsatisfiable-gang
churn ~3× (≈27 → ≈9/sec at uber-5k) — the acquisition half largely
stopped (acquired=0 in 23/25 probe samples) — but domains kept
re-selecting: under 20-cluster contention for the shard-wide pool,
per-domain acquirable totals shift slightly every cycle, so coverage
between candidate domains is rarely exactly equal and the
strictly-greater branch keeps firing on marginal deltas. Exact-tie
pinning is too narrow to hold under perturbation; the residual
~9/sec is Phase 3 reclaiming the partial assemblies that migration
abandons. This is precisely the contingency ADR-0042’s escalation
path named.
Decision
Three pieces, shipped together:
- Group identity crosses the wire.
CapacityNeed.group(field 9) carries the operator’s opaque co-location group ID — one value per gang, empty for plain needs. The shard’s per-gang bookkeeping and the gang-attribution probe previously fell back to fingerprints becauseNeed.Groupwas operator-side only; the #57 probe data was class-level as a result. The autoscaler derives nothing from the value beyond equality. - Aged acquisition parking. The shard counts consecutive cycles
each Same-Need class (cluster + group + fingerprint) is Phase 1-
unsatisfied with zero acquisition AND no structurally satisfiable
bucket (the pre-pass’s joint verdict, surfaced as
SameSatisfiable) — so a gang that is still concentrating, or that merely lost claim races while a satisfiable bucket existed, never ages: parking is concentrate-THEN-park, faithfully. AtparkAfterCycles(8 — debounce against snapshot flicker) the class’s Needs are stampedAcquisitionParkedbefore the phases run. Every supply site honours the stamp identically: the Phase 1 seed pass and Phase 3’s joint fold go creditable-only (no acquirable bucket exists, so the incumbent wins trivially and there is nothing to flip to or migrate toward), and acquisition candidates are empty. A parked gang keeps its concentrated partial assembly, drives zero Bootstrap/Reclaim, and ages in the shortfall buffer — which is the paper’s escalation surface for unsatisfiable demand. Priority remains the sole throttle: parking never reorders anything, it only stops futile re-attempts. - Re-probe, so parking is never forever. Every
reprobeEveryCycles(32) a parked class un-parks for exactly one cycle. If supply has appeared it acquires, leaves the unsatisfied set, and the age ledger forgets it (un-parked permanently until a future relapse restarts the debounce); if not, one cycle of attempts (gangs are all-or-nothing proposals, so a futile re-probe acquires nothing) and it re-parks.
Consequences
- The age ledger is the only new state: a per-class int map on the shard, touched solely by the cycle goroutine, pruned to the live unsatisfied set each cycle. No coordinator involvement; static stability untouched.
- Two tunables exist (8 / 32), as the escalation path predicted. Both are constants, not configuration — YAGNI until evidence demands otherwise.
- The gang-attribution probe now emits real
groupids plus aparkedbool. Expected cloud signature:parked=truelines with stablechosen_domainandacquired=0, Bootstrap/Reclaim ≈ 0 post-fill,p1_unsatisfiedlegitimately holding at the parked-gang count. - Fill-mode dependence (#58): the gang cascade parking targets is a product of live-fill fragmentation — the scheduler’s incremental placement leaving gangs unable to find contiguous rack space. Under a pre-packed install (preBind + full Configured seed) the same gangs concentrate cleanly and parking correctly stays idle. Two consequences: production-shaped fills (always incremental) are exactly where parking matters, and preBind validation runs systematically under-test this engine path — the live-fill rows in docs/scaletest.md’s decision table now note it.
- Probe v3 (#58 follow-up): the attribution probe also samples
Phase 3’s reclaim set per cycle (
reclaim attribution: machine, cluster, instance type, matches/fits an unsatisfied Need) — the unsatisfied-only gang probe is blind by construction to the excess-inventory oscillation #58 surfaced (sustained Bootstrap≈Reclaim where reclaims match no unsatisfied Need). - The deterministic sim still cannot discriminate the perturbation
(ADR-0042’s honesty note stands); the parking bookkeeping is pinned
by unit tests (
TestParkingBookkeeping) and the contention canary now asserts the contract order-independently — zero reclaims/ preempts/evictions across the whole run, every acquired machine still serving at the end — because the OCC worker pool makes per-cycle claim winners timing-dependent and a fixed quiet-window assertion flakes. The cloud re-run decides, probe in hand.