Milestone 12 — Gateway overflow sidecar (dynamic weight) ✅

Goal: make the gateway spill to Modal only when in-house is full, sized to Modal's real live capacity, using a Modal API number — driving the gateway's weighted scheduler dynamically. Code: ~/modal_examples/milestones/m12_gateway_sidecar.py.

What was built

A sidecar that, every ~3 s:

Polls Modal — Function.get_current_stats() → num_total_runners, num_running_inputs, backlog (verified real fields in modal 1.5).
Computes live free capacity — runners × max_inputs − running (open slots now) + a small warm-up allowance so a cold (0-runner) tier can still be woken — without advertising full scale-up capacity it can't instantly serve.
Sets the gateway weight + max_conns for the Modal endpoint via the LB admin API (HAProxy Runtime API / nginx Plus / Envoy EDS):
- 0 when in-house has room (hysteresis, start 0.85 / stop 0.60) → $0, no overflow
- small warm-up weight to wake a cold tier
- ramps with real capacity as Modal scales
- 0 + shed when Modal is full (runners ≥ max_containers & no slots, or backlog rising)
- max_conns = max_inputs × max_containers (hard cap)
Fail-safe: after N consecutive poll errors, force weight → 0 (don't route to a tier you can't observe) + exponential backoff.

How it was validated (`python3 m12_gateway_sidecar.py`)

ok: capacity cold(warmup)/partial/ceiling/backlog correct
ok: hysteresis + capacity-ramped weight; cold tier woken without flooding
ok: weight snaps to 0 on drain (no stuck 0.1 sliver)
ok: Modal full -> weight 0 + shed
ok: run_loop [0.0, 9.0, 0.0]; fail-safe drops to 0 on API outage
VALIDATION PASSED

Code review (separate subagent) — CHANGES NEEDED → fixed

Finding (severity)	Fix applied
Relied on `input_headroom` (reviewer: nonexistent)	*Re-verified it does* exist in modal 1.5, but switched to unambiguous `num_running_inputs`** (`free = runners×max_inputs − running`) to sidestep undocumented semantics
Cold-start over-promise (HIGH) — advertised full scale-up capacity into a cold tier	Advertise free-now + small warm-up only; weight ramps as real runners materialize
Stuck residual weight 0.1 (EWMA never hit 0)	Snap to 0 when target is 0 — no lingering overflow/cost
`backlog` ignored for saturation	Folded into the full check
No fail-safe on API outage	Force weight 0 + shed after N failures + backoff
Tests asserted impl back to itself	Real-shape fake, drain-to-zero test, fail-safe test added

Note: the reviewer's one Critical (a "nonexistent field") was itself wrong — I verified against the installed SDK. The other findings were valid and are fixed.

Defaults (0.85/0.60 hysteresis, ~3 s poll, warm-up = one container) are illustrative — chosen, not measured; tune with real traffic. The test outputs ([0.0, 9.0, 0.0], etc.) are real (python3 m12_gateway_sidecar.py). FunctionStats fields were confirmed live via the SDK.

Status: ✅ validated — dynamic, capacity-aware overflow weighting with cold-start wake, snap-to-0, and fail-safe. See §16.1 for how it fits the weighted gateway.

Build & validation report

Milestone 12 — Gateway overflow sidecar (dynamic weight) ✅

What was built

How it was validated (python3 m12_gateway_sidecar.py)

Code review (separate subagent) — CHANGES NEEDED → fixed

How it was validated (`python3 m12_gateway_sidecar.py`)