Milestone 12 — Gateway overflow sidecar (dynamic weight) ✅
Goal: make the gateway spill to Modal only when in-house is full, sized to Modal's real live capacity, using a Modal API number — driving the gateway's weighted scheduler dynamically. Code: ~/modal_examples/milestones/m12_gateway_sidecar.py.
What was built
A sidecar that, every ~3 s:
- Polls Modal —
Function.get_current_stats()→num_total_runners,num_running_inputs,backlog(verified real fields in modal 1.5). - Computes live free capacity —
runners × max_inputs − running(open slots now) + a small warm-up allowance so a cold (0-runner) tier can still be woken — without advertising full scale-up capacity it can't instantly serve. - Sets the gateway weight +
max_connsfor the Modal endpoint via the LB admin API (HAProxy Runtime API / nginx Plus / Envoy EDS):- 0 when in-house has room (hysteresis, start 0.85 / stop 0.60) → $0, no overflow
- small warm-up weight to wake a cold tier
- ramps with real capacity as Modal scales
- 0 + shed when Modal is full (
runners ≥ max_containers& no slots, orbacklogrising) max_conns = max_inputs × max_containers(hard cap)
- Fail-safe: after N consecutive poll errors, force weight → 0 (don't route to a tier you can't observe) + exponential backoff.
How it was validated (python3 m12_gateway_sidecar.py)
ok: capacity cold(warmup)/partial/ceiling/backlog correct
ok: hysteresis + capacity-ramped weight; cold tier woken without flooding
ok: weight snaps to 0 on drain (no stuck 0.1 sliver)
ok: Modal full -> weight 0 + shed
ok: run_loop [0.0, 9.0, 0.0]; fail-safe drops to 0 on API outage
VALIDATION PASSED
Code review (separate subagent) — CHANGES NEEDED → fixed
| Finding (severity) | Fix applied |
|---|---|
Relied on input_headroom (reviewer: nonexistent) |
Re-verified it does exist in modal 1.5, but switched to unambiguous num_running_inputs (free = runners×max_inputs − running) to sidestep undocumented semantics |
| Cold-start over-promise (HIGH) — advertised full scale-up capacity into a cold tier | Advertise free-now + small warm-up only; weight ramps as real runners materialize |
| Stuck residual weight 0.1 (EWMA never hit 0) | Snap to 0 when target is 0 — no lingering overflow/cost |
backlog ignored for saturation |
Folded into the full check |
| No fail-safe on API outage | Force weight 0 + shed after N failures + backoff |
| Tests asserted impl back to itself | Real-shape fake, drain-to-zero test, fail-safe test added |
Note: the reviewer's one Critical (a "nonexistent field") was itself wrong — I verified against the installed SDK. The other findings were valid and are fixed.
Defaults (
0.85/0.60hysteresis,~3 spoll, warm-up = one container) are illustrative — chosen, not measured; tune with real traffic. The test outputs ([0.0, 9.0, 0.0], etc.) are real (python3 m12_gateway_sidecar.py).FunctionStatsfields were confirmed live via the SDK.
Status: ✅ validated — dynamic, capacity-aware overflow weighting with cold-start wake, snap-to-0, and fail-safe. See §16.1 for how it fits the weighted gateway.