Milestone 7 — Cold-start / per-GPU concurrency measured ✅ (stand-in numbers)

Goal: a re-runnable harness that measures the renderer's cold start, warm latency, and burst throughput. Code: ~/modal_examples/milestones/m7_measure.py.

What was built

A harness that, against the deployed M6 renderer:

forces scale-to-zero (sleeps past scaledown_window), then times a cold call,
times a warm call,
fires an 8-spawn burst (Cls cap max_inputs=4) for a throughput smoke test.

The same script runs verbatim against the real renderer once deployed — the method is final, only the numbers change.

How it was validated (ran against deployed M6)

COLD  call: 10.71s
WARM  call: 0.35s
BURST(8, cap=4): 3.92s, all_ok=True
RESULT: {standin: True, model: SmolLM2-135M, cold_s: 10.71, warm_s: 0.35, speedup: 30.6x}
VALIDATION PASSED: real cold start confirmed

⚠️ Stand-in numbers (SmolLM2-135M, not the real 75 GB FP8 model). Across two runs cold start varied 20.35s → 10.71s — exactly why real-model numbers must be measured on the real image, not extrapolated.

Code review (separate subagent) — CHANGES NEEDED → fixed

Finding	Fix applied
Assertion passed on any positive timing — a surviving-warm container would be silently recorded as "cold"	Added a cold-start sanity floor: `cold_s > 2 and cold_s > 3×warm_s` (fails if the container didn't actually scale to zero)
`RESULT` dict had no stand-in marker → numbers could be lifted as real	Added `standin: True` + `model` to the machine-readable result
8-spawn "concurrency" conflates in-container concurrency (cap 4) with a cold scale-up	Re-framed as a burst throughput smoke test (labeled `BURST(8, cap=4)`), not pure 8-wide concurrency
`cold/warm` division could divide by zero	Guarded (`None` if `warm_s==0`)

Status: ✅ method validated; stand-in numbers. Real cold-start / per-GPU concurrency = the pending gate.

Build & validation report

Milestone 7 — Cold-start / per-GPU concurrency measured ✅ (stand-in numbers)

What was built

How it was validated (ran against deployed M6)

Code review (separate subagent) — CHANGES NEEDED → fixed