Milestone 6 — Serving + snapshot lifecycle ✅ (stand-in)
Goal: validate the memory-snapshot serving lifecycle the real renderer will use. Code: ~/modal_examples/milestones/m6_renderer_snapshot.py.
What was built
A Modal @app.cls with enable_memory_snapshot=True:
@modal.enter(snap=True)— heavy model init on CPU (captured in the snapshot), no GPU access.@modal.enter(snap=False)— the single.to("cuda"), run on each restore.@modal.method() render()— invoked viaspawn(), returns bytes (stand-in for MP4).
Weights come from the M5 Volume (/weights/model); this file deliberately doesn't download (separation of concerns).
How it was validated (deployed + ran)
[snap=True] model loaded to CPU in 2.98s
[snap=False] moved to GPU in 0.59s
RENDER: b'MP4-STANDIN:the capital of France is Paris...'
VALIDATION PASSED: snapshotted Cls renders via spawn()->get().
The snap=True/snap=False split is GPU-safe (no CUDA access during snapshot) — the key correctness property.
Code review (separate subagent) — CHANGES NEEDED → addressed
| Finding | Resolution |
|---|---|
| "weights never downloaded" (HIGH) | Not a bug — weights are provided by M5's Volume. Documented the precondition explicitly so the file isn't read as standalone. |
| Smoke test proves the method runs, not a snapshot restore | Documented scope: validates the lifecycle mechanism only; proving an actual restore needs a 2nd cold boot + log check. |
| Stand-in understates real-scale gaps | Documented: does NOT validate snapshot size, GPU-transfer time, or max_inputs concurrency at the real 75 GB scale. |
Honest scope
This validates that the snapshot lifecycle is wired correctly. It does not prove snapshot feasibility at 75 GB (size limits, restore time) — that needs the real image. The snap split itself is verified GPU-safe.
Status: ✅ mechanism validated (stand-in). Real image deploy = the pending gate.