Milestone · Boson AI · Modal overflow

Build & validation report

← Back to plan & tracker

Milestone 6 — Serving + snapshot lifecycle ✅ (stand-in)

Goal: validate the memory-snapshot serving lifecycle the real renderer will use. Code: ~/modal_examples/milestones/m6_renderer_snapshot.py.

What was built

A Modal @app.cls with enable_memory_snapshot=True:

Weights come from the M5 Volume (/weights/model); this file deliberately doesn't download (separation of concerns).

How it was validated (deployed + ran)

[snap=True]  model loaded to CPU in 2.98s
[snap=False] moved to GPU in 0.59s
RENDER: b'MP4-STANDIN:the capital of France is Paris...'
VALIDATION PASSED: snapshotted Cls renders via spawn()->get().

The snap=True/snap=False split is GPU-safe (no CUDA access during snapshot) — the key correctness property.

Code review (separate subagent) — CHANGES NEEDED → addressed

Finding Resolution
"weights never downloaded" (HIGH) Not a bug — weights are provided by M5's Volume. Documented the precondition explicitly so the file isn't read as standalone.
Smoke test proves the method runs, not a snapshot restore Documented scope: validates the lifecycle mechanism only; proving an actual restore needs a 2nd cold boot + log check.
Stand-in understates real-scale gaps Documented: does NOT validate snapshot size, GPU-transfer time, or max_inputs concurrency at the real 75 GB scale.

Honest scope

This validates that the snapshot lifecycle is wired correctly. It does not prove snapshot feasibility at 75 GB (size limits, restore time) — that needs the real image. The snap split itself is verified GPU-safe.

Status: ✅ mechanism validated (stand-in). Real image deploy = the pending gate.