feat(tools): Modal GPU sandbox tools — on-demand GPU compute via x402 by KillerQueen-Z · Pull Request #41 · BlockRunAI/Franklin

KillerQueen-Z · 2026-05-03T09:57:51Z

Summary

Adds four on-demand GPU sandbox tools backed by the BlockRun gateway's Modal endpoints, letting Franklin spin up CPU/T4/L4/A10G/A100/H100 compute, run shell commands, and tear it down — all paid per-call via x402.

Tool	Purpose	Price
`ModalCreate`	Spin up sandbox at a chosen GPU tier	$0.01 (CPU) → $0.40 (H100)
`ModalExec`	Run a shell command in a live sandbox	$0.001
`ModalStatus`	Inspect sandbox state	$0.001
`ModalTerminate`	Stop sandbox & free resources	$0.001

All four are registered but hidden by default (not in CORE_TOOL_NAMES). The agent must ActivateTool({names:["ModalCreate", ...]}) first — high-cost ops shouldn't be in the default surface.

Why now

There was a real gap: the agent could call image/video/LLM gateway endpoints, but couldn't run its own code on a GPU. Anything from a quick CUDA sanity check to running an open-source LoRA needed the user to leave Franklin and SSH somewhere. Now the agent can: ask user → ModalCreate(A100) → ModalExec("python train.py") → ModalTerminate.

What's in the PR

New files

src/tools/modal.ts (766 lines) — the four capabilities, x402 sign+retry helper, in-memory SessionSandboxTracker for cleanup, client-side coercion (GPU tier case-normalization, command: "str" → ["sh","-c","str"]), cost preview AskUser card before each create, verbose error diagnostics.
src/wallet/reservation.ts (118 lines) — local accounting layer so N concurrent paid calls don't all see the same wallet balance and over-commit. Reusable; not Modal-specific.
scripts/test-modal-tools.mjs (143 lines) — 30 offline checks: registration, schema, command normalization, tracker semantics, live gateway contract probe (uses 400/402 responses to verify pricing & constraints without spending).

Modified

src/tools/index.ts (+6) — register modalCapabilities.
src/agent/tool-guard.ts (+18 −2) — FAILURE_EXEMPT set: Modal lifecycle tools (and ImageGen/VideoGen) bypass the 3-fail auto-disable. ModalTerminate especially must always be callable — orphan sandboxes keep billing GPU time, and this tool is the only recovery path agent-side.
src/stats/insights.ts (+41) — byCategory: { chatCostUsd, mediaCostUsd, sandboxCostUsd, sandboxRequests } so Usage Insights UI can show a clean "where did your USDC go" split.

Gateway constraints (BlockRun-side, not Modal Labs)

Documented in modal.ts so the agent doesn't hit them blind:

Sandbox lifetime hard-capped at 300s
Per-exec timeout hard-capped at 60s
Image locked to python:3.11

For long-running jobs (training, large downloads), the file documents the fire-and-poll pattern: nohup <cmd> > /workspace/log 2>&1 &, then poll log via ModalExec.

Cost protection

AskUser preview — every ModalCreate shows the user the GPU tier, price, and timeout before charging.
Wallet reservation — concurrent paid calls reserve their slice of balance locally so a $0.20 wallet can't dispatch 6 × $0.04 calls simultaneously and have 1 fail mid-flight.
Failure-exempt ModalTerminate — agent can always recover an orphan sandbox.
Session-end cleanup hook — terminateAllSessionSandboxes() exposed for callers to wire into session lifecycle.

Test plan

npm run build — green
node scripts/test-modal-tools.mjs — 30/30 passing, including live gateway probe (CPU $0.01, H100 $0.40, image=python:3.11, exec command-as-array, all confirmed against current gateway behavior)
Manual: franklin → activate ModalCreate → spin up CPU sandbox → exec nvidia-smi (will fail on CPU, but proves round-trip) → terminate (~$0.012)
Manual: GPU smoke test with A10G (~$0.10)

Out of scope (deliberately deferred)

Wiring terminateAllSessionSandboxes() into the agent loop's session-end finally block — that lives next to src/api/vscode-session.ts which is mid-refactor on the extension branch. Will follow in a separate PR so this one stays surgical.
Long-job fire-and-poll helper as its own tool.

🤖 Generated with Claude Code

Adds four capabilities backed by the BlockRun gateway's Modal sandbox endpoints, plus session-level cleanup and a wallet-reservation utility that prevents over-spend across concurrent paid calls. Tools (registered but hidden by default — agent must ActivateTool to use): - ModalCreate create CPU/T4/L4/A10G/A100/H100 sandbox ($0.01–0.40) - ModalExec run a shell command in a live sandbox ($0.001) - ModalStatus inspect sandbox state ($0.001) - ModalTerminate stop sandbox & free resources ($0.001) Other changes: - src/wallet/reservation.ts — local accounting layer so concurrent paid tools don't all see the same balance and over-commit. - src/agent/tool-guard.ts — FAILURE_EXEMPT set so Modal lifecycle tools (and ImageGen/VideoGen) aren't auto-disabled after 3 errors. ModalTerminate especially must always be callable — orphan sandboxes keep billing GPU time. - src/stats/insights.ts — byCategory breakdown (chat / media / sandbox) for Usage Insights UI. Gateway hard limits (BlockRun-side, not Modal Labs): - sandbox lifetime ≤ 300s - per-exec timeout ≤ 60s - image locked to python:3.11 For long jobs use the fire-and-poll pattern documented in modal.ts. Test coverage: scripts/test-modal-tools.mjs runs 30 offline checks including a live gateway contract probe (uses 400/402 responses, no spend). All passing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

1bcMax merged commit 78ba6d9 into main May 3, 2026
0 of 2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(tools): Modal GPU sandbox tools — on-demand GPU compute via x402#41

feat(tools): Modal GPU sandbox tools — on-demand GPU compute via x402#41
1bcMax merged 1 commit intomainfrom
feat/modal-sandbox-tools

KillerQueen-Z commented May 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

KillerQueen-Z commented May 3, 2026

Summary

Why now

What's in the PR

New files

Modified

Gateway constraints (BlockRun-side, not Modal Labs)

Cost protection

Test plan

Out of scope (deliberately deferred)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant