feat(tools): Modal GPU sandbox tools — on-demand GPU compute via x402#41
Merged
feat(tools): Modal GPU sandbox tools — on-demand GPU compute via x402#41
Conversation
Adds four capabilities backed by the BlockRun gateway's Modal sandbox
endpoints, plus session-level cleanup and a wallet-reservation utility
that prevents over-spend across concurrent paid calls.
Tools (registered but hidden by default — agent must ActivateTool to use):
- ModalCreate create CPU/T4/L4/A10G/A100/H100 sandbox ($0.01–0.40)
- ModalExec run a shell command in a live sandbox ($0.001)
- ModalStatus inspect sandbox state ($0.001)
- ModalTerminate stop sandbox & free resources ($0.001)
Other changes:
- src/wallet/reservation.ts — local accounting layer so concurrent
paid tools don't all see the same balance and over-commit.
- src/agent/tool-guard.ts — FAILURE_EXEMPT set so Modal lifecycle
tools (and ImageGen/VideoGen) aren't auto-disabled after 3 errors.
ModalTerminate especially must always be callable — orphan sandboxes
keep billing GPU time.
- src/stats/insights.ts — byCategory breakdown
(chat / media / sandbox) for Usage Insights UI.
Gateway hard limits (BlockRun-side, not Modal Labs):
- sandbox lifetime ≤ 300s
- per-exec timeout ≤ 60s
- image locked to python:3.11
For long jobs use the fire-and-poll pattern documented in modal.ts.
Test coverage: scripts/test-modal-tools.mjs runs 30 offline checks
including a live gateway contract probe (uses 400/402 responses, no
spend). All passing.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds four on-demand GPU sandbox tools backed by the BlockRun gateway's Modal endpoints, letting Franklin spin up CPU/T4/L4/A10G/A100/H100 compute, run shell commands, and tear it down — all paid per-call via x402.
ModalCreateModalExecModalStatusModalTerminateAll four are registered but hidden by default (not in
CORE_TOOL_NAMES). The agent mustActivateTool({names:["ModalCreate", ...]})first — high-cost ops shouldn't be in the default surface.Why now
There was a real gap: the agent could call image/video/LLM gateway endpoints, but couldn't run its own code on a GPU. Anything from a quick CUDA sanity check to running an open-source LoRA needed the user to leave Franklin and SSH somewhere. Now the agent can: ask user →
ModalCreate(A100)→ModalExec("python train.py")→ModalTerminate.What's in the PR
New files
src/tools/modal.ts(766 lines) — the four capabilities, x402 sign+retry helper, in-memorySessionSandboxTrackerfor cleanup, client-side coercion (GPU tier case-normalization,command: "str"→["sh","-c","str"]), cost preview AskUser card before each create, verbose error diagnostics.src/wallet/reservation.ts(118 lines) — local accounting layer so N concurrent paid calls don't all see the same wallet balance and over-commit. Reusable; not Modal-specific.scripts/test-modal-tools.mjs(143 lines) — 30 offline checks: registration, schema, command normalization, tracker semantics, live gateway contract probe (uses 400/402 responses to verify pricing & constraints without spending).Modified
src/tools/index.ts(+6) — registermodalCapabilities.src/agent/tool-guard.ts(+18 −2) —FAILURE_EXEMPTset: Modal lifecycle tools (andImageGen/VideoGen) bypass the 3-fail auto-disable.ModalTerminateespecially must always be callable — orphan sandboxes keep billing GPU time, and this tool is the only recovery path agent-side.src/stats/insights.ts(+41) —byCategory: { chatCostUsd, mediaCostUsd, sandboxCostUsd, sandboxRequests }so Usage Insights UI can show a clean "where did your USDC go" split.Gateway constraints (BlockRun-side, not Modal Labs)
Documented in
modal.tsso the agent doesn't hit them blind:python:3.11For long-running jobs (training, large downloads), the file documents the fire-and-poll pattern:
nohup <cmd> > /workspace/log 2>&1 &, then poll log viaModalExec.Cost protection
ModalCreateshows the user the GPU tier, price, and timeout before charging.ModalTerminate— agent can always recover an orphan sandbox.terminateAllSessionSandboxes()exposed for callers to wire into session lifecycle.Test plan
npm run build— greennode scripts/test-modal-tools.mjs— 30/30 passing, including live gateway probe (CPU $0.01, H100 $0.40, image=python:3.11, exec command-as-array, all confirmed against current gateway behavior)franklin→ activateModalCreate→ spin up CPU sandbox → execnvidia-smi(will fail on CPU, but proves round-trip) → terminate (~$0.012)Out of scope (deliberately deferred)
terminateAllSessionSandboxes()into the agent loop's session-end finally block — that lives next tosrc/api/vscode-session.tswhich is mid-refactor on the extension branch. Will follow in a separate PR so this one stays surgical.🤖 Generated with Claude Code