Skip to content

feat(tools): Modal GPU sandbox tools — on-demand GPU compute via x402#41

Merged
1bcMax merged 1 commit intomainfrom
feat/modal-sandbox-tools
May 3, 2026
Merged

feat(tools): Modal GPU sandbox tools — on-demand GPU compute via x402#41
1bcMax merged 1 commit intomainfrom
feat/modal-sandbox-tools

Conversation

@KillerQueen-Z
Copy link
Copy Markdown
Collaborator

Summary

Adds four on-demand GPU sandbox tools backed by the BlockRun gateway's Modal endpoints, letting Franklin spin up CPU/T4/L4/A10G/A100/H100 compute, run shell commands, and tear it down — all paid per-call via x402.

Tool Purpose Price
ModalCreate Spin up sandbox at a chosen GPU tier $0.01 (CPU) → $0.40 (H100)
ModalExec Run a shell command in a live sandbox $0.001
ModalStatus Inspect sandbox state $0.001
ModalTerminate Stop sandbox & free resources $0.001

All four are registered but hidden by default (not in CORE_TOOL_NAMES). The agent must ActivateTool({names:["ModalCreate", ...]}) first — high-cost ops shouldn't be in the default surface.

Why now

There was a real gap: the agent could call image/video/LLM gateway endpoints, but couldn't run its own code on a GPU. Anything from a quick CUDA sanity check to running an open-source LoRA needed the user to leave Franklin and SSH somewhere. Now the agent can: ask user → ModalCreate(A100)ModalExec("python train.py")ModalTerminate.

What's in the PR

New files

  • src/tools/modal.ts (766 lines) — the four capabilities, x402 sign+retry helper, in-memory SessionSandboxTracker for cleanup, client-side coercion (GPU tier case-normalization, command: "str"["sh","-c","str"]), cost preview AskUser card before each create, verbose error diagnostics.
  • src/wallet/reservation.ts (118 lines) — local accounting layer so N concurrent paid calls don't all see the same wallet balance and over-commit. Reusable; not Modal-specific.
  • scripts/test-modal-tools.mjs (143 lines) — 30 offline checks: registration, schema, command normalization, tracker semantics, live gateway contract probe (uses 400/402 responses to verify pricing & constraints without spending).

Modified

  • src/tools/index.ts (+6) — register modalCapabilities.
  • src/agent/tool-guard.ts (+18 −2) — FAILURE_EXEMPT set: Modal lifecycle tools (and ImageGen/VideoGen) bypass the 3-fail auto-disable. ModalTerminate especially must always be callable — orphan sandboxes keep billing GPU time, and this tool is the only recovery path agent-side.
  • src/stats/insights.ts (+41) — byCategory: { chatCostUsd, mediaCostUsd, sandboxCostUsd, sandboxRequests } so Usage Insights UI can show a clean "where did your USDC go" split.

Gateway constraints (BlockRun-side, not Modal Labs)

Documented in modal.ts so the agent doesn't hit them blind:

  • Sandbox lifetime hard-capped at 300s
  • Per-exec timeout hard-capped at 60s
  • Image locked to python:3.11

For long-running jobs (training, large downloads), the file documents the fire-and-poll pattern: nohup <cmd> > /workspace/log 2>&1 &, then poll log via ModalExec.

Cost protection

  1. AskUser preview — every ModalCreate shows the user the GPU tier, price, and timeout before charging.
  2. Wallet reservation — concurrent paid calls reserve their slice of balance locally so a $0.20 wallet can't dispatch 6 × $0.04 calls simultaneously and have 1 fail mid-flight.
  3. Failure-exempt ModalTerminate — agent can always recover an orphan sandbox.
  4. Session-end cleanup hookterminateAllSessionSandboxes() exposed for callers to wire into session lifecycle.

Test plan

  • npm run build — green
  • node scripts/test-modal-tools.mjs30/30 passing, including live gateway probe (CPU $0.01, H100 $0.40, image=python:3.11, exec command-as-array, all confirmed against current gateway behavior)
  • Manual: franklin → activate ModalCreate → spin up CPU sandbox → exec nvidia-smi (will fail on CPU, but proves round-trip) → terminate (~$0.012)
  • Manual: GPU smoke test with A10G (~$0.10)

Out of scope (deliberately deferred)

  • Wiring terminateAllSessionSandboxes() into the agent loop's session-end finally block — that lives next to src/api/vscode-session.ts which is mid-refactor on the extension branch. Will follow in a separate PR so this one stays surgical.
  • Long-job fire-and-poll helper as its own tool.

🤖 Generated with Claude Code

Adds four capabilities backed by the BlockRun gateway's Modal sandbox
endpoints, plus session-level cleanup and a wallet-reservation utility
that prevents over-spend across concurrent paid calls.

Tools (registered but hidden by default — agent must ActivateTool to use):
  - ModalCreate    create CPU/T4/L4/A10G/A100/H100 sandbox  ($0.01–0.40)
  - ModalExec      run a shell command in a live sandbox    ($0.001)
  - ModalStatus    inspect sandbox state                    ($0.001)
  - ModalTerminate stop sandbox & free resources            ($0.001)

Other changes:
  - src/wallet/reservation.ts — local accounting layer so concurrent
    paid tools don't all see the same balance and over-commit.
  - src/agent/tool-guard.ts — FAILURE_EXEMPT set so Modal lifecycle
    tools (and ImageGen/VideoGen) aren't auto-disabled after 3 errors.
    ModalTerminate especially must always be callable — orphan sandboxes
    keep billing GPU time.
  - src/stats/insights.ts — byCategory breakdown
    (chat / media / sandbox) for Usage Insights UI.

Gateway hard limits (BlockRun-side, not Modal Labs):
  - sandbox lifetime ≤ 300s
  - per-exec timeout ≤ 60s
  - image locked to python:3.11
For long jobs use the fire-and-poll pattern documented in modal.ts.

Test coverage: scripts/test-modal-tools.mjs runs 30 offline checks
including a live gateway contract probe (uses 400/402 responses, no
spend). All passing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@1bcMax 1bcMax merged commit 78ba6d9 into main May 3, 2026
0 of 2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant