NoblePort / Stephanie.ai LangGraph supervisor stack.
This repo runs a production supervisor with strict audit-first governance:
- Audit-first: every state-changing agent action writes a row to
audit_logbefore the agent runs. If the audit insert fails, the request fails (HTTP 503) and the graph never executes. - Postgres truth layer for
audit_log,agent_metrics, andlanggraph_checkpoints. SQLite is supported for tests and local dev. - Async Redis event bus publishes lifecycle and step events on
agent.events. Disabled whenREDIS_URLis unset. - Tool sandbox: only allowlisted executables (default:
pytest,flake8) may run; everything else raisesSandboxRejected. No shell passthrough. - Gate latency metrics: invoke latency stored per-thread / per-event
in
agent_metrics;/metrics/gatesreturns p50/p95/count. - Cooperative kill:
/agent/kill/{thread_id}flips a flag that the supervisor checks between graph steps. True hard cancellation isn't available at the LangGraph version we target — see "Caveats".
| Method | Path | Purpose |
|---|---|---|
| GET | /health |
Liveness + checkpoint-backend identifier |
| GET | /ready |
Verifies DB and (when configured) Redis |
| POST | /agent/invoke |
Audit-first invocation |
| GET | /agent/status/{thread_id} |
Last-known status / kill flag |
| POST | /agent/kill/{thread_id} |
Cooperative cancellation request |
| GET | /metrics/gates |
Latency summary (p50/p95/count) |
See .env.example. Key variables:
DATABASE_URL—postgresql://...in production; falls back tosqlite+aiosqlite:///./nobleport.db.REDIS_URL— leave unset to disable Redis (NullEventBus).AGENT_EVENT_CHANNEL— defaults toagent.events.SANDBOX_ALLOWED_COMMANDS— comma-separated executable basenames.
docker compose up --buildBrings up postgres, redis, and api with sane defaults. The API
listens on :8000.
Without Docker:
pip install -r requirements.txt
pytest -q
uvicorn app.api.app:app --reload- LangGraph PostgresSaver: the supervisor uses
langgraph.checkpoint.postgres.PostgresSaverif it's importable ANDDATABASE_URLpoints at Postgres; otherwise it uses the project-localPostgresCheckpointer(samelanggraph_checkpointstable). The active backend is reported by/health. - Cooperative kill: cancellation is checked between graph steps. A
step that runs for minutes won't be interrupted mid-call; keep step
functions short or wrap long work with
asyncio.wait_for. - Permit / digest scripts in
scripts/are unrelated to the supervisor and continue to run as before.