gw: add admin API token authentication#674
Merged
Merged
Conversation
f43dc99 to
54640c8
Compare
Contributor
There was a problem hiding this comment.
Pull request overview
Adds optional shared-secret authentication to the gateway admin Rocket instance (admin RPC + dashboard) to harden previously unauthenticated admin endpoints, while keeping legacy behavior when no token is configured.
Changes:
- Introduces
core.admin.admin_token(empty = auth disabled) and wires it into the admin Rocket instance with a request fairing. - Adds a new
AdminAuthFairingthat validates tokens fromX-Admin-Token,Authorization: Bearer, or?token=..., and rewrites unauthorized requests to a 401 sentinel route. - Plumbs
ADMIN_TOKENthroughdstack-app(compose + entrypoint TOML generation) and updatesbootstrap-cluster.shto send the header when set.
Reviewed changes
Copilot reviewed 8 out of 9 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
gateway/src/main.rs |
Attaches the admin auth fairing and mounts the 401 sentinel route; adds startup logging about auth enablement. |
gateway/src/config.rs |
Extends AdminConfig with admin_token (default empty). |
gateway/src/admin_auth.rs |
Implements token extraction + constant-time comparison and URI-rewrite-based rejection to a sentinel 401 route. |
gateway/gateway.toml |
Documents and adds core.admin.admin_token default. |
gateway/dstack-app/docker-compose.yaml |
Adds ADMIN_TOKEN env passthrough. |
gateway/dstack-app/builder/entrypoint.sh |
Writes admin_token into generated TOML from ADMIN_TOKEN. |
gateway/dstack-app/bootstrap-cluster.sh |
Adds X-Admin-Token header to curl calls when ADMIN_TOKEN is set. |
gateway/Cargo.toml |
Adds subtle dependency for constant-time compare. |
Cargo.lock |
Locks subtle dependency. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+8
to
+12
| //! present the configured shared secret. The token is accepted via, in order: | ||
| //! 1. `X-Admin-Token` header | ||
| //! 2. `Authorization: Bearer <token>` header | ||
| //! 3. `?token=<token>` query parameter (for dashboard links / browser use) | ||
| //! |
Comment on lines
+32
to
+55
| impl AdminAuthFairing { | ||
| pub fn new(token: String) -> Self { | ||
| Self { | ||
| token: (!token.is_empty()).then_some(token), | ||
| } | ||
| } | ||
|
|
||
| fn extract_token(req: &Request<'_>) -> Option<String> { | ||
| if let Some(t) = req.headers().get_one(HEADER_NAME) { | ||
| return Some(t.to_string()); | ||
| } | ||
| if let Some(auth) = req.headers().get_one("Authorization") { | ||
| if let Some(t) = auth.strip_prefix("Bearer ") { | ||
| return Some(t.to_string()); | ||
| } | ||
| } | ||
| for field in req.query_fields() { | ||
| if field.name.key_lossy().as_str() == "token" { | ||
| return Some(field.value.to_string()); | ||
| } | ||
| } | ||
| None | ||
| } | ||
| } |
The admin server has historically relied on network isolation alone: `construct()` does no caller checks, so anyone reachable on the admin port (default `0.0.0.0:8001` inside the deployment container) can call any admin RPC or load the dashboard. Combined with the dashboard's visibility into cluster state, this is an obvious hardening target. Add an opt-in shared-secret check, modelled on KMS `ensure_admin` but using the plain token in config (no hash) so the value can be injected via dstack encrypted env without round-tripping a SHA-256 step. - `AdminConfig.admin_token` (empty = no auth, with a startup WARN so existing deployments keep their current behaviour). - New `admin_auth::AdminAuthFairing` attached to the admin Rocket instance. Accepts the token via `X-Admin-Token`, `Authorization: Bearer <token>`, or `?token=...` (for the browser dashboard). Constant-time compare via `subtle::ConstantTimeEq`. Rejected requests are rewritten to a sentinel URI that returns HTTP 401, so all currently-mounted routes (prpc + dashboard) are covered without per-route guards. - Thread `ADMIN_TOKEN` through `dstack-app` entrypoint + compose so encrypted-env values land in `gateway.toml`. - Update `bootstrap-cluster.sh` to send `X-Admin-Token` when `ADMIN_TOKEN` is set.
54640c8 to
87573b1
Compare
Address review-style improvements lifted from PR #675: - Fail-by-default. When `admin.enabled = true`, gateway now refuses to start unless either an `admin_token` is configured (via `core.admin.admin_token`, `DSTACK_GATEWAY_ADMIN_TOKEN`, or `ADMIN_API_TOKEN`) or `insecure_no_auth = true` is set explicitly. Replaces the previous "empty token = open access + WARN" policy. e2e configs opt into `insecure_no_auth = true`. - Env-var fallback in Rust. `AdminAuthFairing::from_config` resolves the token from config first, then `DSTACK_GATEWAY_ADMIN_TOKEN`, then `ADMIN_API_TOKEN`. Operators running the binary directly no longer have to template the TOML. - SHA-256 in-memory storage. The plaintext token is hashed at startup; only the 32-byte digest is retained. Verification SHA-256s the request token and constant-time compares the digests via `subtle::ConstantTimeEq`. - dstack-app deployment chain renamed `ADMIN_TOKEN` -> `ADMIN_API_TOKEN` end-to-end (deploy-to-vmm.sh auto-generates it into .env, docker-compose forwards it, entrypoint.sh injects it into `gateway.toml`, bootstrap-cluster.sh requires it and sends `Authorization: Bearer`). - Docs in `gateway/docs/cluster-deployment.md` now show the bearer-auth pattern in every admin-curl example. Tests: 4 new cases in `admin_auth::tests` cover the from_config policy (insecure flag, config path, env fallbacks, error message). All 13 `admin_auth` tests pass; clippy with `-D clippy::expect_used -D clippy::unwrap_used` is clean.
Replace `insecure_no_auth = true` with a real `admin_token` in all three e2e gateway configs, and update test.sh to send `Authorization: Bearer` on every admin-RPC call. Adds a Phase-3 "Admin token auth" check that exercises the auth fairing end-to-end: missing token -> 401, wrong token -> 401, correct token -> 200. This makes the e2e suite a regression test for the auth path itself (not just for the certbot/cert-sync flow that follows), so future changes to admin auth can't silently break ingress without a test failure. Verified locally against the gateway binary (out of band of the docker compose harness, which depends on an external TDX endpoint): the 8 exhaustive curl cases (no token / wrong / Bearer / X-Admin-Token / ?token= GET / ?token= POST / dashboard root unauthed+authed) all return the expected status codes. Env-var fallback (DSTACK_GATEWAY_ADMIN_TOKEN) and fail-by-default policy also confirmed working.
prek's shellcheck-py started scanning e2e/test.sh once the previous commit modified it, and surfaced two pre-existing classes of warnings: - SC3043: `local` is undefined in POSIX sh. The script uses bash-only `local` throughout, so the `#!/bin/sh` shebang was already wrong. Switch to `#!/bin/bash`. - SC2155: `local foo=$(cmd)` masks the inner command's exit status. Split into `local foo; foo=$(cmd)` at every site (one inside `test_certificate_from_pebble`, the rest inside `main`). Remaining shellcheck findings are info-level only (SC2086, SC2317); the prek CI run only failed on warning-level findings.
shellcheck-py in prek treats info-level findings as failures, and the previous lint pass left two info-level classes (SC2317 unreachable code, SC2086 unquoted in `[ ]`) — both located inside the `wait_for_service` function. The function is defined but never called (the actual readiness wait lives in run-e2e.sh's docker-compose-driven healthchecks), so deleting it clears the findings without altering test behaviour.
Plain browsers can't set custom headers when typing a URL, so the existing `X-Admin-Token` / `Bearer` / `?token=` paths leave dashboard operators with only the query-param option — which leaks the token via URL bar, history, and Referer. Add `Authorization: Basic <base64(user:pass)>` as a fourth accepted transport. The token may sit in either the username or password field (operators often paste into either side of the native browser prompt), and the 401 sentinel now emits `WWW-Authenticate: Basic realm="..."` so the browser actually shows the prompt instead of a plain error page. RPC clients are unaffected: the new header just goes alongside the existing transports and they ignore the WWW-Authenticate header. Adds 5 tests: password field, username field, wrong password, malformed base64, and the WWW-Authenticate challenge header on 401.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
The gateway admin server has historically had no application-layer auth —
AdminRpcHandler::constructdoesn't inspect the caller at all, so anything reachable on the admin port (default0.0.0.0:8001insidedstack-app/builder/entrypoint.sh) can hit any admin RPC or load the dashboard. Combined with the dashboard's view of cluster state and the fact that DNS-credential redaction was the only recent protection layer, this is an obvious next hardening step.This PR adds an opt-in shared-secret check, modelled on KMS
ensure_adminbut storing the plain token in config (no SHA-256 hash) so it can be injected directly via dstack encrypted env without an extra hashing step on the operator side.Changes
AdminConfig.admin_token(empty = no auth, with a startupWARNso existing deployments keep their current behaviour).New
gateway/src/admin_auth.rs— a RocketFairingattached to the admin instance. Accepts the token via:X-Admin-Token: <token>header (any HTTP method)Authorization: Bearer <token>header (any HTTP method)?token=<token>query parameter GET only (intended for dashboard browser links). After a successful auth thetokenquery param is stripped from the request URI so it does not propagate to access logs, downstream handlers, or the Referer header.Constant-time compare via
subtle::ConstantTimeEq. Rejected requests are rewritten to/__admin_unauthorized, a sentinel route with handlers for GET / POST / PUT / PATCH / DELETE / OPTIONS / HEAD so every method returns 401 (not 404/405).Tests:
gateway/src/admin_auth.rsincludes amod testscovering token-extraction precedence (header > Bearer > query), the GET-only restriction for query token, query-token stripping, wrong-token rejection, empty-token (auth disabled), and 401 across all HTTP methods. Uses Rocket'slocal::asynchronous::Client— no port binding needed.dstack-appplumbing:ADMIN_TOKENenv indocker-compose.yaml, written into[core.admin] admin_tokenbybuilder/entrypoint.sh(with the existingvalidate_envtoml-injection guard).bootstrap-cluster.shnow sendsX-Admin-TokenwhenADMIN_TOKENis set, so cluster bootstrap continues to work after enabling auth.subtledeclared in[workspace.dependencies]for consistency with the rest of the workspace.Why a fairing + URI rewrite (not per-route guards)
The admin server mounts
prpc!(...)macro-generated routes plus the dashboard route. Adding request guards to those would require modifying the macro. AKind::Requestfairing that rewrites the URI to a 401 sentinel covers every current and future route on the admin instance with one attachment, and is the canonical Rocket pattern for early rejection in a fairing.Backward compatibility
admin_token = ""keeps existing deployments unchanged; the startup log emitsWARN admin server enabled without admin_token; admin API is exposed without authentication.ADMIN_TOKEN, all admin endpoints require it (including the dashboard — visithttps://gateway/?token=<token>once, or send the header).Test plan
cargo check -p dstack-gatewaycleancargo clippy -p dstack-gateway -- -D warnings -D clippy::expect_used -D clippy::unwrap_used --allow unused_variablesclean (CI flags)cargo fmt --check --allcleancargo test -p dstack-gateway admin_auth— 10 tests passadmin_token = "s3cr3t"and verify:curl http://127.0.0.1:8011/prpc/Status→ 401curl -H "X-Admin-Token: s3cr3t" http://127.0.0.1:8011/prpc/Status→ 200curl -H "X-Admin-Token: wrong" http://127.0.0.1:8011/prpc/Status→ 401curl -H "Authorization: Bearer s3cr3t" http://127.0.0.1:8011/prpc/Status→ 200curl "http://127.0.0.1:8011/?token=s3cr3t"→ 200 dashboard (URI stripped)curl -X POST "http://127.0.0.1:8011/prpc/Status?token=s3cr3t"→ 401 (query token rejected on POST)curl http://127.0.0.1:8011/→ 401curl -X OPTIONS http://127.0.0.1:8011/prpc/Status→ 401 (not 404/405)admin_token = ""and confirm WARN log + open access (legacy behaviour)bootstrap-cluster.shwith and withoutADMIN_TOKENAddressed review feedback (Copilot)
?token=query param risks leaking via access logs / history / Referertokenparam after successful authmod testswith Rocket local client — 10 cases covering all extraction paths + all-method 401X-Admin-Tokensubtlepinned locally instead of via workspace[workspace.dependencies]andsubtle.workspace = true