Skip to content

gw: add admin API token authentication#674

Merged
kvinwang merged 6 commits into
masterfrom
feat/gateway-admin-token
May 12, 2026
Merged

gw: add admin API token authentication#674
kvinwang merged 6 commits into
masterfrom
feat/gateway-admin-token

Conversation

@kvinwang
Copy link
Copy Markdown
Collaborator

@kvinwang kvinwang commented May 11, 2026

Summary

The gateway admin server has historically had no application-layer authAdminRpcHandler::construct doesn't inspect the caller at all, so anything reachable on the admin port (default 0.0.0.0:8001 inside dstack-app/builder/entrypoint.sh) can hit any admin RPC or load the dashboard. Combined with the dashboard's view of cluster state and the fact that DNS-credential redaction was the only recent protection layer, this is an obvious next hardening step.

This PR adds an opt-in shared-secret check, modelled on KMS ensure_admin but storing the plain token in config (no SHA-256 hash) so it can be injected directly via dstack encrypted env without an extra hashing step on the operator side.

Changes

  • AdminConfig.admin_token (empty = no auth, with a startup WARN so existing deployments keep their current behaviour).

  • New gateway/src/admin_auth.rs — a Rocket Fairing attached to the admin instance. Accepts the token via:

    • X-Admin-Token: <token> header (any HTTP method)
    • Authorization: Bearer <token> header (any HTTP method)
    • ?token=<token> query parameter GET only (intended for dashboard browser links). After a successful auth the token query param is stripped from the request URI so it does not propagate to access logs, downstream handlers, or the Referer header.

    Constant-time compare via subtle::ConstantTimeEq. Rejected requests are rewritten to /__admin_unauthorized, a sentinel route with handlers for GET / POST / PUT / PATCH / DELETE / OPTIONS / HEAD so every method returns 401 (not 404/405).

  • Tests: gateway/src/admin_auth.rs includes a mod tests covering token-extraction precedence (header > Bearer > query), the GET-only restriction for query token, query-token stripping, wrong-token rejection, empty-token (auth disabled), and 401 across all HTTP methods. Uses Rocket's local::asynchronous::Client — no port binding needed.

  • dstack-app plumbing: ADMIN_TOKEN env in docker-compose.yaml, written into [core.admin] admin_token by builder/entrypoint.sh (with the existing validate_env toml-injection guard).

  • bootstrap-cluster.sh now sends X-Admin-Token when ADMIN_TOKEN is set, so cluster bootstrap continues to work after enabling auth.

  • subtle declared in [workspace.dependencies] for consistency with the rest of the workspace.

Why a fairing + URI rewrite (not per-route guards)

The admin server mounts prpc!(...) macro-generated routes plus the dashboard route. Adding request guards to those would require modifying the macro. A Kind::Request fairing that rewrites the URI to a 401 sentinel covers every current and future route on the admin instance with one attachment, and is the canonical Rocket pattern for early rejection in a fairing.

Backward compatibility

  • Default admin_token = "" keeps existing deployments unchanged; the startup log emits WARN admin server enabled without admin_token; admin API is exposed without authentication.
  • Once an operator sets ADMIN_TOKEN, all admin endpoints require it (including the dashboard — visit https://gateway/?token=<token> once, or send the header).

Test plan

  • cargo check -p dstack-gateway clean
  • cargo clippy -p dstack-gateway -- -D warnings -D clippy::expect_used -D clippy::unwrap_used --allow unused_variables clean (CI flags)
  • cargo fmt --check --all clean
  • cargo test -p dstack-gateway admin_auth — 10 tests pass
  • Manual: start gateway with admin_token = "s3cr3t" and verify:
    • curl http://127.0.0.1:8011/prpc/Status → 401
    • curl -H "X-Admin-Token: s3cr3t" http://127.0.0.1:8011/prpc/Status → 200
    • curl -H "X-Admin-Token: wrong" http://127.0.0.1:8011/prpc/Status → 401
    • curl -H "Authorization: Bearer s3cr3t" http://127.0.0.1:8011/prpc/Status → 200
    • curl "http://127.0.0.1:8011/?token=s3cr3t" → 200 dashboard (URI stripped)
    • curl -X POST "http://127.0.0.1:8011/prpc/Status?token=s3cr3t" → 401 (query token rejected on POST)
    • curl http://127.0.0.1:8011/ → 401
    • curl -X OPTIONS http://127.0.0.1:8011/prpc/Status → 401 (not 404/405)
  • Manual: start gateway with admin_token = "" and confirm WARN log + open access (legacy behaviour)
  • Manual: e2e cluster bootstrap via bootstrap-cluster.sh with and without ADMIN_TOKEN

Addressed review feedback (Copilot)

# Comment Fix
1 ?token= query param risks leaking via access logs / history / Referer Restricted query token to GET only; URI is stripped of token param after successful auth
2 Sentinel only had GET/POST handlers — other methods would 404/405 Added handlers for PUT / PATCH / DELETE / OPTIONS / HEAD
3 No tests for the new auth path Added mod tests with Rocket local client — 10 cases covering all extraction paths + all-method 401
4 Startup log mentioned only X-Admin-Token Simplified to "admin server authentication enabled"
5 subtle pinned locally instead of via workspace Moved to [workspace.dependencies] and subtle.workspace = true

Copilot AI review requested due to automatic review settings May 11, 2026 14:28
@kvinwang kvinwang force-pushed the feat/gateway-admin-token branch from f43dc99 to 54640c8 Compare May 11, 2026 14:32
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds optional shared-secret authentication to the gateway admin Rocket instance (admin RPC + dashboard) to harden previously unauthenticated admin endpoints, while keeping legacy behavior when no token is configured.

Changes:

  • Introduces core.admin.admin_token (empty = auth disabled) and wires it into the admin Rocket instance with a request fairing.
  • Adds a new AdminAuthFairing that validates tokens from X-Admin-Token, Authorization: Bearer, or ?token=..., and rewrites unauthorized requests to a 401 sentinel route.
  • Plumbs ADMIN_TOKEN through dstack-app (compose + entrypoint TOML generation) and updates bootstrap-cluster.sh to send the header when set.

Reviewed changes

Copilot reviewed 8 out of 9 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
gateway/src/main.rs Attaches the admin auth fairing and mounts the 401 sentinel route; adds startup logging about auth enablement.
gateway/src/config.rs Extends AdminConfig with admin_token (default empty).
gateway/src/admin_auth.rs Implements token extraction + constant-time comparison and URI-rewrite-based rejection to a sentinel 401 route.
gateway/gateway.toml Documents and adds core.admin.admin_token default.
gateway/dstack-app/docker-compose.yaml Adds ADMIN_TOKEN env passthrough.
gateway/dstack-app/builder/entrypoint.sh Writes admin_token into generated TOML from ADMIN_TOKEN.
gateway/dstack-app/bootstrap-cluster.sh Adds X-Admin-Token header to curl calls when ADMIN_TOKEN is set.
gateway/Cargo.toml Adds subtle dependency for constant-time compare.
Cargo.lock Locks subtle dependency.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread gateway/src/admin_auth.rs
Comment on lines +8 to +12
//! present the configured shared secret. The token is accepted via, in order:
//! 1. `X-Admin-Token` header
//! 2. `Authorization: Bearer <token>` header
//! 3. `?token=<token>` query parameter (for dashboard links / browser use)
//!
Comment thread gateway/src/admin_auth.rs Outdated
Comment thread gateway/src/admin_auth.rs
Comment on lines +32 to +55
impl AdminAuthFairing {
pub fn new(token: String) -> Self {
Self {
token: (!token.is_empty()).then_some(token),
}
}

fn extract_token(req: &Request<'_>) -> Option<String> {
if let Some(t) = req.headers().get_one(HEADER_NAME) {
return Some(t.to_string());
}
if let Some(auth) = req.headers().get_one("Authorization") {
if let Some(t) = auth.strip_prefix("Bearer ") {
return Some(t.to_string());
}
}
for field in req.query_fields() {
if field.name.key_lossy().as_str() == "token" {
return Some(field.value.to_string());
}
}
None
}
}
Comment thread gateway/src/main.rs Outdated
Comment thread gateway/Cargo.toml
The admin server has historically relied on network isolation alone:
`construct()` does no caller checks, so anyone reachable on the admin
port (default `0.0.0.0:8001` inside the deployment container) can call
any admin RPC or load the dashboard. Combined with the dashboard's
visibility into cluster state, this is an obvious hardening target.

Add an opt-in shared-secret check, modelled on KMS `ensure_admin` but
using the plain token in config (no hash) so the value can be injected
via dstack encrypted env without round-tripping a SHA-256 step.

- `AdminConfig.admin_token` (empty = no auth, with a startup WARN so
  existing deployments keep their current behaviour).
- New `admin_auth::AdminAuthFairing` attached to the admin Rocket
  instance. Accepts the token via `X-Admin-Token`, `Authorization:
  Bearer <token>`, or `?token=...` (for the browser dashboard).
  Constant-time compare via `subtle::ConstantTimeEq`. Rejected
  requests are rewritten to a sentinel URI that returns HTTP 401, so
  all currently-mounted routes (prpc + dashboard) are covered without
  per-route guards.
- Thread `ADMIN_TOKEN` through `dstack-app` entrypoint + compose so
  encrypted-env values land in `gateway.toml`.
- Update `bootstrap-cluster.sh` to send `X-Admin-Token` when
  `ADMIN_TOKEN` is set.
@kvinwang kvinwang force-pushed the feat/gateway-admin-token branch from 54640c8 to 87573b1 Compare May 11, 2026 14:48
kvinwang added 4 commits May 11, 2026 08:20
Address review-style improvements lifted from PR #675:

- Fail-by-default. When `admin.enabled = true`, gateway now refuses to
  start unless either an `admin_token` is configured (via `core.admin.admin_token`,
  `DSTACK_GATEWAY_ADMIN_TOKEN`, or `ADMIN_API_TOKEN`) or `insecure_no_auth = true`
  is set explicitly. Replaces the previous "empty token = open access + WARN"
  policy. e2e configs opt into `insecure_no_auth = true`.
- Env-var fallback in Rust. `AdminAuthFairing::from_config` resolves the
  token from config first, then `DSTACK_GATEWAY_ADMIN_TOKEN`, then
  `ADMIN_API_TOKEN`. Operators running the binary directly no longer have
  to template the TOML.
- SHA-256 in-memory storage. The plaintext token is hashed at startup;
  only the 32-byte digest is retained. Verification SHA-256s the request
  token and constant-time compares the digests via `subtle::ConstantTimeEq`.
- dstack-app deployment chain renamed `ADMIN_TOKEN` -> `ADMIN_API_TOKEN`
  end-to-end (deploy-to-vmm.sh auto-generates it into .env, docker-compose
  forwards it, entrypoint.sh injects it into `gateway.toml`,
  bootstrap-cluster.sh requires it and sends `Authorization: Bearer`).
- Docs in `gateway/docs/cluster-deployment.md` now show the bearer-auth
  pattern in every admin-curl example.

Tests: 4 new cases in `admin_auth::tests` cover the from_config policy
(insecure flag, config path, env fallbacks, error message). All 13
`admin_auth` tests pass; clippy with `-D clippy::expect_used
-D clippy::unwrap_used` is clean.
Replace `insecure_no_auth = true` with a real `admin_token` in all three
e2e gateway configs, and update test.sh to send `Authorization: Bearer`
on every admin-RPC call. Adds a Phase-3 "Admin token auth" check that
exercises the auth fairing end-to-end: missing token -> 401, wrong
token -> 401, correct token -> 200.

This makes the e2e suite a regression test for the auth path itself
(not just for the certbot/cert-sync flow that follows), so future
changes to admin auth can't silently break ingress without a test
failure.

Verified locally against the gateway binary (out of band of the docker
compose harness, which depends on an external TDX endpoint): the 8
exhaustive curl cases (no token / wrong / Bearer / X-Admin-Token /
?token= GET / ?token= POST / dashboard root unauthed+authed) all
return the expected status codes. Env-var fallback
(DSTACK_GATEWAY_ADMIN_TOKEN) and fail-by-default policy also confirmed
working.
prek's shellcheck-py started scanning e2e/test.sh once the previous
commit modified it, and surfaced two pre-existing classes of warnings:

- SC3043: `local` is undefined in POSIX sh. The script uses bash-only
  `local` throughout, so the `#!/bin/sh` shebang was already wrong.
  Switch to `#!/bin/bash`.
- SC2155: `local foo=$(cmd)` masks the inner command's exit status.
  Split into `local foo; foo=$(cmd)` at every site (one inside
  `test_certificate_from_pebble`, the rest inside `main`).

Remaining shellcheck findings are info-level only (SC2086, SC2317);
the prek CI run only failed on warning-level findings.
shellcheck-py in prek treats info-level findings as failures, and the
previous lint pass left two info-level classes (SC2317 unreachable
code, SC2086 unquoted in `[ ]`) — both located inside the
`wait_for_service` function. The function is defined but never
called (the actual readiness wait lives in run-e2e.sh's
docker-compose-driven healthchecks), so deleting it clears the
findings without altering test behaviour.
Plain browsers can't set custom headers when typing a URL, so the
existing `X-Admin-Token` / `Bearer` / `?token=` paths leave dashboard
operators with only the query-param option — which leaks the token via
URL bar, history, and Referer.

Add `Authorization: Basic <base64(user:pass)>` as a fourth accepted
transport. The token may sit in either the username or password field
(operators often paste into either side of the native browser prompt),
and the 401 sentinel now emits `WWW-Authenticate: Basic realm="..."`
so the browser actually shows the prompt instead of a plain error page.

RPC clients are unaffected: the new header just goes alongside the
existing transports and they ignore the WWW-Authenticate header.

Adds 5 tests: password field, username field, wrong password, malformed
base64, and the WWW-Authenticate challenge header on 401.
@kvinwang kvinwang merged commit df0b302 into master May 12, 2026
15 checks passed
Copy link
Copy Markdown
Collaborator

@Leechael Leechael left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@Leechael Leechael deleted the feat/gateway-admin-token branch May 12, 2026 14:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants