Skip to content

feat: SeiNode validator.operatorKeyring CRD surface (closes #219)#220

Merged
bdchatham merged 5 commits into
mainfrom
feat/operator-keyring-crd
May 12, 2026
Merged

feat: SeiNode validator.operatorKeyring CRD surface (closes #219)#220
bdchatham merged 5 commits into
mainfrom
feat/operator-keyring-crd

Conversation

@bdchatham
Copy link
Copy Markdown
Collaborator

@bdchatham bdchatham commented May 12, 2026

Summary

  • Implements Phase 1 step 1 of the in-pod governance signing workstream (design: docs(design): in-pod governance transaction signing for Sei validators seictl#166 / docs/design/in-pod-governance-signing.md — Component A).
  • Adds validator.operatorKeyring to SeiNode — discriminated-union surface (Secret variant; future TMKMS/Vault/KMS reserved as comments and an additive CEL exactly-one rule) declaring the operator-account keyring used by the seictl sidecar for governance, MsgEditValidator, withdraw-rewards, and other operator-account transactions.
  • Trust boundary held at admission, build-time, AND runtime: four CEL invariants on ValidatorSpec; pod-spec mount + env wired into the sidecar container only (never seid main, never init, never bootstrap); bootstrap-pod isolation guard generalized to walk both Secret volumes AND valueFrom.secretKeyRef env entries.
  • Sidecar container security context tightened in the same workstream (the keyring mount is precisely the reason): RunAsNonRoot, RunAsUser/Group 65532, ReadOnlyRootFilesystem, AllowPrivilegeEscalation: false, Capabilities.Drop: [ALL], SeccompProfile: RuntimeDefault, pod-level FSGroup: 65532 (required for the non-root UID to read the 0o400 Secret mount).
  • Pre-flight validate-operator-keyring task mirrors validate-signing-key.go. Deliberately does NOT decrypt the keyring — that's the sidecar's startup smoke test (issue feat(controller): wire Spec.Genesis.Accounts via sidecar typed wrappers #162 in seictl), not the controller's TCB.

CRD invariants

Four CEL rules on ValidatorSpec enforce the trust boundary at admission:

  1. operatorKeyring.secret.secretName != signingKey.secret.secretName (data-keyring ≠ consensus-key Secret)
  2. operatorKeyring.secret.secretName != nodeKey.secret.secretName (data-keyring ≠ P2P-key Secret)
  3. operatorKeyring.secret.secretName != passphraseSecretRef.secretName (data-keyring ≠ passphrase Secret)
  4. secretName and passphraseSecretRef.secretName immutable via self == oldSelf

keyName is intentionally NOT immutable (allows key rotation within a Secret).

Discretionary decisions flagged for review

These weren't fully prescribed by the design — reviewers should sanity-check:

  1. Error wording uses Cosmos SDK terminology (e.g., "file-keyring requires at least one encrypted key blob"). Functional, not load-bearing.
  2. defaultStr helper inlined, not extracted. Design referenced a helper that doesn't exist in this repo; YAGNI says don't add a 4-line helper for two call sites. Defaults: KeyName→\"node_admin\", passphraseRef.Key→\"passphrase\" (matches the kubebuilder defaults).
  3. ptr.To(true) / ptr.To(int64(65532)) workaround: modernize/newexpr linter false-positively suggests new(true) (invalid Go). Assigned to local vars before address-taking. Functionally identical; less direct.
  4. Extra test for env-var leakage in init container (PassphraseEnvInInitContainer_Rejects) — not in the design but the bootstrap pod's sidecar runs as InitContainer, so this is the realistic env-leakage path. Cheap belt-and-suspenders.

Test plan

Verification on the implementer side:

  • make manifests generate succeeds and produces the expected new types/CRD fields
  • go build ./... clean
  • go test ./... all packages pass
  • golangci-lint run --new-from-rev=origin/main ./... reports 0 issues

For human reviewers:

  • Cross-check that the new mount + env appear on the sidecar container only (buildSidecarContainer) — NOT on buildNodeMainContainer, NOT on the init container, NOT on the bootstrap pod
  • Verify the four CEL invariants on ValidatorSpec via kubectl apply --dry-run=server on edge cases (collision Secret names, missing fields, immutability-on-update)
  • Confirm validate-operator-keyring does NOT attempt to decrypt the keyring (controller TCB stays small)
  • Spot-check assertNoValidatorSecretsOnBootstrapPod rejects each of the three forbidden Secret types and accepts NodeKey

Cross-review

This PR is being reviewed in parallel by platform-engineer, kubernetes-specialist, and security-specialist before being marked ready for merge. Their findings will be posted as PR review comments.

🤖 Generated with Claude Code


Note

Medium Risk
Introduces a new validator credential surface (operatorKeyring) and wires it into planning, pod generation, and bootstrap isolation; mistakes could accidentally expose secrets or break validator deployments, though the PR adds multiple admission/runtime guards and tests.

Overview
Adds a new validator.operatorKeyring API surface (Secret-backed Cosmos file keyring + separate passphrase Secret), including CRD schema/defaults and CEL validations to enforce secret distinctness/immutability.

Wires operator keyring into reconciliation/planning by inserting a new preflight validate-operator-keyring task (with a new status condition + reasons) and passing params with a fallback default key name when admission defaulting hasn’t run.

Hardens runtime secret containment: updates StatefulSet generation to mount the keyring directory + passphrase env only on the sidecar, adds a fail-closed pod-spec guard to prevent leakage to seid/init containers, tightens sidecar security context (non-root, read-only rootfs, dropped caps, seccomp) and pod fsGroup, and generalizes the bootstrap-pod guard to reject any validator-owned secrets in volumes or env refs.

Tests/regen: updates unit tests for the new GenerateStatefulSet error return, adds extensive coverage for keyring mounts/env and bootstrap leakage checks, and regenerates deepcopy + CRD/manifests.

Reviewed by Cursor Bugbot for commit 1028fb1. Bugbot is set up for automated code reviews on this repo. Configure here.

Implements Phase 1 step 1 of the in-pod governance signing workstream
(design: sei-protocol/seictl#166).

Adds validator.operatorKeyring to the SeiNode CRD — a discriminated
union (Secret variant; future TMKMS/Vault/KMS reserved) declaring the
operator-account keyring used by the sidecar for governance,
MsgEditValidator, withdraw-rewards, and other operator-account txs.

The keyring data Secret (mounted as a volume at $SEI_HOME/keyring-file/)
and a separate passphrase Secret (projected as an env var) are mounted
exclusively on the sidecar container — never on the seid main container,
init container, or bootstrap pod. Four CEL invariants on ValidatorSpec
enforce the trust boundary at admission:
  - operatorKeyring.secret.secretName != signingKey.secret.secretName
  - operatorKeyring.secret.secretName != nodeKey.secret.secretName
  - operatorKeyring.secret.secretName != passphraseSecretRef.secretName
  - secretName / passphraseSecretRef.secretName immutable post-create

Pre-flight task validate-operator-keyring mirrors validate-signing-key:
checks both Secrets exist + non-empty, validates the file-keyring's
*.info and *.address index files, and confirms the named key entry is
present if KeyName is specified. Does NOT decrypt — that sits with the
sidecar's startup smoke test, not the controller's TCB.

Sidecar container security context tightened in the same workstream:
RunAsNonRoot, RunAsUser/Group 65532, ReadOnlyRootFilesystem,
AllowPrivilegeEscalation false, Capabilities.Drop ALL, SeccompProfile
RuntimeDefault, pod-level FSGroup 65532 (required for the non-root UID
to read the 0o400 Secret mount).

Bootstrap-pod isolation guard generalized — assertNoSigningKeyOnBootstrapPod
becomes assertNoValidatorSecretsOnBootstrapPod, walking volumes for any
of {signing-key, operator-keyring, passphrase} Secrets and env vars for
the passphrase Secret. NodeKey deliberately excluded (no signing authority).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comment thread internal/noderesource/noderesource.go
Comment thread internal/noderesource/noderesource.go
Synthesizes platform-engineer / kubernetes-specialist / security-specialist
cross-review of #220. Addresses 9 review items; 1 deferred with explicit
documentation.

CEL invariants tightened (kubernetes-specialist finding):
  Adds two ValidatorSpec rules forbidding the passphrase Secret from
  colliding with signingKey.secret.secretName or nodeKey.secret.secretName.
  Closes a trust-boundary gap where a malicious config could project a
  Secret used by seid main into the sidecar's env-resolution path.

Planner-side defense-in-depth (security-specialist finding):
  validatorPlanner.Validate now mirrors all five operator-keyring
  distinctness rules. CEL alone requires K8s 1.25+ with the feature
  flag; mis-configured webhooks or --validate=false bypass CEL.

Production pod-spec runtime guard (security-specialist finding):
  GenerateStatefulSet asserts operatorKeyringVolumeName never appears
  on the seid main container or init containers. CI catches regressions
  the way assertNoValidatorSecretsOnBootstrapPod does for bootstrap pods.
  Signature changes to (*appsv1.StatefulSet, error); 42 call sites updated.

Bootstrap-pod guard extended (security-specialist finding):
  assertNoValidatorSecretsOnBootstrapPod now walks c.EnvFrom[].SecretRef
  in addition to c.Env[].ValueFrom.SecretKeyRef. Future-proofs against
  envFrom-shaped leakage.

Default constants exported (platform-engineer finding):
  api/v1alpha1.DefaultOperatorKeyName and DefaultPassphraseSecretKey
  replace magic strings in planner + noderesource. Kubebuilder defaults
  retain literal strings (markers can't reference Go consts) but carry
  comments tying them to the constants.

Operator-facing error wording reworded (platform-engineer finding):
  Conditions point at Secret data-key shape (*.info, *.address) instead
  of internal Cosmos SDK backend terminology.

Explicit AutomountServiceAccountToken (security-specialist finding):
  Pod spec now sets ptr.To(true) explicitly with #165 reference comment.
  Cluster-default flips that disable SA tokens would otherwise silently
  break the future TokenReview path.

ptr.To linter situation cleaned (platform-engineer finding):
  Restored ptr.To(true)/ptr.To(false) with //nolint:modernize and a
  reason string. CLAUDE.md "fix lint, don't suppress" applies to real
  issues; new(true) is invalid Go, so the modernize suggestion is a
  false-positive and explicit nolint is the idiomatic answer.

shareProcessNamespace = true documented as deferred (security blocker
   downgraded to deferred follow-up per orchestrator decision):
  A compromised seid container can today read /proc/<sidecar>/environ
  and /proc/<sidecar>/mem, exposing the passphrase and unlocked keyring.
  Not a one-way door — the broader sidecar/seid hardening (drop
  shareProcessNamespace, harden seid main SecurityContext, separate
  sidecar SA) is tracked separately.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@bdchatham
Copy link
Copy Markdown
Collaborator Author

Cross-review complete — findings integrated

3-specialist review (platform-engineer, kubernetes-specialist, security-specialist).

Verdicts: platform-engineer LGTM-with-fixes • kubernetes-specialist LGTM-with-fixes • security-specialist Block-pending-changes (one item).

Addressed in commit 6622404

# Source Item Fix
1 k8s-specialist Trust-boundary CEL gap — passphrase Secret could collide with signing/node-key Secret Two new XValidation rules on ValidatorSpec
2 security Planner-side CEL mirror — CEL alone bypassable on K8s <1.25 or with --validate=false validatorPlanner.Validate now enforces all 5 distinctness rules
3 security Production pod-spec runtime guard GenerateStatefulSet asserts operatorKeyringVolumeName never on seid main / init; signature changes to (*StatefulSet, error)
4 security Bootstrap guard missed EnvFrom assertNoValidatorSecretsOnBootstrapPod now walks both Env.ValueFrom.SecretKeyRef and EnvFrom.SecretRef
5 platform Magic-string defaults in 3 places Exported DefaultOperatorKeyName / DefaultPassphraseSecretKey constants from api/v1alpha1; planner + noderesource reference them; kubebuilder markers carry "must-match" comment
6 platform Cosmos-SDK terminology in operator-facing conditions Reworded to point at Secret data-key shape (*.info, *.address)
7 security AutomountServiceAccountToken left at kubelet-default Explicit ptr.To(true) with sei-protocol/seictl#165 reference comment
8 platform ptr.To(true) workaround inconsistent with rest of file Restored ptr.To(...) everywhere; //nolint:modernize with reason on the affected lines
9 security shareProcessNamespace decision Deferred with explicit code comment + follow-up issue

Deferred — tracked as follow-up

Issue filed: see "Harden sidecar/seid trust boundary against /proc-based leakage" in this repo for the broader sidecar/seid container hardening workstream.

The shareProcessNamespace gap, distinct UIDs for sidecar vs seid main, seid SecurityContext hardening, and separating the sidecar's SA from the controller-manager SA are coherent as a single workstream — orchestrator decision was to defer this PR's resolution rather than partial-fix here.

Gates

  • make manifests generate — idempotent (only the two new CEL rules show up in regenerated CRDs)
  • go test ./... — all packages pass
  • golangci-lint run --new-from-rev=origin/main ./... — 0 issues

Discretionary decisions during cleanup

  • GenerateStatefulSet signature change rippled to 42 test call sites + 1 real caller. Added mustGenerateStatefulSet test helper to keep ergonomics flat. Alternative considered: keep signature and call assertion from apply_statefulset.go — rejected because pushing the invariant out of the builder weakens the "every production pod-spec is checked" guarantee.
  • modernize/newexpr false-positive class also flagged ptr.To(sidecarNonRootUID) — same //nolint:modernize pattern applied; calling new() on a value is invalid Go.
  • goconst-driven container name constants (containerNameSeid, containerNameSidecar) extracted at package level because the new assertion code pushed occurrence counts past the linter threshold. Idiomatic.

This PR addresses all blockers identified by the cross-review. The deferred shareProcessNamespace hardening is tracked in the follow-up issue and is not a one-way door — the v1 trust posture documented in the design doc remains intact (operator keyring on sidecar only, mount-list discipline, CEL invariants), with the /proc leakage path being the explicit remaining work.

Cursor bugbot flagged two real regressions introduced by the sidecar
SecurityContext + pod-level FSGroup additions:

1. ReadOnlyRootFilesystem: true breaks any Go stdlib path that writes
   to /tmp (os.CreateTemp, net/http multipart, encoding/json large
   buffers). Adds an emptyDir volume mounted at /tmp on the sidecar
   container only. Tmpfs-backed, isolated from the seid container's
   TMPDIR=/sei/tmp.

2. Pod-level FSGroup without FSGroupChangePolicy defaults to "Always",
   causing kubelet to recursively chown every file on every mounted
   PVC at every pod start. For Sei archive PVCs (TBs of chain data)
   this adds minutes of startup latency on every restart. Setting
   FSGroupChangePolicy: OnRootMismatch chowns on first creation only.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@bdchatham
Copy link
Copy Markdown
Collaborator Author

Addressed Cursor bugbot findings (commit 1792267)

Both were real regressions from the sidecar hardening in 6622404. Fixed:

  1. Missing TMPDIR / writable /tmp — added an emptyDir volume mounted at /tmp on the sidecar container only. Tmpfs-backed, isolated from seid's TMPDIR=/sei/tmp. Resolves the Go-stdlib-temp-file failure mode introduced by ReadOnlyRootFilesystem: true.

  2. FSGroupChangePolicy: OnRootMismatch — added alongside FSGroup: 65532 on the pod SecurityContext. Avoids the kubelet's default Always policy recursively chowning every PVC file on every pod start — substantial savings for archive PVCs.

Tests + lint green.

Comment thread api/v1alpha1/validator_types.go Outdated
Comment thread api/v1alpha1/validator_types.go Outdated
Comment thread api/v1alpha1/validator_types.go Outdated
Comment thread internal/noderesource/noderesource.go Outdated
Comment thread internal/noderesource/noderesource.go
Comment thread internal/task/bootstrap_resources.go Outdated
- Cut speculative "future siblings" doc text on OperatorKeyringSource
  (per YAGNI; the additive CEL shape `(has(self.secret) ? 1 : 0) == 1`
  speaks for itself, doesn't need prose elaboration about hypothetical
  variants).
- Remove the `+kubebuilder:default="passphrase"` on
  PassphraseSecretRef.Key and make Key required. Operators declare the
  data key explicitly rather than relying on a default that hides where
  the passphrase actually lives. KeyName's "node_admin" default is kept
  as the seienv-continuity convention.
- Drop the now-unused DefaultPassphraseSecretKey constant and the
  planner+noderesource fallback code (Key is required at admission;
  no in-code fallback needed).
- Refine the operatorKeyringDirName comment to make clear it's an SDK
  contract, not a controller choice.
- Move the `forbiddenSecret` type declaration to the top of
  `bootstrap_resources.go` per Go convention (types alongside package
  declarations).
- Update the AutomountServiceAccountToken comment to reflect rev2 of
  the design (the proxy uses the SA token for TokenReview, not the
  sidecar's in-process middleware).

No substantive change to the keyring CRD design. The kube-rbac-proxy
work from rev2 is fast-follow scope (#165 expanded).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@bdchatham
Copy link
Copy Markdown
Collaborator Author

Addressed line comments (commit `d55487a`)

# Location Comment Resolution
1 `validator_types.go:148` "points about the future might be worth cutting" Cut the "future siblings (TMKMS, Vault, KMS)" sentence from the doc comment. The additive CEL shape `(has(self.secret) ? 1 : 0) == 1` makes the extension path self-evident.
2 `validator_types.go:157` "example for removal" Removed the `// Future siblings: ...` placeholder inside the struct body.
3 `validator_types.go:213` "we just shouldn't have a default. Any strong reason for it?" Removed the `+kubebuilder:default="passphrase"` and made `Key` required (`MinLength=1`, no `omitempty`). Operators declare the data key explicitly. Also dropped the now-unused `DefaultPassphraseSecretKey` constant and the planner+noderesource fallback code. (`KeyName` default of `"node_admin"` kept — that one is load-bearing as the seienv-continuity convention; happy to discuss if you want it removed for symmetry.)
4 `noderesource.go:45` "Is this a one way door? Is this conventional?" No one-way door. The string `"keyring-file"` is mandated by the Cosmos SDK file-backend keyring — `keyring.New(name, BackendFile, homeDir, ...)` opens `homeDir/keyring-file/` internally. The constant just mirrors the SDK contract; it's conventional to do this in Cosmos SDK integrations. Refined the code comment to say so explicitly ("Not a controller choice; this constant mirrors the SDK contract.").
5 `noderesource.go:55` "Is there any constant we can set this to so that it's more self-explanatory?" No upstream Go constant available — `65532` is the `nonroot` UID baked into `distroless/static-debian12` and `chainguard/static` base images, but neither exports it as a Go-importable constant. The existing comment names the source ("nonroot UID/GID baked into distroless and chainguard static-debian12 base images"); the in-file name (`sidecarNonRootUID`) is named per its use site rather than its source. Left as-is.
6 `bootstrap_resources.go:398` "Please put at top of file." Moved the `forbiddenSecret` type declaration up to the package-level constants block, per Go convention.

Bonus tidy (rev2 wording)

While in the file, refreshed the `AutomountServiceAccountToken` comment to reflect the rev2 design (the future kube-rbac-proxy uses the SA token for TokenReview/SAR, not in-process middleware). Same intent, more accurate phrasing.

Rev2 (kube-rbac-proxy) impact on this PR

None substantive. The operatorKeyring CRD design is unchanged. The kube-rbac-proxy work from rev2 is fast-follow scope (sei-protocol/seictl#165 expanded to span both repos; possibly a new sei-k8s-controller issue for the pod-spec proxy-container addition). This PR stays lean.

Gates

  • `make manifests generate` — idempotent (only the `Key` field shape change in the CRD YAML)
  • `go test ./...` — all packages pass
  • `golangci-lint run --new-from-rev=origin/main ./...` — 0 issues
  • Net diff: +48 / -63 (deletion-positive — YAGNI)

Comment thread internal/noderesource/noderesource.go
Cursor bugbot flagged an asymmetric gap — assertNoOperatorKeyringOnSeidContainers
only walked volume mounts, while its bootstrap-pod sibling
(assertNoValidatorSecretsOnBootstrapPod) walks both volumes AND env vars.
The passphrase env is equally sensitive operator-keyring material: a future
refactor that accidentally injects SEI_KEYRING_PASSPHRASE into the seid
main container (or a non-sidecar init container) would slip past the
volume-only check.

Extends the production guard to also walk Env[].ValueFrom.SecretKeyRef.Name
and EnvFrom[].SecretRef.Name for matches against the operator-keyring
passphrase Secret name. Adds two regression tests covering both leakage
paths.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@bdchatham
Copy link
Copy Markdown
Collaborator Author

Addressed Cursor bugbot finding (commit `1028fb1`)

"Production pod guard misses passphrase env-var leakage vector" — real and worth catching. Asymmetric gap: the bootstrap-pod guard (`assertNoValidatorSecretsOnBootstrapPod`) walks both volumes AND `valueFrom.secretKeyRef` / `envFrom.secretRef`, but the production-pod guard (`assertNoOperatorKeyringOnSeidContainers`) was volume-only.

Extended `assertNoOperatorKeyringOnSeidContainers` to also walk:

  • `Container.Env[].ValueFrom.SecretKeyRef.Name` — catches single-key env injection (`SEI_KEYRING_PASSPHRASE` from the passphrase Secret)
  • `Container.EnvFrom[].SecretRef.Name` — catches whole-Secret `envFrom` injection

Both paths are now symmetric with the bootstrap-pod guard. Two new regression tests cover the Env and EnvFrom leakage cases on the seid main / seid init containers.

Tests + lint green.

@bdchatham bdchatham merged commit 5b789bc into main May 12, 2026
3 checks passed
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 1028fb1. Configure here.

},
},
}}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Operator keyring key name not passed to sidecar

High Severity

operatorKeyringEnvVars injects only SEI_KEYRING_PASSPHRASE into the sidecar container but never passes the KeyName (e.g., "node_admin") that tells the sidecar which keyring entry to sign with. The keyring directory mount can contain multiple key entries, the pre-flight validator checks for the named key's .info file, and the planner defaults KeyName via DefaultOperatorKeyName — but none of that information reaches the sidecar process. Without a SEI_KEYRING_KEY_NAME (or similar) env var, the sidecar cannot determine which key to use for signing transactions.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 1028fb1. Configure here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant