feat: SeiNode validator.operatorKeyring CRD surface (closes #219)#220
Conversation
Implements Phase 1 step 1 of the in-pod governance signing workstream (design: sei-protocol/seictl#166). Adds validator.operatorKeyring to the SeiNode CRD — a discriminated union (Secret variant; future TMKMS/Vault/KMS reserved) declaring the operator-account keyring used by the sidecar for governance, MsgEditValidator, withdraw-rewards, and other operator-account txs. The keyring data Secret (mounted as a volume at $SEI_HOME/keyring-file/) and a separate passphrase Secret (projected as an env var) are mounted exclusively on the sidecar container — never on the seid main container, init container, or bootstrap pod. Four CEL invariants on ValidatorSpec enforce the trust boundary at admission: - operatorKeyring.secret.secretName != signingKey.secret.secretName - operatorKeyring.secret.secretName != nodeKey.secret.secretName - operatorKeyring.secret.secretName != passphraseSecretRef.secretName - secretName / passphraseSecretRef.secretName immutable post-create Pre-flight task validate-operator-keyring mirrors validate-signing-key: checks both Secrets exist + non-empty, validates the file-keyring's *.info and *.address index files, and confirms the named key entry is present if KeyName is specified. Does NOT decrypt — that sits with the sidecar's startup smoke test, not the controller's TCB. Sidecar container security context tightened in the same workstream: RunAsNonRoot, RunAsUser/Group 65532, ReadOnlyRootFilesystem, AllowPrivilegeEscalation false, Capabilities.Drop ALL, SeccompProfile RuntimeDefault, pod-level FSGroup 65532 (required for the non-root UID to read the 0o400 Secret mount). Bootstrap-pod isolation guard generalized — assertNoSigningKeyOnBootstrapPod becomes assertNoValidatorSecretsOnBootstrapPod, walking volumes for any of {signing-key, operator-keyring, passphrase} Secrets and env vars for the passphrase Secret. NodeKey deliberately excluded (no signing authority). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Synthesizes platform-engineer / kubernetes-specialist / security-specialist cross-review of #220. Addresses 9 review items; 1 deferred with explicit documentation. CEL invariants tightened (kubernetes-specialist finding): Adds two ValidatorSpec rules forbidding the passphrase Secret from colliding with signingKey.secret.secretName or nodeKey.secret.secretName. Closes a trust-boundary gap where a malicious config could project a Secret used by seid main into the sidecar's env-resolution path. Planner-side defense-in-depth (security-specialist finding): validatorPlanner.Validate now mirrors all five operator-keyring distinctness rules. CEL alone requires K8s 1.25+ with the feature flag; mis-configured webhooks or --validate=false bypass CEL. Production pod-spec runtime guard (security-specialist finding): GenerateStatefulSet asserts operatorKeyringVolumeName never appears on the seid main container or init containers. CI catches regressions the way assertNoValidatorSecretsOnBootstrapPod does for bootstrap pods. Signature changes to (*appsv1.StatefulSet, error); 42 call sites updated. Bootstrap-pod guard extended (security-specialist finding): assertNoValidatorSecretsOnBootstrapPod now walks c.EnvFrom[].SecretRef in addition to c.Env[].ValueFrom.SecretKeyRef. Future-proofs against envFrom-shaped leakage. Default constants exported (platform-engineer finding): api/v1alpha1.DefaultOperatorKeyName and DefaultPassphraseSecretKey replace magic strings in planner + noderesource. Kubebuilder defaults retain literal strings (markers can't reference Go consts) but carry comments tying them to the constants. Operator-facing error wording reworded (platform-engineer finding): Conditions point at Secret data-key shape (*.info, *.address) instead of internal Cosmos SDK backend terminology. Explicit AutomountServiceAccountToken (security-specialist finding): Pod spec now sets ptr.To(true) explicitly with #165 reference comment. Cluster-default flips that disable SA tokens would otherwise silently break the future TokenReview path. ptr.To linter situation cleaned (platform-engineer finding): Restored ptr.To(true)/ptr.To(false) with //nolint:modernize and a reason string. CLAUDE.md "fix lint, don't suppress" applies to real issues; new(true) is invalid Go, so the modernize suggestion is a false-positive and explicit nolint is the idiomatic answer. shareProcessNamespace = true documented as deferred (security blocker downgraded to deferred follow-up per orchestrator decision): A compromised seid container can today read /proc/<sidecar>/environ and /proc/<sidecar>/mem, exposing the passphrase and unlocked keyring. Not a one-way door — the broader sidecar/seid hardening (drop shareProcessNamespace, harden seid main SecurityContext, separate sidecar SA) is tracked separately. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Cross-review complete — findings integrated3-specialist review (platform-engineer, kubernetes-specialist, security-specialist). Verdicts: platform-engineer LGTM-with-fixes • kubernetes-specialist LGTM-with-fixes • security-specialist Block-pending-changes (one item). Addressed in commit
|
| # | Source | Item | Fix |
|---|---|---|---|
| 1 | k8s-specialist | Trust-boundary CEL gap — passphrase Secret could collide with signing/node-key Secret | Two new XValidation rules on ValidatorSpec |
| 2 | security | Planner-side CEL mirror — CEL alone bypassable on K8s <1.25 or with --validate=false |
validatorPlanner.Validate now enforces all 5 distinctness rules |
| 3 | security | Production pod-spec runtime guard | GenerateStatefulSet asserts operatorKeyringVolumeName never on seid main / init; signature changes to (*StatefulSet, error) |
| 4 | security | Bootstrap guard missed EnvFrom |
assertNoValidatorSecretsOnBootstrapPod now walks both Env.ValueFrom.SecretKeyRef and EnvFrom.SecretRef |
| 5 | platform | Magic-string defaults in 3 places | Exported DefaultOperatorKeyName / DefaultPassphraseSecretKey constants from api/v1alpha1; planner + noderesource reference them; kubebuilder markers carry "must-match" comment |
| 6 | platform | Cosmos-SDK terminology in operator-facing conditions | Reworded to point at Secret data-key shape (*.info, *.address) |
| 7 | security | AutomountServiceAccountToken left at kubelet-default |
Explicit ptr.To(true) with sei-protocol/seictl#165 reference comment |
| 8 | platform | ptr.To(true) workaround inconsistent with rest of file |
Restored ptr.To(...) everywhere; //nolint:modernize with reason on the affected lines |
| 9 | security | shareProcessNamespace decision | Deferred with explicit code comment + follow-up issue |
Deferred — tracked as follow-up
Issue filed: see "Harden sidecar/seid trust boundary against /proc-based leakage" in this repo for the broader sidecar/seid container hardening workstream.
The shareProcessNamespace gap, distinct UIDs for sidecar vs seid main, seid SecurityContext hardening, and separating the sidecar's SA from the controller-manager SA are coherent as a single workstream — orchestrator decision was to defer this PR's resolution rather than partial-fix here.
Gates
make manifests generate— idempotent (only the two new CEL rules show up in regenerated CRDs)go test ./...— all packages passgolangci-lint run --new-from-rev=origin/main ./...— 0 issues
Discretionary decisions during cleanup
GenerateStatefulSetsignature change rippled to 42 test call sites + 1 real caller. AddedmustGenerateStatefulSettest helper to keep ergonomics flat. Alternative considered: keep signature and call assertion fromapply_statefulset.go— rejected because pushing the invariant out of the builder weakens the "every production pod-spec is checked" guarantee.modernize/newexprfalse-positive class also flaggedptr.To(sidecarNonRootUID)— same//nolint:modernizepattern applied; callingnew()on a value is invalid Go.goconst-driven container name constants (containerNameSeid,containerNameSidecar) extracted at package level because the new assertion code pushed occurrence counts past the linter threshold. Idiomatic.
This PR addresses all blockers identified by the cross-review. The deferred shareProcessNamespace hardening is tracked in the follow-up issue and is not a one-way door — the v1 trust posture documented in the design doc remains intact (operator keyring on sidecar only, mount-list discipline, CEL invariants), with the /proc leakage path being the explicit remaining work.
Cursor bugbot flagged two real regressions introduced by the sidecar SecurityContext + pod-level FSGroup additions: 1. ReadOnlyRootFilesystem: true breaks any Go stdlib path that writes to /tmp (os.CreateTemp, net/http multipart, encoding/json large buffers). Adds an emptyDir volume mounted at /tmp on the sidecar container only. Tmpfs-backed, isolated from the seid container's TMPDIR=/sei/tmp. 2. Pod-level FSGroup without FSGroupChangePolicy defaults to "Always", causing kubelet to recursively chown every file on every mounted PVC at every pod start. For Sei archive PVCs (TBs of chain data) this adds minutes of startup latency on every restart. Setting FSGroupChangePolicy: OnRootMismatch chowns on first creation only. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Addressed Cursor bugbot findings (commit
|
- Cut speculative "future siblings" doc text on OperatorKeyringSource (per YAGNI; the additive CEL shape `(has(self.secret) ? 1 : 0) == 1` speaks for itself, doesn't need prose elaboration about hypothetical variants). - Remove the `+kubebuilder:default="passphrase"` on PassphraseSecretRef.Key and make Key required. Operators declare the data key explicitly rather than relying on a default that hides where the passphrase actually lives. KeyName's "node_admin" default is kept as the seienv-continuity convention. - Drop the now-unused DefaultPassphraseSecretKey constant and the planner+noderesource fallback code (Key is required at admission; no in-code fallback needed). - Refine the operatorKeyringDirName comment to make clear it's an SDK contract, not a controller choice. - Move the `forbiddenSecret` type declaration to the top of `bootstrap_resources.go` per Go convention (types alongside package declarations). - Update the AutomountServiceAccountToken comment to reflect rev2 of the design (the proxy uses the SA token for TokenReview, not the sidecar's in-process middleware). No substantive change to the keyring CRD design. The kube-rbac-proxy work from rev2 is fast-follow scope (#165 expanded). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Addressed line comments (commit `d55487a`)
Bonus tidy (rev2 wording)While in the file, refreshed the `AutomountServiceAccountToken` comment to reflect the rev2 design (the future kube-rbac-proxy uses the SA token for TokenReview/SAR, not in-process middleware). Same intent, more accurate phrasing. Rev2 (kube-rbac-proxy) impact on this PRNone substantive. The operatorKeyring CRD design is unchanged. The kube-rbac-proxy work from rev2 is fast-follow scope (sei-protocol/seictl#165 expanded to span both repos; possibly a new sei-k8s-controller issue for the pod-spec proxy-container addition). This PR stays lean. Gates
|
Cursor bugbot flagged an asymmetric gap — assertNoOperatorKeyringOnSeidContainers only walked volume mounts, while its bootstrap-pod sibling (assertNoValidatorSecretsOnBootstrapPod) walks both volumes AND env vars. The passphrase env is equally sensitive operator-keyring material: a future refactor that accidentally injects SEI_KEYRING_PASSPHRASE into the seid main container (or a non-sidecar init container) would slip past the volume-only check. Extends the production guard to also walk Env[].ValueFrom.SecretKeyRef.Name and EnvFrom[].SecretRef.Name for matches against the operator-keyring passphrase Secret name. Adds two regression tests covering both leakage paths. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Addressed Cursor bugbot finding (commit `1028fb1`)"Production pod guard misses passphrase env-var leakage vector" — real and worth catching. Asymmetric gap: the bootstrap-pod guard (`assertNoValidatorSecretsOnBootstrapPod`) walks both volumes AND `valueFrom.secretKeyRef` / `envFrom.secretRef`, but the production-pod guard (`assertNoOperatorKeyringOnSeidContainers`) was volume-only. Extended `assertNoOperatorKeyringOnSeidContainers` to also walk:
Both paths are now symmetric with the bootstrap-pod guard. Two new regression tests cover the Env and EnvFrom leakage cases on the seid main / seid init containers. Tests + lint green. |
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 1028fb1. Configure here.
| }, | ||
| }, | ||
| }} | ||
| } |
There was a problem hiding this comment.
Operator keyring key name not passed to sidecar
High Severity
operatorKeyringEnvVars injects only SEI_KEYRING_PASSPHRASE into the sidecar container but never passes the KeyName (e.g., "node_admin") that tells the sidecar which keyring entry to sign with. The keyring directory mount can contain multiple key entries, the pre-flight validator checks for the named key's .info file, and the planner defaults KeyName via DefaultOperatorKeyName — but none of that information reaches the sidecar process. Without a SEI_KEYRING_KEY_NAME (or similar) env var, the sidecar cannot determine which key to use for signing transactions.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit 1028fb1. Configure here.


Summary
docs/design/in-pod-governance-signing.md— Component A).validator.operatorKeyringtoSeiNode— discriminated-union surface (Secretvariant; futureTMKMS/Vault/KMSreserved as comments and an additive CELexactly-onerule) declaring the operator-account keyring used by the seictl sidecar for governance,MsgEditValidator, withdraw-rewards, and other operator-account transactions.ValidatorSpec; pod-spec mount + env wired into the sidecar container only (never seid main, never init, never bootstrap); bootstrap-pod isolation guard generalized to walk both Secret volumes ANDvalueFrom.secretKeyRefenv entries.RunAsNonRoot,RunAsUser/Group 65532,ReadOnlyRootFilesystem,AllowPrivilegeEscalation: false,Capabilities.Drop: [ALL],SeccompProfile: RuntimeDefault, pod-levelFSGroup: 65532(required for the non-root UID to read the 0o400 Secret mount).validate-operator-keyringtask mirrorsvalidate-signing-key.go. Deliberately does NOT decrypt the keyring — that's the sidecar's startup smoke test (issue feat(controller): wire Spec.Genesis.Accounts via sidecar typed wrappers #162 in seictl), not the controller's TCB.CRD invariants
Four CEL rules on
ValidatorSpecenforce the trust boundary at admission:operatorKeyring.secret.secretName != signingKey.secret.secretName(data-keyring ≠ consensus-key Secret)operatorKeyring.secret.secretName != nodeKey.secret.secretName(data-keyring ≠ P2P-key Secret)operatorKeyring.secret.secretName != passphraseSecretRef.secretName(data-keyring ≠ passphrase Secret)secretNameandpassphraseSecretRef.secretNameimmutable viaself == oldSelfkeyNameis intentionally NOT immutable (allows key rotation within a Secret).Discretionary decisions flagged for review
These weren't fully prescribed by the design — reviewers should sanity-check:
"file-keyring requires at least one encrypted key blob"). Functional, not load-bearing.defaultStrhelper inlined, not extracted. Design referenced a helper that doesn't exist in this repo; YAGNI says don't add a 4-line helper for two call sites. Defaults:KeyName→\"node_admin\",passphraseRef.Key→\"passphrase\"(matches the kubebuilder defaults).ptr.To(true)/ptr.To(int64(65532))workaround:modernize/newexprlinter false-positively suggestsnew(true)(invalid Go). Assigned to local vars before address-taking. Functionally identical; less direct.PassphraseEnvInInitContainer_Rejects) — not in the design but the bootstrap pod's sidecar runs asInitContainer, so this is the realistic env-leakage path. Cheap belt-and-suspenders.Test plan
Verification on the implementer side:
make manifests generatesucceeds and produces the expected new types/CRD fieldsgo build ./...cleango test ./...all packages passgolangci-lint run --new-from-rev=origin/main ./...reports 0 issuesFor human reviewers:
buildSidecarContainer) — NOT onbuildNodeMainContainer, NOT on the init container, NOT on the bootstrap podValidatorSpecviakubectl apply --dry-run=serveron edge cases (collision Secret names, missing fields, immutability-on-update)validate-operator-keyringdoes NOT attempt to decrypt the keyring (controller TCB stays small)assertNoValidatorSecretsOnBootstrapPodrejects each of the three forbidden Secret types and accepts NodeKeyCross-review
This PR is being reviewed in parallel by platform-engineer, kubernetes-specialist, and security-specialist before being marked ready for merge. Their findings will be posted as PR review comments.
🤖 Generated with Claude Code
Note
Medium Risk
Introduces a new validator credential surface (
operatorKeyring) and wires it into planning, pod generation, and bootstrap isolation; mistakes could accidentally expose secrets or break validator deployments, though the PR adds multiple admission/runtime guards and tests.Overview
Adds a new
validator.operatorKeyringAPI surface (Secret-backed Cosmos file keyring + separate passphrase Secret), including CRD schema/defaults and CEL validations to enforce secret distinctness/immutability.Wires operator keyring into reconciliation/planning by inserting a new preflight
validate-operator-keyringtask (with a new status condition + reasons) and passing params with a fallback default key name when admission defaulting hasn’t run.Hardens runtime secret containment: updates StatefulSet generation to mount the keyring directory + passphrase env only on the sidecar, adds a fail-closed pod-spec guard to prevent leakage to
seid/init containers, tightens sidecar security context (non-root, read-only rootfs, dropped caps, seccomp) and podfsGroup, and generalizes the bootstrap-pod guard to reject any validator-owned secrets in volumes or env refs.Tests/regen: updates unit tests for the new
GenerateStatefulSeterror return, adds extensive coverage for keyring mounts/env and bootstrap leakage checks, and regenerates deepcopy + CRD/manifests.Reviewed by Cursor Bugbot for commit 1028fb1. Bugbot is set up for automated code reviews on this repo. Configure here.