feat(apply,upgrade): apply-safety follow-ups (#189 + #190 + #191 + #192)#200
Conversation
|
Warning Rate limit exceeded
You’ve run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (16)
📝 WalkthroughWalkthroughThis PR adds a Phase‑1 net‑addr syntactic walker and tests, appends its findings into preflight, introduces an allowlist-based secret-path redaction (default redaction with a ChangesMulti-phase validation and drift-preview enhancements
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Suggested reviewers
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
pkg/commands/preflight_apply_safety.go (1)
145-146:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winGeneralize the blocker summary now that net-addr blockers are included.
Line 145 currently reports only unresolved host resources, but blockers can now also be malformed net-addr fields. This summary is now misleading.
✏️ Suggested patch
- errors.Newf("pre-flight: %d declared host resource(s) do not resolve on the target node", blockers), + errors.Newf("pre-flight: %d blocking finding(s) detected in rendered MachineConfig", blockers),🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/commands/preflight_apply_safety.go` around lines 145 - 146, Update the error message created by errors.Newf that references the blockers variable so it no longer only mentions unresolved host resources; instead use a generalized phrase (e.g., "declared resource(s) failed validation or could not be resolved on the target node") that covers both unresolved host resources and malformed net-addr fields, and keep the existing guidance about correcting values or using --skip-resource-validation. Locate the errors.Newf(...) call that builds the blocker summary (uses the blockers variable) and replace the specific "do not resolve on the target node" text with the broader wording.pkg/commands/preflight_upgrade_verify.go (1)
112-112:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winHardcoded "90s window" reference inconsistent with configurable reconcile window.
Line 112's error message still hardcodes
"beyond the 90s window", contradicting the configurable--post-upgrade-reconcile-windowflag introduced in this PR. An operator passing--post-upgrade-reconcile-window=180swould see "beyond the 90s window" in the error, which is misleading.🔧 Proposed fix to use dynamic reconcile window reference
Since this error is constructed within
verifyPostUpgradeVersion, which doesn't have direct access to the configured window duration, consider either:
- Pass the reconcile window as a parameter to
verifyPostUpgradeVersionand interpolate it into the error message, or- Replace the hardcoded reference with generic wording like
"beyond the configured reconcile window"to match the hint text pattern.Option 2 is simpler:
- errors.Newf("post-upgrade: requested upgrade to %s but running version is %s — either Talos auto-rolled back, or the node is still booting beyond the 90s window", target, running), + errors.Newf("post-upgrade: requested upgrade to %s but running version is %s — either Talos auto-rolled back, or the node is still booting beyond the configured reconcile window", target, running),🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/commands/preflight_upgrade_verify.go` at line 112, The error message in verifyPostUpgradeVersion currently hardcodes "beyond the 90s window" which conflicts with the new --post-upgrade-reconcile-window flag; update verifyPostUpgradeVersion's error construction (the errors.Newf call that mentions target and running) to either accept the reconcile window as an additional parameter and interpolate its value, or (simpler) replace the hardcoded phrase with a generic "beyond the configured reconcile window" so the message reflects the configurable behavior; reference the verifyPostUpgradeVersion function and the specific errors.Newf invocation when making the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@pkg/commands/preflight_apply_safety_redact.go`:
- Around line 29-57: The secretFieldPaths allowlist is missing WireGuard-related
secret fields, so add the WireGuard key paths to the secretFieldPaths slice (the
bracket-normalised forms used there) to ensure redaction; for example add
entries like "cluster.wireguard.privateKey", "cluster.wireguard.preSharedKey",
"cluster.wireguard.keys[].privateKey", and the machine-scoped equivalents such
as "machine.wireguard.privateKey" and "machine.wireguard.preSharedKey" to the
secretFieldPaths variable so WireGuard private/PSK values are redacted in diffs.
---
Outside diff comments:
In `@pkg/commands/preflight_apply_safety.go`:
- Around line 145-146: Update the error message created by errors.Newf that
references the blockers variable so it no longer only mentions unresolved host
resources; instead use a generalized phrase (e.g., "declared resource(s) failed
validation or could not be resolved on the target node") that covers both
unresolved host resources and malformed net-addr fields, and keep the existing
guidance about correcting values or using --skip-resource-validation. Locate the
errors.Newf(...) call that builds the blocker summary (uses the blockers
variable) and replace the specific "do not resolve on the target node" text with
the broader wording.
In `@pkg/commands/preflight_upgrade_verify.go`:
- Line 112: The error message in verifyPostUpgradeVersion currently hardcodes
"beyond the 90s window" which conflicts with the new
--post-upgrade-reconcile-window flag; update verifyPostUpgradeVersion's error
construction (the errors.Newf call that mentions target and running) to either
accept the reconcile window as an additional parameter and interpolate its
value, or (simpler) replace the hardcoded phrase with a generic "beyond the
configured reconcile window" so the message reflects the configurable behavior;
reference the verifyPostUpgradeVersion function and the specific errors.Newf
invocation when making the change.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: fa5b912e-f751-4b4a-8ea6-6315ad88b263
📒 Files selected for processing (12)
docs/manual-test-plan.mdpkg/applycheck/refs_netaddr.gopkg/applycheck/refs_netaddr_test.gopkg/commands/apply.gopkg/commands/preflight_apply_safety.gopkg/commands/preflight_apply_safety_redact.gopkg/commands/preflight_apply_safety_redact_test.gopkg/commands/preflight_apply_safety_test.gopkg/commands/preflight_upgrade_verify.gopkg/commands/talosctl_wrapper.gopkg/commands/upgrade_handler.gopkg/commands/upgrade_handler_test.go
There was a problem hiding this comment.
Code Review
This pull request introduces several enhancements to the talm tool, focusing on security, validation, and user experience. Key changes include the implementation of secret redaction in drift previews and post-apply verification (with an opt-in flag to show them), and the addition of syntactic validation for network address fields (IPs, CIDRs, and host:port endpoints) during pre-flight checks. It also adds a configurable post-upgrade reconcile window and improves multi-node logging by adding per-node prefixes to maintenance warnings. Feedback was provided regarding a potential security vulnerability where slice-based diffing might bypass redaction logic if the order of checks is not adjusted.
| if oldSlice, newSlice, ok := bothSlices(change); ok { | ||
| return formatSliceSetDiff(change.Path, oldSlice, newSlice) | ||
| } |
There was a problem hiding this comment.
The bothSlices check currently takes precedence over the secret redaction logic. If a field path is marked as a secret in the allowlist but its value is represented as a slice (e.g., during a set-diff update), the formatSliceSetDiff function will be called, which is not redaction-aware. This could lead to sensitive data being leaked in the drift preview if a secret-bearing field is also a slice or if a secret is nested within a slice that is being diffed as a whole. Please ensure that isSecretPath is checked before bothSlices to guarantee redaction for all sensitive paths.
#189+#190+#191+#192) Address branch-review blockers on PR #200: - secretFieldPaths advertised Wireguard private/preshared keys (both in --show-secrets-in-drift help text and in the redact.go file comment) but contained zero Wireguard entries. Rotating WireguardConfig.privateKey via talm apply leaked the base64 key to stderr — the security-class bug the redaction feature exists to prevent. Add bare 'privateKey' + 'peers[].presharedKey' paths to the allowlist (the differ's flatten step does not prefix multidoc fields with the doc kind — see pkg/applycheck/diff.go identityOf). Pinned by TestIsSecretPath_WireguardSecretPaths. - docs/apply-safety-gates-test-plan.md (the narrow gates checklist that manual-test-plan.md cites as canonical) was stale on the new behaviour: no rows for the Phase 1 net-addr walker (#192), the Phase 2A secret redaction (#189), the Phase 2A per-node prefix on the maintenance line (#191), or the Phase 2C --post-upgrade-reconcile-window flag (#190). Add rows in the existing table style for each. - formatSecretFieldValue godoc now disclaims the rotation signal for map-shaped values: Go's fmt.Sprintf('%v', m) on a map[string]any iterates keys in randomised order, so length is not a reliable rotation indicator for maps. Today's allowlist has no map entries; if a future addition does, either canonicalise the map (sorted keys before %v) or disclaim the rotation signal for that entry. Signed-off-by: Aleksei Sviridkin <f@lex.la>
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@pkg/commands/preflight_apply_safety.go`:
- Around line 116-123: The appended net-addr findings from
applycheck.WalkNetAddrFindings are expanding the Phase 1 blockers beyond
unresolved host resources, so update the Phase 1 blocker summary text (the code
that constructs the final error/message for blockers where 'findings' is used)
to use a generalized description instead of referencing only unresolved host
resources; locate where 'findings' is counted/converted into the Phase 1 error
message and replace the specific wording with a broader phrase like "Phase 1
blockers detected" or similar to accurately reflect mixed net-addr and
host-resource blockers.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 87b74f74-5c38-4b02-86ff-a23a35fde2fe
📒 Files selected for processing (10)
README.mddocs/apply-safety-gates-test-plan.mddocs/manual-test-plan.mdpkg/commands/apply.gopkg/commands/apply_test.gopkg/commands/preflight_apply_safety.gopkg/commands/preflight_apply_safety_redact.gopkg/commands/preflight_apply_safety_redact_test.gopkg/commands/upgrade_handler.gopkg/commands/upgrade_handler_test.go
✅ Files skipped from review due to trivial changes (1)
- docs/apply-safety-gates-test-plan.md
🚧 Files skipped from review as they are similar to previous changes (4)
- pkg/commands/upgrade_handler.go
- pkg/commands/preflight_apply_safety_redact.go
- pkg/commands/preflight_apply_safety_redact_test.go
- pkg/commands/upgrade_handler_test.go
|
|
||
| netAddrFindings, err := applycheck.WalkNetAddrFindings(rendered) | ||
| if err != nil { | ||
| return errors.Wrap(err, "pre-flight: walking rendered MachineConfig for net-addr fields") | ||
| } | ||
|
|
||
| findings = append(findings, netAddrFindings...) | ||
|
|
There was a problem hiding this comment.
Broaden the Phase 1 blocker summary wording to match new net-addr blockers.
After appending WalkNetAddrFindings, the blocker count is no longer limited to unresolved host resources. The final error text should be generalized to avoid misleading remediation.
Proposed fix
- return errors.WithHint(
- errors.Newf("pre-flight: %d declared host resource(s) do not resolve on the target node", blockers),
- "correct the values referenced above, or pass --skip-resource-validation to bypass.",
- )
+ return errors.WithHint(
+ errors.Newf("pre-flight: %d validation blocker(s) found in rendered MachineConfig", blockers),
+ "correct the blocked values referenced above (resource refs / net-addr fields), or pass --skip-resource-validation to bypass.",
+ )🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@pkg/commands/preflight_apply_safety.go` around lines 116 - 123, The appended
net-addr findings from applycheck.WalkNetAddrFindings are expanding the Phase 1
blockers beyond unresolved host resources, so update the Phase 1 blocker summary
text (the code that constructs the final error/message for blockers where
'findings' is used) to use a generalized description instead of referencing only
unresolved host resources; locate where 'findings' is counted/converted into the
Phase 1 error message and replace the specific wording with a broader phrase
like "Phase 1 blockers detected" or similar to accurately reflect mixed net-addr
and host-resource blockers.
4a70764 to
6d7c654
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
pkg/commands/preflight_upgrade_verify_test.go (1)
129-152: ⚡ Quick winAdd a regression assertion for configured reconcile-window wording.
These tests pass the new duration argument, but they don’t assert the emitted mismatch text reflects that value (and no longer regresses to hardcoded
90sphrasing). A small assertion here would lock the new behavior.♻️ Proposed test hardening
func TestVerifyPostUpgradeVersion_MinorMismatch_Blocks(t *testing.T) { @@ msg := err.Error() + " " + buf.String() @@ if !strings.Contains(msg, "v1.13") { t.Errorf("error should cite target version, got %q", msg) } + if !strings.Contains(msg, "configured reconcile window (1ms)") { + t.Errorf("error should reference configured reconcile window value, got %q", msg) + } + if strings.Contains(msg, "90s window") { + t.Errorf("error should not regress to hardcoded 90s wording, got %q", msg) + } }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/commands/preflight_upgrade_verify_test.go` around lines 129 - 152, The test TestVerifyPostUpgradeVersion_MinorMismatch_Blocks should also assert that the emitted mismatch message reflects the provided reconcile-window duration (the time.Millisecond argument) instead of any hardcoded "90s" text; update the test to check msg (combined err.Error() + buf.String()) contains the expected duration wording (e.g., "1ms" or the phrase used by verifyPostUpgradeVersion like "reconcile-window") so the new behavior is locked in and the message no longer regresses to "90s".
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@docs/apply-safety-gates-test-plan.md`:
- Line 70: The Markdown table row at the "Real-schema pin" entry is malformed
(too few cells) causing MD056; update the row so it has three pipe-separated
cells matching the table's 3-column layout. Locate the row referencing
"Real-schema pin" and the test names
TestWalkNetAddrFindings_RealSchema_StaticHostConfig and
TestWalkNetAddrFindings_RealSchema_NetworkRuleConfig and ensure the line
contains exactly two pipe separators (three columns), e.g. split the explanatory
text into the appropriate second/third cells and keep the note that the walker
fires on what Talos emits in its own cell so the table renders aligned.
In `@docs/manual-test-plan.md`:
- Line 247: Replace the code span that currently contains the spaced text `node
: ` with a normalized `node:` to satisfy markdownlint MD038; update the example
line that shows `node 192.0.2.10: talm: drift verification unavailable on
maintenance connection` so any code spans use `node:` (no inner spaces) while
leaving the single-node case output `talm: drift verification unavailable on
maintenance connection` unchanged.
---
Nitpick comments:
In `@pkg/commands/preflight_upgrade_verify_test.go`:
- Around line 129-152: The test
TestVerifyPostUpgradeVersion_MinorMismatch_Blocks should also assert that the
emitted mismatch message reflects the provided reconcile-window duration (the
time.Millisecond argument) instead of any hardcoded "90s" text; update the test
to check msg (combined err.Error() + buf.String()) contains the expected
duration wording (e.g., "1ms" or the phrase used by verifyPostUpgradeVersion
like "reconcile-window") so the new behavior is locked in and the message no
longer regresses to "90s".
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 903aa8c6-19ec-4e11-9faf-3353b64fb5fd
📒 Files selected for processing (16)
README.mddocs/apply-safety-gates-test-plan.mddocs/manual-test-plan.mdpkg/applycheck/refs_netaddr.gopkg/applycheck/refs_netaddr_test.gopkg/commands/apply.gopkg/commands/apply_test.gopkg/commands/preflight_apply_safety.gopkg/commands/preflight_apply_safety_redact.gopkg/commands/preflight_apply_safety_redact_test.gopkg/commands/preflight_apply_safety_test.gopkg/commands/preflight_upgrade_verify.gopkg/commands/preflight_upgrade_verify_test.gopkg/commands/talosctl_wrapper.gopkg/commands/upgrade_handler.gopkg/commands/upgrade_handler_test.go
🚧 Files skipped from review as they are similar to previous changes (8)
- pkg/commands/talosctl_wrapper.go
- pkg/commands/preflight_apply_safety_redact.go
- pkg/commands/apply_test.go
- pkg/commands/upgrade_handler.go
- pkg/commands/apply.go
- pkg/commands/upgrade_handler_test.go
- pkg/commands/preflight_apply_safety_test.go
- pkg/commands/preflight_apply_safety.go
| -f nodes/node0.yaml | ||
| ``` | ||
|
|
||
| Expected (per node): `node 192.0.2.10: talm: drift verification unavailable on maintenance connection`. The single-node case (empty `nodeID`, the implicit path) MUST still emit the bare `talm: drift verification unavailable on maintenance connection` line — no `node : ` garbage prefix. |
There was a problem hiding this comment.
Fix code-span spacing to satisfy markdownlint MD038.
Line 247 has spaces inside a code span (`node : `), which triggers lint and is easy to normalize without changing meaning.
📝 Proposed doc fix
-Expected (per node): `node 192.0.2.10: talm: drift verification unavailable on maintenance connection`. The single-node case (empty `nodeID`, the implicit path) MUST still emit the bare `talm: drift verification unavailable on maintenance connection` line — no `node : ` garbage prefix.
+Expected (per node): `node 192.0.2.10: talm: drift verification unavailable on maintenance connection`. The single-node case (empty `nodeID`, the implicit path) MUST still emit the bare `talm: drift verification unavailable on maintenance connection` line — no `node :` garbage prefix.📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| Expected (per node): `node 192.0.2.10: talm: drift verification unavailable on maintenance connection`. The single-node case (empty `nodeID`, the implicit path) MUST still emit the bare `talm: drift verification unavailable on maintenance connection` line — no `node : ` garbage prefix. | |
| Expected (per node): `node 192.0.2.10: talm: drift verification unavailable on maintenance connection`. The single-node case (empty `nodeID`, the implicit path) MUST still emit the bare `talm: drift verification unavailable on maintenance connection` line — no `node :` garbage prefix. |
🧰 Tools
🪛 markdownlint-cli2 (0.22.1)
[warning] 247-247: Spaces inside code span elements
(MD038, no-space-in-code)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@docs/manual-test-plan.md` at line 247, Replace the code span that currently
contains the spaced text `node : ` with a normalized `node:` to satisfy
markdownlint MD038; update the example line that shows `node 192.0.2.10: talm:
drift verification unavailable on maintenance connection` so any code spans use
`node:` (no inner spaces) while leaving the single-node case output `talm: drift
verification unavailable on maintenance connection` unchanged.
Four operator-UX polish changes on the safety surfaces introduced by #173. * #191 — Per-node prefix on the maintenance-connection warning. Multi-node insecure apply used to print identical bare 'talm: drift verification unavailable on maintenance connection' lines for every node with no per-node correlation. Both the previewDrift and verifyAppliedState emission sites now wrap through nodePrefix(nodeID); single-node case (empty nodeID) keeps the bare line so the common UX is unchanged. cosiPreflightContext got a fallback path to GlobalArgs.Nodes[0] when no outgoing-context metadata is attached — the maintenance flow (openClientPerNodeMaintenance) pins the node in GlobalArgs but does not attach context metadata, so without the fallback the per-node prefix collapsed to empty even when --nodes was set explicitly. Fallback is single-element-only; multi-element GlobalArgs.Nodes returns empty so a broken caller surfaces as 'no prefix' rather than 'wrong node prefix'. * #190 — Configurable post-upgrade reconcile window. The hardcoded 90s wait after talosctl upgrade returns is now --post-upgrade-reconcile-window. Default defaultPostUpgradeReconcileWindow = 90 * time.Second preserves byte-identical back-compat. validatePostUpgradeReconcileWindow rejects non-positive values at the TOP of wrapUpgradeCommand RunE — fail-fast BEFORE any talosctl RPC fires, so an operator's '=0s' typo cannot land a partial upgrade. The version-mismatch hint copy and error body de-hardcode the '90s' literal to reference 'the configured reconcile window' (defaultPostUpgradeReconcileWindow flows through verifyPostUpgradeVersion). * #192 — Syntactic net-addr walker for three v1alpha1 multidoc kinds. Validates fields the actual Talos schema emits: * StaticHostConfig.name (IP literal; the `name` field on this kind doubles as the IP — there is no separate `address` field). * NetworkRuleConfig.ingress[].subnet and .except (CIDR shapes; no top-level matchSourceAddress[] field exists in Talos's schema). * WireguardConfig.peers[].endpoint (host:port; empty / absent endpoint is a listener-only peer and NOT a finding). Walker runs in parallel with the Ref-based walker via multidocNetAddrHandlers; dispatch-table disjointness with multidocHandlers is pinned by a unit test. Integration tests verify Phase 1 blocks on bad and passes on valid. * #189 — Drift preview redacts secret-bearing field values by default. secretFieldPaths allowlist with bracket-normalisation regex; secret check fires BEFORE bothSlices in formatFieldChangeLine so slice-shaped allowlist entries (cluster.acceptedCAs, machine.acceptedCAs, peers) never leak via formatSliceSetDiff. Length-disclosing redaction sentinel (***redacted (len=N)***) preserves rotation signal without leaking the value. Non-string secret values route through redactValue(fmt.Sprintf("%v", value)) so int / bool rotations still surface as different-length sentinels. Map iteration order non-determinism is disclaimed in the godoc (no map-shaped allowlist entry today). New flag --show-secrets-in-drift (default off) plumbed through previewDrift / verifyAppliedState / printDriftPreview / formatFieldChangeLine. apply.go callers pass applyCmdFlags.showSecretsInDrift. absentFieldValue const hoisted so formatFieldValue and formatSecretFieldValue stay byte-identical on the absent path — add/remove vs rotate stays distinguishable. Doc updates: * README.md bullet #1 (Phase 1) now mentions the net-addr walker; bullet #2 (Phase 2A) mentions redaction + --show-secrets-in-drift + the per-node prefix; bullet #4 (Phase 2C) says 'the configured reconcile window' instead of hardcoding 90s. * docs/manual-test-plan.md gains sections C5/C6/C7 (redact, flag opt-out, walker), D3 (per-node prefix), E3 (reconcile window with help-text + 0s rejection), M6 (false-positive guard for redaction), M7 (walker boundary cases). Forward- looking 'do X, expect Y' shape with explicit regression anchors. * docs/apply-safety-gates-test-plan.md gets matching rows in Phase 1 (net-addr walker), Phase 2A (redaction + per-node prefix), and Phase 2C (reconcile window) tables. Test coverage: * Walker: per-kind table tests with the real schema shape, plus TestWalkNetAddrFindings_RealSchema_StaticHostConfig and ..._NetworkRuleConfig as schema-anchor regression pins so a future drift back to non-existent fields fails these tests rather than passing trivially. * Reconcile window: validator boundary cases, flag default one-(default)-clause pin, fail-fast ordering pin via a sentinel originalRunE, hint-text no-hardcoded-90s pin, a README-no-hardcoded-90s pin so README and code stay in sync. * Per-node prefix: maintenance message on both emission sites, empty-nodeID bare-line guard, single-vs-multi GlobalArgs.Nodes fallback pair. * Redaction: exact-match, Wireguard paths, slice-rotation no-leak (CA-list and peers shapes), bracket-normalisation, false-prefix guard, non-secret control, opt-in flag, non-string redaction with length-signal, absent-side add-vs-rotate distinction. Closes #189, closes #190, closes #191, closes #192. Assisted-By: Claude <noreply@anthropic.com> Signed-off-by: Aleksei Sviridkin <f@lex.la>
6d7c654 to
b68ec42
Compare
What
Four follow-ups filed during the review of PR #173 (apply-safety gates, merged 2026-05-13), now bundled into one PR because they touch the same surfaces (
pkg/applycheck,pkg/commands/preflight_apply_safety.go,pkg/commands/upgrade_handler.go).drift verification unavailable on maintenance connectionwarning so multi-node insecure apply disambiguates per node.--post-upgrade-reconcile-windowflag (default 90s). Neutralize the version-mismatch hint copy so operators with a custom window see consistent advice.StaticHostConfig.address,NetworkRuleConfig.matchSourceAddress[],WireguardConfig.peers[].endpoint. Phase 1 now blocks malformed net-addr fields before the apply RPC.cluster.token,cluster.ca.key, Wireguard keys, etc.). Operator opt-out via--show-secrets-in-drift. Length-preserving sentinel keeps rotation signal visible.Why
All four are operator-UX polish on the safety surfaces #173 introduced. None change apply semantics:
talm: ...lines on multi-node insecure apply provide no per-node correlation; the drift / divergence headers already disambiguate, the maintenance line lagged behind.StaticHostConfig.addresswas silent until apply-time (or worse, runtime); walker catches before the RPC.Surface
pkg/commands/preflight_apply_safety.go(talm apply: maintenance-connection warning lacks per-node prefix on multi-node insecure #191 emission sites; talm apply: redact secret values in drift preview output #189 redaction-aware formatter; net-addr walker integration in Phase 1).pkg/commands/preflight_apply_safety_redact.go+_test.go(NEW) — talm apply: redact secret values in drift preview output #189 secret-path allowlist + matcher.pkg/applycheck/refs_netaddr.go+_test.go(NEW) — talm apply: extend declared-resource walker to StaticHostConfig / NetworkRuleConfig / WireguardConfig references #192 syntactic net-addr walker.pkg/commands/upgrade_handler.go+preflight_upgrade_verify.go— talm upgrade: make post-upgrade reconcile window configurable via a flag #190 flag plumbing + hint copy.pkg/commands/apply.go—--show-secrets-in-driftflag registration;showSecretsInDriftplumbed throughpreviewDrift/verifyAppliedState/printDriftPreview/formatFieldChangeLine.docs/manual-test-plan.md— Sections C5/C6/C7, D3, E3, M6/M7 with forward-looking regression anchors.Behaviour-pin tests
TestPreviewDrift_MaintenanceMessage_CarriesNodePrefix/_EmptyNodeIDPreservesBareLine(talm apply: maintenance-connection warning lacks per-node prefix on multi-node insecure #191).TestUpgradeFlag_PostUpgradeReconcileWindow_Registered+TestValidatePostUpgradeReconcileWindow_{Zero,Negative,Positive}+TestPostUpgradeVersionMismatchHint_NoHardcoded90s(talm upgrade: make post-upgrade reconcile window configurable via a flag #190).TestWalkNetAddrFindings_{StaticHostConfig,NetworkRuleConfig,WireguardConfig}table tests +TestMultidocNetAddrHandlers_NoOverlapWithRefHandlers+TestPreflightValidateResources_NetAddrFinding_{Blocks,ValidPasses}(talm apply: extend declared-resource walker to StaticHostConfig / NetworkRuleConfig / WireguardConfig references #192).TestIsSecretPath_{ExactMatch,BracketNormalisedPrefix,NoFalseMatchOnPrefix,NonSecretPath_NoMatch}+TestRedactValue_PreservesLength+TestFormatFieldChangeLine_{RedactsSecretByDefault,ShowsSecretsWhenFlagSet,NonSecretPathsUnchanged,AcceptedCAsArrayIndex_Redacted}(talm apply: redact secret values in drift preview output #189).Verification
Local CI gate (host):
go build ./...cleango test ./... -raceall packages greengolangci-lint run ./...0 issuesGOOS=windows go vet ./...cleandev17 smokes pending (#190 flag +
--help, #192 hand-crafted bad chart, #189 secret rotation diff). #191 covered by unit suite end-to-end via stub reader; live multi-node maintenance state is impractical to fake on the standing test cluster.Closes #189, closes #190, closes #191, closes #192.
Summary by CodeRabbit
New Features
Bug Fixes
Documentation
Tests