Skip to content

feat(apply,upgrade): apply-safety follow-ups (#189 + #190 + #191 + #192)#200

Merged
lexfrei merged 1 commit into
mainfrom
feat/apply-safety-followups
May 13, 2026
Merged

feat(apply,upgrade): apply-safety follow-ups (#189 + #190 + #191 + #192)#200
lexfrei merged 1 commit into
mainfrom
feat/apply-safety-followups

Conversation

@lexfrei
Copy link
Copy Markdown
Contributor

@lexfrei lexfrei commented May 12, 2026

What

Four follow-ups filed during the review of PR #173 (apply-safety gates, merged 2026-05-13), now bundled into one PR because they touch the same surfaces (pkg/applycheck, pkg/commands/preflight_apply_safety.go, pkg/commands/upgrade_handler.go).

Issue Scope
#191 Per-node prefix on the drift verification unavailable on maintenance connection warning so multi-node insecure apply disambiguates per node.
#190 Promote the hardcoded 90s post-upgrade reconcile window to a tunable --post-upgrade-reconcile-window flag (default 90s). Neutralize the version-mismatch hint copy so operators with a custom window see consistent advice.
#192 New syntactic walker for StaticHostConfig.address, NetworkRuleConfig.matchSourceAddress[], WireguardConfig.peers[].endpoint. Phase 1 now blocks malformed net-addr fields before the apply RPC.
#189 Drift preview redacts secret-bearing field values by default (cluster.token, cluster.ca.key, Wireguard keys, etc.). Operator opt-out via --show-secrets-in-drift. Length-preserving sentinel keeps rotation signal visible.

Why

All four are operator-UX polish on the safety surfaces #173 introduced. None change apply semantics:

Surface

Behaviour-pin tests

Verification

Local CI gate (host):

  • go build ./... clean
  • go test ./... -race all packages green
  • golangci-lint run ./... 0 issues
  • GOOS=windows go vet ./... clean

dev17 smokes pending (#190 flag + --help, #192 hand-crafted bad chart, #189 secret rotation diff). #191 covered by unit suite end-to-end via stub reader; live multi-node maintenance state is impractical to fake on the standing test cluster.

Closes #189, closes #190, closes #191, closes #192.

Summary by CodeRabbit

  • New Features

    • Optional flag to show secret values in drift previews.
    • Configurable post-upgrade reconcile/verify window (default 90s) with validation.
  • Bug Fixes

    • Stronger net-address validation (IPs, CIDRs, host:port) that blocks malformed entries.
    • Default secret redaction with deterministic redaction sentinel; opt-in disclosure when requested.
    • Maintenance/insecure output prefixes multi-node lines; single-node output unchanged.
    • WireGuard peers with empty/omitted endpoints treated as listener-only.
  • Documentation

    • Updated README and test plans for safety gates, redaction, and upgrade timing.
  • Tests

    • Expanded coverage for validation, redaction, output behavior, and upgrade timing.

Review Change Stack

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 12, 2026

Warning

Rate limit exceeded

@lexfrei has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 40 minutes and 31 seconds before requesting another review.

You’ve run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 76388ca2-a402-45cc-8296-cfde413dc3f1

📥 Commits

Reviewing files that changed from the base of the PR and between 6d7c654 and b68ec42.

📒 Files selected for processing (16)
  • README.md
  • docs/apply-safety-gates-test-plan.md
  • docs/manual-test-plan.md
  • pkg/applycheck/refs_netaddr.go
  • pkg/applycheck/refs_netaddr_test.go
  • pkg/commands/apply.go
  • pkg/commands/apply_test.go
  • pkg/commands/preflight_apply_safety.go
  • pkg/commands/preflight_apply_safety_redact.go
  • pkg/commands/preflight_apply_safety_redact_test.go
  • pkg/commands/preflight_apply_safety_test.go
  • pkg/commands/preflight_upgrade_verify.go
  • pkg/commands/preflight_upgrade_verify_test.go
  • pkg/commands/talosctl_wrapper.go
  • pkg/commands/upgrade_handler.go
  • pkg/commands/upgrade_handler_test.go
📝 Walkthrough

Walkthrough

This PR adds a Phase‑1 net‑addr syntactic walker and tests, appends its findings into preflight, introduces an allowlist-based secret-path redaction (default redaction with a --show-secrets-in-drift opt‑in), threads the flag through drift/verify rendering, and makes the post‑upgrade reconcile window configurable with validation. Tests and docs/manual plans updated.

Changes

Multi-phase validation and drift-preview enhancements

Layer / File(s) Summary
Net-addr validation walker
pkg/applycheck/refs_netaddr.go, pkg/applycheck/refs_netaddr_test.go
New public WalkNetAddrFindings validates StaticHostConfig.name (IP literal), `NetworkRuleConfig.ingress[].subnet
Phase 1 preflight integration
pkg/commands/preflight_apply_safety.go
preflightValidateResources appends WalkNetAddrFindings results to the main findings slice before blocker handling.
Secret redaction utilities
pkg/commands/preflight_apply_safety_redact.go, pkg/commands/preflight_apply_safety_redact_test.go
Adds secretFieldPaths allowlist, index-normalizing regex, isSecretPath (normalizes numeric indices like [3][]), and redactValue sentinel (***redacted (len=...)***). Tests validate exact/normalized matching, bracket/index normalization, non-redaction of non-secret paths, false-positive prevention, non-string secret handling, absent-side formatting, and rotation/slice regressions to prevent leaks.
Phase 2 drift-preview and verify secret redaction
pkg/commands/apply.go, pkg/commands/preflight_apply_safety.go
Registers --show-secrets-in-drift and threads a showSecrets boolean through previewDrift, verifyAppliedState, printDriftPreview, and formatFieldChangeLine. Introduces formatSecretFieldValue and absentFieldValue, and conditionally redacts allowlisted secret paths when showSecrets is false. Insecure/maintenance output prints a single talm: line prefixed by nodePrefix(nodeID) when applicable.
Apply CLI flag and preflight context
pkg/commands/apply.go, pkg/commands/apply_test.go
apply gains --show-secrets-in-drift; apply code passes the flag into preview/verify calls. cosiPreflightContext now falls back to GlobalArgs.Nodes[0] only when no outgoing metadata and exactly one node target; tests cover single-node fallback and multi-node no-fallback.
Test call-site updates and new assertions
pkg/commands/preflight_apply_safety_test.go, multiple test files
Existing tests updated to pass showSecrets (false) where needed. New integration tests assert maintenance-connection warning prefixing (per-node node <id>: vs bare talm:) and Wireguard endpoint Phase‑1 validation behavior.
Configurable post-upgrade reconcile window
pkg/commands/upgrade_handler.go, pkg/commands/upgrade_handler_test.go, pkg/commands/preflight_upgrade_verify.go, pkg/commands/talosctl_wrapper.go
Adds --post-upgrade-reconcile-window flag with defaultPostUpgradeReconcileWindow (90s) and validatePostUpgradeReconcileWindow (rejects non‑positive durations). Validation is performed early in the upgrade command and at verify start; the validated duration is passed into the inner verifier. Operator hints/docs updated to reference the configured window and tests assert registration, validation, fail-fast behavior, and absence of hardcoded “90s”.
Docs and manual test plan expansion
README.md, docs/apply-safety-gates-test-plan.md, docs/manual-test-plan.md
Documentation expanded to describe Phase‑1 net‑addr checks, Phase‑2 redaction behavior (including array-index normalization and --show-secrets-in-drift), maintenance/insecure messaging differences (single vs multi-node), configurable reconcile window behavior and validation, and manual/regression test cases.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Suggested reviewers

  • IvanHunters
  • myasnikovdaniil

🐰 I hopped through YAML, fields tidy and neat,
IPs and endpoints checked before they meet.
Secrets masked by default, show them if you please,
Upgrade windows stretch with flags and gentle ease.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title accurately summarizes the main change as apply-safety follow-ups bundling four related issues (#189, #190, #191, #192) across apply and upgrade functionality.
Docstring Coverage ✅ Passed Docstring coverage is 93.02% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/apply-safety-followups

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
pkg/commands/preflight_apply_safety.go (1)

145-146: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Generalize the blocker summary now that net-addr blockers are included.

Line 145 currently reports only unresolved host resources, but blockers can now also be malformed net-addr fields. This summary is now misleading.

✏️ Suggested patch
-		errors.Newf("pre-flight: %d declared host resource(s) do not resolve on the target node", blockers),
+		errors.Newf("pre-flight: %d blocking finding(s) detected in rendered MachineConfig", blockers),
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/commands/preflight_apply_safety.go` around lines 145 - 146, Update the
error message created by errors.Newf that references the blockers variable so it
no longer only mentions unresolved host resources; instead use a generalized
phrase (e.g., "declared resource(s) failed validation or could not be resolved
on the target node") that covers both unresolved host resources and malformed
net-addr fields, and keep the existing guidance about correcting values or using
--skip-resource-validation. Locate the errors.Newf(...) call that builds the
blocker summary (uses the blockers variable) and replace the specific "do not
resolve on the target node" text with the broader wording.
pkg/commands/preflight_upgrade_verify.go (1)

112-112: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Hardcoded "90s window" reference inconsistent with configurable reconcile window.

Line 112's error message still hardcodes "beyond the 90s window", contradicting the configurable --post-upgrade-reconcile-window flag introduced in this PR. An operator passing --post-upgrade-reconcile-window=180s would see "beyond the 90s window" in the error, which is misleading.

🔧 Proposed fix to use dynamic reconcile window reference

Since this error is constructed within verifyPostUpgradeVersion, which doesn't have direct access to the configured window duration, consider either:

  1. Pass the reconcile window as a parameter to verifyPostUpgradeVersion and interpolate it into the error message, or
  2. Replace the hardcoded reference with generic wording like "beyond the configured reconcile window" to match the hint text pattern.

Option 2 is simpler:

-		errors.Newf("post-upgrade: requested upgrade to %s but running version is %s — either Talos auto-rolled back, or the node is still booting beyond the 90s window", target, running),
+		errors.Newf("post-upgrade: requested upgrade to %s but running version is %s — either Talos auto-rolled back, or the node is still booting beyond the configured reconcile window", target, running),
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/commands/preflight_upgrade_verify.go` at line 112, The error message in
verifyPostUpgradeVersion currently hardcodes "beyond the 90s window" which
conflicts with the new --post-upgrade-reconcile-window flag; update
verifyPostUpgradeVersion's error construction (the errors.Newf call that
mentions target and running) to either accept the reconcile window as an
additional parameter and interpolate its value, or (simpler) replace the
hardcoded phrase with a generic "beyond the configured reconcile window" so the
message reflects the configurable behavior; reference the
verifyPostUpgradeVersion function and the specific errors.Newf invocation when
making the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pkg/commands/preflight_apply_safety_redact.go`:
- Around line 29-57: The secretFieldPaths allowlist is missing WireGuard-related
secret fields, so add the WireGuard key paths to the secretFieldPaths slice (the
bracket-normalised forms used there) to ensure redaction; for example add
entries like "cluster.wireguard.privateKey", "cluster.wireguard.preSharedKey",
"cluster.wireguard.keys[].privateKey", and the machine-scoped equivalents such
as "machine.wireguard.privateKey" and "machine.wireguard.preSharedKey" to the
secretFieldPaths variable so WireGuard private/PSK values are redacted in diffs.

---

Outside diff comments:
In `@pkg/commands/preflight_apply_safety.go`:
- Around line 145-146: Update the error message created by errors.Newf that
references the blockers variable so it no longer only mentions unresolved host
resources; instead use a generalized phrase (e.g., "declared resource(s) failed
validation or could not be resolved on the target node") that covers both
unresolved host resources and malformed net-addr fields, and keep the existing
guidance about correcting values or using --skip-resource-validation. Locate the
errors.Newf(...) call that builds the blocker summary (uses the blockers
variable) and replace the specific "do not resolve on the target node" text with
the broader wording.

In `@pkg/commands/preflight_upgrade_verify.go`:
- Line 112: The error message in verifyPostUpgradeVersion currently hardcodes
"beyond the 90s window" which conflicts with the new
--post-upgrade-reconcile-window flag; update verifyPostUpgradeVersion's error
construction (the errors.Newf call that mentions target and running) to either
accept the reconcile window as an additional parameter and interpolate its
value, or (simpler) replace the hardcoded phrase with a generic "beyond the
configured reconcile window" so the message reflects the configurable behavior;
reference the verifyPostUpgradeVersion function and the specific errors.Newf
invocation when making the change.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: fa5b912e-f751-4b4a-8ea6-6315ad88b263

📥 Commits

Reviewing files that changed from the base of the PR and between 020d209 and a337835.

📒 Files selected for processing (12)
  • docs/manual-test-plan.md
  • pkg/applycheck/refs_netaddr.go
  • pkg/applycheck/refs_netaddr_test.go
  • pkg/commands/apply.go
  • pkg/commands/preflight_apply_safety.go
  • pkg/commands/preflight_apply_safety_redact.go
  • pkg/commands/preflight_apply_safety_redact_test.go
  • pkg/commands/preflight_apply_safety_test.go
  • pkg/commands/preflight_upgrade_verify.go
  • pkg/commands/talosctl_wrapper.go
  • pkg/commands/upgrade_handler.go
  • pkg/commands/upgrade_handler_test.go

Comment thread pkg/commands/preflight_apply_safety_redact.go
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces several enhancements to the talm tool, focusing on security, validation, and user experience. Key changes include the implementation of secret redaction in drift previews and post-apply verification (with an opt-in flag to show them), and the addition of syntactic validation for network address fields (IPs, CIDRs, and host:port endpoints) during pre-flight checks. It also adds a configurable post-upgrade reconcile window and improves multi-node logging by adding per-node prefixes to maintenance warnings. Feedback was provided regarding a potential security vulnerability where slice-based diffing might bypass redaction logic if the order of checks is not adjusted.

Comment thread pkg/commands/preflight_apply_safety.go Outdated
Comment on lines +330 to +332
if oldSlice, newSlice, ok := bothSlices(change); ok {
return formatSliceSetDiff(change.Path, oldSlice, newSlice)
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-high high

The bothSlices check currently takes precedence over the secret redaction logic. If a field path is marked as a secret in the allowlist but its value is represented as a slice (e.g., during a set-diff update), the formatSliceSetDiff function will be called, which is not redaction-aware. This could lead to sensitive data being leaked in the drift preview if a secret-bearing field is also a slice or if a secret is nested within a slice that is being diffed as a whole. Please ensure that isSecretPath is checked before bothSlices to guarantee redaction for all sensitive paths.

lexfrei added a commit that referenced this pull request May 12, 2026
#189+#190+#191+#192)

Address branch-review blockers on PR #200:

  - secretFieldPaths advertised Wireguard private/preshared keys
    (both in --show-secrets-in-drift help text and in the
    redact.go file comment) but contained zero Wireguard entries.
    Rotating WireguardConfig.privateKey via talm apply leaked the
    base64 key to stderr — the security-class bug the redaction
    feature exists to prevent. Add bare 'privateKey' +
    'peers[].presharedKey' paths to the allowlist (the differ's
    flatten step does not prefix multidoc fields with the doc
    kind — see pkg/applycheck/diff.go identityOf). Pinned by
    TestIsSecretPath_WireguardSecretPaths.
  - docs/apply-safety-gates-test-plan.md (the narrow gates
    checklist that manual-test-plan.md cites as canonical) was
    stale on the new behaviour: no rows for the Phase 1 net-addr
    walker (#192), the Phase 2A secret redaction (#189), the
    Phase 2A per-node prefix on the maintenance line (#191), or
    the Phase 2C --post-upgrade-reconcile-window flag (#190).
    Add rows in the existing table style for each.
  - formatSecretFieldValue godoc now disclaims the rotation
    signal for map-shaped values: Go's fmt.Sprintf('%v', m) on a
    map[string]any iterates keys in randomised order, so length
    is not a reliable rotation indicator for maps. Today's
    allowlist has no map entries; if a future addition does,
    either canonicalise the map (sorted keys before %v) or
    disclaim the rotation signal for that entry.

Signed-off-by: Aleksei Sviridkin <f@lex.la>
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pkg/commands/preflight_apply_safety.go`:
- Around line 116-123: The appended net-addr findings from
applycheck.WalkNetAddrFindings are expanding the Phase 1 blockers beyond
unresolved host resources, so update the Phase 1 blocker summary text (the code
that constructs the final error/message for blockers where 'findings' is used)
to use a generalized description instead of referencing only unresolved host
resources; locate where 'findings' is counted/converted into the Phase 1 error
message and replace the specific wording with a broader phrase like "Phase 1
blockers detected" or similar to accurately reflect mixed net-addr and
host-resource blockers.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 87b74f74-5c38-4b02-86ff-a23a35fde2fe

📥 Commits

Reviewing files that changed from the base of the PR and between a337835 and 13a8b4a.

📒 Files selected for processing (10)
  • README.md
  • docs/apply-safety-gates-test-plan.md
  • docs/manual-test-plan.md
  • pkg/commands/apply.go
  • pkg/commands/apply_test.go
  • pkg/commands/preflight_apply_safety.go
  • pkg/commands/preflight_apply_safety_redact.go
  • pkg/commands/preflight_apply_safety_redact_test.go
  • pkg/commands/upgrade_handler.go
  • pkg/commands/upgrade_handler_test.go
✅ Files skipped from review due to trivial changes (1)
  • docs/apply-safety-gates-test-plan.md
🚧 Files skipped from review as they are similar to previous changes (4)
  • pkg/commands/upgrade_handler.go
  • pkg/commands/preflight_apply_safety_redact.go
  • pkg/commands/preflight_apply_safety_redact_test.go
  • pkg/commands/upgrade_handler_test.go

Comment on lines +116 to +123

netAddrFindings, err := applycheck.WalkNetAddrFindings(rendered)
if err != nil {
return errors.Wrap(err, "pre-flight: walking rendered MachineConfig for net-addr fields")
}

findings = append(findings, netAddrFindings...)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Broaden the Phase 1 blocker summary wording to match new net-addr blockers.

After appending WalkNetAddrFindings, the blocker count is no longer limited to unresolved host resources. The final error text should be generalized to avoid misleading remediation.

Proposed fix
-	return errors.WithHint(
-		errors.Newf("pre-flight: %d declared host resource(s) do not resolve on the target node", blockers),
-		"correct the values referenced above, or pass --skip-resource-validation to bypass.",
-	)
+	return errors.WithHint(
+		errors.Newf("pre-flight: %d validation blocker(s) found in rendered MachineConfig", blockers),
+		"correct the blocked values referenced above (resource refs / net-addr fields), or pass --skip-resource-validation to bypass.",
+	)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/commands/preflight_apply_safety.go` around lines 116 - 123, The appended
net-addr findings from applycheck.WalkNetAddrFindings are expanding the Phase 1
blockers beyond unresolved host resources, so update the Phase 1 blocker summary
text (the code that constructs the final error/message for blockers where
'findings' is used) to use a generalized description instead of referencing only
unresolved host resources; locate where 'findings' is counted/converted into the
Phase 1 error message and replace the specific wording with a broader phrase
like "Phase 1 blockers detected" or similar to accurately reflect mixed net-addr
and host-resource blockers.

@lexfrei lexfrei force-pushed the feat/apply-safety-followups branch 2 times, most recently from 4a70764 to 6d7c654 Compare May 13, 2026 08:04
@lexfrei lexfrei self-assigned this May 13, 2026
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
pkg/commands/preflight_upgrade_verify_test.go (1)

129-152: ⚡ Quick win

Add a regression assertion for configured reconcile-window wording.

These tests pass the new duration argument, but they don’t assert the emitted mismatch text reflects that value (and no longer regresses to hardcoded 90s phrasing). A small assertion here would lock the new behavior.

♻️ Proposed test hardening
 func TestVerifyPostUpgradeVersion_MinorMismatch_Blocks(t *testing.T) {
@@
 	msg := err.Error() + " " + buf.String()
@@
 	if !strings.Contains(msg, "v1.13") {
 		t.Errorf("error should cite target version, got %q", msg)
 	}
+	if !strings.Contains(msg, "configured reconcile window (1ms)") {
+		t.Errorf("error should reference configured reconcile window value, got %q", msg)
+	}
+	if strings.Contains(msg, "90s window") {
+		t.Errorf("error should not regress to hardcoded 90s wording, got %q", msg)
+	}
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/commands/preflight_upgrade_verify_test.go` around lines 129 - 152, The
test TestVerifyPostUpgradeVersion_MinorMismatch_Blocks should also assert that
the emitted mismatch message reflects the provided reconcile-window duration
(the time.Millisecond argument) instead of any hardcoded "90s" text; update the
test to check msg (combined err.Error() + buf.String()) contains the expected
duration wording (e.g., "1ms" or the phrase used by verifyPostUpgradeVersion
like "reconcile-window") so the new behavior is locked in and the message no
longer regresses to "90s".
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/apply-safety-gates-test-plan.md`:
- Line 70: The Markdown table row at the "Real-schema pin" entry is malformed
(too few cells) causing MD056; update the row so it has three pipe-separated
cells matching the table's 3-column layout. Locate the row referencing
"Real-schema pin" and the test names
TestWalkNetAddrFindings_RealSchema_StaticHostConfig and
TestWalkNetAddrFindings_RealSchema_NetworkRuleConfig and ensure the line
contains exactly two pipe separators (three columns), e.g. split the explanatory
text into the appropriate second/third cells and keep the note that the walker
fires on what Talos emits in its own cell so the table renders aligned.

In `@docs/manual-test-plan.md`:
- Line 247: Replace the code span that currently contains the spaced text `node
: ` with a normalized `node:` to satisfy markdownlint MD038; update the example
line that shows `node 192.0.2.10: talm: drift verification unavailable on
maintenance connection` so any code spans use `node:` (no inner spaces) while
leaving the single-node case output `talm: drift verification unavailable on
maintenance connection` unchanged.

---

Nitpick comments:
In `@pkg/commands/preflight_upgrade_verify_test.go`:
- Around line 129-152: The test
TestVerifyPostUpgradeVersion_MinorMismatch_Blocks should also assert that the
emitted mismatch message reflects the provided reconcile-window duration (the
time.Millisecond argument) instead of any hardcoded "90s" text; update the test
to check msg (combined err.Error() + buf.String()) contains the expected
duration wording (e.g., "1ms" or the phrase used by verifyPostUpgradeVersion
like "reconcile-window") so the new behavior is locked in and the message no
longer regresses to "90s".
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 903aa8c6-19ec-4e11-9faf-3353b64fb5fd

📥 Commits

Reviewing files that changed from the base of the PR and between 13a8b4a and 6d7c654.

📒 Files selected for processing (16)
  • README.md
  • docs/apply-safety-gates-test-plan.md
  • docs/manual-test-plan.md
  • pkg/applycheck/refs_netaddr.go
  • pkg/applycheck/refs_netaddr_test.go
  • pkg/commands/apply.go
  • pkg/commands/apply_test.go
  • pkg/commands/preflight_apply_safety.go
  • pkg/commands/preflight_apply_safety_redact.go
  • pkg/commands/preflight_apply_safety_redact_test.go
  • pkg/commands/preflight_apply_safety_test.go
  • pkg/commands/preflight_upgrade_verify.go
  • pkg/commands/preflight_upgrade_verify_test.go
  • pkg/commands/talosctl_wrapper.go
  • pkg/commands/upgrade_handler.go
  • pkg/commands/upgrade_handler_test.go
🚧 Files skipped from review as they are similar to previous changes (8)
  • pkg/commands/talosctl_wrapper.go
  • pkg/commands/preflight_apply_safety_redact.go
  • pkg/commands/apply_test.go
  • pkg/commands/upgrade_handler.go
  • pkg/commands/apply.go
  • pkg/commands/upgrade_handler_test.go
  • pkg/commands/preflight_apply_safety_test.go
  • pkg/commands/preflight_apply_safety.go

Comment thread docs/apply-safety-gates-test-plan.md
Comment thread docs/manual-test-plan.md
-f nodes/node0.yaml
```

Expected (per node): `node 192.0.2.10: talm: drift verification unavailable on maintenance connection`. The single-node case (empty `nodeID`, the implicit path) MUST still emit the bare `talm: drift verification unavailable on maintenance connection` line — no `node : ` garbage prefix.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fix code-span spacing to satisfy markdownlint MD038.

Line 247 has spaces inside a code span (`node : `), which triggers lint and is easy to normalize without changing meaning.

📝 Proposed doc fix
-Expected (per node): `node 192.0.2.10: talm: drift verification unavailable on maintenance connection`. The single-node case (empty `nodeID`, the implicit path) MUST still emit the bare `talm: drift verification unavailable on maintenance connection` line — no `node : ` garbage prefix.
+Expected (per node): `node 192.0.2.10: talm: drift verification unavailable on maintenance connection`. The single-node case (empty `nodeID`, the implicit path) MUST still emit the bare `talm: drift verification unavailable on maintenance connection` line — no `node :` garbage prefix.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
Expected (per node): `node 192.0.2.10: talm: drift verification unavailable on maintenance connection`. The single-node case (empty `nodeID`, the implicit path) MUST still emit the bare `talm: drift verification unavailable on maintenance connection` line — no `node : ` garbage prefix.
Expected (per node): `node 192.0.2.10: talm: drift verification unavailable on maintenance connection`. The single-node case (empty `nodeID`, the implicit path) MUST still emit the bare `talm: drift verification unavailable on maintenance connection` line — no `node :` garbage prefix.
🧰 Tools
🪛 markdownlint-cli2 (0.22.1)

[warning] 247-247: Spaces inside code span elements

(MD038, no-space-in-code)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/manual-test-plan.md` at line 247, Replace the code span that currently
contains the spaced text `node : ` with a normalized `node:` to satisfy
markdownlint MD038; update the example line that shows `node 192.0.2.10: talm:
drift verification unavailable on maintenance connection` so any code spans use
`node:` (no inner spaces) while leaving the single-node case output `talm: drift
verification unavailable on maintenance connection` unchanged.

Four operator-UX polish changes on the safety surfaces introduced
by #173.

* #191 — Per-node prefix on the maintenance-connection warning.
  Multi-node insecure apply used to print identical bare
  'talm: drift verification unavailable on maintenance connection'
  lines for every node with no per-node correlation. Both the
  previewDrift and verifyAppliedState emission sites now wrap
  through nodePrefix(nodeID); single-node case (empty nodeID)
  keeps the bare line so the common UX is unchanged.

  cosiPreflightContext got a fallback path to GlobalArgs.Nodes[0]
  when no outgoing-context metadata is attached — the maintenance
  flow (openClientPerNodeMaintenance) pins the node in GlobalArgs
  but does not attach context metadata, so without the fallback
  the per-node prefix collapsed to empty even when --nodes was
  set explicitly. Fallback is single-element-only; multi-element
  GlobalArgs.Nodes returns empty so a broken caller surfaces as
  'no prefix' rather than 'wrong node prefix'.

* #190 — Configurable post-upgrade reconcile window.
  The hardcoded 90s wait after talosctl upgrade returns is now
  --post-upgrade-reconcile-window. Default
  defaultPostUpgradeReconcileWindow = 90 * time.Second preserves
  byte-identical back-compat. validatePostUpgradeReconcileWindow
  rejects non-positive values at the TOP of wrapUpgradeCommand
  RunE — fail-fast BEFORE any talosctl RPC fires, so an
  operator's '=0s' typo cannot land a partial upgrade. The
  version-mismatch hint copy and error body de-hardcode the '90s'
  literal to reference 'the configured reconcile window'
  (defaultPostUpgradeReconcileWindow flows through
  verifyPostUpgradeVersion).

* #192 — Syntactic net-addr walker for three v1alpha1 multidoc
  kinds. Validates fields the actual Talos schema emits:
    * StaticHostConfig.name (IP literal; the `name` field on
      this kind doubles as the IP — there is no separate
      `address` field).
    * NetworkRuleConfig.ingress[].subnet and .except (CIDR
      shapes; no top-level matchSourceAddress[] field exists
      in Talos's schema).
    * WireguardConfig.peers[].endpoint (host:port; empty /
      absent endpoint is a listener-only peer and NOT a
      finding).
  Walker runs in parallel with the Ref-based walker via
  multidocNetAddrHandlers; dispatch-table disjointness with
  multidocHandlers is pinned by a unit test. Integration tests
  verify Phase 1 blocks on bad and passes on valid.

* #189 — Drift preview redacts secret-bearing field values by
  default. secretFieldPaths allowlist with bracket-normalisation
  regex; secret check fires BEFORE bothSlices in
  formatFieldChangeLine so slice-shaped allowlist entries
  (cluster.acceptedCAs, machine.acceptedCAs, peers) never leak
  via formatSliceSetDiff. Length-disclosing redaction sentinel
  (***redacted (len=N)***) preserves rotation signal without
  leaking the value. Non-string secret values route through
  redactValue(fmt.Sprintf("%v", value)) so int / bool rotations
  still surface as different-length sentinels. Map iteration
  order non-determinism is disclaimed in the godoc (no
  map-shaped allowlist entry today).

  New flag --show-secrets-in-drift (default off) plumbed through
  previewDrift / verifyAppliedState / printDriftPreview /
  formatFieldChangeLine. apply.go callers pass
  applyCmdFlags.showSecretsInDrift.

  absentFieldValue const hoisted so formatFieldValue and
  formatSecretFieldValue stay byte-identical on the absent path
  — add/remove vs rotate stays distinguishable.

Doc updates:
  * README.md bullet #1 (Phase 1) now mentions the net-addr
    walker; bullet #2 (Phase 2A) mentions redaction +
    --show-secrets-in-drift + the per-node prefix; bullet #4
    (Phase 2C) says 'the configured reconcile window' instead
    of hardcoding 90s.
  * docs/manual-test-plan.md gains sections C5/C6/C7 (redact,
    flag opt-out, walker), D3 (per-node prefix), E3 (reconcile
    window with help-text + 0s rejection), M6 (false-positive
    guard for redaction), M7 (walker boundary cases). Forward-
    looking 'do X, expect Y' shape with explicit regression
    anchors.
  * docs/apply-safety-gates-test-plan.md gets matching rows in
    Phase 1 (net-addr walker), Phase 2A (redaction +
    per-node prefix), and Phase 2C (reconcile window) tables.

Test coverage:
  * Walker: per-kind table tests with the real schema shape,
    plus TestWalkNetAddrFindings_RealSchema_StaticHostConfig
    and ..._NetworkRuleConfig as schema-anchor regression pins
    so a future drift back to non-existent fields fails these
    tests rather than passing trivially.
  * Reconcile window: validator boundary cases, flag default
    one-(default)-clause pin, fail-fast ordering pin via a
    sentinel originalRunE, hint-text no-hardcoded-90s pin, a
    README-no-hardcoded-90s pin so README and code stay in
    sync.
  * Per-node prefix: maintenance message on both emission
    sites, empty-nodeID bare-line guard, single-vs-multi
    GlobalArgs.Nodes fallback pair.
  * Redaction: exact-match, Wireguard paths, slice-rotation
    no-leak (CA-list and peers shapes), bracket-normalisation,
    false-prefix guard, non-secret control, opt-in flag,
    non-string redaction with length-signal, absent-side
    add-vs-rotate distinction.

Closes #189, closes #190, closes #191, closes #192.

Assisted-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
@lexfrei lexfrei force-pushed the feat/apply-safety-followups branch from 6d7c654 to b68ec42 Compare May 13, 2026 08:23
@lexfrei lexfrei merged commit d3c602c into main May 13, 2026
8 checks passed
@lexfrei lexfrei deleted the feat/apply-safety-followups branch May 13, 2026 09:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

1 participant