feat(apply,upgrade): apply-safety follow-ups (#189 + #190 + #191 + #192) by lexfrei · Pull Request #200 · cozystack/talm

lexfrei · 2026-05-12T22:37:55Z

What

Four follow-ups filed during the review of PR #173 (apply-safety gates, merged 2026-05-13), now bundled into one PR because they touch the same surfaces (pkg/applycheck, pkg/commands/preflight_apply_safety.go, pkg/commands/upgrade_handler.go).

Issue	Scope
#191	Per-node prefix on the `drift verification unavailable on maintenance connection` warning so multi-node insecure apply disambiguates per node.
#190	Promote the hardcoded 90s post-upgrade reconcile window to a tunable `--post-upgrade-reconcile-window` flag (default 90s). Neutralize the version-mismatch hint copy so operators with a custom window see consistent advice.
#192	New syntactic walker for `StaticHostConfig.address`, `NetworkRuleConfig.matchSourceAddress[]`, `WireguardConfig.peers[].endpoint`. Phase 1 now blocks malformed net-addr fields before the apply RPC.
#189	Drift preview redacts secret-bearing field values by default (`cluster.token`, `cluster.ca.key`, Wireguard keys, etc.). Operator opt-out via `--show-secrets-in-drift`. Length-preserving sentinel keeps rotation signal visible.

Why

All four are operator-UX polish on the safety surfaces #173 introduced. None change apply semantics:

talm apply: redact secret values in drift preview output #189: secrets leaking into stderr / CI logs / screen-shares is a real-world incident class; redact-by-default with explicit opt-out.
talm upgrade: make post-upgrade reconcile window configurable via a flag #190: 90s window is too tight for slow hardware (cold OCI / large extensions / slow registries); operators currently see misleading "rollback" verdicts. The hint itself acknowledges the problem.
talm apply: maintenance-connection warning lacks per-node prefix on multi-node insecure #191: bare talm: ... lines on multi-node insecure apply provide no per-node correlation; the drift / divergence headers already disambiguate, the maintenance line lagged behind.
talm apply: extend declared-resource walker to StaticHostConfig / NetworkRuleConfig / WireguardConfig references #192: typo'd Wireguard endpoint or StaticHostConfig.address was silent until apply-time (or worse, runtime); walker catches before the RPC.

Surface

pkg/commands/preflight_apply_safety.go (talm apply: maintenance-connection warning lacks per-node prefix on multi-node insecure #191 emission sites; talm apply: redact secret values in drift preview output #189 redaction-aware formatter; net-addr walker integration in Phase 1).
pkg/commands/preflight_apply_safety_redact.go + _test.go (NEW) — talm apply: redact secret values in drift preview output #189 secret-path allowlist + matcher.
pkg/applycheck/refs_netaddr.go + _test.go (NEW) — talm apply: extend declared-resource walker to StaticHostConfig / NetworkRuleConfig / WireguardConfig references #192 syntactic net-addr walker.
pkg/commands/upgrade_handler.go + preflight_upgrade_verify.go — talm upgrade: make post-upgrade reconcile window configurable via a flag #190 flag plumbing + hint copy.
pkg/commands/apply.go — --show-secrets-in-drift flag registration; showSecretsInDrift plumbed through previewDrift/verifyAppliedState/printDriftPreview/formatFieldChangeLine.
docs/manual-test-plan.md — Sections C5/C6/C7, D3, E3, M6/M7 with forward-looking regression anchors.

Behaviour-pin tests

TestPreviewDrift_MaintenanceMessage_CarriesNodePrefix / _EmptyNodeIDPreservesBareLine (talm apply: maintenance-connection warning lacks per-node prefix on multi-node insecure #191).
TestUpgradeFlag_PostUpgradeReconcileWindow_Registered + TestValidatePostUpgradeReconcileWindow_{Zero,Negative,Positive} + TestPostUpgradeVersionMismatchHint_NoHardcoded90s (talm upgrade: make post-upgrade reconcile window configurable via a flag #190).
TestWalkNetAddrFindings_{StaticHostConfig,NetworkRuleConfig,WireguardConfig} table tests + TestMultidocNetAddrHandlers_NoOverlapWithRefHandlers + TestPreflightValidateResources_NetAddrFinding_{Blocks,ValidPasses} (talm apply: extend declared-resource walker to StaticHostConfig / NetworkRuleConfig / WireguardConfig references #192).
TestIsSecretPath_{ExactMatch,BracketNormalisedPrefix,NoFalseMatchOnPrefix,NonSecretPath_NoMatch} + TestRedactValue_PreservesLength + TestFormatFieldChangeLine_{RedactsSecretByDefault,ShowsSecretsWhenFlagSet,NonSecretPathsUnchanged,AcceptedCAsArrayIndex_Redacted} (talm apply: redact secret values in drift preview output #189).

Verification

Local CI gate (host):

go build ./... clean
go test ./... -race all packages green
golangci-lint run ./... 0 issues
GOOS=windows go vet ./... clean

dev17 smokes pending (#190 flag + --help, #192 hand-crafted bad chart, #189 secret rotation diff). #191 covered by unit suite end-to-end via stub reader; live multi-node maintenance state is impractical to fake on the standing test cluster.

Closes #189, closes #190, closes #191, closes #192.

Summary by CodeRabbit

New Features
- Optional flag to show secret values in drift previews.
- Configurable post-upgrade reconcile/verify window (default 90s) with validation.
Bug Fixes
- Stronger net-address validation (IPs, CIDRs, host:port) that blocks malformed entries.
- Default secret redaction with deterministic redaction sentinel; opt-in disclosure when requested.
- Maintenance/insecure output prefixes multi-node lines; single-node output unchanged.
- WireGuard peers with empty/omitted endpoints treated as listener-only.
Documentation
- Updated README and test plans for safety gates, redaction, and upgrade timing.
Tests
- Expanded coverage for validation, redaction, output behavior, and upgrade timing.

coderabbitai · 2026-05-12T22:38:07Z

Warning

Rate limit exceeded

@lexfrei has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 40 minutes and 31 seconds before requesting another review.

You’ve run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 76388ca2-a402-45cc-8296-cfde413dc3f1

📥 Commits

Reviewing files that changed from the base of the PR and between 6d7c654 and b68ec42.

📒 Files selected for processing (16)

README.md
docs/apply-safety-gates-test-plan.md
docs/manual-test-plan.md
pkg/applycheck/refs_netaddr.go
pkg/applycheck/refs_netaddr_test.go
pkg/commands/apply.go
pkg/commands/apply_test.go
pkg/commands/preflight_apply_safety.go
pkg/commands/preflight_apply_safety_redact.go
pkg/commands/preflight_apply_safety_redact_test.go
pkg/commands/preflight_apply_safety_test.go
pkg/commands/preflight_upgrade_verify.go
pkg/commands/preflight_upgrade_verify_test.go
pkg/commands/talosctl_wrapper.go
pkg/commands/upgrade_handler.go
pkg/commands/upgrade_handler_test.go

📝 Walkthrough

Walkthrough

This PR adds a Phase‑1 net‑addr syntactic walker and tests, appends its findings into preflight, introduces an allowlist-based secret-path redaction (default redaction with a --show-secrets-in-drift opt‑in), threads the flag through drift/verify rendering, and makes the post‑upgrade reconcile window configurable with validation. Tests and docs/manual plans updated.

Changes

Multi-phase validation and drift-preview enhancements

Layer / File(s)	Summary
Net-addr validation walker `pkg/applycheck/refs_netaddr.go`, `pkg/applycheck/refs_netaddr_test.go`	New public `WalkNetAddrFindings` validates `StaticHostConfig.name` (IP literal), `NetworkRuleConfig.ingress[].subnet
Phase 1 preflight integration `pkg/commands/preflight_apply_safety.go`	`preflightValidateResources` appends `WalkNetAddrFindings` results to the main findings slice before blocker handling.
Secret redaction utilities `pkg/commands/preflight_apply_safety_redact.go`, `pkg/commands/preflight_apply_safety_redact_test.go`	Adds `secretFieldPaths` allowlist, index-normalizing regex, `isSecretPath` (normalizes numeric indices like `[3]`→`[]`), and `redactValue` sentinel (`*redacted (len=...)*`). Tests validate exact/normalized matching, bracket/index normalization, non-redaction of non-secret paths, false-positive prevention, non-string secret handling, absent-side formatting, and rotation/slice regressions to prevent leaks.
Phase 2 drift-preview and verify secret redaction `pkg/commands/apply.go`, `pkg/commands/preflight_apply_safety.go`	Registers `--show-secrets-in-drift` and threads a `showSecrets` boolean through `previewDrift`, `verifyAppliedState`, `printDriftPreview`, and `formatFieldChangeLine`. Introduces `formatSecretFieldValue` and `absentFieldValue`, and conditionally redacts allowlisted secret paths when `showSecrets` is false. Insecure/maintenance output prints a single `talm:` line prefixed by `nodePrefix(nodeID)` when applicable.
Apply CLI flag and preflight context `pkg/commands/apply.go`, `pkg/commands/apply_test.go`	`apply` gains `--show-secrets-in-drift`; apply code passes the flag into preview/verify calls. `cosiPreflightContext` now falls back to `GlobalArgs.Nodes[0]` only when no outgoing metadata and exactly one node target; tests cover single-node fallback and multi-node no-fallback.
Test call-site updates and new assertions `pkg/commands/preflight_apply_safety_test.go`, multiple test files	Existing tests updated to pass `showSecrets` (`false`) where needed. New integration tests assert maintenance-connection warning prefixing (per-node `node <id>:` vs bare `talm:`) and Wireguard endpoint Phase‑1 validation behavior.
Configurable post-upgrade reconcile window `pkg/commands/upgrade_handler.go`, `pkg/commands/upgrade_handler_test.go`, `pkg/commands/preflight_upgrade_verify.go`, `pkg/commands/talosctl_wrapper.go`	Adds `--post-upgrade-reconcile-window` flag with `defaultPostUpgradeReconcileWindow` (90s) and `validatePostUpgradeReconcileWindow` (rejects non‑positive durations). Validation is performed early in the upgrade command and at verify start; the validated duration is passed into the inner verifier. Operator hints/docs updated to reference the configured window and tests assert registration, validation, fail-fast behavior, and absence of hardcoded “90s”.
Docs and manual test plan expansion `README.md`, `docs/apply-safety-gates-test-plan.md`, `docs/manual-test-plan.md`	Documentation expanded to describe Phase‑1 net‑addr checks, Phase‑2 redaction behavior (including array-index normalization and `--show-secrets-in-drift`), maintenance/insecure messaging differences (single vs multi-node), configurable reconcile window behavior and validation, and manual/regression test cases.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Suggested reviewers

IvanHunters
myasnikovdaniil

🐰 I hopped through YAML, fields tidy and neat,
IPs and endpoints checked before they meet.
Secrets masked by default, show them if you please,
Upgrade windows stretch with flags and gentle ease.

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The PR title accurately summarizes the main change as apply-safety follow-ups bundling four related issues (`#189`, `#190`, `#191`, `#192`) across apply and upgrade functionality.
Docstring Coverage	✅ Passed	Docstring coverage is 93.02% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/apply-safety-followups

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

Generate code and open pull requests
Plan features and break down work
Investigate incidents and troubleshoot customer tickets together
Automate recurring tasks and respond to alerts with triggers
Summarize progress and report instantly

Built for teams:

Shared memory across your entire org—no repeating context
Per-thread sandboxes to safely plan and execute work
Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

pkg/commands/preflight_apply_safety.go (1)
145-146: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Generalize the blocker summary now that net-addr blockers are included.

Line 145 currently reports only unresolved host resources, but blockers can now also be malformed net-addr fields. This summary is now misleading.
✏️ Suggested patch
-		errors.Newf("pre-flight: %d declared host resource(s) do not resolve on the target node", blockers),
+		errors.Newf("pre-flight: %d blocking finding(s) detected in rendered MachineConfig", blockers),
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/commands/preflight_apply_safety.go` around lines 145 - 146, Update the
error message created by errors.Newf that references the blockers variable so it
no longer only mentions unresolved host resources; instead use a generalized
phrase (e.g., "declared resource(s) failed validation or could not be resolved
on the target node") that covers both unresolved host resources and malformed
net-addr fields, and keep the existing guidance about correcting values or using
--skip-resource-validation. Locate the errors.Newf(...) call that builds the
blocker summary (uses the blockers variable) and replace the specific "do not
resolve on the target node" text with the broader wording.
pkg/commands/preflight_upgrade_verify.go (1)
112-112: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Hardcoded "90s window" reference inconsistent with configurable reconcile window.

Line 112's error message still hardcodes "beyond the 90s window", contradicting the configurable --post-upgrade-reconcile-window flag introduced in this PR. An operator passing --post-upgrade-reconcile-window=180s would see "beyond the 90s window" in the error, which is misleading.
🔧 Proposed fix to use dynamic reconcile window reference

Since this error is constructed within verifyPostUpgradeVersion, which doesn't have direct access to the configured window duration, consider either:

Pass the reconcile window as a parameter to verifyPostUpgradeVersion and interpolate it into the error message, or

Replace the hardcoded reference with generic wording like "beyond the configured reconcile window" to match the hint text pattern.

Option 2 is simpler:
-		errors.Newf("post-upgrade: requested upgrade to %s but running version is %s — either Talos auto-rolled back, or the node is still booting beyond the 90s window", target, running),
+		errors.Newf("post-upgrade: requested upgrade to %s but running version is %s — either Talos auto-rolled back, or the node is still booting beyond the configured reconcile window", target, running),
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/commands/preflight_upgrade_verify.go` at line 112, The error message in
verifyPostUpgradeVersion currently hardcodes "beyond the 90s window" which
conflicts with the new --post-upgrade-reconcile-window flag; update
verifyPostUpgradeVersion's error construction (the errors.Newf call that
mentions target and running) to either accept the reconcile window as an
additional parameter and interpolate its value, or (simpler) replace the
hardcoded phrase with a generic "beyond the configured reconcile window" so the
message reflects the configurable behavior; reference the
verifyPostUpgradeVersion function and the specific errors.Newf invocation when
making the change.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pkg/commands/preflight_apply_safety_redact.go`:
- Around line 29-57: The secretFieldPaths allowlist is missing WireGuard-related
secret fields, so add the WireGuard key paths to the secretFieldPaths slice (the
bracket-normalised forms used there) to ensure redaction; for example add
entries like "cluster.wireguard.privateKey", "cluster.wireguard.preSharedKey",
"cluster.wireguard.keys[].privateKey", and the machine-scoped equivalents such
as "machine.wireguard.privateKey" and "machine.wireguard.preSharedKey" to the
secretFieldPaths variable so WireGuard private/PSK values are redacted in diffs.

---

Outside diff comments:
In `@pkg/commands/preflight_apply_safety.go`:
- Around line 145-146: Update the error message created by errors.Newf that
references the blockers variable so it no longer only mentions unresolved host
resources; instead use a generalized phrase (e.g., "declared resource(s) failed
validation or could not be resolved on the target node") that covers both
unresolved host resources and malformed net-addr fields, and keep the existing
guidance about correcting values or using --skip-resource-validation. Locate the
errors.Newf(...) call that builds the blocker summary (uses the blockers
variable) and replace the specific "do not resolve on the target node" text with
the broader wording.

In `@pkg/commands/preflight_upgrade_verify.go`:
- Line 112: The error message in verifyPostUpgradeVersion currently hardcodes
"beyond the 90s window" which conflicts with the new
--post-upgrade-reconcile-window flag; update verifyPostUpgradeVersion's error
construction (the errors.Newf call that mentions target and running) to either
accept the reconcile window as an additional parameter and interpolate its
value, or (simpler) replace the hardcoded phrase with a generic "beyond the
configured reconcile window" so the message reflects the configurable behavior;
reference the verifyPostUpgradeVersion function and the specific errors.Newf
invocation when making the change.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: fa5b912e-f751-4b4a-8ea6-6315ad88b263

📥 Commits

Reviewing files that changed from the base of the PR and between 020d209 and a337835.

📒 Files selected for processing (12)

docs/manual-test-plan.md
pkg/applycheck/refs_netaddr.go
pkg/applycheck/refs_netaddr_test.go
pkg/commands/apply.go
pkg/commands/preflight_apply_safety.go
pkg/commands/preflight_apply_safety_redact.go
pkg/commands/preflight_apply_safety_redact_test.go
pkg/commands/preflight_apply_safety_test.go
pkg/commands/preflight_upgrade_verify.go
pkg/commands/talosctl_wrapper.go
pkg/commands/upgrade_handler.go
pkg/commands/upgrade_handler_test.go

gemini-code-assist

Code Review

This pull request introduces several enhancements to the talm tool, focusing on security, validation, and user experience. Key changes include the implementation of secret redaction in drift previews and post-apply verification (with an opt-in flag to show them), and the addition of syntactic validation for network address fields (IPs, CIDRs, and host:port endpoints) during pre-flight checks. It also adds a configurable post-upgrade reconcile window and improves multi-node logging by adding per-node prefixes to maintenance warnings. Feedback was provided regarding a potential security vulnerability where slice-based diffing might bypass redaction logic if the order of checks is not adjusted.

gemini-code-assist · 2026-05-12T22:43:23Z

+	if oldSlice, newSlice, ok := bothSlices(change); ok {
+		return formatSliceSetDiff(change.Path, oldSlice, newSlice)
+	}


The bothSlices check currently takes precedence over the secret redaction logic. If a field path is marked as a secret in the allowlist but its value is represented as a slice (e.g., during a set-diff update), the formatSliceSetDiff function will be called, which is not redaction-aware. This could lead to sensitive data being leaked in the drift preview if a secret-bearing field is also a slice or if a secret is nested within a slice that is being diffed as a whole. Please ensure that isSecretPath is checked before bothSlices to guarantee redaction for all sensitive paths.

#189+#190+#191+#192) Address branch-review blockers on PR #200: - secretFieldPaths advertised Wireguard private/preshared keys (both in --show-secrets-in-drift help text and in the redact.go file comment) but contained zero Wireguard entries. Rotating WireguardConfig.privateKey via talm apply leaked the base64 key to stderr — the security-class bug the redaction feature exists to prevent. Add bare 'privateKey' + 'peers[].presharedKey' paths to the allowlist (the differ's flatten step does not prefix multidoc fields with the doc kind — see pkg/applycheck/diff.go identityOf). Pinned by TestIsSecretPath_WireguardSecretPaths. - docs/apply-safety-gates-test-plan.md (the narrow gates checklist that manual-test-plan.md cites as canonical) was stale on the new behaviour: no rows for the Phase 1 net-addr walker (#192), the Phase 2A secret redaction (#189), the Phase 2A per-node prefix on the maintenance line (#191), or the Phase 2C --post-upgrade-reconcile-window flag (#190). Add rows in the existing table style for each. - formatSecretFieldValue godoc now disclaims the rotation signal for map-shaped values: Go's fmt.Sprintf('%v', m) on a map[string]any iterates keys in randomised order, so length is not a reliable rotation indicator for maps. Today's allowlist has no map entries; if a future addition does, either canonicalise the map (sorted keys before %v) or disclaim the rotation signal for that entry. Signed-off-by: Aleksei Sviridkin <f@lex.la>

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pkg/commands/preflight_apply_safety.go`:
- Around line 116-123: The appended net-addr findings from
applycheck.WalkNetAddrFindings are expanding the Phase 1 blockers beyond
unresolved host resources, so update the Phase 1 blocker summary text (the code
that constructs the final error/message for blockers where 'findings' is used)
to use a generalized description instead of referencing only unresolved host
resources; locate where 'findings' is counted/converted into the Phase 1 error
message and replace the specific wording with a broader phrase like "Phase 1
blockers detected" or similar to accurately reflect mixed net-addr and
host-resource blockers.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 87b74f74-5c38-4b02-86ff-a23a35fde2fe

📥 Commits

Reviewing files that changed from the base of the PR and between a337835 and 13a8b4a.

📒 Files selected for processing (10)

README.md
docs/apply-safety-gates-test-plan.md
docs/manual-test-plan.md
pkg/commands/apply.go
pkg/commands/apply_test.go
pkg/commands/preflight_apply_safety.go
pkg/commands/preflight_apply_safety_redact.go
pkg/commands/preflight_apply_safety_redact_test.go
pkg/commands/upgrade_handler.go
pkg/commands/upgrade_handler_test.go

✅ Files skipped from review due to trivial changes (1)

docs/apply-safety-gates-test-plan.md

🚧 Files skipped from review as they are similar to previous changes (4)

pkg/commands/upgrade_handler.go
pkg/commands/preflight_apply_safety_redact.go
pkg/commands/preflight_apply_safety_redact_test.go
pkg/commands/upgrade_handler_test.go

coderabbitai · 2026-05-12T23:53:03Z

+
+	netAddrFindings, err := applycheck.WalkNetAddrFindings(rendered)
+	if err != nil {
+		return errors.Wrap(err, "pre-flight: walking rendered MachineConfig for net-addr fields")
+	}
+
+	findings = append(findings, netAddrFindings...)
+


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Broaden the Phase 1 blocker summary wording to match new net-addr blockers.

After appending WalkNetAddrFindings, the blocker count is no longer limited to unresolved host resources. The final error text should be generalized to avoid misleading remediation.

Proposed fix

- return errors.WithHint( - errors.Newf("pre-flight: %d declared host resource(s) do not resolve on the target node", blockers), - "correct the values referenced above, or pass --skip-resource-validation to bypass.", - ) + return errors.WithHint( + errors.Newf("pre-flight: %d validation blocker(s) found in rendered MachineConfig", blockers), + "correct the blocked values referenced above (resource refs / net-addr fields), or pass --skip-resource-validation to bypass.", + )

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/commands/preflight_apply_safety.go` around lines 116 - 123, The appended net-addr findings from applycheck.WalkNetAddrFindings are expanding the Phase 1 blockers beyond unresolved host resources, so update the Phase 1 blocker summary text (the code that constructs the final error/message for blockers where 'findings' is used) to use a generalized description instead of referencing only unresolved host resources; locate where 'findings' is counted/converted into the Phase 1 error message and replace the specific wording with a broader phrase like "Phase 1 blockers detected" or similar to accurately reflect mixed net-addr and host-resource blockers.

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (1)

pkg/commands/preflight_upgrade_verify_test.go (1)

129-152: ⚡ Quick win

Add a regression assertion for configured reconcile-window wording.

These tests pass the new duration argument, but they don’t assert the emitted mismatch text reflects that value (and no longer regresses to hardcoded 90s phrasing). A small assertion here would lock the new behavior.

♻️ Proposed test hardening

 func TestVerifyPostUpgradeVersion_MinorMismatch_Blocks(t *testing.T) {
@@
 	msg := err.Error() + " " + buf.String()
@@
 	if !strings.Contains(msg, "v1.13") {
 		t.Errorf("error should cite target version, got %q", msg)
 	}
+	if !strings.Contains(msg, "configured reconcile window (1ms)") {
+		t.Errorf("error should reference configured reconcile window value, got %q", msg)
+	}
+	if strings.Contains(msg, "90s window") {
+		t.Errorf("error should not regress to hardcoded 90s wording, got %q", msg)
+	}
 }

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/commands/preflight_upgrade_verify_test.go` around lines 129 - 152, The
test TestVerifyPostUpgradeVersion_MinorMismatch_Blocks should also assert that
the emitted mismatch message reflects the provided reconcile-window duration
(the time.Millisecond argument) instead of any hardcoded "90s" text; update the
test to check msg (combined err.Error() + buf.String()) contains the expected
duration wording (e.g., "1ms" or the phrase used by verifyPostUpgradeVersion
like "reconcile-window") so the new behavior is locked in and the message no
longer regresses to "90s".

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/apply-safety-gates-test-plan.md`:
- Line 70: The Markdown table row at the "Real-schema pin" entry is malformed
(too few cells) causing MD056; update the row so it has three pipe-separated
cells matching the table's 3-column layout. Locate the row referencing
"Real-schema pin" and the test names
TestWalkNetAddrFindings_RealSchema_StaticHostConfig and
TestWalkNetAddrFindings_RealSchema_NetworkRuleConfig and ensure the line
contains exactly two pipe separators (three columns), e.g. split the explanatory
text into the appropriate second/third cells and keep the note that the walker
fires on what Talos emits in its own cell so the table renders aligned.

In `@docs/manual-test-plan.md`:
- Line 247: Replace the code span that currently contains the spaced text `node
: ` with a normalized `node:` to satisfy markdownlint MD038; update the example
line that shows `node 192.0.2.10: talm: drift verification unavailable on
maintenance connection` so any code spans use `node:` (no inner spaces) while
leaving the single-node case output `talm: drift verification unavailable on
maintenance connection` unchanged.

---

Nitpick comments:
In `@pkg/commands/preflight_upgrade_verify_test.go`:
- Around line 129-152: The test
TestVerifyPostUpgradeVersion_MinorMismatch_Blocks should also assert that the
emitted mismatch message reflects the provided reconcile-window duration (the
time.Millisecond argument) instead of any hardcoded "90s" text; update the test
to check msg (combined err.Error() + buf.String()) contains the expected
duration wording (e.g., "1ms" or the phrase used by verifyPostUpgradeVersion
like "reconcile-window") so the new behavior is locked in and the message no
longer regresses to "90s".

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 903aa8c6-19ec-4e11-9faf-3353b64fb5fd

📥 Commits

Reviewing files that changed from the base of the PR and between 13a8b4a and 6d7c654.

📒 Files selected for processing (16)

README.md
docs/apply-safety-gates-test-plan.md
docs/manual-test-plan.md
pkg/applycheck/refs_netaddr.go
pkg/applycheck/refs_netaddr_test.go
pkg/commands/apply.go
pkg/commands/apply_test.go
pkg/commands/preflight_apply_safety.go
pkg/commands/preflight_apply_safety_redact.go
pkg/commands/preflight_apply_safety_redact_test.go
pkg/commands/preflight_apply_safety_test.go
pkg/commands/preflight_upgrade_verify.go
pkg/commands/preflight_upgrade_verify_test.go
pkg/commands/talosctl_wrapper.go
pkg/commands/upgrade_handler.go
pkg/commands/upgrade_handler_test.go

🚧 Files skipped from review as they are similar to previous changes (8)

pkg/commands/talosctl_wrapper.go
pkg/commands/preflight_apply_safety_redact.go
pkg/commands/apply_test.go
pkg/commands/upgrade_handler.go
pkg/commands/apply.go
pkg/commands/upgrade_handler_test.go
pkg/commands/preflight_apply_safety_test.go
pkg/commands/preflight_apply_safety.go

coderabbitai · 2026-05-13T08:11:57Z

+  -f nodes/node0.yaml
+```
+
+Expected (per node): `node 192.0.2.10: talm: drift verification unavailable on maintenance connection`. The single-node case (empty `nodeID`, the implicit path) MUST still emit the bare `talm: drift verification unavailable on maintenance connection` line — no `node : ` garbage prefix.


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fix code-span spacing to satisfy markdownlint MD038.

Line 247 has spaces inside a code span (`node : `), which triggers lint and is easy to normalize without changing meaning.

📝 Proposed doc fix

-Expected (per node): `node 192.0.2.10: talm: drift verification unavailable on maintenance connection`. The single-node case (empty `nodeID`, the implicit path) MUST still emit the bare `talm: drift verification unavailable on maintenance connection` line — no `node : ` garbage prefix. +Expected (per node): `node 192.0.2.10: talm: drift verification unavailable on maintenance connection`. The single-node case (empty `nodeID`, the implicit path) MUST still emit the bare `talm: drift verification unavailable on maintenance connection` line — no `node :` garbage prefix.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

Expected (per node): `node 192.0.2.10: talm: drift verification unavailable on maintenance connection`. The single-node case (empty `nodeID`, the implicit path) MUST still emit the bare `talm: drift verification unavailable on maintenance connection` line — no `node : ` garbage prefix.

Expected (per node): `node 192.0.2.10: talm: drift verification unavailable on maintenance connection`. The single-node case (empty `nodeID`, the implicit path) MUST still emit the bare `talm: drift verification unavailable on maintenance connection` line — no `node :` garbage prefix.

🧰 Tools

🪛 markdownlint-cli2 (0.22.1)

[warning] 247-247: Spaces inside code span elements

(MD038, no-space-in-code)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/manual-test-plan.md` at line 247, Replace the code span that currently contains the spaced text `node : ` with a normalized `node:` to satisfy markdownlint MD038; update the example line that shows `node 192.0.2.10: talm: drift verification unavailable on maintenance connection` so any code spans use `node:` (no inner spaces) while leaving the single-node case output `talm: drift verification unavailable on maintenance connection` unchanged.

Four operator-UX polish changes on the safety surfaces introduced by #173. * #191 — Per-node prefix on the maintenance-connection warning. Multi-node insecure apply used to print identical bare 'talm: drift verification unavailable on maintenance connection' lines for every node with no per-node correlation. Both the previewDrift and verifyAppliedState emission sites now wrap through nodePrefix(nodeID); single-node case (empty nodeID) keeps the bare line so the common UX is unchanged. cosiPreflightContext got a fallback path to GlobalArgs.Nodes[0] when no outgoing-context metadata is attached — the maintenance flow (openClientPerNodeMaintenance) pins the node in GlobalArgs but does not attach context metadata, so without the fallback the per-node prefix collapsed to empty even when --nodes was set explicitly. Fallback is single-element-only; multi-element GlobalArgs.Nodes returns empty so a broken caller surfaces as 'no prefix' rather than 'wrong node prefix'. * #190 — Configurable post-upgrade reconcile window. The hardcoded 90s wait after talosctl upgrade returns is now --post-upgrade-reconcile-window. Default defaultPostUpgradeReconcileWindow = 90 * time.Second preserves byte-identical back-compat. validatePostUpgradeReconcileWindow rejects non-positive values at the TOP of wrapUpgradeCommand RunE — fail-fast BEFORE any talosctl RPC fires, so an operator's '=0s' typo cannot land a partial upgrade. The version-mismatch hint copy and error body de-hardcode the '90s' literal to reference 'the configured reconcile window' (defaultPostUpgradeReconcileWindow flows through verifyPostUpgradeVersion). * #192 — Syntactic net-addr walker for three v1alpha1 multidoc kinds. Validates fields the actual Talos schema emits: * StaticHostConfig.name (IP literal; the `name` field on this kind doubles as the IP — there is no separate `address` field). * NetworkRuleConfig.ingress[].subnet and .except (CIDR shapes; no top-level matchSourceAddress[] field exists in Talos's schema). * WireguardConfig.peers[].endpoint (host:port; empty / absent endpoint is a listener-only peer and NOT a finding). Walker runs in parallel with the Ref-based walker via multidocNetAddrHandlers; dispatch-table disjointness with multidocHandlers is pinned by a unit test. Integration tests verify Phase 1 blocks on bad and passes on valid. * #189 — Drift preview redacts secret-bearing field values by default. secretFieldPaths allowlist with bracket-normalisation regex; secret check fires BEFORE bothSlices in formatFieldChangeLine so slice-shaped allowlist entries (cluster.acceptedCAs, machine.acceptedCAs, peers) never leak via formatSliceSetDiff. Length-disclosing redaction sentinel (***redacted (len=N)***) preserves rotation signal without leaking the value. Non-string secret values route through redactValue(fmt.Sprintf("%v", value)) so int / bool rotations still surface as different-length sentinels. Map iteration order non-determinism is disclaimed in the godoc (no map-shaped allowlist entry today). New flag --show-secrets-in-drift (default off) plumbed through previewDrift / verifyAppliedState / printDriftPreview / formatFieldChangeLine. apply.go callers pass applyCmdFlags.showSecretsInDrift. absentFieldValue const hoisted so formatFieldValue and formatSecretFieldValue stay byte-identical on the absent path — add/remove vs rotate stays distinguishable. Doc updates: * README.md bullet #1 (Phase 1) now mentions the net-addr walker; bullet #2 (Phase 2A) mentions redaction + --show-secrets-in-drift + the per-node prefix; bullet #4 (Phase 2C) says 'the configured reconcile window' instead of hardcoding 90s. * docs/manual-test-plan.md gains sections C5/C6/C7 (redact, flag opt-out, walker), D3 (per-node prefix), E3 (reconcile window with help-text + 0s rejection), M6 (false-positive guard for redaction), M7 (walker boundary cases). Forward- looking 'do X, expect Y' shape with explicit regression anchors. * docs/apply-safety-gates-test-plan.md gets matching rows in Phase 1 (net-addr walker), Phase 2A (redaction + per-node prefix), and Phase 2C (reconcile window) tables. Test coverage: * Walker: per-kind table tests with the real schema shape, plus TestWalkNetAddrFindings_RealSchema_StaticHostConfig and ..._NetworkRuleConfig as schema-anchor regression pins so a future drift back to non-existent fields fails these tests rather than passing trivially. * Reconcile window: validator boundary cases, flag default one-(default)-clause pin, fail-fast ordering pin via a sentinel originalRunE, hint-text no-hardcoded-90s pin, a README-no-hardcoded-90s pin so README and code stay in sync. * Per-node prefix: maintenance message on both emission sites, empty-nodeID bare-line guard, single-vs-multi GlobalArgs.Nodes fallback pair. * Redaction: exact-match, Wireguard paths, slice-rotation no-leak (CA-list and peers shapes), bracket-normalisation, false-prefix guard, non-secret control, opt-in flag, non-string redaction with length-signal, absent-side add-vs-rotate distinction. Closes #189, closes #190, closes #191, closes #192. Assisted-By: Claude <noreply@anthropic.com> Signed-off-by: Aleksei Sviridkin <f@lex.la>

coderabbitai Bot reviewed May 12, 2026

View reviewed changes

Comment thread pkg/commands/preflight_apply_safety_redact.go

gemini-code-assist Bot reviewed May 12, 2026

View reviewed changes

coderabbitai Bot reviewed May 12, 2026

View reviewed changes

lexfrei force-pushed the feat/apply-safety-followups branch 2 times, most recently from 4a70764 to 6d7c654 Compare May 13, 2026 08:04

lexfrei self-assigned this May 13, 2026

coderabbitai Bot reviewed May 13, 2026

View reviewed changes

lexfrei force-pushed the feat/apply-safety-followups branch from 6d7c654 to b68ec42 Compare May 13, 2026 08:23

lexfrei merged commit d3c602c into main May 13, 2026
8 checks passed

lexfrei deleted the feat/apply-safety-followups branch May 13, 2026 09:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(apply,upgrade): apply-safety follow-ups (#189 + #190 + #191 + #192)#200

feat(apply,upgrade): apply-safety follow-ups (#189 + #190 + #191 + #192)#200
lexfrei merged 1 commit into
mainfrom
feat/apply-safety-followups

lexfrei commented May 12, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 12, 2026 •

edited

Loading

Rate limit exceeded

Walkthrough

Changes

Estimated code review effort

Suggested reviewers

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot May 12, 2026

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot May 12, 2026

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

coderabbitai Bot May 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	Expected (per node): `node 192.0.2.10: talm: drift verification unavailable on maintenance connection`. The single-node case (empty `nodeID`, the implicit path) MUST still emit the bare `talm: drift verification unavailable on maintenance connection` line — no `node : ` garbage prefix.
	Expected (per node): `node 192.0.2.10: talm: drift verification unavailable on maintenance connection`. The single-node case (empty `nodeID`, the implicit path) MUST still emit the bare `talm: drift verification unavailable on maintenance connection` line — no `node :` garbage prefix.

Conversation

lexfrei commented May 12, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

Why

Surface

Behaviour-pin tests

Verification

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Walkthrough

Changes

Estimated code review effort

Suggested reviewers

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 12, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 12, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot May 13, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

lexfrei commented May 12, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 12, 2026 •

edited

Loading