Skip to content

Refactor backup and restore orchestration pipeline#203

Closed
tis24dev wants to merge 35 commits into
mainfrom
dev
Closed

Refactor backup and restore orchestration pipeline#203
tis24dev wants to merge 35 commits into
mainfrom
dev

Conversation

@tis24dev
Copy link
Copy Markdown
Owner

@tis24dev tis24dev commented May 5, 2026

Summary by CodeRabbit

  • New Features

    • Dual PVE+PBS host backups, expanded interactive restore (now 16 phases), Pushover webhook support, improved restore/archive flows, and an update-check indicator.
  • Improvements

    • Safer external command execution and sanitization, stronger cloud config validation, quoted‑printable email encoding, network-aware feature fallbacks, richer runtime/notification/run summaries, and more robust, phased backup/restore orchestration.
  • Documentation

    • Major docs expanded: restore guides/diagrams, collector architecture, examples, templates, and developer docs.
  • Tests

    • Numerous new and updated unit/integration tests across backup, restore, safeexec, notification, and orchestrator areas.

tis24dev and others added 15 commits April 19, 2026 14:58
…196)

Bumps the actions-updates group with 2 updates in the / directory: [codecov/codecov-action](https://github.com/codecov/codecov-action) and [dependabot/fetch-metadata](https://github.com/dependabot/fetch-metadata).


Updates `codecov/codecov-action` from 5 to 6
- [Release notes](https://github.com/codecov/codecov-action/releases)
- [Changelog](https://github.com/codecov/codecov-action/blob/main/CHANGELOG.md)
- [Commits](codecov/codecov-action@v5...v6)

Updates `dependabot/fetch-metadata` from 2 to 3
- [Release notes](https://github.com/dependabot/fetch-metadata/releases)
- [Commits](dependabot/fetch-metadata@v2...v3)

---
updated-dependencies:
- dependency-name: codecov/codecov-action
  dependency-version: '6'
  dependency-type: direct:production
  update-type: version-update:semver-major
  dependency-group: actions-updates
- dependency-name: dependabot/fetch-metadata
  dependency-version: '3'
  dependency-type: direct:production
  update-type: version-update:semver-major
  dependency-group: actions-updates
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* Add native Pushover support as a webhook format

Pushover requires the application token and user/group key in the JSON
body (not in headers), and its API rejects proxsave's existing generic
payload shape. Until now users had no way to wire Pushover up without
running an external relay.

This adds `pushover` as a first-class WEBHOOK_*_FORMAT alongside
discord/slack/teams/generic. It reuses the existing AUTH_TOKEN /
AUTH_USER fields for Pushover credentials (AUTH_TYPE stays "none"),
introduces an optional WEBHOOK_<NAME>_PRIORITY (-2..1, default 0;
emergency priority 2 deferred), and produces a Pushover-shaped JSON
body with rune-aware truncation to the API's 250-char title and
1024-char message limits.

Validation: missing token/user is reported at payload-build time;
out-of-range priority is rejected by NewWebhookNotifier so misconfig
fails fast at startup rather than per-send.

* Address CodeRabbit review on PR #199

- docs/EXAMPLES.md: replace real-looking Pushover token/user values with
  placeholders so secret scanners stop flagging the example, and update
  the Example 7 scenario header and expected-results bullets so the
  narrative matches the Discord + Pushover configuration block.

- internal/notify/webhook.go: skip the debug payload preview/content
  log when the resolved format is "pushover". The Pushover payload puts
  token + user in the JSON body (not headers), so the existing preview
  would write both credentials into debug logs.

* Validate Pushover method and default-format resolution

Fail fast for invalid Pushover webhook configuration.

- resolve effective webhook format and method centrally
- validate Pushover PRIORITY after default format resolution
- require POST for Pushover endpoints at init time
- add regression tests for default-format and invalid method cases

---------

Co-authored-by: Damiano <71268257+tis24dev@users.noreply.github.com>
Remove t.Parallel() from orchestrator tests to avoid unsafe concurrency around package-level test dependencies such as restoreFS, restoreCmd, and restoreTime.

Fix the decrypt TUI E2E test harness by taking synchronized snapshots of the tcell simulation screen instead of reading screen contents concurrently with rendering.

Verified with:
- go test ./internal/orchestrator -count=1
- go test -race ./internal/orchestrator -count=1
- go vet ./internal/orchestrator
Replace direct exec.CommandContext usages with a new internal/safeexec package and introduce a CommandSpec type for validated command invocation. Collector APIs now accept CommandSpec (with validation and stringification) and use safe execution helpers; many PBS-related collector calls and tests were updated accordingly. Archiver and other components were updated to propagate command creation errors, improving error handling and security when spawning external processes. Added internal/safeexec implementation and tests.
Introduce backgroundRollbackCallKey helper and update numerous rollback tests to use it; add TestRunBackgroundRollbackTimer_UsesPositionalArgsForScriptPath. Add ValidateRemoteRelativePath unit test. Extend internal/safeexec to recognize "false" and "sysctl" commands and add a nosemgrep comment for TrustedCommandContext. Remove unused os/exec import in cmd/proxsave/upgrade.go. Add .github/instructions/codacy.instructions.md with Codacy MCP Server rules and guidelines.
Split the large backup startup/orchestration code into focused cmd/proxsave modules: backup_mode, backup_execution, backup_storage and backup_notifications, and updated main.go to call runBackupMode. Centralized orchestration flow (pre-checks, storage init, notifications, RunGoBackup invocation, stats persistence and exit/status logging) and improved error handling for backup runs. Improved collector validation in internal/backup: added validateExcludePatterns, combined validation checks (ensure at least one collection option, CLI timeouts, pxar concurrency, max PVE backup size and absolute system root prefix) and normalization steps. Also added/updated orchestrator helpers and tests to support the refactor.
Introduce BrickID doc and helper constructors (brick, collectorBrick, pbsCommandBrick, systemCommandBrick, pbsInventoryBrick) to standardize collection-brick creation, and split the large collector_bricks implementation into multiple focused files under internal/backup (common, pbs, pbs_features, pbs_inventory, pbs_runtime, pve, pve_finalize, pve_jobs, pve_storage, system). Remove the in-file recipe and brick definitions from collector_bricks.go and update tests (collector_bricks_test.go) to match the new layout. This reorganizes the code for better separation of concerns and easier maintenance.
Split and reorganize the large monolithic cmd/proxsave/main.go into a modular run pipeline. Main now delegates to staged helper functions (startMainRun, setupRunContext, preparePreRuntimeArgs, prepareRuntime, runRuntime) and most runtime logic and helpers were moved into new files under cmd/proxsave/ (main_config_modes.go, main_defers.go, main_footer.go, main_identity.go, main_lifecycle.go, main_modes.go, main_modes_test.go, main_network.go, main_restore_decrypt.go, main_runtime.go, main_runtime_log.go, main_security.go, main_signals.go, main_state.go, main_support.go, main_update.go). This reduces imports and responsibilities in main.go, improves readability and testability, and adds a focused unit test (main_modes_test.go). No behavioral changes intended—code was relocated and organized for clearer control flow and easier maintenance.
Refactor the large restore.go by extracting archive, extraction, decompression, service/ZFS handling and cluster-apply logic into dedicated files (restore_archive.go, restore_archive_entries.go, restore_archive_extract.go, restore_archive_paths.go, restore_cluster_apply.go, restore_decompression.go, restore_services.go, restore_zfs.go and related additions). Clean up imports and unused globals in restore.go, add ErrRestoreAborted and a doc comment for RunRestoreWorkflow, and update tests to use the new restoreArchiveOptions API for archive extraction. This improves modularity and maintainability of the restore code.
Use quoted-printable encoding for text and HTML parts to ensure 7-bit-safe emails and avoid 8bit transfer encoding. Add encodeQuotedPrintableBody helper and required imports, update Content-Transfer-Encoding headers in buildEmailMessage for both plain/text and html parts, and add a test (TestEmailNotifierBuildEmailMessage_EncodesUTF8BodiesAsSevenBitSafe) that verifies quoted-printable is used and no raw non-ASCII bytes remain in the message.
… the security-patches group across 1 directory (#200)

deps(deps): bump github.com/gdamore/tcell/v2

Bumps the security-patches group with 1 update in the / directory: [github.com/gdamore/tcell/v2](https://github.com/gdamore/tcell).


Updates `github.com/gdamore/tcell/v2` from 2.13.8 to 2.13.9
- [Release notes](https://github.com/gdamore/tcell/releases)
- [Changelog](https://github.com/gdamore/tcell/blob/main/CHANGESv3.md)
- [Commits](gdamore/tcell@v2.13.8...v2.13.9)

---
updated-dependencies:
- dependency-name: github.com/gdamore/tcell/v2
  dependency-version: 2.13.9
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: security-patches
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Comment thread internal/orchestrator/restore_archive_extract.go Fixed
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 5, 2026

Dependency Review

The following issues were found:
  • ✅ 0 vulnerable package(s)
  • ✅ 0 package(s) with incompatible licenses
  • ✅ 0 package(s) with invalid SPDX license definitions
  • ⚠️ 1 package(s) with unknown licenses.
See the Details below.

License Issues

go.mod

PackageVersionLicenseIssue Type
github.com/gdamore/tcell/v22.13.9NullUnknown License
Allowed Licenses: MIT, Apache-2.0, BSD-2-Clause, BSD-3-Clause, ISC, LicenseRef-scancode-google-patent-license-golang

OpenSSF Scorecard

PackageVersionScoreDetails
actions/actions/setup-node 49933ea5288caeca8642d1e84afbd3f7d6820020 🟢 5.8
Details
CheckScoreReason
Maintained🟢 79 commit(s) and 0 issue activity found in the last 90 days -- score normalized to 7
Code-Review🟢 10all changesets reviewed
Binary-Artifacts🟢 9binaries present in source code
Dangerous-Workflow🟢 10no dangerous workflow patterns detected
CII-Best-Practices⚠️ 0no effort to earn an OpenSSF best practices badge detected
Packaging⚠️ -1packaging workflow not detected
Token-Permissions⚠️ 0detected GitHub workflow tokens with excessive permissions
Pinned-Dependencies⚠️ 0dependency not pinned by hash detected -- score normalized to 0
Fuzzing⚠️ 0project is not fuzzed
License🟢 10license file detected
Signed-Releases⚠️ -1no releases found
Security-Policy🟢 9security policy file detected
Branch-Protection⚠️ 1branch protection is not maximal on development and all release branches
SAST🟢 9SAST tool is not run on all commits -- score normalized to 9
actions/dependabot/fetch-metadata 25dd0e34f4fe68f24cc83900b1fe3fe149efef98 🟢 7.7
Details
CheckScoreReason
Code-Review🟢 10all changesets reviewed
Maintained🟢 1030 commit(s) and 0 issue activity found in the last 90 days -- score normalized to 10
Packaging⚠️ -1packaging workflow not detected
Binary-Artifacts🟢 10no binaries found in the repo
Dangerous-Workflow🟢 10no dangerous workflow patterns detected
CII-Best-Practices⚠️ 0no effort to earn an OpenSSF best practices badge detected
Pinned-Dependencies🟢 3dependency not pinned by hash detected -- score normalized to 3
Token-Permissions🟢 9detected GitHub workflow tokens with excessive permissions
Fuzzing⚠️ 0project is not fuzzed
License🟢 10license file detected
Signed-Releases⚠️ -1no releases found
Branch-Protection🟢 5branch protection is not maximal on development and all release branches
Security-Policy🟢 10security policy file detected
SAST🟢 8SAST tool is not run on all commits -- score normalized to 8
gomod/github.com/gdamore/tcell/v2 2.13.9 UnknownUnknown

Scanned Files

  • .github/workflows/dependabot-automerge.yml
  • go.mod

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 5, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Large refactor adding a guarded process-execution layer (internal/safeexec), structured command specs, recipe/brick-based backup collector, staged orchestrator backup phases, a secure restore extraction pipeline, extensive CLI lifecycle wiring, storage/notification initialization, many tests, docs, and small CI/dependency updates.

Proxsave runtime, orchestrator & collector (single cohesive DAG)

Layer / File(s) Summary
Data Shape / Small API
internal/backup/collector.go, internal/config/config.go, internal/config/migration.go, internal/orchestrator/deps.go, cmd/proxsave/main_state.go
Adds CommandSpec type and methods; adds WebhookEndpoint.Priority; cloud-remote/path validators; extends FS/Deps with Lchown/UtimesNano; adds runtime state types (appRuntime, appRunState, modeResult).
Safe exec primitives
internal/safeexec/safeexec.go, internal/safeexec/safeexec_test.go
New safeexec package: CommandContext, TrustedCommandContext, CombinedOutput/Output, validators (ValidateRcloneRemoteName, ValidateRemoteRelativePath, ValidateTrustedExecutablePath, ProcPath) and tests; introduces ErrCommandNotAllowed.
Command wiring / migration
internal/*, cmd/proxsave/*, internal/identity/*, internal/pbs/*, internal/notify/*, cmd/proxsave/install.go
Replaces direct os/exec usage with safeexec (or TrustedCommandContext) and updates Archiver/command-dep signatures to return construction errors; call sites handle command-construction failures.
Collector: recipes & bricks
internal/backup/collector_bricks_*.go, internal/backup/collector_bricks.go, internal/backup/collector.go
Introduces recipe/brick abstraction and many PVE/PBS/system/common bricks; refactors file-copy helpers; standardizes command execution through CommandSpec; adds command-run classification and unified capture helpers.
Orchestrator: backup phases & helpers
internal/orchestrator/backup_run_phases.go, internal/orchestrator/backup_run_helpers.go, internal/orchestrator/orchestrator.go
Implements phased backup workflow (workspace, collect, create archive, verify, bundle, dispatch), backup helper utilities, phase-scoped BackupError mapping, metrics export, and workspace lifecycle.
Restore: extraction & safety
internal/orchestrator/restore_archive*.go, internal/orchestrator/restore_decompression.go, internal/orchestrator/restore_archive_paths.go, internal/orchestrator/restore_archive_entries.go
New restore extraction pipeline: decompressor selection, secure path sanitization, skip rules, typed TAR-entry extraction (dir/file/symlink/hardlink), per-entry logs, and summarized extraction reports.
Runtime & CLI lifecycle
cmd/proxsave/main.go, cmd/proxsave/main_lifecycle.go, cmd/proxsave/main_runtime.go, cmd/proxsave/main_modes.go, cmd/proxsave/main_config_modes.go, cmd/proxsave/main_defers.go, cmd/proxsave/main_footer.go, cmd/proxsave/main_runtime_log.go, cmd/proxsave/main_signals.go, cmd/proxsave/main_identity.go, cmd/proxsave/main_security.go, cmd/proxsave/main_update.go, cmd/proxsave/main_state.go
Refactors run() into a staged lifecycle (start, prepare, runtime, run, finish); adds signal-aware context setup, deferred-run handlers, identity detection, security preflight, update check, mode dispatch, and richer final-summary/footer output.
Backup-mode wiring: storage & notifications
cmd/proxsave/backup_mode.go, cmd/proxsave/backup_execution.go, cmd/proxsave/backup_storage.go, cmd/proxsave/backup_notifications.go
Implements backup-mode orchestration: orchestrator init/config, pre-backup checks, primary/secondary/cloud storage initialization and registration, notifier initialization (email/Telegram/Gotify/webhooks), runtime summary and backup statistics logging.
Notifications, email & pushover
internal/notify/webhook.go, internal/notify/webhook_payloads.go, internal/notify/webhook_test.go, internal/notify/email.go, internal/notify/email_delivery_methods_test.go
Adds webhook format/method normalization, Pushover payload builder and validation (auth + priority + truncation), changes buildPayload signature to accept endpoint, improves payload logging; email bodies encoded quoted-printable and mail-tool calls use safeexec; tests added/updated.
Config, templates & docs
internal/config/config.go, internal/config/migration.go, internal/config/templates/backup.env, docs/*, README.md, .github/instructions/codacy.instructions.md
Cloud remote/path validation added; webhook/pushover config support in templates; extensive docs updates for restore (dual-role model), collector architecture, examples, troubleshooting, developer guide; new Codacy MCP instructions file.
Tests & fixtures
many *_test.go under internal/, cmd/proxsave/
Widespread test migration to CommandSpec, new helpers for bundle/fixture creation, many new tests (safeexec, webhook/pushover, backup/run/orchestrator/restore), and removal of numerous t.Parallel() markers to serialize tests.
CI & deps
.github/workflows/codecov.yml, .github/workflows/dependabot-automerge.yml, go.mod
Bumped Codecov action to v6; Dependabot metadata action to v3 and added Node.js 24 setup step; bumped github.com/gdamore/tcell/v2 to v2.13.9.

Sequence Diagram(s)

sequenceDiagram
    participant CLI as proxsave CLI
    participant Lifecycle as Lifecycle Manager
    participant Runtime as Runtime Bootstrap
    participant Orchestrator as Orchestrator
    participant Collector as Collector (recipes/bricks)
    participant Archiver as Archiver
    participant Storage as Storage Targets

    CLI->>Lifecycle: startMainRun()
    Lifecycle->>Runtime: prepareRuntime(args, config)
    Runtime->>Orchestrator: initializeOrchestrator(cfg, identity)
    Orchestrator->>Collector: runBackupCollector(recipe)
    Collector->>Collector: execute bricks (validate, snapshot, runtime, manifest)
    Collector-->>Orchestrator: collectionStats
    Orchestrator->>Archiver: createBackupArchive(collection)
    Archiver-->>Orchestrator: archivePath, stats
    Orchestrator->>Storage: dispatchPostBackup(archivePath, manifest)
    Storage-->>Orchestrator: dispatch results
    Orchestrator-->>Runtime: final stats/exitCode
    Runtime->>Lifecycle: runDeferredActions()
    Lifecycle->>CLI: os.Exit(exitCode)
Loading
sequenceDiagram
    participant RestoreCLI as proxsave restore
    participant Fetch as Fetch/Decrypt
    participant Extractor as extractArchiveNative
    participant Decompress as Decompression Layer
    participant EntryProc as Entry Processor
    participant Services as Service Manager

    RestoreCLI->>Fetch: select/decrypt candidate
    Fetch-->>RestoreCLI: prepared archive
    RestoreCLI->>Extractor: extractArchiveNative(opts)
    Extractor->>Decompress: createDecompressionReader(file)
    Decompress->>Extractor: reader stream
    Extractor->>EntryProc: processRestoreArchiveEntries(tar reader)
    EntryProc->>EntryProc: sanitize paths, skip rules
    EntryProc->>Services: request stop services (if PBS/PVE)
    Services-->>EntryProc: services stopped/started
    EntryProc-->>Extractor: extraction summary
    Extractor-->>RestoreCLI: final status/logs
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related PRs

"A rabbit hopped through code and bricks,
Commands now vetted with careful ticks,
Archives wrapped and restores made neat,
Pushover pings and emails sweet,
Hooray — the pipeline hums complete."

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch dev

@codacy-production
Copy link
Copy Markdown

codacy-production Bot commented May 5, 2026

Not up to standards ⛔

🔴 Issues 1 critical · 22 medium

Alerts:
⚠ 23 issues (≤ 0 issues of at least minor severity)

Results:
23 new issues

Category Results
Security 1 critical
Complexity 22 medium

View in Codacy

🟢 Metrics 854 complexity · -86 duplication

Metric Results
Complexity 854
Duplication -86

View in Codacy

AI Reviewer: first review requested successfully. AI can make mistakes. Always validate suggestions.

Run reviewer

TIP This summary will be updated as you push new changes.

Reject empty or absolute hardlink Linkname values and normalize them with filepath.FromSlash. Resolve the hardlink target via resolvePathWithinRootFS and use the resolved path for creation to prevent links escaping the extraction root. Add tests: recordingLinkFS to capture Link() args, TestExtractHardlink_UsesResolvedTargetPath to ensure the resolved target is used, and TestExtractHardlink_RejectsSymlinkEscapeTarget to verify symlink-escape targets are rejected.
Copy link
Copy Markdown

@codacy-production codacy-production Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR provides a significant refactor of the backup and restore pipeline but is currently not up to standards due to high-severity security findings and logic errors. Key concerns involve the orchestration logic in internal/backup/collector.go, where certain valid backup sources (custom paths and script repositories) are ignored during validation, potentially breaking existing user workflows.

From a security perspective, the introduction of safeexec is a positive step, but the CI/CD pipeline itself is vulnerable because third-party GitHub Actions are not pinned to specific commit SHAs. Additionally, several core orchestration files, including restore_cluster_apply.go and restore_archive.go, have extremely high cyclomatic complexity (CCN > 70) and are currently flagged as uncovered by tests, posing a high risk for regressions in disaster recovery paths.

About this PR

  • The PR is exceptionally large, encompassing significant refactoring alongside new features and security changes. This increases the risk of regressions in the application's boot sequence and error handling path.

Test suggestions

  • Verify safeexec.CommandContext allowlist correctly permits known binaries and rejects unauthorized paths/commands.
  • Verify Pushover payload construction handles truncation (title 250, message 1024) and priority passthrough.
  • Verify CLI mode compatibility rules (e.g., rejecting --support with --decrypt).
  • Verify that all backup recipes (PVE, PBS, Dual, System) contain unique and complete brick definitions.
  • Verify background rollback timer correctly handles positional arguments to prevent shell injection.
  • Verify email message construction encodes UTF-8 bodies as 7-bit safe quoted-printable strings.
  • Missing unit tests for complex cluster restoration logic in internal/orchestrator/restore_cluster_apply.go.
  • Missing coverage for archive extraction logic in internal/orchestrator/restore_archive.go.
  • Missing tests for estimated backup size and run helper logic in internal/orchestrator/backup_run_helpers.go.
Prompt proposal for missing tests
Consider implementing these tests if applicable:
1. Missing unit tests for complex cluster restoration logic in `internal/orchestrator/restore_cluster_apply.go`.
2. Missing coverage for archive extraction logic in `internal/orchestrator/restore_archive.go`.
3. Missing tests for estimated backup size and run helper logic in `internal/orchestrator/backup_run_helpers.go`.

TIP Improve review quality by adding custom instructions
TIP How was this review? Give us feedback


- name: Upload coverage reports to Codecov
uses: codecov/codecov-action@v5
uses: codecov/codecov-action@v6
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 HIGH RISK

Pin all third-party GitHub Actions to a full-length commit SHA. Using tags like @v6 is convenient but leaves the CI pipeline vulnerable to supply chain risks.

Try running the following prompt in your IDE agent:

Scan the GitHub workflow files and find the latest commit SHAs for codecov/codecov-action@v6 and dependabot/fetch-metadata@v3. Update the workflow files to use these SHAs for pinning.

See Issue in Codacy

Comment on lines +299 to +306
func (c *CollectorConfig) hasCollectionOptionEnabled() bool {
for _, enabled := range c.collectionOptionFlags() {
if enabled {
return true
}
}
return false
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 MEDIUM RISK

The validation logic for enabled collection options omits CustomBackupPaths and BackupScriptRepository. Users who exclusively back up custom paths or the script repository will be blocked by this check.

Try running the following prompt in your coding agent:

Update hasCollectionOptionEnabled in internal/backup/collector.go to also return true if len(c.CustomBackupPaths) > 0 or if c.BackupScriptRepository is enabled.

}

// NewWebhookNotifier creates a new webhook notifier
func NewWebhookNotifier(webhookConfig *config.WebhookConfig, logger *logging.Logger) (*WebhookNotifier, error) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 MEDIUM RISK

Suggestion: Extract the per-endpoint validation logic into a dedicated helper function to simplify the WebhookNotifier initialization and improve modularity for future notification providers.

Try running the following prompt in your IDE agent:

Refactor NewWebhookNotifier in internal/notify/webhook.go by moving the loop that validates individual endpoints into a separate function named validateEndpoints.

See Issue in Codacy

Comment thread internal/backup/collector.go Outdated
}

func (c *Collector) safeCmdOutput(ctx context.Context, cmd, output, description string, critical bool) error {
func (c *Collector) safeCmdOutput(ctx context.Context, spec CommandSpec, output, description string, critical bool) error {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 MEDIUM RISK

Suggestion: Extract the privilege-sensitive error detection and classification logic into a shared private method. This logic is currently duplicated in safeCmdOutput, safeCmdOutputWithPBSAuth, safeCmdOutputWithPBSAuthForDatastore, and captureCommandOutput. Consolidation will significantly reduce the cyclomatic complexity.

Try running the following prompt in your IDE agent:

In internal/backup/collector.go, identify the repetitive code block used to detect and classify privilege-sensitive errors. Extract this into a helper method called handleCommandError.

See Issue in Codacy

// CommandContext creates commands only for binaries that are intentionally
// allowed by the application. Keep exec.CommandContext calls in the switch so
// static analyzers can see literal command names.
func CommandContext(ctx context.Context, name string, args ...string) (*exec.Cmd, error) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚪ LOW RISK

Suggestion: The high cyclomatic complexity is inherent to the security allowlist. Split the switch block into several smaller helper functions grouped by domain (e.g., handleSystemCommands, handleProxmoxCommands) to satisfy linter requirements without losing visibility.

See Issue in Codacy

Comment on lines +154 to +160
func estimatedBackupSizeGB(bytesCollected int64) float64 {
estimatedSizeGB := float64(bytesCollected) / (1024.0 * 1024.0 * 1024.0)
if estimatedSizeGB < 0.001 {
return 0.001
}
return estimatedSizeGB
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚪ LOW RISK

Nitpick: Use established constants (like bytesPerMegabyte) or bit-shifting for byte-to-GB conversions instead of magic numbers to maintain consistency and clarity.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 14

Note

Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (4)
internal/notify/email.go (1)

708-717: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Reuse the resolved binary path instead of re-looking it up at exec time.

These branches call exec.LookPath and then still execute systemctl / mailq by bare name. If PATH changes between the check and the exec, the validated binary is not necessarily the one that runs, and you also miss TrustedCommandContext's absolute-path checks.

Suggested fix
 func (e *EmailNotifier) isMTAServiceActive(ctx context.Context) (bool, string) {
 	services := []string{"postfix", "sendmail", "exim4"}

-	if _, err := exec.LookPath("systemctl"); err != nil {
+	systemctlPath, err := exec.LookPath("systemctl")
+	if err != nil {
 		return false, "systemctl not available"
 	}

 	for _, service := range services {
-		cmd, err := safeexec.CommandContext(ctx, "systemctl", "is-active", service)
+		cmd, err := commandForMailTool(ctx, systemctlPath, "is-active", service)
 		if err != nil {
 			return false, err.Error()
 		}
 func (e *EmailNotifier) checkMailQueue(ctx context.Context) (int, error) {
 	// Try mailq command (works for both Postfix and Sendmail)
-	mailqPath := "/usr/bin/mailq"
-	if _, err := exec.LookPath("mailq"); err != nil {
-		if _, err := exec.LookPath(mailqPath); err != nil {
+	mailqPath, err := exec.LookPath("mailq")
+	if err != nil {
+		mailqPath = "/usr/bin/mailq"
+		if _, err := exec.LookPath(mailqPath); err != nil {
 			return 0, fmt.Errorf("mailq command not found")
 		}
-	} else {
-		mailqPath = "mailq"
 	}

Apply the same pattern in detectQueueEntry.

Also applies to: 776-790, 824-835

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/notify/email.go` around lines 708 - 717, The code calls
exec.LookPath("systemctl") / exec.LookPath("mailq") but still invokes
safeexec.CommandContext with the bare command name, which can run a different
binary; change the code to capture the result of exec.LookPath (e.g. store as
systemctlPath or mailqPath) and pass that absolute path into
safeexec.CommandContext(ctx, systemctlPath, "is-active", service) (and similarly
for mailq invocations and inside detectQueueEntry), so the validated absolute
path is executed and TrustedCommandContext checks apply.
internal/backup/collector_pve.go (2)

783-813: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Return per-node command failures instead of always succeeding.

Both helpers drop the error from captureCommandOutput(...) inside the loop, so they can return nil even when every node query failed. That means the new job/replication abort flags never trip and validated command failures get silently masked. Capture the first error (or aggregate them) and return it after best-effort collection.

Also applies to: 958-988

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/backup/collector_pve.go` around lines 783 - 813,
collectPVEBackupJobHistory currently ignores errors returned by
c.captureCommandOutput so the function can return nil even when per-node
commands fail; change the loop to capture and record the first (or aggregate)
non-nil error from each call to c.captureCommandOutput (e.g. var firstErr error;
after each call do if err := c.captureCommandOutput(...); err != nil && firstErr
== nil { firstErr = err }) and after the loop return firstErr instead of nil;
apply the same pattern to the analogous function around the 958-988 region that
also calls c.captureCommandOutput so command failures are propagated.

882-897: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Propagate schedule command errors back to the brick layer.

collectPVEScheduleCrontab and collectPVEScheduleTimers currently ignore captureCommandOutput(...) failures and always return nil. In the new brick flow, scheduleCollectionAborted only flips when these helpers return an error, so broken crontab/systemctl invocations are reported as success.

Also applies to: 899-913

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/backup/collector_pve.go` around lines 882 - 897,
collectPVEScheduleCrontab currently ignores the error returned by
c.captureCommandOutput and always returns nil; update it to capture and
propagate the error (e.g., assign err := c.captureCommandOutput(...); if err !=
nil return fmt.Errorf("collectPVEScheduleCrontab: %w", err)) so failures bubble
to the brick layer; apply the same fix to collectPVEScheduleTimers (check and
return the error from c.captureCommandOutput there as well) and reference the
captureCommandOutput function when locating the call sites.
internal/orchestrator/restore.go (1)

29-39: ⚠️ Potential issue | 🔴 Critical

Add ClearRestoreAbortInfo() call at the start of each restore workflow to prevent abort state leakage between runs.

The global lastRestoreAbortInfo is assigned during restore abort handling (restore_workflow_ui.go:703) and read at exit (main_defers.go:25), but ClearRestoreAbortInfo() is never called anywhere in the codebase. This leaves abort info from a previous interrupted restore visible to the next run, causing stale rollback details to be displayed.

Call ClearRestoreAbortInfo() at the beginning of runRestoreWorkflowWithUI() (internal/orchestrator/restore_workflow_ui.go:49) to ensure each workflow starts with clean state.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/orchestrator/restore.go` around lines 29 - 39, Stale abort state is
left in the global lastRestoreAbortInfo because ClearRestoreAbortInfo() is never
invoked; fix this by calling ClearRestoreAbortInfo() at the start of the restore
flow inside runRestoreWorkflowWithUI() so each run resets lastRestoreAbortInfo
(type RestoreAbortInfo) before any restore logic executes—open the
runRestoreWorkflowWithUI() function and insert a ClearRestoreAbortInfo() call as
the first statement.
🟡 Minor comments (11)
.github/instructions/codacy.instructions.md-1-5 (1)

1-5: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fix the YAML front matter structure.

The front matter has a duplicate delimiter on line 5 that creates an empty second YAML document, which may confuse parsers. Additionally, the YAML keys use non-standard 4-space indentation.

📝 Proposed fix for YAML front matter
 ---
-    description: Configuration for AI behavior when interacting with Codacy's MCP Server
-    applyTo: '**'
+description: Configuration for AI behavior when interacting with Codacy's MCP Server
+applyTo: '**'
 ---
----
 # Codacy Rules
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/instructions/codacy.instructions.md around lines 1 - 5, Remove the
duplicate YAML document delimiter and normalize indentation in the front matter:
delete the extra '---' that starts a second empty document and ensure the keys
'description' and 'applyTo' use standard 2-space indentation (or no indentation)
under the single '---' header so the front matter is a single, well-formed YAML
document.
cmd/proxsave/main_update.go-89-96 (1)

89-96: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

isNewerVersion mis-parses versions with + build metadata.

At Line 95, strconv.Atoi("3+build") becomes 0 (error ignored), which can produce wrong update decisions for valid semver tags like v1.2.3+meta.

💡 Proposed fix
-		if i := strings.IndexByte(v, '-'); i >= 0 {
+		if i := strings.IndexAny(v, "-+"); i >= 0 {
 			v = v[:i]
 		}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cmd/proxsave/main_update.go` around lines 89 - 96, The version comparison
currently ignores build metadata, so strings like "v1.2.3+meta" get mis-parsed
(strconv.Atoi("3+build") returns 0); update isNewerVersion to strip build
metadata before splitting by checking for '+' on v (similar to the existing '-'
handling) and trim v = v[:i] when a '+' is found, then proceed with parts :=
strings.Split(v, ".") and the toInt helper as-is so numeric fields parse
correctly; reference the variables/function names v, parts and toInt in
isNewerVersion when applying this change.
README.md-17-17 (1)

17-17: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Clean up PayPal donate URL to follow standard format.

The donate URL mixes incompatible PayPal formats. While the URL is currently accessible, it combines the modern format donate/?hosted_button_id= with legacy parameters (cmd=_donations&business=email), and the button ID parameter is empty. Follow one standard format instead:

  • Modern format (recommended): https://www.paypal.com/donate/?hosted_button_id=YOUR_BUTTON_ID
  • Legacy format: https://www.paypal.com/cgi-bin/webscr?cmd=_donations&business=YOUR_EMAIL

The modern format is preferred as it doesn't expose your email address in the URL.

Suggested fix

If you have a PayPal hosted button ID:

-[![💸 Donate](https://img.shields.io/badge/Donate-PayPal-blue?logo=paypal)](https://www.paypal.com/donate/?hosted_button_id=&cmd=_donations&business=damigioanna%40gmail.com)
+[![💸 Donate](https://img.shields.io/badge/Donate-PayPal-blue?logo=paypal)](https://www.paypal.com/donate/?hosted_button_id=YOUR_BUTTON_ID)

Otherwise, use the legacy email-based format:

-[![💸 Donate](https://img.shields.io/badge/Donate-PayPal-blue?logo=paypal)](https://www.paypal.com/donate/?hosted_button_id=&cmd=_donations&business=damigioanna%40gmail.com)
+[![💸 Donate](https://img.shields.io/badge/Donate-PayPal-blue?logo=paypal)](https://www.paypal.com/cgi-bin/webscr?cmd=_donations&business=damigioanna%40gmail.com)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@README.md` at line 17, The PayPal donate link in the README uses a
mixed/incorrect format; replace the current URL in the badge link (the href that
starts with "https://www.paypal.com/donate/?hosted_button_id=") with a single
standard format: preferably the modern form
"https://www.paypal.com/donate/?hosted_button_id=YOUR_BUTTON_ID" (replace
YOUR_BUTTON_ID with your actual hosted button id), or if you must use the legacy
form use
"https://www.paypal.com/cgi-bin/webscr?cmd=_donations&business=YOUR_EMAIL";
update the markdown badge link accordingly so the donate button points to one
clean, valid PayPal URL.
docs/EXAMPLES.md-942-944 (1)

942-944: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Use the actual PXAR config key in the example.

At Line 943, BACKUP_PXAR_FILES=true does not match the current template key naming (PXAR_SCAN_ENABLE). This can cause copy-paste misconfiguration.

✅ Suggested fix
-# Recommended for dual labs: keep diagnostics
-BACKUP_PXAR_FILES=true
+# Recommended for dual labs: keep diagnostics
+PXAR_SCAN_ENABLE=true
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/EXAMPLES.md` around lines 942 - 944, The example uses the outdated
config key BACKUP_PXAR_FILES; replace it with the current PXAR_SCAN_ENABLE key
so examples match runtime config. Update the snippet that sets
BACKUP_PXAR_FILES=true to use PXAR_SCAN_ENABLE=true (and adjust any surrounding
explanatory text that references the old key name), ensuring all mentions of
BACKUP_PXAR_FILES are changed to PXAR_SCAN_ENABLE across the example block and
its description.
internal/backup/collector_pve_patterns_test.go-210-212 (1)

210-212: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Restore the real whitespace-only input in this test case.

At Line 211, the "whitespace only uses default" case now uses "", which duplicates the empty-path case and drops whitespace-only coverage.

✅ Suggested fix
 		{
 			name:       "whitespace only uses default",
-			configPath: "",
+			configPath: "   ",
 			expected:   "/var/lib/pve-cluster",
 		},
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/backup/collector_pve_patterns_test.go` around lines 210 - 212, The
test case named "whitespace only uses default" was changed to an empty string,
duplicating the empty-path case and losing coverage for whitespace-only input;
revert the test data for that case so configPath contains a whitespace-only
string (e.g., a single space or tab) instead of "", locating the case by its
name in internal/backup/collector_pve_patterns_test.go and restoring the
original whitespace-only value to exercise the whitespace handling path.
internal/backup/collector_pve_util_test.go-1124-1126 (1)

1124-1126: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

This case should keep a whitespace-only value, not an empty string.

At Line 1125, the "whitespace only uses default" test now duplicates the empty-string scenario and no longer checks trim-to-default behavior.

✅ Suggested fix
 		{
 			name:       "whitespace only uses default",
-			configPath: "",
+			configPath: "   ",
 			expected:   "/etc/pve",
 		},
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/backup/collector_pve_util_test.go` around lines 1124 - 1126, The
test case named "whitespace only uses default" currently sets configPath to an
empty string instead of a whitespace-only string; update that case in the table
so configPath is a whitespace-only value (e.g., " " or "\t") while leaving
expected as "/etc/pve" to verify trim-to-default behavior; specifically modify
the test entry for the case with name "whitespace only uses default" to use a
whitespace-only configPath rather than "" so the logic in the code paths that
handle trimming/empty-defaults (the test harness invoking the collector_pve
util) is actually exercised.
internal/orchestrator/restore_archive_entries.go-46-50 (1)

46-50: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

/etc/pve prefix check has a false-positive class.

strings.HasPrefix(target, "/etc/pve") will also match unrelated paths like /etc/pveuser, /etc/pveothers, etc., silently skipping them on system-root restores. Use an exact-or-slash-prefix match so only /etc/pve itself and entries strictly under it are blocked.

🛡️ Proposed fix
-	// Hard guard: never write directly into /etc/pve when restoring to system root
-	if strings.HasPrefix(target, "/etc/pve") {
+	// Hard guard: never write directly into /etc/pve when restoring to system root
+	if target == "/etc/pve" || strings.HasPrefix(target, "/etc/pve/") {
 		logger.Warning("Skipping restore to %s (writes to /etc/pve are prohibited)", target)
 		return true, nil
 	}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/orchestrator/restore_archive_entries.go` around lines 46 - 50, The
current guard in restore_archive_entries.go uses strings.HasPrefix(target,
"/etc/pve") which produces false positives (e.g., "/etc/pveuser"); update the
check in the function containing this guard so it only matches the exact path or
paths under it by using a stricter condition like target == "/etc/pve" ||
strings.HasPrefix(target, "/etc/pve/"), then keep the existing
logger.Warning(...) and return true, nil behavior when that condition is met.
docs/RESTORE_GUIDE.md-513-522 (1)

513-522: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add a language identifier to the fenced code block.

markdownlint (MD040) flags the opening fence at Line 513. The block contains operator-facing console output, so text (or console) is appropriate and aligns with the surrounding sample blocks.

📝 Proposed fix
 **Partial-compatibility warning**:
-```
+```text
 ⚠ WARNING: Partial compatibility detected
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/RESTORE_GUIDE.md` around lines 513 - 522, The fenced code block showing
the "⚠ WARNING: Partial compatibility detected" console output needs a language
identifier to satisfy markdownlint MD040; update the opening triple backticks
for that block to include a language like "text" (or "console") so the block
becomes ```text (or ```console) and leave the block contents unchanged.
internal/orchestrator/restore_archive_extract.go-206-209 (1)

206-209: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

The restore summary misreports failures.

Total files in archive excludes filesFailed, and the warning always says “see detailed log” even when opts.logFilePath is empty. That makes failed restores look smaller than they were and can send users to a log that doesn't exist.

🩹 Proposed fix
 func (log *restoreExtractionLog) writeSummary(stats restoreExtractionStats) {
 	if log.logFile == nil {
 		return
 	}
 	fmt.Fprintf(log.logFile, "=== FILES RESTORED ===\n")
 	log.copyTempEntries(log.restoredTemp, "restored")
 	fmt.Fprintf(log.logFile, "\n")
@@
 	fmt.Fprintf(log.logFile, "=== SUMMARY ===\n")
 	fmt.Fprintf(log.logFile, "Total files extracted: %d\n", stats.filesExtracted)
 	fmt.Fprintf(log.logFile, "Total files skipped: %d\n", stats.filesSkipped)
-	fmt.Fprintf(log.logFile, "Total files in archive: %d\n", stats.filesExtracted+stats.filesSkipped)
+	fmt.Fprintf(log.logFile, "Total files failed: %d\n", stats.filesFailed)
+	fmt.Fprintf(log.logFile, "Total files in archive: %d\n", stats.filesExtracted+stats.filesSkipped+stats.filesFailed)
 }
@@
 	} else {
-		opts.logger.Warning("Restored %d files/directories; %d item(s) failed (see detailed log)", stats.filesExtracted, stats.filesFailed)
+		if opts.logFilePath != "" {
+			opts.logger.Warning("Restored %d files/directories; %d item(s) failed (see detailed log)", stats.filesExtracted, stats.filesFailed)
+		} else {
+			opts.logger.Warning("Restored %d files/directories; %d item(s) failed", stats.filesExtracted, stats.filesFailed)
+		}
 	}

Also applies to: 231-239

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/orchestrator/restore_archive_extract.go` around lines 206 - 209, The
summary incorrectly omits failed files and always points users to a log; update
the summary logic in restore_archive_extract.go (the block that writes to
log.logFile and the second similar block) to include stats.filesFailed in the
"Total files in archive" calculation (use filesExtracted + filesSkipped +
filesFailed) and only emit the "see detailed log" warning when opts.logFilePath
is non-empty (or log.logFile is actually opened). Ensure you update both
occurrences that format the summary so the totals and conditional warning use
stats.filesFailed and opts.logFilePath respectively.
cmd/proxsave/main_lifecycle.go-34-42 (1)

34-42: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Recover before logging a clean shutdown.

If runRuntime panics, this function emits exit_code=%d and calls run.runDone(nil) before recover(), so the bootstrap log records a successful run even though the process exits with ExitPanicError.

🩹 Proposed fix
 func finishMainRun(run runBootstrap) {
-	logging.DebugStepBootstrap(run.bootstrap, "main run", "exit_code=%d", run.state.finalExitCode)
-	run.runDone(nil)
 	if r := recover(); r != nil {
 		stack := debug.Stack()
+		logging.DebugStepBootstrap(run.bootstrap, "main run", "exit_code=%d", types.ExitPanicError.Int())
+		run.runDone(fmt.Errorf("panic: %v", r))
 		run.bootstrap.Error("PANIC: %v", r)
 		fmt.Fprintf(os.Stderr, "panic: %v\n%s\n", r, stack)
 		os.Exit(types.ExitPanicError.Int())
 	}
+	logging.DebugStepBootstrap(run.bootstrap, "main run", "exit_code=%d", run.state.finalExitCode)
+	run.runDone(nil)
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cmd/proxsave/main_lifecycle.go` around lines 34 - 42, The current
finishMainRun calls logging.DebugStepBootstrap and run.runDone before checking
recover(), causing successful shutdown logs on panic; update finishMainRun to
install a deferred recover handler at the top (using a defer func() { if r :=
recover() { ... } }()) so that any panic is caught before emitting the
"exit_code" log or calling run.runDone; inside that defer, use debug.Stack(),
call run.bootstrap.Error("PANIC: %v", r), print to stderr, and
os.Exit(types.ExitPanicError.Int()) to preserve existing behavior, and leave the
normal path to call logging.DebugStepBootstrap and run.runDone only when no
panic occurred.
internal/backup/collector_bricks_pbs.go-110-124 (1)

110-124: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Raise log level for non-ENOENT parse failures to surface silent degradation.

When loadPBSUserIDsFromCommandFile fails for a reason other than os.ErrNotExist (e.g., malformed user list), this brick logs at Debug and returns success with userIDs = nil. Downstream brickPBSRuntimeAccessUserTokens and brickPBSRuntimeAccessTokensAggregate early-return on empty userIDs, so PBS API token snapshots are silently skipped without anything visible in normal logs. Consider Warning (or at least Info) for the parse-error path so operators notice when token export is disabled due to a broken file rather than a missing one.

🔧 Suggested adjustment
-					if errors.Is(err, os.ErrNotExist) {
-						state.collector.logger.Debug("User list not available for token export: %v", err)
-						state.pbs.userIDs = nil
-						return nil
-					}
-					state.collector.logger.Debug("Failed to parse user list for token export: %v", err)
-					state.pbs.userIDs = nil
-					return nil
+					if errors.Is(err, os.ErrNotExist) {
+						state.collector.logger.Debug("User list not available for token export: %v", err)
+						state.pbs.userIDs = nil
+						return nil
+					}
+					state.collector.logger.Warning("Failed to parse PBS user list; skipping API token snapshots: %v", err)
+					state.pbs.userIDs = nil
+					return nil
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/backup/collector_bricks_pbs.go` around lines 110 - 124, The
parse-failure branch in the Run function that calls
state.collector.loadPBSUserIDsFromCommandFile currently logs failures
(non-os.ErrNotExist) at Debug and swallows the error; change the logger call for
that branch (where you currently call state.collector.logger.Debug("Failed to
parse user list for token export: %v", err)) to a higher level
(state.collector.logger.Warning or Info) so malformed PBS user-list errors are
visible to operators, but keep the behavior of setting state.pbs.userIDs = nil
and returning nil.
🧹 Nitpick comments (19)
.github/instructions/codacy.instructions.md (1)

11-13: 💤 Low value

Consider standardizing nested list indentation.

The nested list items use 1-space indentation, which violates the MD005 linting rule. Standard markdown convention uses 2 or 4 spaces for nested lists to ensure consistent rendering across different parsers.

📋 Example fix (using 2-space indent)
 ## CRITICAL: After ANY successful `edit_file` or `reapply` operation
 - YOU MUST IMMEDIATELY run the `codacy_cli_analyze` tool from Codacy's MCP Server for each file that was edited, with:
- - `rootPath`: set to the workspace path
- - `file`: set to the path of the edited file
- - `tool`: leave empty or unset
+   - `rootPath`: set to the workspace path
+   - `file`: set to the path of the edited file
+   - `tool`: leave empty or unset

Apply similar changes to all nested list items throughout the file (lines 28-29, 37-46, 48-50, 52-54).

Also applies to: 28-29, 37-54

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/instructions/codacy.instructions.md around lines 11 - 13, The nested
list items under the bullets for `rootPath`, `file`, and `tool` use a 1-space
indent which violates MD005; update all nested list indentation to use a
consistent 2-space indent (e.g., change the child list lines after `- rootPath`,
`- file`, and `- tool` and the other nested lists referenced on lines 28-29,
37-46, 48-50, 52-54) so the markdown follows standard nested-list convention and
passes the lint rule.
cmd/proxsave/main_config_modes.go (1)

155-178: 💤 Low value

Two consecutive "successfully" lines in the apply-result output.

bootstrap.Println("Configuration upgraded successfully!") on entry and bootstrap.Println("✓ Configuration upgrade completed successfully.") on exit say essentially the same thing. Consider keeping only the trailing checkmark line so the apply summary frames the details cleanly between the header and the final confirmation.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cmd/proxsave/main_config_modes.go` around lines 155 - 178, In
printConfigUpgradeApplyResult remove the initial redundant
bootstrap.Println("Configuration upgraded successfully!") so the function only
prints the detailed bullets and finishes with the existing trailing confirmation
bootstrap.Println("✓ Configuration upgrade completed successfully."); this keeps
the header-less details framed cleanly and avoids the duplicate success message.
internal/orchestrator/restore_cluster_apply.go (1)

290-302: 💤 Low value

Reusing parseProxmoxNotificationHeader for storage.cfg is a maintenance landmine.

storage.cfg and notifications.cfg happen to share a <type>: <id> header shape today, but they are independent on-disk formats that may evolve separately. Coupling the storage parser to a notification helper means any future change to notification header parsing silently changes how storage blocks are detected.

Consider extracting a small, neutrally named header parser (e.g. parseSectionHeader) that both call sites delegate to, or inline the trivial <type>: <id> split here.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/orchestrator/restore_cluster_apply.go` around lines 290 - 302, The
code is reusing parseProxmoxNotificationHeader to detect storage.cfg section
headers which couples storage parsing to notification-specific logic; extract a
small neutral parser (e.g. parseSectionHeader) that implements the simple
"<type>: <id>" split and returns (ok, type, id) and replace the call to
parseProxmoxNotificationHeader in restore_cluster_apply.go with
parseSectionHeader so storageBlock creation uses the generic header parser;
ensure parseProxmoxNotificationHeader (if still needed) can delegate to
parseSectionHeader to avoid duplicating logic.
internal/orchestrator/restore_archive_entries.go (1)

220-224: 💤 Low value

os.Lchown (and syscall.UtimesNano at line 270) bypass restoreFS.

Every other filesystem op in this file goes through restoreFS (MkdirAll, CreateTemp, Rename, Symlink, Readlink, Link, Remove), which makes the extractor testable against a fake FS. Lchown on the symlink and setTimestamps on the target both reach the real OS directly, so any FakeFS-backed test root will silently fall back to host paths for these calls (and chown/lchown will simply fail in unprivileged tests).

If FS doesn't yet expose Lchown/UtimesNano, consider adding thin methods so the abstraction stays consistent; otherwise gate these on a "real FS" check.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/orchestrator/restore_archive_entries.go` around lines 220 - 224, The
calls to os.Lchown (in the symlink ownership block) and syscall.UtimesNano (used
by setTimestamps) bypass the restoreFS abstraction and should be routed through
the FS abstraction; add thin methods Lchown(path string, uid, gid int) error and
UtimesNano(path string, times []syscall.Timespec) error to the FS interface (and
its real-fs implementation), then replace direct os.Lchown and
syscall.UtimesNano usages with restoreFS.Lchown and restoreFS.UtimesNano in
restore_archive_entries.go (including setTimestamps) so tests using a FakeFS
exercise the same code paths; if you cannot add those methods immediately, gate
the direct calls behind a real-FS check on restoreFS and document the fallback
behavior.
internal/environment/detect.go (1)

341-365: 💤 Low value

Optional: simplify the absolute-vs-allowlist branching in runCommand.

Current callers always pass an absolute path obtained via lookPathFunc, so the non-absolute branch is effectively dead code today. Both pveversion and proxmox-backup-manager are in the safeexec allowlist anyway, so collapsing the two branches into a single safeexec.TrustedCommandContext call (with a precondition that callers always resolve via LookPath) would reduce surface area and make the trust contract explicit. Not a blocker — current behavior is correct for both branches.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/environment/detect.go` around lines 341 - 365, The runCommand
function currently branches between commandContextFunc for absolute paths and
safeexec.CommandContext for others, but callers always pass absolute paths from
lookPathFunc and the binaries are already in the safeexec allowlist; simplify by
collapsing the branches and always using safeexec.TrustedCommandContext (or the
safeexec entrypoint chosen for trusted, e.g., TrustedCommandContext) to make the
trust contract explicit: replace the filepath.IsAbs check and use
safeexec.TrustedCommandContext(ctx, command, args...) (removing
commandContextFunc usage) and ensure callers still resolve commands via
lookPathFunc before calling runCommand.
internal/pbs/namespaces_test.go (1)

216-223: ⚡ Quick win

Add one stub scenario for command-construction failure.

The production code now returns early when execCommand(...) itself fails, but this helper always returns nil, so that new branch is still untested. A tiny case that returns errors.New(...) would lock in the intended propagation behavior.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/pbs/namespaces_test.go` around lines 216 - 223, Add a test stub case
that simulates execCommand failing by replacing the test helper's execCommand
variable to return (nil, errors.New("...")) for a unique scenario name so the
branch in the production function that returns early on execCommand error is
exercised; modify the test in namespaces_test.go to include a scenario (e.g.,
"cmd-failure") and in its setup set execCommand = func(context.Context, string,
...string) (*exec.Cmd, error) { return nil, errors.New("simulated execCommand
failure") } then assert the caller (the function under test) propagates that
error as expected.
internal/notify/webhook.go (1)

81-90: ⚡ Quick win

Validate missing Pushover credentials during notifier construction.

A Pushover endpoint with an empty Auth.Token or Auth.User still passes NewWebhookNotifier and only fails later on the first send inside buildPushoverPayload. Failing fast here makes bad configs visible before a backup run starts.

Suggested fix
 		format := resolveWebhookFormat(ep.Format, webhookConfig.DefaultFormat)
 		method := resolveWebhookMethod(ep.Method)
 		if strings.EqualFold(format, "pushover") {
+			if strings.TrimSpace(ep.Auth.Token) == "" {
+				return nil, fmt.Errorf("webhook endpoint %q: AUTH_TOKEN is required for pushover", ep.Name)
+			}
+			if strings.TrimSpace(ep.Auth.User) == "" {
+				return nil, fmt.Errorf("webhook endpoint %q: AUTH_USER is required for pushover", ep.Name)
+			}
 			if ep.Priority < -2 || ep.Priority > 1 {
 				return nil, fmt.Errorf("webhook endpoint %q: PRIORITY must be in range -2..1 (got %d); priority 2 (emergency) is not supported", ep.Name, ep.Priority)
 			}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/notify/webhook.go` around lines 81 - 90, When format resolved by
resolveWebhookFormat equals "pushover" in the NewWebhookNotifier construction
path, validate that ep.Auth.Token and ep.Auth.User are non-empty and return a
descriptive error if either is missing (e.g., "webhook endpoint %q: Pushover
requires Auth.Token and Auth.User; missing token/user"), in the same block that
currently checks ep.Priority and method so invalid Pushover configs fail fast;
reference the endpoint fields ep.Auth.Token and ep.Auth.User and the later
buildPushoverPayload function to ensure this moves credential validation from
send-time to notifier construction.
internal/safeexec/safeexec_test.go (1)

11-95: ⚡ Quick win

Expand the allowlist tests to cover the commands newly routed through safeexec.

CommandContext is only asserted for rclone, but this PR now depends on additional names like tar, xz, zstd, systemctl, mailq, tail, journalctl, pvesh, pveum, and proxmox-backup-manager. If one of those drops out of the allowlist, backup or email flows regress at runtime without a unit test catching it.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/safeexec/safeexec_test.go` around lines 11 - 95, The
TestCommandContextAllowlist currently only asserts "rclone" is allowed; update
it to also call CommandContext(context.Background(), ...) and assert no error
for each of the newly routed commands: "tar", "xz", "zstd", "systemctl",
"mailq", "tail", "journalctl", "pvesh", "pveum", and "proxmox-backup-manager"
(alongside the existing "rclone" check), keeping the existing negative checks
for unknown commands and absolute paths (ErrCommandNotAllowed); modify the
TestCommandContextAllowlist function to loop or add assertions that each listed
command returns nil error from CommandContext so regressions are caught.
internal/backup/collector.go (1)

971-1069: 🏗️ Heavy lift

Extract the shared command execution/error-classification path before these helpers drift further.

The validation, LookPath, dry-run, pvesh timeout, output summarization, and privilege-sensitive downgrade logic is now copied across multiple helpers. In this state, a behavior fix in one path is easy to miss in the others. Pulling the common execution/result-classification flow behind one helper would make this refactor much safer to maintain.

Also applies to: 1073-1239, 1333-1445

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/backup/collector.go` around lines 971 - 1069, The code in
safeCmdOutput duplicates a full command-execution and error-classification flow
(validate, depLookPath, dry-run, pvesh timeout, depRunCommand, summarize output,
privilege-detection/downgrade, exit-code handling, and logging) that is repeated
in other helpers — extract this shared flow into a single helper (e.g.,
runAndClassifyCommand or executeAndClassifyCommand) that returns the command
output, an error classification (critical vs non-critical vs downgraded-to-skip)
and contextual info; then have safeCmdOutput call that helper, use its returned
output to call writeReportFile, and handle final logging/metrics there; ensure
the helper encapsulates use of spec.validate(), c.depLookPath(spec.Name),
dry-run branch, pvesh timeout creation, c.depRunCommand(runCtx,...),
summarization via summarizeCommandOutputText, privilege detection via
c.depDetectUnprivilegedContainer() and privilegeSensitiveFailureMatch, and
returns errors or nil consistent with the existing semantics so other duplicated
helpers can be refactored to call it.
cmd/proxsave/main_defers.go (2)

77-86: 💤 Low value

Use defer f.Close() for the heap profile file.

pprof.WriteHeapProfile(f) may panic or otherwise return early; deferring Close ensures the descriptor is released regardless of path. At program exit it doesn't strictly matter, but it's idiomatic.

♻️ Suggested fix
 	f, err := os.Create(rt.heapProfilePath)
 	if err != nil {
 		logging.Warning("Failed to create heap profile file: %v", err)
 		return
 	}
+	defer f.Close()
 	if err := pprof.WriteHeapProfile(f); err != nil {
 		logging.Warning("Failed to write heap profile: %v", err)
 	}
-	_ = f.Close()
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cmd/proxsave/main_defers.go` around lines 77 - 86, The heap profile file
created with os.Create(rt.heapProfilePath) is closed explicitly after
pprof.WriteHeapProfile(f) which can miss the Close if WriteHeapProfile panics or
returns early; change the code in the block that creates f so that immediately
after successfully creating f you call defer f.Close() (instead of the later _ =
f.Close()), leaving the pprof.WriteHeapProfile(f) call as-is so the descriptor
is always released even on early returns or panics.

16-43: ⚡ Quick win

Document the LIFO execution-order contract.

Because the caller in runRuntime does for _, deferredAction := range runDeferredActions(...) { defer deferredAction() }, the slice executes in reverse order. This is load-bearing here: dispatchDeferredEarlyErrorNotification (slice index 3) sets state.pendingSupportStat, which sendDeferredSupportEmail (slice index 2) then reads — that data dependency only works because LIFO causes index 3 to run before index 2. A short comment above the slice will prevent a future reorder from silently breaking support-email delivery.

♻️ Suggested clarifying comment
 func runDeferredActions(rt *appRuntime, state *appRunState) []runDeferredAction {
+	// NOTE: callers `defer` each entry, so execution is LIFO relative to
+	// this slice. dispatchDeferredEarlyErrorNotification must remain after
+	// sendDeferredSupportEmail in slice order so that the support stats it
+	// produces are observable when the email is sent.
 	return []runDeferredAction{
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cmd/proxsave/main_defers.go` around lines 16 - 43, Add a clear LIFO-order
comment to runDeferredActions explaining that callers (notably runRuntime) defer
each returned runDeferredAction in a loop, causing the slice to execute in
reverse; call out the data dependency where
dispatchDeferredEarlyErrorNotification must run before sendDeferredSupportEmail
because dispatchDeferredEarlyErrorNotification sets state.pendingSupportStat
which sendDeferredSupportEmail reads, and warn maintainers not to reorder the
slice entries or change the defer pattern.
cmd/proxsave/main_signals.go (2)

15-15: 💤 Low value

closeStdinOnce as a package-level sync.Once is not test-safe.

Once consumed by any invocation, it can never close stdin again, and it shares state across any concurrent test using setupRunContext. If setupRunContext is intended strictly for the single main() call this is fine; otherwise consider scoping the sync.Once to the returned context (e.g., a closure variable inside setupRunContext).

Also applies to: 28-32

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cmd/proxsave/main_signals.go` at line 15, The package-level sync.Once
variable closeStdinOnce is shared across tests and prevents repeated stdin close
behavior; move its declaration into the scope of setupRunContext (or attach it
to the run context returned by setupRunContext as a field) so each invocation
gets its own sync.Once instance; update any code that references closeStdinOnce
(e.g., where stdin is closed in setupRunContext/main()) to use the new
per-context once variable or field to ensure test isolation and avoid global
shared state.

21-33: 💤 Low value

Consider letting the signal goroutine exit cleanly and stop signal delivery.

The goroutine blocks indefinitely on <-sigChan and signal.Notify is never paired with signal.Stop. In the current single-shot main() flow this is harmless, but if setupRunContext is ever invoked from tests or multiple lifecycles, the previous goroutine and signal registration both persist. A standard pattern is to also wake on ctx.Done() (so the defer cancel() in run() lets the goroutine return) and to release the signal channel:

♻️ Suggested refactor
 	sigChan := make(chan os.Signal, 1)
 	signal.Notify(sigChan, syscall.SIGINT, syscall.SIGTERM)
 	go func() {
-		sig := <-sigChan
-		logging.DebugStepBootstrap(bootstrap, "signal", "received=%v", sig)
-		bootstrap.Info("\nReceived signal %v, initiating graceful shutdown...", sig)
-		cancel()
-		closeStdinOnce.Do(func() {
-			if file := os.Stdin; file != nil {
-				_ = file.Close()
-			}
-		})
+		defer signal.Stop(sigChan)
+		select {
+		case sig := <-sigChan:
+			logging.DebugStepBootstrap(bootstrap, "signal", "received=%v", sig)
+			bootstrap.Info("\nReceived signal %v, initiating graceful shutdown...", sig)
+			cancel()
+			closeStdinOnce.Do(func() {
+				if file := os.Stdin; file != nil {
+					_ = file.Close()
+				}
+			})
+		case <-ctx.Done():
+		}
 	}()
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cmd/proxsave/main_signals.go` around lines 21 - 33, The signal goroutine
currently blocks on sigChan and never calls signal.Stop, so register/unregister
pairing is missing; change the goroutine that uses sigChan (created alongside
signal.Notify) to select between receiving a signal and ctx.Done(), and when
exiting call signal.Stop(sigChan) (and optionally close sigChan) so the
goroutine can return cleanly; keep the existing handling (logging, cancel(),
closeStdinOnce) when a signal is received and ensure the ctx.Done() branch also
stops signal delivery to avoid stale handlers — reference sigChan,
signal.Notify, signal.Stop, cancel(), closeStdinOnce and the goroutine that logs
with logging.DebugStepBootstrap.
internal/backup/collector_bricks_pve_finalize.go (1)

130-156: 💤 Low value

Mixed value/pointer receivers on pveContext.

ensureStorageScanResults uses a pointer receiver (it must — it mutates the field), while runtimeNodes, runtimeStorages, and storageResult use value receivers. This works, but if a pveContext is ever held by value, calling ensureStorageScanResults requires an addressable variable, which can surprise callers. Consider standardizing on pointer receivers for the whole type.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/backup/collector_bricks_pve_finalize.go` around lines 130 - 156, The
methods runtimeNodes, runtimeStorages, and storageResult use value receivers
while ensureStorageScanResults uses a pointer receiver; standardize pveContext
to use pointer receivers for all methods to avoid surprises when the struct is
held by value—change signatures of runtimeNodes, runtimeStorages, and
storageResult to use (p *pveContext), update their bodies accordingly, and then
search & update any call sites if they rely on value semantics so they pass a
pointer (or remain valid); keep ensureStorageScanResults as-is since it already
mutates fields.
cmd/proxsave/main_runtime_log.go (1)

11-22: ⚡ Quick win

logRunContext mutates rt.unprivilegedInfo — surprising for a "log" helper.

Detecting and assigning the unprivileged-container info is a side effect that callers wouldn't expect from a function named logRunContext. If anything else later reads rt.unprivilegedInfo, skipping or reordering this log call would silently leave the field zero. Consider populating rt.unprivilegedInfo during runtime preparation (e.g., in prepareRuntime / main_runtime.go) and have logRunContext only read it.

♻️ Suggested split
 func logRunContext(rt *appRuntime) {
 	logRunDryRunStatus(rt)
 	baseDirSource := runBaseDirSource(rt)
 	logging.Info("Environment: %s %s", rt.envInfo.Type, rt.envInfo.Version)
-	rt.unprivilegedInfo = environment.DetectUnprivilegedContainer()
 	logUserNamespaceContext(rt.logger, rt.unprivilegedInfo)

…and assign rt.unprivilegedInfo = environment.DetectUnprivilegedContainer() in the runtime-preparation path before logRunContext is invoked.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cmd/proxsave/main_runtime_log.go` around lines 11 - 22, logRunContext
currently mutates rt.unprivilegedInfo by calling
environment.DetectUnprivilegedContainer(), which is a surprising side effect for
a logging helper; move the detection/assignment into the runtime preparation
path (e.g., in prepareRuntime or main_runtime.go) so that prepareRuntime sets
rt.unprivilegedInfo = environment.DetectUnprivilegedContainer() before any call
to logRunContext, then remove the assignment from logRunContext so it only reads
rt.unprivilegedInfo and logs it; update any callers/tests that depended on the
old behavior to ensure prepareRuntime is invoked before logging.
internal/orchestrator/restore_archive.go (1)

256-264: 💤 Low value

Use 0o755 for octal consistency with neighbors.

Lines 217 and 246 in this same file use 0o755 for MkdirAll, but this call uses the legacy 0755 literal. Harmless functionally — Go accepts both — but a one-character cleanup keeps the file uniform.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/orchestrator/restore_archive.go` around lines 256 - 264, Replace the
legacy octal literal 0755 with the consistent 0o755 where permissions are
created for the log directory; update the call to restoreFS.MkdirAll (the block
that creates logDir and sets logPath using nowRestore() and restoreLogSequence)
to use 0o755 so it matches the other MkdirAll uses in this file.
cmd/proxsave/main_runtime.go (1)

290-302: 💤 Low value

Misleading helper name; consider clarifying semantics.

newer returns true when versions are equal because the patch comparison is >= while major/minor are strict >. The semantic is actually “meets-or-exceeds minimum”, which matches the call site at Line 300 (!newer(...) → below required). Rename to meetsMinimum (or invert the predicate) to avoid future confusion when this helper is reused.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cmd/proxsave/main_runtime.go` around lines 290 - 302, The helper function
newer has semantics of "meets-or-exceeds minimum" (it returns true when versions
are equal due to patch >=) so rename the function to meetsMinimum (or
equivalent) and update its single call site (the if !newer(rtMaj, rtMin,
rtPatch, minMaj, minMin, minPatch) check) to use the new name; preserve the
existing comparison logic (keep the patch comparison as >= and major/minor as >)
so behavior stays identical while the name accurately reflects intent.
cmd/proxsave/main_modes.go (1)

25-41: 💤 Low value

Fail-fast rule iteration is intentional but loses cumulative diagnostics.

validateModeCompatibility returns the first rule’s messages, so when a user combines flags that violate multiple rules at once, only the first violation is surfaced. Given rejectIncompatibleModes (in cmd/proxsave/main_lifecycle.go) loops over the returned slice to print all messages, the API contract suggests the intent is to return all applicable messages. Consider accumulating across rules so users see every problem in a single run instead of having to re-run after fixing one.

♻️ Optional accumulation
-	for _, rule := range []modeCompatibilityRule{
-		validateCleanupGuardsCompatibility,
-		validateSupportCompatibility,
-		validateInstallCompatibility,
-		validateUpgradeCompatibility,
-	} {
-		if messages := rule(args); len(messages) > 0 {
-			return messages
-		}
-	}
-	return nil
+	var all []string
+	for _, rule := range []modeCompatibilityRule{
+		validateCleanupGuardsCompatibility,
+		validateSupportCompatibility,
+		validateInstallCompatibility,
+		validateUpgradeCompatibility,
+	} {
+		all = append(all, rule(args)...)
+	}
+	return all
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cmd/proxsave/main_modes.go` around lines 25 - 41, validateModeCompatibility
currently short-circuits and returns the first non-empty messages slice from the
rule list, which hides other violations; change it to accumulate messages from
every modeCompatibilityRule (e.g., validateCleanupGuardsCompatibility,
validateSupportCompatibility, validateInstallCompatibility,
validateUpgradeCompatibility) into a single slice and return that combined slice
(or nil/empty when no messages) so rejectIncompatibleModes can print all
diagnostics at once; ensure duplicates are preserved/removed per existing
expectations and keep the function signature validateModeCompatibility(args
*cli.Args) []string unchanged.
cmd/proxsave/backup_storage.go (1)

130-143: 💤 Low value

Surface the underlying detection error before silently disabling cloud.

When detectFilesystemInfo returns nil, this branch disables cloud storage in-place and emits a Skip/debug step but discards the actual error. Operators inspecting why cloud became disabled mid-run won’t see the cause. Logging the error at Debug (or Info) before the disable would help post-mortem debugging without changing behavior.

🪵 Suggested debug log
-	cloudFS, _ := detectFilesystemInfo(opts.ctx, cloudBackend, cfg.CloudRemote, logger)
-	if cloudFS == nil {
-		logging.DebugStep(logger, "storage init", "cloud unavailable, disabling")
+	cloudFS, detectErr := detectFilesystemInfo(opts.ctx, cloudBackend, cfg.CloudRemote, logger)
+	if cloudFS == nil {
+		if detectErr != nil {
+			logging.DebugStep(logger, "storage init", "cloud unavailable, disabling: %v", detectErr)
+		} else {
+			logging.DebugStep(logger, "storage init", "cloud unavailable, disabling")
+		}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cmd/proxsave/backup_storage.go` around lines 130 - 143, detectFilesystemInfo
is discarding the error (cloudFS, _ := detectFilesystemInfo(...)) so when
cloudFS == nil you disable cloud without surfacing why; change the call to
capture the error (e.g., cloudFS, err := detectFilesystemInfo(...)) and, in the
nil branch before setting cfg.CloudEnabled=false and calling
checker.DisableCloud(), log the error via the existing logger (using
logging.DebugStep/logger.Debug or logger.Info) and include it in the message
passed to logStorageInitSummary/logging.Skip so operators can see the underlying
detection failure while preserving current behavior.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 6797ba71-8358-42de-91a4-5f1eed3c718b

📥 Commits

Reviewing files that changed from the base of the PR and between e7a98d0 and c7b1caa.

⛔ Files ignored due to path filters (1)
  • go.sum is excluded by !**/*.sum
📒 Files selected for processing (121)
  • .github/instructions/codacy.instructions.md
  • .github/workflows/codecov.yml
  • .github/workflows/dependabot-automerge.yml
  • README.md
  • cmd/proxsave/backup_execution.go
  • cmd/proxsave/backup_mode.go
  • cmd/proxsave/backup_notifications.go
  • cmd/proxsave/backup_storage.go
  • cmd/proxsave/install.go
  • cmd/proxsave/main.go
  • cmd/proxsave/main_config_modes.go
  • cmd/proxsave/main_defers.go
  • cmd/proxsave/main_footer.go
  • cmd/proxsave/main_identity.go
  • cmd/proxsave/main_lifecycle.go
  • cmd/proxsave/main_modes.go
  • cmd/proxsave/main_modes_test.go
  • cmd/proxsave/main_network.go
  • cmd/proxsave/main_restore_decrypt.go
  • cmd/proxsave/main_runtime.go
  • cmd/proxsave/main_runtime_log.go
  • cmd/proxsave/main_security.go
  • cmd/proxsave/main_signals.go
  • cmd/proxsave/main_state.go
  • cmd/proxsave/main_support.go
  • cmd/proxsave/main_update.go
  • cmd/proxsave/runtime_helpers.go
  • cmd/proxsave/upgrade.go
  • docs/BACKUP_ENV_MAPPING.md
  • docs/CLI_REFERENCE.md
  • docs/COLLECTOR_ARCHITECTURE.md
  • docs/CONFIGURATION.md
  • docs/DEVELOPER_GUIDE.md
  • docs/EXAMPLES.md
  • docs/README.md
  • docs/RESTORE_DIAGRAMS.md
  • docs/RESTORE_GUIDE.md
  • docs/RESTORE_TECHNICAL.md
  • docs/TROUBLESHOOTING.md
  • go.mod
  • internal/backup/archiver.go
  • internal/backup/collector.go
  • internal/backup/collector_bricks.go
  • internal/backup/collector_bricks_common.go
  • internal/backup/collector_bricks_pbs.go
  • internal/backup/collector_bricks_pbs_features.go
  • internal/backup/collector_bricks_pbs_inventory.go
  • internal/backup/collector_bricks_pbs_runtime.go
  • internal/backup/collector_bricks_pve.go
  • internal/backup/collector_bricks_pve_finalize.go
  • internal/backup/collector_bricks_pve_jobs.go
  • internal/backup/collector_bricks_pve_storage.go
  • internal/backup/collector_bricks_system.go
  • internal/backup/collector_bricks_test.go
  • internal/backup/collector_deps.go
  • internal/backup/collector_pbs.go
  • internal/backup/collector_pbs_auth_test.go
  • internal/backup/collector_pbs_datastore.go
  • internal/backup/collector_privilege_sensitive_test.go
  • internal/backup/collector_pve.go
  • internal/backup/collector_pve_patterns_test.go
  • internal/backup/collector_pve_util_test.go
  • internal/backup/collector_system.go
  • internal/backup/collector_system_test.go
  • internal/backup/collector_test.go
  • internal/config/config.go
  • internal/config/migration.go
  • internal/config/templates/backup.env
  • internal/environment/detect.go
  • internal/identity/identity.go
  • internal/notify/email.go
  • internal/notify/email_delivery_methods_test.go
  • internal/notify/webhook.go
  • internal/notify/webhook_payloads.go
  • internal/notify/webhook_test.go
  • internal/orchestrator/additional_helpers_test.go
  • internal/orchestrator/backup_run_helpers.go
  • internal/orchestrator/backup_run_phases.go
  • internal/orchestrator/backup_sources.go
  • internal/orchestrator/decrypt.go
  • internal/orchestrator/decrypt_test.go
  • internal/orchestrator/decrypt_tui_e2e_helpers_test.go
  • internal/orchestrator/deps.go
  • internal/orchestrator/deps_test.go
  • internal/orchestrator/guards_cleanup_test.go
  • internal/orchestrator/mount_guard_more_test.go
  • internal/orchestrator/network_apply.go
  • internal/orchestrator/network_apply_additional_test.go
  • internal/orchestrator/orchestrator.go
  • internal/orchestrator/pbs_mount_guard_test.go
  • internal/orchestrator/pbs_staged_apply_additional_test.go
  • internal/orchestrator/pbs_staged_apply_test.go
  • internal/orchestrator/pve_staged_apply_test.go
  • internal/orchestrator/restore.go
  • internal/orchestrator/restore_access_control_ui.go
  • internal/orchestrator/restore_access_control_ui_additional_test.go
  • internal/orchestrator/restore_archive.go
  • internal/orchestrator/restore_archive_entries.go
  • internal/orchestrator/restore_archive_extract.go
  • internal/orchestrator/restore_archive_paths.go
  • internal/orchestrator/restore_cluster_apply.go
  • internal/orchestrator/restore_decompression.go
  • internal/orchestrator/restore_errors_test.go
  • internal/orchestrator/restore_filesystem_test.go
  • internal/orchestrator/restore_firewall.go
  • internal/orchestrator/restore_firewall_additional_test.go
  • internal/orchestrator/restore_ha.go
  • internal/orchestrator/restore_ha_additional_test.go
  • internal/orchestrator/restore_services.go
  • internal/orchestrator/restore_workflow_ui.go
  • internal/orchestrator/restore_zfs.go
  • internal/orchestrator/temp_registry_test.go
  • internal/orchestrator/unescape_proc_path_test.go
  • internal/pbs/namespaces.go
  • internal/pbs/namespaces_test.go
  • internal/safeexec/safeexec.go
  • internal/safeexec/safeexec_test.go
  • internal/security/procscan.go
  • internal/security/security.go
  • internal/storage/cloud.go
  • internal/tui/wizard/post_install_audit_core.go
💤 Files with no reviewable changes (8)
  • internal/orchestrator/temp_registry_test.go
  • internal/orchestrator/pve_staged_apply_test.go
  • internal/orchestrator/unescape_proc_path_test.go
  • internal/orchestrator/mount_guard_more_test.go
  • internal/orchestrator/pbs_mount_guard_test.go
  • internal/orchestrator/pbs_staged_apply_test.go
  • internal/orchestrator/guards_cleanup_test.go
  • internal/orchestrator/pbs_staged_apply_additional_test.go

Comment thread .github/workflows/dependabot-automerge.yml Outdated
Comment thread cmd/proxsave/main_identity.go
Comment thread cmd/proxsave/main_modes.go Outdated
Comment thread cmd/proxsave/main_state.go Outdated
Comment thread internal/backup/collector_bricks_pve_storage.go
Comment thread internal/orchestrator/backup_run_phases.go
Comment thread internal/orchestrator/orchestrator.go
Comment thread internal/orchestrator/restore_cluster_apply.go Outdated
Comment thread internal/orchestrator/restore_decompression.go
Comment thread internal/orchestrator/restore_zfs.go Outdated
tis24dev added 8 commits May 5, 2026 14:05
Add new unit tests for the orchestrator package. Tests cover estimatedBackupSizeGB scaling and minimum, BackupError wrapping and default messages for disk validation, backup metrics exit code logic, filling backup timing fields, and building backup collector config merging runtime excludes and blacklist. Also add restore tests for extractPlainArchive honoring skip functions and for runSafeClusterApplyWithUI ensuring storage/datacenter apply is skipped when storage_pve is staged.
Return the first capture error from PVE collection loops (so collection continues but callers get notified) and surface errors from schedule captures. Add tests to assert capture-error behavior for backup history, replication status, and schedule collection. Introduce lookupAbsolutePath and use it to resolve systemctl/mailq paths (and use TrustedCommandContext/commandForMailTool) to make mail/service checks more robust. Clear stale restore abort info at start of runRestoreWorkflowWithUI and add a test ensuring abort info is cleared before validation.
Ignore build metadata when comparing semantic versions by trimming any '+' suffix in isNewerVersion, and add unit tests covering build-metadata cases. Also update related docs and metadata: fix Codacy instructions YAML frontmatter formatting, update the PayPal donate link in README, and rename a config key in EXAMPLES.md (BACKUP_PXAR_FILES -> PXAR_SCAN_ENABLE).
Report failed files in restore summary and refine logging/guards. The restore summary now prints the number of failed files and includes them in the total archive count; summary log messages omit the "see detailed log" hint when no log path is configured. shouldSkipRestoreEntryTarget was tightened to match the /etc/pve boundary (exact match or trailing slash) to avoid false positives. Also changed a PBS user-list parse log from Debug to Warning and adjusted a couple of tests to account for whitespace-only config path handling. Added tests covering the restore summary and /etc/pve boundary behavior.
Add Lchown and UtimesNano to the FS interface and implement them in osFS and test fakes; update restore code to use restoreFS.Lchown/UtimesNano instead of direct os/syscall calls. Simplify environment runCommand to use safeexec.TrustedCommandContext and update tests to call echo via LookPath. Refactor Proxmox notification header parsing: introduce parseSectionHeader with a regexp-based validation and keep parseProxmoxNotificationHeader as a wrapper; update callers. Move panic handling in main lifecycle into a deferred function and remove an extra success print in config apply. Also include minor docs/formatting tweaks and add a test simulating execCommand failure in namespaces tests.
Centralize and harden command execution: introduce runAndClassifyCommand with commandRunOptions/result and classification enums; refactor safeCmdOutput and captureCommandOutput to reuse this logic, improving logging, output summarization, unprivileged-container handling, systemctl special cases, and dry-run/lookpath handling.

Runtime and signal updates: detect unprivileged container earlier in bootstrap (rt.unprivilegedInfo), remove duplicate detection in logging, stop signal notification on exit, and make stdin-close handling safe via a scoped sync.Once. Also fix heap profiling to defer file close.

Other changes: make pveContext methods use pointer receivers and nil-checks for safety; enforce Pushover webhook auth (require Token and User) and add tests; expand safeexec tests to validate the list of allowed commands.
Propagate filesystem detection errors for cloud backends so callers get diagnostics (detectFilesystemInfo now returns an error for LocationCloud and backup init logs include the failure reason). Adjust backup storage init logging to include the detection reason and mark cloud disabled. Change validateModeCompatibility to accumulate and return all compatibility violations (with a new test added). Rename helper 'newer' to 'meetsMinimum' for clarity and normalize directory mode literal to 0o755. Tests updated/added to cover the new behaviors.
Pass context.Context through ZFS restore helper functions and tests so operations can be cancelled and use caller-provided contexts. checkZFSPoolsAfterRestore, detectImportableZFSPools and logNoImportableZFSPools now accept a context, check ctx.Err, and pass ctx to restoreCmd.Run. The UI caller was updated to forward the workflow context. Tests were updated to supply contexts and two new tests verify context propagation and behavior on canceled contexts. FakeCommandRunner was extended to record contexts for assertions.
tis24dev added 6 commits May 5, 2026 22:00
Change decompression reader APIs to return io.ReadCloser and ensure readers are closed with errors propagated. Introduce closeDecompressionReader helper to defer Close() and convert close errors into the function's error return. Update callers to use the helper and simplify test cleanup; add a test (TestExtractArchiveNativeReturnsDecompressionCloseError) that verifies decompressor Close() errors surface. Also adjust imports and minor test fixes to read directly from readers.
Refactor how VM and storage configs are applied: storageBlock now stores Type and proxmoxNotificationEntry entries instead of raw data, and VM/storage config files are parsed into --key=value pvesh arguments rather than written to temp files and passed via --filename/-conf. Added parsing helpers (parseColonConfigLine, pveshArgsFromColonConfigFile/Lines, pveshArgsFromProxmoxEntries, storageBlockPveshArgs, storageEntryValue) and updated applyVMConfigs/applyStorageCfg to build proper pvesh calls (e.g. "pvesh create /storage --storage=... --type=... ..."). Tests were updated to reflect the new argument format and to assert that deprecated flags (--filename, -conf) are no longer used.
Add a new ExitEncryptionError exit code and string mapping to represent encryption setup/processing failures. Update createBackupArchive to report the "encryption" phase and use ExitEncryptionError when age recipient preparation fails. Add tests: backup_run_phases_test verifies age-recipient failures are classified as encryption, and exit_codes_test updated to include the new code. Also fix RunGoBackup to return the collected stats value when prepareBackupWorkspace fails so callers receive consistent results.
Change writeArchiveChecksum to return an error and propagate failures: writeArchiveChecksum now returns a wrapped error on write failure (uses 0o640 file mode) and verifyAndWriteBackupArtifacts returns a BackupError when checksum writing fails. Make server identity detection injectable for testing by introducing runtimeServerIdentityDetector and only run detection when ServerID is not configured; add unit tests for initializeServerIdentity. Add unit test ensuring checksum write errors are propagated. Also add Node.js 24 setup to the Dependabot automerge workflow so Dependabot metadata step runs under Node 24.
Add isContextCancellationError helper and use it across PVE collection bricks and storage/metadata steps so context cancellations are propagated (returning the ctx error) instead of being treated as regular failures. Add tests ensuring guest and storage bricks propagate cancellations. Include new standalone collection flags (BackupPBSNotificationsPriv, BackupRootHome, BackupScriptRepository, BackupUserHomes) in collectionOptionFlags and add tests validating they are accepted. Change validateCloudSettings to accept absolute cloud remote refs with a new isAbsoluteCloudRemoteRef helper and add tests for absolute refs and path validation. Remove unused setBackupResult from main_state.go and update related imports in tests.
Introduce simAppCompletionTimeout and group timing constants. Improve test timeout handling by reporting errors and stopping the app if the initial draw doesn't occur or the simulation doesn't finish within the completion timeout after injecting keys. Also properly stop and reset the timer to avoid races.
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 9

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
cmd/proxsave/runtime_helpers.go (1)

221-243: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

resolveHostname has no command timeout.

safeexec.TrustedCommandContext(context.Background(), …) runs hostname -f with no deadline. If the host has slow or broken DNS resolution (common on isolated/edge nodes), hostname -f can block for tens of seconds — and crucially, the os.Hostname() fallback never executes while the command is hung. Since resolveHostname is called early during runtime startup, this can stall a backup run before any of the configured timeouts kick in.

🛡️ Suggested fix: bound the `hostname -f` invocation
 func resolveHostname() string {
 	if path, err := exec.LookPath("hostname"); err == nil {
-		cmd, cmdErr := safeexec.TrustedCommandContext(context.Background(), path, "-f")
+		ctx, cancel := context.WithTimeout(context.Background(), 2*time.Second)
+		defer cancel()
+		cmd, cmdErr := safeexec.TrustedCommandContext(ctx, path, "-f")
 		if cmdErr == nil {
 			if out, err := cmd.Output(); err == nil {
 				if fqdn := strings.TrimSpace(string(out)); fqdn != "" {
 					return fqdn
 				}
 			}
 		}
 	}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cmd/proxsave/runtime_helpers.go` around lines 221 - 243, The call to
safeexec.TrustedCommandContext in resolveHostname has no deadline so hostname -f
can hang; wrap the command creation and execution in a context with a short
timeout (e.g., context.WithTimeout) and call cancel() after use, then run
TrustedCommandContext with that timed context and use cmd.Output(); if the
context times out or returns an error, fall back immediately to os.Hostname()
(preserving the existing trimming/unknown logic). Ensure you reference
resolveHostname and safeexec.TrustedCommandContext, add the timeout import
(time) and properly cancel the context to avoid leaks.
internal/backup/collector.go (1)

1274-1287: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

PBS_REPOSITORY rewriting breaks for custom ports and IPv6 addresses.

The code uses SplitN(repoWithDatastore, ":", 2) to replace the datastore portion, but PBS_REPOSITORY format includes optional port numbers and IPv6 addresses in square brackets, all separated by colons:

  • Custom port: backup-server:8007:store1 → incorrectly becomes backup-server:newds (port lost)
  • IPv6 address: [ff80::51]:8007:store1 → becomes mangled/invalid (splits inside IPv6 literal)

This silently generates invalid repository strings that would fail subsequent PBS operations.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/backup/collector.go` around lines 1274 - 1287, The current
replacement using strings.SplitN(repoWithDatastore, ":", 2) drops ports and
mangles IPv6 literals; instead locate the true datastore-separator (the final
colon that is not inside an IPv6 bracket) and replace only the substring after
it with datastoreName. Concretely, in the block that sets repoWithDatastore
(using c.config.PBSRepository, repoWithDatastore, datastoreName), detect IPv6 by
checking for a leading '['—if present find the matching ']' and then search for
a ':' after that; otherwise use strings.LastIndex to find the last ':'; if such
a separator exists replace everything after that index with datastoreName, else
append ":"+datastoreName. This preserves host, optional port, and IPv6 formats
while swapping the datastore.
♻️ Duplicate comments (2)
cmd/proxsave/main_modes.go (1)

226-230: ⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

runInstallTUI still runs unconditionally before --force-cli is honored — prior critical issue not addressed.

Lines 227–230 still execute runInstallTUI first and only afterward overwrite err with runInstall when args.ForceCLI is set. This means --install --force-cli continues to launch the TUI installer (terminal takeover, prompts, potential file mutations), silently discards its returned error, and only then runs the CLI installer. The dispatcher's debug log on line 220 (mode=install cli=%v) clearly states the intent is mutual exclusion.

🐛 Proposed fix
-	err := runInstallTUI(ctx, args.ConfigPath, bootstrap)
-	if args.ForceCLI {
-		err = runInstall(ctx, args.ConfigPath, bootstrap)
-	}
+	var err error
+	if args.ForceCLI {
+		err = runInstall(ctx, args.ConfigPath, bootstrap)
+	} else {
+		err = runInstallTUI(ctx, args.ConfigPath, bootstrap)
+	}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cmd/proxsave/main_modes.go` around lines 226 - 230, The installer always
calls runInstallTUI unconditionally and then overwrites err with runInstall when
args.ForceCLI is true; change the dispatch so only one of the installers runs:
check args.ForceCLI first and call runInstall(ctx, args.ConfigPath, bootstrap)
when true, otherwise call runInstallTUI(ctx, args.ConfigPath, bootstrap); ensure
only the chosen call sets err (do not call both or discard the first error).
Reference runInstallTUI, runInstall, args.ForceCLI and err to locate the code to
modify.
internal/backup/collector.go (1)

299-326: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Validate() still rejects custom-path-only backups.

hasCollectionOptionEnabled() now covers the new boolean toggles, but it still ignores len(c.CustomBackupPaths). A config that disables the built-in collectors and backs up only explicit custom paths will still fail with at least one backup option must be enabled.

Suggested fix
 func (c *CollectorConfig) hasCollectionOptionEnabled() bool {
+	if len(c.CustomBackupPaths) > 0 {
+		return true
+	}
 	for _, enabled := range c.collectionOptionFlags() {
 		if enabled {
 			return true
 		}
 	}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/backup/collector.go` around lines 299 - 326, The validation
currently treats a config with only CustomBackupPaths as invalid because
hasCollectionOptionEnabled() only inspects collectionOptionFlags(); update
hasCollectionOptionEnabled() (or collectionOptionFlags() callers) to also
consider len(c.CustomBackupPaths) > 0 so a config that disables all boolean
collectors but supplies CustomBackupPaths passes Validate(); reference
CollectorConfig.hasCollectionOptionEnabled,
CollectorConfig.collectionOptionFlags, and the CustomBackupPaths field when
applying the change.
🧹 Nitpick comments (6)
internal/backup/collector_config_extra_test.go (1)

36-54: ⚡ Quick win

LGTM — consider extending coverage to assert defaults are applied.

The table-driven structure is clean and correctly uses t.Run/t.Fatalf. The test verifies acceptance but, unlike the existing BackupVMConfigs case (lines 27–31), it does not assert that Validate() applies expected defaults (e.g., PxarDatastoreConcurrency) for the new standalone options. All four new fields (BackupPBSNotificationsPriv, BackupRootHome, BackupScriptRepository, BackupUserHomes) are included in collectionOptionFlags(), so Validate() will invoke normalizePxarConcurrency() and apply the default value of 3. For consistency and complete coverage, this assertion should be added.

✨ Proposed extension to also assert defaults are applied
 	for _, tt := range tests {
 		t.Run(tt.name, func(t *testing.T) {
 			if err := tt.cfg.Validate(); err != nil {
 				t.Fatalf("Validate() error = %v", err)
 			}
+			if tt.cfg.PxarDatastoreConcurrency != 3 {
+				t.Fatalf("defaults not applied after Validate(): PxarDatastoreConcurrency = %d", tt.cfg.PxarDatastoreConcurrency)
+			}
 		})
 	}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/backup/collector_config_extra_test.go` around lines 36 - 54, Update
TestCollectorConfigValidateAcceptsNewStandaloneCollectionOptions to also assert
that Validate() applies the default pxar concurrency: after calling
tt.cfg.Validate(), check that tt.cfg.PxarDatastoreConcurrency equals the
expected default (3); this mirrors the existing BackupVMConfigs check and
ensures that CollectorConfig.Validate() (which uses collectionOptionFlags() and
normalizePxarConcurrency()) sets the default value for the new standalone flags
(BackupPBSNotificationsPriv, BackupRootHome, BackupScriptRepository,
BackupUserHomes).
docs/RESTORE_GUIDE.md (2)

72-97: 💤 Low value

Consider clarifying automatic category filtering behavior.

The compatibility model is well explained, but the section could be slightly clearer that partial compatibility automatically filters the selected categories to those compatible with the current host (rather than requiring manual selection). This behavior is mentioned elsewhere (e.g., line 245, line 519), but adding a brief note here would help readers understand the outcome immediately.

For example, after line 86:

 - **Incompatible**: backup and host share no role
+
+ProxSave automatically filters the category selection to match the current
+host's capabilities when partial compatibility is detected.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/RESTORE_GUIDE.md` around lines 72 - 97, In the "System Types and
Compatibility" section add a brief sentence after the examples (e.g., after the
"dual/pve/pbs" examples) stating that when a backup is only partially
compatible, the restore process automatically filters the selectable restore
categories to only those roles that match the current host (so users do not need
to manually deselect incompatible categories); reference the section heading
"System Types and Compatibility" and the examples for placement and mirror the
wording used elsewhere (lines referencing automatic filtering) to keep
terminology consistent.

2502-2512: ⚡ Quick win

Consider rewording for clarity on cross-role restore support.

The FAQ answer starts with "Direct cross-role restore is still not recommended" which may initially confuse readers, given that the section then explains how cross-role restore IS supported via role overlap. Consider rephrasing to distinguish between:

  • Not recommended: pure cross-role with no overlap (e.g., pvepbs only common categories)
  • Supported with automatic filtering: cross-role with role overlap (e.g., dualpve)

Suggested rewording:

-A: Direct cross-role restore is still not recommended. PVE and PBS have
-different role-specific configurations. However, ProxSave now evaluates
-compatibility by **role overlap**:
+A: ProxSave evaluates compatibility by **role overlap** rather than requiring
+exact type matches. Cross-role restore is supported when the backup and current
+host share at least one role (partial compatibility):

This makes it immediately clear that the feature is supported while still cautioning about the limitations.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/RESTORE_GUIDE.md` around lines 2502 - 2512, The opening sentence is
ambiguous — change "Direct cross-role restore is still not recommended" to
explicitly distinguish pure cross-role restores (no role overlap) from supported
overlap cases: e.g., replace with "Pure cross-role restore (no role overlap) is
not recommended; however ProxSave supports restores when roles overlap." Keep
the role examples (`pve`, `pbs`, `dual`) and the bullet list as-is, and ensure
the warning about limitations follows that clarified sentence so readers
immediately see that `dual`↔`pve`/`pbs` overlaps are supported but `pve`→`pbs`
without overlap is discouraged.
internal/orchestrator/restore_cluster_apply.go (2)

77-100: 💤 Low value

Minor: redundant os.IsNotExist after errors.Is(err, os.ErrNotExist).

errors.Is(err, os.ErrNotExist) already handles wrapped *PathError/syscall.ENOENT cases that the older os.IsNotExist recognized, so the second clause is dead. Not a bug, but it adds noise.

♻️ Simplify the not-exist check
-		if errors.Is(err, os.ErrNotExist) || os.IsNotExist(err) {
+		if errors.Is(err, os.ErrNotExist) {
 			return nil, nil
 		}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/orchestrator/restore_cluster_apply.go` around lines 77 - 100, In
listExportNodeDirs remove the redundant os.IsNotExist(err) branch: keep only
errors.Is(err, os.ErrNotExist) in the error check inside listExportNodeDirs so
the function returns nil, nil for missing nodesRoot without the extra
os.IsNotExist call; update the if condition that currently reads if
errors.Is(err, os.ErrNotExist) || os.IsNotExist(err) to just errors.Is(err,
os.ErrNotExist).

215-222: 💤 Low value

Naming: detectNodeForVM ignores the VM and just returns the local short hostname.

The signature suggests per-VM resolution, but the function takes no arguments and returns the local host for every call inside applyVMConfigs. Either accept a vmEntry (or the source node) and use it, or rename to something like localNodeName() so callers don't assume per-VM logic.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/orchestrator/restore_cluster_apply.go` around lines 215 - 222, The
function detectNodeForVM misleadingly suggests VM-specific resolution but always
returns the local short hostname; update it to either accept the VM or source
node (e.g., add a parameter like vmEntry or sourceNode) and use that value to
determine the node for per-VM logic in applyVMConfigs, or rename the function to
localNodeName() to make its behavior explicit; also update all callers (notably
usages inside applyVMConfigs) to pass the VM/source when you choose the first
option or to reflect the new name if you choose the renaming option.
internal/backup/collector_bricks_test.go (1)

126-157: ⚡ Quick win

Resolve the target brick by BrickID instead of slice index.

Both cancellation tests assume [0] stays the intended brick. Any harmless reordering inside the brick builders will break these tests without changing runtime behavior.

Suggested refactor
-	err := newPVEGuestBricks()[0].Run(ctx, state)
+	guestBrick := requireBrick(t, recipe{Name: "pve-guest", Bricks: newPVEGuestBricks()}, brickPVEVMQEMUConfigs)
+	err := guestBrick.Run(ctx, state)
 	if !errors.Is(err, context.Canceled) {
 		t.Fatalf("guest brick error = %v, want %v", err, context.Canceled)
 	}
@@
-	err := newPVEStorageProbeBricks()[0].Run(ctx, state)
+	probeBrick := requireBrick(t, recipe{Name: "pve-storage-probe", Bricks: newPVEStorageProbeBricks()}, brickPVEStorageProbe)
+	err := probeBrick.Run(ctx, state)
 	if !errors.Is(err, context.Canceled) {
 		t.Fatalf("storage probe brick error = %v, want %v", err, context.Canceled)
 	}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/backup/collector_bricks_test.go` around lines 126 - 157, Tests
TestPVEGuestBrickPropagatesQEMUContextCancellation and
TestPVEStorageProbeBrickPropagatesContextCancellation are brittle because they
pick bricks by slice index ([0]); instead resolve the desired brick by its
BrickID (e.g., search the slice returned by newPVEGuestBricks() and
newPVEStorageProbeBricks() for the brick whose BrickID matches the intended
brick constant/name) and call Run on that found brick; update both tests to find
the brick via its BrickID, fail the test if not found, and then invoke Run with
the cancelled context to assert context.Canceled behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.github/workflows/dependabot-automerge.yml:
- Line 20: Replace the mutable action tags with immutable full commit SHAs:
change uses: actions/setup-node@v4 and uses: dependabot/fetch-metadata@v3 to use
their respective full-length commit SHAs (e.g., actions/setup-node@<full-sha>
and dependabot/fetch-metadata@<full-sha>), and add inline comments preserving
the human-readable tag (e.g., # pinned from actions/setup-node@v4) so future
reviewers know the original version while ensuring the workflow executes an
immutable commit.

In `@cmd/proxsave/main_lifecycle.go`:
- Around line 34-45: The current panic recovery is ineffective because
finishMainRun registers an inner deferred anonymous function which means
recover() is not called directly from the deferred function of run(); fix by
moving the recover() logic into finishMainRun's body (i.e., call recover()
directly in finishMainRun rather than inside another deferred func), so when
run() does defer finishMainRun(runInfo) it can catch panics; inside
finishMainRun use a single defer that executes cleanup (logging via
run.bootstrap.Error and logging.DebugStepBootstrap with
run.state.finalExitCode), call run.runDone with an error when a panic was
recovered, and call os.Exit(types.ExitPanicError.Int()) after logging—remove the
extra nested deferred anonymous function and ensure the recovered panic value
and stack (debug.Stack()) are included in the log calls.

In `@cmd/proxsave/main_update.go`:
- Around line 27-39: The function checkForUpdates should guard against a nil ctx
before calling context.WithTimeout to avoid a panic: at the start of
checkForUpdates, if ctx == nil set ctx = context.Background(), then proceed to
create checkCtx, cancel := context.WithTimeout(ctx, 5*time.Second) and defer
cancel() as before; reference the checkForUpdates function and the use of
context.WithTimeout to locate where to add this nil-check and fallback.

In `@internal/backup/collector_bricks_pbs.go`:
- Around line 79-83: CollectPBSConfigs() currently only uses newPBSRecipe()
which appends manifest/runtime/inventory/feature/finalize bricks, so
newPBSUserConfigRecipe() (which registers the user ID/token snapshot/aggregation
bricks) is never executed; modify the PBS config collection flow to include the
user-config recipe by calling newPBSUserConfigRecipe() (or appending its bricks)
alongside newPBSRecipe() inside CollectPBSConfigs(), ensuring the bricks from
newPBSUserConfigRecipe() are appended in the correct order relative to
newPBSManifestBricks(), newPBSRuntimeBricks(), newPBSInventoryBricks(),
newPBSFeatureBricks(), and newPBSFinalizeBricks().

In `@internal/notify/email.go`:
- Around line 687-696: The function lookupAbsolutePath currently calls
exec.LookPath and then resolves non-absolute results with filepath.Abs, which
can allow CWD-relative binaries; change lookupAbsolutePath to reject any
non-absolute execPath returned by exec.LookPath (i.e., if
!filepath.IsAbs(execPath) return an error instead of calling filepath.Abs) so
callers like safeexec.TrustedCommandContext / ValidateTrustedExecutablePath only
see absolute paths from the OS PATH; include a clear error (e.g., via fmt.Errorf
or errors.New) referencing the original execPath.

In `@internal/orchestrator/deps_test.go`:
- Around line 190-192: The FakeFS.Lchown implementation should not call the real
os.Lchown; replace that syscall with a sandboxed simulation that updates
FakeFS's in-memory metadata for the target path (use the existing onDisk(path)
lookup to validate existence), e.g., maintain/update an ownership map or file
metadata struct keyed by path and set uid/gid there, returning appropriate
errors (like os.ErrNotExist) when the path is missing, so tests that replay
ownership changes do not require root/CAP_CHOWN; modify or add the ownership
store on FakeFS and change Lchown to update it instead of delegating to
os.Lchown.

In `@internal/orchestrator/restore_cluster_apply.go`:
- Around line 186-213: The code is emitting args from snapshot/section headers
because pveshArgsFromColonConfigFile (and its helper
pveshArgsFromColonConfigLines) are section-unaware; update those functions so
they stop emitting key:value pairs once a bracketed section header is
encountered (line trimmed startsWith "[" ) — i.e. parse only the active
pre-section portion of the conf (trim whitespace, ignore blank lines until a
line beginning with '[' then break), then build --key=value args only from that
pre-section slice; apply the same change to the other call-site variant used
around the later block (the pveshArgsFromColonConfigLines path referenced in the
diff) so snapshot keys (parent, snaptime, vmstate, runningmachine, etc.) are
never passed to pvesh set.
- Around line 192-213: The restore loop currently only calls pvesh "set" which
fails if the guest doesn't exist; update the logic in restore_cluster_apply.go
(the loop using detectNodeForVM(), pveshArgsFromColonConfigFile(), runPvesh(),
and vm.VMID/vm.Kind) to first check for existence (e.g. GET
/nodes/{node}/{qemu|lxc}/{vmid}/config or attempt a pvesh "get" and detect the
not-found error) and if absent call the appropriate pvesh "create" endpoint
(/nodes/{node}/qemu or /nodes/{node}/lxc) with --vmid=<id> and required create
args (for LXC include ostemplate) to materialize the guest, then proceed to run
the existing "set" call; ensure error handling/logging mirrors the current
pattern and increment failed/applied counters accordingly.

In `@README.md`:
- Line 17: The README contains a PayPal donation badge that embeds a personal
email in the URL
("https://www.paypal.com/cgi-bin/webscr?cmd=_donations&business=damigioanna%40gmail.com");
replace that badge link target with a non-PII redirect such as a paypal.me URL
(or a generic project/donation landing page) so the raw README no longer exposes
the email, updating the markdown badge link (the existing "[![💸
Donate](...)](...)" entry) to point to the new paypal.me or landing URL and
leaving the visible badge text unchanged.

---

Outside diff comments:
In `@cmd/proxsave/runtime_helpers.go`:
- Around line 221-243: The call to safeexec.TrustedCommandContext in
resolveHostname has no deadline so hostname -f can hang; wrap the command
creation and execution in a context with a short timeout (e.g.,
context.WithTimeout) and call cancel() after use, then run TrustedCommandContext
with that timed context and use cmd.Output(); if the context times out or
returns an error, fall back immediately to os.Hostname() (preserving the
existing trimming/unknown logic). Ensure you reference resolveHostname and
safeexec.TrustedCommandContext, add the timeout import (time) and properly
cancel the context to avoid leaks.

In `@internal/backup/collector.go`:
- Around line 1274-1287: The current replacement using
strings.SplitN(repoWithDatastore, ":", 2) drops ports and mangles IPv6 literals;
instead locate the true datastore-separator (the final colon that is not inside
an IPv6 bracket) and replace only the substring after it with datastoreName.
Concretely, in the block that sets repoWithDatastore (using
c.config.PBSRepository, repoWithDatastore, datastoreName), detect IPv6 by
checking for a leading '['—if present find the matching ']' and then search for
a ':' after that; otherwise use strings.LastIndex to find the last ':'; if such
a separator exists replace everything after that index with datastoreName, else
append ":"+datastoreName. This preserves host, optional port, and IPv6 formats
while swapping the datastore.

---

Duplicate comments:
In `@cmd/proxsave/main_modes.go`:
- Around line 226-230: The installer always calls runInstallTUI unconditionally
and then overwrites err with runInstall when args.ForceCLI is true; change the
dispatch so only one of the installers runs: check args.ForceCLI first and call
runInstall(ctx, args.ConfigPath, bootstrap) when true, otherwise call
runInstallTUI(ctx, args.ConfigPath, bootstrap); ensure only the chosen call sets
err (do not call both or discard the first error). Reference runInstallTUI,
runInstall, args.ForceCLI and err to locate the code to modify.

In `@internal/backup/collector.go`:
- Around line 299-326: The validation currently treats a config with only
CustomBackupPaths as invalid because hasCollectionOptionEnabled() only inspects
collectionOptionFlags(); update hasCollectionOptionEnabled() (or
collectionOptionFlags() callers) to also consider len(c.CustomBackupPaths) > 0
so a config that disables all boolean collectors but supplies CustomBackupPaths
passes Validate(); reference CollectorConfig.hasCollectionOptionEnabled,
CollectorConfig.collectionOptionFlags, and the CustomBackupPaths field when
applying the change.

---

Nitpick comments:
In `@docs/RESTORE_GUIDE.md`:
- Around line 72-97: In the "System Types and Compatibility" section add a brief
sentence after the examples (e.g., after the "dual/pve/pbs" examples) stating
that when a backup is only partially compatible, the restore process
automatically filters the selectable restore categories to only those roles that
match the current host (so users do not need to manually deselect incompatible
categories); reference the section heading "System Types and Compatibility" and
the examples for placement and mirror the wording used elsewhere (lines
referencing automatic filtering) to keep terminology consistent.
- Around line 2502-2512: The opening sentence is ambiguous — change "Direct
cross-role restore is still not recommended" to explicitly distinguish pure
cross-role restores (no role overlap) from supported overlap cases: e.g.,
replace with "Pure cross-role restore (no role overlap) is not recommended;
however ProxSave supports restores when roles overlap." Keep the role examples
(`pve`, `pbs`, `dual`) and the bullet list as-is, and ensure the warning about
limitations follows that clarified sentence so readers immediately see that
`dual`↔`pve`/`pbs` overlaps are supported but `pve`→`pbs` without overlap is
discouraged.

In `@internal/backup/collector_bricks_test.go`:
- Around line 126-157: Tests TestPVEGuestBrickPropagatesQEMUContextCancellation
and TestPVEStorageProbeBrickPropagatesContextCancellation are brittle because
they pick bricks by slice index ([0]); instead resolve the desired brick by its
BrickID (e.g., search the slice returned by newPVEGuestBricks() and
newPVEStorageProbeBricks() for the brick whose BrickID matches the intended
brick constant/name) and call Run on that found brick; update both tests to find
the brick via its BrickID, fail the test if not found, and then invoke Run with
the cancelled context to assert context.Canceled behavior.

In `@internal/backup/collector_config_extra_test.go`:
- Around line 36-54: Update
TestCollectorConfigValidateAcceptsNewStandaloneCollectionOptions to also assert
that Validate() applies the default pxar concurrency: after calling
tt.cfg.Validate(), check that tt.cfg.PxarDatastoreConcurrency equals the
expected default (3); this mirrors the existing BackupVMConfigs check and
ensures that CollectorConfig.Validate() (which uses collectionOptionFlags() and
normalizePxarConcurrency()) sets the default value for the new standalone flags
(BackupPBSNotificationsPriv, BackupRootHome, BackupScriptRepository,
BackupUserHomes).

In `@internal/orchestrator/restore_cluster_apply.go`:
- Around line 77-100: In listExportNodeDirs remove the redundant
os.IsNotExist(err) branch: keep only errors.Is(err, os.ErrNotExist) in the error
check inside listExportNodeDirs so the function returns nil, nil for missing
nodesRoot without the extra os.IsNotExist call; update the if condition that
currently reads if errors.Is(err, os.ErrNotExist) || os.IsNotExist(err) to just
errors.Is(err, os.ErrNotExist).
- Around line 215-222: The function detectNodeForVM misleadingly suggests
VM-specific resolution but always returns the local short hostname; update it to
either accept the VM or source node (e.g., add a parameter like vmEntry or
sourceNode) and use that value to determine the node for per-VM logic in
applyVMConfigs, or rename the function to localNodeName() to make its behavior
explicit; also update all callers (notably usages inside applyVMConfigs) to pass
the VM/source when you choose the first option or to reflect the new name if you
choose the renaming option.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: acb6580f-ac79-472f-9020-7e5606409aad

📥 Commits

Reviewing files that changed from the base of the PR and between c7b1caa and 532ebe4.

📒 Files selected for processing (72)
  • .github/instructions/codacy.instructions.md
  • .github/workflows/dependabot-automerge.yml
  • README.md
  • cmd/proxsave/backup_storage.go
  • cmd/proxsave/main_config_modes.go
  • cmd/proxsave/main_defers.go
  • cmd/proxsave/main_identity.go
  • cmd/proxsave/main_identity_test.go
  • cmd/proxsave/main_lifecycle.go
  • cmd/proxsave/main_modes.go
  • cmd/proxsave/main_modes_test.go
  • cmd/proxsave/main_runtime.go
  • cmd/proxsave/main_runtime_log.go
  • cmd/proxsave/main_signals.go
  • cmd/proxsave/main_state.go
  • cmd/proxsave/main_update.go
  • cmd/proxsave/runtime_helpers.go
  • cmd/proxsave/runtime_helpers_more_test.go
  • cmd/proxsave/version_helpers_test.go
  • docs/EXAMPLES.md
  • docs/RESTORE_GUIDE.md
  • internal/backup/collector.go
  • internal/backup/collector_bricks.go
  • internal/backup/collector_bricks_pbs.go
  • internal/backup/collector_bricks_pve.go
  • internal/backup/collector_bricks_pve_finalize.go
  • internal/backup/collector_bricks_pve_jobs.go
  • internal/backup/collector_bricks_pve_storage.go
  • internal/backup/collector_bricks_test.go
  • internal/backup/collector_config_extra_test.go
  • internal/backup/collector_pve.go
  • internal/backup/collector_pve_capture_errors_test.go
  • internal/backup/collector_pve_patterns_test.go
  • internal/backup/collector_pve_util_test.go
  • internal/config/config.go
  • internal/config/config_test.go
  • internal/environment/detect.go
  • internal/environment/detect_additional_test.go
  • internal/notify/email.go
  • internal/notify/webhook.go
  • internal/notify/webhook_test.go
  • internal/orchestrator/additional_helpers_test.go
  • internal/orchestrator/backup_run_helpers.go
  • internal/orchestrator/backup_run_helpers_additional_test.go
  • internal/orchestrator/backup_run_phases.go
  • internal/orchestrator/backup_run_phases_test.go
  • internal/orchestrator/decompress_reader_test.go
  • internal/orchestrator/deps.go
  • internal/orchestrator/deps_test.go
  • internal/orchestrator/nic_mapping.go
  • internal/orchestrator/orchestrator.go
  • internal/orchestrator/resolv_conf_repair.go
  • internal/orchestrator/restore_archive.go
  • internal/orchestrator/restore_archive_additional_test.go
  • internal/orchestrator/restore_archive_entries.go
  • internal/orchestrator/restore_archive_extract.go
  • internal/orchestrator/restore_archive_extract_summary_test.go
  • internal/orchestrator/restore_cluster_apply.go
  • internal/orchestrator/restore_cluster_apply_additional_test.go
  • internal/orchestrator/restore_coverage_extra_test.go
  • internal/orchestrator/restore_decision.go
  • internal/orchestrator/restore_decompression.go
  • internal/orchestrator/restore_errors_test.go
  • internal/orchestrator/restore_notifications.go
  • internal/orchestrator/restore_test.go
  • internal/orchestrator/restore_workflow_abort_test.go
  • internal/orchestrator/restore_workflow_ui.go
  • internal/orchestrator/restore_zfs.go
  • internal/pbs/namespaces_test.go
  • internal/safeexec/safeexec_test.go
  • internal/types/exit_codes.go
  • internal/types/exit_codes_test.go
🚧 Files skipped from review as they are similar to previous changes (2)
  • cmd/proxsave/main_defers.go
  • cmd/proxsave/main_runtime.go

Comment thread .github/workflows/dependabot-automerge.yml Outdated
Comment thread cmd/proxsave/main_lifecycle.go
Comment thread cmd/proxsave/main_update.go
Comment thread internal/backup/collector_bricks_pbs.go
Comment thread internal/notify/email.go
Comment thread internal/orchestrator/deps_test.go
Comment thread internal/orchestrator/restore_cluster_apply.go
Comment thread internal/orchestrator/restore_cluster_apply.go Outdated
Comment thread README.md Outdated
tis24dev added 2 commits May 6, 2026 12:17
Ensure App.Stop is safe when called before Run by adding run state tracking, a mutex and a stopRequested flag; implement Run and markRunningAndStopIfRequested to defer stopping until the event loop starts. Add synchronized test helpers (setTimedSimAppStopper, stopTimedSimAppForTest) and protect simulated apps with mutexes so tests can reliably stop current TUI instances. Increase several simulation timeouts and update runDecryptWorkflowTUIForTest to use a cancellable context and trigger a timed shutdown path. Also add a test verifying pre-run Stop terminates RunWithContext as expected.
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
internal/orchestrator/decrypt_tui_e2e_helpers_test.go (1)

249-253: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

timedSimKey.Wait is ignored in the injection loop.

Line 250 only treats k.Wait as a boolean, while waitForScreenText always uses the shared 10s timeout. That means the per-step pacing encoded in the sequences (35ms, 100ms, 750ms, etc.) no longer affects when the next key is injected, which can bring back the race this helper is trying to eliminate.

💡 One localized way to preserve the configured step pacing
 				for _, k := range keys {
 					if k.Wait > 0 {
 						if !waitForScreenText(k.WaitForText) {
 							return
 						}
+						timer := time.NewTimer(k.Wait)
+						select {
+						case <-done:
+							timer.Stop()
+							return
+						case <-timer.C:
+						}
 					}
 					current := currentScreenState()
 					mod := k.Mod
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/orchestrator/decrypt_tui_e2e_helpers_test.go` around lines 249 -
253, The loop currently treats timedSimKey.Wait only as a boolean and always
uses the shared 10s timeout, which ignores per-step pacing; update the injection
loop that iterates over keys to actually use k.Wait as the wait duration (e.g.
convert k.Wait to a time.Duration and pass it into waitForScreenText or else
sleep for that duration before continuing). Specifically, change the branch
around timedSimKey.Wait/keys to call waitForScreenText with a timeout derived
from k.Wait (or call time.Sleep(time.Duration(k.Wait)*time.Millisecond) when
waitForText is empty) so each key uses its configured per-step delay instead of
the global 10s.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@internal/orchestrator/decrypt_tui_e2e_helpers_test.go`:
- Around line 249-253: The loop currently treats timedSimKey.Wait only as a
boolean and always uses the shared 10s timeout, which ignores per-step pacing;
update the injection loop that iterates over keys to actually use k.Wait as the
wait duration (e.g. convert k.Wait to a time.Duration and pass it into
waitForScreenText or else sleep for that duration before continuing).
Specifically, change the branch around timedSimKey.Wait/keys to call
waitForScreenText with a timeout derived from k.Wait (or call
time.Sleep(time.Duration(k.Wait)*time.Millisecond) when waitForText is empty) so
each key uses its configured per-step delay instead of the global 10s.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 70dc89d3-5fc0-41ee-a5d5-fcaa659c497d

📥 Commits

Reviewing files that changed from the base of the PR and between daddefa and da7a7e7.

📒 Files selected for processing (4)
  • internal/orchestrator/decrypt_tui_e2e_helpers_test.go
  • internal/orchestrator/tui_simulation_test.go
  • internal/tui/abort_context_test.go
  • internal/tui/app.go

Replace ad-hoc timed simulation plumbing with a reusable timedSimHarness for decrypt TUI e2e tests. Introduce timedSimHarness, timedSimAppState, and richer timedSimKey fields (RequireNewApp, SettleAfterMatch) plus new timing constants to make key injection and app lifecycle more deterministic and to improve diagnostics on timeouts. Update helper functions and tests to return/use the harness and tighten shutdown logic (StopAll, better timeout/error messages).

Also change RunTask startup/shutdown in workflow_ui_tui_decrypt.go to trigger the background task after the first draw (SetAfterDrawFunc), add start synchronization (started, startOnce) and queueProgressUpdate to safely schedule UI updates and ensure proper waiting for task completion on exit. Minor import adjustments (sync) included.
Pin GitHub Actions steps to specific SHAs for reproducible runs. Add a nil-context guard in checkForUpdates to avoid panics when ctx is nil. Include PBS user config collection by running newPBSUserConfigRecipe(). Change lookupAbsolutePath to return an error if exec.LookPath returns a non-absolute path rather than attempting to make it absolute. Enhance TUI e2e test helpers by adding a Wait field to timedSimKey and using it to control timeouts/sleeps. Update test FakeFS to simulate ownership changes via an Ownership map and a FakeOwnership struct in Lchown instead of calling os.Lchown.
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
internal/notify/email.go (1)

790-801: 💤 Low value

Optional: extract the duplicated mailq path lookup.

The two-step lookupAbsolutePath("mailq")/usr/bin/mailq fallback is duplicated verbatim in checkMailQueue and detectQueueEntry. A small helper would prevent these two call sites from drifting (e.g. when adding /usr/local/bin/mailq or improving the error message). Also note that the current fallback wraps to fmt.Errorf("mailq command not found") and discards the underlying LookPath error, which loses useful diagnostics.

♻️ Suggested helper
+func resolveMailqPath() (string, error) {
+	if path, err := lookupAbsolutePath("mailq"); err == nil {
+		return path, nil
+	}
+	path, err := lookupAbsolutePath("/usr/bin/mailq")
+	if err != nil {
+		return "", fmt.Errorf("mailq command not found: %w", err)
+	}
+	return path, nil
+}

Then in both call sites:

-	mailqPath, err := lookupAbsolutePath("mailq")
-	if err != nil {
-		mailqPath, err = lookupAbsolutePath("/usr/bin/mailq")
-		if err != nil {
-			return 0, fmt.Errorf("mailq command not found")
-		}
-	}
+	mailqPath, err := resolveMailqPath()
+	if err != nil {
+		return 0, err
+	}

Also applies to: 836-847

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/notify/email.go` around lines 790 - 801, Extract the duplicated
two-step mailq lookup into a single helper (e.g., findMailqPath) and replace the
direct calls in checkMailQueue and detectQueueEntry with that helper; the helper
should attempt lookupAbsolutePath("mailq"), then
lookupAbsolutePath("/usr/bin/mailq") (and any future paths) and return the
resolved path or a wrapped error that includes the underlying LookPath error(s)
instead of discarding them so commandForMailTool callers get detailed
diagnostics. Ensure the helper's name is used where mailqPath is currently
computed and that returned errors preserve context (e.g., fmt.Errorf("mailq
command not found: %w", err)).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@cmd/proxsave/main_update.go`:
- Around line 89-127: The version parser currently strips prerelease suffixes,
so same numeric versions with/without prerelease are treated equal; update the
parse function (parse) to also detect and return whether a prerelease suffix was
present (e.g., add a bool like hasPrerelease), then update the comparison logic
(after curMaj/curMin/curPatch and latMaj/latMin/latPatch are computed) to, when
major/minor/patch are equal, treat a stable release (latest has
hasPrerelease==false and current has hasPrerelease==true) as newer (return
true), and treat the opposite (latest prerelease while current is stable) as not
newer (return false); keep existing numeric comparisons otherwise.

In `@internal/orchestrator/decrypt_tui_e2e_helpers_test.go`:
- Around line 289-296: The test blocks waiting on h.done which is only closed by
h.stop() (registered in t.Cleanup), so slow cleanup can trigger the timeout; add
a dedicated completion signal (e.g., h.runCompleted chan struct{}) and close it
when RunDecryptWorkflowTUI returns inside run(), then change this select to wait
on h.runCompleted (or include it alongside h.done) instead of relying solely on
h.done; keep h.stop()/t.Cleanup for teardown. Ensure to initialize
h.runCompleted in the harness, close it right after RunDecryptWorkflowTUI
returns in run(), and update the select in decrypt_tui_e2e_helpers_test.go to
listen for <-h.runCompleted.

---

Nitpick comments:
In `@internal/notify/email.go`:
- Around line 790-801: Extract the duplicated two-step mailq lookup into a
single helper (e.g., findMailqPath) and replace the direct calls in
checkMailQueue and detectQueueEntry with that helper; the helper should attempt
lookupAbsolutePath("mailq"), then lookupAbsolutePath("/usr/bin/mailq") (and any
future paths) and return the resolved path or a wrapped error that includes the
underlying LookPath error(s) instead of discarding them so commandForMailTool
callers get detailed diagnostics. Ensure the helper's name is used where
mailqPath is currently computed and that returned errors preserve context (e.g.,
fmt.Errorf("mailq command not found: %w", err)).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 5e0f7117-086a-4e25-a0f4-71a9a8d0cf5d

📥 Commits

Reviewing files that changed from the base of the PR and between da7a7e7 and 5e36f47.

📒 Files selected for processing (8)
  • .github/workflows/dependabot-automerge.yml
  • cmd/proxsave/main_update.go
  • internal/backup/collector_pbs.go
  • internal/notify/email.go
  • internal/orchestrator/decrypt_tui_e2e_helpers_test.go
  • internal/orchestrator/decrypt_tui_e2e_test.go
  • internal/orchestrator/deps_test.go
  • internal/orchestrator/workflow_ui_tui_decrypt.go

Comment thread cmd/proxsave/main_update.go Outdated
Comment thread internal/orchestrator/decrypt_tui_e2e_helpers_test.go
Multiple fixes and feature additions across the codebase: update README PayPal link; improve finishMainRun panic handling and ensure runDone/logging are called; allow --install to choose CLI vs TUI; make isNewerVersion treat stable > prerelease and add tests; add timeout to hostname resolution. Backup collector: honor CustomBackupPaths, add pbsRepositoryWithDatastore helper and use it when building PBS_REPOSITORY, plus new tests. Email notifier: centralize mailq lookup (findMailqPath) and use it for queue checks. Orchestrator/restore: introduce localNodeName and helpers (pveshGuestExists, pveshCreateGuestArgs, pveshArgValue, isPveshNotFoundError), ensure missing guests are created before applying configs, stop parsing at section headers, and add tests for these behaviors. TUI tests: add runCompleted signaling to timedSimHarness and ensure RunDecryptWorkflowTUI signals completion. Misc: small test and brick-call refactors.
@tis24dev tis24dev closed this May 6, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants