Skip to content

Add master-slave replication (rsync+SSH config sync)#42

Open
lionevil1 wants to merge 4 commits intoSamNet-dev:mainfrom
lionevil1:main
Open

Add master-slave replication (rsync+SSH config sync)#42
lionevil1 wants to merge 4 commits intoSamNet-dev:mainfrom
lionevil1:main

Conversation

@lionevil1
Copy link

Summary

Implements the Replication feature discussed in #39 — automatic config synchronization from a master server to one or more slaves via rsync+SSH on a configurable interval (default: 60s).

  • Synced files: secrets.conf, upstreams.conf, instances.conf, config.toml
  • Never synced: settings.conf, replication.conf (slave role always preserved)
  • Self-contained sync script at /opt/mtproxymax/mtproxymax-sync.sh
  • systemd timer + oneshot service (mtproxymax-sync.timer)
  • flock prevents overlapping sync runs
  • Migration guard in load_settings() ensures exclude list is always correct

New CLI

mtproxymax replication setup        # interactive wizard (master/slave/standalone)
mtproxymax replication status       # role, timer state, last sync, slave list
mtproxymax replication add <host> [port] [label]
mtproxymax replication remove <label>
mtproxymax replication sync         # trigger immediate sync
mtproxymax replication test [host]  # test SSH connectivity
mtproxymax replication logs         # view sync log
mtproxymax replication promote      # promote slave → master (failover)
mtproxymax replication enable/disable/reset

TUI

  • [r] Replication menu item added to main menu
  • Full show_replication_menu() with wizard, status, logs, sync

Status integration

mtproxymax status shows replication role when not standalone.

Tests

tests/test_replication.sh — 1027 lines, no Docker/SSH/systemd required:

  • save_replication / load_replication round-trip
  • replication_add / replication_remove CRUD validation
  • Settings persistence of all REPLICATION_* keys
  • Validation edge cases (invalid role, port out of range)
  • Migration guard (REPLICATION_EXCLUDE auto-append)
bash tests/test_replication.sh

Checklist

Closes #39

lionevil1 and others added 3 commits March 27, 2026 08:54
Implements Section 14b: automatic config synchronization from a master
server to one or more slave servers via rsync+SSH on a configurable
interval (default: 60s).

Synced files: secrets.conf, upstreams.conf, instances.conf, config.toml
Never synced: settings.conf, replication.conf (slave role preserved)

New commands:
  mtproxymax replication setup        — interactive wizard
  mtproxymax replication status       — role, timer state, last sync
  mtproxymax replication add <host>   — register a slave
  mtproxymax replication remove       — remove a slave
  mtproxymax replication sync         — trigger immediate sync
  mtproxymax replication test         — test SSH connectivity
  mtproxymax replication logs         — view sync log
  mtproxymax replication promote      — promote slave to master
  mtproxymax replication enable/disable/reset

Implementation:
- Self-contained sync script at /opt/mtproxymax/mtproxymax-sync.sh
- systemd timer + oneshot service (mtproxymax-sync.timer)
- flock prevents overlapping sync runs
- REPLICATION_EXCLUDE hardcoded to always include settings.conf and
  replication.conf — slave role can never be overwritten by master
- Migration guard in load_settings() ensures exclude list is correct
- SSH key auto-generated (ed25519), copied via ssh-copy-id
- TUI menu: [r] Replication in main menu + show_replication_menu()
- Status: replication role shown in mtproxymax status output

Tests: tests/test_replication.sh (1027 lines, no Docker/SSH required)
  covers save/load round-trip, CRUD, settings persistence, validation,
  migration guard

Closes SamNet-dev#39

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…changelog

- Add Replication (master-slave rsync+SSH) feature section after Telegram Bot
- Add Master-Slave Replication row to comparison table
- Add Replication commands to CLI Reference
- Merge v1.0.4 changelog: Replication + Engine v3.3.32 + SNI + metrics

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Why MTProxyMax: add Replication bullet
- Features: add Replication section (after Telegram Bot)
- Comparison table: add Master-Slave Replication row
- Architecture: include Replication diagram
- CLI Reference: add Replication commands block
- Changelog v1.0.4: merge Replication + Engine v3.3.32 + SNI + metrics

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copy link
Owner

@SamNet-dev SamNet-dev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the solid work on this @lionevil1 — the architecture is sound, the exclude-list safety is impressive (triple-layered protection), and the test suite is genuinely thorough. A few things need addressing before we can merge:


Must Fix

1. Silent flock timeout in save_replication()

flock -w 5 201 2>/dev/null
mv "$tmp" "$REPLICATION_FILE"

If the sync script holds the lock, flock -w 5 times out and the mv proceeds without the lock. Need a failure guard:

flock -w 5 201 2>/dev/null || { log_error "Could not acquire lock"; rm -f "$tmp"; exec 201>&-; return 1; }

2. No dependency checks for rsync, ssh, ssh-keygen

None of these are verified before use. On minimal Alpine/Docker images they may be missing. A missing rsync would silently fail every 60 seconds. Add command -v checks at the top of the wizard and in do_sync().

3. Hardcoded root@ — no configurable SSH user

Every SSH/rsync call forces root@${host}. Add a REPLICATION_SSH_USER setting (default root) so users with security policies prohibiting root SSH can use a dedicated sync user.

4. IPv6 addresses silently rejected

Host regex ^[a-zA-Z0-9._-]+$ rejects colons. The error message says "Use IP or FQDN" but doesn't mention IPv6 isn't supported. Either add bracketed IPv6 support ([2001:db8::1]) or document the limitation clearly in the error message.

5. StrictHostKeyChecking=accept-new with no warning

Auto-accepts unknown host keys on first connection. For a tool that syncs proxy secrets, the wizard should warn the user that the first connection uses trust-on-first-use, or show the host key fingerprint for confirmation.


Should Fix

6. rsync --delete can destroy slave-local files

Any file in /opt/mtproxymax/ on the slave that doesn't exist on the master (and isn't excluded) gets deleted — diagnostic scripts, custom configs, etc. Either document this clearly in the wizard, use --delete-after (safer ordering), or add a REPLICATION_DELETE_EXTRA toggle.

7. Test file save_settings/load_settings are missing UNKNOWN_SNI_ACTION

The test copies are already out of sync with production — missing UNKNOWN_SNI_ACTION in the heredoc, whitelist, and validation. This will mask bugs. Sync them up with the current production versions.

8. "KNOWN BUG" comments in tests appear stale

The code already uses _rl_ prefixed variable names in load_replication, so the variable scoping collision described in tests 2.14, 2.15, 3.3, 3.4, 3.6, 6.5 doesn't exist. Verify and remove these misleading comments.

9. replication_promote doesn't generate an SSH key

A freshly promoted slave won't have /opt/mtproxymax/.ssh/id_ed25519. The first replication sync will fail. Either auto-generate a key during promote, or check and warn that replication setup needs to be run first.

10. enable subcommand doesn't check if role is master

replication enable starts the systemd timer regardless of role. On a slave, the timer fires every 60s just to exit immediately (sync script checks role). Add a role guard — only masters should enable the timer.

11. Sync script mktemp uses /tmp/ instead of $INSTALL_DIR

tmp=$(mktemp /tmp/.mtproxymax-sync.XXXXXX)

If /tmp and /opt/mtproxymax are on different filesystems, mv "$tmp" "$REPLICATION_FILE" isn't atomic — brief window for partial reads. Use $INSTALL_DIR like the production _mktemp() does.

12. Wizard hostname -I is Linux-specific

hostname -I isn't available on Alpine's busybox or BSDs. If both hostname -I and hostname -s fail, the hint shows mtproxymax replication add 22 with a blank host. Add a fallback or suppress the hint when the commands aren't available.


Once these are addressed we're good to merge. The core design is solid — these are mostly hardening items. Let us know if you have questions on any of them.

@lionevil1 lionevil1 requested a review from SamNet-dev March 27, 2026 09:36
…ages (SamNet-dev#43-SamNet-dev#46)

- reload_proxy_config: flush traffic counters before SIGHUP
- self_update: fix flock FD leak (exec {fd}>&- + trap RETURN)
- self_update: set _SCRIPT_NEEDS_REEXEC=true after script update
- self_update: auto-remove old engine Docker images after engine upgrade
- TUI menu: re-exec updated script instead of continuing with stale binary

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature request: Master-Slave Replication (rsync+SSH config sync)

2 participants