Skip to content

Add TestReplayXversion module for cross-version WAL replay testing#42

Open
x4m wants to merge 7 commits into
PGBuildFarm:mainfrom
x4m:x_version_replay
Open

Add TestReplayXversion module for cross-version WAL replay testing#42
x4m wants to merge 7 commits into
PGBuildFarm:mainfrom
x4m:x_version_replay

Conversation

@x4m
Copy link
Copy Markdown

@x4m x4m commented May 26, 2026

Test that WAL generated by the .0 release of a major version replays correctly on the current STABLE binary. This catches backwards- compatibility regressions in WAL replay code, such as the self-deadlock in RecordNewMultiXact introduced by commit 0852643e1c6.

The .0 binary is built once and cached. Each run generates fresh WAL (including 2500 multixacts via the savepoint trick to cross an SLRU page boundary) and verifies that a STABLE standby can replay it.

Written using Cursor, needs more testing. I've fixed some issues that catched my eye, but some more review is definitely needed.

x4m added 6 commits May 26, 2026 16:02
Test that WAL generated by the .0 release of a major version replays
correctly on the current STABLE binary.  This catches backwards-
compatibility regressions in WAL replay code, such as the self-deadlock
in RecordNewMultiXact introduced by commit 0852643e1c6.

The .0 binary is built once and cached.  Each run generates fresh WAL
(including 2500 multixacts via the savepoint trick to cross an SLRU
page boundary) and verifies that a STABLE standby can replay it.
Run the .0 regression tests via pg_regress against the .0 primary for
diverse WAL coverage.  The regression test files are preserved in
inst-dot0/regress/ during the build phase.  A 180-second watchdog
kills pg_regress if it hangs.
The .0 regression tests run against the .0 server, so there are no
feature mismatches and allow_in_place_tablespaces is unnecessary.
Use -m immediate for pg_ctl stop so a deadlocked standby does not
hang the test indefinitely.
A STABLE branch is created months before the GA release, so the
REL_x_0 tag may not exist yet.  Silently skip rather than reporting
a build failure.
- Set allow_in_place_tablespaces on the STABLE standby so it can
  replay in-place tablespace WAL from .0 regression tests.
- Add -t 10 timeout and SIGKILL fallback to stop_and_clean so a
  deadlocked standby does not hang the test indefinitely.
- Reduce --max-concurrent-tests to 4 to keep checkpoint time short.
- Remove unused fields (major, pgsql) and File::Basename import.
@x4m
Copy link
Copy Markdown
Author

x4m commented May 27, 2026

We use autotools for .0 build. I'm not sure about it, can swap to meson if needed. $stable_inst is done via buildfarm pipeline. Maybe I'll try to switch build of .0 to buildfarm pipeline too.

When the module runs, add replay_xversion to PG_TEST_EXTRA so that
future TAP tests in src/test/recovery/ can be gated on this token.
Harmless until PostgreSQL adds such tests.
@x4m x4m force-pushed the x_version_replay branch from a9242b6 to bb65c23 Compare May 28, 2026 19:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant