Skip to content

fix: LOB phantom undo suppression + fuzz test improvements#16

Merged
rophy merged 6 commits intomasterfrom
fix/lob-phantom-undo
Mar 31, 2026
Merged

fix: LOB phantom undo suppression + fuzz test improvements#16
rophy merged 6 commits intomasterfrom
fix/lob-phantom-undo

Conversation

@rophy
Copy link
Copy Markdown
Owner

@rophy rophy commented Mar 31, 2026

Summary

  • Fix LOB phantom transaction emission by restoring FLG_ROLLBACK_OP0504 check removed in 1a2d316
  • Suppress phantom UPDATE undo on LOB tables in RAC online mode (extends INSERT→DELETE to UPDATE→UPDATE)
  • Add SKIP_LOB flag and tail event classification to fuzz test
  • Document L13 LOB unavailable value as known LogMiner limitation

Changes

OLR fixes

  • Parser.cpp: Restore FLG_ROLLBACK_OP0504 check on opcode 0x0504 commit records. Upstream had this; our olr#10 fix incorrectly removed it. The rollbackLastOp() fix in Transaction.cpp independently handles LOB phantom undo at the op level.
  • Transaction.cpp: Extend phantom undo detection to cover UPDATE→UPDATE (0x0B05→0x0B05) in addition to INSERT→DELETE (0x0B02→0x0B03). Oracle RAC generates phantom undo for both patterns. Guard: !lobStripped && deferCommittedTransactions && LOB table.

Fuzz test improvements

  • fuzz-workload.sql: Add p_skip_lob parameter. SKIP_LOB=1 ./fuzz-test.sh run 60 skips LOB table ops for absolute non-LOB accuracy testing.
  • validator.py: Classify events beyond the safe frontier as "tail" (timing lag) instead of mismatches. Skip __debezium_unavailable_value in both before and after images (L13).
  • KNOWN-LIMITATIONS.md: Add L13 documenting LogMiner LOB unavailable value behavior per Debezium docs.

Test plan

  • 160/160 redo-log regression tests pass
  • Fuzz 10min SKIP_LOB=1: 35,000 events, 100% match, 0 mismatches
  • Fuzz 10min with LOB: 56,599 events, 0 mismatches, 0 missing from OLR (was 12 before fix)
  • Fuzz 10min non-LOB: 69,443 events, 100% match
  • lob-operations-rac fixture passes (olr#10 scenario unaffected)

Related

Summary by CodeRabbit

  • Bug Fixes

    • Corrected commit/rollback classification and LOB-related rollback handling so transaction rollbacks and emitted metrics are more accurate.
    • Improved handling of mismatched redo fragments to prevent incorrect DML interpretation.
  • Documentation

    • Added note about Oracle LogMiner omitting unchanged LOB after-images, causing __debezium_unavailable_value placeholders.
  • Tests

    • Validator now skips unavailable LOB markers for before/after and distinguishes tail-lag events.
    • Fuzz tests can skip LOB operations via a new parameter.

rophy added 5 commits March 29, 2026 04:00
When Transaction::flush() accumulates multi-piece supplemental log records,
an orphaned first-piece (FB_F only) from a LOB row migration could block
records from other tables. The new record was dropped (warning 60017) but
its FB_L flag still triggered processDml() with the wrong data, permanently
losing the DML event.

Fix: clear the orphaned redo1/redo2 and replace with the current record.

Validated: 27,898 fuzz events, 0 non-LOB mismatches (was 4 before fix).
…ctions

Commit 1a2d316 removed the FLG_ROLLBACK_OP0504 check from
appendToTransactionCommit() as part of the olr#10 fix, but this was
overly aggressive. The rollbackLastOp() fix in Transaction.cpp already
handles LOB phantom undo at the op level independently.

Without this check, OLR emits ~2% extra phantom events on LOB tables
where Oracle internally commits then rolls back the same XID.

Fixes #15
LogMiner only includes LOB column values when explicitly changed by the
SQL statement. Unchanged LOB columns appear as __debezium_unavailable_value.
This is documented Debezium behavior (DBZ-4276), not a bug — OLR delivers
actual LOB content that LogMiner cannot.

Update validator to skip unavailable markers in both before and after
images instead of only before images.
- Add p_skip_lob parameter to FUZZ_WKL.run() to skip LOB table
  operations. Usage: SKIP_LOB=1 ./fuzz-test.sh run 60
  Debezium LogMiner has a known bug dropping LOB events on RAC
  (see DEBEZIUM-BUG-RAC-LOB.md). Skipping LOB allows sustained
  fuzz testing focused on absolute accuracy.

- Classify events beyond the safe frontier as "tail" instead of
  mismatches. OLR processes redo faster than Debezium LogMiner,
  so at drain time OLR is ahead. These tail events are timing
  lag, not data loss.

- Add DBZ_LM_CONNECTOR_JAR env var to mount a patched Debezium
  connector JAR for the LogMiner adapter.

- Add Debezium RAC LOB bug report and review questions.
Extend the phantom undo detection in rollbackLastOp() to cover
UPDATE->UPDATE (0x0B05->0x0B05) in addition to INSERT->DELETE
(0x0B02->0x0B03). Oracle RAC generates phantom undo for both
patterns during LOB segment management. Legitimate LOB rollbacks
always strip LOB index records first (lobStripped=true), so the
guard remains safe.

Fixes 12 missing LOB UPDATE events per 10-min fuzz test run.
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 31, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: e694f597-7810-446d-acc0-b931dbc735b2

📥 Commits

Reviewing files that changed from the base of the PR and between c544010 and e3bdc5f.

📒 Files selected for processing (3)
  • tests/KNOWN-LIMITATIONS.md
  • tests/dbz-twin/rac/perf/fuzz-workload.sql
  • tests/dbz-twin/rac/validator.py
🚧 Files skipped from review as they are similar to previous changes (3)
  • tests/KNOWN-LIMITATIONS.md
  • tests/dbz-twin/rac/perf/fuzz-workload.sql
  • tests/dbz-twin/rac/validator.py

📝 Walkthrough

Walkthrough

Adds rollback classification for OP0504 records in the parser, expands phantom LOB-undo pattern detection, discards mismatched redo fragments on supplemental-log warnings, and updates test harness and validator to support optional LOB skipping, unavailable-LOb markers, and tail-lag accounting.

Changes

Cohort / File(s) Summary
Parser commit/rollback handling
src/parser/Parser.cpp
Inspect redoLogRecord1->flg for FLG_ROLLBACK_OP0504 and set transaction->rollback = true for OP0504 records before flush/skip evaluation.
Transaction logic
src/parser/Transaction.cpp
Treat 0x0B05 -> 0x0B05 as a phantom LOB-undo pattern in rollbackLastOp; on warning 60017 discard buffered redo1/redo2 and re-seed with current fragments while resetting transactionType.
Known limitations
tests/KNOWN-LIMITATIONS.md
Add L13: LogMiner omits unchanged LOB after-images causing Debezium __debezium_unavailable_value placeholders; document validator guidance.
Fuzz harness / workload
tests/dbz-twin/rac/fuzz-test.sh, tests/dbz-twin/rac/perf/fuzz-workload.sql
Add SKIP_LOB / p_skip_lob parameter, thread it through runner; skip initial LOB seeds and remap table-selection to avoid LOB-targeted ops when enabled.
Validator / metrics
tests/dbz-twin/rac/validator.py
Skip unavailable-Lob markers on both before/after comparisons; capture a safe_frontier snapshot and classify missing events within tail lag separately, adding tail counters to progress and PASS qualifiers.

Sequence Diagram(s)

(Skipped — changes are localized control-flow and test updates that do not introduce a new multi-component feature requiring sequential visualization.)

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related issues

Poem

🐇 I sniffed the redo stream at night,

Flags and fragments set to right,
Phantom LOBs I chased away,
Now commits and rollbacks clear the way,
Hoppity—tests pass in morning light!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 16.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly summarizes the main changes: LOB phantom undo suppression and fuzz test improvements, which aligns with the key code modifications across Parser.cpp, Transaction.cpp, and fuzz testing tooling.
Linked Issues check ✅ Passed All code requirements from issue #15 are addressed: FLG_ROLLBACK_OP0504 check added to Parser.cpp [#15], phantom undo detection expanded in Transaction.cpp [#15], and fuzz tooling enhanced to validate fixes with skip-LOB parameter and tail-lag classification [#15].
Out of Scope Changes check ✅ Passed All changes are in-scope: Parser/Transaction core fixes address phantom LOB commits, fuzz improvements validate the fixes, KNOWN-LIMITATIONS.md documents the context, and all modifications directly support the #15 objectives.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/lob-phantom-undo

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
tests/KNOWN-LIMITATIONS.md (1)

7-10: ⚠️ Potential issue | 🟡 Minor

Update header text to include L13.

The header states external limitations are "L1-L7" but L13 is now added as an external limitation. Update for consistency.

 Entries are split into two categories:
-- **External limitations** (L1-L7): Oracle LogMiner or Debezium behavior that
+- **External limitations** (L1-L7, L13): Oracle LogMiner or Debezium behavior that
   cannot be fixed in OLR. These require workarounds in test comparison scripts.
 - **OLR bugs** (L8-L12): Issues in OLR that should be fixed. Each has a
   corresponding GitHub issue.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/KNOWN-LIMITATIONS.md` around lines 7 - 10, The header "**External
limitations** (L1-L7)" is now out-of-date because L13 was added; update the
header text to list the new range (for example change "(L1-L7)" to "(L1-L7,
L13)" or similar) so it accurately reflects that L13 is an external limitation;
update the same header string in KNOWN-LIMITATIONS.md (the "**External
limitations**" header) and search for any other occurrences of that header text
to keep them consistent.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/dbz-twin/rac/perf/fuzz-workload.sql`:
- Around line 642-648: The comment above the LOB-remapping block is inaccurate:
the code (IF g_skip_lob = 1 AND v_table_dice > 40 AND v_table_dice <= 55 THEN
v_table_dice := rand_int(1, 30); END IF;) remaps the entire 15% LOB range to the
scalar bucket (making scalar ~45%), not the redistributed percentages shown;
update the comment to state that when g_skip_lob=1 the 41–55 LOB range is
remapped entirely to scalar (1–30), or alternatively implement proper
redistribution logic across the other buckets if the original percentages (35%
scalar, 12% wide, etc.) were intended—refer to variables/functions v_table_dice,
g_skip_lob, and rand_int to locate the code to change.

In `@tests/dbz-twin/rac/validator.py`:
- Line 392: The print statement using an f-string with no placeholders
(print(f"\n  RESULT: PASS", flush=True)) should be changed to a regular string
literal to satisfy static analysis; locate the print call that outputs "\n 
RESULT: PASS" in validator.py (the print(...) near line 392) and remove the
unnecessary f prefix so it becomes print("\n  RESULT: PASS", flush=True).

---

Outside diff comments:
In `@tests/KNOWN-LIMITATIONS.md`:
- Around line 7-10: The header "**External limitations** (L1-L7)" is now
out-of-date because L13 was added; update the header text to list the new range
(for example change "(L1-L7)" to "(L1-L7, L13)" or similar) so it accurately
reflects that L13 is an external limitation; update the same header string in
KNOWN-LIMITATIONS.md (the "**External limitations**" header) and search for any
other occurrences of that header text to keep them consistent.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 5b104883-499f-4994-95d4-f3c2171ccb46

📥 Commits

Reviewing files that changed from the base of the PR and between 91acc0b and c544010.

📒 Files selected for processing (6)
  • src/parser/Parser.cpp
  • src/parser/Transaction.cpp
  • tests/KNOWN-LIMITATIONS.md
  • tests/dbz-twin/rac/fuzz-test.sh
  • tests/dbz-twin/rac/perf/fuzz-workload.sql
  • tests/dbz-twin/rac/validator.py

- Update KNOWN-LIMITATIONS.md header to include L13
- Fix inaccurate comment in fuzz-workload.sql about LOB skip redistribution
- Remove unnecessary f-string prefix in validator.py
@rophy rophy merged commit 7f4b893 into master Mar 31, 2026
2 checks passed
@rophy rophy deleted the fix/lob-phantom-undo branch March 31, 2026 02:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

LOB phantom transactions: OLR doesn't distinguish commit from rollback in opcode 5.4

1 participant