fix: fuzz-test accuracy and infrastructure improvements#14
Conversation
event_id is now assigned once at INSERT and never changed by UPDATE/DELETE. The consumer increments seq naturally for subsequent ops on the same row. Validator compares per (event_id, seq) pair instead of just distinct event_ids.
- event_id assigned once at INSERT, never changed by UPDATE/DELETE - Consumer increments seq for subsequent ops on same row - Validator compares per (event_id, seq) pair - Dockerfile runs as non-root app user - Kafka retention increased to 24h for investigation - Validator shared Dockerfile with consumer for consistent user
Search full OLR log for startup message instead of tail -5, which missed it when DUMP trace output was enabled.
📝 WalkthroughWalkthroughRefactors the fuzz testing framework: per-sequence validation replaces per-event merging; sequence maps no longer trimmed; workload SQL stops updating event_id; test scripts and container configs adjusted; and an "Investigation Policy" was added to AGENTS.md. (49 words) Changes
Sequence Diagram(s)sequenceDiagram
participant Workload as SQL Workload
participant Kafka as Kafka Broker
participant Consumer as Kafka Consumer
participant LM as LM Tracker
participant OLR as OLR Tracker
participant Validator as Validator
Workload->>Kafka: Emit events (preserve event_id)
Kafka->>Consumer: Deliver messages
Consumer->>LM: Record (event_id, seq) -> lm_seq
Consumer->>OLR: Record (event_id, seq) -> olr_seq
Note over LM,OLR: No periodic trimming (retain all sequences)
Validator->>LM: Query seqs for event_id
Validator->>OLR: Query seqs for event_id
loop for each (event_id, seq)
Validator->>Validator: Normalize LM row for (eid,seq)
Validator->>Validator: Normalize OLR row for (eid,seq)
Validator->>Validator: compare_values -> match/mismatch/diagnostic
end
Validator->>Validator: Emit per-seq counts & diagnostics
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
tests/dbz-twin/rac/perf/fuzz-workload.sql (1)
427-445:⚠️ Potential issue | 🟠 MajorSeeded rows now stay invisible to CDC validation.
run()seedsFUZZ_SCALAR,FUZZ_LOB, andFUZZ_PARTwithevent_id='SEED'and immediately adds those IDs to the tracked pools at Lines 731-752 in this file. After these UPDATE paths stopped rewritingevent_id, any UPDATE that hits one of those seeded rows keepsSEED, andtests/dbz-twin/rac/kafka-consumer.pydropsSEEDevents at Lines 89-90. Early UPDATE traffic on seeded rows—and any later DELETE on those rows—never reaches SQLite or the validator. Please either keep seeded IDs out of the tracked pools or assign them a realevent_idbefore they become UPDATE/DELETE targets.Also applies to: 463-474, 569-579
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/dbz-twin/rac/perf/fuzz-workload.sql` around lines 427 - 445, Seeded rows are being added to the tracked ID pools in run(), which causes later UPDATE/DELETE procedures (e.g., PROCEDURE do_update_scalar using pick_tracked_id and g_scalar_ids/g_scalar_id_cnt) to hit rows with event_id='SEED' that the Kafka consumer drops; either stop adding seeded IDs to the tracked pools or give seeded rows a real event_id before they are added: change run() to exclude SEED-inserted IDs from g_scalar_ids/g_lob_ids/g_part_ids (or add only after updating event_id), or immediately update seeded rows' event_id to a non-'SEED' value before any code pushes their IDs into the tracked arrays so that pick_tracked_id in do_update_scalar / do_update_lob / do_update_part will never return SEED rows.tests/dbz-twin/rac/validator.py (1)
249-264:⚠️ Potential issue | 🟠 MajorThe frontier still skips later
seqs for already-validatedevent_ids.The new
(event_id, seq)comparison only fixes the inner match. The outer scan at Lines 200-208 still selectsevent_id > cursor_by_node, and Lines 333-335 advance that cursor to the maxevent_id. With immutableevent_ids fromtests/dbz-twin/rac/perf/fuzz-workload.sql, later UPDATE/DELETEs append newseqs to olderevent_ids, so those rows are never revisited after their initial insert was validated. Long update-heavy stretches can therefore hit the idle path with unvalidated events still in SQLite. Track validation progress per(event_id, seq)or keep revisiting seenevent_ids until both sides' seq frontiers stop moving.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/dbz-twin/rac/validator.py` around lines 249 - 264, The outer scan and cursor advancement currently use cursor_by_node and only advance to the max event_id, which skips later seqs appended to existing event_id values; change the validation frontier to be per (event_id, seq) or ensure event_ids are revisited until both sides' seq frontiers stop moving. Concretely, replace or augment cursor_by_node with a per-event progress tracker (e.g., cursor_by_event mapping keyed by event_id and last_validated_seq) or, when advancing an event_id in the loop that builds lm_by_seq / olr_by_seq, only advance cursor_by_node for that event_id when max(lm_by_seq.keys()) == max(olr_by_seq.keys()) (i.e., both sides' seq frontiers match); ensure lm_events and olr_events queries and the all_seqs logic (lm_by_seq, olr_by_seq, all_seqs) use that per-(event_id, seq) logic so updates/deletes appended to older event_id values are revisited until validated.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/dbz-twin/rac/fuzz-test.sh`:
- Around line 311-328: The idle-check loop in fuzz-test.sh (variables prev_line,
cur_line, idle_count reading docker logs for fuzz-consumer and grepping
'\[consumer\]') can exit prematurely because the consumer prints progress only
every 30s; update the loop to either require two successive identical
'[consumer]' log lines before considering it idle (e.g., track last_two_lines or
a stable_report_count) or replace the log-polling with a direct check of the
SQLite row counts used by the validator (query the DB and ensure counts are
stable across two checks) and then set final_counts accordingly (currently
computed into final_counts) before proceeding to run the validator.
In `@tests/dbz-twin/rac/validator.py`:
- Around line 270-289: The per-sequence branches that detect a row present only
on one side (the blocks checking "if lm_r and not olr_r" and "if olr_r and not
lm_r") currently increment total_validated/total_mismatches/total_lob_known but
do not update the aggregate missing counters; update those branches to also
increment total_missing_olr when lm_r exists without olr_r, and increment
total_missing_lm when olr_r exists without lm_r, preserving the existing is_lob
handling (i.e., still count lob rows into total_lob_known instead of the missing
counters) and keep the same print/log behavior so per-seq missing rows
contribute to the final summary.
---
Outside diff comments:
In `@tests/dbz-twin/rac/perf/fuzz-workload.sql`:
- Around line 427-445: Seeded rows are being added to the tracked ID pools in
run(), which causes later UPDATE/DELETE procedures (e.g., PROCEDURE
do_update_scalar using pick_tracked_id and g_scalar_ids/g_scalar_id_cnt) to hit
rows with event_id='SEED' that the Kafka consumer drops; either stop adding
seeded IDs to the tracked pools or give seeded rows a real event_id before they
are added: change run() to exclude SEED-inserted IDs from
g_scalar_ids/g_lob_ids/g_part_ids (or add only after updating event_id), or
immediately update seeded rows' event_id to a non-'SEED' value before any code
pushes their IDs into the tracked arrays so that pick_tracked_id in
do_update_scalar / do_update_lob / do_update_part will never return SEED rows.
In `@tests/dbz-twin/rac/validator.py`:
- Around line 249-264: The outer scan and cursor advancement currently use
cursor_by_node and only advance to the max event_id, which skips later seqs
appended to existing event_id values; change the validation frontier to be per
(event_id, seq) or ensure event_ids are revisited until both sides' seq
frontiers stop moving. Concretely, replace or augment cursor_by_node with a
per-event progress tracker (e.g., cursor_by_event mapping keyed by event_id and
last_validated_seq) or, when advancing an event_id in the loop that builds
lm_by_seq / olr_by_seq, only advance cursor_by_node for that event_id when
max(lm_by_seq.keys()) == max(olr_by_seq.keys()) (i.e., both sides' seq frontiers
match); ensure lm_events and olr_events queries and the all_seqs logic
(lm_by_seq, olr_by_seq, all_seqs) use that per-(event_id, seq) logic so
updates/deletes appended to older event_id values are revisited until validated.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: c9f207d9-f95f-4948-8312-bd9412800bdc
📒 Files selected for processing (7)
AGENTS.mdtests/dbz-twin/rac/Dockerfiletests/dbz-twin/rac/docker-compose-fuzz.yamltests/dbz-twin/rac/fuzz-test.shtests/dbz-twin/rac/kafka-consumer.pytests/dbz-twin/rac/perf/fuzz-workload.sqltests/dbz-twin/rac/validator.py
Per-seq MISSING_OLR/EXTRA_OLR branches incremented total_mismatches but not total_missing_olr/total_missing_lm, under-reporting in the summary output.
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
tests/dbz-twin/rac/validator.py (1)
249-264:⚠️ Potential issue | 🔴 CriticalDon't keep the watermark keyed only by
event_id.Lines 249-251 make
event_idstable for the whole row lifetime, but the surrounding scan still advancescursor_by_nodebyevent_idand only refetchesevent_id > cursor. That means once a row has been validated once, any laterseqfor that sameevent_idis permanently behind the cursor, so subsequent UPDATE/DELETE mismatches are never checked and the idle pass can exit with unvalidated work still pending. This needs progress tracking per(event_id, seq)or, at minimum, a revisit path for event IDs whose maxseqhas grown.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/dbz-twin/rac/validator.py` around lines 249 - 264, The validator currently advances cursor_by_node keyed only by event_id so later seq updates for the same event_id are skipped; change progress tracking to include seq (or track max seq per event_id and revisit when it grows). Concretely, modify the scan/advance logic around cursor_by_node, and when you build lm_by_seq / olr_by_seq from lm_events and olr_events, compute the highest seq for that event_id and store cursor_by_node[node] as (event_id, max_seq) or maintain a separate cursor_seq map; ensure the loop refetches event_ids whose max_seq > stored seq (or uses tuple comparison) so UPDATE/DELETE seqs are validated, using the existing variables lm_events, olr_events, lm_by_seq, olr_by_seq, all_seqs to detect and drive the revisit.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@tests/dbz-twin/rac/validator.py`:
- Around line 249-264: The validator currently advances cursor_by_node keyed
only by event_id so later seq updates for the same event_id are skipped; change
progress tracking to include seq (or track max seq per event_id and revisit when
it grows). Concretely, modify the scan/advance logic around cursor_by_node, and
when you build lm_by_seq / olr_by_seq from lm_events and olr_events, compute the
highest seq for that event_id and store cursor_by_node[node] as (event_id,
max_seq) or maintain a separate cursor_seq map; ensure the loop refetches
event_ids whose max_seq > stored seq (or uses tuple comparison) so UPDATE/DELETE
seqs are validated, using the existing variables lm_events, olr_events,
lm_by_seq, olr_by_seq, all_seqs to detect and drive the revisit.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 7893beae-9807-4f20-8afb-b186d4de2563
📒 Files selected for processing (1)
tests/dbz-twin/rac/validator.py
Summary
Fixes and improvements to the fuzz-test framework for OLR data accuracy validation on Oracle RAC.
Bug fixes:
INSERT OR REPLACEto overwrite the INSERTtail -5, which missed the startup message with verbose trace outputValidator improvements:
Infrastructure:
Test Results (30-minute run)
Test plan
Summary by CodeRabbit
Documentation
Tests