Skip to content

Blkseq commitlsn#5848

Open
markhannum wants to merge 2 commits intobloomberg:mainfrom
markhannum:blkseq_commitlsn
Open

Blkseq commitlsn#5848
markhannum wants to merge 2 commits intobloomberg:mainfrom
markhannum:blkseq_commitlsn

Conversation

@markhannum
Copy link
Copy Markdown
Contributor

This PR changes the durability algorithm used for blkseq replays. If we detected a replay previously, the master would return 'this is durable' only if the most recent commit was durable. This instead records a transactions's blkseq->commitlsn mapping in the blkseq_commitlsns table. This allows us to determine durability using the transaction's actual commit-lsn, falling through to the previous logic if we cannot find it.

Copy link
Copy Markdown

@roborivers roborivers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cbuild submission: Success ✓.
Regression testing: Success ✓.

The first 10 failing tests are:
truncatesc_offline_generated **quarantined**
sc_resume_logicalsc_generated **quarantined**
reco-ddlk-sql **quarantined**
sc_parallel_logicalsc_generated
consumer_non_atomic_default_consumer_generated **quarantined**

Copy link
Copy Markdown

@roborivers roborivers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cbuild submission: Success ✓.
Regression testing: Success ✓.

The first 10 failing tests are:
consumer_non_atomic_default_consumer_generated **quarantined**
reco-ddlk-sql [timeout] **quarantined**

Copy link
Copy Markdown

@roborivers roborivers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cbuild submission: Success ✓.
Regression testing: Success ✓.

The first 10 failing tests are:
scindex_logicalsc_generated [failed with core dumped]
consumer_non_atomic_default_consumer_generated **quarantined**
reco-ddlk-sql [timeout] **quarantined**

@markhannum markhannum force-pushed the blkseq_commitlsn branch 3 times, most recently from 9ce4ffe to d4ba33c Compare April 6, 2026 15:01
Copy link
Copy Markdown

@roborivers roborivers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cbuild submission: Error ⚠.
Regression testing: Success ✓.

The first 10 failing tests are:
consumer_non_atomic_default_consumer_generated **quarantined**
unifiedcancel **quarantined**
truncatesc_offline_generated [timeout] **quarantined**
reco-ddlk-sql [timeout] **quarantined**

Copy link
Copy Markdown

@roborivers roborivers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cbuild submission: Error ⚠.
Regression testing: Success ✓.

The first 10 failing tests are:
socksql_master_swings
cldeadlock
consumer_non_atomic_default_consumer_generated **quarantined**
sc_transactional_rowlocks_generated **quarantined**
reco-ddlk-sql [timeout] **quarantined**

@markhannum markhannum force-pushed the blkseq_commitlsn branch 3 times, most recently from b288f1c to ff4d085 Compare April 7, 2026 13:21
Copy link
Copy Markdown

@roborivers roborivers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cbuild submission: Success ✓.
Regression testing: Success ✓.

The first 10 failing tests are:
scindex_logicalsc_generated
consumer_non_atomic_default_consumer_generated **quarantined**
sc_transactional_rowlocks_generated **quarantined**
queuedb_rollover [timeout] **quarantined**
sc_truncate_lockorder_generated [timeout] **quarantined**
reco-ddlk-sql [timeout] **quarantined**

Copy link
Copy Markdown

@roborivers roborivers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cbuild submission: Success ✓.
Regression testing: Success ✓.

The first 10 failing tests are:
sc_truncate_multiddl_generated [db unavailable at finish] **quarantined**
socksql_master_swings
queuedb_rollover_noroll1_generated **quarantined**
consumer_non_atomic_default_consumer_generated **quarantined**
reco-ddlk-sql [timeout] **quarantined**

Comment thread db/toblock.c
Comment thread bdb/cursor.c
Comment thread db/toblock.c
Comment thread db/toblock.c
/* Prepare failed to reach a majority of the cluster: fail the txn */
dist_txn_abort_write_blkseq(thedb->bdb_env, bskey, bskeylen);
trans_abort(iq, parent_trans);
dist_txn_abort_write_blkseq(thedb->bdb_env, bskey, bskeylen, NULL, 0);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we call this before releasing the locks, and use parent_trans?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

trans_abort releases all locks and will emit the 'dist-abort' record. The code is a bit lazy in forcing all non-durable prepares to fall back to the 'latest-commit-is-durable' case.

I wrote it this way because only prepared transactions require an explicit 'abort' record, and because we don't currently have a way to convey the LSN of that abort record to toblock.

Comment thread tests/final_non_durable_retry.test/runit
Comment thread tests/final_non_durable_retry.test/runit
@markhannum markhannum force-pushed the blkseq_commitlsn branch 2 times, most recently from 8818b08 to ba780e3 Compare April 8, 2026 13:59
Copy link
Copy Markdown

@roborivers roborivers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cbuild submission: Success ✓.
Regression testing: Success ✓.

The first 10 failing tests are:
truncatesc_offline_generated [failed with core dumped] **quarantined**
sc_resume_logicalsc_generated **quarantined**
consumer_non_atomic_default_consumer_generated **quarantined**
reco-ddlk-sql [timeout] **quarantined**

Copy link
Copy Markdown

@roborivers roborivers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cbuild submission: Success ✓.
Regression testing: Success ✓.

The first 10 failing tests are:
lostwrite
consumer_non_atomic_default_consumer_generated **quarantined**
phys_rep_tiered_firstfile_generated [timeout]
reco-ddlk-sql [timeout] **quarantined**

Copy link
Copy Markdown

@roborivers roborivers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cbuild submission: Error ⚠.
Regression testing: Success ✓.

The first 10 failing tests are:
consumer_non_atomic_default_consumer_generated **quarantined**
longreq_stats

@markhannum markhannum force-pushed the blkseq_commitlsn branch 2 times, most recently from 556b21b to 3363f10 Compare April 13, 2026 16:04
Signed-off-by: Mark Hannum <mhannum@bloomberg.net>
Signed-off-by: Mark Hannum <mhannum@bloomberg.net>
Copy link
Copy Markdown

@roborivers roborivers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cbuild submission: Success ✓.
Regression testing: Success ✓.

The first 10 failing tests are:
sc_resume_logicalsc_generated **quarantined**
timepart_retro
consumer_non_atomic_default_consumer_generated **quarantined**
reco-ddlk-sql [timeout] **quarantined**

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants