GitHub Issue Draft — flashbots/rbuilder
TITLE:
run_check_if_parent_block_is_last_block cancels ~67% of building jobs on non-datacenter hardware
BODY:
Summary
The continuous last_block_number check in run_check_if_parent_block_is_last_block() cancels approximately 67% of building jobs on consumer/VPS hardware. The root cause is a timing mismatch between Reth's database commit latency and the 100ms check interval.
Environment
- rbuilder develop branch (commit 55bbd32, also tested on 80ebfc8)
- Reth 1.9.3 (commit 27a8c0f5, same as rbuilder's pinned version)
- Tested on both integrated (reth-rbuilder) and standalone modes
- Hardware: Intel i5-12600K / 64GB RAM / NVMe (integrated) and AMD EPYC 4244P / 64GB RAM / NVMe VPS (standalone)
- Lighthouse v8.1.2 as CL
The Problem
Logs show constant cancellations:
INFO Cancelling building job reason="last block number" last_block_number=24753843 block=24753845
INFO Cancelling building job reason="last block number" last_block_number=24753843 block=24753846
last_block_number consistently trails the canonical head by 2-3 blocks. Measured bid success rates:
| Mode |
Bid Rate |
Hardware |
| Integrated |
32.6% |
i5-12600K, 64GB, consumer NVMe |
| Standalone |
31% |
EPYC 4244P, 64GB, datacenter VPS |
Config tuning attempted with no improvement: root_hash_sparse_trie_version (v1, v2, vexp), root_hash_threads (0, 4, 6), faster_finalize (true/false).
Root Cause
run_check_if_parent_block_is_last_block() in crates/rbuilder/src/live_builder/building/mod.rs polls every 100ms:
const CHECK_LAST_BLOCK_INTERVAL: Duration = Duration::from_millis(100);
let last_block_number = provider.last_block_number()?;
if last_block_number + 1 != block_ctx.block() {
block_cancellation.cancel();
}
last_block_number() returns the highest block committed to Reth's MDBX database on disk — not the latest block processed in memory. Reth's database write pipeline introduces 200ms-2s latency between "Block added to canonical chain" and "Canonical chain committed" (consistent with paradigmxyz/reth#8307).
The CL sends payload_attributes at slot boundary. rbuilder starts building for block N+1. The 100ms check fires, asks Reth for last_block_number(), gets N-2 (disk commit still in progress), sees N-2+1 != N+1, and cancels.
The sparse trie is not involved — the building job is cancelled before the trie does any work. The trie operates on a fixed parent block reference set at job start.
Why BuilderNet Doesn't See This
Datacenter hardware with NVMe RAID arrays and 256GB+ RAM keeps MDBX commit latency under 100ms — within the check interval. Consumer and VPS hardware cannot match this.
Fix
Commenting out the spawn_blocking call that launches the continuous check:
// crates/rbuilder/src/live_builder/building/mod.rs, lines ~110-117
// BEFORE:
{
let provider = self.provider.clone();
let block_ctx = block_ctx.clone();
let block_cancellation = block_cancellation.clone();
tokio::task::spawn_blocking(move || {
run_check_if_parent_block_is_last_block(provider, block_ctx, block_cancellation);
});
}
// AFTER: commented out entirely
Results
| Machine |
Before |
After |
| Integrated (i5-12600K) |
32.6% |
100% (937/937 bids, 0 cancellations) |
| Standalone (EPYC 4244P) |
31% |
97% (862 bids, 0 cancellations, 26 IPC header misses) |
Safety Analysis
The check's purpose is reorg detection during building. Safety considerations:
- Parent header is already validated before building starts in
wait_for_block_header(). This prevents building on the wrong chain.
- If a reorg occurs during building, the relay rejects the block (parent hash mismatch). No funds at risk — just a wasted building cycle.
- Mainnet reorgs are extremely rare (~1-2 per month).
- The
max_time_to_build timeout still applies — building jobs have a natural deadline.
The cost of the check (67% of building jobs cancelled on non-datacenter hardware) vastly exceeds the benefit (catching ~2 reorgs/month).
Suggested Improvement
Rather than removing the check entirely, consider one of:
-
Replace last_block_number() with best_block_number() — returns the in-memory head rather than disk-committed state. This would make the check hardware-independent.
-
Add a grace period at startup — wait for last_block_number() to catch up before starting the check loop, with a configurable timeout.
-
Make the check configurable — add a disable_last_block_check config option so non-datacenter operators can opt out.
Impact
This affects any rbuilder operator running on hardware where Reth's MDBX commit takes >100ms. This likely includes most solo validators and small operators — exactly the audience rbuilder is designed to serve as an open-source block builder.
GitHub Issue Draft — flashbots/rbuilder
TITLE:
run_check_if_parent_block_is_last_blockcancels ~67% of building jobs on non-datacenter hardwareBODY:
Summary
The continuous
last_block_numbercheck inrun_check_if_parent_block_is_last_block()cancels approximately 67% of building jobs on consumer/VPS hardware. The root cause is a timing mismatch between Reth's database commit latency and the 100ms check interval.Environment
The Problem
Logs show constant cancellations:
last_block_numberconsistently trails the canonical head by 2-3 blocks. Measured bid success rates:Config tuning attempted with no improvement:
root_hash_sparse_trie_version(v1, v2, vexp),root_hash_threads(0, 4, 6),faster_finalize(true/false).Root Cause
run_check_if_parent_block_is_last_block()incrates/rbuilder/src/live_builder/building/mod.rspolls every 100ms:last_block_number()returns the highest block committed to Reth's MDBX database on disk — not the latest block processed in memory. Reth's database write pipeline introduces 200ms-2s latency between "Block added to canonical chain" and "Canonical chain committed" (consistent with paradigmxyz/reth#8307).The CL sends
payload_attributesat slot boundary. rbuilder starts building for block N+1. The 100ms check fires, asks Reth forlast_block_number(), gets N-2 (disk commit still in progress), sees N-2+1 != N+1, and cancels.The sparse trie is not involved — the building job is cancelled before the trie does any work. The trie operates on a fixed parent block reference set at job start.
Why BuilderNet Doesn't See This
Datacenter hardware with NVMe RAID arrays and 256GB+ RAM keeps MDBX commit latency under 100ms — within the check interval. Consumer and VPS hardware cannot match this.
Fix
Commenting out the
spawn_blockingcall that launches the continuous check:Results
Safety Analysis
The check's purpose is reorg detection during building. Safety considerations:
wait_for_block_header(). This prevents building on the wrong chain.max_time_to_buildtimeout still applies — building jobs have a natural deadline.The cost of the check (67% of building jobs cancelled on non-datacenter hardware) vastly exceeds the benefit (catching ~2 reorgs/month).
Suggested Improvement
Rather than removing the check entirely, consider one of:
Replace
last_block_number()withbest_block_number()— returns the in-memory head rather than disk-committed state. This would make the check hardware-independent.Add a grace period at startup — wait for
last_block_number()to catch up before starting the check loop, with a configurable timeout.Make the check configurable — add a
disable_last_block_checkconfig option so non-datacenter operators can opt out.Impact
This affects any rbuilder operator running on hardware where Reth's MDBX commit takes >100ms. This likely includes most solo validators and small operators — exactly the audience rbuilder is designed to serve as an open-source block builder.