run_check_if_parent_block_is_last_block cancels ~67% of building jobs on non-datacenter hardware

# GitHub Issue Draft — flashbots/rbuilder

## TITLE:
`run_check_if_parent_block_is_last_block` cancels ~67% of building jobs on non-datacenter hardware

---

## BODY:

### Summary

The continuous `last_block_number` check in `run_check_if_parent_block_is_last_block()` cancels approximately 67% of building jobs on consumer/VPS hardware. The root cause is a timing mismatch between Reth's database commit latency and the 100ms check interval.

### Environment

- rbuilder develop branch (commit 55bbd32, also tested on 80ebfc8)
- Reth 1.9.3 (commit 27a8c0f5, same as rbuilder's pinned version)
- Tested on both integrated (reth-rbuilder) and standalone modes
- Hardware: Intel i5-12600K / 64GB RAM / NVMe (integrated) and AMD EPYC 4244P / 64GB RAM / NVMe VPS (standalone)
- Lighthouse v8.1.2 as CL

### The Problem

Logs show constant cancellations:

```
INFO Cancelling building job reason="last block number" last_block_number=24753843 block=24753845
INFO Cancelling building job reason="last block number" last_block_number=24753843 block=24753846
```

`last_block_number` consistently trails the canonical head by 2-3 blocks. Measured bid success rates:

| Mode | Bid Rate | Hardware |
|------|----------|----------|
| Integrated | 32.6% | i5-12600K, 64GB, consumer NVMe |
| Standalone | 31% | EPYC 4244P, 64GB, datacenter VPS |

Config tuning attempted with no improvement: `root_hash_sparse_trie_version` (v1, v2, vexp), `root_hash_threads` (0, 4, 6), `faster_finalize` (true/false).

### Root Cause

`run_check_if_parent_block_is_last_block()` in `crates/rbuilder/src/live_builder/building/mod.rs` polls every 100ms:

```rust
const CHECK_LAST_BLOCK_INTERVAL: Duration = Duration::from_millis(100);

let last_block_number = provider.last_block_number()?;
if last_block_number + 1 != block_ctx.block() {
    block_cancellation.cancel();
}
```

`last_block_number()` returns the highest block committed to Reth's MDBX database on disk — not the latest block processed in memory. Reth's database write pipeline introduces 200ms-2s latency between "Block added to canonical chain" and "Canonical chain committed" (consistent with paradigmxyz/reth#8307).

The CL sends `payload_attributes` at slot boundary. rbuilder starts building for block N+1. The 100ms check fires, asks Reth for `last_block_number()`, gets N-2 (disk commit still in progress), sees N-2+1 != N+1, and cancels.

The sparse trie is not involved — the building job is cancelled before the trie does any work. The trie operates on a fixed parent block reference set at job start.

### Why BuilderNet Doesn't See This

Datacenter hardware with NVMe RAID arrays and 256GB+ RAM keeps MDBX commit latency under 100ms — within the check interval. Consumer and VPS hardware cannot match this.

### Fix

Commenting out the `spawn_blocking` call that launches the continuous check:

```rust
// crates/rbuilder/src/live_builder/building/mod.rs, lines ~110-117
// BEFORE:
        {
            let provider = self.provider.clone();
            let block_ctx = block_ctx.clone();
            let block_cancellation = block_cancellation.clone();
            tokio::task::spawn_blocking(move || {
                run_check_if_parent_block_is_last_block(provider, block_ctx, block_cancellation);
            });
        }

// AFTER: commented out entirely
```

### Results

| Machine | Before | After |
|---------|--------|-------|
| Integrated (i5-12600K) | 32.6% | **100%** (937/937 bids, 0 cancellations) |
| Standalone (EPYC 4244P) | 31% | **97%** (862 bids, 0 cancellations, 26 IPC header misses) |

### Safety Analysis

The check's purpose is reorg detection during building. Safety considerations:

1. **Parent header is already validated before building starts** in `wait_for_block_header()`. This prevents building on the wrong chain.
2. **If a reorg occurs during building**, the relay rejects the block (parent hash mismatch). No funds at risk — just a wasted building cycle.
3. **Mainnet reorgs are extremely rare** (~1-2 per month).
4. **The `max_time_to_build` timeout still applies** — building jobs have a natural deadline.

The cost of the check (67% of building jobs cancelled on non-datacenter hardware) vastly exceeds the benefit (catching ~2 reorgs/month).

### Suggested Improvement

Rather than removing the check entirely, consider one of:

1. **Replace `last_block_number()` with `best_block_number()`** — returns the in-memory head rather than disk-committed state. This would make the check hardware-independent.

2. **Add a grace period at startup** — wait for `last_block_number()` to catch up before starting the check loop, with a configurable timeout.

3. **Make the check configurable** — add a `disable_last_block_check` config option so non-datacenter operators can opt out.

### Impact

This affects any rbuilder operator running on hardware where Reth's MDBX commit takes >100ms. This likely includes most solo validators and small operators — exactly the audience rbuilder is designed to serve as an open-source block builder.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

run_check_if_parent_block_is_last_block cancels ~67% of building jobs on non-datacenter hardware #909

GitHub Issue Draft — flashbots/rbuilder

TITLE:

BODY:

Summary

Environment

The Problem

Root Cause

Why BuilderNet Doesn't See This

Fix

Results

Safety Analysis

Suggested Improvement

Impact

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Mode	Bid Rate	Hardware
Integrated	32.6%	i5-12600K, 64GB, consumer NVMe
Standalone	31%	EPYC 4244P, 64GB, datacenter VPS

Machine	Before	After
Integrated (i5-12600K)	32.6%	100% (937/937 bids, 0 cancellations)
Standalone (EPYC 4244P)	31%	97% (862 bids, 0 cancellations, 26 IPC header misses)

run_check_if_parent_block_is_last_block cancels ~67% of building jobs on non-datacenter hardware #909

Description

GitHub Issue Draft — flashbots/rbuilder

TITLE:

BODY:

Summary

Environment

The Problem

Root Cause

Why BuilderNet Doesn't See This

Fix

Results

Safety Analysis

Suggested Improvement

Impact

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions