Skip to content

feat: multi blob#935

Open
chengwenxi wants to merge 53 commits into
mainfrom
feat/multi_batch
Open

feat: multi blob#935
chengwenxi wants to merge 53 commits into
mainfrom
feat/multi_batch

Conversation

@chengwenxi
Copy link
Copy Markdown
Collaborator

@chengwenxi chengwenxi commented Apr 17, 2026

Summary by CodeRabbit

  • New Features

    • Multi-blob batch support (Batch Header v2) with aggregated blob-hash handling across the stack.
  • Configuration

    • New CLI/env options to set max blob count and V2 upgrade time; batching honors on-chain governance interval/timeout.
  • Improvements

    • Enhanced packing/sealing, multi-blob compression/chunking, progress reporting, and blob-aware gas/fee estimation.
  • Tests

    • Added unit and integration tests covering V2 and multi-blob scenarios.

Review Change Stack

Kukoomomo and others added 6 commits April 15, 2026 15:04
- Add isBatchV2Upgraded hook to BatchCache so V2 header is always
  generated once the upgrade is activated, regardless of blob count.
  Previously the code fell back to V1 for single-blob batches, which
  is incompatible with the V2 public_input_hash (keccak(hash[0]) ≠ hash[0]).

- Remove the MAX_BLOB_PER_BLOCK = 6 constant from Rollup.sol and rely
  solely on blobhash(i) == bytes32(0) to terminate the blob-count loop.
  Per spec §9 design decision, blob count limits should be controlled
  by tx-submitter MaxBlobCount config, not a hardcoded contract constant,
  so Ethereum protocol upgrades (e.g. EIP-7691) require no contract change.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… challenge handler

- Change ExecutorInput.blob_info to blob_infos (Vec<BlobInfo>) with batch_version field
- Add BlobVerifier::verify_blobs for multi-blob KZG verification
- Add BatchInfo::public_input_hash_v2 using keccak256(hash[0]||...||hash[N-1])
- Add multi-blob encoding (encode_multi_blob, encode_blob_from_bytes) in host blob.rs
- Route verify() on batch_version: V2 uses aggregated blob hashes, V0/V1 unchanged
- Update shadow_rollup calc_batch_pi to parse V2 header with blob_count at offset 257
- Add blob_count and extra_blob_hashes to challenge handler BatchInfo and encode

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… entrypoints

- Add batch_version to ProveRequest and thread through gen_client_input/execute_batch
- Use verify_blobs (all blobs) instead of verify (first blob only) in server queue
- Compute blobHashesHash for V2 in server batch_header_ex; pass individual hashes for fill_ext
- fill_ext parses V2 blob_count + per-blob hashes from extended batch_header_ex
- Add batch_version param to try_execute_batch; callers extract version from batch_header[0]
- Add --batch-version CLI arg to host binary
- Add blob_count param to execute_batch for correct PI hash routing

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…V0/V1 compat fields

Replace blob_versioned_hash + blob_count + extra_blob_hashes with a single
blob_hashes: Vec<[u8; 32]>. fill_ext parses all hashes from batch_header_ex,
encode writes blob_hashes[0] at offset 57 and appends count + remaining hashes
for V2. No backward-compatibility shims needed since prover components upgrade together.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 17, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds BatchHeader V2 and multi-blob support across blob utilities, batch cache packing/sealing, node/prover flows, Rollup contract, tx-submitter wiring, gas-oracle sizing, tests, and infra updates.

Changes

Batch V2 header, interfaces, and shared blob/encoding utilities

Layer / File(s) Summary
Interfaces & data contracts
common/batch/interfaces.go
Defines SealedBatchKV, L1/L2/rollup/gov interfaces and ErrKeyNotFound.
Batch header V2
common/batch/batch_header.go, node/types/batch_header.go
Adds BatchHeaderVersion2, validates V2 length, adds BlobHashesHash() and version-gates BlobVersionedHash().
Encoding helpers
common/batch/encoding.go
Big-endian uint64/uint16 helpers and HeightFromBlockContextBytes validator.
Blob payload/sidecar & compression
common/batch/blob.go, common/blob/payload.go
Canonical blob packing, commitments, MakeBlobTxSidecar multi-blob builder, CompressBatchBytes, and Retrieve/Decode helpers.
Blob fee/config helpers
common/blob/fee.go
Chain blob fee configs, ChainConfigMap, GetBlobFeeDenominator, BlobHashes/MakeBlobProof/MakeCellProof, sidecar version helpers.
Module metadata
common/go.mod
Adds morph-l2/common module and dependency pins.

Batch cache V2 enablement and tests

Layer / File(s) Summary
Cache construction and L2Gov wiring
common/batch/batch_cache.go, common/batch/l2_gov.go
New NewBatchCache signature accepts isBatchV2Upgraded, maxBlobCount, L1HeaderClient, L2MultiClient, RollupBatchReader, L2GovCaller, SealedBatchKV; uses L2Gov for timing/tree roots and constructs BatchStorage.
Capacity calculation and sealing
common/batch/batch_cache.go, common/batch/batch_data.go
Effective maxBlobCount gating, capacity checks use maxBlobCount*MaxBlobBytesSize; SealBatch creates sidecars with effective blob count and updated compressed-size checks.
Packing progress & logs
common/batch/batch_cache.go
Adds batch packing progress state, progress logging, and expanded sealed-batch logs including blobCount/version.
Batch storage abstraction
common/batch/batch_storage.go
BatchStorage now uses SealedBatchKV and centralized isKVNotFound handling.
Tests and helpers
common/batch/helpers_test.go, *_test.go
Test KV helper, genesis-header init test, updated cache tests, restart tests using L2Gov, commit tests updated to new blob helpers.

Solidity Rollup V2 and tests

Layer / File(s) Summary
Rollup V2 commit & verify
contracts/contracts/l1/rollup/Rollup.sol
Adds _computeBlobVersionedHash supporting V2 aggregation, accepts versions ≤2, refactors header serialization and public-input sourcing.
Contract tests
contracts/contracts/test/Rollup.t.sol
Updates invalid-version expectations and adds comprehensive V2 header/aggregation tests and V2-specific revert checks.

Node DA verification and batch parsing

Layer / File(s) Summary
Beacon/derivation ordered blob assembly
node/derivation/beacon.go, node/derivation/derivation.go
Adds verifyBlob, indexes beacon sidecars by versioned hash, enforces L1-declared blob order, decodes & verifies each blob deterministically, proofs omitted.
ParseBatch multi-blob handling
node/derivation/batch_info.go, node/derivation/batch_info_test.go
Concatenates blob bodies then single decompress; switches to common batch header/tx decode and adds tests for multi-blob decompression invariant.
Deprecated duplicate header
node/types/batch_header.go
Marks file deprecated; mirrors common/batch V2 behavior for backward compatibility.

Prover/executor multi-blob inputs and hashing

Layer / File(s) Summary
Executor input & BatchInfo hash
prover/crates/executor/client/src/types/*
ExecutorInput uses blob_infos Vec and batch_version; adds public_input_hash_v2 aggregating blob hashes for V2 public input.
BlobVerifier multi-blob API
prover/crates/executor/client/src/verifier/blob_verifier.rs
Adds verify_blobs and verify_raw, separates verify_kzg, concatenates raw scalars then decompresses once.
Host encode & execute API
prover/crates/executor/host/src/blob.rs, prover/bin/host/src/execute.rs
Encodes multi-blob payloads, exposes get_blob_infos_from_* APIs, and threads batch_version into execute_batch.
Queue/shadow & challenge wiring
prover/bin/server/*, prover/bin/shadow-prove/*, prover/bin/challenge/src/handler.rs
Adds batch_version to ProveRequest, gen_client_input, queue processing, shadow path; aggregates versioned_hashes into keccak for V2 header blob_input; Challenge stores blob_hashes Vec.

Gas-oracle multi-blob changes

Layer / File(s) Summary
Blob payload extraction & detection
gas-oracle/app/src/da_scalar/blob.rs
Adds get_payload_bytes, detect_zstd_compressed and check_data accept num_blobs and validate against multi-blob capacity.
Combined payload & scalar
gas-oracle/app/src/da_scalar/{calculate,l1_scalar}.rs
Concatenates payloads, detects/decompresses once; get_data_from_blob returns (batch_size,num_blobs,txn_count); scalar uses num_blobs in capacity math.

Tx-submitter refactor, flags, config, infra

Layer / File(s) Summary
Blob compat & config migration
tx-submitter/types/blob_compat.go, tx-submitter/types/blob*.go
Removes old tx-submitter blob code; adds blob_compat wrappers delegating to morph-l2/common/blob; removes old params file.
Rollup client & service wiring
tx-submitter/services/rollup.go, tx-submitter/iface/client.go, tx-submitter/types/{converter,l2Caller}.go
Blob-aware EstimateGas signature, ChainConfigMap uses common blob types, L2Caller aliases L2Gov, client Len(), converter delegations to common/batch.
Flags & runtime config
tx-submitter/flags/flags.go, tx-submitter/utils/config.go
Adds MaxBlobCount and BatchV2UpgradeTime flags and config fields.
Infra & configs
ops/docker/*, ops/l2-genesis/*, contracts/src/deploy-config/*, go-ethereum
Docker env flags for seal/V2, compose profiles, sampling/custody increases, devnet govBatchBlockInterval bump, go-ethereum submodule update, deploy-config programVkey updates.

Sequence Diagram(s)

sequenceDiagram
  participant Cache
  participant L2Gov
  participant BlobUtils
  participant TxSubmitter
  participant Rollup
  participant Beacon
  participant Prover

  Cache->>L2Gov: BatchBlockInterval/BatchTimeout
  Cache->>BlobUtils: CompressBatchBytes -> MakeBlobTxSidecar
  Cache->>TxSubmitter: Emit V2 header (aggregated blob hashes)
  TxSubmitter->>Rollup: commitBatch(V<=2, sidecar)
  Rollup-->>TxSubmitter: committed
  TxSubmitter->>Beacon: fetch sidecars by tx
  Beacon-->>TxSubmitter: blobs+commitments
  Prover->>Prover: verify_blobs -> versioned_hashes
  Prover->>Prover: public_input_hash_v2(aggregated hash)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Suggested labels

validator

Suggested reviewers

  • panos-xyz
  • curryxbo

"I nibble bytes and pack them tight,
headers flipped to V2 tonight.
Keccak hums, blobs align,
sequencer roots and proofs refine.
Hop, probe, seal — a rabbit's delight."

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/multi_batch

chengwenxi and others added 7 commits April 17, 2026 10:35
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…mpress once

Split get_origin_batch into unpack_blob (field-element unpack) and decompress_batch
(zstd decompress). verify_blobs now KZG-verifies each blob independently, unpacks
all compressed chunks, concatenates them, then calls decompress_batch once.
Previously each blob was independently decompressed which fails for N>1 since the
zstd frame spans all chunks.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…fset 57

V2 headers now use the same 257-byte format as V1 with the aggregated
blob hash (keccak256 of all blob hashes) at offset 57. This eliminates
BatchHeaderCodecV2, simplifies contracts/prover/submitter, and fixes the
multi-blob decompression bug in blob_verifier.

- Delete BatchHeaderCodecV2.sol; V2 commitBatch computes aggregated hash inline
- Unify _verifyProof and _loadBatchHeader for all versions
- Remove BatchHeaderV2 struct in Go; V2 uses V1 format + version override
- Simplify Rust challenge handler, queue, shadow_rollup (uniform 96-byte batch_header_ex)
- Fix verify_blobs: decode BLS scalars per blob, concatenate, decompress once

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
V2 should store the aggregated blob hash (keccak256 of all blob hashes)
in batchBlobVersionedHashes, consistent with the header offset 57 value,
instead of blobhash(0).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…struction

Move blobVersionedHash computation out of _commitBatchWithBatchData into
callers via a new _computeBlobVersionedHash(version) helper:
- V0/V1: blobhash(0) or ZERO_VERSIONED_HASH
- V2: keccak256(blobhash(0)||...||blobhash(N-1)), requires >=1 blob

_commitBatchWithBatchData now has a single unified header construction
path for all versions — no more V2/V0V1 branch split.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Remove the V2 restriction in commitState — with the simplified V2 header
format (aggregated hash at offset 57), the stored batchBlobVersionedHashes
value is sufficient to recommit without a blob, same as V0/V1.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@Kukoomomo Kukoomomo requested a review from dylanCai9 April 23, 2026 02:32
chengwenxi and others added 15 commits April 23, 2026 17:46
…rsionedHash

- Move `require(_blobCount > 0)` before the keccak256 assembly block to
  avoid computing keccak256 on empty input when no blobs are attached
- Add multi-blob aggregated hash unit tests to RollupCommitBatchV2Test:
  single/two/three blob correctness, order-sensitivity, and _verifyProof
  publicInputHash alignment with V2 aggregated hash semantics
- Add Rust unit tests for public_input_hash_v2 in prover-executor-client:
  single/two/three blob correctness, order-sensitivity, and structural
  difference from V1 (V2 uses keccak(h0||...||hN-1), not raw hash)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: corey <corey.zhang@bitget.com>
Co-authored-by: corey <corey.zhang@bitget.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Resolve conflicts after PR #939 (remove sequencer batch generation):
- node/types/batch_test.go: keep HEAD (multi_batch) version; it carries
  TestBatchHeaderV2 / TestBlobHashesHashUnavailableForLegacy which still
  exercise BatchHeaderV0/V1/V2 helpers retained on this branch.
- go.work.sum: union both sides (toolchain entries from x/exp, x/mod,
  x/tools); identical go directive on both branches.
- go-ethereum submodule: take main's 045be0fd (v2.2.2, includes the
  matching "Remove sequencer batch write paths" change for #939) over
  multi_batch's older 62952ec7.
- common/go.mod: bump tendermint replace from v0.3.4 -> v0.3.7 to align
  with node/tx-submitter/contracts/oracle/ops modules; common itself
  has no tendermint imports, the directive was stale and was the cause
  of "conflicting replacements for github.com/tendermint/tendermint" in
  the workspace build.

Co-authored-by: Cursor <cursoragent@cursor.com>
…/batch (#945)

Co-authored-by: corey <corey.zhang@bitget.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Point the go-ethereum submodule at 045be0fdc7ca (morph-v2.2.2) to match origin/main and the locked go.mod version.

Co-authored-by: Cursor <cursoragent@cursor.com>
@secmgt
Copy link
Copy Markdown

secmgt commented May 13, 2026

@claude review

Copy link
Copy Markdown

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a protocol-level multi-blob (Batch V2) upgrade spanning 73 files across the Rollup contract, derivation, prover public-input hashing, gas-oracle DA scalar, and tx-submitter sealing/commit paths — it should get a human review.

Extended reasoning...

Overview

This PR introduces Batch Header V2 and multi-blob support end-to-end:

  • L1 contract (contracts/contracts/l1/rollup/Rollup.sol): accepts version <= 2, adds _computeBlobVersionedHash which aggregates blobhash(0..N-1) via keccak256 for V2, and routes V2 headers through the V1 257-byte loader. The 32-byte field at offset 57 now carries an aggregated hash for V2 (versioned hash for V0/V1), and _verifyProof consumes that field uniformly.
  • Prover (prover/crates/executor/client, prover/bin/{server,shadow-prove,challenge}): ExecutorInput gains blob_infos: Vec<BlobInfo> and batch_version: u8; BlobVerifier::verify_blobs concatenates raw blob scalars and decompresses once; BatchInfo::public_input_hash_v2 hashes keccak256(h0||...||hN-1) into the public input. The proverServer's blob input fed into batch_header.data flips between aggregated and single hash based on batch_version.
  • Node derivation (node/derivation): ParseBatch now concatenates all blob bodies before a single zstd decompress (per-blob decompression would fail mid-frame), enforces L1-declared blob order via a versioned-hash map, and replaces VerifyBlobProof with a commitment round-trip (verifyBlob) since beacon /blob_sidecars no longer guarantees a legacy single-blob KZG proof post-Osaka/PeerDAS.
  • Common module (common/batch, common/blob): new Go module hosting BatchCache, BatchStorage, V2 header codec (BlobHashesHash vs BlobVersionedHash), and multi-blob sidecar/compression helpers; tx-submitter and node now depend on it via common/blob compat shims.
  • Tx-submitter: new flags MAX_BLOB_COUNT and BATCH_V2_UPGRADE_TIME gate V2 activation; EstimateGas now passes BlobHashes/BlobGasFeeCap so blobhash(i) simulates correctly during eth_estimateGas.
  • Gas-oracle: blob extraction switched to multi-blob concat-then-decompress; get_data_from_blob returns (batch_size, num_blobs, txn_count) and blob-scalar capacity scales with num_blobs.
  • Configs/infra: programVkey updated across all networks (holesky/hoodi/l1/qanetl1/sepolia/testnetl1) — verifier-side change that must match the new SP1 ELF.

Security risks

  • L1 contract surface changes: Rollup._computeBlobVersionedHash and _loadBatchHeader accept a new version, and the proof public-input now depends on an aggregated blob hash for V2. A mismatch between the contract's aggregation (loop over blobhash(i) until zero) and the prover's aggregation (Vec from beacon) would silently invalidate proofs or accept malformed batches. CodeRabbit flagged that the V0/V1 branch does not assert blobhash(1) == 0, allowing legacy batches to carry uncommitted extra blobs; the author elected not to add the guard.
  • Verifier key rotation: programVkey changes across all deploy configs — any deploy-script discrepancy between contract upgrade and prover ELF would brick finalization.
  • Derivation hardening deferred: CodeRabbit raised a calldata-prefix length check (since fixed in 2843031), a stale V0→V1 BlockContext.Decode on tx-payload bytes, and a header-vs-calldata blockCount mismatch. The author explicitly deferred the latter two under a 'trusted sequencer' policy — defensible but worth a human sign-off on the threat model.
  • Removed KZG proof check: VerifyBlobProof was replaced with a commitment→versioned-hash round-trip. The reasoning (PeerDAS makes beacon-supplied proofs unreliable) is sound, but the security argument deserves a human review given that this is the only on-the-fly DA check at derivation time.
  • Blob length validation: copy(blob[:], b) is now guarded by an exact length check after CodeRabbit raised it — good — but several other multi-blob index sites (Rust decode_blob_scalars, Go MakeBlobProof) were flagged without explicit fixes landing.

Level of scrutiny

This is the highest-scrutiny tier: a coordinated protocol upgrade that changes consensus-relevant public-input hashing, L1 contract code, and verifier keys, with several deferred hardening items and an explicit invocation of the trusted-sequencer model in derivation. Outside the bot's auto-approval criteria — needs a human reviewer who can validate the V2 activation/coordination plan, confirm prover/contract aggregation parity, and accept the deferred derivation guards.

Other factors

  • 60+ minute estimated review effort per CodeRabbit's own walkthrough.
  • Multiple unresolved CodeRabbit majors (legacy-version extra-blob guard, blobless-batch scalar fallback, batch_version validation, calldata-vs-header blockCount consistency) where the author chose to defer rather than fix; a human should weigh those tradeoffs.
  • Tests added are non-trivial (V2 header parsing, multi-blob ParseBatch round-trip, contract V2 aggregation tests) but do not cover the cross-boundary parity between contract aggregation and prover aggregation end-to-end.
  • The bug hunting system found no bugs this run, but absence of findings here is not a substitute for human review on a change of this scope.

Comment thread contracts/contracts/l1/rollup/Rollup.sol Outdated
@Kukoomomo
Copy link
Copy Markdown
Contributor

@claude review

Comment on lines +27 to 33
#[serde(default = "default_batch_version")]
pub batch_version: u8,
}

fn default_batch_version() -> u8 {
2
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 The server defaults batch_version to 2 in prover/bin/server/src/queue.rs:27-33, but the challenge handler's ProveRequest at prover/bin/challenge/src/handler.rs:21-27 has no batch_version field at all and POSTs JSON without it (handler.rs:184-189). For any V0/V1 batch the challenge handler asks the prover to prove, the server applies the V2 default and dispatches public_input_hash_v2 at lib.rs:37-41 plus writes keccak256(h0||...) to batch_header.data at queue.rs:167-175, producing a public-input hash that does not match the on-chain Rollup contract's V0/V1 blobhash(0)-derived PI — so L1 challenge proof verification fails. Fix by updating challenge handler ProveRequest to include batch_version (read from the batch header it already loads, mirroring shadow_prove.rs:127), or change default_batch_version() to 0 to match host/src/main.rs:36 and ExecutorInput's #[serde(default)].

Extended reasoning...

What the bug is

prover/bin/server/src/queue.rs:21-33 defines the prover server's HTTP API request schema with a #[serde(default = "default_batch_version")] on batch_version, and default_batch_version() returns 2 (the V2 multi-blob variant). Any caller that POSTs to /prove_batch without including the field will have its proof generated under V2 PI semantics.

prover/bin/challenge/src/handler.rs:21-27 defines its own ProveRequest struct that literally has no batch_version field:

#[derive(Serialize)]
pub struct ProveRequest {
    pub batch_index: u64,
    pub start_block: u64,
    pub end_block: u64,
    pub rpc: String,
}

It is serialized to JSON and POSTed to /prove_batch at handler.rs:184-190. The server deserializes that JSON, sees no batch_version field, and applies the default of 2.

Why existing code does not prevent it

This PR is the one introducing batch_version. It updates shadow_prove.rs:127-134 to read batch_version from the on-chain batch header and pass it explicitly, but it does not update challenge/src/handler.rs. Three other entry points in this PR all default to 0:

  • prover/bin/host/src/main.rs:36#[clap(long = "batch-version", default_value_t = 0)]
  • prover/crates/executor/client/src/types/input.rs:73-74#[serde(default)] on pub batch_version: u8 (u8 default = 0)
  • prover/bin/shadow-prove/src/shadow_prove.rs:24#[serde(default)] (u8 default = 0)

Only the server queue defaults to 2. So the challenge handler — the one in-tree caller that doesn't set the field — silently triggers V2 PI math on every batch it asks for proof of.

Step-by-step proof

  1. A challenge is filed against a V0 or V1 batch on L1 (e.g. any historical batch from before BATCH_V2_UPGRADE_TIME).
  2. challenge-handler detects the event, inspects the batch (handler.rs:154-161), and constructs ProveRequest { batch_index, start_block, end_block, rpc } — no batch_version field at all (handler.rs:184-189).
  3. util::call_prover(serde_json::to_string(&request).unwrap(), "/prove_batch") (handler.rs:190) sends a JSON body that lacks batch_version.
  4. The server's ProveRequest deserializer in queue.rs:21-29 applies default_batch_version() = 2.
  5. prove_for_queue calls gen_client_input(..., batch_version=2) (queue.rs:90-96). Inside, execute_batch is invoked with batch_version = 2 (queue.rs:145-153), so the resulting ExecutorInput.batch_version = 2.
  6. gen_client_input then computes blob_input for the saved batch_header.data (queue.rs:163-181):
    let blob_input = if batch_version >= 2 {
        let mut blob_hasher = Keccak256::new();
        for h in &versioned_hashes { blob_hasher.update(h.as_slice()); }
        blob_hasher.finalize()
    } else {
        versioned_hashes[0]
    };
    For the V0/V1 batch we get keccak256(versioned_hashes[0]) instead of versioned_hashes[0].
  7. Inside the zkVM, prover/crates/executor/client/src/lib.rs:37-41 dispatches on input.batch_version:
    let public_input_hash = if input.batch_version >= 2 {
        batch_info.public_input_hash_v2(&versioned_hashes)
    } else {
        batch_info.public_input_hash(&versioned_hashes[0])
    };
    So the proof commits to a public input that uses keccak256(versioned_hashes[0]) as the blob input field.
  8. On L1, the Rollup contract's _verifyProof (Rollup.sol:758-772) reads offset 57 of the V0/V1 batch header, which was set at commit time by _computeBlobVersionedHash(0_or_1) (Rollup.sol:921-942) to blobhash(0) directly, not keccak256(blobhash(0)). The contract then computes its expected public input hash using blobhash(0) as the blob field.
  9. Off-chain PI (keccak256(h0)) != on-chain PI (h0) → SP1 verifier rejects the proof → challenge resolution fails on L1.

Impact

Challenge resolution is the safety-critical defense path: when an honest party challenges an invalid batch, the prover must produce a valid SP1 proof to defend the batch on L1. With this bug, the prover silently produces proofs against the wrong public input for any V0/V1 batch challenged via challenge-handler, so verification fails and the rollup loses the challenge by default. This is bounded to the V0/V1 → V2 transition window (any historical V0/V1 batches still inside the proof window when the V2 upgrade goes live), but during that window the challenge handler is broken.

How to fix

Three options, in order of robustness:

  1. Best: Update challenge/src/handler.rs:21-27 ProveRequest to include batch_version: u8, and at handler.rs:184-189 populate it from the parent batch header that batch_inspect already loads (mirroring shadow_prove.rs:124-134). The challenge handler already parses the batch header at handler.rs:486-500, so the version byte is trivially available.
  2. Change default_batch_version() in queue.rs:31-33 to return 0 instead of 2, matching host/src/main.rs:36 and types/input.rs:73-74. This eliminates the silent regression but still leaves batch_version implicit — a future V2-only caller might forget to set it.
  3. Remove #[serde(default = "default_batch_version")] entirely and make the field mandatory in the request payload. This forces every caller to be explicit, which is the cleanest contract but requires touching all callers.

Option (1) addresses the current production-breaking path; option (2) or (3) is a defense-in-depth follow-up worth landing in the same PR.

curryxbo pushed a commit that referenced this pull request May 13, 2026
Brings in 6 commits from base feat/multi_batch since the previous merge
(09a2c72) so PR #951 stops reporting CONFLICTING:

- 4be9cda update challenge handler dep
- c6f7cc2 add v0/v1 blob check
- 7ec11cf Merge branch 'feat/multi_batch' of github.com:morph-l2/morph
- fa9f4e8 fix submitter replay batch with config max blob count
- 1fe0d74 add rustc version desc
- fbd1bee chore: align go-ethereum submodule with origin/main

Single content conflict in common/batch/batch_cache.go: base added a new
sealEffectiveBlobCount function in the same hunk where SPEC-005 exported
BuildBlockContext (PR #935-derived rename). Resolved by keeping both:
sealEffectiveBlobCount stays where base placed it; the exported
BuildBlockContext (with SPEC-005 docstring) follows. base's lowercase
"buildBlockContext" comment was dropped (now stale -- we renamed it).

All other touched files auto-merged: contracts/Rollup.sol,
BlobVersionedHashLegacy.t.sol (added by base), prover/Cargo.{toml,lock},
prover/{,bin/challenge/}rust-toolchain (deleted by base),
tx-submitter/mock/rollup.go, common/batch/batch_restart_test.go.

go build ./common/batch/ ./common/blob/ ./node/... ./tx-submitter/...
./oracle/... -- clean.
go test ./node/derivation/... -- 22 tests PASS (incl. 9 added in
b9ad14e). common/batch tests hang on a pre-existing txSenderCacher
goroutine leak unrelated to this merge (reproducible on origin/feat/multi_batch
in isolation); not introduced here.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…hes (#953)

Co-authored-by: corey <corey.zhang@bitget.com>
Comment on lines +336 to +341
MaxBlobCountFlag = cli.IntFlag{
Name: "max_blob_count",
Usage: "Maximum number of blobs per batch submission (1-6)",
Value: 6,
EnvVar: prefixEnvVar("MAX_BLOB_COUNT"),
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 The max_blob_count flag at tx-submitter/flags/flags.go:336-341 documents its range as (1-6) in the usage string, but no code on the path from CLI parsing (utils/config.go:194) to BatchCache construction (common/batch/batch_cache.go:99) actually enforces the upper bound — NewBatchCache only guards <= 0 and otherwise accepts any positive value. An operator who sets TX_SUBMITTER_MAX_BLOB_COUNT=10 (e.g. by typo or misreading the docs) gets a tx-submitter that happily packs 7+ blob batches; MakeBlobTxSidecar's byte-capacity check passes, but every resulting blob tx is rejected by the L1 client at submission (EIP-4844 caps at 6, 9 under Pectra) until the operator restarts with corrected config. Suggest a one-line range check (e.g. maxBlobCount > 6 -> reject or clamp) at config load or in NewBatchCache to match the documented contract.

Extended reasoning...

What the bug is

MaxBlobCountFlag (tx-submitter/flags/flags.go:336-341) is declared with:

MaxBlobCountFlag = cli.IntFlag{
    Name:   "max_blob_count",
    Usage:  "Maximum number of blobs per batch submission (1-6)",
    Value:  6,
    EnvVar: prefixEnvVar("MAX_BLOB_COUNT"),
}

The usage string promises a (1-6) range, but cli.IntFlag in this version of urfave/cli has no validator hook, so the bound is documentation-only.

The plumbing then:

  1. utils/config.go:194 reads ctx.GlobalInt(flags.MaxBlobCountFlag.Name) into Config.MaxBlobCount with no clamping or validation.
  2. services/rollup.go:128 passes that value straight into batch.NewBatchCache.
  3. common/batch/batch_cache.go:99-101 is the only guard on the value:
    if maxBlobCount <= 0 {
        maxBlobCount = 2
    }
    — lower bound only. 10, 100, 1000 all pass.

Step-by-step proof

  1. Operator sets TX_SUBMITTER_MAX_BLOB_COUNT=10 (typo, doc misread, or stale config from a future protocol upgrade).
  2. NewBatchCache(..., maxBlobCount=10, ...) stores bc.maxBlobCount = 10. Capacity checks now use 10 * MaxBlobBytesSize = 1,269,760 bytes.
  3. Once V2 is active (isBatchV2Upgraded true), effectiveMaxBlobCount returns 10. Batches grow until the compressed payload approaches that cap.
  4. On a busy enough batch, SealBatchMakeBlobTxSidecar(compressed, 10) in common/blob/payload.go:78 — the only capacity check there is len(blobBytes) > maxBlobs*MaxBlobBytesSize, which a 10-blob payload satisfies. The sidecar is built with up to 10 blob field-element arrays.
  5. createBlobTx wraps the sidecar into a BlobTx and submits via L1Client.SendTransaction.
  6. L1 client rejects: EIP-4844 caps tx.blobVersionedHashes.length at 6 (or 9 post-Pectra/EIP-7691 depending on network). The error surfaces but there is no recovery path — every subsequent attempt builds the same invalid tx because the cap is static config. The submitter is stuck until the operator restarts with corrected config.

Why existing code doesn't prevent it

The only checks in the cache/blob path use maxBlobs * MaxBlobBytesSize for byte capacity, never for blob count. The L1 protocol cap is implicit in the EL/CL clients, not enforced by tx-submitter. The default value (Value: 6) is correct, so a vanilla deployment is fine; the bug only manifests under explicit operator misconfiguration.

Impact

Bounded — this is operator misconfiguration territory, not an attacker-controlled input. The default is the safe maximum, the failure is loud (clear L1 error), no on-chain state is corrupted, and no submitter state is lost. But the documented (1-6) contract is not enforced, which is the kind of mismatch that breaks during operator handoffs or doc drift. One verifier rightly calls this "not actionable" since L1 will reject and the operator sees the error immediately; that view is defensible, but flagging it as a nit costs nothing and the fix is one line.

Fix

Add a range check at config load (utils/config.go around line 194) or in NewBatchCache:

if maxBlobCount > 6 {
    return fmt.Errorf("max_blob_count %d exceeds protocol cap 6", maxBlobCount)
}

Or clamp to match the documented bound. The choice between reject vs clamp depends on whether silent normalization is preferable to an explicit boot-time failure; rejecting matches the principle of failing loudly on misconfiguration.

chengwenxi and others added 3 commits May 18, 2026 16:31
…955)

Co-authored-by: chengwenxi <22697326+chengwenxi@users.noreply.github.com>
…956)

Co-authored-by: chengwenxi <22697326+chengwenxi@users.noreply.github.com>
…batch_pi (#957)

Co-authored-by: chengwenxi <22697326+chengwenxi@users.noreply.github.com>
Comment thread common/blob/fee.go
Comment thread common/blob/payload.go
Comment on lines +43 to +51
func RetrieveBlobBytes(blob *kzg4844.Blob) ([]byte, error) {
data := make([]byte, MaxBlobBytesSize)
for i := 0; i < 4096; i++ {
if blob[i*32] != 0 {
return nil, fmt.Errorf("invalid blob, found non-zero high order byte %x of field element %d", data[i*32], i)
}
copy(data[i*31:i*31+31], blob[i*32+1:i*32+32])
}
return data, nil
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 RetrieveBlobBytes at common/blob/payload.go:47 builds its "non-zero high order byte %x of field element %d" error from data[i*32], but data is the freshly allocated all-zero output buffer at that point — nothing has been written to it yet. The byte that actually failed the check is blob[i*32] (the condition on line 46 tests blob[i*32] != 0), so the error always reports 0x0, directly contradicting its own "non-zero" wording. The same pattern exists in common/batch/blob.go:48; both spots need the one-character fix data[i*32]blob[i*32].

Extended reasoning...

What the bug is

At common/blob/payload.go:43-51, RetrieveBlobBytes walks the 4096 field elements of a blob and rejects any element whose high-order byte is non-zero (the canonical encoding requires that byte to be 0x00):

func RetrieveBlobBytes(blob *kzg4844.Blob) ([]byte, error) {
    data := make([]byte, MaxBlobBytesSize)
    for i := 0; i < 4096; i++ {
        if blob[i*32] != 0 {
            return nil, fmt.Errorf("invalid blob, found non-zero high order byte %x of field element %d", data[i*32], i)
        }
        copy(data[i*31:i*31+31], blob[i*32+1:i*32+32])
    }
    return data, nil
}

The condition on line 46 correctly tests blob[i*32] != 0, but the format argument on line 47 reads data[i*32] — the freshly allocated, all-zero output buffer. The loop only writes data[i*31:i*31+31] (31-byte stride), never the 32-byte stride offset the error reads from, so data[i*32] is guaranteed to be 0x00 every time the error fires.

Step-by-step proof

  1. Call RetrieveBlobBytes(blob) where blob[0] = 0x42 (any non-zero byte at a field-element boundary).
  2. Enter the loop at i = 0. make([]byte, MaxBlobBytesSize) returns a slice where every byte is 0x00.
  3. blob[0*32] != 0 evaluates to true (since blob[0] = 0x42).
  4. fmt.Errorf formats data[0*32] = data[0] = 0x00.
  5. The returned error reads: invalid blob, found non-zero high order byte 0 of field element 0 — internally self-contradicting (the message claims the byte is non-zero while printing it as 0).

Why existing code does not prevent it

There is no prior write to data[i*32] before the error fires:

  • On the first iteration, data is still the zeroed buffer from make.
  • On later iterations the loop only writes data[i*31:i*31+31], i.e. the 31-byte-stride slice, never the 32-byte-stride offset the error reads. So even mid-loop, data[i*32] is always either still the initial zero or, at most, an unrelated byte written by a prior iteration's copy — it has no semantic relation to the field element being validated.

The identical pattern is also present in common/batch/blob.go:48 (the older copy that RetrieveBlobBytes was forked from / moved alongside in this PR). Both spots need the fix.

Impact

Diagnostic-only — the validation itself still correctly rejects malformed blobs. But operators investigating a malformed blob (from a misbehaving sequencer, beacon node, or upstream encoder) get an uninformative error that always reads 0x0 instead of the actual offending byte value, which is the single most useful piece of forensic information. The error message also directly contradicts its own "non-zero" wording — the kind of inconsistency that wastes operator time during a production incident. CodeRabbit flagged this on 2026-05-08 at this exact location and it remains unfixed.

How to fix

One-character change in each file: replace data[i*32] with blob[i*32] in the format argument so the error logs the actual offending source byte.

return nil, fmt.Errorf("invalid blob, found non-zero high order byte %x of field element %d", blob[i*32], i)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing common/blob/payload.go:47 — that one now correctly reports blob[i*32].

However the comment also flagged the same pattern at common/batch/blob.go:48, and that file still has the original data[i*32]:

// common/batch/blob.go:44-52
func RetrieveBlobBytes(blob *kzg4844.Blob) ([]byte, error) {
    data := make([]byte, MaxBlobBytesSize)
    for i := 0; i < 4096; i++ {
        if blob[i*32] != 0 {
            return nil, fmt.Errorf("invalid blob, found non-zero high order byte %x of field element %d", data[i*32], i)
            //                                                                                                ^^^^^^^^^
            //                                                                                always 0x0
        }
        copy(data[i*31:i*31+31], blob[i*32+1:i*32+32])
    }
    return data, nil
}

This is not dead code: node/derivation/batch_info.go:110 calls it as commonbatch.RetrieveBlobBytes(&blobCopy) (imported as commonbatch "morph-l2/common/batch"), so the derivation path still hits the always-zero error message on malformed blobs.

Same one-character fix: data[i*32]blob[i*32].

Comment thread prover/crates/executor/client/src/lib.rs Outdated
Comment thread common/batch/batch_cache.go Outdated
Co-authored-by: FletcherMan <fanciture@163.com>
Co-authored-by: fletcher.fan <fletcher.fan@bitget.com>
Comment on lines 249 to 255
let commit_scalar = (rollup_gas_used.as_u64() + self.finalize_batch_gas_used) * PRECISION /
l2_txn.max(self.txn_per_batch);
let blob_scalar = if l2_data_len > 0 {
MAX_BLOB_TX_PAYLOAD_SIZE as u64 * PRECISION / l2_data_len
num_blobs.max(1) * MAX_BLOB_TX_PAYLOAD_SIZE as u64 * PRECISION / l2_data_len
} else {
MAX_BLOB_SCALAR
};
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 The new commitState() method emits the same CommitBatch event as commitBatch but the tx is a DynamicFeeTx with no blob sidecar; gas-oracle's calculate_scalar does not filter logs by tx selector, so when a commitState tx is the most-recent CommitBatch log, get_data_from_blob() returns (0,0,0) and calculate_from_rollup (l1_scalar.rs:249-255) falls into the l2_data_len == 0 branch, setting blob_scalar = MAX_BLOB_SCALAR (10^11) and pushing it to L2. This inflates the L1-DA fee component on every L2 user transaction until a subsequent blob-carrying commitBatch lands and the oracle re-derives a normal value. Fix by propagating an Option/None sentinel from get_data_from_blob when indexed_hashes.is_empty() and skipping the set_blob_scalar update in that case, or by filtering CommitBatch logs in calculate_scalar by tx selector to exclude commitState.

Extended reasoning...

What the bug is

This PR adds a new commitState() method to Rollup.sol that re-commits a previously-committed batch by reusing the blob versioned hash stored on L1 (batchBlobVersionedHashes[N] populated by a prior commitBatch). The commitState tx itself is a plain DynamicFeeTx with no blobVersionedHashes — see tx-submitter/services/rollup.go:1206 and the new rebuild path at tryRebuildRollupCommitTx (rollup.go:1379-1392) which both call createDynamicFeeTx. Crucially, commitState reaches _commitBatchWithBatchData which unconditionally emits the same CommitBatch event at Rollup.sol:383 — so at the L1 log level a commitState is indistinguishable from a commitBatch.

The gas-oracle's calculate_scalar (gas-oracle/app/src/da_scalar/l1_scalar.rs:175-198) subscribes to CommitBatch events via commit_batch_filter() with no method/selector discrimination, then picks the most-recent log by block number (line 193 max_by_key(log.block_number)). It then calls calculate_from_rollup() -> get_data_from_blob() for that tx hash. Inside get_data_from_blob (line 271+), data_and_hashes_from_txs reads target_tx.other['blobVersionedHashes'] (calculate.rs:155-159). For a commitState DynamicFeeTx this field is empty/absent, so indexed_hashes is empty and the function short-circuits to Ok((0, 0, 0)) at l1_scalar.rs:300-303.

Why existing code does not prevent it

Back in calculate_from_rollup, the new multi-blob math at lines 249-255 is:

let blob_scalar = if l2_data_len > 0 {
    num_blobs.max(1) * MAX_BLOB_TX_PAYLOAD_SIZE as u64 * PRECISION / l2_data_len
} else {
    MAX_BLOB_SCALAR  // 10^11
};

With l2_data_len == 0, blob_scalar saturates to MAX_BLOB_SCALAR. The threshold gate (check_threshold_reached, lines 159-171) only compares relative change to gas_threshold%; jumping from a normal value (~1e9-1e10) to 1e11 is roughly 10-100x, easily clearing any sane threshold. set_blob_scalar (line 147) is then invoked and the inflated value is pushed to the L2 GasPriceOracle, where it feeds directly into the L1-DA fee component of every L2 user transaction.

Why this PR introduces the bug

Pre-PR, every commitBatch tx attached at least one blob — even empty batches used EmptyVersionedHash as a sentinel — so indexed_hashes was always non-empty for any CommitBatch event the oracle processed. The (0,0,0) -> MAX_BLOB_SCALAR branch existed but was unreachable in production. CodeRabbit flagged this exact latent fall-through on 2026-05-08 (PR timeline, l1_scalar.rs:303 'Skip blob-scalar recalculation for blobless batches'); the comment is unaddressed. This PR makes the dead branch live by introducing commitState: a CommitBatch-emitting tx with zero attached blobs.

Step-by-step proof

  1. Owner runs revertBatch(N). committedBatches[N] is cleared but batchBlobVersionedHashes[N] is preserved (Rollup.sol:108 comment).
  2. tx-submitter's next rollup() iteration reads the stored blob hash and chooses commitState (rollup.go:1133-1144). It packs commitState(rollupBatch, signature) and submits via createDynamicFeeTx — no blob sidecar.
  3. The tx lands on L1; _commitBatchWithBatchData emits CommitBatch(batchIndex, hash) exactly as commitBatch would.
  4. Gas-oracle's next cycle: commit_batch_filter() returns all CommitBatch logs in the last 100 blocks; max_by_key(block_number) selects the commitState tx as the most-recent.
  5. calculate_from_rollup -> get_data_from_blob(tx_hash) -> fetches the tx; target_tx.other['blobVersionedHashes'] is empty; indexed_hashes.is_empty() is true; returns Ok((0, 0, 0)).
  6. Back in calculate_from_rollup: l2_data_len == 0 -> blob_scalar = MAX_BLOB_SCALAR = 10^11.
  7. check_threshold_reached compares 10^11 to the current ~1e9-1e10 stored value; actual_change > expected_change trivially holds; set_blob_scalar(MAX_BLOB_SCALAR) is sent to L2.
  8. L2 GasPriceOracle writes 10^11; every subsequent user transaction pays a roughly 10-100x inflated L1-DA fee component until the next blob-carrying commitBatch arrives and the oracle re-derives a normal scalar.

Impact and bounds

Self-healing: any subsequent commitBatch (with blob) lands within a normal sequencing window and overwrites MAX_BLOB_SCALAR with a sane value. The inflated-fee window is bounded to that interval (typically minutes, but operator-controlled). However the trigger is realistic — revertBatch is owner-only but the rebuild path at rollup.go:1379-1392 will dispatch commitState on every recovery, and the oracle picks it up on the very next polling cycle. Users pay real overcharges during the window. No state corruption, no consensus impact, but a concrete user-facing fee regression introduced by this PR's commitState plumbing.

How to fix

Either of the following (the bug submitter and CodeRabbit both proposed essentially the same set):

  1. Propagate a sentinel from get_data_from_blob: change the return type to Result<Option<(u64,u64,u64)>> (or an explicit enum) and have calculate_from_rollup skip the set_blob_scalar push when the sentinel is None ("no blobs in this tx, do not update blob scalar").
  2. Filter logs by tx selector in calculate_scalar: drop logs whose tx data starts with the commitState selector before max_by_key picks the most-recent.
  3. Carry over the previous blob_scalar when l2_data_len == 0, instead of saturating to MAX_BLOB_SCALAR.

Option (1) is the cleanest and matches the existing semantics: a tx that delivers no DA bytes simply should not move the DA scalar.

Comment on lines 338 to +360
return nil, fmt.Errorf("failed to get blobs, continuing processing:%v", err)
}
if len(blobSidecars) > 0 {
// Create blob sidecar
var blobTxSidecar eth.BlobTxSidecar
matchedCount := 0

// Match blobs
// Index beacon sidecars by their KZG-derived versioned hash so we
// can assemble the local sidecar in the exact order the L1 tx
// declared its blobs. Multi-blob batches are decoded by
// concatenating blob bodies in tx order; any reordering here
// would corrupt the resulting zstd stream. The map key is
// derived from the beacon-supplied commitment; verifyBlob below
// re-derives the same hash from the actual blob bytes, so a
// malicious beacon cannot forge an entry by lying about the
// commitment.
byHash := make(map[common.Hash]*BlobSidecar, len(blobSidecars))
for _, sidecar := range blobSidecars {
var commitment kzg4844.Commitment
copy(commitment[:], sidecar.KZGCommitment[:])
versionedHash := KZGToVersionedHash(commitment)

for _, expectedHash := range blobHashes {
if bytes.Equal(versionedHash[:], expectedHash[:]) {
matchedCount++
d.logger.Info("Found matching blob", "index", sidecar.Index, "hash", versionedHash.Hex())

// Decode and process blob data
var blob Blob
b, err := hexutil.Decode(sidecar.Blob)
if err != nil {
d.logger.Error("Failed to decode blob data", "error", err)
continue
}
copy(blob[:], b)

// Verify blob
//if err := VerifyBlobProof(&blob, commitment, kzg4844.Proof(sidecar.KZGProof)); err != nil {
// d.logger.Error("Blob verification failed", "error", err)
// continue
//}

// Add to sidecar
blobTxSidecar.Blobs = append(blobTxSidecar.Blobs, *blob.KZGBlob())
blobTxSidecar.Commitments = append(blobTxSidecar.Commitments, commitment)
blobTxSidecar.Proofs = append(blobTxSidecar.Proofs, kzg4844.Proof(sidecar.KZGProof))
break
}
}
byHash[KZGToVersionedHash(commitment)] = sidecar
}

d.logger.Info("Blob matching results", "matched", matchedCount, "expected", len(blobHashes))
if matchedCount == 0 {
return nil, fmt.Errorf("no matching versionedHash was found")
// Downstream (ParseBatch) only consumes Sidecar.Blobs and
// Sidecar.Commitments; Proofs is intentionally left empty to
// avoid an extra ~O(n) KZG op per blob per batch on every
// sync. If a future consumer needs Proofs, compute them
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 L2 derivation and the oracle silently skip the new commitState transactions introduced by this PR. commitState (Rollup.sol:257-277) calls _commitBatchWithBatchData which emits the same CommitBatch(batchIndex, batchHash) event at line 383 that derivation filters on, but UnPackData at node/derivation/derivation.go:422-516 only matches the four legacy commitBatch selectors and returns ErrNotCommitBatchTx for the commitState selector. The caller at derivation.go:218-226 silently continues on that error, so any L2 blocks committed via commitState are never run through parseBatch/handleL1Message and root/withdrawal-root divergences inside that batch range are silently missed. The same defect exists in oracle/oracle/batch.go:118-184 (else-branch continue on unknown selectors), where it stalls fee/gas-oracle accounting at line 194 with batch is incontinuity, expect N, have N+1 as soon as the next non-commitState batch arrives. Trigger is reachable in production: tx-submitter/services/rollup.go:1133-1156 sets useCommitState=true and packs the commitState selector whenever Rollup.BatchBlobVersionedHashes[batchIndex] is non-zero, which persists across revertBatch (Rollup.sol:438-476 clears committedBatches[idx] and committedStateRoots[idx] but not batchBlobVersionedHashes[idx]). Fix: add a commitState branch in UnPackData that unpacks the same BatchDataInput tuple as the rollupABI.commitBatch case, and add the matching branch in the oracle\x27s switch.

Extended reasoning...

What the bug is

This PR introduces a new commitState entry point on Rollup.sol (lines 257-277) that re-commits a batch using the L1-stored blob versioned hash from a previous commitBatch. Internally it calls the shared _commitBatchWithBatchData (line 271), which at line 383 emits the same CommitBatch(_batchIndex, committedBatches[_batchIndex]) event topic that derivation\x27s fetchRollupLog (derivation.go:280-292) filters on.

The two downstream consumers that decode the event\x27s tx calldata were not updated:

  1. node/derivation/derivation.go:422-516UnPackData has exactly four selector branches: beforeMoveBlockCtxABI.commitBatch, legacyRollupABI.commitBatch, rollupABI.commitBatch, and rollupABI.commitBatchWithProof. The final else at line 513-514 returns types.ErrNotCommitBatchTx for any other selector — including commitState.
  2. The caller at derivation.go:218-226 silently swallows that error:
    if errors.Is(err, types.ErrNotCommitBatchTx) { continue }
  3. oracle/oracle/batch.go:118-184 mirrors the same pattern: only abi.Methods["commitBatch"].ID and beforeRemoveSkipMapAbi.Methods["commitBatch"].ID are matched; the else at line 182-183 continues on unknown selectors. Worse, the oracle tracks batchIndex sequentially and at line 194-196 throws batch is incontinuity, expect N, have N+1 as soon as the next commitBatch after a skipped commitState arrives.

Why existing code does not prevent it

  • The derivation continue on ErrNotCommitBatchTx was originally intended for unrelated tx topics that share the contract address. Adding commitState retroactively turns this defensive skip into a silent data-drop.
  • The oracle has no equivalent ErrNotCommitBatchTx plumbing; it just hits the else branch and falls off the loop.
  • Nothing on L1 forbids commitState from being used: the contract path is fully wired and the staker can reach it.

The trigger is reachable

tx-submitter/services/rollup.go:1133-1156 reads r.Rollup.BatchBlobVersionedHashes(nil, big.NewInt(batchIndex)) and sets useCommitState = stored != [32]byte{}. The path packs the commitState selector at lines 1156-1160. The stored hash survives revertBatch in Rollup.sol:438-476, which clears committedBatches[_batchIndex] = bytes32(0) and committedStateRoots[_batchIndex] = bytes32(0) but never touches batchBlobVersionedHashes[_batchIndex]. So any owner-initiated revert leaves a stored blob hash behind, and the next legitimate re-commit of that batch index goes through commitState rather than commitBatch.

What the impact would be

Once any commitState tx lands on L1 for a given deployment:

  • Derivation: every L2 block in that batch range is never derived locally. parseBatch/handleL1Message is skipped, so any divergence between the canonical postStateRoot/withdrawalRoot written by L1 and what derivation would compute locally is silently lost. The SyncedBatchIndex / L2DeriveHeight metrics jump past the missing batch. Subsequent commitBatch txs still parse fine (their parent batch header bytes are carried in calldata), so the failure is silent end-to-end — operators get no fetch batch info failed log, no error, no metric drop.
  • Oracle: the gas/fee oracle\x27s sequential batch-index tracker hits batch is incontinuity on the very next commitBatch after a skipped commitState (oracle/oracle/batch.go:194-196). The oracle stalls until manual intervention.

Step-by-step proof

  1. Owner calls Rollup.revertBatch(batchHeader, 1) to revert batch N (Rollup.sol:438). committedBatches[N] is cleared but batchBlobVersionedHashes[N] retains the original versioned hash.
  2. tx-submitter wakes up, reads BatchBlobVersionedHashes(N)bytes32(0) at rollup.go:1133, sets useCommitState = true, packs commitState calldata at rollup.go:1156, and submits the L1 tx.
  3. Rollup.sol:257-277 commitState runs, _commitBatchWithBatchData emits CommitBatch(N, hash) at line 383.
  4. Derivation\x27s fetchRollupLog (derivation.go:280-292) picks up the log because the topic is RollupEventTopicHash. It calls fetchRollupDataByTxHash at derivation.go:219.
  5. UnPackData (derivation.go:422-516) compares tx.Data()[:4] against the four registered commitBatch selectors. None match. Line 513-514 returns (batch, types.ErrNotCommitBatchTx).
  6. The caller at derivation.go:221-223 sees errors.Is(err, types.ErrNotCommitBatchTx) and continues. No log line, no metric, no error surfaced.
  7. The next commitBatch for batch N+1 arrives normally; its calldata includes the parent batch header bytes verbatim, so derivation parses it fine. But block heights and roots from batch N were never validated locally against L2 state.

How to fix

The bindings already include commitState (verified in bindings/bindings/rollup.go). The fix is to add a commitState branch in both decoders that unpacks the identical BatchDataInput tuple already used by the rollupABI.commitBatch case (IRollup.sol:16-24):

// node/derivation/derivation.go, inside UnPackData
} else if bytes.Equal(d.rollupABI.Methods["commitState"].ID, data[:4]) {
    args, err := d.rollupABI.Methods["commitState"].Inputs.Unpack(data[4:])
    if err != nil { return batch, fmt.Errorf("commitState Unpack error:%v", err) }
    rollupBatchData := args[0].(struct { /* same BatchDataInput tuple as commitBatch */ })
    batch = geth.RPCRollupBatch{ /* same field copy */ }
}

And the matching branch in oracle/oracle/batch.go after line 140.

Scope note

The synthesis description mentions "parent for the next batch points to an unknown hash, and downstream batches also fail to derive." That part is not quite right — the next batch\x27s parent header bytes are carried verbatim in its own calldata, so subsequent batches still parse. The real damage is the silent skip: divergence inside the commitStated batch range is never caught, and the oracle stalls. This is what the verifiers correctly highlight, and it is sufficient on its own to make the bug normal-severity for any deployment that ever uses revertBatch.

Comment on lines 580 to 599
}

func (r *Rollup) handlePendingTx(txRecord *types.TxRecord, tx *ethtypes.Transaction, method string) error {
// Check for timeout
if txRecord.SendTime+uint64(r.cfg.TxTimeout.Seconds()) < uint64(time.Now().Unix()) {
log.Info("Transaction timed out",
"tx", tx.Hash().Hex(),
"nonce", tx.Nonce(),
"method", method)

// Try to replace the transaction with higher gas price
return r.replaceTimedOutTx(tx)
}

// Check if transaction might fail
if method == constants.MethodCommitBatch {
// Obsolete / doomed txs: cancel before timeout handling so we do not resubmit a tx that will revert.
if constants.IsCommitLikeMethod(method) {
batchIndex := utils.ParseParentBatchIndex(tx.Data()) + 1
lastCommitted, err := r.Rollup.LastCommittedBatchIndex(nil)
if err != nil {
return fmt.Errorf("get last committed batch index error: %w", err)
}

if batchIndex <= lastCommitted.Uint64() {
// This batch is already committed by another submitter
log.Info("Batch already committed by another submitter, trying to cancel transaction",
"batch_index", batchIndex,
"last_committed", lastCommitted.Uint64(),
"tx_hash", tx.Hash().String())

// Try to cancel the transaction since it will fail
cancelTx, err := r.CancelTx(tx)
if err != nil {
log.Error("Failed to cancel commit batch transaction",
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟣 Pre-existing bug surfaced by this PR. CancelTx (rollup.go:2088-2116) builds a V0 blob sidecar (kzg4844.ComputeBlobProof + sidecar.Version = 0 by default), unlike createBlobTx (rollup.go:1255-1272) and ReSubmitTx (rollup.go:1840-1846) which version-gate via blob.DetermineBlobVersion / blob.BlobSidecarVersionToV1. Osaka is already active on mainnet (2025-12-03), hoodi and devnet (2025-10-28), so a V0 sidecar is rejected at the L1 EL mempool admission step. The new cancel-first ordering in handlePendingTx (this PR) combined with V2 multi-blob activation makes this cancel path much hotter; fix is to mirror createBlobTx in the BlobTxType branch — capture head (already returned by the GetGasTipAndCap call discarded with _ at the top of CancelTx) and switch on blob.DetermineBlobVersion(head, r.chainId.Uint64()), or call blob.BlobSidecarVersionToV1(sidecar) after constructing the V0 sidecar.

Extended reasoning...

What the bug is

CancelTx in tx-submitter/services/rollup.go (BlobTxType branch, around lines 2088-2116) hardcodes a single-blob V0 sidecar:

sidecar := &ethtypes.BlobTxSidecar{
    Blobs:       []kzg4844.Blob{*emptyBlob},
    Commitments: []kzg4844.Commitment{emptyBlobCommit},
    Proofs:      []kzg4844.Proof{emptyBlobProof}, // kzg4844.ComputeBlobProof — V0 single-blob KZG proof
}
// sidecar.Version is left at the zero default = BlobSidecarVersion0

The head returned by GetGasTipAndCap at the top of CancelTx is discarded with _, so the function has no way to consult the chain config to decide whether V0 or V1 sidecar format is required.

Compare the two other blob-tx construction sites, both of which correctly Osaka-gate:

  • createBlobTx (rollup.go:1255-1272) switches on blob.DetermineBlobVersion(head, chainID) and produces Version=1 + MakeCellProof after Osaka.
  • ReSubmitTx (rollup.go:1840-1846) calls blob.BlobSidecarVersionToV1(sidecar) when the chain has crossed Osaka.

CancelTx has neither path.

Why this matters now

All three configured chain configs in common/blob/fee.go have OsakaTime already in the past relative to the current date (2026-05-19):

  • MainnetChainConfig.OsakaTime = 1764798551 = 2025-12-03
  • HoodiChainConfig.OsakaTime = 1761677592 = 2025-10-28
  • DevnetChainConfig.OsakaTime = 1761677592 = 2025-10-28

After Osaka, the L1 EL mempool requires BlobSidecarVersion1 with cell proofs (EIP-7594 / PeerDAS). A V0-formatted blob tx is rejected at admission. The version-gating logic in createBlobTx and the explicit BlobSidecarVersionToV1 call in ReSubmitTx would otherwise be dead code — they are not, which confirms the morph-l2/go-ethereum mempool enforces the chain-active sidecar version on admission, mirroring upstream Pectra/Osaka behavior.

PR-interaction (why this isn't purely pre-existing)

The CancelTx body itself is byte-identical pre-PR (the diff does not touch lines 2088-2116). However, this PR materially increases the bug's reachability in two ways:

  1. Cancel-first re-ordering in handlePendingTx (around rollup.go:580-621). Pre-PR, an obsolete commitBatch BlobTx that had also timed out would first traverse the timeout branch → replaceTimedOutTxReSubmitTx, which DOES call BlobSidecarVersionToV1 and produces an Osaka-conforming tx (even though it would revert on-chain). Post-PR, the cancel branch is checked first, so an obsolete BlobTx goes straight to CancelTx and produces a V0 sidecar.
  2. V2 multi-blob activation via BATCH_V2_UPGRADE_TIME materially increases the number of BlobTxs in flight, so cancel-on-obsolete-BlobTx becomes a routine occurrence rather than a rare corner case.

The widening to IsCommitLikeMethod (also covers commitState) is harmless for this particular bug since commitState is built via createDynamicFeeTx and hits the safe DynamicFeeTxType branch of CancelTx, not the buggy BlobTxType branch.

Step-by-step proof of reachability

  1. Local tx-submitter sends a V2 multi-blob commitBatch BlobTx for batch X (created correctly by createBlobTx with V1 cell-proof sidecar).
  2. Another active submitter (priority rollup mode, or rotation handoff) wins the race: r.Rollup.LastCommittedBatchIndex(nil) >= X.
  3. handlePendingTx (rollup.go:584-621) detects batchIndex <= lastCommitted for the new IsCommitLikeMethod branch and calls r.CancelTx(tx).
  4. CancelTx builds BlobTxSidecar{Version: 0, Proofs: [ComputeBlobProof(emptyBlob)]} and the cancel tx is signed and passed to SendTx.
  5. L1Client.SendTransaction reaches a post-Osaka L1 EL: the mempool rejects with an admission error (e.g. "invalid sidecar version" / cell-proof shape mismatch).
  6. SendTx removes the would-be cancel tx from pendingTxs (rollup.go:1715-1718) and returns the error. handlePendingTx logs "cancel commit batch transaction failed" at line 609 and returns the error.
  7. Crucially, the original obsolete BlobTx is NOT removed from pendingTxs — the cancel never landed. The next ProcessTx cycle re-enters handlePendingTx, re-hits the same lastCommitted check, and loops on the same failing cancel until either:
    • the original commitBatch BlobTx is mined and reverts with "batch already committed by another submitter" (full gas wasted, including blob fees), or
    • the L1 blob pool TTL-evicts the original BlobTx.

In either case the submitter cannot make progress on that nonce until L1 unblocks it.

How to fix

Mirror createBlobTx: in the BlobTxType branch of CancelTx, capture head from the GetGasTipAndCap call (currently _, _, _, head, err := r.GetGasTipAndCap() — already returned but discarded), then switch on blob.DetermineBlobVersion(head, r.chainId.Uint64()):

switch blob.DetermineBlobVersion(head, r.chainId.Uint64()) {
case ethtypes.BlobSidecarVersion0:
    sidecar.Version = ethtypes.BlobSidecarVersion0
    // existing ComputeBlobProof path
case ethtypes.BlobSidecarVersion1:
    sidecar.Version = ethtypes.BlobSidecarVersion1
    proofs, err := blob.MakeCellProof(sidecar.Blobs)
    if err != nil { return nil, err }
    sidecar.Proofs = proofs
}

Or, equivalently, construct the V0 sidecar as today and unconditionally call blob.BlobSidecarVersionToV1(sidecar) after — same result as the ReSubmitTx upgrade path.

Severity rationale

Filed as pre_existing: the CancelTx body itself is not modified by this PR, and the buggy code is reachable in main today. However, the PR's cancel-first reorder and V2 multi-blob enablement substantially increase the practical hit rate of this latent bug. Bundling the one-line fix here while the area is being touched is the least-disruptive option.

# Conflicts:
#	MakefileEks.mk
#	contracts/src/deploy-config/holesky.ts
#	contracts/src/deploy-config/hoodi.ts
#	contracts/src/deploy-config/l1.ts
#	contracts/src/deploy-config/qanetl1.ts
#	contracts/src/deploy-config/sepolia.ts
#	contracts/src/deploy-config/testnetl1.ts
#	ops/docker/docker-compose-4nodes.yml
#	prover/bin/client/elf/verifier-client
- common/blob/fee.go: BlobHashes iterates over commitments (the slice it
  indexes); MakeBlobProof rejects blob/commitment length mismatch.
- common/blob/payload.go: report blob[i*32] (source byte) instead of zeroed
  data[i*32] in RetrieveBlobBytes validation error.
- prover blob.rs: validate blob_data length before indexing in
  decode_blob_scalars.
- prover lib.rs: version-aware PI hash dispatch — V0/V1 and V2 each require
  >=1 versioned hash; unsupported versions rejected explicitly.
- batch_cache_genesis_header_test: replace blocking <-interrupt loop with a
  single synchronous AssembleCurrentBatchHeader call so the test terminates.
- commit_test: t.Skip when pk is empty to avoid pk[2:] panic.
Rebuilt prover/bin/client/elf/verifier-client via `cargo prove build
--docker --tag v6.2.0` against the current multi_batch tree.

New programVkey: 0x00a1e5d01b309d25dc5014890b88ac4ba57c6065e05bfe352a2a8c120d60e003

Updated programVkey in deploy-configs: holesky, hoodi, l1, qanetl1,
sepolia, testnetl1.
Copy link
Copy Markdown

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Code review skipped — your organization's overage spend limit has been reached.

Code review is billed via overage credits. To resume reviews, an organization admin can raise the monthly limit at claude.ai/admin-settings/claude-code.

Once credits are available, reopen this pull request to trigger a review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants