feat: multi blob#935
Conversation
- Add isBatchV2Upgraded hook to BatchCache so V2 header is always generated once the upgrade is activated, regardless of blob count. Previously the code fell back to V1 for single-blob batches, which is incompatible with the V2 public_input_hash (keccak(hash[0]) ≠ hash[0]). - Remove the MAX_BLOB_PER_BLOCK = 6 constant from Rollup.sol and rely solely on blobhash(i) == bytes32(0) to terminate the blob-count loop. Per spec §9 design decision, blob count limits should be controlled by tx-submitter MaxBlobCount config, not a hardcoded contract constant, so Ethereum protocol upgrades (e.g. EIP-7691) require no contract change. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… challenge handler - Change ExecutorInput.blob_info to blob_infos (Vec<BlobInfo>) with batch_version field - Add BlobVerifier::verify_blobs for multi-blob KZG verification - Add BatchInfo::public_input_hash_v2 using keccak256(hash[0]||...||hash[N-1]) - Add multi-blob encoding (encode_multi_blob, encode_blob_from_bytes) in host blob.rs - Route verify() on batch_version: V2 uses aggregated blob hashes, V0/V1 unchanged - Update shadow_rollup calc_batch_pi to parse V2 header with blob_count at offset 257 - Add blob_count and extra_blob_hashes to challenge handler BatchInfo and encode Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… entrypoints - Add batch_version to ProveRequest and thread through gen_client_input/execute_batch - Use verify_blobs (all blobs) instead of verify (first blob only) in server queue - Compute blobHashesHash for V2 in server batch_header_ex; pass individual hashes for fill_ext - fill_ext parses V2 blob_count + per-blob hashes from extended batch_header_ex - Add batch_version param to try_execute_batch; callers extract version from batch_header[0] - Add --batch-version CLI arg to host binary - Add blob_count param to execute_batch for correct PI hash routing Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…V0/V1 compat fields Replace blob_versioned_hash + blob_count + extra_blob_hashes with a single blob_hashes: Vec<[u8; 32]>. fill_ext parses all hashes from batch_header_ex, encode writes blob_hashes[0] at offset 57 and appends count + remaining hashes for V2. No backward-compatibility shims needed since prover components upgrade together. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds BatchHeader V2 and multi-blob support across blob utilities, batch cache packing/sealing, node/prover flows, Rollup contract, tx-submitter wiring, gas-oracle sizing, tests, and infra updates. ChangesBatch V2 header, interfaces, and shared blob/encoding utilities
Batch cache V2 enablement and tests
Solidity Rollup V2 and tests
Node DA verification and batch parsing
Prover/executor multi-blob inputs and hashing
Gas-oracle multi-blob changes
Tx-submitter refactor, flags, config, infra
Sequence Diagram(s)sequenceDiagram
participant Cache
participant L2Gov
participant BlobUtils
participant TxSubmitter
participant Rollup
participant Beacon
participant Prover
Cache->>L2Gov: BatchBlockInterval/BatchTimeout
Cache->>BlobUtils: CompressBatchBytes -> MakeBlobTxSidecar
Cache->>TxSubmitter: Emit V2 header (aggregated blob hashes)
TxSubmitter->>Rollup: commitBatch(V<=2, sidecar)
Rollup-->>TxSubmitter: committed
TxSubmitter->>Beacon: fetch sidecars by tx
Beacon-->>TxSubmitter: blobs+commitments
Prover->>Prover: verify_blobs -> versioned_hashes
Prover->>Prover: public_input_hash_v2(aggregated hash)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested labels
Suggested reviewers
✨ Finishing Touches🧪 Generate unit tests (beta)
|
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…mpress once Split get_origin_batch into unpack_blob (field-element unpack) and decompress_batch (zstd decompress). verify_blobs now KZG-verifies each blob independently, unpacks all compressed chunks, concatenates them, then calls decompress_batch once. Previously each blob was independently decompressed which fails for N>1 since the zstd frame spans all chunks. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…fset 57 V2 headers now use the same 257-byte format as V1 with the aggregated blob hash (keccak256 of all blob hashes) at offset 57. This eliminates BatchHeaderCodecV2, simplifies contracts/prover/submitter, and fixes the multi-blob decompression bug in blob_verifier. - Delete BatchHeaderCodecV2.sol; V2 commitBatch computes aggregated hash inline - Unify _verifyProof and _loadBatchHeader for all versions - Remove BatchHeaderV2 struct in Go; V2 uses V1 format + version override - Simplify Rust challenge handler, queue, shadow_rollup (uniform 96-byte batch_header_ex) - Fix verify_blobs: decode BLS scalars per blob, concatenate, decompress once Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
V2 should store the aggregated blob hash (keccak256 of all blob hashes) in batchBlobVersionedHashes, consistent with the header offset 57 value, instead of blobhash(0). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…struction Move blobVersionedHash computation out of _commitBatchWithBatchData into callers via a new _computeBlobVersionedHash(version) helper: - V0/V1: blobhash(0) or ZERO_VERSIONED_HASH - V2: keccak256(blobhash(0)||...||blobhash(N-1)), requires >=1 blob _commitBatchWithBatchData now has a single unified header construction path for all versions — no more V2/V0V1 branch split. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Remove the V2 restriction in commitState — with the simplified V2 header format (aggregated hash at offset 57), the stored batchBlobVersionedHashes value is sufficient to recommit without a blob, same as V0/V1. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…rsionedHash - Move `require(_blobCount > 0)` before the keccak256 assembly block to avoid computing keccak256 on empty input when no blobs are attached - Add multi-blob aggregated hash unit tests to RollupCommitBatchV2Test: single/two/three blob correctness, order-sensitivity, and _verifyProof publicInputHash alignment with V2 aggregated hash semantics - Add Rust unit tests for public_input_hash_v2 in prover-executor-client: single/two/three blob correctness, order-sensitivity, and structural difference from V1 (V2 uses keccak(h0||...||hN-1), not raw hash) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: corey <corey.zhang@bitget.com>
Co-authored-by: corey <corey.zhang@bitget.com> Co-authored-by: Cursor <cursoragent@cursor.com>
Resolve conflicts after PR #939 (remove sequencer batch generation): - node/types/batch_test.go: keep HEAD (multi_batch) version; it carries TestBatchHeaderV2 / TestBlobHashesHashUnavailableForLegacy which still exercise BatchHeaderV0/V1/V2 helpers retained on this branch. - go.work.sum: union both sides (toolchain entries from x/exp, x/mod, x/tools); identical go directive on both branches. - go-ethereum submodule: take main's 045be0fd (v2.2.2, includes the matching "Remove sequencer batch write paths" change for #939) over multi_batch's older 62952ec7. - common/go.mod: bump tendermint replace from v0.3.4 -> v0.3.7 to align with node/tx-submitter/contracts/oracle/ops modules; common itself has no tendermint imports, the directive was stale and was the cause of "conflicting replacements for github.com/tendermint/tendermint" in the workspace build. Co-authored-by: Cursor <cursoragent@cursor.com>
…/batch (#945) Co-authored-by: corey <corey.zhang@bitget.com> Co-authored-by: Cursor <cursoragent@cursor.com>
Point the go-ethereum submodule at 045be0fdc7ca (morph-v2.2.2) to match origin/main and the locked go.mod version. Co-authored-by: Cursor <cursoragent@cursor.com>
|
@claude review |
There was a problem hiding this comment.
This is a protocol-level multi-blob (Batch V2) upgrade spanning 73 files across the Rollup contract, derivation, prover public-input hashing, gas-oracle DA scalar, and tx-submitter sealing/commit paths — it should get a human review.
Extended reasoning...
Overview
This PR introduces Batch Header V2 and multi-blob support end-to-end:
- L1 contract (
contracts/contracts/l1/rollup/Rollup.sol): acceptsversion <= 2, adds_computeBlobVersionedHashwhich aggregatesblobhash(0..N-1)viakeccak256for V2, and routes V2 headers through the V1 257-byte loader. The 32-byte field at offset 57 now carries an aggregated hash for V2 (versioned hash for V0/V1), and_verifyProofconsumes that field uniformly. - Prover (
prover/crates/executor/client,prover/bin/{server,shadow-prove,challenge}):ExecutorInputgainsblob_infos: Vec<BlobInfo>andbatch_version: u8;BlobVerifier::verify_blobsconcatenates raw blob scalars and decompresses once;BatchInfo::public_input_hash_v2hasheskeccak256(h0||...||hN-1)into the public input. The proverServer's blob input fed intobatch_header.dataflips between aggregated and single hash based onbatch_version. - Node derivation (
node/derivation):ParseBatchnow concatenates all blob bodies before a single zstd decompress (per-blob decompression would fail mid-frame), enforces L1-declared blob order via a versioned-hash map, and replacesVerifyBlobProofwith a commitment round-trip (verifyBlob) since beacon/blob_sidecarsno longer guarantees a legacy single-blob KZG proof post-Osaka/PeerDAS. - Common module (
common/batch,common/blob): new Go module hostingBatchCache,BatchStorage, V2 header codec (BlobHashesHashvsBlobVersionedHash), and multi-blob sidecar/compression helpers; tx-submitter and node now depend on it viacommon/blobcompat shims. - Tx-submitter: new flags
MAX_BLOB_COUNTandBATCH_V2_UPGRADE_TIMEgate V2 activation;EstimateGasnow passesBlobHashes/BlobGasFeeCapsoblobhash(i)simulates correctly duringeth_estimateGas. - Gas-oracle: blob extraction switched to multi-blob concat-then-decompress;
get_data_from_blobreturns(batch_size, num_blobs, txn_count)and blob-scalar capacity scales withnum_blobs. - Configs/infra:
programVkeyupdated across all networks (holesky/hoodi/l1/qanetl1/sepolia/testnetl1) — verifier-side change that must match the new SP1 ELF.
Security risks
- L1 contract surface changes:
Rollup._computeBlobVersionedHashand_loadBatchHeaderaccept a new version, and the proof public-input now depends on an aggregated blob hash for V2. A mismatch between the contract's aggregation (loop overblobhash(i)until zero) and the prover's aggregation (Vec from beacon) would silently invalidate proofs or accept malformed batches. CodeRabbit flagged that the V0/V1 branch does not assertblobhash(1) == 0, allowing legacy batches to carry uncommitted extra blobs; the author elected not to add the guard. - Verifier key rotation:
programVkeychanges across all deploy configs — any deploy-script discrepancy between contract upgrade and prover ELF would brick finalization. - Derivation hardening deferred: CodeRabbit raised a calldata-prefix length check (since fixed in 2843031), a stale V0→V1
BlockContext.Decodeon tx-payload bytes, and a header-vs-calldatablockCountmismatch. The author explicitly deferred the latter two under a 'trusted sequencer' policy — defensible but worth a human sign-off on the threat model. - Removed KZG proof check:
VerifyBlobProofwas replaced with a commitment→versioned-hash round-trip. The reasoning (PeerDAS makes beacon-supplied proofs unreliable) is sound, but the security argument deserves a human review given that this is the only on-the-fly DA check at derivation time. - Blob length validation:
copy(blob[:], b)is now guarded by an exact length check after CodeRabbit raised it — good — but several other multi-blob index sites (Rustdecode_blob_scalars, GoMakeBlobProof) were flagged without explicit fixes landing.
Level of scrutiny
This is the highest-scrutiny tier: a coordinated protocol upgrade that changes consensus-relevant public-input hashing, L1 contract code, and verifier keys, with several deferred hardening items and an explicit invocation of the trusted-sequencer model in derivation. Outside the bot's auto-approval criteria — needs a human reviewer who can validate the V2 activation/coordination plan, confirm prover/contract aggregation parity, and accept the deferred derivation guards.
Other factors
- 60+ minute estimated review effort per CodeRabbit's own walkthrough.
- Multiple unresolved CodeRabbit majors (legacy-version extra-blob guard, blobless-batch scalar fallback, batch_version validation, calldata-vs-header blockCount consistency) where the author chose to defer rather than fix; a human should weigh those tradeoffs.
- Tests added are non-trivial (V2 header parsing, multi-blob ParseBatch round-trip, contract V2 aggregation tests) but do not cover the cross-boundary parity between contract aggregation and prover aggregation end-to-end.
- The bug hunting system found no bugs this run, but absence of findings here is not a substitute for human review on a change of this scope.
|
@claude review |
| #[serde(default = "default_batch_version")] | ||
| pub batch_version: u8, | ||
| } | ||
|
|
||
| fn default_batch_version() -> u8 { | ||
| 2 | ||
| } |
There was a problem hiding this comment.
🔴 The server defaults batch_version to 2 in prover/bin/server/src/queue.rs:27-33, but the challenge handler's ProveRequest at prover/bin/challenge/src/handler.rs:21-27 has no batch_version field at all and POSTs JSON without it (handler.rs:184-189). For any V0/V1 batch the challenge handler asks the prover to prove, the server applies the V2 default and dispatches public_input_hash_v2 at lib.rs:37-41 plus writes keccak256(h0||...) to batch_header.data at queue.rs:167-175, producing a public-input hash that does not match the on-chain Rollup contract's V0/V1 blobhash(0)-derived PI — so L1 challenge proof verification fails. Fix by updating challenge handler ProveRequest to include batch_version (read from the batch header it already loads, mirroring shadow_prove.rs:127), or change default_batch_version() to 0 to match host/src/main.rs:36 and ExecutorInput's #[serde(default)].
Extended reasoning...
What the bug is
prover/bin/server/src/queue.rs:21-33 defines the prover server's HTTP API request schema with a #[serde(default = "default_batch_version")] on batch_version, and default_batch_version() returns 2 (the V2 multi-blob variant). Any caller that POSTs to /prove_batch without including the field will have its proof generated under V2 PI semantics.
prover/bin/challenge/src/handler.rs:21-27 defines its own ProveRequest struct that literally has no batch_version field:
#[derive(Serialize)]
pub struct ProveRequest {
pub batch_index: u64,
pub start_block: u64,
pub end_block: u64,
pub rpc: String,
}It is serialized to JSON and POSTed to /prove_batch at handler.rs:184-190. The server deserializes that JSON, sees no batch_version field, and applies the default of 2.
Why existing code does not prevent it
This PR is the one introducing batch_version. It updates shadow_prove.rs:127-134 to read batch_version from the on-chain batch header and pass it explicitly, but it does not update challenge/src/handler.rs. Three other entry points in this PR all default to 0:
prover/bin/host/src/main.rs:36—#[clap(long = "batch-version", default_value_t = 0)]prover/crates/executor/client/src/types/input.rs:73-74—#[serde(default)]onpub batch_version: u8(u8 default = 0)prover/bin/shadow-prove/src/shadow_prove.rs:24—#[serde(default)](u8 default = 0)
Only the server queue defaults to 2. So the challenge handler — the one in-tree caller that doesn't set the field — silently triggers V2 PI math on every batch it asks for proof of.
Step-by-step proof
- A challenge is filed against a V0 or V1 batch on L1 (e.g. any historical batch from before
BATCH_V2_UPGRADE_TIME). challenge-handlerdetects the event, inspects the batch (handler.rs:154-161), and constructsProveRequest { batch_index, start_block, end_block, rpc }— nobatch_versionfield at all (handler.rs:184-189).util::call_prover(serde_json::to_string(&request).unwrap(), "/prove_batch")(handler.rs:190) sends a JSON body that lacksbatch_version.- The server's
ProveRequestdeserializer inqueue.rs:21-29appliesdefault_batch_version()= 2. prove_for_queuecallsgen_client_input(..., batch_version=2)(queue.rs:90-96). Inside,execute_batchis invoked withbatch_version = 2(queue.rs:145-153), so the resultingExecutorInput.batch_version = 2.gen_client_inputthen computesblob_inputfor the savedbatch_header.data(queue.rs:163-181):For the V0/V1 batch we getlet blob_input = if batch_version >= 2 { let mut blob_hasher = Keccak256::new(); for h in &versioned_hashes { blob_hasher.update(h.as_slice()); } blob_hasher.finalize() } else { versioned_hashes[0] };
keccak256(versioned_hashes[0])instead ofversioned_hashes[0].- Inside the zkVM,
prover/crates/executor/client/src/lib.rs:37-41dispatches oninput.batch_version:So the proof commits to a public input that useslet public_input_hash = if input.batch_version >= 2 { batch_info.public_input_hash_v2(&versioned_hashes) } else { batch_info.public_input_hash(&versioned_hashes[0]) };
keccak256(versioned_hashes[0])as the blob input field. - On L1, the Rollup contract's
_verifyProof(Rollup.sol:758-772) reads offset 57 of the V0/V1 batch header, which was set at commit time by_computeBlobVersionedHash(0_or_1)(Rollup.sol:921-942) toblobhash(0)directly, notkeccak256(blobhash(0)). The contract then computes its expected public input hash usingblobhash(0)as the blob field. - Off-chain PI (
keccak256(h0)) != on-chain PI (h0) → SP1 verifier rejects the proof → challenge resolution fails on L1.
Impact
Challenge resolution is the safety-critical defense path: when an honest party challenges an invalid batch, the prover must produce a valid SP1 proof to defend the batch on L1. With this bug, the prover silently produces proofs against the wrong public input for any V0/V1 batch challenged via challenge-handler, so verification fails and the rollup loses the challenge by default. This is bounded to the V0/V1 → V2 transition window (any historical V0/V1 batches still inside the proof window when the V2 upgrade goes live), but during that window the challenge handler is broken.
How to fix
Three options, in order of robustness:
- Best: Update
challenge/src/handler.rs:21-27ProveRequestto includebatch_version: u8, and athandler.rs:184-189populate it from the parent batch header thatbatch_inspectalready loads (mirroringshadow_prove.rs:124-134). The challenge handler already parses the batch header athandler.rs:486-500, so the version byte is trivially available. - Change
default_batch_version()inqueue.rs:31-33to return0instead of2, matchinghost/src/main.rs:36andtypes/input.rs:73-74. This eliminates the silent regression but still leavesbatch_versionimplicit — a future V2-only caller might forget to set it. - Remove
#[serde(default = "default_batch_version")]entirely and make the field mandatory in the request payload. This forces every caller to be explicit, which is the cleanest contract but requires touching all callers.
Option (1) addresses the current production-breaking path; option (2) or (3) is a defense-in-depth follow-up worth landing in the same PR.
Brings in 6 commits from base feat/multi_batch since the previous merge (09a2c72) so PR #951 stops reporting CONFLICTING: - 4be9cda update challenge handler dep - c6f7cc2 add v0/v1 blob check - 7ec11cf Merge branch 'feat/multi_batch' of github.com:morph-l2/morph - fa9f4e8 fix submitter replay batch with config max blob count - 1fe0d74 add rustc version desc - fbd1bee chore: align go-ethereum submodule with origin/main Single content conflict in common/batch/batch_cache.go: base added a new sealEffectiveBlobCount function in the same hunk where SPEC-005 exported BuildBlockContext (PR #935-derived rename). Resolved by keeping both: sealEffectiveBlobCount stays where base placed it; the exported BuildBlockContext (with SPEC-005 docstring) follows. base's lowercase "buildBlockContext" comment was dropped (now stale -- we renamed it). All other touched files auto-merged: contracts/Rollup.sol, BlobVersionedHashLegacy.t.sol (added by base), prover/Cargo.{toml,lock}, prover/{,bin/challenge/}rust-toolchain (deleted by base), tx-submitter/mock/rollup.go, common/batch/batch_restart_test.go. go build ./common/batch/ ./common/blob/ ./node/... ./tx-submitter/... ./oracle/... -- clean. go test ./node/derivation/... -- 22 tests PASS (incl. 9 added in b9ad14e). common/batch tests hang on a pre-existing txSenderCacher goroutine leak unrelated to this merge (reproducible on origin/feat/multi_batch in isolation); not introduced here. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…hes (#953) Co-authored-by: corey <corey.zhang@bitget.com>
| MaxBlobCountFlag = cli.IntFlag{ | ||
| Name: "max_blob_count", | ||
| Usage: "Maximum number of blobs per batch submission (1-6)", | ||
| Value: 6, | ||
| EnvVar: prefixEnvVar("MAX_BLOB_COUNT"), | ||
| } |
There was a problem hiding this comment.
🟡 The max_blob_count flag at tx-submitter/flags/flags.go:336-341 documents its range as (1-6) in the usage string, but no code on the path from CLI parsing (utils/config.go:194) to BatchCache construction (common/batch/batch_cache.go:99) actually enforces the upper bound — NewBatchCache only guards <= 0 and otherwise accepts any positive value. An operator who sets TX_SUBMITTER_MAX_BLOB_COUNT=10 (e.g. by typo or misreading the docs) gets a tx-submitter that happily packs 7+ blob batches; MakeBlobTxSidecar's byte-capacity check passes, but every resulting blob tx is rejected by the L1 client at submission (EIP-4844 caps at 6, 9 under Pectra) until the operator restarts with corrected config. Suggest a one-line range check (e.g. maxBlobCount > 6 -> reject or clamp) at config load or in NewBatchCache to match the documented contract.
Extended reasoning...
What the bug is
MaxBlobCountFlag (tx-submitter/flags/flags.go:336-341) is declared with:
MaxBlobCountFlag = cli.IntFlag{
Name: "max_blob_count",
Usage: "Maximum number of blobs per batch submission (1-6)",
Value: 6,
EnvVar: prefixEnvVar("MAX_BLOB_COUNT"),
}The usage string promises a (1-6) range, but cli.IntFlag in this version of urfave/cli has no validator hook, so the bound is documentation-only.
The plumbing then:
- utils/config.go:194 reads
ctx.GlobalInt(flags.MaxBlobCountFlag.Name)intoConfig.MaxBlobCountwith no clamping or validation. - services/rollup.go:128 passes that value straight into
batch.NewBatchCache. - common/batch/batch_cache.go:99-101 is the only guard on the value:
— lower bound only.
if maxBlobCount <= 0 { maxBlobCount = 2 }
10,100,1000all pass.
Step-by-step proof
- Operator sets
TX_SUBMITTER_MAX_BLOB_COUNT=10(typo, doc misread, or stale config from a future protocol upgrade). NewBatchCache(..., maxBlobCount=10, ...)storesbc.maxBlobCount = 10. Capacity checks now use10 * MaxBlobBytesSize = 1,269,760bytes.- Once V2 is active (
isBatchV2Upgradedtrue),effectiveMaxBlobCountreturns 10. Batches grow until the compressed payload approaches that cap. - On a busy enough batch,
SealBatch→MakeBlobTxSidecar(compressed, 10)in common/blob/payload.go:78 — the only capacity check there islen(blobBytes) > maxBlobs*MaxBlobBytesSize, which a 10-blob payload satisfies. The sidecar is built with up to 10 blob field-element arrays. createBlobTxwraps the sidecar into aBlobTxand submits viaL1Client.SendTransaction.- L1 client rejects: EIP-4844 caps
tx.blobVersionedHashes.lengthat 6 (or 9 post-Pectra/EIP-7691 depending on network). The error surfaces but there is no recovery path — every subsequent attempt builds the same invalid tx because the cap is static config. The submitter is stuck until the operator restarts with corrected config.
Why existing code doesn't prevent it
The only checks in the cache/blob path use maxBlobs * MaxBlobBytesSize for byte capacity, never for blob count. The L1 protocol cap is implicit in the EL/CL clients, not enforced by tx-submitter. The default value (Value: 6) is correct, so a vanilla deployment is fine; the bug only manifests under explicit operator misconfiguration.
Impact
Bounded — this is operator misconfiguration territory, not an attacker-controlled input. The default is the safe maximum, the failure is loud (clear L1 error), no on-chain state is corrupted, and no submitter state is lost. But the documented (1-6) contract is not enforced, which is the kind of mismatch that breaks during operator handoffs or doc drift. One verifier rightly calls this "not actionable" since L1 will reject and the operator sees the error immediately; that view is defensible, but flagging it as a nit costs nothing and the fix is one line.
Fix
Add a range check at config load (utils/config.go around line 194) or in NewBatchCache:
if maxBlobCount > 6 {
return fmt.Errorf("max_blob_count %d exceeds protocol cap 6", maxBlobCount)
}Or clamp to match the documented bound. The choice between reject vs clamp depends on whether silent normalization is preferable to an explicit boot-time failure; rejecting matches the principle of failing loudly on misconfiguration.
| func RetrieveBlobBytes(blob *kzg4844.Blob) ([]byte, error) { | ||
| data := make([]byte, MaxBlobBytesSize) | ||
| for i := 0; i < 4096; i++ { | ||
| if blob[i*32] != 0 { | ||
| return nil, fmt.Errorf("invalid blob, found non-zero high order byte %x of field element %d", data[i*32], i) | ||
| } | ||
| copy(data[i*31:i*31+31], blob[i*32+1:i*32+32]) | ||
| } | ||
| return data, nil |
There was a problem hiding this comment.
🟡 RetrieveBlobBytes at common/blob/payload.go:47 builds its "non-zero high order byte %x of field element %d" error from data[i*32], but data is the freshly allocated all-zero output buffer at that point — nothing has been written to it yet. The byte that actually failed the check is blob[i*32] (the condition on line 46 tests blob[i*32] != 0), so the error always reports 0x0, directly contradicting its own "non-zero" wording. The same pattern exists in common/batch/blob.go:48; both spots need the one-character fix data[i*32] → blob[i*32].
Extended reasoning...
What the bug is
At common/blob/payload.go:43-51, RetrieveBlobBytes walks the 4096 field elements of a blob and rejects any element whose high-order byte is non-zero (the canonical encoding requires that byte to be 0x00):
func RetrieveBlobBytes(blob *kzg4844.Blob) ([]byte, error) {
data := make([]byte, MaxBlobBytesSize)
for i := 0; i < 4096; i++ {
if blob[i*32] != 0 {
return nil, fmt.Errorf("invalid blob, found non-zero high order byte %x of field element %d", data[i*32], i)
}
copy(data[i*31:i*31+31], blob[i*32+1:i*32+32])
}
return data, nil
}The condition on line 46 correctly tests blob[i*32] != 0, but the format argument on line 47 reads data[i*32] — the freshly allocated, all-zero output buffer. The loop only writes data[i*31:i*31+31] (31-byte stride), never the 32-byte stride offset the error reads from, so data[i*32] is guaranteed to be 0x00 every time the error fires.
Step-by-step proof
- Call
RetrieveBlobBytes(blob)whereblob[0] = 0x42(any non-zero byte at a field-element boundary). - Enter the loop at
i = 0.make([]byte, MaxBlobBytesSize)returns a slice where every byte is0x00. blob[0*32] != 0evaluates totrue(sinceblob[0] = 0x42).fmt.Errorfformatsdata[0*32]=data[0]=0x00.- The returned error reads:
invalid blob, found non-zero high order byte 0 of field element 0— internally self-contradicting (the message claims the byte is non-zero while printing it as0).
Why existing code does not prevent it
There is no prior write to data[i*32] before the error fires:
- On the first iteration,
datais still the zeroed buffer frommake. - On later iterations the loop only writes
data[i*31:i*31+31], i.e. the 31-byte-stride slice, never the 32-byte-stride offset the error reads. So even mid-loop,data[i*32]is always either still the initial zero or, at most, an unrelated byte written by a prior iteration'scopy— it has no semantic relation to the field element being validated.
The identical pattern is also present in common/batch/blob.go:48 (the older copy that RetrieveBlobBytes was forked from / moved alongside in this PR). Both spots need the fix.
Impact
Diagnostic-only — the validation itself still correctly rejects malformed blobs. But operators investigating a malformed blob (from a misbehaving sequencer, beacon node, or upstream encoder) get an uninformative error that always reads 0x0 instead of the actual offending byte value, which is the single most useful piece of forensic information. The error message also directly contradicts its own "non-zero" wording — the kind of inconsistency that wastes operator time during a production incident. CodeRabbit flagged this on 2026-05-08 at this exact location and it remains unfixed.
How to fix
One-character change in each file: replace data[i*32] with blob[i*32] in the format argument so the error logs the actual offending source byte.
return nil, fmt.Errorf("invalid blob, found non-zero high order byte %x of field element %d", blob[i*32], i)There was a problem hiding this comment.
Thanks for fixing common/blob/payload.go:47 — that one now correctly reports blob[i*32].
However the comment also flagged the same pattern at common/batch/blob.go:48, and that file still has the original data[i*32]:
// common/batch/blob.go:44-52
func RetrieveBlobBytes(blob *kzg4844.Blob) ([]byte, error) {
data := make([]byte, MaxBlobBytesSize)
for i := 0; i < 4096; i++ {
if blob[i*32] != 0 {
return nil, fmt.Errorf("invalid blob, found non-zero high order byte %x of field element %d", data[i*32], i)
// ^^^^^^^^^
// always 0x0
}
copy(data[i*31:i*31+31], blob[i*32+1:i*32+32])
}
return data, nil
}This is not dead code: node/derivation/batch_info.go:110 calls it as commonbatch.RetrieveBlobBytes(&blobCopy) (imported as commonbatch "morph-l2/common/batch"), so the derivation path still hits the always-zero error message on malformed blobs.
Same one-character fix: data[i*32] → blob[i*32].
Co-authored-by: FletcherMan <fanciture@163.com> Co-authored-by: fletcher.fan <fletcher.fan@bitget.com>
| let commit_scalar = (rollup_gas_used.as_u64() + self.finalize_batch_gas_used) * PRECISION / | ||
| l2_txn.max(self.txn_per_batch); | ||
| let blob_scalar = if l2_data_len > 0 { | ||
| MAX_BLOB_TX_PAYLOAD_SIZE as u64 * PRECISION / l2_data_len | ||
| num_blobs.max(1) * MAX_BLOB_TX_PAYLOAD_SIZE as u64 * PRECISION / l2_data_len | ||
| } else { | ||
| MAX_BLOB_SCALAR | ||
| }; |
There was a problem hiding this comment.
🔴 The new commitState() method emits the same CommitBatch event as commitBatch but the tx is a DynamicFeeTx with no blob sidecar; gas-oracle's calculate_scalar does not filter logs by tx selector, so when a commitState tx is the most-recent CommitBatch log, get_data_from_blob() returns (0,0,0) and calculate_from_rollup (l1_scalar.rs:249-255) falls into the l2_data_len == 0 branch, setting blob_scalar = MAX_BLOB_SCALAR (10^11) and pushing it to L2. This inflates the L1-DA fee component on every L2 user transaction until a subsequent blob-carrying commitBatch lands and the oracle re-derives a normal value. Fix by propagating an Option/None sentinel from get_data_from_blob when indexed_hashes.is_empty() and skipping the set_blob_scalar update in that case, or by filtering CommitBatch logs in calculate_scalar by tx selector to exclude commitState.
Extended reasoning...
What the bug is
This PR adds a new commitState() method to Rollup.sol that re-commits a previously-committed batch by reusing the blob versioned hash stored on L1 (batchBlobVersionedHashes[N] populated by a prior commitBatch). The commitState tx itself is a plain DynamicFeeTx with no blobVersionedHashes — see tx-submitter/services/rollup.go:1206 and the new rebuild path at tryRebuildRollupCommitTx (rollup.go:1379-1392) which both call createDynamicFeeTx. Crucially, commitState reaches _commitBatchWithBatchData which unconditionally emits the same CommitBatch event at Rollup.sol:383 — so at the L1 log level a commitState is indistinguishable from a commitBatch.
The gas-oracle's calculate_scalar (gas-oracle/app/src/da_scalar/l1_scalar.rs:175-198) subscribes to CommitBatch events via commit_batch_filter() with no method/selector discrimination, then picks the most-recent log by block number (line 193 max_by_key(log.block_number)). It then calls calculate_from_rollup() -> get_data_from_blob() for that tx hash. Inside get_data_from_blob (line 271+), data_and_hashes_from_txs reads target_tx.other['blobVersionedHashes'] (calculate.rs:155-159). For a commitState DynamicFeeTx this field is empty/absent, so indexed_hashes is empty and the function short-circuits to Ok((0, 0, 0)) at l1_scalar.rs:300-303.
Why existing code does not prevent it
Back in calculate_from_rollup, the new multi-blob math at lines 249-255 is:
let blob_scalar = if l2_data_len > 0 {
num_blobs.max(1) * MAX_BLOB_TX_PAYLOAD_SIZE as u64 * PRECISION / l2_data_len
} else {
MAX_BLOB_SCALAR // 10^11
};With l2_data_len == 0, blob_scalar saturates to MAX_BLOB_SCALAR. The threshold gate (check_threshold_reached, lines 159-171) only compares relative change to gas_threshold%; jumping from a normal value (~1e9-1e10) to 1e11 is roughly 10-100x, easily clearing any sane threshold. set_blob_scalar (line 147) is then invoked and the inflated value is pushed to the L2 GasPriceOracle, where it feeds directly into the L1-DA fee component of every L2 user transaction.
Why this PR introduces the bug
Pre-PR, every commitBatch tx attached at least one blob — even empty batches used EmptyVersionedHash as a sentinel — so indexed_hashes was always non-empty for any CommitBatch event the oracle processed. The (0,0,0) -> MAX_BLOB_SCALAR branch existed but was unreachable in production. CodeRabbit flagged this exact latent fall-through on 2026-05-08 (PR timeline, l1_scalar.rs:303 'Skip blob-scalar recalculation for blobless batches'); the comment is unaddressed. This PR makes the dead branch live by introducing commitState: a CommitBatch-emitting tx with zero attached blobs.
Step-by-step proof
- Owner runs
revertBatch(N).committedBatches[N]is cleared butbatchBlobVersionedHashes[N]is preserved (Rollup.sol:108 comment). - tx-submitter's next
rollup()iteration reads the stored blob hash and choosescommitState(rollup.go:1133-1144). It packscommitState(rollupBatch, signature)and submits viacreateDynamicFeeTx— no blob sidecar. - The tx lands on L1;
_commitBatchWithBatchDataemitsCommitBatch(batchIndex, hash)exactly ascommitBatchwould. - Gas-oracle's next cycle:
commit_batch_filter()returns all CommitBatch logs in the last 100 blocks;max_by_key(block_number)selects thecommitStatetx as the most-recent. calculate_from_rollup->get_data_from_blob(tx_hash)-> fetches the tx;target_tx.other['blobVersionedHashes']is empty;indexed_hashes.is_empty()is true; returnsOk((0, 0, 0)).- Back in
calculate_from_rollup:l2_data_len == 0->blob_scalar = MAX_BLOB_SCALAR = 10^11. check_threshold_reachedcompares 10^11 to the current ~1e9-1e10 stored value;actual_change > expected_changetrivially holds;set_blob_scalar(MAX_BLOB_SCALAR)is sent to L2.- L2
GasPriceOraclewrites 10^11; every subsequent user transaction pays a roughly 10-100x inflated L1-DA fee component until the next blob-carryingcommitBatcharrives and the oracle re-derives a normal scalar.
Impact and bounds
Self-healing: any subsequent commitBatch (with blob) lands within a normal sequencing window and overwrites MAX_BLOB_SCALAR with a sane value. The inflated-fee window is bounded to that interval (typically minutes, but operator-controlled). However the trigger is realistic — revertBatch is owner-only but the rebuild path at rollup.go:1379-1392 will dispatch commitState on every recovery, and the oracle picks it up on the very next polling cycle. Users pay real overcharges during the window. No state corruption, no consensus impact, but a concrete user-facing fee regression introduced by this PR's commitState plumbing.
How to fix
Either of the following (the bug submitter and CodeRabbit both proposed essentially the same set):
- Propagate a sentinel from
get_data_from_blob: change the return type toResult<Option<(u64,u64,u64)>>(or an explicit enum) and havecalculate_from_rollupskip theset_blob_scalarpush when the sentinel isNone("no blobs in this tx, do not update blob scalar"). - Filter logs by tx selector in
calculate_scalar: drop logs whose tx data starts with thecommitStateselector beforemax_by_keypicks the most-recent. - Carry over the previous
blob_scalarwhenl2_data_len == 0, instead of saturating toMAX_BLOB_SCALAR.
Option (1) is the cleanest and matches the existing semantics: a tx that delivers no DA bytes simply should not move the DA scalar.
| return nil, fmt.Errorf("failed to get blobs, continuing processing:%v", err) | ||
| } | ||
| if len(blobSidecars) > 0 { | ||
| // Create blob sidecar | ||
| var blobTxSidecar eth.BlobTxSidecar | ||
| matchedCount := 0 | ||
|
|
||
| // Match blobs | ||
| // Index beacon sidecars by their KZG-derived versioned hash so we | ||
| // can assemble the local sidecar in the exact order the L1 tx | ||
| // declared its blobs. Multi-blob batches are decoded by | ||
| // concatenating blob bodies in tx order; any reordering here | ||
| // would corrupt the resulting zstd stream. The map key is | ||
| // derived from the beacon-supplied commitment; verifyBlob below | ||
| // re-derives the same hash from the actual blob bytes, so a | ||
| // malicious beacon cannot forge an entry by lying about the | ||
| // commitment. | ||
| byHash := make(map[common.Hash]*BlobSidecar, len(blobSidecars)) | ||
| for _, sidecar := range blobSidecars { | ||
| var commitment kzg4844.Commitment | ||
| copy(commitment[:], sidecar.KZGCommitment[:]) | ||
| versionedHash := KZGToVersionedHash(commitment) | ||
|
|
||
| for _, expectedHash := range blobHashes { | ||
| if bytes.Equal(versionedHash[:], expectedHash[:]) { | ||
| matchedCount++ | ||
| d.logger.Info("Found matching blob", "index", sidecar.Index, "hash", versionedHash.Hex()) | ||
|
|
||
| // Decode and process blob data | ||
| var blob Blob | ||
| b, err := hexutil.Decode(sidecar.Blob) | ||
| if err != nil { | ||
| d.logger.Error("Failed to decode blob data", "error", err) | ||
| continue | ||
| } | ||
| copy(blob[:], b) | ||
|
|
||
| // Verify blob | ||
| //if err := VerifyBlobProof(&blob, commitment, kzg4844.Proof(sidecar.KZGProof)); err != nil { | ||
| // d.logger.Error("Blob verification failed", "error", err) | ||
| // continue | ||
| //} | ||
|
|
||
| // Add to sidecar | ||
| blobTxSidecar.Blobs = append(blobTxSidecar.Blobs, *blob.KZGBlob()) | ||
| blobTxSidecar.Commitments = append(blobTxSidecar.Commitments, commitment) | ||
| blobTxSidecar.Proofs = append(blobTxSidecar.Proofs, kzg4844.Proof(sidecar.KZGProof)) | ||
| break | ||
| } | ||
| } | ||
| byHash[KZGToVersionedHash(commitment)] = sidecar | ||
| } | ||
|
|
||
| d.logger.Info("Blob matching results", "matched", matchedCount, "expected", len(blobHashes)) | ||
| if matchedCount == 0 { | ||
| return nil, fmt.Errorf("no matching versionedHash was found") | ||
| // Downstream (ParseBatch) only consumes Sidecar.Blobs and | ||
| // Sidecar.Commitments; Proofs is intentionally left empty to | ||
| // avoid an extra ~O(n) KZG op per blob per batch on every | ||
| // sync. If a future consumer needs Proofs, compute them |
There was a problem hiding this comment.
🔴 L2 derivation and the oracle silently skip the new commitState transactions introduced by this PR. commitState (Rollup.sol:257-277) calls _commitBatchWithBatchData which emits the same CommitBatch(batchIndex, batchHash) event at line 383 that derivation filters on, but UnPackData at node/derivation/derivation.go:422-516 only matches the four legacy commitBatch selectors and returns ErrNotCommitBatchTx for the commitState selector. The caller at derivation.go:218-226 silently continues on that error, so any L2 blocks committed via commitState are never run through parseBatch/handleL1Message and root/withdrawal-root divergences inside that batch range are silently missed. The same defect exists in oracle/oracle/batch.go:118-184 (else-branch continue on unknown selectors), where it stalls fee/gas-oracle accounting at line 194 with batch is incontinuity, expect N, have N+1 as soon as the next non-commitState batch arrives. Trigger is reachable in production: tx-submitter/services/rollup.go:1133-1156 sets useCommitState=true and packs the commitState selector whenever Rollup.BatchBlobVersionedHashes[batchIndex] is non-zero, which persists across revertBatch (Rollup.sol:438-476 clears committedBatches[idx] and committedStateRoots[idx] but not batchBlobVersionedHashes[idx]). Fix: add a commitState branch in UnPackData that unpacks the same BatchDataInput tuple as the rollupABI.commitBatch case, and add the matching branch in the oracle\x27s switch.
Extended reasoning...
What the bug is
This PR introduces a new commitState entry point on Rollup.sol (lines 257-277) that re-commits a batch using the L1-stored blob versioned hash from a previous commitBatch. Internally it calls the shared _commitBatchWithBatchData (line 271), which at line 383 emits the same CommitBatch(_batchIndex, committedBatches[_batchIndex]) event topic that derivation\x27s fetchRollupLog (derivation.go:280-292) filters on.
The two downstream consumers that decode the event\x27s tx calldata were not updated:
- node/derivation/derivation.go:422-516 —
UnPackDatahas exactly four selector branches:beforeMoveBlockCtxABI.commitBatch,legacyRollupABI.commitBatch,rollupABI.commitBatch, androllupABI.commitBatchWithProof. The finalelseat line 513-514 returnstypes.ErrNotCommitBatchTxfor any other selector — includingcommitState. - The caller at derivation.go:218-226 silently swallows that error:
if errors.Is(err, types.ErrNotCommitBatchTx) { continue }
- oracle/oracle/batch.go:118-184 mirrors the same pattern: only
abi.Methods["commitBatch"].IDandbeforeRemoveSkipMapAbi.Methods["commitBatch"].IDare matched; theelseat line 182-183continues on unknown selectors. Worse, the oracle tracksbatchIndexsequentially and at line 194-196 throwsbatch is incontinuity, expect N, have N+1as soon as the nextcommitBatchafter a skippedcommitStatearrives.
Why existing code does not prevent it
- The derivation
continueonErrNotCommitBatchTxwas originally intended for unrelated tx topics that share the contract address. AddingcommitStateretroactively turns this defensive skip into a silent data-drop. - The oracle has no equivalent
ErrNotCommitBatchTxplumbing; it just hits theelsebranch and falls off the loop. - Nothing on L1 forbids
commitStatefrom being used: the contract path is fully wired and the staker can reach it.
The trigger is reachable
tx-submitter/services/rollup.go:1133-1156 reads r.Rollup.BatchBlobVersionedHashes(nil, big.NewInt(batchIndex)) and sets useCommitState = stored != [32]byte{}. The path packs the commitState selector at lines 1156-1160. The stored hash survives revertBatch in Rollup.sol:438-476, which clears committedBatches[_batchIndex] = bytes32(0) and committedStateRoots[_batchIndex] = bytes32(0) but never touches batchBlobVersionedHashes[_batchIndex]. So any owner-initiated revert leaves a stored blob hash behind, and the next legitimate re-commit of that batch index goes through commitState rather than commitBatch.
What the impact would be
Once any commitState tx lands on L1 for a given deployment:
- Derivation: every L2 block in that batch range is never derived locally.
parseBatch/handleL1Messageis skipped, so any divergence between the canonicalpostStateRoot/withdrawalRootwritten by L1 and what derivation would compute locally is silently lost. TheSyncedBatchIndex/L2DeriveHeightmetrics jump past the missing batch. SubsequentcommitBatchtxs still parse fine (their parent batch header bytes are carried in calldata), so the failure is silent end-to-end — operators get nofetch batch info failedlog, no error, no metric drop. - Oracle: the gas/fee oracle\x27s sequential batch-index tracker hits
batch is incontinuityon the very nextcommitBatchafter a skippedcommitState(oracle/oracle/batch.go:194-196). The oracle stalls until manual intervention.
Step-by-step proof
- Owner calls
Rollup.revertBatch(batchHeader, 1)to revert batchN(Rollup.sol:438).committedBatches[N]is cleared butbatchBlobVersionedHashes[N]retains the original versioned hash. - tx-submitter wakes up, reads
BatchBlobVersionedHashes(N)≠bytes32(0)at rollup.go:1133, setsuseCommitState = true, packscommitStatecalldata at rollup.go:1156, and submits the L1 tx. - Rollup.sol:257-277
commitStateruns,_commitBatchWithBatchDataemitsCommitBatch(N, hash)at line 383. - Derivation\x27s
fetchRollupLog(derivation.go:280-292) picks up the log because the topic isRollupEventTopicHash. It callsfetchRollupDataByTxHashat derivation.go:219. UnPackData(derivation.go:422-516) comparestx.Data()[:4]against the four registeredcommitBatchselectors. None match. Line 513-514 returns(batch, types.ErrNotCommitBatchTx).- The caller at derivation.go:221-223 sees
errors.Is(err, types.ErrNotCommitBatchTx)andcontinues. No log line, no metric, no error surfaced. - The next
commitBatchfor batchN+1arrives normally; its calldata includes the parent batch header bytes verbatim, so derivation parses it fine. But block heights and roots from batchNwere never validated locally against L2 state.
How to fix
The bindings already include commitState (verified in bindings/bindings/rollup.go). The fix is to add a commitState branch in both decoders that unpacks the identical BatchDataInput tuple already used by the rollupABI.commitBatch case (IRollup.sol:16-24):
// node/derivation/derivation.go, inside UnPackData
} else if bytes.Equal(d.rollupABI.Methods["commitState"].ID, data[:4]) {
args, err := d.rollupABI.Methods["commitState"].Inputs.Unpack(data[4:])
if err != nil { return batch, fmt.Errorf("commitState Unpack error:%v", err) }
rollupBatchData := args[0].(struct { /* same BatchDataInput tuple as commitBatch */ })
batch = geth.RPCRollupBatch{ /* same field copy */ }
}And the matching branch in oracle/oracle/batch.go after line 140.
Scope note
The synthesis description mentions "parent for the next batch points to an unknown hash, and downstream batches also fail to derive." That part is not quite right — the next batch\x27s parent header bytes are carried verbatim in its own calldata, so subsequent batches still parse. The real damage is the silent skip: divergence inside the commitStated batch range is never caught, and the oracle stalls. This is what the verifiers correctly highlight, and it is sufficient on its own to make the bug normal-severity for any deployment that ever uses revertBatch.
| } | ||
|
|
||
| func (r *Rollup) handlePendingTx(txRecord *types.TxRecord, tx *ethtypes.Transaction, method string) error { | ||
| // Check for timeout | ||
| if txRecord.SendTime+uint64(r.cfg.TxTimeout.Seconds()) < uint64(time.Now().Unix()) { | ||
| log.Info("Transaction timed out", | ||
| "tx", tx.Hash().Hex(), | ||
| "nonce", tx.Nonce(), | ||
| "method", method) | ||
|
|
||
| // Try to replace the transaction with higher gas price | ||
| return r.replaceTimedOutTx(tx) | ||
| } | ||
|
|
||
| // Check if transaction might fail | ||
| if method == constants.MethodCommitBatch { | ||
| // Obsolete / doomed txs: cancel before timeout handling so we do not resubmit a tx that will revert. | ||
| if constants.IsCommitLikeMethod(method) { | ||
| batchIndex := utils.ParseParentBatchIndex(tx.Data()) + 1 | ||
| lastCommitted, err := r.Rollup.LastCommittedBatchIndex(nil) | ||
| if err != nil { | ||
| return fmt.Errorf("get last committed batch index error: %w", err) | ||
| } | ||
|
|
||
| if batchIndex <= lastCommitted.Uint64() { | ||
| // This batch is already committed by another submitter | ||
| log.Info("Batch already committed by another submitter, trying to cancel transaction", | ||
| "batch_index", batchIndex, | ||
| "last_committed", lastCommitted.Uint64(), | ||
| "tx_hash", tx.Hash().String()) | ||
|
|
||
| // Try to cancel the transaction since it will fail | ||
| cancelTx, err := r.CancelTx(tx) | ||
| if err != nil { | ||
| log.Error("Failed to cancel commit batch transaction", |
There was a problem hiding this comment.
🟣 Pre-existing bug surfaced by this PR. CancelTx (rollup.go:2088-2116) builds a V0 blob sidecar (kzg4844.ComputeBlobProof + sidecar.Version = 0 by default), unlike createBlobTx (rollup.go:1255-1272) and ReSubmitTx (rollup.go:1840-1846) which version-gate via blob.DetermineBlobVersion / blob.BlobSidecarVersionToV1. Osaka is already active on mainnet (2025-12-03), hoodi and devnet (2025-10-28), so a V0 sidecar is rejected at the L1 EL mempool admission step. The new cancel-first ordering in handlePendingTx (this PR) combined with V2 multi-blob activation makes this cancel path much hotter; fix is to mirror createBlobTx in the BlobTxType branch — capture head (already returned by the GetGasTipAndCap call discarded with _ at the top of CancelTx) and switch on blob.DetermineBlobVersion(head, r.chainId.Uint64()), or call blob.BlobSidecarVersionToV1(sidecar) after constructing the V0 sidecar.
Extended reasoning...
What the bug is
CancelTx in tx-submitter/services/rollup.go (BlobTxType branch, around lines 2088-2116) hardcodes a single-blob V0 sidecar:
sidecar := ðtypes.BlobTxSidecar{
Blobs: []kzg4844.Blob{*emptyBlob},
Commitments: []kzg4844.Commitment{emptyBlobCommit},
Proofs: []kzg4844.Proof{emptyBlobProof}, // kzg4844.ComputeBlobProof — V0 single-blob KZG proof
}
// sidecar.Version is left at the zero default = BlobSidecarVersion0The head returned by GetGasTipAndCap at the top of CancelTx is discarded with _, so the function has no way to consult the chain config to decide whether V0 or V1 sidecar format is required.
Compare the two other blob-tx construction sites, both of which correctly Osaka-gate:
createBlobTx(rollup.go:1255-1272) switches onblob.DetermineBlobVersion(head, chainID)and producesVersion=1+MakeCellProofafter Osaka.ReSubmitTx(rollup.go:1840-1846) callsblob.BlobSidecarVersionToV1(sidecar)when the chain has crossed Osaka.
CancelTx has neither path.
Why this matters now
All three configured chain configs in common/blob/fee.go have OsakaTime already in the past relative to the current date (2026-05-19):
MainnetChainConfig.OsakaTime= 1764798551 = 2025-12-03HoodiChainConfig.OsakaTime= 1761677592 = 2025-10-28DevnetChainConfig.OsakaTime= 1761677592 = 2025-10-28
After Osaka, the L1 EL mempool requires BlobSidecarVersion1 with cell proofs (EIP-7594 / PeerDAS). A V0-formatted blob tx is rejected at admission. The version-gating logic in createBlobTx and the explicit BlobSidecarVersionToV1 call in ReSubmitTx would otherwise be dead code — they are not, which confirms the morph-l2/go-ethereum mempool enforces the chain-active sidecar version on admission, mirroring upstream Pectra/Osaka behavior.
PR-interaction (why this isn't purely pre-existing)
The CancelTx body itself is byte-identical pre-PR (the diff does not touch lines 2088-2116). However, this PR materially increases the bug's reachability in two ways:
- Cancel-first re-ordering in
handlePendingTx(around rollup.go:580-621). Pre-PR, an obsolete commitBatch BlobTx that had also timed out would first traverse the timeout branch →replaceTimedOutTx→ReSubmitTx, which DOES callBlobSidecarVersionToV1and produces an Osaka-conforming tx (even though it would revert on-chain). Post-PR, the cancel branch is checked first, so an obsolete BlobTx goes straight toCancelTxand produces a V0 sidecar. - V2 multi-blob activation via
BATCH_V2_UPGRADE_TIMEmaterially increases the number of BlobTxs in flight, so cancel-on-obsolete-BlobTx becomes a routine occurrence rather than a rare corner case.
The widening to IsCommitLikeMethod (also covers commitState) is harmless for this particular bug since commitState is built via createDynamicFeeTx and hits the safe DynamicFeeTxType branch of CancelTx, not the buggy BlobTxType branch.
Step-by-step proof of reachability
- Local tx-submitter sends a V2 multi-blob
commitBatchBlobTx for batch X (created correctly bycreateBlobTxwith V1 cell-proof sidecar). - Another active submitter (priority rollup mode, or rotation handoff) wins the race:
r.Rollup.LastCommittedBatchIndex(nil) >= X. handlePendingTx(rollup.go:584-621) detectsbatchIndex <= lastCommittedfor the newIsCommitLikeMethodbranch and callsr.CancelTx(tx).CancelTxbuildsBlobTxSidecar{Version: 0, Proofs: [ComputeBlobProof(emptyBlob)]}and the cancel tx is signed and passed toSendTx.L1Client.SendTransactionreaches a post-Osaka L1 EL: the mempool rejects with an admission error (e.g. "invalid sidecar version" / cell-proof shape mismatch).SendTxremoves the would-be cancel tx frompendingTxs(rollup.go:1715-1718) and returns the error.handlePendingTxlogs "cancel commit batch transaction failed" at line 609 and returns the error.- Crucially, the original obsolete BlobTx is NOT removed from
pendingTxs— the cancel never landed. The nextProcessTxcycle re-entershandlePendingTx, re-hits the samelastCommittedcheck, and loops on the same failing cancel until either:- the original
commitBatchBlobTx is mined and reverts with "batch already committed by another submitter" (full gas wasted, including blob fees), or - the L1 blob pool TTL-evicts the original BlobTx.
- the original
In either case the submitter cannot make progress on that nonce until L1 unblocks it.
How to fix
Mirror createBlobTx: in the BlobTxType branch of CancelTx, capture head from the GetGasTipAndCap call (currently _, _, _, head, err := r.GetGasTipAndCap() — already returned but discarded), then switch on blob.DetermineBlobVersion(head, r.chainId.Uint64()):
switch blob.DetermineBlobVersion(head, r.chainId.Uint64()) {
case ethtypes.BlobSidecarVersion0:
sidecar.Version = ethtypes.BlobSidecarVersion0
// existing ComputeBlobProof path
case ethtypes.BlobSidecarVersion1:
sidecar.Version = ethtypes.BlobSidecarVersion1
proofs, err := blob.MakeCellProof(sidecar.Blobs)
if err != nil { return nil, err }
sidecar.Proofs = proofs
}Or, equivalently, construct the V0 sidecar as today and unconditionally call blob.BlobSidecarVersionToV1(sidecar) after — same result as the ReSubmitTx upgrade path.
Severity rationale
Filed as pre_existing: the CancelTx body itself is not modified by this PR, and the buggy code is reachable in main today. However, the PR's cancel-first reorder and V2 multi-blob enablement substantially increase the practical hit rate of this latent bug. Bundling the one-line fix here while the area is being touched is the least-disruptive option.
# Conflicts: # MakefileEks.mk # contracts/src/deploy-config/holesky.ts # contracts/src/deploy-config/hoodi.ts # contracts/src/deploy-config/l1.ts # contracts/src/deploy-config/qanetl1.ts # contracts/src/deploy-config/sepolia.ts # contracts/src/deploy-config/testnetl1.ts # ops/docker/docker-compose-4nodes.yml # prover/bin/client/elf/verifier-client
- common/blob/fee.go: BlobHashes iterates over commitments (the slice it indexes); MakeBlobProof rejects blob/commitment length mismatch. - common/blob/payload.go: report blob[i*32] (source byte) instead of zeroed data[i*32] in RetrieveBlobBytes validation error. - prover blob.rs: validate blob_data length before indexing in decode_blob_scalars. - prover lib.rs: version-aware PI hash dispatch — V0/V1 and V2 each require >=1 versioned hash; unsupported versions rejected explicitly. - batch_cache_genesis_header_test: replace blocking <-interrupt loop with a single synchronous AssembleCurrentBatchHeader call so the test terminates. - commit_test: t.Skip when pk is empty to avoid pk[2:] panic.
Rebuilt prover/bin/client/elf/verifier-client via `cargo prove build --docker --tag v6.2.0` against the current multi_batch tree. New programVkey: 0x00a1e5d01b309d25dc5014890b88ac4ba57c6065e05bfe352a2a8c120d60e003 Updated programVkey in deploy-configs: holesky, hoodi, l1, qanetl1, sepolia, testnetl1.
There was a problem hiding this comment.
Code review is billed via overage credits. To resume reviews, an organization admin can raise the monthly limit at claude.ai/admin-settings/claude-code.
Once credits are available, reopen this pull request to trigger a review.
Summary by CodeRabbit
New Features
Configuration
Improvements
Tests