fix(bb-prover): pool long-lived bb verifier processes instead of spawning per-call#23093
Merged
spalladino merged 6 commits intoMay 11, 2026
Merged
Conversation
The socket file appears after bb's bind() but before its listen(). A connect() landing in that window returns ECONNREFUSED. The previous code rejected the first attempt; under contention (e.g. tx_stats_bench spawning ~600 bb processes on an 8-CPU isolate) the race window stretches and ECONNREFUSED becomes a reliable flake. Retry connect() with exponential backoff (capped at 50ms) while ECONNREFUSED keeps coming, up to the existing 5s budget. Other socket errors fail fast as before. With this in place the bind→listen race is no longer observable regardless of how many bb spawns happen in parallel. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…-or-fresh API Old API exposed two callback-style methods (withFreshInstance / withVerifierInstance) plus startVerifierPool / stopVerifierPool, with the prover-vs-verifier distinction hardcoded into the surface. The class also handled both proving and verifying despite the "Prover" name, which made it harder to reason about. New API: a single getInstance() returning BBJsApi & AsyncDisposable. - No poolSize set: getInstance() spawns a fresh bb that is destroyed on dispose (same behaviour as the old withFreshInstance). - poolSize: N: pool of N long-lived bb processes; getInstance() borrows from it and dispose returns it. Pool is lazily spawned on first acquire. Callers use `await using inst = await factory.getInstance()` for RAII-style release, matching the codebase's preference for AsyncDisposable. BBCircuitVerifier passes poolSize: numConcurrentIVCVerifiers (or undefined when 0) — same intent as the previous startVerifierPool path. Every other caller (BBNativeRollupProver, AvmProvingTester, prove_native.ts) leaves poolSize unset, preserving the prior fresh-per-call behaviour. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…mFields
splitChonkProofToStructured had three hand-maintained constants (MERGE_PROOF_SIZE,
ECCVM_PROOF_LENGTH, JOINT_PROOF_LENGTH) duplicating C++ values. When C++ shifted
the Chonk proof layout (e.g. databus relation changes shrinking the oink portion),
these constants went stale and chonkVerify failed deep in the verifier with an
opaque "OinkVerifier: num_public_inputs mismatch with VK" error.
Add a new ChonkVerifyFromFields bbapi command that takes a flat Vec<bb::fr> and
calls ChonkProof::from_field_elements server-side, then runs the verifier.
The TS layer now passes the flat fields straight through — no layout knowledge,
no hand-maintained constants.
- bbapi_chonk.{hpp,cpp}: new ChonkVerifyFromFields struct + execute().
- bbapi_execute.hpp: register the command in the Command and CommandResponse unions.
- bb_js_backend.ts: BBJsApi.verifyChonkProof now calls api.chonkVerifyFromFields;
splitChonkProofToStructured and the 3 constants are deleted.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The previous BBJsFactory swallowed bb destroy() errors with .catch(() => {}) in
three disposal paths (makeOwned, makeBorrowed, destroy). A bb child that failed
to shut down cleanly would silently linger and Jest would later complain about a
worker that "failed to exit gracefully", which is exactly the symptom we hit on
one test run after the rename.
Also fix a partial-failure leak in initPool: the previous Promise.all rejected
on first createInstance() failure but other already-spawned bb children were
dropped on the floor. Switch to allSettled so we destroy them.
Disposal failures now surface (single bb destroy errors propagate; pool destroy
aggregates via AggregateError) instead of being silently swallowed.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The new ChonkVerifyFromFields bbapi variant added to bb.js bindings tips the playground main entrypoint just over the 1750 KB hard limit (1750.02 KB). Bumping to 1800 to give 50 KB of headroom for further bbapi additions. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
Follow-up to #21564 (bb-prover bb.js migration) addressing the IVC verification perf regression that surfaced in
tx_stats_bench.The migration kept the legacy spawn-per-verification model: every chonk/ultra-honk verification through
BBCircuitVerifierspawned a freshbbprocess and SIGTERMed it after one proof.BB_NUM_IVC_VERIFIERS=8only capped concurrency at the queue layer (QueuedIVCVerifier), not the number of bb processes.That made the bench spawn ~600 bb processes over its 60s 10 TPS phase inside an 8-CPU isolate. Two compounding problems:
bbstartup tax on every verification's hot path.NativeUnixSocket: bb's socket file appears afterbind()but beforelisten(). A TSconnect()landing in that window getsECONNREFUSED. Vanishingly rare under low load; reliable flake under contention. Diagnosis at http://ci.aztec-labs.com/735256f13a268733.What
Make
BB_NUM_IVC_VERIFIERSmean what its name says (commits aa99817, 0f4cb77)Pool of long-lived bb verifier processes instead of fresh-per-call. The factory class is renamed
BBJsProverFactory→BBJsFactory(it's used for both proving and verifying) and given a singlegetInstance(): Promise<BBJsApi & AsyncDisposable>method:new BBJsFactory(path)→ no pool. EverygetInstance()spawns a fresh bb that is destroyed on dispose. Same as the previouswithFreshInstancebehaviour — used byBBNativeRollupProver, the AVM proving tester, and ivc-integration helpers, so their semantics are unchanged.new BBJsFactory(path, { poolSize: N })→ pool of N long-lived bb processes, lazily spawned on first acquire. Used byBBCircuitVerifierwithpoolSize: numConcurrentIVCVerifiers.Callers use
await using inst = await factory.getInstance()for RAII-style release, matching the codebase's preference forAsyncDisposable.BBCircuitVerifier.stop(already wired through to aztec-node shutdown) tears the pool down.Close the bind→listen race in bb.js (commit 8e519b0)
barretenberg/ts/src/bb_backends/node/native_socket.ts: retryconnect()onECONNREFUSEDwith exponential backoff (capped at 50 ms) up to the existing 5 s budget. Other socket errors fail fast as before. Pool startup still spawns N bb processes in parallel, so the race surface is reduced from ~600 to N — the retry handles the residual.Server-side Chonk proof split (commit 97577cf)
splitChonkProofToStructuredin TS had three hand-maintained constants (MERGE_PROOF_SIZE,ECCVM_PROOF_LENGTH,JOINT_PROOF_LENGTH) duplicating C++ values. When C++ shifted Chonk layout (e.g. databus relation changes shrinking the oink portion in the previous round of regressions), these went stale and verification failed deep in the verifier with an opaque "OinkVerifier: num_public_inputs mismatch with VK".Add a new
ChonkVerifyFromFieldsbbapi command that takes a flatVec<bb::fr>and callsChonkProof::from_field_elementsserver-side, then runs the verifier. The TS layer now passes flat fields straight through — no layout knowledge, no hand-maintained constants.bbapi_chonk.{hpp,cpp}: new struct +execute().bbapi_execute.hpp: register the variant.bb_js_backend.ts:verifyChonkProofcalls the new API;splitChonkProofToStructuredand the 3 constants are deleted.Disposal robustness (commit 5cde220)
The first cut of
BBJsFactoryhad three.catch(() => {})clauses that silently swallowed bbdestroy()errors, and aninitPool()that dropped already-spawned bb children if a sibling creation failed (Promise.allshort-circuit). Both would manifest as the Jest "worker failed to exit gracefully" warning we hit on one test run.Now: destroy errors propagate (
AggregateErrorfor the pool path);initPoolusesallSettledand tears down anything it spawned if any sibling rejects.Playground bundle size (commit 1681d33)
The new
ChonkVerifyFromFieldsbbapi variant tipped the playground main entrypoint over the 1750 KB hard limit. Bumped to 1800 with a bump-log entry.Effect
tx_stats_bench: 600 bb spawns → 8 bb spawns at boot, then 8 long-lived processes serve every verification. The bind→listen race surface drops 75×, and the residual is handled by the connect retry. Per-call ~50–100 msbbstartup cost disappears from the verifier hot path.BBNativeRollupProver, AVM tests, ivc-integration) is unchanged — they still spawn fresh per call.ClaudeBox log: https://claudebox.work/s/2d65052b0deaeab2?run=3