perf: buffer accumulation in BatchMessage.send_body()#791
Draft
mykaul wants to merge 2 commits intoscylladb:masterfrom
Draft
perf: buffer accumulation in BatchMessage.send_body()#791mykaul wants to merge 2 commits intoscylladb:masterfrom
mykaul wants to merge 2 commits intoscylladb:masterfrom
Conversation
Replace the per-parameter write_value(f, param) loop in _QueryMessage._write_query_params() with a buffer accumulation approach: list.append + b"".join + single f.write(). This reduces the number of f.write() calls from 2*N+1 to 1, which is significant for vector workloads with large parameters. Also removes the redundant ExecuteMessage._write_query_params() pass-through override to avoid extra MRO lookup per call. Includes 14 unit tests covering normal, NULL, UNSET, empty, large vector, and mixed parameter scenarios for both ExecuteMessage and QueryMessage. Includes a benchmark script (benchmarks/bench_execute_write_params.py).
Replace per-write_value()/write_byte()/write_short() calls in BatchMessage.send_body() with buffer accumulation (list.append + b"".join + single f.write()), reducing f.write() calls from Q*(4 + 2*P) + footer to 1 for Q queries with P params each. Benchmark results (Python 3.14, Cython .so, 50K iters, best of 3, quiet machine): Scenario Before After Speedup 10 queries x 2 params (128D vec) 8364 ns 4475 ns 1.87x 10 queries x 2 params (768D vec) 8081 ns 5516 ns 1.47x 50 queries x 2 params (128D vec) 32368 ns 16271 ns 1.99x 10 queries x 10 text params 19138 ns 9051 ns 2.11x 50 queries x 10 text params 86845 ns 40020 ns 2.17x 10 unprepared x 2 params 8666 ns 4252 ns 2.04x Also updates test_batch_message_with_keyspace to use BytesIO for byte-level verification (compatible with single-write output). Adds 7 batch-specific unit tests covering prepared, unprepared, mixed, empty, many-query, NULL/UNSET, and vector parameter scenarios. Includes benchmark script benchmarks/bench_batch_send_body.py.
5104cdf to
6e0eb40
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Replace per-call
write_value()/write_byte()/write_short()inBatchMessage.send_body()with buffer accumulation (list.append+b"".join+ singlef.write()), reducingf.write()calls fromQ*(4 + 2*P) + footerto 1 for Q queries with P params each.Depends on PR #790 (
perf/buffer-accum-write-params).Motivation
Batch INSERTs are a common pattern for vector workloads — inserting many embeddings in a single batch. Each query in the batch serializes its parameters through
write_value()(2 writes per param) plus per-query framing (write_byte,write_short,write_longstring). For a batch of 50 queries with 2 params each, the original code makes ~500 individualf.write()calls. Buffer accumulation collapses this to 1.What changed
cassandra/protocol.pyBatchMessage.send_body()— Full buffer accumulation for the entire message: batch header, per-query framing (prepared/unprepared), all parameters (with NULL/UNSET/str handling), and trailer (consistency, flags, serial CL, timestamp, keyspace).tests/unit/test_protocol.pytest_batch_message_with_keyspaceto useio.BytesIOfor byte-level verification (compatible with single-write buffer accumulation output)benchmarks/bench_batch_send_body.py(new)Standalone benchmark script for reproducibility.
Benchmark results
Environment: Python 3.14, Cython
.socompiled, 50K iterations, best of 3 runs, quiet machine (load < 2.0).Methodology: Build
.sofrom pre-patch code, benchmark, then apply patch, rebuild.so, and benchmark again — same machine, same session.Implementation notes
list.append+b"".joinpattern as PR perf: buffer accumulation in _write_query_params() reduces f.write() calls #790 (_write_query_params)_i32,_u16,_u8,_p) for Cython-friendly tight loopsstrparams by encoding to UTF-8 (matchingwrite_value/write_longstringbehavior)