Skip to content

perf: buffer accumulation in BatchMessage.send_body()#791

Draft
mykaul wants to merge 2 commits intoscylladb:masterfrom
mykaul:perf/buffer-accum-batch-message
Draft

perf: buffer accumulation in BatchMessage.send_body()#791
mykaul wants to merge 2 commits intoscylladb:masterfrom
mykaul:perf/buffer-accum-batch-message

Conversation

@mykaul
Copy link
Copy Markdown

@mykaul mykaul commented Apr 4, 2026

Summary

Replace per-call write_value()/write_byte()/write_short() in BatchMessage.send_body() with buffer accumulation (list.append + b"".join + single f.write()), reducing f.write() calls from Q*(4 + 2*P) + footer to 1 for Q queries with P params each.

Depends on PR #790 (perf/buffer-accum-write-params).

Motivation

Batch INSERTs are a common pattern for vector workloads — inserting many embeddings in a single batch. Each query in the batch serializes its parameters through write_value() (2 writes per param) plus per-query framing (write_byte, write_short, write_longstring). For a batch of 50 queries with 2 params each, the original code makes ~500 individual f.write() calls. Buffer accumulation collapses this to 1.

What changed

cassandra/protocol.py

BatchMessage.send_body() — Full buffer accumulation for the entire message: batch header, per-query framing (prepared/unprepared), all parameters (with NULL/UNSET/str handling), and trailer (consistency, flags, serial CL, timestamp, keyspace).

tests/unit/test_protocol.py

  • Updated test_batch_message_with_keyspace to use io.BytesIO for byte-level verification (compatible with single-write buffer accumulation output)
  • Added 7 new batch-specific test methods: prepared queries, unprepared queries, mixed, empty batch, many queries (50), NULL/UNSET params, and vector params (128D × 10 queries)

benchmarks/bench_batch_send_body.py (new)

Standalone benchmark script for reproducibility.

Benchmark results

Environment: Python 3.14, Cython .so compiled, 50K iterations, best of 3 runs, quiet machine (load < 2.0).

Methodology: Build .so from pre-patch code, benchmark, then apply patch, rebuild .so, and benchmark again — same machine, same session.

Scenario Before (ns/call) After (ns/call) Speedup
10 queries × 2 params (128D vec) 8364 4475 1.87x
10 queries × 2 params (768D vec) 8081 5516 1.47x
50 queries × 2 params (128D vec) 32368 16271 1.99x
10 queries × 10 text params 19138 9051 2.11x
50 queries × 10 text params 86845 40020 2.17x
10 unprepared × 2 params 8666 4252 2.04x

Implementation notes

Replace the per-parameter write_value(f, param) loop in
_QueryMessage._write_query_params() with a buffer accumulation approach:
list.append + b"".join + single f.write().

This reduces the number of f.write() calls from 2*N+1 to 1, which is
significant for vector workloads with large parameters.

Also removes the redundant ExecuteMessage._write_query_params()
pass-through override to avoid extra MRO lookup per call.

Includes 14 unit tests covering normal, NULL, UNSET, empty, large vector,
and mixed parameter scenarios for both ExecuteMessage and QueryMessage.

Includes a benchmark script (benchmarks/bench_execute_write_params.py).
@mykaul mykaul marked this pull request as draft April 4, 2026 17:01
Replace per-write_value()/write_byte()/write_short() calls in
BatchMessage.send_body() with buffer accumulation (list.append +
b"".join + single f.write()), reducing f.write() calls from
Q*(4 + 2*P) + footer to 1 for Q queries with P params each.

Benchmark results (Python 3.14, Cython .so, 50K iters, best of 3,
quiet machine):

  Scenario                              Before    After    Speedup
  10 queries x 2 params (128D vec)      8364 ns   4475 ns  1.87x
  10 queries x 2 params (768D vec)      8081 ns   5516 ns  1.47x
  50 queries x 2 params (128D vec)     32368 ns  16271 ns  1.99x
  10 queries x 10 text params          19138 ns   9051 ns  2.11x
  50 queries x 10 text params          86845 ns  40020 ns  2.17x
  10 unprepared x 2 params              8666 ns   4252 ns  2.04x

Also updates test_batch_message_with_keyspace to use BytesIO for
byte-level verification (compatible with single-write output).

Adds 7 batch-specific unit tests covering prepared, unprepared, mixed,
empty, many-query, NULL/UNSET, and vector parameter scenarios.

Includes benchmark script benchmarks/bench_batch_send_body.py.
@mykaul mykaul force-pushed the perf/buffer-accum-batch-message branch from 5104cdf to 6e0eb40 Compare April 4, 2026 17:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant