Skip to content

perf: use pre-allocated constant for null sentinel in collection serialization#763

Draft
mykaul wants to merge 1 commit intoscylladb:masterfrom
mykaul:perf/collection-null-sentinel
Draft

perf: use pre-allocated constant for null sentinel in collection serialization#763
mykaul wants to merge 1 commit intoscylladb:masterfrom
mykaul:perf/collection-null-sentinel

Conversation

@mykaul
Copy link
Copy Markdown

@mykaul mykaul commented Mar 25, 2026

Summary

  • Replace per-call int32_pack(-1) with a module-level _INT32_NULL constant in 5 collection serialize methods
  • Avoids a struct.pack() call on every null element during collection serialization

Details

The CQL protocol represents null collection elements as a 4-byte int32 with value -1. Previously, every null element triggered a fresh struct.pack('>i', -1) call. This PR pre-computes the result once at module load time and reuses the bytes object.

Affected sites

Type Method Null path
ListType / SetType _SimpleParameterizedType.serialize_safe null elements
MapType MapType.serialize_safe null keys, null values (2 sites)
TupleType TupleType.serialize_safe null fields
UserType UserType.serialize_safe null fields

Benchmark results

Isolated write operation (the buf.write(...) call itself):

Operation Time Speedup
buf.write(int32_pack(-1)) (before) 0.069 µs
buf.write(_INT32_NULL) (after) 0.039 µs 1.78×

End-to-end ListType.serialize (list with 50% null elements):

List size Time/call
10 elements 1.5 µs
100 elements 11.7 µs

TupleType.serialize (5-element tuple, all nulls):

Case Time/call
5 null fields 0.8 µs

Testing

  • Added CollectionNullSentinelTests with 5 round-trip tests covering List, Set, Map, Tuple, and UserType with None elements
  • All 652 existing unit tests continue to pass (16 pre-existing skips, 860 pre-existing warnings)

…alization

Replace per-call int32_pack(-1) with a module-level _INT32_NULL constant
in serialize methods of ListType, SetType, MapType, TupleType, and
UserType. This avoids a struct.pack call on every null element during
collection serialization.

Affected sites:
  - _SimpleParameterizedType.serialize_safe (ListType/SetType)
  - MapType.serialize_safe (null key + null value)
  - TupleType.serialize_safe
  - UserType.serialize_safe
@mykaul mykaul marked this pull request as draft March 25, 2026 20:31
mykaul added a commit to mykaul/python-driver that referenced this pull request Apr 3, 2026
Replace io.BytesIO() buffer pattern with list accumulation + b''.join()
in serialize_safe methods for ListType, SetType, MapType, TupleType,
and UserType. Also pre-compute _INT32_NULL = int32_pack(-1) as a
module-level constant to avoid repeated packing of the null sentinel.

Buffer assembly micro-benchmarks (isolating the BytesIO overhead from
per-element to_binary() cost):

  Scenario                  Before (us)   After (us)  Speedup
  List 100 elements              9.0          8.6      1.05x
  List 10 elements               1.1          0.9      1.18x
  List 10 all-null               0.8          0.4      2.23x
  Map 10 entries                 2.0          1.6      1.24x

The all-null case benefits most from the pre-computed _INT32_NULL
constant, which eliminates repeated int32_pack(-1) calls.

Note: PR scylladb#763 on this repo adds only the _INT32_NULL constant;
this commit is a superset that also replaces BytesIO with b''.join()
across all four collection/composite type serializers.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant