Skip to content

(improvement) (cython only) cache deserializer instances in find_deserializer and m…#741

Draft
mykaul wants to merge 2 commits intoscylladb:masterfrom
mykaul:perf/cache-deserializer-lookup
Draft

(improvement) (cython only) cache deserializer instances in find_deserializer and m…#741
mykaul wants to merge 2 commits intoscylladb:masterfrom
mykaul:perf/cache-deserializer-lookup

Conversation

@mykaul
Copy link
Copy Markdown

@mykaul mykaul commented Mar 13, 2026

Cache find_deserializer() and make_deserializers() results in Cython cdef dict caches keyed on cqltype objects to avoid repeated class lookups and Deserializer object creation on every result set.

Using cqltype objects (not id()) as cache keys holds strong references, preventing GC/id-reuse correctness issues with parameterized types.

Motivation

On every result set, make_deserializers(coltypes) is called from row_parser.pyx:37, which in turn calls find_deserializer() for each column type. These functions perform class name lookups and issubclass() chains, then create fresh Deserializer objects -- all redundant work when the same column types appear repeatedly (which is always the case for prepared statements).

Benchmark results

Benchmarks compare the original code (Before) against the new cached implementation (After).

find_deserializer (single type lookup) -- 6.6x faster:

Variant Min Mean Median Ops/sec Speedup
Before (original) 367.0 ns 416.7 ns 403.0 ns 2.4 Mops/s --
After (with cache) 54.9 ns 62.9 ns 60.5 ns 15.9 Mops/s 6.6x

make_deserializers (5 types) -- 29x faster:

Variant Min Mean Median Ops/sec Speedup
Before (original) 2,983 ns 3,766 ns 3,620 ns 266 Kops/s --
After (with cache) 110.7 ns 128.4 ns 118.6 ns 7,790 Kops/s 29x

make_deserializers (10 types) -- 40x faster:

Variant Min Mean Median Ops/sec Speedup
Before (original) 4,946 ns 5,830 ns 5,864 ns 172 Kops/s --
After (with cache) 127.7 ns 147.3 ns 140.9 ns 6,787 Kops/s 40x

Design notes

  • Caches are cdef dict (C-level, not accessible from Python) for minimal overhead
  • Cache keys are the cqltype objects themselves, not id(cqltype) -- holds strong references preventing GC and id() reuse
  • For prepared statements (the hot path), cache hit rate is effectively 100%
  • Caches are bounded to 256 entries; on overflow the entire cache is cleared (simple and sufficient for real workloads where the number of distinct column-type signatures is typically dozens to low hundreds)
  • The returned array from make_deserializers may be a shared cached object; callers must not modify it
  • Runtime Des* class overrides (e.g. DesBytesType = DesBytesTypeByteArray) require a call to clear_deserializer_caches() to flush stale entries
  • A clear_deserializer_caches() API and get_deserializer_cache_sizes() diagnostic helper are provided

Tests

All existing unit tests pass. New tests cover:

  • Cache hit/miss behaviour for both find_deserializer and make_deserializers
  • Bounded eviction (cache does not exceed 256 entries)
  • clear_deserializer_caches() invalidation
  • get_deserializer_cache_sizes() diagnostic helper
  • Benchmark correctness and performance comparison vs uncached baseline

Pre-review checklist

  • I have split my patch into logically separate commits.
  • All commit messages clearly explain what they change and why.
  • I added relevant tests for new features and bug fixes.
  • All commits compile, pass static checks and pass test.
  • PR description sums up the changes and reasons why they should be introduced.
  • I have provided docstrings for the public items that I want to introduce.
  • I have adjusted the documentation in ./docs/source/.
  • I added appropriate Fixes: annotations to PR description.

@mykaul mykaul changed the title (improvement) cache deserializer instances in find_deserializer and m… (improvement) (cython only) cache deserializer instances in find_deserializer and m… Mar 13, 2026
@mykaul mykaul marked this pull request as draft March 13, 2026 10:13
@mykaul mykaul requested a review from Copilot March 13, 2026 13:10
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves performance in the Cython fast-path by caching deserializer lookups and per-column deserializer arrays, avoiding repeated class-resolution work and repeated Deserializer object creation across result sets.

Changes:

  • Add module-level Cython dict caches for find_deserializer() and make_deserializers() keyed by cqltype objects / tuples of cqltype objects.
  • Update find_deserializer() / make_deserializers() to consult and populate these caches.
  • Add a pytest-based benchmark module to compare cached vs uncached behavior.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.

File Description
cassandra/deserializers.pyx Adds caching for deserializer instance resolution and for per-column deserializer arrays.
benchmarks/test_deserializer_cache_benchmark.py Introduces correctness checks + pytest-benchmark-style benchmarks for the new caching behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

mykaul added 2 commits April 3, 2026 21:34
…ake_deserializers

Cache find_deserializer() and make_deserializers() results in Cython cdef
dict caches keyed on cqltype objects to avoid repeated class lookups and
Deserializer object creation on every result set.

Using cqltype objects (not id()) as cache keys holds strong references,
preventing GC/id-reuse correctness issues with parameterized types.
- Add 256-entry size cap to both _deserializer_cache and
  _make_deserializers_cache to prevent unbounded growth from
  non-interned parameterized types in unprepared queries.
- Add clear_deserializer_caches() public API so that runtime Des*
  class overrides (e.g. DesBytesType = DesBytesTypeByteArray for
  cqlsh) can flush stale cached instances.
- Add get_deserializer_cache_sizes() diagnostic helper.
- Document override/cache interaction in code comments.
- Fix benchmark copyright (DataStax -> ScyllaDB), add
  pytest.importorskip guards for pytest-benchmark and Cython.
- Add 11 unit tests for cache hit/miss, clear, eviction bounds,
  and size reporting.
- Add clear_deserializer_caches() calls to integration test for
  DesBytesType override.
@mykaul mykaul force-pushed the perf/cache-deserializer-lookup branch from 9dc1ddf to 4a7b199 Compare April 3, 2026 18:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants