Skip to content

(improvement) (python code path only): cache namedtuple class in named_tuple_factory to avoid …#740

Draft
mykaul wants to merge 1 commit intoscylladb:masterfrom
mykaul:perf/cache-named-tuple-factory
Draft

(improvement) (python code path only): cache namedtuple class in named_tuple_factory to avoid …#740
mykaul wants to merge 1 commit intoscylladb:masterfrom
mykaul:perf/cache-named-tuple-factory

Conversation

@mykaul
Copy link
Copy Markdown

@mykaul mykaul commented Mar 13, 2026

Cache the Row namedtuple class keyed on tuple(colnames) so Python's namedtuple() (which internally calls exec()) is only invoked once per unique column schema. For prepared statements the column names never change, eliminating redundant class creation on every result set.

Motivation

named_tuple_factory is the default row_factory in the driver. Every call to namedtuple('Row', columns) internally calls exec() to generate a new class -- this is surprisingly expensive. For prepared statements executing the same query repeatedly, the column names never change, yet we pay the namedtuple() + exec() cost on every result set.

Benchmark results

Benchmarks compare the original code (Before) against the new cached implementation (After). All timings in us (microseconds).

10 columns, 1 row (isolates class creation overhead):

Variant Min (us) Mean (us) Median (us) Ops/sec Speedup
Before (original) 43.49 59.98 47.65 16,700
After (with cache) 0.24 0.45 0.35 2,210,000 ~133x

5 columns, 100 rows:

Variant Min (us) Mean (us) Median (us) Ops/sec Speedup
Before (original) 57.4 91.2 65.8 10,969
After (with cache) 19.3 25.3 24.0 39,594 ~3.6x

10 columns, 100 rows:

Variant Min (us) Mean (us) Median (us) Ops/sec Speedup
Before (original) 56.7 101.9 75.6 9,813
After (with cache) 18.1 21.4 20.4 46,825 ~4.8x

Design notes

  • Cache is a plain dict keyed on tuple(colnames) (raw column names before cleaning)
  • Error handling paths (SyntaxError, Exception) preserved unchanged
  • Cache is naturally bounded by the number of distinct queries

Tests

All existing unit tests pass (46 passed).

Pre-review checklist

  • I have split my patch into logically separate commits.
  • All commit messages clearly explain what they change and why.
  • I added relevant tests for new features and bug fixes.
  • All commits compile, pass static checks and pass test.
  • PR description sums up the changes and reasons why they should be introduced.
  • I have provided docstrings for the public items that I want to introduce.
  • I have adjusted the documentation in ./docs/source/.
  • I added appropriate Fixes: annotations to PR description.

@mykaul mykaul changed the title (improvement) cache namedtuple class in named_tuple_factory to avoid … (improvement) (python code path only): cache namedtuple class in named_tuple_factory to avoid … Mar 13, 2026
@mykaul mykaul marked this pull request as draft March 13, 2026 10:13
@mykaul mykaul force-pushed the perf/cache-named-tuple-factory branch from 9ea1ed1 to 8398d42 Compare April 2, 2026 17:08
…repeated exec() calls

Cache the Row namedtuple class keyed on tuple(colnames) so Python's
namedtuple() (which internally calls exec()) is only invoked once per
unique column schema. For prepared statements the column names never
change, eliminating redundant class creation on every result set.

Cache is a plain dict keyed on tuple(colnames) (raw column names before
cleaning). Error handling paths (SyntaxError, Exception) preserved
unchanged. Cache is naturally bounded by the number of distinct queries.
@mykaul mykaul force-pushed the perf/cache-named-tuple-factory branch from 8398d42 to 9a76016 Compare April 3, 2026 18:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant