Skip to content

Add TLS session resumption via SSLSessionCache#789

Open
sylwiaszunejko wants to merge 3 commits intoscylladb:masterfrom
sylwiaszunejko:tls-ticket
Open

Add TLS session resumption via SSLSessionCache#789
sylwiaszunejko wants to merge 3 commits intoscylladb:masterfrom
sylwiaszunejko:tls-ticket

Conversation

@sylwiaszunejko
Copy link
Copy Markdown
Collaborator

@sylwiaszunejko sylwiaszunejko commented Apr 3, 2026

Summary

This PR implements TLS session resumption for the Python driver. After the first
successful TLS handshake with a node, the negotiated session is stored in a
thread-safe cache and reused on subsequent connections, skipping the full
handshake.

Both TLS 1.2 (session IDs) and TLS 1.3 (session tickets / PSK) are supported.

Changes

cassandra/connection.pySSLSessionCache class

  • Introduces SSLSessionCache: a thread-safe dict backed by a Lock,
    keyed by (address, port, server_hostname).
  • SNI-aware key design prevents proxy-shared connections from overwriting each
    other's sessions.
  • Connection gains an optional ssl_session_cache parameter:
    • Restores a cached session before connect() to enable resumption; gracefully
      handles ssl.SSLError / AttributeError if the server rejects the session.
    • Caches the negotiated session in three places:
      • After _initiate_connection() in _connect_socket() — TLS 1.2 sessions
        available immediately after connect.
      • After ReadyMessage in _handle_startup_response() — TLS 1.3 sessions
        delivered asynchronously after the first application-data exchange.
      • After AuthSuccessMessage in _handle_auth_response() — same TLS 1.3
        coverage for authenticated connections.

cassandra/cluster.pyCluster integration

  • Adds ssl_session_cache attribute to Cluster.
  • Auto-creates an SSLSessionCache when ssl_context or ssl_options are set;
    no configuration required for the common case.
  • Pass ssl_session_cache=None explicitly to opt out of session caching.
  • A custom SSLSessionCache instance can be supplied to share a cache across
    clusters or inject a custom implementation.
  • Logs a warning when the active connection class uses pyOpenSSL (Twisted,
    Eventlet), which has a different session API and is not covered by this cache.
  • Passes the cache through connection_factory kwargs to every Connection.

Limitations

  • Only stdlib ssl reactor paths are supported: asyncore, libev, gevent,
    asyncio.
  • Twisted and Eventlet connections use pyOpenSSL and are not covered.
    A warning is emitted when this combination is detected.

Tests

  • tests/unit/test_connection.pyTestSSLSessionCache: empty lookup,
    set/get, key isolation by address/port/SNI, overwrite, thread safety.
  • tests/unit/test_connection.pyConnection unit tests: session restore
    (including SNI-specific lookup), error tolerance, caching on ReadyMessage /
    AuthSuccess, no-op guard paths (cache=None, ssl_context=None,
    session=None).
  • tests/unit/test_cluster.pyTestSSLSessionCacheAutoCreation: auto-create
    with ssl_context / ssl_options, no cache without TLS, explicit None
    opt-out, custom cache injection, factory propagation, pyOpenSSL warnings for
    Twisted and Eventlet.

Fixes: https://scylladb.atlassian.net/browse/DRIVER-165

Pre-review checklist

  • I have split my patch into logically separate commits.
  • All commit messages clearly explain what they change and why.
  • I added relevant tests for new features and bug fixes.
  • All commits compile, pass static checks and pass test.
  • PR description sums up the changes and reasons why they should be introduced.
  • I have provided docstrings for the public items that I want to introduce.
  • I have adjusted the documentation in ./docs/source/.
  • I added appropriate Fixes: annotations to PR description.

Introduce SSLSessionCache in connection.py: a thread-safe dict keyed by
(address, port, server_hostname) storing ssl.SSLSession objects.

- get(key) / set(key, session) API backed by a Lock
- SNI-aware key prevents proxy-shared connections from colliding

Includes unit tests for empty lookup, set/get, key isolation by
address/port/SNI, overwrite, and concurrent access.
Wire SSLSessionCache into Connection so TLS sessions are saved after
each handshake and restored on subsequent connections.

- Accept optional ssl_session_cache in __init__
- Restore cached session before connect() to enable resumption;
  handle ssl.SSLError/AttributeError if session is rejected
- Cache negotiated session in three places: after connect (TLS 1.2),
  after ReadyMessage and AuthSuccessMessage (TLS 1.3 async tickets)
- Cache key uses (address, port, server_hostname) to avoid SNI
  collisions on shared proxy addresses

Includes unit tests for restore, SNI lookup, error tolerance,
caching, and no-op paths (cache=None, ssl_context=None, session=None).
Add ssl_session_cache to Cluster so all managed connections share a
single TLS session store.

- Auto-create SSLSessionCache when ssl_context or ssl_options are set;
  pass ssl_session_cache=None to opt out
- Accept explicit ssl_session_cache to allow a custom instance
- Warn when Twisted/Eventlet connection classes are used (pyOpenSSL
  does not support stdlib ssl session resumption; cache has no effect)
- Pass the cache through connection_factory kwargs to each Connection

Includes unit tests for auto-creation, opt-out, custom cache injection,
factory propagation, and pyOpenSSL warnings.
@Lorak-mmk
Copy link
Copy Markdown

This reduces reconnection latency and CPU overhead, especially in
deployments with short-lived connections or frequent reconnects.

Such claims would ideally be supported by benchmarks. Could you try to create some?
I very vaguely remember this feature being postponed because the performance gains were underwhelming (but perhaps memory is failing me).

@sylwiaszunejko
Copy link
Copy Markdown
Collaborator Author

This reduces reconnection latency and CPU overhead, especially in
deployments with short-lived connections or frequent reconnects.

Such claims would ideally be supported by benchmarks. Could you try to create some? I very vaguely remember this feature being postponed because the performance gains were underwhelming (but perhaps memory is failing me).

That's the goal, but you're right, I don't have any tests to prove that, removed this claim from the PR description. If I manage to create proper benchmarks I will update on that

@mykaul
Copy link
Copy Markdown

mykaul commented Apr 3, 2026

We could, if it helps, only support this for TLS 1.3.

@sylwiaszunejko sylwiaszunejko self-assigned this Apr 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants