Skip to content

Fix JVM <clinit> deadlock by removing static final accessor fields#48689

Merged
jeet1995 merged 6 commits intoAzure:mainfrom
jeet1995:fix/clinit-deadlock-bridge-methods
Apr 14, 2026
Merged

Fix JVM <clinit> deadlock by removing static final accessor fields#48689
jeet1995 merged 6 commits intoAzure:mainfrom
jeet1995:fix/clinit-deadlock-bridge-methods

Conversation

@jeet1995
Copy link
Copy Markdown
Member

@jeet1995 jeet1995 commented Apr 3, 2026

Summary

Fixes a JVM <clinit> deadlock (#48622, #48585) that permanently hangs threads when multiple threads concurrently trigger Cosmos SDK class loading. Also fixes a latent CosmosItemSerializer.DEFAULT_SERIALIZER null bug.

Fixes: #48622, #48585

Root Cause

Deadlock: Consuming classes cached accessors in private static final fields. During <clinit>, the getter calls initializeAllAccessors(), eagerly loading 40+ classes. Concurrent <clinit> of different classes creates circular init-lock waits — permanent deadlock per JLS §12.4.2.

DEFAULT_SERIALIZER null: CosmosItemSerializer.DEFAULT_SERIALIZER cross-referenced DefaultCosmosItemSerializer.DEFAULT_SERIALIZER. When DefaultCosmosItemSerializer.<clinit> ran first, recursive same-thread <clinit> of the parent read the child's field before it was set.

Parent-child <clinit> deadlock: DefaultCosmosItemSerializer.INTERNAL_DEFAULT_SERIALIZER was accessed independently by implementation code, triggering child <clinit> on a different thread than the parent — creating an AB/BA init-lock deadlock between parent and child.

Fix

1. Uniform static getter pattern

// Before — triggers initializeAllAccessors() during <clinit>
private static final FeedResponseAccessor feedResponseAccessor =
    ImplementationBridgeHelpers.FeedResponseHelper.getFeedResponseAccessor();

// After — no <clinit> involvement
private static FeedResponseAccessor feedResponseAccessor() {
    return ImplementationBridgeHelpers.FeedResponseHelper.getFeedResponseAccessor();
}

2. Break CosmosItemSerializer ↔ DefaultCosmosItemSerializer cycle

  • CosmosItemSerializer.DEFAULT_SERIALIZER creates instance directly via new DefaultCosmosItemSerializer(...) — no cross-class <clinit> dependency
  • INTERNAL_DEFAULT_SERIALIZER moved from DefaultCosmosItemSerializer to CosmosItemSerializer (private, exposed via CosmosItemSerializerAccessor.getInternalDefaultSerializer()) — so that accessing it no longer triggers child <clinit> from a different thread, eliminating the AB/BA init-lock between parent and child
  • static { initialize(); } placed before DEFAULT_SERIALIZER so the accessor is registered before construction — eliminates initializeAllAccessors() fallback during <clinit>
  • DefaultCosmosItemSerializer.DEFAULT_SERIALIZER and its serializationInclusionModeAwareObjectMapper removed (dead code)

Scope

Category Description
Static final/instance accessor fields removed 66 fields across consuming classes
Static getter methods added ~120 private static XxxAccessor xxx() methods
Missing static { initialize(); } added CosmosRequestContext, CosmosOperationDetails, CosmosDiagnosticsContext
Accessor rename fix getCosmosAsyncClientAccessor()getCosmosDiagnosticsThresholdsAccessor() in CosmosDiagnosticsThresholdsHelper
checkNotNull bug fix DefaultCosmosItemSerializer constructor passed string literal instead of parameter
Files changed 85

Exceptions (not converted to static getters)

  • HttpClient.java — Java 8 interface, no private static methods

Tests

Test What it proves
concurrentAccessorInitializationShouldNotDeadlock (×5 invocations) Forked JVMs with 12 concurrent threads triggering <clinit> — catches deadlock via 30s timeout
allAccessorClassesMustHaveStaticInitializerBlock Forked JVM verifies every accessor is non-null after <clinit>
noStaticOrInstanceAccessorFieldsInConsumingClasses Reflection scan: fails if any class has a static or final Accessor field
accessorInitialization Validates initializeAllAccessors() bootstrap path

@github-actions github-actions bot added azure-spring All azure-spring related issues Cosmos labels Apr 3, 2026
@jeet1995
Copy link
Copy Markdown
Member Author

jeet1995 commented Apr 3, 2026

Closing — bridge classes don't allow adding new methods. Proceeding with #48667 (Class.forName with explicit classloader).

@jeet1995 jeet1995 closed this Apr 3, 2026
@jeet1995 jeet1995 reopened this Apr 3, 2026
@jeet1995 jeet1995 force-pushed the fix/clinit-deadlock-bridge-methods branch from e57066d to 66afd43 Compare April 4, 2026 00:05
@jeet1995 jeet1995 changed the title Fix JVM <clinit> deadlock using targeted bridge methods (alternative to #48667) Fix JVM <clinit> deadlock by removing static final accessor fields (alternative to #48667) Apr 4, 2026
@jeet1995 jeet1995 marked this pull request as ready for review April 5, 2026 00:30
Copilot AI review requested due to automatic review settings April 5, 2026 00:30
@jeet1995 jeet1995 changed the title Fix JVM <clinit> deadlock by removing static final accessor fields (alternative to #48667) Fix JVM <clinit> deadlock by removing static final accessor fields Apr 5, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses a JVM <clinit> deadlock in the Cosmos Java SDK by removing many static final ...Accessor caches in consuming classes and switching call sites to resolve accessors lazily via ImplementationBridgeHelpers.*Helper.get*Accessor() on demand, reducing class-initialization-time cross-dependencies.

Changes:

  • Replaced numerous private static final XxxAccessor ... = getXxxAccessor() fields with inline (lazy) getter calls at usage sites.
  • Added/adjusted static { initialize(); } blocks and <clinit> ordering to ensure accessors are registered safely during class initialization where required.
  • Added forked-JVM regression/enforcement tests around concurrent <clinit> behavior and accessor registration.

Reviewed changes

Copilot reviewed 54 out of 54 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
sdk/spring/azure-spring-data-cosmos/README.md Trailing whitespace/newline adjustment.
sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/util/CosmosPagedFluxStaticListImpl.java Removed static accessor cache; inline FeedResponse accessor usage.
sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/util/CosmosPagedFluxDefaultImpl.java Removed static accessor cache; inline diagnostics context accessor usage.
sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/models/FeedResponse.java Removed static diagnostics accessor cache; inline accessor calls.
sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/models/CosmosOperationDetails.java Added static { initialize(); } registration block.
sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/models/CosmosItemRequestOptions.java Removed static thresholds accessor cache; inline thresholds accessor usage.
sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/StaleResourceRetryPolicy.java Removed static exception accessor cache.
sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/SessionTokenMismatchRetryPolicy.java Removed static accessor cache; inline session retry options accessor usage.
sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/RxDocumentClientImpl.java Removed multiple static accessor caches; inline accessor usage across implementation.
sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/query/QueryPlanRetriever.java Removed static accessor caches; inline accessors for options/exception handling.
sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/query/PipelinedQueryExecutionContext.java Removed static accessor cache; inline accessor usage when cloning options.
sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/query/PipelinedDocumentQueryExecutionContext.java Removed static accessor caches; inline options and serializer accessor usage.
sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/query/ParallelDocumentQueryExecutionContext.java Removed static accessor caches; inline options/diagnostics accessor usage.
sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/query/OrderByUtils.java Removed static diagnostics accessor cache; inline diagnostics accessor usage.
sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/query/OrderByDocumentQueryExecutionContext.java Removed static accessor caches; inline feed/diagnostics accessor usage.
sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/query/OrderByDocumentProducer.java Removed static feed accessor cache; inline feed accessor usage.
sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/query/NonStreamingOrderByUtils.java Removed static diagnostics accessor cache; inline diagnostics accessor usage.
sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/query/NonStreamingOrderByDocumentQueryExecutionContext.java Removed static accessor caches; inline feed/diagnostics accessor usage.
sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/query/HybridSearchDocumentQueryExecutionContext.java Removed static accessor caches; inline feed/diagnostics accessor usage.
sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/query/GroupByDocumentQueryExecutionContext.java Removed static diagnostics accessor cache; inline diagnostics accessor usage.
sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/query/Fetcher.java Removed static diagnostics accessor cache; inline diagnostics accessor usage.
sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/query/DocumentQueryExecutionContextFactory.java Inline options accessor usage in creation flow.
sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/query/DocumentQueryExecutionContextBase.java Removed static accessor caches; inline accessor usage for request creation and cloning.
sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/query/DocumentProducer.java Removed static accessor cache; inline accessor usage when cloning options.
sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/query/DefaultDocumentQueryExecutionContext.java Removed static accessor cache; inline accessor usage for partition key definition/properties.
sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/query/DCountDocumentQueryExecutionContext.java Removed static diagnostics accessor cache; inline diagnostics accessor usage.
sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/query/ChangeFeedFetcher.java Removed static feed accessor cache; inline feed accessor usage.
sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/query/AggregateDocumentQueryExecutionContext.java Removed static accessor caches; inline feed/diagnostics accessor usage.
sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/JsonSerializable.java Removed static serializer accessor cache; inline serializer accessor usage.
sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/ImplementationBridgeHelpers.java Renamed thresholds accessor getter (getCosmosAsyncClientAccessorgetCosmosDiagnosticsThresholdsAccessor).
sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/http/HttpClientConfig.java Removed static HTTP2 config accessor cache; inline accessor usage.
sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/Document.java Removed static serializer accessor cache; inline serializer accessor usage.
sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/directconnectivity/GoneAndRetryWithRetryPolicy.java Removed static exception accessor cache; inline exception accessor usage.
sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/DiagnosticsProvider.java Removed multiple static accessor caches; inline accessors throughout tracing/metrics paths.
sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/CosmosQueryRequestOptionsImpl.java Updated thresholds accessor getter name usage.
sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/CosmosQueryRequestOptionsBase.java Updated thresholds accessor getter name usage.
sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/ConnectionPolicy.java Removed static HTTP2 config accessor cache; inline accessor usage.
sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/clienttelemetry/ClientTelemetryMetrics.java Removed static accessor caches; inline accessors for telemetry metrics recording.
sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/clienttelemetry/ClientMetricsDiagnosticsHandler.java Removed static telemetry config accessor cache.
sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/ChangeFeedQueryImpl.java Removed static accessor caches; inline accessors for change feed request/response.
sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/caches/RxCollectionCache.java Removed static exception accessor cache; inline exception accessor usage.
sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/batch/TransactionalBulkExecutor.java Removed static batch request options accessor cache; inline accessor usage.
sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/batch/BulkExecutor.java Removed static accessor caches; inline accessors for batch response and diagnostics provider.
sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/CosmosRequestContext.java Added static { initialize(); } registration block.
sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/CosmosItemSerializer.java Reordered <clinit> to register accessor before static fields.
sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/CosmosDiagnosticsContext.java Added static { initialize(); } registration block.
sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/CosmosContainerProactiveInitConfig.java Removed static container identity accessor cache; inline accessor usage.
sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/CosmosAsyncUser.java Removed static accessor caches; inline accessors for query naming/feed response creation.
sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/CosmosAsyncScripts.java Removed static accessor caches; inline accessors for query naming/feed response creation.
sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/CosmosAsyncDatabase.java Removed static accessor caches; inline accessors for query naming/feed response creation.
sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/CosmosAsyncContainer.java Removed many static accessor caches; inline accessors across request/response, policies, and telemetry.
sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/CosmosAsyncClient.java Removed static accessor caches; inline telemetry/options/feed response accessors.
sdk/cosmos/azure-cosmos/CHANGELOG.md Added changelog bullet for <clinit> deadlock fix.
sdk/cosmos/azure-cosmos-tests/src/test/java/com/azure/cosmos/implementation/ImplementationBridgeHelpersTest.java Added forked-JVM deadlock regression test and accessor registration enforcement test.

Comment thread sdk/cosmos/azure-cosmos/CHANGELOG.md Outdated
jeet1995 added a commit to jeet1995/azure-sdk-for-java that referenced this pull request Apr 5, 2026
… stale docs

- Removed remaining static final accessor fields in
  DocumentQueryExecutionContextFactory, CosmosQueryRequestOptionsBase,
  CosmosQueryRequestOptionsImpl
- Extracted local variables for long inline accessor chains in
  SessionTokenMismatchRetryPolicy, RxDocumentClientImpl,
  CosmosPagedFluxDefaultImpl
- Updated test Javadoc to reflect lazy accessor approach (not Class.forName)
- Reduced child JVM runs from 3 to 1 (invocationCount=5 provides repetition)
- Fixed CHANGELOG PR link to Azure#48689

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@jeet1995 jeet1995 force-pushed the fix/clinit-deadlock-bridge-methods branch from 4e41bc9 to 101efda Compare April 7, 2026 22:08
jeet1995 added a commit to jeet1995/azure-sdk-for-java that referenced this pull request Apr 7, 2026
… stale docs

- Removed remaining static final accessor fields in
  DocumentQueryExecutionContextFactory, CosmosQueryRequestOptionsBase,
  CosmosQueryRequestOptionsImpl
- Extracted local variables for long inline accessor chains in
  SessionTokenMismatchRetryPolicy, RxDocumentClientImpl,
  CosmosPagedFluxDefaultImpl
- Updated test Javadoc to reflect lazy accessor approach (not Class.forName)
- Reduced child JVM runs from 3 to 1 (invocationCount=5 provides repetition)
- Fixed CHANGELOG PR link to Azure#48689

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@jeet1995 jeet1995 force-pushed the fix/clinit-deadlock-bridge-methods branch 5 times, most recently from d8edfe4 to 7332533 Compare April 8, 2026 00:43
@rujche rujche requested a review from Copilot April 8, 2026 00:57
@xinlian12
Copy link
Copy Markdown
Member

Review complete (36:01)

Posted 2 inline comment(s).

Steps: ✓ context, correctness, cross-sdk, design, history, past-prs, synthesis, test-coverage

jeet1995 and others added 2 commits April 11, 2026 17:21
… null

Fixes Azure#48622, Azure#48585

Replace all static final accessor fields and inline
ImplementationBridgeHelpers calls with uniform private static getter
methods. This eliminates <clinit>-time class loading that caused
permanent deadlocks under concurrent class initialization (JLS 12.4.2).

Fix CosmosItemSerializer.DEFAULT_SERIALIZER circular <clinit> —
create instance directly and move INTERNAL_DEFAULT_SERIALIZER to
parent class to prevent concurrent <clinit> between parent and child.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- ModelBridgeInternal.java: Remove 26 duplicate ImplementationBridgeHelpers imports
- ItemBulkOperation.java: Remove 2 duplicate ImplementationBridgeHelpers imports
- SqlQuerySpecWithEncryption.java: Add private static internalDefaultSerializer()
  getter (matching uniform pattern), replace inline accessor calls, remove
  unused DefaultCosmosItemSerializer import

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@jeet1995
Copy link
Copy Markdown
Member Author

/azp run java - cosmos - ci

@jeet1995
Copy link
Copy Markdown
Member Author

/azp run java - cosmos - spark

@jeet1995
Copy link
Copy Markdown
Member Author

/azp run java - cosmos - tests

@jeet1995
Copy link
Copy Markdown
Member Author

/azp run java - cosmos - kafka

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

3 similar comments
@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

jeet1995 and others added 3 commits April 13, 2026 11:02
…Block test

Clarify that the test verifies accessor resolvability (via <clinit> or
initializeAllAccessors fallback), not that each class independently
registers its accessor. Structural enforcement is done by the companion
noStaticOrInstanceAccessorFieldsInConsumingClasses test.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@jeet1995
Copy link
Copy Markdown
Member Author

/azp run java - cosmos - ci

@jeet1995
Copy link
Copy Markdown
Member Author

/azp run java - cosmos - tests

@jeet1995
Copy link
Copy Markdown
Member Author

/azp run java - cosmos - kafka

@jeet1995
Copy link
Copy Markdown
Member Author

/azp run java - cosmos - spark

@jeet1995
Copy link
Copy Markdown
Member Author

/azp run java - spring - ci

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

4 similar comments
@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Copy Markdown
Member

@FabianMeiswinkel FabianMeiswinkel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Copy Markdown
Member

@xinlian12 xinlian12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks

@jeet1995
Copy link
Copy Markdown
Member Author

Thin-client test failures are due to service side config updates.

@jeet1995
Copy link
Copy Markdown
Member Author

/check-enforcer override

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

azure-spring All azure-spring related issues Cosmos

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants