perf: reduce HashMap/collection allocation overhead in gateway path#48662
perf: reduce HashMap/collection allocation overhead in gateway path#48662xinlian12 wants to merge 23 commits intoAzure:mainfrom
Conversation
Eliminate per-response intermediate HashMap allocation by adding a new StoreResponse constructor that accepts HttpHeaders directly. Header names and values are populated into String[] arrays without materializing an intermediate Map. The JsonNodeStorePayload is updated to accept header arrays and only builds a Map lazily on error paths (extremely rare). Pre-size HashMaps throughout the hot path to avoid resize/rehash: - HttpHeaders request construction: sized to defaultHeaders + request headers - StoreResponse.replicaStatusList: pre-sized to 4 - StoreResponse.withRemappedStatusCode: pre-sized to header count - RxDocumentServiceRequest fallback maps: pre-sized to 32 Fix HttpUtils.asMap() double-allocation by iterating HttpHeaders directly instead of calling toMap() which creates an intermediate HashMap. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ve null-guard inconsistency - Fix HashMap<>(4) to HashMap<>(6) for replicaStatusList to avoid rehash at 4 replicas (capacity 4 * 0.75 = threshold 3, resizes on 4th insert) - Refactor JsonNodeStorePayload: extract shared parseJson() method with Supplier<Map<String,String>> to eliminate duplicated error-handling logic - Remove misleading null ternary in getHttpRequestHeaders() since getHeaders() always returns non-null (fallback HashMap<>(32)) - Revert HashMap<>(16) to HashMap<>() in HttpHeaders default constructor (16 is already the default capacity, change was no-op noise) - Add unit tests for StoreResponse HttpHeaders constructor, HttpHeaders populateLowerCaseHeaders, and JsonNodeStorePayload array-header constructor Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Fix HashMap<>(headers.size()) in HttpUtils.asMap() to account for the 0.75 load factor, avoiding resize when all headers are inserted. Extract mapCapacityForSize(int) helper in HttpUtils to consolidate the capacity calculation (n * 4 / 3 + 1) used across HttpUtils.asMap(), StoreResponse.withRemappedStatusCode(), and JsonNodeStorePayload.buildHeaderMap(). Addresses review feedback from alzimmermsft. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ize to HttpHeaders, clarify lowercase key guarantee - Add comments explaining why HashMap<>(32) is used as fallback in RxDocumentServiceRequest: capacity 32 gives threshold 24, covering typical 15-20 request headers without resize. - Apply HttpUtils.mapCapacityForSize() in RxGatewayStoreModel.getHttpRequestHeaders() to account for 0.75 load factor when constructing HttpHeaders. - Make mapCapacityForSize() public so it can be used from other packages. - Document in populateLowerCaseHeaders() Javadoc that keys are guaranteed lowercase because HttpHeaders.set() stores them via toLowerCase(Locale.ROOT). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Extract duplicate contentStream/payload handling from both StoreResponse constructors into a shared parseResponsePayload() static helper method. Both constructors now use the array-based JsonNodeStorePayload constructor, eliminating code duplication while preserving identical behavior. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Remove both unescape(Set<Entry>) and unescape(Map) overloads from HttpUtils as they are no longer needed - Update ResponseUtils to use the HttpHeaders-based StoreResponse constructor (same optimization as RxGatewayStoreModel) - Remove unescape test from HttpUtilsTest, keep asMap() coverage - Clean up unused imports (AbstractMap, ArrayList, List, Set) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Replace previous benchmark charts with comprehensive V3 analysis: - 20 configs (c1/c8/c16/c32/c128 x Read/Write x HTTP1/HTTP2) - 3 rounds each, 10 min/run, GATEWAY mode - Timeline charts with throughput and P99 latency - JFR allocation breakdown comparison - Detailed per-round analysis of outlier patterns Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
JFR ObjectAllocationSample weight = estimated cumulative bytes allocated over the recording (10 min), not heap residency. Heap was 8 GB committed. The ~271 GB 'targeted' is allocation throughput (~4 GB/s alloc rate), most objects immediately GC'd. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Replace confusing cumulative GB with allocation share % - Add GC comparison table (817 vs 813 pauses - identical) - Frame as code efficiency improvement, not GC impact Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…arison - Remove timeline charts (not needed for review) - Add variance analysis using request-level metrics (transitTime) showing variance is server-side, not SDK-related - Add JFR allocation comparison for all 20 configs - Keep summary bar chart and c128 JFR chart Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The c16 PR showing HashMap\ is from the request-sending path (ReactorNettyClient.bodySendDelegate iterating request headers), NOT the response-side asMap() iterator we eliminated. Added clarifying note and removed the per-config ValueIterator column (too noisy). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…a-jackson.version The recent Jackson dependency update (8a671dd) bumped Jackson from 2.18.4 to 2.18.6 in all Cosmos Spark child modules but missed updating the scala-jackson.version property in the parent POM. This caused the maven-enforcer-plugin BannedDependencies rule to reject jackson-module-scala_2.12 and _2.13 at version 2.18.6. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…to upstream-main
…to upstream-main
…om/xinlian12/azure-sdk-for-java into perf/hashmap-collection-allocation
|
/azp run java - cosmos - tests |
|
Azure Pipelines successfully started running 1 pipeline(s). |
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Optimizes the Cosmos gateway hot path by reducing temporary HashMap/header-collection allocations and removing redundant header normalization.
Changes:
- Adds capacity sizing helper (
HttpUtils.mapCapacityForSize) and applies pre-sizing to avoidHashMaprehash/resizes. - Changes
StoreResponseto acceptHttpHeadersdirectly and avoidsHttpHeaders -> Map -> HttpHeadersround-trips. - Adds/updates unit tests for the new header-array population path and new constructors.
Reviewed changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/http/HttpHeaders.java | Adds populateLowerCaseHeaders to extract headers without building an intermediate map. |
| sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/directconnectivity/StoreResponse.java | Adds HttpHeaders ctor and factors payload parsing/stream close into a helper; tweaks map sizing. |
| sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/directconnectivity/ResponseUtils.java | Routes response creation through the new StoreResponse(HttpHeaders, ...) ctor. |
| sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/directconnectivity/JsonNodeStorePayload.java | Adds ctor that consumes header arrays and lazily builds a map only on error paths. |
| sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/directconnectivity/HttpUtils.java | Adds mapCapacityForSize, updates asMap, and removes unescape APIs. |
| sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/RxGatewayStoreModel.java | Avoids lowercasing/map conversions; pre-sizes HttpHeaders for request headers. |
| sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/RxDocumentServiceRequest.java | Pre-sizes request headers map to 32 to reduce resizes. |
| sdk/cosmos/azure-cosmos-tests/src/test/java/com/azure/cosmos/implementation/http/HttpHeadersTests.java | Adds unit tests for populateLowerCaseHeaders. |
| sdk/cosmos/azure-cosmos-tests/src/test/java/com/azure/cosmos/implementation/directconnectivity/StoreResponseTest.java | Adds unit tests covering new StoreResponse(HttpHeaders, ...) behavior. |
| sdk/cosmos/azure-cosmos-tests/src/test/java/com/azure/cosmos/implementation/directconnectivity/JsonNodeStorePayloadTests.java | Adds unit tests for new header-array JsonNodeStorePayload constructor. |
| sdk/cosmos/azure-cosmos-tests/src/test/java/com/azure/cosmos/implementation/directconnectivity/HttpUtilsTest.java | Removes coverage for deleted unescape(Map) API. |
| public void populateLowerCaseHeaders(String[] names, String[] values) { | ||
| int i = 0; | ||
| for (Map.Entry<String, HttpHeader> entry : headers.entrySet()) { | ||
| names[i] = entry.getKey(); | ||
| values[i] = entry.getValue().value(); | ||
| i++; | ||
| } | ||
| } |
There was a problem hiding this comment.
populateLowerCaseHeaders is a public method but it relies on a caller-side precondition (arrays must be >= size()) and will otherwise throw ArrayIndexOutOfBoundsException. Consider adding argument validation (e.g., reject null arrays and enforce names.length >= size() and values.length >= size()) and throw an IllegalArgumentException with a clear message to fail fast and make misuse easier to diagnose.
| // URL-decode OWNER_FULL_NAME header value inline (replaces HttpUtils.unescape). | ||
| // This is kept separate from populateLowerCaseHeaders because HttpHeaders is a | ||
| // general-purpose HTTP class and should not contain Cosmos-specific URL-decoding logic. | ||
| for (int i = 0; i < headerCount; i++) { | ||
| if (HttpConstants.HttpHeaders.OWNER_FULL_NAME.equals(responseHeaderNames[i])) { | ||
| responseHeaderValues[i] = HttpUtils.urlDecode(responseHeaderValues[i]); | ||
| break; | ||
| } | ||
| } else { | ||
| this.responsePayload = null; | ||
| } |
There was a problem hiding this comment.
This comparison assumes HttpConstants.HttpHeaders.OWNER_FULL_NAME matches the lowercase key stored in responseHeaderNames. Since populateLowerCaseHeaders fills keys from the internal lowercase map, decoding may be skipped if the constant isn’t lowercase. To preserve prior behavior, compare case-insensitively (or compare against a lowercase-normalized constant) so decoding is applied regardless of constant casing.
| * without resizing, accounting for the default load factor of 0.75. | ||
| */ | ||
| public static int mapCapacityForSize(int expectedSize) { | ||
| return expectedSize * 4 / 3 + 1; |
There was a problem hiding this comment.
mapCapacityForSize can overflow for large expectedSize values (producing a negative capacity that will fail new HashMap<>(...)) and doesn’t define behavior for negative inputs. Consider guarding expectedSize <= 0 and clamping the computed capacity to a safe upper bound (e.g., Integer.MAX_VALUE) to avoid overflow-driven failures.
| return expectedSize * 4 / 3 + 1; | |
| if (expectedSize <= 0) { | |
| return 1; | |
| } | |
| long capacity = (long) expectedSize * 4 / 3 + 1; | |
| return capacity >= Integer.MAX_VALUE ? Integer.MAX_VALUE : (int) capacity; |
| for (String name : names) { | ||
| assertThat(name).isEqualTo(name.toLowerCase()); | ||
| } |
There was a problem hiding this comment.
Tests should avoid locale-dependent casing behavior. String#toLowerCase() without an explicit locale can behave unexpectedly under certain default locales (e.g., Turkish). Use toLowerCase(Locale.ROOT) for stable assertions.
| logger.warn("Failed to close content stream. This may cause a Netty ByteBuf leak.", e); | ||
| } | ||
| } | ||
| replicaStatusList = new HashMap<>(6); |
There was a problem hiding this comment.
The initial capacity 6 is a magic number repeated across constructors. Consider extracting it into a named constant (or using HttpUtils.mapCapacityForSize(<expectedEntries>) with a brief rationale) so the intent is clearer and future changes don’t require hunting multiple occurrences.
| public StoreResponse( | ||
| String endpoint, | ||
| int status, | ||
| HttpHeaders httpHeaders, | ||
| ByteBufInputStream contentStream, | ||
| int responsePayloadLength) { |
There was a problem hiding this comment.
The new StoreResponse(HttpHeaders, ...) path moves OWNER_FULL_NAME URL-decoding inline (replacing the removed HttpUtils.unescape(...)). Add a unit test that includes an encoded OWNER_FULL_NAME header and asserts it is decoded when accessed via getHeaderValue, to ensure behavioral parity with the previous map-based flow.
Performance: Reduce HashMap/Collection Allocation Overhead in Gateway Path
Motivation
JFR profiling of the baseline (
main) under high-concurrency gateway workloads revealed thatHashMap-related allocations (HashMap$Node,HashMap,HashMap$ValueIterator) and HTTP header collections (DefaultHeaders$HeaderEntry,HttpHeader) are responsible for a significant share of total object allocation churn.Baseline JFR allocation profile (c128 Read HTTP/1,
ObjectAllocationSample, 10-min recording):HashMap$NodeDefaultHeaders$HeaderEntryHashMap$ValueIteratorHttpHeaderHashMapHttpHeadersHashMap$Node[]Root causes:
HashMap<>()default initial capacity (16) forces 1-2 resize+rehash cycles for typical gateway responses with 20-30 headers, creating throwawayHashMap$Node[]arrays and re-hashedHashMap$NodeentriesStoreResponseconstructor convertsHttpHeaderstoMapviaHttpUtils.asMap()on every response, allocating a throwawayHashMap$ValueIteratorand rebuilding allHashMap$NodeentriesHttpHeadersinRxGatewayStoreModel.getHttpRequestHeaders()is undersized, causing internal HashMap resizetoLowerCase()calls on header keys that are already normalizedChanges
HashMap<>(32)instead ofHashMap<>()inRxDocumentServiceRequest, andmapCapacityForSize()helper inHttpUtilsto avoid rehashingStoreResponsenow acceptsHttpHeadersdirectly, removing intermediateasMap()conversion that created throwawayHashMap$ValueIteratorandHashMap$NodearraysRxGatewayStoreModel: sized todefaultHeaders.size() + headers.size()to avoid internal HashMap resizetoLowerCase()calls:HttpHeaders.set()already normalizes keys; callers no longer double-normalize creating extraStringobjectsBenchmark Results
Test matrix: 1-tenant {c1, c8, c32, c128} {HTTP/1, HTTP/2} + 30-tenant c3 {HTTP/1, HTTP/2}, GATEWAY mode, ReadThroughput, 10 min per round, 3 independent JVM rounds per config. All endpoints pinned to the same Cosmos gateway frontend via
/etc/hoststo eliminate Azure Traffic Manager routing variance. Bothmainandhashmap-alloctested on the same VM per config.1-Tenant Throughput & Latency
30-Tenant Throughput & Latency
JFR Allocation Comparison (c128 Read HTTP/1, ObjectAllocationSample)
HashMap$NodeDefaultHeaders$HeaderEntryHashMap$Node[]HttpHeaderHeadersUtils$StringEntryHashMapLinkedHashMap$EntryIdentityHashMap$ValueIteratorPer-Round Detail
Click to expand per-round throughput (ops/s)
VM1 (c1, c8, c32-http1):
VM2 (c32-http2, c128, 30t):
All CVs 1.4% across both refs, confirming stable measurements with pinned endpoints.
Conclusion
HashMap$Node-16%,HashMap$Node[]-22%,IdentityHashMap$ValueIteratoreliminated at c128