Skip to content

Upserts Bitmap Optimizations while taking Snapshot - Segment Commit#18020

Merged
Jackie-Jiang merged 10 commits intoapache:masterfrom
deepthi912:compilation
Mar 31, 2026
Merged

Upserts Bitmap Optimizations while taking Snapshot - Segment Commit#18020
Jackie-Jiang merged 10 commits intoapache:masterfrom
deepthi912:compilation

Conversation

@deepthi912
Copy link
Copy Markdown
Collaborator

@deepthi912 deepthi912 commented Mar 29, 2026

  • Replaced segments and removed segments are not cleaned up from tracking set (_updatedSegmentsSinceLastSnapshot): Pinot tracks which segments need their state saved to disk. When a segment gets replaced (e.g. during a reload), the old segment is removed from the active list but not from the tracking set. It stays there until the next periodic save which only happens every 3-20 hours.
  • Snapshot persistence currently clones each updated segment's bitmap via getMutableRoaringBitmap(), which deep-copies the entire object graph. Under high commit frequency this causes significant heap spikes and GC pressure. Replace the clone with a synchronized in-place serialization (toBytes()) that produces the snapshot bytes directly from the live bitmap, eliminating the intermediate object allocation.
  • When a consuming segment finishes filling up and gets converted to an immutable segment, Pinot is supposed to release the old in-memory data for _reusePreviousRow. Instead, a reusable internal object (_reusePreviousRow) keeps holding a pointer to the old segment's data indefinitely. It's better to cleanup immediately after usage rather than doing it when the next previous row need to be used. This leak is specific for Partial Upsert tables.

Ran a small benchmark with each segment having ~85K valid doc IDs (100K docs, ~41 KB serialized bitmap):

Segments OLD heap (With clone()) (KB) NEW heap (Without clone)(KB)
1 80 40
10 806 406
50 4,032 2,033
100 8,064 4,067
200 16,128 8,134
500 40,320 (39 MB) 20,337 (20 MB)

OLD flow -- dominated by clone internals:

Count Object Type
29,763 long[]
17,844 MappeableBitmapContainer
5,964 HeapByteBuffer
5,952 byte[]
26 MutableRoaringArray

NEW flow :

Count Object Type
8,273 byte[]
3,310 DocIdSnapshot
7 HeapByteBuffer

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Mar 29, 2026

Codecov Report

❌ Patch coverage is 64.51613% with 22 lines in your changes missing coverage. Please review.
✅ Project coverage is 63.34%. Comparing base (43c8ecc) to head (fa244e7).
⚠️ Report is 69 commits behind head on master.

Files with missing lines Patch % Lines
.../index/mutable/ThreadSafeMutableRoaringBitmap.java 0.00% 13 Missing ⚠️
...cal/upsert/BasePartitionUpsertMetadataManager.java 85.71% 0 Missing and 4 partials ⚠️
...tionUpsertMetadataManagerForConsistentDeletes.java 0.00% 4 Missing ⚠️
...l/indexsegment/immutable/ImmutableSegmentImpl.java 91.66% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #18020      +/-   ##
============================================
+ Coverage     63.29%   63.34%   +0.05%     
- Complexity     1525     1543      +18     
============================================
  Files          3194     3200       +6     
  Lines        193645   194240     +595     
  Branches      29787    29932     +145     
============================================
+ Hits         122559   123047     +488     
- Misses        61466    61536      +70     
- Partials       9620     9657      +37     
Flag Coverage Δ
custom-integration1 100.00% <ø> (ø)
integration 100.00% <ø> (ø)
integration1 100.00% <ø> (ø)
integration2 0.00% <ø> (ø)
java-11 63.30% <64.51%> (+0.08%) ⬆️
java-21 63.32% <64.51%> (+0.09%) ⬆️
temurin 63.34% <64.51%> (+0.05%) ⬆️
unittests 63.34% <64.51%> (+0.05%) ⬆️
unittests1 55.51% <0.00%> (-0.04%) ⬇️
unittests2 34.26% <64.51%> (+0.03%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@deepthi912 deepthi912 added memory Related to memory usage or optimization upsert Related to upsert functionality labels Mar 30, 2026
}

public void init(IndexSegment segment, int docId) {
clear();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why have we removed this? are we going to rely on manually clearing this map ourselves ?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

init is called whenever we call the previous row, and then merge with that row. Since we are anyway clearing this map manually after the merge, thought this might be duplicate and unnecessary to clear() again.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i feel other folks who have custom pinot forks may still be relying on init to clear. let's just keep it for now

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, added it back

@Jackie-Jiang Jackie-Jiang added the bug Something is not working as expected label Mar 30, 2026
MutableRoaringBitmap docIdsSnapshot = docIds.getMutableRoaringBitmap();
try (DataOutputStream dataOutputStream = new DataOutputStream(new FileOutputStream(tmpFile))) {
docIdsSnapshot.serialize(dataOutputStream);
byte[] bytes = docIds.toBytes();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you benchmark the performance of serialization vs clone? Will this cause longer pause?
Also, will this save heap memory? Should we consider serializing using off-heap memory?

Copy link
Copy Markdown
Collaborator Author

@deepthi912 deepthi912 Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I haven't seen any longer pause from the small benchmark I have taken . Older code was completely cloning the bitmap. Atleast with that we reduce the number of lock contentions as well as bytes which are eligible for gc immediately. With off-heap memory:

It might not be worth it here:

Direct buffers are slower to allocate. ByteBuffer.allocateDirect() requires a system call (mmap) and has higher per-allocation overhead. For a ~40KB buffer created per segment, this would add overhead dominate any benefit. For maybe larger bitmaps it might make sense but it doesn't explain the purpose. We might just use it to save the heap memory and bring down heap overhead or long gc pauses. I don't think in our scenario we had a long gc pause. @KKcorps correct me if I am wrong.

Copy link
Copy Markdown
Contributor

@xiangfu0 xiangfu0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Auto-approved: low-risk change, all CI checks passed. Reviewed by automated bot.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Optimizes upsert snapshot persistence and segment lifecycle bookkeeping to reduce heap/GC pressure during frequent commits, and fixes retention of previous-segment references in partial-upsert flows.

Changes:

  • Add byte-serialization snapshotting to ThreadSafeMutableRoaringBitmap and switch snapshot persistence to use serialized bytes instead of cloning bitmaps.
  • Eagerly remove replaced/dropped segments from _updatedSegmentsSinceLastSnapshot tracking.
  • Clear the reusable LazyRow after partial-upsert merges to release old segment references sooner.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/index/mutable/ThreadSafeMutableRoaringBitmap.java Adds synchronized toBytes() and getCardinality() helpers to snapshot the live bitmap without cloning.
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/indexsegment/immutable/ImmutableSegmentImpl.java Persists docIds snapshots by writing serialized bitmap bytes instead of serializing a cloned bitmap.
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/upsert/BasePartitionUpsertMetadataManager.java Eagerly cleans _updatedSegmentsSinceLastSnapshot on segment replace/drop; adjusts snapshot-taking logic and cardinality accounting.
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/upsert/ConcurrentMapPartitionUpsertMetadataManager.java Clears reusable LazyRow after partial-upsert merge to avoid retaining old segment state.
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/upsert/ConcurrentMapPartitionUpsertMetadataManagerForConsistentDeletes.java Same reusable-LazyRow cleanup as the non-consistent-deletes manager.
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/readers/LazyRow.java Adjusts LazyRow lifecycle by no longer clearing on init() and by nulling _segment on clear().

@Jackie-Jiang Jackie-Jiang merged commit b5f9bce into apache:master Mar 31, 2026
16 checks passed
KKcorps pushed a commit to KKcorps/incubator-pinot that referenced this pull request Apr 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something is not working as expected memory Related to memory usage or optimization upsert Related to upsert functionality

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants