Skip to content

Add COLUMNAR_MAP index for per-key columnar storage for dense key and JSON storage for sparse key for MAP datatype#17896

Open
tarun11Mavani wants to merge 1 commit intoapache:masterfrom
tarun11Mavani:sparse-map-1-storage
Open

Add COLUMNAR_MAP index for per-key columnar storage for dense key and JSON storage for sparse key for MAP datatype#17896
tarun11Mavani wants to merge 1 commit intoapache:masterfrom
tarun11Mavani:sparse-map-1-storage

Conversation

@tarun11Mavani
Copy link
Copy Markdown
Contributor

@tarun11Mavani tarun11Mavani commented Mar 17, 2026

Summary

This is PR 1 of 2 introducing columnar MAP storage — an opt-in index for MAP columns in Apache Pinot that stores each key in its own columnar format with two-tier (dense/sparse) storage. This PR covers the full storage layer: SPI interfaces, binary format, segment creation, immutable reader, dictionary encoding, and lifecycle wiring.

Stack

  • PR 1 (this): Storage layer — SPI, index format, segment creation, immutable read path
  • PR 2: Query layer — DataSource, mutable segments, filter operators, optimizations

Motivation

Many Pinot use cases need to store and query semi-structured map data (e.g., user metrics, event properties, feature stores) where the set of keys varies across records. Today users must flatten maps into individual columns at ingestion time or store them as opaque JSON blobs without query pushdown. The columnar MAP index solves this by storing maps in a columnar per-key format that supports typed access, filtering, and GROUP BY at query time — without requiring schema changes when new keys appear.

Discussed the benefits in detail in #17894.
RFC: https://docs.google.com/document/d/14kPmjDTKbO8l0ql4rrN7I5Yki5pqMw6GeGmxxc9grsU/edit?tab=t.0

What's in this PR

Schema & Config (pinot-spi)

  • No new DataType — uses existing DataType.MAP from ComplexFieldSpec. Columnar storage is an opt-in index via indexTypes: ["COLUMNAR_MAP"] in table config
  • ComplexFieldSpec.MapFieldSpec — extended with optional keyTypes (per-key type declarations) and defaultValueType for undeclared keys
  • ColumnarMapIndexConfig — controls denseKeyThreshold, explicit denseKeys, maxKeys, inverted index enablement, and noDictionaryKeys
  • FieldConfig.IndexType.COLUMNAR_MAP — explicit index type enum for opt-in

SPI Interfaces (pinot-segment-spi)

  • ColumnarMapIndexCreator — segment-build-time interface: add(Map<String, Object>) per doc, seal() to flush
  • ColumnarMapIndexReader — query-time interface: typed getters (getInt, getString, ...), presence bitmaps, inverted index lookups, per-key DataSource access
  • Constants in V1Constants and StandardIndexes for index type registration
  • ColumnMetadataImpl — extended to persist per-key type declarations in segment metadata

Index Implementation (pinot-segment-local)

Binary format (.columnarmap.idx, SPMX v3) with two-tier storage:

+-------------------------+----------------------------------------------------------+
|         Section         |                        Contents                          |
+-------------------------+----------------------------------------------------------+
| Header (56 bytes)       | Magic 0x53504D58, version=3, numKeys, numDocs,          |
|                         | numDenseKeys, numSparseKeys, 4 offset pointers           |
+-------------------------+----------------------------------------------------------+
| Key Dictionary          | Sorted key name strings (dense + sparse)                 |
+-------------------------+----------------------------------------------------------+
| Key Metadata (70 B/key) | tierFlag (1B), storedType (1B), numDocsForKey (4B),      |
|                         | nullBitmapOffset/Len, fwdIndexOffset/Len,                |
|                         | invertedIndexOffset/Len, dictIdFwdOffset/Len             |
+-------------------------+----------------------------------------------------------+
| Per-Key Data (dense)    | Null bitmap (RLE-optimized Roaring) + forward index      |
|                         | (one entry per segment doc, indexed by docId) +          |
|                         | optional inverted index + dictId forward index           |
+-------------------------+----------------------------------------------------------+
| Value Dictionary        | Per-key sorted distinct values for dict-encoded keys     |
+-------------------------+----------------------------------------------------------+

Sparse sidecar file (.columnarmap.sparse):

+-------------------------+----------------------------------------------------------+
| Header                  | Magic 0x534D5350, numDocs                                |
+-------------------------+----------------------------------------------------------+
| Offset table            | int[numDocs+1] byte offsets into JSON blob section        |
+-------------------------+----------------------------------------------------------+
| JSON blobs              | Per-doc JSON containing only the sparse key/value pairs   |
+-------------------------+----------------------------------------------------------+

Two-tier storage design:

  • Dense tier — keys with fill rate > denseKeyThreshold (default 0.5) or explicitly listed in denseKeys. One forward index entry per segment document, O(1) access by docId, null bitmap tracks absent documents
  • Sparse tier — all remaining keys. Metadata-only in the SPMX file, data stored in sidecar JSON blobs. Optimizes storage for keys present in a small fraction of documents

Key storage features:

  • Dictionary encoding by default — per-key dictionary with 0.85 size-ratio heuristic. noDictionaryKeys forces raw encoding
  • Dense forward index — direct docId-indexed access (no rank() call), with optimized null bitmaps for absent documents
  • Per-key inverted index — optional, for fast value-based filtering on dense keys
  • Co-iterator for absent-doc handling — uses presence bitmap to return default values for documents missing a key, avoiding full-segment-sized forward index for sparse keys

Segment Creation Wiring

  • BaseSegmentCreator — skips forward index creation for MAP columns with COLUMNAR_MAP enabled; persists per-key type metadata in segment properties
  • ColumnarMapColumnPreIndexStatsCollector — lightweight stats collector (doc count only, no min/max/cardinality)
  • StatsCollectorUtil — routes MAP columns to the columnar map stats collector
  • ColumnMinMaxValueGenerator — skips min/max generation for MAP columns
  • SegmentGeneratorConfig — exposes getColumnarMapColumnNames() for metadata persistence

How to Use

1. Schema definition (complexFieldSpecs)

{
  "schemaName": "myTable",
  "complexFieldSpecs": [
    {
      "name": "metrics",
      "dataType": "MAP",
      "keyTypes": {
        "clicks": "LONG",
        "spend": "DOUBLE",
        "country": "STRING"
      },
      "defaultValueType": "STRING"
    }
  ]
}
  • keyTypes (optional) — declares known keys and their data types for type coercion
  • defaultValueType (optional) — type for undeclared/dynamic keys (defaults to STRING)

2. Table config (fieldConfigList with COLUMNAR_MAP index)

{
  "fieldConfigList": [
    {
      "name": "metrics",
      "indexTypes": ["COLUMNAR_MAP"],
      "properties": {
        "maxKeys": "1000",
        "denseKeyThreshold": "0.5",
        "denseKeys": "country,sessions",
        "enableInvertedIndexForAll": "false",
        "invertedIndexKeys": "country,sessions"
      }
    }
  ]
}

What's NOT in this PR (comes in PR 2)

  • ColumnarMapDataSource — per-key query routing and DataSource construction
  • MutableColumnarMapIndexImpl — consuming segment support with O(1) lock-free reads
  • MapFilterOperator — per-key inverted index filter strategy, IS NULL/IS NOT NULL
  • ItemTransformFunction — null bitmap propagation for item() expressions
  • ImmutableSegmentImpl / MutableSegmentImpl — segment loading wiring
  • Performance optimizations (TransformBlock bypass, NonScanBasedAggregation, FixedBitSVForwardIndexReaderV2)

Test plan

  • ColumnarMapDataTypeTest — 11 tests (schema serialization, ComplexFieldSpec round-trip, keyTypes/defaultValueType)
  • ColumnarMapIndexConfigTest — 8 tests (config deserialization, denseKeys, properties round-trip)
  • ColumnarMapSegmentCreationTest — 1 test (end-to-end segment build pipeline with COLUMNAR_MAP index)
  • All 20 tests pass

@tarun11Mavani tarun11Mavani changed the title Add SPARSE_MAP storage layer: data type, SPI, segment creation, index… [Draft] Add SPARSE_MAP storage layer: data type, SPI, segment creation, index… Mar 17, 2026
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Mar 17, 2026

Codecov Report

❌ Patch coverage is 25.63492% with 937 lines in your changes missing coverage. Please review.
✅ Project coverage is 34.93%. Comparing base (aa483d3) to head (545d24e).
⚠️ Report is 40 commits behind head on master.

Files with missing lines Patch % Lines
...x/columnarmap/ImmutableColumnarMapIndexReader.java 0.00% 364 Missing ⚠️
...dex/columnarmap/OnHeapColumnarMapIndexCreator.java 48.33% 217 Missing and 31 partials ⚠️
...nt/index/columnarmap/ColumnarMapKeyDictionary.java 2.08% 92 Missing and 2 partials ⚠️
.../columnarmap/ColumnarMapKeyForwardIndexReader.java 0.00% 62 Missing ⚠️
...va/org/apache/pinot/spi/data/ComplexFieldSpec.java 24.07% 33 Missing and 8 partials ⚠️
...pinot/spi/config/table/ColumnarMapIndexConfig.java 26.92% 33 Missing and 5 partials ⚠️
...ent/index/columnarmap/ColumnarMapIndexHandler.java 44.44% 17 Missing and 3 partials ⚠️
...local/segment/creator/impl/BaseSegmentCreator.java 41.66% 9 Missing and 5 partials ⚠️
...egment/index/columnarmap/ColumnarMapIndexType.java 51.85% 11 Missing and 2 partials ⚠️
...segment/spi/index/metadata/ColumnMetadataImpl.java 0.00% 13 Missing ⚠️
... and 8 more

❗ There is a different number of reports uploaded between BASE (aa483d3) and HEAD (545d24e). Click for more details.

HEAD has 16 uploads less than BASE
Flag BASE (aa483d3) HEAD (545d24e)
java-21 5 3
unittests1 2 0
unittests 4 2
temurin 10 6
java-11 5 3
integration 6 4
custom-integration1 2 0
Additional details and impacted files
@@              Coverage Diff              @@
##             master   #17896       +/-   ##
=============================================
- Coverage     63.31%   34.93%   -28.39%     
+ Complexity     1627      789      -838     
=============================================
  Files          3229     3255       +26     
  Lines        196705   198612     +1907     
  Branches      30408    30770      +362     
=============================================
- Hits         124544    69380    -55164     
- Misses        62183   123101    +60918     
+ Partials       9978     6131     -3847     
Flag Coverage Δ
custom-integration1 ?
integration 100.00% <ø> (ø)
integration1 100.00% <ø> (ø)
integration2 0.00% <ø> (ø)
java-11 34.91% <25.63%> (-28.37%) ⬇️
java-21 34.90% <25.63%> (-28.37%) ⬇️
temurin 34.93% <25.63%> (-28.39%) ⬇️
unittests 34.92% <25.63%> (-28.39%) ⬇️
unittests1 ?
unittests2 34.92% <25.63%> (-0.05%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@raghavyadav01
Copy link
Copy Markdown
Collaborator

@tarun11Mavani Did you explore adding Sparse Map handling as part of Existing MAP (ComplexFieldSpec)? What were the challenges? I was thinking if you could plugin SparseIndexCreator/reader (may be a config) in MAP data type rather than creating a new data type ?

@xiangfu0 xiangfu0 added ingestion Related to data ingestion pipeline index Related to indexing (general) labels Mar 20, 2026
@tarun11Mavani
Copy link
Copy Markdown
Contributor Author

@tarun11Mavani Did you explore adding Sparse Map handling as part of Existing MAP (ComplexFieldSpec)? What were the challenges? I was thinking if you could plugin SparseIndexCreator/reader (may be a config) in MAP data type rather than creating a new data type ?

Actually, I started with this before deciding to move towards having a dedicated data type.

The short answer is that MAP and SPARSE_MAP have different enough semantics that extending MAP cleanly is harder than it appears.

What MAP currently does: a MAP column has a single forward index storing the raw map blob per document. Key access is done by deserializing the full map on the fly via MapIndexReaderWrapper. The $$key/$$value naming only appears in segment metadata properties, not as separate physical files.

Why plugging SparseMapIndexCreator into MAP is non-trivial:

  1. Schema model mismatch. ComplexFieldSpec for MAP has a single homogeneous value type (one DataType for all values). SPARSE_MAP's defining feature is per-key typed values (keyTypes: { "price": DOUBLE, "color": STRING }). There's no place to express this in the current MAP field spec without adding more conditional fields that are only valid if sparse_map = true.

  2. Access pattern is fundamentally different. MAP's MapIndexReaderWrapper deserializes the full map to access any key. SPARSE_MAP uses per-key presence bitmaps and typed columnar storage for O(1) typed key reads without full deserialization.

  3. No per-key inverted index in MAP. SPARSE_MAP adds optional per-key inverted indexes and dictionary-based GROUP BY support. Wiring these into MAP's BaseMapDataSource/MapIndexReader SPI would require extending those interfaces in ways that don't apply to regular MAP.

  4. Scattered conditionals and future maintenance risk. Routing sparse vs. non-sparse MAP through the same DataType means sprinkling if (isSparse) branches across multiple core flows. A new DataType makes the two code paths explicit and independently testable, with no risk of one silently breaking the other.


That said, a hybrid path is difficult but possible. Happy to explore that direction if the community prefers not adding a new DataType.

@raghavyadav01
Copy link
Copy Markdown
Collaborator

Thanks for the detailed write-up and the work put into this.

My main question before this moves forward: do we need a new DataType here, or can
this be an opt-in index implementation under the existing MAP type?

Looking at the existing SPI, MapIndexReader already has exactly the right
abstractions:

Map<IndexType, R> getKeyIndexes(String key);
IndexReader getKeyReader(String key, IndexType type);
FieldSpec getKeyFieldSpec(String key);
ColumnMetadata getKeyMetadata(String key);

And MapDataSource.getKeyDataSource(key) already returns a per-key DataSource —
which is precisely where filter operators look for indexes. The current
MapIndexReaderWrapper just implements these interfaces badly (full blob
deserialization). ImmutableSparseMapIndexReader looks like a better implementation
of MapIndexReader, not a fundamentally different contract.

The per-key typing gap (ComplexFieldSpec has one homogeneous value type) seems
addressable with a small backward-compatible addition:

{
"name": "properties",
"dataType": "MAP",
"keyType": "STRING",
"valueType": "STRING",
"keyTypes": { "price": "DOUBLE", "country": "STRING" }
}

Existing schemas are unaffected; keyTypes absent → current behavior.

If we go this route, the core storage work (OnHeapSparseMapIndexCreator,
ImmutableSparseMapIndexReader, MutableSparseMapIndexImpl, the binary format) stays
as-is — just wired as a MAP index implementation rather than a new DataType. The
high-blast-radius changes (FieldSpec.DataType enum, PinotDataType,
DataSchema.ColumnDataType, DataBlockBuilder, DataBlockExtractUtils, TypeFactory)
would largely drop out of the PR.

The query layer (PR 2) should also work without new operator types since the
existing filter path already goes through MapDataSource.getKeyDataSource(key) →
standard DataSource → FilterPlanNode index selection.

@tarun11Mavani
Copy link
Copy Markdown
Contributor Author

Thanks for the detailed write-up and the work put into this.

My main question before this moves forward: do we need a new DataType here, or can this be an opt-in index implementation under the existing MAP type?

Looking at the existing SPI, MapIndexReader already has exactly the right abstractions:

Map<IndexType, R> getKeyIndexes(String key); IndexReader getKeyReader(String key, IndexType type); FieldSpec getKeyFieldSpec(String key); ColumnMetadata getKeyMetadata(String key);

And MapDataSource.getKeyDataSource(key) already returns a per-key DataSource — which is precisely where filter operators look for indexes. The current MapIndexReaderWrapper just implements these interfaces badly (full blob deserialization). ImmutableSparseMapIndexReader looks like a better implementation of MapIndexReader, not a fundamentally different contract.

The per-key typing gap (ComplexFieldSpec has one homogeneous value type) seems addressable with a small backward-compatible addition:

{ "name": "properties", "dataType": "MAP", "keyType": "STRING", "valueType": "STRING", "keyTypes": { "price": "DOUBLE", "country": "STRING" } }

Existing schemas are unaffected; keyTypes absent → current behavior.

If we go this route, the core storage work (OnHeapSparseMapIndexCreator, ImmutableSparseMapIndexReader, MutableSparseMapIndexImpl, the binary format) stays as-is — just wired as a MAP index implementation rather than a new DataType. The high-blast-radius changes (FieldSpec.DataType enum, PinotDataType, DataSchema.ColumnDataType, DataBlockBuilder, DataBlockExtractUtils, TypeFactory) would largely drop out of the PR.

The query layer (PR 2) should also work without new operator types since the existing filter path already goes through MapDataSource.getKeyDataSource(key) → standard DataSource → FilterPlanNode index selection.

Thanks for the review! I have addressed this. the latest changes remove SPARSE_MAP from DataType/FieldType/ColumnDataType entirely and wire everything under the existing DataType.MAP.
Columnar storage is now an opt-in index via indexTypes: ["COLUMNAR_MAP"] in table config's fieldConfigList, with keyTypes/defaultValueType added as optional fields on ComplexFieldSpec.MapFieldSpec. The storage layer (binary format, creator, reader) is unchanged — just the wiring moved from a separate DataType to an index implementation on MAP.

@tarun11Mavani tarun11Mavani changed the title [Draft] Add SPARSE_MAP storage layer: data type, SPI, segment creation, index… Add COLUMNAR_MAP index for per-key columnar storage on MAP columns Mar 27, 2026
@tarun11Mavani tarun11Mavani force-pushed the sparse-map-1-storage branch 3 times, most recently from 16e23f0 to d66fb54 Compare March 27, 2026 14:17
@raghavyadav01
Copy link
Copy Markdown
Collaborator

@tarun11Mavani Do we need a new index Type ? I think storage format can be a flag in forward Index itself.
Do you see any advantage of using a separate Index?

@tarun11Mavani
Copy link
Copy Markdown
Contributor Author

@tarun11Mavani Do we need a new index Type ? I think storage format can be a flag in forward Index itself. Do you see any advantage of using a separate Index?

I had two options in mind to use for having columnar storage for MAP type.

Option A (current): Explicit IndexType.COLUMNAR_MAP in fieldConfigList                                                                         
Option B: Derive from schema — if keyTypes is present in MapFieldSpec, automatically use columnar storage

I went with A because it follows the same pattern as TEXT, JSON, and RANGE indexes — every storage/index optimization in Pinot is table-config-driven via fieldConfigList. The schema defines the logical model (what the data is), table config defines the physical storage (how to store it). This separation means:

  1. Same schema works with blob or columnar storage — you can toggle without a schema change
  2. Storage-level properties (maxKeys, invertedIndexKeys, noDictionaryKeys) live naturally in fieldConfig.properties, exactly like TEXT index properties
  3. Existing MAP tables are safe — no behavioral change unless you explicitly opt in

Option B couples type semantics (keyTypes) with storage layout, and still needs table config for properties anyway — so it doesn't actually eliminate the second config location, it just removes the activation flag.
I also feel that using columnar storage based on keyTypes is not very intuitive for users as they might want to provide keytypes but still want to use BLOB storage.

The dispatch in ImmutableSegmentImpl is clean: if columnarMapReader != null → ColumnarMapDataSource, else → ImmutableMapDataSource (blob path). Both implement MapDataSource, so the query layer uses the same MAP index APIs regardless of storage mode.

Happy to rename COLUMNAR_MAP to just MAP in the IndexType enum if that reads better — the key point is keeping activation in table config rather than schema.

@tarun11Mavani tarun11Mavani force-pushed the sparse-map-1-storage branch 2 times, most recently from b58b73f to a276e48 Compare April 7, 2026 07:30
@ankitsultana
Copy link
Copy Markdown
Contributor

Do we have a conclusion here? Did we decide on index type or field type?

@tarun11Mavani
Copy link
Copy Markdown
Contributor Author

Do we have a conclusion here? Did we decide on index type or field type?

For now, I have implemented this as a new indexType for MAP datatype. I am connecting with @raghavyadav01 and @Jackie-Jiang tomorrow to discuss this further.

@tarun11Mavani
Copy link
Copy Markdown
Contributor Author

tarun11Mavani commented Apr 17, 2026

Here is the RFC based on the offline discussion with @Jackie-Jiang and @raghavyadav01
https://docs.google.com/document/d/14kPmjDTKbO8l0ql4rrN7I5Yki5pqMw6GeGmxxc9grsU/edit?tab=t.0

I will refactor the PRs to publish smaller PRs once the design looks good.

@tarun11Mavani tarun11Mavani changed the title Add COLUMNAR_MAP index for per-key columnar storage on MAP columns [draft] [breaking this in multiple prs] Add COLUMNAR_MAP index for per-key columnar storage on MAP columns Apr 18, 2026
@tarun11Mavani tarun11Mavani changed the title [draft] [breaking this in multiple prs] Add COLUMNAR_MAP index for per-key columnar storage on MAP columns [Draft] [breaking this in multiple prs] Add COLUMNAR_MAP index for per-key columnar storage on MAP columns Apr 18, 2026
@tarun11Mavani tarun11Mavani changed the title [Draft] [breaking this in multiple prs] Add COLUMNAR_MAP index for per-key columnar storage on MAP columns Add COLUMNAR_MAP index for per-key columnar storage on MAP columns Apr 18, 2026
@tarun11Mavani tarun11Mavani changed the title Add COLUMNAR_MAP index for per-key columnar storage on MAP columns Add COLUMNAR_MAP index for per-key columnar storage for dense key and JSON storage for sparse key for MAP datatype Apr 18, 2026
@tarun11Mavani tarun11Mavani force-pushed the sparse-map-1-storage branch 2 times, most recently from 40651ac to e054bb0 Compare April 19, 2026 13:26
…immutable read path

Introduces the COLUMNAR_MAP index type for MAP columns with per-key columnar
storage. Includes ComplexFieldSpec enhancements, SPMX v3 binary format with
dense/sparse two-tier storage, dictionary encoding, forward index reader with
co-iterator, per-key inverted index, and index plugin/type/handler wiring.

Format details:
- 56-byte header (magic + version + numKeys + numDocs + numDenseKeys +
  numSparseKeys + 4 section offsets)
- 70-byte key metadata (tier flag + storedType + numDocs + 4 offset/length
  pairs for nullBitmap/forward/inverted/dictIdForward)
- Dense tier: full forward index per key with run-optimized null bitmap
- Sparse tier: JSON sidecar file with per-key SPMX entries reduced to type
  metadata (per-key presence bitmap added in PR-2 query layer)

Quality fixes (from self-review):
- sortValues() uses type-aware comparator matching ColumnarMapKeyDictionary,
  preventing wrong range query results and GROUP BY ordering for numeric keys
- Sparse sidecar JSON serialization uses Jackson ObjectMapper to handle
  control characters per RFC 8259
- Class-level Javadoc accurately documents the 56-byte header and 70-byte
  key metadata layout
- StandardIndexes.columnarMap() returns parameterized IndexType<> matching
  other accessor methods
- Preconditions.checkState guards bufferSize long-to-int cast
- ColumnarMapIndexHandler.updateIndices declares throws Exception
- DataOutputStream wrapped in try-with-resources
- WARN log when sparse sidecar missing but SPMX has sparse keys

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

index Related to indexing (general) ingestion Related to data ingestion pipeline

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants