Skip to content

Remove native text and FST index support#17998

Merged
xiangfu0 merged 13 commits intoapache:masterfrom
xiangfu0:codex/remove-native-text-index
Apr 1, 2026
Merged

Remove native text and FST index support#17998
xiangfu0 merged 13 commits intoapache:masterfrom
xiangfu0:codex/remove-native-text-index

Conversation

@xiangfu0
Copy link
Copy Markdown
Contributor

@xiangfu0 xiangfu0 commented Mar 27, 2026

Summary

  • remove the native text index reader, creator, mutable index, benchmark, and native-only tests
  • remove TEXT_CONTAINS support because it depended on the native text index implementation
  • remove the native FST creator, reader, realtime mutable index, benchmark, nativefst package, and native-only tests
  • remove the remaining Pinot-local utils/fst package and have the Lucene FST/IFST creators and readers use Lucene APIs directly
  • make Pinot text and FST indexing Lucene-only while keeping only the legacy config rejection and on-disk cleanup or rebuild hooks

Why

Pinot introduced native text and native FST support across several PRs, but the codebase still carried native-specific implementations, query handling, tests, benchmarks, and compatibility surface after the Lucene path became the only intended implementation. This change finishes that cleanup, removes the leftover Pinot-local Lucene FST utility layer, and keeps only the compatibility hooks needed to reject old configs and migrate old on-disk indexes during reload.

User Impact

  • TEXT indexes now use Lucene only.
  • FST indexes now use Lucene only.
  • Existing .nativetext.idx files are removed on reload instead of being treated as live text indexes.
  • Existing legacy native FST files are rebuilt as Lucene FST indexes on reload when the table config still enables FST.
  • TEXT_CONTAINS is no longer supported.
  • Table configs that still specify native text or native FST indexing are rejected instead of silently using removed implementations.

Out Of Scope

  • Compatibility parsing for legacy configs remains so Pinot can reject FSTType.NATIVE and fstType=native explicitly.
  • IFST was already Lucene-only before this PR; this cleanup removes the remaining Pinot-local helper layer rather than changing IFST semantics.

How To Reproduce

  1. Configure a TEXT index with fstType=native, configure an FST index with FSTType.NATIVE, run a query with TEXT_CONTAINS, or load a segment that still contains native text or legacy native FST files.
  2. Observe that Pinot previously still carried native reader or creator code paths, native-only query behavior, and native-only tests and benchmarks, plus a separate Pinot-local Lucene FST utility package.
  3. Apply this change and reload the segment, validate the table config again, or rerun the query.

Validation

  • ./mvnw -pl pinot-core,pinot-segment-local,pinot-segment-spi,pinot-query-runtime,pinot-integration-tests,pinot-perf spotless:apply -DskipTests
  • ./mvnw -pl pinot-core,pinot-segment-local,pinot-segment-spi,pinot-query-runtime,pinot-integration-tests,pinot-perf checkstyle:check license:format license:check -DskipTests
  • ./mvnw -pl pinot-core,pinot-query-runtime -am -Dtest=TextMatchTransformFunctionTest,TableIndexingTest,QueryRunnerTest -Dsurefire.failIfNoSpecifiedTests=false test -rf :pinot-core
  • ./mvnw -pl pinot-common,pinot-core,pinot-query-planner,pinot-query-runtime -am -Dtest=FunctionRegistryTest,FilterOperatorUtilsTest,QueryCompilationTest,QueryRunnerTest -Dsurefire.failIfNoSpecifiedTests=false test
  • ./mvnw -pl pinot-segment-local,pinot-integration-tests,pinot-perf -am -DskipTests test-compile
  • ./mvnw -pl pinot-spi,pinot-segment-spi spotless:apply -DskipTests
  • ./mvnw -pl pinot-spi,pinot-segment-spi checkstyle:check license:format license:check -DskipTests
  • ./mvnw -pl pinot-spi,pinot-segment-spi -am -Dtest=TextIndexConfigTest,FstIndexConfigTest -Dsurefire.failIfNoSpecifiedTests=false test
  • ./mvnw -pl pinot-segment-local -am -Dtest=SegmentPreProcessorTest,FstIndexTypeTest,FilePerIndexDirectoryTest -Dsurefire.failIfNoSpecifiedTests=false test
  • ./mvnw -pl pinot-core -am -Dtest=FSTBasedRegexpLikeQueriesTest,IFSTBasedRegexpLikeQueriesTest,CrcUtilsTest -Dsurefire.failIfNoSpecifiedTests=false test
  • ./mvnw -pl pinot-core,pinot-segment-local,pinot-segment-spi,pinot-spi,pinot-perf spotless:apply -DskipTests
  • ./mvnw -pl pinot-core,pinot-segment-local,pinot-segment-spi,pinot-spi,pinot-perf checkstyle:check -DskipTests
  • ./mvnw -pl pinot-core,pinot-segment-local,pinot-segment-spi,pinot-spi,pinot-perf license:format -DskipTests
  • ./mvnw -pl pinot-core,pinot-segment-local,pinot-segment-spi,pinot-spi,pinot-perf license:check -DskipTests
  • ./mvnw -pl pinot-perf -am -DskipTests test-compile
  • ./mvnw -pl pinot-segment-local spotless:apply -DskipTests
  • ./mvnw -pl pinot-segment-local checkstyle:check license:format license:check -DskipTests
  • ./mvnw -q -pl pinot-segment-local -DskipTests test-compile
  • ./mvnw -q -pl pinot-segment-local -Dtest=LuceneFSTIndexCreatorTest,LuceneIFSTIndexCreatorTest,FstIndexTypeTest,IFSTIndexTypeTest,SegmentPreProcessorTest,FilePerIndexDirectoryTest -Dsurefire.failIfNoSpecifiedTests=false test
  • ./mvnw -q -pl pinot-core -Dtest=FSTBasedRegexpLikeQueriesTest,IFSTBasedRegexpLikeQueriesTest -Dsurefire.failIfNoSpecifiedTests=false test

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Mar 27, 2026

Codecov Report

❌ Patch coverage is 69.11765% with 21 lines in your changes missing coverage. Please review.
✅ Project coverage is 63.74%. Comparing base (cfd7268) to head (f717c19).
⚠️ Report is 22 commits behind head on master.

Files with missing lines Patch % Lines
...pache/pinot/segment/spi/index/TextIndexConfig.java 14.28% 6 Missing ⚠️
...nt/index/loader/invertedindex/FSTIndexHandler.java 73.68% 4 Missing and 1 partial ⚠️
...t/index/loader/invertedindex/TextIndexHandler.java 81.81% 2 Missing and 2 partials ⚠️
.../segment/local/segment/index/fst/FstIndexType.java 66.66% 1 Missing and 1 partial ⚠️
...segment/local/segment/index/fst/FstIndexUtils.java 33.33% 1 Missing and 1 partial ⚠️
...segment/local/segment/index/fst/IFSTIndexType.java 50.00% 1 Missing ⚠️
...apache/pinot/segment/spi/index/FstIndexConfig.java 50.00% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #17998      +/-   ##
============================================
+ Coverage     63.27%   63.74%   +0.47%     
+ Complexity     1543     1541       -2     
============================================
  Files          3200     3160      -40     
  Lines        194074   190714    -3360     
  Branches      29883    29267     -616     
============================================
- Hits         122792   121563    -1229     
+ Misses        61637    59595    -2042     
+ Partials       9645     9556      -89     
Flag Coverage Δ
custom-integration1 100.00% <ø> (ø)
integration 100.00% <ø> (ø)
integration1 100.00% <ø> (ø)
integration2 0.00% <ø> (ø)
java-11 63.68% <69.11%> (+0.43%) ⬆️
java-21 63.71% <69.11%> (+8.23%) ⬆️
temurin 63.74% <69.11%> (+0.47%) ⬆️
unittests 63.73% <69.11%> (+0.47%) ⬆️
unittests1 56.25% <44.11%> (+0.74%) ⬆️
unittests2 34.14% <64.70%> (-0.06%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@xiangfu0 xiangfu0 added the deprecation Marks deprecated APIs, configs, or features label Mar 27, 2026
@xiangfu0 xiangfu0 marked this pull request as ready for review March 27, 2026 18:33
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR removes Pinot’s native text index implementation and makes text indexing Lucene-only, while adding compatibility behavior to reject legacy native configs and clean up legacy native on-disk artifacts during segment reload/rebuild.

Changes:

  • Delete native text index creator/reader/mutable index implementations, related tests, and performance benchmarks.
  • Reject fstType=native text index configurations and drop TEXT_CONTAINS support (was native-dependent).
  • Add cleanup logic intended to remove legacy .nativetext.idx files and rebuild Lucene text indexes where still configured.

Reviewed changes

Copilot reviewed 26 out of 26 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/store/SegmentDirectoryPaths.java Removes native text index path helper.
pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/index/multicolumntext/MultiColumnTextMetadata.java Drops native fstType from shared multi-col text properties.
pinot-segment-local/src/test/java/org/apache/pinot/segment/local/segment/store/FilePerIndexDirectoryTest.java Updates tests to treat .nativetext.idx as legacy/ignored and ensure it is deleted on remove.
pinot-segment-local/src/test/java/org/apache/pinot/segment/local/segment/index/creator/NativeTextIndexCreatorTest.java Removes native text index creator/reader unit tests.
pinot-segment-local/src/test/java/org/apache/pinot/segment/local/realtime/impl/invertedindex/NativeMutableTextIndexReaderWriterTest.java Removes realtime native mutable text index tests.
pinot-segment-local/src/test/java/org/apache/pinot/segment/local/realtime/impl/invertedindex/NativeMutableTextIndexConcurrentTest.java Removes native mutable text index concurrency tests.
pinot-segment-local/src/test/java/org/apache/pinot/segment/local/realtime/impl/invertedindex/NativeAndLuceneMutableTextIndexTest.java Removes native-vs-lucene realtime comparison tests.
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/store/TextIndexUtils.java Makes cleanup helper public and removes native index presence checks from hasTextIndex.
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/text/TextIndexType.java Enforces “Lucene-only” by rejecting FSTType.NATIVE at validation/creation time and removes native reader/mutable index creation.
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/text/TextIndexConfigBuilder.java Stops trying to infer fstType from properties; fstType is now provided by the caller.
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/readers/text/NativeTextIndexReader.java Deletes native text index reader implementation.
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/loader/invertedindex/TextIndexHandler.java Adds logic intended to detect/remove legacy native text index files on reload and rebuild Lucene text indexes.
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/converter/SegmentV1V2ToV3FormatConverter.java Stops copying native text index files during v1/v2→v3 conversion.
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/text/NativeTextIndexCreator.java Deletes native text index creator implementation.
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/realtime/impl/invertedindex/NativeMutableTextIndex.java Deletes realtime native mutable text index implementation.
pinot-query-runtime/src/test/java/org/apache/pinot/query/runtime/queries/QueryRunnerTest.java Updates expected error message for textContains() now that native is removed.
pinot-perf/src/main/java/org/apache/pinot/perf/BenchmarkNativeVsLuceneTextIndex.java Removes native vs lucene benchmark.
pinot-integration-tests/src/test/java/org/apache/pinot/integration/tests/custom/TextIndicesTest.java Removes native-text-index integration coverage and schema/config elements.
pinot-integration-tests/src/test/java/org/apache/pinot/integration/tests/custom/MultiColumnTextIndicesTest.java Removes native-text-index integration coverage and schema/config elements.
pinot-integration-tests/src/test/java/org/apache/pinot/integration/tests/StaleSegmentCheckIntegrationTest.java Removes native fst/native text config helpers from the stale-segment test.
pinot-core/src/test/resources/TableIndexingTest.csv Removes native_text_index cases from indexing test matrix.
pinot-core/src/test/java/org/apache/pinot/queries/TextMatchTransformFunctionTest.java Removes test case for TEXT_MATCH on native text index.
pinot-core/src/test/java/org/apache/pinot/queries/NativeAndLuceneComparisonTest.java Removes native-vs-lucene query comparison tests.
pinot-core/src/test/java/org/apache/pinot/core/data/manager/TableIndexingTest.java Removes native_text_index from test case generation and expectations.
pinot-core/src/main/java/org/apache/pinot/core/plan/FilterPlanNode.java Makes TEXT_CONTAINS unsupported at execution time with an explicit exception.
pinot-core/src/main/java/org/apache/pinot/core/operator/transform/function/TextMatchTransformFunction.java Removes checks related to native text index readers (no longer applicable).

@xiangfu0 xiangfu0 force-pushed the codex/remove-native-text-index branch from 7553a7e to bb5455c Compare March 27, 2026 19:16
@xiangfu0 xiangfu0 changed the title [codex] Remove native text index support Remove native text index support Mar 27, 2026
@xiangfu0 xiangfu0 changed the title Remove native text index support Remove native text and FST index support Mar 27, 2026
@xiangfu0 xiangfu0 force-pushed the codex/remove-native-text-index branch from 3f20784 to b07fbe7 Compare March 27, 2026 20:38
@xiangfu0 xiangfu0 requested a review from Copilot March 27, 2026 21:13
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 122 out of 126 changed files in this pull request and generated 2 comments.

Comments suppressed due to low confidence (1)

pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/fst/FstIndexUtils.java:35

  • isLegacyNativeFst calls fstBuffer.getInt(0) unconditionally. If the FST index file is empty/truncated, this can throw and fail segment reload with a confusing error. Consider guarding with a size check (e.g., require at least 4 bytes) and treating too-small buffers as non-legacy (or throwing a clearer exception).

@xiangfu0 xiangfu0 requested a review from Copilot March 27, 2026 22:56
Comment thread .github/workflows/pinot_tests.yml Outdated
Comment thread pinot-spi/src/main/java/org/apache/pinot/spi/config/table/FSTType.java Outdated
@xiangfu0 xiangfu0 force-pushed the codex/remove-native-text-index branch 3 times, most recently from a2f32b0 to 864cd61 Compare March 29, 2026 07:12
@xiangfu0 xiangfu0 force-pushed the codex/remove-native-text-index branch 3 times, most recently from c633528 to 0aa66d9 Compare March 29, 2026 07:28
@xiangfu0 xiangfu0 force-pushed the codex/remove-native-text-index branch from 0aa66d9 to 7a4ef98 Compare March 29, 2026 07:33
@xiangfu0 xiangfu0 requested a review from Jackie-Jiang March 29, 2026 07:33
- Remove unused findNativeTextIndexIndexFile() from SegmentDirectoryPaths
- Combine contains+remove into single remove() in FSTIndexHandler
- Replace cleanupTextIndexFiles() with segmentWriter.removeIndex() for
  managed indexes; keep targeted deletion only for legacy .nativetext.idx
- Make LEGACY_NATIVE_FST_MAGIC private (only used internally)
- Remove dead compatibility tests (newNativeTypeIsIgnored, etc.)
- Clean up unused imports

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Remove withLegacyLuceneTypeIgnored, withLegacyNativeTypeIgnored from
FstIndexConfigTest and withLegacyNativeFstTypeIgnored from
TextIndexConfigTest. These only verified that Jackson ignores unknown
JSON properties and did not exercise any live Pinot behavior.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@xiangfu0 xiangfu0 requested a review from Jackie-Jiang March 29, 2026 19:57
xiangfu0 and others added 3 commits March 30, 2026 19:04
Remove legacy native indexes before fetching existingColumns so they
no longer appear in the set. This eliminates the need for the
legacyNativeColumns guard in FSTIndexHandler and the
recreatedLegacyNativeColumns tracking in TextIndexHandler.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Remove deleteLegacyNativeTextIndexFiles since segmentWriter.removeIndex()
  already calls TextIndexUtils.cleanupTextIndex() which handles native files
- Remove dead conditional inProgress marker block — the marker is
  unconditionally recreated on the next line
- Remove unused indexDir/segmentDirectory locals from updateIndices

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Revert createTextIndexForColumn to use the original inprogress marker
pattern and cleanupExistingTextIndexFiles method from upstream master.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@xiangfu0 xiangfu0 requested a review from Jackie-Jiang March 31, 2026 07:54
@xiangfu0 xiangfu0 merged commit 9866ce4 into apache:master Apr 1, 2026
43 of 48 checks passed
@xiangfu0 xiangfu0 deleted the codex/remove-native-text-index branch April 1, 2026 07:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

deprecation Marks deprecated APIs, configs, or features

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants