Remove native text and FST index support#17998
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #17998 +/- ##
============================================
+ Coverage 63.27% 63.74% +0.47%
+ Complexity 1543 1541 -2
============================================
Files 3200 3160 -40
Lines 194074 190714 -3360
Branches 29883 29267 -616
============================================
- Hits 122792 121563 -1229
+ Misses 61637 59595 -2042
+ Partials 9645 9556 -89
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Pull request overview
This PR removes Pinot’s native text index implementation and makes text indexing Lucene-only, while adding compatibility behavior to reject legacy native configs and clean up legacy native on-disk artifacts during segment reload/rebuild.
Changes:
- Delete native text index creator/reader/mutable index implementations, related tests, and performance benchmarks.
- Reject
fstType=nativetext index configurations and dropTEXT_CONTAINSsupport (was native-dependent). - Add cleanup logic intended to remove legacy
.nativetext.idxfiles and rebuild Lucene text indexes where still configured.
Reviewed changes
Copilot reviewed 26 out of 26 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/store/SegmentDirectoryPaths.java | Removes native text index path helper. |
| pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/index/multicolumntext/MultiColumnTextMetadata.java | Drops native fstType from shared multi-col text properties. |
| pinot-segment-local/src/test/java/org/apache/pinot/segment/local/segment/store/FilePerIndexDirectoryTest.java | Updates tests to treat .nativetext.idx as legacy/ignored and ensure it is deleted on remove. |
| pinot-segment-local/src/test/java/org/apache/pinot/segment/local/segment/index/creator/NativeTextIndexCreatorTest.java | Removes native text index creator/reader unit tests. |
| pinot-segment-local/src/test/java/org/apache/pinot/segment/local/realtime/impl/invertedindex/NativeMutableTextIndexReaderWriterTest.java | Removes realtime native mutable text index tests. |
| pinot-segment-local/src/test/java/org/apache/pinot/segment/local/realtime/impl/invertedindex/NativeMutableTextIndexConcurrentTest.java | Removes native mutable text index concurrency tests. |
| pinot-segment-local/src/test/java/org/apache/pinot/segment/local/realtime/impl/invertedindex/NativeAndLuceneMutableTextIndexTest.java | Removes native-vs-lucene realtime comparison tests. |
| pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/store/TextIndexUtils.java | Makes cleanup helper public and removes native index presence checks from hasTextIndex. |
| pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/text/TextIndexType.java | Enforces “Lucene-only” by rejecting FSTType.NATIVE at validation/creation time and removes native reader/mutable index creation. |
| pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/text/TextIndexConfigBuilder.java | Stops trying to infer fstType from properties; fstType is now provided by the caller. |
| pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/readers/text/NativeTextIndexReader.java | Deletes native text index reader implementation. |
| pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/loader/invertedindex/TextIndexHandler.java | Adds logic intended to detect/remove legacy native text index files on reload and rebuild Lucene text indexes. |
| pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/converter/SegmentV1V2ToV3FormatConverter.java | Stops copying native text index files during v1/v2→v3 conversion. |
| pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/text/NativeTextIndexCreator.java | Deletes native text index creator implementation. |
| pinot-segment-local/src/main/java/org/apache/pinot/segment/local/realtime/impl/invertedindex/NativeMutableTextIndex.java | Deletes realtime native mutable text index implementation. |
| pinot-query-runtime/src/test/java/org/apache/pinot/query/runtime/queries/QueryRunnerTest.java | Updates expected error message for textContains() now that native is removed. |
| pinot-perf/src/main/java/org/apache/pinot/perf/BenchmarkNativeVsLuceneTextIndex.java | Removes native vs lucene benchmark. |
| pinot-integration-tests/src/test/java/org/apache/pinot/integration/tests/custom/TextIndicesTest.java | Removes native-text-index integration coverage and schema/config elements. |
| pinot-integration-tests/src/test/java/org/apache/pinot/integration/tests/custom/MultiColumnTextIndicesTest.java | Removes native-text-index integration coverage and schema/config elements. |
| pinot-integration-tests/src/test/java/org/apache/pinot/integration/tests/StaleSegmentCheckIntegrationTest.java | Removes native fst/native text config helpers from the stale-segment test. |
| pinot-core/src/test/resources/TableIndexingTest.csv | Removes native_text_index cases from indexing test matrix. |
| pinot-core/src/test/java/org/apache/pinot/queries/TextMatchTransformFunctionTest.java | Removes test case for TEXT_MATCH on native text index. |
| pinot-core/src/test/java/org/apache/pinot/queries/NativeAndLuceneComparisonTest.java | Removes native-vs-lucene query comparison tests. |
| pinot-core/src/test/java/org/apache/pinot/core/data/manager/TableIndexingTest.java | Removes native_text_index from test case generation and expectations. |
| pinot-core/src/main/java/org/apache/pinot/core/plan/FilterPlanNode.java | Makes TEXT_CONTAINS unsupported at execution time with an explicit exception. |
| pinot-core/src/main/java/org/apache/pinot/core/operator/transform/function/TextMatchTransformFunction.java | Removes checks related to native text index readers (no longer applicable). |
7553a7e to
bb5455c
Compare
3f20784 to
b07fbe7
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 122 out of 126 changed files in this pull request and generated 2 comments.
Comments suppressed due to low confidence (1)
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/fst/FstIndexUtils.java:35
isLegacyNativeFstcallsfstBuffer.getInt(0)unconditionally. If the FST index file is empty/truncated, this can throw and fail segment reload with a confusing error. Consider guarding with a size check (e.g., require at least 4 bytes) and treating too-small buffers as non-legacy (or throwing a clearer exception).
1fbaeaa to
e58bf41
Compare
a2f32b0 to
864cd61
Compare
c633528 to
0aa66d9
Compare
0aa66d9 to
7a4ef98
Compare
- Remove unused findNativeTextIndexIndexFile() from SegmentDirectoryPaths - Combine contains+remove into single remove() in FSTIndexHandler - Replace cleanupTextIndexFiles() with segmentWriter.removeIndex() for managed indexes; keep targeted deletion only for legacy .nativetext.idx - Make LEGACY_NATIVE_FST_MAGIC private (only used internally) - Remove dead compatibility tests (newNativeTypeIsIgnored, etc.) - Clean up unused imports Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Remove withLegacyLuceneTypeIgnored, withLegacyNativeTypeIgnored from FstIndexConfigTest and withLegacyNativeFstTypeIgnored from TextIndexConfigTest. These only verified that Jackson ignores unknown JSON properties and did not exercise any live Pinot behavior. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Remove legacy native indexes before fetching existingColumns so they no longer appear in the set. This eliminates the need for the legacyNativeColumns guard in FSTIndexHandler and the recreatedLegacyNativeColumns tracking in TextIndexHandler. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Remove deleteLegacyNativeTextIndexFiles since segmentWriter.removeIndex() already calls TextIndexUtils.cleanupTextIndex() which handles native files - Remove dead conditional inProgress marker block — the marker is unconditionally recreated on the next line - Remove unused indexDir/segmentDirectory locals from updateIndices Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Revert createTextIndexForColumn to use the original inprogress marker pattern and cleanupExistingTextIndexFiles method from upstream master. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Summary
TEXT_CONTAINSsupport because it depended on the native text index implementationutils/fstpackage and have the Lucene FST/IFST creators and readers use Lucene APIs directlyWhy
Pinot introduced native text and native FST support across several PRs, but the codebase still carried native-specific implementations, query handling, tests, benchmarks, and compatibility surface after the Lucene path became the only intended implementation. This change finishes that cleanup, removes the leftover Pinot-local Lucene FST utility layer, and keeps only the compatibility hooks needed to reject old configs and migrate old on-disk indexes during reload.
User Impact
.nativetext.idxfiles are removed on reload instead of being treated as live text indexes.TEXT_CONTAINSis no longer supported.Out Of Scope
FSTType.NATIVEandfstType=nativeexplicitly.How To Reproduce
fstType=native, configure an FST index withFSTType.NATIVE, run a query withTEXT_CONTAINS, or load a segment that still contains native text or legacy native FST files.Validation
./mvnw -pl pinot-core,pinot-segment-local,pinot-segment-spi,pinot-query-runtime,pinot-integration-tests,pinot-perf spotless:apply -DskipTests./mvnw -pl pinot-core,pinot-segment-local,pinot-segment-spi,pinot-query-runtime,pinot-integration-tests,pinot-perf checkstyle:check license:format license:check -DskipTests./mvnw -pl pinot-core,pinot-query-runtime -am -Dtest=TextMatchTransformFunctionTest,TableIndexingTest,QueryRunnerTest -Dsurefire.failIfNoSpecifiedTests=false test -rf :pinot-core./mvnw -pl pinot-common,pinot-core,pinot-query-planner,pinot-query-runtime -am -Dtest=FunctionRegistryTest,FilterOperatorUtilsTest,QueryCompilationTest,QueryRunnerTest -Dsurefire.failIfNoSpecifiedTests=false test./mvnw -pl pinot-segment-local,pinot-integration-tests,pinot-perf -am -DskipTests test-compile./mvnw -pl pinot-spi,pinot-segment-spi spotless:apply -DskipTests./mvnw -pl pinot-spi,pinot-segment-spi checkstyle:check license:format license:check -DskipTests./mvnw -pl pinot-spi,pinot-segment-spi -am -Dtest=TextIndexConfigTest,FstIndexConfigTest -Dsurefire.failIfNoSpecifiedTests=false test./mvnw -pl pinot-segment-local -am -Dtest=SegmentPreProcessorTest,FstIndexTypeTest,FilePerIndexDirectoryTest -Dsurefire.failIfNoSpecifiedTests=false test./mvnw -pl pinot-core -am -Dtest=FSTBasedRegexpLikeQueriesTest,IFSTBasedRegexpLikeQueriesTest,CrcUtilsTest -Dsurefire.failIfNoSpecifiedTests=false test./mvnw -pl pinot-core,pinot-segment-local,pinot-segment-spi,pinot-spi,pinot-perf spotless:apply -DskipTests./mvnw -pl pinot-core,pinot-segment-local,pinot-segment-spi,pinot-spi,pinot-perf checkstyle:check -DskipTests./mvnw -pl pinot-core,pinot-segment-local,pinot-segment-spi,pinot-spi,pinot-perf license:format -DskipTests./mvnw -pl pinot-core,pinot-segment-local,pinot-segment-spi,pinot-spi,pinot-perf license:check -DskipTests./mvnw -pl pinot-perf -am -DskipTests test-compile./mvnw -pl pinot-segment-local spotless:apply -DskipTests./mvnw -pl pinot-segment-local checkstyle:check license:format license:check -DskipTests./mvnw -q -pl pinot-segment-local -DskipTests test-compile./mvnw -q -pl pinot-segment-local -Dtest=LuceneFSTIndexCreatorTest,LuceneIFSTIndexCreatorTest,FstIndexTypeTest,IFSTIndexTypeTest,SegmentPreProcessorTest,FilePerIndexDirectoryTest -Dsurefire.failIfNoSpecifiedTests=false test./mvnw -q -pl pinot-core -Dtest=FSTBasedRegexpLikeQueriesTest,IFSTBasedRegexpLikeQueriesTest -Dsurefire.failIfNoSpecifiedTests=false test