Wire v1.37 tokenization config (textAnalyzer, stopwordPresets) through public types#429
Conversation
There was a problem hiding this comment.
Orca Security Scan Summary
| Status | Check | Issues by priority | |
|---|---|---|---|
| Infrastructure as Code | View in Orca | ||
| SAST | View in Orca | ||
| Secrets | View in Orca | ||
| Vulnerabilities | View in Orca |
CI prettier flagged whitespace inside empty `() => { }` arrow bodies.
Strip to `() => {}` to match repo style.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR updates the public TypeScript client types and (de)serialization to expose Weaviate v1.37’s per-property text-analysis configuration (textAnalyzer) and collection-level invertedIndex.stopwordPresets, and ensures the tokenize endpoint uses the same shared translation logic.
Changes:
- Exposes
TextAnalyzerConfigand wires it through collection property create/read types, with shared union↔wire translation helpers. - Exposes
InvertedIndexConfig.stopwordPresetson schema create/read surfaces and maps it through config deserialization. - Updates tokenize endpoint typing/docs and CI matrix to target Weaviate
1.37.2, plus adds unit + integration coverage for round-tripping.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| test/collections/tokenization/integration.test.ts | Adds integration coverage for schema-config round-tripping of textAnalyzer and stopwordPresets. |
| src/tokenize/index.ts | Switches tokenize analyzerConfig serialization to the shared translator and updates stopword preset typing. |
| src/collections/tokenization/unit.test.ts | Adds type-level tests pinning the public tokenization surface across schema refreshes. |
| src/collections/configure/types/base.ts | Wires textAnalyzer and stopwordPresets into public “configure/create/update” types. |
| src/collections/config/utils.ts | Introduces shared textAnalyzerConfigToWire / textAnalyzerConfigFromWire and plugs into schema create + config.get mapping. |
| src/collections/config/types/index.ts | Adds public TextAnalyzerConfig and exposes stopwordPresets + PropertyConfig.textAnalyzer. |
| .github/workflows/main.yaml | Updates CI matrix Weaviate 1.37 entry to 1.37.2. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| }); | ||
| }); | ||
|
|
||
| requireAtLeast(1, 37, 2).describe('tokenize stopwords / stopwordPresets (>= 1.37.2)', () => { |
There was a problem hiding this comment.
| requireAtLeast(1, 37, 2).describe('tokenize stopwords / stopwordPresets (>= 1.37.2)', () => { | |
| requireAtLeast(1, 37, 2).describe('tokenize stopwords / stopwordPresets', () => { |
nit: the clarification is redundant as requireAtLeast(1, 37, 2) says the same
| return out.asciiFold !== undefined || out.asciiFoldIgnore !== undefined || out.stopwordPreset !== undefined | ||
| ? out | ||
| : undefined; |
There was a problem hiding this comment.
| return out.asciiFold !== undefined || out.asciiFoldIgnore !== undefined || out.stopwordPreset !== undefined | |
| ? out | |
| : undefined; | |
| return out; |
nit: is there any harm in returning an empty {} object if all it's properties are ?-optional? Especially on the wire, where these will be turned to nulls or omitted entirely.
The return long_condition ? out : undefined seems unnecessary.
Summary
Brings the TS client to parity with the python client for Weaviate v1.37 tokenization config. Pre-patch, users had to fall back to
as anyfor per-propertytextAnalyzer,invertedIndex.stopwordPresets, and the/v1/tokenizestopwords/stopwordPresetsfields.Public surface:
TextAnalyzerConfig— new type used for both per-propertytextAnalyzerandtokenize.text({ analyzerConfig }). Ergonomic union:asciiFold: boolean | { ignore: string[] }.InvertedIndexConfig.stopwordPresets— exposed on create / read / update, plus on theconfigure.invertedIndex(...)andreconfigure.invertedIndex(...)builders.tokenize.text— now acceptsstopwords(one-off block) andstopwordPresets(named catalog). Mutually exclusive — passing both rejects client-side withWeaviateInvalidInputError. Version-gated at>= 1.37.2.Schema:
tools/refresh_schema.sh v1.37.2refreshedsrc/openapi/schema.tssoTokenizeRequestcarriesstopwords(top-level) and the flatstopwordPresetsshape. CI matrix bumped to1.37.2.Test plan
WEAVIATE_VERSION=1.37.2 npm run test:unit— 323/323 passnpm run build/npm run lint— cleantest/tokenize/integration.test.ts— coversanalyzerConfig,stopwords(preset+additions / additions-only / removals-only),stopwordPresets(named ref / builtin override), mutex rejection. Inputs/outputs match the python integration suite.test/collections/tokenization/integration.test.ts— round-tripstextAnalyzerandstopwordPresetsthroughcollection.config.get().🤖 Generated with Claude Code