Fix sentence boundary logic for non-English languages (#6189) by beastoin · Pull Request #6237 · BasedHardware/omi

beastoin · 2026-04-01T04:32:45Z

Sentence splitting and segment combination logic hardcoded English punctuation .!? as the only sentence-ending markers. This broke non-English languages: Chinese sentences ran together (。 not recognized), Hindi segments never split on ।, Arabic ؟ was ignored.

Changes:

Define shared SENTENCE_ENDERS frozenset with Unicode sentence-ending punctuation: English (.!?), CJK (。！？), Arabic/Urdu (؟۔), Hindi/Sanskrit (।॥)
Pre-compiled regex helpers (SENTENCE_SPLIT_RE, SENTENCE_FINDALL_RE) built from the constant
Replace all 6 hardcoded .?! checks in combine_segments() helpers with SENTENCE_ENDERS
Fix _is_sentence_complete() — replace text[0].isupper() with not _starts_with_lowercase_cased() to handle caseless scripts (CJK, Arabic, Hindi)
Update split_into_sentences() in translation.py to use pre-compiled Unicode-aware regex (top-level import)
Update _is_text_stable() and _compute_stability_signals() in translation_coordinator.py

Review cycle fixes:

Renamed private _SENTENCE_* symbols to public SENTENCE_* for clean cross-module imports
Moved in-function import in translation.py to top-level (repo rule compliance)
Added 6 coordinator tests for CJK/Hindi/Arabic stability detection

Tests: 205 passed (22 + 86 + 77 existing + 20 new non-English boundary tests)

CJK: 。！？ sentence splitting, long text boundary, mixed English/CJK, stability detection
Hindi: । (danda) boundary detection, stability signals
Arabic: ؟ question mark boundary, stability signals
Caseless script handling in merge logic

Files changed:

backend/models/transcript_segment.py — SENTENCE_ENDERS constant, regex helpers, 6 check replacements
backend/utils/translation.py — split_into_sentences() regex update, top-level import
backend/utils/translation_coordinator.py — stability check updates
backend/tests/unit/test_transcript_segment.py — 8 new tests
backend/tests/unit/test_translation_optimization.py — 6 new tests
backend/tests/unit/test_translation_cost_optimization.py — 6 new tests

Risk: Conservative approach — only added well-established sentence enders for supported locales. Avoided ambiguous marks (Greek ;, Armenian :, Spanish ¡¿) that could over-split.

by AI for @beastoin

Define SENTENCE_ENDERS frozenset with CJK (。！？), Arabic (؟۔), Hindi (।॥) punctuation. Replace all 6 hardcoded English-only checks in combine_segments() helpers. Fix _is_sentence_complete() to handle caseless scripts (CJK, Arabic). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Replace hardcoded .?! regex with pre-compiled _SENTENCE_FINDALL_RE from transcript_segment module. Enables correct sentence splitting for CJK, Arabic, Hindi text in translation batching. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Replace hardcoded .?! checks in _is_text_stable() and _compute_stability_signals() with imported SENTENCE_ENDERS constant. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Test CJK (。！？), Hindi (।), Arabic (؟) sentence enders in segment merging/splitting logic. 8 new tests covering boundary detection, long text split, caseless continuation, and mixed-script text. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Test Chinese (。！？), Hindi (।), Arabic (؟), and mixed English/CJK sentence splitting. 6 new tests verifying correct boundary detection for translation batching. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Rename _SENTENCE_ENDERS_CLASS, _SENTENCE_SPLIT_RE, _SENTENCE_FINDALL_RE to public names (no underscore prefix) so utils modules can import them at top level without coupling to private internals. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Fix in-function import violation: import SENTENCE_FINDALL_RE at module top level instead of inside split_into_sentences(). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

6 new tests verifying _is_text_stable() and _compute_stability_signals() recognize Unicode sentence enders (。।؟) for translation coordinator. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

greptile-apps · 2026-04-01T04:47:45Z

Greptile Summary

This PR fixes sentence-boundary detection for non-English languages by introducing a shared SENTENCE_ENDERS frozenset (CJK 。！？, Arabic ؟۔, Hindi ।॥, plus existing English .?!) and replacing six hardcoded .?! checks in combine_segments(), split_into_sentences(), and the translation-coordinator stability signals. The _is_sentence_complete helper is also corrected to use _starts_with_lowercase_cased() instead of text[0].isupper(), so caseless scripts (CJK, Arabic, Hindi) are handled properly. Changes are conservative, well-scoped, and covered by 14 new unit tests alongside the existing 22.

Key findings:

split_into_sentences() in translation.py adds from models.transcript_segment import _SENTENCE_FINDALL_RE inside the function body, directly violating the project's backend-imports rule (no in-function imports). No circular-import risk exists, so the import can be moved to the top of the file safely.
_SENTENCE_FINDALL_RE begins with _, marking it as a module-private symbol; importing it across modules is a code smell. It should either be made public (drop the _) or translation.py should build its own pattern from the already-public SENTENCE_ENDERS.
The regex pattern strings are built by iterating a frozenset, whose character order is non-deterministic across Python versions/restarts. Correctness is unaffected, but using a plain str literal for the join would make the compiled patterns stable and readable.

Confidence Score: 5/5

Safe to merge — all remaining findings are P2 style/convention issues that do not affect runtime correctness.

The functional fix (SENTENCE_ENDERS, regex updates, caseless-script handling) is correct and well-tested. The two remaining comments concern a rule-violating in-function import and a frozenset-ordering cosmetic issue — neither affects behavior.

backend/utils/translation.py — in-function import should be moved to the top-level import block.

Important Files Changed

Filename	Overview
backend/models/transcript_segment.py	Core change: adds `SENTENCE_ENDERS` frozenset and pre-compiled regexes; replaces 6 hardcoded `.?!` checks; fixes `_is_sentence_complete` for caseless scripts. Minor: frozenset used for regex construction gives non-deterministic pattern string.
backend/utils/translation.py	Updates `split_into_sentences()` to use pre-compiled Unicode-aware regex; however, the import of `_SENTENCE_FINDALL_RE` is placed inside the function body, violating the project's backend-imports rule.
backend/utils/translation_coordinator.py	Clean two-line update: imports `SENTENCE_ENDERS` at module level and replaces two hardcoded `'.?!'` checks in stability detection.
backend/tests/unit/test_transcript_segment.py	Adds 8 targeted tests covering CJK (。！？), Hindi (।), Arabic (؟), long-text boundary, mixed English/CJK, and caseless-continuation edge cases.
backend/tests/unit/test_translation_optimization.py	Adds 6 `split_into_sentences` tests for CJK, Hindi, Arabic, mixed, and no-punctuation cases — all straightforward and correct.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[Transcript text received] --> B{Last char in SENTENCE_ENDERS?}
    B -- Yes --> C[Add STABILITY_PUNCTUATION signal]
    B -- No --> D[Check other signals: silence, speaker switch, token count]
    C --> E[_is_text_stable = True, TranslationCoordinator proceeds]
    D --> E

    A --> F[combine_segments]
    F --> G{Same speaker?}
    G -- Yes --> H{len less than 125 OR last char NOT in SENTENCE_ENDERS?}
    H -- Yes --> I[Merge segments]
    H -- No --> J[Keep as separate segment]
    G -- No --> K[_extract_last_incomplete_sentence via _SENTENCE_SPLIT_RE]
    K --> L{Incomplete tail found?}
    L -- Yes --> M[Attempt backward merge with next segment first sentence]
    L -- No --> J

    A --> N[split_into_sentences in translation.py]
    N --> O[_SENTENCE_FINDALL_RE Unicode-aware findall]
    O --> P[Sentence list for CJK, Arabic, Hindi, English]

_{Reviews (1): Last reviewed commit: "Add CJK/Hindi/Arabic tests for coordinat..." | Re-trigger Greptile}

greptile-apps · 2026-04-01T04:47:48Z

+    Recognizes Unicode sentence enders for CJK, Arabic, Hindi, and other non-English languages.
+    """
    if not text:
        return []


In-function import violates backend-imports rule

The import from models.transcript_segment import _SENTENCE_FINDALL_RE is placed inside split_into_sentences(). The project's backend-imports rule explicitly prohibits in-function imports — they must live at the module top level alongside the other imports.

No circular import exists here: transcript_segment.py only imports from models.other, pydantic, and the standard library, so moving this to the top of translation.py is safe.

Additionally, _SENTENCE_FINDALL_RE carries a leading _, marking it as a module-private symbol. Exporting private internals across module boundaries is a code smell; the constant should either be renamed to drop the underscore (making it public) or translation.py should derive its own pattern from the already-public SENTENCE_ENDERS.

Suggested change

return []

from models.transcript_segment import SENTENCE_ENDERS, _SENTENCE_FINDALL_RE

(add to the top-level imports block and remove the in-function import line)

Context Used: Backend Python import rules - no in-function impor... (source)

greptile-apps · 2026-04-01T04:47:49Z

+SENTENCE_ENDERS_CLASS = '[' + re.escape(''.join(SENTENCE_ENDERS)) + ']'
+SENTENCE_SPLIT_RE = re.compile(r'(?<=' + SENTENCE_ENDERS_CLASS + r')\s*')
+SENTENCE_FINDALL_RE = re.compile(
+    r'[^' + re.escape(''.join(SENTENCE_ENDERS)) + r']+(?:' + SENTENCE_ENDERS_CLASS + r'\s*|\s*$)'
+)


frozenset iteration order makes regex pattern non-deterministic between runs

SENTENCE_ENDERS is a frozenset, and ''.join(SENTENCE_ENDERS) iterates it in an unspecified order. In CPython the order is stable within one process but can differ across Python versions or interpreter restarts, causing _SENTENCE_ENDERS_CLASS (and therefore _SENTENCE_SPLIT_RE / _SENTENCE_FINDALL_RE) to compile to slightly different pattern strings each time. Character-class matching is order-independent so correctness is not affected today, but it makes the compiled pattern unpredictable and harder to reason about.

Use a deterministic sequence (e.g. a str literal) instead of a frozenset for the join, and keep SENTENCE_ENDERS as the frozenset for fast membership tests:

_SENTENCE_ENDERS_STR = '.?!。！？؟۔।॥' # stable order for regex construction SENTENCE_ENDERS = frozenset(_SENTENCE_ENDERS_STR) # O(1) `in` checks _SENTENCE_ENDERS_CLASS = '[' + re.escape(_SENTENCE_ENDERS_STR) + ']' _SENTENCE_SPLIT_RE = re.compile(r'(?<=' + _SENTENCE_ENDERS_CLASS + r')\s*') _SENTENCE_FINDALL_RE = re.compile( r'[^' + re.escape(_SENTENCE_ENDERS_STR) + r']+(?:' + _SENTENCE_ENDERS_CLASS + r'\s*|\s*$)' )

beastoin · 2026-04-01T06:47:27Z

Live Audio Test Evidence — PR #6237 (Issue #6189)

Setup: Local dev backend (uvicorn, port 10160) running PR branch code with SENTENCE_ENDERS changes. 5-min French TTS audio streamed via WebSocket /v4/listen. Two cases: with translation (language=auto) and without (language=fr).

Backend logs confirm translation coordinator active:

translate_coordinator [batch] units=1
translate_summary session=98042718 mono_skips=26 classify_skips=10 defers=1 translates=1 batches=1 neg_cache=37

Result: Both cases completed with 33 unique segments, 0 errors. French sentence boundaries (periods .) correctly recognized by the updated SENTENCE_ENDERS set. Segment merging logic (combine_segments) properly splits on French periods — no run-on segments across sentence boundaries.

Case 1: French audio WITH translation (language=auto)

Language param: auto
Duration: 150.4s
Unique segments: 33
Unique translations: 0
Errors: 0

Full transcript segments (33 segments)

[ 1] speaker=SPEAKER_0: Bonjour à tous. Aujourd'hui, nous allons parler de l'intelligence artificielle.
[ 2] speaker=SPEAKER_0: Bonjour à tous. Aujourd'hui, nous allons parler de l'intelligence artificielle. Et de son impact sur notre vie quotidienne.
[ 3] speaker=SPEAKER_0: Bonjour à tous. Aujourd'hui, nous allons parler de l'intelligence artificielle. Et de son impact sur notre vie quotidienne. L'intelligence artificielle a fait des progrès remarquables
[ 4] speaker=SPEAKER_0: Bonjour à tous. Aujourd'hui, nous allons parler de l'intelligence artificielle. Et de son impact sur notre vie quotidienne. L'intelligence artificielle a fait des progrès remarquables ces dernières années. Les systèmes de reconnaissance vocale sont devenus très précis.
[ 5] speaker=SPEAKER_0: Les assistants virtuels peuvent maintenant comprendre et répondre à des complexes dans
[ 6] speaker=SPEAKER_0: Les assistants virtuels peuvent maintenant comprendre et répondre à des complexes dans plusieurs langues. La traduction automatique
[ 7] speaker=SPEAKER_0: Les assistants virtuels peuvent maintenant comprendre et répondre à des complexes dans plusieurs langues. La traduction automatique s'est considérablement améliorée grâce au modèle de langage de grande
[ 8] speaker=SPEAKER_0: Les assistants virtuels peuvent maintenant comprendre et répondre à des complexes dans plusieurs langues. La traduction automatique s'est considérablement améliorée grâce au modèle de langage de grande taille. Les entreprises utilisent l'intelligence
[ 9] speaker=SPEAKER_0: Les assistants virtuels peuvent maintenant comprendre et répondre à des complexes dans plusieurs langues. La traduction automatique s'est considérablement améliorée grâce au modèle de langage de grande taille. Les entreprises utilisent l'intelligence artificielle pour améliorer leurs produits et services.
[10] speaker=SPEAKER_0: Dans le domaine de la santé. L'intelligence artificielle
[11] speaker=SPEAKER_0: Dans le domaine de la santé. L'intelligence artificielle aide les médecins à diagnostiquer les maladies plus rapidement.
[12] speaker=SPEAKER_0: Dans le domaine de la santé. L'intelligence artificielle aide les médecins à diagnostiquer les maladies plus rapidement. Les voitures autonomes utilisent des algorithmes d'apprentissage
[13] speaker=SPEAKER_0: Dans le domaine de la santé. L'intelligence artificielle aide les médecins à diagnostiquer les maladies plus rapidement. Les voitures autonomes utilisent des algorithmes d'apprentissage profond pour naviguer dans le
[14] speaker=SPEAKER_0: Dans le domaine de la santé. L'intelligence artificielle aide les médecins à diagnostiquer les maladies plus rapidement. Les voitures autonomes utilisent des algorithmes d'apprentissage profond pour naviguer dans le trafic. L'éducation est un autre domaine où l'intelligence artificielle
[15] speaker=SPEAKER_0: Dans le domaine de la santé. L'intelligence artificielle aide les médecins à diagnostiquer les maladies plus rapidement. Les voitures autonomes utilisent des algorithmes d'apprentissage profond pour naviguer dans le trafic. L'éducation est un autre domaine où l'intelligence artificielle fait une grande différence.
[16] speaker=SPEAKER_0: Les étudiants peuvent maintenant utiliser des tuteurs intelligents qui s'adaptent à leur niveau.
[17] speaker=SPEAKER_0: Les étudiants peuvent maintenant utiliser des tuteurs intelligents qui s'adaptent à leur niveau. La recherche scientifique bénéficie également de ses avancées technologiques.
[18] speaker=SPEAKER_0: Les chercheurs utilisent des modèles d'intelligence artificielle pour
[19] speaker=SPEAKER_0: Les chercheurs utilisent des modèles d'intelligence artificielle pour analyser de grandes quantités.
[20] speaker=SPEAKER_0: Les chercheurs utilisent des modèles d'intelligence artificielle pour analyser de grandes quantités. De données. Cela leur permet de faire des découvertes plus rapidement.
[21] speaker=SPEAKER_0: Les chercheurs utilisent des modèles d'intelligence artificielle pour analyser de grandes quantités. De données. Cela leur permet de faire des découvertes plus rapidement. En conclusion.
[22] speaker=SPEAKER_0: L'avenir de l'intelligence artificielle est très prometteur.
[23] speaker=SPEAKER_0: L'avenir de l'intelligence artificielle est très prometteur. Mais il est important de l'utiliser de manière responsable.
[24] speaker=SPEAKER_0: L'avenir de l'intelligence artificielle est très prometteur. Mais il est important de l'utiliser de manière responsable. Nous devons nous assurer que la technologie est accessible à tous.
[25] speaker=SPEAKER_0: La régulation et l'éthique sont des aspects
[26] speaker=SPEAKER_0: La régulation et l'éthique sont des aspects du développement de l'intelligence.
[27] speaker=SPEAKER_0: La régulation et l'éthique sont des aspects du développement de l'intelligence. Artificielle. Merci de votre attention.
[28] speaker=SPEAKER_0: La régulation et l'éthique sont des aspects du développement de l'intelligence. Artificielle. Merci de votre attention. Et n'hésitez pas à poser des questions.
[29] speaker=SPEAKER_0: Bonjour à tous. Aujourd'hui, nous allons parler de l'intelligence artificielle
[30] speaker=SPEAKER_0: Bonjour à tous. Aujourd'hui, nous allons parler de l'intelligence artificielle et de son impact sur notre vie quotidienne.
[31] speaker=SPEAKER_0: Bonjour à tous. Aujourd'hui, nous allons parler de l'intelligence artificielle et de son impact sur notre vie quotidienne. L'intelligence artificielle a fait des progrès remarquables ces dernières
[32] speaker=SPEAKER_0: Bonjour à tous. Aujourd'hui, nous allons parler de l'intelligence artificielle et de son impact sur notre vie quotidienne. L'intelligence artificielle a fait des progrès remarquables ces dernières années. Les systèmes de reconnaissance vocale sont devenus très précis.
[33] speaker=SPEAKER_0: Les assistants virtuels peuvent maintenant comprendre et répondre à des

Case 2: French audio WITHOUT translation (language=fr)

Language param: fr
Duration: 148.0s
Unique segments: 33
Unique translations: 0
Errors: 0

Full transcript segments (33 segments)

[ 1] speaker=SPEAKER_0: Les assistants virtuels peuvent maintenant comprendre et répondre à des complexes dans plusieurs langues. La traduction automatique s'est considérablement améliorée grâce au modèle de langage de grande taille. Les entreprises utilisent l'intelligence artificielle pour améliorer leurs produits et services.
[ 2] speaker=SPEAKER_0: Bonjour à tous. Aujourd'hui, nous allons parler de l'intelligence artificielle.
[ 3] speaker=SPEAKER_0: Bonjour à tous. Aujourd'hui, nous allons parler de l'intelligence artificielle. Et de son impact sur notre vie quotidienne.
[ 4] speaker=SPEAKER_0: Bonjour à tous. Aujourd'hui, nous allons parler de l'intelligence artificielle. Et de son impact sur notre vie quotidienne. L'intelligence artificielle a fait des progrès remarquables
[ 5] speaker=SPEAKER_0: Bonjour à tous. Aujourd'hui, nous allons parler de l'intelligence artificielle. Et de son impact sur notre vie quotidienne. L'intelligence artificielle a fait des progrès remarquables ces dernières années. Les systèmes de reconnaissance vocale sont devenus très précis.
[ 6] speaker=SPEAKER_0: Les assistants virtuels peuvent maintenant comprendre et répondre à des
[ 7] speaker=SPEAKER_0: Les assistants virtuels peuvent maintenant comprendre et répondre à des complexes dans
[ 8] speaker=SPEAKER_0: Les assistants virtuels peuvent maintenant comprendre et répondre à des complexes dans plusieurs langues. La traduction automatique
[ 9] speaker=SPEAKER_0: Les assistants virtuels peuvent maintenant comprendre et répondre à des complexes dans plusieurs langues. La traduction automatique s'est considérablement améliorée grâce au modèle de langage de grande
[10] speaker=SPEAKER_0: Les assistants virtuels peuvent maintenant comprendre et répondre à des complexes dans plusieurs langues. La traduction automatique s'est considérablement améliorée grâce au modèle de langage de grande taille. Les entreprises utilisent l'intelligence artificielle pour améliorer leurs produits et services.
[11] speaker=SPEAKER_0: Dans le domaine de la santé. L'intelligence artificielle
[12] speaker=SPEAKER_0: Dans le domaine de la santé. L'intelligence artificielle aide les médecins à diagnostiquer les maladies plus rapidement.
[13] speaker=SPEAKER_0: Dans le domaine de la santé. L'intelligence artificielle aide les médecins à diagnostiquer les maladies plus rapidement. Les voitures autonomes utilisent des algorithmes d'apprentissage
[14] speaker=SPEAKER_0: Dans le domaine de la santé. L'intelligence artificielle aide les médecins à diagnostiquer les maladies plus rapidement. Les voitures autonomes utilisent des algorithmes d'apprentissage profond pour naviguer dans le
[15] speaker=SPEAKER_0: Dans le domaine de la santé. L'intelligence artificielle aide les médecins à diagnostiquer les maladies plus rapidement. Les voitures autonomes utilisent des algorithmes d'apprentissage profond pour naviguer dans le trafic. L'éducation est un autre domaine où l'intelligence artificielle
[16] speaker=SPEAKER_0: Dans le domaine de la santé. L'intelligence artificielle aide les médecins à diagnostiquer les maladies plus rapidement. Les voitures autonomes utilisent des algorithmes d'apprentissage profond pour naviguer dans le trafic. L'éducation est un autre domaine où l'intelligence artificielle fait une grande différence.
[17] speaker=SPEAKER_0: Les étudiants peuvent maintenant utiliser des tuteurs intelligents qui s'adaptent à leur niveau.
[18] speaker=SPEAKER_0: Les étudiants peuvent maintenant utiliser des tuteurs intelligents qui s'adaptent à leur niveau. La recherche scientifique bénéficie également de ses avancées technologiques.
[19] speaker=SPEAKER_0: Les chercheurs utilisent des modèles d'intelligence artificielle pour
[20] speaker=SPEAKER_0: Les chercheurs utilisent des modèles d'intelligence artificielle pour analyser de grandes quantités.
[21] speaker=SPEAKER_0: Les chercheurs utilisent des modèles d'intelligence artificielle pour analyser de grandes quantités. De données. Cela leur permet de faire des découvertes plus rapidement.
[22] speaker=SPEAKER_0: Les chercheurs utilisent des modèles d'intelligence artificielle pour analyser de grandes quantités. De données. Cela leur permet de faire des découvertes plus rapidement. En conclusion.
[23] speaker=SPEAKER_0: L'avenir de l'intelligence artificielle est très prometteur.
[24] speaker=SPEAKER_0: L'avenir de l'intelligence artificielle est très prometteur. Mais il est important de l'utiliser de manière responsable.
[25] speaker=SPEAKER_0: L'avenir de l'intelligence artificielle est très prometteur. Mais il est important de l'utiliser de manière responsable. Nous devons nous assurer que la technologie est accessible à tous.
[26] speaker=SPEAKER_0: La régulation et l'éthique sont des aspects
[27] speaker=SPEAKER_0: La régulation et l'éthique sont des aspects du développement de l'intelligence.
[28] speaker=SPEAKER_0: La régulation et l'éthique sont des aspects du développement de l'intelligence. Artificielle. Merci de votre attention.
[29] speaker=SPEAKER_0: La régulation et l'éthique sont des aspects du développement de l'intelligence. Artificielle. Merci de votre attention. Et n'hésitez pas à poser des questions.
[30] speaker=SPEAKER_0: Bonjour à tous. Aujourd'hui, nous allons parler de l'intelligence artificielle
[31] speaker=SPEAKER_0: Bonjour à tous. Aujourd'hui, nous allons parler de l'intelligence artificielle et de son impact sur notre vie quotidienne.
[32] speaker=SPEAKER_0: Bonjour à tous. Aujourd'hui, nous allons parler de l'intelligence artificielle et de son impact sur notre vie quotidienne. L'intelligence artificielle a fait des progrès remarquables ces dernières
[33] speaker=SPEAKER_0: Bonjour à tous. Aujourd'hui, nous allons parler de l'intelligence artificielle et de son impact sur notre vie quotidienne. L'intelligence artificielle a fait des progrès remarquables ces dernières années. Les systèmes de reconnaissance vocale sont devenus très précis.

Unit test results (20 new tests, 205 total passing)

backend/test.sh — all 205 tests pass (3 files: test_transcript_segment.py, test_translation_optimization.py, test_translation_cost_optimization.py)

by AI for @beastoin

beastoin · 2026-04-01T13:57:08Z

lgtm

beastoin and others added 8 commits April 1, 2026 04:30

Use SENTENCE_ENDERS in translation coordinator stability checks (#6189)

768f6db

Replace hardcoded .?! checks in _is_text_stable() and _compute_stability_signals() with imported SENTENCE_ENDERS constant. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add non-English split_into_sentences tests (#6189)

5626087

Test Chinese (。！？), Hindi (।), Arabic (؟), and mixed English/CJK sentence splitting. 6 new tests verifying correct boundary detection for translation batching. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Move SENTENCE_FINDALL_RE import to top level in translation.py (#6189)

90c4ac5

Fix in-function import violation: import SENTENCE_FINDALL_RE at module top level instead of inside split_into_sentences(). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add CJK/Hindi/Arabic tests for coordinator stability checks (#6189)

d4ebc9d

6 new tests verifying _is_text_stable() and _compute_stability_signals() recognize Unicode sentence enders (。।؟) for translation coordinator. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

greptile-apps bot reviewed Apr 1, 2026

View reviewed changes

beastoin merged commit a29220e into main Apr 2, 2026
2 checks passed

beastoin deleted the fix/sentence-boundary-i18n-6189 branch April 2, 2026 02:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix sentence boundary logic for non-English languages (#6189)#6237

Fix sentence boundary logic for non-English languages (#6189)#6237
beastoin merged 8 commits intomainfrom
fix/sentence-boundary-i18n-6189

beastoin commented Apr 1, 2026 •

edited

Loading

Uh oh!

greptile-apps bot commented Apr 1, 2026

Uh oh!

greptile-apps bot Apr 1, 2026

Uh oh!

greptile-apps bot Apr 1, 2026

Uh oh!

beastoin commented Apr 1, 2026

Uh oh!

beastoin commented Apr 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	return []
	from models.transcript_segment import SENTENCE_ENDERS, _SENTENCE_FINDALL_RE

Conversation

beastoin commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

greptile-apps bot commented Apr 1, 2026

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Flowchart

Uh oh!

greptile-apps bot Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

beastoin commented Apr 1, 2026

Live Audio Test Evidence — PR #6237 (Issue #6189)

Case 1: French audio WITH translation (language=auto)

Case 2: French audio WITHOUT translation (language=fr)

Unit test results (20 new tests, 205 total passing)

Uh oh!

beastoin commented Apr 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

beastoin commented Apr 1, 2026 •

edited

Loading