fix(explore): keep Tier 0 code-first diversity for popular identifiers (#449) by justrach · Pull Request #457 · justrach/codedb

justrach · 2026-05-11T17:20:17Z

Summary

Fixes #449. Tier 0 of searchContent had a word_hits.len <= max_results * 2 gate that skipped the whole Tier 0 code-first/doc-second diversity pass when a posting list got large. For popular identifiers like `fooBar`, that meant markdown files with many incidental mentions could fill `max_results` before any code file was scanned.

Approach

Replace the total-hit-count gate with a code-language-only gate. The new check counts hits in code-language files specifically; when code hits stay within bounds, Tier 0's two-pass (code, then doc) runs even if total hits are large. When the population is all-code (the #427 scenario), Tier 1's existing hit-count sort takes over as before.

Test plan

zig build test passes (519/519 including the new issue-449 test).
issue-427 regression scenario still passes (verified manually).

Commits

test: failing test for #449 (Tier 0 gate bypass)
fix(explore): keep Tier 0 code/doc diversity for popular identifiers (#449)

🤖 Generated with Claude Code

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…449) Gate Tier 0 on code-language hit count instead of total posting-list length so queries where doc files dominate the word index still get the code-first pass, while all-code popular queries (issue-427) still fall through to Tier 1's hit-count sort. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d1ea27adf4

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-11T17:22:44Z

+            var code_hit_count: usize = 0;
+            for (word_hits) |hit| {
+                const hp = self.word_index.hitPath(hit);
+                if (hp.len > 0 and !isDocLanguage(detectLanguage(hp))) code_hit_count += 1;
+            }


Short-circuit code-hit counting once over the Tier 0 gate

For popular identifiers that have more than max_results * 2 code hits, this loop still walks the entire posting list just to decide Tier 0 should be skipped, and Tier 1 immediately walks the same word_hits again to build hits_per_file. The old total-hit gate was an O(1) length check in this path, so large all-code or mostly-code queries now pay an extra full posting-list traversal before taking the same Tier 1 path; break as soon as code_hit_count exceeds the threshold to avoid regressing common searchContent calls.

Useful? React with 👍 / 👎.

github-actions · 2026-05-11T17:22:45Z

Benchmark Regression Report

Thresholds: 10.00% and 50,000 ns absolute delta

NOISE means the percentage threshold was exceeded, but the absolute delta was too small to fail CI.

Tool	Base (ns)	Head (ns)	Delta	Abs Delta (ns)	Status
`codedb_bundle`	559550	544898	-2.62%	-14652	OK
`codedb_changes`	54771	53701	-1.95%	-1070	OK
`codedb_deps`	8980	10636	+18.44%	+1656	NOISE
`codedb_edit`	6213	5961	-4.06%	-252	OK
`codedb_find`	60989	59310	-2.75%	-1679	OK
`codedb_hot`	100374	100264	-0.11%	-110	OK
`codedb_outline`	287492	305259	+6.18%	+17767	OK
`codedb_read`	94288	99367	+5.39%	+5079	OK
`codedb_search`	198851	240607	+21.00%	+41756	NOISE
`codedb_snapshot`	295711	293050	-0.90%	-2661	OK
`codedb_status`	213582	209390	-1.96%	-4192	OK
`codedb_symbol`	61131	63391	+3.70%	+2260	OK
`codedb_tree`	65134	67785	+4.07%	+2651	OK
`codedb_word`	69601	70622	+1.47%	+1021	OK

…entifiers (#449) (#457)" This reverts commit 26e29c5.

justrach and others added 2 commits May 12, 2026 00:41

test: failing test for #449 (Tier 0 gate bypass)

f9232c3

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

chatgpt-codex-connector Bot reviewed May 11, 2026

View reviewed changes

justrach merged commit 26e29c5 into main May 11, 2026
1 check passed

justrach deleted the fix/449-tier0-code-first branch May 11, 2026 17:38

justrach added a commit that referenced this pull request May 11, 2026

Revert "fix(explore): keep Tier 0 code-first diversity for popular id…

1d15f3c

…entifiers (#449) (#457)" This reverts commit 26e29c5.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(explore): keep Tier 0 code-first diversity for popular identifiers (#449)#457

fix(explore): keep Tier 0 code-first diversity for popular identifiers (#449)#457
justrach merged 2 commits into
mainfrom
fix/449-tier0-code-first

justrach commented May 11, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot May 11, 2026

Uh oh!

github-actions Bot commented May 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

justrach commented May 11, 2026

Summary

Approach

Test plan

Commits

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 11, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented May 11, 2026

Benchmark Regression Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant