P5-search-show: add aggregate search and show commands#22
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #22 +/- ##
==========================================
Coverage 100.00% 100.00%
==========================================
Files 112 118 +6
Lines 8817 9235 +418
==========================================
+ Hits 8817 9235 +418 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Heads-up: PR #21 (P5-calendar) has merged to
|
| File | What to do | Why |
|---|---|---|
animedex/agg/_fanout.py |
Accept main verbatim (PR #21's 186-line hardened version). Discard the 160-line version on this branch. |
main has the _HTTP_STATUS_RE regex tightening (status-introducing token required), the _normalise_items dict-envelope + raise-on-unknown-shape changes, and the ApiError import. None of those are in this branch's version. |
animedex/models/aggregate.py |
Accept main verbatim (239-line version with MergedAnime, merge_diagnostics, id_conflicts). |
This branch's version (140 lines) predates MergedAnime, merge_diagnostics, and id_conflicts. The new search/show commands consume AggregateResult and AggregateSourceStatus, both of which exist in main's version unchanged. |
animedex/entry/aggregate.py |
Accept main (new file on main, absent on this branch). |
Carries _report_failures / _report_merge_diagnostics / _finish / _emit. The new search/show CLI entries should register through this shared helper rather than ship parallel implementations in entry/search.py and entry/show.py — see "Post-merge integration" below. |
animedex/render/tty.py |
Accept main as the base, then re-apply any Search / Show / merge-result render helpers this branch added (around 500 lines of diff between main and PR head). |
main has the schedule timeline render and the stream parameter; this branch has whatever render helpers search / show need. Both sides have to coexist. |
animedex/diag/selftest.py |
Union the two _SELFTEST_TARGETS lists. |
main registers animedex.agg.calendar, animedex.agg._fanout, animedex.utils.timezone, plus the runtime-dep smoke targets (anyascii, unidecode, tzdata, python-dateutil, jaconv). This branch will need to add animedex.agg.search, animedex.agg.show, animedex.agg._prefix_id, animedex.agg._type_routes, and the matching entry-module targets. |
requirements.txt |
Accept main (12-line version with the 5 new deps). |
main has anyascii, jaconv, unidecode, tzdata, python-dateutil. This branch's reverting of them is purely a baseline artefact, not a deliberate choice. |
animedex/utils/timezone.py |
Accept main (new file). |
Used by the schedule path on main; harmless for search/show but kept for the substrate's shared posture. |
tools/generate_spec.py |
Accept main (unidecode HIDDEN_IMPORTS + PACKAGE_DATAS additions). |
This branch doesn't touch the spec generator; no real conflict. |
docs/source/api_doc/agg/_fanout.rst |
Accept main (covers PR #21's helper functions). |
Auto-generated; regen with make rst_auto after the merge if anything looks off. |
docs/source/api_doc/models/aggregate.rst |
Accept main (covers MergedAnime / merge_diagnostics). |
Same as above. |
AGENTS.md |
This branch's "clarify data language policy" commit needs to land cleanly on top of main's unchanged AGENTS.md. Should be a clean three-way merge with no manual reconcile. |
Post-merge integration (after conflicts resolve)
These are not git conflicts but they are integration tasks that have to happen before this PR is review-ready:
- Move
entry/search.py+entry/show.pycontents intoentry/aggregate.py(or have them import the shared helpers). Currently the new CLI entries duplicate the partial-failure inform logic that already exists onmainatentry/aggregate.py:_report_failuresand_report_merge_diagnostics. The same_finish/_emithelpers should serve all four aggregate commands (season/schedule/search/show). - Thread the
streamparameter throughsearch/showrender paths:render_tty(result, stream=sys.stdout)peranimedex/entry/_cli_factory.py:80so the Unicode/ASCII fallback inherits correctly. - Drop any
_BACKEND_POLICYentry forsearch/showif one was added — per issue P5-search-show: animedex search + animedex show (entity infrastructure + multi-source flagship) #19 spec, these are top-level aggregate commands, not pseudo-backends. - Update this PR's body to reflect the post-merge state: drop any references to "PR P5-search-show: add aggregate search and show commands #22's own
_fanout.py" / "PR P5-search-show: add aggregate search and show commands #22's ownAggregateResult"; note that this round is annotate-only and cross-source merging onsearchis deferred to a follow-up slice (per the issue P5-search-show: animedex search + animedex show (entity infrastructure + multi-source flagship) #19 scope-clarification comment).
Verification checklist (run after the merge resolves)
Before re-requesting review:
-
git merge origin/maincompleted; no<<<<<<<markers anywhere in the tree. -
git diff main..HEAD -- animedex/agg/_fanout.py animedex/models/aggregate.py animedex/utils/timezone.py animedex/entry/aggregate.py requirements.txt tools/generate_spec.pyreturns empty — those aremain's files verbatim. -
git diff main..HEAD -- animedex/agg/search.py animedex/agg/show.py animedex/agg/_prefix_id.py animedex/agg/_type_routes.pyshows the new search/show substrate. -
entry/search.pyandentry/show.pyeither don't exist (logic moved toentry/aggregate.py) or are thin wrappers thatimportfromentry/aggregate.py. -
_SELFTEST_TARGETSinanimedex/diag/selftest.pycontains bothmain's calendar entries (animedex.agg.calendar,animedex.utils.timezone, etc.) and the new search/show entries. -
make formatclean. -
make testgreen. Specifically, thetest/agg/test_calendar.pytest suite still passes against the merged substrate (sanity check that the merge didn't break PR Add calendar aggregate commands #21's regressions), and the newtest/agg/test_search.py/test/agg/test_show.py(or wherever they land) pass. -
make rst_autoregeneratesdocs/source/api_doc/cleanly; commit the diff. -
make build && make test_clipasses — frozen binary still smokes the calendar substrate plus the new commands. -
python -m animedex.policy.lint animedex/green. -
grep -rE 'Phase [0-9]|AGENTS[. ]§|Reviewer review|Refs #' animedex/ tools/returns zero. - CLI dual-path test:
animedex search anime "frieren" --jsonand the TTY default both work; same foranimedex show anime anilist:154587. - Partial-failure: with a synthetic AniList-429 fixture,
animedex search anime "x"returns the healthy backends' rows, exits 0, and emits the same_report_failuresstderr line shape thatseasonalready does. - Backward-compat check:
animedex season 2024 spring --jsonstill produces the mergedMergedAnimeenvelope shape from PR Add calendar aggregate commands #21 (the merge should not have regressed the calendar substrate).
What this PR does not do (deferred to follow-up)
- Cross-source merging on
searchoutput. Same-anime-across-backends returns as multiple rows in this slice. The follow-up issue will refactor_merge_season_itemsinto a generic_merge_aggregate_itemswith per-entity scorers, calibrate match thresholds for the non-anime entity types, and wiresearchto merge by default. See the scope-clarification comment on issue P5-search-show: animedex search + animedex show (entity infrastructure + multi-source flagship) #19 for the rationale.
Once the merge is clean and the checklist is green, please re-request review. If any of the conflicts above are non-obvious, post the specific file and the conflict shape on this PR before resolving — happy to walk through individual files rather than have you guess.
|
Merge update summary: I merged current What I kept from PR #21/main:
What I kept from PR #22/search-show:
Conflict handling:
Verification completed:
|
|
Search merge-readiness inventory based on the current PR state, the aggregate search route table, backend model fields, and captured fixtures. This is research-only; no code changes were made for this inventory. Important framing: mergeability should not be limited to shared IDs. Future search merge work should support calibrated non-ID matching against an adjudicated groundtruth, while still preserving source attribution and partial-source behavior when one upstream is unavailable or rate-limited.
Recommended priority for future implementation:
Implementation-shape notes for later:
|
|
CJK+E search-friendliness follow-up inventory and proposed expansion plan. This is research/planning only; no code changes were made for this comment. The current search backends are not equally friendly to CJK+English users. The observed issue is not simply that one upstream does or does not accept CJK input. Some upstreams return empty results for CJK queries, while others return non-empty but unrelated results, which is more dangerous for aggregate search. Observed behavior from small live CLI samples:
Concrete sample notes:
Follow-up validation with manually simulated aggregate query variants confirmed that the expansion strategy is useful, but only when it is source- and type-specific:
The follow-up tests also clarified an important limitation: generic transliteration helpers already available in the project ( Design boundary for implementation:
Proposed aggregate search expansion shape:
A compact implementation sketch for the aggregate-only layer: @dataclass(frozen=True)
class SearchVariant:
query: str
reason: str
confidence_hint: int
def plan_variants(entity_type: str, source: str, q: str, aliases: AliasLookup) -> list[SearchVariant]:
variants = [SearchVariant(q, "original", 100)]
if entity_type in {"person", "character"} and looks_like_cjk_name(q):
variants.extend(space_cjk_name_variants(q))
variants.extend(aliases.romanized_name_variants(q))
if entity_type in {"anime", "manga", "studio", "publisher"}:
variants.extend(aliases.canonical_title_or_name_variants(q))
if source == "ann":
variants = [v for v in variants if looks_latin(v.query)]
if source == "mangadex" and entity_type == "manga":
variants = prefer_original_cjk_title_first(variants)
return dedupe_variants(variants)
def score_candidate(entity_type: str, source: str, input_q: str, variant: SearchVariant, row: object) -> MatchEvidence:
fields = searchable_fields(row)
score = 0
reasons = []
if exact_or_normalized_match(input_q, fields):
score += 100
reasons.append("input-match")
if exact_or_normalized_match(variant.query, fields):
score += variant.confidence_hint
reasons.append(f"variant-match:{variant.reason}")
if source in {"kitsu", "shikimori"} and entity_type == "character" and not reasons:
score -= 80
reasons.append("cjk-character-untrusted-nonmatch")
if source == "ann" and not looks_latin(variant.query):
score -= 100
reasons.append("ann-non-latin-query")
return MatchEvidence(score=score, matched_query=variant.query, reasons=reasons)Source-specific rule refinements from the validation:
Suggested implementation order:
|
|
Follow-up on whether the PR #21 season anime merge can be reused directly for Short answer: the PR #21 identity work is useful as a base, but Tested queries included Observed behavior:
What should be reused from PR #21:
What should not be reused directly:
Recommended search-anime merge shape: # Conceptual flow only; backend search APIs stay raw.
rows = aggregate_search_rows(type="anime", query=q)
rows = attach_query_variant_evidence(rows, input_query=q)
projected, passthrough, diagnostics = project_to_common_anime(rows)
projected = dedupe_same_source_same_external_id(projected)
groups = group_by_shared_ids_then_search_title_context(projected)
groups = apply_search_specific_guards(groups, query=q)
return build_merged_search_result(groups, passthrough, diagnostics)Required follow-ups before enabling search anime merge:
Conclusion: PR #21 provides useful identity primitives and output-shape conventions, but |
Search-quality eval — empirical findings (R&D for
|
| type | backend | seeds | raw hit |
any-variant hit | net gain | top rescue variants |
|---|---|---|---|---|---|---|
anime |
anilist |
50 | 46 (92%) | 50 (100%) | +4 | alias_english (+4), first_token (+3) |
anime |
ann |
50 | 9 (18%) | 36 (72%) | +27 | alias_english (+20), first_token (+14) |
anime |
jikan |
50 | 49 (98%) | 50 (100%) | +1 | alias_english (+1), first_token (+1), alias_romaji (+1) |
anime |
kitsu |
50 | 48 (96%) | 49 (98%) | +1 | alias_english (+1), first_token (+1) |
anime |
shikimori |
50 | 46 (92%) | 49 (98%) | +3 | alias_english (+2), first_token (+1) |
character |
anilist |
50 | 37 (74%) | 39 (78%) | +2 | first_token (+2) |
character |
jikan |
50 | 48 (96%) | 48 (96%) | +0 | — |
character |
kitsu |
50 | 22 (44%) | 24 (48%) | +2 | first_token (+2) |
character |
shikimori |
50 | 47 (94%) | 47 (94%) | +0 | — |
manga |
anilist |
33 | 18 (55%) | 32 (97%) | +14 | first_token (+13), alias_english (+4) |
manga |
jikan |
33 | 31 (94%) | 32 (97%) | +1 | alias_english (+1), alias_romaji (+1) |
manga |
kitsu |
33 | 25 (76%) | 30 (91%) | +5 | alias_english (+4), first_token (+3), alias_romaji (+1) |
manga |
mangadex |
33 | 31 (94%) | 32 (97%) | +1 | first_token (+1) |
manga |
shikimori |
33 | 26 (79%) | 30 (91%) | +4 | first_token (+3), alias_english (+2), alias_romaji (+1) |
person |
anilist |
24 | 21 (88%) | 21 (88%) | +0 | — |
person |
jikan |
24 | 18 (75%) | 20 (83%) | +2 | first_token (+2) |
person |
kitsu |
24 | 0 (0%) | 0 (0%) | +0 | — |
person |
shikimori |
24 | 23 (96%) | 24 (100%) | +1 | first_token (+1) |
publisher |
shikimori |
50 | 50 (100%) | 50 (100%) | +0 | — |
studio |
anilist |
19 | 12 (63%) | 12 (63%) | +0 | — |
studio |
jikan |
19 | 19 (100%) | 19 (100%) | +0 | — |
studio |
kitsu |
19 | 8 (42%) | 8 (42%) | +0 | — |
studio |
shikimori |
19 | 19 (100%) | 19 (100%) | +0 | — |
Patterns:
-
alias_englishis the most useful rescue when a row carries an English alias. ANN anime is the standout case: raw 18% → any-variant 72%, withalias_englishalone rescuing 20 of the 41 raw misses. ANN'ssubstring_searchindexes English-leaning titles; when the seed is romaji the substring lookup just misses.Concrete rescued seeds (raw → alias_english):
'Ore wa Seikan Kokka no Akutoku Ryoushu! ' → "I'm the Evil Lord of an Intergalactic Empire!" (pos 0) 'Gimai Seikatsu' → 'Days with My Stepsister' (pos 0) 'Maou 2099' → 'DEMON LORD 2099' (pos 0) 'Yesterday wo Utatte' → 'SING "YESTERDAY" FOR ME' (pos 0) 'Zom 100: Zombie ni Naru Made ni Shitai 100 no Koto' → 'Zom 100: Bucket List of the Dead' (pos 0) -
first_tokenis the biggest single rescue on AniList manga search, raw 55% → any-variant 97%, withfirst_tokenalone covering 13 of 15 raw misses.The pattern: AniList GraphQL
Media(search: ..., type: MANGA)is a substring match that fails on titles with parenthetical or dash-suffix annotations harvested from MangaDex fixtures:'Chainsaw Man (Official Colored)' raw 0 results → 'Chainsaw' (pos 0) 'Spy × Family (Official Colored)' raw 0 results → 'Spy' (pos 1) 'Goodnight Punpun' raw 0 results → 'Goodnight' (pos 3) 'Jujutsu Kaisen - The Possession (Doujinshi)' raw 0 results → 'Jujutsu' (pos 0) 'Boku no Hero Academia (Official Colored)' raw 0 results → 'Boku' (pos 0)A targeted "strip parenthetical / dash-suffix" client-side preprocessor would catch the same misses without needing the first-token fallback.
Combined view of the rescued seeds across both standout cells:
-
first_tokenalso rescues a small fraction of people searches (Jikan/peopleand Shikimori/people), where the seed is "Surname Forename" and the upstream's substring filter behaves differently between full-name and first-token inputs. -
anyascii/unidecodeproduced near-zero rescues on real seeds. They occasionally collapseSōsō no Furīren→Sosou no Furirenstyle noise that no upstream's index normalises through. They remain useful in the merge path (matching catalog rows post-fetch) but should not be fan-out queries. -
nfkcandjaconv_normrescue only rare full-width-digit / full-width-punctuation cases in the sample. Worth keeping cheaply because they are zero-cost on already-NFKC strings. -
Studio and publisher rescues are zero for two different reasons depending on the cell:
kitsu producersandshikimori studios/shikimori publishersgo through theall_items=Truepath in_type_routes.py(fetch full catalogue, locally filter), so the local-filter step already canonicalises lowercase + substring and variants add nothing.anilist studio_search(63% raw) andkitsu producer search(42% raw) are real upstream search routes with no rescue from any variant. The misses are mostly "MAPPA"-class brand-name lookups where the upstream's tokeniser is the binding constraint; only fan-in to the catalogue itself would fix them.
Key finding 3 — _anime_match_score ports to search anime, conditionally
We took the 1,461 high-confidence (score ≥ 9.5) pairs from test/fixtures/aggregate/season_matrix/candidates/ (the PR #21 adjudication corpus) and re-scored under four input shapes that simulate what a search fan-out actually sees:
Green = pairs above the 70 merge threshold, orange = near (50-69, would not merge today). The first two scenarios merge everything because the _shared_external_id shortcut returns 1000 instantly. Strip the IDs and keep title + year + season: still 82% merge. Strip the context entirely and it collapses to 0% because title alone caps at 55. The implication is that the existing scorer ports as-is, provided the search fan-out preserves either the cross-source ID (best) or the year/season (good enough).
| scenario | merge (≥70) | near (50-69) | partial (30-49) | miss (<30) | raw p50 (title+ctx, pre-clamp) |
|---|---|---|---|---|---|
calendar_with_context (status quo) |
100% | 0% | 0% | 0% | 1000 (ID shortcut) |
search_with_ids_no_context (e.g. AniList/Jikan search rows carry mal_id) |
100% | 0% | 0% | 0% | 1000 (ID shortcut) |
search_partial_context_no_ids (title + season + season_year) |
82% | 17% | 1% | 1% | 80 |
search_title_only (title block only) |
0% | 98% | 1% | 1% | 55 (title cap) |
Interpretation:
- The
_shared_external_idshortcut returns 1000 and dominates whenever both sides carry the same external ID — AniList search already returnsidMal, Jikan search returnsmal_idnatively, MangaDex search returns its own UUID. These pairs port without any change to the scorer. - The
_context_match_scoreadds up to ~28 (year 15 + season 10 + format 6 + episodes 6 + aired ≤14d 8) and reliably pushes title-only above the 70 threshold whenever justseason_yearis present (year 15 + title 55 = 70). AniList/Jikan/Kitsu all exposeseason_year(or its date-derived equivalent) inside their search payload. - Pure title-only matches (Shikimori's bare anime search row, ANN's substring row) cap at 55 and never cross the 70 threshold. The substrate does not need a new scorer — it needs the search fan-out to either keep the year field on Shikimori rows (its API has it; we just have to map it) or, for ANN, accept that ANN rows merge on shared ID only.
The takeaway: the follow-up issue's "extend merge to search" slice does not need a new scorer. It needs the search fan-out to retain season_year on every row it can (the easy 82% → 100% bridge) and accept a small Shikimori/ANN miss rate as the cost of not hydrating. Lowering the threshold below 70 would let "Naruto" and "Naruto Shippuden" merge, which is the kind of cross-source false positive the calendar adjudication corpus was specifically built to avoid.
Key finding 4 — concrete search enhancement (code-ready)
These are the cheapest wins from the variant gain table:
4.A — AniList manga: client-side strip-parenthetical preprocessor
The clearest single win in the eval. AniList's GraphQL manga search misses on titles with annotation suffixes harvested from MangaDex. A 5-line preprocessor before the GraphQL call rescues 13/15 raw misses (55% → 97% any-variant; full retry chain not required):
# animedex/backends/anilist/__init__.py — manga_search wrapper
import re
_PAREN_OR_DASH_SUFFIX = re.compile(r"\s*(?:\([^)]*\)|[-–—:]\s*.*|~[^~]*~)\s*$")
def _normalise_manga_query(q: str) -> str:
cleaned = q
while True:
new = _PAREN_OR_DASH_SUFFIX.sub("", cleaned).strip()
if new == cleaned or not new:
break
cleaned = new
return cleaned or qThe CLI's contract stays unchanged (queries pass through verbatim from the user); this is a per-backend search-query normaliser, not a global rewrite. The animedex-style way to ship this is a _normalise_manga_query hook called inside manga_search before the GraphQL Media(search: ...) parameter is set, with the original query also tried as a tie-break if the normalised query returns nothing the user might want.
4.B — ANN anime: retry with sibling-source English alias (aggregate-level)
ANN's 18% → 72% jump in the eval came from feeding the seed's English alias to ANN's substring search. In production, the user typing animedex search anime "frieren" has no English alias yet — but aggregate.search calls AniList/Jikan/Kitsu in parallel, and the first one to land usually provides one. The fallback can therefore live in the aggregate fan-out:
# animedex/agg/search.py — illustrative shape, not a literal diff
def _call_ann_with_alias_fallback(route, query, limit, *, sibling_results):
rows = call_search_route(route, query=query, limit=limit)
if rows:
return rows
# Harvest a few English-looking aliases from earlier-completed backends.
aliases = _english_aliases_from(sibling_results)
for alias in aliases[:2]:
if alias != query:
rows = call_search_route(route, query=alias, limit=limit)
if rows:
return rows
return rowsCost: one extra HTTP call to ANN only when the first attempt returned zero rows AND at least one sibling backend already produced a result. Net expected gain on the 50-seed eval: +27 rescued anime per 50 seeds.
4.C — Person/character: token-degraded retry on empty result
# animedex/agg/search.py — illustrative
def _person_search_with_fallback(route, query, limit):
rows = call_search_route(route, query=query, limit=limit)
if rows:
return rows
tokens = query.split()
if len(tokens) > 1:
return call_search_route(route, query=tokens[0], limit=limit)
return rowsExpected gain in the eval: jikan first_token rescues 2/6 missed person seeds; shikimori first_token rescues 1/1; kitsu/anilist character first_token rescues 2/13 + 2/26 missed character seeds. Small absolute numbers, but the implementation is a 4-line fallback so the ROI is good.
4.D — Merge for search anime (follow-up slice, not this PR)
This commits to the maintainer's scoped guidance ("cross-source merging on search lands in a follow-up slice that reuses PR #21's substrate directly"):
# animedex/agg/search.py — proposed search-merge wrapper for the follow-up slice
from animedex.agg.calendar import _anime_match_score, _merge_season_items
from animedex.models.aggregate import AggregateResult
def _maybe_merge_anime(result: AggregateResult) -> AggregateResult:
# Same dispatcher as _merge_season_items, but on a search result rather than
# a season result. _anime_match_score already returns 1000 on shared mal_id,
# which is the only safe shortcut for search inputs that lack full season
# context (the eval at score >= 9.5 says 100% of high-confidence pairs share
# mal_id; the title-only score caps at 55 < threshold so this is safe).
return _merge_season_items(result)Note: this only fires for type=anime. Manga/character/person/studio/publisher would need their own per-entity scorers + adjudication corpora before they merge. Surface only the merged shape for anime in the follow-up; ship the other types annotate-only.
Reproducing the numbers
# 1. Build seed inventory from real fixtures
PYTHONPATH=. python3 -m tools.search_eval.seeds --limit 50 --output tools/search_eval/seeds.json
# 2. Full eval (requires PP_AO3 or an equivalent proxy to avoid bucket saturation)
export PP_AO3='<your-proxy-url>'
PYTHONPATH=. python3 -m tools.search_eval.run_eval \
--types anime,manga,character,person,studio,publisher \
--seeds-per-type 50 --max-variants 6 --workers 8 --limit 10 \
--out tools/search_eval/runs/$(date -u +%Y%m%dT%H%M%S)
# 3. Per-variant rescue rates from a completed run
PYTHONPATH=. python3 -m tools.search_eval.variant_gain --run tools/search_eval/runs/<dir>
# 4. Merge-fitness analysis (fixture-only, no network needed)
PYTHONPATH=. python3 -m tools.search_eval.merge_fitnessThe full eval landed in 846 seconds with --workers 8 through the brightdata proxy; AniList's 0.5 req/s sustained bucket is the dominant wall-clock cost. The artefacts under tools/search_eval/runs/full50/ (results-*.jsonl, summary.md, meta.json, variant_gain.md) are reproducible from the commands above and are auditable from the JSONL rows.
Out-of-scope for this PR (intentionally)
- Wiring
_merge_season_itemsintosearch animeis the follow-up issue's slice per maintainer guidance — this comment is the empirical motivation, not the implementation. PR P5-search-show: add aggregate search and show commands #22 ships annotate-onlysearchas agreed. - The two client bugs (1.1 Jikan, 1.2 Kitsu) are small and self-contained; recommended as a quick follow-up PR rather than expanding PR P5-search-show: add aggregate search and show commands #22's scope (AGENTS §15.1). Both fixes are 5-10 line patches with HTTP-mocked regression tests against captured
null-alternate_names / 400-filter fixtures. - The variant-rescue enhancements (4.A–4.C) also belong in a follow-up. The clean cut from this PR is: PR P5-search-show: add aggregate search and show commands #22 lands the aggregate search/show surface; the follow-up adds the per-backend query-rewriting hooks and the AniList/ANN fallbacks now that we have measured numbers to size the implementation by.





Summary
This PR adds the Phase 5 aggregate entity surface:
animedex search <type> <q>andanimedex show <type> <prefix:id>. The implementation introduces the shared aggregate substrate underanimedex/agg/: a backend fan-out helper, prefix-id parser, type route table, Python API modules, and a sharedAggregateResultenvelope.searchkeeps backend-rich rows lossless and annotates every row with_sourceplus_prefix_id;showroutes a prefixed ID back to the owning backend and rejects invalid type/backend pairs before any HTTP call.Closes #19. Refs #1 Phase 5. Coordinates with #18 through the shared
_fanout.pyandAggregateResultsubstrate.Demo
The GIF was rendered from
docs/source/_static/gifs/search_show.tapeafter prewarming the local cache from committed fixtures withtools/fixtures/prewarm_aggregate_cache.py, so the demo does not depend on live upstream availability and does not include normal-example error output.Examples
Failure-Mode Example
This block is intentionally a failure-mode example. The
annrow uses the synthetictest/fixtures/ann/substring_search/17-synthetic-503.yamlfixture; healthy sources still return rows, stdout remains a valid aggregate envelope, stderr reports the failed source, and the command exits 0 because at least one source succeeded.Source Matrix
animemangacharacterpersonstudiopublisherFixture Notes
Most new live fixtures were captured on 2026-05-11 UTC; the reused MangaDex Berserk fixture was captured on 2026-05-07 UTC. The PR includes real upstream fixtures for the positive routes and clearly named synthetic fixtures for failure-path coverage.
Availability observations from fixture capture are intentionally documented instead of hidden: AniList typed
Frierenanime search currently returned an emptymedialist, AniList typedBerserkmanga search also yielded zero aggregate demo rows, and Kitsu people search forMiyazakireturned upstream 400 because that free-text filter is not accepted. The working examples therefore use positive captured rows from the other available sources while the aggregate layer still reports per-source failures or empty results honestly.Verification
make formatmake format-checkpython -m animedex.policy.lint animedex/python -m animedex --helppython -m animedex search --helppython -m animedex show --helppython -m animedex selftest(111 passed, 0 failed)70 passed)make test(2821 passed, 84 skipped, total coverage99%)make rst_automake docs(warning-free Sphinx build)make build && make test_cli(4 passed, 0 failed)git diff --checkAbstraction Proposal
Implementation follows the aggregate proposal in #19: #19 (comment). The shared-substrate constraints were also posted on #18 to avoid duplicate fan-out/envelope implementations: #18 (comment).