Fix #57: drop dead _app- filter and refresh hardcoded /api/bleed fallbacks#58
Fix #57: drop dead _app- filter and refresh hardcoded /api/bleed fallbacks#58dpashutskii wants to merge 1 commit intoScrappyCocco:masterfrom
Conversation
… /api/bleed fallbacks
HLTB rotated the search endpoint again (/api/finder -> /api/bleed). Runtime
discovery in `SearchInformations.__extract_search_url_script` already adapts
to that, but two things were keeping searches from succeeding on a cold call:
1. `send_website_request_getcode` and its async sibling were filtering scripts
by `'_app-' in script['src']` on the first pass. HLTB used to bundle the
relevant code under `_app-*.js`, but the modern (Turbopack) build emits
opaque chunk names like `0-~-0up.q3_p0.js`. The filter never matches today,
so the first pass always returned None and forced every search through a
redundant retry loop.
2. The hardcoded fallbacks (`SEARCH_URL`, `SearchAuthToken.search_url`) were
pinned to `/api/s` — three rename cycles behind. Refreshed to `/api/bleed`.
Changes:
* `send_website_request_getcode` / `async_send_website_request_getcode` now
iterate every `<script src>` tag and stop at the first one yielding a
`search_url`. The `parse_all_scripts` parameter is preserved for backward
compatibility but no longer changes behaviour. (The upstream callers
`send_web_request` / `send_async_web_request` already had a retry pattern
that called the same method twice — that pattern remains and is now just
redundant rather than load-bearing.)
* The async version also fixes a small unrelated bug: it previously called
`return None` on the first script that returned non-200, which prevented
trying any subsequent script. Now skips and continues, matching the sync
version's intent.
* `SEARCH_URL` and `SearchAuthToken.search_url` updated to `/api/bleed`.
* New `tests/test_search_url_extraction.py` with 5 hermetic unit tests
covering the regex against the real `/api/bleed` chunk shape, a
hypothetical future rename, GET-only fetches, no-fetch scripts, and
versioned endpoints. Existing integration tests are untouched.
* `setup.py` version bumped 1.0.21 -> 1.0.22.
End-to-end verified locally: `HowLongToBeat().search("The Witcher 3")` returns
5 results with "The Witcher 3: Wild Hunt" as best match (51.6h main story).
Refs ScrappyCocco#57.
ScrappyCocco
left a comment
There was a problem hiding this comment.
I appreciate the support a lot but I would like you to clean the code a little
If you don't have the time I can do it as soon as I have the time
Additionally you should add your test as ignored by codecov.yml (I think it's the only place in which the test file is referenced)
|
|
||
| class SearchAuthToken: | ||
| search_url = "api/s" | ||
| search_url = "api/bleed" |
There was a problem hiding this comment.
This was intentionally /s so that if the "retrieve url" doesn't work it's easily noticeable because this variable was not changed
| # current as of 2026-05). The runtime extraction in | ||
| # send_website_request_getcode is the source of truth — this is just | ||
| # a backstop. | ||
| SEARCH_URL = BASE_URL + "api/bleed" |
There was a problem hiding this comment.
Same reason as above + remove the comments
I think both here and above no change is required
| else: | ||
| if resp is None or resp.status != 200: | ||
| return None | ||
| resp_text = await resp.text() |
There was a problem hiding this comment.
This is a nice change but it inverted the "if" condition for no reason, redo the condition as before please so the diff is more noticeable
|
Also are you sure it's not working? See comment in the issue #57 (comment) - maybe I don't need these changes |
Summary
Fixes #57.
HLTB rotated the search endpoint again (
/api/finder→/api/bleed). Runtime discovery inSearchInformationsadapts to whatever endpoint name shows up in the JS, but the existing_app-*.jsscript-name filter on the first pass ofsend_website_request_getcode(and its async sibling) hasn't matched anything since HLTB moved to Turbopack — chunks now have opaque names like0-~-0up.q3_p0.js. So every search was failing the first pass, falling back to the second pass, and either succeeding slowly or hitting the stale/api/shardcoded fallback.What changes
_app-*.jsscript filter insend_website_request_getcodeandasync_send_website_request_getcode. The methods now iterate every<script src>tag and stop at the first one that yields asearch_url. Theparse_all_scriptsparameter is preserved for backward compatibility (no longer changes behaviour, documented in docstring).SEARCH_URLandSearchAuthToken.search_urlupdated from/api/sto/api/bleed(current as of 2026-05-07).return Noneto break out of the loop entirely. Now skips and continues to the next script, matching the sync version's intent.tests/test_search_url_extraction.py— 5 hermetic tests covering the discovery regex against the real/api/bleedchunk shape, a hypothetical future rename, GET-only fetches, scripts with no fetch call, and versioned endpoints (/api/bleed/v2). These don't hit the network so they're safe to run in CI without flakes.1.0.21→1.0.22.The integration tests in
tests/test_normal_request*.pyare unchanged.Verification
$ python3 -m unittest tests.test_search_url_extraction -v test_extracts_a_hypothetical_future_endpoint ... ok test_extracts_current_api_bleed_endpoint ... ok test_extracts_root_path_from_versioned_endpoint ... ok test_ignores_get_only_fetches ... ok test_returns_none_when_no_post_fetch_present ... ok Ran 5 tests in 0.000s OKEnd-to-end against live HLTB:
Why minimal-diff
I kept the
parse_all_scriptsparameter on both*_getcodemethods rather than removing it, so any downstream caller that was passingTrue/Falsedirectly continues to work. Same goes for the duplicate retry call insend_web_request/send_async_web_request— it's now wasteful but harmless, and removing it would expand the diff. Happy to drop those if you'd prefer a cleaner cut.🤖 Generated with Claude Code