Skip to content

Fix #57: drop dead _app- filter and refresh hardcoded /api/bleed fallbacks#58

Open
dpashutskii wants to merge 1 commit intoScrappyCocco:masterfrom
dpashutskii:fix/api-bleed-endpoint
Open

Fix #57: drop dead _app- filter and refresh hardcoded /api/bleed fallbacks#58
dpashutskii wants to merge 1 commit intoScrappyCocco:masterfrom
dpashutskii:fix/api-bleed-endpoint

Conversation

@dpashutskii
Copy link
Copy Markdown

Summary

Fixes #57.

HLTB rotated the search endpoint again (/api/finder/api/bleed). Runtime discovery in SearchInformations adapts to whatever endpoint name shows up in the JS, but the existing _app-*.js script-name filter on the first pass of send_website_request_getcode (and its async sibling) hasn't matched anything since HLTB moved to Turbopack — chunks now have opaque names like 0-~-0up.q3_p0.js. So every search was failing the first pass, falling back to the second pass, and either succeeding slowly or hitting the stale /api/s hardcoded fallback.

What changes

  1. Drop the _app-*.js script filter in send_website_request_getcode and async_send_website_request_getcode. The methods now iterate every <script src> tag and stop at the first one that yields a search_url. The parse_all_scripts parameter is preserved for backward compatibility (no longer changes behaviour, documented in docstring).
  2. Refresh hardcoded fallbacks: SEARCH_URL and SearchAuthToken.search_url updated from /api/s to /api/bleed (current as of 2026-05-07).
  3. Bug fix in the async path: previously a non-200 response from any single script caused return None to break out of the loop entirely. Now skips and continues to the next script, matching the sync version's intent.
  4. New unit tests in tests/test_search_url_extraction.py — 5 hermetic tests covering the discovery regex against the real /api/bleed chunk shape, a hypothetical future rename, GET-only fetches, scripts with no fetch call, and versioned endpoints (/api/bleed/v2). These don't hit the network so they're safe to run in CI without flakes.
  5. Version bump 1.0.211.0.22.

The integration tests in tests/test_normal_request*.py are unchanged.

Verification

$ python3 -m unittest tests.test_search_url_extraction -v
test_extracts_a_hypothetical_future_endpoint ... ok
test_extracts_current_api_bleed_endpoint ... ok
test_extracts_root_path_from_versioned_endpoint ... ok
test_ignores_get_only_fetches ... ok
test_returns_none_when_no_post_fetch_present ... ok
Ran 5 tests in 0.000s
OK

End-to-end against live HLTB:

>>> from howlongtobeatpy import HowLongToBeat
>>> results = HowLongToBeat().search("The Witcher 3")
>>> max(results, key=lambda r: r.similarity).game_name
'The Witcher 3: Wild Hunt'
>>> max(results, key=lambda r: r.similarity).main_story
51.61

Why minimal-diff

I kept the parse_all_scripts parameter on both *_getcode methods rather than removing it, so any downstream caller that was passing True/False directly continues to work. Same goes for the duplicate retry call in send_web_request / send_async_web_request — it's now wasteful but harmless, and removing it would expand the diff. Happy to drop those if you'd prefer a cleaner cut.

🤖 Generated with Claude Code

… /api/bleed fallbacks

HLTB rotated the search endpoint again (/api/finder -> /api/bleed). Runtime
discovery in `SearchInformations.__extract_search_url_script` already adapts
to that, but two things were keeping searches from succeeding on a cold call:

1. `send_website_request_getcode` and its async sibling were filtering scripts
   by `'_app-' in script['src']` on the first pass. HLTB used to bundle the
   relevant code under `_app-*.js`, but the modern (Turbopack) build emits
   opaque chunk names like `0-~-0up.q3_p0.js`. The filter never matches today,
   so the first pass always returned None and forced every search through a
   redundant retry loop.

2. The hardcoded fallbacks (`SEARCH_URL`, `SearchAuthToken.search_url`) were
   pinned to `/api/s` — three rename cycles behind. Refreshed to `/api/bleed`.

Changes:

* `send_website_request_getcode` / `async_send_website_request_getcode` now
  iterate every `<script src>` tag and stop at the first one yielding a
  `search_url`. The `parse_all_scripts` parameter is preserved for backward
  compatibility but no longer changes behaviour. (The upstream callers
  `send_web_request` / `send_async_web_request` already had a retry pattern
  that called the same method twice — that pattern remains and is now just
  redundant rather than load-bearing.)
* The async version also fixes a small unrelated bug: it previously called
  `return None` on the first script that returned non-200, which prevented
  trying any subsequent script. Now skips and continues, matching the sync
  version's intent.
* `SEARCH_URL` and `SearchAuthToken.search_url` updated to `/api/bleed`.
* New `tests/test_search_url_extraction.py` with 5 hermetic unit tests
  covering the regex against the real `/api/bleed` chunk shape, a
  hypothetical future rename, GET-only fetches, no-fetch scripts, and
  versioned endpoints. Existing integration tests are untouched.
* `setup.py` version bumped 1.0.21 -> 1.0.22.

End-to-end verified locally: `HowLongToBeat().search("The Witcher 3")` returns
5 results with "The Witcher 3: Wild Hunt" as best match (51.6h main story).

Refs ScrappyCocco#57.
@ScrappyCocco ScrappyCocco self-assigned this May 7, 2026
@ScrappyCocco ScrappyCocco added the enhancement New feature or request label May 7, 2026
Copy link
Copy Markdown
Owner

@ScrappyCocco ScrappyCocco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I appreciate the support a lot but I would like you to clean the code a little
If you don't have the time I can do it as soon as I have the time

Additionally you should add your test as ignored by codecov.yml (I think it's the only place in which the test file is referenced)


class SearchAuthToken:
search_url = "api/s"
search_url = "api/bleed"
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was intentionally /s so that if the "retrieve url" doesn't work it's easily noticeable because this variable was not changed

# current as of 2026-05). The runtime extraction in
# send_website_request_getcode is the source of truth — this is just
# a backstop.
SEARCH_URL = BASE_URL + "api/bleed"
Copy link
Copy Markdown
Owner

@ScrappyCocco ScrappyCocco May 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same reason as above + remove the comments
I think both here and above no change is required

else:
if resp is None or resp.status != 200:
return None
resp_text = await resp.text()
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a nice change but it inverted the "if" condition for no reason, redo the condition as before please so the diff is more noticeable

@ScrappyCocco
Copy link
Copy Markdown
Owner

Also are you sure it's not working? See comment in the issue #57 (comment) - maybe I don't need these changes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

It does not work anymore

2 participants