Skip to content

Fix infinite 401 retry loop when app logs resubscribe fails#7055

Merged
craigmichaelmartin merged 5 commits intomainfrom
03-19-fix_infinite_401_retry_loop_when_app_logs_resubscribe_fails
Mar 26, 2026
Merged

Fix infinite 401 retry loop when app logs resubscribe fails#7055
craigmichaelmartin merged 5 commits intomainfrom
03-19-fix_infinite_401_retry_loop_when_app_logs_resubscribe_fails

Conversation

@craigmichaelmartin
Copy link
Copy Markdown
Contributor

@craigmichaelmartin craigmichaelmartin commented Mar 19, 2026

Summary

  • Wrap onResubscribe() in handleFetchAppLogsError with try-catch so resubscribe failures use a 60s retry interval instead of propagating as unhandled exceptions
  • Fixes infinite 401 retry loop in dev/poll-app-logs.ts (catch block retried every 5s with the same expired JWT)
  • Fixes unhandled crash in logs-command/render-json-logs.ts
  • Fixes silent polling stop in logs-command/usePollAppLogs.ts
  • Adds tests for the resubscribe failure path in both poll-app-logs and render-json-logs

WHY are these changes introduced?

WHAT is this pull request doing?

How to test your changes?

Post-release steps

Measuring impact

How do we know this change was effective? Please choose one:

  • n/a - this doesn't need measurement, e.g. a linting rule or a bug-fix
  • Existing analytics will cater for this addition
  • PR includes analytics changes to measure impact

Checklist

  • I've considered possible cross-platform impacts (Mac, Linux, Windows)
  • I've considered possible documentation changes

Copy link
Copy Markdown
Contributor Author

craigmichaelmartin commented Mar 19, 2026

@craigmichaelmartin craigmichaelmartin marked this pull request as ready for review March 19, 2026 19:51
@github-actions
Copy link
Copy Markdown
Contributor

We detected some changes at packages/*/src and there are no updates in the .changeset.
If the changes are user-facing, run pnpm changeset add to track your changes and include them in the next release CHANGELOG.

Caution

DO NOT create changesets for features which you do not wish to be included in the public changelog of the next CLI release.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Mar 19, 2026

Coverage report

St.
Category Percentage Covered / Total
🟢 Statements 82.2% 14967/18208
🟡 Branches 74.56% 7394/9917
🟢 Functions 81.3% 3779/4648
🟢 Lines 82.59% 14151/17135

Test suite run success

3938 tests passing in 1514 suites.

Report generated by 🧪jest coverage report action from 59476a1

Copy link
Copy Markdown
Contributor Author

@craigmichaelmartin craigmichaelmartin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


Review assisted by pair-review

Comment thread packages/app/src/cli/services/app-logs/dev/poll-app-logs.test.ts
Comment thread packages/app/src/cli/services/app-logs/logs-command/render-json-logs.test.ts Outdated
Comment thread packages/app/src/cli/services/app-logs/utils.ts
@craigmichaelmartin
Copy link
Copy Markdown
Contributor Author

Code Review from River 🌊

Overall assessment: Targeted, correct fix for the immediate symptom. Should ship — it'll reduce load by ~99% for this failure mode. But a follow-up is needed.

What this PR gets right

  • Correctly identifies the unhandled exception in the resubscribe path as the root cause of the retry storm
  • Uses the existing throttle interval (60s) which is reasonable
  • Good test coverage for both the poll and JSON render code paths
  • Small, focused change — minimal risk

What remains unfixed

  1. The CLI never refreshes the Identity token. The resubscribeCallback just calls subscribeToAppLogs again with the same expired bearer token. So the CLI will keep retrying every 60s indefinitely — failing every time — until the user manually restarts. The session becomes a zombie.

  2. No user-facing error message. The outputDebug is only visible in debug mode. A regular user will see their log stream silently stop with no indication of what happened. Consider adding an outputWarn like: "Your session token has expired. Please restart shopify app logs."

  3. No max retry count or escalation. Consider capping retries and then surfacing a clear error asking the user to restart.

Observability context

From Observe data (last 1h), the App Management GraphQL endpoint is receiving ~22K 401 requests/hour — all of them are AppLogsSubscribe, from only ~8 distinct CLI sessions. This confirms the tight retry loop this PR addresses. The 60s backoff should reduce this to ~480 requests/hour from those same clients.

Copy link
Copy Markdown
Contributor

@dmerand dmerand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dropping in a bug that looks like it needs addressing.

@@ -109,17 +110,26 @@ export interface AppLogsOptions {

export const handleFetchAppLogsError = async (
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resubscribeFailed is not enough to drive the new counter reset logic. A false value currently means either "resubscribe succeeded" or "this was some other error path, like 429/500, so no resubscribe happened". All three callers reset their consecutive-failure counter on false, which means unrelated errors between two 401 resubscribe failures can clear the counter and prevent the cutoff from triggering when it should.

Either return a distinct state for successful resubscribe vs. non-401 errors, or have the callers reset only when a fresh token is actually returned. This affects [[file:packages/app/src/cli/services/app-logs/dev/poll-app-logs.ts:107-117]], [[file:packages/app/src/cli/services/app-logs/logs-command/render-json-logs.ts:44-54]], and [[file:packages/app/src/cli/services/app-logs/logs-command/ui/components/hooks/usePollAppLogs.ts:169-179]].

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved — replaced resubscribeFailed: boolean with a tri-state resubscribeResult: 'succeeded' | 'failed' | 'not_attempted'. Callers now only reset the consecutive-failure counter on 'succeeded' and leave it unchanged on 'not_attempted' (429/500/other non-401 errors), so unrelated errors no longer clear the counter.

When the JWT for app log polling expires, handleFetchAppLogsError calls
onResubscribe() to get a fresh token. If onResubscribe() itself throws
(network error, expired session), the exception propagates unhandled,
causing the dev polling loop to retry every 5s with the same expired
JWT — an infinite 401 loop.

Wrap onResubscribe() in a try-catch so failures fall back to a 60s
retry interval instead of propagating. This is a 12x reduction in 401
rate per affected client and allows self-healing when the underlying
issue resolves.
@craigmichaelmartin craigmichaelmartin force-pushed the 03-19-fix_infinite_401_retry_loop_when_app_logs_resubscribe_fails branch 2 times, most recently from 45f916a to 9f2f644 Compare March 24, 2026 15:18
@craigmichaelmartin craigmichaelmartin force-pushed the 03-19-fix_infinite_401_retry_loop_when_app_logs_resubscribe_fails branch from 9f2f644 to 59476a1 Compare March 24, 2026 15:29
Copy link
Copy Markdown
Contributor

River's assessment is wrong in one thing, subscribeToAppLogs does try to refresh the identity token if expired using the unauthorizedHandler

@craigmichaelmartin craigmichaelmartin added this pull request to the merge queue Mar 26, 2026
Merged via the queue into main with commit e36b6ab Mar 26, 2026
29 checks passed
@craigmichaelmartin craigmichaelmartin deleted the 03-19-fix_infinite_401_retry_loop_when_app_logs_resubscribe_fails branch March 26, 2026 15:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants