fix: retry redis conn in the bg and fail open otherwise#5377
Conversation
|
Warning This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
This stack of pull requests is managed by Graphite. Learn more about stacking. |
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Repository UI Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
c5c3602 to
86ca217
Compare
6e1f0ac to
d62a21e
Compare
|
Aren't we over complicating this? Why do we do it differently here and introduce redos as potential failure point? |
|
the api doesnt tho? the api also requires redis to be up when starting otherwise it will error Its just handles redis being down in some moment in time that is not the startup |
86ca217 to
f57a27c
Compare
d62a21e to
3f550d3
Compare
3f550d3 to
3bdc149
Compare
f57a27c to
06fcc58
Compare
06fcc58 to
9aa4e45
Compare
3bdc149 to
60fb344
Compare
9aa4e45 to
98c4bef
Compare
60fb344 to
9763159
Compare
9763159 to
4cbd947
Compare
98c4bef to
26fec89
Compare
4cbd947 to
a5f4056
Compare
a5f4056 to
e64329e
Compare
e64329e to
0ddba89
Compare
59db08a to
bda6c5e
Compare
0ddba89 to
7be0117
Compare
bda6c5e to
82121dd
Compare
7be0117 to
832fd10
Compare
82121dd to
9675019
Compare
832fd10 to
a777cd3
Compare
a777cd3 to
c8d828b
Compare
9675019 to
79fdfc8
Compare
c8d828b to
8912607
Compare
79fdfc8 to
14563fb
Compare
14563fb to
108b602
Compare
8912607 to
8339690
Compare

What does this PR do?
Adds resilient middleware engine handling to prevent service failures when Redis is unavailable. The middleware engine now fails closed with a 503 Service Unavailable response when Redis connectivity is lost, and automatically retries connection in the background with exponential backoff.
Introduces a new
ResilientEvaluatorwrapper that atomically swaps between unavailable and working engine states. When Redis is configured but fails to connect, the service returns 503 errors instead of crashing, and continues attempting to reconnect until successful.Adds a new error code
EngineUnavailablewith appropriate HTTP status mapping and Prometheus metrics for monitoring engine unavailability events.Fixes #5365
Type of change
How should this be tested?
sentinel_engine_unavailable_totalPrometheus metric during unavailabilityChecklist
Required
pnpm buildpnpm fmtmake fmton/godirectoryconsole.logsgit pull origin mainAppreciated