Skip to content

fix: retry redis conn in the bg and fail open otherwise#5377

Draft
Flo4604 wants to merge 1 commit into03-11-feat_cache_parsed_policies_extract_timer_pkgfrom
03-18-fix_retry_redis_conn_in_the_bg_and_fail_open_otherwise
Draft

fix: retry redis conn in the bg and fail open otherwise#5377
Flo4604 wants to merge 1 commit into03-11-feat_cache_parsed_policies_extract_timer_pkgfrom
03-18-fix_retry_redis_conn_in_the_bg_and_fail_open_otherwise

Conversation

@Flo4604
Copy link
Member

@Flo4604 Flo4604 commented Mar 18, 2026

What does this PR do?

Adds resilient middleware engine handling to prevent service failures when Redis is unavailable. The middleware engine now fails closed with a 503 Service Unavailable response when Redis connectivity is lost, and automatically retries connection in the background with exponential backoff.

Introduces a new ResilientEvaluator wrapper that atomically swaps between unavailable and working engine states. When Redis is configured but fails to connect, the service returns 503 errors instead of crashing, and continues attempting to reconnect until successful.

Adds a new error code EngineUnavailable with appropriate HTTP status mapping and Prometheus metrics for monitoring engine unavailability events.

Fixes #5365

Type of change

  • Enhancement (small improvements)
  • Bug fix (non-breaking change which fixes an issue)
  • Chore (refactoring code, technical debt, workflow improvements)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

How should this be tested?

  • Start sentinel service with Redis URL configured but Redis server unavailable
  • Verify requests return 503 with "middleware engine temporarily unavailable" message
  • Start Redis server and verify engine automatically recovers
  • Monitor sentinel_engine_unavailable_total Prometheus metric during unavailability
  • Test with empty Redis URL to ensure pass-through mode still works

Checklist

Required

  • Filled out the "How to test" section in this PR
  • Read Contributing Guide
  • Self-reviewed my own code
  • Commented on my code in hard-to-understand areas
  • Ran pnpm build
  • Ran pnpm fmt
  • Ran make fmt on /go directory
  • Checked for warnings, there are none
  • Removed all console.logs
  • Merged the latest changes from main onto my branch with git pull origin main
  • My changes don't cause any responsiveness issues

Appreciated

  • If a UI change was made: Added a screen recording or screenshots to this PR
  • Updated the Unkey Docs if changes were necessary

Copy link
Member Author

Flo4604 commented Mar 18, 2026

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more

This stack of pull requests is managed by Graphite. Learn more about stacking.

@Flo4604 Flo4604 mentioned this pull request Mar 18, 2026
19 tasks
@vercel
Copy link

vercel bot commented Mar 18, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
dashboard Ready Ready Preview, Comment Mar 24, 2026 1:23pm

Request Review

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 18, 2026

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 3ae8b072-77bd-416d-aaa0-8524da8d3f62

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch 03-18-fix_retry_redis_conn_in_the_bg_and_fail_open_otherwise

Comment @coderabbitai help to get the list of available commands and usage tips.

@Flo4604 Flo4604 changed the base branch from 03-18-fix_use_correct_health_handlers to graphite-base/5377 March 18, 2026 16:10
@Flo4604 Flo4604 force-pushed the graphite-base/5377 branch from c5c3602 to 86ca217 Compare March 18, 2026 16:11
@Flo4604 Flo4604 force-pushed the 03-18-fix_retry_redis_conn_in_the_bg_and_fail_open_otherwise branch from 6e1f0ac to d62a21e Compare March 18, 2026 16:11
@chronark
Copy link
Collaborator

Aren't we over complicating this?
Redos being down should be fine, our api handles it gracefully too and falls back to local evaluation

Why do we do it differently here and introduce redos as potential failure point?

Copy link
Member Author

Flo4604 commented Mar 19, 2026

the api doesnt tho? the api also requires redis to be up when starting otherwise it will error

Its just handles redis being down in some moment in time that is not the startup
pkg/counter/redis.go L78

@Flo4604 Flo4604 force-pushed the graphite-base/5377 branch from 86ca217 to f57a27c Compare March 19, 2026 11:14
@Flo4604 Flo4604 force-pushed the 03-18-fix_retry_redis_conn_in_the_bg_and_fail_open_otherwise branch from d62a21e to 3f550d3 Compare March 19, 2026 11:14
@Flo4604 Flo4604 force-pushed the 03-18-fix_retry_redis_conn_in_the_bg_and_fail_open_otherwise branch from 3f550d3 to 3bdc149 Compare March 23, 2026 12:08
@Flo4604 Flo4604 force-pushed the graphite-base/5377 branch from f57a27c to 06fcc58 Compare March 23, 2026 12:08
@Flo4604 Flo4604 changed the base branch from graphite-base/5377 to 03-11-feat_cache_parsed_policies_extract_timer_pkg March 23, 2026 12:09
@Flo4604 Flo4604 force-pushed the 03-11-feat_cache_parsed_policies_extract_timer_pkg branch from 06fcc58 to 9aa4e45 Compare March 23, 2026 12:26
@Flo4604 Flo4604 force-pushed the 03-18-fix_retry_redis_conn_in_the_bg_and_fail_open_otherwise branch from 3bdc149 to 60fb344 Compare March 23, 2026 12:26
@Flo4604 Flo4604 force-pushed the 03-11-feat_cache_parsed_policies_extract_timer_pkg branch from 9aa4e45 to 98c4bef Compare March 23, 2026 12:44
@Flo4604 Flo4604 force-pushed the 03-18-fix_retry_redis_conn_in_the_bg_and_fail_open_otherwise branch from 60fb344 to 9763159 Compare March 23, 2026 12:44
@Flo4604 Flo4604 force-pushed the 03-18-fix_retry_redis_conn_in_the_bg_and_fail_open_otherwise branch from 9763159 to 4cbd947 Compare March 23, 2026 12:59
@Flo4604 Flo4604 force-pushed the 03-11-feat_cache_parsed_policies_extract_timer_pkg branch from 98c4bef to 26fec89 Compare March 23, 2026 12:59
@Flo4604 Flo4604 force-pushed the 03-18-fix_retry_redis_conn_in_the_bg_and_fail_open_otherwise branch from 4cbd947 to a5f4056 Compare March 23, 2026 13:08
@Flo4604 Flo4604 force-pushed the 03-18-fix_retry_redis_conn_in_the_bg_and_fail_open_otherwise branch from a5f4056 to e64329e Compare March 23, 2026 13:12
@Flo4604 Flo4604 force-pushed the 03-18-fix_retry_redis_conn_in_the_bg_and_fail_open_otherwise branch from e64329e to 0ddba89 Compare March 23, 2026 13:38
@Flo4604 Flo4604 force-pushed the 03-11-feat_cache_parsed_policies_extract_timer_pkg branch from 59db08a to bda6c5e Compare March 23, 2026 13:41
@Flo4604 Flo4604 force-pushed the 03-18-fix_retry_redis_conn_in_the_bg_and_fail_open_otherwise branch from 0ddba89 to 7be0117 Compare March 23, 2026 13:41
@Flo4604 Flo4604 force-pushed the 03-11-feat_cache_parsed_policies_extract_timer_pkg branch from bda6c5e to 82121dd Compare March 23, 2026 13:44
@Flo4604 Flo4604 force-pushed the 03-18-fix_retry_redis_conn_in_the_bg_and_fail_open_otherwise branch from 7be0117 to 832fd10 Compare March 23, 2026 13:44
@Flo4604 Flo4604 force-pushed the 03-11-feat_cache_parsed_policies_extract_timer_pkg branch from 82121dd to 9675019 Compare March 24, 2026 09:38
@Flo4604 Flo4604 force-pushed the 03-18-fix_retry_redis_conn_in_the_bg_and_fail_open_otherwise branch from 832fd10 to a777cd3 Compare March 24, 2026 09:38
@Flo4604 Flo4604 force-pushed the 03-18-fix_retry_redis_conn_in_the_bg_and_fail_open_otherwise branch from a777cd3 to c8d828b Compare March 24, 2026 12:08
@Flo4604 Flo4604 force-pushed the 03-11-feat_cache_parsed_policies_extract_timer_pkg branch from 9675019 to 79fdfc8 Compare March 24, 2026 12:08
@Flo4604 Flo4604 force-pushed the 03-18-fix_retry_redis_conn_in_the_bg_and_fail_open_otherwise branch from c8d828b to 8912607 Compare March 24, 2026 12:22
@Flo4604 Flo4604 force-pushed the 03-11-feat_cache_parsed_policies_extract_timer_pkg branch from 79fdfc8 to 14563fb Compare March 24, 2026 12:22
@Flo4604 Flo4604 force-pushed the 03-11-feat_cache_parsed_policies_extract_timer_pkg branch from 14563fb to 108b602 Compare March 24, 2026 13:21
@Flo4604 Flo4604 force-pushed the 03-18-fix_retry_redis_conn_in_the_bg_and_fail_open_otherwise branch from 8912607 to 8339690 Compare March 24, 2026 13:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants