Problem
pgflow's exponential backoff has two critical issues:
- Integer overflow crash in SQL at attempt ≥31
- Unbounded retry delays allowing delays up to 68 years
Timeline to Overflow
| Attempt |
Single Delay |
Total Time Elapsed |
Status |
| 1 |
2 sec |
2 sec |
✓ Works |
| 10 |
17 min |
34 min |
✓ Works |
| 20 |
12 days |
24 days |
⚠️ Frozen for weeks |
| 30 |
34 years |
68 years |
⚠️ Unreasonable |
| 31 |
- |
- |
❌ CRASH: integer overflow |
Industry Comparison
| System |
Max Delay Cap |
Default Max Attempts |
Pattern |
| Temporal |
100 seconds |
∞ |
Exponential capped |
| Trigger.dev |
30 seconds |
3 |
Exponential capped |
| Graphile Worker |
~6 hours |
24 |
exp(least(10, attempt)) |
| Inngest |
Not specified |
4 |
Exponential with jitter |
| pgflow SQL |
NONE ❌ |
3 (unlimited) |
2^attempt uncapped |
| pgflow edge-worker |
300s default 68 years max ❌ |
50 max |
Exponential with cap |
Root Cause
SQL Core (Primary Issue)
Function: pkgs/core/schemas/0030_utilities.sql (lines 21-32)
create or replace function pgflow.calculate_retry_delay(
base_delay numeric,
attempts_count int
)
returns int
as $$
select floor(base_delay * power(2, attempts_count))::int
$$;
Problems:
- No exponent cap:
power(2, 31) exceeds PostgreSQL int32 limit (2,147,483,647)
- No result cap: delays can grow to years/decades
- Schema allows unlimited
opt_max_attempts (no upper bound)
Used by: fail_task() function when task fails and needs retry
Edge Worker (Secondary Issue)
Function: pkgs/edge-worker/src/queue/createQueueWorker.ts (lines 91-101)
export function calculateRetryDelay(attempt: number, config: RetryConfig): number {
switch (config.strategy) {
case 'fixed':
return config.baseDelay;
case 'exponential': {
const delay = config.baseDelay * Math.pow(2, attempt - 1);
return Math.min(delay, config.maxDelay ?? 300);
}
}
}
Current behavior:
- ✅ Default
maxDelay: 300 seconds (5 minutes) is reasonable
- ✅ Hard limit of 50 attempts prevents JavaScript overflow
- ✅ JavaScript handles
Infinity gracefully (caps via Math.min)
- ❌ Validation allows
maxDelay up to 2,147,483,647 seconds (68 years!)
Validation: pkgs/edge-worker/src/queue/validateRetryConfig.ts (lines 48-66)
// Prevents overflow
if (config.limit > 50) {
throw new Error('For exponential strategy, limit must not exceed 50');
}
// But allows absurd maxDelay!
const MAX_POSTGRES_INTERVAL_SECONDS = 2147483647; // 68 years
if (config.maxDelay > MAX_POSTGRES_INTERVAL_SECONDS) {
throw new Error(`maxDelay must not exceed ${MAX_POSTGRES_INTERVAL_SECONDS} seconds`);
}
Edge-worker delay comparison:
| Attempt |
Default (maxDelay=300) |
Unrestricted (maxDelay=2147483647) |
| 1 |
3 sec |
3 sec |
| 5 |
48 sec |
48 sec |
| 10 |
300 sec (capped) |
25.6 min |
| 20 |
300 sec (capped) |
12.1 days |
| 30 |
300 sec (capped) |
33.4 years |
| 50 |
300 sec (capped) |
37.8 million years |
Proposed Solution
1. Fix SQL calculate_retry_delay (High Priority)
Replace line 31 in pkgs/core/schemas/0030_utilities.sql:
select least(86400, greatest(0, floor(base_delay * power(2, least(attempts_count, 30)))::bigint))::int
What this does:
- Caps exponent at 30 → prevents overflow
- Caps result at 86400 seconds (24 hours) → prevents unbounded delays
- Maintains exponential backoff for attempts 1-30
Fixes 7/9 problems with one line!
2. Update Edge-Worker Validation (Recommended)
Option A: Hard cap at 24 hours (strict)
Update pkgs/edge-worker/src/queue/validateRetryConfig.ts line 64:
const MAX_RETRY_DELAY_SECONDS = 86400; // 24 hours (align with SQL)
if (config.maxDelay > MAX_RETRY_DELAY_SECONDS) {
throw new Error(`maxDelay must not exceed ${MAX_RETRY_DELAY_SECONDS} seconds (24 hours)`);
}
Option B: Warning + PostgreSQL limit (permissive)
if (config.maxDelay > 86400) {
console.warn(`maxDelay of ${config.maxDelay}s exceeds recommended maximum of 86400s (24 hours)`);
}
if (config.maxDelay > MAX_POSTGRES_INTERVAL_SECONDS) {
throw new Error(`maxDelay must not exceed ${MAX_POSTGRES_INTERVAL_SECONDS} seconds`);
}
3. Add Database Constraint (Optional)
In pkgs/core/schemas/0050_tables_definitions.sql:
constraint opt_max_attempts_is_reasonable check (opt_max_attempts >= 0 and opt_max_attempts <= 100)
New Tests Required
SQL Tests
Add pkgs/core/supabase/tests/functions/calculate_retry_delay.test.sql:
- Normal exponential growth works (attempts 1, 3, 5 with various base_delay)
- Attempt 31 does not crash (overflow prevention)
- Attempt 50 does not crash
- Attempt 100 does not crash
- Attempt 30 caps at 86400 seconds
- Attempt 31 caps at 86400 seconds
- Large base_delay + high attempt caps at 86400
- base_delay=0 returns 0 delay
- Attempt 0 returns base_delay (edge case)
Edge-Worker Tests (if validation updated)
Update existing tests in pkgs/edge-worker/tests/:
- maxDelay=86400 is accepted (boundary)
- maxDelay=86401 is rejected (over limit)
- maxDelay=300 works as default
- Warning logged when maxDelay > 86400 (if using Option B)
Impact
Before fix:
- ❌ SQL crashes at attempt 31 with
ERROR: integer out of range
- ❌ Workflows frozen for weeks/years
- ❌ Failed tasks consume resources indefinitely
- ❌ No alerts until catastrophic failure
- ❌ Edge-worker accepts 68-year delays
After fix:
- ✅ No overflow at any attempt count
- ✅ Maximum 24-hour retry delay (industry-aligned)
- ✅ Tasks fail explicitly after reasonable time
- ✅ Faster failure detection and recovery
- ✅ Consistent limits between SQL and edge-worker
Problem
pgflow's exponential backoff has two critical issues:
Timeline to Overflow
Industry Comparison
exp(least(10, attempt))2^attemptuncapped68 years max ❌
Root Cause
SQL Core (Primary Issue)
Function:
pkgs/core/schemas/0030_utilities.sql(lines 21-32)Problems:
power(2, 31)exceeds PostgreSQL int32 limit (2,147,483,647)opt_max_attempts(no upper bound)Used by:
fail_task()function when task fails and needs retryEdge Worker (Secondary Issue)
Function:
pkgs/edge-worker/src/queue/createQueueWorker.ts(lines 91-101)Current behavior:
maxDelay: 300seconds (5 minutes) is reasonableInfinitygracefully (caps viaMath.min)maxDelayup to 2,147,483,647 seconds (68 years!)Validation:
pkgs/edge-worker/src/queue/validateRetryConfig.ts(lines 48-66)Edge-worker delay comparison:
Proposed Solution
1. Fix SQL
calculate_retry_delay(High Priority)Replace line 31 in
pkgs/core/schemas/0030_utilities.sql:What this does:
Fixes 7/9 problems with one line!
2. Update Edge-Worker Validation (Recommended)
Option A: Hard cap at 24 hours (strict)
Update
pkgs/edge-worker/src/queue/validateRetryConfig.tsline 64:Option B: Warning + PostgreSQL limit (permissive)
3. Add Database Constraint (Optional)
In
pkgs/core/schemas/0050_tables_definitions.sql:New Tests Required
SQL Tests
Add
pkgs/core/supabase/tests/functions/calculate_retry_delay.test.sql:Edge-Worker Tests (if validation updated)
Update existing tests in
pkgs/edge-worker/tests/:Impact
Before fix:
ERROR: integer out of rangeAfter fix: