Skip to content

Latest commit

 

History

History
102 lines (77 loc) · 3.9 KB

File metadata and controls

102 lines (77 loc) · 3.9 KB

Troubleshooting Guide

Recent Fixes Applied (January 2025)

Issue: ETIMEDOUT Errors and API Unavailable Messages

Root Causes Identified:

  1. Double slash in API URLs (https://api.getforge.com//deployed_sites.json)
  2. No timeout configuration for fetch requests
  3. No retry mechanism for transient network errors
  4. Insufficient error handling and logging

Fixes Applied:

1. Fixed Double Slash URL Issue

  • Problem: FORGE_API env var with trailing slash + code adding another slash = //
  • Solution: Added URL normalization to remove trailing slashes before constructing API URLs
  • Files Modified: deleter.js, src/middlewares/common.js

2. Added Timeout Configuration

  • Added: 10-second timeout for all API requests
  • Implementation: Using AbortController for proper timeout handling
  • Files Modified: deleter.js, src/middlewares/common.js

3. Implemented Retry Mechanism

  • Added: Exponential backoff retry logic for network errors and HTTP 5xx errors
  • Configuration:
    • deleter.js: 3 retries with 1s base delay
    • common.js: 2 retries with 500ms base delay
  • Retry Conditions: ETIMEDOUT, ECONNRESET, ENOTFOUND, AbortError, HTTP 5xx errors (502, 503, 504, etc.)
  • Note: 502 Bad Gateway and other 5xx errors are now retried automatically

4. Enhanced Error Handling

  • Added: Better error categorization and logging
  • Added: Specific handling for timeout vs. other network errors
  • Added: Validation for missing FORGE_API environment variable
  • Added: Request timeout protection (30s) to prevent hanging requests
  • Added: Automatic retry for HTTP 5xx errors (502, 503, 504)

Environment Variables Required

FORGE_API

  • Purpose: Base URL for Forge API endpoints
  • Expected Value: https://api.getforge.com (without trailing slash)
  • Current Issue: If set with trailing slash, ensure it's https://api.getforge.com/
  • Note: Code now handles both with and without trailing slash

Disk Space Issues

Critical: Disk space issues can cause 504 Gateway Timeout errors. If disk usage exceeds 90%, the server may be unable to write cache files or logs.

Quick Check:

df -h /  # Check disk usage
du -sh /tmp/cache/  # Check cache directory size

Immediate Actions:

  1. Run the disk cleanup script: sudo /app/scripts/disk-space-cleanup.sh
  2. Manually clean old cache: find /tmp/cache -type f -mtime +1 -delete
  3. Check for large log files: find /var/log -type f -size +100M -ls

See README_DISK_SPACE.md for detailed disk space management documentation.

Monitoring

Key Log Messages to Watch:

Success Indicators:

  • 🔄 Retrying API request in Xms - Retry mechanism working
  • Normal site cleaning messages without errors

Error Indicators:

  • 💣 BOOM! Connection timeout to Forge API after retries - Network connectivity issues
  • ⚠️ FORGE_API environment variable not set - Configuration issue
  • 💣 BOOM! Request timeout to Forge API after retries - Server response issues
  • 💣 BOOM! API returned 502 error after retries - Upstream API issues (now retried automatically)
  • ⏱️ Timeout loading site meta for X after 30000ms - Request timeout protection triggered

Recommended Actions:

  1. If ETIMEDOUT errors persist: Check network connectivity from server to api.getforge.com
  2. If retries are frequent: Consider increasing timeout values or check API health
  3. If "API unavailable" errors: Verify FORGE_API environment variable is set correctly

Configuration Tweaks

Timeout Settings

// In deleter.js
const API_TIMEOUT = 10000 // 10 seconds
const MAX_RETRIES = 3
const RETRY_DELAY_BASE = 1000 // 1 second

// In src/middlewares/common.js  
const API_TIMEOUT = 10000 // 10 seconds
const MAX_RETRIES = 2
const RETRY_DELAY_BASE = 500 // 500ms

To Adjust Timeouts:

Modify the constants at the top of each file if 10 seconds proves insufficient.