The CJA SDR Generator now supports high-performance batch processing with 3-4x throughput improvement through parallel multiprocessing.
# Process a single data view
cja_auto_sdr dv_677ea9291244fd082f02dd42# Automatically triggers parallel batch processing
cja_auto_sdr dv_12345 dv_67890 dv_abcdeNote: When you provide multiple data view IDs, the script automatically enables parallel processing with auto-detected workers (based on CPU cores and workload). The --batch flag is optional.
# Explicitly use batch mode with custom settings
cja_auto_sdr --batch dv_12345 dv_67890 dv_abcde dv_11111 --workers 8DATA_VIEW_ID [DATA_VIEW_ID ...]- One or more data view IDs (must start withdv_)
| Argument | Description | Default |
|---|---|---|
--profile NAME / -p |
Use named profile from ~/.cja/orgs/<NAME>/ |
None |
--batch |
Explicitly enable batch mode (optional with multiple data views) | Auto-detect (parallel if multiple data views) |
--workers N |
Number of parallel workers (1-256), or auto for intelligent detection |
auto |
--log-format FORMAT |
Log output format: text or json (for Splunk/ELK/CloudWatch) |
text |
--output-dir PATH |
Output directory for generated files | Current directory |
--config-file PATH |
Path to CJA configuration file (ignored if --profile used) |
config.json |
--continue-on-error |
Continue processing if one data view fails | Stop on first error |
--log-level LEVEL |
Logging level (DEBUG, INFO, WARNING, ERROR, CRITICAL) | INFO |
--enable-cache |
Enable validation result caching | Disabled |
--clear-cache |
Clear cache before processing (use with --enable-cache) | - |
--cache-size N |
Maximum cached entries (>= 1) | 1000 |
--cache-ttl N |
Cache time-to-live in seconds (>= 1) | 3600 |
--shared-cache |
Share validation cache across batch workers | Disabled |
--api-auto-tune |
Enable automatic API worker tuning | Disabled |
--api-min-workers N |
Minimum workers for auto-tuning | 1 |
--api-max-workers N |
Maximum workers for auto-tuning | 10 |
--circuit-breaker |
Enable circuit breaker pattern | Disabled |
--circuit-failure-threshold N |
Failures before opening circuit | 5 |
--circuit-timeout N |
Recovery timeout in seconds | 30 |
--include-segments |
Include segments inventory in output | Disabled |
--include-derived |
Include derived fields inventory in output | Disabled |
--include-calculated |
Include calculated metrics inventory in output | Disabled |
--inventory-only |
Output only inventory sheets (requires --include-*) |
Disabled |
-h, --help |
Show help message and exit | - |
# Single data view
cja_auto_sdr dv_12345
# Multiple data views (automatically triggers parallel batch processing)
cja_auto_sdr dv_12345 dv_67890 dv_abcde
# Explicitly use batch mode (same result as above when multiple data views)
cja_auto_sdr --batch dv_12345 dv_67890 dv_abcde# Use a profile for credentials (recommended for multi-org)
cja_auto_sdr --profile client-a dv_12345 dv_67890 dv_abcde
# Custom number of workers (conservative for shared API)
cja_auto_sdr --batch dv_12345 dv_67890 --workers 2
# Custom output directory
cja_auto_sdr dv_12345 --output-dir ./reports
# Continue processing even if some data views fail
cja_auto_sdr --batch dv_12345 dv_67890 dv_abcde --continue-on-error
# Batch processing with custom log level
cja_auto_sdr --batch dv_* --log-level WARNING
# Full production example
cja_auto_sdr --batch \
dv_12345 dv_67890 dv_abcde \
--workers 4 \
--output-dir ./sdr_reports \
--continue-on-error \
--log-level INFO# Create a file with data view IDs (one per line)
cat > dataviews.txt <<EOF
dv_12345
dv_67890
dv_abcde
dv_99999
EOF
# Process all data views from file
cja_auto_sdr --batch $(cat dataviews.txt)
# With continue-on-error
cja_auto_sdr --batch \
$(cat dataviews.txt) \
--continue-on-error \
--output-dir ./batch_reports$ cja_auto_sdr
usage: cja_auto_sdr [-h] [--batch] ... DATA_VIEW_ID [DATA_VIEW_ID ...]
cja_auto_sdr: error: the following arguments are required: DATA_VIEW_ID
$ cja_auto_sdr invalid_id test123
ERROR: Invalid data view ID format: invalid_id, test123
Data view IDs should start with 'dv_'
Example: dv_677ea9291244fd082f02dd42
$ cja_auto_sdr --help
# Displays full help with all options and examples
1 data view × 35s = 35 seconds per data view
10 data views / 4 workers × 35s = ~87.5 seconds (1.5 minutes)
Improvement: 4x faster than processing individually (75% time savings)
Note: Multiple data views automatically trigger parallel batch processing for optimal performance.
| Workers | Best For | Performance |
|---|---|---|
| 1 | Testing, debugging | Baseline (100%) |
| 2 | Shared API, conservative | ~2x faster |
| 4 | Default, balanced | ~4x faster |
| 8 | Dedicated infrastructure | ~8x faster |
Note: Actual performance depends on API rate limits, network latency, and system resources.
Processing 10 data view(s) in batch mode with 4 workers...
2026-01-07 12:00:00 - INFO - ============================================================
2026-01-07 12:00:00 - INFO - BATCH PROCESSING START
2026-01-07 12:00:00 - INFO - ============================================================
2026-01-07 12:00:00 - INFO - Data views to process: 10
2026-01-07 12:00:00 - INFO - Parallel workers: 4
2026-01-07 12:00:00 - INFO - Continue on error: False
2026-01-07 12:00:00 - INFO - Output directory: .
2026-01-07 12:00:00 - INFO - ============================================================
2026-01-07 12:00:15 - INFO - ✓ dv_12345: SUCCESS (14.5s)
2026-01-07 12:00:16 - INFO - ✓ dv_67890: SUCCESS (15.2s)
2026-01-07 12:00:18 - ERROR - ✗ dv_abc123: FAILED - Data view validation failed
2026-01-07 12:00:20 - INFO - ✓ dv_def456: SUCCESS (16.1s)
...
============================================================
BATCH PROCESSING SUMMARY
============================================================
Total data views: 10
Successful: 8
Failed: 2
Success rate: 80.0%
Total duration: 125.3s
Average per data view: 15.7s
Successful Data Views:
✓ dv_12345 Production Analytics 14.5s
✓ dv_67890 Development Analytics 15.2s
✓ dv_def456 Testing Analytics 16.1s
...
Failed Data Views:
✗ dv_abc123 Data view validation failed
✗ dv_xyz789 No metrics or dimensions found
============================================================
Throughput: 4.8 data views per minute
============================================================
Batch Mode:
logs/SDR_Batch_Generation_YYYYMMDD_HHMMSS.log- Main batch log
Single Mode:
logs/SDR_Generation_{DATA_VIEW_ID}_YYYYMMDD_HHMMSS.log- Per data view log
# Add to crontab (crontab -e)
# Note: In crontab, % has special meaning (newline), so it must be escaped with \
# Process all data views nightly at 2 AM
0 2 * * * cd /path/to/project && cja_auto_sdr \
--batch dv_12345 dv_67890 dv_abcde \
--output-dir /reports/$(date +\%Y\%m\%d) \
--continue-on-error \
--log-level WARNING
# Process weekly on Sunday at midnight
0 0 * * 0 cd /path/to/project && cja_auto_sdr \
--batch $(cat /path/to/dataviews.txt) \
--workers 8 \
--output-dir /weekly_reports/$(date +\%Y_week_\%V) \
--continue-on-error# Create a scheduled task to run nightly at 2 AM
$action = New-ScheduledTaskAction -Execute "C:\path\to\project\.venv\Scripts\cja_auto_sdr.exe" `
-Argument "--batch dv_12345 dv_67890 dv_abcde --output-dir C:\reports --continue-on-error --log-level WARNING" `
-WorkingDirectory "C:\path\to\project"
$trigger = New-ScheduledTaskTrigger -Daily -At 2am
Register-ScheduledTask -Action $action -Trigger $trigger -TaskName "CJA SDR Nightly" -Description "Generate CJA SDR reports"
# Or create a weekly task for Sunday at midnight
$weeklyAction = New-ScheduledTaskAction -Execute "C:\path\to\project\.venv\Scripts\cja_auto_sdr.exe" `
-Argument "--batch dv_12345 dv_67890 --workers 8 --output-dir C:\weekly_reports --continue-on-error" `
-WorkingDirectory "C:\path\to\project"
$weeklyTrigger = New-ScheduledTaskTrigger -Weekly -DaysOfWeek Sunday -At 12am
Register-ScheduledTask -Action $weeklyAction -Trigger $weeklyTrigger -TaskName "CJA SDR Weekly"Alternatively via Task Scheduler GUI:
- Open Task Scheduler (search "Task Scheduler" in Start menu)
- Click "Create Basic Task..."
- Set schedule (Daily/Weekly)
- Action: "Start a program"
- Program:
C:\path\to\project\.venv\Scripts\cja_auto_sdr.exe - Arguments:
--batch dv_12345 --output-dir C:\reports - Start in:
C:\path\to\project
# Conservative (shared API with rate limits)
--workers 2
# Balanced (default, works well for most cases)
--workers 4
# Aggressive (dedicated infrastructure)
--workers 8# Stop on first error (default, good for testing)
cja_auto_sdr --batch dv_1 dv_2 dv_3
# Continue on error (good for production, get as many as possible)
cja_auto_sdr --batch dv_1 dv_2 dv_3 --continue-on-error# Organize by date
--output-dir ./reports/$(date +%Y/%m/%d)
# Organize by environment
--output-dir ./reports/production
--output-dir ./reports/staging# Development/debugging
--log-level DEBUG
# Production (default)
--log-level INFO
# Production (quiet, only warnings/errors)
--log-level WARNINGSolution: Use uv run to execute the script:
cja_auto_sdr dv_12345Solution: Provide at least one data view ID:
cja_auto_sdr dv_12345Solution: Ensure data view IDs start with dv_:
# Wrong
cja_auto_sdr 12345
# Correct
cja_auto_sdr dv_12345Solution: Close any open Excel files or specify a different output directory:
cja_auto_sdr dv_12345 --output-dir ./new_reportsSolution: Reduce the number of workers:
cja_auto_sdr --batch dv_1 dv_2 dv_3 --workers 2# Old way: Edit script to change data view
data_view = "dv_677ea9291244fd082f02dd42"
cja_auto_sdr# New way: Specify data view(s) as arguments
cja_auto_sdr dv_677ea9291244fd082f02dd42
# Or multiple at once
cja_auto_sdr dv_12345 dv_67890- ProcessPoolExecutor: True parallelism (separate processes)
- No GIL limitations: Full CPU utilization
- Isolated processing: Each data view runs in its own process
- Fault tolerance: One failure doesn't affect others
- Each worker process has its own memory space
- No shared state between workers
- Automatic cleanup after completion
- Suitable for processing large datasets
- Parallel API calls to CJA endpoints
- ThreadPoolExecutor for I/O-bound API fetching within each process
- Optimized to minimize API call overhead
- Respects API rate limits (adjust workers as needed)
For issues, questions, or feature requests:
- Check this guide first
- Review error messages and logs
- Try with
--log-level DEBUGfor detailed output - Use
--helpto see all available options
- Configuration Guide - config.json, environment variables, CI/CD setup
- CLI Reference - Complete command-line options
- Performance Guide - Optimization and caching
- Use Cases - Automation workflows
- Org-Wide Analysis - Analyze components across all data views
- Segments Inventory - Segment filter documentation
- Derived Fields Inventory - Derived field documentation
- Calculated Metrics Inventory - Calculated metrics documentation
- Troubleshooting - Error resolution