Problem
The @pgpm/jobs and @pgpm/database-jobs packages (v0.15.5) have three compounding performance issues in the job acquisition path that, together, produce the worst-case scenario documented in upstream Graphile Worker issue #240. Under queue contention, job acquisition can be 20x–50x slower than it needs to be.
1. queue_name defaults to gen_random_uuid()::text instead of NULL
Both jobs and scheduled_jobs tables define:
queue_name text DEFAULT (public.gen_random_uuid())::text,
Upstream Graphile Worker made queue_name nullable with no default in v0.4.0 (migration 000003), describing it as a "significant performance enhancement" because it "changes database schema such that a job_queue record is only added/checked when necessary."
With the random UUID default, every job creates a unique queue — meaning:
- The
increase_job_queue_count trigger fires on every insert, creating a job_queues row per job
get_job must evaluate the expensive EXISTS ... FOR UPDATE SKIP LOCKED subquery on job_queues for every candidate row
- The
job_queues table grows unboundedly with unique entries
- Queue bookkeeping runs at maximum cost for zero benefit
Note: The queue-count triggers already have WHEN (NEW.queue_name IS NOT NULL) guards — so the fix is just changing the default.
2. get_job uses the old row-by-row queue checking pattern (Strategy 1)
The current get_job checks queue eligibility per candidate job row:
AND (jobs.queue_name IS NULL
OR EXISTS (
SELECT 1 FROM app_jobs.job_queues
WHERE job_queues.queue_name = jobs.queue_name
AND (job_queues.locked_at IS NULL
OR job_queues.locked_at < (v_now - job_expiry))
FOR UPDATE SKIP LOCKED
))
This is Strategy 1 (per-row EXISTS) in the upstream nomenclature. Upstream issue #240 documented this exact query going from 242ms to 13ms (18x) when replaced with Strategy 2 (pre-computed eligible queue set). The upstream benchmarks show:
| Strategy |
Description |
Jobs/sec |
| 0 |
Skip named queues entirely |
11,800 |
| 1 |
Per-row EXISTS (current pgpm approach) |
~40 |
| 2 |
Pre-lock queues upfront (recommended) |
843 |
| 3 |
Exclude locked queues (unsafe) |
600 |
3. Legacy ordering index
The current index:
CREATE INDEX priority_run_at_id_idx ON app_jobs.jobs (priority, run_at, id);
Upstream v0.14 replaced this with partial covering indexes filtered on an is_available generated column:
CREATE INDEX jobs_main_index ON jobs USING btree (priority, run_at)
INCLUDE (id, task_id, job_queue_id) WHERE (is_available = true);
The partial index excludes locked/exhausted jobs from the index entirely, making it smaller and faster to scan.
Solution
Phase 1: Quick Wins (High Impact, Low Risk)
1a. Fix queue_name default to NULL
In both @pgpm/jobs and @pgpm/database-jobs, change the jobs and scheduled_jobs table definitions:
-- Before:
queue_name text DEFAULT (public.gen_random_uuid())::text,
-- After:
queue_name text DEFAULT NULL,
Audit any downstream code that assumes queue_name is always non-null. The existing get_job, complete_job, fail_job, and triggers already handle NULL correctly.
1b. Rewrite get_job with pre-computed queue set (Strategy 2)
Replace the row-by-row EXISTS with a single pre-computed subquery:
-- Key change in the WHERE clause:
AND (jobs.queue_name IS NULL
OR jobs.queue_name IN (
SELECT jq.queue_name
FROM app_jobs.job_queues jq
WHERE (jq.locked_at IS NULL
OR jq.locked_at < (v_now - job_expiry))
FOR UPDATE SKIP LOCKED
))
The subquery is evaluated once and PostgreSQL uses a hashed semi-join, eliminating the per-row queue-lock check.
Phase 2: Index Improvements
2a. Add is_available generated column
ALTER TABLE app_jobs.jobs ADD COLUMN is_available boolean
GENERATED ALWAYS AS ((locked_at IS NULL) AND (attempts < max_attempts)) STORED;
2b. Replace legacy index with partial covering indexes
CREATE INDEX CONCURRENTLY jobs_main_index
ON app_jobs.jobs USING btree (priority, run_at)
INCLUDE (id, queue_name) WHERE (is_available = true);
CREATE INDEX CONCURRENTLY jobs_no_queue_index
ON app_jobs.jobs USING btree (priority, run_at)
INCLUDE (id) WHERE (is_available = true AND queue_name IS NULL);
-- After benchmarking:
DROP INDEX priority_run_at_id_idx;
Phase 3: Modernization Features
- Add
revision integer DEFAULT 0 and flags jsonb columns
- Add
forbidden_flags parameter to get_job
- Change notify trigger from
FOR EACH ROW to FOR EACH STATEMENT
- Add
remove_job(job_key text) function
- Add
force_unlock_workers(worker_ids text[]) function
Phase 4: Optional — Full v0.14 Table Rebuild
tasks lookup table with integer FK (replaces text task_identifier on hot path)
job_queue_id integer FK (replaces text queue_name join)
smallint for priority/attempts/max_attempts
Phase 4 is optional — the performance gains are marginal compared to Phases 1–2, and the migration risk is higher (especially preserving the database_id multi-tenant design).
Expected Impact
| Change |
Expected Improvement |
Fix queue_name default |
~2–5x for add_job (eliminates queue bookkeeping) |
Rewrite get_job (Strategy 2) |
~20x acquisition speedup under contention |
Partial indexes with is_available |
~2–3x for get_job on large tables |
| Combined Phase 1 + 2 |
Estimated 10–50x throughput improvement |
References
Problem
The
@pgpm/jobsand@pgpm/database-jobspackages (v0.15.5) have three compounding performance issues in the job acquisition path that, together, produce the worst-case scenario documented in upstream Graphile Worker issue #240. Under queue contention, job acquisition can be 20x–50x slower than it needs to be.1.
queue_namedefaults togen_random_uuid()::textinstead ofNULLBoth
jobsandscheduled_jobstables define:Upstream Graphile Worker made
queue_namenullable with no default in v0.4.0 (migration 000003), describing it as a "significant performance enhancement" because it "changes database schema such that a job_queue record is only added/checked when necessary."With the random UUID default, every job creates a unique queue — meaning:
increase_job_queue_counttrigger fires on every insert, creating ajob_queuesrow per jobget_jobmust evaluate the expensiveEXISTS ... FOR UPDATE SKIP LOCKEDsubquery onjob_queuesfor every candidate rowjob_queuestable grows unboundedly with unique entriesNote: The queue-count triggers already have
WHEN (NEW.queue_name IS NOT NULL)guards — so the fix is just changing the default.2.
get_jobuses the old row-by-row queue checking pattern (Strategy 1)The current
get_jobchecks queue eligibility per candidate job row:This is Strategy 1 (per-row EXISTS) in the upstream nomenclature. Upstream issue #240 documented this exact query going from 242ms to 13ms (18x) when replaced with Strategy 2 (pre-computed eligible queue set). The upstream benchmarks show:
3. Legacy ordering index
The current index:
Upstream v0.14 replaced this with partial covering indexes filtered on an
is_availablegenerated column:The partial index excludes locked/exhausted jobs from the index entirely, making it smaller and faster to scan.
Solution
Phase 1: Quick Wins (High Impact, Low Risk)
1a. Fix
queue_namedefault to NULLIn both
@pgpm/jobsand@pgpm/database-jobs, change thejobsandscheduled_jobstable definitions:Audit any downstream code that assumes
queue_nameis always non-null. The existingget_job,complete_job,fail_job, and triggers already handle NULL correctly.1b. Rewrite
get_jobwith pre-computed queue set (Strategy 2)Replace the row-by-row
EXISTSwith a single pre-computed subquery:The subquery is evaluated once and PostgreSQL uses a hashed semi-join, eliminating the per-row queue-lock check.
Phase 2: Index Improvements
2a. Add
is_availablegenerated column2b. Replace legacy index with partial covering indexes
Phase 3: Modernization Features
revision integer DEFAULT 0andflags jsonbcolumnsforbidden_flagsparameter toget_jobFOR EACH ROWtoFOR EACH STATEMENTremove_job(job_key text)functionforce_unlock_workers(worker_ids text[])functionPhase 4: Optional — Full v0.14 Table Rebuild
taskslookup table with integer FK (replaces texttask_identifieron hot path)job_queue_idinteger FK (replaces textqueue_namejoin)smallintfor priority/attempts/max_attemptsPhase 4 is optional — the performance gains are marginal compared to Phases 1–2, and the migration risk is higher (especially preserving the
database_idmulti-tenant design).Expected Impact
queue_namedefaultadd_job(eliminates queue bookkeeping)get_job(Strategy 2)is_availableget_jobon large tablesReferences