Defer log_lines search_vector indexing off the insert path by stuartc · Pull Request #4818 · OpenFn/lightning

stuartc · 2026-05-30T18:02:06Z

Description

This changes how log_lines full-text search gets indexed — taking it off the
log-insert hot path to cut down the run:log channel timeouts we've been seeing
under heavy log volume.

The old AFTER INSERT trigger computed to_tsvector synchronously inside the
insert transaction (so it never actually deferred anything) and double-wrote
every row via a self-UPDATE. It also had no program_limit_exceeded guard, so
a single oversized message would abort the insert and roll back the whole batch
on the insert_all path.

Now inserts leave search_vector NULL and a background Oban worker
(Lightning.LogLines.SearchVectorWorker) backfills it out-of-band on a dedicated
search_indexing queue. There's a guarded safe_to_tsvector SQL function and a
partial index (WHERE search_vector IS NULL) so finding pending rows stays
cheap. Live log streaming is unaffected — it's push-based over PubSub and never
reads search_vector; only full-text log search now lags ingestion slightly,
typically under a minute.

Three migrations, ordered 1a→1b→1c: add safe_to_tsvector, add the partial index
(built per-partition CONCURRENTLY then attached to the parent), then drop the
trigger.

Closes #4425

Validation steps

Run the migrations (mix ecto.migrate). Confirm the log_lines
set_search_vector trigger is gone, safe_to_tsvector exists, and the
partial index is VALID across the parent + all 100 partitions. The
dataclips trigger should be untouched.
Insert a log line — its search_vector is NULL and it isn't matched by log
search yet.
Run Lightning.LogLines.SearchVectorWorker — the row's search_vector is
populated and now matches to_tsquery('english_nostop', …).
Insert an oversized (>1MB) message — it drains to an empty vector rather than
erroring, and doesn't get stuck retrying.
mix test test/lightning/log_lines/search_vector_worker_test.exs test/lightning/runs_test.exs

Additional notes for the reviewer

The worker drains newest-first and "snowballs" an immediate follow-up while
there's backlog, falling back to a 1-minute cron heartbeat. Concurrency is 1
on the dedicated queue (SKIP LOCKED makes bumping it safe later).
Two subtle bits worth a close look: the snowball's Oban unique states are
restricted to [:available, :scheduled] on purpose — the default includes
:executing/:completed, which makes a running job dedup its own successor
and kills the chain — and the index migration drops any INVALID leftover
before re-creating, so a failed CONCURRENTLY build can be re-run.
safe_to_tsvector is also meant as the template for the dataclip trigger fix
(Dataclip insert times out building the search vector, which loses the run #4800), which is separate.

AI Usage

I have used Claude Code
I have used another model
I have not used AI

Pre-submission checklist

I have performed an AI review of my code
I have implemented and tested all related authorization policies.
I have updated the changelog.
I have ticked a box in "AI usage" in this PR

The AFTER INSERT trigger on log_lines computed to_tsvector synchronously inside the insert transaction (it never actually deferred anything) and double-wrote every row via a self-UPDATE. It also lacked the program_limit_exceeded guard dataclips got, so an oversized message aborted the insert (rolling back the whole batch for insert_all). Remove the trigger so inserts leave search_vector NULL, and backfill it out-of-band via a new Oban worker (Lightning.LogLines.SearchVectorWorker) on a dedicated search_indexing queue. Adds a guarded safe_to_tsvector SQL function and a partial index (WHERE search_vector IS NULL) so draining pending rows stays cheap. Covers both the single (run:log) and batch (run:batch_logs) insert paths. Live log streaming is unaffected (push-based via PubSub); only full-text log search lags slightly behind ingestion.

Follow-up hardening on the search_vector deferral, from a review pass: - Fix the worker's snowball chain. Oban's default unique states include :executing and :completed, so a running snowball job matched itself when enqueueing its successor and the insert was silently deduped, breaking the chain after one hop. Restrict uniqueness to [:available, :scheduled]. - Make the pending-search index migration re-runnable: a failed CREATE INDEX CONCURRENTLY leaves an INVALID index that IF NOT EXISTS would skip and ATTACH would leave the parent invalid forever. Drop any invalid leftover first. - Make safe_to_tsvector NULL-defensive and re-runnable (CREATE OR REPLACE, drop STRICT, COALESCE the doc) so it never returns NULL and leaves a row stuck in the pending set. Also the template for the dataclip fix. - Add snowball uniqueness regression tests.

Trim hindsight/diff-narrating comments down to what's non-obvious, and rewrite the SearchVectorWorker moduledoc to read as documentation of the mechanism rather than a justification of the change.

github-actions · 2026-05-30T18:34:15Z

The single raw SQL call is parameterized with a module constant (@batch_size); migration interpolation uses part_num from a fixed 1..100 range. No user-controlled input flows into SQL.

Security Review ✅

S0 (project scoping): N/A — the new worker performs internal search_vector backfill on log_lines (no project-scoped reads/writes, no web entrypoint); the one Repo.query! (lib/lightning/log_lines/search_vector_worker.ex:82) is parameter-bound and the migration's only interpolation is a fixed 1..100 partition counter.
S1 (authorization): N/A — no new controller, LiveView event, or channel handler; the worker is only invoked by Oban cron/snowball self-enqueue.
S2 (audit trail): N/A — the only write updates a derived full-text index column on log_lines, not a config resource (workflows, credentials, settings, OAuth clients, etc.).

taylordowns2000 · 2026-05-31T09:51:00Z

@stuartc , fantastic find here. I noticed that the description seemed to reference inserting multiple log lines at once. Wanted to double check: batch log lines insert is turned off on app.openfn.org, right?

I think we tried it a few months back and when the crashes started occurring we switched it off. I believe Joe has been tracking a bug related to log line ordering on the bulk log line feature too.

(Just double checking that we're all on the same page here.)

github-project-automation Bot added this to Core May 30, 2026

github-project-automation Bot moved this to New Issues in Core May 30, 2026

stuartc added 3 commits May 30, 2026 20:31

Refine comments and moduledoc for deferred log_lines indexing

cae40bc

Trim hindsight/diff-narrating comments down to what's non-obvious, and rewrite the SearchVectorWorker moduledoc to read as documentation of the mechanism rather than a justification of the change.

stuartc force-pushed the defer-log-lines-search-vector branch from f3f45af to cae40bc Compare May 30, 2026 18:32

stuartc marked this pull request as ready for review May 30, 2026 18:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Defer log_lines search_vector indexing off the insert path#4818

Defer log_lines search_vector indexing off the insert path#4818
stuartc wants to merge 3 commits into
mainfrom
defer-log-lines-search-vector

stuartc commented May 30, 2026

Uh oh!

github-actions Bot commented May 30, 2026

Uh oh!

taylordowns2000 commented May 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

stuartc commented May 30, 2026

Description

Validation steps

Additional notes for the reviewer

AI Usage

Pre-submission checklist

Uh oh!

github-actions Bot commented May 30, 2026

Security Review ✅

Uh oh!

taylordowns2000 commented May 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants