Skip to content

Update adaptors on the fly#4801

Draft
stuartc wants to merge 39 commits into
mainfrom
adaptors-on-the-fly
Draft

Update adaptors on the fly#4801
stuartc wants to merge 39 commits into
mainfrom
adaptors-on-the-fly

Conversation

@stuartc
Copy link
Copy Markdown
Member

@stuartc stuartc commented May 27, 2026

This PR introduces Lightning.Adaptors.*, a new subsystem for managing adaptor metadata — the NPM packages that power workflow jobs. It runs alongside the existing Lightning.AdaptorRegistry for now; the new subsystem will eventually take over the catalogue path entirely.

Opening as a draft to make the work visible on GitHub and keep the backlog grounded in what's actually being built.

Closes #4473
Closes #3114
Closes #2209
Closes #325
Closes #1996
Closes #220

Status

A top-level item is only ticked once every nested item under it is ticked.

Delivered

  • App boots whether or not npm is reachable — the first caller for a not-yet-loaded key blocks on a bounded Cachex.fetch and gets {:ok, data} or {:error, :timeout}; no :unavailable tri-state spreads into the UI.
  • New adaptor versions appear without a restart (Better Adaptor List Refreshing #2209, Update adaptor icons and schemas list for new adaptors #3114) — hourly diff-driven refresh against the npm listing; steady state ≈ one upstream listing call per tick.
  • Cluster nodes stay in sync (Adaptor registry is out of sync when the app is running in a cluster #1996) — Postgres is the single ledger; a cluster-singleton scheduler writes; peers learn over PubSub and evict their caches; a rejoining node re-warms from Postgres.
  • @latest / @local resolve against the current catalogue, not boot-time state — resolved at fetch:plan time from the live DB row.
  • Credential schemas keep their JSON field orderschema_data stored as text and decoded with ordered objects, end to end.
  • Local / airgap mode — filesystem-backed strategy makes no outbound calls; interchangeable with npm behind one behaviour.
  • On-demand refresh — superuser maintenance page + mix lightning.refresh_adaptors.

To finish before un-drafting

  • Icons appear without a redeployinitial load works; live updates don't yet.
    • Icons fetched from GitHub raw on every scheduler tick
    • Content-addressable URLs with Cache-Control: immutable
    • Initial channel join carries icon_urls
    • adaptors_updated broadcast carries icon_urls too — today the payload fails frontend validation and silently clears icons from the picker
  • Credential types list is pre-populated from the live catalogue (Credential types list should be pre-populated (caching?) #220, Build service to pull credential schemas from npm/github #325) — body schemas done; picker list not.
    • Credential body schema resolves through the live DB / cache / npm path
    • "New Credential" type picker still reads priv/schemas/*.json from the build-time install_schemas task — a fresh deploy won't list new adaptor types until that task runs
  • Docker image still builds and runsdefinition of done: Docker still works. The image still runs mix lightning.install_adaptor_icons and install_schemas, which now fetch artefacts nothing reads.
    • Remove the dead build steps from the Dockerfile
    • Remove the same dead steps from bin/bootstrap.d/common.sh
    • Confirm the image builds and boots clean
  • Supervisor survives a sibling crash without churning leadership:rest_for_one currently restarts the scheduler whenever an earlier child dies, forcing a needless leader re-election. Reshape so a sibling crash doesn't disturb leadership.
  • One env var, one outcome for local modeLOCAL_ADAPTORS=true flips the legacy registry but leaves the new subsystem on npm. Reconcile while both run side by side.
  • Old browser tabs degrade gracefully after a deploy — a stale tab sending request_project_adaptors to the new server hits no handler and is silently dropped. Add a catch-all channel fallback.
  • Icons survive a container restart in production — the default icon path is a temp dir, so icons re-fetch on every restart. Set a durable default (e.g. a PVC path) and document ADAPTOR_ICONS_PATH for deploys.
  • Workflow diagram shows icons on the legacy LiveView editor path — that path has no store provider, so it falls back to text labels. Either mount a provider or consciously defer (see considerations).

Smaller clean-ups

  • request_adaptors field-name match — server emits latest_version, frontend expects latest; add an end-to-end channel assertion
  • Route the icon controller through the Lightning.Adaptors facade rather than reaching into Store directly
  • Delete dead useAdaptorIcons.ts
  • mix lightning.refresh_adaptors exit codes — distinguish not_leader from other failures so non-leader runs report clearly
  • Tidy inspect/1 topic names (cosmetic double-colon in logs / observer)

Considerations / open questions (not yet decided)

  • @local on the wire — confirmed supported, with a deployment caveat. The worker resolves @openfn/language-foo@local against its monorepo path (convert-lightning-plan.ts), but only when started with OPENFN_ADAPTORS_REPO / --monorepo-dir; otherwise the run fails. Decide whether Lightning guards/documents this, or moves to resolving a concrete version on the wire.
  • Cold-DB @latest resolution — on a fresh deploy before the first tick, the adaptors table is empty and resolution falls back to the literal name@latest. Confirm the worker handles that harmlessly (add a CI test).
  • Removed-from-npm packages — a package disappearing upstream is currently a no-op (row stays, hidden at read time). Confirm this matches our data-retention intent, or add a hard-delete / tombstone policy.

Module orientation

Module Role
Lightning.Adaptors Public facade: packages/0, versions/1, schema/1, icon/2, resolve_version/2, refresh_now/0, refresh_package/1
Lightning.Adaptors.Supervisor Boots the subsystem as one supervised unit
Lightning.Adaptors.Config Settings resolver
Lightning.Adaptors.Strategy Behaviour; implemented by NPM (with NPM.Registry, NPM.Schema, NPM.GitHub) and Local
Lightning.Adaptors.Repo + Repo.Adaptor + Repo.AdaptorVersion Ecto schemas and queries over the new tables
Lightning.Adaptors.Store Cachex-fronted read facade
Lightning.Adaptors.IconCache Per-node on-disk icon cache
Lightning.Adaptors.Scheduler Cluster-singleton diff-driven refresh (leadership via Postgres advisory lock + HighlanderPG)
Lightning.Adaptors.Invalidator / NodeMonitor / ChannelBroadcaster Cache coherence and fan-out
LightningWeb.AdaptorIconController Serves content-addressable icons
Mix.Tasks.Lightning.RefreshAdaptors Manual refresh from the command line
Migration Creates adaptors and adaptor_versions tables

How to review

TBD

AI Usage

  • I have used Claude Code
  • I have used another model
  • I have not used AI

Pre-submission checklist

  • I have performed an AI review of my code
  • I have implemented and tested all related authorization policies
  • I have updated the changelog
  • I have ticked a box in "AI usage" in this PR

stuartc added 30 commits May 27, 2026 14:23
Stateless wrapper around Application.get_env/3 for the Lightning.Adaptors
subsystem (Phase A batch 1). Exposes strategy/0, current_source/0,
refresh_interval/0, cache_timeout_ms/0, icon_path/0, strategy_opts/1 —
no internal state, no caching, every call reads fresh.
Pure behaviour module defining the adaptor source contract (Phase A
batch 1). Three callbacks, two types; no test file per PRD — conformance
is exercised via StrategyMock in downstream stories.
Phase A: repo_adaptor. Generated by autonomous harness.
Phase A: repo_adaptor_version. Generated by autonomous harness.
Phase A: icon_cache. Generated by autonomous harness.
Phase A: local. Generated by autonomous harness.
Phase A: npm. Generated by autonomous harness.
Sits alongside the existing .elixir_ls / .elixir-tools entries. Expert
occasionally spawns nested workspace markers in subdirectories when it
loses track of the project root, so a bare-name match catches both
the project-root .expert/ and any stray nested copies.
Phase A: adaptors_migration. Generated by autonomous harness.
Phase A: repo. Generated by autonomous harness.
Phase A: supervisor. Generated by autonomous harness.
Phase A: invalidator. Generated by autonomous harness.
Phase A: node_monitor. Generated by autonomous harness.
Phase A: scheduler. Generated by autonomous harness.
Phase A: adaptor_icon_controller. Generated by autonomous harness.
Phase A: adaptors. Generated by autonomous harness.
Phase A: refresh_adaptors_task. Generated by autonomous harness.
Phase A: channel_broadcaster. Generated by autonomous harness.
Store provides cached package/version/schema/icon reads with Cachex
fallthrough to the active Strategy, persisting NPM-mode results to
Postgres via Repo and warming from Repo on :nodeup.

Supervisor takes an explicit :strategy opt (defaulting to
Config.strategy/0) and stashes {strategy, source} in :persistent_term
keyed by {Supervisor, name}. Store callers resolve both via the new
strategy/1 and source/1 helpers — no Application.put_env, no shared
mutable state, async-safe.

Pattern follows PR #4562: thread the dependency through the opts,
expose it via the supervisor instance, mock the behaviour (not the
caller) in tests. test_helper.exs defines StrategyMock against the
Strategy behaviour.

Supervisor's child list drops Invalidator/ChannelBroadcaster/
NodeMonitor/Scheduler for now — they're added back as their PRDs
land in batches 4 and 5.
Split the monolithic NPM strategy into three focused sub-modules under
Lightning.Adaptors.NPM.{Registry, Schema, Tarball}, each owning its own
Tesla client and upstream concern. The NPM module is now a thin
orchestrator (~96 lines) implementing the Strategy behaviour by
delegating to the sub-modules.

Migrate the test suite from Mox-stubbed Tesla envs to Bypass-driven HTTP
fixtures. Each sub-module gets its own test file with its own Bypass
instance; the orchestrator test exercises real HTTP through the
composed pipeline. The Tesla adapter is overridden per-test, so the
21 other test files relying on the global Lightning.Tesla.Mock are
unaffected.

Add :jsdelivr_url to the NPM strategy_opts block, mirroring the
existing :registry_url. This makes both upstreams swappable at runtime
(e.g. for pointing dev/CI at a local Verdaccio cache).

Apply the @openfn/language-* filter to Registry.list_adaptors/0, matching
legacy AdaptorRegistry semantics (excludes @openfn/cli, @openfn/buildtools,
and other non-language packages).

The icon-strategy reshape (icons live in the OpenFn GitHub monorepo, not
in per-package npm tarballs) is deferred to a follow-up plan; see
context/lightning/adaptors/02-deferred-icon-strategy-fix.md.
Icons live in the OpenFn/adaptors monorepo, not in published npm
artifacts. Replace the broken per-package tarball walk with
NPM.GitHub, which fetches from raw.githubusercontent.com both in
bulk (Scheduler refresh tick) and as a lazy-miss fallback (Store).

* Add Strategy.fetch_icons/0 bulk callback; remove icon fields
  from fetch_adaptor/1 records — Scheduler joins icons in a
  separate pipeline.
* Scheduler runs two parallel pipelines per tick: bulk icons via
  fetch_icons/0, and per-adaptor diff via list_adaptors +
  fetch_adaptor. Write failures (Repo upsert, IconCache.write!)
  now degrade with Logger lines instead of crashing the task
  silently.
* Local strategy implements fetch_icons/0 by walking its
  configured assets directory.
* Delete NPM.Tarball — dead code, icons were never there.
Drop direct Store.icon_meta/2 and Store.icon/3 calls in favour of the
single-arg facade. The controller no longer hard-codes the default
supervisor name; the facade does. Matches REWRITE-2026-05 §6.7.2.
Add Invalidator, NodeMonitor, ChannelBroadcaster, and Scheduler to the
:rest_for_one child list — they were stubbed out pending their own
Phase A stories, which have now all shipped. Each child gets the opts
its own start_link/1 declares (not the speculative shape from the
plan); names come from the Supervisor's name helpers.

Mount Lightning.Adaptors.Supervisor in application.ex alongside the
legacy adaptor_registry_childspec and adaptor_service_childspec.
Coexistence is deliberate per REWRITE-2026-05 §9 — the cheap rollback
path until Phase B migrates callers.

In config/test.exs, point the boot-time supervisor at
Lightning.Adaptors.StrategyMock with refresh_interval: 0 so the
production-name instance is inert in the test suite. Per-test
supervisors override as needed.

Drop now-redundant start_supervised!({Supervisor, ...}) calls from the
four Phase A unit-test files (channel_broadcaster, invalidator,
node_monitor, scheduler) and from the adaptors facade test, since
application.ex provides the instance. Where a test needed a custom
refresh interval, terminate the application-supplied child first then
start a replacement under the same supervisor.

Add supervisor_integration_test.exs covering the :rest_for_one crash
cascade — a load-bearing decision in §6.5a (Invalidator subscribes at
init; a Cachex restart without a downstream cascade would leave it
bound to a stale cache).
Mount GET /adaptors/icons/:name/:filename in the existing public
browser scope. Icons are content-addressed (sha8 in the URL) and must
be cacheable across users, so authentication is intentionally skipped.

The route uses a single :filename path segment rather than the plan's
literal :shape-:sha8.:ext — Phoenix's router rejects multiple dynamic
entries in one path component. The controller's new 2-arg show/2 head
parses shape-sha8.ext via Regex.named_captures and delegates to the
existing 4-key clause that the 22 direct-call tests still exercise.
AdaptorIconURL.build/3 already emits this shape, so the public URL is
unchanged.

Add two tests under the existing test file that go through the full
router pipeline (200 on sha match, 404 on unknown adaptor) so a
routing-table typo breaks CI. The existing direct-call matrix covers
controller behaviour.
Explicit Lightning.Adaptors.NPM override block in config/dev.exs.
Defaults already work; the block exists so new devs can see and tweak
the four knobs NPM.Registry / NPM.GitHub / NPM.Schema actually read
(registry_url, github_url, github_ref, jsdelivr_url) without
spelunking through the strategy modules.

Expose ChannelBroadcaster's @debounce_ms 250 as a public debounce_ms/0
so the end-to-end test can read the authoritative value rather than
hardcoding 250.

Add end_to_end_broadcast_test.exs proving the §6.5c contract: a
{:changed, name, source} broadcast on the source topic arrives at
subscribers of the client topic as a coalesced %{event:
"adaptors_updated", payload: %{adaptors: _}} envelope. This is the
single test that breaks if any of the four newly-wired children is
misconfigured.
Until now the scheduler only logged failures, so a successful tick
left no trace. Each tick now emits one Logger.info summary line with
listed/changed/touched/fetched/icons/errors/duration counts, plus
init, refresh_now, and refresh_package invocation lines. Per-package
fetch/persist events log at :debug.
New /settings/maintenance LiveView exposes an on-demand
"Refresh Adaptor Registry" action that calls
Lightning.Adaptors.refresh_now/0. Gated by :access_admin_space,
matching the AuditLive/UserLive pattern.
Wraps Lightning.Adaptors.Scheduler in a HighlanderPG child under
Lightning.Adaptors.Supervisor so exactly one node in the cluster runs
the refresh tick. The inner Scheduler registers via {:global, …} so
callers on any node reach the leader transparently via Erlang
distribution; the fictional {:error, :not_leader} surface is removed
everywhere (Adaptors facade, MaintenanceLive, refresh_adaptors task).

Test isolation fixes uncovered while verifying:

* Drop the boot Cachex.clear/1 Task — redundant (Cachex is ETS-backed,
  empty at every (re)start under :rest_for_one) and a real race against
  test setups that put! into the cache immediately after
  start_supervised!.
* Pin Lightning.Adaptors.IconCache to a per-OS-PID directory at
  test_helper boot and wipe on entry. The default path under
  System.tmp_dir!/lightning/adaptor_icons is shared across mix test
  invocations, and System.unique_integer/1 recycles per VM — stale
  files from a prior run masked Mox expectations by short-circuiting
  IconCache.cached?/4.
* Replace two FIFO-dispatched expect/4 calls in store_test and
  scheduler_test with single multi-clause expect/4 calls so Mox routes
  by pattern when parallel tasks fan out in arbitrary order.
GitHub.strip_scope only stripped `@openfn/`, leaving `language-` in the
URL. The adaptors monorepo lays packages out at `packages/<bare-name>/`,
so every icon GET 404'd silently and the persisted rows came out
iconless. Strip `@openfn/language-` first so URLs hit the real path.

Add debug-level per-URL logging and an info summary with ok/not_found/
errors counts so a 100% miss is loud next time.

Then close the gap for already-broken rows:

  * Repo.list_missing_icons/1 + update_icons/3 — icon-only writer that
    bypasses upsert_adaptor/1 (which would rewrite adaptor_versions).
  * Scheduler self-heal: every tick tops up rows with NULL icon shas
    using the icons map we already fetched. No-op once everyone has
    icons, so cheap.
  * Scheduler.refresh_icons/1 + Lightning.Adaptors.refresh_icons/0,1:
    walk every row, diff against fresh shas, write only where changed.
  * Maintenance LiveView: second card wires the manual force-resync.
Periodic ticks and Maintenance "Refresh Icons" now send If-None-Match
with the server-issued ETag and short-circuit on 304, dropping a warm
refresh from ~4.7s/206 full downloads to ~130ms/206 304s against the
Fastly edge.

Schema gains icon_square_etag/icon_rectangle_etag columns (transport
metadata, not part of the sha256 invariant). Strategy callback becomes
fetch_icons/1 with a :prior_etags option; per-shape result is now a
three-way union (fresh map | :not_modified sentinel | absent) so the
scheduler distinguishes "upstream confirmed unchanged" from "upstream
has no such shape". nil/missing etags never clobber existing columns.
stuartc added 9 commits May 27, 2026 14:23
channel_request_adaptors_enrichment. Generated by autonomous harness.
channel_broadcaster_wiring. Generated by autonomous harness.
Swap the three TSX consumers of adaptor icons (AdaptorIcon, JobNode,
MiniMapNode) off the legacy adaptor_icons.json manifest fetch and onto
the channel-delivered `icon_urls.square` field surfaced by the
collaborative-editor AdaptorStore.

- Add icon_urls to AdaptorSchema (square/rectangle, both nullable).
- Add useAdaptorIconUrl hook in collaborative-editor/hooks/useAdaptors.
- AdaptorIcon reads StoreContext directly with a noop-subscribe
  fallback so consumers without a StoreProvider (e.g. FullScreenIDE
  tests) still get the first-letter placeholder.
- Job/MiniMap nodes consume useAdaptorIconUrl directly; the LiveView
  workflow-editor path falls back to the adaptor string label until a
  follow-up PRD wires a LiveView adaptor source.
- Fixture mockAdaptor* records get explicit icon_urls.
- New tests cover happy path, null icon, and missing-StoreProvider.
Migrate production callers from the old AdaptorRegistry API to the
new Lightning.Adaptors / Lightning.Adaptors.Store API across
credentials, channels, job/adaptor picker, workflow edit/editor/job
views, and the AI assistant.

Add Lightning.Adaptors.PackageName helper for adaptor-name parsing.

Add test/support/adaptor_test_helpers.ex with seed_adaptor /
seed_credential_schema / seed_common_packages and Cachex warming.
Wire it into the test files that previously relied on the implicit
AdaptorRegistry seed.

Known follow-up: credential-form field order regressed because the
new Lightning.Adaptors.Repo stores schema_data as jsonb (:map),
which flattens JSON property order. Two affected tests
(@tag :skip with TODO) covering postgresql and dhis2 credential
creation will be re-enabled when the storage shape is fixed in a
follow-up PRD.
Removes the request_project_adaptors RPC and projectAdaptors field;
derives adaptors-in-use client-side from the Y.Doc job list and merges
into the AdaptorStore. Renames useProjectAdaptors → useAdaptorsInUse
through call sites and (post-hoc) updates the three FullScreenIDE
mocks that were out of the PRD's initial touches: allow-list.
Restores credential-form field-rendering order by switching
adaptors.schema_data from jsonb to text and feeding the raw JSON
binary to Lightning.Credentials.Schema.new/2 (re-engaging
Jason.decode!(_, objects: :ordered_objects)). Adds a custom Ecto
type that accepts both maps (legacy rows) and binaries at the
schema layer. Removes the two @tag :skip markers introduced by
PRD #5 for the postgresql/dhis2 credential tests. Deploy: run
`mix lightning.refresh_adaptors` post-migration so existing rows
re-fetch via the strategy in property-preserving order.
@github-project-automation github-project-automation Bot moved this to New Issues in Core May 27, 2026
@stuartc stuartc self-assigned this May 28, 2026
@stuartc stuartc moved this from New Issues to In progress in Core May 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: In progress

1 participant