Skip to content

(feat): per-IOC locking#165

Open
anderslindho wants to merge 5 commits into
masterfrom
feat/per-ioc-locking
Open

(feat): per-IOC locking#165
anderslindho wants to merge 5 commits into
masterfrom
feat/per-ioc-locking

Conversation

@anderslindho
Copy link
Copy Markdown
Contributor

No description provided.

anderslindho and others added 5 commits May 15, 2026 14:01
Previously abort() returned the error on CancelledError and raised a
new one on other failures, leaving the chain errored so any queued
commit (notably the disconnect transaction) was never executed —
IOCs could stay Active in CF after an upload failure.

Co-authored-by: Sky Brewer <sky.brewer@ess.eu>
The single global DeferredLock serialised all IOC commits behind one
queue. Under load, the lock depth determined how long IOCs waited to
be marked active or inactive, and one stuck commit blocked every other.
Per-IOC locks let different IOCs commit in parallel while keeping
same-IOC transactions serialised.

_state_lock (threading.Lock) guards the shared iocs/channel_ioc_ids
dicts against concurrent thread writes. The CF push receives a targeted
snapshot of the channels relevant to this commit (records being deleted
plus the IOC's current channel set) so snapshot cost scales with the
commit rather than total channel count.

_ioc_channels tracks which channels belong to each IOC for O(1)
disconnect cost; registration deduplicates to prevent double-counting.

stopService drains all in-flight per-IOC locks before running the
clean_on_stop sweep, preventing Active pushes from racing the sweep.

The commit path moves to @inlineCallbacks so retry waits use
task.deferLater — no sleeping worker threads between attempts.

Also introduces CFUpdateAbortedError to distinguish exhausted CF
retries from cancellation, and fixes chain_error to surface unexpected
exceptions rather than swallowing them.

Co-authored-by: Sky Brewer <sky.brewer@ess.eu>
deferToThread is monkeypatched to return synchronously-resolved
Deferreds so the full _commit_with_lock callback chain can be
exercised without a reactor. The prepare_result tuple matches the
(ioc_info, record_info_by_name, records_to_delete, iocs_snap,
ciids_snap) signature of _prepare_commit; the push phase is
controlled via _push_to_cf_async. Covers lock identity, iocid
routing, prune lifecycle, and all four chain_error paths.

Co-authored-by: Sky Brewer <sky.brewer@ess.eu>
The global lock is gone; update demo.conf and recast.py to reflect
that pushAlwaysRetry and maxActive now operate per-IOC, and fix an
incorrect default value and a long-standing typo in the same pass.

Also rename single-letter CastFactory identifiers (P/P2 → proto/waiting,
addr → _addr) to satisfy linter, and correct the _stop_service_with_lock
docstring which still claimed the lifecycle lock prevents concurrent
commits (commits now use per-IOC locks and are drained explicitly before
the clean_on_stop sweep).
application.py unconditionally overwrote session.trlimit with the config
value, which defaulted to 0 (no limit), silently negating the trlimit=5000
class default added in 89be20c. Reference CollectionSession.trlimit directly
so the default cannot silently drift if the class default ever changes,
and add a regression test.
@sonarqubecloud
Copy link
Copy Markdown

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant