fix(publish): cap concurrent WebSocket connections with semaphore#1781
fix(publish): cap concurrent WebSocket connections with semaphore#1781
Conversation
🟡 Heimdall Review Status
|
|
The latest updates on your projects. Learn more about Vercel for GitHub. |
ced8b30 to
709113e
Compare
709113e to
b79d937
Compare
b79d937 to
315e018
Compare
Add a tokio::sync::Semaphore to bound the number of concurrent handshake + broadcast tasks. Without a limit, heavy connection pressure can exhaust memory and task-scheduler resources before handshake timeouts reclaim anything. Connections beyond the limit are dropped immediately with an on_connection_rejected metric callback. Builds on the handshake timeout and spawned-handshake improvements from #1252.
315e018 to
aa3b6d3
Compare
| /// | ||
| /// Defaults to [`DEFAULT_MAX_CONNECTIONS`] (8192). | ||
| pub const fn with_max_connections(mut self, max_connections: usize) -> Self { | ||
| self.max_connections = max_connections; |
There was a problem hiding this comment.
nit: with_max_connections(0) will create a Semaphore::new(0) that rejects every connection unconditionally. Consider validating the input or using NonZeroUsize:
| self.max_connections = max_connections; | |
| pub const fn with_max_connections(mut self, max_connections: NonZeroUsize) -> Self { | |
| self.max_connections = max_connections.get(); | |
| self |
Alternatively, a runtime assert!(max_connections > 0) or saturating to 1 would also work. Not a blocker, but a zero-capacity listener is almost certainly a misconfiguration.
There was a problem hiding this comment.
we're using this for the unit test
Review SummarySolid PR that addresses a real DoS vector. The semaphore-based approach is the right tool for the job — What looks good
Finding
|
|
Upon some further review, we will hold off/close this PR The main reason is the number of connections on mainnet are too low (~30) since the builder mostly sits behind websocket proxy. |
Builds on the handshake timeout and spawned-handshake improvements from #1252.
Summary
tokio::sync::Semaphoreto bound the number of concurrent WebSocket connections (handshaking + active) to a configurable limit (default 8192)on_connection_rejectedto thePublisherMetricstrait so rejected connections are observableListener::with_max_connectionsbuilder method to override the default limitProblem
There is no bound on the number of concurrently spawned tasks in the WebSocket listener. Under heavy connection pressure, each new TCP connection spawns a task that lives for up to
HANDSHAKE_TIMEOUT(10s) even if it never completes the upgrade. At sustained rates this can exhaust memory and task-scheduler resources before timeouts reclaim anything.Fix
Guard
tokio::spawnwith anArc<Semaphore>usingtry_acquire_owned. The permit is held for the entire lifetime of the spawned task (handshake + broadcast), so it is released when the client disconnects. Connections beyond the limit are dropped immediately with a debug log and a metric callback.Testing
All 25 tests pass (
cargo test -p base-builder-publish), including a newlistener_rejects_connections_beyond_limittest that verifies the semaphore rejects excess connections.