You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: self-host/nats-workers/overview.mdx
+13-59Lines changed: 13 additions & 59 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -12,7 +12,9 @@ description: "Scale Lightdash query processing with dedicated NATS worker pods u
12
12
13
13
By default, Lightdash processes all queries on the main API server. NATS workers move query execution onto dedicated pods, improving responsiveness under load and letting you scale query capacity independently.
14
14
15
-
Lightdash uses [NATS JetStream](https://nats.io/) as the message broker for distributing work between the API server and worker pods.
15
+
Lightdash uses [NATS](https://nats.io/) — a lightweight, high-performance messaging system — with [JetStream](https://docs.nats.io/nats-concepts/jetstream), its built-in persistent streaming layer, to distribute work between the API server and worker pods.
@@ -26,28 +28,22 @@ Lightdash uses [NATS JetStream](https://nats.io/) as the message broker for dist
26
28
## Requirements
27
29
28
30
-**Helm chart** version **2.7.2** or later
29
-
-Your Lightdash container image must include the NATS worker entrypoint — make sure you are running a recent version. Older images will fail with `MODULE_NOT_FOUND`.
31
+
-**Lightdash** version [**0.2675.0**](https://hub.docker.com/r/lightdash/lightdash/tags) or later. Older images will fail with `MODULE_NOT_FOUND`.
30
32
31
33
<Note>
32
34
Upgrading the Helm chart alone does not change how Lightdash works. NATS features are entirely opt-in — your existing deployment will behave exactly the same until you explicitly enable the new Helm values described below.
33
35
</Note>
34
36
35
37
## Architecture
36
38
37
-
When NATS is enabled, the Helm chart deploys three additional components alongside the main Lightdash backend:
|**NATS JetStream**|`nats.enabled: true`| Message broker with dedicated streams for warehouse queries and pre-aggregate jobs |
49
-
|**Warehouse worker**|`warehouseNatsWorker.enabled: true`| Consumes warehouse query jobs and executes SQL against your data warehouse |
50
-
|**Pre-aggregate worker**|`preAggregateNatsWorker.enabled: true`| Materializes pre-aggregates (warehouse → Parquet → S3) and serves DuckDB queries against Parquet files |
46
+
The Lightdash API publishes jobs to NATS JetStream. Worker pods consume messages from their stream and process them concurrently (default 100 concurrent jobs per pod).
51
47
52
48
## Enabling NATS
53
49
@@ -97,26 +93,23 @@ We recommend enabling these incrementally so you can validate each step:
97
93
| Setting | What changes | Risk level |
98
94
| --- | --- | --- |
99
95
| `nats.enabled: true` | Deploys the NATS StatefulSet. No queries are routed through it yet. | Low — just adds infrastructure, no behavior change. |
100
-
| `warehouseNatsWorker.enabled: true` | All warehouse query execution moves from the API server to dedicated worker pods via NATS. | Higher — changes the query execution path for all users. We have been running this in production for all Lightdash Cloud customers. |
96
+
| `warehouseNatsWorker.enabled: true` | All warehouse query execution moves from the API server to dedicated worker pods via NATS. | Higher — changes the query execution path for all users. |
101
97
| `preAggregateNatsWorker.enabled: true` | Enables the pre-aggregates feature and deploys the pre-aggregate worker. | Low — queries only use pre-aggregates after you [define them in your dbt YAML](/references/pre-aggregates/getting-started). Until then, nothing changes. |
102
98
103
99
## Auto-configured environment variables
104
100
105
-
The chart automatically sets these environment variables based on the Helm values above — you do not need to set them manually:
101
+
The chart automatically sets these environment variables in the shared ConfigMap — you do not need to set them manually:
The first four are set in the shared ConfigMap and apply to all pods. `NATS_WORKER_CONCURRENCY` is set per worker deployment based on the worker's `concurrency` Helm value.
108
+
Additional environment variables are auto-configured per worker deployment — see [Warehouse workers](/self-host/nats-workers/warehouse-workers) and [Pre-aggregate workers](/self-host/nats-workers/pre-aggregate-workers) for details.
116
109
117
110
## NATS JetStream configuration
118
111
119
-
Lightdash uses [NATS JetStream](https://docs.nats.io/nats-concepts/jetstream) for reliable message delivery between the API server and workers. JetStream supports two storage backends — we default to memory store, but you can switch to file store depending on your needs.
112
+
JetStream supports two storage backends — we default to memory store, but you can switch to file store depending on your needs.
120
113
121
114
### Memory store vs file store
122
115
@@ -132,7 +125,7 @@ For more details, see the NATS documentation on [JetStream storage](https://docs
132
125
133
126
### Default configuration
134
127
135
-
This is how we configure NATS JetStream internally at Lightdash Cloud:
128
+
Our recommended default configuration:
136
129
137
130
```yaml
138
131
nats:
@@ -147,43 +140,23 @@ nats:
147
140
memoryStore:
148
141
enabled: true
149
142
maxSize: 1Gi # max memory for message storage
150
-
reloader:
151
-
enabled: false
152
-
container:
153
-
merge:
154
-
resources:
155
-
requests:
156
-
memory: 256Mi
157
-
cpu: 100m
158
-
limits:
159
-
memory: 1Gi
160
143
```
161
144
162
145
| Setting | Default | Description |
163
146
| --- | --- | --- |
164
-
| `nats.config.jetstream.memoryStore.maxSize` | `1Gi` | Maximum memory for JetStream message storage |
| `nats.config.cluster.enabled` | `false` | Single-node NATS (no clustering) |
168
-
| `nats.reloader.enabled` | `false` | Config reloader sidecar (not needed for most deployments) |
169
-
| `nats.container.merge.resources.requests.memory` | `256Mi` | NATS server memory request |
170
-
| `nats.container.merge.resources.limits.memory` | `1Gi` | NATS server memory limit |
171
151
172
-
The 1Gi memory store default is sufficient for most workloads. If you need to increase it, make sure the NATS container memory limit is at least as large as `maxSize`:
152
+
The 1Gi memory store default is sufficient for most workloads. If you need to increase it:
173
153
174
154
```yaml
175
155
nats:
176
156
config:
177
157
jetstream:
178
158
memoryStore:
179
159
maxSize: 2Gi
180
-
container:
181
-
merge:
182
-
resources:
183
-
requests:
184
-
memory: 512Mi
185
-
limits:
186
-
memory: 2Gi
187
160
```
188
161
189
162
To switch to file store instead, disable memory store and enable file store with a PersistentVolumeClaim:
@@ -201,28 +174,9 @@ nats:
201
174
size: 10Gi
202
175
```
203
176
204
-
### Network policy
205
-
206
-
The chart deploys a `NetworkPolicy` that restricts NATS access to Lightdash pods only. If you need to allow Prometheus scraping from a specific namespace:
207
-
208
-
```yaml
209
-
nats:
210
-
networkPolicy:
211
-
enabled: true
212
-
additionalIngress:
213
-
- from:
214
-
- namespaceSelector:
215
-
matchLabels:
216
-
kubernetes.io/metadata.name: monitoring
217
-
ports:
218
-
- protocol: TCP
219
-
port: 7777
220
-
```
221
-
222
177
### Pod disruption
223
178
224
179
NATS is a stateful component — if the NATS pod restarts, in-flight messages are lost (queries will be retried by users). The chart protects against unplanned eviction with:
0 commit comments