Skip to content

Commit cb10e9f

Browse files
committed
docs: self hosting nats workers
1 parent 6070f82 commit cb10e9f

1 file changed

Lines changed: 13 additions & 59 deletions

File tree

self-host/nats-workers/overview.mdx

Lines changed: 13 additions & 59 deletions
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,9 @@ description: "Scale Lightdash query processing with dedicated NATS worker pods u
1212

1313
By default, Lightdash processes all queries on the main API server. NATS workers move query execution onto dedicated pods, improving responsiveness under load and letting you scale query capacity independently.
1414

15-
Lightdash uses [NATS JetStream](https://nats.io/) as the message broker for distributing work between the API server and worker pods.
15+
Lightdash uses [NATS](https://nats.io/) — a lightweight, high-performance messaging system — with [JetStream](https://docs.nats.io/nats-concepts/jetstream), its built-in persistent streaming layer, to distribute work between the API server and worker pods.
16+
17+
NATS powers two opt-in features in Lightdash:
1618

1719
<CardGroup cols={2}>
1820
<Card title="Warehouse workers" icon="database" horizontal href="/self-host/nats-workers/warehouse-workers">
@@ -26,28 +28,22 @@ Lightdash uses [NATS JetStream](https://nats.io/) as the message broker for dist
2628
## Requirements
2729

2830
- **Helm chart** version **2.7.2** or later
29-
- Your Lightdash container image must include the NATS worker entrypoint — make sure you are running a recent version. Older images will fail with `MODULE_NOT_FOUND`.
31+
- **Lightdash** version [**0.2675.0**](https://hub.docker.com/r/lightdash/lightdash/tags) or later. Older images will fail with `MODULE_NOT_FOUND`.
3032

3133
<Note>
3234
Upgrading the Helm chart alone does not change how Lightdash works. NATS features are entirely opt-in — your existing deployment will behave exactly the same until you explicitly enable the new Helm values described below.
3335
</Note>
3436

3537
## Architecture
3638

37-
When NATS is enabled, the Helm chart deploys three additional components alongside the main Lightdash backend:
38-
3939
```mermaid
4040
flowchart LR
4141
API[Lightdash API] -->|publish job| NATS[NATS JetStream]
42-
NATS -->|deliver message| Worker[Worker pod\nconcurrency: 100]
42+
NATS -->|deliver message| Worker[Worker pod<br/>concurrency: 100]
4343
Worker -->|return result| API
4444
```
4545

46-
| Component | Helm value | What it does |
47-
| --- | --- | --- |
48-
| **NATS JetStream** | `nats.enabled: true` | Message broker with dedicated streams for warehouse queries and pre-aggregate jobs |
49-
| **Warehouse worker** | `warehouseNatsWorker.enabled: true` | Consumes warehouse query jobs and executes SQL against your data warehouse |
50-
| **Pre-aggregate worker** | `preAggregateNatsWorker.enabled: true` | Materializes pre-aggregates (warehouse → Parquet → S3) and serves DuckDB queries against Parquet files |
46+
The Lightdash API publishes jobs to NATS JetStream. Worker pods consume messages from their stream and process them concurrently (default 100 concurrent jobs per pod).
5147

5248
## Enabling NATS
5349

@@ -97,26 +93,23 @@ We recommend enabling these incrementally so you can validate each step:
9793
| Setting | What changes | Risk level |
9894
| --- | --- | --- |
9995
| `nats.enabled: true` | Deploys the NATS StatefulSet. No queries are routed through it yet. | Low — just adds infrastructure, no behavior change. |
100-
| `warehouseNatsWorker.enabled: true` | All warehouse query execution moves from the API server to dedicated worker pods via NATS. | Higher — changes the query execution path for all users. We have been running this in production for all Lightdash Cloud customers. |
96+
| `warehouseNatsWorker.enabled: true` | All warehouse query execution moves from the API server to dedicated worker pods via NATS. | Higher — changes the query execution path for all users. |
10197
| `preAggregateNatsWorker.enabled: true` | Enables the pre-aggregates feature and deploys the pre-aggregate worker. | Low — queries only use pre-aggregates after you [define them in your dbt YAML](/references/pre-aggregates/getting-started). Until then, nothing changes. |
10298

10399
## Auto-configured environment variables
104100

105-
The chart automatically sets these environment variables based on the Helm values above — you do not need to set them manually:
101+
The chart automatically sets these environment variables in the shared ConfigMap — you do not need to set them manually:
106102

107103
| Variable | Set when | Value |
108104
| --- | --- | --- |
109105
| `NATS_ENABLED` | `nats.enabled: true` | `"true"` |
110106
| `NATS_URL` | `nats.enabled: true` | `nats://<release>-nats:4222` |
111-
| `NATS_WORKER_CONCURRENCY` | Any NATS worker is enabled | From the worker's `concurrency` Helm value (default `100`) |
112-
| `PRE_AGGREGATES_ENABLED` | `preAggregateNatsWorker.enabled: true` | `"true"` |
113-
| `PRE_AGGREGATES_PARQUET_ENABLED` | `preAggregateNatsWorker.enabled: true` | `"true"` |
114107

115-
The first four are set in the shared ConfigMap and apply to all pods. `NATS_WORKER_CONCURRENCY` is set per worker deployment based on the worker's `concurrency` Helm value.
108+
Additional environment variables are auto-configured per worker deployment — see [Warehouse workers](/self-host/nats-workers/warehouse-workers) and [Pre-aggregate workers](/self-host/nats-workers/pre-aggregate-workers) for details.
116109

117110
## NATS JetStream configuration
118111

119-
Lightdash uses [NATS JetStream](https://docs.nats.io/nats-concepts/jetstream) for reliable message delivery between the API server and workers. JetStream supports two storage backends — we default to memory store, but you can switch to file store depending on your needs.
112+
JetStream supports two storage backends — we default to memory store, but you can switch to file store depending on your needs.
120113

121114
### Memory store vs file store
122115

@@ -132,7 +125,7 @@ For more details, see the NATS documentation on [JetStream storage](https://docs
132125

133126
### Default configuration
134127

135-
This is how we configure NATS JetStream internally at Lightdash Cloud:
128+
Our recommended default configuration:
136129

137130
```yaml
138131
nats:
@@ -147,43 +140,23 @@ nats:
147140
memoryStore:
148141
enabled: true
149142
maxSize: 1Gi # max memory for message storage
150-
reloader:
151-
enabled: false
152-
container:
153-
merge:
154-
resources:
155-
requests:
156-
memory: 256Mi
157-
cpu: 100m
158-
limits:
159-
memory: 1Gi
160143
```
161144

162145
| Setting | Default | Description |
163146
| --- | --- | --- |
164-
| `nats.config.jetstream.memoryStore.maxSize` | `1Gi` | Maximum memory for JetStream message storage |
165147
| `nats.config.jetstream.memoryStore.enabled` | `true` | Enable memory-backed storage |
148+
| `nats.config.jetstream.memoryStore.maxSize` | `1Gi` | Maximum memory for JetStream message storage |
166149
| `nats.config.jetstream.fileStore.enabled` | `false` | Enable disk-backed storage |
167150
| `nats.config.cluster.enabled` | `false` | Single-node NATS (no clustering) |
168-
| `nats.reloader.enabled` | `false` | Config reloader sidecar (not needed for most deployments) |
169-
| `nats.container.merge.resources.requests.memory` | `256Mi` | NATS server memory request |
170-
| `nats.container.merge.resources.limits.memory` | `1Gi` | NATS server memory limit |
171151

172-
The 1Gi memory store default is sufficient for most workloads. If you need to increase it, make sure the NATS container memory limit is at least as large as `maxSize`:
152+
The 1Gi memory store default is sufficient for most workloads. If you need to increase it:
173153

174154
```yaml
175155
nats:
176156
config:
177157
jetstream:
178158
memoryStore:
179159
maxSize: 2Gi
180-
container:
181-
merge:
182-
resources:
183-
requests:
184-
memory: 512Mi
185-
limits:
186-
memory: 2Gi
187160
```
188161

189162
To switch to file store instead, disable memory store and enable file store with a PersistentVolumeClaim:
@@ -201,28 +174,9 @@ nats:
201174
size: 10Gi
202175
```
203176

204-
### Network policy
205-
206-
The chart deploys a `NetworkPolicy` that restricts NATS access to Lightdash pods only. If you need to allow Prometheus scraping from a specific namespace:
207-
208-
```yaml
209-
nats:
210-
networkPolicy:
211-
enabled: true
212-
additionalIngress:
213-
- from:
214-
- namespaceSelector:
215-
matchLabels:
216-
kubernetes.io/metadata.name: monitoring
217-
ports:
218-
- protocol: TCP
219-
port: 7777
220-
```
221-
222177
### Pod disruption
223178

224179
NATS is a stateful component — if the NATS pod restarts, in-flight messages are lost (queries will be retried by users). The chart protects against unplanned eviction with:
225180

226181
- `cluster-autoscaler.kubernetes.io/safe-to-evict: "false"` annotation
227182
- `PodDisruptionBudget` with `maxUnavailable: 0`
228-

0 commit comments

Comments
 (0)