Skip to content

Commit d109b75

Browse files
committed
docs: self hosting nats workers
1 parent 8f3273a commit d109b75

6 files changed

Lines changed: 486 additions & 128 deletions

File tree

docs.json

Lines changed: 17 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -295,8 +295,14 @@
295295
"self-host/enterprise-on-prem",
296296
"self-host/self-host-lightdash-docker-compose",
297297
"self-host/update-lightdash",
298-
"self-host/pre-aggregates",
299-
"self-host/nats-workers",
298+
{
299+
"group": "NATS workers",
300+
"pages": [
301+
"self-host/nats-workers/overview",
302+
"self-host/nats-workers/warehouse-workers",
303+
"self-host/nats-workers/pre-aggregate-workers"
304+
]
305+
},
300306
{
301307
"group": "Customize deployment",
302308
"pages": [
@@ -423,6 +429,14 @@
423429
"source": "/references/pre-aggregates",
424430
"destination": "/references/pre-aggregates/overview"
425431
},
432+
{
433+
"source": "/self-host/pre-aggregates",
434+
"destination": "/self-host/nats-workers/pre-aggregate-workers"
435+
},
436+
{
437+
"source": "/self-host/nats-workers",
438+
"destination": "/self-host/nats-workers/warehouse-workers"
439+
},
426440
{
427441
"source": "/guides/ai-analyst",
428442
"destination": "/guides/ai-agents"
@@ -677,4 +691,4 @@
677691
"display": "simple"
678692
}
679693
}
680-
}
694+
}

self-host/nats-workers.mdx

Lines changed: 0 additions & 41 deletions
This file was deleted.
Lines changed: 182 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,182 @@
1+
---
2+
title: "NATS workers"
3+
sidebarTitle: "Overview"
4+
description: "Scale Lightdash query processing with dedicated NATS worker pods using the Helm chart."
5+
---
6+
7+
<Badge color="blue" size="md" shape="pill">Helm chart</Badge>
8+
9+
<Callout icon="wrench" color="#6B7280">
10+
This page is for engineering teams self-hosting their own Lightdash instance.
11+
</Callout>
12+
13+
By default, Lightdash processes all queries on the main API server. NATS workers move query execution onto dedicated pods, improving responsiveness under load and letting you scale query capacity independently.
14+
15+
Lightdash uses [NATS](https://nats.io/) — a lightweight, high-performance messaging system — with [JetStream](https://docs.nats.io/nats-concepts/jetstream), its built-in persistent streaming layer, to distribute work between the API server and worker pods.
16+
17+
NATS powers two opt-in features in Lightdash:
18+
19+
<CardGroup cols={2}>
20+
<Card title="Warehouse workers" icon="database" horizontal href="/self-host/nats-workers/warehouse-workers">
21+
Process interactive and background warehouse queries on dedicated pods.
22+
</Card>
23+
<Card title="Pre-aggregate workers" icon="layer-group" horizontal href="/self-host/nats-workers/pre-aggregate-workers">
24+
Materialize pre-aggregates and serve queries from DuckDB.
25+
</Card>
26+
</CardGroup>
27+
28+
## Requirements
29+
30+
- **Helm chart** version **2.7.2** or later
31+
- **Lightdash** version [**0.2675.0**](https://hub.docker.com/r/lightdash/lightdash/tags) or later. Older images will fail with `MODULE_NOT_FOUND`.
32+
33+
<Note>
34+
Upgrading the Helm chart alone does not change how Lightdash works. NATS features are entirely opt-in — your existing deployment will behave exactly the same until you explicitly enable the new Helm values described below.
35+
</Note>
36+
37+
## Architecture
38+
39+
```mermaid
40+
flowchart LR
41+
API[Lightdash API] -->|publish job| NATS[NATS JetStream]
42+
NATS -->|deliver message| Worker[Worker pod<br/>concurrency: 100]
43+
Worker -->|return result| API
44+
```
45+
46+
The Lightdash API publishes jobs to NATS JetStream. Worker pods consume messages from their stream and process them concurrently (default 100 concurrent jobs per pod).
47+
48+
## Enabling NATS
49+
50+
You should be using the [Helm chart](/self-host/self-host-lightdash) to deploy Lightdash with NATS workers.
51+
52+
### Warehouse queries only
53+
54+
If you only need to scale warehouse query processing:
55+
56+
```yaml
57+
nats:
58+
enabled: true
59+
warehouseNatsWorker:
60+
enabled: true
61+
```
62+
63+
### With pre-aggregates
64+
65+
To also enable [pre-aggregates](/references/pre-aggregates/overview), add the pre-aggregate worker and required storage configuration:
66+
67+
```yaml
68+
nats:
69+
enabled: true
70+
warehouseNatsWorker:
71+
enabled: true
72+
preAggregateNatsWorker:
73+
enabled: true
74+
75+
# S3 storage for materialized data
76+
configMap:
77+
S3_ENDPOINT: "https://s3.us-east-1.amazonaws.com"
78+
PRE_AGGREGATE_RESULTS_S3_BUCKET: "my-lightdash-pre-aggs"
79+
PRE_AGGREGATE_RESULTS_S3_REGION: "us-east-1"
80+
secrets:
81+
PRE_AGGREGATE_RESULTS_S3_ACCESS_KEY: "your-access-key"
82+
PRE_AGGREGATE_RESULTS_S3_SECRET_KEY: "your-secret-key"
83+
```
84+
85+
<Note>
86+
Pre-aggregates require an [Enterprise license key](/self-host/customize-deployment/enterprise-license-keys) and a **dedicated S3 bucket** separate from your main Lightdash results cache bucket. See [Pre-aggregate workers](/self-host/nats-workers/pre-aggregate-workers) for details.
87+
</Note>
88+
89+
### What each setting does
90+
91+
We recommend enabling these incrementally so you can validate each step:
92+
93+
| Setting | What changes | Risk level |
94+
| --- | --- | --- |
95+
| `nats.enabled: true` | Deploys the NATS StatefulSet. No queries are routed through it yet. | Low — just adds infrastructure, no behavior change. |
96+
| `warehouseNatsWorker.enabled: true` | All warehouse query execution moves from the API server to dedicated worker pods via NATS. | Higher — changes the query execution path for all users. |
97+
| `preAggregateNatsWorker.enabled: true` | Enables the pre-aggregates feature and deploys the pre-aggregate worker. | Low — queries only use pre-aggregates after you [define them in your dbt YAML](/references/pre-aggregates/getting-started). Until then, nothing changes. |
98+
99+
## Auto-configured environment variables
100+
101+
The chart automatically sets these environment variables in the shared ConfigMap — you do not need to set them manually:
102+
103+
| Variable | Set when | Value |
104+
| --- | --- | --- |
105+
| `NATS_ENABLED` | `nats.enabled: true` | `"true"` |
106+
| `NATS_URL` | `nats.enabled: true` | `nats://<release>-nats:4222` |
107+
108+
Additional environment variables are auto-configured per worker deployment — see [Warehouse workers](/self-host/nats-workers/warehouse-workers) and [Pre-aggregate workers](/self-host/nats-workers/pre-aggregate-workers) for details.
109+
110+
## NATS JetStream configuration
111+
112+
JetStream supports two storage backends — we default to memory store, but you can switch to file store depending on your needs.
113+
114+
### Memory store vs file store
115+
116+
| | Memory store (default) | File store |
117+
| --- | --- | --- |
118+
| **How it works** | Messages are held in RAM | Messages are persisted to disk |
119+
| **Performance** | Faster — no disk I/O overhead | Slower — writes go through disk |
120+
| **Persistence** | Messages are lost if NATS restarts | Messages survive NATS restarts |
121+
| **Infrastructure** | No PersistentVolumeClaim needed | Requires a PersistentVolumeClaim |
122+
| **When to use** | Most deployments. Lightdash messages are small (just a query UUID) and are deleted once processed. | High message volume exceeding available RAM, or if you need messages to survive NATS pod restarts. |
123+
124+
For more details, see the NATS documentation on [JetStream storage](https://docs.nats.io/nats-concepts/jetstream/streams#storage-overhead) and [memory vs file store](https://docs.nats.io/running-a-nats-service/nats_admin/jetstream_admin/streams#extracting-stream-details).
125+
126+
### Default configuration
127+
128+
Our recommended default configuration:
129+
130+
```yaml
131+
nats:
132+
enabled: true
133+
config:
134+
cluster:
135+
enabled: false # single-node NATS, no clustering
136+
jetstream:
137+
enabled: true
138+
fileStore:
139+
enabled: false # no disk persistence
140+
memoryStore:
141+
enabled: true
142+
maxSize: 1Gi # max memory for message storage
143+
```
144+
145+
| Setting | Default | Description |
146+
| --- | --- | --- |
147+
| `nats.config.jetstream.memoryStore.enabled` | `true` | Enable memory-backed storage |
148+
| `nats.config.jetstream.memoryStore.maxSize` | `1Gi` | Maximum memory for JetStream message storage |
149+
| `nats.config.jetstream.fileStore.enabled` | `false` | Enable disk-backed storage |
150+
| `nats.config.cluster.enabled` | `false` | Single-node NATS (no clustering) |
151+
152+
The 1Gi memory store default is sufficient for most workloads. If you need to increase it:
153+
154+
```yaml
155+
nats:
156+
config:
157+
jetstream:
158+
memoryStore:
159+
maxSize: 2Gi
160+
```
161+
162+
To switch to file store instead, disable memory store and enable file store with a PersistentVolumeClaim:
163+
164+
```yaml
165+
nats:
166+
config:
167+
jetstream:
168+
memoryStore:
169+
enabled: false
170+
fileStore:
171+
enabled: true
172+
dir: /data
173+
pvc:
174+
size: 10Gi
175+
```
176+
177+
### Pod disruption
178+
179+
NATS is a stateful component — if the NATS pod restarts, in-flight messages are lost (queries will be retried by users). The chart protects against unplanned eviction with:
180+
181+
- `cluster-autoscaler.kubernetes.io/safe-to-evict: "false"` annotation
182+
- `PodDisruptionBudget` with `maxUnavailable: 0`

0 commit comments

Comments
 (0)