docs: self hosting nats workers

IrakliJani · IrakliJani · commit cb10e9f2d9bd · 2026-03-30T20:38:55.000+04:00
diff --git a/self-host/nats-workers/overview.mdx b/self-host/nats-workers/overview.mdx
@@ -12,7 +12,9 @@ description: "Scale Lightdash query processing with dedicated NATS worker pods u
 
 By default, Lightdash processes all queries on the main API server. NATS workers move query execution onto dedicated pods, improving responsiveness under load and letting you scale query capacity independently.
 
-Lightdash uses [NATS JetStream](https://nats.io/) as the message broker for distributing work between the API server and worker pods.
+Lightdash uses [NATS](https://nats.io/) — a lightweight, high-performance messaging system — with [JetStream](https://docs.nats.io/nats-concepts/jetstream), its built-in persistent streaming layer, to distribute work between the API server and worker pods.
+
+NATS powers two opt-in features in Lightdash:
 
 <CardGroup cols={2}>
   <Card title="Warehouse workers" icon="database" horizontal href="/self-host/nats-workers/warehouse-workers">
@@ -26,28 +28,22 @@ Lightdash uses [NATS JetStream](https://nats.io/) as the message broker for dist
 ## Requirements
 
 - **Helm chart** version **2.7.2** or later
-- Your Lightdash container image must include the NATS worker entrypoint — make sure you are running a recent version. Older images will fail with `MODULE_NOT_FOUND`.
+- **Lightdash** version [**0.2675.0**](https://hub.docker.com/r/lightdash/lightdash/tags) or later. Older images will fail with `MODULE_NOT_FOUND`.
 
 <Note>
   Upgrading the Helm chart alone does not change how Lightdash works. NATS features are entirely opt-in — your existing deployment will behave exactly the same until you explicitly enable the new Helm values described below.
 </Note>
 
 ## Architecture
 
-When NATS is enabled, the Helm chart deploys three additional components alongside the main Lightdash backend:
-
 ```mermaid
 flowchart LR
     API[Lightdash API] -->|publish job| NATS[NATS JetStream]
-    NATS -->|deliver message| Worker[Worker pod\nconcurrency: 100]
+    NATS -->|deliver message| Worker[Worker pod<br/>concurrency: 100]
     Worker -->|return result| API
 ```
 
-| Component | Helm value | What it does |
-| --- | --- | --- |
-| **NATS JetStream** | `nats.enabled: true` | Message broker with dedicated streams for warehouse queries and pre-aggregate jobs |
-| **Warehouse worker** | `warehouseNatsWorker.enabled: true` | Consumes warehouse query jobs and executes SQL against your data warehouse |
-| **Pre-aggregate worker** | `preAggregateNatsWorker.enabled: true` | Materializes pre-aggregates (warehouse → Parquet → S3) and serves DuckDB queries against Parquet files |
+The Lightdash API publishes jobs to NATS JetStream. Worker pods consume messages from their stream and process them concurrently (default 100 concurrent jobs per pod).
 
 ## Enabling NATS
 
@@ -97,26 +93,23 @@ We recommend enabling these incrementally so you can validate each step:
 | Setting | What changes | Risk level |
 | --- | --- | --- |
 | `nats.enabled: true` | Deploys the NATS StatefulSet. No queries are routed through it yet. | Low — just adds infrastructure, no behavior change. |
-| `warehouseNatsWorker.enabled: true` | All warehouse query execution moves from the API server to dedicated worker pods via NATS. | Higher — changes the query execution path for all users. We have been running this in production for all Lightdash Cloud customers. |
+| `warehouseNatsWorker.enabled: true` | All warehouse query execution moves from the API server to dedicated worker pods via NATS. | Higher — changes the query execution path for all users. |
 | `preAggregateNatsWorker.enabled: true` | Enables the pre-aggregates feature and deploys the pre-aggregate worker. | Low — queries only use pre-aggregates after you [define them in your dbt YAML](/references/pre-aggregates/getting-started). Until then, nothing changes. |
 
 ## Auto-configured environment variables
 
-The chart automatically sets these environment variables based on the Helm values above — you do not need to set them manually:
+The chart automatically sets these environment variables in the shared ConfigMap — you do not need to set them manually:
 
 | Variable | Set when | Value |
 | --- | --- | --- |
 | `NATS_ENABLED` | `nats.enabled: true` | `"true"` |
 | `NATS_URL` | `nats.enabled: true` | `nats://<release>-nats:4222` |
-| `NATS_WORKER_CONCURRENCY` | Any NATS worker is enabled | From the worker's `concurrency` Helm value (default `100`) |
-| `PRE_AGGREGATES_ENABLED` | `preAggregateNatsWorker.enabled: true` | `"true"` |
-| `PRE_AGGREGATES_PARQUET_ENABLED` | `preAggregateNatsWorker.enabled: true` | `"true"` |
 
-The first four are set in the shared ConfigMap and apply to all pods. `NATS_WORKER_CONCURRENCY` is set per worker deployment based on the worker's `concurrency` Helm value.
+Additional environment variables are auto-configured per worker deployment — see [Warehouse workers](/self-host/nats-workers/warehouse-workers) and [Pre-aggregate workers](/self-host/nats-workers/pre-aggregate-workers) for details.
 
 ## NATS JetStream configuration
 
-Lightdash uses [NATS JetStream](https://docs.nats.io/nats-concepts/jetstream) for reliable message delivery between the API server and workers. JetStream supports two storage backends — we default to memory store, but you can switch to file store depending on your needs.
+JetStream supports two storage backends — we default to memory store, but you can switch to file store depending on your needs.
 
 ### Memory store vs file store
 
@@ -132,7 +125,7 @@ For more details, see the NATS documentation on [JetStream storage](https://docs
 
 ### Default configuration
 
-This is how we configure NATS JetStream internally at Lightdash Cloud:
+Our recommended default configuration:
 
 ```yaml
 nats:
@@ -147,43 +140,23 @@ nats:
       memoryStore:
         enabled: true
         maxSize: 1Gi           # max memory for message storage
-  reloader:
-    enabled: false
-  container:
-    merge:
-      resources:
-        requests:
-          memory: 256Mi
-          cpu: 100m
-        limits:
-          memory: 1Gi
 ```
 
 | Setting | Default | Description |
 | --- | --- | --- |
-| `nats.config.jetstream.memoryStore.maxSize` | `1Gi` | Maximum memory for JetStream message storage |
 | `nats.config.jetstream.memoryStore.enabled` | `true` | Enable memory-backed storage |
+| `nats.config.jetstream.memoryStore.maxSize` | `1Gi` | Maximum memory for JetStream message storage |
 | `nats.config.jetstream.fileStore.enabled` | `false` | Enable disk-backed storage |
 | `nats.config.cluster.enabled` | `false` | Single-node NATS (no clustering) |
-| `nats.reloader.enabled` | `false` | Config reloader sidecar (not needed for most deployments) |
-| `nats.container.merge.resources.requests.memory` | `256Mi` | NATS server memory request |
-| `nats.container.merge.resources.limits.memory` | `1Gi` | NATS server memory limit |
 
-The 1Gi memory store default is sufficient for most workloads. If you need to increase it, make sure the NATS container memory limit is at least as large as `maxSize`:
+The 1Gi memory store default is sufficient for most workloads. If you need to increase it:
 
 ```yaml
 nats:
   config:
     jetstream:
       memoryStore:
         maxSize: 2Gi
-  container:
-    merge:
-      resources:
-        requests:
-          memory: 512Mi
-        limits:
-          memory: 2Gi
 ```
 
 To switch to file store instead, disable memory store and enable file store with a PersistentVolumeClaim:
@@ -201,28 +174,9 @@ nats:
           size: 10Gi
 ```
 
-### Network policy
-
-The chart deploys a `NetworkPolicy` that restricts NATS access to Lightdash pods only. If you need to allow Prometheus scraping from a specific namespace:
-
-```yaml
-nats:
-  networkPolicy:
-    enabled: true
-    additionalIngress:
-      - from:
-          - namespaceSelector:
-              matchLabels:
-                kubernetes.io/metadata.name: monitoring
-        ports:
-          - protocol: TCP
-            port: 7777
-```
-
 ### Pod disruption
 
 NATS is a stateful component — if the NATS pod restarts, in-flight messages are lost (queries will be retried by users). The chart protects against unplanned eviction with:
 
 - `cluster-autoscaler.kubernetes.io/safe-to-evict: "false"` annotation
 - `PodDisruptionBudget` with `maxUnavailable: 0`
-