Summary
Add optional ClickHouse support as a parallel write sink alongside PostgreSQL. PG continues to handle point lookups (get block, get tx by hash); ClickHouse handles analytical queries (aggregations, time-range scans, charts, top-N rankings).
Motivation
Atlas's analytics endpoints (daily tx charts, gas price history, token holder rankings) all hit PostgreSQL today. As chain history grows into hundreds of millions of rows, these queries will degrade — ClickHouse is purpose-built for this workload and can be orders of magnitude faster for full-column scans and aggregations.
Design
Dual-Sink Writer
- Add a
ClickHouseSink that writes blocks, txs, logs, and receipts using the clickhouse crate (RowBinary format + LZ4 compression)
- Wrap PG + CH sinks in a
SinkSet abstraction that fans out writes via tokio::try_join! (parallel writes, both must succeed)
- CH is optional — enabled via
ENABLE_CLICKHOUSE=true + CLICKHOUSE_URL
- CH sink should chunk large batches (e.g., 10K rows per INSERT) to bound memory
- Use
ReplacingMergeTree engine so re-inserts after crashes are safe (duplicates merged away)
- CH sink gets its own retry logic (3 attempts, exponential backoff)
ClickHouse Schema
- Mirror the core PG tables:
blocks, transactions, event_logs, receipts
- Partition by month:
toYYYYMM(timestamp) for efficient time-range pruning
- Bloom filter indexes on
event_logs(topic0, address) for fast event filtering
- Order keys optimized for analytical access patterns (e.g., logs ordered by
(address, topic0, block_number))
Query Routing
- Analytics API endpoints (charts, top holders, aggregations) route to ClickHouse when available
- Point lookups (get block by number, get tx by hash) stay on PostgreSQL
- Could expose an
?engine=clickhouse query param for explicit routing
PG → CH Backfill
- One-time migration tool that reads historical data from PG and bulk-loads into CH
- Allows adding ClickHouse to an existing Atlas deployment without re-indexing from the chain
- Should run as a background task with configurable concurrency and throttling
Configuration
| Var |
Default |
Description |
ENABLE_CLICKHOUSE |
false |
Enable ClickHouse dual-sink |
CLICKHOUSE_URL |
— |
ClickHouse HTTP endpoint (required if enabled) |
CLICKHOUSE_DATABASE |
atlas |
Database name |
CLICKHOUSE_USER |
default |
Auth user |
CLICKHOUSE_PASSWORD |
— |
Auth password |
CH_INSERT_CHUNK_SIZE |
10000 |
Rows per INSERT batch |
Docker Compose
- Add optional ClickHouse service (clickhouse/clickhouse-server)
- Only started when ENABLE_CLICKHOUSE=true profile is active
References
- tidx implementation:
sync/sink.rs (SinkSet fan-out), sync/ch_sink.rs (ClickHouse writer with chunked inserts + retry), clickhouse.rs (read engine with failover)
- tidx uses
ReplacingMergeTree + monthly partitioning + bloom filter indexes on logs
Summary
Add optional ClickHouse support as a parallel write sink alongside PostgreSQL. PG continues to handle point lookups (get block, get tx by hash); ClickHouse handles analytical queries (aggregations, time-range scans, charts, top-N rankings).
Motivation
Atlas's analytics endpoints (daily tx charts, gas price history, token holder rankings) all hit PostgreSQL today. As chain history grows into hundreds of millions of rows, these queries will degrade — ClickHouse is purpose-built for this workload and can be orders of magnitude faster for full-column scans and aggregations.
Design
Dual-Sink Writer
ClickHouseSinkthat writes blocks, txs, logs, and receipts using theclickhousecrate (RowBinary format + LZ4 compression)SinkSetabstraction that fans out writes viatokio::try_join!(parallel writes, both must succeed)ENABLE_CLICKHOUSE=true+CLICKHOUSE_URLReplacingMergeTreeengine so re-inserts after crashes are safe (duplicates merged away)ClickHouse Schema
blocks,transactions,event_logs,receiptstoYYYYMM(timestamp)for efficient time-range pruningevent_logs(topic0, address)for fast event filtering(address, topic0, block_number))Query Routing
?engine=clickhousequery param for explicit routingPG → CH Backfill
Configuration
ENABLE_CLICKHOUSEfalseCLICKHOUSE_URLCLICKHOUSE_DATABASEatlasCLICKHOUSE_USERdefaultCLICKHOUSE_PASSWORDCH_INSERT_CHUNK_SIZE10000Docker Compose
References
sync/sink.rs(SinkSet fan-out),sync/ch_sink.rs(ClickHouse writer with chunked inserts + retry),clickhouse.rs(read engine with failover)ReplacingMergeTree+ monthly partitioning + bloom filter indexes on logs