boltsql × Spider quickstart
Requirements
- Python 3.9+
- Spider dataset (dev.json, tables.json, database/*.sqlite)
- Built boltsql binary (target/release/bolt-sql)
- Build boltsql
cargo build --release
- Generate catalogs per DB
python scripts/spider_to_catalog.py --spider-dir /path/to/spider --out-dir spider_catalogs
This creates spider_catalogs/<db_id>/catalog.json.
- Run evaluation (subset)
python scripts/eval_spider.py --spider-dir /path/to/spider
--catalogs-dir spider_catalogs
--boltsql target/release/bolt-sql
--model gpt-4o-mini
--limit 100
--qps 2.0
--out spider_eval.jsonl
- Score
python scripts/eval_spider_score.py --in spider_eval.jsonl --out spider_summary.json
Notes
- Uses SQLite DBs that ship with Spider. Connection URL is sqlite://.sqlite.
- The evaluation harness computes:
- Exact match: canonicalized SQL string comparison
- Execution match: compares result sets from gold vs generated SQL (order-insensitive, top-100)
- Token Jaccard: token-level similarity score
- Use the LLM cache (logger/llm-cache.jsonl) to avoid repeated generations.