QueryForge Analytics

Personal LLM analytics product by Nikola Pantelic
GitHub: NikolaPantelic

Overview

QueryForge turns natural-language analytics questions into safe SQL, executes them on SQLite, and returns a concise answer with full execution metadata.

Pipeline stages:

SQL generation (LLM)
SQL validation
SQL execution
Answer generation (LLM)

Core goals:

Reliable SQL generation for analytical prompts
Strong query-safety guardrails
Production-grade observability and error handling
Good latency and token efficiency

Product Features

OpenRouter-backed LLM integration for SQL and answer synthesis
Multi-layer SQL validation (SELECT-only, forbidden operations, multi-statement blocking, column checks)
Token accounting (prompt_tokens, completion_tokens, total_tokens, llm_calls)
Structured timing and stage-level outputs in PipelineOutput
Benchmark script for repeatable latency/success measurement

Requirements

Python 3.13+
Dependencies in requirements.txt (openrouter, pandas)
OPENROUTER_API_KEY environment variable

Data Setup

This repository expects the Kaggle dataset file and converts it to SQLite:

Open Gaming and Mental Health dataset
Download gaming_mental_health_10M_40features.csv
Place it in data/
Build SQLite DB:

python3 -m pip install -r requirements.txt
python3 scripts/gaming_csv_to_db.py

Environment

Set your OpenRouter API key:

set OPENROUTER_API_KEY=<your_key>

Linux/macOS:

export OPENROUTER_API_KEY=<your_key>

Optional env vars:

OPENROUTER_MODEL (default model override)
LOG_LEVEL (default: INFO)

Run Tests

python3 -m unittest discover -s tests -p "test_public.py"
python3 -m unittest discover -s tests -p "test_validator.py"

Benchmark

python3 scripts/benchmark.py --runs 3

Outputs summary JSON including:

success_rate
avg_ms
p50_ms
p95_ms

Programmatic Usage

from src.pipeline import AnalyticsPipeline

pipeline = AnalyticsPipeline()
result = pipeline.run("Which addiction level bucket has the highest number of respondents?")

print(result.status)
print(result.sql)
print(result.answer)
print(result.total_llm_stats)

Repository Map

src/pipeline.py: orchestration, SQL validation, execution
src/llm_client.py: OpenRouter calls, prompting, token stats
src/types.py: dataclasses for stage and final outputs
scripts/gaming_csv_to_db.py: CSV to SQLite ingestion
scripts/benchmark.py: latency/success benchmark runner
tests/: integration and validation tests

Status

Current focus is single-turn analytics over one table (gaming_mental_health).
Planned extensions include multi-turn follow-up support, parser-based SQL validation, and metrics export integrations.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
scripts		scripts
src		src
tests		tests
.env.example		.env.example
.gitignore		.gitignore
CHECKLIST.md		CHECKLIST.md
README.md		README.md
SOLUTION_NOTES.md		SOLUTION_NOTES.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

QueryForge Analytics

Overview

Product Features

Requirements

Data Setup

Environment

Run Tests

Benchmark

Programmatic Usage

Repository Map

Status

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

QueryForge Analytics

Overview

Product Features

Requirements

Data Setup

Environment

Run Tests

Benchmark

Programmatic Usage

Repository Map

Status

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages