Fact Classification System is a FastAPI-based service that classifies English factual text as "правда" (true), "неправда" (false), or "нейтрально" (neutral).
It combines claim extraction, Wikipedia evidence retrieval (FAISS), and NLI verification (roberta-large-mnli) to produce transparent, evidence-backed results.
- End-to-end NLP pipeline with practical trade-offs (accuracy vs latency).
- Stateless API architecture with startup model loading and structured error handling.
- Reproducible local setup with automated knowledge base build.
- Separate unit/integration test suites for fast feedback and realistic validation.
- Simple frontend for interactive demos without build tooling.
- API endpoint to classify text and return per-claim evidence.
- Confidence-aware aggregation for multi-claim text.
- Built-in rate limiting, response caching, and input validation.
- Health/status endpoints for runtime observability.
flowchart LR
A[Input text] --> B[Claim extraction]
B --> C[Evidence retrieval\nFAISS + Wikipedia snippets]
C --> D[NLI verification\nroberta-large-mnli]
D --> E[Claim-level scoring]
E --> F[Weighted aggregation]
F --> G[Overall classification + evidence]
Main modules:
app/services/claim_extractor.py- sentence splitting and claim filtering.app/services/evidence_retriever.py- embedding + FAISS nearest-neighbor lookup.app/services/nli_verifier.py- entailment scoring for claim-evidence pairs.app/services/classifier.py- thresholds and overall aggregation.app/core/models.py- singleton lifecycle manager for all heavy models.
See docs/ARCHITECTURE.md for a deeper walkthrough.
git clone https://github.com/levvius/Fact-Classification-System.git
cd Fact-Classification-System
./run.shrun.sh handles:
- virtual environment creation (if missing),
- dependency installation,
- knowledge base build (if missing),
- API startup on
http://localhost:8000.
Open:
- Web UI:
http://localhost:8000 - API docs:
http://localhost:8000/docs - Health:
http://localhost:8000/api/v1/health
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
python scripts/build_kb.py
uvicorn app.main:app --host 0.0.0.0 --port 8000Request:
curl -X POST http://localhost:8000/api/v1/classify \
-H "Content-Type: application/json" \
-d '{"text":"Albert Einstein was born in 1879 and won the Nobel Prize in Physics in 1921."}'Response (shape):
{
"overall_classification": "правда",
"confidence": 0.95,
"claims": [
{
"claim": "Albert Einstein was born in 1879.",
"classification": "правда",
"confidence": 0.99,
"best_evidence": {
"snippet": "Albert Einstein was born in Ulm...",
"source": "https://en.wikipedia.org/wiki/Albert_Einstein",
"nli_score": 0.99,
"retrieval_score": 0.98
}
}
]
}| Endpoint | Method | Description |
|---|---|---|
/ |
GET | Frontend UI (or API info fallback) |
/api/v1/health |
GET | Service/model readiness check |
/api/v1/classify |
POST | Main text classification endpoint |
/api/v1/topics |
GET | Available Wikipedia topics |
/api/v1/cache-info |
GET | Cache statistics |
/docs |
GET | OpenAPI/Swagger UI |
Environment variables are loaded from .env (see .env.example).
Key settings:
TRUTH_THRESHOLD(default0.75)FALSEHOOD_THRESHOLD(default0.4)TOP_K_PROOFS(default10)MAX_CLAIMS(default8)USE_WEIGHTED_AGGREGATION(defaulttrue)USE_NLI_CONTEXT(defaulttrue)
# Unit tests (fast, mocked models)
pytest tests/unit -m unit
# Integration tests (real models, slower)
pytest tests/integration -m integrationCurrent test layout includes 99 tests in total (82 unit + 17 integration).
Fact-Classification-System/
├── app/
│ ├── api/ # FastAPI routes and schemas
│ ├── core/ # config, model manager, cache, exceptions
│ ├── services/ # claim extraction, retrieval, NLI, classifier
│ ├── static/ # web UI (vanilla HTML/CSS/JS)
│ └── utils/ # KB building helpers
├── scripts/ # helper scripts (KB build)
├── tests/ # unit + integration tests
├── docs/ # architecture and development docs
├── run.sh
└── README.md
- Models load once at startup through
ModelManager. - API runs inference in a dedicated thread pool to avoid event-loop blocking.
- CPU-only + single-threaded torch settings improve stability on macOS.
- Rate limiting and validation harden public API usage.
Contributions are welcome. Start with CONTRIBUTING.md for setup and workflow expectations.
MIT - see LICENSE.