Fact Classification System

Fact Classification System is a FastAPI-based service that classifies English factual text as "правда" (true), "неправда" (false), or "нейтрально" (neutral).

It combines claim extraction, Wikipedia evidence retrieval (FAISS), and NLI verification (roberta-large-mnli) to produce transparent, evidence-backed results.

Why this project

End-to-end NLP pipeline with practical trade-offs (accuracy vs latency).
Stateless API architecture with startup model loading and structured error handling.
Reproducible local setup with automated knowledge base build.
Separate unit/integration test suites for fast feedback and realistic validation.
Simple frontend for interactive demos without build tooling.

What you get

API endpoint to classify text and return per-claim evidence.
Confidence-aware aggregation for multi-claim text.
Built-in rate limiting, response caching, and input validation.
Health/status endpoints for runtime observability.

Architecture

flowchart LR
    A[Input text] --> B[Claim extraction]
    B --> C[Evidence retrieval\nFAISS + Wikipedia snippets]
    C --> D[NLI verification\nroberta-large-mnli]
    D --> E[Claim-level scoring]
    E --> F[Weighted aggregation]
    F --> G[Overall classification + evidence]

Main modules:

app/services/claim_extractor.py - sentence splitting and claim filtering.
app/services/evidence_retriever.py - embedding + FAISS nearest-neighbor lookup.
app/services/nli_verifier.py - entailment scoring for claim-evidence pairs.
app/services/classifier.py - thresholds and overall aggregation.
app/core/models.py - singleton lifecycle manager for all heavy models.

See docs/ARCHITECTURE.md for a deeper walkthrough.

Quick Start

git clone https://github.com/levvius/Fact-Classification-System.git
cd Fact-Classification-System
./run.sh

run.sh handles:

virtual environment creation (if missing),
dependency installation,
knowledge base build (if missing),
API startup on http://localhost:8000.

Open:

Web UI: http://localhost:8000
API docs: http://localhost:8000/docs
Health: http://localhost:8000/api/v1/health

Manual Setup

python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
python scripts/build_kb.py
uvicorn app.main:app --host 0.0.0.0 --port 8000

API Example

Request:

curl -X POST http://localhost:8000/api/v1/classify \
  -H "Content-Type: application/json" \
  -d '{"text":"Albert Einstein was born in 1879 and won the Nobel Prize in Physics in 1921."}'

Response (shape):

{
  "overall_classification": "правда",
  "confidence": 0.95,
  "claims": [
    {
      "claim": "Albert Einstein was born in 1879.",
      "classification": "правда",
      "confidence": 0.99,
      "best_evidence": {
        "snippet": "Albert Einstein was born in Ulm...",
        "source": "https://en.wikipedia.org/wiki/Albert_Einstein",
        "nli_score": 0.99,
        "retrieval_score": 0.98
      }
    }
  ]
}

Endpoints

Endpoint	Method	Description
`/`	GET	Frontend UI (or API info fallback)
`/api/v1/health`	GET	Service/model readiness check
`/api/v1/classify`	POST	Main text classification endpoint
`/api/v1/topics`	GET	Available Wikipedia topics
`/api/v1/cache-info`	GET	Cache statistics
`/docs`	GET	OpenAPI/Swagger UI

Configuration

Environment variables are loaded from .env (see .env.example).

Key settings:

TRUTH_THRESHOLD (default 0.75)
FALSEHOOD_THRESHOLD (default 0.4)
TOP_K_PROOFS (default 10)
MAX_CLAIMS (default 8)
USE_WEIGHTED_AGGREGATION (default true)
USE_NLI_CONTEXT (default true)

Testing

# Unit tests (fast, mocked models)
pytest tests/unit -m unit

# Integration tests (real models, slower)
pytest tests/integration -m integration

Current test layout includes 99 tests in total (82 unit + 17 integration).

Repository Layout

Fact-Classification-System/
├── app/
│   ├── api/           # FastAPI routes and schemas
│   ├── core/          # config, model manager, cache, exceptions
│   ├── services/      # claim extraction, retrieval, NLI, classifier
│   ├── static/        # web UI (vanilla HTML/CSS/JS)
│   └── utils/         # KB building helpers
├── scripts/           # helper scripts (KB build)
├── tests/             # unit + integration tests
├── docs/              # architecture and development docs
├── run.sh
└── README.md

Engineering Notes

Models load once at startup through ModelManager.
API runs inference in a dedicated thread pool to avoid event-loop blocking.
CPU-only + single-threaded torch settings improve stability on macOS.
Rate limiting and validation harden public API usage.

Contributing

Contributions are welcome. Start with CONTRIBUTING.md for setup and workflow expectations.

License

MIT - see LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
.github		.github
app		app
docs		docs
scripts		scripts
tests		tests
.coveragerc		.coveragerc
.env.example		.env.example
.gitignore		.gitignore
.python-version		.python-version
CLAUDE.md		CLAUDE.md
CONTRIBUTING.md		CONTRIBUTING.md
IMPLEMENTATION_GUIDE.md		IMPLEMENTATION_GUIDE.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
pytest.ini		pytest.ini
requirements.txt		requirements.txt
requirements.txt.backup		requirements.txt.backup
run.sh		run.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Fact Classification System

Why this project

What you get

Architecture

Quick Start

Manual Setup

API Example

Endpoints

Configuration

Testing

Repository Layout

Engineering Notes

Contributing

License

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Fact Classification System

Why this project

What you get

Architecture

Quick Start

Manual Setup

API Example

Endpoints

Configuration

Testing

Repository Layout

Engineering Notes

Contributing

License

About

Topics

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages