Skip to content

levvius/Fact-Classification-System

Repository files navigation

Fact Classification System

Python FastAPI License CI

Fact Classification System is a FastAPI-based service that classifies English factual text as "правда" (true), "неправда" (false), or "нейтрально" (neutral).

It combines claim extraction, Wikipedia evidence retrieval (FAISS), and NLI verification (roberta-large-mnli) to produce transparent, evidence-backed results.

Why this project

  • End-to-end NLP pipeline with practical trade-offs (accuracy vs latency).
  • Stateless API architecture with startup model loading and structured error handling.
  • Reproducible local setup with automated knowledge base build.
  • Separate unit/integration test suites for fast feedback and realistic validation.
  • Simple frontend for interactive demos without build tooling.

What you get

  • API endpoint to classify text and return per-claim evidence.
  • Confidence-aware aggregation for multi-claim text.
  • Built-in rate limiting, response caching, and input validation.
  • Health/status endpoints for runtime observability.

Architecture

flowchart LR
    A[Input text] --> B[Claim extraction]
    B --> C[Evidence retrieval\nFAISS + Wikipedia snippets]
    C --> D[NLI verification\nroberta-large-mnli]
    D --> E[Claim-level scoring]
    E --> F[Weighted aggregation]
    F --> G[Overall classification + evidence]
Loading

Main modules:

  • app/services/claim_extractor.py - sentence splitting and claim filtering.
  • app/services/evidence_retriever.py - embedding + FAISS nearest-neighbor lookup.
  • app/services/nli_verifier.py - entailment scoring for claim-evidence pairs.
  • app/services/classifier.py - thresholds and overall aggregation.
  • app/core/models.py - singleton lifecycle manager for all heavy models.

See docs/ARCHITECTURE.md for a deeper walkthrough.

Quick Start

git clone https://github.com/levvius/Fact-Classification-System.git
cd Fact-Classification-System
./run.sh

run.sh handles:

  • virtual environment creation (if missing),
  • dependency installation,
  • knowledge base build (if missing),
  • API startup on http://localhost:8000.

Open:

  • Web UI: http://localhost:8000
  • API docs: http://localhost:8000/docs
  • Health: http://localhost:8000/api/v1/health

Manual Setup

python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
python scripts/build_kb.py
uvicorn app.main:app --host 0.0.0.0 --port 8000

API Example

Request:

curl -X POST http://localhost:8000/api/v1/classify \
  -H "Content-Type: application/json" \
  -d '{"text":"Albert Einstein was born in 1879 and won the Nobel Prize in Physics in 1921."}'

Response (shape):

{
  "overall_classification": "правда",
  "confidence": 0.95,
  "claims": [
    {
      "claim": "Albert Einstein was born in 1879.",
      "classification": "правда",
      "confidence": 0.99,
      "best_evidence": {
        "snippet": "Albert Einstein was born in Ulm...",
        "source": "https://en.wikipedia.org/wiki/Albert_Einstein",
        "nli_score": 0.99,
        "retrieval_score": 0.98
      }
    }
  ]
}

Endpoints

Endpoint Method Description
/ GET Frontend UI (or API info fallback)
/api/v1/health GET Service/model readiness check
/api/v1/classify POST Main text classification endpoint
/api/v1/topics GET Available Wikipedia topics
/api/v1/cache-info GET Cache statistics
/docs GET OpenAPI/Swagger UI

Configuration

Environment variables are loaded from .env (see .env.example).

Key settings:

  • TRUTH_THRESHOLD (default 0.75)
  • FALSEHOOD_THRESHOLD (default 0.4)
  • TOP_K_PROOFS (default 10)
  • MAX_CLAIMS (default 8)
  • USE_WEIGHTED_AGGREGATION (default true)
  • USE_NLI_CONTEXT (default true)

Testing

# Unit tests (fast, mocked models)
pytest tests/unit -m unit

# Integration tests (real models, slower)
pytest tests/integration -m integration

Current test layout includes 99 tests in total (82 unit + 17 integration).

Repository Layout

Fact-Classification-System/
├── app/
│   ├── api/           # FastAPI routes and schemas
│   ├── core/          # config, model manager, cache, exceptions
│   ├── services/      # claim extraction, retrieval, NLI, classifier
│   ├── static/        # web UI (vanilla HTML/CSS/JS)
│   └── utils/         # KB building helpers
├── scripts/           # helper scripts (KB build)
├── tests/             # unit + integration tests
├── docs/              # architecture and development docs
├── run.sh
└── README.md

Engineering Notes

  • Models load once at startup through ModelManager.
  • API runs inference in a dedicated thread pool to avoid event-loop blocking.
  • CPU-only + single-threaded torch settings improve stability on macOS.
  • Rate limiting and validation harden public API usage.

Contributing

Contributions are welcome. Start with CONTRIBUTING.md for setup and workflow expectations.

License

MIT - see LICENSE.

About

Репозиторий для "Технологии проектирования и сопровождения информационных систем"

Topics

Resources

License

Contributing

Security policy

Stars

Watchers

Forks

Contributors