AI Interview Question Generator (Gemini + RAG)

A full-stack web application that generates personalized interview questions for IT specialists using a modular RAG (Retrieval-Augmented Generation) pipeline. The system retrieves questions from a PDF-based knowledge base and optionally generates additional questions using Google's Gemini AI.

Features

RAG-based Question Retrieval: Primary source of questions from PDF files organized by specialization
AI-powered Difficulty Classification: Automatically classifies question difficulty (easy/medium/hard) using Gemini
Experience-based Distribution: Probabilistic selection of questions based on candidate experience level
- Junior: 60% easy, 30% medium, 10% hard
- Middle: 30% easy, 50% medium, 20% hard
- Senior: 10% easy, 30% medium, 60% hard
CV Analysis: Optional CV upload (PDF, DOCX, TXT) for personalized question generation
Specialization Filtering: Supports backend, frontend, DevOps, ML, mobile, data engineering, QA, and other specializations
Optional LLM Generation: Can generate additional questions via Gemini based on CV and context
Automated Benchmarking: Evaluates RAG pipeline performance with real CVs
LIME Explainer: Explains why specific documents were retrieved with high scores using Local Interpretable Model-agnostic Explanations

LIME Retrieval Explainer

LIME explains why a specific document was ranked for a query.

Endpoint: POST /api/interview/explain-retrieval
Runtime code: backend/app/services/lime_retrieval.py
Minimal notes: backend/LIME_notes.md
Request/response guide: backend/LIME_api.md

Architecture

Backend (FastAPI)

Location: backend/

Main Entry: app.main:app

Key Services:

services/ingestion.py: CV upload and text extraction from PDF, DOCX, TXT
services/embeddings.py: HuggingFace SentenceTransformer embeddings (all-MiniLM-L6-v2, 384 dimensions)
services/vectorstores.py: ChromaDB vector store implementation
services/retrieval.py: RAG retrieval service with similarity search
services/reranking.py: Heuristic reranking of retrieved results
services/prompting.py: Prompt construction for CV understanding and question generation
services/pipeline.py: End-to-end interview question generation pipeline
- RAG retrieval from PDF question bank
- Difficulty classification via Gemini
- Probabilistic question selection
- Optional LLM-based question generation
services/memory.py: In-memory CV storage and response caching
services/benchmarking.py: Automated RAG evaluation with metrics
services/gemini_client.py: Gemini API client with retry logic for rate limits
services/qa_corpus.py: Loads questions from PDF files in data/questions/
services/pdf_question_parser.py: Extracts Q&A pairs from structured PDF documents

API Endpoints:

POST /api/interview/upload-cv: Upload and process CV file
POST /api/interview/generate: Generate interview questions
GET /api/interview/specializations: List available specializations
POST /api/interview/explain-retrieval: Explain retrieval score using LIME (new)
POST /api/benchmarking/run: Run automated benchmark evaluation
GET /api/health: Health check

Frontend (React + Vite)

Location: frontend/

Tech Stack: Vite + React + TypeScript

Features:

CV upload interface (optional)
Specialization and experience level selection
Prompt mode selection (CV-focused, specialization-focused, mixed)
Optional custom parameters (tech stack, company profile, job description)
Question display with difficulty tags
Export questions to PDF
Optional LLM generation toggle

Setup

Prerequisites

Python 3.11+
Node.js 18+
Docker and Docker Compose (for containerized deployment)
Google Gemini API key (Get one here)

Local Development

Backend

cd backend
pip install --upgrade pip
pip install .
export GEMINI_API_KEY="YOUR_KEY_HERE"  # Windows: set GEMINI_API_KEY=...
uvicorn app.main:app --reload --host 0.0.0.0 --port 8000

Backend will be available at http://localhost:8000

Frontend

cd frontend
npm install
npm run dev

Frontend will be available at http://localhost:3000

The dev server proxies /api requests to http://localhost:8000.

Docker Deployment

Create a .env file in the project root:

GEMINI_API_KEY=your_gemini_api_key_here
CORS_ALLOW_ORIGINS=["http://localhost:3000","http://127.0.0.1:3000"]

Build and run:

docker-compose up --build

Backend API: http://localhost:8000/api
Frontend: http://localhost:3000
API Docs: http://localhost:8000/api/docs

Question Data Structure

PDF Question Bank

Place PDF files containing interview questions in data/questions/ directory. Files should be named by specialization (e.g., backend.pdf, ml.pdf, qa.pdf).

Expected PDF Format:

Questions marked with "Q:", "Question:", or numbered (1., 2., etc.)
Answers marked with "A:", "Answer:", "Ideal Answer:", etc.
Optional category and difficulty markers

The system automatically:

Parses Q&A pairs from all PDF files
Generates embeddings using HuggingFace model
Stores them in ChromaDB vector store
Filters by specialization based on source file name

CSV Reference (Optional)

An optional questions.csv file can be placed in data/questions/ for LLM generation reference. This file is not used for RAG retrieval, only as a style reference when generating additional questions via LLM.

CSV Format:

Question,Answer,Category,Difficulty
"What is...?","Answer text...","Category","easy"

How It Works

Question Retrieval (RAG):
- User selects specialization and experience level
- System retrieves relevant questions from PDF-based vector store
- Questions are filtered by specialization (using source file names and categories)
Difficulty Classification:
- Retrieved questions are sent to Gemini for difficulty classification
- Each question is classified as easy, medium, or hard
Question Selection:
- Questions are selected probabilistically based on experience level
- System ensures exactly the requested number of questions
Optional LLM Generation:
- If generate_with_llm is enabled, additional questions are generated via Gemini
- LLM uses CV summary, retrieved questions, and custom context as references
Response:
- Returns list of questions with difficulty, category, ideal answers, and explanations

Public Access

To make the application accessible to others over the internet, see PUBLIC_ACCESS.md for detailed instructions.

Quick start with Tuna (recommended):

Sign up at https://tuna.am/
Install Tuna CLI: tuna auth (follow instructions)
Start your application: docker-compose up or run locally
Start tunnels:
- Windows: .\start-tuna.ps1
- Linux/Mac: chmod +x start-tuna.sh && ./start-tuna.sh
Copy the Tuna URLs and add them to CORS_ALLOW_ORIGINS in .env
Restart backend: docker-compose restart backend (or restart local server)

Alternative: ngrok:

Install ngrok
Create .env file with your Gemini API key
Run docker-compose up
In a new terminal: ngrok http 3000
Add the ngrok HTTPS URL to CORS_ALLOW_ORIGINS in .env
Restart backend: docker-compose restart backend

Configuration

Environment Variables

GEMINI_API_KEY (required): Your Google Gemini API key
CORS_ALLOW_ORIGINS: JSON array of allowed origins for CORS
- Example: ["http://localhost:3000","http://127.0.0.1:3000","https://your-domain.com"]
- Use ["*"] to allow all origins (not recommended for production)

Model Configuration

Default models (configurable in backend/app/core/config.py):

Embeddings: all-MiniLM-L6-v2 (HuggingFace, 384 dimensions)
Question Generation: gemini-2.5-pro
CV Understanding & Classification: gemini-2.5-flash

Project Structure

.
├── backend/
│   ├── app/
│   │   ├── core/          # Configuration and logging
│   │   ├── models/        # Pydantic schemas
│   │   ├── routes/        # API endpoints
│   │   └── services/      # Business logic
│   ├── data/
│   │   ├── questions/     # PDF question files
│   │   └── basic_cv/      # CV files for benchmarking
│   └── pyproject.toml     # Python dependencies
├── frontend/
│   ├── src/
│   │   ├── components/    # React components
│   │   ├── services/      # API client
│   │   └── App.tsx        # Main app component
│   └── nginx.conf         # Nginx configuration
├── docker-compose.yml     # Docker orchestration
└── README.md              # This file

License

This project is provided as-is for educational and demonstration purposes.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
backend		backend
frontend		frontend
screenshots		screenshots
.gitignore		.gitignore
README.md		README.md
blog_post.md		blog_post.md
docker-compose.yml		docker-compose.yml
requirements.txt		requirements.txt
xai_blogpost (1).pdf		xai_blogpost (1).pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AI Interview Question Generator (Gemini + RAG)

Features

LIME Retrieval Explainer

Architecture

Backend (FastAPI)

Frontend (React + Vite)

Setup

Prerequisites

Local Development

Backend

Frontend

Docker Deployment

Question Data Structure

PDF Question Bank

CSV Reference (Optional)

How It Works

Public Access

Configuration

Environment Variables

Model Configuration

Project Structure

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

AI Interview Question Generator (Gemini + RAG)

Features

LIME Retrieval Explainer

Architecture

Backend (FastAPI)

Frontend (React + Vite)

Setup

Prerequisites

Local Development

Backend

Frontend

Docker Deployment

Question Data Structure

PDF Question Bank

CSV Reference (Optional)

How It Works

Public Access

Configuration

Environment Variables

Model Configuration

Project Structure

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages