π Google Live Agent Hackathon Submission Category: Live Agents π£οΈ An advanced multimodal AI interview coach that can See, Hear, Speak, and Act to train candidates for developer jobs at top tech companies through interactive mock interviews.
Interview Coach is not just a chatbot; it's a live, interactive interview trainer. Built using the Google ADK (Agent Development Kit) and the Gemini Live API, it allows candidates to:
- Practice DSA and System Design interviews through real-time bidirectional voice conversation.
- Share their screen while coding or upload photos of whiteboard sketches.
- Draw system design diagrams on a built-in whiteboard canvas and get instant visual feedback.
- Have their Python code executed and validated in real-time by the agent.
The agent uses a Socratic teaching method β asking guiding questions, giving progressive hints, and correcting mistakes rather than simply providing answers.
| Capability | How It Works |
|---|---|
| π Hear | Real-time voice streaming at 16kHz PCM. The candidate speaks their thought process naturally, just like in a real interview. |
| π£οΈ Speak | The coach responds with natural, sub-second latency voice (24kHz PCM) β asking follow-ups, giving hints, and correcting mistakes. |
| ποΈ See | Candidates can share their screen showing code in an IDE, upload photos, or use the built-in whiteboard to draw system design diagrams. The coach reads code, examines diagrams, and gives specific visual feedback. |
| π οΈ Act (Code Execution) | The agent can execute Python code via an MCP server to validate candidate solutions against test cases in real-time. |
| βοΈ Whiteboard | Built-in drawing canvas for system design sketches β draw components, arrows, and diagrams, then send to the coach for review. |
| π Graceful Interruption | Candidates can interrupt the coach mid-sentence simply by speaking over it. The system instantly clears audio buffers and handles the interruption smoothly β just like a real conversation. |
| π§ Adaptive Coaching | The coach adjusts difficulty based on performance. Struggling? More scaffolding. Crushing it? Harder problems and deeper follow-ups. |
The application consists of a React frontend, a FastAPI WebSocket backend, and the Google ADK routing to the Gemini model.
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β FRONTEND β
β React + Vite + Shadcn/UI β
β β
β ββββββββββββββββ ββββββββββββββ ββββββββββββββββ ββββββββββββββββ β
β β ChatInterfaceβ βVoiceButton β βActivity Log β β Whiteboard β β
β β Messages, β β Mic toggle β β Live MCP β β Canvas for β β
β β text input β β with pulse β β tracking UI β β diagrams β β
β ββββββββ¬ββββββββ βββββββ¬βββββββ ββββββββ¬ββββββββ ββββββββ¬ββββββββ β
β β β β β β
β ββββββββββββββββββΌββββββββββββββββββΌβββββββββββββββββββ β
β β β β
β βββββββββββΌββββββββββββββββββΌβββ β
β β useWebSocket Hook β β
β β β’ Connects to WS server β β
β β β’ Sends text/audio/images β β
β β β’ Parses ADK response events β β
β β β’ Plays audio β β
β βββββββββββββββ¬ββββββββββββββββββ β
ββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββββββββββββββββββ
β
WebSocket Connection
ws://localhost:8000/ws/{session}
ββ Upstream: JSON/PCM text/audio/images
ββ Downstream: ADK Event objects (JSON)
β
ββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββββββββββββββββββ
β BACKEND β
β FastAPI + Google ADK β
β β
β βββββββββββββββββββββββββββββΌβββββββββββββ β
β β WebSocket Server β β
β β (app/main.py) β β
β βββββββββββββββββββββββ¬ββββββββββββββββββββ β
β β β
β βββββββββββββββββββββββ΄ββββββββββββββββββββ β
β β ADK Runner.run_live() β β
β β β’ Routes multimodal input β β
β β β’ Yields response events β β
β βββββββββββββββββββββββ¬ββββββββββββββββββββ β
β β β
β βββββββββββββββββββββββΌββββββββββββββββββββββββββββββββββββββββββββββ β
β β Root Agent (Interview Coach) β β
β β Acts as MCP Tool Client + Google Search β β
β βββββββββββββββ¬βββββββββββββββββββββββββββββββ¬βββββββββββββββββββββββ β
β β β β
β MCP β MCP β β
β Protocol β Protocol β β
β βββββββββββββββΌβββββββββββββββββββ βββββββββββΌβββββββββββββββββββββββ β
β β LeetCode MCP Server β β CodeExec MCP (FastMCP) β β
β β (@jinzcdev/leetcode-mcp) β β - run_python_code β β
β β - get_daily_challenge β β (execute & validate β β
β β - get_problem β β candidate solutions) β β
β β - search_problems β ββββββββββββββββββββββββββββββββββ β
β β - list_problem_solutions β β
β β - get_problem_solution β β
β β + user profile/submission toolsβ β
β ββββββββββββββββββββββββββββββββββ β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
The agent uses the Model Context Protocol (MCP) for its tool capabilities, connecting to two MCP servers:
- LeetCode MCP Server (
@jinzcdev/leetcode-mcp-server): A community MCP server that provides direct access to LeetCode's problem database via their GraphQL API. The agent can search problems by difficulty/tags, fetch full problem details, get daily challenges, and retrieve community solutions β all from real LeetCode data. - Code Execution MCP Server (local FastMCP): A lightweight local server that provides sandboxed Python code execution for validating candidate solutions in real-time.
- Dynamic Bridge: The
MCP Client Bridgespawns both servers as subprocesses and dynamically fetches tools over the stdio protocol. Adding new MCP servers is as simple as adding a newcreate_mcp_bridge_tools_from_command()call. - Google Search: The agent also has access to Google Search for looking up algorithms, design patterns, and concepts in real-time.
Graceful interruptions require precise coordination across the full stack:
- Frontend (
useWebSocket.ts): Wraps Web Audio API playback. When the user speaks, it triggersinterruptAgent(), which instantly clears the audio buffer, drops partial messages, and sends an interruption signal. - Backend (
main.py): Uses anasyncio.Event(cancel_event) shared between the upstream (receive) and downstream (send) tasks. If the client disconnects or interrupts, tasks are cleanly cancelled without hanging the server or queue. - Auto-Reconnect: Features exponential backoff for unexpected network drops, ensuring a resilient live session.
Google_Hackathon/
βββ .env # API key + model config
βββ requirements.txt # Python dependencies
β
βββ bidi_streaming_agent/ # Google ADK Agent Code
β βββ agent.py # Root agent: Interview Coach persona, dual MCP Client loading
β βββ mcp_client_bridge.py # Bridge: Spawns MCP servers (Python & Node.js) & wraps tools
β βββ mcp_servers/
β βββ interview_mcp_server.py # FastMCP server for code execution
β
βββ app/
β βββ main.py # FastAPI WebSocket server (session & interrupt mgmt)
β
βββ frontend/ # React app (Vite + TypeScript)
βββ src/
βββ hooks/
β βββ useWebSocket.ts # WS lifecycle, streaming, interruption handling
β βββ useAudioRecorder.ts # Mic capture via AudioWorklet (16kHz PCM)
βββ components/
βββ ChatInterface.tsx # Main UI (Chat, Voice, Whiteboard, Activity Log)
βββ Whiteboard.tsx # Canvas drawing tool for system design diagrams
βββ ui/ # Shadcn UI primitives
- Python 3.10+
- Node.js &
pnpm - A Gemini API Key
- Clone the repository and navigate to the root directory.
- Create and activate a virtual environment:
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
- Install dependencies:
pip install -r requirements.txt
- Install the LeetCode MCP server (requires Node.js):
npm install -g @jinzcdev/leetcode-mcp-server
- Set up your
.envfile in the root directory:GEMINI_API_KEY="your_api_key_here" DEMO_AGENT_MODEL="gemini-2.5-flash-native-audio-preview-12-2025"
- Start the FastAPI server:
python -m uvicorn app.main:app --host 0.0.0.0 --port 8000 --reload
- Open a second terminal and navigate to the
frontendfolder:cd frontend - Install dependencies:
pnpm install
- Start the Vite development server:
pnpm run dev
- Open your browser to
http://localhost:5173. Click the microphone icon to start a coaching session and begin practicing!
The backend infrastructure leverages Google Cloud for all AI capabilities.
How it uses Google Cloud:
- Google ADK & Vertex AI/Gemini API: The core intelligence and multimodal live streaming are powered completely by Google's cloud infrastructure via the Gemini Live API endpoint.
- Production Deployment Strategy: In a production scenario, the FastAPI backend can be deployed via Google Cloud Run using a Dockerfile. The code execution sandbox would run in an isolated container for security, and the frontend would be served via Cloud CDN.
- Agent Framework: Google ADK (Agent Development Kit)
- AI Model: Gemini 2.5 Flash Native Audio (bidi streaming API)
- Backend: Python, FastAPI, Uvicorn, WebSockets,
asyncio - Frontend: React 19, Vite, TypeScript, Tailwind CSS v4, Shadcn/UI
- Browser APIs: Web Audio API, AudioWorklet (raw PCM conversion), Screen Capture API, Canvas API
- Tooling: MCP (Model Context Protocol), FastMCP, @jinzcdev/leetcode-mcp-server, Google Search