Skip to content

Commit ba37b26

Browse files
author
examples-bot
committed
feat(examples): add 530 — Multi-Provider Chat Completions Proxy for Voice Agent (Python)
1 parent fe9e6da commit ba37b26

8 files changed

Lines changed: 665 additions & 0 deletions

File tree

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
# Deepgram — https://console.deepgram.com/
2+
DEEPGRAM_API_KEY=
3+
4+
# OpenAI — https://platform.openai.com/api-keys
5+
OPENAI_API_KEY=
Lines changed: 65 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,65 @@
1+
# Multi-Provider Chat Completions Proxy for Deepgram Voice Agent
2+
3+
A FastAPI proxy server that exposes an OpenAI-compatible `/v1/chat/completions` endpoint, routing requests to multiple LLM backends (OpenAI, AWS Bedrock). The Deepgram Voice Agent API uses this proxy as its `think.endpoint.url`, letting you swap LLM providers without changing application code.
4+
5+
## What you'll build
6+
7+
A Python proxy server that sits between the Deepgram Voice Agent API and your choice of LLM backend. The Voice Agent handles speech-to-text (nova-3) and text-to-speech (aura-2) while all "thinking" routes through your proxy to OpenAI or AWS Bedrock — switchable via a single environment variable.
8+
9+
## Prerequisites
10+
11+
- Python 3.10+
12+
- Deepgram account — [get a free API key](https://console.deepgram.com/)
13+
- OpenAI account — [get an API key](https://platform.openai.com/api-keys)
14+
- AWS account (optional, for Bedrock) — [IAM console](https://console.aws.amazon.com/iam/)
15+
16+
## Environment variables
17+
18+
| Variable | Where to find it |
19+
|----------|-----------------|
20+
| `DEEPGRAM_API_KEY` | [Deepgram console](https://console.deepgram.com/) |
21+
| `LLM_PROVIDER` | Set to `openai` or `bedrock` (default: `openai`) |
22+
| `OPENAI_API_KEY` | [OpenAI dashboard → API keys](https://platform.openai.com/api-keys) |
23+
| `AWS_ACCESS_KEY_ID` | [AWS IAM console](https://console.aws.amazon.com/iam/) (Bedrock only) |
24+
| `AWS_SECRET_ACCESS_KEY` | [AWS IAM console](https://console.aws.amazon.com/iam/) (Bedrock only) |
25+
| `AWS_REGION` | AWS region with Bedrock access, e.g. `us-east-1` (Bedrock only) |
26+
27+
## Install and run
28+
29+
```bash
30+
cp .env.example .env
31+
# Fill in your API keys in .env
32+
33+
pip install -r requirements.txt
34+
35+
# Start the proxy server
36+
cd src && uvicorn proxy:app --port 8080
37+
38+
# In another terminal, run the demo Voice Agent
39+
python src/demo_agent.py
40+
```
41+
42+
## Key parameters
43+
44+
| Parameter | Value | Description |
45+
|-----------|-------|-------------|
46+
| `think.provider.type` | `open_ai` | Tells the Voice Agent to use OpenAI-compatible format |
47+
| `think.endpoint.url` | `https://your-proxy.example.com/v1/chat/completions` | Points the agent's LLM calls at the proxy (must be HTTPS) |
48+
| `listen.provider.model` | `nova-3` | Deepgram's flagship STT model |
49+
| `speak.provider.model` | `aura-2-thalia-en` | Deepgram's TTS model |
50+
| `LLM_PROVIDER` | `openai` or `bedrock` | Which backend the proxy routes to |
51+
52+
## How it works
53+
54+
1. **Start the proxy** — FastAPI serves `/v1/chat/completions` on port 8080
55+
2. **Connect the Voice Agent** — The demo script opens a WebSocket to `wss://agent.deepgram.com/v1/agent/converse` with `think.endpoint.url` pointed at the proxy
56+
3. **User speaks** — The Voice Agent transcribes speech using Deepgram nova-3
57+
4. **Agent thinks** — The Voice Agent sends an OpenAI-format chat completion request to the proxy
58+
5. **Proxy routes** — Based on `LLM_PROVIDER` (or the `X-LLM-Provider` header), the proxy forwards to OpenAI or AWS Bedrock
59+
6. **Agent speaks** — The Voice Agent converts the LLM response to speech using Deepgram aura-2 and streams audio back
60+
61+
To switch providers, change `LLM_PROVIDER` in your `.env` — no code changes needed. You can also override per-request using the `X-LLM-Provider: bedrock` header.
62+
63+
## Starter templates
64+
65+
[deepgram-starters](https://github.com/orgs/deepgram-starters/repositories)
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
deepgram-sdk==6.1.1
2+
fastapi==0.135.3
3+
starlette==1.0.0
4+
uvicorn[standard]==0.34.0
5+
httpx==0.28.1
6+
python-dotenv==1.2.2
7+
websockets==14.2
8+
boto3==1.37.23

examples/530-voice-agent-multi-provider-proxy-python/src/__init__.py

Whitespace-only changes.
Lines changed: 136 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,136 @@
1+
"""Demo: connect a Deepgram Voice Agent to the multi-provider proxy.
2+
3+
This script opens a WebSocket to the Deepgram Voice Agent API with
4+
think.endpoint.url pointed at the local proxy server, then streams
5+
microphone audio and plays back the agent's TTS responses.
6+
7+
Prerequisites:
8+
1. Start the proxy: uvicorn src.proxy:app --port 8080
9+
2. Run this script: python src/demo_agent.py
10+
11+
The Voice Agent handles STT (nova-3) and TTS (aura-2) directly via
12+
Deepgram, while all LLM "thinking" goes through the proxy — which
13+
routes to whichever provider LLM_PROVIDER is set to.
14+
"""
15+
16+
from __future__ import annotations
17+
18+
import json
19+
import os
20+
import sys
21+
22+
from dotenv import load_dotenv
23+
24+
load_dotenv()
25+
26+
import websockets
27+
import websockets.sync.client
28+
29+
DG_AGENT_URL = "wss://agent.deepgram.com/v1/agent/converse"
30+
31+
PROXY_URL = os.environ.get("PROXY_URL", "http://localhost:8080/v1/chat/completions")
32+
33+
34+
def build_settings(proxy_url: str = PROXY_URL) -> dict:
35+
"""Build the Voice Agent Settings message with the proxy as the LLM backend."""
36+
return {
37+
"type": "Settings",
38+
"audio": {
39+
"input": {
40+
"encoding": "linear16",
41+
"sample_rate": 16000,
42+
},
43+
"output": {
44+
"encoding": "linear16",
45+
"sample_rate": 16000,
46+
},
47+
},
48+
"agent": {
49+
"listen": {
50+
"provider": {
51+
"type": "deepgram",
52+
"model": "nova-3",
53+
},
54+
},
55+
"think": {
56+
"provider": {
57+
"type": "open_ai",
58+
"model": "gpt-4o-mini",
59+
},
60+
"endpoint": {
61+
"url": proxy_url,
62+
"headers": {},
63+
},
64+
"prompt": (
65+
"You are a helpful voice assistant. Keep responses concise "
66+
"and conversational — the user is speaking, not reading."
67+
),
68+
},
69+
"speak": {
70+
"provider": {
71+
"type": "deepgram",
72+
"model": "aura-2-thalia-en",
73+
},
74+
},
75+
"greeting": "Hello! I'm your voice assistant. How can I help?",
76+
},
77+
}
78+
79+
80+
def run_agent(proxy_url: str = PROXY_URL) -> None:
81+
"""Connect to the Voice Agent and print events until interrupted."""
82+
api_key = os.environ.get("DEEPGRAM_API_KEY")
83+
if not api_key:
84+
print("Error: DEEPGRAM_API_KEY not set", file=sys.stderr)
85+
sys.exit(1)
86+
87+
settings = build_settings(proxy_url)
88+
89+
print(f"Connecting to Deepgram Voice Agent…")
90+
print(f" LLM proxy: {proxy_url}")
91+
92+
ws = websockets.sync.client.connect(
93+
DG_AGENT_URL,
94+
additional_headers={"Authorization": f"Token {api_key}"},
95+
)
96+
97+
ws.send(json.dumps(settings))
98+
print("Settings sent, waiting for agent…")
99+
100+
try:
101+
while True:
102+
raw = ws.recv()
103+
if isinstance(raw, bytes):
104+
print(f" [audio] {len(raw)} bytes")
105+
continue
106+
107+
msg = json.loads(raw)
108+
msg_type = msg.get("type", "")
109+
110+
if msg_type == "Welcome":
111+
print(f" Connected — request_id: {msg.get('request_id')}")
112+
elif msg_type == "SettingsApplied":
113+
print(" Settings applied — agent ready")
114+
print(" (Send audio to interact, or Ctrl+C to stop)")
115+
elif msg_type == "ConversationText":
116+
print(f" [{msg.get('role')}] {msg.get('content')}")
117+
elif msg_type == "AgentStartedSpeaking":
118+
latency = msg.get("total_latency", 0)
119+
print(f" Agent speaking (latency: {latency:.2f}s)")
120+
elif msg_type == "AgentAudioDone":
121+
print(" Agent audio done")
122+
elif msg_type == "Error":
123+
print(f" ERROR: {msg.get('description')} ({msg.get('code')})")
124+
elif msg_type == "Warning":
125+
print(f" WARNING: {msg.get('description')}")
126+
else:
127+
print(f" [{msg_type}] {json.dumps(msg)[:120]}")
128+
129+
except KeyboardInterrupt:
130+
print("\nDisconnecting…")
131+
finally:
132+
ws.close()
133+
134+
135+
if __name__ == "__main__":
136+
run_agent()
Lines changed: 121 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,121 @@
1+
"""LLM provider backends for the OpenAI-compatible proxy.
2+
3+
Each provider implements chat_completion() which accepts OpenAI-format messages
4+
and returns an OpenAI-format response dict. This keeps the proxy layer thin —
5+
adding a new provider means writing one function.
6+
"""
7+
8+
from __future__ import annotations
9+
10+
import json
11+
import os
12+
import time
13+
import uuid
14+
from typing import Any
15+
16+
import httpx
17+
18+
19+
def openai_completion(
20+
messages: list[dict[str, Any]],
21+
model: str = "gpt-4o-mini",
22+
**kwargs: Any,
23+
) -> dict[str, Any]:
24+
"""Forward the request to OpenAI's chat completions API."""
25+
api_key = os.environ.get("OPENAI_API_KEY")
26+
if not api_key:
27+
raise RuntimeError("OPENAI_API_KEY not set")
28+
29+
payload: dict[str, Any] = {"model": model, "messages": messages, **kwargs}
30+
31+
resp = httpx.post(
32+
"https://api.openai.com/v1/chat/completions",
33+
headers={
34+
"Authorization": f"Bearer {api_key}",
35+
"Content-Type": "application/json",
36+
},
37+
json=payload,
38+
timeout=60.0,
39+
)
40+
resp.raise_for_status()
41+
return resp.json()
42+
43+
44+
def bedrock_completion(
45+
messages: list[dict[str, Any]],
46+
model: str = "anthropic.claude-3-haiku-20240307-v1:0",
47+
**kwargs: Any,
48+
) -> dict[str, Any]:
49+
"""Forward the request to AWS Bedrock's Converse API and reformat as OpenAI."""
50+
try:
51+
import boto3
52+
except ImportError as exc:
53+
raise RuntimeError("boto3 is required for the bedrock provider") from exc
54+
55+
region = os.environ.get("AWS_REGION", "us-east-1")
56+
client = boto3.client(
57+
"bedrock-runtime",
58+
region_name=region,
59+
aws_access_key_id=os.environ.get("AWS_ACCESS_KEY_ID"),
60+
aws_secret_access_key=os.environ.get("AWS_SECRET_ACCESS_KEY"),
61+
)
62+
63+
bedrock_messages = []
64+
system_prompt = None
65+
for msg in messages:
66+
if msg["role"] == "system":
67+
system_prompt = msg["content"]
68+
continue
69+
bedrock_messages.append({
70+
"role": msg["role"],
71+
"content": [{"text": msg["content"]}],
72+
})
73+
74+
converse_kwargs: dict[str, Any] = {
75+
"modelId": model,
76+
"messages": bedrock_messages,
77+
}
78+
if system_prompt:
79+
converse_kwargs["system"] = [{"text": system_prompt}]
80+
81+
response = client.converse(**converse_kwargs)
82+
83+
output_text = ""
84+
if response.get("output", {}).get("message", {}).get("content"):
85+
for block in response["output"]["message"]["content"]:
86+
if "text" in block:
87+
output_text += block["text"]
88+
89+
usage = response.get("usage", {})
90+
return {
91+
"id": f"chatcmpl-{uuid.uuid4().hex[:12]}",
92+
"object": "chat.completion",
93+
"created": int(time.time()),
94+
"model": model,
95+
"choices": [
96+
{
97+
"index": 0,
98+
"message": {"role": "assistant", "content": output_text},
99+
"finish_reason": response.get("stopReason", "end_turn"),
100+
}
101+
],
102+
"usage": {
103+
"prompt_tokens": usage.get("inputTokens", 0),
104+
"completion_tokens": usage.get("outputTokens", 0),
105+
"total_tokens": usage.get("inputTokens", 0) + usage.get("outputTokens", 0),
106+
},
107+
}
108+
109+
110+
PROVIDERS = {
111+
"openai": openai_completion,
112+
"bedrock": bedrock_completion,
113+
}
114+
115+
116+
def get_provider(name: str):
117+
"""Return the completion function for the named provider."""
118+
fn = PROVIDERS.get(name)
119+
if fn is None:
120+
raise ValueError(f"Unknown provider '{name}'. Available: {list(PROVIDERS.keys())}")
121+
return fn

0 commit comments

Comments
 (0)