Overview
BYOL lets you replace the default platform LLM with any inference engine you control — an
on-premise model, a fine-tuned model, a Google ADK agent, a LangGraph workflow, or anything else
that can speak the Vaani WebSocket protocol.
When BYOL is enabled for an agent, every response turn is routed to your WebSocket server instead
of the built-in provider (OpenAI, Google, Groq, etc.). The rest of the pipeline — STT, TTS,
telephony, transcripts, Langfuse tracing — stays exactly the same.
How to enable BYOL
- Open your agent in the dashboard.
- Go to Brain → Reasoning Language Model (LLM).
- Switch to the Bring your Own LLM (BYOL) tab.
- Paste your WebSocket URL (e.g.
wss://your-server.example.com/chat/stream).
- Click Test Connection to verify reachability, then Save URL.
- Choose a Fallback LLM — either No fallback or Use platform LLM.
The URL is stored under agent_config.persona.senses_capabilities.brain.llm.extra_params.llm_websocket_url
and takes effect immediately on the next call.
WebSocket protocol
Your server must implement the following JSON message exchange over a persistent WebSocket
connection. Vaani opens one connection per call (identified by session_id / room name) and
sends one request per agent turn.
Connection handshake
Immediately after the WebSocket is accepted, your server must send two JSON frames in order:
{ "interaction_type": "config", "content": "Server ready" }
{ "interaction_type": "greeting", "content": "Hello" }
These frames are consumed by the Vaani agent and discarded — they are only used to confirm the
connection is live. The content strings may be anything.
Agent → Your server (request)
For every agent turn (after the user finishes speaking) Vaani sends:
{
"interaction_type": "response_required",
"response_id": 1,
"transcript": [
{ "role": "system", "content": "You are a pen salesman …" },
{ "role": "user", "content": "What pens do you sell?" },
{ "role": "assistant", "content": "We carry …" },
{ "role": "user", "content": "Do you have fountain pens?" }
]
}
| Field | Type | Description |
|---|
interaction_type | "response_required" | Always this value for a normal turn |
response_id | integer | Monotonically increasing; echoed back in every response chunk |
transcript | array | Full conversation history for this call, including the system prompt as the first "system" message |
The system prompt built from your agent configuration is always the first
entry with "role": "system". Your server must apply it as the LLM’s
instruction/system message for each turn. If you create an ADK LlmAgent or a
LangChain chain, pass this text as the agent instruction or system message so
the configured persona is honoured.
Your server → Agent (streaming response)
Stream back one or more chunks, each as a JSON frame:
{
"response_type": "response",
"response_id": 1,
"content": "Yes, we carry ",
"content_complete": false
}
Send a final frame with "content_complete": true and optionally empty content to signal end of
turn:
{
"response_type": "response",
"response_id": 1,
"content": "beautiful Pelikan M800 models.",
"content_complete": true
}
| Field | Type | Description |
|---|
response_type | "response" | Always this value |
response_id | integer | Must match the response_id from the request |
content | string | Text chunk to speak; may be empty on the final frame |
content_complete | boolean | true on the last frame of a turn |
You may optionally add "end_call": true on the final frame to signal that the agent should
hang up the call after speaking the response.
Keep-alive (ping/pong)
If your server sends a keep-alive frame Vaani will echo it back immediately:
{ "response_type": "ping_pong" }
Fallback behaviour
You can configure what happens when your server is unreachable or returns an error:
| Setting | Behaviour |
|---|
| No fallback | The turn fails silently; the agent waits for the next user utterance |
| Use platform LLM | The agent falls back to the primary provider configured in the From Providers tab; if that also fails, it tries the fallback provider |
The fallback mode is stored as extra_params.fallback in the agent config ("none" or
"platform").
Session management
Each call opens a new connection at <ws_url>/<call_id> (the call_id / room name is appended
automatically). Your server should use the session_id or the path component to isolate
per-call state (e.g. conversation memory, tool state).
When the call ends Vaani closes the WebSocket cleanly. You can also expose a
DELETE /session/{session_id} endpoint on your server so Vaani can explicitly clean up state
(see the example ADK server).
Reference implementation
The vaani-adk-byol-example repository
contains a complete FastAPI server (adk_server.py) that:
- Implements the full WebSocket protocol above
- Runs a Google ADK agent (
LlmAgent) with the system prompt injected per session
- Exposes REST (
/chat), SSE (/chat/sse), and WebSocket (/chat/stream/{session_id}) endpoints
- Can be deployed locally and exposed with ngrok in under 5 minutes
# Install
cd vaani-adk-byol-example
python -m venv venv && source venv/bin/activate
pip install -r requirements.txt
cp example.env .env # add your GOOGLE_API_KEY
# Run
uvicorn adk_server:app --host 0.0.0.0 --port 8090 --reload
# Expose publicly
ngrok http 8090
# → copy wss://xxxx.ngrok-free.app and paste into the BYOL tab