Skip to main content

Overview

BYOL lets you replace the default platform LLM with any inference engine you control — an on-premise model, a fine-tuned model, a Google ADK agent, a LangGraph workflow, or anything else that can speak the Vaani WebSocket protocol. When BYOL is enabled for an agent, every response turn is routed to your WebSocket server instead of the built-in provider (OpenAI, Google, Groq, etc.). The rest of the pipeline — STT, TTS, telephony, transcripts, Langfuse tracing — stays exactly the same.

How to enable BYOL

  1. Open your agent in the dashboard.
  2. Go to Brain → Reasoning Language Model (LLM).
  3. Switch to the Bring your Own LLM (BYOL) tab.
  4. Paste your WebSocket URL (e.g. wss://your-server.example.com/chat/stream).
  5. Click Test Connection to verify reachability, then Save URL.
  6. Choose a Fallback LLM — either No fallback or Use platform LLM.
The URL is stored under agent_config.persona.senses_capabilities.brain.llm.extra_params.llm_websocket_url and takes effect immediately on the next call.

WebSocket protocol

Your server must implement the following JSON message exchange over a persistent WebSocket connection. Vaani opens one connection per call (identified by session_id / room name) and sends one request per agent turn.

Connection handshake

Immediately after the WebSocket is accepted, your server must send two JSON frames in order:
{ "interaction_type": "config",    "content": "Server ready" }
{ "interaction_type": "greeting",  "content": "Hello" }
These frames are consumed by the Vaani agent and discarded — they are only used to confirm the connection is live. The content strings may be anything.

Agent → Your server (request)

For every agent turn (after the user finishes speaking) Vaani sends:
{
  "interaction_type": "response_required",
  "response_id": 1,
  "transcript": [
    { "role": "system", "content": "You are a pen salesman …" },
    { "role": "user", "content": "What pens do you sell?" },
    { "role": "assistant", "content": "We carry …" },
    { "role": "user", "content": "Do you have fountain pens?" }
  ]
}
FieldTypeDescription
interaction_type"response_required"Always this value for a normal turn
response_idintegerMonotonically increasing; echoed back in every response chunk
transcriptarrayFull conversation history for this call, including the system prompt as the first "system" message
The system prompt built from your agent configuration is always the first entry with "role": "system". Your server must apply it as the LLM’s instruction/system message for each turn. If you create an ADK LlmAgent or a LangChain chain, pass this text as the agent instruction or system message so the configured persona is honoured.

Your server → Agent (streaming response)

Stream back one or more chunks, each as a JSON frame:
{
  "response_type": "response",
  "response_id": 1,
  "content": "Yes, we carry ",
  "content_complete": false
}
Send a final frame with "content_complete": true and optionally empty content to signal end of turn:
{
  "response_type": "response",
  "response_id": 1,
  "content": "beautiful Pelikan M800 models.",
  "content_complete": true
}
FieldTypeDescription
response_type"response"Always this value
response_idintegerMust match the response_id from the request
contentstringText chunk to speak; may be empty on the final frame
content_completebooleantrue on the last frame of a turn
You may optionally add "end_call": true on the final frame to signal that the agent should hang up the call after speaking the response.

Keep-alive (ping/pong)

If your server sends a keep-alive frame Vaani will echo it back immediately:
{ "response_type": "ping_pong" }

Fallback behaviour

You can configure what happens when your server is unreachable or returns an error:
SettingBehaviour
No fallbackThe turn fails silently; the agent waits for the next user utterance
Use platform LLMThe agent falls back to the primary provider configured in the From Providers tab; if that also fails, it tries the fallback provider
The fallback mode is stored as extra_params.fallback in the agent config ("none" or "platform").

Session management

Each call opens a new connection at <ws_url>/<call_id> (the call_id / room name is appended automatically). Your server should use the session_id or the path component to isolate per-call state (e.g. conversation memory, tool state). When the call ends Vaani closes the WebSocket cleanly. You can also expose a DELETE /session/{session_id} endpoint on your server so Vaani can explicitly clean up state (see the example ADK server).

Reference implementation

The vaani-adk-byol-example repository contains a complete FastAPI server (adk_server.py) that:
  • Implements the full WebSocket protocol above
  • Runs a Google ADK agent (LlmAgent) with the system prompt injected per session
  • Exposes REST (/chat), SSE (/chat/sse), and WebSocket (/chat/stream/{session_id}) endpoints
  • Can be deployed locally and exposed with ngrok in under 5 minutes
# Install
cd vaani-adk-byol-example
python -m venv venv && source venv/bin/activate
pip install -r requirements.txt
cp example.env .env   # add your GOOGLE_API_KEY

# Run
uvicorn adk_server:app --host 0.0.0.0 --port 8090 --reload

# Expose publicly
ngrok http 8090
# → copy wss://xxxx.ngrok-free.app and paste into the BYOL tab