Endpoint
| Parameter | Description |
|---|---|
chat_id | Unique identifier for the chat session. Used to persist and resume conversation history in Firebase. |
token | Firebase authentication token. Validated on connection before any messages are processed. |
X-Forwarded-For— client IP passed by load balancersCF-Connecting-IP— client IP passed by Cloudflare
Connection lifecycle
Authenticate
The server validates the
token query parameter against Firebase. If authentication fails, the WebSocket is closed before being added to the connection pool.Register connection
The
ConnectionManager accepts the WebSocket and appends it to active_connections. The chat is fetched from Firebase or created if it does not exist.Enter message loop
The handler enters a
while True loop, waiting for text frames from the client. Each received frame is treated as a new user prompt.Stream response
The orchestrator runs with
runner.run_streamed(...). Events are forwarded to the client as they arrive. See message types below.Finalize
After the stream ends, the server sends
end_of_stream, generates suggested follow-up prompts, persists the updated conversation to Firebase, and records token usage.ConnectionManager
websocket/connection_manager.py provides a lightweight manager that tracks all live connections and handles safe message delivery.
send_text call raises an exception (e.g. because the client disconnected silently), ConnectionManager removes the stale connection automatically. A broadcast method is also available for pushing a message to all active connections simultaneously.
Sending a message
The client sends a plain text JSON frame:The
thread_id field is tracked for analytics. The conversation history is keyed by chat_id (from the URL), not thread_id.Message types
The server sends a stream of JSON messages for each response. Messages arrive in this order:Text delta (raw_response_event)
Sent for every token the model generates. Clients should concatenate these to build the full response text in real time.
Tool call notification (tool_call)
Sent as soon as the model decides to invoke a tool, before the tool executes. This powers live “what the AI is doing” indicators in the UI.
description string comes from the TOOL_CALLS dictionary in connectors/orchestrator.py. If the tool maps to None, no tool_call message is emitted for that call.
Tool output (tool_output)
Sent after a tool returns, when the tool’s ToolResponse has display_response=True. Clients can use this to render structured data (e.g. a weather card or search results).
Agent handoff (agent_updated / handoff)
Sent when the orchestrator transfers control to a specialized agent.
Completed LLM response (llm_response)
Sent once when the model finishes generating text for a turn, carrying the full assembled response string.
End of stream (end_of_stream)
Signals that the response is complete. Clients should stop appending text deltas when this is received.
Suggested prompts (suggested_prompts)
Sent after end_of_stream. Contains AI-generated follow-up prompt suggestions based on the conversation context.
Tool call visualization
The sequence oftool_call and agent_updated messages before text starts arriving lets clients show the user exactly what the AI is doing in real time — for example:
agent_updated→ “Handing off to AccuWeather”tool_call→ “Getting Current Weather…”raw_response_eventdeltas start arriving
TOOL_CALLS dictionary, which maps every function tool name to a human-readable present-participle string. Tools not meant to surface progress (e.g. internal state writes) map to None and are silently skipped.
Conversation state and reconnection
Chat history is persisted to Firebase at the end of every turn viachat_service.update_chat_messages. When a client reconnects to an existing chat_id, the server loads the stored message history and the last_message_id from the previous OpenAI response.
On reconnect, the runner uses the stored previous_response_id to resume the conversation with full context, without resending the entire message history to the model on every request: