WebSocket Streaming

ODAI uses a persistent WebSocket connection for all text chat interactions. The connection carries bidirectional JSON messages: user prompts in, and a stream of response events out — including partial text deltas, tool call notifications, agent handoff events, and final responses.

Endpoint

WSS /chats/{chat_id}?token={auth_token}

Parameter	Description
`chat_id`	Unique identifier for the chat session. Used to persist and resume conversation history in Firebase.
`token`	Firebase authentication token. Validated on connection before any messages are processed.

Optional headers used for location detection:

X-Forwarded-For — client IP passed by load balancers
CF-Connecting-IP — client IP passed by Cloudflare

Connection lifecycle

Authenticate

The server validates the token query parameter against Firebase. If authentication fails, the WebSocket is closed before being added to the connection pool.

The ConnectionManager accepts the WebSocket and appends it to active_connections. The chat is fetched from Firebase or created if it does not exist.

await self.connection_manager.connect(websocket)
chat, is_new_chat, last_message_id = self.chat_service.get_or_create_chat(
    chat_id, user, location_info
)

Enter message loop

The handler enters a while True loop, waiting for text frames from the client. Each received frame is treated as a new user prompt.

Stream response

The orchestrator runs with runner.run_streamed(...). Events are forwarded to the client as they arrive. See message types below.

Finalize

After the stream ends, the server sends end_of_stream, generates suggested follow-up prompts, persists the updated conversation to Firebase, and records token usage.

Disconnect

On WebSocketDisconnect or ConnectionClosed, the connection is removed from active_connections and the loop exits.

ConnectionManager

websocket/connection_manager.py provides a lightweight manager that tracks all live connections and handles safe message delivery.

class ConnectionManager:
    def __init__(self):
        self.active_connections: List[WebSocket] = []

    async def connect(self, websocket: WebSocket) -> None:
        await websocket.accept()
        self.active_connections.append(websocket)

    def disconnect(self, websocket: WebSocket) -> None:
        if websocket in self.active_connections:
            self.active_connections.remove(websocket)

    async def send_personal_message(self, message: str, websocket: WebSocket) -> None:
        try:
            await websocket.send_text(message)
        except Exception as e:
            logger.error(f"Error sending message to WebSocket: {e}")
            self.disconnect(websocket)

If a send_text call raises an exception (e.g. because the client disconnected silently), ConnectionManager removes the stale connection automatically. A broadcast method is also available for pushing a message to all active connections simultaneously.

Sending a message

The client sends a plain text JSON frame:

{
  "message": "What is the weather in San Francisco today?",
  "thread_id": "unique_thread_identifier"
}

The thread_id field is tracked for analytics. The conversation history is keyed by chat_id (from the URL), not thread_id.

Message types

The server sends a stream of JSON messages for each response. Messages arrive in this order:

Text delta (`raw_response_event`)

Sent for every token the model generates. Clients should concatenate these to build the full response text in real time.

{
  "type": "raw_response_event",
  "delta": "The weather in San Francisco ",
  "current_agent": "ODAI"
}

Tool call notification (`tool_call`)

Sent as soon as the model decides to invoke a tool, before the tool executes. This powers live “what the AI is doing” indicators in the UI.

{
  "type": "tool_call",
  "name": "get_current_weather_by_latitude_longitude",
  "description": "Getting Current Weather...",
  "current_agent": "AccuWeather"
}

The description string comes from the TOOL_CALLS dictionary in connectors/orchestrator.py. If the tool maps to None, no tool_call message is emitted for that call.

Tool output (`tool_output`)

Sent after a tool returns, when the tool’s ToolResponse has display_response=True. Clients can use this to render structured data (e.g. a weather card or search results).

{
  "type": "tool_output",
  "output": {
    "response_type": "weather",
    "agent_name": "AccuWeather",
    "friendly_name": "AccuWeather",
    "display_response": true,
    "response": { ... }
  },
  "current_agent": "AccuWeather"
}

Agent handoff (`agent_updated` / `handoff`)

Sent when the orchestrator transfers control to a specialized agent.

{
  "type": "agent_updated",
  "new_agent": "AccuWeather",
  "current_agent": "AccuWeather",
  "name": "AccuWeather"
}

Completed LLM response (`llm_response`)

Sent once when the model finishes generating text for a turn, carrying the full assembled response string.

{
  "type": "llm_response",
  "current_agent": "ODAI",
  "response": "The current temperature in San Francisco is 58°F with partly cloudy skies..."
}

End of stream (`end_of_stream`)

Signals that the response is complete. Clients should stop appending text deltas when this is received.

{ "type": "end_of_stream" }

Suggested prompts (`suggested_prompts`)

Sent after end_of_stream. Contains AI-generated follow-up prompt suggestions based on the conversation context.

{
  "type": "suggested_prompts",
  "prompts": [
    "What will the weather be like tomorrow?",
    "Should I bring an umbrella this week?",
    "What is the UV index in San Francisco?"
  ]
}

Tool call visualization

The sequence of tool_call and agent_updated messages before text starts arriving lets clients show the user exactly what the AI is doing in real time — for example:

agent_updated → “Handing off to AccuWeather”
tool_call → “Getting Current Weather…”
raw_response_event deltas start arriving

This is powered by the TOOL_CALLS dictionary, which maps every function tool name to a human-readable present-participle string. Tools not meant to surface progress (e.g. internal state writes) map to None and are silently skipped.

Conversation state and reconnection

Chat history is persisted to Firebase at the end of every turn via chat_service.update_chat_messages. When a client reconnects to an existing chat_id, the server loads the stored message history and the last_message_id from the previous OpenAI response. On reconnect, the runner uses the stored previous_response_id to resume the conversation with full context, without resending the entire message history to the model on every request:

if last_message_id:
    result = runner.run_streamed(
        orchestrator_agent,
        [{"content": prompt, "role": "user"}],
        context=context,
        previous_response_id=last_message_id
    )
else:
    result = runner.run_streamed(
        orchestrator_agent,
        chat.messages + [{"content": prompt, "role": "user"}],
        context=context,
    )

Using previous_response_id lets the model continue a conversation without sending the full prior message list in each API call, reducing latency and token cost on long threads.

Get Started

Core Concepts

Integrations

Setup & Deployment

Security

WebSocket Streaming

Endpoint

Connection lifecycle

ConnectionManager

Sending a message

Message types

Text delta (`raw_response_event`)

Tool call notification (`tool_call`)

Tool output (`tool_output`)

Agent handoff (`agent_updated` / `handoff`)

Completed LLM response (`llm_response`)

End of stream (`end_of_stream`)

Suggested prompts (`suggested_prompts`)

Tool call visualization

Conversation state and reconnection

Build docs developers (and LLMs) love

Get Started

Core Concepts

Integrations

Setup & Deployment

Security

Documentation Index

​Endpoint

​Connection lifecycle

​ConnectionManager

​Sending a message

​Message types

​Text delta (raw_response_event)

​Tool call notification (tool_call)

​Tool output (tool_output)

​Agent handoff (agent_updated / handoff)

​Completed LLM response (llm_response)

​End of stream (end_of_stream)

​Suggested prompts (suggested_prompts)

​Tool call visualization

​Conversation state and reconnection

Build docs developers (and LLMs) love

Endpoint

Connection lifecycle

ConnectionManager

Sending a message

Message types

Text delta (`raw_response_event`)

Tool call notification (`tool_call`)

Tool output (`tool_output`)

Agent handoff (`agent_updated` / `handoff`)

Completed LLM response (`llm_response`)

End of stream (`end_of_stream`)

Suggested prompts (`suggested_prompts`)

Tool call visualization

Conversation state and reconnection