LLM

Overview

The LLM module provides abstract base classes for different types of language models in Vision Agents. These classes define the interface that all LLM implementations must follow.

Base Classes

The base class for all language model implementations. Location: vision_agents.core.llm.llm.LLM

from vision_agents.core.llm.llm import LLM

class MyLLM(LLM):
    async def simple_response(
        self,
        text: str,
        processors: Optional[List[Processor]] = None,
        participant: Optional[Participant] = None,
    ) -> LLMResponseEvent[Any]:
        # Implementation here
        pass

Abstract Methods

simple_response

async method

required

Generate a response from text input.Parameters:

text (str): The input text to process
processors (Optional[List[Processor]]): Optional list of processors to apply
participant (Optional[Participant]): Optional participant information

Returns: LLMResponseEvent[Any] - Response event containing the model’s output

Key Methods

register_function

method

Decorator to register a function with the LLM’s function registry.Parameters:

name (Optional[str]): Custom name for the function. Defaults to function name.
description (Optional[str]): Function description. Defaults to docstring.

Returns: Decorator functionExample:

@llm.register_function(name="get_weather", description="Get current weather")
async def get_weather(location: str) -> dict:
    return {"temperature": 72, "condition": "sunny"}

call_function

async method

Call a registered function with the given arguments.Parameters:

name (str): Name of the function to call
arguments (Dict[str, Any]): Dictionary of arguments to pass

Returns: Result of the function call

set_instructions

method

Set instructions for the LLM.Parameters:

instructions (Instructions | str): Instructions object or string

Provider-Specific Methods (Override in Subclasses)

_convert_tools_to_provider_format

method

Convert ToolSchema objects to provider-specific format.Parameters:

tools (List[ToolSchema]): List of ToolSchema objects

Returns: List[Dict[str, Any]] - Tools in provider-specific format

_extract_tool_calls_from_response

method

Extract tool calls from provider-specific response.Parameters:

response (Any): Provider-specific response object

Returns: List[NormalizedToolCallItem] - Normalized tool call items

_create_tool_result_message

method

Create tool result messages for the provider.Parameters:

tool_calls (List[NormalizedToolCallItem]): List of executed tool calls
results (List[Any]): List of results from function execution

Returns: List[Dict[str, Any]] - Tool result messages in provider format

AudioLLM

Base class for LLMs capable of processing speech-to-speech audio. These models do not require separate TTS and STT services. Location: vision_agents.core.llm.llm.AudioLLM

from vision_agents.core.llm.llm import AudioLLM

class MyAudioLLM(AudioLLM):
    async def simple_audio_response(
        self,
        pcm: PcmData,
        participant: Optional[Participant] = None
    ):
        # Process audio and generate audio response
        pass

Abstract Methods

simple_audio_response

async method

required

Process PCM audio frames and generate a response.Parameters:

pcm (PcmData): PCM audio frame to process (typically 48 kHz mono, 16-bit)
participant (Optional[Participant]): Optional participant information

The audio should be raw PCM matching the model’s expected format.

VideoLLM

Base class for LLMs capable of processing video input. Location: vision_agents.core.llm.llm.VideoLLM

from vision_agents.core.llm.llm import VideoLLM
import aiortc

class MyVideoLLM(VideoLLM):
    async def watch_video_track(
        self,
        track: aiortc.mediastreams.MediaStreamTrack,
        shared_forwarder: Optional[VideoForwarder] = None,
    ) -> None:
        # Start watching video track
        pass
    
    async def stop_watching_video_track(self) -> None:
        # Stop watching video track
        pass

Abstract Methods

watch_video_track

async method

required

Watch and forward video tracks.Parameters:

track (aiortc.mediastreams.MediaStreamTrack): Video track to watch
shared_forwarder (Optional[VideoForwarder]): Optional shared VideoForwarder instance to use instead of creating a new one. Allows multiple consumers to share the same video stream.

stop_watching_video_track

async method

required

Stop watching the video track and clean up resources.

OmniLLM

Base class for LLMs capable of both video and speech-to-speech audio processing. Combines AudioLLM and VideoLLM capabilities. Location: vision_agents.core.llm.llm.OmniLLM

from vision_agents.core.llm.llm import OmniLLM

class MyOmniLLM(OmniLLM):
    async def simple_audio_response(
        self,
        pcm: PcmData,
        participant: Optional[Participant] = None
    ):
        # Process audio
        pass
    
    async def watch_video_track(
        self,
        track: aiortc.mediastreams.MediaStreamTrack,
        shared_forwarder: Optional[VideoForwarder] = None,
    ) -> None:
        # Watch video
        pass
    
    async def stop_watching_video_track(self) -> None:
        # Stop watching video
        pass

Inherited Methods

OmniLLM inherits all abstract methods from both AudioLLM and VideoLLM:

simple_audio_response() - from AudioLLM
watch_video_track() - from VideoLLM
stop_watching_video_track() - from VideoLLM

Properties

agent

Optional[Agent]

Reference to the agent using this LLM. Set automatically by the agent.

events

EventManager

Event manager for emitting LLM events (tool calls, responses, etc.)

function_registry

FunctionRegistry

Registry of functions available for tool calling.

Tool Execution

The base LLM class provides built-in support for concurrent tool execution with timeout and deduplication:

# Tools are automatically executed with:
# - Concurrency limit (default: 8)
# - Timeout per tool (default: 30s)
# - Automatic deduplication
# - Event emission (ToolStartEvent, ToolEndEvent)

Events

LLM implementations emit the following events:

ToolStartEvent - When a tool execution starts
ToolEndEvent - When a tool execution completes (success or failure)
LLMResponseChunkEvent - For streaming responses

See Events for more details.

Core

LLM Components

Processors

Edge & Transport

Session Management

Events

LLM

Overview

Base Classes

LLM

Abstract Methods

Key Methods

Provider-Specific Methods (Override in Subclasses)

AudioLLM

Abstract Methods

VideoLLM

Abstract Methods

OmniLLM

Inherited Methods

Properties

Tool Execution

Events

Build docs developers (and LLMs) love

Core

LLM Components

Processors

Edge & Transport

Session Management

Events

​Overview

​Base Classes

​LLM

​Abstract Methods

​Key Methods

​Provider-Specific Methods (Override in Subclasses)

​AudioLLM

​Abstract Methods

​VideoLLM

​Abstract Methods

​OmniLLM

​Inherited Methods

​Properties

​Tool Execution

​Events

Build docs developers (and LLMs) love

Overview

Base Classes

LLM

Abstract Methods

Key Methods

Provider-Specific Methods (Override in Subclasses)

AudioLLM

Abstract Methods

VideoLLM

Abstract Methods

OmniLLM

Inherited Methods

Properties

Tool Execution

Events