Skip to main content

Overview

The LLM module provides abstract base classes for different types of language models in Vision Agents. These classes define the interface that all LLM implementations must follow.

Base Classes

LLM

The base class for all language model implementations. Location: vision_agents.core.llm.llm.LLM
from vision_agents.core.llm.llm import LLM

class MyLLM(LLM):
    async def simple_response(
        self,
        text: str,
        processors: Optional[List[Processor]] = None,
        participant: Optional[Participant] = None,
    ) -> LLMResponseEvent[Any]:
        # Implementation here
        pass

Abstract Methods

simple_response
async method
required
Generate a response from text input.Parameters:
  • text (str): The input text to process
  • processors (Optional[List[Processor]]): Optional list of processors to apply
  • participant (Optional[Participant]): Optional participant information
Returns: LLMResponseEvent[Any] - Response event containing the model’s output

Key Methods

register_function
method
Decorator to register a function with the LLM’s function registry.Parameters:
  • name (Optional[str]): Custom name for the function. Defaults to function name.
  • description (Optional[str]): Function description. Defaults to docstring.
Returns: Decorator functionExample:
@llm.register_function(name="get_weather", description="Get current weather")
async def get_weather(location: str) -> dict:
    return {"temperature": 72, "condition": "sunny"}
call_function
async method
Call a registered function with the given arguments.Parameters:
  • name (str): Name of the function to call
  • arguments (Dict[str, Any]): Dictionary of arguments to pass
Returns: Result of the function call
set_instructions
method
Set instructions for the LLM.Parameters:
  • instructions (Instructions | str): Instructions object or string

Provider-Specific Methods (Override in Subclasses)

_convert_tools_to_provider_format
method
Convert ToolSchema objects to provider-specific format.Parameters:
  • tools (List[ToolSchema]): List of ToolSchema objects
Returns: List[Dict[str, Any]] - Tools in provider-specific format
_extract_tool_calls_from_response
method
Extract tool calls from provider-specific response.Parameters:
  • response (Any): Provider-specific response object
Returns: List[NormalizedToolCallItem] - Normalized tool call items
_create_tool_result_message
method
Create tool result messages for the provider.Parameters:
  • tool_calls (List[NormalizedToolCallItem]): List of executed tool calls
  • results (List[Any]): List of results from function execution
Returns: List[Dict[str, Any]] - Tool result messages in provider format

AudioLLM

Base class for LLMs capable of processing speech-to-speech audio. These models do not require separate TTS and STT services. Location: vision_agents.core.llm.llm.AudioLLM
from vision_agents.core.llm.llm import AudioLLM

class MyAudioLLM(AudioLLM):
    async def simple_audio_response(
        self,
        pcm: PcmData,
        participant: Optional[Participant] = None
    ):
        # Process audio and generate audio response
        pass

Abstract Methods

simple_audio_response
async method
required
Process PCM audio frames and generate a response.Parameters:
  • pcm (PcmData): PCM audio frame to process (typically 48 kHz mono, 16-bit)
  • participant (Optional[Participant]): Optional participant information
The audio should be raw PCM matching the model’s expected format.

VideoLLM

Base class for LLMs capable of processing video input. Location: vision_agents.core.llm.llm.VideoLLM
from vision_agents.core.llm.llm import VideoLLM
import aiortc

class MyVideoLLM(VideoLLM):
    async def watch_video_track(
        self,
        track: aiortc.mediastreams.MediaStreamTrack,
        shared_forwarder: Optional[VideoForwarder] = None,
    ) -> None:
        # Start watching video track
        pass
    
    async def stop_watching_video_track(self) -> None:
        # Stop watching video track
        pass

Abstract Methods

watch_video_track
async method
required
Watch and forward video tracks.Parameters:
  • track (aiortc.mediastreams.MediaStreamTrack): Video track to watch
  • shared_forwarder (Optional[VideoForwarder]): Optional shared VideoForwarder instance to use instead of creating a new one. Allows multiple consumers to share the same video stream.
stop_watching_video_track
async method
required
Stop watching the video track and clean up resources.

OmniLLM

Base class for LLMs capable of both video and speech-to-speech audio processing. Combines AudioLLM and VideoLLM capabilities. Location: vision_agents.core.llm.llm.OmniLLM
from vision_agents.core.llm.llm import OmniLLM

class MyOmniLLM(OmniLLM):
    async def simple_audio_response(
        self,
        pcm: PcmData,
        participant: Optional[Participant] = None
    ):
        # Process audio
        pass
    
    async def watch_video_track(
        self,
        track: aiortc.mediastreams.MediaStreamTrack,
        shared_forwarder: Optional[VideoForwarder] = None,
    ) -> None:
        # Watch video
        pass
    
    async def stop_watching_video_track(self) -> None:
        # Stop watching video
        pass

Inherited Methods

OmniLLM inherits all abstract methods from both AudioLLM and VideoLLM:
  • simple_audio_response() - from AudioLLM
  • watch_video_track() - from VideoLLM
  • stop_watching_video_track() - from VideoLLM

Properties

agent
Optional[Agent]
Reference to the agent using this LLM. Set automatically by the agent.
events
EventManager
Event manager for emitting LLM events (tool calls, responses, etc.)
function_registry
FunctionRegistry
Registry of functions available for tool calling.

Tool Execution

The base LLM class provides built-in support for concurrent tool execution with timeout and deduplication:
# Tools are automatically executed with:
# - Concurrency limit (default: 8)
# - Timeout per tool (default: 30s)
# - Automatic deduplication
# - Event emission (ToolStartEvent, ToolEndEvent)

Events

LLM implementations emit the following events:
  • ToolStartEvent - When a tool execution starts
  • ToolEndEvent - When a tool execution completes (success or failure)
  • LLMResponseChunkEvent - For streaming responses
See Events for more details.

Build docs developers (and LLMs) love