Overview
The LLM module provides abstract base classes for different types of language models in Vision Agents. These classes define the interface that all LLM implementations must follow.Base Classes
LLM
The base class for all language model implementations. Location:vision_agents.core.llm.llm.LLM
Abstract Methods
Generate a response from text input.Parameters:
text(str): The input text to processprocessors(Optional[List[Processor]]): Optional list of processors to applyparticipant(Optional[Participant]): Optional participant information
LLMResponseEvent[Any] - Response event containing the model’s outputKey Methods
Decorator to register a function with the LLM’s function registry.Parameters:
name(Optional[str]): Custom name for the function. Defaults to function name.description(Optional[str]): Function description. Defaults to docstring.
Call a registered function with the given arguments.Parameters:
name(str): Name of the function to callarguments(Dict[str, Any]): Dictionary of arguments to pass
Set instructions for the LLM.Parameters:
instructions(Instructions | str): Instructions object or string
Provider-Specific Methods (Override in Subclasses)
Convert ToolSchema objects to provider-specific format.Parameters:
tools(List[ToolSchema]): List of ToolSchema objects
List[Dict[str, Any]] - Tools in provider-specific formatExtract tool calls from provider-specific response.Parameters:
response(Any): Provider-specific response object
List[NormalizedToolCallItem] - Normalized tool call itemsCreate tool result messages for the provider.Parameters:
tool_calls(List[NormalizedToolCallItem]): List of executed tool callsresults(List[Any]): List of results from function execution
List[Dict[str, Any]] - Tool result messages in provider formatAudioLLM
Base class for LLMs capable of processing speech-to-speech audio. These models do not require separate TTS and STT services. Location:vision_agents.core.llm.llm.AudioLLM
Abstract Methods
Process PCM audio frames and generate a response.Parameters:
pcm(PcmData): PCM audio frame to process (typically 48 kHz mono, 16-bit)participant(Optional[Participant]): Optional participant information
VideoLLM
Base class for LLMs capable of processing video input. Location:vision_agents.core.llm.llm.VideoLLM
Abstract Methods
Watch and forward video tracks.Parameters:
track(aiortc.mediastreams.MediaStreamTrack): Video track to watchshared_forwarder(Optional[VideoForwarder]): Optional shared VideoForwarder instance to use instead of creating a new one. Allows multiple consumers to share the same video stream.
Stop watching the video track and clean up resources.
OmniLLM
Base class for LLMs capable of both video and speech-to-speech audio processing. CombinesAudioLLM and VideoLLM capabilities.
Location: vision_agents.core.llm.llm.OmniLLM
Inherited Methods
OmniLLM inherits all abstract methods from bothAudioLLM and VideoLLM:
simple_audio_response()- from AudioLLMwatch_video_track()- from VideoLLMstop_watching_video_track()- from VideoLLM
Properties
Reference to the agent using this LLM. Set automatically by the agent.
Event manager for emitting LLM events (tool calls, responses, etc.)
Registry of functions available for tool calling.
Tool Execution
The base LLM class provides built-in support for concurrent tool execution with timeout and deduplication:Events
LLM implementations emit the following events:ToolStartEvent- When a tool execution startsToolEndEvent- When a tool execution completes (success or failure)LLMResponseChunkEvent- For streaming responses