Overview
Processors are modular components that:- Process incoming audio/video streams
- Publish outgoing processed streams back to calls
- Attach to agents and access agent functionality
- Run continuously or at intervals
- Provide state to LLMs for context-aware responses
Processor Types
Vision Agents provides base classes for different processing scenarios:Base Processor
All processors extend this abstract base:base_processor.py:16-43
Video Processors
VideoProcessor
Process incoming video streams:base_processor.py:58-82
VideoPublisher
Publish outgoing video streams:base_processor.py:45-56
VideoProcessorPublisher
Process incoming video and publish transformed output:base_processor.py:85-91
Audio Processors
AudioProcessor
Process incoming audio streams:base_processor.py:102-115
AudioPublisher
Publish outgoing audio streams:base_processor.py:93-100
AudioProcessorPublisher
Process incoming audio and publish transformed output:base_processor.py:117-121
Using Processors
Attaching to Agents
Pass processors to the agent constructor:- Calls
attach_agent()on each processor - Routes video/audio to appropriate processors
- Calls
start()when joining a call - Calls
stop()andclose()during cleanup
agents.py:249-250
Processor Lifecycle
Processors follow this lifecycle: Reference:agents.py:800-822
Accessing Agent State
Processors can access the agent they’re attached to:base_processor.py:34-42
Providing State to LLMs
Processors can provide context to LLM responses:simple_response().
Reference: agents.py:586-588
Video Track Management
Shared Video Forwarders
Multiple processors can share the same video stream efficiently:agents.py:1176-1190
Track Priority
Screen shares are prioritized over regular video:agents.py:1231-1237
Audio Track Management
Per-Participant Audio
Audio processors receive PCM data with participant info:base_processor.py:108-114
Multi-Speaker Filtering
The agent filters audio before passing to processors:agents.py:1121-1130
Performance Considerations
Video Frame Rate
Control how many frames you process:Async Processing
Avoid blocking the event loop:Resource Cleanup
Always clean up inclose():
Complete Example
Here’s a full processor that detects objects and provides context to the LLM:Best Practices
- Use shared forwarders: Always prefer
shared_forwarderover creating new ones - Process asynchronously: Don’t block the event loop with heavy computation
- Control frame rate: Process at the minimum FPS needed for your use case
- Clean up properly: Always implement
close()and cancel tasks - Provide useful state: Make
get_state()return LLM-friendly context - Handle missing participants: Check if
participantis present before using it - Use background tasks: Process frames in separate tasks for parallelism
- Test resource cleanup: Ensure no memory leaks when processors stop/start
Code References
- Base classes:
base_processor.py:16-121 - Agent integration:
agents.py:249-250,agents.py:1176-1190 - Video routing:
agents.py:1217-1253 - Audio routing:
agents.py:1102-1156
Next Steps
- Learn about Agents orchestration
- Explore Edge Networks for media transport
- Understand Realtime vs Interval processing modes