Vision Agents provides built-in observability through OpenTelemetry for metrics and tracing.
Quick Start
Metrics are automatically collected when you configure OpenTelemetry:
from opentelemetry import metrics
from opentelemetry.sdk.metrics import MeterProvider
from opentelemetry.exporter.prometheus import PrometheusMetricReader
from prometheus_client import start_http_server
# Configure Prometheus exporter
reader = PrometheusMetricReader()
provider = MeterProvider( metric_readers = [reader])
metrics.set_meter_provider(provider)
# Start Prometheus HTTP server
start_http_server( port = 9464 )
# Create your agent - metrics are automatically collected
agent = Agent(
llm = gemini.LLM( "gemini-2.5-flash-lite" ),
stt = deepgram.STT( eager_turn_detection = True ),
tts = elevenlabs.TTS(),
...
)
Metrics collection is automatic - no need to manually create a MetricsCollector. The framework subscribes to events internally.
View metrics at http://localhost:9464/metrics.
Available Metrics
LLM Metrics
llm_latency_ms - Total LLM response latency (request to completion)
llm_time_to_first_token_ms - Time to first token (streaming)
llm_tokens_input - Input/prompt tokens consumed
llm_tokens_output - Output/completion tokens generated
llm_tool_calls - Function/tool calls executed
llm_tool_latency_ms - Tool execution latency
llm_errors - LLM errors
STT Metrics
stt_latency_ms - Speech-to-text processing latency
stt_audio_duration_ms - Duration of audio processed
stt_errors - STT errors
TTS Metrics
tts_latency_ms - Text-to-speech synthesis latency
tts_audio_duration_ms - Duration of synthesized audio
tts_characters - Characters synthesized
tts_errors - TTS errors
Realtime Metrics
realtime_sessions - Realtime LLM sessions started
realtime_session_duration_ms - Duration of realtime sessions
realtime_audio_input_bytes - Audio bytes sent to realtime LLM
realtime_audio_output_bytes - Audio bytes received from realtime LLM
realtime_responses - Realtime LLM responses received
realtime_user_transcriptions - User speech transcriptions
realtime_agent_transcriptions - Agent speech transcriptions
realtime_errors - Realtime LLM errors
Video Metrics
video_frames_processed - Video frames processed
video_processing_latency_ms - Frame processing latency
video_detections - Objects/items detected
vlm_inferences - Vision LLM inference requests
vlm_inference_latency_ms - VLM inference latency
vlm_input_tokens - VLM input tokens (text + image)
vlm_output_tokens - VLM output tokens
Turn Detection Metrics
turn_duration_ms - Duration of detected turns
turn_trailing_silence_ms - Trailing silence before turn end
Complete Example
import logging
import sys
from opentelemetry import metrics
from opentelemetry.sdk.metrics import MeterProvider
from opentelemetry.exporter.prometheus import PrometheusMetricReader
from prometheus_client import start_http_server
from vision_agents.core import Agent, User
from vision_agents.plugins import getstream, gemini, deepgram, elevenlabs
# Configure OpenTelemetry
PROMETHEUS_PORT = 9464
reader = PrometheusMetricReader()
provider = MeterProvider( metric_readers = [reader])
metrics.set_meter_provider(provider)
# Start Prometheus HTTP server
start_http_server( PROMETHEUS_PORT )
logger = logging.getLogger( __name__ )
logger.setLevel(logging. INFO )
handler = logging.StreamHandler(sys.stderr)
logger.addHandler(handler)
async def create_agent ( ** kwargs ) -> Agent:
llm = gemini.LLM( "gemini-2.5-flash-lite" )
agent = Agent(
edge = getstream.Edge(),
agent_user = User( name = "Metrics Demo Agent" , id = "agent" ),
instructions = "You're a helpful assistant." ,
llm = llm,
tts = elevenlabs.TTS(),
stt = deepgram.STT( eager_turn_detection = True ),
)
return agent
async def join_call ( agent : Agent, call_type : str , call_id : str , ** kwargs ) -> None :
logger.info( "=" * 60 )
logger.info( "Prometheus Metrics Agent" )
logger.info( "=" * 60 )
logger.info( f "Metrics endpoint: http://localhost: { PROMETHEUS_PORT } /metrics" )
logger.info( "" )
logger.info( "Metrics being collected:" )
logger.info( " - llm_latency_ms, llm_time_to_first_token_ms" )
logger.info( " - llm_tokens_input, llm_tokens_output" )
logger.info( " - stt_latency_ms, tts_latency_ms" )
logger.info( "=" * 60 )
call = await agent.create_call(call_type, call_id)
async with agent.join(call):
await agent.simple_response(
"Hello! I'm demonstrating metrics collection. Ask me anything!"
)
await agent.finish()
if __name__ == "__main__" :
from vision_agents.core import AgentLauncher, Runner
Runner(AgentLauncher( create_agent = create_agent, join_call = join_call)).cli()
Run the example:
uv run python prometheus_example.py run --call-type default --call-id test
View metrics at http://localhost:9464/metrics.
Distributed Tracing
Enable tracing with Jaeger or other OTLP-compatible backends:
Install Dependencies
uv add opentelemetry-sdk opentelemetry-exporter-otlp-proto-grpc
Configure Tracing
from opentelemetry import trace
from opentelemetry.sdk.resources import Resource
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
resource = Resource.create({ "service.name" : "vision-agents" })
tp = TracerProvider( resource = resource)
exporter = OTLPSpanExporter( endpoint = "localhost:4317" , insecure = True )
tp.add_span_processor(BatchSpanProcessor(exporter))
trace.set_tracer_provider(tp)
Run Jaeger
docker run --rm -it \
-e COLLECTOR_OTLP_ENABLED= true \
-p 16686:16686 -p 4317:4317 -p 4318:4318 \
jaegertracing/all-in-one:1.51
View Traces
Open http://localhost:16686 to see traces in Jaeger UI.
Custom Metrics
Add custom metrics for your application:
from opentelemetry import metrics
meter = metrics.get_meter( "my_application" )
# Create custom metrics
custom_counter = meter.create_counter(
"custom.events" ,
description = "Custom events counter"
)
custom_histogram = meter.create_histogram(
"custom.latency.ms" ,
unit = "ms" ,
description = "Custom operation latency"
)
# Record metrics
custom_counter.add( 1 , { "event_type" : "user_action" })
custom_histogram.record( 123.45 , { "operation" : "process_frame" })
Agent-Level Metrics
Access built-in agent metrics directly:
agent = Agent( ... )
# Agent exposes simple metrics
print ( f "LLM latency avg: { agent.metrics.llm_latency_ms__avg.value } ms" )
print ( f "Total tokens: { agent.metrics.llm_input_tokens__total.value } " )
print ( f "Tool calls: { agent.metrics.llm_tool_calls__total.value } " )
Available agent metrics:
agent.metrics.llm_latency_ms__avg
agent.metrics.llm_time_to_first_token_ms__avg
agent.metrics.llm_input_tokens__total
agent.metrics.llm_output_tokens__total
agent.metrics.llm_tool_calls__total
agent.metrics.llm_tool_latency_ms__avg
agent.metrics.stt_latency_ms__avg
agent.metrics.stt_audio_duration_ms__total
agent.metrics.tts_latency_ms__avg
agent.metrics.tts_audio_duration_ms__total
agent.metrics.tts_characters__total
agent.metrics.turn_duration_ms__avg
agent.metrics.turn_trailing_silence_ms__avg
agent.metrics.video_frames_processed__total
agent.metrics.video_processing_latency_ms__avg
agent.metrics.vlm_inferences__total
agent.metrics.vlm_inference_latency_ms__avg
Grafana Dashboard
Create a Grafana dashboard for visualization:
Add Prometheus data source pointing to http://localhost:9464
Create panels for key metrics:
# LLM latency over time
rate(llm_latency_ms_sum[5m]) / rate(llm_latency_ms_count[5m])
# Token usage
rate(llm_tokens_input[5m])
rate(llm_tokens_output[5m])
# Error rates
rate(llm_errors[5m])
rate(stt_errors[5m])
rate(tts_errors[5m])
# Video processing
rate(video_frames_processed[5m])
Use the built-in profiler for detailed performance analysis:
from vision_agents.core.profiling import Profiler
agent = Agent(
...
profiler = Profiler( output_path = './profile.html' ),
)
# Profiling starts automatically
# When agent finishes, profile is saved to profile.html
The profiler:
Starts when agent is created
Stops when AgentFinishEvent is emitted
Generates an HTML report with timeline visualization
Shows function calls and time spent
Open profile.html in a browser to analyze performance.
Production Monitoring
Export to Cloud
Configure exporters for your cloud provider: Google Cloud
AWS CloudWatch
Datadog
from opentelemetry.exporter.cloud_monitoring import CloudMonitoringMetricsExporter
exporter = CloudMonitoringMetricsExporter()
provider = MeterProvider( metric_readers = [MetricReader(exporter)])
Set Up Alerts
Create alerts for critical metrics:
High LLM latency (> 2s)
High error rates (> 1%)
Low response rates
Resource exhaustion
Monitor Costs
Track token usage to control costs: sum(rate(llm_tokens_input[1h])) * 3600 * 24 # Daily token usage
Best Practices
Always configure OpenTelemetry before creating agents
Use labels/attributes to filter metrics by provider, model, etc.
Set up alerts for errors and latency spikes
Monitor token usage to control costs
Use tracing to debug complex flows
Profile in development to identify bottlenecks
Example: Metrics in Production
import os
from opentelemetry import metrics
from opentelemetry.sdk.metrics import MeterProvider
from opentelemetry.exporter.prometheus import PrometheusMetricReader
from opentelemetry.sdk.resources import Resource
from prometheus_client import start_http_server
# Configure with resource attributes
resource = Resource.create({
"service.name" : "vision-agents" ,
"service.version" : "1.0.0" ,
"deployment.environment" : os.environ.get( "ENVIRONMENT" , "production" ),
})
reader = PrometheusMetricReader()
provider = MeterProvider( resource = resource, metric_readers = [reader])
metrics.set_meter_provider(provider)
# Start metrics server
start_http_server( port = 9464 )
# Now create your agent
agent = Agent( ... )
Examples
See examples/06_prometheus_metrics_example/prometheus_metrics_example.py for a complete working example.
Next Steps
Deploy agents: Deployment
Review metrics in agents-core/vision_agents/core/observability/metrics.py
Check event definitions in agents-core/vision_agents/core/*/events.py