Observability

Vision Agents provides built-in observability through OpenTelemetry for metrics and tracing.

Quick Start

Metrics are automatically collected when you configure OpenTelemetry:

from opentelemetry import metrics
from opentelemetry.sdk.metrics import MeterProvider
from opentelemetry.exporter.prometheus import PrometheusMetricReader
from prometheus_client import start_http_server

# Configure Prometheus exporter
reader = PrometheusMetricReader()
provider = MeterProvider(metric_readers=[reader])
metrics.set_meter_provider(provider)

# Start Prometheus HTTP server
start_http_server(port=9464)

# Create your agent - metrics are automatically collected
agent = Agent(
    llm=gemini.LLM("gemini-2.5-flash-lite"),
    stt=deepgram.STT(eager_turn_detection=True),
    tts=elevenlabs.TTS(),
    ...
)

Metrics collection is automatic - no need to manually create a MetricsCollector. The framework subscribes to events internally.

View metrics at http://localhost:9464/metrics.

Available Metrics

LLM Metrics

llm_latency_ms - Total LLM response latency (request to completion)
llm_time_to_first_token_ms - Time to first token (streaming)
llm_tokens_input - Input/prompt tokens consumed
llm_tokens_output - Output/completion tokens generated
llm_tool_calls - Function/tool calls executed
llm_tool_latency_ms - Tool execution latency
llm_errors - LLM errors

STT Metrics

stt_latency_ms - Speech-to-text processing latency
stt_audio_duration_ms - Duration of audio processed
stt_errors - STT errors

TTS Metrics

tts_latency_ms - Text-to-speech synthesis latency
tts_audio_duration_ms - Duration of synthesized audio
tts_characters - Characters synthesized
tts_errors - TTS errors

Realtime Metrics

realtime_sessions - Realtime LLM sessions started
realtime_session_duration_ms - Duration of realtime sessions
realtime_audio_input_bytes - Audio bytes sent to realtime LLM
realtime_audio_output_bytes - Audio bytes received from realtime LLM
realtime_responses - Realtime LLM responses received
realtime_user_transcriptions - User speech transcriptions
realtime_agent_transcriptions - Agent speech transcriptions
realtime_errors - Realtime LLM errors

Video Metrics

video_frames_processed - Video frames processed
video_processing_latency_ms - Frame processing latency
video_detections - Objects/items detected
vlm_inferences - Vision LLM inference requests
vlm_inference_latency_ms - VLM inference latency
vlm_input_tokens - VLM input tokens (text + image)
vlm_output_tokens - VLM output tokens

Turn Detection Metrics

turn_duration_ms - Duration of detected turns
turn_trailing_silence_ms - Trailing silence before turn end

Complete Example

import logging
import sys
from opentelemetry import metrics
from opentelemetry.sdk.metrics import MeterProvider
from opentelemetry.exporter.prometheus import PrometheusMetricReader
from prometheus_client import start_http_server
from vision_agents.core import Agent, User
from vision_agents.plugins import getstream, gemini, deepgram, elevenlabs

# Configure OpenTelemetry
PROMETHEUS_PORT = 9464
reader = PrometheusMetricReader()
provider = MeterProvider(metric_readers=[reader])
metrics.set_meter_provider(provider)

# Start Prometheus HTTP server
start_http_server(PROMETHEUS_PORT)

logger = logging.getLogger(__name__)
logger.setLevel(logging.INFO)
handler = logging.StreamHandler(sys.stderr)
logger.addHandler(handler)

async def create_agent(**kwargs) -> Agent:
    llm = gemini.LLM("gemini-2.5-flash-lite")
    
    agent = Agent(
        edge=getstream.Edge(),
        agent_user=User(name="Metrics Demo Agent", id="agent"),
        instructions="You're a helpful assistant.",
        llm=llm,
        tts=elevenlabs.TTS(),
        stt=deepgram.STT(eager_turn_detection=True),
    )
    
    return agent

async def join_call(agent: Agent, call_type: str, call_id: str, **kwargs) -> None:
    logger.info("=" * 60)
    logger.info("Prometheus Metrics Agent")
    logger.info("=" * 60)
    logger.info(f"Metrics endpoint: http://localhost:{PROMETHEUS_PORT}/metrics")
    logger.info("")
    logger.info("Metrics being collected:")
    logger.info("  - llm_latency_ms, llm_time_to_first_token_ms")
    logger.info("  - llm_tokens_input, llm_tokens_output")
    logger.info("  - stt_latency_ms, tts_latency_ms")
    logger.info("=" * 60)
    
    call = await agent.create_call(call_type, call_id)
    
    async with agent.join(call):
        await agent.simple_response(
            "Hello! I'm demonstrating metrics collection. Ask me anything!"
        )
        await agent.finish()

if __name__ == "__main__":
    from vision_agents.core import AgentLauncher, Runner
    Runner(AgentLauncher(create_agent=create_agent, join_call=join_call)).cli()

Run the example:

uv run python prometheus_example.py run --call-type default --call-id test

View metrics at http://localhost:9464/metrics.

Distributed Tracing

Enable tracing with Jaeger or other OTLP-compatible backends:

Install Dependencies

uv add opentelemetry-sdk opentelemetry-exporter-otlp-proto-grpc

Configure Tracing

from opentelemetry import trace
from opentelemetry.sdk.resources import Resource
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter

resource = Resource.create({"service.name": "vision-agents"})
tp = TracerProvider(resource=resource)
exporter = OTLPSpanExporter(endpoint="localhost:4317", insecure=True)

tp.add_span_processor(BatchSpanProcessor(exporter))
trace.set_tracer_provider(tp)

Run Jaeger

docker run --rm -it \
  -e COLLECTOR_OTLP_ENABLED=true \
  -p 16686:16686 -p 4317:4317 -p 4318:4318 \
  jaegertracing/all-in-one:1.51

View Traces

Open http://localhost:16686 to see traces in Jaeger UI.

Custom Metrics

Add custom metrics for your application:

from opentelemetry import metrics

meter = metrics.get_meter("my_application")

# Create custom metrics
custom_counter = meter.create_counter(
    "custom.events",
    description="Custom events counter"
)

custom_histogram = meter.create_histogram(
    "custom.latency.ms",
    unit="ms",
    description="Custom operation latency"
)

# Record metrics
custom_counter.add(1, {"event_type": "user_action"})
custom_histogram.record(123.45, {"operation": "process_frame"})

Agent-Level Metrics

Access built-in agent metrics directly:

agent = Agent(...)

# Agent exposes simple metrics
print(f"LLM latency avg: {agent.metrics.llm_latency_ms__avg.value}ms")
print(f"Total tokens: {agent.metrics.llm_input_tokens__total.value}")
print(f"Tool calls: {agent.metrics.llm_tool_calls__total.value}")

Available agent metrics:

agent.metrics.llm_latency_ms__avg
agent.metrics.llm_time_to_first_token_ms__avg
agent.metrics.llm_input_tokens__total
agent.metrics.llm_output_tokens__total
agent.metrics.llm_tool_calls__total
agent.metrics.llm_tool_latency_ms__avg

agent.metrics.stt_latency_ms__avg
agent.metrics.stt_audio_duration_ms__total

agent.metrics.tts_latency_ms__avg
agent.metrics.tts_audio_duration_ms__total
agent.metrics.tts_characters__total

agent.metrics.turn_duration_ms__avg
agent.metrics.turn_trailing_silence_ms__avg

agent.metrics.video_frames_processed__total
agent.metrics.video_processing_latency_ms__avg

agent.metrics.vlm_inferences__total
agent.metrics.vlm_inference_latency_ms__avg

Grafana Dashboard

Create a Grafana dashboard for visualization:

Add Prometheus data source pointing to http://localhost:9464
Create panels for key metrics:

# LLM latency over time
rate(llm_latency_ms_sum[5m]) / rate(llm_latency_ms_count[5m])

# Token usage
rate(llm_tokens_input[5m])
rate(llm_tokens_output[5m])

# Error rates
rate(llm_errors[5m])
rate(stt_errors[5m])
rate(tts_errors[5m])

# Video processing
rate(video_frames_processed[5m])

Performance Profiling

Use the built-in profiler for detailed performance analysis:

from vision_agents.core.profiling import Profiler

agent = Agent(
    ...
    profiler=Profiler(output_path='./profile.html'),
)

# Profiling starts automatically
# When agent finishes, profile is saved to profile.html

The profiler:

Starts when agent is created
Stops when AgentFinishEvent is emitted
Generates an HTML report with timeline visualization
Shows function calls and time spent

Open profile.html in a browser to analyze performance.

Production Monitoring

Export to Cloud

Configure exporters for your cloud provider:

from opentelemetry.exporter.cloud_monitoring import CloudMonitoringMetricsExporter

exporter = CloudMonitoringMetricsExporter()
provider = MeterProvider(metric_readers=[MetricReader(exporter)])

Set Up Alerts

Create alerts for critical metrics:

High LLM latency (> 2s)
High error rates (> 1%)
Low response rates
Resource exhaustion

Monitor Costs

Track token usage to control costs:

sum(rate(llm_tokens_input[1h])) * 3600 * 24  # Daily token usage

Best Practices

Always configure OpenTelemetry before creating agents
Use labels/attributes to filter metrics by provider, model, etc.
Set up alerts for errors and latency spikes
Monitor token usage to control costs
Use tracing to debug complex flows
Profile in development to identify bottlenecks

Example: Metrics in Production

import os
from opentelemetry import metrics
from opentelemetry.sdk.metrics import MeterProvider
from opentelemetry.exporter.prometheus import PrometheusMetricReader
from opentelemetry.sdk.resources import Resource
from prometheus_client import start_http_server

# Configure with resource attributes
resource = Resource.create({
    "service.name": "vision-agents",
    "service.version": "1.0.0",
    "deployment.environment": os.environ.get("ENVIRONMENT", "production"),
})

reader = PrometheusMetricReader()
provider = MeterProvider(resource=resource, metric_readers=[reader])
metrics.set_meter_provider(provider)

# Start metrics server
start_http_server(port=9464)

# Now create your agent
agent = Agent(...)

Examples

See examples/06_prometheus_metrics_example/prometheus_metrics_example.py for a complete working example.

Next Steps

Deploy agents: Deployment
Review metrics in agents-core/vision_agents/core/observability/metrics.py
Check event definitions in agents-core/vision_agents/core/*/events.py

Get Started

Core Concepts

Building Agents

Integrations

Examples

Quick Start

Available Metrics

LLM Metrics

STT Metrics

TTS Metrics

Realtime Metrics

Video Metrics

Turn Detection Metrics

Complete Example

Distributed Tracing

Custom Metrics

Agent-Level Metrics

Grafana Dashboard

Performance Profiling

Production Monitoring

Best Practices

Example: Metrics in Production

Examples

Next Steps

Build docs developers (and LLMs) love

Get Started

Core Concepts

Building Agents

Integrations

Examples

​Quick Start

​Available Metrics

​LLM Metrics

​STT Metrics

​TTS Metrics

​Realtime Metrics

​Video Metrics

​Turn Detection Metrics

​Complete Example

​Distributed Tracing

​Custom Metrics

​Agent-Level Metrics

​Grafana Dashboard

​Performance Profiling

​Production Monitoring

​Best Practices

​Example: Metrics in Production

​Examples

​Next Steps

Build docs developers (and LLMs) love

Quick Start

Available Metrics

LLM Metrics

STT Metrics

TTS Metrics

Realtime Metrics

Video Metrics

Turn Detection Metrics

Complete Example

Distributed Tracing

Custom Metrics

Agent-Level Metrics

Grafana Dashboard

Performance Profiling

Production Monitoring

Best Practices

Example: Metrics in Production

Examples

Next Steps