OpenTelemetry

The OpenTelemetry feature instruments your AI agents with industry-standard observability, enabling distributed tracing, metrics collection, and integration with monitoring backends like Jaeger, Zipkin, and Prometheus.

What is OpenTelemetry?

OpenTelemetry is an observability framework that provides:

Distributed Tracing: Track requests across system boundaries
Metrics Collection: Monitor performance and usage patterns
Spans: Represent individual operations with timing and metadata
Exporters: Send telemetry to various backends
Semantic Conventions: Standardized attribute names for AI/LLM operations

Koog’s OpenTelemetry feature automatically creates spans for:

Agent creation and invocation
Strategy execution
Node execution
LLM calls (including token usage)
Tool calls

Installation

import ai.koog.agents.features.opentelemetry.OpenTelemetry
import io.opentelemetry.exporter.logging.LoggingSpanExporter

val agent = AIAgent(
    executor = myExecutor,
    strategy = myStrategy
) {
    install(OpenTelemetry) {
        // Set service information
        setServiceInfo("my-agent-service", "1.0.0")
        
        // Add span exporter
        addSpanExporter(LoggingSpanExporter.create())
    }
}

Configuration

Basic Configuration

install(OpenTelemetry) {
    // Service name and version
    setServiceInfo("my-agent", "1.0.0")
    
    // Add exporters
    addSpanExporter(LoggingSpanExporter.create())
    
    // Enable verbose logging for debugging
    setVerbose(true)
}

Configuration Options

Option	Type	Default	Description
`serviceName`	`String`	`"ai.koog"`	Name of the service being instrumented
`serviceVersion`	`String`	`"0.0.0"`	Version of the service
`isVerbose`	`Boolean`	`false`	Enable verbose logging for debugging
`sdk`	`OpenTelemetrySdk`	Auto-configured	Custom SDK instance
`tracer`	`Tracer`	Auto-created	Custom tracer instance

Configuration Methods

install(OpenTelemetry) {
    // Set service info
    setServiceInfo(
        serviceName = "my-agent",
        serviceVersion = "2.0.0"
    )
    
    // Add span exporters
    addSpanExporter(exporter)
    
    // Add span processors
    addSpanProcessor { exporter ->
        BatchSpanProcessor.builder(exporter).build()
    }
    
    // Add resource attributes
    addResourceAttributes(mapOf(
        AttributeKey.stringKey("environment") to "production",
        AttributeKey.stringKey("region") to "us-east-1"
    ))
    
    // Set sampling strategy
    setSampler(Sampler.alwaysOn())
    
    // Enable verbose logging
    setVerbose(true)
}

Exporters

OTLP Exporter (Recommended)

Send telemetry to OpenTelemetry Collector:

import io.opentelemetry.exporter.otlp.trace.OtlpGrpcSpanExporter

addSpanExporter(
    OtlpGrpcSpanExporter.builder()
        .setEndpoint("http://localhost:4317")  // Default OTLP endpoint
        .build()
)

Logging Exporter

Output traces to console (useful for development):

import io.opentelemetry.exporter.logging.LoggingSpanExporter

addSpanExporter(LoggingSpanExporter.create())

Jaeger Exporter

Send traces directly to Jaeger:

import io.opentelemetry.exporter.jaeger.JaegerGrpcSpanExporter

addSpanExporter(
    JaegerGrpcSpanExporter.builder()
        .setEndpoint("http://localhost:14250")
        .build()
)

Zipkin Exporter

Send traces to Zipkin:

import io.opentelemetry.exporter.zipkin.ZipkinSpanExporter

addSpanExporter(
    ZipkinSpanExporter.builder()
        .setEndpoint("http://localhost:9411/api/v2/spans")
        .build()
)

Integration with Jaeger

Jaeger is a popular distributed tracing system. Here’s how to set it up:

1. Start Jaeger with Docker

# docker-compose.yaml
version: '3'
services:
  jaeger:
    image: jaegertracing/all-in-one:1.39
    container_name: jaeger
    environment:
      - COLLECTOR_OTLP_ENABLED=true
    ports:
      - "4317:4317"   # OTLP gRPC
      - "16686:16686" # Jaeger UI
      - "14250:14250" # Jaeger gRPC

Start with: docker-compose up -d

2. Configure Agent

import io.opentelemetry.exporter.otlp.trace.OtlpGrpcSpanExporter

val agent = AIAgent(...) {
    install(OpenTelemetry) {
        setServiceInfo("my-agent", "1.0.0")
        
        addSpanExporter(
            OtlpGrpcSpanExporter.builder()
                .setEndpoint("http://localhost:4317")
                .build()
        )
    }
}

3. View Traces

Open Jaeger UI at http://localhost:16686 to view your traces.

Span Types and Attributes

Koog creates different span types following OpenTelemetry Semantic Conventions for GenAI:

Create Agent Span

Long-lived span for the agent’s lifetime: Attributes:

gen_ai.operation.name = "create_agent"
gen_ai.agent.id
gen_ai.request.model

Invoke Agent Span

One execution run of an agent: Attributes:

gen_ai.operation.name = "invoke_agent"
gen_ai.agent.id
gen_ai.conversation.id
gen_ai.system (LLM provider)
gen_ai.response.finish_reasons (on error)

Strategy Span

Strategy execution: Attributes:

gen_ai.conversation.id
koog.strategy.name
koog.event.id

Node Execute Span

Individual node execution: Attributes:

gen_ai.conversation.id
koog.node.id

Inference Span (LLM Call)

Single LLM call: Attributes:

gen_ai.operation.name = "chat"
gen_ai.system (provider: “openai”, “anthropic”, etc.)
gen_ai.request.model
gen_ai.request.temperature
gen_ai.request.max_tokens
gen_ai.usage.input_tokens
gen_ai.usage.output_tokens
gen_ai.usage.total_tokens
gen_ai.response.finish_reasons

Events:

System, user, and assistant messages
Tool choice and tool result messages
Moderation responses

Execute Tool Span

Tool execution: Attributes:

gen_ai.tool.name
gen_ai.tool.description
gen_ai.tool.arguments
gen_ai.tool.call_id
gen_ai.tool.output
error.type (on failure)

Resource Attributes

Default resource attributes automatically added:

service.name: Service name
service.version: Service version
service.instance.time: Instance creation timestamp
os.type: Operating system type
os.version: OS version
os.arch: OS architecture

Add custom attributes:

addResourceAttributes(mapOf(
    AttributeKey.stringKey("deployment.environment") to "production",
    AttributeKey.stringKey("deployment.region") to "us-east-1",
    AttributeKey.stringKey("team") to "ai-platform"
))

Sampling

Control which spans are collected:

import io.opentelemetry.sdk.trace.samplers.Sampler

// Always sample (default)
setSampler(Sampler.alwaysOn())

// Never sample
setSampler(Sampler.alwaysOff())

// Sample 10% of traces
setSampler(Sampler.traceIdRatioBased(0.1))

// Parent-based sampling (follow parent span's decision)
setSampler(Sampler.parentBased(Sampler.traceIdRatioBased(0.1)))

Custom SDK

Provide a pre-configured OpenTelemetry SDK:

import io.opentelemetry.sdk.OpenTelemetrySdk
import io.opentelemetry.sdk.trace.SdkTracerProvider

val sdk = OpenTelemetrySdk.builder()
    .setTracerProvider(
        SdkTracerProvider.builder()
            .addSpanProcessor(myProcessor)
            .setSampler(mySampler)
            .build()
    )
    .build()

install(OpenTelemetry) {
    setSdk(sdk)
}

When using setSdk(), other configuration methods like addSpanExporter() are ignored since the SDK is already configured.

Examples

Basic Example

import ai.koog.agents.features.opentelemetry.OpenTelemetry
import io.opentelemetry.exporter.logging.LoggingSpanExporter
import java.util.concurrent.TimeUnit

suspend fun main() {
    val agent = AIAgent(
        executor = simpleGoogleAIExecutor(apiKey),
        llmModel = GoogleModels.Gemini2_0Flash,
        systemPrompt = "You are a code assistant."
    ) {
        install(OpenTelemetry) {
            setServiceInfo("code-assistant", "1.0.0")
            addSpanExporter(LoggingSpanExporter.create())
        }
    }
    
    val result = agent.run("Create a Python hello world function")
    println(result)
    
    // Wait for telemetry to be exported
    TimeUnit.SECONDS.sleep(10)
}

Production Example

import io.opentelemetry.exporter.otlp.trace.OtlpGrpcSpanExporter
import io.opentelemetry.sdk.trace.samplers.Sampler

suspend fun main() {
    val agent = AIAgent(
        executor = simpleOpenAIExecutor(apiKey),
        llmModel = OpenAIModels.Chat.GPT4o,
        systemPrompt = "You are a helpful assistant."
    ) {
        install(OpenTelemetry) {
            // Service identification
            setServiceInfo("production-agent", "2.0.0")
            
            // Sampling strategy (10% of traces)
            setSampler(Sampler.traceIdRatioBased(0.1))
            
            // Resource attributes
            addResourceAttributes(mapOf(
                AttributeKey.stringKey("deployment.environment") to "production",
                AttributeKey.stringKey("service.namespace") to "ai-platform",
                AttributeKey.stringKey("cloud.region") to "us-west-2"
            ))
            
            // Export to OTLP collector
            addSpanExporter(
                OtlpGrpcSpanExporter.builder()
                    .setEndpoint("https://otel-collector.example.com:4317")
                    .addHeader("Authorization", "Bearer $apiToken")
                    .build()
            )
        }
    }
    
    val result = agent.run("Analyze system performance")
    println(result)
    
    TimeUnit.SECONDS.sleep(10)
}

Multi-Exporter Example

install(OpenTelemetry) {
    setServiceInfo("multi-export-agent", "1.0.0")
    
    // Export to Jaeger for visualization
    addSpanExporter(
        OtlpGrpcSpanExporter.builder()
            .setEndpoint("http://localhost:4317")
            .build()
    )
    
    // Also log to console for debugging
    addSpanExporter(LoggingSpanExporter.create())
    
    // Custom processor for each exporter
    addSpanProcessor { exporter ->
        BatchSpanProcessor.builder(exporter)
            .setMaxQueueSize(2048)
            .setScheduleDelay(Duration.ofSeconds(5))
            .build()
    }
}

Token Usage Tracking

OpenTelemetry automatically tracks token usage for LLM calls:

// Token attributes in Inference spans:
// - gen_ai.usage.input_tokens
// - gen_ai.usage.output_tokens  
// - gen_ai.usage.total_tokens

// Query token usage from your observability backend:
// Example Jaeger query: service="my-agent" AND gen_ai.usage.total_tokens > 1000

Use this for:

Cost monitoring: Track API usage costs
Performance analysis: Identify expensive prompts
Optimization: Find opportunities to reduce token usage

Troubleshooting

No traces appearing

Verify the exporter endpoint is accessible
Check that sampling is not set to alwaysOff()
Ensure you wait for async export (add delay before exit)
Enable verbose logging: setVerbose(true)

Missing spans or incomplete traces

Verify agent execution completes successfully
Check for exceptions in your code
Ensure proper span processor configuration
Wait sufficient time for batch processing

Too many spans

Adjust sampling rate: Sampler.traceIdRatioBased(0.1)
Use parent-based sampling for consistency
Filter at the collector level

High memory usage

Reduce max queue size in BatchSpanProcessor
Decrease schedule delay for more frequent exports
Increase sampling ratio to collect fewer traces

Best Practices

Use OTLP exporter

OTLP is the standard protocol and works with all major backends. Prefer it over backend-specific exporters.

Configure appropriate sampling

In production, use ratio-based sampling (e.g., 0.1 = 10%) to balance observability with overhead.

Add meaningful resource attributes

Include environment, region, version to make traces easier to filter and analyze.

Monitor token usage

Use gen_ai.usage.* attributes to track and optimize LLM costs.

Wait for export completion

Add a delay before application exit to ensure all spans are exported.

Performance: Excessive tracing can impact agent performance. Use sampling and batch processing in production.

Tracing

Koog-specific tracing with custom message processors

Event Handlers

Lightweight hooks for custom monitoring logic

Get Started

Core Concepts

Building Agents

LLM Providers

Features

Integrations

Advanced

Documentation Index

​What is OpenTelemetry?

​Installation

​Configuration

​Basic Configuration

​Configuration Options

​Configuration Methods

​Exporters

​OTLP Exporter (Recommended)

​Logging Exporter

​Jaeger Exporter

​Zipkin Exporter

​Integration with Jaeger

​1. Start Jaeger with Docker

​2. Configure Agent

​3. View Traces

​Span Types and Attributes

​Create Agent Span

​Invoke Agent Span

​Strategy Span

​Node Execute Span

​Inference Span (LLM Call)

​Execute Tool Span

​Resource Attributes

​Sampling

​Custom SDK

​Examples

​Basic Example

​Production Example

​Multi-Exporter Example

​Token Usage Tracking

​Troubleshooting

​Best Practices

​Related Features

Tracing

Event Handlers

Build docs developers (and LLMs) love

What is OpenTelemetry?

Installation

Configuration

Basic Configuration

Configuration Options

Configuration Methods

Exporters

OTLP Exporter (Recommended)

Logging Exporter

Jaeger Exporter

Zipkin Exporter

Integration with Jaeger

1. Start Jaeger with Docker

2. Configure Agent

3. View Traces

Span Types and Attributes

Create Agent Span

Invoke Agent Span

Strategy Span

Node Execute Span

Inference Span (LLM Call)

Execute Tool Span

Resource Attributes

Sampling

Custom SDK

Examples

Basic Example

Production Example

Multi-Exporter Example

Token Usage Tracking

Troubleshooting

Best Practices

Related Features