Observability

Genkit provides built-in observability through automatic tracing, a local Developer UI, and production monitoring integrations with OpenTelemetry.

Why Observability Matters for AI

AI applications are inherently non-deterministic and complex:

Multi-step workflows: Model calls, tool invocations, data retrieval
Non-deterministic: Same input can produce different outputs
Expensive: Token costs, latency, rate limits
Hard to debug: What went wrong in a 10-step agentic workflow?

Genkit’s observability features help you:

Debug failures: See exactly which step failed and why
Optimize performance: Identify slow model calls or bottlenecks
Monitor costs: Track token usage across models
Improve quality: Analyze outputs and refine prompts

Automatic Tracing

Every action in Genkit is automatically traced:

┌──────────────────────────────────────────────────────────────┐
│ Trace: myFlow                                                │
│ Duration: 3.5s                                               │
├──────────────────────────────────────────────────────────────┤
│                                                              │
│  Span: flow/myFlow (3.5s)                                    │
│  ├─ input: {"query": "quantum computing"}                    │
│  ├─ output: {"result": "Quantum computing is..."}          │
│  │                                                            │
│  ├──► Span: generate (2.1s)                                 │
│  │   ├─ model: googleai/gemini-2.0-flash                  │
│  │   ├─ input tokens: 150                                 │
│  │   ├─ output tokens: 450                                │
│  │   ├─ tool calls: [search, analyze]                     │
│  │   │                                                      │
│  │   ├──► Span: tool/search (0.8s)                       │
│  │   │   ├─ input: {"query": "quantum computing"}        │
│  │   │   └─ output: [...search results...]               │
│  │   │                                                      │
│  │   └──► Span: tool/analyze (0.5s)                      │
│  │       ├─ input: {"text": "..."}                        │
│  │       └─ output: {"summary": "..."}                   │
│  │                                                            │
│  └──► Span: generate (1.2s)                                 │
│      ├─ model: googleai/gemini-2.0-flash                  │
│      ├─ input tokens: 300                                 │
│      └─ output tokens: 200                                │
└──────────────────────────────────────────────────────────────┘

Every span captures:

Timing: Start time, duration
Input/Output: Request and response data
Metadata: Model name, token usage, cost
Errors: Stack traces and error messages
Hierarchy: Parent-child relationships

Developer UI

The Developer UI provides a local dashboard for testing and debugging:

Starting the Dev UI

npx genkit start

Then open http://localhost:4000

Features

1. Action Browser

Browse all flows, models, prompts, tools
See input/output schemas
Read descriptions and metadata

2. Flow Runner

Run flows with test inputs
See results in real-time
Test streaming responses
Try different configurations

3. Trace Inspector

View execution traces
Expand/collapse spans
See timing breakdowns
Inspect input/output at each step
Filter by flow, model, or time range

4. Prompt Editor

Edit .prompt files
Test with sample inputs
See rendered output
Compare variants

5. Model Tester

Test models directly
Compare different models
Adjust temperature, topK, etc.
See token usage and cost

OpenTelemetry Integration

Genkit uses OpenTelemetry for all tracing, making it compatible with any observability platform.

How It Works

┌─────────────────────────────────────────────────────────────────────┐
│                    Your Genkit Application                          │
│                                                                     │
│  Flows, Tools, Models ─── All actions automatically traced     │
└─────────────────────────────┬───────────────────────────────────────┘
                                 │
                                 ▼
          ┌───────────────────────────────┐
          │      OpenTelemetry SDK        │
          │  (automatic instrumentation)  │
          └───────────────┬────────────────┘
                       │
       ┌───────────────┼────────────────┐
       │               │                │
       ▼               ▼                ▼
┌──────────┐  ┌──────────┐  ┌──────────┐
│  Cloud   │  │ Datadog  │  │ Sentry   │  ...
│  Trace   │  │          │  │          │
└──────────┘  └──────────┘  └──────────┘

Genkit automatically:

Creates OpenTelemetry spans for every action
Exports spans to configured backends
Includes Genkit-specific attributes (tokens, model, cost)

Production Monitoring

Google Cloud Trace

Integrate with Google Cloud for production tracing:

import { genkit } from 'genkit';
import { googleAI } from '@genkit-ai/google-genai';
import { googleCloud } from '@genkit-ai/google-cloud';

const ai = genkit({
  plugins: [
    googleAI(),
    googleCloud({
      projectId: 'my-project',
      telemetryConfig: {
        forceDevExport: false, // Only export in production
        autoInstrumentation: true,
      },
    }),
  ],
});

Features:

Cloud Trace: Distributed tracing across services
Cloud Logging: Structured logs with trace correlation
Metrics: Token usage, latency, error rates
Alerts: Set up alerts on errors or slow requests

Firebase

For Firebase projects:

import { firebase } from '@genkit-ai/firebase';

const ai = genkit({
  plugins: [
    firebase({
      telemetryConfig: {
        autoInstrumentation: true,
      },
    }),
  ],
});

Provides:

Cloud Trace integration
Cloud Logging
Firebase Console integration

Third-Party Observability

Support for popular platforms:

import { observability } from '@genkit-ai/observability';

const ai = genkit({
  plugins: [
    observability({
      sentry: {
        dsn: process.env.SENTRY_DSN,
      },
      datadog: {
        apiKey: process.env.DD_API_KEY,
        site: 'datadoghq.com',
      },
    }),
  ],
});

Supported platforms:

Sentry: Error tracking and performance monitoring
Datadog: APM, logs, metrics
Honeycomb: Distributed tracing and observability
New Relic: Application performance monitoring
Jaeger: Open-source distributed tracing
Zipkin: Distributed tracing system

Custom Telemetry

Add custom attributes to traces:

import { runInNewSpan } from 'genkit';

export const myFlow = ai.defineFlow(
  { name: 'myFlow' },
  async (input: string) => {
    return await runInNewSpan(
      {
        metadata: { name: 'custom-step' },
        labels: {
          userId: input.userId,
          requestType: 'premium',
          version: '2.0',
        },
      },
      async () => {
        // Your logic here
        return result;
      }
    );
  }
);

Metrics and Token Tracking

Genkit automatically tracks: Token Usage

Input tokens per request
Output tokens per request
Total tokens per flow
Tokens by model

Latency

Model call duration
Tool execution time
End-to-end flow time
Time to first token (TTFT)

Costs (when supported by model)

Cost per request
Cost per model
Daily/monthly spend

Error Rates

Failed requests
Timeout errors
Rate limit errors
Model-specific errors

Access in traces:

const response = await ai.generate({
  model: 'googleai/gemini-2.0-flash',
  prompt: 'Hello!',
});

console.log(response.usage);
// {
//   inputTokens: 5,
//   outputTokens: 12,
//   totalTokens: 17,
//   inputCharacters: 6,
//   outputCharacters: 42
// }

console.log(response.latencyMs); // 1234

Trace Export Formats

Genkit supports multiple trace export formats: JSON

genkit trace export --format json trace-id > trace.json

OpenTelemetry Protocol (OTLP)

export OTEL_EXPORTER_OTLP_ENDPOINT=https://your-collector:4318

Zipkin

export OTEL_EXPORTER_ZIPKIN_ENDPOINT=http://localhost:9411/api/v2/spans

Jaeger

export OTEL_EXPORTER_JAEGER_ENDPOINT=http://localhost:14250

Best Practices

1. Use Meaningful Step Names

Name your steps clearly:

@ai.flow()
async def research_flow(topic: str) -> str:
    # Good: Clear step names
    facts = await run('gather-facts', lambda: ...)
    analysis = await run('analyze-facts', lambda: ...)
    summary = await run('generate-summary', lambda: ...)
    
    # Bad: Generic names
    # step1 = await run('step1', lambda: ...)
    # step2 = await run('step2', lambda: ...)

2. Add Custom Metadata

Enrich traces with business context:

const response = await ai.generate({
  model: 'googleai/gemini-2.0-flash',
  prompt: input,
  metadata: {
    userId: user.id,
    requestId: req.id,
    feature: 'chat',
    tier: 'premium',
  },
});

3. Monitor Token Usage

Set up alerts for unexpected token usage:

if (response.usage.totalTokens > 10000) {
  logger.warn('High token usage detected', {
    tokens: response.usage.totalTokens,
    userId: user.id,
  });
}

4. Sample in Production

For high-traffic apps, sample traces:

const ai = genkit({
  plugins: [
    googleCloud({
      telemetryConfig: {
        sampler: {
          type: 'probabilistic',
          probability: 0.1, // Sample 10% of traces
        },
      },
    }),
  ],
});

5. Use Dev UI for Debugging

Before deploying:

Run flows in Dev UI with test data
Inspect traces for unexpected behavior
Verify token usage and latency
Test error handling

Debugging Common Issues

High Latency

Check trace for:

Slow model calls → Try faster model
Multiple sequential tool calls → Can any run in parallel?
Large prompts → Reduce context size
Network delays → Check model endpoint

High Token Usage

Check trace for:

Long conversation history → Summarize or truncate
Verbose prompts → Simplify instructions
Unnecessary tool calls → Refine tool descriptions
Large tool responses → Return only needed data

Errors

Check trace for:

Stack trace and error message
Which step failed
Input that caused the error
Model-specific error codes (rate limits, etc.)

Example: Monitoring Dashboard

Query traces programmatically:

from genkit.core.registry import Registry

registry = ai.registry

# Get recent traces
traces = await registry.get_traces(
    flow_name='myFlow',
    limit=100,
    time_range='24h',
)

# Analyze token usage
total_tokens = sum(t.usage.total_tokens for t in traces)
avg_latency = sum(t.latency_ms for t in traces) / len(traces)
error_rate = len([t for t in traces if t.error]) / len(traces)

print(f'Total tokens: {total_tokens}')
print(f'Avg latency: {avg_latency}ms')
print(f'Error rate: {error_rate * 100}%')

Next Steps

Learn about Flows - building traceable workflows
Explore Architecture - how tracing works internally
See Plugins - telemetry plugin options

Overview

Getting Started

Core Concepts

Guides

Model Providers

Deployment

Developer Tools

Why Observability Matters for AI

Automatic Tracing

Developer UI

Starting the Dev UI

Features

OpenTelemetry Integration

How It Works

Production Monitoring

Google Cloud Trace

Firebase

Third-Party Observability

Custom Telemetry

Metrics and Token Tracking

Trace Export Formats

Best Practices

1. Use Meaningful Step Names

2. Add Custom Metadata

3. Monitor Token Usage

4. Sample in Production

5. Use Dev UI for Debugging

Debugging Common Issues

High Latency

High Token Usage

Errors

Example: Monitoring Dashboard

Next Steps

Build docs developers (and LLMs) love

Overview

Getting Started

Core Concepts

Guides

Model Providers

Deployment

Developer Tools

Documentation Index

​Why Observability Matters for AI

​Automatic Tracing

​Developer UI

​Starting the Dev UI

​Features

​OpenTelemetry Integration

​How It Works

​Production Monitoring

​Google Cloud Trace

​Firebase

​Third-Party Observability

​Custom Telemetry

​Metrics and Token Tracking

​Trace Export Formats

​Best Practices

​1. Use Meaningful Step Names

​2. Add Custom Metadata

​3. Monitor Token Usage

​4. Sample in Production

​5. Use Dev UI for Debugging

​Debugging Common Issues

​High Latency

​High Token Usage

​Errors

​Example: Monitoring Dashboard

​Next Steps

Build docs developers (and LLMs) love

Why Observability Matters for AI

Automatic Tracing

Developer UI

Starting the Dev UI

Features

OpenTelemetry Integration

How It Works

Production Monitoring

Google Cloud Trace

Firebase

Third-Party Observability

Custom Telemetry

Metrics and Token Tracking

Trace Export Formats

Best Practices

1. Use Meaningful Step Names

2. Add Custom Metadata

3. Monitor Token Usage

4. Sample in Production

5. Use Dev UI for Debugging

Debugging Common Issues

High Latency

High Token Usage

Errors

Example: Monitoring Dashboard

Next Steps