Observability Module

The Observability module (OtelModule) provides comprehensive monitoring through OpenTelemetry-compliant traces, metrics, and logs. Essential for debugging, performance analysis, and production monitoring.

Configuration

Configure the Observability module in config.yaml:

config.yaml

modules:
  - class: modules::observability::OtelModule
    config:
      # Core Configuration
      enabled: true
      service_name: my-app
      service_version: 0.1.0
      service_namespace: production
      
      # Trace Exporter
      exporter: memory  # Options: otlp, memory, both
      endpoint: http://localhost:4317  # OTLP endpoint
      
      # Sampling
      sampling_ratio: 1.0  # 1.0 = 100% sampling
      
      # Memory Storage
      memory_max_spans: 10000
      
      # Metrics
      metrics_enabled: true
      metrics_exporter: memory
      metrics_retention_seconds: 3600
      metrics_max_count: 10000
      
      # Logs
      logs_enabled: true
      logs_exporter: memory
      logs_max_count: 1000
      logs_retention_seconds: 3600
      logs_sampling_ratio: 1.0
      logs_console_output: true

Configuration Options

Core Settings

enabled

boolean

default:"true"

Enable/disable observability module

service_name

string

default:"iii"

Service name for telemetry data

service_version

string

Service version identifier

service_namespace

string

Service namespace/environment (e.g., production, staging)

Trace Configuration

exporter

string

default:"memory"

Trace exporter type:

memory: Store in-memory (for development/debugging)
otlp: Export to OTLP collector
both: Both memory and OTLP

endpoint

string

OTLP endpoint URL (required when exporter is otlp or both)

sampling_ratio

number

default:"1.0"

Basic sampling ratio (0.0 to 1.0)

Metrics Configuration

metrics_enabled

boolean

default:"true"

Enable metrics collection

metrics_exporter

string

default:"memory"

Metrics exporter: memory, otlp

metrics_retention_seconds

number

default:"3600"

How long to keep metrics in memory (seconds)

Logs Configuration

logs_enabled

boolean

default:"true"

Enable log collection

logs_exporter

string

default:"memory"

Logs exporter: memory, otlp, both

logs_sampling_ratio

number

default:"1.0"

Log sampling ratio (0.0 to 1.0)

logs_console_output

boolean

default:"true"

Output logs to console

Logging Functions

Use built-in logging functions for structured logs:

// Info level
await client.call('engine::log::info', {
  message: 'User logged in',
  data: { userId: 123, ip: '192.168.1.1' },
});

// Warning level
await client.call('engine::log::warn', {
  message: 'Rate limit approaching',
  data: { current: 95, limit: 100 },
});

// Error level
await client.call('engine::log::error', {
  message: 'Payment failed',
  data: { orderId: 456, error: 'Card declined' },
});

// Debug level
await client.call('engine::log::debug', {
  message: 'Cache miss',
  data: { key: 'user_123' },
});

// Trace level
await client.call('engine::log::trace', {
  message: 'Database query',
  data: { query: 'SELECT * FROM users' },
});

Log Input Format

{
  trace_id?: string;      // Optional: link to trace
  span_id?: string;       // Optional: link to span
  message: string;        // Log message
  data?: object;          // Structured data
  service_name?: string;  // Override service name
}

Traces

All function invocations are automatically traced. Access traces via functions:

List Traces

const traces = await client.call('engine::traces::list', {
  offset: 0,
  limit: 100,
  service_name: 'my-app',        // Filter by service
  name: 'api.users',             // Filter by span name
  status: 'error',               // Filter by status
  min_duration_ms: 100,          // Min duration
  max_duration_ms: 5000,         // Max duration
  start_time: 1234567890000,     // Start timestamp (ms)
  end_time: 1234567890000,       // End timestamp (ms)
  sort_by: 'duration',           // Sort: duration, start_time, name
  sort_order: 'desc',            // asc or desc
  include_internal: false,       // Include engine.* functions
});

Get Trace Tree

const tree = await client.call('engine::traces::tree', {
  trace_id: 'abc123...',
});

// Returns hierarchical trace structure

Clear Traces

await client.call('engine::traces::clear', {});

Metrics

Access collected metrics:

const metrics = await client.call('engine::metrics::list', {
  start_time: Date.now() - 3600000,  // Last hour
  end_time: Date.now(),
  metric_name: 'iii.invocations.total',  // Optional filter
  aggregate_interval: 60,  // Aggregate by 60 seconds
});

// Returns:
// {
//   engine_metrics: {
//     invocations: { total, success, error, deferred, by_function },
//     workers: { spawns, deaths, active },
//     performance: { avg_duration_ms, p50, p95, p99, min, max }
//   },
//   sdk_metrics: [...],
//   aggregated_metrics: [...],  // If aggregate_interval specified
//   timestamp: 1234567890
// }

Built-in Metrics

iii.invocations.total - Total function invocations
iii.invocations.success - Successful invocations
iii.invocations.error - Failed invocations
iii.invocations.deferred - Deferred invocations
iii.workers.spawns - Worker spawn count
iii.workers.deaths - Worker death count
iii.workers.active - Active workers

Logs

Query stored logs:

const logs = await client.call('engine::logs::list', {
  start_time: Date.now() - 3600000,
  end_time: Date.now(),
  trace_id: 'abc123',           // Filter by trace
  span_id: 'xyz789',            // Filter by span
  severity_min: 13,             // Min severity (13 = WARN)
  severity_text: 'ERROR',       // Filter by level
  offset: 0,
  limit: 100,
});

Clear Logs

await client.call('engine::logs::clear', {});

Log Triggers

React to log events:

index.ts

export default iii({
  triggers: {
    'on-error-log': {
      type: 'log',
      config: {
        level: 'error',  // Options: all, error, warn, info, debug, trace
      },
    },
  },
});

export async function onErrorLog(log: any) {
  console.log('Error logged:', log.body);
  console.log('Severity:', log.severity_text);
  console.log('Data:', log.attributes);
  
  // Send alert
  if (log.severity_number >= 17) { // ERROR or higher
    await sendAlert({
      message: log.body,
      service: log.service_name,
      trace_id: log.trace_id,
    });
  }
}

Advanced Sampling

Configure rule-based sampling:

config.yaml

modules:
  - class: modules::observability::OtelModule
    config:
      sampling:
        default: 0.1  # 10% default
        parent_based: true  # Respect parent sampling decisions
        
        rules:
          # Sample 100% of critical operations
          - operation: "api.critical.*"
            rate: 1.0
          
          # Sample 1% of health checks
          - operation: "health.*"
            rate: 0.01
          
          # Sample 80% of production API calls
          - operation: "api.*"
            service: "production-*"
            rate: 0.8
          
          # Sample 10% of development
          - service: "development-*"
            rate: 0.1
        
        # Rate limiting
        rate_limit:
          max_traces_per_second: 100

Alerts

Configure metric-based alerts:

config.yaml

modules:
  - class: modules::observability::OtelModule
    config:
      alerts:
        - name: high_error_rate
          metric: iii.invocations.error
          threshold: 100
          operator: ">"  # >, >=, <, <=, ==, !=
          window_seconds: 60
          cooldown_seconds: 300
          action: webhook
          webhook_url: https://hooks.slack.com/services/xxx
        
        - name: low_workers
          metric: iii.workers.active
          threshold: 1
          operator: "<"
          action: log

Query alert states:

const alerts = await client.call('engine::alerts::list', {});

// Manually evaluate alerts
const triggered = await client.call('engine::alerts::evaluate', {});

Health Check

Check observability system health:

const health = await client.call('engine::health::check', {});

// Returns:
// {
//   status: 'healthy',
//   components: {
//     otel: { status: 'healthy', details: {...} },
//     metrics: { status: 'healthy', details: {...} },
//     logs: { status: 'healthy', details: {...} },
//     spans: { status: 'healthy', details: {...} }
//   },
//   timestamp: 1234567890,
//   version: '0.1.0'
// }

Distributed Tracing

Traces automatically propagate across:

HTTP calls
Queue messages
Stream events
State changes

// Publisher (HTTP endpoint)
export async function createOrder(input: any) {
  // This creates a trace span
  const order = await db.orders.create(input.body);
  
  // Queue message inherits trace context
  await client.call('queue.enqueue', {
    topic: 'orders.process',
    data: order,
  });
  
  return order;
}

// Subscriber (continues the trace)
export async function processOrder(data: any) {
  // This span is linked to the HTTP request
  console.log('Processing order:', data.id);
}

Baggage

Propagate custom context across traces:

// Set baggage
await client.call('engine::baggage::set', {
  key: 'user_id',
  value: '123',
});

// Get baggage
const baggage = await client.call('engine::baggage::get', {
  key: 'user_id',
});

// Get all baggage
const all = await client.call('engine::baggage::get_all', {});

Exporting to OTLP Collector

Export to Jaeger, Zipkin, or other OTLP collectors:

config.yaml

modules:
  - class: modules::observability::OtelModule
    config:
      exporter: otlp
      endpoint: http://localhost:4317
      metrics_exporter: otlp
      logs_exporter: otlp

Docker Compose example:

docker-compose.yml

services:
  jaeger:
    image: jaegertracing/all-in-one:latest
    ports:
      - "16686:16686"  # UI
      - "4317:4317"    # OTLP gRPC
      - "4318:4318"    # OTLP HTTP

Best Practices

Production: Use exporter: otlp with sampling
Development: Use exporter: memory for debugging
Sampling: Adjust based on traffic (0.1 = 10% for high traffic)
Alerts: Configure for critical metrics
Log Triggers: Use for error notification
Retention: Balance storage vs. retention needs

Performance Impact

Memory exporter: Low overhead, limited by max_spans
OTLP exporter: Network overhead, offloads storage
Sampling: Reduces overhead proportionally
Logs: Consider sampling for high-volume applications

Source Code Reference

Module: src/modules/observability/mod.rs:278
Logging functions: src/modules/observability/mod.rs:329
Traces API: src/modules/observability/mod.rs:591
Metrics API: src/modules/observability/mod.rs:769
Logs API: src/modules/observability/mod.rs:947

Get Started

Core Concepts

Modules

SDKs

Deployment

Advanced

Observability Module

Configuration

Configuration Options

Core Settings

Trace Configuration

Metrics Configuration

Logs Configuration

Logging Functions

Log Input Format

Traces

List Traces

Get Trace Tree

Clear Traces

Metrics

Built-in Metrics

Logs

Clear Logs

Log Triggers

Advanced Sampling

Alerts

Health Check

Distributed Tracing

Baggage

Exporting to OTLP Collector

Best Practices

Performance Impact

Source Code Reference

Build docs developers (and LLMs) love

Get Started

Core Concepts

Modules

SDKs

Deployment

Advanced

​Configuration

​Configuration Options

​Core Settings

​Trace Configuration

​Metrics Configuration

​Logs Configuration

​Logging Functions

​Log Input Format

​Traces

​List Traces

​Get Trace Tree

​Clear Traces

​Metrics

​Built-in Metrics

​Logs

​Clear Logs

​Log Triggers

​Advanced Sampling

​Alerts

​Health Check

​Distributed Tracing

​Baggage

​Exporting to OTLP Collector

​Best Practices

​Performance Impact

​Source Code Reference

Build docs developers (and LLMs) love

Configuration

Configuration Options

Core Settings

Trace Configuration

Metrics Configuration

Logs Configuration

Logging Functions

Log Input Format

Traces

List Traces

Get Trace Tree

Clear Traces

Metrics

Built-in Metrics

Logs

Clear Logs

Log Triggers

Advanced Sampling

Alerts

Health Check

Distributed Tracing

Baggage

Exporting to OTLP Collector

Best Practices

Performance Impact

Source Code Reference