Monitoring & Observability

Overview

Carrier provides comprehensive logging and monitoring capabilities to help you track message processing, resource usage, and system health in production environments.

JSON Logging

Structured logs for easy parsing and analysis

Colorized Output

Human-friendly logs for local development

Statistics Tracking

Periodic resource usage reports

Source Attribution

Every log includes component identification

Logging Configuration

Carrier supports two logging formats controlled by environment variables:

JSON Logging (Production)

Default format for production environments. Each log entry is a valid JSON object:

CARRIER_ENABLE_COLORIZED_LOGGING=false  # Default

Example output:

{"time":"2024-03-09T10:15:30Z","level":"INFO","source":"sqs.Receiver","msg":"starting event loop","batch_size":10,"max_workers":10}
{"time":"2024-03-09T10:15:31Z","level":"INFO","source":"webhook.HealthChecker","msg":"webhook online","endpoint":"http://worker:9000"}
{"time":"2024-03-09T10:15:31Z","level":"INFO","source":"main","msg":"carrier has arrived"}
{"time":"2024-03-09T10:15:35Z","level":"DEBUG","source":"sqs.Receiver","msg":"received messages","count":5}
{"time":"2024-03-09T10:15:36Z","level":"DEBUG","source":"sqs.Receiver","msg":"deleted messages","count":5}

Colorized Logging (Development)

Human-friendly format for local development and debugging:

CARRIER_ENABLE_COLORIZED_LOGGING=true

Example output:

15:30 INF starting event loop source=sqs.Receiver batch_size=10 max_workers=10
15:31 INF webhook online source=webhook.HealthChecker endpoint=http://worker:9000
15:31 INF carrier has arrived source=main
15:35 DBG received messages source=sqs.Receiver count=5
15:36 DBG deleted messages source=sqs.Receiver count=5

Implementation

Logging is configured in main.go:84-92:

if envCfg.EnableColorizedLogging {
    logHandler = tint.NewHandler(os.Stdout, &tint.Options{Level: slog.LevelInfo})
} else {
    logHandler = slog.NewJSONHandler(os.Stdout, &slog.HandlerOptions{Level: slog.LevelInfo})
}
log := slog.New(logHandler).With("source", "main")

All components use Go’s structured logging (slog) with source attribution:

log := slog.New(logHandler).With("source", "sqs.Receiver")
log.Info("starting event loop", "batch_size", batchSize, "max_workers", maxWorkers)

Statistics Logging

Enable periodic statistics reporting to track resource usage:

CARRIER_ENABLE_STAT_LOG=true
CARRIER_STAT_LOG_TIMER=120s  # Default: every 2 minutes

StatLogger Implementation

The StatLogger component (main.go:42-78) tracks goroutines and memory usage:

// StatLogger is a utility for logging runtime statistics.
type StatLogger struct {
    ticker *time.Ticker
    log    *slog.Logger
    ctx    context.Context
}

// Run executes the execution loop of the StatLogger.
func (l *StatLogger) Run() {
    for {
        select {
        case <-l.ctx.Done():
            return
        case <-l.ticker.C:
            var m runtime.MemStats
            runtime.ReadMemStats(&m)
            l.log.Info("stats", "goroutines", runtime.NumGoroutine(), "memory", humanize.Bytes(m.Sys))
        }
    }
}

Stats Output Example

JSON format:

{"time":"2024-03-09T10:15:30Z","level":"INFO","source":"main.StatLogger","msg":"stats","goroutines":15,"memory":"12 MB"}
{"time":"2024-03-09T10:17:30Z","level":"INFO","source":"main.StatLogger","msg":"stats","goroutines":15,"memory":"12 MB"}
{"time":"2024-03-09T10:19:30Z","level":"INFO","source":"main.StatLogger","msg":"stats","goroutines":18,"memory":"13 MB"}

Colorized format:

15:30 INF stats source=main.StatLogger goroutines=15 memory="12 MB"
17:30 INF stats source=main.StatLogger goroutines=15 memory="12 MB"
19:30 INF stats source=main.StatLogger goroutines=18 memory="13 MB"

The StatLogger uses the humanize library to format memory sizes in human-readable units (MB, GB, etc.).

Log Levels and Sources

Log Levels

Carrier uses standard log levels:

Level	Usage	Examples
`INFO`	Normal operation events	Startup, shutdown, state changes
`WARN`	Potential issues	Webhook offline, configuration warnings
`ERROR`	Error conditions	Failed API calls, transmission failures
`DEBUG`	Detailed diagnostics	Message counts, batch operations

Log Sources

Each log entry includes a source field identifying the component:

Source	Component	Purpose
`main`	Main process	Startup, shutdown, configuration
`main.StatLogger`	Statistics logger	Resource usage tracking
`sqs.Receiver`	SQS receiver	Message polling and processing
`webhook.Transmitter`	HTTP transmitter	Webhook delivery (logged at debug level in errors)
`webhook.HealthChecker`	Health checker	Endpoint health monitoring

Production Monitoring Setup

Docker Compose Example

version: '3.8'

services:
  carrier:
    image: amplifysecurity/carrier
    environment:
      # Logging configuration
      CARRIER_ENABLE_COLORIZED_LOGGING: "false"
      CARRIER_ENABLE_STAT_LOG: "true"
      CARRIER_STAT_LOG_TIMER: "60s"
      
      # SQS configuration
      CARRIER_SQS_ENDPOINT: "https://sqs.us-west-2.amazonaws.com"
      CARRIER_SQS_QUEUE_NAME: "my-queue"
      CARRIER_SQS_BATCH_SIZE: "10"
      
      # Webhook configuration
      CARRIER_WEBHOOK_ENDPOINT: "http://worker:9000/webhook"
      CARRIER_WEBHOOK_HEALTH_CHECK_ENDPOINT: "http://worker:9000/health"
    
    # Send logs to stdout for container log collection
    logging:
      driver: "json-file"
      options:
        max-size: "10m"
        max-file: "3"

  worker:
    image: my-worker:latest

Kubernetes Example

apiVersion: apps/v1
kind: Deployment
metadata:
  name: carrier-worker
spec:
  template:
    spec:
      containers:
        - name: carrier
          image: amplifysecurity/carrier
          env:
            - name: CARRIER_ENABLE_COLORIZED_LOGGING
              value: "false"
            - name: CARRIER_ENABLE_STAT_LOG
              value: "true"
            - name: CARRIER_STAT_LOG_TIMER
              value: "60s"
            - name: CARRIER_SQS_ENDPOINT
              value: "https://sqs.us-west-2.amazonaws.com"
            - name: CARRIER_SQS_QUEUE_NAME
              value: "my-queue"
            - name: CARRIER_WEBHOOK_ENDPOINT
              value: "http://localhost:9000/webhook"
            - name: CARRIER_WEBHOOK_HEALTH_CHECK_ENDPOINT
              value: "http://localhost:9000/health"
          
          resources:
            requests:
              memory: "32Mi"
              cpu: "100m"
            limits:
              memory: "128Mi"
              cpu: "500m"
        
        - name: worker
          image: my-worker:latest

Integration with Monitoring Tools

CloudWatch Logs

When running on AWS ECS or EKS, logs are automatically sent to CloudWatch:

// Log Insights query to track message throughput
fields @timestamp, source, msg, count
| filter source = "sqs.Receiver" and msg = "deleted messages"
| stats sum(count) as total_messages by bin(5m)

// Query to monitor resource usage
fields @timestamp, goroutines, memory
| filter source = "main.StatLogger"
| sort @timestamp desc

Datadog

Configure the Datadog Agent to parse Carrier’s JSON logs:

logs:
  - type: file
    path: "/var/log/containers/*carrier*.log"
    service: carrier
    source: golang
    sourcecategory: sourcecode

Example Datadog monitor:

# Alert on high message processing errors
avg(last_5m):sum:carrier.message.errors{*} > 10

Prometheus

While Carrier doesn’t expose metrics directly, you can use a log-to-metrics exporter:

# mtail configuration example
program carrier {
  /"msg":"deleted messages","count":(\d+)/ {
    carrier_messages_processed_total += $1
  }
  
  /"msg":"failed to transmit message"/ {
    carrier_message_errors_total++
  }
  
  /"msg":"stats","goroutines":(\d+),"memory":"([\d.]+)\s+(\w+)"/ {
    carrier_goroutines = $1
  }
}

Grafana Loki

Query Carrier logs in Loki:

{container="carrier"} | json

# Count messages processed per minute
sum(rate({container="carrier"} 
  | json 
  | source="sqs.Receiver" 
  | msg="deleted messages" [1m])) by (count)

Elasticsearch (ELK Stack)

Index Carrier logs with Filebeat:

filebeat.inputs:
  - type: container
    paths:
      - '/var/lib/docker/containers/*/*.log'
    processors:
      - add_kubernetes_metadata:
          host: ${NODE_NAME}
          matchers:
          - logs_path:
              logs_path: "/var/lib/docker/containers/"

output.elasticsearch:
  hosts: ["elasticsearch:9200"]

Kibana query:

source:"sqs.Receiver" AND msg:"deleted messages"

Key Metrics to Monitor

Message Throughput

Track deleted messages count to measure successful processing rate

Error Rate

Monitor failed to transmit message errors for processing issues

Memory Usage

Watch memory values in stats logs for memory leaks

Goroutine Count

Track goroutines to detect goroutine leaks

Health Status

Monitor webhook online/offline events for service health

Visibility Updates

Track updated message visibility for retry pattern analysis

Sample Monitoring Queries

Message Processing
Error Tracking
Resource Usage
Health Events

CloudWatch Logs Insights

// Total messages processed in last hour
fields @timestamp, count
| filter source = "sqs.Receiver" and msg = "deleted messages"
| stats sum(count) as total

Loki

sum(count_over_time({container="carrier"} 
  | json 
  | source="sqs.Receiver" 
  | msg="deleted messages" [1h]))

CloudWatch Logs Insights

// Errors in last 15 minutes
fields @timestamp, level, msg, error
| filter level = "ERROR"
| sort @timestamp desc
| limit 100

Loki

{container="carrier"} 
  | json 
  | level="ERROR"

CloudWatch Logs Insights

// Memory and goroutine trends
fields @timestamp, goroutines, memory
| filter source = "main.StatLogger"
| sort @timestamp desc

Loki

{container="carrier"} 
  | json 
  | source="main.StatLogger"

CloudWatch Logs Insights

// Webhook state changes
fields @timestamp, msg, endpoint
| filter source = "webhook.HealthChecker" 
  and (msg = "webhook online" or msg = "webhook offline")
| sort @timestamp desc

Loki

{container="carrier"} 
  | json 
  | source="webhook.HealthChecker" 
  | msg=~"webhook (online|offline)"

Alerting Recommendations

High Error Rate

Condition: More than 5 transmission errors in 5 minutesAction:

Check webhook service health
Review recent code deployments
Verify network connectivity

Query:

count({container="carrier"} | json | level="ERROR" | msg=~"failed to transmit") > 5

Webhook Offline

Condition: Webhook marked as offlineAction:

Check worker service status
Review worker logs for errors
Verify health check endpoint

Query:

{container="carrier"} | json | msg="webhook offline"

Memory Growth

Condition: Memory usage increasing over timeAction:

Review memory stats trends
Check for message processing backlog
Consider restarting the container

Query:

avg_over_time({container="carrier"} 
  | json 
  | source="main.StatLogger" 
  | unwrap memory [30m])

Goroutine Leak

Condition: Goroutine count continuously increasingAction:

Check for stuck message processing
Review recent configuration changes
Restart the container if count exceeds threshold

Query:

rate({container="carrier"} 
  | json 
  | source="main.StatLogger" 
  | unwrap goroutines [10m]) > 0

Best Practices

Use JSON in Production

Always use JSON logging (CARRIER_ENABLE_COLORIZED_LOGGING=false) in production for better parsing and analysis.

Enable Stats Logging

Set CARRIER_ENABLE_STAT_LOG=true to track resource usage trends over time.

Configure Log Rotation

Use container logging drivers with size and file limits to prevent disk space issues.

Correlate Logs

Use message IDs from SQS to correlate logs across Carrier and your worker application.

Set Up Alerts

Create alerts for error rates, webhook offline events, and resource anomalies.

Monitor Trends

Track message throughput, error rates, and resource usage trends over time.

Troubleshooting

No Logs Appearing

Verify logs are being written to stdout: docker logs <container>
Check log aggregation configuration
Ensure JSON parsing is configured correctly

Missing Stats Logs

Confirm CARRIER_ENABLE_STAT_LOG=true
Check CARRIER_STAT_LOG_TIMER value
Verify the StatLogger goroutine is running

Log Volume Too High

Increase CARRIER_STAT_LOG_TIMER for less frequent stats
Filter out debug-level logs in your aggregation tool
Increase batch size to reduce per-message log entries

Health Checks

Configure webhook health monitoring

Dynamic Timeouts

Implement intelligent retry strategies

Configuration

Complete environment variable reference

Get Started

Deployment

Architecture

Operations

Reference

​Overview

JSON Logging

Colorized Output

Statistics Tracking

Source Attribution

​Logging Configuration

​JSON Logging (Production)

​Colorized Logging (Development)

​Implementation

​Statistics Logging

​StatLogger Implementation

​Stats Output Example

​Log Levels and Sources

​Log Levels

​Log Sources

​Production Monitoring Setup

​Docker Compose Example

​Kubernetes Example

​Integration with Monitoring Tools

​CloudWatch Logs

​Datadog

​Prometheus

​Grafana Loki

​Elasticsearch (ELK Stack)

​Key Metrics to Monitor

Message Throughput

Error Rate

Memory Usage

Goroutine Count

Health Status

Visibility Updates

​Sample Monitoring Queries

​Alerting Recommendations

​Best Practices

Use JSON in Production

Enable Stats Logging

Configure Log Rotation

Correlate Logs

Set Up Alerts

Monitor Trends

​Troubleshooting

​No Logs Appearing

​Missing Stats Logs

​Log Volume Too High

​Related Topics

Health Checks

Dynamic Timeouts

Configuration

Build docs developers (and LLMs) love

Overview

Logging Configuration

JSON Logging (Production)

Colorized Logging (Development)

Implementation

Statistics Logging

StatLogger Implementation

Stats Output Example

Log Levels and Sources

Log Levels

Log Sources

Production Monitoring Setup

Docker Compose Example

Kubernetes Example

Integration with Monitoring Tools

CloudWatch Logs

Datadog

Prometheus

Grafana Loki

Elasticsearch (ELK Stack)

Key Metrics to Monitor

Sample Monitoring Queries

Alerting Recommendations

Best Practices

Troubleshooting

No Logs Appearing

Missing Stats Logs

Log Volume Too High

Related Topics