Skip to main content

Overview

Carrier provides built-in health check monitoring for webhook endpoints. When enabled, Carrier will:
  • Wait for the webhook endpoint to come online before processing messages
  • Continuously monitor the webhook’s health status
  • Automatically exit if the webhook goes offline, allowing orchestration systems like Kubernetes to restart the container
This prevents messages from being unnecessarily sent to dead letter queues when services are starting up or experiencing issues.

Configuration

Health checks are configured through environment variables:
VariableRequiredDefaultDescription
CARRIER_WEBHOOK_HEALTH_CHECK_ENDPOINTNo-Enables health checks when set. Should be a full URL to your health check endpoint
CARRIER_WEBHOOK_HEALTH_CHECK_INTERVALNo60sTime interval between health checks
CARRIER_WEBHOOK_HEALTH_CHECK_TIMEOUTNo10sTimeout for each health check request
CARRIER_WEBHOOK_OFFLINE_THRESHOLD_COUNTNo5Number of consecutive failed checks before marking webhook as offline
All time duration values support Go’s time.ParseDuration() format (e.g., 30s, 2m, 1h30m).

Example Configuration

Docker Compose
services:
  carrier:
    image: amplifysecurity/carrier
    environment:
      CARRIER_WEBHOOK_ENDPOINT: http://worker:9000/webhook
      CARRIER_WEBHOOK_HEALTH_CHECK_ENDPOINT: http://worker:9000/health
      CARRIER_WEBHOOK_HEALTH_CHECK_INTERVAL: 30s
      CARRIER_WEBHOOK_HEALTH_CHECK_TIMEOUT: 5s
      CARRIER_WEBHOOK_OFFLINE_THRESHOLD_COUNT: 3
      CARRIER_SQS_ENDPOINT: https://sqs.us-west-2.amazonaws.com
      CARRIER_SQS_QUEUE_NAME: my-queue
Kubernetes
apiVersion: apps/v1
kind: Deployment
metadata:
  name: carrier-demo
spec:
  template:
    spec:
      containers:
        - name: carrier
          image: amplifysecurity/carrier
          env:
            - name: CARRIER_WEBHOOK_ENDPOINT
              value: http://localhost:9000/webhook
            - name: CARRIER_WEBHOOK_HEALTH_CHECK_ENDPOINT
              value: http://localhost:9000/health
            - name: CARRIER_WEBHOOK_HEALTH_CHECK_INTERVAL
              value: 30s

How It Works

Health Check Mechanism

The health checker (transmitter/webhook/checker.go:76-93) performs periodic GET requests to the configured endpoint:
func (c *HealthChecker) checkEndpoint() (EndpointState, error) {
    req, err := http.NewRequest(http.MethodGet, c.endpoint, nil)
    if err != nil {
        return EndpointStateOffline, err
    }
    resp, err := c.client.Do(req)
    if resp != nil && resp.Body != nil {
        defer resp.Body.Close()
    }
    if err != nil {
        return EndpointStateOffline, err
    }
    if resp.StatusCode == http.StatusOK {
        return EndpointStateOnline, nil
    }
    return EndpointStateOffline, fmt.Errorf("%w: %d", ErrNon200StatusCode, resp.StatusCode)
}
The health checker only considers HTTP 200 responses as healthy. Any other status code or network error marks the endpoint as offline.

State Transitions

Carrier manages two endpoint states defined in transmitter/webhook/checker.go:13-16:
const (
    // EndpointStateOnline represents an online endpoint.
    EndpointStateOnline EndpointState = iota
    // EndpointStateOffline represents an offline endpoint.
    EndpointStateOffline
)

Startup Sequence

  1. Initial State: Carrier starts with the endpoint in EndpointStateOffline
  2. Waiting: Health checks run continuously until the endpoint returns HTTP 200
  3. Online: Once healthy, Carrier logs “webhook online” and begins processing messages from SQS
  4. Ready: The message “carrier has arrived” indicates the system is fully operational
if c.currentState == EndpointStateOffline {
    // waiting for endpoint to initialize
    if state == EndpointStateOnline {
        c.log.Info("webhook online", "endpoint", c.webhookEndpoint)
        c.currentState = state
        c.offlineCount = 0
        c.ctrl <- state
        continue
    }
}

Runtime Monitoring

Once online, the health checker:
  • Resets the offline counter on each successful check
  • Increments the counter on each failed check
  • Marks the endpoint offline after reaching the threshold
  • Signals Carrier to exit, triggering a container restart
if state == EndpointStateOnline {
    // reset the current offline count
    c.offlineCount = 0
    continue
}
c.offlineCount++
if c.offlineCount >= c.offlineThresholdCount {
    c.log.Warn("webhook offline", "endpoint", c.webhookEndpoint)
    c.currentState = state
    c.ctrl <- state
}

Implementing Health Check Endpoints

Best Practices

Keep It Simple

Health checks should be fast and lightweight. Avoid database queries or external API calls.

Check Dependencies

Verify that critical dependencies your webhook needs are available.

Return Quickly

Respond within the configured timeout (default 10s). Faster is better.

Use Standard Codes

Return HTTP 200 for healthy, anything else for unhealthy.

Example Implementations

func healthHandler(w http.ResponseWriter, r *http.Request) {
    // Simple health check
    w.WriteHeader(http.StatusOK)
    w.Write([]byte("OK"))
}

// With dependency check
func healthHandlerWithChecks(w http.ResponseWriter, r *http.Request) {
    // Check database connection
    if err := db.Ping(); err != nil {
        w.WriteHeader(http.StatusServiceUnavailable)
        return
    }
    
    w.WriteHeader(http.StatusOK)
    w.Write([]byte("OK"))
}

Monitoring with StatLogger

When health checks are enabled, you can monitor their operation using Carrier’s statistics logging feature. Enable it by setting:
CARRIER_ENABLE_STAT_LOG=true
The StatLogger (from main.go:42-78) tracks goroutines and memory usage:
type StatLogger struct {
    ticker *time.Ticker
    log    *slog.Logger
    ctx    context.Context
}

func (l *StatLogger) Run() {
    for {
        select {
        case <-l.ctx.Done():
            return
        case <-l.ticker.C:
            var m runtime.MemStats
            runtime.ReadMemStats(&m)
            l.log.Info("stats", "goroutines", runtime.NumGoroutine(), "memory", humanize.Bytes(m.Sys))
        }
    }
}
This helps you track resource usage and verify the health checker is running properly.

Troubleshooting

  • Verify your health check endpoint is accessible from Carrier
  • Check that the endpoint returns HTTP 200
  • Review logs for connection errors
  • Ensure the endpoint URL is correct (protocol, host, port, path)
Increase the CARRIER_WEBHOOK_OFFLINE_THRESHOLD_COUNT to allow more consecutive failures before marking the endpoint offline. Default is 5 failures.
Decrease the CARRIER_WEBHOOK_HEALTH_CHECK_INTERVAL value. Default is 60 seconds. For faster detection, try 30s or 15s.
  • Check network latency between Carrier and the webhook
  • Optimize your health check endpoint to respond faster
  • Increase CARRIER_WEBHOOK_HEALTH_CHECK_TIMEOUT if necessary

Kubernetes Integration

When running Carrier in Kubernetes, health checks work seamlessly with pod lifecycle management:
apiVersion: apps/v1
kind: Deployment
metadata:
  name: carrier-worker
spec:
  template:
    spec:
      containers:
        - name: carrier
          image: amplifysecurity/carrier
          env:
            - name: CARRIER_WEBHOOK_HEALTH_CHECK_ENDPOINT
              value: http://localhost:9000/health
            - name: CARRIER_WEBHOOK_OFFLINE_THRESHOLD_COUNT
              value: "3"
        - name: worker
          image: my-worker:latest
          ports:
            - containerPort: 9000
          livenessProbe:
            httpGet:
              path: /health
              port: 9000
            initialDelaySeconds: 10
            periodSeconds: 30
Carrier will exit when it detects the webhook is offline. Ensure your Kubernetes deployment has an appropriate restart policy (default is Always).

Monitoring

Learn about logging and monitoring Carrier in production

Configuration

Complete reference for all Carrier environment variables

Build docs developers (and LLMs) love