Documentation Index
Fetch the complete documentation index at: https://mintlify.com/Conway-Research/automaton/llms.txt
Use this file to discover all available pages before exploring further.
Health Status Overview
Your automaton continuously monitors its own health and reports status through multiple channels.
Quick Status Check
Output:
Name: MyAutomaton
Address: 0x742d35Cc6634C0532925a3b844Bc9e7595f0bEb1
State: running
Credits: $12.45
USDC Balance: $50.00
Survival Tier: normal
Uptime: 3h 24m
Last Heartbeat: 2 minutes ago
Turns: 147
Agent States
Your automaton transitions through different operational states:
export type AgentState =
| "setup" // Initial configuration
| "waking" // Starting up
| "running" // Normal operation
| "sleeping" // Idle, waiting for input
| "low_compute" // Low credits, reduced activity
| "critical" // Zero credits, survival mode
| "dead"; // Non-operational
State Transitions
┌──────┐
│ setup│
└───┬──┘
▼
┌───────┐
│waking │
└───┬───┘
▼
┌────────┐ ┌──────────┐
│running │◄──►│ sleeping │
└───┬────┘ └──────────┘
│
▼ (credits low)
┌─────────────┐
│ low_compute │
└──────┬──────┘
│
▼ (credits exhausted)
┌──────────┐
│ critical │
└─────┬────┘
│
▼ (negative balance)
┌──────┐
│ dead │
└──────┘
Observability System
The automaton includes a comprehensive observability stack with structured logging, metrics, and alerts.
Structured Logging
All logs are JSON-formatted for easy parsing:
{
"timestamp": "2026-03-03T10:15:30.123Z",
"level": "info",
"module": "heartbeat.scheduler",
"message": "Heartbeat task completed",
"context": {
"taskName": "check_inbox",
"durationMs": 234
}
}
Log Levels
- debug: Detailed diagnostic information
- info: General informational messages
- warn: Warning messages (non-critical issues)
- error: Error messages (operation failed)
- fatal: Fatal errors (system shutdown)
Viewing Logs
# All logs
automaton-cli logs
# Filter by level
automaton-cli logs --level error
# Follow live
automaton-cli logs --follow
# Search content
automaton-cli logs --grep "credit"
# Last 100 lines
automaton-cli logs --tail 100
Metrics Collection
The automaton tracks metrics in three types:
Counters
Monotonically increasing values:
turns_total: Total turns executed
inference_cost_cents: Cumulative inference cost
heartbeat_task_successes_total: Successful heartbeat tasks
heartbeat_task_failures_total: Failed heartbeat tasks
policy_decisions_total: Total policy evaluations
policy_denies_total: Denied policy decisions
Gauges
Point-in-time measurements:
balance_cents: Current credit balance
usdc_balance: Current USDC balance
context_tokens_total: Current context size
turns_last_hour: Turns in last hour (windowed)
unhealthy_child_count: Number of unhealthy child agents
Histograms
Distribution of values over time:
turn_duration_ms: Turn execution time
inference_latency_ms: Model API latency
tool_duration_ms: Tool execution time
Metrics API
From src/observability/metrics.ts:
import { getMetrics } from "./observability/metrics.js";
const metrics = getMetrics();
// Increment counter
metrics.increment("turns_total", { state: "running" });
// Set gauge
metrics.gauge("balance_cents", creditsCents);
// Record histogram value
metrics.histogram("turn_duration_ms", durationMs);
// Query metrics
const turnCount = metrics.getCounter("turns_total");
const balance = metrics.getGauge("balance_cents");
Viewing Metrics
# Current metrics snapshot
automaton-cli metrics
# Specific metric
automaton-cli metrics --name balance_cents
# Export to JSON
automaton-cli metrics --format json > metrics.json
Alert System
The automaton evaluates alert rules against metric snapshots and triggers notifications.
Built-in Alert Rules
From src/observability/alerts.ts:
1. Balance Below Reserve
{
name: "balance_below_reserve",
severity: "critical",
message: "Balance is below minimum reserve (1000 cents)",
cooldownMs: 5 * 60 * 1000,
condition: (metrics) => {
const balance = metrics.gauges.get("balance_cents") ?? Infinity;
return balance < 1000;
},
}
2. High Heartbeat Failure Rate
{
name: "heartbeat_high_failure_rate",
severity: "warning",
message: "Heartbeat task failure rate exceeds 20%",
cooldownMs: 15 * 60 * 1000,
condition: (metrics) => {
const failures = metrics.counters.get("heartbeat_task_failures_total") ?? 0;
const successes = metrics.counters.get("heartbeat_task_successes_total") ?? 0;
const total = failures + successes;
if (total === 0) return false;
return failures / total > 0.2;
},
}
3. Context Near Capacity
{
name: "context_near_capacity",
severity: "warning",
message: "Context token usage above 90% of budget",
cooldownMs: 10 * 60 * 1000,
condition: (metrics) => {
const tokens = metrics.gauges.get("context_tokens_total") ?? 0;
return tokens > 90_000; // 100k default budget
},
}
4. Zero Turns Last Hour
{
name: "zero_turns_last_hour",
severity: "critical",
message: "No successful turns in the last hour",
cooldownMs: 60 * 60 * 1000,
condition: (metrics) => {
const turnsLastHour = metrics.gauges.get("turns_last_hour") ?? -1;
if (turnsLastHour >= 0) return turnsLastHour === 0;
return false;
},
}
Alert Cooldowns
Each alert has a cooldown period to prevent alert storms. Once an alert fires, it won’t fire again until the cooldown expires.
Viewing Alerts
# Active alerts
automaton-cli alerts
# Alert history
automaton-cli alerts --history
Heartbeat Health
The heartbeat system provides autonomous health monitoring.
Heartbeat Tasks
From ~/.automaton/heartbeat.yml:
entries:
- name: check_balance
schedule: "0 */5 * * * *" # Every 5 minutes
task: check_balance
enabled: true
- name: check_inbox
schedule: "0 */2 * * * *" # Every 2 minutes
task: check_inbox
enabled: true
- name: self_reflect
schedule: "0 0 */6 * * *" # Every 6 hours
task: self_reflect
enabled: true
Heartbeat Status
automaton-cli heartbeat status
Output:
Task Schedule Last Run Next Run Status
check_balance */5 * * * * 2 minutes ago 3 minutes OK
check_inbox */2 * * * * 1 minute ago 1 minute OK
self_reflect 0 */6 * * * 4 hours ago 2 hours OK
Child Agent Health Monitoring
If your automaton has spawned child agents, it monitors their health:
Health Checks
From src/orchestration/health-monitor.ts:
export interface AgentHealthStatus {
address: string;
name: string;
status: string;
healthy: boolean;
lastHeartbeat: string | null;
currentTaskId: string | null;
creditBalance: number | null;
errorRate: number;
issues: string[];
}
Health Issues
The system detects:
- heartbeat_missing: No recent heartbeat
- heartbeat_stale: Heartbeat older than 15 minutes
- process_crashed: No response for 45 minutes
- stuck_on_task: Task execution exceeding timeout
- out_of_credits: Balance below minimum (10 cents)
- error_loop: Error rate exceeds 60%
Auto-Healing
The health monitor can automatically:
- Fund agents low on credits
- Restart crashed agents
- Reassign stuck tasks
- Stop agents in error loops
# View child health
automaton-cli children health
# Trigger auto-heal
automaton-cli children heal
Track agent turn execution:
SELECT
AVG(cost_cents) as avg_cost,
AVG(json_extract(token_usage, '$.totalTokens')) as avg_tokens,
COUNT(*) as turn_count
FROM turns
WHERE timestamp > datetime('now', '-1 day');
Inference Costs
Query inference spending:
SELECT
model,
SUM(cost_cents) as total_cost,
SUM(input_tokens) as total_input,
SUM(output_tokens) as total_output,
AVG(latency_ms) as avg_latency
FROM inference_costs
GROUP BY model;
Analyze tool call patterns:
SELECT
name,
COUNT(*) as call_count,
AVG(duration_ms) as avg_duration,
SUM(CASE WHEN error IS NOT NULL THEN 1 ELSE 0 END) as error_count
FROM tool_calls
GROUP BY name
ORDER BY call_count DESC;
Database Health
Monitor SQLite database:
# Database size
ls -lh ~/.automaton/state.db
# Integrity check
sqlite3 ~/.automaton/state.db "PRAGMA integrity_check;"
# Table sizes
sqlite3 ~/.automaton/state.db "SELECT name, SUM(pgsize) as size FROM dbstat GROUP BY name;"
Vacuum and Optimize
# Compact database
sqlite3 ~/.automaton/state.db "VACUUM;"
# Analyze query planner
sqlite3 ~/.automaton/state.db "ANALYZE;"
System Resource Monitoring
Sandbox Resources
Check Conway sandbox usage:
automaton-cli sandbox stats
Memory Usage
# Process memory
ps aux | grep automaton
# Database memory
du -sh ~/.automaton/
Export State
Export full automaton state for debugging:
automaton-cli export --output state-dump.json
Health Report
Generate comprehensive health report:
automaton-cli health-report
Includes:
- Current state and uptime
- Credit and USDC balances
- Recent turns and costs
- Active alerts
- Heartbeat status
- Child agent health
- Database stats
Best Practices
Proactive Monitoring
- Set up alerts: Configure notifications for critical alerts
- Review daily: Check status and metrics daily
- Trend analysis: Track spending and performance trends
- Capacity planning: Monitor context usage and database growth
- Optimize context: Prune old turns to reduce context size
- Tune heartbeat: Balance responsiveness vs cost
- Index optimization: Add database indexes for common queries
- Model selection: Use appropriate models for task complexity
Debugging
- Enable debug logs: Set
logLevel: "debug" temporarily
- Increase verbosity: Add context to critical operations
- Trace tool calls: Monitor tool execution and errors
- Profile turns: Measure turn duration and token usage
Troubleshooting Common Issues
High error rate
- Check logs:
automaton-cli logs --level error
- Review failed tool calls
- Verify API keys and network connectivity
- Check treasury policy for denied operations
Stuck in sleeping state
- Verify heartbeat is running
- Check inbox for unprocessed messages
- Review wake conditions
- Manually trigger:
automaton-cli wake
High costs
- Review inference costs by model
- Check for inefficient tool usage
- Optimize heartbeat frequency
- Switch to cheaper models for routine tasks
Database corruption
- Run integrity check
- Restore from backup
- Check disk space
- Review logs for write errors