Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/nearai/ironclaw/llms.txt

Use this file to discover all available pages before exploring further.

Overview

IronClaw defaults to NEAR AI for model access but supports any OpenAI-compatible endpoint as well as Anthropic and Ollama directly. This guide covers configuration for all supported providers.

Provider Overview

ProviderBackend ValueRequires API KeyNotes
NEAR AInearaiOAuth (browser)Default; multi-model
AnthropicanthropicANTHROPIC_API_KEYClaude models
OpenAIopenaiOPENAI_API_KEYGPT models
OllamaollamaNoLocal inference
OpenRouteropenai_compatibleLLM_API_KEY300+ models
Together AIopenai_compatibleLLM_API_KEYFast inference
Fireworks AIopenai_compatibleLLM_API_KEYFast inference
vLLM / LiteLLMopenai_compatibleOptionalSelf-hosted
LM Studioopenai_compatibleNoLocal GUI

Provider Configuration

NEAR AI (Default)

No additional configuration required. On first run, ironclaw onboard opens a browser for OAuth authentication. Credentials are saved to ~/.ironclaw/session.json.
# Optional: customize model and base URL
NEARAI_MODEL=claude-3-5-sonnet-20241022
NEARAI_BASE_URL=https://private.near.ai
Features:
  • OAuth authentication (no API key needed)
  • Multi-model support (Claude, GPT, Llama, etc.)
  • Usage tracking and billing through NEAR

Anthropic (Claude)

Direct access to Claude models:
LLM_BACKEND=anthropic
ANTHROPIC_API_KEY=sk-ant-...
Popular Models:
  • claude-sonnet-4-20250514 - Latest Sonnet (recommended)
  • claude-3-5-sonnet-20241022 - Sonnet 3.5
  • claude-3-5-haiku-20241022 - Fast, cost-effective
Configuration Options:
# Model selection
ANTHROPIC_MODEL=claude-sonnet-4-20250514

# Base URL (for custom endpoints)
ANTHROPIC_BASE_URL=https://api.anthropic.com

# API version
ANTHROPIC_VERSION=2023-06-01

OpenAI (GPT)

Access GPT models:
LLM_BACKEND=openai
OPENAI_API_KEY=sk-...
Popular Models:
  • gpt-4o - Latest GPT-4 Optimized
  • gpt-4o-mini - Fast, cost-effective
  • o3-mini - Reasoning model
Configuration Options:
# Model selection
OPENAI_MODEL=gpt-4o

# Base URL (for Azure OpenAI, etc.)
OPENAI_BASE_URL=https://api.openai.com/v1

# Organization ID (optional)
OPENAI_ORG_ID=org-...

Ollama (Local)

Run models locally:
LLM_BACKEND=ollama
OLLAMA_MODEL=llama3.2
Setup:
  1. Install Ollama from ollama.com
  2. Pull a model: ollama pull llama3.2
  3. Start Ollama service (automatic on most systems)
  4. Configure IronClaw to use Ollama
Configuration Options:
# Model
OLLAMA_MODEL=llama3.2

# Base URL (if running on different host)
OLLAMA_BASE_URL=http://localhost:11434

# Context window (override model default)
OLLAMA_CONTEXT_LENGTH=8192
Popular Models:
  • llama3.2 - Meta’s latest
  • mistral - Fast and efficient
  • codellama - Code-specialized
  • deepseek-coder - Code understanding

OpenRouter

Access 300+ models through a single API:
LLM_BACKEND=openai_compatible
LLM_BASE_URL=https://openrouter.ai/api/v1
LLM_API_KEY=sk-or-...
LLM_MODEL=anthropic/claude-sonnet-4
Popular Models:
ModelID
Claude Sonnet 4anthropic/claude-sonnet-4
GPT-4oopenai/gpt-4o
Llama 4 Maverickmeta-llama/llama-4-maverick
Gemini 2.0 Flashgoogle/gemini-2.0-flash-001
Mistral Smallmistralai/mistral-small-3.1-24b-instruct
Browse all models at openrouter.ai/models. Features:
  • Unified API for all major model providers
  • Automatic fallback if primary model is unavailable
  • Usage analytics and cost tracking

Together AI

Fast inference for open-source models:
LLM_BACKEND=openai_compatible
LLM_BASE_URL=https://api.together.xyz/v1
LLM_API_KEY=...
LLM_MODEL=meta-llama/Llama-3.3-70B-Instruct-Turbo
Popular Models:
ModelID
Llama 3.3 70Bmeta-llama/Llama-3.3-70B-Instruct-Turbo
DeepSeek R1deepseek-ai/DeepSeek-R1
Qwen 2.5 72BQwen/Qwen2.5-72B-Instruct-Turbo
Features:
  • Fast inference (optimized infrastructure)
  • Competitive pricing
  • Open-source model focus

Fireworks AI

High-performance inference with compound AI support:
LLM_BACKEND=openai_compatible
LLM_BASE_URL=https://api.fireworks.ai/inference/v1
LLM_API_KEY=fw_...
LLM_MODEL=accounts/fireworks/models/llama4-maverick-instruct-basic
Features:
  • Sub-second latency
  • Compound AI system support (function calling, tool use)
  • Multi-model support

vLLM / LiteLLM (Self-Hosted)

Run your own inference server:

vLLM

LLM_BACKEND=openai_compatible
LLM_BASE_URL=http://localhost:8000/v1
LLM_API_KEY=token-abc123  # Any string if auth not configured
LLM_MODEL=meta-llama/Llama-3.1-8B-Instruct
Setup:
# Install vLLM
pip install vllm

# Start server
vllm serve meta-llama/Llama-3.1-8B-Instruct \
  --host 0.0.0.0 \
  --port 8000

LiteLLM

Proxy that forwards to any backend (Bedrock, Vertex, Azure, etc.):
LLM_BACKEND=openai_compatible
LLM_BASE_URL=http://localhost:4000/v1
LLM_API_KEY=sk-...
LLM_MODEL=gpt-4o  # As configured in litellm config.yaml
Setup:
# Install LiteLLM
pip install litellm

# Create config.yaml
cat > config.yaml <<EOF
model_list:
  - model_name: gpt-4o
    litellm_params:
      model: azure/gpt-4o
      api_base: https://my-azure.openai.azure.com
      api_key: os.environ/AZURE_API_KEY
EOF

# Start proxy
litellm --config config.yaml

LM Studio (Local GUI)

User-friendly local model hosting:
LLM_BACKEND=openai_compatible
LLM_BASE_URL=http://localhost:1234/v1
LLM_MODEL=llama-3.2-3b-instruct-q4_K_M
# LLM_API_KEY not required
Setup:
  1. Download LM Studio
  2. Download a model from the catalog
  3. Start the local server (tab in LM Studio)
  4. Configure IronClaw to use the endpoint

Advanced Configuration

Model Metadata Override

Override context length and max output:
# Context window size
LLM_CONTEXT_LENGTH=200000

# Max output tokens
LLM_MAX_OUTPUT_TOKENS=8192

# Temperature
LLM_TEMPERATURE=0.7

Streaming

Enable/disable streaming responses:
# Enable streaming (default)
LLM_STREAMING=true

# Disable streaming
LLM_STREAMING=false

Retry Configuration

# Max retries on failure
LLM_MAX_RETRIES=3

# Retry delay (milliseconds)
LLM_RETRY_DELAY=1000

# Timeout (seconds)
LLM_TIMEOUT=120

Request Headers

Add custom headers to LLM requests:
# Single header
LLM_HEADER_X_Custom=value

# Multiple headers
LLM_HEADER_X_Request_ID=req-123
LLM_HEADER_X_User_Agent=ironclaw/1.0

Proxy Configuration

Route LLM requests through HTTP proxy:
# HTTP proxy
HTTP_PROXY=http://proxy.company.com:8080

# HTTPS proxy
HTTPS_PROXY=http://proxy.company.com:8080

# No proxy (comma-separated hosts)
NO_PROXY=localhost,127.0.0.1,.local

Setup Wizard

Instead of editing .env manually, run the onboarding wizard:
ironclaw onboard
The wizard will:
  1. Prompt for LLM backend selection
  2. Request API keys (securely masked)
  3. Test the connection
  4. Save configuration to .env
Wizard Options:
  • NEAR AI (OAuth flow)
  • Anthropic (API key)
  • OpenAI (API key)
  • Ollama (model selection)
  • OpenAI-compatible (custom endpoint)

Provider-Specific Features

Anthropic

Tool Use (Function Calling): Anthropic’s native tool use format is fully supported:
// Tools are automatically converted to Anthropic format
pub struct AnthropicTool {
    pub name: String,
    pub description: String,
    pub input_schema: serde_json::Value,
}
Prompt Caching: Long prompts are automatically cached:
# Enable prompt caching (default: true)
ANTHROPIC_PROMPT_CACHING=true

OpenAI

Function Calling: Native OpenAI function calling:
pub struct OpenAIFunction {
    pub name: String,
    pub description: String,
    pub parameters: serde_json::Value,
}
Response Format: Enforce JSON output:
OPENAI_RESPONSE_FORMAT=json_object

Ollama

Model Pull: Automatically pull models if missing:
OLLAMA_AUTO_PULL=true
Keep Alive: Control model unloading:
# Keep model loaded indefinitely
OLLAMA_KEEP_ALIVE=-1

# Unload after 5 minutes
OLLAMA_KEEP_ALIVE=5m

Testing Configuration

Connection Test

# Test LLM connection
ironclaw llm test

# Expected output:
# ✅ Connected to Anthropic (claude-sonnet-4-20250514)
# ✅ Context length: 200000 tokens
# ✅ Max output: 8192 tokens

Completion Test

# Send test completion
ironclaw llm complete "What is 2+2?"

# Expected output:
# 2 + 2 = 4

Troubleshooting

Authentication Errors

Error: Authentication failed (401)
Solutions:
  1. Verify API key is correct
  2. Check API key has not expired
  3. Ensure API key has necessary permissions
  4. For NEAR AI, re-run ironclaw onboard to refresh OAuth token

Rate Limiting

Error: Rate limit exceeded (429)
Solutions:
  1. Reduce request frequency
  2. Increase retry delay: LLM_RETRY_DELAY=5000
  3. Switch to a different provider/model
  4. Upgrade API plan for higher limits

Connection Timeout

Error: Request timeout after 120s
Solutions:
  1. Increase timeout: LLM_TIMEOUT=300
  2. Check network connectivity
  3. Verify proxy configuration
  4. Try a different model (some are slower)

Model Not Found

Error: Model not found: gpt-5
Solutions:
  1. Check model name spelling
  2. Verify model is available for your API key
  3. List available models: ironclaw llm models
  4. For Ollama, pull the model: ollama pull model-name

Invalid Response Format

Error: Invalid JSON response from LLM
Solutions:
  1. Check base URL is correct (must include /v1 for OpenAI-compatible)
  2. Verify provider is actually OpenAI-compatible
  3. Enable debug logging: RUST_LOG=ironclaw::llm=debug
  4. Test endpoint directly with curl

Cost Optimization

Model Selection

Choose cost-effective models:
Use CaseRecommended ModelWhy
Quick tasksclaude-3-5-haiku-20241022Fastest, cheapest Claude
Codegpt-4o-miniGood code understanding, low cost
Complex reasoningclaude-sonnet-4Best performance
Local/freeOllama llama3.2No API costs

Prompt Optimization

  1. Reduce context: Minimize system prompts and skill content
  2. Cache prompts: Use Anthropic prompt caching for repeated long prompts
  3. Batch requests: Group similar tasks together
  4. Output limiting: Set max_tokens appropriately

Provider Comparison

ProviderCostSpeedQuality
NEAR AIMediumFastHigh (multi-model)
AnthropicHighFastHighest (Claude)
OpenAIHighMediumHigh (GPT)
OpenRouterVariableVariableVariable
Together AILowFastMedium-High
OllamaFreeSlowMedium

Migration Guide

From OpenAI to Anthropic

# Before
LLM_BACKEND=openai
OPENAI_API_KEY=sk-...
OPENAI_MODEL=gpt-4o

# After
LLM_BACKEND=anthropic
ANTHROPIC_API_KEY=sk-ant-...
ANTHROPIC_MODEL=claude-sonnet-4-20250514

From Cloud to Local (Ollama)

# Before
LLM_BACKEND=anthropic
ANTHROPIC_API_KEY=sk-ant-...

# After
LLM_BACKEND=ollama
OLLAMA_MODEL=llama3.2
# No API key needed

From Direct to OpenRouter

# Before (direct Anthropic)
LLM_BACKEND=anthropic
ANTHROPIC_API_KEY=sk-ant-...
ANTHROPIC_MODEL=claude-sonnet-4-20250514

# After (via OpenRouter)
LLM_BACKEND=openai_compatible
LLM_BASE_URL=https://openrouter.ai/api/v1
LLM_API_KEY=sk-or-...
LLM_MODEL=anthropic/claude-sonnet-4

Source Code

Key files:
  • src/llm/mod.rs - LLM provider abstraction
  • src/llm/anthropic.rs - Anthropic implementation
  • src/llm/openai.rs - OpenAI implementation
  • src/llm/ollama.rs - Ollama implementation
  • src/llm/nearai.rs - NEAR AI implementation
  • docs/LLM_PROVIDERS.md - Additional provider documentation

Build docs developers (and LLMs) love