Documentation Index
Fetch the complete documentation index at: https://mintlify.com/nearai/ironclaw/llms.txt
Use this file to discover all available pages before exploring further.
Overview
IronClaw defaults to NEAR AI for model access but supports any OpenAI-compatible endpoint as well as Anthropic and Ollama directly. This guide covers configuration for all supported providers.
Provider Overview
| Provider | Backend Value | Requires API Key | Notes |
|---|
| NEAR AI | nearai | OAuth (browser) | Default; multi-model |
| Anthropic | anthropic | ANTHROPIC_API_KEY | Claude models |
| OpenAI | openai | OPENAI_API_KEY | GPT models |
| Ollama | ollama | No | Local inference |
| OpenRouter | openai_compatible | LLM_API_KEY | 300+ models |
| Together AI | openai_compatible | LLM_API_KEY | Fast inference |
| Fireworks AI | openai_compatible | LLM_API_KEY | Fast inference |
| vLLM / LiteLLM | openai_compatible | Optional | Self-hosted |
| LM Studio | openai_compatible | No | Local GUI |
Provider Configuration
NEAR AI (Default)
No additional configuration required. On first run, ironclaw onboard opens a browser for OAuth authentication. Credentials are saved to ~/.ironclaw/session.json.
# Optional: customize model and base URL
NEARAI_MODEL=claude-3-5-sonnet-20241022
NEARAI_BASE_URL=https://private.near.ai
Features:
- OAuth authentication (no API key needed)
- Multi-model support (Claude, GPT, Llama, etc.)
- Usage tracking and billing through NEAR
Anthropic (Claude)
Direct access to Claude models:
LLM_BACKEND=anthropic
ANTHROPIC_API_KEY=sk-ant-...
Popular Models:
claude-sonnet-4-20250514 - Latest Sonnet (recommended)
claude-3-5-sonnet-20241022 - Sonnet 3.5
claude-3-5-haiku-20241022 - Fast, cost-effective
Configuration Options:
# Model selection
ANTHROPIC_MODEL=claude-sonnet-4-20250514
# Base URL (for custom endpoints)
ANTHROPIC_BASE_URL=https://api.anthropic.com
# API version
ANTHROPIC_VERSION=2023-06-01
OpenAI (GPT)
Access GPT models:
LLM_BACKEND=openai
OPENAI_API_KEY=sk-...
Popular Models:
gpt-4o - Latest GPT-4 Optimized
gpt-4o-mini - Fast, cost-effective
o3-mini - Reasoning model
Configuration Options:
# Model selection
OPENAI_MODEL=gpt-4o
# Base URL (for Azure OpenAI, etc.)
OPENAI_BASE_URL=https://api.openai.com/v1
# Organization ID (optional)
OPENAI_ORG_ID=org-...
Ollama (Local)
Run models locally:
LLM_BACKEND=ollama
OLLAMA_MODEL=llama3.2
Setup:
- Install Ollama from ollama.com
- Pull a model:
ollama pull llama3.2
- Start Ollama service (automatic on most systems)
- Configure IronClaw to use Ollama
Configuration Options:
# Model
OLLAMA_MODEL=llama3.2
# Base URL (if running on different host)
OLLAMA_BASE_URL=http://localhost:11434
# Context window (override model default)
OLLAMA_CONTEXT_LENGTH=8192
Popular Models:
llama3.2 - Meta’s latest
mistral - Fast and efficient
codellama - Code-specialized
deepseek-coder - Code understanding
OpenRouter
Access 300+ models through a single API:
LLM_BACKEND=openai_compatible
LLM_BASE_URL=https://openrouter.ai/api/v1
LLM_API_KEY=sk-or-...
LLM_MODEL=anthropic/claude-sonnet-4
Popular Models:
| Model | ID |
|---|
| Claude Sonnet 4 | anthropic/claude-sonnet-4 |
| GPT-4o | openai/gpt-4o |
| Llama 4 Maverick | meta-llama/llama-4-maverick |
| Gemini 2.0 Flash | google/gemini-2.0-flash-001 |
| Mistral Small | mistralai/mistral-small-3.1-24b-instruct |
Browse all models at openrouter.ai/models.
Features:
- Unified API for all major model providers
- Automatic fallback if primary model is unavailable
- Usage analytics and cost tracking
Together AI
Fast inference for open-source models:
LLM_BACKEND=openai_compatible
LLM_BASE_URL=https://api.together.xyz/v1
LLM_API_KEY=...
LLM_MODEL=meta-llama/Llama-3.3-70B-Instruct-Turbo
Popular Models:
| Model | ID |
|---|
| Llama 3.3 70B | meta-llama/Llama-3.3-70B-Instruct-Turbo |
| DeepSeek R1 | deepseek-ai/DeepSeek-R1 |
| Qwen 2.5 72B | Qwen/Qwen2.5-72B-Instruct-Turbo |
Features:
- Fast inference (optimized infrastructure)
- Competitive pricing
- Open-source model focus
Fireworks AI
High-performance inference with compound AI support:
LLM_BACKEND=openai_compatible
LLM_BASE_URL=https://api.fireworks.ai/inference/v1
LLM_API_KEY=fw_...
LLM_MODEL=accounts/fireworks/models/llama4-maverick-instruct-basic
Features:
- Sub-second latency
- Compound AI system support (function calling, tool use)
- Multi-model support
vLLM / LiteLLM (Self-Hosted)
Run your own inference server:
vLLM
LLM_BACKEND=openai_compatible
LLM_BASE_URL=http://localhost:8000/v1
LLM_API_KEY=token-abc123 # Any string if auth not configured
LLM_MODEL=meta-llama/Llama-3.1-8B-Instruct
Setup:
# Install vLLM
pip install vllm
# Start server
vllm serve meta-llama/Llama-3.1-8B-Instruct \
--host 0.0.0.0 \
--port 8000
LiteLLM
Proxy that forwards to any backend (Bedrock, Vertex, Azure, etc.):
LLM_BACKEND=openai_compatible
LLM_BASE_URL=http://localhost:4000/v1
LLM_API_KEY=sk-...
LLM_MODEL=gpt-4o # As configured in litellm config.yaml
Setup:
# Install LiteLLM
pip install litellm
# Create config.yaml
cat > config.yaml <<EOF
model_list:
- model_name: gpt-4o
litellm_params:
model: azure/gpt-4o
api_base: https://my-azure.openai.azure.com
api_key: os.environ/AZURE_API_KEY
EOF
# Start proxy
litellm --config config.yaml
LM Studio (Local GUI)
User-friendly local model hosting:
LLM_BACKEND=openai_compatible
LLM_BASE_URL=http://localhost:1234/v1
LLM_MODEL=llama-3.2-3b-instruct-q4_K_M
# LLM_API_KEY not required
Setup:
- Download LM Studio
- Download a model from the catalog
- Start the local server (tab in LM Studio)
- Configure IronClaw to use the endpoint
Advanced Configuration
Override context length and max output:
# Context window size
LLM_CONTEXT_LENGTH=200000
# Max output tokens
LLM_MAX_OUTPUT_TOKENS=8192
# Temperature
LLM_TEMPERATURE=0.7
Streaming
Enable/disable streaming responses:
# Enable streaming (default)
LLM_STREAMING=true
# Disable streaming
LLM_STREAMING=false
Retry Configuration
# Max retries on failure
LLM_MAX_RETRIES=3
# Retry delay (milliseconds)
LLM_RETRY_DELAY=1000
# Timeout (seconds)
LLM_TIMEOUT=120
Add custom headers to LLM requests:
# Single header
LLM_HEADER_X_Custom=value
# Multiple headers
LLM_HEADER_X_Request_ID=req-123
LLM_HEADER_X_User_Agent=ironclaw/1.0
Proxy Configuration
Route LLM requests through HTTP proxy:
# HTTP proxy
HTTP_PROXY=http://proxy.company.com:8080
# HTTPS proxy
HTTPS_PROXY=http://proxy.company.com:8080
# No proxy (comma-separated hosts)
NO_PROXY=localhost,127.0.0.1,.local
Setup Wizard
Instead of editing .env manually, run the onboarding wizard:
The wizard will:
- Prompt for LLM backend selection
- Request API keys (securely masked)
- Test the connection
- Save configuration to
.env
Wizard Options:
- NEAR AI (OAuth flow)
- Anthropic (API key)
- OpenAI (API key)
- Ollama (model selection)
- OpenAI-compatible (custom endpoint)
Provider-Specific Features
Anthropic
Tool Use (Function Calling):
Anthropic’s native tool use format is fully supported:
// Tools are automatically converted to Anthropic format
pub struct AnthropicTool {
pub name: String,
pub description: String,
pub input_schema: serde_json::Value,
}
Prompt Caching:
Long prompts are automatically cached:
# Enable prompt caching (default: true)
ANTHROPIC_PROMPT_CACHING=true
OpenAI
Function Calling:
Native OpenAI function calling:
pub struct OpenAIFunction {
pub name: String,
pub description: String,
pub parameters: serde_json::Value,
}
Response Format:
Enforce JSON output:
OPENAI_RESPONSE_FORMAT=json_object
Ollama
Model Pull:
Automatically pull models if missing:
Keep Alive:
Control model unloading:
# Keep model loaded indefinitely
OLLAMA_KEEP_ALIVE=-1
# Unload after 5 minutes
OLLAMA_KEEP_ALIVE=5m
Testing Configuration
Connection Test
# Test LLM connection
ironclaw llm test
# Expected output:
# ✅ Connected to Anthropic (claude-sonnet-4-20250514)
# ✅ Context length: 200000 tokens
# ✅ Max output: 8192 tokens
Completion Test
# Send test completion
ironclaw llm complete "What is 2+2?"
# Expected output:
# 2 + 2 = 4
Troubleshooting
Authentication Errors
Error: Authentication failed (401)
Solutions:
- Verify API key is correct
- Check API key has not expired
- Ensure API key has necessary permissions
- For NEAR AI, re-run
ironclaw onboard to refresh OAuth token
Rate Limiting
Error: Rate limit exceeded (429)
Solutions:
- Reduce request frequency
- Increase retry delay:
LLM_RETRY_DELAY=5000
- Switch to a different provider/model
- Upgrade API plan for higher limits
Connection Timeout
Error: Request timeout after 120s
Solutions:
- Increase timeout:
LLM_TIMEOUT=300
- Check network connectivity
- Verify proxy configuration
- Try a different model (some are slower)
Model Not Found
Error: Model not found: gpt-5
Solutions:
- Check model name spelling
- Verify model is available for your API key
- List available models:
ironclaw llm models
- For Ollama, pull the model:
ollama pull model-name
Error: Invalid JSON response from LLM
Solutions:
- Check base URL is correct (must include
/v1 for OpenAI-compatible)
- Verify provider is actually OpenAI-compatible
- Enable debug logging:
RUST_LOG=ironclaw::llm=debug
- Test endpoint directly with curl
Cost Optimization
Model Selection
Choose cost-effective models:
| Use Case | Recommended Model | Why |
|---|
| Quick tasks | claude-3-5-haiku-20241022 | Fastest, cheapest Claude |
| Code | gpt-4o-mini | Good code understanding, low cost |
| Complex reasoning | claude-sonnet-4 | Best performance |
| Local/free | Ollama llama3.2 | No API costs |
Prompt Optimization
- Reduce context: Minimize system prompts and skill content
- Cache prompts: Use Anthropic prompt caching for repeated long prompts
- Batch requests: Group similar tasks together
- Output limiting: Set
max_tokens appropriately
Provider Comparison
| Provider | Cost | Speed | Quality |
|---|
| NEAR AI | Medium | Fast | High (multi-model) |
| Anthropic | High | Fast | Highest (Claude) |
| OpenAI | High | Medium | High (GPT) |
| OpenRouter | Variable | Variable | Variable |
| Together AI | Low | Fast | Medium-High |
| Ollama | Free | Slow | Medium |
Migration Guide
From OpenAI to Anthropic
# Before
LLM_BACKEND=openai
OPENAI_API_KEY=sk-...
OPENAI_MODEL=gpt-4o
# After
LLM_BACKEND=anthropic
ANTHROPIC_API_KEY=sk-ant-...
ANTHROPIC_MODEL=claude-sonnet-4-20250514
From Cloud to Local (Ollama)
# Before
LLM_BACKEND=anthropic
ANTHROPIC_API_KEY=sk-ant-...
# After
LLM_BACKEND=ollama
OLLAMA_MODEL=llama3.2
# No API key needed
From Direct to OpenRouter
# Before (direct Anthropic)
LLM_BACKEND=anthropic
ANTHROPIC_API_KEY=sk-ant-...
ANTHROPIC_MODEL=claude-sonnet-4-20250514
# After (via OpenRouter)
LLM_BACKEND=openai_compatible
LLM_BASE_URL=https://openrouter.ai/api/v1
LLM_API_KEY=sk-or-...
LLM_MODEL=anthropic/claude-sonnet-4
Source Code
Key files:
src/llm/mod.rs - LLM provider abstraction
src/llm/anthropic.rs - Anthropic implementation
src/llm/openai.rs - OpenAI implementation
src/llm/ollama.rs - Ollama implementation
src/llm/nearai.rs - NEAR AI implementation
docs/LLM_PROVIDERS.md - Additional provider documentation