LLM Provider Configuration

Overview

IronClaw defaults to NEAR AI for model access but supports any OpenAI-compatible endpoint as well as Anthropic and Ollama directly. This guide covers configuration for all supported providers.

Provider Overview

Provider	Backend Value	Requires API Key	Notes
NEAR AI	`nearai`	OAuth (browser)	Default; multi-model
Anthropic	`anthropic`	`ANTHROPIC_API_KEY`	Claude models
OpenAI	`openai`	`OPENAI_API_KEY`	GPT models
Ollama	`ollama`	No	Local inference
OpenRouter	`openai_compatible`	`LLM_API_KEY`	300+ models
Together AI	`openai_compatible`	`LLM_API_KEY`	Fast inference
Fireworks AI	`openai_compatible`	`LLM_API_KEY`	Fast inference
vLLM / LiteLLM	`openai_compatible`	Optional	Self-hosted
LM Studio	`openai_compatible`	No	Local GUI

Provider Configuration

NEAR AI (Default)

No additional configuration required. On first run, ironclaw onboard opens a browser for OAuth authentication. Credentials are saved to ~/.ironclaw/session.json.

# Optional: customize model and base URL
NEARAI_MODEL=claude-3-5-sonnet-20241022
NEARAI_BASE_URL=https://private.near.ai

Features:

OAuth authentication (no API key needed)
Multi-model support (Claude, GPT, Llama, etc.)
Usage tracking and billing through NEAR

Anthropic (Claude)

Direct access to Claude models:

LLM_BACKEND=anthropic
ANTHROPIC_API_KEY=sk-ant-...

Popular Models:

claude-sonnet-4-20250514 - Latest Sonnet (recommended)
claude-3-5-sonnet-20241022 - Sonnet 3.5
claude-3-5-haiku-20241022 - Fast, cost-effective

Configuration Options:

# Model selection
ANTHROPIC_MODEL=claude-sonnet-4-20250514

# Base URL (for custom endpoints)
ANTHROPIC_BASE_URL=https://api.anthropic.com

# API version
ANTHROPIC_VERSION=2023-06-01

OpenAI (GPT)

Access GPT models:

LLM_BACKEND=openai
OPENAI_API_KEY=sk-...

Popular Models:

gpt-4o - Latest GPT-4 Optimized
gpt-4o-mini - Fast, cost-effective
o3-mini - Reasoning model

Configuration Options:

# Model selection
OPENAI_MODEL=gpt-4o

# Base URL (for Azure OpenAI, etc.)
OPENAI_BASE_URL=https://api.openai.com/v1

# Organization ID (optional)
OPENAI_ORG_ID=org-...

Ollama (Local)

Run models locally:

LLM_BACKEND=ollama
OLLAMA_MODEL=llama3.2

Setup:

Install Ollama from ollama.com
Pull a model: ollama pull llama3.2
Start Ollama service (automatic on most systems)
Configure IronClaw to use Ollama

Configuration Options:

# Model
OLLAMA_MODEL=llama3.2

# Base URL (if running on different host)
OLLAMA_BASE_URL=http://localhost:11434

# Context window (override model default)
OLLAMA_CONTEXT_LENGTH=8192

Popular Models:

llama3.2 - Meta’s latest
mistral - Fast and efficient
codellama - Code-specialized
deepseek-coder - Code understanding

OpenRouter

Access 300+ models through a single API:

LLM_BACKEND=openai_compatible
LLM_BASE_URL=https://openrouter.ai/api/v1
LLM_API_KEY=sk-or-...
LLM_MODEL=anthropic/claude-sonnet-4

Popular Models:

Model	ID
Claude Sonnet 4	`anthropic/claude-sonnet-4`
GPT-4o	`openai/gpt-4o`
Llama 4 Maverick	`meta-llama/llama-4-maverick`
Gemini 2.0 Flash	`google/gemini-2.0-flash-001`
Mistral Small	`mistralai/mistral-small-3.1-24b-instruct`

Browse all models at openrouter.ai/models. Features:

Unified API for all major model providers
Automatic fallback if primary model is unavailable
Usage analytics and cost tracking

Together AI

Fast inference for open-source models:

LLM_BACKEND=openai_compatible
LLM_BASE_URL=https://api.together.xyz/v1
LLM_API_KEY=...
LLM_MODEL=meta-llama/Llama-3.3-70B-Instruct-Turbo

Popular Models:

Model	ID
Llama 3.3 70B	`meta-llama/Llama-3.3-70B-Instruct-Turbo`
DeepSeek R1	`deepseek-ai/DeepSeek-R1`
Qwen 2.5 72B	`Qwen/Qwen2.5-72B-Instruct-Turbo`

Features:

Fast inference (optimized infrastructure)
Competitive pricing
Open-source model focus

Fireworks AI

High-performance inference with compound AI support:

LLM_BACKEND=openai_compatible
LLM_BASE_URL=https://api.fireworks.ai/inference/v1
LLM_API_KEY=fw_...
LLM_MODEL=accounts/fireworks/models/llama4-maverick-instruct-basic

Features:

Sub-second latency
Compound AI system support (function calling, tool use)
Multi-model support

vLLM / LiteLLM (Self-Hosted)

Run your own inference server:

vLLM

LLM_BACKEND=openai_compatible
LLM_BASE_URL=http://localhost:8000/v1
LLM_API_KEY=token-abc123  # Any string if auth not configured
LLM_MODEL=meta-llama/Llama-3.1-8B-Instruct

Setup:

# Install vLLM
pip install vllm

# Start server
vllm serve meta-llama/Llama-3.1-8B-Instruct \
  --host 0.0.0.0 \
  --port 8000

LiteLLM

Proxy that forwards to any backend (Bedrock, Vertex, Azure, etc.):

LLM_BACKEND=openai_compatible
LLM_BASE_URL=http://localhost:4000/v1
LLM_API_KEY=sk-...
LLM_MODEL=gpt-4o  # As configured in litellm config.yaml

Setup:

# Install LiteLLM
pip install litellm

# Create config.yaml
cat > config.yaml <<EOF
model_list:
  - model_name: gpt-4o
    litellm_params:
      model: azure/gpt-4o
      api_base: https://my-azure.openai.azure.com
      api_key: os.environ/AZURE_API_KEY
EOF

# Start proxy
litellm --config config.yaml

LM Studio (Local GUI)

User-friendly local model hosting:

LLM_BACKEND=openai_compatible
LLM_BASE_URL=http://localhost:1234/v1
LLM_MODEL=llama-3.2-3b-instruct-q4_K_M
# LLM_API_KEY not required

Setup:

Download LM Studio
Download a model from the catalog
Start the local server (tab in LM Studio)
Configure IronClaw to use the endpoint

Advanced Configuration

Model Metadata Override

Override context length and max output:

# Context window size
LLM_CONTEXT_LENGTH=200000

# Max output tokens
LLM_MAX_OUTPUT_TOKENS=8192

# Temperature
LLM_TEMPERATURE=0.7

Streaming

Enable/disable streaming responses:

# Enable streaming (default)
LLM_STREAMING=true

# Disable streaming
LLM_STREAMING=false

Retry Configuration

# Max retries on failure
LLM_MAX_RETRIES=3

# Retry delay (milliseconds)
LLM_RETRY_DELAY=1000

# Timeout (seconds)
LLM_TIMEOUT=120

Request Headers

Add custom headers to LLM requests:

# Single header
LLM_HEADER_X_Custom=value

# Multiple headers
LLM_HEADER_X_Request_ID=req-123
LLM_HEADER_X_User_Agent=ironclaw/1.0

Proxy Configuration

Route LLM requests through HTTP proxy:

# HTTP proxy
HTTP_PROXY=http://proxy.company.com:8080

# HTTPS proxy
HTTPS_PROXY=http://proxy.company.com:8080

# No proxy (comma-separated hosts)
NO_PROXY=localhost,127.0.0.1,.local

Setup Wizard

Instead of editing .env manually, run the onboarding wizard:

ironclaw onboard

The wizard will:

Prompt for LLM backend selection
Request API keys (securely masked)
Test the connection
Save configuration to .env

Wizard Options:

NEAR AI (OAuth flow)
Anthropic (API key)
OpenAI (API key)
Ollama (model selection)
OpenAI-compatible (custom endpoint)

Provider-Specific Features

Anthropic

Tool Use (Function Calling): Anthropic’s native tool use format is fully supported:

// Tools are automatically converted to Anthropic format
pub struct AnthropicTool {
    pub name: String,
    pub description: String,
    pub input_schema: serde_json::Value,
}

Prompt Caching: Long prompts are automatically cached:

# Enable prompt caching (default: true)
ANTHROPIC_PROMPT_CACHING=true

OpenAI

Function Calling: Native OpenAI function calling:

pub struct OpenAIFunction {
    pub name: String,
    pub description: String,
    pub parameters: serde_json::Value,
}

Response Format: Enforce JSON output:

OPENAI_RESPONSE_FORMAT=json_object

Ollama

Model Pull: Automatically pull models if missing:

OLLAMA_AUTO_PULL=true

Keep Alive: Control model unloading:

# Keep model loaded indefinitely
OLLAMA_KEEP_ALIVE=-1

# Unload after 5 minutes
OLLAMA_KEEP_ALIVE=5m

Testing Configuration

Connection Test

# Test LLM connection
ironclaw llm test

# Expected output:
# ✅ Connected to Anthropic (claude-sonnet-4-20250514)
# ✅ Context length: 200000 tokens
# ✅ Max output: 8192 tokens

Completion Test

# Send test completion
ironclaw llm complete "What is 2+2?"

# Expected output:
# 2 + 2 = 4

Troubleshooting

Authentication Errors

Error: Authentication failed (401)

Solutions:

Verify API key is correct
Check API key has not expired
Ensure API key has necessary permissions
For NEAR AI, re-run ironclaw onboard to refresh OAuth token

Rate Limiting

Error: Rate limit exceeded (429)

Solutions:

Reduce request frequency
Increase retry delay: LLM_RETRY_DELAY=5000
Switch to a different provider/model
Upgrade API plan for higher limits

Connection Timeout

Error: Request timeout after 120s

Solutions:

Increase timeout: LLM_TIMEOUT=300
Check network connectivity
Verify proxy configuration
Try a different model (some are slower)

Model Not Found

Error: Model not found: gpt-5

Solutions:

Check model name spelling
Verify model is available for your API key
List available models: ironclaw llm models
For Ollama, pull the model: ollama pull model-name

Invalid Response Format

Error: Invalid JSON response from LLM

Solutions:

Check base URL is correct (must include /v1 for OpenAI-compatible)
Verify provider is actually OpenAI-compatible
Enable debug logging: RUST_LOG=ironclaw::llm=debug
Test endpoint directly with curl

Cost Optimization

Model Selection

Choose cost-effective models:

Use Case	Recommended Model	Why
Quick tasks	`claude-3-5-haiku-20241022`	Fastest, cheapest Claude
Code	`gpt-4o-mini`	Good code understanding, low cost
Complex reasoning	`claude-sonnet-4`	Best performance
Local/free	Ollama `llama3.2`	No API costs

Prompt Optimization

Reduce context: Minimize system prompts and skill content
Cache prompts: Use Anthropic prompt caching for repeated long prompts
Batch requests: Group similar tasks together
Output limiting: Set max_tokens appropriately

Provider Comparison

Provider	Cost	Speed	Quality
NEAR AI	Medium	Fast	High (multi-model)
Anthropic	High	Fast	Highest (Claude)
OpenAI	High	Medium	High (GPT)
OpenRouter	Variable	Variable	Variable
Together AI	Low	Fast	Medium-High
Ollama	Free	Slow	Medium

Migration Guide

From OpenAI to Anthropic

# Before
LLM_BACKEND=openai
OPENAI_API_KEY=sk-...
OPENAI_MODEL=gpt-4o

# After
LLM_BACKEND=anthropic
ANTHROPIC_API_KEY=sk-ant-...
ANTHROPIC_MODEL=claude-sonnet-4-20250514

From Cloud to Local (Ollama)

# Before
LLM_BACKEND=anthropic
ANTHROPIC_API_KEY=sk-ant-...

# After
LLM_BACKEND=ollama
OLLAMA_MODEL=llama3.2
# No API key needed

From Direct to OpenRouter

# Before (direct Anthropic)
LLM_BACKEND=anthropic
ANTHROPIC_API_KEY=sk-ant-...
ANTHROPIC_MODEL=claude-sonnet-4-20250514

# After (via OpenRouter)
LLM_BACKEND=openai_compatible
LLM_BASE_URL=https://openrouter.ai/api/v1
LLM_API_KEY=sk-or-...
LLM_MODEL=anthropic/claude-sonnet-4

Source Code

Key files:

src/llm/mod.rs - LLM provider abstraction
src/llm/anthropic.rs - Anthropic implementation
src/llm/openai.rs - OpenAI implementation
src/llm/ollama.rs - Ollama implementation
src/llm/nearai.rs - NEAR AI implementation
docs/LLM_PROVIDERS.md - Additional provider documentation

Getting Started

Core Concepts

CLI Reference

Channels

Tools & Extensions

Advanced

Security

Development

Documentation Index

​Overview

​Provider Overview

​Provider Configuration

​NEAR AI (Default)

​Anthropic (Claude)

​OpenAI (GPT)

​Ollama (Local)

​OpenRouter

​Together AI

​Fireworks AI

​vLLM / LiteLLM (Self-Hosted)

​vLLM

​LiteLLM

​LM Studio (Local GUI)

​Advanced Configuration

​Model Metadata Override

​Streaming

​Retry Configuration

​Request Headers

​Proxy Configuration

​Setup Wizard

​Provider-Specific Features

​Anthropic

​OpenAI

​Ollama

​Testing Configuration

​Connection Test

​Completion Test

​Troubleshooting

​Authentication Errors

​Rate Limiting

​Connection Timeout

​Model Not Found

​Invalid Response Format

​Cost Optimization

​Model Selection

​Prompt Optimization

​Provider Comparison

​Migration Guide

​From OpenAI to Anthropic

​From Cloud to Local (Ollama)

​From Direct to OpenRouter

​Source Code

Build docs developers (and LLMs) love

Overview

Provider Overview

Provider Configuration

NEAR AI (Default)

Anthropic (Claude)

OpenAI (GPT)

Ollama (Local)

OpenRouter

Together AI

Fireworks AI

vLLM / LiteLLM (Self-Hosted)

vLLM

LiteLLM

LM Studio (Local GUI)

Advanced Configuration

Model Metadata Override

Streaming

Retry Configuration

Request Headers

Proxy Configuration

Setup Wizard

Provider-Specific Features

Anthropic

OpenAI

Ollama

Testing Configuration

Connection Test

Completion Test

Troubleshooting

Authentication Errors

Rate Limiting

Connection Timeout

Model Not Found

Invalid Response Format

Cost Optimization

Model Selection

Prompt Optimization

Provider Comparison

Migration Guide

From OpenAI to Anthropic

From Cloud to Local (Ollama)

From Direct to OpenRouter

Source Code