Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/nearai/ironclaw/llms.txt

Use this file to discover all available pages before exploring further.

Overview

The LLM module provides integration with multiple language model providers:
  • NEAR AI (default): Session token or API key auth via Chat Completions API
  • OpenAI: Direct API access with your own key
  • Anthropic: Claude models via direct API
  • Ollama: Local model inference
  • OpenAI-compatible: Any endpoint that speaks the OpenAI API
  • Tinfoil: Private inference endpoints
Includes resilience features: retry logic, failover, circuit breakers, response caching, and smart routing.

Core Types

LlmProvider Trait

The main trait that all LLM providers must implement.
#[async_trait]
pub trait LlmProvider: Send + Sync {
    fn model_name(&self) -> &str;
    fn cost_per_token(&self) -> (Decimal, Decimal);
    
    async fn complete(&self, request: CompletionRequest) -> Result<CompletionResponse, LlmError>;
    
    async fn complete_with_tools(
        &self,
        request: ToolCompletionRequest,
    ) -> Result<ToolCompletionResponse, LlmError>;
    
    async fn list_models(&self) -> Result<Vec<String>, LlmError> {
        Ok(Vec::new())
    }
    
    async fn model_metadata(&self) -> Result<ModelMetadata, LlmError> {
        Ok(ModelMetadata {
            id: self.model_name().to_string(),
            context_length: None,
        })
    }
}

Methods

model_name
fn(&self) -> &str
Return the model identifier
cost_per_token
fn(&self) -> (Decimal, Decimal)
Return (input_cost, output_cost) per token in USD
complete
async fn(&self, request: CompletionRequest) -> Result<CompletionResponse>
Generate a completion for a conversation
complete_with_tools
async fn(&self, request: ToolCompletionRequest) -> Result<ToolCompletionResponse>
Generate a completion with tool calling support
list_models
async fn(&self) -> Result<Vec<String>>
List available models from the provider (optional)
model_metadata
async fn(&self) -> Result<ModelMetadata>
Fetch metadata about the model (context length, etc.)

ChatMessage

A message in a conversation.
role
Role
Message role (System, User, Assistant, Tool)
content
String
Message content/text
tool_call_id
Option<String>
Tool call ID if this is a tool result message
name
Option<String>
Tool name for tool result messages
tool_calls
Option<Vec<ToolCall>>
Tool calls made by the assistant

Constructors

system
fn(content: impl Into<String>) -> Self
Create a system message
user
fn(content: impl Into<String>) -> Self
Create a user message
assistant
fn(content: impl Into<String>) -> Self
Create an assistant message
assistant_with_tool_calls
fn(content: Option<String>, tool_calls: Vec<ToolCall>) -> Self
Create an assistant message with tool calls
tool_result
fn(tool_call_id: impl Into<String>, name: impl Into<String>, content: impl Into<String>) -> Self
Create a tool result message

Example

use ironclaw::llm::{ChatMessage, Role};

let messages = vec![
    ChatMessage::system("You are a helpful assistant."),
    ChatMessage::user("What is the capital of France?"),
    ChatMessage::assistant("The capital of France is Paris."),
];

CompletionRequest

Request for a chat completion.
messages
Vec<ChatMessage>
Conversation history
model
Option<String>
Optional per-request model override
max_tokens
Option<u32>
Maximum tokens to generate
temperature
Option<f32>
Sampling temperature (0.0 to 2.0)
stop_sequences
Option<Vec<String>>
Sequences that stop generation
metadata
HashMap<String, String>
Opaque metadata passed to provider

Methods

new
fn(messages: Vec<ChatMessage>) -> Self
Create a new completion request
with_model
fn(self, model: impl Into<String>) -> Self
Set model override
with_max_tokens
fn(self, max_tokens: u32) -> Self
Set max tokens
with_temperature
fn(self, temperature: f32) -> Self
Set temperature

Example

use ironclaw::llm::{CompletionRequest, ChatMessage};

let request = CompletionRequest::new(messages)
    .with_max_tokens(1000)
    .with_temperature(0.7);

let response = llm.complete(request).await?;
println!("Response: {}", response.content);

CompletionResponse

Response from a chat completion.
content
String
Generated text content
input_tokens
u32
Number of tokens in the prompt
output_tokens
u32
Number of tokens generated
finish_reason
FinishReason
Why generation stopped

FinishReason

Stop
()
Model naturally completed the response
Length
()
Hit max_tokens limit
ToolUse
()
Model wants to use tools
ContentFilter
()
Filtered by content policy
Unknown
()
Unknown reason

ToolCompletionRequest

Request for a completion with tool use support.
messages
Vec<ChatMessage>
Conversation history
tools
Vec<ToolDefinition>
Available tools for the model
model
Option<String>
Optional model override
max_tokens
Option<u32>
Maximum tokens to generate
temperature
Option<f32>
Sampling temperature
tool_choice
Option<String>
How to handle tools: “auto”, “required”, or “none”
metadata
HashMap<String, String>
Opaque metadata

Methods

new
fn(messages: Vec<ChatMessage>, tools: Vec<ToolDefinition>) -> Self
Create a new tool completion request
with_tool_choice
fn(self, choice: impl Into<String>) -> Self
Set tool choice mode (“auto”, “required”, “none”)

Example

use ironclaw::llm::{ToolCompletionRequest, ToolDefinition};

let tools = vec![
    ToolDefinition {
        name: "get_weather".to_string(),
        description: "Get current weather".to_string(),
        parameters: serde_json::json!({
            "type": "object",
            "properties": {
                "location": { "type": "string" }
            },
            "required": ["location"]
        }),
    }
];

let request = ToolCompletionRequest::new(messages, tools)
    .with_tool_choice("auto");

let response = llm.complete_with_tools(request).await?;
for tool_call in response.tool_calls {
    println!("Tool: {} - Args: {}", tool_call.name, tool_call.arguments);
}

ToolCompletionResponse

Response from a tool-enabled completion.
content
Option<String>
Text content (may be empty if tool calls present)
tool_calls
Vec<ToolCall>
Tool calls requested by the model
input_tokens
u32
Prompt tokens
output_tokens
u32
Generated tokens
finish_reason
FinishReason
Why generation stopped

ToolCall

A tool call requested by the LLM.
id
String
Unique call identifier
name
String
Tool name
arguments
serde_json::Value
Tool arguments as JSON

ToolDefinition

Definition of a tool for the LLM.
name
String
Tool name
description
String
What the tool does
parameters
serde_json::Value
JSON Schema for parameters

ToolResult

Result of tool execution to send back to the LLM.
tool_call_id
String
ID of the tool call this is a result for
name
String
Tool name
content
String
Result content
is_error
bool
Whether this result is an error

Provider Creation

create_llm_provider

create_llm_provider
fn(config: &LlmConfig, session: Arc<SessionManager>) -> Result<Arc<dyn LlmProvider>>
Create an LLM provider based on configuration
use ironclaw::llm::{create_llm_provider, SessionManager};
use ironclaw::config::LlmConfig;

let session_mgr = Arc::new(SessionManager::new(session_config));
let llm = create_llm_provider(&llm_config, session_mgr)?;

build_provider_chain

build_provider_chain
fn(config: &LlmConfig, session: Arc<SessionManager>) -> Result<(Arc<dyn LlmProvider>, Option<Arc<dyn LlmProvider>>)>
Build the full provider chain with retry, failover, circuit breaker, cache, etc. Returns (main_provider, cheap_provider).
use ironclaw::llm::build_provider_chain;

let (llm, cheap_llm) = build_provider_chain(&config, session_mgr)?;
The provider chain applies decorators in this order:
  1. Raw provider (from config)
  2. RetryProvider (exponential backoff)
  3. SmartRoutingProvider (cheap/primary split)
  4. FailoverProvider (fallback model)
  5. CircuitBreakerProvider (fast-fail when degraded)
  6. CachedProvider (response cache)

Resilience Features

RetryProvider

Retries failed requests with exponential backoff.
new
fn(provider: Arc<dyn LlmProvider>, config: RetryConfig) -> Self
Wrap a provider with retry logic

RetryConfig

max_retries
u32
default:"3"
Maximum number of retry attempts
use ironclaw::llm::{RetryProvider, RetryConfig};

let retry_llm = Arc::new(RetryProvider::new(
    base_llm,
    RetryConfig { max_retries: 3 }
));

FailoverProvider

Fails over to backup providers when primary fails.
new
fn(providers: Vec<Arc<dyn LlmProvider>>) -> Result<Self>
Create a failover provider with ordered list of providers
with_cooldown
fn(providers: Vec<Arc<dyn LlmProvider>>, config: CooldownConfig) -> Result<Self>
Create failover with cooldown (temporary disable after failures)

CooldownConfig

cooldown_duration
Duration
default:"300s"
How long to disable a provider after failures
failure_threshold
usize
default:"3"
Number of failures before cooldown
use ironclaw::llm::{FailoverProvider, CooldownConfig};
use std::time::Duration;

let failover = Arc::new(FailoverProvider::with_cooldown(
    vec![primary_llm, fallback_llm],
    CooldownConfig {
        cooldown_duration: Duration::from_secs(300),
        failure_threshold: 3,
    }
)?);

CircuitBreakerProvider

Fast-fails when backend is degraded (circuit breaker pattern).
new
fn(provider: Arc<dyn LlmProvider>, config: CircuitBreakerConfig) -> Self
Wrap a provider with circuit breaker

CircuitBreakerConfig

failure_threshold
usize
default:"5"
Failures before opening circuit
recovery_timeout
Duration
default:"30s"
How long to wait before testing recovery
half_open_max_calls
usize
default:"3"
Test calls allowed in half-open state
use ironclaw::llm::{CircuitBreakerProvider, CircuitBreakerConfig};

let cb_llm = Arc::new(CircuitBreakerProvider::new(
    base_llm,
    CircuitBreakerConfig {
        failure_threshold: 5,
        recovery_timeout: Duration::from_secs(30),
        ..Default::default()
    }
));

CachedProvider

Caches responses to avoid redundant API calls.
new
fn(provider: Arc<dyn LlmProvider>, config: ResponseCacheConfig) -> Self
Wrap a provider with response caching

ResponseCacheConfig

ttl
Duration
default:"3600s"
Cache entry time-to-live
max_entries
usize
default:"1000"
Maximum cache entries
use ironclaw::llm::{CachedProvider, ResponseCacheConfig};

let cached = Arc::new(CachedProvider::new(
    base_llm,
    ResponseCacheConfig {
        ttl: Duration::from_secs(3600),
        max_entries: 1000,
    }
));

SmartRoutingProvider

Routes simple requests to cheap model, complex to primary.
new
fn(primary: Arc<dyn LlmProvider>, cheap: Arc<dyn LlmProvider>, config: SmartRoutingConfig) -> Self
Create smart routing between two providers

SmartRoutingConfig

cascade_enabled
bool
default:"true"
Retry on primary if cheap model fails
complexity_threshold
f32
default:"0.6"
Complexity score above which to use primary (0.0 to 1.0)
use ironclaw::llm::{SmartRoutingProvider, SmartRoutingConfig};

let smart = Arc::new(SmartRoutingProvider::new(
    expensive_llm,
    cheap_llm,
    SmartRoutingConfig {
        cascade_enabled: true,
        complexity_threshold: 0.6,
    }
));

Session Management

SessionManager

Manages NEAR AI authentication sessions.
new
fn(config: SessionConfig) -> Self
Create a new session manager
get_or_create_session
async fn(&self) -> Result<String>
Get current session token or create new session
refresh_session
async fn(&self) -> Result<()>
Refresh the current session

SessionConfig

session_path
PathBuf
Where to store session data
auth_base_url
String
NEAR AI auth endpoint

Error Handling

LlmError

AuthFailed
{ provider: String }
Authentication failed
RequestFailed
{ provider: String, reason: String }
API request failed
RateLimited
{ retry_after: Option<Duration> }
Rate limit exceeded
InvalidResponse
String
Response parsing failed
ContextLengthExceeded
{ requested: usize, max: usize }
Request exceeds model context window
CircuitOpen
String
Circuit breaker is open

Cost Tracking

TokenUsage

Tracks token usage and costs.
input_tokens
u32
Tokens in prompts
output_tokens
u32
Tokens in completions
total_cost
Decimal
Total cost in USD
use ironclaw::llm::TokenUsage;

let usage = TokenUsage {
    input_tokens: response.input_tokens,
    output_tokens: response.output_tokens,
    total_cost: calculate_cost(&response, &llm),
};

Agent Module

Agent reasoning and tool selection using LLMs

Workspace Module

Workspace system prompts and memory context

Build docs developers (and LLMs) love