LLM Module - IronClaw

Overview

The LLM module provides integration with multiple language model providers:

NEAR AI (default): Session token or API key auth via Chat Completions API
OpenAI: Direct API access with your own key
Anthropic: Claude models via direct API
Ollama: Local model inference
OpenAI-compatible: Any endpoint that speaks the OpenAI API
Tinfoil: Private inference endpoints

Includes resilience features: retry logic, failover, circuit breakers, response caching, and smart routing.

Core Types

LlmProvider Trait

The main trait that all LLM providers must implement.

#[async_trait]
pub trait LlmProvider: Send + Sync {
    fn model_name(&self) -> &str;
    fn cost_per_token(&self) -> (Decimal, Decimal);
    
    async fn complete(&self, request: CompletionRequest) -> Result<CompletionResponse, LlmError>;
    
    async fn complete_with_tools(
        &self,
        request: ToolCompletionRequest,
    ) -> Result<ToolCompletionResponse, LlmError>;
    
    async fn list_models(&self) -> Result<Vec<String>, LlmError> {
        Ok(Vec::new())
    }
    
    async fn model_metadata(&self) -> Result<ModelMetadata, LlmError> {
        Ok(ModelMetadata {
            id: self.model_name().to_string(),
            context_length: None,
        })
    }
}

Methods

model_name

fn(&self) -> &str

Return the model identifier

cost_per_token

fn(&self) -> (Decimal, Decimal)

Return (input_cost, output_cost) per token in USD

complete

async fn(&self, request: CompletionRequest) -> Result<CompletionResponse>

Generate a completion for a conversation

complete_with_tools

async fn(&self, request: ToolCompletionRequest) -> Result<ToolCompletionResponse>

Generate a completion with tool calling support

list_models

async fn(&self) -> Result<Vec<String>>

List available models from the provider (optional)

model_metadata

async fn(&self) -> Result<ModelMetadata>

Fetch metadata about the model (context length, etc.)

ChatMessage

A message in a conversation.

role

Role

Message role (System, User, Assistant, Tool)

content

String

Message content/text

tool_call_id

Option<String>

Tool call ID if this is a tool result message

name

Option<String>

Tool name for tool result messages

tool_calls

Option<Vec<ToolCall>>

Tool calls made by the assistant

Constructors

system

fn(content: impl Into<String>) -> Self

Create a system message

user

fn(content: impl Into<String>) -> Self

Create a user message

assistant

fn(content: impl Into<String>) -> Self

Create an assistant message

assistant_with_tool_calls

fn(content: Option<String>, tool_calls: Vec<ToolCall>) -> Self

Create an assistant message with tool calls

tool_result

fn(tool_call_id: impl Into<String>, name: impl Into<String>, content: impl Into<String>) -> Self

Create a tool result message

Example

use ironclaw::llm::{ChatMessage, Role};

let messages = vec![
    ChatMessage::system("You are a helpful assistant."),
    ChatMessage::user("What is the capital of France?"),
    ChatMessage::assistant("The capital of France is Paris."),
];

CompletionRequest

Request for a chat completion.

messages

Vec<ChatMessage>

Conversation history

model

Option<String>

Optional per-request model override

max_tokens

Option<u32>

Maximum tokens to generate

temperature

Option<f32>

Sampling temperature (0.0 to 2.0)

stop_sequences

Option<Vec<String>>

Sequences that stop generation

metadata

HashMap<String, String>

Opaque metadata passed to provider

Methods

new

fn(messages: Vec<ChatMessage>) -> Self

Create a new completion request

with_model

fn(self, model: impl Into<String>) -> Self

Set model override

with_max_tokens

fn(self, max_tokens: u32) -> Self

Set max tokens

with_temperature

fn(self, temperature: f32) -> Self

Set temperature

Example

use ironclaw::llm::{CompletionRequest, ChatMessage};

let request = CompletionRequest::new(messages)
    .with_max_tokens(1000)
    .with_temperature(0.7);

let response = llm.complete(request).await?;
println!("Response: {}", response.content);

CompletionResponse

Response from a chat completion.

content

String

Generated text content

input_tokens

u32

Number of tokens in the prompt

output_tokens

u32

Number of tokens generated

finish_reason

FinishReason

Why generation stopped

FinishReason

Stop

()

Model naturally completed the response

Length

()

Hit max_tokens limit

ToolUse

()

Model wants to use tools

ContentFilter

()

Filtered by content policy

Unknown

()

Unknown reason

ToolCompletionRequest

Request for a completion with tool use support.

messages

Vec<ChatMessage>

Conversation history

tools

Vec<ToolDefinition>

Available tools for the model

model

Option<String>

Optional model override

max_tokens

Option<u32>

Maximum tokens to generate

temperature

Option<f32>

Sampling temperature

tool_choice

Option<String>

How to handle tools: “auto”, “required”, or “none”

metadata

HashMap<String, String>

Opaque metadata

Methods

new

fn(messages: Vec<ChatMessage>, tools: Vec<ToolDefinition>) -> Self

Create a new tool completion request

with_tool_choice

fn(self, choice: impl Into<String>) -> Self

Set tool choice mode (“auto”, “required”, “none”)

Example

use ironclaw::llm::{ToolCompletionRequest, ToolDefinition};

let tools = vec![
    ToolDefinition {
        name: "get_weather".to_string(),
        description: "Get current weather".to_string(),
        parameters: serde_json::json!({
            "type": "object",
            "properties": {
                "location": { "type": "string" }
            },
            "required": ["location"]
        }),
    }
];

let request = ToolCompletionRequest::new(messages, tools)
    .with_tool_choice("auto");

let response = llm.complete_with_tools(request).await?;
for tool_call in response.tool_calls {
    println!("Tool: {} - Args: {}", tool_call.name, tool_call.arguments);
}

ToolCompletionResponse

Response from a tool-enabled completion.

content

Option<String>

Text content (may be empty if tool calls present)

tool_calls

Vec<ToolCall>

Tool calls requested by the model

input_tokens

u32

Prompt tokens

output_tokens

u32

Generated tokens

finish_reason

FinishReason

Why generation stopped

ToolCall

A tool call requested by the LLM.

String

Unique call identifier

name

String

Tool name

arguments

serde_json::Value

Tool arguments as JSON

ToolDefinition

Definition of a tool for the LLM.

name

String

Tool name

description

String

What the tool does

parameters

serde_json::Value

JSON Schema for parameters

ToolResult

Result of tool execution to send back to the LLM.

tool_call_id

String

ID of the tool call this is a result for

name

String

Tool name

content

String

Result content

is_error

bool

Whether this result is an error

Provider Creation

create_llm_provider

fn(config: &LlmConfig, session: Arc<SessionManager>) -> Result<Arc<dyn LlmProvider>>

Create an LLM provider based on configuration

use ironclaw::llm::{create_llm_provider, SessionManager};
use ironclaw::config::LlmConfig;

let session_mgr = Arc::new(SessionManager::new(session_config));
let llm = create_llm_provider(&llm_config, session_mgr)?;

build_provider_chain

fn(config: &LlmConfig, session: Arc<SessionManager>) -> Result<(Arc<dyn LlmProvider>, Option<Arc<dyn LlmProvider>>)>

Build the full provider chain with retry, failover, circuit breaker, cache, etc. Returns (main_provider, cheap_provider).

use ironclaw::llm::build_provider_chain;

let (llm, cheap_llm) = build_provider_chain(&config, session_mgr)?;

The provider chain applies decorators in this order:

Raw provider (from config)
RetryProvider (exponential backoff)
SmartRoutingProvider (cheap/primary split)
FailoverProvider (fallback model)
CircuitBreakerProvider (fast-fail when degraded)
CachedProvider (response cache)

Resilience Features

RetryProvider

Retries failed requests with exponential backoff.

new

fn(provider: Arc<dyn LlmProvider>, config: RetryConfig) -> Self

Wrap a provider with retry logic

RetryConfig

max_retries

u32

default:"3"

Maximum number of retry attempts

use ironclaw::llm::{RetryProvider, RetryConfig};

let retry_llm = Arc::new(RetryProvider::new(
    base_llm,
    RetryConfig { max_retries: 3 }
));

FailoverProvider

Fails over to backup providers when primary fails.

new

fn(providers: Vec<Arc<dyn LlmProvider>>) -> Result<Self>

Create a failover provider with ordered list of providers

with_cooldown

fn(providers: Vec<Arc<dyn LlmProvider>>, config: CooldownConfig) -> Result<Self>

Create failover with cooldown (temporary disable after failures)

CooldownConfig

cooldown_duration

Duration

default:"300s"

How long to disable a provider after failures

failure_threshold

usize

default:"3"

Number of failures before cooldown

use ironclaw::llm::{FailoverProvider, CooldownConfig};
use std::time::Duration;

let failover = Arc::new(FailoverProvider::with_cooldown(
    vec![primary_llm, fallback_llm],
    CooldownConfig {
        cooldown_duration: Duration::from_secs(300),
        failure_threshold: 3,
    }
)?);

CircuitBreakerProvider

Fast-fails when backend is degraded (circuit breaker pattern).

new

fn(provider: Arc<dyn LlmProvider>, config: CircuitBreakerConfig) -> Self

Wrap a provider with circuit breaker

CircuitBreakerConfig

failure_threshold

usize

default:"5"

Failures before opening circuit

recovery_timeout

Duration

default:"30s"

How long to wait before testing recovery

half_open_max_calls

usize

default:"3"

Test calls allowed in half-open state

use ironclaw::llm::{CircuitBreakerProvider, CircuitBreakerConfig};

let cb_llm = Arc::new(CircuitBreakerProvider::new(
    base_llm,
    CircuitBreakerConfig {
        failure_threshold: 5,
        recovery_timeout: Duration::from_secs(30),
        ..Default::default()
    }
));

CachedProvider

Caches responses to avoid redundant API calls.

new

fn(provider: Arc<dyn LlmProvider>, config: ResponseCacheConfig) -> Self

Wrap a provider with response caching

ResponseCacheConfig

ttl

Duration

default:"3600s"

Cache entry time-to-live

max_entries

usize

default:"1000"

Maximum cache entries

use ironclaw::llm::{CachedProvider, ResponseCacheConfig};

let cached = Arc::new(CachedProvider::new(
    base_llm,
    ResponseCacheConfig {
        ttl: Duration::from_secs(3600),
        max_entries: 1000,
    }
));

SmartRoutingProvider

Routes simple requests to cheap model, complex to primary.

new

fn(primary: Arc<dyn LlmProvider>, cheap: Arc<dyn LlmProvider>, config: SmartRoutingConfig) -> Self

Create smart routing between two providers

SmartRoutingConfig

cascade_enabled

bool

default:"true"

Retry on primary if cheap model fails

complexity_threshold

f32

default:"0.6"

Complexity score above which to use primary (0.0 to 1.0)

use ironclaw::llm::{SmartRoutingProvider, SmartRoutingConfig};

let smart = Arc::new(SmartRoutingProvider::new(
    expensive_llm,
    cheap_llm,
    SmartRoutingConfig {
        cascade_enabled: true,
        complexity_threshold: 0.6,
    }
));

Session Management

SessionManager

Manages NEAR AI authentication sessions.

new

fn(config: SessionConfig) -> Self

Create a new session manager

get_or_create_session

async fn(&self) -> Result<String>

Get current session token or create new session

refresh_session

async fn(&self) -> Result<()>

Refresh the current session

SessionConfig

session_path

PathBuf

Where to store session data

auth_base_url

String

NEAR AI auth endpoint

Error Handling

LlmError

AuthFailed

{ provider: String }

Authentication failed

RequestFailed

{ provider: String, reason: String }

API request failed

RateLimited

{ retry_after: Option<Duration> }

Rate limit exceeded

InvalidResponse

String

Response parsing failed

ContextLengthExceeded

{ requested: usize, max: usize }

Request exceeds model context window

CircuitOpen

String

Circuit breaker is open

Cost Tracking

TokenUsage

Tracks token usage and costs.

input_tokens

u32

Tokens in prompts

output_tokens

u32

Tokens in completions

total_cost

Decimal

Total cost in USD

use ironclaw::llm::TokenUsage;

let usage = TokenUsage {
    input_tokens: response.input_tokens,
    output_tokens: response.output_tokens,
    total_cost: calculate_cost(&response, &llm),
};

Agent Module

Agent reasoning and tool selection using LLMs

Workspace Module

Workspace system prompts and memory context

Core Modules

Built-in Tools

Documentation Index

​Overview

​Core Types

​LlmProvider Trait

​Methods

​ChatMessage

​Constructors

​Example

​CompletionRequest

​Methods

​Example

​CompletionResponse

​FinishReason

​ToolCompletionRequest

​Methods

​Example

​ToolCompletionResponse

​ToolCall

​ToolDefinition

​ToolResult

​Provider Creation

​create_llm_provider

​build_provider_chain

​Resilience Features

​RetryProvider

​RetryConfig

​FailoverProvider

​CooldownConfig

​CircuitBreakerProvider

​CircuitBreakerConfig

​CachedProvider

​ResponseCacheConfig

​SmartRoutingProvider

​SmartRoutingConfig

​Session Management

​SessionManager

​SessionConfig

​Error Handling

​LlmError

​Cost Tracking

​TokenUsage

​Related Modules

Agent Module

Workspace Module

Build docs developers (and LLMs) love

Overview

Core Types

LlmProvider Trait

Methods

ChatMessage

Constructors

Example

CompletionRequest

Methods

Example

CompletionResponse

FinishReason

ToolCompletionRequest

Methods

Example

ToolCompletionResponse

ToolCall

ToolDefinition

ToolResult

Provider Creation

create_llm_provider

build_provider_chain

Resilience Features

RetryProvider

RetryConfig

FailoverProvider

CooldownConfig

CircuitBreakerProvider

CircuitBreakerConfig

CachedProvider

ResponseCacheConfig

SmartRoutingProvider

SmartRoutingConfig

Session Management

SessionManager

SessionConfig

Error Handling

LlmError

Cost Tracking

TokenUsage

Related Modules