Anthropic Claude

Overview

LiteLLM provides comprehensive support for Anthropic’s Claude models, including advanced features like prompt caching, computer use, web search, and extended thinking.

Quick Start

Install LiteLLM

pip install litellm

Set API Key

export ANTHROPIC_API_KEY="sk-ant-..."

Make Your First Call

from litellm import completion

response = completion(
    model="anthropic/claude-sonnet-4-20250514",
    messages=[{"role": "user", "content": "Hello Claude!"}]
)
print(response.choices[0].message.content)

Supported Models

Claude 4
Claude 3.7
Claude 3.5
Claude 3

Latest generation with extended thinking and advanced reasoning.

# Claude 4.6 - Latest model with reasoning
response = completion(
    model="anthropic/claude-4-6-sonnet-20250514",
    messages=[{"role": "user", "content": "Solve this complex problem..."}]
)

# With extended thinking (reasoning)
response = completion(
    model="anthropic/claude-4-6-sonnet-20250514",
    messages=[{"role": "user", "content": "Complex analysis task..."}],
    thinking={
        "type": "enabled",
        "budget_tokens": 10000  # Allocate tokens for thinking
    }
)

Advanced Sonnet model with excellent performance.

response = completion(
    model="anthropic/claude-3-7-sonnet-20250219",
    messages=[{"role": "user", "content": "Write detailed analysis..."}],
    max_tokens=4096
)

Popular Sonnet and Haiku models.

# Claude 3.5 Sonnet - Great balance
response = completion(
    model="anthropic/claude-3-5-sonnet-20240620",
    messages=[{"role": "user", "content": "Analyze this data..."}]
)

# Claude 3.5 Haiku - Fast and efficient
response = completion(
    model="anthropic/claude-3-5-haiku-20241022",
    messages=[{"role": "user", "content": "Quick task..."}]
)

Previous generation models.

# Claude 3 Opus - Most capable
response = completion(
    model="anthropic/claude-3-opus-20240229",
    messages=[{"role": "user", "content": "Complex reasoning..."}]
)

# Claude 3 Sonnet
response = completion(
    model="anthropic/claude-3-sonnet-20240229",
    messages=[{"role": "user", "content": "Balanced task..."}]
)

# Claude 3 Haiku - Fast
response = completion(
    model="anthropic/claude-3-haiku-20240307",
    messages=[{"role": "user", "content": "Quick query..."}]
)

Authentication

Environment Variable
Direct Parameter

export ANTHROPIC_API_KEY="sk-ant-..."

from litellm import completion

response = completion(
    model="anthropic/claude-3-5-sonnet-20240620",
    messages=[{"role": "user", "content": "Hello!"}]
)

from litellm import completion

response = completion(
    model="anthropic/claude-3-5-sonnet-20240620",
    messages=[{"role": "user", "content": "Hello!"}],
    api_key="sk-ant-..."
)

Extended Thinking (Reasoning)

Claude 4.6 supports extended thinking for complex reasoning tasks:

response = completion(
    model="anthropic/claude-4-6-sonnet-20250514",
    messages=[{"role": "user", "content": "Solve this math problem: ..."}],
    thinking={
        "type": "enabled",
        "budget_tokens": 5000  # Tokens allocated for thinking
    }
)

# Access thinking content
for block in response.choices[0].message.content:
    if block.get("type") == "thinking":
        print(f"Thinking: {block['thinking']}")
    elif block.get("type") == "text":
        print(f"Response: {block['text']}")

Prompt Caching

Save costs by caching frequently used context:

response = completion(
    model="anthropic/claude-3-5-sonnet-20240620",
    messages=[
        {
            "role": "system",
            "content": [
                {
                    "type": "text",
                    "text": "You are an expert in...",  # Long system prompt
                    "cache_control": {"type": "ephemeral"}  # Cache this
                }
            ]
        },
        {"role": "user", "content": "Question 1"}
    ]
)

# Subsequent requests reuse cached context (5-minute TTL)
response2 = completion(
    model="anthropic/claude-3-5-sonnet-20240620",
    messages=[
        # Same cached system message
        {"role": "system", "content": [{
            "type": "text",
            "text": "You are an expert in...",
            "cache_control": {"type": "ephemeral"}
        }]},
        {"role": "user", "content": "Question 2"}  # Only this is new
    ]
)

Computer Use

Claude can interact with computers through screenshots and commands:

tools = [{
    "type": "computer_20241022",
    "name": "computer",
    "display_width_px": 1920,
    "display_height_px": 1080,
    "display_number": 1
}]

response = completion(
    model="anthropic/claude-3-5-sonnet-20241022",
    messages=[{
        "role": "user",
        "content": "Click on the search button and type 'hello'"
    }],
    tools=tools
)

# Claude returns tool use with computer actions
for block in response.choices[0].message.content:
    if block.get("type") == "tool_use":
        action = block.get("input", {})
        print(f"Action: {action.get('action')}")
        # Actions: key, type, mouse_move, left_click, etc.

Web Search

Claude can search the web for current information:

# Enable web search tool
tools = [{
    "type": "web_search_20250101",
    "name": "web_search",
    "max_uses": 5,  # Limit search queries
    "user_location": {
        "type": "auto"  # or specify: {"type": "city", "city": "San Francisco, CA"}
    }
}]

response = completion(
    model="anthropic/claude-3-7-sonnet-20250219",
    messages=[{
        "role": "user",
        "content": "What are the latest developments in AI this week?"
    }],
    tools=tools
)

# Claude automatically searches and cites sources
for block in response.choices[0].message.content:
    if block.get("type") == "text":
        print(block.get("text"))

Function Calling

Claude supports sophisticated tool use:

tools = [{
    "type": "function",
    "function": {
        "name": "get_weather",
        "description": "Get weather for a location",
        "parameters": {
            "type": "object",
            "properties": {
                "location": {
                    "type": "string",
                    "description": "City name"
                }
            },
            "required": ["location"]
        }
    }
}]

response = completion(
    model="anthropic/claude-3-5-sonnet-20240620",
    messages=[{"role": "user", "content": "What's the weather in Paris?"}],
    tools=tools
)

if response.choices[0].message.tool_calls:
    tool_call = response.choices[0].message.tool_calls[0]
    print(f"Function: {tool_call.function.name}")
    print(f"Args: {tool_call.function.arguments}")

Vision (Multimodal)

Claude models support image analysis:

Image URL
Base64 Image
Multiple Images

response = completion(
    model="anthropic/claude-3-5-sonnet-20240620",
    messages=[{
        "role": "user",
        "content": [
            {"type": "text", "text": "What's in this image?"},
            {
                "type": "image_url",
                "image_url": {"url": "https://example.com/image.jpg"}
            }
        ]
    }]
)

import base64

with open("image.jpg", "rb") as f:
    image_data = base64.b64encode(f.read()).decode('utf-8')

response = completion(
    model="anthropic/claude-3-5-sonnet-20240620",
    messages=[{
        "role": "user",
        "content": [
            {"type": "text", "text": "Describe this"},
            {
                "type": "image_url",
                "image_url": {
                    "url": f"data:image/jpeg;base64,{image_data}"
                }
            }
        ]
    }]
)

response = completion(
    model="anthropic/claude-3-5-sonnet-20240620",
    messages=[{
        "role": "user",
        "content": [
            {"type": "text", "text": "Compare these screenshots"},
            {"type": "image_url", "image_url": {"url": "https://..."}},
            {"type": "image_url", "image_url": {"url": "https://..."}}
        ]
    }]
)

Streaming

from litellm import completion

response = completion(
    model="anthropic/claude-3-5-sonnet-20240620",
    messages=[{"role": "user", "content": "Write a story"}],
    stream=True
)

for chunk in response:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)

Streaming with Thinking

response = completion(
    model="anthropic/claude-4-6-sonnet-20250514",
    messages=[{"role": "user", "content": "Solve this problem..."}],
    thinking={"type": "enabled", "budget_tokens": 5000},
    stream=True
)

for chunk in response:
    delta = chunk.choices[0].delta
    
    # Handle thinking content
    if hasattr(delta, 'thinking'):
        print(f"[Thinking] {delta.thinking}", end="")
    
    # Handle regular content
    if delta.content:
        print(delta.content, end="", flush=True)

JSON Mode

# JSON object mode
response = completion(
    model="anthropic/claude-3-5-sonnet-20240620",
    messages=[{
        "role": "user",
        "content": "Extract: John is 30, lives in NYC, likes pizza"
    }],
    response_format={"type": "json_object"}
)

import json
data = json.loads(response.choices[0].message.content)

Batch Processing

Process requests asynchronously in batches:

from litellm import create_batch, retrieve_batch

# Create batch
batch = create_batch(
    custom_llm_provider="anthropic",
    input_file_id="file-abc123",
    endpoint="/v1/messages"
)

print(f"Batch ID: {batch.id}")

# Retrieve results
batch_result = retrieve_batch(
    custom_llm_provider="anthropic",
    batch_id=batch.id
)

Advanced Parameters

System Messages

response = completion(
    model="anthropic/claude-3-5-sonnet-20240620",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hello!"}
    ]
)

Temperature and Top P

response = completion(
    model="anthropic/claude-3-5-sonnet-20240620",
    messages=[{"role": "user", "content": "Be creative"}],
    temperature=1.0,  # 0.0 to 1.0
    top_p=0.9,
    top_k=50
)

Stop Sequences

response = completion(
    model="anthropic/claude-3-5-sonnet-20240620",
    messages=[{"role": "user", "content": "Count to 10"}],
    stop=["5", "\n\n"]  # Stop at these sequences
)

Max Tokens

# Important: Anthropic requires max_tokens to be set
response = completion(
    model="anthropic/claude-3-5-sonnet-20240620",
    messages=[{"role": "user", "content": "Write an essay"}],
    max_tokens=4096  # Required parameter
)

Error Handling

from litellm import completion
from litellm.exceptions import (
    AuthenticationError,
    RateLimitError,
    ContextWindowExceededError,
    APIError
)

try:
    response = completion(
        model="anthropic/claude-3-5-sonnet-20240620",
        messages=[{"role": "user", "content": "Hello"}],
        max_tokens=1024
    )
except AuthenticationError:
    print("Invalid API key")
except RateLimitError:
    print("Rate limit hit")
except ContextWindowExceededError:
    print("Input too long")
except APIError as e:
    print(f"API error: {e}")

Cost Tracking

from litellm import completion, completion_cost

response = completion(
    model="anthropic/claude-3-5-sonnet-20240620",
    messages=[{"role": "user", "content": "Hello"}],
    max_tokens=100
)

# Track costs including cache usage
cost = completion_cost(completion_response=response)
print(f"Cost: ${cost:.6f}")

# Check cache usage
if hasattr(response.usage, 'cache_read_input_tokens'):
    print(f"Cached tokens: {response.usage.cache_read_input_tokens}")
    print(f"New tokens: {response.usage.prompt_tokens}")

Best Practices

Use Prompt Caching

Cache system prompts and long documents to reduce costs by up to 90%.

Set Max Tokens

Always set max_tokens - it’s required by Anthropic’s API.

Use Extended Thinking

Enable thinking for complex reasoning, math, and analysis tasks.

Try Haiku First

Use Claude 3.5 Haiku for simple tasks - it’s fast and cost-effective.

Function Calling

Deep dive into tool use with Claude

Vision

Working with images in Claude

Streaming

Stream responses in real-time

Batching

Process requests in batches

Providers

Provider Features

Anthropic Claude

Overview

Quick Start

Supported Models

Authentication

Extended Thinking (Reasoning)

Prompt Caching

Computer Use

Web Search

Function Calling

Vision (Multimodal)

Streaming

Streaming with Thinking

JSON Mode

Batch Processing

Advanced Parameters

System Messages

Temperature and Top P

Stop Sequences

Max Tokens

Error Handling

Cost Tracking

Best Practices

Use Prompt Caching

Set Max Tokens

Use Extended Thinking

Try Haiku First

Function Calling

Vision

Streaming

Batching

Build docs developers (and LLMs) love

Providers

Provider Features

Documentation Index

​Overview

​Quick Start

​Supported Models

​Authentication

​Extended Thinking (Reasoning)

​Prompt Caching

​Computer Use

​Web Search

​Function Calling

​Vision (Multimodal)

​Streaming

​Streaming with Thinking

​JSON Mode

​Batch Processing

​Advanced Parameters

​System Messages

​Temperature and Top P

​Stop Sequences

​Max Tokens

​Error Handling

​Cost Tracking

​Best Practices

Use Prompt Caching

Set Max Tokens

Use Extended Thinking

Try Haiku First

​Related Documentation

Function Calling

Vision

Streaming

Batching

Build docs developers (and LLMs) love

Overview

Quick Start

Supported Models

Authentication

Extended Thinking (Reasoning)

Prompt Caching

Computer Use

Web Search

Function Calling

Vision (Multimodal)

Streaming

Streaming with Thinking

JSON Mode

Batch Processing

Advanced Parameters

System Messages

Temperature and Top P

Stop Sequences

Max Tokens

Error Handling

Cost Tracking

Best Practices

Related Documentation