Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/BerriAI/litellm/llms.txt

Use this file to discover all available pages before exploring further.

Overview

LiteLLM provides comprehensive support for Anthropic’s Claude models, including advanced features like prompt caching, computer use, web search, and extended thinking.

Quick Start

1

Install LiteLLM

pip install litellm
2

Set API Key

export ANTHROPIC_API_KEY="sk-ant-..."
3

Make Your First Call

from litellm import completion

response = completion(
    model="anthropic/claude-sonnet-4-20250514",
    messages=[{"role": "user", "content": "Hello Claude!"}]
)
print(response.choices[0].message.content)

Supported Models

Latest generation with extended thinking and advanced reasoning.
# Claude 4.6 - Latest model with reasoning
response = completion(
    model="anthropic/claude-4-6-sonnet-20250514",
    messages=[{"role": "user", "content": "Solve this complex problem..."}]
)

# With extended thinking (reasoning)
response = completion(
    model="anthropic/claude-4-6-sonnet-20250514",
    messages=[{"role": "user", "content": "Complex analysis task..."}],
    thinking={
        "type": "enabled",
        "budget_tokens": 10000  # Allocate tokens for thinking
    }
)

Authentication

export ANTHROPIC_API_KEY="sk-ant-..."
from litellm import completion

response = completion(
    model="anthropic/claude-3-5-sonnet-20240620",
    messages=[{"role": "user", "content": "Hello!"}]
)

Extended Thinking (Reasoning)

Claude 4.6 supports extended thinking for complex reasoning tasks:
response = completion(
    model="anthropic/claude-4-6-sonnet-20250514",
    messages=[{"role": "user", "content": "Solve this math problem: ..."}],
    thinking={
        "type": "enabled",
        "budget_tokens": 5000  # Tokens allocated for thinking
    }
)

# Access thinking content
for block in response.choices[0].message.content:
    if block.get("type") == "thinking":
        print(f"Thinking: {block['thinking']}")
    elif block.get("type") == "text":
        print(f"Response: {block['text']}")

Prompt Caching

Save costs by caching frequently used context:
response = completion(
    model="anthropic/claude-3-5-sonnet-20240620",
    messages=[
        {
            "role": "system",
            "content": [
                {
                    "type": "text",
                    "text": "You are an expert in...",  # Long system prompt
                    "cache_control": {"type": "ephemeral"}  # Cache this
                }
            ]
        },
        {"role": "user", "content": "Question 1"}
    ]
)

# Subsequent requests reuse cached context (5-minute TTL)
response2 = completion(
    model="anthropic/claude-3-5-sonnet-20240620",
    messages=[
        # Same cached system message
        {"role": "system", "content": [{
            "type": "text",
            "text": "You are an expert in...",
            "cache_control": {"type": "ephemeral"}
        }]},
        {"role": "user", "content": "Question 2"}  # Only this is new
    ]
)

Computer Use

Claude can interact with computers through screenshots and commands:
tools = [{
    "type": "computer_20241022",
    "name": "computer",
    "display_width_px": 1920,
    "display_height_px": 1080,
    "display_number": 1
}]

response = completion(
    model="anthropic/claude-3-5-sonnet-20241022",
    messages=[{
        "role": "user",
        "content": "Click on the search button and type 'hello'"
    }],
    tools=tools
)

# Claude returns tool use with computer actions
for block in response.choices[0].message.content:
    if block.get("type") == "tool_use":
        action = block.get("input", {})
        print(f"Action: {action.get('action')}")
        # Actions: key, type, mouse_move, left_click, etc.
Claude can search the web for current information:
# Enable web search tool
tools = [{
    "type": "web_search_20250101",
    "name": "web_search",
    "max_uses": 5,  # Limit search queries
    "user_location": {
        "type": "auto"  # or specify: {"type": "city", "city": "San Francisco, CA"}
    }
}]

response = completion(
    model="anthropic/claude-3-7-sonnet-20250219",
    messages=[{
        "role": "user",
        "content": "What are the latest developments in AI this week?"
    }],
    tools=tools
)

# Claude automatically searches and cites sources
for block in response.choices[0].message.content:
    if block.get("type") == "text":
        print(block.get("text"))

Function Calling

Claude supports sophisticated tool use:
tools = [{
    "type": "function",
    "function": {
        "name": "get_weather",
        "description": "Get weather for a location",
        "parameters": {
            "type": "object",
            "properties": {
                "location": {
                    "type": "string",
                    "description": "City name"
                }
            },
            "required": ["location"]
        }
    }
}]

response = completion(
    model="anthropic/claude-3-5-sonnet-20240620",
    messages=[{"role": "user", "content": "What's the weather in Paris?"}],
    tools=tools
)

if response.choices[0].message.tool_calls:
    tool_call = response.choices[0].message.tool_calls[0]
    print(f"Function: {tool_call.function.name}")
    print(f"Args: {tool_call.function.arguments}")

Vision (Multimodal)

Claude models support image analysis:
response = completion(
    model="anthropic/claude-3-5-sonnet-20240620",
    messages=[{
        "role": "user",
        "content": [
            {"type": "text", "text": "What's in this image?"},
            {
                "type": "image_url",
                "image_url": {"url": "https://example.com/image.jpg"}
            }
        ]
    }]
)

Streaming

from litellm import completion

response = completion(
    model="anthropic/claude-3-5-sonnet-20240620",
    messages=[{"role": "user", "content": "Write a story"}],
    stream=True
)

for chunk in response:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)

Streaming with Thinking

response = completion(
    model="anthropic/claude-4-6-sonnet-20250514",
    messages=[{"role": "user", "content": "Solve this problem..."}],
    thinking={"type": "enabled", "budget_tokens": 5000},
    stream=True
)

for chunk in response:
    delta = chunk.choices[0].delta
    
    # Handle thinking content
    if hasattr(delta, 'thinking'):
        print(f"[Thinking] {delta.thinking}", end="")
    
    # Handle regular content
    if delta.content:
        print(delta.content, end="", flush=True)

JSON Mode

# JSON object mode
response = completion(
    model="anthropic/claude-3-5-sonnet-20240620",
    messages=[{
        "role": "user",
        "content": "Extract: John is 30, lives in NYC, likes pizza"
    }],
    response_format={"type": "json_object"}
)

import json
data = json.loads(response.choices[0].message.content)

Batch Processing

Process requests asynchronously in batches:
from litellm import create_batch, retrieve_batch

# Create batch
batch = create_batch(
    custom_llm_provider="anthropic",
    input_file_id="file-abc123",
    endpoint="/v1/messages"
)

print(f"Batch ID: {batch.id}")

# Retrieve results
batch_result = retrieve_batch(
    custom_llm_provider="anthropic",
    batch_id=batch.id
)

Advanced Parameters

System Messages

response = completion(
    model="anthropic/claude-3-5-sonnet-20240620",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hello!"}
    ]
)

Temperature and Top P

response = completion(
    model="anthropic/claude-3-5-sonnet-20240620",
    messages=[{"role": "user", "content": "Be creative"}],
    temperature=1.0,  # 0.0 to 1.0
    top_p=0.9,
    top_k=50
)

Stop Sequences

response = completion(
    model="anthropic/claude-3-5-sonnet-20240620",
    messages=[{"role": "user", "content": "Count to 10"}],
    stop=["5", "\n\n"]  # Stop at these sequences
)

Max Tokens

# Important: Anthropic requires max_tokens to be set
response = completion(
    model="anthropic/claude-3-5-sonnet-20240620",
    messages=[{"role": "user", "content": "Write an essay"}],
    max_tokens=4096  # Required parameter
)

Error Handling

from litellm import completion
from litellm.exceptions import (
    AuthenticationError,
    RateLimitError,
    ContextWindowExceededError,
    APIError
)

try:
    response = completion(
        model="anthropic/claude-3-5-sonnet-20240620",
        messages=[{"role": "user", "content": "Hello"}],
        max_tokens=1024
    )
except AuthenticationError:
    print("Invalid API key")
except RateLimitError:
    print("Rate limit hit")
except ContextWindowExceededError:
    print("Input too long")
except APIError as e:
    print(f"API error: {e}")

Cost Tracking

from litellm import completion, completion_cost

response = completion(
    model="anthropic/claude-3-5-sonnet-20240620",
    messages=[{"role": "user", "content": "Hello"}],
    max_tokens=100
)

# Track costs including cache usage
cost = completion_cost(completion_response=response)
print(f"Cost: ${cost:.6f}")

# Check cache usage
if hasattr(response.usage, 'cache_read_input_tokens'):
    print(f"Cached tokens: {response.usage.cache_read_input_tokens}")
    print(f"New tokens: {response.usage.prompt_tokens}")

Best Practices

Use Prompt Caching

Cache system prompts and long documents to reduce costs by up to 90%.

Set Max Tokens

Always set max_tokens - it’s required by Anthropic’s API.

Use Extended Thinking

Enable thinking for complex reasoning, math, and analysis tasks.

Try Haiku First

Use Claude 3.5 Haiku for simple tasks - it’s fast and cost-effective.

Function Calling

Deep dive into tool use with Claude

Vision

Working with images in Claude

Streaming

Stream responses in real-time

Batching

Process requests in batches

Build docs developers (and LLMs) love