Documentation Index Fetch the complete documentation index at: https://mintlify.com/BerriAI/litellm/llms.txt
Use this file to discover all available pages before exploring further.
Overview
LiteLLM provides comprehensive support for Anthropic’s Claude models, including advanced features like prompt caching, computer use, web search, and extended thinking.
Quick Start
Set API Key
export ANTHROPIC_API_KEY = "sk-ant-..."
Make Your First Call
from litellm import completion
response = completion(
model = "anthropic/claude-sonnet-4-20250514" ,
messages = [{ "role" : "user" , "content" : "Hello Claude!" }]
)
print (response.choices[ 0 ].message.content)
Supported Models
Claude 4
Claude 3.7
Claude 3.5
Claude 3
Latest generation with extended thinking and advanced reasoning. # Claude 4.6 - Latest model with reasoning
response = completion(
model = "anthropic/claude-4-6-sonnet-20250514" ,
messages = [{ "role" : "user" , "content" : "Solve this complex problem..." }]
)
# With extended thinking (reasoning)
response = completion(
model = "anthropic/claude-4-6-sonnet-20250514" ,
messages = [{ "role" : "user" , "content" : "Complex analysis task..." }],
thinking = {
"type" : "enabled" ,
"budget_tokens" : 10000 # Allocate tokens for thinking
}
)
Advanced Sonnet model with excellent performance. response = completion(
model = "anthropic/claude-3-7-sonnet-20250219" ,
messages = [{ "role" : "user" , "content" : "Write detailed analysis..." }],
max_tokens = 4096
)
Popular Sonnet and Haiku models. # Claude 3.5 Sonnet - Great balance
response = completion(
model = "anthropic/claude-3-5-sonnet-20240620" ,
messages = [{ "role" : "user" , "content" : "Analyze this data..." }]
)
# Claude 3.5 Haiku - Fast and efficient
response = completion(
model = "anthropic/claude-3-5-haiku-20241022" ,
messages = [{ "role" : "user" , "content" : "Quick task..." }]
)
Previous generation models. # Claude 3 Opus - Most capable
response = completion(
model = "anthropic/claude-3-opus-20240229" ,
messages = [{ "role" : "user" , "content" : "Complex reasoning..." }]
)
# Claude 3 Sonnet
response = completion(
model = "anthropic/claude-3-sonnet-20240229" ,
messages = [{ "role" : "user" , "content" : "Balanced task..." }]
)
# Claude 3 Haiku - Fast
response = completion(
model = "anthropic/claude-3-haiku-20240307" ,
messages = [{ "role" : "user" , "content" : "Quick query..." }]
)
Authentication
Environment Variable
Direct Parameter
export ANTHROPIC_API_KEY = "sk-ant-..."
from litellm import completion
response = completion(
model = "anthropic/claude-3-5-sonnet-20240620" ,
messages = [{ "role" : "user" , "content" : "Hello!" }]
)
from litellm import completion
response = completion(
model = "anthropic/claude-3-5-sonnet-20240620" ,
messages = [{ "role" : "user" , "content" : "Hello!" }],
api_key = "sk-ant-..."
)
Extended Thinking (Reasoning)
Claude 4.6 supports extended thinking for complex reasoning tasks:
Enable Thinking
Reasoning Effort
response = completion(
model = "anthropic/claude-4-6-sonnet-20250514" ,
messages = [{ "role" : "user" , "content" : "Solve this math problem: ..." }],
thinking = {
"type" : "enabled" ,
"budget_tokens" : 5000 # Tokens allocated for thinking
}
)
# Access thinking content
for block in response.choices[ 0 ].message.content:
if block.get( "type" ) == "thinking" :
print ( f "Thinking: { block[ 'thinking' ] } " )
elif block.get( "type" ) == "text" :
print ( f "Response: { block[ 'text' ] } " )
Prompt Caching
Save costs by caching frequently used context:
System Message Caching
User Message Caching
Tool Caching
response = completion(
model = "anthropic/claude-3-5-sonnet-20240620" ,
messages = [
{
"role" : "system" ,
"content" : [
{
"type" : "text" ,
"text" : "You are an expert in..." , # Long system prompt
"cache_control" : { "type" : "ephemeral" } # Cache this
}
]
},
{ "role" : "user" , "content" : "Question 1" }
]
)
# Subsequent requests reuse cached context (5-minute TTL)
response2 = completion(
model = "anthropic/claude-3-5-sonnet-20240620" ,
messages = [
# Same cached system message
{ "role" : "system" , "content" : [{
"type" : "text" ,
"text" : "You are an expert in..." ,
"cache_control" : { "type" : "ephemeral" }
}]},
{ "role" : "user" , "content" : "Question 2" } # Only this is new
]
)
Computer Use
Claude can interact with computers through screenshots and commands:
tools = [{
"type" : "computer_20241022" ,
"name" : "computer" ,
"display_width_px" : 1920 ,
"display_height_px" : 1080 ,
"display_number" : 1
}]
response = completion(
model = "anthropic/claude-3-5-sonnet-20241022" ,
messages = [{
"role" : "user" ,
"content" : "Click on the search button and type 'hello'"
}],
tools = tools
)
# Claude returns tool use with computer actions
for block in response.choices[ 0 ].message.content:
if block.get( "type" ) == "tool_use" :
action = block.get( "input" , {})
print ( f "Action: { action.get( 'action' ) } " )
# Actions: key, type, mouse_move, left_click, etc.
Web Search
Claude can search the web for current information:
# Enable web search tool
tools = [{
"type" : "web_search_20250101" ,
"name" : "web_search" ,
"max_uses" : 5 , # Limit search queries
"user_location" : {
"type" : "auto" # or specify: {"type": "city", "city": "San Francisco, CA"}
}
}]
response = completion(
model = "anthropic/claude-3-7-sonnet-20250219" ,
messages = [{
"role" : "user" ,
"content" : "What are the latest developments in AI this week?"
}],
tools = tools
)
# Claude automatically searches and cites sources
for block in response.choices[ 0 ].message.content:
if block.get( "type" ) == "text" :
print (block.get( "text" ))
Function Calling
Claude supports sophisticated tool use:
Basic Tool Use
Force Tool Usage
MCP Server Tools
tools = [{
"type" : "function" ,
"function" : {
"name" : "get_weather" ,
"description" : "Get weather for a location" ,
"parameters" : {
"type" : "object" ,
"properties" : {
"location" : {
"type" : "string" ,
"description" : "City name"
}
},
"required" : [ "location" ]
}
}
}]
response = completion(
model = "anthropic/claude-3-5-sonnet-20240620" ,
messages = [{ "role" : "user" , "content" : "What's the weather in Paris?" }],
tools = tools
)
if response.choices[ 0 ].message.tool_calls:
tool_call = response.choices[ 0 ].message.tool_calls[ 0 ]
print ( f "Function: { tool_call.function.name } " )
print ( f "Args: { tool_call.function.arguments } " )
Vision (Multimodal)
Claude models support image analysis:
Image URL
Base64 Image
Multiple Images
response = completion(
model = "anthropic/claude-3-5-sonnet-20240620" ,
messages = [{
"role" : "user" ,
"content" : [
{ "type" : "text" , "text" : "What's in this image?" },
{
"type" : "image_url" ,
"image_url" : { "url" : "https://example.com/image.jpg" }
}
]
}]
)
import base64
with open ( "image.jpg" , "rb" ) as f:
image_data = base64.b64encode(f.read()).decode( 'utf-8' )
response = completion(
model = "anthropic/claude-3-5-sonnet-20240620" ,
messages = [{
"role" : "user" ,
"content" : [
{ "type" : "text" , "text" : "Describe this" },
{
"type" : "image_url" ,
"image_url" : {
"url" : f "data:image/jpeg;base64, { image_data } "
}
}
]
}]
)
response = completion(
model = "anthropic/claude-3-5-sonnet-20240620" ,
messages = [{
"role" : "user" ,
"content" : [
{ "type" : "text" , "text" : "Compare these screenshots" },
{ "type" : "image_url" , "image_url" : { "url" : "https://..." }},
{ "type" : "image_url" , "image_url" : { "url" : "https://..." }}
]
}]
)
Streaming
from litellm import completion
response = completion(
model = "anthropic/claude-3-5-sonnet-20240620" ,
messages = [{ "role" : "user" , "content" : "Write a story" }],
stream = True
)
for chunk in response:
if chunk.choices[ 0 ].delta.content:
print (chunk.choices[ 0 ].delta.content, end = "" , flush = True )
Streaming with Thinking
response = completion(
model = "anthropic/claude-4-6-sonnet-20250514" ,
messages = [{ "role" : "user" , "content" : "Solve this problem..." }],
thinking = { "type" : "enabled" , "budget_tokens" : 5000 },
stream = True
)
for chunk in response:
delta = chunk.choices[ 0 ].delta
# Handle thinking content
if hasattr (delta, 'thinking' ):
print ( f "[Thinking] { delta.thinking } " , end = "" )
# Handle regular content
if delta.content:
print (delta.content, end = "" , flush = True )
JSON Mode
# JSON object mode
response = completion(
model = "anthropic/claude-3-5-sonnet-20240620" ,
messages = [{
"role" : "user" ,
"content" : "Extract: John is 30, lives in NYC, likes pizza"
}],
response_format = { "type" : "json_object" }
)
import json
data = json.loads(response.choices[ 0 ].message.content)
Batch Processing
Process requests asynchronously in batches:
from litellm import create_batch, retrieve_batch
# Create batch
batch = create_batch(
custom_llm_provider = "anthropic" ,
input_file_id = "file-abc123" ,
endpoint = "/v1/messages"
)
print ( f "Batch ID: { batch.id } " )
# Retrieve results
batch_result = retrieve_batch(
custom_llm_provider = "anthropic" ,
batch_id = batch.id
)
Advanced Parameters
System Messages
response = completion(
model = "anthropic/claude-3-5-sonnet-20240620" ,
messages = [
{ "role" : "system" , "content" : "You are a helpful assistant." },
{ "role" : "user" , "content" : "Hello!" }
]
)
Temperature and Top P
response = completion(
model = "anthropic/claude-3-5-sonnet-20240620" ,
messages = [{ "role" : "user" , "content" : "Be creative" }],
temperature = 1.0 , # 0.0 to 1.0
top_p = 0.9 ,
top_k = 50
)
Stop Sequences
response = completion(
model = "anthropic/claude-3-5-sonnet-20240620" ,
messages = [{ "role" : "user" , "content" : "Count to 10" }],
stop = [ "5" , " \n\n " ] # Stop at these sequences
)
Max Tokens
# Important: Anthropic requires max_tokens to be set
response = completion(
model = "anthropic/claude-3-5-sonnet-20240620" ,
messages = [{ "role" : "user" , "content" : "Write an essay" }],
max_tokens = 4096 # Required parameter
)
Error Handling
from litellm import completion
from litellm.exceptions import (
AuthenticationError,
RateLimitError,
ContextWindowExceededError,
APIError
)
try :
response = completion(
model = "anthropic/claude-3-5-sonnet-20240620" ,
messages = [{ "role" : "user" , "content" : "Hello" }],
max_tokens = 1024
)
except AuthenticationError:
print ( "Invalid API key" )
except RateLimitError:
print ( "Rate limit hit" )
except ContextWindowExceededError:
print ( "Input too long" )
except APIError as e:
print ( f "API error: { e } " )
Cost Tracking
from litellm import completion, completion_cost
response = completion(
model = "anthropic/claude-3-5-sonnet-20240620" ,
messages = [{ "role" : "user" , "content" : "Hello" }],
max_tokens = 100
)
# Track costs including cache usage
cost = completion_cost( completion_response = response)
print ( f "Cost: $ { cost :.6f} " )
# Check cache usage
if hasattr (response.usage, 'cache_read_input_tokens' ):
print ( f "Cached tokens: { response.usage.cache_read_input_tokens } " )
print ( f "New tokens: { response.usage.prompt_tokens } " )
Best Practices
Use Prompt Caching Cache system prompts and long documents to reduce costs by up to 90%.
Set Max Tokens Always set max_tokens - it’s required by Anthropic’s API.
Use Extended Thinking Enable thinking for complex reasoning, math, and analysis tasks.
Try Haiku First Use Claude 3.5 Haiku for simple tasks - it’s fast and cost-effective.
Function Calling Deep dive into tool use with Claude
Vision Working with images in Claude
Streaming Stream responses in real-time
Batching Process requests in batches