Documentation Index Fetch the complete documentation index at: https://mintlify.com/BerriAI/litellm/llms.txt
Use this file to discover all available pages before exploring further.
Installation
Install LiteLLM using pip:
Basic Usage
LiteLLM provides a simple, unified interface to call any LLM. All you need to do is set the appropriate environment variables and use the completion() function.
Set API Keys
Set your API keys as environment variables: import os
# Set API keys for the providers you want to use
os.environ[ "OPENAI_API_KEY" ] = "your-openai-key"
os.environ[ "ANTHROPIC_API_KEY" ] = "your-anthropic-key"
Make Your First Call
Use the completion() function with any supported model: from litellm import completion
# Call OpenAI GPT-4
response = completion(
model = "openai/gpt-4o" ,
messages = [{ "role" : "user" , "content" : "Hello!" }]
)
print (response.choices[ 0 ].message.content)
# Call Anthropic Claude
response = completion(
model = "anthropic/claude-sonnet-4-20250514" ,
messages = [{ "role" : "user" , "content" : "Hello!" }]
)
print (response.choices[ 0 ].message.content)
Try Streaming
Enable streaming for real-time responses: from litellm import completion
response = completion(
model = "openai/gpt-4o" ,
messages = [{ "role" : "user" , "content" : "Write a haiku about coding" }],
stream = True
)
for chunk in response:
if chunk.choices[ 0 ].delta.content:
print (chunk.choices[ 0 ].delta.content, end = "" )
Supported Providers
LiteLLM supports 100+ providers. Here are examples of the most popular ones:
OpenAI
Anthropic
Azure OpenAI
Vertex AI
AWS Bedrock
Groq
from litellm import completion
response = completion(
model = "openai/gpt-4o" ,
messages = [{ "role" : "user" , "content" : "Hello!" }]
)
Async Support
LiteLLM provides async support out of the box:
import asyncio
from litellm import acompletion
async def main ():
response = await acompletion(
model = "openai/gpt-4o" ,
messages = [{ "role" : "user" , "content" : "Hello!" }]
)
print (response.choices[ 0 ].message.content)
# Async streaming
response = await acompletion(
model = "openai/gpt-4o" ,
messages = [{ "role" : "user" , "content" : "Write a haiku" }],
stream = True
)
async for chunk in response:
if chunk.choices[ 0 ].delta.content:
print (chunk.choices[ 0 ].delta.content, end = "" )
asyncio.run(main())
Function Calling
LiteLLM standardizes function calling across all providers:
from litellm import completion
tools = [
{
"type" : "function" ,
"function" : {
"name" : "get_weather" ,
"description" : "Get the current weather for a location" ,
"parameters" : {
"type" : "object" ,
"properties" : {
"location" : {
"type" : "string" ,
"description" : "The city and state, e.g. San Francisco, CA"
},
"unit" : { "type" : "string" , "enum" : [ "celsius" , "fahrenheit" ]}
},
"required" : [ "location" ]
}
}
}
]
response = completion(
model = "openai/gpt-4o" ,
messages = [{ "role" : "user" , "content" : "What's the weather in SF?" }],
tools = tools,
tool_choice = "auto"
)
if response.choices[ 0 ].message.tool_calls:
tool_call = response.choices[ 0 ].message.tool_calls[ 0 ]
print ( f "Function: { tool_call.function.name } " )
print ( f "Arguments: { tool_call.function.arguments } " )
Error Handling
LiteLLM provides OpenAI-compatible exceptions:
from litellm import completion
from litellm.exceptions import (
RateLimitError,
AuthenticationError,
ContextWindowExceededError,
APIError
)
try :
response = completion(
model = "openai/gpt-4o" ,
messages = [{ "role" : "user" , "content" : "Hello!" }]
)
except RateLimitError as e:
print ( f "Rate limit exceeded: { e } " )
except AuthenticationError as e:
print ( f "Authentication failed: { e } " )
except ContextWindowExceededError as e:
print ( f "Context too long: { e } " )
except APIError as e:
print ( f "API error: { e } " )
Router with Fallbacks
The Router provides load balancing and automatic fallbacks:
from litellm import Router
router = Router(
model_list = [
{
"model_name" : "gpt-4" ,
"litellm_params" : {
"model" : "openai/gpt-4o" ,
"api_key" : "your-openai-key"
}
},
{
"model_name" : "gpt-4" ,
"litellm_params" : {
"model" : "anthropic/claude-sonnet-4-20250514" ,
"api_key" : "your-anthropic-key"
}
}
],
fallbacks = [( "gpt-4" , [ "gpt-4" ])], # Fallback to Claude if OpenAI fails
num_retries = 2
)
response = router.completion(
model = "gpt-4" ,
messages = [{ "role" : "user" , "content" : "Hello!" }]
)
Embeddings
Generate embeddings with any provider:
from litellm import embedding
response = embedding(
model = "openai/text-embedding-3-small" ,
input = [ "Hello, world!" , "LiteLLM is awesome" ]
)
print ( f "Embeddings: { len (response.data) } vectors" )
print ( f "Dimensions: { len (response.data[ 0 ].embedding) } " )
Image Generation
Generate images with supported providers:
from litellm import image_generation
response = image_generation(
model = "openai/dall-e-3" ,
prompt = "A serene landscape with mountains and a lake" ,
n = 1 ,
size = "1024x1024"
)
print ( f "Image URL: { response.data[ 0 ].url } " )
What’s Next?
Explore Providers Learn about all 100+ supported providers and their capabilities
Caching Enable caching to reduce costs and improve response times
Observability Integrate with Langfuse, Lunary, MLflow, and other observability tools
Deploy Proxy Deploy the AI Gateway for team-wide LLM access