Documentation Index Fetch the complete documentation index at: https://mintlify.com/BerriAI/litellm/llms.txt
Use this file to discover all available pages before exploring further.
Overview
LiteLLM provides comprehensive support for Azure OpenAI Service, allowing you to use GPT-4, GPT-3.5, embeddings, and more through your Azure deployments.
Quick Start
Set Azure Credentials
export AZURE_API_KEY = "your-azure-api-key"
export AZURE_API_BASE = "https://your-resource.openai.azure.com"
export AZURE_API_VERSION = "2024-02-15-preview"
Make Your First Call
from litellm import completion
response = completion(
model = "azure/gpt-4o" , # Your Azure deployment name
messages = [{ "role" : "user" , "content" : "Hello Azure!" }]
)
print (response.choices[ 0 ].message.content)
Authentication
Environment Variables
Direct Parameters
Azure Active Directory
Managed Identity
Set Azure credentials via environment variables: export AZURE_API_KEY = "your-api-key"
export AZURE_API_BASE = "https://your-resource.openai.azure.com"
export AZURE_API_VERSION = "2024-02-15-preview"
from litellm import completion
response = completion(
model = "azure/gpt-4o" , # Your deployment name
messages = [{ "role" : "user" , "content" : "Hello!" }]
)
Pass credentials directly: from litellm import completion
response = completion(
model = "azure/gpt-4o" ,
messages = [{ "role" : "user" , "content" : "Hello!" }],
api_key = "your-api-key" ,
api_base = "https://your-resource.openai.azure.com" ,
api_version = "2024-02-15-preview"
)
Use Azure AD authentication: export AZURE_AD_TOKEN = "your-ad-token"
export AZURE_API_BASE = "https://your-resource.openai.azure.com"
export AZURE_API_VERSION = "2024-02-15-preview"
from litellm import completion
response = completion(
model = "azure/gpt-4o" ,
messages = [{ "role" : "user" , "content" : "Hello!" }]
)
Use Azure Managed Identity: export AZURE_USE_MANAGED_IDENTITY = "true"
export AZURE_API_BASE = "https://your-resource.openai.azure.com"
export AZURE_API_VERSION = "2024-02-15-preview"
from litellm import completion
response = completion(
model = "azure/gpt-4o" ,
messages = [{ "role" : "user" , "content" : "Hello!" }]
)
Model Naming
Azure uses deployment names, not model names. Format: azure/{deployment_name}
# If your Azure deployment is named "gpt-4o-deployment"
response = completion(
model = "azure/gpt-4o-deployment" ,
messages = [{ "role" : "user" , "content" : "Hello!" }]
)
# If your deployment is named "my-gpt-35-turbo"
response = completion(
model = "azure/my-gpt-35-turbo" ,
messages = [{ "role" : "user" , "content" : "Hello!" }]
)
Common Azure Deployments
GPT-3.5 Turbo model = "azure/gpt-35-turbo"
Fast and efficient
Embeddings model = "azure/text-embedding-ada-002"
Text embeddings
API Versions
Azure OpenAI uses API versions. Recommended versions:
Version Features Recommended For 2024-02-15-previewLatest features Production use 2024-08-01-previewNewest preview Testing new features 2023-12-01-previewStable Legacy support
response = completion(
model = "azure/gpt-4o" ,
messages = [{ "role" : "user" , "content" : "Hello!" }],
api_version = "2024-02-15-preview"
)
Streaming
from litellm import completion
response = completion(
model = "azure/gpt-4o" ,
messages = [{ "role" : "user" , "content" : "Write a story" }],
stream = True
)
for chunk in response:
if chunk.choices[ 0 ].delta.content:
print (chunk.choices[ 0 ].delta.content, end = "" , flush = True )
Function Calling
tools = [{
"type" : "function" ,
"function" : {
"name" : "get_weather" ,
"description" : "Get current weather" ,
"parameters" : {
"type" : "object" ,
"properties" : {
"location" : { "type" : "string" }
},
"required" : [ "location" ]
}
}
}]
response = completion(
model = "azure/gpt-4o" ,
messages = [{ "role" : "user" , "content" : "What's the weather in Seattle?" }],
tools = tools
)
if response.choices[ 0 ].message.tool_calls:
tool_call = response.choices[ 0 ].message.tool_calls[ 0 ]
print ( f "Function: { tool_call.function.name } " )
print ( f "Arguments: { tool_call.function.arguments } " )
Vision (Multimodal)
Use GPT-4 Vision on Azure:
response = completion(
model = "azure/gpt-4o" , # Or your GPT-4-vision deployment
messages = [{
"role" : "user" ,
"content" : [
{ "type" : "text" , "text" : "What's in this image?" },
{
"type" : "image_url" ,
"image_url" : { "url" : "https://example.com/image.jpg" }
}
]
}]
)
Embeddings
Generate embeddings using Azure:
from litellm import embedding
response = embedding(
model = "azure/text-embedding-ada-002" , # Your deployment name
input = "Hello world"
)
print (response.data[ 0 ].embedding)
print ( f "Dimensions: { len (response.data[ 0 ].embedding) } " )
# Multiple texts
response = embedding(
model = "azure/text-embedding-ada-002" ,
input = [ "Text 1" , "Text 2" , "Text 3" ]
)
Azure Embedding Models
# text-embedding-ada-002
embedding( model = "azure/text-embedding-ada-002" , input = "..." )
# text-embedding-3-small
embedding( model = "azure/text-embedding-3-small" , input = "..." )
# text-embedding-3-large
embedding( model = "azure/text-embedding-3-large" , input = "..." , dimensions = 256 )
Image Generation (DALL-E)
Generate images using DALL-E on Azure:
from litellm import image_generation
response = image_generation(
model = "azure/dall-e-3" ,
prompt = "A sunset over mountains" ,
n = 1 ,
size = "1024x1024" ,
quality = "standard" # or "hd"
)
print (response.data[ 0 ].url)
Audio Transcription (Whisper)
Transcribe audio using Whisper on Azure:
from litellm import transcription
with open ( "audio.mp3" , "rb" ) as audio_file:
response = transcription(
model = "azure/whisper" , # Your Whisper deployment
file = audio_file,
language = "en"
)
print (response.text)
Text-to-Speech
Generate speech from text:
from litellm import speech
response = speech(
model = "azure/tts" , # Your TTS deployment
input = "Hello, this is a test." ,
voice = "alloy" # alloy, echo, fable, onyx, nova, shimmer
)
# Save audio file
with open ( "output.mp3" , "wb" ) as f:
f.write(response.content)
Batch Processing
Process requests in batches:
from litellm import create_batch, retrieve_batch
# Create batch
batch = create_batch(
custom_llm_provider = "azure" ,
input_file_id = "file-abc123" ,
endpoint = "/chat/completions" ,
completion_window = "24h"
)
print ( f "Batch ID: { batch.id } " )
# Check status
batch_status = retrieve_batch(
custom_llm_provider = "azure" ,
batch_id = batch.id
)
Advanced Features
JSON Mode
response = completion(
model = "azure/gpt-4o" ,
messages = [{
"role" : "user" ,
"content" : "Extract info: John is 30 and lives in NYC"
}],
response_format = { "type" : "json_object" }
)
import json
data = json.loads(response.choices[ 0 ].message.content)
Seed for Reproducibility
response = completion(
model = "azure/gpt-4o" ,
messages = [{ "role" : "user" , "content" : "Tell a joke" }],
seed = 42 ,
temperature = 0.7
)
Logprobs
response = completion(
model = "azure/gpt-4o" ,
messages = [{ "role" : "user" , "content" : "Hello" }],
logprobs = True ,
top_logprobs = 3
)
for token in response.choices[ 0 ].logprobs.content:
print ( f " { token.token } : { token.logprob } " )
Multiple Azure Deployments
Use different Azure resources:
# Resource 1
response1 = completion(
model = "azure/gpt-4o" ,
messages = [{ "role" : "user" , "content" : "Hello" }],
api_base = "https://resource1.openai.azure.com" ,
api_key = "key1"
)
# Resource 2
response2 = completion(
model = "azure/gpt-35-turbo" ,
messages = [{ "role" : "user" , "content" : "Hello" }],
api_base = "https://resource2.openai.azure.com" ,
api_key = "key2"
)
Content Filtering
Azure applies content filtering by default:
try :
response = completion(
model = "azure/gpt-4o" ,
messages = [{ "role" : "user" , "content" : "..." }]
)
except Exception as e:
# Check if content was filtered
if "content_filter" in str (e).lower():
print ( "Content was filtered by Azure" )
raise
# Access content filter results
if hasattr (response.choices[ 0 ], 'content_filter_results' ):
print (response.choices[ 0 ].content_filter_results)
Error Handling
from litellm import completion
from litellm.exceptions import (
AuthenticationError,
RateLimitError,
ContextWindowExceededError,
APIError
)
try :
response = completion(
model = "azure/gpt-4o" ,
messages = [{ "role" : "user" , "content" : "Hello!" }]
)
except AuthenticationError:
print ( "Invalid API key or auth" )
except RateLimitError:
print ( "Rate limit exceeded" )
except ContextWindowExceededError:
print ( "Input too long" )
except APIError as e:
print ( f "Azure API error: { e } " )
Cost Tracking
from litellm import completion, completion_cost
response = completion(
model = "azure/gpt-4o" ,
messages = [{ "role" : "user" , "content" : "Hello!" }]
)
# Calculate cost
cost = completion_cost( completion_response = response)
print ( f "Cost: $ { cost :.6f} " )
# Token usage
print ( f "Prompt tokens: { response.usage.prompt_tokens } " )
print ( f "Completion tokens: { response.usage.completion_tokens } " )
Regional Deployments
Azure OpenAI is available in multiple regions:
# East US
response = completion(
model = "azure/gpt-4o" ,
api_base = "https://eastus.api.cognitive.microsoft.com/" ,
api_key = "..."
)
# West Europe
response = completion(
model = "azure/gpt-4o" ,
api_base = "https://westeurope.api.cognitive.microsoft.com/" ,
api_key = "..."
)
Best Practices
Use Latest API Version Always use the latest stable API version for new features and improvements.
Handle Content Filters Azure applies content filtering - handle these responses appropriately.
Use Managed Identity For Azure-hosted apps, use Managed Identity instead of API keys.
Monitor Rate Limits Track TPM (tokens per minute) and RPM (requests per minute) limits.
Troubleshooting
Deployment Not Found
# Make sure deployment name matches Azure
response = completion(
model = "azure/your-exact-deployment-name" , # Must match Azure portal
messages = [{ "role" : "user" , "content" : "Hello" }]
)
API Version Issues
# Use a supported API version
response = completion(
model = "azure/gpt-4o" ,
api_version = "2024-02-15-preview" , # Check Azure docs for valid versions
messages = [{ "role" : "user" , "content" : "Hello" }]
)
OpenAI Learn about OpenAI models and features
Streaming Stream responses in real-time
Function Calling Implement function calling
Embeddings Generate embeddings on Azure