Documentation Index
Fetch the complete documentation index at: https://mintlify.com/BerriAI/litellm/llms.txt
Use this file to discover all available pages before exploring further.
Overview
LiteLLM automatically calculates and tracks costs for all supported LLM providers. Track spending across models, users, teams, and API keys to manage budgets and optimize usage.
Automatic Cost Calculation
Costs are calculated automatically for every request:
from litellm import completion
response = completion(
model="gpt-4",
messages=[{"role": "user", "content": "Explain quantum computing"}]
)
# Cost information in response
print(f"Prompt tokens: {response.usage.prompt_tokens}")
print(f"Completion tokens: {response.usage.completion_tokens}")
print(f"Total cost: ${response._hidden_params['response_cost']}")
Supported Cost Metrics
LiteLLM tracks costs for:
- Completion/Chat - Input and output tokens
- Embeddings - Per token or per request
- Image Generation - Per image, resolution, quality
- Audio (Speech) - Per character or per second
- Audio (Transcription) - Per second
- Fine-tuning - Training tokens
- Realtime API - Session duration, audio input/output
Provider Support
Cost tracking for 100+ providers:
- OpenAI (GPT-4, GPT-3.5, etc.)
- Anthropic (Claude)
- Google (Gemini, Vertex AI)
- Azure OpenAI
- AWS Bedrock
- Cohere
- Replicate
- Together AI
- And many more…
Response Object
response = completion(
model="gpt-4-turbo",
messages=[{"role": "user", "content": "Hello!"}]
)
# Standard usage information
usage = response.usage
print(f"Prompt tokens: {usage.prompt_tokens}")
print(f"Completion tokens: {usage.completion_tokens}")
print(f"Total tokens: {usage.total_tokens}")
# Cost information
cost = response._hidden_params.get("response_cost", 0)
print(f"Request cost: ${cost:.6f}")
Streaming Responses
For streaming, cost is available in the final chunk:
response = completion(
model="gpt-4",
messages=[{"role": "user", "content": "Write a story"}],
stream=True
)
total_cost = 0
for chunk in response:
if hasattr(chunk, '_hidden_params'):
total_cost = chunk._hidden_params.get('response_cost', 0)
print(f"Total streaming cost: ${total_cost:.6f}")
Custom Pricing
Override default pricing for custom deployments:
from litellm import completion
response = completion(
model="custom-model",
messages=[{"role": "user", "content": "Hello"}],
input_cost_per_token=0.00001, # $0.00001 per input token
output_cost_per_token=0.00003 # $0.00003 per output token
)
Custom Pricing with Router
from litellm import Router
router = Router(
model_list=[
{
"model_name": "custom-gpt",
"litellm_params": {
"model": "openai/custom-deployment",
"api_key": "sk-...",
"input_cost_per_token": 0.00002,
"output_cost_per_token": 0.00004
}
}
]
)
Budget Management
Set Budget Limits
Prevent overspending with budget limits:
import litellm
from litellm import BudgetManager
# Initialize budget manager
budget = BudgetManager(
project_name="my-project",
max_budget=100.00, # $100 limit
budget_duration="monthly" # monthly, daily, or total
)
litellm.budget_manager = budget
# Requests will fail if budget is exceeded
response = completion(
model="gpt-4",
messages=[{"role": "user", "content": "Hello"}]
)
User-Level Budgets
Set budgets per user or API key:
from litellm import Router
router = Router(
model_list=[...],
provider_budget_config={
"openai": {"budget_limit": 100.0}, # $100/day for OpenAI
"anthropic": {"budget_limit": 50.0}, # $50/day for Anthropic
"google": {"budget_limit": 75.0} # $75/day for Google
}
)
Cost Logging
Custom Cost Logger
from litellm.integrations import CustomLogger
import litellm
class CostLogger(CustomLogger):
def __init__(self):
self.total_cost = 0
super().__init__()
def log_success_event(self, kwargs, response_obj, start_time, end_time):
cost = kwargs.get("response_cost", 0)
self.total_cost += cost
print(f"Request cost: ${cost:.6f}")
print(f"Total cost: ${self.total_cost:.6f}")
print(f"Model: {kwargs.get('model')}")
print(f"Tokens: {response_obj.usage.total_tokens}")
cost_logger = CostLogger()
litellm.callbacks = [cost_logger]
# Make requests
response = completion(
model="gpt-4",
messages=[{"role": "user", "content": "Hello"}]
)
print(f"Total accumulated cost: ${cost_logger.total_cost:.6f}")
Database Cost Tracking
from litellm.integrations import CustomLogger
import sqlite3
from datetime import datetime
class DatabaseCostLogger(CustomLogger):
def __init__(self, db_path="costs.db"):
self.conn = sqlite3.connect(db_path)
self.create_table()
super().__init__()
def create_table(self):
self.conn.execute("""
CREATE TABLE IF NOT EXISTS costs (
id INTEGER PRIMARY KEY,
timestamp TEXT,
model TEXT,
prompt_tokens INTEGER,
completion_tokens INTEGER,
total_tokens INTEGER,
cost REAL,
user_id TEXT
)
""")
self.conn.commit()
def log_success_event(self, kwargs, response_obj, start_time, end_time):
usage = response_obj.usage
cost = kwargs.get("response_cost", 0)
user_id = kwargs.get("metadata", {}).get("user_id")
self.conn.execute("""
INSERT INTO costs (timestamp, model, prompt_tokens,
completion_tokens, total_tokens, cost, user_id)
VALUES (?, ?, ?, ?, ?, ?, ?)
""", (
datetime.now().isoformat(),
kwargs.get("model"),
usage.prompt_tokens,
usage.completion_tokens,
usage.total_tokens,
cost,
user_id
))
self.conn.commit()
litellm.callbacks = [DatabaseCostLogger()]
Cost Analytics
Query Costs by Model
import sqlite3
conn = sqlite3.connect("costs.db")
# Total cost by model
result = conn.execute("""
SELECT model, SUM(cost) as total_cost, COUNT(*) as request_count
FROM costs
GROUP BY model
ORDER BY total_cost DESC
""").fetchall()
for model, cost, count in result:
print(f"{model}: ${cost:.2f} ({count} requests)")
Query Costs by User
# Total cost by user
result = conn.execute("""
SELECT user_id, SUM(cost) as total_cost
FROM costs
WHERE user_id IS NOT NULL
GROUP BY user_id
ORDER BY total_cost DESC
LIMIT 10
""").fetchall()
for user_id, cost in result:
print(f"User {user_id}: ${cost:.2f}")
Time-Based Analysis
# Daily costs
result = conn.execute("""
SELECT DATE(timestamp) as date, SUM(cost) as daily_cost
FROM costs
GROUP BY DATE(timestamp)
ORDER BY date DESC
LIMIT 30
""").fetchall()
for date, cost in result:
print(f"{date}: ${cost:.2f}")
Cost Optimization
Model Cost Comparison
from litellm import model_cost
# Get costs for different models
models = ["gpt-4", "gpt-3.5-turbo", "claude-3-opus", "claude-3-sonnet"]
for model in models:
try:
cost_info = model_cost.get(model)
if cost_info:
input_cost = cost_info.get("input_cost_per_token", 0)
output_cost = cost_info.get("output_cost_per_token", 0)
print(f"{model}:")
print(f" Input: ${input_cost * 1000000:.2f}/1M tokens")
print(f" Output: ${output_cost * 1000000:.2f}/1M tokens")
except:
print(f"{model}: Cost info not available")
Prompt Optimization
Reduce costs by optimizing prompts:
from litellm import token_counter
prompt = "Your long prompt here..."
# Count tokens before sending
token_count = token_counter(
model="gpt-4",
messages=[{"role": "user", "content": prompt}]
)
print(f"Prompt tokens: {token_count}")
# Estimate cost
input_cost_per_token = 0.00003 # GPT-4
estimated_cost = token_count * input_cost_per_token
print(f"Estimated input cost: ${estimated_cost:.6f}")
Choose Cost-Effective Models
from litellm import Router
router = Router(
model_list=[
# Primary: Cheap model
{
"model_name": "smart",
"litellm_params": {"model": "gpt-3.5-turbo"}
},
# Fallback: Expensive model
{
"model_name": "smart",
"litellm_params": {"model": "gpt-4"}
}
],
routing_strategy="cost-based-routing" # Prefer cheaper models
)
Cost Alerting
Threshold-Based Alerts
from litellm.integrations import CustomLogger
import litellm
class CostAlertLogger(CustomLogger):
def __init__(self, daily_threshold=10.0):
self.daily_threshold = daily_threshold
self.daily_cost = 0
self.alert_sent = False
super().__init__()
def log_success_event(self, kwargs, response_obj, start_time, end_time):
cost = kwargs.get("response_cost", 0)
self.daily_cost += cost
if self.daily_cost > self.daily_threshold and not self.alert_sent:
self.send_alert(self.daily_cost)
self.alert_sent = True
def send_alert(self, cost):
print(f"⚠️ ALERT: Daily cost ${cost:.2f} exceeds threshold ${self.daily_threshold:.2f}")
# Send email, Slack notification, etc.
litellm.callbacks = [CostAlertLogger(daily_threshold=10.0)]
Langfuse Integration
import litellm
litellm.success_callback = ["langfuse"]
litellm.failure_callback = ["langfuse"]
# Set Langfuse credentials
import os
os.environ["LANGFUSE_PUBLIC_KEY"] = "pk-..."
os.environ["LANGFUSE_SECRET_KEY"] = "sk-..."
os.environ["LANGFUSE_HOST"] = "https://cloud.langfuse.com"
# Costs automatically tracked in Langfuse
response = completion(
model="gpt-4",
messages=[{"role": "user", "content": "Hello"}]
)
DataDog Integration
import litellm
litellm.success_callback = ["datadog"]
# Set DataDog credentials
import os
os.environ["DD_API_KEY"] = "..."
os.environ["DD_SITE"] = "datadoghq.com"
# Costs sent as metrics to DataDog
response = completion(
model="gpt-4",
messages=[{"role": "user", "content": "Hello"}]
)
Best Practices
Cost Management Tips
- Monitor daily - Track costs in real-time
- Set budgets - Use budget limits to prevent overruns
- Optimize prompts - Reduce token usage
- Cache responses - Avoid redundant API calls
- Use cheaper models - Balance cost vs. quality
- Track by user - Identify high-cost users
- Alert on thresholds - Get notified of unusual spending
- Analyze trends - Review cost patterns weekly
Cost Calculation Details
Token-Based Pricing
Most models charge per token:
# Example calculation for GPT-4
input_tokens = 100
output_tokens = 50
input_cost = input_tokens * 0.00003 # $0.03/1K tokens
output_cost = output_tokens * 0.00006 # $0.06/1K tokens
total_cost = input_cost + output_cost
print(f"Total cost: ${total_cost:.6f}") # $0.006000
Image Generation Pricing
from litellm import image_generation
response = image_generation(
model="dall-e-3",
prompt="A sunset over mountains",
size="1024x1024",
quality="hd",
n=1
)
# Cost based on size, quality, and number of images
cost = response._hidden_params.get("response_cost")
print(f"Image generation cost: ${cost}")
Audio Pricing
# Speech (TTS)
from litellm import speech
response = speech(
model="tts-1",
input="Hello, world!",
voice="alloy"
)
# Cost based on character count
cost = response._hidden_params.get("response_cost")