Documentation Index
Fetch the complete documentation index at: https://mintlify.com/BerriAI/litellm/llms.txt
Use this file to discover all available pages before exploring further.
Quick Start with Helm
LiteLLM provides official Helm charts for Kubernetes deployment.Install with Default Values
helm install litellm litellm/litellm-helm \
--set postgresql.auth.password=your-secure-password \
--set postgresql.auth.postgres-password=your-admin-password
Verify Installation
kubectl get pods -l app.kubernetes.io/name=litellm
kubectl logs -l app.kubernetes.io/name=litellm -f
Helm Chart Configuration
Basic Values
Create avalues.yaml file:
values.yaml
# Number of replicas
replicaCount: 3
# Image configuration
image:
repository: ghcr.io/berriai/litellm-database
tag: "main-stable"
pullPolicy: Always
# Service configuration
service:
type: ClusterIP
port: 4000
# Resource limits
resources:
requests:
cpu: 500m
memory: 512Mi
limits:
cpu: 2000m
memory: 2Gi
# LiteLLM configuration
proxy_config:
model_list:
- model_name: gpt-4o
litellm_params:
model: gpt-4o
api_key: os.environ/OPENAI_API_KEY
- model_name: claude-sonnet-4
litellm_params:
model: anthropic/claude-sonnet-4-20250514
api_key: os.environ/ANTHROPIC_API_KEY
general_settings:
master_key: os.environ/PROXY_MASTER_KEY
database_url: os.environ/DATABASE_URL
# Database configuration
db:
deployStandalone: true # Deploy PostgreSQL with chart
useExisting: false # Or use existing database
postgresql:
architecture: standalone
auth:
username: litellm
database: litellm
password: "ChangeMe123!" # Override via --set
postgres-password: "AdminPass123!" # Override via --set
# Enable autoscaling
autoscaling:
enabled: true
minReplicas: 2
maxReplicas: 10
targetCPUUtilizationPercentage: 70
targetMemoryUtilizationPercentage: 80
helm install litellm litellm/litellm-helm -f values.yaml
Environment Variables from Secrets
Store API keys and sensitive data in Kubernetes Secrets, not in values.yaml.
kubectl create secret generic litellm-secrets \
--from-literal=OPENAI_API_KEY=sk-... \
--from-literal=ANTHROPIC_API_KEY=sk-ant-... \
--from-literal=PROXY_MASTER_KEY=sk-1234
values.yaml:
environmentSecrets:
- litellm-secrets
proxy_config:
model_list:
- model_name: gpt-4o
litellm_params:
model: gpt-4o
api_key: os.environ/OPENAI_API_KEY
general_settings:
master_key: os.environ/PROXY_MASTER_KEY
Manual Kubernetes Deployment
For custom deployments without Helm:Deployment YAML
litellm-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: litellm-deployment
spec:
replicas: 3
selector:
matchLabels:
app: litellm
template:
metadata:
labels:
app: litellm
spec:
containers:
- name: litellm-container
image: ghcr.io/berriai/litellm:main-stable
imagePullPolicy: Always
ports:
- containerPort: 4000
name: http
env:
- name: LITELLM_MASTER_KEY
valueFrom:
secretKeyRef:
name: litellm-secrets
key: master-key
- name: DATABASE_URL
valueFrom:
secretKeyRef:
name: litellm-secrets
key: database-url
- name: OPENAI_API_KEY
valueFrom:
secretKeyRef:
name: litellm-secrets
key: openai-api-key
args:
- "--config"
- "/app/proxy_config.yaml"
volumeMounts:
- name: config-volume
mountPath: /app
readOnly: true
livenessProbe:
httpGet:
path: /health/liveliness
port: 4000
initialDelaySeconds: 120
periodSeconds: 15
successThreshold: 1
failureThreshold: 3
timeoutSeconds: 10
readinessProbe:
httpGet:
path: /health/readiness
port: 4000
initialDelaySeconds: 120
periodSeconds: 15
successThreshold: 1
failureThreshold: 3
timeoutSeconds: 10
resources:
requests:
cpu: 500m
memory: 512Mi
limits:
cpu: 2000m
memory: 2Gi
volumes:
- name: config-volume
configMap:
name: litellm-config
---
apiVersion: v1
kind: Service
metadata:
name: litellm-service
spec:
selector:
app: litellm
ports:
- protocol: TCP
port: 4000
targetPort: 4000
type: ClusterIP
kubectl apply -f litellm-deployment.yaml
ConfigMap for Proxy Config
litellm-configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: litellm-config
data:
proxy_config.yaml: |
model_list:
- model_name: gpt-4o
litellm_params:
model: gpt-4o
api_key: os.environ/OPENAI_API_KEY
- model_name: claude-sonnet-4
litellm_params:
model: anthropic/claude-sonnet-4-20250514
api_key: os.environ/ANTHROPIC_API_KEY
general_settings:
master_key: os.environ/LITELLM_MASTER_KEY
database_url: os.environ/DATABASE_URL
kubectl apply -f litellm-configmap.yaml
Ingress Configuration
NGINX Ingress
litellm-ingress.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: litellm-ingress
annotations:
kubernetes.io/ingress.class: nginx
cert-manager.io/cluster-issuer: letsencrypt-prod
nginx.ingress.kubernetes.io/ssl-redirect: "true"
nginx.ingress.kubernetes.io/proxy-body-size: "50m"
nginx.ingress.kubernetes.io/proxy-read-timeout: "600"
spec:
tls:
- hosts:
- api.yourdomain.com
secretName: litellm-tls
rules:
- host: api.yourdomain.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: litellm-service
port:
number: 4000
ingress:
enabled: true
className: nginx
annotations:
cert-manager.io/cluster-issuer: letsencrypt-prod
hosts:
- host: api.yourdomain.com
paths:
- path: /
pathType: Prefix
tls:
- secretName: litellm-tls
hosts:
- api.yourdomain.com
Autoscaling
Horizontal Pod Autoscaler (HPA)
litellm-hpa.yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: litellm-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: litellm-deployment
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
KEDA Autoscaling
Use KEDA for advanced autoscaling based on custom metrics like request queue depth or Prometheus metrics.
keda:
enabled: true
minReplicas: 2
maxReplicas: 20
pollingInterval: 30
cooldownPeriod: 300
triggers:
- type: prometheus
metadata:
serverAddress: http://prometheus:9090
metricName: litellm_requests_total
threshold: '1000'
query: sum(rate(litellm_requests_total[2m]))
Database Configuration
Using External PostgreSQL
For production, use a managed database service (AWS RDS, GCP Cloud SQL, Azure Database) for better reliability.
db:
useExisting: true
endpoint: postgres.example.com
database: litellm
url: postgresql://$(DATABASE_USERNAME):$(DATABASE_PASSWORD)@$(DATABASE_HOST)/$(DATABASE_NAME)
secret:
name: postgres-credentials
usernameKey: username
passwordKey: password
# Disable bundled PostgreSQL
postgresql:
enabled: false
kubectl create secret generic postgres-credentials \
--from-literal=username=litellm \
--from-literal=password=your-secure-password
Prisma Migrations
The Helm chart includes a migration job that runs before deployment:migrationJob:
enabled: true
retries: 3
backoffLimit: 4
ttlSecondsAfterFinished: 120
hooks:
argocd:
enabled: true # Run as ArgoCD hook
helm:
enabled: false # Or as Helm pre-install hook
High Availability Setup
Pod Disruption Budget
pdb:
enabled: true
minAvailable: 2 # Ensure at least 2 pods always running
# Or use: maxUnavailable: 1
Topology Spread Constraints
topologySpreadConstraints:
- maxSkew: 1
topologyKey: kubernetes.io/hostname
whenUnsatisfiable: DoNotSchedule
labelSelector:
matchLabels:
app: litellm
- maxSkew: 1
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: ScheduleAnyway
labelSelector:
matchLabels:
app: litellm
Graceful Shutdown
terminationGracePeriodSeconds: 90
lifecycle:
preStop:
exec:
command: ["/bin/sh", "-c", "sleep 15"] # Drain requests
Monitoring with Prometheus
ServiceMonitor
serviceMonitor:
enabled: true
interval: 15s
scrapeTimeout: 10s
labels:
prometheus: kube-prometheus
Redis for Caching
redis:
enabled: true
architecture: standalone
auth:
enabled: true
password: "your-redis-password"
proxy_config:
general_settings:
cache: true
redis_host: os.environ/REDIS_HOST
redis_port: os.environ/REDIS_PORT
redis_password: os.environ/REDIS_PASSWORD
Security Best Practices
Network Policies
networkpolicy.yaml
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: litellm-netpol
spec:
podSelector:
matchLabels:
app: litellm
policyTypes:
- Ingress
- Egress
ingress:
- from:
- namespaceSelector:
matchLabels:
name: ingress-nginx
ports:
- protocol: TCP
port: 4000
egress:
- to:
- podSelector:
matchLabels:
app: postgresql
ports:
- protocol: TCP
port: 5432
- to: # Allow external API calls
- namespaceSelector: {}
ports:
- protocol: TCP
port: 443
Pod Security Standards
podSecurityContext:
runAsNonRoot: true
runAsUser: 1000
fsGroup: 1000
seccompProfile:
type: RuntimeDefault
securityContext:
allowPrivilegeEscalation: false
capabilities:
drop:
- ALL
readOnlyRootFilesystem: false # Prisma needs write access
Troubleshooting
Pod Won’t Start
# Check pod status
kubectl get pods
kubectl describe pod litellm-xxx
# View logs
kubectl logs litellm-xxx -f
# Common issues:
# - Database migrations failing (check migration job logs)
# - ConfigMap not mounted (verify configmap exists)
# - Secrets missing (check secret creation)
Database Connection Issues
# Test database connectivity from pod
kubectl exec -it litellm-xxx -- sh
psql $DATABASE_URL
# Check service DNS resolution
kubectl exec -it litellm-xxx -- nslookup postgres
Health Check Failures
# Manually test health endpoint
kubectl exec -it litellm-xxx -- curl localhost:4000/health/liveliness
# Increase startup probe failure threshold
startupProbe:
failureThreshold: 30 # Allow 5 minutes (30 * 10s)
periodSeconds: 10
Next Steps
High Availability
Multi-region HA deployment patterns
Monitoring
Set up Prometheus and Grafana dashboards
Security
Harden your Kubernetes deployment
Performance
Optimize for high throughput