Documentation Index Fetch the complete documentation index at: https://mintlify.com/BerriAI/litellm/llms.txt
Use this file to discover all available pages before exploring further.
Overview
LiteLLM provides comprehensive support for Cohere’s models including Command R+, chat completions, embeddings, and reranking capabilities.
Quick Start
Set API Key
export COHERE_API_KEY = "your-api-key"
Make Your First Call
from litellm import completion
response = completion(
model = "cohere/command-r-plus" ,
messages = [{ "role" : "user" , "content" : "Hello!" }]
)
print (response.choices[ 0 ].message.content)
Supported Models
Command R+
Command R
Command
Most capable model for complex tasks. from litellm import completion
response = completion(
model = "cohere/command-r-plus" ,
messages = [{ "role" : "user" , "content" : "Analyze this data..." }],
max_tokens = 1000 ,
temperature = 0.7
)
Balanced model for general use. response = completion(
model = "cohere/command-r" ,
messages = [{ "role" : "user" , "content" : "Summarize this text..." }]
)
Standard chat model. response = completion(
model = "cohere/command" ,
messages = [{ "role" : "user" , "content" : "Quick question..." }]
)
Authentication
Environment Variable
Direct Parameter
export COHERE_API_KEY = "your-api-key"
from litellm import completion
response = completion(
model = "cohere/command-r-plus" ,
messages = [{ "role" : "user" , "content" : "Hello!" }]
)
from litellm import completion
response = completion(
model = "cohere/command-r-plus" ,
messages = [{ "role" : "user" , "content" : "Hello!" }],
api_key = "your-api-key"
)
Function Calling
Cohere supports function calling with automatic tool translation.
from litellm import completion
tools = [
{
"type" : "function" ,
"function" : {
"name" : "get_weather" ,
"description" : "Get current weather in a location" ,
"parameters" : {
"type" : "object" ,
"properties" : {
"location" : {
"type" : "string" ,
"description" : "City and state, e.g. San Francisco, CA"
},
"unit" : {
"type" : "string" ,
"enum" : [ "celsius" , "fahrenheit" ]
}
},
"required" : [ "location" ]
}
}
}
]
response = completion(
model = "cohere/command-r-plus" ,
messages = [{ "role" : "user" , "content" : "What's the weather in NYC?" }],
tools = tools
)
# Check for tool calls
if response.choices[ 0 ].message.tool_calls:
tool_call = response.choices[ 0 ].message.tool_calls[ 0 ]
print ( f "Tool: { tool_call.function.name } " )
print ( f "Args: { tool_call.function.arguments } " )
Streaming
from litellm import completion
response = completion(
model = "cohere/command-r-plus" ,
messages = [{ "role" : "user" , "content" : "Write a story..." }],
stream = True
)
for chunk in response:
if chunk.choices[ 0 ].delta.content:
print (chunk.choices[ 0 ].delta.content, end = "" )
Embeddings
v3 Models
v2 Models
Input Types
Latest embedding models with improved performance. from litellm import embedding
response = embedding(
model = "cohere/embed-english-v3.0" ,
input = [ "Text to embed" , "Another text" ]
)
embeddings = [data.embedding for data in response.data]
Previous generation embeddings. from litellm import embedding
response = embedding(
model = "cohere/embed-english-v2.0" ,
input = [ "Text to embed" ]
)
Specify input type for better performance. from litellm import embedding
# For search queries
response = embedding(
model = "cohere/embed-english-v3.0" ,
input = [ "search query" ],
input_type = "search_query"
)
# For documents
response = embedding(
model = "cohere/embed-english-v3.0" ,
input = [ "document content" ],
input_type = "search_document"
)
Reranking
Cohere’s rerank models improve search results.
from litellm import rerank
response = rerank(
model = "cohere/rerank-english-v3.0" ,
query = "What is the capital of France?" ,
documents = [
"Paris is the capital of France." ,
"London is the capital of England." ,
"Berlin is the capital of Germany."
],
top_n = 2
)
# Get ranked results
for result in response.results:
print ( f "Score: { result.relevance_score } " )
print ( f "Document: { result.document } " )
Citations
Cohere automatically provides citations for grounded responses.
from litellm import completion
response = completion(
model = "cohere/command-r-plus" ,
messages = [{ "role" : "user" , "content" : "Tell me about LiteLLM" }],
documents = [
{ "text" : "LiteLLM is a unified interface for LLMs." },
{ "text" : "It supports 100+ LLM providers." }
]
)
# Access citations
if hasattr (response, 'citations' ):
for citation in response.citations:
print ( f "Cited document: { citation.document_ids } " )
Configuration
Basic Config
Cohere-Specific
from litellm import completion
response = completion(
model = "cohere/command-r-plus" ,
messages = [{ "role" : "user" , "content" : "Hello!" }],
temperature = 0.8 ,
max_tokens = 500 ,
top_p = 0.9 ,
frequency_penalty = 0.5 ,
presence_penalty = 0.5
)
from litellm import completion
response = completion(
model = "cohere/command-r-plus" ,
messages = [{ "role" : "user" , "content" : "Hello!" }],
# Cohere-specific parameters
preamble = "You are a helpful assistant." ,
k = 50 , # Top-k sampling
p = 0.75 , # Nucleus sampling
seed = 42 # For reproducibility
)
Supported Parameters
Parameter Type Description temperaturefloat Randomness (0-1) max_tokensint Max output tokens max_completion_tokensint Alternative to max_tokens top_pfloat Nucleus sampling frequency_penaltyfloat Reduce repetition presence_penaltyfloat Encourage diversity stoplist Stop sequences nint Number of completions seedint Reproducibility preamblestr System message kint Top-k sampling documentslist Documents for grounding
Error Handling
from litellm import completion
from litellm.exceptions import APIError, RateLimitError
try :
response = completion(
model = "cohere/command-r-plus" ,
messages = [{ "role" : "user" , "content" : "Hello!" }]
)
except RateLimitError as e:
print ( f "Rate limit exceeded: { e } " )
except APIError as e:
print ( f "API error: { e.status_code } - { e.message } " )
LiteLLM Proxy
Use Cohere through the LiteLLM proxy server.
model_list :
- model_name : command-r-plus
litellm_params :
model : cohere/command-r-plus
api_key : os.environ/COHERE_API_KEY
import openai
client = openai.OpenAI(
api_key = "sk-1234" , # LiteLLM proxy key
base_url = "http://0.0.0.0:4000"
)
response = client.chat.completions.create(
model = "command-r-plus" ,
messages = [{ "role" : "user" , "content" : "Hello!" }]
)
Best Practices
Use max_completion_tokens instead of deprecated max_tokens
Monitor token usage via response.usage
Cohere uses billed units for accurate billing
LiteLLM automatically converts OpenAI format to Cohere format
Use force_single_step=True when needed
Handle tool results properly in conversation history