Skip to main content

POST /v1/chat/completions

Sends a list of messages to a supported chat model and returns a generated response. This endpoint is fully compatible with the OpenAI Chat Completions API, so any OpenAI-compatible SDK works without code changes. Requests are routed through the gateway according to the config or headers you provide. Fallback, retry, load balancing, and caching all apply transparently.

Request headers

x-portkey-provider
string
The provider to route the request to (e.g. openai, anthropic, azure-openai). Required when not using a config.
x-portkey-api-key
string
Your provider API key. Alternative to setting the key in a virtual key or config.
x-portkey-config
string
A JSON config object (or config ID from Portkey Cloud) that defines routing, fallbacks, retries, and more.
x-portkey-virtual-key
string
A virtual key ID from Portkey Cloud that maps to a stored provider credential.

Request body

model
string
required
The model identifier to use (e.g. gpt-4o, claude-3-5-sonnet-20241022). The gateway forwards this value to the target provider; use the provider’s model name.
messages
object[]
required
The conversation history as an array of message objects. Each object must have a role and content.
temperature
number
default:"1"
Sampling temperature between 0 and 2. Higher values produce more varied output; lower values produce more focused, deterministic output. Avoid using this alongside top_p.
max_tokens
integer
default:"100"
The maximum number of tokens to generate. The request fails if the prompt plus max_tokens exceeds the model’s context length.
max_completion_tokens
integer
An upper bound on tokens generated for a completion, including visible and reasoning tokens. Preferred over max_tokens for models that support reasoning.
stream
boolean
default:"false"
When true, responses are sent as server-sent events (text/event-stream). Each event is a JSON delta prefixed with data: . The stream ends with data: [DONE].
top_p
number
default:"1"
Nucleus sampling threshold between 0 and 1. The model considers only the tokens whose cumulative probability reaches top_p. Avoid using alongside temperature.
frequency_penalty
number
default:"0"
Number between -2.0 and 2.0. Positive values penalize tokens that have already appeared in the output, reducing repetition.
presence_penalty
number
default:"0"
Number between -2.0 and 2.0. Positive values penalize tokens that appear anywhere in the text so far, encouraging the model to introduce new topics.
n
integer
default:"1"
The number of chat completion choices to generate for each input message.
stop
string | string[]
One or more sequences where the model stops generating further tokens. The stop sequence itself is not included in the output.
tools
object[]
A list of tools the model may call. Each tool describes a function the model can invoke.
tool_choice
string | object
default:"auto"
Controls how the model selects tools. One of none, auto, required, or an object {"type": "function", "function": {"name": "..."}}.
response_format
object
Specifies the output format. Set type to json_object to enable JSON mode, or json_schema to enforce a specific schema.
seed
integer
A seed for deterministic sampling. The same seed and parameters should return the same result, though determinism is not guaranteed.
logprobs
boolean
default:"false"
Whether to return log probabilities for the output tokens.
top_logprobs
integer
The number of most likely tokens (0–20) to return at each position, along with their log probabilities. Requires logprobs to be true.
stream_options
object
Options that modify streaming behavior.
user
string
A unique identifier for the end user. Helps with monitoring and abuse detection.
reasoning_effort
string
Controls the amount of reasoning for models that support extended thinking (e.g. o3, Claude). One of none, minimal, low, medium, or high.
store
boolean
Whether to store the output of this request for use with model distillation or evals.
metadata
object
Key-value pairs attached to the stored output when store is true.

Response

id
string
A unique identifier for the completion in the form chatcmpl-....
object
string
Always chat.completion for non-streaming responses, or chat.completion.chunk for streaming deltas.
created
integer
Unix timestamp of when the completion was created.
model
string
The model used for the completion.
choices
object[]
An array of completion choices. Contains one item unless n is greater than 1.
usage
object
Token usage for this request.
system_fingerprint
string
A fingerprint representing the backend configuration used to serve the request.

Code examples

curl http://localhost:8787/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "x-portkey-provider: openai" \
  -H "x-portkey-api-key: $OPENAI_API_KEY" \
  -d '{
    "model": "gpt-4o",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "What is the Portkey AI Gateway?"}
    ],
    "temperature": 0.7,
    "max_tokens": 256
  }'

Build docs developers (and LLMs) love