POST /v1/chat/completions
Sends a list of messages to a supported chat model and returns a generated response. This endpoint is fully compatible with the OpenAI Chat Completions API, so any OpenAI-compatible SDK works without code changes. Requests are routed through the gateway according to the config or headers you provide. Fallback, retry, load balancing, and caching all apply transparently.Request headers
The provider to route the request to (e.g.
openai, anthropic, azure-openai). Required when not using a config.Your provider API key. Alternative to setting the key in a virtual key or config.
A JSON config object (or config ID from Portkey Cloud) that defines routing, fallbacks, retries, and more.
A virtual key ID from Portkey Cloud that maps to a stored provider credential.
Request body
The model identifier to use (e.g.
gpt-4o, claude-3-5-sonnet-20241022). The gateway forwards this value to the target provider; use the provider’s model name.The conversation history as an array of message objects. Each object must have a
role and content.Sampling temperature between
0 and 2. Higher values produce more varied output; lower values produce more focused, deterministic output. Avoid using this alongside top_p.The maximum number of tokens to generate. The request fails if the prompt plus
max_tokens exceeds the model’s context length.An upper bound on tokens generated for a completion, including visible and reasoning tokens. Preferred over
max_tokens for models that support reasoning.When
true, responses are sent as server-sent events (text/event-stream). Each event is a JSON delta prefixed with data: . The stream ends with data: [DONE].Nucleus sampling threshold between
0 and 1. The model considers only the tokens whose cumulative probability reaches top_p. Avoid using alongside temperature.Number between
-2.0 and 2.0. Positive values penalize tokens that have already appeared in the output, reducing repetition.Number between
-2.0 and 2.0. Positive values penalize tokens that appear anywhere in the text so far, encouraging the model to introduce new topics.The number of chat completion choices to generate for each input message.
One or more sequences where the model stops generating further tokens. The stop sequence itself is not included in the output.
A list of tools the model may call. Each tool describes a function the model can invoke.
Controls how the model selects tools. One of
none, auto, required, or an object {"type": "function", "function": {"name": "..."}}.Specifies the output format. Set
type to json_object to enable JSON mode, or json_schema to enforce a specific schema.A seed for deterministic sampling. The same seed and parameters should return the same result, though determinism is not guaranteed.
Whether to return log probabilities for the output tokens.
The number of most likely tokens (0–20) to return at each position, along with their log probabilities. Requires
logprobs to be true.Options that modify streaming behavior.
A unique identifier for the end user. Helps with monitoring and abuse detection.
Controls the amount of reasoning for models that support extended thinking (e.g.
o3, Claude). One of none, minimal, low, medium, or high.Whether to store the output of this request for use with model distillation or evals.
Key-value pairs attached to the stored output when
store is true.Response
A unique identifier for the completion in the form
chatcmpl-....Always
chat.completion for non-streaming responses, or chat.completion.chunk for streaming deltas.Unix timestamp of when the completion was created.
The model used for the completion.
An array of completion choices. Contains one item unless
n is greater than 1.Token usage for this request.
A fingerprint representing the backend configuration used to serve the request.