Chat completions

POST /v1/chat/completions

Sends a list of messages to a supported chat model and returns a generated response. This endpoint is fully compatible with the OpenAI Chat Completions API, so any OpenAI-compatible SDK works without code changes. Requests are routed through the gateway according to the config or headers you provide. Fallback, retry, load balancing, and caching all apply transparently.

Request headers

x-portkey-provider

string

The provider to route the request to (e.g. openai, anthropic, azure-openai). Required when not using a config.

x-portkey-api-key

string

Your provider API key. Alternative to setting the key in a virtual key or config.

x-portkey-config

string

A JSON config object (or config ID from Portkey Cloud) that defines routing, fallbacks, retries, and more.

x-portkey-virtual-key

string

A virtual key ID from Portkey Cloud that maps to a stored provider credential.

Request body

model

string

required

The model identifier to use (e.g. gpt-4o, claude-3-5-sonnet-20241022). The gateway forwards this value to the target provider; use the provider’s model name.

messages

object[]

required

The conversation history as an array of message objects. Each object must have a role and content.

Show properties

role

string

required

The sender role. One of system, user, assistant, tool, or developer.

content

string | object[]

required

The message content. A plain string for most messages. An array of content blocks (type text, image_url, or input_audio) for multimodal input.

name

string

An optional name for the participant, used to differentiate multiple participants with the same role.

tool_calls

object[]

Tool call objects generated by a previous assistant turn. Required when role is tool.

tool_call_id

string

The ID of the tool call this message is responding to. Required when role is tool.

temperature

number

default:"1"

Sampling temperature between 0 and 2. Higher values produce more varied output; lower values produce more focused, deterministic output. Avoid using this alongside top_p.

max_tokens

integer

default:"100"

The maximum number of tokens to generate. The request fails if the prompt plus max_tokens exceeds the model’s context length.

max_completion_tokens

integer

An upper bound on tokens generated for a completion, including visible and reasoning tokens. Preferred over max_tokens for models that support reasoning.

stream

boolean

default:"false"

When true, responses are sent as server-sent events (text/event-stream). Each event is a JSON delta prefixed with data: . The stream ends with data: [DONE].

top_p

number

default:"1"

Nucleus sampling threshold between 0 and 1. The model considers only the tokens whose cumulative probability reaches top_p. Avoid using alongside temperature.

frequency_penalty

number

default:"0"

Number between -2.0 and 2.0. Positive values penalize tokens that have already appeared in the output, reducing repetition.

presence_penalty

number

default:"0"

Number between -2.0 and 2.0. Positive values penalize tokens that appear anywhere in the text so far, encouraging the model to introduce new topics.

integer

default:"1"

The number of chat completion choices to generate for each input message.

stop

string | string[]

One or more sequences where the model stops generating further tokens. The stop sequence itself is not included in the output.

tools

object[]

A list of tools the model may call. Each tool describes a function the model can invoke.

Show properties

type

string

required

Always function.

function

object

required

Function definition with name, optional description, and parameters as a JSON Schema object.

tool_choice

string | object

default:"auto"

Controls how the model selects tools. One of none, auto, required, or an object {"type": "function", "function": {"name": "..."}}.

response_format

object

Specifies the output format. Set type to json_object to enable JSON mode, or json_schema to enforce a specific schema.

Show properties

type

string

required

One of text, json_object, or json_schema.

json_schema

object

The schema to enforce when type is json_schema. Include name, optional description, schema (JSON Schema), and strict (boolean).

seed

integer

A seed for deterministic sampling. The same seed and parameters should return the same result, though determinism is not guaranteed.

logprobs

boolean

default:"false"

Whether to return log probabilities for the output tokens.

top_logprobs

integer

The number of most likely tokens (0–20) to return at each position, along with their log probabilities. Requires logprobs to be true.

stream_options

object

Options that modify streaming behavior.

Show properties

include_usage

boolean

When true, a final chunk is emitted before [DONE] that includes the usage field.

user

string

A unique identifier for the end user. Helps with monitoring and abuse detection.

reasoning_effort

string

Controls the amount of reasoning for models that support extended thinking (e.g. o3, Claude). One of none, minimal, low, medium, or high.

store

boolean

Whether to store the output of this request for use with model distillation or evals.

metadata

object

Key-value pairs attached to the stored output when store is true.

Response

string

A unique identifier for the completion in the form chatcmpl-....

object

string

Always chat.completion for non-streaming responses, or chat.completion.chunk for streaming deltas.

created

integer

Unix timestamp of when the completion was created.

model

string

The model used for the completion.

choices

object[]

An array of completion choices. Contains one item unless n is greater than 1.

Show properties

index

integer

The index of this choice in the array.

message

object

The generated message.

Show properties

role

string

Always assistant.

content

string | null

The text content of the message. null when the model called a tool.

tool_calls

object[]

Tool calls made by the model, if any.

finish_reason

string

Why the model stopped generating. One of stop, length, tool_calls, or content_filter.

logprobs

object | null

Log probability information for output tokens. Present when logprobs was true in the request.

usage

object

Token usage for this request.

Show properties

prompt_tokens

integer

Number of tokens in the input.

completion_tokens

integer

Number of tokens in the generated output.

total_tokens

integer

Total tokens consumed (prompt_tokens + completion_tokens).

system_fingerprint

string

A fingerprint representing the backend configuration used to serve the request.

Code examples

curl http://localhost:8787/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "x-portkey-provider: openai" \
  -H "x-portkey-api-key: $OPENAI_API_KEY" \
  -d '{
    "model": "gpt-4o",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "What is the Portkey AI Gateway?"}
    ],
    "temperature": 0.7,
    "max_tokens": 256
  }'

Overview

Chat

Multimodal

Files & Batches

Other

POST /v1/chat/completions

Request headers

Request body

Response

Code examples

Build docs developers (and LLMs) love

Overview

Chat

Multimodal

Files & Batches

Other

​POST /v1/chat/completions

​Request headers

​Request body

​Response

​Code examples

Build docs developers (and LLMs) love

POST /v1/chat/completions

Request headers

Request body

Response

Code examples