Skip to main content

POST /v1/messages

Creates a message using the Anthropic Messages API format. This endpoint lets you call Anthropic models (and compatible providers) with the native request structure — including system as a top-level field, Anthropic-style content blocks, and extended thinking configuration — without translating to the OpenAI format.
This endpoint is intended for Anthropic-native integrations. For cross-provider routing with a unified format, use chat completions instead.

Request headers

x-portkey-provider
string
The provider to route the request to. Typically anthropic for this endpoint.
x-portkey-api-key
string
Your provider API key.
x-portkey-config
string
A JSON config object or config ID that defines routing, fallbacks, retries, and more.
x-portkey-virtual-key
string
A virtual key ID from Portkey Cloud.

Request body

model
string
required
The model identifier (e.g. claude-opus-4-5, claude-3-5-sonnet-20241022).
messages
object[]
required
The conversation messages. Only user and assistant roles are accepted; system instructions go in the top-level system field.
max_tokens
integer
required
The maximum number of tokens to generate. The request fails if the prompt plus max_tokens exceeds the model’s context length.
system
string | object[]
The system prompt. Accepts a plain string or an array of TextBlockParam objects for structured system prompts (including prompt caching with cache_control).
temperature
number
Sampling temperature between 0 and 1 for most Claude models.
top_p
number
Nucleus sampling threshold. Only tokens in the top top_p cumulative probability are considered.
top_k
integer
Only sample from the top top_k tokens at each step.
stream
boolean
default:"false"
When true, responses stream as server-sent events.
stop_sequences
string[]
Custom sequences that cause the model to stop generating.
tools
object[]
Definitions of tools the model may call.
tool_choice
object
Controls tool selection. One of {"type": "auto"}, {"type": "any"}, {"type": "tool", "name": "..."}, or {"type": "none"}.
thinking
object
Configuration for extended thinking (reasoning).
metadata
object
Optional metadata about the request.
service_tier
string
Capacity tier to use. One of auto or standard_only.

Response

id
string
Unique message identifier.
type
string
Always message.
role
string
Always assistant.
model
string
The model that produced the response.
content
object[]
The generated content as an array of typed content blocks.
stop_reason
string
Why the model stopped. One of end_turn, max_tokens, stop_sequence, or tool_use.
stop_sequence
string | null
The stop sequence that triggered a stop, if any.
usage
object
Token usage for the request.

Code examples

curl http://localhost:8787/v1/messages \
  -H "Content-Type: application/json" \
  -H "x-portkey-provider: anthropic" \
  -H "Authorization: Bearer $ANTHROPIC_API_KEY" \
  -d '{
    "model": "claude-opus-4-5",
    "max_tokens": 1024,
    "system": "You are a helpful assistant.",
    "messages": [
      {"role": "user", "content": "What is the Portkey AI Gateway?"}
    ]
  }'

POST /v1/messages/count_tokens

Count the number of tokens that a given request would consume without generating a response. Useful for checking whether a prompt fits within a model’s context window before making the full request. The request body has the same structure as POST /v1/messages.
curl
curl http://localhost:8787/v1/messages/count_tokens \
  -H "Content-Type: application/json" \
  -H "x-portkey-provider: anthropic" \
  -H "Authorization: Bearer $ANTHROPIC_API_KEY" \
  -d '{
    "model": "claude-opus-4-5",
    "system": "You are a helpful assistant.",
    "messages": [
      {"role": "user", "content": "What is the Portkey AI Gateway?"}
    ]
  }'

Response

input_tokens
integer
The number of tokens the request would use, including the system prompt and all messages.

Build docs developers (and LLMs) love