Messages

POST /v1/messages

Creates a message using the Anthropic Messages API format. This endpoint lets you call Anthropic models (and compatible providers) with the native request structure — including system as a top-level field, Anthropic-style content blocks, and extended thinking configuration — without translating to the OpenAI format.

This endpoint is intended for Anthropic-native integrations. For cross-provider routing with a unified format, use chat completions instead.

Request headers

x-portkey-provider

string

The provider to route the request to. Typically anthropic for this endpoint.

x-portkey-api-key

string

Your provider API key.

x-portkey-config

string

A JSON config object or config ID that defines routing, fallbacks, retries, and more.

x-portkey-virtual-key

string

A virtual key ID from Portkey Cloud.

Request body

model

string

required

The model identifier (e.g. claude-opus-4-5, claude-3-5-sonnet-20241022).

messages

object[]

required

The conversation messages. Only user and assistant roles are accepted; system instructions go in the top-level system field.

Show properties

role

string

required

Either user or assistant.

content

string | object[]

required

The message content. A plain string or an array of typed content blocks (text, image, document, tool_use, tool_result, etc.).

max_tokens

integer

required

The maximum number of tokens to generate. The request fails if the prompt plus max_tokens exceeds the model’s context length.

system

string | object[]

The system prompt. Accepts a plain string or an array of TextBlockParam objects for structured system prompts (including prompt caching with cache_control).

temperature

number

Sampling temperature between 0 and 1 for most Claude models.

top_p

number

Nucleus sampling threshold. Only tokens in the top top_p cumulative probability are considered.

top_k

integer

Only sample from the top top_k tokens at each step.

stream

boolean

default:"false"

When true, responses stream as server-sent events.

stop_sequences

string[]

Custom sequences that cause the model to stop generating.

tools

object[]

Definitions of tools the model may call.

Show properties

name

string

required

The tool name.

description

string

A description of what the tool does.

input_schema

object

required

JSON Schema describing the tool’s input parameters.

tool_choice

object

Controls tool selection. One of {"type": "auto"}, {"type": "any"}, {"type": "tool", "name": "..."}, or {"type": "none"}.

thinking

object

Configuration for extended thinking (reasoning).

Show properties

type

string

required

enabled to turn on thinking, disabled to turn it off.

budget_tokens

integer

The number of tokens Claude may use for internal reasoning. Required when type is enabled.

metadata

object

Optional metadata about the request.

Show properties

user_id

string

An external identifier for the end user.

service_tier

string

Capacity tier to use. One of auto or standard_only.

Response

string

Unique message identifier.

type

string

Always message.

role

string

Always assistant.

model

string

The model that produced the response.

content

object[]

The generated content as an array of typed content blocks.

Show properties

type

string

Block type: text, tool_use, thinking, or redacted_thinking.

text

string

The generated text. Present when type is text.

string

Tool call identifier. Present when type is tool_use.

name

string

Tool name. Present when type is tool_use.

input

object

Tool input arguments. Present when type is tool_use.

thinking

string

The model’s reasoning. Present when type is thinking.

stop_reason

string

Why the model stopped. One of end_turn, max_tokens, stop_sequence, or tool_use.

stop_sequence

string | null

The stop sequence that triggered a stop, if any.

usage

object

Token usage for the request.

Show properties

input_tokens

integer

Number of input tokens used.

output_tokens

integer

Number of output tokens generated.

cache_creation_input_tokens

integer

Tokens written to the prompt cache (when prompt caching is active).

cache_read_input_tokens

integer

Tokens read from the prompt cache (when prompt caching is active).

Code examples

curl http://localhost:8787/v1/messages \
  -H "Content-Type: application/json" \
  -H "x-portkey-provider: anthropic" \
  -H "Authorization: Bearer $ANTHROPIC_API_KEY" \
  -d '{
    "model": "claude-opus-4-5",
    "max_tokens": 1024,
    "system": "You are a helpful assistant.",
    "messages": [
      {"role": "user", "content": "What is the Portkey AI Gateway?"}
    ]
  }'

POST /v1/messages/count_tokens

Count the number of tokens that a given request would consume without generating a response. Useful for checking whether a prompt fits within a model’s context window before making the full request. The request body has the same structure as POST /v1/messages.

curl

curl http://localhost:8787/v1/messages/count_tokens \
  -H "Content-Type: application/json" \
  -H "x-portkey-provider: anthropic" \
  -H "Authorization: Bearer $ANTHROPIC_API_KEY" \
  -d '{
    "model": "claude-opus-4-5",
    "system": "You are a helpful assistant.",
    "messages": [
      {"role": "user", "content": "What is the Portkey AI Gateway?"}
    ]
  }'

Response

input_tokens

integer

The number of tokens the request would use, including the system prompt and all messages.

Overview

Chat

Multimodal

Files & Batches

Other

POST /v1/messages

Request headers

Request body

Response

Code examples

POST /v1/messages/count_tokens

Response

Build docs developers (and LLMs) love

Overview

Chat

Multimodal

Files & Batches

Other

​POST /v1/messages

​Request headers

​Request body

​Response

​Code examples

​POST /v1/messages/count_tokens

​Response

Build docs developers (and LLMs) love

POST /v1/messages

Request headers

Request body

Response

Code examples

POST /v1/messages/count_tokens

Response