POST /v1/messages
Creates a message using the Anthropic Messages API format. This endpoint lets you call Anthropic models (and compatible providers) with the native request structure — includingsystem as a top-level field, Anthropic-style content blocks, and extended thinking configuration — without translating to the OpenAI format.
This endpoint is intended for Anthropic-native integrations. For cross-provider routing with a unified format, use chat completions instead.
Request headers
The provider to route the request to. Typically
anthropic for this endpoint.Your provider API key.
A JSON config object or config ID that defines routing, fallbacks, retries, and more.
A virtual key ID from Portkey Cloud.
Request body
The model identifier (e.g.
claude-opus-4-5, claude-3-5-sonnet-20241022).The conversation messages. Only
user and assistant roles are accepted; system instructions go in the top-level system field.The maximum number of tokens to generate. The request fails if the prompt plus
max_tokens exceeds the model’s context length.The system prompt. Accepts a plain string or an array of
TextBlockParam objects for structured system prompts (including prompt caching with cache_control).Sampling temperature between
0 and 1 for most Claude models.Nucleus sampling threshold. Only tokens in the top
top_p cumulative probability are considered.Only sample from the top
top_k tokens at each step.When
true, responses stream as server-sent events.Custom sequences that cause the model to stop generating.
Definitions of tools the model may call.
Controls tool selection. One of
{"type": "auto"}, {"type": "any"}, {"type": "tool", "name": "..."}, or {"type": "none"}.Configuration for extended thinking (reasoning).
Optional metadata about the request.
Capacity tier to use. One of
auto or standard_only.Response
Unique message identifier.
Always
message.Always
assistant.The model that produced the response.
The generated content as an array of typed content blocks.
Why the model stopped. One of
end_turn, max_tokens, stop_sequence, or tool_use.The stop sequence that triggered a stop, if any.
Token usage for the request.
Code examples
POST /v1/messages/count_tokens
Count the number of tokens that a given request would consume without generating a response. Useful for checking whether a prompt fits within a model’s context window before making the full request. The request body has the same structure asPOST /v1/messages.
curl
Response
The number of tokens the request would use, including the system prompt and all messages.