TheDocumentation Index
Fetch the complete documentation index at: https://mintlify.com/jundot/omlx/llms.txt
Use this file to discover all available pages before exploring further.
/v1/messages endpoint is a drop-in replacement for the Anthropic Messages API (POST https://api.anthropic.com/v1/messages). It accepts the same request and response structure as the official Anthropic API, so the official anthropic Python SDK and any other Anthropic-compatible client works by changing only the base_url. oMLX supports adaptive thinking/reasoning, tool use, multi-image vision inputs, and SSE streaming using Anthropic’s event schema.
Request
POST /v1/messages
Parameters
The model name or alias to use. Accepts any model discovered in your model directory. Use
GET /v1/models to list available models.The conversation history. Each message has a
role of "user" or "assistant", and content as either a plain string or an array of content blocks.Maximum number of output tokens to generate.
System prompt. Accepts a plain string or an array of
SystemContent blocks (each with type: "text", text, and optional cache_control).If
true, the server streams response events using Anthropic’s SSE schema: message_start, content_block_start, content_block_delta, content_block_stop, message_delta, and message_stop.Sampling temperature for generation.
Nucleus sampling cutoff probability.
Top-k sampling limit.
List of stop strings. Generation halts when any sequence is produced.
Tool definitions available to the model. Each tool has
name, optional description, and input_schema (JSON Schema). Anthropic server-side tool types (e.g., web_search_20250305) are accepted for client compatibility but dropped before inference since oMLX cannot execute them locally.Controls tool selection.
{"type": "auto"} lets the model decide, {"type": "any"} forces a tool call, or {"type": "tool", "name": "my_tool"} forces a specific tool.Configure adaptive thinking/reasoning.
Extra keyword arguments passed directly to the model’s chat template (e.g.,
{"enable_thinking": true}).Optional metadata for the request. Accepted for API compatibility; not used by the server.
Token counting
To count tokens without generating a response, send a request to:POST /v1/messages/count_tokens
The request body accepts the same fields as /v1/messages (minus stream, temperature, etc.) and returns:
Examples
Response
Unique identifier for the message, prefixed with
msg_.Always
"message".Always
"assistant".The model that generated the response.
Array of content blocks. Each block has a
type field:text: containstextstringtool_use: containsid,name, andinputthinking: containsthinkingstring (reasoning models only)
Why generation stopped:
"end_turn", "max_tokens", "stop_sequence", or "tool_use".The stop sequence that triggered the halt, if applicable.