TheDocumentation Index
Fetch the complete documentation index at: https://mintlify.com/jundot/omlx/llms.txt
Use this file to discover all available pages before exploring further.
/v1/chat/completions endpoint is a drop-in replacement for the OpenAI Chat Completions API. It accepts the same request shape and returns the same response structure, so any client or library built for OpenAI works without modification — just change the base URL. The endpoint supports streaming via SSE, function/tool calling, JSON structured output, reasoning model output, and vision inputs for vision-language models (VLMs).
Request
POST /v1/chat/completions
Parameters
The model name or alias to use. Must match a model discovered in your model directory. You can use either the directory name or the alias configured in per-model settings. Use
GET /v1/models to list available models.The conversation history as an array of message objects.
If
true, the server streams partial message deltas as SSE events (one data: line per token). The stream ends with data: [DONE].Options for streaming responses.
Sampling temperature. Higher values produce more random output. Overrides the per-model default when set.
Nucleus sampling probability cutoff. Only tokens comprising the top
top_p probability mass are considered.Top-k sampling: restricts sampling to the
top_k most probable tokens at each step.Minimum probability threshold for token sampling.
Maximum number of tokens to generate. Defaults to the server’s
max_tokens setting (32768 by default).Stop sequence(s). Generation halts when any of these strings is produced. Accepts a single string or an array.
Random seed for reproducible outputs. Best-effort: identical seeds on the same hardware produce identical outputs.
List of tools the model may call. Each tool is an object with
type: "function" and a function object containing name, description, and parameters (JSON Schema).Controls when the model calls a tool.
"auto" lets the model decide, "none" disables tools, or pass {"type": "function", "function": {"name": "my_func"}} to force a specific tool.Enforce structured output format. Set
type to "json_object" to require valid JSON, or "json_schema" with a json_schema definition to enforce a specific schema.vLLM-compatible structured output options. Supports
json (JSON schema), regex, choice, and grammar fields. Pass via extra_body in the OpenAI Python client.Extra keyword arguments passed directly to the model’s chat template (e.g.,
{"enable_thinking": true}, {"reasoning_effort": "low"}).Maximum number of thinking tokens for reasoning models.
null means unlimited. Applies when the model supports adaptive thinking.Penalty for token presence in the generated text so far. Positive values reduce repetition.
Penalty proportional to token frequency in the generated text so far.
XTC (exclude top choices) sampling probability.
XTC sampling probability threshold.
Vision inputs
For vision-language models, pass image content as a content part array in the user message. Both URL and base64 data URIs are accepted:detail field accepts "auto", "low", or "high".
Examples
Response
Unique identifier for the completion, prefixed with
chatcmpl-.Always
"chat.completion".Unix timestamp of when the completion was created.
The model that generated the response.
Array of completion choices.
Token usage and optional timing metrics.