TheDocumentation Index
Fetch the complete documentation index at: https://mintlify.com/jundot/omlx/llms.txt
Use this file to discover all available pages before exploring further.
/v1/completions endpoint provides raw text completion — you supply a prompt string and the model continues it. Unlike chat completions, there is no message formatting or chat template applied; the prompt is passed directly to the model. This is the right endpoint for legacy pipelines, fill-in-the-middle tasks, or any case where you want precise control over the exact text fed to the model.
Request
POST /v1/completions
Parameters
The model name or alias to use. Use
GET /v1/models to list available models.The prompt(s) to complete. Accepts a single string or a list of strings. When a list is provided, each prompt is completed independently.
If
true, the server streams partial token deltas as SSE events. The stream ends with data: [DONE].Maximum number of tokens to generate. Defaults to the server’s
max_tokens setting.Sampling temperature. Higher values produce more varied output.
Nucleus sampling cutoff. Only the top probability mass summing to
top_p is sampled.Minimum probability threshold for sampling.
Stop sequence(s). Generation halts when any sequence is produced.
Seed for reproducible generation. Best-effort on Apple Silicon.
Penalty for tokens already present in the output.
Penalty proportional to token frequency in the output so far.
XTC (exclude top choices) sampling probability.
XTC sampling probability threshold.
Examples
Response
Unique identifier for the completion, prefixed with
cmpl-.Always
"text_completion".Unix timestamp of when the completion was created.
The model that generated the completion.
Array of completion choices, one per prompt.
Token usage statistics.