Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/jundot/omlx/llms.txt

Use this file to discover all available pages before exploring further.

The /v1/completions endpoint provides raw text completion — you supply a prompt string and the model continues it. Unlike chat completions, there is no message formatting or chat template applied; the prompt is passed directly to the model. This is the right endpoint for legacy pipelines, fill-in-the-middle tasks, or any case where you want precise control over the exact text fed to the model.

Request

POST /v1/completions

Parameters

model
string
required
The model name or alias to use. Use GET /v1/models to list available models.
prompt
string | string[]
required
The prompt(s) to complete. Accepts a single string or a list of strings. When a list is provided, each prompt is completed independently.
stream
boolean
default:"false"
If true, the server streams partial token deltas as SSE events. The stream ends with data: [DONE].
stream_options
object
max_tokens
number
Maximum number of tokens to generate. Defaults to the server’s max_tokens setting.
temperature
number
Sampling temperature. Higher values produce more varied output.
top_p
number
Nucleus sampling cutoff. Only the top probability mass summing to top_p is sampled.
min_p
number
Minimum probability threshold for sampling.
stop
string | string[]
Stop sequence(s). Generation halts when any sequence is produced.
seed
number
Seed for reproducible generation. Best-effort on Apple Silicon.
presence_penalty
number
Penalty for tokens already present in the output.
frequency_penalty
number
Penalty proportional to token frequency in the output so far.
xtc_probability
number
XTC (exclude top choices) sampling probability.
xtc_threshold
number
XTC sampling probability threshold.

Examples

curl http://localhost:8000/v1/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "your-model",
    "prompt": "The quick brown fox",
    "max_tokens": 50,
    "temperature": 0.7
  }'

Response

id
string
Unique identifier for the completion, prefixed with cmpl-.
object
string
Always "text_completion".
created
number
Unix timestamp of when the completion was created.
model
string
The model that generated the completion.
choices
object[]
Array of completion choices, one per prompt.
usage
object
Token usage statistics.

Example response

{
  "id": "cmpl-abc123",
  "object": "text_completion",
  "created": 1746835200,
  "model": "your-model",
  "choices": [
    {
      "index": 0,
      "text": " jumps over the lazy dog near the old oak tree.",
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 5,
    "completion_tokens": 12,
    "total_tokens": 17
  }
}

Build docs developers (and LLMs) love