POST /v1/completions

The /v1/completions endpoint provides raw text completion — you supply a prompt string and the model continues it. Unlike chat completions, there is no message formatting or chat template applied; the prompt is passed directly to the model. This is the right endpoint for legacy pipelines, fill-in-the-middle tasks, or any case where you want precise control over the exact text fed to the model.

Request

POST /v1/completions

Parameters

model

string

required

The model name or alias to use. Use GET /v1/models to list available models.

prompt

string | string[]

required

The prompt(s) to complete. Accepts a single string or a list of strings. When a list is provided, each prompt is completed independently.

stream

boolean

default:"false"

If true, the server streams partial token deltas as SSE events. The stream ends with data: [DONE].

stream_options

object

Show stream_options properties

include_usage

boolean

default:"false"

Include token usage statistics on the final SSE chunk.

max_tokens

number

Maximum number of tokens to generate. Defaults to the server’s max_tokens setting.

temperature

number

Sampling temperature. Higher values produce more varied output.

top_p

number

Nucleus sampling cutoff. Only the top probability mass summing to top_p is sampled.

min_p

number

Minimum probability threshold for sampling.

stop

string | string[]

Stop sequence(s). Generation halts when any sequence is produced.

seed

number

Seed for reproducible generation. Best-effort on Apple Silicon.

presence_penalty

number

Penalty for tokens already present in the output.

frequency_penalty

number

Penalty proportional to token frequency in the output so far.

xtc_probability

number

XTC (exclude top choices) sampling probability.

xtc_threshold

number

XTC sampling probability threshold.

Examples

curl http://localhost:8000/v1/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "your-model",
    "prompt": "The quick brown fox",
    "max_tokens": 50,
    "temperature": 0.7
  }'

Response

string

Unique identifier for the completion, prefixed with cmpl-.

object

string

Always "text_completion".

created

number

Unix timestamp of when the completion was created.

model

string

The model that generated the completion.

choices

object[]

Array of completion choices, one per prompt.

Show Choice properties

index

number

Index of this choice.

text

string

The generated text continuation.

finish_reason

string

Why generation stopped: "stop" (natural end or stop sequence hit) or "length" (max tokens reached).

usage

object

Token usage statistics.

Show Usage properties

prompt_tokens

number

Tokens in the prompt.

completion_tokens

number

Tokens in the generated text.

total_tokens

number

Total tokens consumed.

Example response

{
  "id": "cmpl-abc123",
  "object": "text_completion",
  "created": 1746835200,
  "model": "your-model",
  "choices": [
    {
      "index": 0,
      "text": " jumps over the lazy dog near the old oak tree.",
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 5,
    "completion_tokens": 12,
    "total_tokens": 17
  }
}

Overview

Endpoints

MCP Tools API

POST /v1/completions

Request

Parameters

Examples

Response

Example response

Build docs developers (and LLMs) love

Overview

Endpoints

MCP Tools API

Documentation Index

​Request

​Parameters

​Examples

​Response

​Example response

Build docs developers (and LLMs) love

Request

Parameters

Examples

Response

Example response