POST /v1/completions — legacy text completion endpoint

The /v1/completions endpoint provides legacy text completion behavior: you supply a raw text prompt and the model continues it. This differs from chat completions, which use a structured message list. The endpoint is maintained for compatibility with older OpenAI SDK versions and tools that target the original text-davinci-003-style interface.

For modern chat-oriented models such as GPT-4o, Claude, or Gemini, prefer /v1/chat/completions. The completions endpoint is best suited for base models or instruct-tuned models that expect a raw prompt rather than a conversation.

Method and path

POST /v1/completions

Authentication

Include your Bearer token in the Authorization header on every request.

Authorization: Bearer <your-access-token>

Request body

model

string

required

Model name, alias, or model@provider syntax. MonoRelay resolves the model through the same routing rules used by the chat endpoint.

prompt

string | string[]

required

The prompt text (or array of prompts) to complete. The model generates a continuation starting from where this text ends.

max_tokens

integer

Maximum number of tokens to generate. When omitted, the upstream provider’s default limit applies.

temperature

number

Sampling temperature between 0 and 2. Lower values produce more focused, deterministic output.

stream

boolean

default:"false"

When true, the response is delivered as SSE stream chunks ending with data: [DONE].

stop

string | string[]

One or more sequences at which generation should stop. The stop sequence is not included in the output.

integer

default:"1"

Number of completion choices to generate for the prompt.

Example

curl https://<host>/v1/completions \
  -H "Authorization: Bearer <token>" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-3.5-turbo-instruct",
    "prompt": "The capital of France is",
    "max_tokens": 10,
    "temperature": 0
  }'

The response follows the standard OpenAI completions format:

{
  "id": "cmpl-...",
  "object": "text_completion",
  "created": 1710000000,
  "model": "gpt-3.5-turbo-instruct",
  "choices": [
    {
      "text": " Paris.",
      "index": 0,
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 7,
    "completion_tokens": 3,
    "total_tokens": 10
  }
}

Error responses

Errors follow the same structure as all MonoRelay endpoints, returning a JSON body with an error object and HTTP 503 for upstream failures.

{
  "error": {
    "message": "[openrouter] Provider 'openrouter' is not enabled",
    "type": "provider_disabled"
  }
}

Overview

OpenAI-Compatible

Anthropic-Compatible

Management API

POST /v1/completions — legacy text completion endpoint

Method and path

Authentication

Request body

Example

Error responses

Build docs developers (and LLMs) love

Overview

OpenAI-Compatible

Anthropic-Compatible

Management API

Documentation Index

​Method and path

​Authentication

​Request body

​Example

​Error responses

Build docs developers (and LLMs) love

Method and path

Authentication

Request body

Example

Error responses