Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/Excurs1ons/MonoRelay/llms.txt

Use this file to discover all available pages before exploring further.

The /v1/completions endpoint provides legacy text completion behavior: you supply a raw text prompt and the model continues it. This differs from chat completions, which use a structured message list. The endpoint is maintained for compatibility with older OpenAI SDK versions and tools that target the original text-davinci-003-style interface.
For modern chat-oriented models such as GPT-4o, Claude, or Gemini, prefer /v1/chat/completions. The completions endpoint is best suited for base models or instruct-tuned models that expect a raw prompt rather than a conversation.

Method and path

POST /v1/completions

Authentication

Include your Bearer token in the Authorization header on every request.
Authorization: Bearer <your-access-token>

Request body

model
string
required
Model name, alias, or model@provider syntax. MonoRelay resolves the model through the same routing rules used by the chat endpoint.
prompt
string | string[]
required
The prompt text (or array of prompts) to complete. The model generates a continuation starting from where this text ends.
max_tokens
integer
Maximum number of tokens to generate. When omitted, the upstream provider’s default limit applies.
temperature
number
Sampling temperature between 0 and 2. Lower values produce more focused, deterministic output.
stream
boolean
default:"false"
When true, the response is delivered as SSE stream chunks ending with data: [DONE].
stop
string | string[]
One or more sequences at which generation should stop. The stop sequence is not included in the output.
n
integer
default:"1"
Number of completion choices to generate for the prompt.

Example

curl https://<host>/v1/completions \
  -H "Authorization: Bearer <token>" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-3.5-turbo-instruct",
    "prompt": "The capital of France is",
    "max_tokens": 10,
    "temperature": 0
  }'
The response follows the standard OpenAI completions format:
{
  "id": "cmpl-...",
  "object": "text_completion",
  "created": 1710000000,
  "model": "gpt-3.5-turbo-instruct",
  "choices": [
    {
      "text": " Paris.",
      "index": 0,
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 7,
    "completion_tokens": 3,
    "total_tokens": 10
  }
}

Error responses

Errors follow the same structure as all MonoRelay endpoints, returning a JSON body with an error object and HTTP 503 for upstream failures.
{
  "error": {
    "message": "[openrouter] Provider 'openrouter' is not enabled",
    "type": "provider_disabled"
  }
}

Build docs developers (and LLMs) love