POST /v1/embeddings — generate vector embeddings for text

The /v1/embeddings endpoint converts text into dense numerical vectors (embeddings) that capture semantic meaning. These vectors are useful for semantic search, retrieval-augmented generation (RAG), clustering, and similarity ranking. MonoRelay routes the request to the appropriate provider based on your model routing configuration and returns the embeddings in the standard OpenAI format.

Method and path

POST /v1/embeddings

Authentication

Include your Bearer token in the Authorization header.

Authorization: Bearer <your-access-token>

Request body

model

string

required

The embedding model to use, such as text-embedding-3-small or text-embedding-ada-002. Accepts aliases and model@provider syntax.

input

string | string[]

required

The text to embed. Pass a single string for one embedding, or an array of strings to embed multiple texts in a single request. All strings are processed as a batch by the upstream provider.

encoding_format

string

default:"float"

The format of the returned embeddings. Use "float" for a list of floating-point numbers, or "base64" for a base64-encoded binary representation. Not all providers support "base64".

Response

object

string

Always "list".

data

object[]

Array of embedding objects, one per input string, in the same order as the input.

Show data item properties

object

string

Always "embedding".

index

integer

Zero-based index of this embedding in the input array.

embedding

number[]

The embedding vector as an array of floating-point numbers. The dimensionality depends on the model.

model

string

The model name as returned by the upstream provider.

usage

object

Token usage for the request.

Show usage properties

prompt_tokens

integer

Number of tokens in the input text.

total_tokens

integer

Total tokens processed (same as prompt_tokens for embeddings).

Examples

from openai import OpenAI

client = OpenAI(
    base_url="https://<host>/v1",
    api_key="<your-access-token>",
)

response = client.embeddings.create(
    model="text-embedding-3-small",
    input=["MonoRelay is an LLM relay server.", "It supports multiple providers."],
)

for item in response.data:
    print(f"Index {item.index}: {len(item.embedding)}-dimensional vector")

Error responses

Errors are returned as JSON with HTTP 503. The message field is prefixed with the provider name for easier debugging.

{
  "error": {
    "message": "[openai] No available keys for provider 'openai'",
    "type": "no_keys"
  }
}

Overview

OpenAI-Compatible

Anthropic-Compatible

Management API

POST /v1/embeddings — generate vector embeddings for text

Method and path

Authentication

Request body

Response

Examples

Error responses

Build docs developers (and LLMs) love

Overview

OpenAI-Compatible

Anthropic-Compatible

Management API

Documentation Index

​Method and path

​Authentication

​Request body

​Response

​Examples

​Error responses

Build docs developers (and LLMs) love

Method and path

Authentication

Request body

Response

Examples

Error responses