POST /v1/embeddings

The /v1/embeddings endpoint generates dense vector representations of text using embedding models. It is fully compatible with the OpenAI Embeddings API, so any library using openai.embeddings.create(...) works by changing only the base URL. oMLX auto-detects embedding model families — BERT, BGE-M3, and ModernBERT are supported — and routes requests to the appropriate engine automatically.

Supported models

Family	Examples
BERT	`bert-base-uncased`, `bert-large-uncased`
BGE-M3	`bge-m3`, `bge-large-en-v1.5`
ModernBERT	`modernbert-base`, `modernbert-large`

Point --model-dir at a directory containing MLX-format embedding model subdirectories. The model type is detected automatically; no manual configuration is needed.

Request

POST /v1/embeddings

Parameters

model

string

required

The embedding model name or alias to use. Must match an embedding model discovered in your model directory.

input

string | string[]

The text to embed. Accepts a single string or an array of strings. Each string is embedded independently. Either input or items must be provided, but not both.

items

object[]

Structured input for multimodal embedding models. Each item is an object with at least one of text (string) or image (string). Mutually exclusive with input.

Show items object properties

text

string

Text to embed.

image

string

Image to embed (URL, base64 data URI, or local file path). Supported only by multimodal embedding models.

encoding_format

string

default:"float"

Format of the returned embedding vector. "float" returns a JSON array of numbers; "base64" returns a base64-encoded string of little-endian 32-bit floats, suitable for compact storage.

dimensions

number

Truncate the output embedding to this many dimensions. Only supported by models that allow dimension reduction. If the model does not support it, the full-dimension vector is returned.

Examples

curl http://localhost:8000/v1/embeddings \
  -H "Content-Type: application/json" \
  -d '{
    "model": "bge-m3",
    "input": "The quick brown fox jumps over the lazy dog"
  }'

Response

object

string

Always "list".

data

object[]

List of embedding results, one per input string, in the same order as the input.

Show Embedding object properties

object

string

Always "embedding".

index

number

The position of this embedding in the input list.

embedding

number[] | string

The embedding vector. A JSON array of floats when encoding_format is "float", or a base64 string when "base64".

model

string

The embedding model used.

usage

object

Show Usage properties

prompt_tokens

number

Total tokens across all input strings.

total_tokens

number

Same as prompt_tokens (embeddings have no completion tokens).

Example response

{
  "object": "list",
  "data": [
    {
      "object": "embedding",
      "index": 0,
      "embedding": [0.0234, -0.0891, 0.1234, "..."]
    }
  ],
  "model": "bge-m3",
  "usage": {
    "prompt_tokens": 9,
    "total_tokens": 9
  }
}

Overview

Endpoints

MCP Tools API

POST /v1/embeddings

Supported models

Request

Parameters

Examples

Response

Example response

Build docs developers (and LLMs) love

Overview

Endpoints

MCP Tools API

Documentation Index

​Supported models

​Request

​Parameters

​Examples

​Response

​Example response

Build docs developers (and LLMs) love

Supported models

Request

Parameters

Examples

Response

Example response