Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/jundot/omlx/llms.txt

Use this file to discover all available pages before exploring further.

The /v1/embeddings endpoint generates dense vector representations of text using embedding models. It is fully compatible with the OpenAI Embeddings API, so any library using openai.embeddings.create(...) works by changing only the base URL. oMLX auto-detects embedding model families — BERT, BGE-M3, and ModernBERT are supported — and routes requests to the appropriate engine automatically.

Supported models

FamilyExamples
BERTbert-base-uncased, bert-large-uncased
BGE-M3bge-m3, bge-large-en-v1.5
ModernBERTmodernbert-base, modernbert-large
Point --model-dir at a directory containing MLX-format embedding model subdirectories. The model type is detected automatically; no manual configuration is needed.

Request

POST /v1/embeddings

Parameters

model
string
required
The embedding model name or alias to use. Must match an embedding model discovered in your model directory.
input
string | string[]
The text to embed. Accepts a single string or an array of strings. Each string is embedded independently. Either input or items must be provided, but not both.
items
object[]
Structured input for multimodal embedding models. Each item is an object with at least one of text (string) or image (string). Mutually exclusive with input.
encoding_format
string
default:"float"
Format of the returned embedding vector. "float" returns a JSON array of numbers; "base64" returns a base64-encoded string of little-endian 32-bit floats, suitable for compact storage.
dimensions
number
Truncate the output embedding to this many dimensions. Only supported by models that allow dimension reduction. If the model does not support it, the full-dimension vector is returned.

Examples

curl http://localhost:8000/v1/embeddings \
  -H "Content-Type: application/json" \
  -d '{
    "model": "bge-m3",
    "input": "The quick brown fox jumps over the lazy dog"
  }'

Response

object
string
Always "list".
data
object[]
List of embedding results, one per input string, in the same order as the input.
model
string
The embedding model used.
usage
object

Example response

{
  "object": "list",
  "data": [
    {
      "object": "embedding",
      "index": 0,
      "embedding": [0.0234, -0.0891, 0.1234, "..."]
    }
  ],
  "model": "bge-m3",
  "usage": {
    "prompt_tokens": 9,
    "total_tokens": 9
  }
}

Build docs developers (and LLMs) love