Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/jundot/omlx/llms.txt

Use this file to discover all available pages before exploring further.

oMLX exposes a local HTTP API that is a drop-in replacement for both the OpenAI and Anthropic REST APIs. Any client that works with https://api.openai.com/v1 or https://api.anthropic.com can be pointed at http://localhost:8000 with no code changes beyond the base URL. The server runs fully offline on Apple Silicon and supports streaming via Server-Sent Events (SSE), tool calling, structured output, vision inputs, and MCP tool integration.

Base URL

http://localhost:8000
The host and port are configurable via --host and --port CLI flags, or through the admin panel at /admin. All API paths below are relative to this base URL.

All endpoints

MethodPathDescription
POST/v1/chat/completionsChat completions (OpenAI compatible)
POST/v1/completionsText completions
POST/v1/messagesAnthropic Messages API
POST/v1/responsesOpenAI Responses API (Codex compatibility)
POST/v1/embeddingsText embeddings
POST/v1/rerankDocument reranking
GET/v1/modelsList available models
GET/healthHealth check
GET/v1/mcp/toolsList MCP tools
GET/v1/mcp/serversMCP server status
POST/v1/mcp/executeExecute an MCP tool

Streaming

Most generation endpoints accept "stream": true in the request body. When streaming is enabled, the server responds with a series of text/event-stream SSE events. Pass "stream_options": {"include_usage": true} to receive token usage statistics on the final chunk.

Authentication

Authentication is optional. By default, oMLX accepts all requests without an API key, which is safe for localhost-only use. To require a key, start the server with --api-key or configure it in the admin panel. See Authentication for full details.

Explore the API

Chat completions

OpenAI-compatible chat endpoint with streaming, tool calling, and vision support.

Anthropic Messages

Drop-in replacement for the Anthropic Messages API with adaptive thinking.

Embeddings

Generate text embeddings with BERT, BGE-M3, and ModernBERT models.

Rerank

Rerank documents by relevance using ModernBERT and XLM-RoBERTa.

Models

List all discovered models and their load status.

MCP tools

List and execute tools from connected MCP servers.

Build docs developers (and LLMs) love