Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/jundot/omlx/llms.txt

Use this file to discover all available pages before exploring further.

oMLX is a local LLM inference server built specifically for Apple Silicon. It combines continuous batching, a two-tier KV cache (RAM + SSD), and an OpenAI-compatible API — all managed from a native macOS menu bar app or a simple CLI command. Whether you’re running coding agents or experimenting with large models, oMLX keeps your most-used models hot in memory and handles the rest automatically.

Installation

Install via macOS DMG, Homebrew, or from source. Up and running in minutes.

Quickstart

Start the server, load a model, and make your first API call.

API Reference

Drop-in OpenAI and Anthropic API compatibility. Chat, completions, embeddings, and more.

Integrations

Connect Claude Code, Codex, OpenClaw, and Pi with one click.

Why oMLX?

oMLX solves the real pain of running local LLMs for serious development work. Most local inference servers make you choose between convenience and control — oMLX gives you both.

Tiered KV Cache

Hot RAM tier plus cold SSD tier. Context survives restarts and is reused across requests — no recomputation.

Multi-Model Serving

Pin everyday models, auto-load heavier ones on demand. LRU eviction and per-model TTL keep memory in check.

Vision-Language Models

Run VLMs with the same continuous batching and paged cache stack as text LLMs.

Tool Calling & MCP

Full function calling support for Llama, Qwen, DeepSeek, Gemma, Mistral, and more. MCP tool integration included.

Get started in three steps

1

Install oMLX

Download the macOS app from GitHub Releases or install via Homebrew:
brew tap jundot/omlx https://github.com/jundot/omlx
brew install omlx
2

Start the server

Point oMLX at your models directory:
omlx serve --model-dir ~/models
Or launch the macOS app from your Applications folder — the Welcome screen guides you through setup.
3

Connect your client

Any OpenAI-compatible client works out of the box:
curl http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "your-model-name",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'
oMLX requires macOS 15.0+ (Sequoia), Python 3.10+, and Apple Silicon (M1/M2/M3/M4).

Build docs developers (and LLMs) love