oMLX: LLM Inference Optimized for Apple Silicon

oMLX is a local LLM inference server built specifically for Apple Silicon. It combines continuous batching, a two-tier KV cache (RAM + SSD), and an OpenAI-compatible API — all managed from a native macOS menu bar app or a simple CLI command. Whether you’re running coding agents or experimenting with large models, oMLX keeps your most-used models hot in memory and handles the rest automatically.

Installation

Install via macOS DMG, Homebrew, or from source. Up and running in minutes.

Quickstart

Start the server, load a model, and make your first API call.

API Reference

Drop-in OpenAI and Anthropic API compatibility. Chat, completions, embeddings, and more.

Integrations

Connect Claude Code, Codex, OpenClaw, and Pi with one click.

Why oMLX?

oMLX solves the real pain of running local LLMs for serious development work. Most local inference servers make you choose between convenience and control — oMLX gives you both.

Tiered KV Cache

Hot RAM tier plus cold SSD tier. Context survives restarts and is reused across requests — no recomputation.

Multi-Model Serving

Pin everyday models, auto-load heavier ones on demand. LRU eviction and per-model TTL keep memory in check.

Vision-Language Models

Run VLMs with the same continuous batching and paged cache stack as text LLMs.

Tool Calling & MCP

Full function calling support for Llama, Qwen, DeepSeek, Gemma, Mistral, and more. MCP tool integration included.

Get started in three steps

Install oMLX

Download the macOS app from GitHub Releases or install via Homebrew:

brew tap jundot/omlx https://github.com/jundot/omlx
brew install omlx

Start the server

Point oMLX at your models directory:

omlx serve --model-dir ~/models

Or launch the macOS app from your Applications folder — the Welcome screen guides you through setup.

Connect your client

Any OpenAI-compatible client works out of the box:

curl http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "your-model-name",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

oMLX requires macOS 15.0+ (Sequoia), Python 3.10+, and Apple Silicon (M1/M2/M3/M4).

Get Started

Core Features

Configuration

Integrations

Admin Dashboard

oMLX: LLM Inference Optimized for Apple Silicon

Installation

Quickstart

API Reference

Integrations

Why oMLX?

Tiered KV Cache

Multi-Model Serving

Vision-Language Models

Tool Calling & MCP

Get started in three steps

Build docs developers (and LLMs) love

Get Started

Core Features

Configuration

Integrations

Admin Dashboard

Documentation Index

Installation

Quickstart

API Reference

Integrations

​Why oMLX?

Tiered KV Cache

Multi-Model Serving

Vision-Language Models

Tool Calling & MCP

​Get started in three steps

Build docs developers (and LLMs) love

Why oMLX?

Get started in three steps