oMLX exposes a standard OpenAI-compatible API, so any client that works with OpenAI works with oMLX — no changes required. This guide walks you through starting the server, pointing it at your models, and making your first inference request.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/jundot/omlx/llms.txt
Use this file to discover all available pages before exploring further.
Start the server
Run The server starts on macOS app: Launch oMLX from your Applications folder. The Welcome screen guides you through model directory selection and server startup automatically.
omlx serve and point it at your models directory:http://localhost:8000 by default. You should see output like:Organize your models
oMLX discovers models from subdirectories of your oMLX auto-detects model type — LLM, VLM, embedding, or reranker — from the model’s config. You don’t need to declare types manually.
--model-dir. Each subdirectory should contain a valid MLX-format model (a config.json and .safetensors files).Both flat and two-level organization are supported:Make your first API call
Check which models oMLX has discovered:Then send a chat completion request. Replace For streaming responses, add Streaming works the same way:
your-model-name with a model ID from the list above:"stream": true:Using the Python OpenAI SDK
Any OpenAI-compatible client works by pointingbase_url at your local server:Explore the admin dashboard
Open
http://localhost:8000/admin in your browser. The admin dashboard lets you:- Monitor loaded models, memory usage, and request throughput in real time
- Load and unload models on demand using interactive status badges
- Pin models to keep frequently used ones always in memory
- Configure per-model settings — sampling parameters, chat template kwargs, TTL, model alias, and more
- Chat directly with any loaded model, including image uploads for VLMs
- Run benchmarks to measure prefill and generation speed with prefix cache testing
- Download models from HuggingFace directly in the dashboard
The built-in chat UI is available at
http://localhost:8000/admin/chat if you want a quick conversational interface without any external client.Connect coding tools
oMLX integrates with Claude Code, Codex, OpenClaw, and Pi. You can set up any of these from the admin dashboard with a single click, or use the CLI:launch command checks that oMLX is running, lets you pick a model interactively if needed, and configures the tool to use your local server automatically.
See Integrations for the full list of supported tools and manual configuration options.
Common CLI options
/admin and are persisted to ~/.omlx/settings.json. CLI flags always take precedence over saved settings.