oMLX discovers models by scanning the directory (or directories) specified byDocumentation Index
Fetch the complete documentation index at: https://mintlify.com/jundot/omlx/llms.txt
Use this file to discover all available pages before exploring further.
--model-dir. No manifest file or registration step is required — drop an MLX-format model into the folder, and it appears in /v1/models on the next request or after a manual refresh from the admin panel. Understanding the directory layout, format requirements, and auto-detection rules will help you organize models and avoid surprises.
Directory layout
oMLX accepts two directory structures: flat (one level) and two-level (namespaced, common for HuggingFace organization names).- Flat layout
- Two-level layout
/v1/models.MLX format requirement
Every model must be in MLX format: aconfig.json describing the architecture, plus one or more .safetensors files containing the weights. Models in PyTorch (.bin) or GGUF format are not supported. Use the mlx-lm convert tool or download pre-converted models from HuggingFace.
Auto-detection by type
oMLX readsconfig.json and probes the weight layout to determine model type automatically. No manual labeling is needed in the common case.
| Type | Examples | Notes |
|---|---|---|
| LLM | Any model supported by mlx-lm | Default type for text generation models |
| VLM | Qwen3.5 Series, GLM-4V, Pixtral, other mlx-vlm models | Enables vision inputs and multi-image chat |
| OCR | DeepSeek-OCR, DOTS-OCR, GLM-OCR | Auto-detected with optimized system prompts |
| Embedding | BERT, BGE-M3, ModernBERT | Served via /v1/embeddings |
| Reranker | ModernBERT, XLM-RoBERTa | Served via /v1/rerank |
Per-model settings
oMLX stores per-model configuration in~/.omlx/model_settings.json. You can edit settings from the admin panel at /admin — changes apply immediately without a server restart.
Model alias
Set a custom API-visible name for any model. Once set:/v1/modelsreturns the alias as the model ID.- Requests can use either the alias or the original directory name.
Model type override
If auto-detection produces the wrong result, you can manually set the type tollm or vlm from the admin panel. This is stored in model_settings.json and persists across restarts.
Sampling parameters
Per-model overrides formax_tokens, temperature, top_p, top_k, repetition_penalty, and other generation parameters can be set from the admin panel. When set, they take precedence over the global defaults in settings.json. When left unset (null), the global defaults apply.
Downloading models
The recommended way to add models is the Model Downloader in the admin dashboard at/admin. It lets you search HuggingFace directly, inspect model cards and file sizes, and download with one click into your configured model directory.
HuggingFace mirror endpoint
For regions with restricted access tohuggingface.co, specify an alternate endpoint:
HF_ENDPOINT environment variable before the model downloader or any mlx-lm load call makes network requests.
ModelScope
For users in regions where ModelScope (modelscope.cn) is preferred:
trust_remote_code
trust_remote_code is disabled by default. When a HuggingFace model repository ships custom Python files — typically named modeling_*.py or tokenization_*.py — those files are executed at load time if trust_remote_code is enabled. This is a significant security risk for repositories you have not audited.
You can enable it per model from the admin panel under Per-Model Settings. The setting is stored in model_settings.json and applies only to that specific model. There is no global trust_remote_code flag in omlx serve — the granular per-model control is intentional.