The Models tab in the oMLX Admin Dashboard is the primary interface for controlling which models are in memory and how they behave. Every action — loading, unloading, pinning, configuring — takes effect immediately, and all settings persist across server restarts inDocumentation Index
Fetch the complete documentation index at: https://mintlify.com/jundot/omlx/llms.txt
Use this file to discover all available pages before exploring further.
~/.omlx/model_settings.json.
Model status badges
Each model in the list has a status badge indicating whether it is currently loaded or unloaded. Click the badge to toggle:- Loaded — the model is in memory and ready to serve requests. Click to unload.
- Unloaded — the model is available on disk but not in memory. Click to load.
Pinning models
Click the pin icon next to any loaded model to mark it as pinned. Pinned models are excluded from LRU eviction and remain loaded until you manually unload them or remove the pin. Use pinning for models you use constantly — small chat models, embedding models, or rerankers — so they are never swapped out during heavy workloads.Model downloader
The downloader is accessible from the Models tab. Enter a model name or search query to find MLX-format models on HuggingFace. The panel shows the model card, file sizes, and quantization details before you commit to a download. Click Download to pull the model into your configured model directory.Downloaded models appear in the model list automatically once the download completes. No server restart is required.
Per-model settings
Open the settings panel for any model by clicking its name. All fields apply immediately without a server restart.Sampling parameters
| Parameter | Description |
|---|---|
max_tokens | Maximum output tokens per request. Overrides the global default. |
temperature | Sampling temperature. Higher values increase randomness. |
top_p | Nucleus sampling probability cutoff. |
top_k | Limit the next-token selection to the top K candidates. |
min_p | Minimum probability threshold for nucleus sampling. |
repetition_penalty | Penalise repeated tokens. 1.0 disables the penalty. |
presence_penalty | Penalise tokens that have already appeared in the output. |
max_context_window | Reject requests whose prompt exceeds this token count. |
Chat template kwargs
chat_template_kwargs passes extra keyword arguments to the model’s Jinja2 chat template. This is useful for models that expose template-level toggles — for example, enabling thinking mode or disabling system prompt injection. You can also mark specific keys as forced_ct_kwargs so API callers cannot override them.
TTL (idle timeout)
Setttl_seconds to automatically unload a model after it has been idle for that many seconds. This is useful for large models that you only use occasionally — they load on demand and free memory after inactivity.
Model alias
model_alias sets a custom API-visible name for the model. When an alias is set:
GET /v1/modelsreturns the alias instead of the directory name.- Requests accept both the alias and the original directory name.
my-coder to a model directory that might change between quantization updates.
Model type override
oMLX auto-detects whether a model is an LLM, VLM, embedding model, or reranker. If auto-detection produces the wrong result, usemodel_type_override to manually set the type. Valid values are llm, vlm, embedding, and reranker.
Testing models with built-in chat
Any loaded model can be tested directly from the dashboard without leaving your browser. The chat UI supports:- Full conversation history
- Mid-conversation model switching
- Image upload for VLMs and OCR models
- Reasoning model output (thinking blocks rendered separately)
- Dark mode
Serving stats
The dashboard displays per-model serving statistics — total requests, prompt tokens, completion tokens, cached tokens, average prefill TPS, and average generation TPS. These stats have two scopes:- Session — counters since the server last started.
- All-time — counters persisted across restarts to
~/.omlx/stats.json.
Load a model
Click the Unloaded badge next to the model you want to load. The badge turns green when the model is ready.
Configure settings
Click the model name to open the settings panel. Adjust sampling parameters, TTL, alias, or model type, then click Save.