Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/jundot/omlx/llms.txt

Use this file to discover all available pages before exploring further.

The /v1/models endpoint returns the list of all models that oMLX has discovered in your configured model directory. The response is compatible with the OpenAI List Models API, so any OpenAI client can use it to enumerate available models. oMLX extends the standard response with model type and load status information available via the /v1/models/status endpoint.

List models

GET /v1/models Returns all discovered models. For each model, the id field is the alias if one is configured in per-model settings, or the directory name otherwise. Both the alias and the directory name are accepted when specifying a model in generation requests.

Example

curl http://localhost:8000/v1/models

Response

object
string
Always "list".
data
object[]
Array of model objects.

Example response

{
  "object": "list",
  "data": [
    {
      "id": "qwen3-8b",
      "object": "model",
      "created": 1746835200,
      "owned_by": "omlx"
    },
    {
      "id": "bge-m3",
      "object": "model",
      "created": 1746835200,
      "owned_by": "omlx"
    }
  ]
}

Model status

GET /v1/models/status Returns extended per-model information including the model type and context window configuration. This endpoint is an oMLX extension and is not part of the OpenAI API spec.
curl http://localhost:8000/v1/models/status
models
object[]
Array of model status objects.

Example response

{
  "models": [
    {
      "id": "qwen3-8b",
      "model_type": "llm",
      "loaded": true,
      "max_context_window": 32768,
      "max_tokens": 32768
    },
    {
      "id": "bge-m3",
      "model_type": "embedding",
      "loaded": false,
      "max_context_window": 8192,
      "max_tokens": 8192
    }
  ]
}

Health check

GET /health Returns a simple health check response. Useful for monitoring and readiness checks in scripts or container orchestration.
curl http://localhost:8000/health
Response:
{
  "status": "ok"
}
Use the /health endpoint in shell scripts to wait for the server to be ready before sending the first request:
until curl -sf http://localhost:8000/health > /dev/null; do sleep 1; done
echo "Server is ready"

Load and unload models

Two additional endpoints let you control model loading state programmatically: POST /v1/models/{model_id}/load — Load a model into memory. POST /v1/models/{model_id}/unload — Unload a model from memory. These are equivalent to using the status badges in the admin panel. The model_id path parameter accepts the model alias or directory name.
# Unload a model to free memory
curl -X POST http://localhost:8000/v1/models/qwen3-8b/unload

# Load it back
curl -X POST http://localhost:8000/v1/models/qwen3-8b/load

Build docs developers (and LLMs) love