GET /v1/models

The /v1/models endpoint returns the list of all models that oMLX has discovered in your configured model directory. The response is compatible with the OpenAI List Models API, so any OpenAI client can use it to enumerate available models. oMLX extends the standard response with model type and load status information available via the /v1/models/status endpoint.

List models

GET /v1/models Returns all discovered models. For each model, the id field is the alias if one is configured in per-model settings, or the directory name otherwise. Both the alias and the directory name are accepted when specifying a model in generation requests.

Example

curl http://localhost:8000/v1/models

Response

object

string

Always "list".

data

object[]

Array of model objects.

Show Model object properties

string

Model identifier. This is the alias if configured, otherwise the subdirectory name. Use this value as the model parameter in generation requests.

object

string

Always "model".

created

number

Unix timestamp.

owned_by

string

Always "omlx".

Example response

{
  "object": "list",
  "data": [
    {
      "id": "qwen3-8b",
      "object": "model",
      "created": 1746835200,
      "owned_by": "omlx"
    },
    {
      "id": "bge-m3",
      "object": "model",
      "created": 1746835200,
      "owned_by": "omlx"
    }
  ]
}

Model status

GET /v1/models/status Returns extended per-model information including the model type and context window configuration. This endpoint is an oMLX extension and is not part of the OpenAI API spec.

curl http://localhost:8000/v1/models/status

models

object[]

Array of model status objects.

Show Model status properties

string

Model identifier (alias or directory name).

model_type

string

Detected model type: "llm", "vlm", "embedding", or "reranker".

loaded

boolean

Whether the model is currently loaded into memory.

max_context_window

number

Maximum context window in tokens as configured for this model.

max_tokens

number

Maximum generation tokens as configured for this model.

Example response

{
  "models": [
    {
      "id": "qwen3-8b",
      "model_type": "llm",
      "loaded": true,
      "max_context_window": 32768,
      "max_tokens": 32768
    },
    {
      "id": "bge-m3",
      "model_type": "embedding",
      "loaded": false,
      "max_context_window": 8192,
      "max_tokens": 8192
    }
  ]
}

Health check

GET /health Returns a simple health check response. Useful for monitoring and readiness checks in scripts or container orchestration.

curl http://localhost:8000/health

Response:

{
  "status": "ok"
}

Use the /health endpoint in shell scripts to wait for the server to be ready before sending the first request:

until curl -sf http://localhost:8000/health > /dev/null; do sleep 1; done
echo "Server is ready"

Load and unload models

Two additional endpoints let you control model loading state programmatically: POST /v1/models/{model_id}/load — Load a model into memory. POST /v1/models/{model_id}/unload — Unload a model from memory. These are equivalent to using the status badges in the admin panel. The model_id path parameter accepts the model alias or directory name.

# Unload a model to free memory
curl -X POST http://localhost:8000/v1/models/qwen3-8b/unload

# Load it back
curl -X POST http://localhost:8000/v1/models/qwen3-8b/load

Overview

Endpoints

MCP Tools API

GET /v1/models

List models

Example

Response

Example response

Model status

Example response

Health check

Load and unload models

Build docs developers (and LLMs) love

Overview

Endpoints

MCP Tools API

Documentation Index

​List models

​Example

​Response

​Example response

​Model status

​Example response

​Health check

​Load and unload models

Build docs developers (and LLMs) love

List models

Example

Response

Example response

Model status

Example response

Health check

Load and unload models