oMLX includes specific optimizations for Claude Code that address two common pain points when running local models: context compaction timing and request timeouts during long prefill. Once connected, Claude Code sends every request to your local oMLX server instead of Anthropic’s API, so your code and conversations stay on-device.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/jundot/omlx/llms.txt
Use this file to discover all available pages before exploring further.
Claude Code optimizations in oMLX
Context scaling — Claude Code’s auto-compact feature triggers based on the model’s reported context window. Smaller local models often have shorter context limits than Claude’s hosted counterparts. oMLX scales the token counts it reports to Claude Code so that auto-compact fires at the right point relative to the model’s actual capacity. SSE keep-alive — During long prefill operations (loading a large codebase into context, for example), local inference can take several seconds before the first token is generated. oMLX emits SSE keep-alive events during this gap to prevent Claude Code from timing out before generation begins.Connecting Claude Code
- Admin dashboard
- CLI
Open the Integrations tab
Navigate to
http://localhost:8000/admin and click Integrations in the top navigation.Find Claude Code
Locate the Claude Code card and click the one-click setup button. The dashboard fetches your loaded models and writes the required environment variables automatically.
What the launch command does
omlx launch claude configures the following environment variables before exec-ing the claude binary:
| Variable | Value | Purpose |
|---|---|---|
ANTHROPIC_BASE_URL | http://localhost:8000 | Points Claude Code at your oMLX server |
ANTHROPIC_AUTH_TOKEN | your API key, or "omlx" | Authenticates with oMLX |
ANTHROPIC_API_KEY | (empty) | Prevents Claude Code from using a real Anthropic key |
ANTHROPIC_DEFAULT_OPUS_MODEL | selected model | Routes all Claude tiers to your local model |
ANTHROPIC_DEFAULT_SONNET_MODEL | selected model | Routes all Claude tiers to your local model |
ANTHROPIC_DEFAULT_HAIKU_MODEL | selected model | Routes all Claude tiers to your local model |
CLAUDE_CODE_SUBAGENT_MODEL | selected model | Ensures sub-agents also use your local model |
API_TIMEOUT_MS | 3000000 | Extended timeout for local model inference |
CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC | 1 | Disables telemetry and background requests |
API key
If your oMLX server has no API key configured (the default), any non-empty string works as the auth token.omlx launch claude uses "omlx" as the fallback. If you started oMLX with --api-key your-secret, that key is used automatically.
Endpoint and port
The default endpoint ishttp://localhost:8000. If you started oMLX on a different host or port, pass --host and --port to the launch command:
Installing Claude Code
Ifclaude is not yet installed: