OpenGauss supports any OpenAI-compatible inference API through the standardDocumentation Index
Fetch the complete documentation index at: https://mintlify.com/math-inc/OpenGauss/llms.txt
Use this file to discover all available pages before exploring further.
OPENAI_BASE_URL environment variable. This means you can point Gauss at a locally running vLLM server, llama.cpp, Ollama in OpenAI-compatible mode, LM Studio, or any other server that speaks the OpenAI chat-completion format — without touching the cloud.
Start a vLLM inference server
Start your vLLM server on a local port:http://localhost:8000/v1.
Point Gauss at your local server
Option A — run gauss setup (interactive wizard)
The setup wizard has a dedicated step for custom endpoints:When prompted for your provider or API configuration, select the custom/local
option and enter
http://localhost:8000/v1 as the base URL. The wizard writes
the value to ~/.gauss/.env automatically.Option B — set OPENAI_BASE_URL directly
Edit No API key is required for local endpoints. OpenGauss treats a configured
~/.gauss/.env and add:OPENAI_BASE_URL as sufficient to consider the local provider active.Select the model
Tell Gauss which model name to request (must match the Or set it directly in
--model you passed
to vLLM):~/.gauss/config.yaml:Full config example
Here is a complete~/.gauss/config.yaml snippet for a local vLLM setup:
~/.gauss/.env:
Compatible inference servers
Any server that implements the OpenAI chat-completion API works with the sameOPENAI_BASE_URL approach:
| Server | Notes |
|---|---|
| vLLM | Recommended for GPU-accelerated inference; supports most HuggingFace models |
llama.cpp (--server) | CPU-friendly; use --port and --host to control the endpoint |
| Ollama | Start with ollama serve; OpenAI-compatible endpoint at http://localhost:11434/v1 |
| LM Studio | Enable “Local Server” in the app; exposes http://localhost:1234/v1 by default |
| LocalAI | Drop-in OpenAI replacement with broad model support |
Routing auxiliary tasks to a different endpoint
OpenGauss uses auxiliary models for side tasks like vision analysis, context compression, and MCP sampling. You can route these to a different local endpoint — for example, a lighter model on the same machine — without changing the primary model:base_url is set under an auxiliary task it takes precedence over the provider setting for that task. This lets you run a large model for primary interactions and a small model for background work.
Performance tuning
Local models often have shorter context windows than hosted frontier models. A few config settings that help:gauss config set to change these without editing the file directly: