Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/techjarves/Hermes-USB-Portable/llms.txt

Use this file to discover all available pages before exploring further.

Hermes Portable can connect to any OpenAI-compatible API endpoint — including a locally running Ollama server. This means you can run inference entirely on your own hardware: no API keys required, no data leaves your machine, and no per-token costs. It is ideal for air-gapped environments, sensitive workflows, or simply experimenting with open-weight models at no cost.

Prerequisites

Ollama Installed

Ollama must be installed on the same machine (or a reachable host). Download it from ollama.com.

Ollama Running

The Ollama server must be started before launching Hermes. By default it listens on http://127.0.0.1:11434.

Connect Hermes to Local Ollama

1

Start Ollama and pull a model

Open a terminal and pull the model you want to use. For example:
ollama pull qwen3:latest
You can pull any model supported by Ollama. Run ollama list to see what is already available on your machine.
2

Launch Hermes Portable and open Setup

Start the launcher for your platform:
launch.bat
From the interactive menu, select [2] Setup / Reconfigure Hermes.You can also run setup directly from the command line:
launch.bat setup
3

Choose Quick setup → More providers → Custom endpoint

Inside the Hermes setup wizard:
  1. Choose Quick setup.
  2. Select More providers.
  3. Select Custom endpoint (enter URL manually).
4

Enter the local Ollama endpoint

When prompted for the endpoint URL, enter:
http://127.0.0.1:11434/v1
Ollama exposes an OpenAI-compatible /v1 API. The /v1 suffix is required — Hermes uses the OpenAI client format to communicate with it.
5

Leave the API key blank

When the wizard asks for an API key, press Enter to leave it empty. Ollama does not require authentication when running locally.
6

Select your model and auto-detect context length

The wizard will detect models available from the running Ollama instance. Select the model you pulled in Step 1.When prompted for context length, press Enter to leave it blank — Hermes will auto-detect the correct value from the model’s metadata.
Hermes is now configured to use your local Ollama instance. Start a chat session from the launcher menu ([1] Start Hermes Chat) to begin.

Remote Ollama Host

If Ollama is running on a different machine on your network, use the same /v1 endpoint format with the remote host’s IP address or hostname. For example:
http://192.168.1.20:11434/v1
Verify the remote Ollama host is reachable before running setup — use ping or open the URL in a browser to confirm the server responds. Ollama must also be configured to accept connections from external interfaces (set OLLAMA_HOST=0.0.0.0 on the remote machine if needed).
Follow the same wizard steps as above, substituting the remote URL when prompted for the endpoint.

Summary

SettingValue
Endpoint (local)http://127.0.0.1:11434/v1
Endpoint (remote example)http://192.168.1.20:11434/v1
API Key(leave blank)
Context length(leave blank to auto-detect)
API formatOpenAI-compatible /v1

Build docs developers (and LLMs) love