Odysseus Portable ships with two fully offline inference backends: llama.cpp and Ollama. Both run entirely on your machine without sending data to any external service, but they differ in how they acquire models, which GPU runtimes they support, and how much control they give you over memory usage. Understanding these differences lets you pick the right engine from the start — and switch later if your needs change.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/techjarves/Odysseus-Portable/llms.txt
Use this file to discover all available pages before exploring further.
Backend Comparison
| Feature | llama.cpp | Ollama |
|---|---|---|
| Model format | GGUF files in models/ folder | Pulled via ollama pull |
| Model storage | models/ inside project folder | models/ollama/ inside project folder |
| API endpoint | http://127.0.0.1:8080/v1 (proxy) | http://127.0.0.1:11434/v1 |
| Context auto-scaling | Yes — retries with smaller context on OOM | No |
| GPU support | CUDA, Vulkan, Metal, CPU | CUDA, Metal, CPU |
| Best for | Portable GGUF files, USB drives | Convenient model management via web UI |
Selecting a Backend
Odysseus Portable gives you four ways to choose your backend, from most ephemeral to most persistent.Interactive prompt on first launch
When no configuration exists yet, the launcher presents a menu at startup. Enter the number for the backend you want:Your choice is saved to
data/launcher_config.json so it persists across future launches.Environment variable
Set
ODYSSEUS_BACKEND before running the launcher. This is useful in scripts or CI-like environments where you don’t want to modify any files:Which Backend Should I Use?
llama.cpp
Best when you need maximum portability — copy the entire project folder to a USB drive, external SSD, or another machine and everything works out of the box. The built-in context auto-scaling means it gracefully handles low-VRAM situations by automatically stepping down to a smaller context window instead of crashing. Supports CUDA, Vulkan, Metal, and CPU.
Ollama
Best when you prefer a polished model-management experience. Use the Cookbook/Models section in the Odysseus web UI to browse, pull, and switch models without leaving the browser. Ollama’s library covers a broad range of quantised models and its familiar CLI is well-documented. Supports CUDA, Metal, and CPU.