Odysseus Portable is not limited to its predefined model list. BecauseDocumentation Index
Fetch the complete documentation index at: https://mintlify.com/techjarves/Odysseus-Portable/llms.txt
Use this file to discover all available pages before exploring further.
llama-server is configured to scan the entire models/ folder recursively at startup, any valid GGUF file you place there — regardless of where it came from — is automatically registered and available for inference. This makes it straightforward to experiment with the latest community releases, fine-tuned variants, or private models you have converted yourself, without touching any configuration files.
What Is a GGUF File?
GGUF (GPT-Generated Unified Format) is the standard model serialization format used by llama.cpp. It packages model weights, tokenizer data, and architecture metadata into a single portable file. Virtually every open-weight model released on Hugging Face today has a GGUF variant available, usually at multiple quantization levels. If a model page lists files ending in.gguf, it will work with Odysseus Portable.
Adding a Custom Model
Simple drag-and-drop
The fastest method. Copy or move your You can also organise models into subfolders if you prefer. The recursive scanner will find them regardless of nesting depth.
.gguf file into the models/ folder at the Odysseus Portable project root. Flat top-level files are picked up immediately on the next launch or model reload — no additional steps required.Download from Hugging Face via curl
Navigate to the model’s page on huggingface.co, locate the GGUF file you want under the Files and versions tab, and copy its direct URL. Then download it straight into For gated or private models, add your Hugging Face read token to the request:
models/:You can persist your Hugging Face token so the Odysseus Cookbook and downloader can access gated models automatically. Create a
.env file inside the odysseus/ directory with the line HUGGING_FACE_HUB_TOKEN=hf_yourTokenHere.Via the Odysseus Cookbook UI
Open the Odysseus web interface at
http://127.0.0.1:7070 and navigate to the Cookbook tab. Use the built-in model download panel to search for and pull any Hugging Face GGUF model directly into the models/ folder without leaving your browser. The Cookbook shows download progress and organises files into the standard hub/ cache layout automatically.How Nested Models Are Resolved
Whenllama-server starts, Odysseus Portable scans models/ recursively and creates a flat symlink (or a hardlink on FAT32/exFAT drives) at the top level for every GGUF file discovered in a subdirectory. This lets llama-server address all models through a single flat namespace.
For example, a Hugging Face hub cache entry like:
myorg/mymodel) to that flat filename, so API clients can reference the model by its canonical name:
Quantization Guide
GGUF files for the same base model are distributed at multiple quantization levels. Choosing the right level means balancing file size, RAM/VRAM usage, and output quality.Q2_K — Smallest size, lowest quality
Q2_K — Smallest size, lowest quality
Aggressively compressed. Use this only when storage or RAM is the hard constraint (e.g. 4 GB RAM machines). Output quality degrades noticeably compared to higher quantizations, but the model remains usable for simple tasks.Typical size vs Q4_K_M: ~55–60%
Q4_K_M — Best balance (recommended)
Q4_K_M — Best balance (recommended)
The default choice for most users. Provides strong output quality very close to the full-precision model at roughly half the file size. All of the predefined models in Odysseus Portable use Q4_K_M.Typical size vs F16: ~30–35%
Q5_K_M — Better quality, ~25% larger than Q4_K_M
Q5_K_M — Better quality, ~25% larger than Q4_K_M
A step up from Q4_K_M with a modest size increase. A good option if you have a few GB to spare and want tighter fidelity to the original model weights, particularly for structured output or code generation tasks.
Q6_K — Near-lossless quality
Q6_K — Near-lossless quality
Very close to the original floating-point weights. Recommended when output accuracy is critical and storage is not a concern. Noticeably larger than Q4_K_M but still significantly smaller than F16.Typical size vs Q4_K_M: ~140–150%
Q8_0 — Highest quality GGUF
Q8_0 — Highest quality GGUF
The highest-fidelity quantized format. Output is nearly indistinguishable from F16 at about half the size. Requires significant RAM/VRAM — typically only practical for 3B models and smaller on mid-range hardware.Typical size vs Q4_K_M: ~190–210%
F16 / F32 — Full precision
F16 / F32 — Full precision
Full 16-bit or 32-bit floating-point weights. These are almost always too large for local inference on consumer hardware. A 7B model in F16 is roughly 14 GB; Q4_K_M is under 5 GB. Use quantized formats instead.
Finding GGUF Models
TheBloke on Hugging Face
One of the most prolific GGUF quantizers in the community. TheBloke publishes Q2_K through Q8_0 versions of hundreds of popular open-weight models with consistent naming conventions.
unsloth on Hugging Face
Maintains optimized GGUF quantizations of DeepSeek, Llama, Mistral, and other frontier models — including the models bundled in Odysseus Portable’s predefined list.