Add Custom GGUF Models to Your Odysseus Portable Workspace

Odysseus Portable is not limited to its predefined model list. Because llama-server is configured to scan the entire models/ folder recursively at startup, any valid GGUF file you place there — regardless of where it came from — is automatically registered and available for inference. This makes it straightforward to experiment with the latest community releases, fine-tuned variants, or private models you have converted yourself, without touching any configuration files.

What Is a GGUF File?

GGUF (GPT-Generated Unified Format) is the standard model serialization format used by llama.cpp. It packages model weights, tokenizer data, and architecture metadata into a single portable file. Virtually every open-weight model released on Hugging Face today has a GGUF variant available, usually at multiple quantization levels. If a model page lists files ending in .gguf, it will work with Odysseus Portable.

Adding a Custom Model

Simple drag-and-drop

The fastest method. Copy or move your .gguf file into the models/ folder at the Odysseus Portable project root. Flat top-level files are picked up immediately on the next launch or model reload — no additional steps required.

models/
  my-fine-tuned-model-q4_k_m.gguf

You can also organise models into subfolders if you prefer. The recursive scanner will find them regardless of nesting depth.

Download from Hugging Face via curl

Navigate to the model’s page on huggingface.co, locate the GGUF file you want under the Files and versions tab, and copy its direct URL. Then download it straight into models/:

curl -L -o models/my-model-q4_k_m.gguf \
  https://huggingface.co/org/repo/resolve/main/model-q4_k_m.gguf

For gated or private models, add your Hugging Face read token to the request:

curl -L \
  -H "Authorization: Bearer hf_yourTokenHere" \
  -o models/my-model-q4_k_m.gguf \
  https://huggingface.co/org/private-repo/resolve/main/model-q4_k_m.gguf

You can persist your Hugging Face token so the Odysseus Cookbook and downloader can access gated models automatically. Create a .env file inside the odysseus/ directory with the line HUGGING_FACE_HUB_TOKEN=hf_yourTokenHere.

Via the Odysseus Cookbook UI

Open the Odysseus web interface at http://127.0.0.1:7070 and navigate to the Cookbook tab. Use the built-in model download panel to search for and pull any Hugging Face GGUF model directly into the models/ folder without leaving your browser. The Cookbook shows download progress and organises files into the standard hub/ cache layout automatically.

How Nested Models Are Resolved

When llama-server starts, Odysseus Portable scans models/ recursively and creates a flat symlink (or a hardlink on FAT32/exFAT drives) at the top level for every GGUF file discovered in a subdirectory. This lets llama-server address all models through a single flat namespace. For example, a Hugging Face hub cache entry like:

models/hub/models--myorg--mymodel/snapshots/abc123/mymodel-q4_k_m.gguf

Results in a flat link created automatically:

models/myorg--mymodel.gguf   ← symlink pointing to hub/models--myorg--mymodel/...

The proxy layer then maps the Hugging Face-style repo ID (myorg/mymodel) to that flat filename, so API clients can reference the model by its canonical name:

{
  "model": "myorg/mymodel",
  "messages": [{ "role": "user", "content": "Hello!" }]
}

Flat links are regenerated on every launch — stale links from deleted models are cleaned up automatically, so you never need to manage them manually.

Quantization Guide

GGUF files for the same base model are distributed at multiple quantization levels. Choosing the right level means balancing file size, RAM/VRAM usage, and output quality.

Q2_K — Smallest size, lowest quality

Aggressively compressed. Use this only when storage or RAM is the hard constraint (e.g. 4 GB RAM machines). Output quality degrades noticeably compared to higher quantizations, but the model remains usable for simple tasks.Typical size vs Q4_K_M: ~55–60%

Q4_K_M — Best balance (recommended)

The default choice for most users. Provides strong output quality very close to the full-precision model at roughly half the file size. All of the predefined models in Odysseus Portable use Q4_K_M.Typical size vs F16: ~30–35%

Q5_K_M — Better quality, ~25% larger than Q4_K_M

A step up from Q4_K_M with a modest size increase. A good option if you have a few GB to spare and want tighter fidelity to the original model weights, particularly for structured output or code generation tasks.

Q6_K — Near-lossless quality

Very close to the original floating-point weights. Recommended when output accuracy is critical and storage is not a concern. Noticeably larger than Q4_K_M but still significantly smaller than F16.Typical size vs Q4_K_M: ~140–150%

Q8_0 — Highest quality GGUF

The highest-fidelity quantized format. Output is nearly indistinguishable from F16 at about half the size. Requires significant RAM/VRAM — typically only practical for 3B models and smaller on mid-range hardware.Typical size vs Q4_K_M: ~190–210%

F16 / F32 — Full precision

Full 16-bit or 32-bit floating-point weights. These are almost always too large for local inference on consumer hardware. A 7B model in F16 is roughly 14 GB; Q4_K_M is under 5 GB. Use quantized formats instead.

If a model is too large for your available VRAM or RAM, the proxy will automatically retry the request with a progressively smaller context window (stepping down through 32768 → 24576 → 16384 → 12288 → 8192 → 4096 → 2048 tokens). This often resolves memory pressure without any action on your part. However, if the model itself does not fit in memory at any context size, the request will still fail — in that case, switch to a smaller quantization (e.g. Q2_K or Q4_K_M) or a model with fewer parameters (e.g. 1.5B or 3B instead of 7B).

Finding GGUF Models

TheBloke on Hugging Face

One of the most prolific GGUF quantizers in the community. TheBloke publishes Q2_K through Q8_0 versions of hundreds of popular open-weight models with consistent naming conventions.

unsloth on Hugging Face

Maintains optimized GGUF quantizations of DeepSeek, Llama, Mistral, and other frontier models — including the models bundled in Odysseus Portable’s predefined list.

When browsing Hugging Face for GGUF files, filter by the gguf tag on the model page or look for repos with names ending in -GGUF. The Files and versions tab shows all available quantization variants with their exact file sizes so you can choose before downloading.

Get Started

Configuration

Inference Backends

Models

Guides

Add Custom GGUF Models to Your Odysseus Portable Workspace

What Is a GGUF File?

Adding a Custom Model

How Nested Models Are Resolved

Quantization Guide

Finding GGUF Models

TheBloke on Hugging Face

unsloth on Hugging Face

Build docs developers (and LLMs) love

Get Started

Configuration

Inference Backends

Models

Guides

Documentation Index

​What Is a GGUF File?

​Adding a Custom Model

​How Nested Models Are Resolved

​Quantization Guide

​Finding GGUF Models

TheBloke on Hugging Face

unsloth on Hugging Face

Build docs developers (and LLMs) love

What Is a GGUF File?

Adding a Custom Model

How Nested Models Are Resolved

Quantization Guide

Finding GGUF Models