Odysseus Portable comes pre-configured with a hand-picked set of GGUF models that cover the most common use cases — coding assistance, reasoning, and general-purpose chat — across a wide range of hardware. On the first launch with the llama.cpp backend, or any time theDocumentation Index
Fetch the complete documentation index at: https://mintlify.com/techjarves/Odysseus-Portable/llms.txt
Use this file to discover all available pages before exploring further.
models/ folder is empty, the orchestrator pauses at the terminal and presents an interactive numbered menu. The menu shows any locally detected GGUF files first, followed by the full predefined download list. Selecting a download option streams the file directly from Hugging Face into your models/ folder before llama-server is started.
Predefined Model List
The full list below is sourced directly fromsrc/model.js in the Odysseus Portable codebase. Every entry represents a Q4_K_M quantization hosted on Hugging Face, selected for their balance of capability and download size.
| Model | Hugging Face Repo | Quantization | Size | Best For |
|---|---|---|---|---|
| Qwen 2.5 Coder 0.5B Instruct | Qwen/Qwen2.5-Coder-0.5B-Instruct-GGUF | Q4_K_M | 0.38 GB | Super light, quick testing |
| Qwen 2.5 Coder 1.5B Instruct | Qwen/Qwen2.5-Coder-1.5B-Instruct-GGUF | Q4_K_M | 1.2 GB | Ultra-fast coding, light devices |
| Qwen 2.5 Coder 7B Instruct | Qwen/Qwen2.5-Coder-7B-Instruct-GGUF | Q4_K_M | 4.7 GB | Best for development |
| DeepSeek R1 Distill Qwen 1.5B | unsloth/DeepSeek-R1-Distill-Qwen-1.5B-GGUF | Q4_K_M | 1.1 GB | Fast reasoning, light devices |
| DeepSeek R1 Distill Qwen 7B | unsloth/DeepSeek-R1-Distill-Qwen-7B-GGUF | Q4_K_M | 4.7 GB | High-performance reasoning |
| Llama 3.2 3B Instruct | unsloth/Llama-3.2-3B-Instruct-GGUF | Q4_K_M | 2.0 GB | Balanced general intelligence |
| Llama 3 8B Instruct | unsloth/llama-3-8b-Instruct-gguf | Q4_K_M | 4.9 GB | Standard general model |
How the Selection Menu Works
When the CLI menu appears, the orchestrator:- Scans
models/recursively for any.gguffiles already present and lists them asLocal Model:entries at the top. - Appends all predefined download entries. If a predefined model’s filename is already found locally, it is marked
(Already Downloaded)and will not be re-downloaded if selected — the existing file is used directly. - Prompts you to enter a number. If a default model was previously used, its entry is highlighted and pressing Enter without typing accepts it.
The predefined list is just a starting point. Any GGUF model from any source can be used by placing it in the
models/ folder manually. See Custom GGUF Models for details.Recommended Models by Hardware Tier
Low-end hardware (4–8 GB RAM, no dedicated GPU)
Low-end hardware (4–8 GB RAM, no dedicated GPU)
On CPU-only machines or systems with less than 8 GB of RAM, the smaller models keep inference responsive and avoid out-of-memory crashes.Recommended picks:
- Qwen 2.5 Coder 0.5B Instruct (0.38 GB) — the lightest option, ideal for quick code completions and testing on very constrained hardware.
- DeepSeek R1 Distill Qwen 1.5B (1.1 GB) — a reasoning-capable model at a size that fits comfortably in 4 GB of RAM.
Mid-range hardware (8–16 GB RAM or entry-level GPU)
Mid-range hardware (8–16 GB RAM or entry-level GPU)
Systems with 8–16 GB of RAM, or a GPU with 6–8 GB VRAM, can run the 1.5B–3B range comfortably and get noticeably better output quality.Recommended picks:
- Qwen 2.5 Coder 1.5B Instruct (1.2 GB) — fast coding assistance with a small memory footprint.
- Llama 3.2 3B Instruct (2.0 GB) — well-rounded general chat and reasoning.
- DeepSeek R1 Distill Qwen 1.5B (1.1 GB) — structured reasoning tasks at low cost.
High-end hardware (16+ GB RAM or 8+ GB VRAM)
High-end hardware (16+ GB RAM or 8+ GB VRAM)
With a modern GPU or a system with 16 GB or more of RAM, the 7B–8B models run efficiently and deliver near-frontier quality for local inference.Recommended picks:
- Qwen 2.5 Coder 7B Instruct (4.7 GB) — the top choice for serious coding work, with deep language and tool-use capabilities.
- DeepSeek R1 Distill Qwen 7B (4.7 GB) — best-in-class reasoning at the 7B scale.
- Llama 3 8B Instruct (4.9 GB) — a reliable general-purpose model for chat, summarisation, and agent tasks.
Model Sources
All predefined models are hosted on Hugging Face and downloaded directly over HTTPS. No third-party relay or proxy is involved. Downloads are stored inmodels/ and remain entirely local to your Odysseus Portable workspace — nothing is sent to external services during inference.
Qwen Models
The Qwen 2.5 Coder series from Alibaba is optimized for code generation and completion tasks across multiple languages.
DeepSeek & Llama via unsloth
The unsloth organization on Hugging Face provides well-maintained GGUF quantizations of DeepSeek R1 Distill and Llama models.