Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/pewdiepie-archdaemon/odysseus/llms.txt

Use this file to discover all available pages before exploring further.

Cookbook is Odysseus’s built-in model manager. Built on llmfit, it scans your hardware, scores every compatible model against your specific GPU and RAM configuration, and presents a ranked list of recommendations. Download from HuggingFace with one click, then spin up a server — all without touching a terminal. No CLI knowledge required.

Hardware scan

Cookbook probes your system at startup and on demand (the Rescan button). It detects:
  • GPU — vendor (NVIDIA/AMD/Apple Silicon), model name, and VRAM size
  • RAM — total system memory, used as overflow when VRAM is insufficient for full GPU serving
  • Backend — CUDA, ROCm, Metal, or CPU
Detection results are cached for 24 hours. For remote servers, hardware is probed over SSH.
In Docker, Cookbook can only detect GPUs that the container runtime exposes. If you see the wrong GPU (or no GPU), the Docker GPU overlay is not configured. See GPU Setup for the NVIDIA and AMD passthrough setup scripts.

Model recommendations and fit scoring

Using the detected hardware profile, Cookbook calculates a fit score for every model in its catalog. The score weighs four factors based on the use case (general, coding, reasoning, chat, multimodal):
  • Hardware fit — does the model’s memory footprint realistically fit in VRAM (or with acceptable CPU offload)?
  • Estimated speed — predicted tokens per second based on GPU memory bandwidth and quantization
  • Quality — parameter count and architecture generation bonus (e.g. Qwen3 scores higher than older Qwen2 at the same size)
  • Context headroom — how much context window remains after the model weights are loaded
Models are ranked by fit score so the best option for your exact hardware appears first. Supported formats:
FormatDescription
GGUFQuantized weights for llama.cpp (Q4_K_M, Q5_K_M, Q8_0, and others)
FP88-bit floating point for vLLM on Hopper (H100) and newer GPUs
AWQActivation-aware weight quantization for vLLM

Downloading models

Click Download next to any recommended model to begin a background download from HuggingFace.
  • Downloads run in a tmux session in the background so Odysseus stays responsive.
  • Progress is tracked and visible in the Downloads panel.
  • Downloaded files are stored in ./data/huggingface (mapped to ~/.cache/huggingface inside Docker).
For gated or private models, add your HuggingFace token in Cookbook → Settings → HuggingFace Token. The token is encrypted at rest.
tmux must be installed on the server for background downloads and serves to work on Linux and macOS. Install it via your OS package manager (apt install tmux, brew install tmux, etc.).

Serving models

Once a model is downloaded, click Serve to start a local inference server. Cookbook manages the tmux session, writes the task to cookbook_state.json, and automatically registers the running model as a chat endpoint in Settings → Models. Supported serving engines:
CUDA and ROCm only. Best throughput for GPU inference. Install via Cookbook → Dependencies. Not available on macOS.
# Example vLLM serve command (generated by Cookbook)
vllm serve Qwen/Qwen3-8B --port 8000
Install missing serve engines from Cookbook → Dependencies — Cookbook handles the pip install and build steps.

Serve presets

Once you’ve dialed in a working serve command (model, flags, port, quantization level), save it as a Serve Preset from the Cookbook UI. Presets let you re-launch the same configuration with one click, or ask the agent to “start the Qwen serve preset” by name.

Remote servers

Cookbook can download and serve models on a remote GPU server over SSH, so you don’t need Odysseus itself running on the GPU machine.
1

Generate the Cookbook SSH key

Go to Cookbook → Settings → Servers and click Generate SSH Key. Cookbook creates an ed25519 key pair. The public key is shown on screen.Alternatively, from the host you can copy the key manually:
ssh-copy-id -i data/ssh/id_ed25519.pub user@server
2

Add the public key to the remote server

On the remote server, append the public key to ~/.ssh/authorized_keys:
echo "ssh-ed25519 AAAA... odysseus-cookbook" >> ~/.ssh/authorized_keys
3

Add the server in Cookbook

In Cookbook → Settings → Servers, click Add Server and enter user@hostname. Cookbook will SSH in for hardware probes and model operations.

Docker storage

When running Odysseus in Docker, model downloads and serve-engine installations survive container recreation because they are mapped to host volumes:
Path inside containerHost pathContents
~/.cache/huggingface./data/huggingfaceDownloaded model files
~/.local./data/localCookbook-installed Python CLIs and serve engines
Do not delete these directories unless you intend to re-download all models.

macOS notes

Apple Silicon Macs are fully supported with Metal acceleration via llama.cpp. However:
  • vLLM and SGLang are CUDA/ROCm only — they do not run on macOS.
  • Docker on macOS cannot use the GPU. For GPU-accelerated Cookbook on an M-series Mac, run Odysseus natively with ./start-macos.sh.
  • MLX-only models are not served by Odysseus Cookbook.
Docker requires GPU passthrough configured at the container-runtime level before Cookbook can see or use your GPU. Run the diagnostic script first:
# NVIDIA
scripts/check-docker-gpu.sh

# AMD
scripts/check-docker-amd-gpu.sh
See GPU Setup for full instructions.

Build docs developers (and LLMs) love