Cookbook is Odysseus’s built-in model manager. Built on llmfit, it scans your hardware, scores every compatible model against your specific GPU and RAM configuration, and presents a ranked list of recommendations. Download from HuggingFace with one click, then spin up a server — all without touching a terminal. No CLI knowledge required.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/pewdiepie-archdaemon/odysseus/llms.txt
Use this file to discover all available pages before exploring further.
Hardware scan
Cookbook probes your system at startup and on demand (the Rescan button). It detects:- GPU — vendor (NVIDIA/AMD/Apple Silicon), model name, and VRAM size
- RAM — total system memory, used as overflow when VRAM is insufficient for full GPU serving
- Backend — CUDA, ROCm, Metal, or CPU
In Docker, Cookbook can only detect GPUs that the container runtime exposes. If you see the wrong GPU (or no GPU), the Docker GPU overlay is not configured. See GPU Setup for the NVIDIA and AMD passthrough setup scripts.
Model recommendations and fit scoring
Using the detected hardware profile, Cookbook calculates a fit score for every model in its catalog. The score weighs four factors based on the use case (general, coding, reasoning, chat, multimodal):- Hardware fit — does the model’s memory footprint realistically fit in VRAM (or with acceptable CPU offload)?
- Estimated speed — predicted tokens per second based on GPU memory bandwidth and quantization
- Quality — parameter count and architecture generation bonus (e.g. Qwen3 scores higher than older Qwen2 at the same size)
- Context headroom — how much context window remains after the model weights are loaded
| Format | Description |
|---|---|
| GGUF | Quantized weights for llama.cpp (Q4_K_M, Q5_K_M, Q8_0, and others) |
| FP8 | 8-bit floating point for vLLM on Hopper (H100) and newer GPUs |
| AWQ | Activation-aware weight quantization for vLLM |
Downloading models
Click Download next to any recommended model to begin a background download from HuggingFace.- Downloads run in a
tmuxsession in the background so Odysseus stays responsive. - Progress is tracked and visible in the Downloads panel.
- Downloaded files are stored in
./data/huggingface(mapped to~/.cache/huggingfaceinside Docker).
Serving models
Once a model is downloaded, click Serve to start a local inference server. Cookbook manages thetmux session, writes the task to cookbook_state.json, and automatically registers the running model as a chat endpoint in Settings → Models.
Supported serving engines:
- vLLM
- llama.cpp
- Ollama
CUDA and ROCm only. Best throughput for GPU inference. Install via Cookbook → Dependencies. Not available on macOS.
Serve presets
Once you’ve dialed in a working serve command (model, flags, port, quantization level), save it as a Serve Preset from the Cookbook UI. Presets let you re-launch the same configuration with one click, or ask the agent to “start the Qwen serve preset” by name.Remote servers
Cookbook can download and serve models on a remote GPU server over SSH, so you don’t need Odysseus itself running on the GPU machine.Generate the Cookbook SSH key
Go to Cookbook → Settings → Servers and click Generate SSH Key. Cookbook creates an
ed25519 key pair. The public key is shown on screen.Alternatively, from the host you can copy the key manually:Add the public key to the remote server
On the remote server, append the public key to
~/.ssh/authorized_keys:Docker storage
When running Odysseus in Docker, model downloads and serve-engine installations survive container recreation because they are mapped to host volumes:| Path inside container | Host path | Contents |
|---|---|---|
~/.cache/huggingface | ./data/huggingface | Downloaded model files |
~/.local | ./data/local | Cookbook-installed Python CLIs and serve engines |
macOS notes
Apple Silicon Macs are fully supported with Metal acceleration via llama.cpp. However:- vLLM and SGLang are CUDA/ROCm only — they do not run on macOS.
- Docker on macOS cannot use the GPU. For GPU-accelerated Cookbook on an M-series Mac, run Odysseus natively with
./start-macos.sh. - MLX-only models are not served by Odysseus Cookbook.