Cookbook: Hardware-Aware Model Manager for Odysseus

Cookbook is Odysseus’s built-in model manager. Built on llmfit, it scans your hardware, scores every compatible model against your specific GPU and RAM configuration, and presents a ranked list of recommendations. Download from HuggingFace with one click, then spin up a server — all without touching a terminal. No CLI knowledge required.

Hardware scan

Cookbook probes your system at startup and on demand (the Rescan button). It detects:

GPU — vendor (NVIDIA/AMD/Apple Silicon), model name, and VRAM size
RAM — total system memory, used as overflow when VRAM is insufficient for full GPU serving
Backend — CUDA, ROCm, Metal, or CPU

Detection results are cached for 24 hours. For remote servers, hardware is probed over SSH.

In Docker, Cookbook can only detect GPUs that the container runtime exposes. If you see the wrong GPU (or no GPU), the Docker GPU overlay is not configured. See GPU Setup for the NVIDIA and AMD passthrough setup scripts.

Model recommendations and fit scoring

Using the detected hardware profile, Cookbook calculates a fit score for every model in its catalog. The score weighs four factors based on the use case (general, coding, reasoning, chat, multimodal):

Hardware fit — does the model’s memory footprint realistically fit in VRAM (or with acceptable CPU offload)?
Estimated speed — predicted tokens per second based on GPU memory bandwidth and quantization
Quality — parameter count and architecture generation bonus (e.g. Qwen3 scores higher than older Qwen2 at the same size)
Context headroom — how much context window remains after the model weights are loaded

Models are ranked by fit score so the best option for your exact hardware appears first. Supported formats:

Format	Description
GGUF	Quantized weights for llama.cpp (Q4_K_M, Q5_K_M, Q8_0, and others)
FP8	8-bit floating point for vLLM on Hopper (H100) and newer GPUs
AWQ	Activation-aware weight quantization for vLLM

Downloading models

Click Download next to any recommended model to begin a background download from HuggingFace.

Downloads run in a tmux session in the background so Odysseus stays responsive.
Progress is tracked and visible in the Downloads panel.
Downloaded files are stored in ./data/huggingface (mapped to ~/.cache/huggingface inside Docker).

For gated or private models, add your HuggingFace token in Cookbook → Settings → HuggingFace Token. The token is encrypted at rest.

tmux must be installed on the server for background downloads and serves to work on Linux and macOS. Install it via your OS package manager (apt install tmux, brew install tmux, etc.).

Serving models

Once a model is downloaded, click Serve to start a local inference server. Cookbook manages the tmux session, writes the task to cookbook_state.json, and automatically registers the running model as a chat endpoint in Settings → Models. Supported serving engines:

vLLM
llama.cpp
Ollama

CUDA and ROCm only. Best throughput for GPU inference. Install via Cookbook → Dependencies. Not available on macOS.

# Example vLLM serve command (generated by Cookbook)
vllm serve Qwen/Qwen3-8B --port 8000

Works on CUDA, ROCm, Metal (Apple Silicon), and CPU. Cross-platform. Cookbook builds a hardware-appropriate binary for you.

# Example llama.cpp serve command (generated by Cookbook)
llama-server -m ~/.cache/huggingface/Qwen3-8B-Q4_K_M.gguf --port 8080

Wraps the Ollama CLI. Useful if Ollama is already installed on the system. Serves via the Ollama API (/api/chat or /v1).

Install missing serve engines from Cookbook → Dependencies — Cookbook handles the pip install and build steps.

Serve presets

Once you’ve dialed in a working serve command (model, flags, port, quantization level), save it as a Serve Preset from the Cookbook UI. Presets let you re-launch the same configuration with one click, or ask the agent to “start the Qwen serve preset” by name.

Remote servers

Cookbook can download and serve models on a remote GPU server over SSH, so you don’t need Odysseus itself running on the GPU machine.

Generate the Cookbook SSH key

Go to Cookbook → Settings → Servers and click Generate SSH Key. Cookbook creates an ed25519 key pair. The public key is shown on screen.Alternatively, from the host you can copy the key manually:

ssh-copy-id -i data/ssh/id_ed25519.pub user@server

Add the public key to the remote server

On the remote server, append the public key to ~/.ssh/authorized_keys:

echo "ssh-ed25519 AAAA... odysseus-cookbook" >> ~/.ssh/authorized_keys

Add the server in Cookbook

In Cookbook → Settings → Servers, click Add Server and enter user@hostname. Cookbook will SSH in for hardware probes and model operations.

Docker storage

When running Odysseus in Docker, model downloads and serve-engine installations survive container recreation because they are mapped to host volumes:

Path inside container	Host path	Contents
`~/.cache/huggingface`	`./data/huggingface`	Downloaded model files
`~/.local`	`./data/local`	Cookbook-installed Python CLIs and serve engines

Do not delete these directories unless you intend to re-download all models.

macOS notes

Apple Silicon Macs are fully supported with Metal acceleration via llama.cpp. However:

vLLM and SGLang are CUDA/ROCm only — they do not run on macOS.
Docker on macOS cannot use the GPU. For GPU-accelerated Cookbook on an M-series Mac, run Odysseus natively with ./start-macos.sh.
MLX-only models are not served by Odysseus Cookbook.

Docker requires GPU passthrough configured at the container-runtime level before Cookbook can see or use your GPU. Run the diagnostic script first:

# NVIDIA
scripts/check-docker-gpu.sh

# AMD
scripts/check-docker-amd-gpu.sh

See GPU Setup for full instructions.

Get Started

Features

Deployment

Integrations

Security & Administration

Cookbook: Hardware-Aware Model Manager for Odysseus

Hardware scan

Model recommendations and fit scoring

Downloading models

Serving models

Serve presets

Remote servers

Docker storage

macOS notes

Build docs developers (and LLMs) love

Get Started

Features

Deployment

Integrations

Security & Administration

Documentation Index

​Hardware scan

​Model recommendations and fit scoring

​Downloading models

​Serving models

​Serve presets

​Remote servers

​Docker storage

​macOS notes

Build docs developers (and LLMs) love

Hardware scan

Model recommendations and fit scoring

Downloading models

Serving models

Serve presets

Remote servers

Docker storage

macOS notes