Skip to main content

Overview

The cactus download command fetches models from HuggingFace and converts them to Cactus format. Models are cached in ./weights for offline use.

Syntax

cactus download <model> [flags]

Arguments

  • <model> - Model name or HuggingFace repository

Model Naming Conventions

Cactus supports several model name formats:

Short Names

cactus download qwen-2.5-1.5b
cactus download llama-3.2-1b
cactus download phi-4

HuggingFace Repository Format

cactus download Qwen/Qwen2.5-1.5B-Instruct
cactus download meta-llama/Llama-3.2-1B-Instruct

Flags

—precision

Set the quantization precision level:
cactus download <model> --precision INT4|INT8|FP16
Default: INT4 Options:
  • INT4 - 4-bit quantization (smallest size, ~1-2GB per model)
  • INT8 - 8-bit quantization (medium size, ~2-4GB per model)
  • FP16 - 16-bit floating point (largest size, ~4-8GB per model)

—token

Provide a HuggingFace API token for authentication:
cactus download <model> --token <your-hf-token>
Required for:
  • Gated models (Llama, Gemma)
  • Private repositories
  • Rate-limited downloads

—reconvert

Force reconversion from source weights:
cactus download <model> --reconvert
Useful when:
  • Model format has been updated
  • Previous conversion was incomplete
  • Switching between precision levels

Examples

# Download Qwen with default INT4 precision
cactus download qwen-2.5-1.5b

Download Progress

The command shows real-time download and conversion progress:
┌─────────────────────────────────────────────┐
│ Downloading: qwen-2.5-1.5b                  │
│ Precision: INT4                             │
└─────────────────────────────────────────────┘

Fetching from HuggingFace...
model.safetensors ████████████████ 100% 1.2GB
tokenizer.json    ████████████████ 100% 2.1MB
config.json       ████████████████ 100% 1.8KB

Converting to Cactus format...
Quantizing to INT4 ████████████████ 100%

✓ Model downloaded to ./weights/qwen-2.5-1.5b-int4

Cache Location

All downloaded models are stored in:
./weights/
├── qwen-2.5-1.5b-int4/
├── llama-3.2-1b-fp16/
├── phi-4-int8/
└── parakeet-1.1b-int4/
Each model directory contains:
  • Quantized weights
  • Tokenizer files
  • Model configuration
  • Metadata

Disk Space Requirements

Typical sizes by precision:
Precision1B Model3B Model7B Model
INT4~800MB~2GB~4GB
INT8~1.5GB~3.5GB~7GB
FP16~3GB~7GB~14GB

Offline Usage

Once downloaded, models can be used without internet:
# Download while online
cactus download qwen-2.5-1.5b

# Use offline later
cactus run qwen-2.5-1.5b  # Uses cached version

See Also

Run Command

Run downloaded models interactively

Convert Command

Convert models with custom settings

Build docs developers (and LLMs) love