Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/cactus-compute/cactus/llms.txt

Use this file to discover all available pages before exploring further.

Overview

The cactus download command fetches models from HuggingFace and converts them to Cactus format. Models are cached in ./weights for offline use.

Syntax

cactus download <model> [flags]

Arguments

  • <model> - Model name or HuggingFace repository

Model Naming Conventions

Cactus supports several model name formats:

Short Names

cactus download qwen-2.5-1.5b
cactus download llama-3.2-1b
cactus download phi-4

HuggingFace Repository Format

cactus download Qwen/Qwen2.5-1.5B-Instruct
cactus download meta-llama/Llama-3.2-1B-Instruct

Flags

—precision

Set the quantization precision level:
cactus download <model> --precision INT4|INT8|FP16
Default: INT4 Options:
  • INT4 - 4-bit quantization (smallest size, ~1-2GB per model)
  • INT8 - 8-bit quantization (medium size, ~2-4GB per model)
  • FP16 - 16-bit floating point (largest size, ~4-8GB per model)

—token

Provide a HuggingFace API token for authentication:
cactus download <model> --token <your-hf-token>
Required for:
  • Gated models (Llama, Gemma)
  • Private repositories
  • Rate-limited downloads

—reconvert

Force reconversion from source weights:
cactus download <model> --reconvert
Useful when:
  • Model format has been updated
  • Previous conversion was incomplete
  • Switching between precision levels

Examples

# Download Qwen with default INT4 precision
cactus download qwen-2.5-1.5b

Download Progress

The command shows real-time download and conversion progress:
┌─────────────────────────────────────────────┐
│ Downloading: qwen-2.5-1.5b                  │
│ Precision: INT4                             │
└─────────────────────────────────────────────┘

Fetching from HuggingFace...
model.safetensors ████████████████ 100% 1.2GB
tokenizer.json    ████████████████ 100% 2.1MB
config.json       ████████████████ 100% 1.8KB

Converting to Cactus format...
Quantizing to INT4 ████████████████ 100%

✓ Model downloaded to ./weights/qwen-2.5-1.5b-int4

Cache Location

All downloaded models are stored in:
./weights/
├── qwen-2.5-1.5b-int4/
├── llama-3.2-1b-fp16/
├── phi-4-int8/
└── parakeet-1.1b-int4/
Each model directory contains:
  • Quantized weights
  • Tokenizer files
  • Model configuration
  • Metadata

Disk Space Requirements

Typical sizes by precision:
Precision1B Model3B Model7B Model
INT4~800MB~2GB~4GB
INT8~1.5GB~3.5GB~7GB
FP16~3GB~7GB~14GB

Offline Usage

Once downloaded, models can be used without internet:
# Download while online
cactus download qwen-2.5-1.5b

# Use offline later
cactus run qwen-2.5-1.5b  # Uses cached version

See Also

Run Command

Run downloaded models interactively

Convert Command

Convert models with custom settings

Build docs developers (and LLMs) love