Skip to main content

Overview

The cactus convert command transforms models from HuggingFace format to Cactus format with quantization. Supports merging LoRA adapters into base models.

Syntax

cactus convert <model> [output_dir] [flags]

Arguments

  • <model> - Model name or HuggingFace repository
  • [output_dir] - Optional output directory (default: ./weights/<model-name>)

Flags

—precision

Set the quantization precision level:
cactus convert <model> --precision INT4|INT8|FP16
Default: INT4 Options:
  • INT4 - 4-bit quantization (smallest size)
  • INT8 - 8-bit quantization (balanced)
  • FP16 - 16-bit floating point (highest quality)

—lora

Merge a LoRA adapter into the base model:
cactus convert <model> --lora <path/to/lora>
Supports:
  • Local LoRA adapter directories
  • HuggingFace LoRA repositories
  • Multiple LoRA adapters (specify flag multiple times)

—token

Provide a HuggingFace API token for downloading source models:
cactus convert <model> --token <your-hf-token>
Required for gated models or private repositories.

Examples

# Convert Qwen to INT4 format
cactus convert qwen-2.5-1.5b

Conversion Process

The conversion pipeline includes:
  1. Download - Fetch source model from HuggingFace (if needed)
  2. LoRA Merge - Apply LoRA adapters to base weights (if specified)
  3. Quantization - Convert to target precision level
  4. Optimization - Apply Cactus-specific optimizations
  5. Export - Write converted model to output directory
┌─────────────────────────────────────────────┐
│ Converting: qwen-2.5-1.5b                   │
│ Precision: INT4                             │
│ LoRA: ./adapters/my-finetune                │
└─────────────────────────────────────────────┘

Loading base model...
Applying LoRA adapter... ████████████ 100%
Quantizing to INT4...    ████████████ 100%
Optimizing for ARM...    ████████████ 100%
Writing weights...       ████████████ 100%

✓ Conversion complete
  Output: ./weights/qwen-2.5-1.5b-int4
  Size: 1.2GB

LoRA Adapter Format

Supported LoRA formats:

Local Directory

./adapters/my-lora/
├── adapter_config.json
├── adapter_model.safetensors  # or .bin
└── README.md

HuggingFace Repository

# Public repository
cactus convert base-model --lora username/lora-adapter

# Private repository (requires token)
cactus convert base-model \
  --lora username/private-lora \
  --token hf_xxxxxxxxxxxxx

Output Format

Converted models include:
./weights/model-name-precision/
├── weights.bin          # Quantized model weights
├── tokenizer.json       # Tokenizer vocabulary
├── config.json          # Model configuration
└── metadata.json        # Conversion metadata

Use Cases

Fine-tuned Models

Convert your custom fine-tuned models:
# Convert your HuggingFace fine-tune
cactus convert username/my-finetuned-llama

LoRA Experimentation

Test different LoRA combinations:
# Base model
cactus convert qwen-2.5-7b

# With coding LoRA
cactus convert qwen-2.5-7b --lora ./coding-lora

# With math LoRA
cactus convert qwen-2.5-7b --lora ./math-lora

Precision Optimization

Create multiple precision variants:
# Small & fast (mobile)
cactus convert phi-4 --precision INT4

# Balanced (tablets)
cactus convert phi-4 --precision INT8

# High quality (desktop)
cactus convert phi-4 --precision FP16

See Also

Download Command

Download models without custom conversion

Run Command

Run converted models interactively

Build docs developers (and LLMs) love