Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/cactus-compute/cactus/llms.txt

Use this file to discover all available pages before exploring further.

Overview

The cactus convert command transforms models from HuggingFace format to Cactus format with quantization. Supports merging LoRA adapters into base models.

Syntax

cactus convert <model> [output_dir] [flags]

Arguments

  • <model> - Model name or HuggingFace repository
  • [output_dir] - Optional output directory (default: ./weights/<model-name>)

Flags

—precision

Set the quantization precision level:
cactus convert <model> --precision INT4|INT8|FP16
Default: INT4 Options:
  • INT4 - 4-bit quantization (smallest size)
  • INT8 - 8-bit quantization (balanced)
  • FP16 - 16-bit floating point (highest quality)

—lora

Merge a LoRA adapter into the base model:
cactus convert <model> --lora <path/to/lora>
Supports:
  • Local LoRA adapter directories
  • HuggingFace LoRA repositories
  • Multiple LoRA adapters (specify flag multiple times)

—token

Provide a HuggingFace API token for downloading source models:
cactus convert <model> --token <your-hf-token>
Required for gated models or private repositories.

Examples

# Convert Qwen to INT4 format
cactus convert qwen-2.5-1.5b

Conversion Process

The conversion pipeline includes:
  1. Download - Fetch source model from HuggingFace (if needed)
  2. LoRA Merge - Apply LoRA adapters to base weights (if specified)
  3. Quantization - Convert to target precision level
  4. Optimization - Apply Cactus-specific optimizations
  5. Export - Write converted model to output directory
┌─────────────────────────────────────────────┐
│ Converting: qwen-2.5-1.5b                   │
│ Precision: INT4                             │
│ LoRA: ./adapters/my-finetune                │
└─────────────────────────────────────────────┘

Loading base model...
Applying LoRA adapter... ████████████ 100%
Quantizing to INT4...    ████████████ 100%
Optimizing for ARM...    ████████████ 100%
Writing weights...       ████████████ 100%

✓ Conversion complete
  Output: ./weights/qwen-2.5-1.5b-int4
  Size: 1.2GB

LoRA Adapter Format

Supported LoRA formats:

Local Directory

./adapters/my-lora/
├── adapter_config.json
├── adapter_model.safetensors  # or .bin
└── README.md

HuggingFace Repository

# Public repository
cactus convert base-model --lora username/lora-adapter

# Private repository (requires token)
cactus convert base-model \
  --lora username/private-lora \
  --token hf_xxxxxxxxxxxxx

Output Format

Converted models include:
./weights/model-name-precision/
├── weights.bin          # Quantized model weights
├── tokenizer.json       # Tokenizer vocabulary
├── config.json          # Model configuration
└── metadata.json        # Conversion metadata

Use Cases

Fine-tuned Models

Convert your custom fine-tuned models:
# Convert your HuggingFace fine-tune
cactus convert username/my-finetuned-llama

LoRA Experimentation

Test different LoRA combinations:
# Base model
cactus convert qwen-2.5-7b

# With coding LoRA
cactus convert qwen-2.5-7b --lora ./coding-lora

# With math LoRA
cactus convert qwen-2.5-7b --lora ./math-lora

Precision Optimization

Create multiple precision variants:
# Small & fast (mobile)
cactus convert phi-4 --precision INT4

# Balanced (tablets)
cactus convert phi-4 --precision INT8

# High quality (desktop)
cactus convert phi-4 --precision FP16

See Also

Download Command

Download models without custom conversion

Run Command

Run converted models interactively

Build docs developers (and LLMs) love