Documentation Index Fetch the complete documentation index at: https://mintlify.com/cactus-compute/cactus/llms.txt
Use this file to discover all available pages before exploring further.
Overview
The cactus convert command transforms models from HuggingFace format to Cactus format with quantization. Supports merging LoRA adapters into base models.
Syntax
cactus convert < mode l > [output_dir] [flags]
Arguments
<model> - Model name or HuggingFace repository
[output_dir] - Optional output directory (default: ./weights/<model-name>)
Flags
—precision
Set the quantization precision level:
cactus convert < mode l > --precision INT4 | INT8 | FP16
Default: INT4
Options:
INT4 - 4-bit quantization (smallest size)
INT8 - 8-bit quantization (balanced)
FP16 - 16-bit floating point (highest quality)
—lora
Merge a LoRA adapter into the base model:
cactus convert < mode l > --lora < path/to/lor a >
Supports:
Local LoRA adapter directories
HuggingFace LoRA repositories
Multiple LoRA adapters (specify flag multiple times)
—token
Provide a HuggingFace API token for downloading source models:
cactus convert < mode l > --token < your-hf-toke n >
Required for gated models or private repositories.
Examples
Basic Conversion
Custom Output Directory
High Precision
Merge LoRA Adapter
HuggingFace LoRA
Multiple LoRAs
# Convert Qwen to INT4 format
cactus convert qwen-2.5-1.5b
Conversion Process
The conversion pipeline includes:
Download - Fetch source model from HuggingFace (if needed)
LoRA Merge - Apply LoRA adapters to base weights (if specified)
Quantization - Convert to target precision level
Optimization - Apply Cactus-specific optimizations
Export - Write converted model to output directory
┌─────────────────────────────────────────────┐
│ Converting: qwen-2.5-1.5b │
│ Precision: INT4 │
│ LoRA: ./adapters/my-finetune │
└─────────────────────────────────────────────┘
Loading base model...
Applying LoRA adapter... ████████████ 100%
Quantizing to INT4... ████████████ 100%
Optimizing for ARM... ████████████ 100%
Writing weights... ████████████ 100%
✓ Conversion complete
Output: ./weights/qwen-2.5-1.5b-int4
Size: 1.2GB
Supported LoRA formats:
Local Directory
./adapters/my-lora/
├── adapter_config.json
├── adapter_model.safetensors # or .bin
└── README.md
HuggingFace Repository
# Public repository
cactus convert base-model --lora username/lora-adapter
# Private repository (requires token)
cactus convert base-model \
--lora username/private-lora \
--token hf_xxxxxxxxxxxxx
Converted models include:
./weights/model-name-precision/
├── weights.bin # Quantized model weights
├── tokenizer.json # Tokenizer vocabulary
├── config.json # Model configuration
└── metadata.json # Conversion metadata
Use Cases
Fine-tuned Models
Convert your custom fine-tuned models:
# Convert your HuggingFace fine-tune
cactus convert username/my-finetuned-llama
LoRA Experimentation
Test different LoRA combinations:
# Base model
cactus convert qwen-2.5-7b
# With coding LoRA
cactus convert qwen-2.5-7b --lora ./coding-lora
# With math LoRA
cactus convert qwen-2.5-7b --lora ./math-lora
Precision Optimization
Create multiple precision variants:
# Small & fast (mobile)
cactus convert phi-4 --precision INT4
# Balanced (tablets)
cactus convert phi-4 --precision INT8
# High quality (desktop)
cactus convert phi-4 --precision FP16
See Also
Download Command Download models without custom conversion
Run Command Run converted models interactively