Skip to main content

Compatibility

This guide covers weight versioning across Cactus releases, breaking changes in the API, and platform requirements.

Weight Versioning

Some Cactus releases change the internal weight format. When this happens, cached weights from an older version will not load with a newer runtime and must be re-downloaded. Breaking weight changes are called out in the release notes.

How Versioning Works

Weights are published to Hugging Face and only re-tagged when they actually change. If a release does not affect the weight format, the previous tag remains — no new upload.
Runtime v1.7  -> weights tagged v1.7 on HF
Runtime v1.8  -> no new tag (unchanged) - still use v1.7
...
Runtime v1.14 -> no new tag - still use v1.7
Runtime v1.15 -> new tag v1.15 (changed!) - must update
The rule: use the latest HF weight tag that is ≤ your runtime version.

Checking Compatibility

  1. Open your model on huggingface.co/Cactus-Compute
  2. Click Files and versions → open branch dropdown from Main
  3. Find the latest tag that is ≤ your runtime version
  4. If your local weights use an older tag, re-download them

Re-downloading Weights

# Force reconversion from HuggingFace source
cactus download LiquidAI/LFM2.5-1.2B-Instruct --reconvert

# Or for custom models
cactus convert Qwen/Qwen3-0.6B ./my-model --reconvert
The --reconvert flag forces re-downloading and re-converting weights from source.
Weight mismatch errors: If you see errors like “incompatible weight format” or “failed to load weights”, re-download your models with --reconvert.

Breaking Changes

v1.15 (Jan 2026)

NPU Acceleration & Hybrid Inference
  • ✅ New weight format with NPU prefill models
  • ✅ Vision models now include .mlmodelc files for Apple NPU
  • ✅ Speech models now include .mlmodelc files for Apple NPU
  • ⚠️ Breaking: Weights from v1.14 and earlier must be re-downloaded
Migration:
cactus download <model-name> --reconvert

v1.7 (Oct 2025)

Chunked Prefill & KV Cache Quantization
  • ✅ KV cache now uses INT8 (2x memory reduction)
  • ✅ Chunked prefill enabled by default
  • ⚠️ Breaking: KV cache format changed
  • ⚠️ Breaking: Weights must be re-downloaded
Migration:
cactus download <model-name> --reconvert

v1.0 (Sep 2025)

Initial Release
  • First public release
  • Initial weight format (v1.0 tag on HF)

API Breaking Changes

v1.15 (Jan 2026)

Added NPU APIs:
// New methods in engine.h
bool load_npu_prefill(const std::string& model_path);
bool has_npu_prefill() const;
size_t get_prefill_chunk_size() const;
Backward compatible — Existing code continues to work.

v1.7 (Oct 2025)

Added Cache Control:
// New methods in engine.h
void set_cache_window(size_t window_size, size_t sink_size = 4);
void reset_cache();
Backward compatible — Existing code continues to work.

v1.0 (Sep 2025)

Initial API — All APIs introduced.

Platform Requirements

iOS

Minimum Version: iOS 14.0+ Recommended: iOS 15.0+ for better Neural Engine APIs Device Requirements:
  • iPhone XS / XR or newer (A12 Bionic+)
  • iPad Air (3rd gen) or newer
  • iPad Pro (3rd gen) or newer
  • iPad mini (5th gen) or newer
NPU Acceleration:
  • iOS 14.0+ required
  • A12 Bionic or newer (Neural Engine)

macOS

Minimum Version: macOS 11.0+ (Big Sur) Device Requirements:
  • Apple Silicon (M1, M2, M3, M4)
  • Intel Macs not supported (no Neural Engine, slower CPU)
NPU Acceleration:
  • macOS 11.0+ required
  • Apple Silicon required (M1+)

Android

Minimum API Level: API 24+ (Android 7.0) Recommended: API 29+ (Android 10) for better performance Device Requirements:
  • ARM64 (arm64-v8a) architecture
  • 2GB+ RAM recommended
  • 4GB+ RAM for larger models (>1B params)
NPU Acceleration (Coming Mar 2026):
  • API 29+ required
  • Snapdragon 8 Gen 1+ or Dimensity 9000+
  • NNAPI or QNN runtime installed

Linux (Desktop/Raspberry Pi)

Minimum: Ubuntu 20.04+ or Debian 11+ Device Requirements:
  • ARM64 or x86_64 architecture
  • 4GB+ RAM for development
  • 2GB+ RAM for inference only
Raspberry Pi:
  • Raspberry Pi 4 (4GB+ RAM) or Raspberry Pi 5
  • 64-bit Raspberry Pi OS
No GPU required: Cactus runs entirely on CPU (with optional NPU for mobile). No CUDA, OpenCL, or discrete GPU needed.

Runtime Version Detection

Checking Runtime Version

# Check installed version
cactus --version
Output:
Cactus v1.15.0
Build: Mar 05 2026
Commit: a1b2c3d

Checking from Code

// Currently not exposed in public API
// Check release notes or GitHub releases page

SDK Compatibility

Python SDK

Requires: Python 3.8+ Platforms: macOS (Apple Silicon), Linux (ARM64/x86_64) Install:
pip install cactus-compute

Swift SDK

Requires: Swift 5.5+, Xcode 13+ Platforms: iOS 14+, macOS 11+ Install: Link cactus-ios.xcframework or cactus-macos.xcframework

Kotlin SDK (Android)

Requires: Kotlin 1.8+, Android Gradle Plugin 7.0+ Platforms: Android API 24+ Install: Copy libcactus.so to jniLibs/arm64-v8a/

Flutter SDK

Requires: Flutter 3.0+, Dart 2.17+ Platforms: iOS 14+, Android API 24+, macOS 11+ Install:
flutter pub add cactus_flutter

React Native

Requires: React Native 0.70+ Platforms: iOS 14+, Android API 24+ Install:
npm install @cactus-compute/react-native

Model Compatibility

Supported Architectures

Cactus supports these model families:
  • Gemma 3 — All variants (270m, 1b)
  • Qwen 3 — All variants (0.6B, 1.7B)
  • LFM 2/2.5 — All variants (350M to 8B)
  • SmolLM 2 — Coming soon
  • Whisper — All variants (tiny, base, small, medium)
  • Parakeet — All variants (0.6b, 1.1b)
  • Nomic Embed — Embedding models
  • Silero VAD — Voice activity detection
For the complete list, see Supported Models.

Custom Models

Custom models and fine-tunes must:
  • Use a supported base architecture
  • Be converted with matching Cactus runtime version
  • Use LoRA adapters trained with PEFT/Unsloth
Architecture changes break compatibility: If Cactus adds support for a new model architecture, existing weights for other models may still work, but new models require the latest runtime.

Cross-Platform Weights

Cactus weights are platform-independent:
  • Convert once on Mac/Linux
  • Deploy to iOS, Android, macOS, Linux, Raspberry Pi
  • Same weight files work everywhere
Exception: NPU models (.mlmodelc files) are Apple-specific and only load on iOS/macOS.

Troubleshooting

Incompatible Weight Format

Error: incompatible weight format version 1.7, expected 1.15
Solution: Re-download weights with --reconvert:
cactus download <model-name> --reconvert

Platform Not Supported

Error: platform 'x86_64-linux' not supported
Solution: Cactus requires ARM64 on mobile. Use Apple Silicon or ARM64 Linux for desktop.

NPU Not Available

Warning: NPU not available, using CPU fallback
Reasons:
  • Device doesn’t have compatible NPU (Intel Mac, older iPhone)
  • Platform not yet supported (Android NPU coming Mar 2026)
  • Simulator build (NPU only on physical devices)
Solution: Use CPU fallback (automatic) or test on physical device.

Missing Dependencies (Linux)

Error: libcurl.so.4: cannot open shared object file
Solution: Install dependencies:
sudo apt-get install libcurl4-openssl-dev build-essential

See Also

Build docs developers (and LLMs) love