Documentation Index
Fetch the complete documentation index at: https://mintlify.com/MilesONerd/neurenix/llms.txt
Use this file to discover all available pages before exploring further.
The optimize command improves model performance by applying optimization techniques such as quantization, pruning, knowledge distillation, and hyperparameter tuning.
Usage
neurenix optimize --model <model_file> [options]
Options
| Option | Type | Default | Description |
|---|
--model | string | required | Path to the model file |
--output | string | auto | Output file for optimized model |
--technique | string | auto | Optimization technique |
--quantize | string | None | Quantization precision (int8, fp16, fp8) |
--prune | float | None | Pruning level (0.0 to 1.0) |
--data | string | None | Calibration data for optimization |
--config | string | None | Optimization configuration file |
--device | string | auto | Device to use for optimization |
Optimization Techniques
| Technique | Description |
|---|
quantize | Reduce precision to int8, fp16, or fp8 |
prune | Remove less important weights |
distill | Transfer knowledge to smaller model |
hyperparameter | Tune hyperparameters automatically |
auto | Automatically select best technique |
Examples
Auto optimization
neurenix optimize --model models/classifier.nrx
Loading model from models/classifier.nrx...
Optimizing model using auto technique...
Saving optimized model to models/classifier_optimized.nrx...
Optimization Results:
model_size_reduction: 0.45
inference_speedup: 2.3x
accuracy_change: -0.002
Model successfully optimized and saved to models/classifier_optimized.nrx
Quantize to int8
neurenix optimize \
--model models/model.nrx \
--technique quantize \
--quantize int8
Loading model from models/model.nrx...
Optimizing model using quantize technique...
Saving optimized model to models/model_int8.nrx...
Optimization Results:
model_size: 24.5 MB -> 6.2 MB
inference_time: 12.3ms -> 3.8ms
accuracy: 0.923 -> 0.918
Model successfully optimized and saved to models/model_int8.nrx
Quantize to fp16
neurenix optimize \
--model models/large_model.nrx \
--quantize fp16 \
--output models/large_model_fp16.nrx
Loading model from models/large_model.nrx...
Optimizing model using auto technique...
Saving optimized model to models/large_model_fp16.nrx...
Optimization Results:
model_size_reduction: 0.5
inference_speedup: 1.8x
accuracy_change: -0.001
Model successfully optimized and saved to models/large_model_fp16.nrx
Prune model
neurenix optimize \
--model models/model.nrx \
--technique prune \
--prune 0.3
Loading model from models/model.nrx...
Optimizing model using prune technique...
Saving optimized model to models/model_pruned_30.nrx...
Optimization Results:
weights_removed: 30%
model_size_reduction: 0.25
inference_speedup: 1.4x
accuracy_change: -0.015
Model successfully optimized and saved to models/model_pruned_30.nrx
Optimize with calibration data
neurenix optimize \
--model models/model.nrx \
--technique quantize \
--quantize int8 \
--data data/calibration.csv
Loading model from models/model.nrx...
Loading calibration data from data/calibration.csv...
Optimizing model using quantize technique...
Saving optimized model to models/model_int8.nrx...
Optimization Results:
model_size_reduction: 0.75
inference_speedup: 3.2x
accuracy_change: -0.005
Model successfully optimized and saved to models/model_int8.nrx
Use configuration file
neurenix optimize \
--model models/model.nrx \
--config configs/optimize.json
Loading model from models/model.nrx...
Optimizing model using quantize technique...
Saving optimized model to models/model_optimized.nrx...
Optimization Results:
model_size_reduction: 0.6
inference_speedup: 2.5x
accuracy_change: -0.008
Model successfully optimized and saved to models/model_optimized.nrx
Configuration File
Create a JSON configuration for complex optimization:
{
"technique": "quantize",
"quantize": "int8",
"calibration": {
"samples": 1000,
"method": "percentile"
},
"validation": {
"min_accuracy": 0.90
}
}
Then use it:
neurenix optimize --model model.nrx --config optimize_config.json
Quantization Precision
int8 (8-bit Integer)
- Best for: Edge deployment, mobile devices
- Size reduction: ~75%
- Speed improvement: 2-4x
- Accuracy loss: 1-3%
neurenix optimize --model model.nrx --quantize int8
fp16 (16-bit Float)
- Best for: GPU deployment
- Size reduction: ~50%
- Speed improvement: 1.5-2x
- Accuracy loss: less than 1%
neurenix optimize --model model.nrx --quantize fp16
fp8 (8-bit Float)
- Best for: Modern GPUs (H100, A100)
- Size reduction: ~75%
- Speed improvement: 2-3x
- Accuracy loss: less than 2%
neurenix optimize --model model.nrx --quantize fp8
Pruning Levels
| Level | Pruning | Use Case |
|---|
| 0.1 | 10% | Minimal optimization |
| 0.3 | 30% | Balanced optimization |
| 0.5 | 50% | Aggressive optimization |
| 0.7 | 70% | Maximum optimization |
# Light pruning
neurenix optimize --model model.nrx --prune 0.1
# Aggressive pruning
neurenix optimize --model model.nrx --prune 0.5
Higher pruning levels (>0.5) may significantly impact model accuracy. Always validate performance after pruning.
Optimization Results
The command outputs detailed optimization metrics:
Optimization Results:
model_size: 245.3 MB -> 61.2 MB (75% reduction)
inference_time: 45.6ms -> 12.3ms (3.7x speedup)
accuracy: 0.934 -> 0.928 (0.6% decrease)
throughput: 22 samples/sec -> 81 samples/sec
Error Handling
Model not found
neurenix optimize --model missing.nrx
Error: Model file 'missing.nrx' not found.
Invalid quantization precision
neurenix optimize --model model.nrx --quantize invalid
Error: Invalid quantization precision. Choose from: int8, fp16, fp8
Invalid pruning level
neurenix optimize --model model.nrx --prune 1.5
Error: Pruning level must be between 0.0 and 1.0
Optimization failed
neurenix optimize --model model.nrx --quantize int8
Loading model from model.nrx...
Error optimizing model: Quantization failed - model architecture not supported
Use Cases
1. Deploy to mobile devices
neurenix optimize \
--model models/production.nrx \
--quantize int8 \
--data data/calibration.csv \
--output models/mobile.nrx
2. Reduce inference costs
neurenix optimize \
--model models/large_model.nrx \
--quantize fp16 \
--output models/efficient.nrx
3. Speed up real-time inference
neurenix optimize \
--model models/detector.nrx \
--technique quantize \
--quantize int8 \
--device cuda
4. Compress models for storage
neurenix optimize \
--model models/checkpoint.nrx \
--prune 0.4 \
--output models/compressed.nrx
5. Automated optimization pipeline
# Try different optimization strategies
for technique in quantize prune auto; do
neurenix optimize \
--model models/base.nrx \
--technique $technique \
--output models/optimized_${technique}.nrx
done
Best Practices
1. Use calibration data
Provide representative data for better quantization:
neurenix optimize \
--model model.nrx \
--quantize int8 \
--data data/representative_samples.csv
2. Start with conservative settings
Begin with less aggressive optimization:
# Start with fp16
neurenix optimize --model model.nrx --quantize fp16
# If accuracy is acceptable, try int8
neurenix optimize --model model.nrx --quantize int8
3. Validate after optimization
Always check model performance:
# Optimize
neurenix optimize --model model.nrx --output optimized.nrx
# Validate
neurenix eval --model optimized.nrx --data data/test.csv
4. Keep original model
Never overwrite your original model:
# Good: explicit output
neurenix optimize --model model.nrx --output model_optimized.nrx
# Bad: might overwrite (don't do this)
# neurenix optimize --model model.nrx --output model.nrx
5. Document optimization settings
Save optimization configuration:
cat > optimize_config.json << EOF
{
"technique": "quantize",
"quantize": "int8",
"notes": "Optimized for mobile deployment"
}
EOF
neurenix optimize --model model.nrx --config optimize_config.json
Optimization Workflow
#!/bin/bash
# 1. Train model
neurenix run train.py
# 2. Evaluate baseline
neurenix eval --model models/model.nrx --data data/test.csv > baseline.txt
# 3. Optimize
neurenix optimize \
--model models/model.nrx \
--quantize int8 \
--data data/calibration.csv \
--output models/model_int8.nrx
# 4. Evaluate optimized
neurenix eval --model models/model_int8.nrx --data data/test.csv > optimized.txt
# 5. Compare results
diff baseline.txt optimized.txt
See Also