The hls4ml Optimization API provides hardware-aware pruning and weight sharing techniques to reduce model footprint and computational requirements while targeting specific hardware resources.
Overview
The optimization framework solves a Knapsack optimization problem to maximize model performance while minimizing target resource utilization. It supports multiple objectives including:
Network sparsity (parameter reduction)
GPU FLOPs (computational efficiency)
FPGA DSP blocks (hardware multipliers)
Memory utilization (BRAM/FF)
Installation
Optimization features require TensorFlow/Keras:
pip install hls4ml tensorflow
Optimization Structures
The API supports four pruning structures:
Unstructured Removes individual weights. Maximizes flexibility but may not reduce hardware resources efficiently.
Structured Removes entire neurons (Dense) or filters (Conv2D). Directly reduces computational requirements.
Pattern Groups weights processed by the same DSP. Optimizes DSP utilization in Resource strategy.
Block Removes rectangular blocks of weights. Supports only rank-2 layers (Dense).
Unstructured Pruning
Minimize total parameter count with weight-level pruning:
import numpy as np
from sklearn.metrics import accuracy_score
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.metrics import CategoricalAccuracy
from tensorflow.keras.losses import CategoricalCrossentropy
from hls4ml.optimization.dsp_aware_pruning.keras import optimize_model
from hls4ml.optimization.dsp_aware_pruning.keras.utils import get_model_sparsity
from hls4ml.optimization.dsp_aware_pruning.attributes import get_attributes_from_keras_model
from hls4ml.optimization.dsp_aware_pruning.objectives import ParameterEstimator
from hls4ml.optimization.dsp_aware_pruning.scheduler import PolynomialScheduler
# Load model and data
# baseline_model = ...
# X_train, y_train, X_val, y_val, X_test, y_test = ...
# Evaluate baseline
y_baseline = baseline_model.predict(X_test)
acc_base = accuracy_score(np.argmax(y_test, axis = 1 ), np.argmax(y_baseline, axis = 1 ))
sparsity, layers = get_model_sparsity(baseline_model)
print ( f 'Baseline accuracy: { acc_base } ' )
print ( f 'Baseline sparsity: { sparsity } ' )
# Configure optimization
epochs = 10
batch_size = 128
optimizer = Adam()
loss_fn = CategoricalCrossentropy( from_logits = True )
metric = CategoricalAccuracy()
increasing = True # Accuracy increases with better performance
rtol = 0.975 # Allow 2.5% performance drop
# Create sparsity scheduler
# Polynomial schedule: gradually increase sparsity to 50% over 5 steps
scheduler = PolynomialScheduler( steps = 5 , final_sparsity = 0.5 )
# Get model attributes
model_attributes = get_attributes_from_keras_model(baseline_model)
# Optimize for minimum parameters
optimized_model = optimize_model(
baseline_model, model_attributes, ParameterEstimator, scheduler,
X_train, y_train, X_val, y_val, batch_size, epochs,
optimizer, loss_fn, metric, increasing, rtol
)
# Evaluate optimized model
y_optimized = optimized_model.predict(X_test)
acc_optimized = accuracy_score(np.argmax(y_test, axis = 1 ), np.argmax(y_optimized, axis = 1 ))
sparsity, layers = get_model_sparsity(optimized_model)
print ( f 'Optimized accuracy: { acc_optimized } ' )
print ( f 'Optimized sparsity: { sparsity } ' )
GPU FLOP Optimization
Reduce computational complexity with structured pruning:
from hls4ml.optimization.dsp_aware_pruning.objectives.gpu_objectives import GPUFLOPEstimator
# Get model attributes
model_attributes = get_attributes_from_keras_model(baseline_model)
# Optimize for GPU FLOPs (structured pruning)
optimized_model = optimize_model(
baseline_model, model_attributes, GPUFLOPEstimator, scheduler,
X_train, y_train, X_val, y_val, batch_size, epochs,
optimizer, loss_fn, metric, increasing, rtol
)
# Structured pruning removes entire neurons/filters
print ( "Baseline model:" )
baseline_model.summary()
print ( " \n Optimized model:" )
optimized_model.summary()
GPU FLOP optimization performs structured pruning, removing entire neurons or filters. This directly reduces the model architecture size.
FPGA DSP Optimization
Target Vivado DSP blocks for hardware-efficient designs:
from hls4ml.utils.config import config_from_keras_model
from hls4ml.optimization.dsp_aware_pruning.objectives.vivado_objectives import VivadoDSPEstimator
from hls4ml.optimization import optimize_keras_model_for_hls4ml
# Create hls4ml configuration
default_reuse_factor = 4
default_precision = 'ap_fixed<16,6>'
hls_config = config_from_keras_model(
baseline_model,
granularity = 'name' ,
default_precision = default_precision,
default_reuse_factor = default_reuse_factor
)
hls_config[ 'IOType' ] = 'io_parallel'
hls_config[ 'Model' ][ 'Strategy' ] = 'Resource' # Required for DSP optimization
# Optimize for Vivado DSPs
optimized_model = optimize_keras_model_for_hls4ml(
baseline_model, hls_config, VivadoDSPEstimator, scheduler,
X_train, y_train, X_val, y_val, batch_size, epochs,
optimizer, loss_fn, metric, increasing, rtol
)
# Evaluate
y_optimized = optimized_model.predict(X_test)
acc_optimized = accuracy_score(np.argmax(y_test, axis = 1 ), np.argmax(y_optimized, axis = 1 ))
print ( f 'Optimized accuracy: { acc_optimized } ' )
For DSP optimization to work correctly, you must use the Resource strategy. After optimization, you can convert to hls4ml with the Unrolled strategy to realize DSP savings.
DSP Optimization Workflow
Configure with Resource strategy
Set Strategy: 'Resource' in hls_config to enable pattern-based DSP optimization.
Run optimization
Use optimize_keras_model_for_hls4ml() with VivadoDSPEstimator to optimize the model.
Convert to hls4ml
After optimization, create a new config with Strategy: 'Unrolled' for synthesis: hls_config = config_from_keras_model(optimized_model)
hls_config[ 'Model' ][ 'Strategy' ] = 'Unrolled'
hls_model = hls4ml.converters.convert_from_keras_model(
optimized_model, hls_config = hls_config
)
Additional Objectives
Vivado FF Optimization
Minimize register (flip-flop) utilization:
from hls4ml.optimization.dsp_aware_pruning.objectives.vivado_objectives import VivadoFFEstimator
optimized_model = optimize_keras_model_for_hls4ml(
baseline_model, hls_config, VivadoFFEstimator, scheduler,
X_train, y_train, X_val, y_val, batch_size, epochs,
optimizer, loss_fn, metric, increasing, rtol
)
Multi-Objective Optimization
Optimize for both DSP and BRAM utilization:
from hls4ml.optimization.dsp_aware_pruning.objectives.vivado_objectives import VivadoMultiObjectiveEstimator
optimized_model = optimize_keras_model_for_hls4ml(
baseline_model, hls_config, VivadoMultiObjectiveEstimator, scheduler,
X_train, y_train, X_val, y_val, batch_size, epochs,
optimizer, loss_fn, metric, increasing, rtol
)
Optimization Schedulers
Schedulers control how sparsity increases during optimization:
PolynomialScheduler
from hls4ml.optimization.dsp_aware_pruning.scheduler import PolynomialScheduler
# Increase sparsity polynomially over 10 steps to 75%
scheduler = PolynomialScheduler( steps = 10 , final_sparsity = 0.75 )
ConstantScheduler
from hls4ml.optimization.dsp_aware_pruning.scheduler import ConstantScheduler
# Apply constant sparsity increment per step
scheduler = ConstantScheduler( steps = 10 , final_sparsity = 0.5 )
BinaryScheduler
from hls4ml.optimization.dsp_aware_pruning.scheduler import BinaryScheduler
# Binary search for optimal sparsity
scheduler = BinaryScheduler( steps = 10 , final_sparsity = 0.8 )
If final_sparsity is not specified, it defaults to 1.0 (100% sparsity). Optimization stops when either the performance threshold is reached or final sparsity is achieved.
Advanced Configuration
Custom Regularization Range
import numpy as np
# Define custom regularization values for weight decay
regularization_range = np.logspace( - 7 , - 1 , num = 20 ).tolist()
optimized_model = optimize_model(
baseline_model, model_attributes, ParameterEstimator, scheduler,
X_train, y_train, X_val, y_val, batch_size, epochs,
optimizer, loss_fn, metric, increasing, rtol,
regularization_range = regularization_range
)
Knapsack Solver Selection
# Use greedy algorithm for very large networks (faster but less optimal)
optimized_model = optimize_model(
baseline_model, model_attributes, ParameterEstimator, scheduler,
X_train, y_train, X_val, y_val, batch_size, epochs,
optimizer, loss_fn, metric, increasing, rtol,
knapsack_solver = 'greedy' # Default: 'CBC_MIP'
)
Local vs Global Pruning
# Layer-wise (local) pruning
optimized_model = optimize_model(
baseline_model, model_attributes, ParameterEstimator, scheduler,
X_train, y_train, X_val, y_val, batch_size, epochs,
optimizer, loss_fn, metric, increasing, rtol,
local = True # Default: False (global)
)
Ranking Metrics
Choose how to rank weights for pruning:
# Available: 'l1', 'l2', 'saliency', 'Oracle'
optimized_model = optimize_model(
baseline_model, model_attributes, ParameterEstimator, scheduler,
X_train, y_train, X_val, y_val, batch_size, epochs,
optimizer, loss_fn, metric, increasing, rtol,
ranking_metric = 'l1' # Default: 'l1'
)
Best Practices
Choose appropriate epochs
For pre-trained models, use 1/3 to 1/2 of the original training epochs at each optimization step. Too few epochs may not recover accuracy; too many waste time.
Set realistic performance tolerance
Start with conservative sparsity
Begin with lower target sparsity (30-50%) and gradually increase if accuracy permits. Extremely high sparsity may not converge.
Match objective to deployment
Use ParameterEstimator for model size reduction
Use GPUFLOPEstimator for inference speed
Use VivadoDSPEstimator when DSPs are the bottleneck
Use VivadoMultiObjectiveEstimator for balanced FPGA designs
Always validate optimized models by synthesizing with hls4ml and checking actual resource utilization, not just estimates.
API Reference
optimize_model()
hls4ml.optimization.dsp_aware_pruning.keras.optimize_model(
keras_model,
model_attributes,
objective,
scheduler,
X_train, y_train,
X_val, y_val,
batch_size,
epochs,
optimizer,
loss_fn,
validation_metric,
increasing,
rtol,
callbacks = None ,
ranking_metric = 'l1' ,
local = False ,
verbose = False ,
rewinding_epochs = 1 ,
cutoff_bad_trials = 3 ,
directory = 'hls4ml-optimization' ,
tuner = 'Bayesian' ,
knapsack_solver = 'CBC_MIP' ,
regularization_range = None
)
optimize_keras_model_for_hls4ml()
hls4ml.optimization.optimize_keras_model_for_hls4ml(
keras_model,
hls_config,
objective,
scheduler,
X_train, y_train,
X_val, y_val,
batch_size,
epochs,
optimizer,
loss_fn,
validation_metric,
increasing,
rtol,
** kwargs
)
Wrapper for optimize_model() that automatically extracts attributes from hls4ml config.
get_model_sparsity()
from hls4ml.optimization.dsp_aware_pruning.keras.utils import get_model_sparsity
sparsity, layer_sparsity = get_model_sparsity(model)
Returns overall sparsity and per-layer sparsity dictionary.