Conversion Process
The model conversion process in hls4ml transforms a trained machine learning model into synthesizable HLS code. The conversion is highly configurable, allowing you to balance latency, resource usage, and accuracy.
Basic Conversion
From Keras Models
import hls4ml
import tensorflow as tf
# Load your trained model
model = tf.keras.models.load_model( 'my_model.h5' )
# Create configuration
config = hls4ml.utils.config_from_keras_model(model)
# Convert to HLS
hls_model = hls4ml.converters.convert_from_keras_model(
model,
hls_config = config,
output_dir = 'my-hls-test' ,
backend = 'Vivado' ,
io_type = 'io_parallel'
)
Converter function: hls4ml/converters/__init__.py:169
From PyTorch Models
import torch
import hls4ml
# Your PyTorch model
model = torch.nn.Sequential(
torch.nn.Linear( 16 , 64 ),
torch.nn.ReLU(),
torch.nn.Linear( 64 , 10 )
)
# Convert to HLS
hls_model = hls4ml.converters.convert_from_pytorch_model(
model,
input_shape = ( None , 16 ),
output_dir = 'pytorch-hls-test' ,
backend = 'Vivado' ,
io_type = 'io_parallel'
)
PyTorch uses “channels_first” format while hls4ml expects “channels_last”. The converter automatically handles this transformation for io_parallel, but you may need to transpose manually for io_stream.
Converter function: hls4ml/converters/__init__.py:251
From ONNX Models
import onnx
import hls4ml
# Load ONNX model
onnx_model = onnx.load( 'model.onnx' )
# Convert to HLS
hls_model = hls4ml.converters.convert_from_onnx_model(
onnx_model,
output_dir = 'onnx-hls-test' ,
backend = 'Vivado' ,
io_type = 'io_parallel'
)
Converter function: hls4ml/converters/__init__.py:323
Configuration Deep Dive
Creating HLS Config
The hls_config dictionary controls all aspects of the conversion:
config = hls4ml.utils.config_from_keras_model(
model,
granularity = 'name' , # 'model', 'type', or 'name'
default_precision = 'fixed<16,6>' ,
default_reuse_factor = 1
)
Config creation: hls4ml/utils/config.py:115
Precision Configuration
Precision is specified using the format fixed<width,integer> or ap_fixed<width,integer>:
width - Total number of bits
integer - Number of integer bits (left of decimal point)
Fractional bits = width - integer
config = {
'Model' : {
'Precision' : 'fixed<16,6>' , # 16 total bits, 6 integer bits
'ReuseFactor' : 1
}
}
Start with fixed<16,6> as a baseline, then adjust based on your accuracy requirements and available resources.
Reuse Factor
The reuse factor determines hardware parallelism and resource usage:
ReuseFactor = 1 - Fully parallel, lowest latency, highest resource usage
ReuseFactor = N - Operations are serialized N times, lower resource usage
config = {
'Model' : {
'Precision' : 'fixed<16,6>' ,
'ReuseFactor' : 1 # Fully parallel
}
}
For a Dense layer with 128 inputs and 64 outputs:
ReuseFactor = 1 - 8,192 multipliers (128 × 64)
ReuseFactor = 8 - 1,024 multipliers, 8× longer latency
ReuseFactor = 64 - 128 multipliers, 64× longer latency
Configuration handling: hls4ml/model/graph.py:34
IO Types
hls4ml supports two I/O implementation styles:
io_parallel
Data is passed as arrays, fully parallel access:
hls_model = hls4ml.converters.convert_from_keras_model(
model,
hls_config = config,
io_type = 'io_parallel' ,
backend = 'Vivado'
)
✅ Lowest latency
✅ Simple interface
❌ High I/O pin count for large models
❌ Not suitable for large data sizes
io_stream
Data is passed as streaming interfaces (FIFOs/AXI-Stream):
hls_model = hls4ml.converters.convert_from_keras_model(
model,
hls_config = config,
io_type = 'io_stream' ,
backend = 'Vivado'
)
✅ Minimal I/O pins
✅ Supports large models
✅ Compatible with streaming architectures
❌ Slightly higher latency
❌ More complex dataflow
Example YAML config: hls4ml/converters/__init__.py:67
Granular Configuration
Model-Level Configuration
Applies to all layers unless overridden:
config = {
'Model' : {
'Precision' : 'fixed<16,6>' ,
'ReuseFactor' : 1 ,
'Strategy' : 'Latency'
}
}
Layer-Type Configuration
Applies to all layers of a specific type:
config = {
'Model' : {
'Precision' : 'fixed<16,6>' ,
'ReuseFactor' : 1
},
'LayerType' : {
'Dense' : {
'Precision' : 'fixed<18,8>' ,
'ReuseFactor' : 2
},
'Activation' : {
'Precision' : 'fixed<18,8>'
}
}
}
Layer-Name Configuration
Applies to specific layers (highest priority):
config = {
'Model' : {
'Precision' : 'fixed<16,6>' ,
'ReuseFactor' : 1
},
'LayerName' : {
'dense_1' : {
'Precision' : 'fixed<32,16>' , # Higher precision for critical layer
'ReuseFactor' : 1 ,
'Strategy' : 'Latency'
},
'dense_2' : {
'Precision' : 'fixed<12,4>' , # Lower precision for less critical layer
'ReuseFactor' : 8 , # Higher reuse to save resources
'Strategy' : 'Resource'
}
}
}
Configuration hierarchy: hls4ml/model/graph.py:90
Advanced Configuration
Strategy Selection
Implementation strategy affects resource/latency trade-offs:
config[ 'LayerName' ][ 'conv2d_1' ] = {
'Strategy' : 'Latency' , # Options: 'Latency', 'Resource', 'Resource_Unrolled'
'ReuseFactor' : 1
}
Latency - Fully pipelined, all operations in parallel
Resource - Sequential implementation, minimal resources
Resource_Unrolled - Balanced approach with partial parallelism
Precision per Tensor
For fine-grained control, specify precision for individual tensors:
config[ 'LayerName' ][ 'dense_1' ] = {
'Precision' : {
'weight' : 'fixed<8,4>' , # 8-bit weights
'bias' : 'fixed<16,8>' , # 16-bit biases
'result' : 'fixed<16,6>' # 16-bit outputs
}
}
Convolutional Implementation
For Conv layers, choose the implementation style:
config[ 'LayerName' ][ 'conv2d_1' ] = {
'ConvImplementation' : 'LineBuffer' , # Options: 'LineBuffer', 'Encoded'
'ParallelizationFactor' : 1
}
LineBuffer - Streaming convolution with line buffers
Encoded - Optimized for small kernels
Backend attributes: hls4ml/backends/vivado/vivado_backend.py:80
RNN-Specific Options
For recurrent layers:
config[ 'LayerName' ][ 'lstm_1' ] = {
'ReuseFactor' : 1 ,
'RecurrentReuseFactor' : 1 , # Separate reuse factor for recurrent weights
'Static' : True , # Static (fixed timesteps) vs dynamic
'TableSize' : 1024 # LUT size for activation functions
}
RNN attributes: hls4ml/backends/vivado/vivado_backend.py:48
YAML Configuration
You can also use YAML files for configuration:
KerasH5 : my_keras_model.h5
OutputDir : my-hls-test
ProjectName : myproject
Part : xcvu13p-flga2577-2-e
ClockPeriod : 5
IOType : io_stream
HLSConfig :
Model :
Precision : ap_fixed<16,6>
ReuseFactor : 10
LayerName :
dense_1 :
Precision : ap_fixed<32,16>
ReuseFactor : 1
dense_2 :
Precision : ap_fixed<12,4>
ReuseFactor : 8
Then convert using:
import hls4ml
hls_model = hls4ml.converters.convert_from_config( 'config.yml' )
YAML parsing: hls4ml/converters/__init__.py:61
Backend-Specific Parameters
Vivado/Vitis Backends
hls_model = hls4ml.converters.convert_from_keras_model(
model,
hls_config = config,
backend = 'Vivado' ,
board = 'pynq-z2' , # Or specify 'part' directly
part = 'xc7z020clg400-1' , # FPGA part number
clock_period = 5 , # Clock period in ns
clock_uncertainty = '12.5%' , # Clock uncertainty
io_type = 'io_parallel'
)
Vivado backend: hls4ml/backends/vivado/vivado_backend.py:1
Quartus Backend (Intel FPGAs)
hls_model = hls4ml.converters.convert_from_keras_model(
model,
hls_config = config,
backend = 'Quartus' ,
part = 'Arria10' ,
clock_period = 5 ,
io_type = 'io_parallel'
)
Quartus backend: hls4ml/backends/quartus/quartus_backend.py:27
Optimization Passes
hls4ml applies various optimization passes during conversion. You can control these:
config[ 'HLSConfig' ][ 'Flows' ] = [ 'convert' ] # Minimal conversion
# Or skip specific optimizers
config[ 'HLSConfig' ][ 'SkipOptimizers' ] = [ 'fuse_consecutive_batch_normalization' ]
# Or add custom optimizers
config[ 'HLSConfig' ][ 'Optimizers' ] = [ 'custom_optimizer_name' ]
Optimization system: hls4ml/model/optimizer.py
Verification
Software Prediction
Test the converted model in software:
import numpy as np
# Compile the model
hls_model.compile()
# Run prediction
X_test = np.random.rand( 100 , 16 )
y_keras = model.predict(X_test)
y_hls = hls_model.predict(X_test)
# Check accuracy
np.testing.assert_allclose(y_hls, y_keras, rtol = 1e-2 , atol = 0.01 )
C Simulation
Run HLS C simulation:
# Write the project
hls_model.write()
# Build with C simulation
hls_model.build( csim = True )
Co-simulation
Run RTL co-simulation (requires synthesis):
# Build with synthesis and co-simulation
hls_model.build( csim = True , synth = True , cosim = True )
Common Patterns
Quick Conversion for Testing
# Minimal config for quick testing
config = hls4ml.utils.config_from_keras_model(model)
hls_model = hls4ml.converters.convert_from_keras_model(
model,
hls_config = config,
output_dir = 'quick-test'
)
hls_model.compile()
Resource-Constrained Conversion
# Optimize for minimal resource usage
config = hls4ml.utils.config_from_keras_model(
model,
default_precision = 'fixed<8,3>' , # Lower precision
default_reuse_factor = 32 # Higher reuse
)
config[ 'Model' ][ 'Strategy' ] = 'Resource'
Ultra-Low Latency Conversion
# Optimize for minimum latency
config = hls4ml.utils.config_from_keras_model(
model,
default_precision = 'fixed<16,6>' ,
default_reuse_factor = 1 # Fully parallel
)
config[ 'Model' ][ 'Strategy' ] = 'Latency'
hls_model = hls4ml.converters.convert_from_keras_model(
model,
hls_config = config,
io_type = 'io_parallel' ,
backend = 'Vivado'
)
Best Practices
Start with default settings
Begin with fixed<16,6> and ReuseFactor=1, then adjust based on synthesis results.
Use profiling to identify resource bottlenecks: hls4ml.model.profiling.numerical(keras_model, hls_model, X_test)
Apply different precision and reuse factors to different layers based on their sensitivity and resource usage.
Check accuracy after each configuration change using software simulation before running synthesis.
Next Steps
Precision Optimization Learn how to optimize fixed-point precision
HLS Backends Understand backend-specific features