Model Conversion

Conversion Process

The model conversion process in hls4ml transforms a trained machine learning model into synthesizable HLS code. The conversion is highly configurable, allowing you to balance latency, resource usage, and accuracy.

Basic Conversion

From Keras Models

import hls4ml
import tensorflow as tf

# Load your trained model
model = tf.keras.models.load_model('my_model.h5')

# Create configuration
config = hls4ml.utils.config_from_keras_model(model)

# Convert to HLS
hls_model = hls4ml.converters.convert_from_keras_model(
    model,
    hls_config=config,
    output_dir='my-hls-test',
    backend='Vivado',
    io_type='io_parallel'
)

Converter function: hls4ml/converters/__init__.py:169

From PyTorch Models

import torch
import hls4ml

# Your PyTorch model
model = torch.nn.Sequential(
    torch.nn.Linear(16, 64),
    torch.nn.ReLU(),
    torch.nn.Linear(64, 10)
)

# Convert to HLS
hls_model = hls4ml.converters.convert_from_pytorch_model(
    model,
    input_shape=(None, 16),
    output_dir='pytorch-hls-test',
    backend='Vivado',
    io_type='io_parallel'
)

PyTorch uses “channels_first” format while hls4ml expects “channels_last”. The converter automatically handles this transformation for io_parallel, but you may need to transpose manually for io_stream.

Converter function: hls4ml/converters/__init__.py:251

From ONNX Models

import onnx
import hls4ml

# Load ONNX model
onnx_model = onnx.load('model.onnx')

# Convert to HLS
hls_model = hls4ml.converters.convert_from_onnx_model(
    onnx_model,
    output_dir='onnx-hls-test',
    backend='Vivado',
    io_type='io_parallel'
)

Converter function: hls4ml/converters/__init__.py:323

Configuration Deep Dive

Creating HLS Config

The hls_config dictionary controls all aspects of the conversion:

config = hls4ml.utils.config_from_keras_model(
    model,
    granularity='name',  # 'model', 'type', or 'name'
    default_precision='fixed<16,6>',
    default_reuse_factor=1
)

Config creation: hls4ml/utils/config.py:115

Precision Configuration

Precision is specified using the format fixed<width,integer> or ap_fixed<width,integer>:

width - Total number of bits
integer - Number of integer bits (left of decimal point)
Fractional bits = width - integer

config = {
    'Model': {
        'Precision': 'fixed<16,6>',  # 16 total bits, 6 integer bits
        'ReuseFactor': 1
    }
}

Start with fixed<16,6> as a baseline, then adjust based on your accuracy requirements and available resources.

Reuse Factor

The reuse factor determines hardware parallelism and resource usage:

ReuseFactor = 1 - Fully parallel, lowest latency, highest resource usage
ReuseFactor = N - Operations are serialized N times, lower resource usage

config = {
    'Model': {
        'Precision': 'fixed<16,6>',
        'ReuseFactor': 1  # Fully parallel
    }
}

For a Dense layer with 128 inputs and 64 outputs:

ReuseFactor = 1 - 8,192 multipliers (128 × 64)
ReuseFactor = 8 - 1,024 multipliers, 8× longer latency
ReuseFactor = 64 - 128 multipliers, 64× longer latency

Configuration handling: hls4ml/model/graph.py:34

IO Types

hls4ml supports two I/O implementation styles:

io_parallel

Data is passed as arrays, fully parallel access:

hls_model = hls4ml.converters.convert_from_keras_model(
    model,
    hls_config=config,
    io_type='io_parallel',
    backend='Vivado'
)

✅ Lowest latency
✅ Simple interface
❌ High I/O pin count for large models
❌ Not suitable for large data sizes

io_stream

Data is passed as streaming interfaces (FIFOs/AXI-Stream):

hls_model = hls4ml.converters.convert_from_keras_model(
    model,
    hls_config=config,
    io_type='io_stream',
    backend='Vivado'
)

✅ Minimal I/O pins
✅ Supports large models
✅ Compatible with streaming architectures
❌ Slightly higher latency
❌ More complex dataflow

Example YAML config: hls4ml/converters/__init__.py:67

Granular Configuration

Model-Level Configuration

Applies to all layers unless overridden:

config = {
    'Model': {
        'Precision': 'fixed<16,6>',
        'ReuseFactor': 1,
        'Strategy': 'Latency'
    }
}

Layer-Type Configuration

Applies to all layers of a specific type:

config = {
    'Model': {
        'Precision': 'fixed<16,6>',
        'ReuseFactor': 1
    },
    'LayerType': {
        'Dense': {
            'Precision': 'fixed<18,8>',
            'ReuseFactor': 2
        },
        'Activation': {
            'Precision': 'fixed<18,8>'
        }
    }
}

Layer-Name Configuration

Applies to specific layers (highest priority):

config = {
    'Model': {
        'Precision': 'fixed<16,6>',
        'ReuseFactor': 1
    },
    'LayerName': {
        'dense_1': {
            'Precision': 'fixed<32,16>',  # Higher precision for critical layer
            'ReuseFactor': 1,
            'Strategy': 'Latency'
        },
        'dense_2': {
            'Precision': 'fixed<12,4>',   # Lower precision for less critical layer
            'ReuseFactor': 8,              # Higher reuse to save resources
            'Strategy': 'Resource'
        }
    }
}

Configuration hierarchy: hls4ml/model/graph.py:90

Advanced Configuration

Strategy Selection

Implementation strategy affects resource/latency trade-offs:

config['LayerName']['conv2d_1'] = {
    'Strategy': 'Latency',  # Options: 'Latency', 'Resource', 'Resource_Unrolled'
    'ReuseFactor': 1
}

Latency - Fully pipelined, all operations in parallel
Resource - Sequential implementation, minimal resources
Resource_Unrolled - Balanced approach with partial parallelism

Precision per Tensor

For fine-grained control, specify precision for individual tensors:

config['LayerName']['dense_1'] = {
    'Precision': {
        'weight': 'fixed<8,4>',     # 8-bit weights
        'bias': 'fixed<16,8>',      # 16-bit biases
        'result': 'fixed<16,6>'     # 16-bit outputs
    }
}

Convolutional Implementation

For Conv layers, choose the implementation style:

config['LayerName']['conv2d_1'] = {
    'ConvImplementation': 'LineBuffer',  # Options: 'LineBuffer', 'Encoded'
    'ParallelizationFactor': 1
}

LineBuffer - Streaming convolution with line buffers
Encoded - Optimized for small kernels

Backend attributes: hls4ml/backends/vivado/vivado_backend.py:80

RNN-Specific Options

For recurrent layers:

config['LayerName']['lstm_1'] = {
    'ReuseFactor': 1,
    'RecurrentReuseFactor': 1,  # Separate reuse factor for recurrent weights
    'Static': True,              # Static (fixed timesteps) vs dynamic
    'TableSize': 1024            # LUT size for activation functions
}

RNN attributes: hls4ml/backends/vivado/vivado_backend.py:48

YAML Configuration

You can also use YAML files for configuration:

KerasH5: my_keras_model.h5
OutputDir: my-hls-test
ProjectName: myproject
Part: xcvu13p-flga2577-2-e
ClockPeriod: 5
IOType: io_stream
HLSConfig:
  Model:
    Precision: ap_fixed<16,6>
    ReuseFactor: 10
  LayerName:
    dense_1:
      Precision: ap_fixed<32,16>
      ReuseFactor: 1
    dense_2:
      Precision: ap_fixed<12,4>
      ReuseFactor: 8

Then convert using:

import hls4ml

hls_model = hls4ml.converters.convert_from_config('config.yml')

YAML parsing: hls4ml/converters/__init__.py:61

Backend-Specific Parameters

Vivado/Vitis Backends

hls_model = hls4ml.converters.convert_from_keras_model(
    model,
    hls_config=config,
    backend='Vivado',
    board='pynq-z2',              # Or specify 'part' directly
    part='xc7z020clg400-1',       # FPGA part number
    clock_period=5,               # Clock period in ns
    clock_uncertainty='12.5%',    # Clock uncertainty
    io_type='io_parallel'
)

Vivado backend: hls4ml/backends/vivado/vivado_backend.py:1

Quartus Backend (Intel FPGAs)

hls_model = hls4ml.converters.convert_from_keras_model(
    model,
    hls_config=config,
    backend='Quartus',
    part='Arria10',
    clock_period=5,
    io_type='io_parallel'
)

Quartus backend: hls4ml/backends/quartus/quartus_backend.py:27

Optimization Passes

hls4ml applies various optimization passes during conversion. You can control these:

config['HLSConfig']['Flows'] = ['convert']  # Minimal conversion

# Or skip specific optimizers
config['HLSConfig']['SkipOptimizers'] = ['fuse_consecutive_batch_normalization']

# Or add custom optimizers
config['HLSConfig']['Optimizers'] = ['custom_optimizer_name']

Optimization system: hls4ml/model/optimizer.py

Verification

Software Prediction

Test the converted model in software:

import numpy as np

# Compile the model
hls_model.compile()

# Run prediction
X_test = np.random.rand(100, 16)
y_keras = model.predict(X_test)
y_hls = hls_model.predict(X_test)

# Check accuracy
np.testing.assert_allclose(y_hls, y_keras, rtol=1e-2, atol=0.01)

C Simulation

Run HLS C simulation:

# Write the project
hls_model.write()

# Build with C simulation
hls_model.build(csim=True)

Co-simulation

Run RTL co-simulation (requires synthesis):

# Build with synthesis and co-simulation
hls_model.build(csim=True, synth=True, cosim=True)

Common Patterns

Quick Conversion for Testing

# Minimal config for quick testing
config = hls4ml.utils.config_from_keras_model(model)
hls_model = hls4ml.converters.convert_from_keras_model(
    model,
    hls_config=config,
    output_dir='quick-test'
)
hls_model.compile()

Resource-Constrained Conversion

# Optimize for minimal resource usage
config = hls4ml.utils.config_from_keras_model(
    model,
    default_precision='fixed<8,3>',  # Lower precision
    default_reuse_factor=32           # Higher reuse
)
config['Model']['Strategy'] = 'Resource'

Ultra-Low Latency Conversion

# Optimize for minimum latency
config = hls4ml.utils.config_from_keras_model(
    model,
    default_precision='fixed<16,6>',
    default_reuse_factor=1  # Fully parallel
)
config['Model']['Strategy'] = 'Latency'
hls_model = hls4ml.converters.convert_from_keras_model(
    model,
    hls_config=config,
    io_type='io_parallel',
    backend='Vivado'
)

Best Practices

Start with default settings

Begin with fixed<16,6> and ReuseFactor=1, then adjust based on synthesis results.

Profile your model

Use profiling to identify resource bottlenecks:

hls4ml.model.profiling.numerical(keras_model, hls_model, X_test)

Layer-specific tuning

Apply different precision and reuse factors to different layers based on their sensitivity and resource usage.

Validate frequently

Check accuracy after each configuration change using software simulation before running synthesis.

Getting Started

Core Concepts

Frontends

Backends

Advanced Features

Internals

​Conversion Process

​Basic Conversion

​From Keras Models

​From PyTorch Models

​From ONNX Models

​Configuration Deep Dive

​Creating HLS Config

​Precision Configuration

​Reuse Factor

​IO Types

​io_parallel

​io_stream

​Granular Configuration

​Model-Level Configuration

​Layer-Type Configuration

​Layer-Name Configuration

​Advanced Configuration

​Strategy Selection

​Precision per Tensor

​Convolutional Implementation

​RNN-Specific Options

​YAML Configuration

​Backend-Specific Parameters

​Vivado/Vitis Backends

​Quartus Backend (Intel FPGAs)

​Optimization Passes

​Verification

​Software Prediction

​C Simulation

​Co-simulation

​Common Patterns

​Quick Conversion for Testing

​Resource-Constrained Conversion

​Ultra-Low Latency Conversion

​Best Practices

​Next Steps

Precision Optimization

HLS Backends

Build docs developers (and LLMs) love

Conversion Process

Basic Conversion

From Keras Models

From PyTorch Models

From ONNX Models

Configuration Deep Dive

Creating HLS Config

Precision Configuration

Reuse Factor

IO Types

io_parallel

io_stream

Granular Configuration

Model-Level Configuration

Layer-Type Configuration

Layer-Name Configuration

Advanced Configuration

Strategy Selection

Precision per Tensor

Convolutional Implementation

RNN-Specific Options

YAML Configuration

Backend-Specific Parameters

Vivado/Vitis Backends

Quartus Backend (Intel FPGAs)

Optimization Passes

Verification

Software Prediction

C Simulation

Co-simulation

Common Patterns

Quick Conversion for Testing

Resource-Constrained Conversion

Ultra-Low Latency Conversion

Best Practices

Next Steps