What is hls4ml?
hls4ml is a package that converts machine learning models into highly optimized firmware implementations for FPGAs and ASICs using High-Level Synthesis (HLS). It enables ultra-low latency inference by translating neural networks into hardware descriptions that can be synthesized into custom accelerators.hls4ml achieves inference latencies as low as a few nanoseconds by implementing neural networks directly in hardware logic, making it ideal for real-time applications in particle physics, edge computing, and low-power AI.
Architecture Overview
The hls4ml conversion pipeline transforms machine learning models through several stages:Conversion Pipeline Stages
Model Parsing
The framework-specific model (Keras, PyTorch, or ONNX) is parsed and converted into an internal graph representation. Each layer is mapped to an hls4ml layer type.
Graph IR Construction
An intermediate representation (IR) is built that captures the computational graph structure, layer connectivity, and data flow patterns independent of the original framework.
Optimization
Multiple optimization passes are applied:
- Layer fusion (e.g., BatchNorm + Activation)
- Precision inference and quantization
- Resource allocation strategies
- Memory optimization
Backend Code Generation
The optimized graph is translated to backend-specific HLS code (C++/SystemC) with appropriate pragmas and directives for the target tool.
Key Components
Model Graph
TheModelGraph class is the central data structure that represents the converted model:
hls4ml/model/graph.py:1
Backends
Backends translate the internal representation to vendor-specific HLS:- Vivado HLS - Xilinx FPGAs (legacy toolchain)
- Vitis HLS - Xilinx FPGAs (modern toolchain)
- Quartus - Intel/Altera FPGAs
- Catapult HLS - Siemens high-level synthesis
- oneAPI - Intel FPGA alternative flow
hls4ml/backends/backend.py:17
Converters
Framework-specific converters handle model parsing:keras_v2_to_hls- TensorFlow/Keras 2.x modelskeras_v3_to_hls- Keras 3.x modelspytorch_to_hls- PyTorch models via tracingonnx_to_hls- ONNX models
hls4ml/converters/__init__.py:34
Workflow Example
Here’s a complete example showing the typical workflow:Design Philosophy
hls4ml is designed with several key principles:Layer-by-Layer Translation
Each neural network layer is mapped to a corresponding hardware function with specific precision, parallelism, and resource allocation. This modular approach allows fine-grained control over implementation trade-offs.Fixed-Point Arithmetic
To maximize efficiency and minimize resource usage, hls4ml uses fixed-point arithmetic instead of floating-point. Precision can be configured per-layer or per-tensor.Configurable Resource Usage
The reuse factor controls the trade-off between latency and resource usage. A reuse factor of 1 fully unrolls operations for minimum latency, while higher values serialize computations to save resources.Multiple Implementation Strategies
- Latency - Fully pipelined, minimum latency
- Resource - Serialized computation, minimum resource usage
- Resource Unrolled - Balanced approach with configurable parallelism
Next Steps
Model Conversion
Learn how to configure and convert your models
HLS Backends
Understand the different backend options
Precision Optimization
Master fixed-point precision configuration
API Reference
Explore the complete API documentation
