General Questions
What is hls4ml?
What is hls4ml?
- High-energy physics triggers at CERN’s Large Hadron Collider
- Real-time control systems for quantum computing
- Feedback loops in nuclear fusion reactors
- Low-power environmental monitoring on satellites
- Biomedical signal processing (e.g., arrhythmia classification)
How does hls4ml work?
How does hls4ml work?
- Parse the trained model to extract architecture and weights
- Optimize by fusing layers (e.g., BatchNorm into Conv), applying precision inference
- Generate C++ HLS code implementing each layer
- Synthesize with vendor tools (Vivado HLS, Vitis, Quartus, etc.) to create RTL
- Integrate into FPGA designs or run standalone
How is hls4ml so fast?
How is hls4ml so fast?
- On-chip weight storage: All model parameters stored in FPGA block RAM for immediate access
- Spatial dataflow architecture: Exploits FPGA parallelism by implementing layers as hardware pipelines
- Configurable parallelism: Fully parallel (
ReuseFactor=1) for minimum latency or resource-sharing for efficiency - Fixed-point arithmetic: Uses optimized fixed-point types instead of floating-point
- Layer fusion: Merges operations (e.g., Conv+BatchNorm+ReLU) into single hardware blocks
- io_parallel: 50-200 nanoseconds for small models
- io_stream: 1-10 microseconds for larger models
Will my model work with hls4ml?
Will my model work with hls4ml?
- Dense (fully connected) layers
- Convolutional layers (Conv1D, Conv2D, DepthwiseConv)
- Pooling layers (MaxPooling, AveragePooling, GlobalPooling)
- Activations (ReLU, sigmoid, tanh, softmax, ELU, LeakyReLU, etc.)
- Batch normalization
- Recurrent layers (LSTM, GRU)
- Residual/skip connections
- Quantized layers (QKeras, HGQ, Brevitas)
- Graph Neural Networks (experimental)
- Transformers (early development, not stable)
- Custom layers (requires extension API)
- Large Language Models (LLMs)
- Large vision transformers
- Extremely novel architectures
Will my X-parameter model fit on FPGA Y?
Will my X-parameter model fit on FPGA Y?
- Small models: O(1,000) parameters on modest FPGAs
- Medium models: O(10,000) parameters with quantization
- Larger models: O(100,000) parameters on large FPGAs with aggressive optimization
Model Architecture
- CNNs reuse parameters → lower resource usage
- Dense layers → higher resource usage
- Activations with LUTs → additional resources
Configuration
- Precision: Lower bit widths → fewer resources
- ReuseFactor: Higher values → more resource sharing
- IOType:
io_stream→ lower resource usage thanio_parallel - Strategy:
Resource→ more sharing thanLatency
Getting Started
How do I get started with hls4ml?
How do I get started with hls4ml?
- Part 1: Introduction to FPGAs and HLS
- Part 2: Your first hls4ml model conversion
- Part 3: Model optimization and compression
- Part 4: Advanced features (profiling, custom layers)
What hardware do I need?
What hardware do I need?
- Linux or macOS machine with Python 3.10+
- C++ compiler (g++ on Linux, Xcode on macOS)
- 8GB+ RAM recommended
- Vendor HLS tools (see Installation):
- Vivado HLS 2020.1+ or Vitis HLS 2022.2+ (Xilinx/AMD)
- Quartus 20.1-21.4 or oneAPI 2024.1-2025.0 (Intel/Altera)
- Catapult HLS 2024.1+ (Siemens/Mentor)
- 16GB+ RAM recommended for synthesis
- Multi-core CPU helpful (synthesis is parallelizable)
- Target FPGA board matching your chosen part
- JTAG programmer/debugger
- Host interface (PCIe, Ethernet, etc.) if needed
Which backend should I choose?
Which backend should I choose?
| Backend | Vendor | Best For | Notes |
|---|---|---|---|
| Vitis | Xilinx/AMD | New designs, UltraScale+ | Recommended for new projects |
| Vivado | Xilinx/AMD | Legacy 7-series, UltraScale | Mature, well-tested |
| Quartus | Intel/Altera | Stratix, Arria, Cyclone | Stratix 10, Arria 10 |
| oneAPI | Intel/Altera | Modern Intel FPGAs | Experimental, uses SYCL |
| Catapult | Any | ASICs and FPGAs | Best for ASIC flows |
Do I need to know FPGA programming?
Do I need to know FPGA programming?
- No HDL (Verilog/VHDL) knowledge required
- No understanding of FPGA primitives needed
- Basic Python and ML framework knowledge sufficient
- Fixed-point arithmetic and precision
- Parallelism vs resource usage tradeoffs
- Latency vs throughput
- FPGA resources (LUTs, FFs, DSPs, BRAMs)
Common Issues
My predictions don't match the original model
My predictions don't match the original model
Build fails: 'Stop unrolling loop because it may cause large runtime'
Build fails: 'Stop unrolling loop because it may cause large runtime'
- Quick Fixes
- Model Compression
- Architecture Changes
Build fails: 'cannot open shared object file'
Build fails: 'cannot open shared object file'
Synthesis takes forever / uses too much memory
Synthesis takes forever / uses too much memory
ImportError: cannot import converters or utils
ImportError: cannot import converters or utils
Model with custom layers fails to convert
Model with custom layers fails to convert
- Replace with equivalent supported layers (easiest)
-
Use the extension API to add support:
- Request support via GitHub Discussions
Optimization & Performance
How do I reduce resource usage?
How do I reduce resource usage?
How do I reduce latency?
How do I reduce latency?
What's the difference between io_parallel and io_stream?
What's the difference between io_parallel and io_stream?
| Feature | io_parallel | io_stream |
|---|---|---|
| Latency | Lowest (50-200ns) | Higher (1-10μs) |
| Resources | Highest | Lower |
| Throughput | One inference at a time | Pipelined |
| Best for | Single inputs, min latency | Streaming data, large models |
| I/O interface | All inputs/outputs at once | Sequential stream |
How do I use profiling tools?
How do I use profiling tools?
- Box plots show distribution of weights/activations
- Grey boxes show range representable with current precision
- If box-and-whisker extends outside grey box → increase precision
- If grey box is much larger → can reduce precision
Contributing
How do I contribute to hls4ml?
How do I contribute to hls4ml?
Start a discussion
Review contribution guidelines
- Code style (using ruff formatter)
- Testing requirements
- Documentation standards
- Attend online meetings (request invite via CERN e-group)
- Chat on GitHub Discussions
- Present your use cases
We acknowledge the Fast Machine Learning collective as an open community of multi-domain experts and collaborators.
How should I report bugs?
How should I report bugs?
-
Environment info:
-
Minimal reproducible example:
-
Error messages:
- Full stack trace
- HLS log files if synthesis failed
- Expected vs actual behavior
