Profile AI Model Memory and Performance on ADI Hardware

CodeFusion Studio provides two complementary profiling approaches for embedded AI models. Static resource profiling runs before any deployment and estimates memory requirements, inference latency, compute cycles, and per-layer bottlenecks directly from the model file and target hardware profile — no physical board is needed. Runtime hardware profiling captures live trace data from a running application on the actual device, delivering operator-level inference timing alongside system-level metrics such as CPU load and memory usage. Together, these tools give you a complete picture of how your model will behave on the target hardware before and after deployment.

The static resource profiler currently supports TFLM models only. CNN accelerator models on the MAX78002 that use the PyTorch izer backend are not supported by the profiler or compatibility analyzer. The Open Profiling Report option in System Planner is disabled for izer models.

Static resource profiling

Static profiling analyzes a model file against a target hardware profile without requiring a physical board. It produces a five-section report covering model metadata, memory requirements, hardware performance estimates, per-layer breakdown, and optimization recommendations.

Run from System Planner

Open Embedded AI Tools

In your workspace, open System Planner and click the Embedded AI Tools tab.

Add or select a TFLM model

If no model is configured, click Add Model and complete the model configuration. See Embedded AI Tools for field descriptions.

Open the Profiling Report

Click Open Profiling Report to generate and view the resource profiling report. The report opens as an interactive graphical view with filtering, recommendations panels, and an expandable layer table.

Review optimization recommendations

If the profiler detects optimization opportunities, a banner appears at the top of the Summary section. Click View recommendations to open the side panel containing memory recommendations, layer optimization opportunities, and MACs optimization opportunities.

To reopen a previously generated report, select AI Tools > Open Report from the CFS Home Page. Use Save As in VS Code to persist the report file to a custom location. By default, reports are stored in the system temp directory.

Run from the CLI

Use cfsutil ai profile to generate a resource profiling report from the terminal. Provide --model, --soc, and --core at minimum.

cfsutil ai profile --soc <soc> --core <core> --model <path/to/model_file>

Flags

Flag	Short	Description
`--model`	`-m`	Path or URL to the model file.
`--soc`	`-s`	Target SoC (e.g. `MAX32690`, `MAX32657`, `ADSP-SC835`).
`--core`	`-c`	Target core (e.g. `CM4`, `CM33`, `FX`).
`--acc`	`-a`	Target accelerator (optional).
`--package`	`-p`	SoC package variant (optional).
`--format`		Console output format: `text` (default) or `json`.
`--report-file`		Path to write the output report file.
`--report-format`		Report file format: `json` (default) or `text`.
`--search-path`	`-x`	Additional search path for data models. Can be repeated.
`--ignore-cache`		Bypass cache and fetch the latest remote files.

Example commands

# Profile a TFLM model on MAX32690 Cortex-M4
cfsutil ai profile --soc MAX32690 --core CM4 --model model.tflite

# Profile for a specific SoC package variant
cfsutil ai profile --soc MAX32690 --core CM4 --package WLP --model model.tflite

# Output console results as JSON
cfsutil ai profile --soc MAX32690 --core CM4 --model model.tflite --format json

# Write a JSON report to file
cfsutil ai profile --soc MAX32690 --core CM4 --model model.tflite \
    --report-file reports/profile-max32690.json

# Write a human-readable text report to file
cfsutil ai profile --soc MAX32690 --core CM4 --model model.tflite \
    --report-file reports/profile-max32690.txt \
    --report-format text

# Profile a model on SHARC-FX
cfsutil ai profile --soc ADSP-SC835 --core FX --model model.tflite

Understanding the profiling report

The resource profiling report is organized into five sections that progressively drill down from high-level summaries to detailed layer-by-layer analysis.

1. Model summary

Provides a high-level overview of the analyzed model file before detailed profiling begins.

Field	Description
Model Name	Filename of the model under test.
Model Path	Path to the model file on disk.
Framework	ML framework type. Currently only `TensorFlow Lite` is supported.
Model Size	Memory required to store the model (in KB).
Data Type	Numerical precision defined in the model file (e.g. `float32`, `int8`).
Layer Count	Total number of layers parsed from the model.

Example output:

=== MODEL SUMMARY ===
| Metric      | Value                           |
|-------------+---------------------------------|
| Model Name  | hello_world_f32.tflite          |
| Model Path  | examples/hello_world_f32.tflite |
| Framework   | TensorFlow Lite                 |
| Model Size  | 1.25 KB                         |
| Data Type   | float32                         |
| Layer Count | 3                               |

2. Memory analysis

Shows peak runtime RAM requirements and compares them to the available RAM on the target hardware.

Field	Description
Peak RAM Required	Maximum RAM usage during model execution.
RAM Status	Whether peak RAM usage fits the target hardware constraints (`OK` or overflow).
Available RAM	Total RAM available on the target hardware, from the hardware profile.
RAM Utilization	Percentage of available RAM consumed. Formula: (Peak RAM ÷ Available RAM) × 100.

If the profiler detects over-usage, a Memory Issues subsection lists the specific problems (e.g. Peak RAM usage (5880.0 KB) exceeds available RAM (1024.0 KB)). A Memory Recommendations subsection follows with context-specific mitigations.Example output (healthy model):

=== MEMORY ANALYSIS ===
| Memory Metric     | Value               |
|-------------------+---------------------|
| Peak RAM Required | 0.12 KB (0.00 MB)   |
| RAM Status        | OK                  |
| Available RAM     | 284.00 KB (0.28 MB) |
| RAM Utilization   | 0.0%                |

3. Hardware performance

Aggregates per-layer metrics into an overall performance estimate for the target hardware.

Field	Description
Total Cycles	Total compute cycles for full model execution.
Estimated Latency	End-to-end inference time in milliseconds, derived from cycles and max CPU clock frequency.
Peak Memory	Maximum RAM required during inference.
Accelerated Layers	Number of layers executed using hardware accelerators (DSP, NPU).
CPU-Only Layers	Number of layers that must run on the CPU without hardware acceleration.

Example output:

=== HARDWARE PERFORMANCE ===
| Metric             | Value     |
|--------------------+-----------|
| Total Cycles       | 576       |
| Estimated Latency  | 0.00 ms   |
| Peak Memory        | 0.12 KB   |
| Accelerated Layers | 0         |
| CPU-Only Layers    | 3         |

4. Per-layer performance table

A detailed breakdown of compute and memory requirements for each layer. Use this table to identify specific performance bottlenecks.

Field	Description
Layer	Index of the layer in the model (e.g. `0`, `1`, `2`).
Operator	Operator type used in this layer (e.g. `CONV_2D`, `ADD`, `FULLY_CONNECTED`).
Cycles	Total compute cycles to execute the layer.
Latency	Estimated runtime of the layer in milliseconds.
Energy	Estimated energy consumption in microjoules (µJ).
MACs	Number of multiply–accumulate operations.
Memory	Memory footprint of the layer in KB.
Accel	Whether the layer runs on hardware acceleration (`Yes`) or CPU only (`No`).

Example output:

=== PER-LAYER PERFORMANCE ===
| Layer | Operator        | Cycles | Latency (ms) | Energy (uJ) | MACs | Memory (KB) | Accel |
|-------+-----------------+--------+--------------+-------------+------+-------------+-------|
| 0     | FULLY_CONNECTED |     32 |         0.00 |        0.24 |   16 |        0.12 |  No   |
| 1     | FULLY_CONNECTED |    512 |         0.00 |        3.84 |  256 |        1.06 |  No   |
| 2     | FULLY_CONNECTED |     32 |         0.00 |        0.24 |   16 |        0.07 |  No   |

In the System Planner graphical report, each row can be expanded by clicking the chevron icon and the table supports SQL-like queries — for example:

-- Top 10 slowest layers
SELECT ID, name, latency ORDER BY latency DESC LIMIT 10

-- Layers with more than 1 million MACs
SELECT * WHERE MACs > 1000000

-- All CONV_2D layers
SELECT * WHERE name LIKE '%CONV_2D%'

5. Optimization opportunities

Reports baseline totals and highlights the specific layers most likely to benefit from optimization — grouped into memory (parameter size) and compute (MACs) categories.Summary baseline metrics:

Field	Description
Total Parameter Memory	Size of all model weights in KB.
Total MACs	Total multiply–accumulate operations for the model.

Layerwise memory opportunities flag layers with high parameter memory and suggest strategies such as depthwise separable convolution or low-rank factorization.Layerwise MAC opportunities flag layers with high MACs and suggest strategies such as replacing with depthwise convolution or using sparse matrices.Example output:

=== OPTIMIZATION OPPORTUNITIES ===
| Metric                 | Value   |
|------------------------+---------|
| Total Parameter Memory | 1.25 KB |
| Total MACs             | 288     |

=== Layerwise Memory Optimization Opportunities ===
| Layer | Op Type         | Param Mem (KB) | MACs | Suggestion                             |
|-------+-----------------+----------------+------+----------------------------------------|
| 1     | FULLY_CONNECTED |           1.06 |  256 | Quantize weights and consider sparsity |

The resource profiler logic is located in profile_resources.py in the cfs-ai/packages directory of the CodeFusion Studio repository. This script can be adapted or extended for custom use cases.

Runtime hardware profiling

Runtime profiling captures live trace data from a model running on actual hardware. It is powered by the Zephelin middleware layer developed by Antmicro for Analog Devices. Zephelin extends Zephyr RTOS with advanced tracing capabilities including operator-level AI inference analysis, task switch events, interrupts, and timing information.

The Zephelin profiling feature is currently in beta. Some options or trace formats may change in future releases.

Configure profiling options in System Planner

Open the Profiling tab

In System Planner, click the Profiling tab to see a list of cores in your workspace.

Enable Zephelin Profiler

Turn on Enable Zephelin Profiler for each core you want to profile.

Configure profiling options

Expand the core entry and configure the available options:

Application Callgraph — Enables instrumentation for capturing function call graphs and application-level tracing.
AI Model Profiling — Enables TFLM inference tracing. Only available when AI models are configured in Embedded AI Tools and the target processor supports it.
CPU Load — Samples CPU usage at a configurable interval (in milliseconds).
Memory Usage — Samples memory consumption at a configurable interval (in milliseconds).

Configure interface options

Set the trace transport interface:

Trace Interface Type — Select UART (USB is planned for a future release).
Trace Interface — Select the UART port number for trace output (e.g. 0 for UART0, 2 for UART2). Available options depend on UART peripherals allocated in Peripheral Allocation.
Baud Rate — Displays the configured baud rate for the trace interface (typically 115200).

Generate code and build

Click Generate Code in System Planner (enable Generate AI Models if your project includes an AI model), then run the CFS: build and CFS: flash tasks to compile and program the board.

Enabling Zephelin profiling options increases application memory usage. If the build fails with a linker error such as section 'XXXXX' will not fit in region 'XXXXX', the device may not have sufficient memory for all enabled profiling options. Try disabling some options (for example, turn off Memory Usage or CPU Load) to reduce overhead.

Capture a profiling trace

Open the Trace Capture panel

On the CFS Home Page, expand the TRACE CAPTURE section. If you see a Setup required or Source not available message, click Configure capture to open the Trace Configuration view.

Configure the trace source

In the Trace Configuration view, set:

Trace Interface Type — UART.
Serial Port — The serial port connected to your board’s UART trace output (e.g. /dev/ttyUSB0 on Linux, /dev/tty.usbserial-* on macOS, COM3 on Windows).
Baud Rate — Must match the firmware configuration (default: 115200).
Output Directory — Where trace files are saved (default: <core>/tracefiles).
ELF File (optional) — Path to the application ELF binary, used to symbolize trace data (default: <core>/build/zephyr/zephyr.elf). Without this, function names may not appear in the trace viewer.
Build Directory (optional) — Path to the build output directory for debug symbol lookup.

Start capture

Return to the TRACE CAPTURE panel and click Start Capture.

Close any serial monitor or terminal before starting capture. The trace capture requires exclusive access to the UART port. If another process is using the same port, the capture will fail.

Reset the board

Press the Reset button on your development board to restart the application and begin trace data transmission. The .ctf file will appear in the output directory once the device begins transmitting. For applications that run once and exit (such as the AI profiling examples), you may need to press Reset multiple times to capture additional trace data.

Stop capture

Click Stop Capture when you have collected sufficient data. The capture generates timestamped trace files using the naming pattern tracefile_YYYYMMDD_HHMMSS. Each board reset during a single capture session produces a separate pair of files:

.ctf file — Binary trace in Common Trace Format, optimized for efficient UART transmission.
.tef file — JSON-based Trace Event Format file for visualization in the Zephelin Trace Viewer.

For a terminal-based trace capture workflow, use cfsutil tasks run with the --capture and --port flags:

cfsutil tasks run flash_run_JLink --capture --port /dev/ttyUSB0 --project m4

The CLI automatically detects CTF tracing and generates TEF files when you stop capture with Ctrl+C.

AI Hardware Profiling view

For workspaces created from an AI model, the AI Hardware Profiling view (.cfs/ai.cfsaiprof file) provides a guided interface for deploying the model to hardware and capturing profiling data. It exposes the same trace capture workflow but adds deployment-specific status indicators and hardware configuration options. The view includes:

Building / Built / Error status indicator for the compilation state.
Undeployed → Deploying → Running → Stopped → Error status indicator for the deployment and trace capture state.
Host USB port designation dropdown — select the serial port for UART trace output.
Run with dropdown — select the debug probe type (J-Link or CMSIS-DAP) used to flash the application.
Run and Stop buttons to control deployment and trace capture.

If you closed the view and need to reopen it: in the VS Code Explorer, navigate to the .cfs folder in your workspace and open the ai.cfsaiprof file.

Visualize trace data with the Zephelin Trace Viewer

The Zephelin Trace Viewer VS Code extension is automatically installed as a dependency of the CodeFusion Studio extension. You can open .tef trace files in two ways:

Immediately after capture — When capture completes, a notification displays Traces captured successfully with a list of generated files. If multiple files were produced, click Choose a file to open to select which to view.
From Explorer — In the VS Code Explorer, locate any .tef file and click it to open in the trace viewer.

For a better viewing experience, right-click the trace viewer tab and select Move to New Window to get a full-screen view.

The trace viewer displays profiling data based on the options you enabled in the Profiling Configuration:

CPU Load

Shows CPU usage sampled at the configured interval. Visualizes how much CPU time is consumed by the AI inference relative to other application tasks.

AI Model Profiling

Displays per-operator inference timing for TensorFlow Lite Micro models. Enables detailed layer-level performance analysis of the live inference run.

Memory Usage

Shows per-thread memory consumption with multiple visualization types, sampled at the configured interval.

For advanced trace configuration options — such as memory profiling scopes and additional tracing features — see the Zephelin documentation.

Next steps

Check Compatibility First

Run the Compatibility Analyzer to validate operator support and memory constraints before profiling.

Create an AI Workspace

Set up a workspace pre-configured for your model and target device using the GUI wizard or CLI.

Get Started

Workspace & System Planner

Build, Flash & Debug

Embedded AI

Developer Tools

Profile AI Model Memory and Performance on ADI Hardware

Static resource profiling

Run from System Planner

Run from the CLI

Flags

Example commands

Understanding the profiling report

Runtime hardware profiling

Configure profiling options in System Planner

Capture a profiling trace

AI Hardware Profiling view

Visualize trace data with the Zephelin Trace Viewer

CPU Load

AI Model Profiling

Memory Usage

Next steps

Check Compatibility First

Create an AI Workspace

Build docs developers (and LLMs) love

Get Started

Workspace & System Planner

Build, Flash & Debug

Embedded AI

Developer Tools

Documentation Index

​Static resource profiling

​Run from System Planner

​Run from the CLI

​Flags

​Example commands

​Understanding the profiling report

​Runtime hardware profiling

​Configure profiling options in System Planner

​Capture a profiling trace

​AI Hardware Profiling view

​Visualize trace data with the Zephelin Trace Viewer

CPU Load

AI Model Profiling

Memory Usage

​Next steps

Check Compatibility First

Create an AI Workspace

Build docs developers (and LLMs) love

Static resource profiling

Run from System Planner

Run from the CLI

Flags

Example commands

Understanding the profiling report

Runtime hardware profiling

Configure profiling options in System Planner

Capture a profiling trace

AI Hardware Profiling view

Visualize trace data with the Zephelin Trace Viewer

Next steps