Timing and Profiling OpenVINS Estimator Components

Real-time performance is a first-class requirement for any VIO system intended for robotic deployment. ov_eval includes a set of timing tools that let you measure how long each processing stage takes, visualise the distribution of execution times, compare mono vs. stereo configurations, and monitor CPU and memory load over a full run — all without modifying the core estimator code.

How timing data is recorded

Timing information is written directly by ov_msckf::VioManager. The output is a comma-separated (CSV) file where:

The first column is the wall-clock timestamp (seconds).
The middle columns contain per-component durations (seconds), with component names taken from the CSV header row.
The last column is the total processing time for that frame.

The components timed in a standard OpenVINS run are:

Component	What it covers
`tracking`	Feature detection and optical-flow tracking
`propagation`	IMU pre-integration between camera frames
`msckf update`	MSCKF feature measurement update
`slam update`	SLAM-feature EKF update (if SLAM features enabled)
`slam delayed`	Delayed initialisation of new SLAM features
`marginalization`	Sliding-window marginalization step
`total`	Sum of all components for the frame

You can add or remove columns to match a modified estimator — the timing tools read component names from the CSV header at runtime.

Timing output is enabled via the record_timing_information option in VioManagerOptions. Set it to true in your parameter YAML before launching to start generating a timing file.

Recording CPU and memory usage

Execution time alone does not capture the full computational load. To measure percent CPU usage, percent memory, and thread count in real time, use the pid_ros.py Python script with the psutil library.

Installing psutil

pip install psutil

Adding the recorder to your launch file

<!-- Monitor a single OpenVINS node -->
<node name="recorder_timing" pkg="ov_eval" type="pid_ros.py" output="screen">
    <param name="nodes"   type="str" value="/run_subscribe_msckf" />
    <param name="output"  type="str" value="/tmp/psutil_log.txt" />
</node>

To monitor a multi-node system (e.g., VINS-Mono), pass a comma-separated list of node names:

<node name="recorder_timing" pkg="ov_eval" type="pid_ros.py" output="screen">
    <param name="nodes"   type="str" value="/feature_tracker,/vins_estimator,/pose_graph" />
    <param name="output"  type="str" value="/tmp/psutil_log.txt" />
</node>

The script polls each node at 1 Hz and writes a header plus one row per second to the output file. The header format is:

# timestamp(s) summed_cpu_perc summed_mem_perc summed_threads <node>_cpu_perc <node>_mem_perc <node>_threads ...

CPU percentage reported by psutil uses the convention where 100% = one full CPU thread. A value of 200% means the system requires two threads of computation. This is standard top/htop behaviour. Do not compare CPU percentages across machines with different CPU counts.

Analyzing timing output

timing_flamegraph

The flame-graph script produces a stacked breakdown of per-component times that makes it immediately obvious which component dominates.

rosrun ov_eval timing_flamegraph timing_mono_ethV101.txt

Example output:

[TIME]: loaded 2817 timestamps from file (7 categories)!!
mean_time = 0.0037 | std = 0.0011 | 99th = 0.0063 | max = 0.0254 (tracking)
mean_time = 0.0004 | std = 0.0001 | 99th = 0.0006 | max = 0.0014 (propagation)
mean_time = 0.0032 | std = 0.0022 | 99th = 0.0083 | max = 0.0309 (msckf update)
mean_time = 0.0034 | std = 0.0013 | 99th = 0.0063 | max = 0.0099 (slam update)
mean_time = 0.0012 | std = 0.0017 | 99th = 0.0052 | max = 0.0141 (slam delayed)
mean_time = 0.0009 | std = 0.0003 | 99th = 0.0015 | max = 0.0027 (marginalization)
mean_time = 0.0128 | std = 0.0035 | 99th = 0.0208 | max = 0.0403 (total)

The 99th percentile column is especially useful for real-time analysis: it shows the worst-case time that occurs in 1% of frames, which is the relevant bound for a robot running at a fixed camera rate. An optional second argument subsamples the data before plotting (default: every 10th frame):

rosrun ov_eval timing_flamegraph timing_mono_ethV101.txt 5

timing_histogram

The histogram script bins execution times for each component to reveal the shape of the distribution — whether times are tightly clustered or have a heavy tail.

rosrun ov_eval timing_histogram timing_mono_ethV101.txt 75

The second argument is the number of histogram bins. A higher bin count gives finer resolution but requires more data to be statistically meaningful.

timing_comparison

Use this script to compare timing files side-by-side — for example, to quantify the overhead of stereo tracking vs. monocular:

rosrun ov_eval timing_comparison timing_mono_ethV101.txt timing_stereo_ethV101.txt

Example output:

======================================
[TIME]: loading data for timing_mono
mean_time = 0.0037 | std = 0.0011 | 99th = 0.0063 | max = 0.0254 (tracking)
mean_time = 0.0128 | std = 0.0035 | 99th = 0.0208 | max = 0.0403 (total)
======================================
======================================
[TIME]: loading data for timing_stereo
mean_time = 0.0077 | std = 0.0020 | 99th = 0.0124 | max = 0.0219 (tracking)
mean_time = 0.0263 | std = 0.0063 | 99th = 0.0410 | max = 0.0979 (total)
======================================

In this example, stereo tracking takes roughly twice as long as monocular (7.7 ms vs. 3.7 ms mean), and the total processing time increases from 12.8 ms to 26.3 ms — well within a 10 Hz budget but relevant for 30 Hz operation.

timing_percentages

Processes psutil log folders to summarise CPU and memory usage across multiple runs:

rosrun ov_eval timing_percentages psutil_logs/

The expected folder structure mirrors the algorithm/dataset structure used by the error evaluation tools:

psutil_logs/
    ov_mono/
        run1.txt
        run2.txt
        run3.txt
    ov_stereo/
        run1.txt
        run2.txt
        run3.txt

Example output:

======================================
[COMP]: processing ov_mono algorithm
    loaded 149 timestamps from file!!
    PREC: mean_cpu = 83.858 +- 17.130
    PREC: mean_mem = 0.266 +- 0.019
    THR:  mean_threads = 12.893 +- 0.924
======================================
[COMP]: processing ov_stereo algorithm
    loaded 148 timestamps from file!!
    PREC: mean_cpu = 111.007 +- 16.519
    PREC: mean_mem = 5.139 +- 2.889
    THR:  mean_threads = 12.905 +- 0.943
======================================

The monocular run uses ~84% of one CPU thread on average, while stereo uses ~111% — just over one full thread. Memory usage increases significantly with stereo because additional feature maps and descriptor buffers must be maintained.

Performance expectations and bottlenecks

Based on the timing data from a standard EuRoC run on a modern laptop:

Feature tracking is usually the dominant cost

Optical-flow tracking over a full image is

O(N \cdot W^2)

in the number of features

N

and patch width

W

. For 200 features with a 15×15 patch, tracking typically takes 3–8 ms per frame. Stereo roughly doubles this. Reducing the maximum feature count or enabling image downsampling are the most effective ways to cut tracking time.

MSCKF update scales with the number of features

The MSCKF update cost scales with the number of features that are marginalized at each step and the size of the sliding window. With a window of 10 poses and 50 features, the update is typically 3–8 ms. Reducing the window size or the number of SLAM features directly reduces this cost.

Propagation is almost free

IMU propagation is a fixed-cost

O(1)

operation per IMU measurement and typically takes under 0.5 ms regardless of state size. It is rarely a bottleneck.

Marginalization can spike on difficult datasets

Marginalization involves Schur-complement operations whose cost grows with the number of features and clone states. On datasets with long corridors or slow motion (many features per frame), marginalization times can spike to 5–10 ms in the worst case.

Tips for improving real-time performance

Profile first

Run timing_flamegraph to identify which component accounts for the majority of the total time. Only optimise the bottleneck — improving a component that takes 0.4 ms will have negligible effect on a 12 ms total budget.

Reduce the feature count

Lower max_cameras tracked features or enable the klt_win_size / num_pts parameters to reduce optical-flow cost. Fewer features also speeds up the MSCKF update.

Shorten the clone window

A shorter sliding window reduces both the MSCKF update dimension and the marginalization cost, at the expense of slightly reduced accuracy on fast-motion sequences.

Disable inline plotting

In production deployments, set DISABLE_MATPLOTLIB=ON at build time to avoid any overhead from the matplotlib-cpp bridge (even when not actively plotting).

Compare across hardware with caution

CPU percentage from pid_ros.py is machine-dependent. Always compare methods on the same hardware under the same load conditions. Cross-machine comparisons should use raw execution-time files (timing_flamegraph) rather than psutil percentages.

Get Started

Guides

Theory

Evaluation

Development

Timing and Profiling OpenVINS Estimator Components

How timing data is recorded

Recording CPU and memory usage

Installing psutil

Adding the recorder to your launch file

Analyzing timing output

timing_flamegraph

timing_histogram

timing_comparison

timing_percentages

Performance expectations and bottlenecks

Tips for improving real-time performance

Build docs developers (and LLMs) love

Get Started

Guides

Theory

Evaluation

Development

Documentation Index

​How timing data is recorded

​Recording CPU and memory usage

​Installing psutil

​Adding the recorder to your launch file

​Analyzing timing output

​timing_flamegraph

​timing_histogram

​timing_comparison

​timing_percentages

​Performance expectations and bottlenecks

​Tips for improving real-time performance

Build docs developers (and LLMs) love

How timing data is recorded

Recording CPU and memory usage

Installing psutil

Adding the recorder to your launch file

Analyzing timing output

timing_flamegraph

timing_histogram

timing_comparison

timing_percentages

Performance expectations and bottlenecks

Tips for improving real-time performance