VIO Accuracy Metrics: ATE, RPE, RMSE, and NEES Explained

Evaluating a visual-inertial odometry (VIO) system fairly requires more than a single number. Different metrics expose different failure modes: a method may have low global error but high local drift, or may be accurate on average while producing over-confident covariance estimates. This page defines each metric used by ov_eval, explains when to apply it, and shows how to compute it from the command line. We recommend reading Zhang and Scaramuzza (2018) for a thorough treatment of trajectory evaluation methodology.

Metric comparison

Metric	Requires covariance	Sensitive to outliers	Captures drift	Best for
ATE	No	Yes	No	Overall global accuracy
RPE	No	Less so	Yes	Local / incremental drift
RMSE	No	Yes	Partial	Per-timestep debugging
NEES	Yes	No	No	Estimator consistency

Absolute Trajectory Error (ATE)

The Absolute Trajectory Error measures the global accuracy of an estimated trajectory by computing the difference between it and the ground truth after optimal alignment. The alignment step finds the best rigid-body (or similarity) transform that minimises the total error, after which the residual is averaged across all timesteps and runs. For

N

independent runs of the same algorithm, each producing

K

pose measurements, and an aligned estimated trajectory

\hat{\mathbf{x}}^+

e_{\text{ATE}} = \frac{1}{N} \sum_{i=1}^{N} \sqrt{ \frac{1}{K} \sum_{k=1}^{K} \left\| \mathbf{x}_{k,i} \boxminus \hat{\mathbf{x}}^+_{k,i} \right\|^2_2 }

The

\boxminus

operator denotes the manifold-aware pose difference (geodesic error on

SO(3) \times \mathbb{R}^3

). When to use ATE:

Comparing the final global accuracy of multiple algorithms on the same dataset.
Benchmarking against published results, where ATE is widely reported.

Limitation: Because ATE aligns the full trajectory before computing error, a single large outlier (e.g., from a tracking failure) can distort the alignment and inflate the apparent error of an otherwise good run. ATE is also less informative about where in the trajectory the estimation degrades.

Some datasets, such as UZH-FPV, provide only intermittent ground truth. For these, only ATE is a valid metric because RPE requires densely sampled ground truth to compute segment-level errors reliably.

Computing ATE with ov_eval

rosrun ov_eval error_singlerun posyaw truths/V1_01_easy.txt estimate.txt

Example console output:

======================================
Absolute Trajectory Error
======================================
rmse_ori = 0.677 | rmse_pos = 0.055
mean_ori = 0.606 | mean_pos = 0.051
min_ori  = 0.044 | min_pos  = 0.001
max_ori  = 1.856 | max_pos  = 0.121
std_ori  = 0.302 | std_pos  = 0.021

Orientation errors are in degrees; position errors are in metres.

Relative Pose Error (RPE)

The Relative Pose Error evaluates drift over short, fixed-length sub-segments of the trajectory. Rather than aligning the full trajectory, RPE measures how well the relative motion predicted by the estimator matches the relative motion of the ground truth over the same window. We define a set of segment lengths

\mathcal{D} = [d_1, d_2, \dots, d_V]

and split the trajectory into overlapping segments of each length. For segments of length

d_i

, yielding

D_i

segment pairs:

\tilde{\mathbf{x}}_{r} = \mathbf{x}_{k} \boxminus \mathbf{x}_{k+d_i}

e_{\text{rpe}, d_i} = \frac{1}{D_i} \sum_{k=1}^{D_i} \left\| \tilde{\mathbf{x}}_{r} \boxminus \hat{\tilde{\mathbf{x}}}_{r} \right\|_2

When to use RPE:

Measuring how quickly the estimator drifts as the trajectory length increases.
Comparing methods on datasets where alignment is ambiguous or unavailable.
Publishing results in papers — RPE over multiple segment lengths gives reviewers more information than ATE alone.

Key advantage over ATE: RPE is less sensitive to isolated tracking failures or sudden jumps in estimation error because it samples the trajectory over many shorter windows. A single bad segment affects only the samples that include it, rather than corrupting the global alignment.

Default segment lengths

The error_singlerun binary uses segment lengths of 8, 16, 24, 32, and 40 metres by default. To use different lengths, edit the segments vector in error_singlerun.cpp:134 and recompile.

Computing RPE with ov_eval

rosrun ov_eval error_singlerun posyaw truths/V1_01_easy.txt estimate.txt

Example console output (RPE section):

======================================
Relative Pose Error
======================================
seg 8  - median_ori = 0.566 | median_pos = 0.068 (2484 samples)
seg 16 - median_ori = 0.791 | median_pos = 0.060 (2280 samples)
seg 24 - median_ori = 0.736 | median_pos = 0.070 (2108 samples)
seg 32 - median_ori = 0.715 | median_pos = 0.071 (1943 samples)
seg 40 - median_ori = 0.540 | median_pos = 0.063 (1792 samples)

The sample count decreases for longer segments because there are fewer non-overlapping windows as the segment length approaches the total trajectory length.

Root Mean Squared Error (RMSE)

RMSE is a per-timestep metric most useful when evaluating a single dataset. Instead of collapsing accuracy into a scalar, RMSE plots reveal exactly when during the trajectory the estimator struggles, making it a powerful debugging tool. For

N

runs of the same algorithm, the RMSE at timestep

k

is:

e_{\text{rmse}, k} = \sqrt{ \frac{1}{N} \sum_{i=1}^{N} \left\| \mathbf{x}_{k,i} \boxminus \hat{\mathbf{x}}_{k,i} \right\|^2_2 }

When

N = 1

(a single run), RMSE reduces to the instantaneous pose error at each timestep. When to use RMSE:

Diagnosing where a run degrades (e.g., aggressive motion, poor lighting, long corridors).
Verifying that a bug fix improves error consistently across the trajectory, not just on average.

Computing RMSE with ov_eval

rosrun ov_eval error_dataset posyaw truths/V1_01_easy.txt algorithms/

Example output (RMSE lines):

    RMSE: mean_ori = 0.644 | mean_pos = 0.056
    RMSE 2D: mean_ori = 0.516 | mean_pos = 0.052

The 2D variant projects positions onto the horizontal plane, which is useful for ground-vehicle datasets where vertical accuracy is not a concern.

Normalized Estimation Error Squared (NEES)

NEES is the standard metric for assessing estimator consistency: does the filter’s self-reported uncertainty actually match the observed errors? An estimator that reports a small covariance while making large errors is called over-confident and is inconsistent. For

N

runs at timestep

k

, NEES is:

e_{\text{nees}, k} = \frac{1}{N} \sum_{i=1}^{N} \left( \mathbf{x}_{k,i} \boxminus \hat{\mathbf{x}}_{k,i} \right)^\top \mathbf{P}^{-1}_{k,i} \left( \mathbf{x}_{k,i} \boxminus \hat{\mathbf{x}}_{k,i} \right)

where

\mathbf{P}_{k,i}

is the estimator’s covariance at that timestep. Interpretation: For a consistent estimator, the expected NEES equals the degrees of freedom of the state variable. For 3-D position or 3-D orientation (each 3-DOF), a well-calibrated filter should produce a NEES of approximately 3 at every timestep. Values significantly above 3 indicate over-confidence; values below 3 indicate under-confidence (the filter is being too conservative).

NEES requires that your estimator output includes covariance information. ov_eval reads covariance from the PoseWithCovarianceStamped message. If you record from a PoseStamped or plain Odometry topic without covariance, the NEES calculation will be skipped.

Single-run consistency ( $3\sigma$ bounds)

For a single run, you can inspect consistency by plotting the per-component error alongside the estimator’s

3\sigma

bound:

\mathbf{e}_k = \mathbf{x}_{k} \boxminus \hat{\mathbf{x}}_{k}, \quad \mathbf{e}_k \sim \mathcal{N}(0, \mathbf{P})

If the error frequently exceeds the

\pm 3\sigma

envelope, the estimator is over-confident in that component. This plot is produced automatically by error_singlerun when matplotlib is available.

Computing NEES with ov_eval

rosrun ov_eval error_singlerun posyaw truths/V1_01_easy.txt estimate_with_cov.txt

Example console output (NEES section):

======================================
Normalized Estimation Error Squared
======================================
mean_ori = 793.646 | mean_pos = 13.095

A NEES of 793 for orientation is a clear signal of over-confidence — the filter is producing much smaller uncertainty bounds than the actual errors warrant. This often points to under-tuned IMU noise parameters or an insufficiently large linearization noise.

Multi-dataset comparison

To compare multiple algorithms across multiple datasets and produce a publication-ready LaTeX table:

rosrun ov_eval error_comparison posyaw truths/ algorithms/

This prints per-dataset ATE and RPE for each method, then outputs a LaTeX table at the end:

============================================
ATE LATEX TABLE
============================================
 & \textbf{V1\_01\_easy} & \textbf{V1\_02\_medium} & \textbf{Average} \\\hline
open_vins  & 0.699 / 0.058 & 1.675 / 0.076 & 1.445 / 0.079 \\
okvis      & 0.642 / 0.076 & 1.766 / 0.096 & 1.442 / 0.148 \\

Values are reported as orientation_error / position_error in degrees and metres respectively.

Get Started

Guides

Theory

Evaluation

Development

VIO Accuracy Metrics: ATE, RPE, RMSE, and NEES Explained

Metric comparison

Absolute Trajectory Error (ATE)

Computing ATE with ov_eval

Relative Pose Error (RPE)

Default segment lengths

Computing RPE with ov_eval

Root Mean Squared Error (RMSE)

Computing RMSE with ov_eval

Normalized Estimation Error Squared (NEES)

Single-run consistency ( $3\sigma$ bounds)

Computing NEES with ov_eval

Multi-dataset comparison

Build docs developers (and LLMs) love

Get Started

Guides

Theory

Evaluation

Development

Documentation Index

​Metric comparison

​Absolute Trajectory Error (ATE)

​Computing ATE with ov_eval

​Relative Pose Error (RPE)

​Default segment lengths

​Computing RPE with ov_eval

​Root Mean Squared Error (RMSE)

​Computing RMSE with ov_eval

​Normalized Estimation Error Squared (NEES)

​Single-run consistency (3σ3\sigma3σ bounds)

​Computing NEES with ov_eval

​Multi-dataset comparison

Build docs developers (and LLMs) love

Metric comparison

Absolute Trajectory Error (ATE)

Computing ATE with ov_eval

Relative Pose Error (RPE)

Default segment lengths

Computing RPE with ov_eval

Root Mean Squared Error (RMSE)

Computing RMSE with ov_eval

Normalized Estimation Error Squared (NEES)

Single-run consistency ( $3\sigma$ bounds)

Computing NEES with ov_eval

Multi-dataset comparison