Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/rpng/open_vins/llms.txt

Use this file to discover all available pages before exploring further.

Evaluating a visual-inertial odometry (VIO) system fairly requires more than a single number. Different metrics expose different failure modes: a method may have low global error but high local drift, or may be accurate on average while producing over-confident covariance estimates. This page defines each metric used by ov_eval, explains when to apply it, and shows how to compute it from the command line. We recommend reading Zhang and Scaramuzza (2018) for a thorough treatment of trajectory evaluation methodology.

Metric comparison

MetricRequires covarianceSensitive to outliersCaptures driftBest for
ATENoYesNoOverall global accuracy
RPENoLess soYesLocal / incremental drift
RMSENoYesPartialPer-timestep debugging
NEESYesNoNoEstimator consistency

Absolute Trajectory Error (ATE)

The Absolute Trajectory Error measures the global accuracy of an estimated trajectory by computing the difference between it and the ground truth after optimal alignment. The alignment step finds the best rigid-body (or similarity) transform that minimises the total error, after which the residual is averaged across all timesteps and runs. For NN independent runs of the same algorithm, each producing KK pose measurements, and an aligned estimated trajectory x^+\hat{\mathbf{x}}^+: eATE=1Ni=1N1Kk=1Kxk,ix^k,i+22e_{\text{ATE}} = \frac{1}{N} \sum_{i=1}^{N} \sqrt{ \frac{1}{K} \sum_{k=1}^{K} \left\| \mathbf{x}_{k,i} \boxminus \hat{\mathbf{x}}^+_{k,i} \right\|^2_2 } The \boxminus operator denotes the manifold-aware pose difference (geodesic error on SO(3)×R3SO(3) \times \mathbb{R}^3). When to use ATE:
  • Comparing the final global accuracy of multiple algorithms on the same dataset.
  • Benchmarking against published results, where ATE is widely reported.
Limitation: Because ATE aligns the full trajectory before computing error, a single large outlier (e.g., from a tracking failure) can distort the alignment and inflate the apparent error of an otherwise good run. ATE is also less informative about where in the trajectory the estimation degrades.
Some datasets, such as UZH-FPV, provide only intermittent ground truth. For these, only ATE is a valid metric because RPE requires densely sampled ground truth to compute segment-level errors reliably.

Computing ATE with ov_eval

rosrun ov_eval error_singlerun posyaw truths/V1_01_easy.txt estimate.txt
Example console output:
======================================
Absolute Trajectory Error
======================================
rmse_ori = 0.677 | rmse_pos = 0.055
mean_ori = 0.606 | mean_pos = 0.051
min_ori  = 0.044 | min_pos  = 0.001
max_ori  = 1.856 | max_pos  = 0.121
std_ori  = 0.302 | std_pos  = 0.021
Orientation errors are in degrees; position errors are in metres.

Relative Pose Error (RPE)

The Relative Pose Error evaluates drift over short, fixed-length sub-segments of the trajectory. Rather than aligning the full trajectory, RPE measures how well the relative motion predicted by the estimator matches the relative motion of the ground truth over the same window. We define a set of segment lengths D=[d1,d2,,dV]\mathcal{D} = [d_1, d_2, \dots, d_V] and split the trajectory into overlapping segments of each length. For segments of length did_i, yielding DiD_i segment pairs: x~r=xkxk+di\tilde{\mathbf{x}}_{r} = \mathbf{x}_{k} \boxminus \mathbf{x}_{k+d_i} erpe,di=1Dik=1Dix~rx~^r2e_{\text{rpe}, d_i} = \frac{1}{D_i} \sum_{k=1}^{D_i} \left\| \tilde{\mathbf{x}}_{r} \boxminus \hat{\tilde{\mathbf{x}}}_{r} \right\|_2 When to use RPE:
  • Measuring how quickly the estimator drifts as the trajectory length increases.
  • Comparing methods on datasets where alignment is ambiguous or unavailable.
  • Publishing results in papers — RPE over multiple segment lengths gives reviewers more information than ATE alone.
Key advantage over ATE: RPE is less sensitive to isolated tracking failures or sudden jumps in estimation error because it samples the trajectory over many shorter windows. A single bad segment affects only the samples that include it, rather than corrupting the global alignment.

Default segment lengths

The error_singlerun binary uses segment lengths of 8, 16, 24, 32, and 40 metres by default. To use different lengths, edit the segments vector in error_singlerun.cpp:134 and recompile.

Computing RPE with ov_eval

rosrun ov_eval error_singlerun posyaw truths/V1_01_easy.txt estimate.txt
Example console output (RPE section):
======================================
Relative Pose Error
======================================
seg 8  - median_ori = 0.566 | median_pos = 0.068 (2484 samples)
seg 16 - median_ori = 0.791 | median_pos = 0.060 (2280 samples)
seg 24 - median_ori = 0.736 | median_pos = 0.070 (2108 samples)
seg 32 - median_ori = 0.715 | median_pos = 0.071 (1943 samples)
seg 40 - median_ori = 0.540 | median_pos = 0.063 (1792 samples)
The sample count decreases for longer segments because there are fewer non-overlapping windows as the segment length approaches the total trajectory length.

Root Mean Squared Error (RMSE)

RMSE is a per-timestep metric most useful when evaluating a single dataset. Instead of collapsing accuracy into a scalar, RMSE plots reveal exactly when during the trajectory the estimator struggles, making it a powerful debugging tool. For NN runs of the same algorithm, the RMSE at timestep kk is: ermse,k=1Ni=1Nxk,ix^k,i22e_{\text{rmse}, k} = \sqrt{ \frac{1}{N} \sum_{i=1}^{N} \left\| \mathbf{x}_{k,i} \boxminus \hat{\mathbf{x}}_{k,i} \right\|^2_2 } When N=1N = 1 (a single run), RMSE reduces to the instantaneous pose error at each timestep. When to use RMSE:
  • Diagnosing where a run degrades (e.g., aggressive motion, poor lighting, long corridors).
  • Verifying that a bug fix improves error consistently across the trajectory, not just on average.

Computing RMSE with ov_eval

rosrun ov_eval error_dataset posyaw truths/V1_01_easy.txt algorithms/
Example output (RMSE lines):
    RMSE: mean_ori = 0.644 | mean_pos = 0.056
    RMSE 2D: mean_ori = 0.516 | mean_pos = 0.052
The 2D variant projects positions onto the horizontal plane, which is useful for ground-vehicle datasets where vertical accuracy is not a concern.

Normalized Estimation Error Squared (NEES)

NEES is the standard metric for assessing estimator consistency: does the filter’s self-reported uncertainty actually match the observed errors? An estimator that reports a small covariance while making large errors is called over-confident and is inconsistent. For NN runs at timestep kk, NEES is: enees,k=1Ni=1N(xk,ix^k,i)Pk,i1(xk,ix^k,i)e_{\text{nees}, k} = \frac{1}{N} \sum_{i=1}^{N} \left( \mathbf{x}_{k,i} \boxminus \hat{\mathbf{x}}_{k,i} \right)^\top \mathbf{P}^{-1}_{k,i} \left( \mathbf{x}_{k,i} \boxminus \hat{\mathbf{x}}_{k,i} \right) where Pk,i\mathbf{P}_{k,i} is the estimator’s covariance at that timestep. Interpretation: For a consistent estimator, the expected NEES equals the degrees of freedom of the state variable. For 3-D position or 3-D orientation (each 3-DOF), a well-calibrated filter should produce a NEES of approximately 3 at every timestep. Values significantly above 3 indicate over-confidence; values below 3 indicate under-confidence (the filter is being too conservative).
NEES requires that your estimator output includes covariance information. ov_eval reads covariance from the PoseWithCovarianceStamped message. If you record from a PoseStamped or plain Odometry topic without covariance, the NEES calculation will be skipped.

Single-run consistency (3σ3\sigma bounds)

For a single run, you can inspect consistency by plotting the per-component error alongside the estimator’s 3σ3\sigma bound: ek=xkx^k,ekN(0,P)\mathbf{e}_k = \mathbf{x}_{k} \boxminus \hat{\mathbf{x}}_{k}, \quad \mathbf{e}_k \sim \mathcal{N}(0, \mathbf{P}) If the error frequently exceeds the ±3σ\pm 3\sigma envelope, the estimator is over-confident in that component. This plot is produced automatically by error_singlerun when matplotlib is available.

Computing NEES with ov_eval

rosrun ov_eval error_singlerun posyaw truths/V1_01_easy.txt estimate_with_cov.txt
Example console output (NEES section):
======================================
Normalized Estimation Error Squared
======================================
mean_ori = 793.646 | mean_pos = 13.095
A NEES of 793 for orientation is a clear signal of over-confidence — the filter is producing much smaller uncertainty bounds than the actual errors warrant. This often points to under-tuned IMU noise parameters or an insufficiently large linearization noise.

Multi-dataset comparison

To compare multiple algorithms across multiple datasets and produce a publication-ready LaTeX table:
rosrun ov_eval error_comparison posyaw truths/ algorithms/
This prints per-dataset ATE and RPE for each method, then outputs a LaTeX table at the end:
============================================
ATE LATEX TABLE
============================================
 & \textbf{V1\_01\_easy} & \textbf{V1\_02\_medium} & \textbf{Average} \\\hline
open_vins  & 0.699 / 0.058 & 1.675 / 0.076 & 1.445 / 0.079 \\
okvis      & 0.642 / 0.076 & 1.766 / 0.096 & 1.442 / 0.148 \\
Values are reported as orientation_error / position_error in degrees and metres respectively.

Build docs developers (and LLMs) love