Evaluating a visual-inertial odometry (VIO) system fairly requires more than a single number. Different metrics expose different failure modes: a method may have low global error but high local drift, or may be accurate on average while producing over-confident covariance estimates. This page defines each metric used byDocumentation Index
Fetch the complete documentation index at: https://mintlify.com/rpng/open_vins/llms.txt
Use this file to discover all available pages before exploring further.
ov_eval, explains when to apply it, and shows how to compute it from the command line.
We recommend reading Zhang and Scaramuzza (2018) for a thorough treatment of trajectory evaluation methodology.
Metric comparison
| Metric | Requires covariance | Sensitive to outliers | Captures drift | Best for |
|---|---|---|---|---|
| ATE | No | Yes | No | Overall global accuracy |
| RPE | No | Less so | Yes | Local / incremental drift |
| RMSE | No | Yes | Partial | Per-timestep debugging |
| NEES | Yes | No | No | Estimator consistency |
Absolute Trajectory Error (ATE)
The Absolute Trajectory Error measures the global accuracy of an estimated trajectory by computing the difference between it and the ground truth after optimal alignment. The alignment step finds the best rigid-body (or similarity) transform that minimises the total error, after which the residual is averaged across all timesteps and runs. For independent runs of the same algorithm, each producing pose measurements, and an aligned estimated trajectory : The operator denotes the manifold-aware pose difference (geodesic error on ). When to use ATE:- Comparing the final global accuracy of multiple algorithms on the same dataset.
- Benchmarking against published results, where ATE is widely reported.
Some datasets, such as UZH-FPV, provide only intermittent ground truth. For these, only ATE is a valid metric because RPE requires densely sampled ground truth to compute segment-level errors reliably.
Computing ATE with ov_eval
Relative Pose Error (RPE)
The Relative Pose Error evaluates drift over short, fixed-length sub-segments of the trajectory. Rather than aligning the full trajectory, RPE measures how well the relative motion predicted by the estimator matches the relative motion of the ground truth over the same window. We define a set of segment lengths and split the trajectory into overlapping segments of each length. For segments of length , yielding segment pairs: When to use RPE:- Measuring how quickly the estimator drifts as the trajectory length increases.
- Comparing methods on datasets where alignment is ambiguous or unavailable.
- Publishing results in papers — RPE over multiple segment lengths gives reviewers more information than ATE alone.
Default segment lengths
Theerror_singlerun binary uses segment lengths of 8, 16, 24, 32, and 40 metres by default. To use different lengths, edit the segments vector in error_singlerun.cpp:134 and recompile.
Computing RPE with ov_eval
Root Mean Squared Error (RMSE)
RMSE is a per-timestep metric most useful when evaluating a single dataset. Instead of collapsing accuracy into a scalar, RMSE plots reveal exactly when during the trajectory the estimator struggles, making it a powerful debugging tool. For runs of the same algorithm, the RMSE at timestep is: When (a single run), RMSE reduces to the instantaneous pose error at each timestep. When to use RMSE:- Diagnosing where a run degrades (e.g., aggressive motion, poor lighting, long corridors).
- Verifying that a bug fix improves error consistently across the trajectory, not just on average.
Computing RMSE with ov_eval
2D variant projects positions onto the horizontal plane, which is useful for ground-vehicle datasets where vertical accuracy is not a concern.
Normalized Estimation Error Squared (NEES)
NEES is the standard metric for assessing estimator consistency: does the filter’s self-reported uncertainty actually match the observed errors? An estimator that reports a small covariance while making large errors is called over-confident and is inconsistent. For runs at timestep , NEES is: where is the estimator’s covariance at that timestep. Interpretation: For a consistent estimator, the expected NEES equals the degrees of freedom of the state variable. For 3-D position or 3-D orientation (each 3-DOF), a well-calibrated filter should produce a NEES of approximately 3 at every timestep. Values significantly above 3 indicate over-confidence; values below 3 indicate under-confidence (the filter is being too conservative).Single-run consistency ( bounds)
For a single run, you can inspect consistency by plotting the per-component error alongside the estimator’s bound: If the error frequently exceeds the envelope, the estimator is over-confident in that component. This plot is produced automatically byerror_singlerun when matplotlib is available.
Computing NEES with ov_eval
Multi-dataset comparison
To compare multiple algorithms across multiple datasets and produce a publication-ready LaTeX table:orientation_error / position_error in degrees and metres respectively.