Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/rpng/open_vins/llms.txt

Use this file to discover all available pages before exploring further.

Before a visual feature can contribute to the EKF update — whether as a short-lived MSCKF feature or a persistent SLAM landmark — its 3D position must be estimated from a set of 2D bearing observations across multiple camera frames. This process, called feature initialization or triangulation, establishes the linearization point at which all subsequent Jacobians for that feature will be evaluated. A poor linearization point produces biased Jacobians that degrade filter consistency. OpenVINS therefore invests in a careful two-stage pipeline: a fast linear least-squares for an initial estimate, followed by a nonlinear Gauss-Newton refinement in inverse-depth parameterization.

Why Feature Initialization Matters

EKF linearization of the camera measurement model requires evaluating hGpf\frac{\partial\mathbf{h}}{\partial{}^G\mathbf{p}_f} at a fixed feature estimate p^f\hat{\mathbf{p}}_f. When First-Estimate Jacobians (FEJ) are enabled, this estimate is frozen at initialization time and never updated. A poor triangulation therefore permanently degrades the Jacobians for the entire lifetime of the feature. Conversely, a good initialization with sufficient baseline and view diversity leads to accurate, consistent updates.
OpenVINS requires a minimum of two views to triangulate a feature and recommends at least five views for robust performance. The linear system’s condition number is checked to reject geometrically degenerate configurations (e.g., near-parallel viewing rays).

Feature Representations

OpenVINS supports six distinct parameterizations of a 3D point feature, selectable at runtime. Each trades off between singularity robustness, state dimension, and Jacobian complexity.
RepresentationParameters λ\boldsymbol{\lambda}DimIn State?Notes
Global XYZ[Gx, Gy, Gz][{}^Gx,\ {}^Gy,\ {}^Gz]^\top3Yes (SLAM)Canonical; full Gpf{}^G\mathbf{p}_f in global frame
Global Inverse Depth[θ, ϕ, ρ][\theta,\ \phi,\ \rho]^\top3Yes (SLAM)Spherical coordinates; singularity when z0z \to 0
Anchored XYZ[Cax, Cay, Caz][{}^{C_a}x,\ {}^{C_a}y,\ {}^{C_a}z]^\top3Yes (SLAM)Position in anchor camera frame {Ca}\{C_a\}
Anchored Inverse Depth[θ, ϕ, ρ][\theta,\ \phi,\ \rho]^\top3Yes (SLAM)Spherical coords in anchor frame; no z0z \to 0 singularity
Anchored MSCKF Inv. Depth[α, β, ρ][\alpha,\ \beta,\ \rho]^\top3No (MSCKF)Planar bearing [α,β,1]/ρ[\alpha,\beta,1]/\rho; default MSCKF rep.
Anchored Single Inv. Depth[ρ][\rho]1Yes (SLAM, 1-DoF)Bearing fixed at init; only depth in state
Global XYZ simply stores λ=Gpf\boldsymbol{\lambda} = {}^G\mathbf{p}_f with identity Jacobian fλ=I3×3\frac{\partial\mathbf{f}}{\partial\boldsymbol{\lambda}} = \mathbf{I}_{3\times3}. Global Inverse Depth uses spherical coordinates: Gpf=1ρ[cosθsinϕsinθsinϕcosϕ]{}^G\mathbf{p}_f = \frac{1}{\rho}\begin{bmatrix}\cos\theta\sin\phi \\ \sin\theta\sin\phi \\ \cos\phi\end{bmatrix} where ρ=1/Gpf\rho = 1/\|{}^G\mathbf{p}_f\| is the inverse distance. This avoids the near-linear degeneracy of XYZ for far-away features, but has a singularity when the global zz-coordinate approaches zero. Anchored representations express the feature relative to an “anchor” camera pose {Ca}\{C_a\} — typically the first frame in which the feature was detected. This prevents the global zz-singularity and is preferred in practice. The global position is recovered via: Gpf=GIaRICR(λCpI)+GpIa{}^G\mathbf{p}_f = {}^{I_a}_G\mathbf{R}^\top\,{}^C_I\mathbf{R}^\top(\boldsymbol{\lambda}' - {}^C\mathbf{p}_I) + {}^G\mathbf{p}_{I_a} where λ\boldsymbol{\lambda}' is the feature expressed in the anchor camera frame. Anchored MSCKF Inverse Depth ([α,β,ρ][\alpha,\beta,\rho]) is the representation from the original MSCKF paper. The bearing is [α/ρ, β/ρ, 1/ρ][{\alpha}/{\rho},\ {\beta}/{\rho},\ {1}/{\rho}]^\top. This is the default MSCKF representation and is numerically well-conditioned as long as features are observed from the camera frame they were anchored in (depth always positive). Anchored Single Inverse Depth keeps only the scalar depth ρ\rho in the state after marginalizing the bearing at initialization time. This is useful for applications where reducing state dimensionality is critical.

Triangulation Pipeline

1

Select anchor frame

Choose an anchor camera pose {A}\{A\} — typically the earliest camera frame that observed the feature. All subsequent computation is done relative to this frame.
2

Build bearing vectors

For each observing camera pose CiC_i (i=1mi = 1\ldots m), compute the normalized image bearing in the anchor frame:AbCif=ACiR[un,ivn,i1]{}^A\mathbf{b}_{C_i\to f} = {}^{C_i}_A\mathbf{R}^\top\begin{bmatrix}u_{n,i} \\ v_{n,i} \\ 1\end{bmatrix}where (un,i,vn,i)(u_{n,i}, v_{n,i}) are the undistorted normalized image coordinates in frame CiC_i.
3

Form linear system (3D triangulation)

For each frame, the depth along the bearing is an unknown. To eliminate it, pre-multiply by the skew-symmetric (cross-product) matrix of the bearing:ANi=AbCif×{}^A\mathbf{N}_i = \lfloor{}^A\mathbf{b}_{C_i\to f}\times\rfloorThis gives two independent constraints per frame. Stacking all mm frames yields AApf=b\mathbf{A}\,{}^A\mathbf{p}_f = \mathbf{b}, which is solved in the least-squares sense via the 3×33\times3 normal equations:(iANiANi)Apf=iANiANiApCi\Bigl(\sum_i {}^A\mathbf{N}_i^\top{}^A\mathbf{N}_i\Bigr)\,{}^A\mathbf{p}_f = \sum_i {}^A\mathbf{N}_i^\top{}^A\mathbf{N}_i\,{}^A\mathbf{p}_{C_i}This is a 3×33\times3 system that can be solved instantly, regardless of the number of observations mm.
4

Validate the linear estimate

Check two conditions before proceeding to refinement:
  • Positive depth: the triangulated feature must lie in front of all observing cameras (Cizf>0{}^{C_i}z_f > 0).
  • Condition number: the condition number of AA\mathbf{A}^\top\mathbf{A} must be below a threshold. A large condition number indicates near-parallel rays (insufficient baseline) and an unreliable estimate.
Features failing either check are discarded.
5

Gauss-Newton refinement in inverse depth

The linear estimate is refined using Gauss-Newton optimization in the inverse-depth parameterization (uA,vA,ρA)(u_A, v_A, \rho_A) where uA=Axf/Azfu_A = {}^Ax_f/{}^Az_f, vA=Ayf/Azfv_A = {}^Ay_f/{}^Az_f, ρA=1/Azf\rho_A = 1/{}^Az_f.For each observing frame, the predicted normalized bearing is:h(uA,vA,ρA)=ACiR[uAvA1]ρAACiRApCi\mathbf{h}(u_A, v_A, \rho_A) = {}^{C_i}_A\mathbf{R}\begin{bmatrix}u_A \\ v_A \\ 1\end{bmatrix} - \rho_A\,{}^{C_i}_A\mathbf{R}\,{}^A\mathbf{p}_{C_i}and the projected measurement is z^=[h1/h3, h2/h3]\hat{\mathbf{z}} = [h_1/h_3,\ h_2/h_3]^\top. The total least-squares cost is:minuA,vA,ρAiziz^i(uA,vA,ρA)2\min_{u_A,v_A,\rho_A}\sum_i\|\mathbf{z}_i - \hat{\mathbf{z}}_i(u_A,v_A,\rho_A)\|^2The Jacobian is derived via chain rule and the update step is applied iteratively. Convergence typically requires 2–3 Gauss-Newton iterations in indoor environments.
6

Final depth check

After refinement, verify that the feature is still in front of all cameras and within a reasonable depth range. The refined (uA,vA,ρA)(u_A, v_A, \rho_A) is then converted back to the desired feature representation (Global XYZ, Anchored Inverse Depth, etc.) for use in the EKF.

1D Depth Triangulation (Single-Depth Variant)

For the Anchored Single Inverse Depth representation, the bearing in the anchor frame Abf=[un,A, vn,A, 1]{}^A\mathbf{b}_f = [u_{n,A},\ v_{n,A},\ 1]^\top is known from the anchor observation. Only the scalar depth Azf{}^Az_f is unknown. The same cross-product trick yields a scalar equation: (i(ANiAbf)(ANiAbf))Azf=i(ANiAbf)ANiApCi\Bigl(\sum_i ({}^A\mathbf{N}_i\,{}^A\mathbf{b}_f)^\top({}^A\mathbf{N}_i\,{}^A\mathbf{b}_f)\Bigr)\,{}^Az_f = \sum_i ({}^A\mathbf{N}_i\,{}^A\mathbf{b}_f)^\top\,{}^A\mathbf{N}_i\,{}^A\mathbf{p}_{C_i} This reduces to a single scalar division, making it extremely fast. The full feature position is then Apf=AzfAbf{}^A\mathbf{p}_f = {}^Az_f\,{}^A\mathbf{b}_f.

Why Inverse Depth for Refinement?

Expressing depth as its reciprocal ρ=1/z\rho = 1/z has favorable numerical properties:
  • Features at large distances have small zz and large, numerically unstable Cartesian coordinates. In inverse depth, they cluster near ρ0\rho \approx 0 and the optimization landscape is well-conditioned.
  • The inverse-depth parameterization is closer to the natural “uncertainty space” of bearing-only sensors: uncertainty in bearing translates to roughly uniform uncertainty in ρ\rho (rather than a skewed distribution in zz).
  • Gauss-Newton converges faster and with fewer iterations compared to direct Cartesian optimization.

Build docs developers (and LLMs) love