0% found this document useful (0 votes)

17 views16 pages

Auro 2017

Uploaded by

chowkamlee81

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views16 pages

Auro 2017

Uploaded by

chowkamlee81

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Noname manuscript No.

(will be inserted by the editor)

Edge Alignment-Based Visual-Inertial Fusion for Tracking of Aggressive

Motions
Yonggen Ling · Manohar Kuse · Shaojie Shen

the date of receipt and acceptance should be inserted later

Abstract We propose a novel edge-based visual-inertial fu- applications. In particularly, reliable tracking of fast and ag-
sion approach to address the problem of tracking aggressive gressive motions is essential for popular applications that
motions with real-time state estimates. At the front-end, our involves highly dynamic mobile platforms/devices, such as
system performs edge alignment, which estimates the rel- aerial robotics and augmented reality 1.
ative poses in the distance transform domain with a larger
convergence basin and stronger resistance to changing light- As demonstrated in the literature (Baker and Matthews
ing conditions or camera exposures compared to the popu- (2004)), cameras are the ideal sensors for tracking slow to
lar direct dense tracking. At the back-end, a sliding-window moderate motions using feature-based methods under con-
optimization-based framework is applied to fuse visual and stant lighting conditions and camera exposures. However,
inertial measurements. We utilize efficient inertial measure- large image displacement caused by fast motions can seri-
ment unit (IMU) preintegration and two-way marginaliza- ously downgrade the feature tracking performance. Recent
tion to generate accurate and smooth estimates with limited advances in direct dense tracking have shown good adapt-
computational resources. To increase the robustness of our ability to fast motions (Newcombe et al (2011); Engel et al
proposed system, we propose to perform an edge alignment (2014); Ling and Shen (2015)). These methods operate on
self check and IMU-aided external check. Extensive statisti- image intensities, rather than on sparse features, to minimize
cal analysis and comparison are presented to verify the per- the photometric cost function and make full use of all the
formance of our proposed approach and its usability with available information within an image. Thus they essentially
resource-constrained platforms. Comparing to state-of-the- bypass the feature processing pipeline and eliminate some
art point feature-based visual-inertial fusion methods, our of the issues found with feature-based methods. To ensure
approach achieves better robustness under extreme motions high image quality under different lighting conditions, cam-
or low frame rates, at the expense of slightly lower accu- era auto-exposure is usually employed due to the same phys-
racy in general scenarios. We release our implementation as ical location in space being imaged as having different inten-
open-source ROS packages. sities across frames. The photo-consistency assumption be-
hind direct dense tracking is easily proven wrong when the
Keywords Visual-inertial fusion · Edge alignment ·
lighting conditions of the environment change. In contrast
Tracking of aggressive motions · Visual-inertial odometry
to direct methods, our earlier work on edge alignment (Kuse
and Shen (2016)) uses the distance transform in the en-
ergy formulations and elegantly addresses the lack of photo-
1 Introduction consistency issue. Nevertheless, it fails if the captured im-
ages undergo severe blurring. In contrast to cameras, IMUs
Real-time, robust, and accurate state estimation is the fore- generate noisy but outlier-free measurements, making them
most important component for many autonomous robotics very effective for short-term tracking even under fast mo-
Yonggen Ling · Manohar Kuse · Shaojie Shen
tions. On the flip side, low-cost IMUs suffer significant drift
Department of Electronic and Computer Engineering, The Hong Kong in the long run. We believe that combining the complemen-
University of Science and Technology, Hong Kong, SAR China. tary nature of edge alignment and IMU measurements opens
E-mail: [email protected], [email protected], [email protected] up the possibility of reliable tracking of aggressive motions.
2 Yonggen Ling et al.

narios. We release our code as open-source ROS packages

with relevant video demonstrations available at https:
//github.com/ygling2008/direct_edge_imu.
The proposed system is an extension of our earlier
papers (Ling et al (2016); Kuse and Shen (2016)), with
improvements on edge alignment, co-estimation of IMU-
camera extrinsics and IMU biases, system integration and
(a) (b) performance evaluation. We believe that this contribution
is an important milestone towards a practical visual-inertial
system for tracking of aggressive motions, which would en-
able applications such as autonomous agile quadrotor flight
and augmented reality. The remainder of the paper is orga-
nized as follows. In Sect. 2, we review the state-of-the-art
scholarly work. An overview of the system is presented in
Sect. 3. Notations are given in Sect. 4. Edge alignment is in-
(c) (d) troduced in Sect. 5. Details of the sensor fusion framework
are discussed in Sect. 6, and Sect. 7 shows implementation
Fig. 1: Snapshots during challenging throw experiment. The
details and experimental evaluations. Finally, Sect. 8 draws
maximum linear velocity is 4 m/s and maximum angular ve-
conclusions and points out possible future extensions.
locity is 1000 degree/s. (a) Before sensor suite being thrown.
(b)(c)(d) After sensor suite has been thrown. Our proposed
method gives smooth and robust estimations in this experi-
ment. See Sect. 7.3 for details. 2 Related Work

There has been extensive scholarly work done in relation

Inspired by our earlier results towards fast motions to visual odometry, image registration, point cloud regis-
(Shen et al (2013); Ling et al (2016); Ling and Shen (2015)), tration, and visual-inertial state estimation. Visual measure-
in this work, we propose a novel approach that fuses comple- ments can be calculated from different camera configura-
mentary visual and inertial information for aggressive motions, such as monocular (Shen et al (2015); Hesch et al
tion tracking using lightweight and off-the-shelf sensors. In (2014); Li and Mourikis (2013); Scaramuzza et al (2014);
contrast to existing visual-inertial fusion approaches, we ex- Shen et al (2013)), stereo (Leutenegger et al (2015)), or
plicitly address the problems of lighting variations and es- RGB-D cameras (Huang et al (2011)). The majority of these
timator convergence using edge alignment and graph-based approaches rely on detecting and tracking sparse features
nonlinear optimization. Our method uses information from a across multiple frames. The most well known approach is
pair of calibrated stereo cameras and a MEMS IMU and runs called the Kanade-Lucas-Tomasi (KLT) algorithm (Tomasi
onboard a moderate computer. The focus of this work is a and Kanade (1991); Shi and Tomasi (1994)) and uses a
semi-tightly-coupled, probabilistic, optimization-based esti- feature detector and a feature tracker that make full use
mation method that fuses pre-integrated IMU measurements of spatial intensity information to reduce potential matches
and multi-constrained relative pose measurements from an between images. Thus, it is faster than the traditional im-
edge alignment module. Our estimator actively searches for age alignment methods. Many variants of the KLT algo-
multi-constrained edge alignments between frames within rithm have been developed and some of them are summa-
a sliding window. This loop closure-like method enables rized in Baker and Matthews (2004). In Baker and Matthews
the estimator to recover from complete loss of visual track- (2004), an inverse compositional approach, which greatly
ing and eliminate drifts after very aggressive motions. In improves the efficiency of the KLT algorithm, is presented.
addition, we initialize the incremental rotation for edge Other mainstream sparse feature algorithms are based on de-
alignment using the angular prior from IMU measurements, scriptors, such as FAST (Rosten and Drummond (2006)),
which greatly improves the convergence property during ag- Haris (Harris and Pike (1987)), Shi-Tomasi (Shi and Tomasi
gressive motions. Extensive statistical analysis and compar- (1994)), SIFT (Lowe (2004)), and SURF (Herbert Bay and
ison are presented to verify the performance of our pro- Gool (2008)). Sparse features are firstly detected and de-
posed approach and its usability with resource-constrained scribed, and then matched according to the distance in the
platforms. Comparing to state-of-the-art point feature-based feature descriptor space. Though feature-based methods are
visual-inertial fusion methods, our approach achieves bet- well-developed, they depend heavily on image quality for
ter robustness under extreme motions or low frame rates, feature detection and small image displacement for feature
at the expense of slightly lower accuracy in general sce- tracking. They fail when cameras undergo aggressive mo-
Edge Alignment-Based Visual-Inertial Fusion for Tracking of Aggressive Motions 3

tions that lead to large image movement and severe motion Intensity errors of image patches as well as inverse depth
blur. parametrization are considered in Bloesch et al (2015). The
With recent advances in high-performance mobile com- high-level effect of such fusion is the smoothing of vision-
puting, direct dense methods have become popular (Kerl based tracking. Even when visual tracking fails, the esti-
et al (2013); Newcombe et al (2011); Engel et al (2013, mation can be done by the use of an IMU for short-term
2014)). Kerl et al (2013) propose a probabilistic formula- motion prediction. In loosely-coupled methods, visual mea-
tion of direct dense tracking that is based on student distri- surements are usually presented in the form of relative pose
bution, which alleviates the influence of outliers and leads transformations, while leaving the visual pose tracking as a
to robust estimation. Newcombe et al (2011) present a sys- black box. This leads to lower computational complexity at
tem for real-time camera tracking and reconstruction using the cost of suboptimal results.
current commodity GPU hardware. Recent work on direct Recent developments in visual-inertial fusion indicates
dense tracking (Engel et al (2013, 2014)) models the uncer- that tightly-coupled methods outperform their loosely-
tainty on the inverse depth of pixels and exhibits amazing coupled counterparts in terms of estimation accuracy (Hesch
performance towards a large scale environment. However, et al (2014); Li and Mourikis (2013); Leutenegger et al
these methods rely on the photo-consistency assumption, by (2015); Shen et al (2015); G. Huang, M. Kaess and Leonard
which motion estimation can be done by following the local J.J. (2014); Christian et al (2015); Usenko et al (2016)).
gradient directions to minimize the total intensity error. Di- Shen et al (2015), Christian et al (2015) and Hesch et al
rect dense methods have a small basin of attraction (as noted (2014) propose a scheme of IMU preintegration on the Lie
in Kerl et al (2013)) and are sensitive to changing lighting manifold and then fuse it with monocular camera track-
conditions. ing information in a tightly-coupled graph-based optimiza-
Another way to estimate the states using visual mea- tion framework. The inertial measurement integration ap-
surements is an iterative closest point (ICP) based method, proaches in Leutenegger et al (2015), Shen et al (2015) and
which directly aligns three-dimensional point clouds. Christian et al (2015) are slightly different, with respec-
Stückler and Behnke (2012) employ a method based on ICP tive pros and cons. Hesch et al (2014), Li and Mourikis
for the alignment of point clouds obtained from an RGB- (2013) and G. Huang, M. Kaess and Leonard J.J. (2014)
D camera, while Rusinkiewicz and Levoy (2001) present put emphasis on the system’s observability and build up the
a survey of other attempts to use efficient ICP-like meth- mathematical foundation of visual-inertial systems. Upon
ods for pose estimation. A generalized formulation of ICP their analysis, they develop monocular visual-inertial sys-
is proposed in Segal et al (2005), in which ICP and point- tems that are high-precision as well as consistent. Besides
to-plane ICP are combined into a single probabilistic frame- this, Hesch et al (2014), Li and Mourikis (2013), Dong-Si
work. Fitzgibbon (2003) proposes an algorithm to align two and Mourikis (2012), L. Heng, G. H. Lee, and M. Polle-
two-dimensional point sets, and the algorithm is also ex- feys (2014), and Yang and Shen (2015) relax the assump-
tensible to three-dimensional point sets. Fitzgibbon (2003) tion that the transformation between the camera and IMU
uses the distance transform to model the point correspon- (cameras extrinsics) is known. IMU-camera extrinsics are
dence function to align two-dimensional curves. Since ICP also optimized in their algorithms, which takes a step for-
relies on three-dimensional points clouds, its applications to ward towards practical applications. Instead of geometric er-
monocular or stereo cameras, which generally give neither rors of sparse features, Usenko et al (2016) propose a direct
dense nor accurate three-dimensional points, are limited. visual-inertial odometry that minimizes the intensity errors.
To overcome the disadvantages of approaches with vi- Tightly-coupled methods consider the coupling between two
sion only, visual-inertial fusion has recently gained lots of types of measurements and allows the adoption of a graph
attentions. It is straightforward to apply some variations of optimization-based framework with the ability to iteratively
Kalman filtering (Huang et al (2011); Shen et al (2013); re-linearize nonlinear functions. In this way, tightly-coupled
Scaramuzza et al (2014); Omari et al (2015); Bloesch et al approaches gain the potential to achieve better performance.
(2015)) to loosely fuse visual and inertial measurements. On the other hand, these kinds of methods usually come with
Huang et al (2011) utilizes the information richness of RGB- a higher computational cost as the number of variables in-
D cameras and fuses the visual tracking and inertial mea- volved in the optimization is large.
surements in an extended Kalman filter (EKF) framework.
Shen et al (2013) combines the information from a KLT
tracker and IMU measurements with an unscented Kalman 3 System Overview
filter (UKF), and Omari et al (2015) leverages the recent de-
velopment of direct dense tracking and fuses it with inertial The pipeline of our proposed system is illustrated in Fig. 2.
information in the EKF fashion. Scaramuzza et al (2014) (Also see Table 1 in Sect. 7.1.) Three threads run simultane-
also applies EKF in a similar way to Shen et al (2013). ously, utilizing the multi-core architecture.
4 Yonggen Ling et al.

Fig. 2: The pipeline of our proposed system. It consists of three modules running in separate threads to ensure real-time
availability of state estimates (denoted as three dashed boxes). The driver thread runs at 200 Hz, the edge alignment thread
runs at 25 Hz, and the optimization-based sensor fusion thread runs at 25 Hz.

The first thread is the driver thread, which performs ba- and camera frame while taking the k th image. We assume
sic operations, such as data acquisition and image rectifica- that the IMU-camera sensor suite is rigidly mounted, and
tion. the translation and rotation between the left camera and
The edge alignment (Sect. 5) thread performs key- the IMU are tcb , qcb . The intrinsics of stereo cameras are
frame-to-frame edge alignment periodically. Canny edge de- calibrated beforehand. pX X X
Y , vY and RY are the 3D posi-
tection and distance transform are performed for each in- tion, velocity and rotation of camera frame Y with respect
coming image. The angular prior from the integration of gy- to frame X, respectively. We also have the corresponding
roscope measurements initializes the incremental rotation. quaternion (qX Y = [qx , qy , qz , qw ]) representation for rota-
This thread also identifies instantaneous tracking perfor- tion. Hamilton notation is used for quaternions. The states
mance, detects tracking failure and determines whether to are defined as the combinations of positions, velocities, ro-
add a new keyframe. A disparity map is computed using a tations, linear acceleration biases and angular velocity bi-
standard block matching algorithm for every new keyframe. ases xk = [pw w w bk bk
bk , vbk , qbk , ba , bω ]. For camera frame cr
The visual measurements and their corresponding frames (which denotes the reference frame) and camera frame cn
are stored in a frame list buffer for further processing by (which denotes the current frame), the rigid-body transfor-
the optimization thread. mation between them is Tnr = {pccnr , Rccnr } ∈ SE(3), where
The optimization-based sensor fusion thread maintains a pccnr and Rccnr are translation and rotation, respectively. Next
sliding window of states and measurements, and checks the we denote a 3D scene point i in the co-ordinate system of
frame list buffer periodically. If it is not empty, all the frames the camera optical center at time instance k by k fi ∈ R3 .
within the buffer will be added into the sliding window. If a The camera projection function Π : R3 7→ R2 projects the
keyframe is added, loop closure detection is performed to visible 3D scene point onto the image domain. The inverse
find possible visual connections between keyframes. Graph- projection function Π̃ : (R2 , R) 7→ R3 back-projects a
optimization is then applied to find the maximum a posteri- pixel coordinate given the depth at this pixel co-ordinate:
ori estimate of all the states within the sliding window using
measurements from IMU pre-integration (Sect. 6.2), multi-
k
constrained relative pose measurements (Sect. 6.3) and the ui = Π(k fi ) (1)
prior. A two-way marginalization scheme that selectively re- k
fi = Π̃(k ui , Zk (k ui )), (2)
moves states is used to bound the computational complexity
and the time interval of the IMU integration and to maximize
the information stored within the sliding window (Sect. 6.5).
where k ui ∈ R2 denotes the image coordinates of the 3D
point k fi , and Zk (k u) denotes the depth of point k fi . We use
4 Notations a graph structure to represent the variables (states, combina-
tion of poses and velocities) we aim to solve and constraints
We begin by giving notations. We consider (·)w as the (links) between variables. See Fig. 3 for an illustration and
earth’s inertial frame, (·)bk and (·)ck as the IMU body frame Sect. 6. for details about variables and constraints.
Edge Alignment-Based Visual-Inertial Fusion for Tracking of Aggressive Motions 5

Fig. 3: Graph representation of variables (xk = [pw w w bk bk

bk , vbk , qbk , ba , bω ]) and constraints (inertial links, prior links,
keyframe-to-frame links and loop closure links). See Sect. 6 for details about variables and constraints.

r
5 Edge Alignment Ei ) are pre-selected, then the function minj D(ui , uj ) is
exactly the definition of the distance transform (Felzen-
In this section, we introduce our formulation for relative szwalb and Huttenlocher (2012)). We denote the distance
camera motion estimation, which we refer to as the edge transform of the edge-map of the current image as V (n) :
alignment formulation. It is based on the minimization of R2 7→ R. Thus, the energy terms for an edge-pixel of the
the geometric error term at each edge pixel to obtain an reference frame are given by
estimate of the rigid body transform between two frames,
υei (R, p) = V (n) Π[R Π̃(r ei , Zr (r ei )) + p)] .

ie., to find a pose (rotation and translation matrix) such (5)
that the edges of the two images align. This is in contrast
To summarize, the relaxed energy function is
to previous direct methods, notably the one proposed by
Kerl et al (2013), which minimizes the photometric error X
f (R, p) = (υei (R, p))2 (6)
at every pixel. The energy formulation we propose in ∀ei
this work is the sum of the squared distances between and our goal is to solve for R∗ and t∗ :
transformed-projected (on current frame) co-ordinates of
the edge-pixels from the reference frame and the nearest argmin f (R, p) (7)
R,p
edge-pixels in the current frame.
subject to R ∈ SO(3).

5.1 Formulation 5.2 Optimization on Lie Group Manifolds

For convenience of notation we derive our energy formula- Since (5) is highly nonlinear with respect to R and p, we
tion using R and p as the alias to Rccnr and pccnr for a partic- linearize it on the Lie group manifolds SE(3) with respect
ular instance of the reference and current frames. Our pro- to ξ = (δp, δθ) ∈ se(3), which is the minimum dimension
posed geometric energy function is the sum of the distances error representation,
of the re-projections (of edge points from the reference im-
υei (R, p, ξ) = υei (R, p) + Ji|ξ=0 · ξ, (8)
age) and nearest edge points in the current image:
X where Ji is the Jacobian matrix of υei (R, p) with respect
min D2 Π[R r fi + p], n uj ,

f (R, p) = (3)
i
j to ξ at ξ = 0. Lie group SE(3) and Lie algebra se(3) can
be linked by an exponential map and logarithm map. More
where D : (R2 , R2 ) 7→ R denotes the Euclidean distance details can be found in Ma et al (2012).
between those points. The best estimates for the rigid trans- Unlike in our early work Kuse and Shen (2016), which
form can be obtained by solving the following optimization provides a strong theoretical guarantee on convergence, we
problem: adopt the Gaussian-Newton method to solve (7). We empir-
ically find that the Gaussian-Newton method works in most
minimize f (R, p) (4)
R,p cases and it converges to the local minimum quickly so the
subject to R ∈ SO(3). real-time requirement of our proposed system is satisfied. To
get rid of the disturbance caused by convergence, we have
We relax the geometric energy function by restricting it mechanisms to detect and reject failure of edge alignment
only to edge points. In this approach, we observe that, if the (Sect. 5.3 and Sect. 6.4).
image points corresponding to edge points in the reference Following the scheme of the Gaussian-Newton ap-
image (denoted by r ei ∈ R2 with corresponding 3D point proach, we iteratively solve (7) using the linearization (8)
6 Yonggen Ling et al.

around the current estimate T̂ = {R̂, p̂} and then perform 6 Sliding Window-Based Sensor Fusion
incremental updates until convergence:
Given two time instants that correspond to two images, we
T̂ ← T̂ ⊗ exp(ξ). (9) can write the IMU propagation model for position, velocity
and rotation with respect to the earth’s inertial frame:

Substituting (8) into (7) and then taking the derivative

pw w w w 2 w k
bk+1 = pbk + vbk ∆t − g ∆t /2 + Rbk αk+1
with respect to ξ and setting it to zero leads to the following
k
system: vbwk+1 = vbwk + Rw w
bk β k+1 − g ∆t (12)
qw
k+1 = qw
k ⊗ qkk+1 ,
JT Jξ = −JT ς, (10)
where ∆t is the interval between two image acquisitions,
where J is a Jacobian matrix that is formed by stacking Ja- and gw = [0 0 9.8] is the gravity vector in the earth’s iner-
cobians Ji and ς is the corresponding vector that is formed tial frame. αkk+1 , β kk+1 and qkk+1 are obtained by integrating
by stacking υei (R, p) together. the IMU measurements between time instants k and k + 1,
As has been observed by Kerl et al (2013), weighting with the definition detailed in Sect. 6.2.
large residues can help alleviate the effect of outliers aris-
ing due to reflections, occlusions, disocclusions and edge-
map misses. We use the Laplacian weighting term given by
w(υei (ξ)) = e−υei (ξ) and rewrite (10) as a weighted formu- 6.1 State Estimation Formulation
lation:
We set the initial position and yaw angle to be zero, and
define the full state vector as:
JT WJξ = −JT Wς, (11)

X = [x0 , x1 , ..., xN , tcb , qcb ],

where W is a diagonal matrix that encodes the Laplacian
weights. Fig. 4 shows the reprojections of edges-pixels as
the Gaussian-Newton optimization progresses. where xk = [pw w w bk bk
bk , vbk , qbk , ba , bω ]. We aim to obtain a
maximum a posteriori (MAP) estimate by minimizing the
sum of the Mahalanobis norm of the weighted visual mea-
surement residuals, inertial measurement residuals and the
5.3 Edge Alignments Self Check prior:

The proposed edge alignment abstracts the image as edges X

minimize ||bp − Hp X ||2 + ||rSi (ẑkk+1 , X )||2Pk +
and optimizes a function based on the distance transform for X
k∈Si
k+1

the relative pose for the key-frame-to-current-frame. This

(13)
results in an increased convergence basin and robustness to- X
wards changing lighting conditions. Inspite of its robustness, ||rSc (ẑji , X )||2(Wj )−1 Pj
i i
under certain extreme situations, edge alignment tends to (i,j)∈Sc

produce estimates with high uncertainty. This is detrimen-

tal to the overall performance of the system and detection where Si and Sc are the set of inertial and visual mea-
of such an event is crucial. For example, aggressive mo- surements respectively, rSi (ẑkk+1 , X ) is the residual func-
tions can cause severe motion-blur in captured images, the tion that measures the residual between the inertial mea-
effect being that the Canny edge detection module results in surements and X with covariance Pkk+1 , while rSc (ẑji , X ) is
temporally inconsistent edges. For another example, when the residual function that measures the reprojection error be-
captured images undergo changing lighting conditions, the tween the visual measurements and X with covariance Pji .
detected edges may also be inconsistent among consecutive Since visual measurements are subject to failure, we added
images. a diagonal matrix Wij to weight the influence of rSc (ẑji , X ).
We propose to use the average reprojected distance as bp and Hp are the prior of the states, which will be de-
the criterion for the self check and reporting of failures. We tailed in the two-way marginalization section (Sect. 6.5).
evaluate the value of the cost function (6) at the final itera- Inertial measurements are obtained by IMU preintegration
tion (f (R∗ , p∗ )) divided by the number of edge pixels. An (Sect. 6.2) and visual measurements are obtained by multi-
appropriate threshold is set to detect failure of convergence. constrained edge alignments (Sect. 6.3).
Edge Alignment-Based Visual-Inertial Fusion for Tracking of Aggressive Motions 7

(a) Reference Image Ir (b) Current Image In

(h) Iteration 0 (i) Iteration 2 (j) Iteration 4 (k) Iteration 6 (l) Iteration 8

Fig. 4: Reprojections of edge-pixels in the reference frame onto the current frame as the Gaussian-Newton optimization
progresses. The middle row shows the reprojections on the current gray image. They are false colored to represent υei (ξ).
The last row shows reprojections on the distance transform image of the edge-map of the current frame. Note that the current
frame and the reference frame are about 160 ms apart (5 frames) and the Gaussian-Newton method progress is shown without
pyramids, with the initial guess as the identity. Viewing in color is recommended.

6.2 IMU Preintegration The residual function between the states and the IMU inte-
gration is defined as
We adopt the IMU preintegration approach proposed in
Yang and Shen (2016). The linear acceleration abt and an-
gular velocity ω bt at time t are modeled as  k 
δαk+1
∗
 δβ kk+1 
abt = abt + bbat + nbat (14) rSi (ẑkk+1 , X ) = 
 k 
 δθ k+1 

bt bt ∗  δbbk 
ω =ω + bbωt + nbωt (15) a
δbbωk
∗ ∗
where abt and ω bt are true values, bbat and bbωt are slowly w ∆t2
 k

Rbwk (pw w w
bk+1 − pbk − vbk ∆t + g 2 ) − α̂k+1
varying biases which are modeled as Gaussian random  k 
walks, and nbat and nbωt are additive Gaussian white noises.

 Rbwk (vbwk+1 − vbwk + gw ∆t) − β̂ k+1 

= k −1 w −1 w .
The integration from IMU measurements between time
 2[(q̂k+1 ) (qbk ) qbk+1 ]xyz 
b
bak+1 − bbak
 
instants k and k + 1 is  
bk+1 bk
bω − bω
Rkt abt dt2
 RR 
 k  t∈[k,k+1] (17)
α̂k+1 
R
Rk abt dt 
k  k  
ẑk+1 =  β̂ k+1  =  " t
t∈[k,k+1] # .

b b
R 1 − ω t
× ω t
k
q̂kk+1
 
t∈[k,k+1] 2 bt T q t dt
−ω 0 The covariance Pkk+1 can be calculated by iteratively
(16) linearizing the continuous-time dynamics of the error term
8 Yonggen Ling et al.

and then updating it with discrete-time approximation: not an easy job as different external conditions may result
in different parameter settings. We propose to use the IMU
Pkt+δt =(I + Ft δt) · Pkt · (I + Ft δt)T
preintegration to threshold the performance of instantaneous
+ (I + Gt δt) · Qt · (I + Gt δt)T , (18) edge alignment. The characteristics of an IMU can be eas-
with the initial condition Pkk = 0. Ft and Gt are the ily calculated offline and are supposed to be known prior
state transition Jacobians with respect to the states and the to the starting of the system. We can detect possible false
IMU measurement noise, respectively. Detailed derivations edge alignment according to the difference between the IMU
can be found in Yang and Shen (2016). IMU preintegration preintegration and edge alignment estimate. We ignore the
forms constraints between consecutive state variables (iner- visual measurements from edge alignment if they are not
tial links in the graph model, see Fig. 3). consistent with the IMU preintegation (both for the rotation
and translation estimation). An IMU-aided external check is
a vitally important step towards a practical and robust sys-
6.3 Multi-constrained Edge Alignments tem.
We declare edge alignment estimates as failures if either
Edge alignment (Sect. 5) is performed between the latest
of the criteria (Sect. 5.3 and Sect. 6.4) fail.
keyframe within the sliding window and the latest incom-
ing frame (also referred to as the current frame). The re-
sultant visual measurements are named key-frame-to-frame
links in the graph model (Fig. 3). In addition, since signif-
icant drifts may occur after aggressive motions, we intro- 6.5 Two-way Marginalization
duce a local loop closure module for recovery. Once a new
keyframe is added, loop closure detection is performed to
Due to the limited memory and computational resources of
seek possible visual measurements between existing key-
the system, we can only maintain a certain number of states
frames within the sliding window and the new keyframe us-
and measurements within the sliding window. We convert
ing edge alignment. Note that the cross check is adopted to
states that carry less information into priors {Λp , bp } by
avoid wrong loop closure. If and only if the two correspond-
marginalization, where Λp = HTp Hp . Note that the ef-
ing estimated rigid-body transformations are consistent, the
fectiveness of loop closure (Sect. 6.3) and drift elimination
cross check is passed. The outputs from the loop closure de-
depends on whether an older state is kept within the slid-
tection module are denoted as loop closure links in the graph
ing window. For this reason, unlike traditional approaches,
model (Fig. 3).
which only marginalize old states, we use the two-way
Suppose the visual measurement between reference
marginalization scheme that was first introduced in our ear-
frame i and aligned frame j obtained from edge alignment
∗ lier work, Shen et al (2014), to selectively remove old or
is ẑji = Tji . The residual function is defined as
more recent states in order to enlarge the covered regions of
j
j δpi the sliding window.
rSc (ẑi , X ) =
δθ ji Fig. 5 illustrates the process of our two-way marginal-
"
b cj c cj
# ization. Front marginalization removes the second newest
Rwj (pw w b
bi − pbj ) − Rc (Rci tb + p̂ci ) − tc
b
= c , state, while back marginalization removes the oldest state.
2[(qcb q̂cji qcb )−1 (qw
bj )
−1 w
qbi ]xyz
Blue circles represent key states, green circles represent the
(19) states to be marginalized and brown circles represent the
incoming states. The relation between states and frames is
where q̂ji is the quaternion representation of R̂ji , and vice
that states include poses and velocities, while frames include
versa. It can be derived mathematically that the correspond-
poses and images. Each frame has its corresponding state
ing covariance Pji is the inverse of the Hessian matrix
and vice versa. States are linked by IMU preintegration (in-
JT WJ at the final Gaussian-Newton iteration.
ertial link), incremental edge alignment (tracking link), loop
closure (loop closure link) and the prior (prior link). To per-
6.4 IMU-aided External Check form front marginalization, the second newest state is first
linked with the incoming state (step 1) and then marginal-
Since the IMUs provide noisy but outlier-free measure- ized out (step 2). For back marginalization, the oldest state
ments, the estimation using IMU preintegration is short- is simply marginalized out (steps 1-2). After marginaliza-
term reliable. Moreover, though we can tune the related pa- tion, the third step decides which state is to be marginalized
rameters so that the edge alignment exhibits good perfor- in the next round (front marginalization or back marginal-
mance, it fails as the surroundings are complicated and un- ization). Mathematically, to marginalize a specific state, we
known in advance. Additionally, tuning the parameters is remove all links related to it and then add the removed links
Edge Alignment-Based Visual-Inertial Fusion for Tracking of Aggressive Motions 9

Fig. 5: The process of our two-way marginalization, which marginalizes all the available information (motion estimates from
edge alignment, inertial measurement, loop closure relation and prior) into a new prior and maintains bounded computational
complexity. Front marginalization marginalizes the second newest state, while back marginalization marginalizes the oldest
state within the sliding window.

into a prior: the system is then updated (step 1 in front marginalization).

X Also, by marginalization, we ensure that the time period
Λp = Λp + (Hkk+1 )T (Pkk+1 )−1 Hkk+1 for each IMU preintegration is bounded in order to bound
k∈Si− the accumulated error in the IMU measurements. Two-way
X marginalization preserves the relations between states and
+ (Hki )T (Pki )−1 Hki (20)
serves as the prior links in the graph model (Fig. 3).
(i,k)∈Sc−

6.6 Optimization with Robust Norm

X
bp = bp + (Hkk+1 )T (Pkk+1 )−1 rSi (ẑkk+1 , X )
k∈Si −
X Based on the residual functions defined in (17) and (19),
+ (Hki )T (Pki )−1 rSc (ẑji , X ), (21) we operate on the error state and optimize (13) using the
(i,k)∈Sc − Gaussian-Newton method, which iteratively minimizes
where Si− and Sc − are the set of removed IMU preintegra-
X
min ||bp − Hp X ||2 + ||rSi (ẑkk+1 , X ) + Hkk+1 δX ||2Pk
tion measurements and visual measurements, respectively. δX k+1
k∈Si
The prior is then marginalized via the Schur complement (22)
(Sibley et al (2010)). X
The criteria to select whether to use front or back + ||rSc (ẑji , X ) + Hji δX ||2(Wj )−1 Pj
i i
(i,j)∈Sc
marginalization are based on the edge alignment perfor-
mance. If the edge alignment is good and the second newest and then updates
state is near to the current keyframe, the second newest state
will be marginalized in the next round. Otherwise, the oldest X̂ = X̂ ⊕ δX (23)
state will be marginalized if it fails.
Note that our two-way marginalization is fundamentally until convergence. Hkk+1 and Hji are the Jacobian matrices
different from traditional keyframe-based approaches that of the inertial measurements and visual measurements with
simply drop non-keyframes. We preserve all the information respect to the states.
(IMU and edge alignment) from non-keyframes by only per- To increase the robustness of our proposed system, Wij
forming marginalization after the newest state comes, and changes in each iteration to further eliminate the possible
10 Yonggen Ling et al.

outliers in edge alignment that pass the DE-A self check depth map from the stereo camera, we use a block match-
(Sect. 5.3) and IMU-aided external check (Sect. 6.4). Wij ing algorithm implemented in OpenCV (StereoBM). Since,
is computed according to the Huber norm thresholding on the proposed edge alignment requires depth values at edge
the current estimate pixels only, a simple stereo block matching suffices for
 our needs. We adopt image pyramids (with three levels) in
 I3×3 , if ||Rj (p0 − p0 ) − t̂j || ≤ ct the edge alignment to handle the large image displacement
j 0 i j i
(Wi )ul = ct
 ||Rj (p0 −p I
0 )−t̂j || 3×3 , otherwise caused by fast motion and increase the speed of convergence
0 j i i
for the underlying iterative optimization procedure. We set
(24)
 the size of the sliding window to be 30. The threshold of
 I3×3 , if ||2[(q̂ji )−1 (q0j )−1 q0i ]xyz || ≤ ca the average reprojection distance of the edge alignment self
(Wij )lr = ca check is set to 5. For the local loop closure module, we firstly
 I ,
||2[(q̂ji )−1 (q0j )−1 q0i ]xyz || 3×3
otherwise,
do the cross check of the edge alignment at the coarsest
(25) level, and ignore the candidates that fail this test. We then
do the cross check of the edge alignment with full image
where (Wij )ul is the upper left 3 × 3 matrix of Wij , (Wij )lr pyramids for the remaining candidates. Meanwhile, we re-
is the lower right 3 × 3 matrix of Wij , I3×3 is an identity strict the number of cross checks with full image pyramids
matrix, and ct and ca are the given translation and angular so as to limit the maximum time spent on the loop closure
threshold, respectively. module. The computing times of each component are sum-
marized in Table I.
We do not impose a global prior (like fixing the old-
7 Experiments est pose) when solving equations (13) and (22). Instead, we
solve equations (13) and (22) without any global prior (the
For sensing, we use a VI-sensor1 which consists of a MEMS resultant equations may not be well constrained, we thus
IMU and two global shutter cameras with a fronto-parallel use perturbed Cholesky decomposition that ensures positive
stereo configuration. A power efficient small-form factor definiteness to solve them). The obtained positions and yaw
computer, the Intel NUC2 with a dual-core CPU i5-4250U angles of states in the sliding window after the iteration are
running at 1.3 GHz and 16 GB RAM is used for the com- subtracted by the position and the yaw angle difference of
puting needs. All the algorithms are developed in C++ with the oldest pose before and after the iteration. We do NOT
ROS as the interfacing robotics middleware. The IMU gen- enforce a global prior as the pitch and roll angle of the old-
erates data at 200 Hz and the stereo camera produces time est state in the sliding window are observable. The initial
synchronized data at 25 Hz. position and yaw angle of the oldest pose at time instant b0
before the iteration are zero. Prior matrix Λp and prior vec-
tor bp obtained in the marginalization step are relative priors
7.1 Real-time Implementation between states.

Component Average Computation Time Thread

7.2 Tracking in Changing Lighting Conditions
Driver 1 ms 0
Edge Alignment 11 ms 1
Canny Edge Detection 2 ms 1 We record an image sequence which spans different rooms.
Distance Transform 2 ms 1 The path spans rooms that are dimly lit, followed by a rather
Block Matching 8 ms 1 featureless corridor, which leads to a room lit by sunlight
Graph Optimization 6 ms 2
Two-way Marginalization 3 ms 2 (brightly lit), followed again by a corridor that is dimly lit
Loop Closure 28 ms 2 and a bright corridor. In such a situation, where ambient
brightness is changing, it is not appropriate to disable the
Table 1: Average computation time of the main time- camera auto exposure. The charge couple devices (CCDs)
consuming components of our proposed system. in cameras usually have a low dynamic range. The auto-
exposure module of the camera adjusts the exposure time to
match the mid-tone of the scene to the mid-tone of the cap-
To achieve real-time performance we set the finest res- tured image by means of an internal exposure module. This
olution for edge alignment to be 320×240. To estimate the module is present in almost every camera and allows it to
1
http://www.skybotix.com/ produce better image quality. Disabling this camera module
2
http://www.intel.com/content/www/us/en/nuc/ to satisfy the photo-consistency assumption is detrimental
overview.html to the overall image quality. For example, fixing the expo-
Edge Alignment-Based Visual-Inertial Fusion for Tracking of Aggressive Motions 11

(a) Transition from indoor corridor (b) Transition from outdoor corridor (c) Featureless and structured sur- (d) Fast changing lighting condi-
to outdoor corridor. to indoor corridor. roundings. tions and strong reflection on win-
dows.

Fig. 6: Part of captured images during a walk around a circular path with various lighting conditions.

estimated trajectory The segments within blue dashed boxes indicate the lo-
keyframe insertions via DEA self−check
15 failture detection via IMU−aided external check
cations at which the captured surroundings transform from
an indoor corridor to an outdoor corridor or from an outdoor
10
corridor to an indoor corridor (Fig. 6 (a), (b)). Since lighting
conditions change rapidly and greatly, edge detection is not
5
consistent between frames and results in alignment failure.
Our proposed system is able to detect this alignment fail-
Y (meters)

0
ure by the IMU-aided external check, and this increases the
robustness of the system.
−5 The segments within the purple dashed boxes indicate
the locations at which the captured surroundings are fea-
−10 tureless (Fig. 6 (c)). Since our estimator tracks incremen-
tal motions based on edges instead of sparse features, the
−15 alignment succeeds in most cases. Though occasional fail-
−20 −15 −10 −5 0 5 10
X (meters) ures exist (see the blue circles), our system overcomes them
by the IMU-aided external check.
Fig. 7: The estimated trajectory of a walk around a circular The segments within the red dashed boxes indicate the
path with different lighting conditions. locations at which the lighting conditions of the captured
surroundings change rapidly and alternately. (Fig. 6 (d)).
The changing and alternating lighting conditions are caused
by the transitions between the glass and borders of the win-
sure in a dimly lit room and using this exposure setting in a
dows. We also notice that there are strong reflections on the
brighter room causes severe degradation of image quality.
windows on both sides. Our edge alignment module inserts
We note that the previous visual odometry algorithms keyframes frequently to handle these cases (see the green
which rely on the photo-consistency assumption fail in this circles). Again, the IMU-aided external check is required
challenging environment. The dense approach proposed by (see the blue circles).
Kerl et al (2013) fails to produce any meaningful results due
to the violation of the photo-consistency assumption. Fur-
thermore, the feature based algorithms also fail in this situ-
7.3 Throw it!
ation because of the rather featureless corridors.
Fig. 7 presents the estimated output of our estimator. The In this experiment, we test the proposed system with ex-
red curve represents the estimated trajectory, green circles treme experimental conditions. This experiment is firstly
represent the locations at which new keyframes are inserted designed for demonstration of the superior tracking perfor-
by the edge alignment self check, and blue circles represent mance of our previous work Ling and Shen (2015). We play
the locations at which edge alignment is detected as failed back the recorded data and redo this experiment using the
by the IMU-aided external check. Some of the images cap- proposed system in this work. In this challenging experi-
tured during this experiment are shown in Fig. 6 (a)-(d). The ment, we throw the VI-sensor while walking (Fig. 1). The
total distance travelled is about 120 meters and the final drift total walking distance is about 50 meters and the final po-
is about 1.4 meters. More details can be found in the accom- sition drift is about 2.24 meters. VI-sensor is thrown eight
panying video. times in total. The estimation results of our proposed method
12 Yonggen Ling et al.

15 cameras in OKVIS. To separate the effects of local loop clo-

10
sure and integrating IMUs, our proposed system is tested
on four settings. For “Edge-Only” setting, neither local loop
5
closure nor IMU measurements are used; for “Edge+Loop”
0
setting, local loop closure is used; for “Edge+IMU” setting,
−5 IMU measurements are used; for “Edge+IMU+Loop” set-
−2
ting, both local loop closure and IMU measurements are
−1
used. The accuracy of the estimated position and orientation
is measured using the average relative rotation error (ARE-
0 estimated trajectory rot) and the average relative translation error (ARE-trans)
failture detection via DEA self−check

1
failture detection via IMU prior proposed in Geiger et al (2012). The summaries are shown
in Table 2. No data means the concerned method fails to
2
−4 −2 0 2 4 6 8 10 12
converge at some point in the sequence.
OKVIS, a tightly coupled feature-based approach, is the
best in terms of ARE-rot and ARE-trans. Nevertheless, it
Fig. 8: The estimated trajectory of a walk around a circular
fails to track in the V2 03 difficult sequence. The other ap-
path. We throw the VI-sensor while walking.
proaches are able to track all the sequences successfully. The
ARE-rot and ARE-trans of “Edge+IMU” are smaller than
are shown in Fig. 8. From the figure, we see that our es- “Edge+IMU+Loop” in most of the sequences. This is be-
timator can successfully track the motions of these eight cause local loop closure usually causes a noticeable pose
throws, resulting in a smooth estimated trajectory. Though correction to the latest estimate, which is unfriendly for the
edge alignment in our proposed system is able to handle relative metrics of ARE-rot and ARE-trans. The same for
large image displacement caused by challenging motions, the comparison of “Edge-Only” and “Edge+Loop”. In terms
it fails when the motions become more and more aggres- of the ARE-rot, our proposed method (“Edge+IMU+Loop”)
sive (captured images become more and more blurry, see is better than ROVIO. However, for the ARE-trans, ROVIO
the green and blue circles for indications). Inertial measure- obtains smaller errors than our approach. The reason is that
ments are the last resort that provide crucial links between the estimation of rotation is not related to the scene depth,
consecutive states to ensure continuous operation of the es- its error only depends on the number of pixels that well-
timator. Moreover, failure detection via edge alignment self- constraints the rotation. ARE-trans greatly depends on scene
check (highlighted with green circles, detailed in Sect. 5.3) depth, thus ROVIO, which is a tight-coupled approach that
and failure detection via IMU prior (highlighted with blue jointly optimizes the poses and the scene depth, performs
circles, detailed in Sect. 6.4) are of vital importance to the better than our method.
smoothness of the estimated trajectory. In all cases, our local
loop closure is able to largely eliminate drifts after throwing 7.5 Discussions on Convergence Basin
(Sect. 6.3).
Notice that, to the best of our knowledge, this experi- One advantage of our proposed method compared to dense
ment is the toughest testing for a visual-inertial estimator tracking based on image intensities is that the convergence
that has ever been reported. basin is larger. Put differently, what it means is, with the pro-
posed formulation, the iterations converge even for a rather
poor initial guess. We evaluate this property via skipping
7.4 Performance on the EuRoC MAV Dataset frames (downsampling the image temporal frequency). Sup-
pose the origin image temporal frequency is fn and the num-
We compare our proposed method and other state-of-the- ber of skipped frames is sm . The downsampled temporal
art approaches on the public EuRoC MAV dataset (Burri frequency is
et al (2016)). The complexity of the sequences in this
fn
dataset varies in terms of trajectory length, flight dynam- fn0 = . (26)
ics, and illumination conditions. The reference methods are 1 + sm
OKVIS (Leutenegger et al (2015)) and ROVIO (Bloesch We use the most difficult sequence (V2 03 difficult) of
et al (2015)). Both OKVIS and ROVIO contain the default the EuRoc MAV dataset to give a detailed assessment of our
parameters for the EuRoC MAV dataset in their open-source system. Since OKVIS fails to track this sequence, we ex-
implementations. Since we use stereo cameras in our pro- clude it in this comparison. We compare our system with
posed system, for fair comparison, we set the “doStereoIni- ROVIO. The same as the previous experiment, we set the
tialization” flag to be true in ROVIO, and also use stereo “doStereoInitialization” flag to be true in ROVIO for fair
Edge Alignment-Based Visual-Inertial Fusion for Tracking of Aggressive Motions 13

Sequence OKVIS ROVIO Edge+IMU+Loop Edge+IMU Edge+Loop Edge Only

MH 01 easy 0.006715 0.014446 0.009401 0.008921 0.015630 0.015630
MH 02 easy 0.006412 0.014243 0.009286 0.010063 0.016520 0.010659
MH 03 medium 0.007525 0.011873 0.009198 0.009005 0.013628 0.011600
MH 04 difficult 0.005876 0.011875 0.012522 0.011266 0.020757 0.019770
MH 05 difficult 0.004875 0.009175 0.011338 0.010784 0.018719 0.014592
V1 01 easy 0.024244 0.034842 0.035249 0.035605 0.129194 0.106800
V1 02 medium 0.042781 0.054126 0.044258 0.046482 0.054274 0.050339
V1 03 difficult 0.049682 0.067790 0.057715 0.059629 0.069350 0.060957
V2 01 easy 0.018585 0.026823 0.018803 0.018513 0.026975 0.025404
V2 02 medium 0.040857 0.057114 0.053272 0.056820 0.069222 0.065446
V2 03 difficult - 0.075503 0.064237 0.066026 0.087980 0.084640

Table 2: Average relative angle error (ARE-rot, deg/m) of different approaches on the EuRoC MAV dataset. The best result
is bold. No data means the concerned method fails to converge at some point in the sequence.

Sequence OKVIS ROVIO Edge+IMU+Loop Edge+IMU Edge+Loop Edge Only

MH 01 easy 0.000346 0.000915 0.001023 0.000982 0.001041 0.001041
MH 02 easy 0.000375 0.001063 0.001106 0.001253 0.001401 0.001083
MH 03 medium 0.000548 0.001289 0.001884 0.001675 0.002500 0.002563
MH 04 difficult 0.000440 0.003783 0.003586 0.002986 0.004070 0.003987
MH 05 difficult 0.000426 0.001271 0.002179 0.002014 0.003459 0.003357
V1 01 easy 0.000786 0.001543 0.002883 0.002835 0.003666 0.003571
V1 02 medium 0.001311 0.002322 0.003494 0.003982 0.006836 0.006216
V1 03 difficult 0.001204 0.002152 0.002475 0.002509 0.004593 0.004055
V2 01 easy 0.000523 0.001018 0.002081 0.002083 0.002509 0.002508
V2 02 medium 0.001018 0.001794 0.002667 0.002720 0.004530 0.003304
V2 03 difficult - 0.002270 0.002656 0.002496 0.007265 0.006052

Table 3: Average relative translation error (ARE-trans, m/m) of different approaches on the EuRoC MAV dataset. The best
result is bold. No data means the concerned method fails to converge at some point in the sequence.

Skipped Number ROVIO Edge+IMU+Loop Edge+IMU Edge+Loop Edge-Only

1 0.120224 0.069523 0.067445 0.093351 0.088024
2 0.139539 0.076872 0.074963 - -
3 0.182557 0.094589 0.091109 - -
4 0.233408 0.094585 0.079288 - -
5 - 0.115803 0.115535 - -
6 - 0.123581 0.123147 - -
7 - 0.152680 - - -

Table 4: Comparison between different methods with different numbers of skipped frames in the V2 03 difficult sequence
of the EuRoc MAV dataset. Error metrics are average relative angle error (ARE-rot, deg/m). The best result is bold. No data
means the concerned method fails to converge at some point in the sequence.

Skipped Number ROVIO Edge+IMU+Loop Edge+IMU Edge+Loop Edge-Only

1 0.004429 0.011473 0.011459 0.012937 0.012262
2 0.007028 0.017675 0.015349 - -
3 0.011486 0.028945 0.022221 - -
4 0.021452 0.039633 0.031632 - -
5 - 0.039131 0.039562 - -
6 - 0.065406 0.065720 - -
7 - 0.108776 - - -

Table 5: Comparison between different methods with different numbers of skipped frames in the V2 03 difficult sequence
of the EuRoc MAV dataset. Error metrics are average relative translation error (ARE-trans, m/m). The best result is bold.
No data means the concerned method fails to converge at some point in the sequence.
14 Yonggen Ling et al.

comparison. Our system runs with four settings (“Edge-

Only”, “Edge+Loop”, “Edge+IMU”, “Edge+IMU+Loop”).
Details are shown in Table 3. No data means the concerned
method fails to converge at some point in the sequence. estimated trajectory
ARE-rot and ARE-trans metrics are used for comparison. 5
ROVIO loses track if the number of skipped frames is equal

z (m)
0

to or more than 5 while our system loses track if the num- −5

50
ber of skipped frames is equal to or more than 8. The in-
40
tegration of the IMU has a significant improvement on the 5
30
tracking accuracy and robustness. Firstly, it provides an ini- 0
20
tial pose estimate for edge alignment, especially for the ro-
−5
10
tation estimate, which greatly reduces the risks of trapping
y (m) −10
in wrong local regions. Secondly, the fusion formulation 0 x (m)

involving IMU measurements optimizes velocities, which

helps to bound the poses according to the differentiation (a) Trajectory
equation. Finally, IMU measurements are noisy but outlier-
free. The integration of IMU measurements is a good refer-
ence to detect tracking failure of the edge alignment module.
The local loop module seems to be useless in terms of ARE-
rot and ARE-trans metrics. However, it relocalizes the latest
pose when the edge alignment fails and the prediction from
IMU measurements is not accurate after long-term integra-
(b) Current frame (c) Current frame edges
tion. It also helps to bound the poses and velocities within
the sliding window. As a result, the following poses to be
estimated will be bounded according to the differentiation
equation (See that the sections of the skipped number are
more than 4).

7.6 Tracking in an Outdoor Environment with More

(d) Current frame distance trans- (e) Keyframe edges
Complex Textures and Less Prominent Edge Data form

We further test our system performance in an outdoor en- Fig. 9: Our system is able to track in an outdoor environ-
vironment with more complex textures and less prominent ment with more complex textures and less prominent edge
edge data. There are trees, grass and shadows in the test se- data. (a) The estimated trajectory. (b) One of the captured
quence. Our system is able to handle this tracking sequence. images. (c) The edges detected in the current frame. (d)
The estimated trajectory is shown in Fig. 9 (a). One of the The distance transform of the current frame. (e) The ref-
captured images is shown in Fig. 9 (b). Corresponding edges erence keyframe edges. More tracking details can be found
and distance transform are shown in Fig. 9 (c) and (d). Ref- in the supplementary video: https://1drv.ms/u/s!
erence keyframe edges are shown in Fig. 9 (e). The total ApzRxvwAxXqQmgX66v7srdWZNvAs
travel distance is about 120 meters and the final position
drift is about 1.5 meters. More tracking details can be found
in the supplementary video: https://1drv.ms/u/s!
pose a semi-tightly coupled probabilistic framework for fu-
ApzRxvwAxXqQmgX66v7srdWZNvAs.
sion of sensor states over a sliding window. The multi-thread
framework enables a fast and stable estimate with only the
CPU of an off-the-shelf computing platform. Experiments
8 Conclusions and Future Work
have verified the performance of our system and its poten-
We propose a novel and robust real-time system for state tial for use in embedded system applications.
estimation of aggressive motions. Our system is designed We note that the tightly-coupled methods, that jointly
specifically for aggressive quadrotor flights or other applica- optimize poses and point depth, outperform our semi-
tions in which aggressive motions are encountered (such as tightly coupled approach if their front-end trackers work
augmented reality). We employ a novel edge-tracking for- well. As the future work, we will integrate the front-end
mulation for visual relative pose estimation. We also pro- edge tracker and back-end tightly-coupled optimization in a
Edge Alignment-Based Visual-Inertial Fusion for Tracking of Aggressive Motions 15

whole framework to achieve better performance. The main Herbert Bay TT Andreas Ess, Gool LV (2008) Speeded up
challenge is to handle the greatly increased system complex- robust features. In: Computer Vision and Image Under-
ity. A probabilistic formulation of the edge alignment will standing
also be considered. Hesch JA, Kottas DG, Bowman SL, Roumeliotis SI (2014)
Consistency analysis and improvement of vision-aided in-
ertial navigation. IEEE Trans Robot 30(1):158–176
Huang AS, Bachrach A, Henry P, Krainin M, Maturana D,
9 Acknowledgments Fox D, Roy N (2011) Visual odometry and mapping for
autonomous flight using an RGB-D camera. In: Proc. of
The authors acknowledge the funding support from HKUST the Intl. Sym. of Robot. Research, Flagstaff, AZ
internal grant R9341 and HKUST institutional scholarship. Kerl C, Sturm J, Cremers D (2013) Robust odometry estima-
We would like to thank all AURO reviewers for their excep- tion for rgb-d cameras. In: Proc. of the IEEE Intl. Conf.
tionally useful reviews. on Robot. and Autom.
Kuse M, Shen S (2016) Robust camera motion estimation
using direct edge alignment and sub-gradient method. In:
Proc. of the IEEE Intl. Conf. on Robot. and Autom.
References
L Heng, G H Lee, and M Pollefeys (2014) Self-calibration
and visual slam with a multi-camera system on a micro
Baker S, Matthews I (2004) Lucas-kanade 20 years on: A
aerial vehicle. In: Proc. of Robot.: Sci. and Syst.
unifying framework. Intl J Comput Vis 56(3):221–255
Leutenegger S, Furgale P, Rabaud V, Chli M, Konolige K,
Bloesch M, Omari S, Hutter M, Roland S (2015) Robust vi-
Siegwart R (2015) Keyframe-based visual-inertial using
sual inertial odometry using a direct ekf-based approach.
nonlinear optimization. Intl J Robot Research 34(3):314–
In: Proc. of the IEEE/RSJ Intl. Conf. on Intell. Robots and
334
Syst.
Li M, Mourikis A (2013) High-precision, consistent ekf-
Burri M, Nikolic J, Gohl P, Schneider T, Rehder J, Omari S,
based visual-inertial odometry. Intl J Robot Research
Achtelik MW, Siegwart R (2016) The euroc micro aerial
32(6):690–711
vehicle datasets. Intl J Robot Research 35(10):1–11
Ling Y, Shen S (2015) Dense visual-inertial odometry for
Christian F, Luca C, Frank D, Davide S (2015) Imu
tracking of aggressive motions. In: Proc. of the IEEE Intl.
preintegration on manifold for efficient visual-inertial
Conf. on Robot. and Bio.
maximum-a-posteriori estimation. In: Proc. of Robot.:
Ling Y, Liu T, Shen S (2016) Aggresive quadrotor flight us-
Sci. and Syst.
ing dense visual-inertial fusion. In: Proc. of the IEEE Intl.
Dong-Si T, Mourikis AI (2012) Estimator initialization in
Conf. on Robot. and Autom.
vision-aided inertial navigation with unknown camera-
Lowe DG (2004) Distinctive image features from scale-
IMU calibration. In: Proc. of the IEEE/RSJ Intl. Conf. on
invariant keypoints. International Journal of Computer
Intell. Robots and Syst.
Vision 60(2):91–110
Engel J, Sturm J, Cremers D (2013) Semi-dense visual
Ma Y, Soatto S, Kosecka J, Sastry SS (2012) An invitation
odometry for a monocular camera. In: Proc. of the IEEE
to 3-d vision: from images to geometric models, vol 26.
Intl. Conf. Comput. Vis., Sydney, Australia
Springer Science & Business Media
Engel J, Schöps T, Cremers D (2014) LSD-SLAM: Large-
Newcombe RA, Lovegrove S, Davison AJ (2011) DTAM:
scale direct monocular SLAM. In: Euro. Conf. on Com-
Dense tracking and mapping in real-time. In: Proc. of the
put. Vis.
IEEE Intl. Conf. Comput. Vis.
Felzenszwalb PF, Huttenlocher DP (2012) Distance Trans-
Omari S, Bloesch M, Gohl P, Siegwart R (2015) Dense
forms of Sampled Functions. Theory of computing
visual-inertial navigation system for mobile robots. In:
8(1):415–428
Proc. of the IEEE Intl. Conf. on Robot. and Autom.
Fitzgibbon A (2003) Robust registration of 2D and 3D point
Rosten E, Drummond T (2006) Machine learning for high-
sets. Image and Vision Computing 21(14):1145–1153
speed corner detection. In: IEEE Conference on European
G Huang, M Kaess and Leonard JJ (2014) Towards Consis-
Conference on Computer Vision
tent Visual-Inertial Navigation. In: Proc. of the IEEE Intl.
Rusinkiewicz S, Levoy M (2001) Efficient variants of the
Conf. on Robot. and Autom., Hong Kong
ICP algorithm. In: International Conference on 3-D Imag-
Geiger A, Lenz P, Urtasun R (2012) Are we ready for au-
ing and Modeling, pp 145–152
tonomous driving? the KITTI vision benchmark suite. In:
Scaramuzza D, Achtelik M, Doitsidis L, Fraundorfer F,
Proc. of the IEEE Intl. Conf. on Pattern Recognition
Kosmatopoulos E, Martinelli A, Achtelik M, Chli M,
Harris CG, Pike JM (1987) 3D positional integration from
Chatzichristofis S, Kneip L, Gurdan D, Heng L, Lee G,
image sequences. In: In Proc. Alvey Vision Conference
16 Yonggen Ling et al.

Lynen S, Meier L, Pollefeys M, Renzaglia A, Siegwart R,

Stumpf J, Tanskanen P, Troiani C, Weiss S (2014) Vision-
controlled micro flying robots: from system design to au-
tonomous navigation and mapping in GPS-denied envi-
ronments. IEEE Robot Autom Mag 21(3):26–40
Segal A, Haehnel D, Thrun S (2005) Generalized-ICP. In:
Robotics: Science and Systems
Shen S, Mulgaonkar Y, Michael N, Kumar V (2013) Vision-
based state estimation and trajectory control towards
high-speed flight with a quadrotor. In: Proc. of Robot.:
Sci. and Syst., Berlin, Germany
Shen S, Mulgaonkar Y, Michael N, Kumar V (2014)
Initialization-free monocular visual-inertial estimation
with application to autonomous MAVs. In: Proc. of the
Intl. Sym. on Exp. Robot., Morocco
Shen S, Michael N, Kumar V (2015) Tightly-coupled
monocular visual-inertial fusion for autonomous flight of
rotorcraft MAVs. In: Proc. of the IEEE Intl. Conf. on
Robot. and Autom., Seattle, WA
Shi J, Tomasi C (1994) Good features to track. In: IEEE
Conference on Computer Vision and Pattern Recognition
Sibley G, Matthies L, Sukhatme G (2010) Sliding window
filter with application to planetary landing. J Field Robot
27(5):587–608
Stückler J, Behnke S (2012) Model learning and real-time
tracking using multi-resolution surfel maps. In: Asso. for
the Advan. of Art. Intell.
Tomasi C, Kanade T (1991) Detection and tracking of point
features. In: Carnegie Mellon University Technical Re-
port CMU-CS-91-132
Usenko V, Engel J, Stuckler J, Cremers D (2016) Direct
visual-inertial odometry with stereo cameras. In: Proc. of
the IEEE Intl. Conf. on Robot. and Autom.
Yang Z, Shen S (2015) Monocular isual-inertial fusion
with online initialization and camera-IMU calibration. In:
Proc. of the IEEE/RSJ Intl. Conf. on Intell. Robots and
Syst
Yang Z, Shen S (2016) Tightly-coupled visual-inertial sen-
sor fusion based on IMU pre-integration. Tech. rep.,
Hong Kong University of Science and Technology,
URL: http://www.ece.ust.hk/˜eeshaojie/
vins2016zhenfei.pdf

Direct Visual-Inertial Odometry
No ratings yet
Direct Visual-Inertial Odometry
8 pages
Tight Fusion of Events and Inertial Measurements For Direct Velocity Estimation
No ratings yet
Tight Fusion of Events and Inertial Measurements For Direct Velocity Estimation
17 pages
Vision Aided Inertial Navigation
No ratings yet
Vision Aided Inertial Navigation
14 pages
Real-Time RGB-D Camera Odometry
No ratings yet
Real-Time RGB-D Camera Odometry
8 pages
A Deep Learning Approach To VI Camera Pose Estimation - 2016
No ratings yet
A Deep Learning Approach To VI Camera Pose Estimation - 2016
6 pages
Vision-Lidar SLAM for Robotics
No ratings yet
Vision-Lidar SLAM for Robotics
54 pages
Basalt
No ratings yet
Basalt
8 pages
SVO Semidirect Visual Odometry For Monocular and Multicamera Systems
No ratings yet
SVO Semidirect Visual Odometry For Monocular and Multicamera Systems
17 pages
On-Manifold Preintegration For Real-Time
No ratings yet
On-Manifold Preintegration For Real-Time
20 pages
Visual-Inertial Odometry of Aerial Robots
No ratings yet
Visual-Inertial Odometry of Aerial Robots
13 pages
SURF Algorithm for UAV Pose Estimation
No ratings yet
SURF Algorithm for UAV Pose Estimation
27 pages
Dissertation Myung PDF
No ratings yet
Dissertation Myung PDF
143 pages
Signals
No ratings yet
Signals
14 pages
Enhanced MSCKF for Visual-Inertial Odometry
No ratings yet
Enhanced MSCKF for Visual-Inertial Odometry
23 pages
Efficient Visual-Inertial Estimation
No ratings yet
Efficient Visual-Inertial Estimation
10 pages
PHD Thesis Raul Acuna
No ratings yet
PHD Thesis Raul Acuna
168 pages
Sensors 23 05188
No ratings yet
Sensors 23 05188
22 pages
Ego Motion Estimation Paper
No ratings yet
Ego Motion Estimation Paper
13 pages
Enhanced SLAM with RK-VIF Fusion
No ratings yet
Enhanced SLAM with RK-VIF Fusion
26 pages
Image Feature Extraction
No ratings yet
Image Feature Extraction
7 pages
Robust Navigation of UAV's by Fusion of Vision, InS and GPS
No ratings yet
Robust Navigation of UAV's by Fusion of Vision, InS and GPS
38 pages
Lec 07 Odometry Slam Localization
No ratings yet
Lec 07 Odometry Slam Localization
75 pages
WiCRF Weighted Bimodal Constrained LiDAR Odometry and Mapping With Robust Features
No ratings yet
WiCRF Weighted Bimodal Constrained LiDAR Odometry and Mapping With Robust Features
8 pages
qt74t252zw Nosplash
No ratings yet
qt74t252zw Nosplash
158 pages
Displacement Estimation Based On Optical and Inertial Sensor Fusion
No ratings yet
Displacement Estimation Based On Optical and Inertial Sensor Fusion
22 pages
GNSS-Aided Visual-Inertial Odomtry With Failure Mode Recognition
No ratings yet
GNSS-Aided Visual-Inertial Odomtry With Failure Mode Recognition
6 pages
Sensors 23 07921 v2
No ratings yet
Sensors 23 07921 v2
17 pages
MITI: SLAM Benchmark For Laparoscopic Surgery
No ratings yet
MITI: SLAM Benchmark For Laparoscopic Surgery
3 pages
Accurate Orientation Estimates For Deep Inertial Odometry: Scott Sun July 2020 CMU-RI-TR-20-29
No ratings yet
Accurate Orientation Estimates For Deep Inertial Odometry: Scott Sun July 2020 CMU-RI-TR-20-29
42 pages
Hybrid Tracking for Outdoor AR Systems
No ratings yet
Hybrid Tracking for Outdoor AR Systems
14 pages
Kiss Icp
No ratings yet
Kiss Icp
8 pages
10 1109@jsen 2019 2907716
No ratings yet
10 1109@jsen 2019 2907716
7 pages
Remotesensing 16 01524 v2
No ratings yet
Remotesensing 16 01524 v2
27 pages
Multi Camera Visual Inertial Odometry For Sliding Window Bundle Adjustment
No ratings yet
Multi Camera Visual Inertial Odometry For Sliding Window Bundle Adjustment
8 pages
2007 01867
No ratings yet
2007 01867
9 pages
Sensors: Visual-Inertial Odometry With Robust Initialization and Online Scale Estimation
No ratings yet
Sensors: Visual-Inertial Odometry With Robust Initialization and Online Scale Estimation
21 pages
Inertial Sensor Positioning Techniques
No ratings yet
Inertial Sensor Positioning Techniques
90 pages
A Deep Learning Based Hybrid VIO Approach For UAS Position Estimation
No ratings yet
A Deep Learning Based Hybrid VIO Approach For UAS Position Estimation
14 pages
Sensors: Pose Estimation of A Mobile Robot Based On Fusion of IMU Data and Vision Data Using An Extended Kalman Filter
No ratings yet
Sensors: Pose Estimation of A Mobile Robot Based On Fusion of IMU Data and Vision Data Using An Extended Kalman Filter
22 pages
3D-LIDAR Localization for Autonomous Vehicles
No ratings yet
3D-LIDAR Localization for Autonomous Vehicles
6 pages
Dense Visual SLAM For RGB-D Cameras
No ratings yet
Dense Visual SLAM For RGB-D Cameras
7 pages
Vision-Aided Inertial Navigation Analysis
No ratings yet
Vision-Aided Inertial Navigation Analysis
23 pages
Unit Iii Cv&ip
No ratings yet
Unit Iii Cv&ip
29 pages
Online Calibration for Camera-IMU Systems
No ratings yet
Online Calibration for Camera-IMU Systems
17 pages
ACK-MSCKF for Vehicle Pose Estimation
No ratings yet
ACK-MSCKF for Vehicle Pose Estimation
20 pages
Image As An IMU: Estimating Camera Motion From A Single Motion-Blurred Image
No ratings yet
Image As An IMU: Estimating Camera Motion From A Single Motion-Blurred Image
10 pages
Kinect 10
No ratings yet
Kinect 10
8 pages
A Consistent Parallel Estimation Framework For Visual-Inertial SLAM
No ratings yet
A Consistent Parallel Estimation Framework For Visual-Inertial SLAM
22 pages
CV - Unit Iii
No ratings yet
CV - Unit Iii
25 pages
Paper 4
No ratings yet
Paper 4
7 pages
Nonlinear Observers for Vision-Aided INS
No ratings yet
Nonlinear Observers for Vision-Aided INS
16 pages
On Computer Vision For Augmented Reality
No ratings yet
On Computer Vision For Augmented Reality
4 pages
Uncertainty in Visual Odometry Systems
No ratings yet
Uncertainty in Visual Odometry Systems
21 pages
Visual Features For Vehicle Localization and Ego-Motion Estimation
No ratings yet
Visual Features For Vehicle Localization and Ego-Motion Estimation
7 pages
Neuromorphic Visual Odometry for Vehicles
No ratings yet
Neuromorphic Visual Odometry for Vehicles
8 pages
Computer Vison 5
No ratings yet
Computer Vison 5
44 pages
UniLog: Automatic Logging Via LLM and In-Context Learning
No ratings yet
UniLog: Automatic Logging Via LLM and In-Context Learning
12 pages
Huangfu Yixin Finalsubmission202202 PHD
No ratings yet
Huangfu Yixin Finalsubmission202202 PHD
238 pages
Analysis, Security, and Interpretation
No ratings yet
Analysis, Security, and Interpretation
26 pages
TraceRCA IWQoS2021
No ratings yet
TraceRCA IWQoS2021
134 pages
Systems Through Code Analysis With LLM Agents
No ratings yet
Systems Through Code Analysis With LLM Agents
12 pages
Genfollower: Enhancing Car-Following Prediction With Large Language Models
No ratings yet
Genfollower: Enhancing Car-Following Prediction With Large Language Models
11 pages
Few-Shot Issue Report Classification With Adapters: Fahad Ebrahim Mike Joy
No ratings yet
Few-Shot Issue Report Classification With Adapters: Fahad Ebrahim Mike Joy
4 pages
FULLTEXT01
No ratings yet
FULLTEXT01
74 pages
BERT Based Issue Classification JSS Workshop Extension
No ratings yet
BERT Based Issue Classification JSS Workshop Extension
36 pages
Issue Report Validation in An Industrial Context: Ethem Utku Aktas Ebru Cakmak
No ratings yet
Issue Report Validation in An Industrial Context: Ethem Utku Aktas Ebru Cakmak
7 pages
Ying Nlbse 2023
No ratings yet
Ying Nlbse 2023
4 pages
Classifai: Automating Issue Reports Classification Using Pre-Trained Bert (Bidirectional Encoder Representations From Transformers) Language Models
No ratings yet
Classifai: Automating Issue Reports Classification Using Pre-Trained Bert (Bidirectional Encoder Representations From Transformers) Language Models
4 pages
ALL ApplyingLLMtoissueclassification
No ratings yet
ALL ApplyingLLMtoissueclassification
94 pages
GitHub Issue Classification Using BERT S
No ratings yet
GitHub Issue Classification Using BERT S
4 pages
Girt-D: Sampling Github Issue Report Templates
No ratings yet
Girt-D: Sampling Github Issue Report Templates
5 pages
MLFailurePredictionSR Authorsversion
No ratings yet
MLFailurePredictionSR Authorsversion
25 pages
Predicting Issue Types With Sebert: Alexander Trautsch Steffen Herbold
No ratings yet
Predicting Issue Types With Sebert: Alexander Trautsch Steffen Herbold
3 pages
White Paper EN Introduction To Fuzzy Theory
No ratings yet
White Paper EN Introduction To Fuzzy Theory
12 pages
Dopamin: Transformer-Based Comment Classifiers Through Domain Post-Training and Multi-Level Layer Aggregation
No ratings yet
Dopamin: Transformer-Based Comment Classifiers Through Domain Post-Training and Multi-Level Layer Aggregation
4 pages
Robust Pose Graph Optimization Techniques
No ratings yet
Robust Pose Graph Optimization Techniques
10 pages
Carter YF Carburetor Adjustment Guide
No ratings yet
Carter YF Carburetor Adjustment Guide
4 pages
AGM4307 Exam: Economics for Engineers
No ratings yet
AGM4307 Exam: Economics for Engineers
11 pages
Pest Control Service Proposal Overview
No ratings yet
Pest Control Service Proposal Overview
1 page
MAIN CAMPUS-Administrative Aide VI Disbursing Officer I
No ratings yet
MAIN CAMPUS-Administrative Aide VI Disbursing Officer I
1 page
CH 7 Stock P. 69-80 v20 F
No ratings yet
CH 7 Stock P. 69-80 v20 F
14 pages
Presentation CIT
No ratings yet
Presentation CIT
43 pages
CSE Discipline Committee Proposal
No ratings yet
CSE Discipline Committee Proposal
9 pages
Sds Simonz Magic-Carpet
No ratings yet
Sds Simonz Magic-Carpet
4 pages
Pitchdeck Apr2025 Puls Brief
No ratings yet
Pitchdeck Apr2025 Puls Brief
12 pages
PR-SG-103 108 Standard Details-PR-SG-105
No ratings yet
PR-SG-103 108 Standard Details-PR-SG-105
1 page
Philips Respironics DreamStation BiPAP Auto
100% (1)
Philips Respironics DreamStation BiPAP Auto
2 pages
Agatha Raisin and The Quiche of Death - M.C. Beaton
80% (5)
Agatha Raisin and The Quiche of Death - M.C. Beaton
28 pages
DHV Technology EPS For Cubesats
No ratings yet
DHV Technology EPS For Cubesats
2 pages
Automated Systems Revision
No ratings yet
Automated Systems Revision
3 pages
Pep Guardiola Job Analysis
No ratings yet
Pep Guardiola Job Analysis
6 pages
Break Even Analysis
No ratings yet
Break Even Analysis
5 pages
Nanotech Innovations Explored
No ratings yet
Nanotech Innovations Explored
11 pages
Modular CNC Mini Lathe-1
100% (1)
Modular CNC Mini Lathe-1
2 pages
PC Literacy - Nuco Mine
No ratings yet
PC Literacy - Nuco Mine
1 page
Trane XB13 SC Product
No ratings yet
Trane XB13 SC Product
72 pages
A Retrospective of High-Lift Device Technology: Andrea Dal Monte, Marco Raciti Castelli and Ernesto Benini
No ratings yet
A Retrospective of High-Lift Device Technology: Andrea Dal Monte, Marco Raciti Castelli and Ernesto Benini
6 pages
Alkaline Capital Trading Parameters
No ratings yet
Alkaline Capital Trading Parameters
2 pages
Subtraction Flash Cards
No ratings yet
Subtraction Flash Cards
21 pages
Using SQLite with C# Console Apps
No ratings yet
Using SQLite with C# Console Apps
5 pages
Free Office Text Maker Resources
No ratings yet
Free Office Text Maker Resources
29 pages
Textile Engineer Resume of Mahinul Islam
No ratings yet
Textile Engineer Resume of Mahinul Islam
2 pages
Inventory Planning and Control Insights
No ratings yet
Inventory Planning and Control Insights
46 pages
0050 Inert Gas System
100% (1)
0050 Inert Gas System
23 pages
MSDS for Tri Sodium Phosphate
No ratings yet
MSDS for Tri Sodium Phosphate
7 pages
Hướng Dẫn Lập Trình .NET Với Visual Studio
No ratings yet
Hướng Dẫn Lập Trình .NET Với Visual Studio
18 pages

Auro 2017

Uploaded by

Auro 2017

Uploaded by

Noname manuscript No.

(will be inserted by the editor)

Edge Alignment-Based Visual-Inertial Fusion for Tracking of Aggressive

the date of receipt and acceptance should be inserted later

narios. We release our code as open-source ROS packages

There has been extensive scholarly work done in relation

Fig. 3: Graph representation of variables (xk = [pw w w bk bk

5.1 Formulation 5.2 Optimization on Lie Group Manifolds

Substituting (8) into (7) and then taking the derivative

X = [x0 , x1 , ..., xN , tcb , qcb ],

The proposed edge alignment abstracts the image as edges X

the relative pose for the key-frame-to-current-frame. This

produce estimates with high uncertainty. This is detrimen-

(a) Reference Image Ir (b) Current Image In

into a prior: the system is then updated (step 1 in front marginalization).

6.6 Optimization with Robust Norm

Component Average Computation Time Thread

15 cameras in OKVIS. To separate the effects of local loop clo-

Sequence OKVIS ROVIO Edge+IMU+Loop Edge+IMU Edge+Loop Edge Only

Sequence OKVIS ROVIO Edge+IMU+Loop Edge+IMU Edge+Loop Edge Only

Skipped Number ROVIO Edge+IMU+Loop Edge+IMU Edge+Loop Edge-Only

Skipped Number ROVIO Edge+IMU+Loop Edge+IMU Edge+Loop Edge-Only

comparison. Our system runs with four settings (“Edge-

to or more than 5 while our system loses track if the num- −5

involving IMU measurements optimizes velocities, which

7.6 Tracking in an Outdoor Environment with More

Lynen S, Meier L, Pollefeys M, Renzaglia A, Siegwart R,

You might also like