Camera Algorithm For Estimating Distance of Objects
Camera Algorithm For Estimating Distance of Objects
autonomous vehicles
Master Thesis
Technical University Berlin
Faculty Electrical Engineering and Computer Science
Department Computer Vision and Remote Sensing
Field of Study Computer Engineering
Submitted on:
Eidesstattliche Erklärung
............................................................................
Unterschrift
i
Abstract Patrick Irmisch
Abstract
The aim of this work is the investigation of camera-based techniques for distance esti-
mation between two autonomous vehicles. While both monocular- and stereo-camera
methods are explored, this study focuses on the usage of fiducial markers.
Therefore, existing fiducial markers are discussed and selected. Based on this selec-
tion, three configurations of markers are proposed and applied to different distance
estimation methods. The chosen markers are AprilTag and WhyCon. Their distances
are estimated by means of Perspective-n-Point, 3D position calculation of a circle and
stereo-based triangulation.
Within this study the presented methods are evaluated based on their distance esti-
mation accuracy and applicable range. They are compared with each other and with
the common stereo method Semi-Global-Matching. Moreover, the influence of uncer-
tainties is explored with reference to geometrical calibration. A setup is presented
to evaluate the techniques based on real-world and simulated data. In order to gain
insights on the methods properties, a simulation is used that facilitates variation of
the image data. In addition, a Monte-Carlo-Simulation allows to model calibration
uncertainty. The obtained observations are substantiated based on two real-world ex-
periments.
The results demonstrate the potential of fiducial markers for relative distance estima-
tion of vehicles in terms of high accuracy and low uncertainty. The lower sensitiv-
ity to uncertainties in camera calibration makes fiducial markers preferable to stereo
methods.
ii
Zusammenfassung Patrick Irmisch
Zusammenfassung
iii
Contents Patrick Irmisch
Contents
List of Figures vii
Nomenclature viii
1 Introduction 1
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Research Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Organization of the Thesis . . . . . . . . . . . . . . . . . . . . . . . . . 2
2 Related Work 3
2.1 Stereo Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.2 Monocular Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.3 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3 Fundamentals 8
3.1 Camera Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.1.1 Camera Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.1.2 Distortion Model . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.1.3 Noise Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.2 Position estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.2.1 Stereo Triangulation . . . . . . . . . . . . . . . . . . . . . . . . 11
3.2.2 Perspective-n-Point . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.2.3 3D Position Calculation of a Circle . . . . . . . . . . . . . . . . 14
3.3 Image Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.3.1 AprilTag . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.3.2 WhyCon and WhyCode . . . . . . . . . . . . . . . . . . . . . . 15
3.3.3 SGM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.4 Image Synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.4.1 Basic Rendering . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.4.2 Extended Rendering Pipeline . . . . . . . . . . . . . . . . . . . 18
3.4.3 Anti-Aliasing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4 Evaluation Pipeline 21
4.1 General Setup and Definitions . . . . . . . . . . . . . . . . . . . . . . . 21
4.2 Real-World Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
4.3 Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.3.1 Simulation Stage . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.3.2 Application Stage . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.4 Evaluation Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
iv
Contents Patrick Irmisch
5 Integration of Methods 29
5.1 Integration of AprilTags . . . . . . . . . . . . . . . . . . . . . . . . . . 29
5.1.1 Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
5.1.2 Preliminary Evaluation and Summary . . . . . . . . . . . . . . 30
5.2 Integration of WhyCon . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
5.2.1 Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
5.2.2 Proposed Code System and Extraction . . . . . . . . . . . . . . 33
5.2.3 Preliminary Evaluation - Coding . . . . . . . . . . . . . . . . . 33
5.2.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
5.3 Integration of Stereo Methods . . . . . . . . . . . . . . . . . . . . . . . 35
5.3.1 Application of SGM . . . . . . . . . . . . . . . . . . . . . . . . 35
5.3.2 Triangulation of Markers . . . . . . . . . . . . . . . . . . . . . . 36
5.3.3 Preliminary Evaluation and Summary . . . . . . . . . . . . . . 36
6 Evaluation 37
6.1 Qualitative Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . 37
6.1.1 Distance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
6.1.2 Application Range and View-Angle . . . . . . . . . . . . . . . . 38
6.1.3 Image Exposure . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
6.2 Consideration of Calibration Uncertainty . . . . . . . . . . . . . . . . . 42
6.2.1 Direct Comparison . . . . . . . . . . . . . . . . . . . . . . . . . 42
6.2.2 Correlation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
6.2.3 Marker Uncertainty . . . . . . . . . . . . . . . . . . . . . . . . . 45
6.2.4 Camera Uncertainty . . . . . . . . . . . . . . . . . . . . . . . . 45
6.2.5 Influence of the Baseline . . . . . . . . . . . . . . . . . . . . . . 46
6.3 Accumulation of RPV-Methods . . . . . . . . . . . . . . . . . . . . . . 47
6.3.1 Correlation of RPV-Methods . . . . . . . . . . . . . . . . . . . . 47
6.3.2 Combination of RPV-Methods . . . . . . . . . . . . . . . . . . . 47
7 Conclusion 49
References vi
Appendix viii
v
Lists of -figures/-tables/-listings/-abbrevations Patrick Irmisch
List of Figures
1 Simulation based illustration of ’Virtual Coupling’ of trains . . . . . . . 1
2 Visualization of exemplary stereo-based methods . . . . . . . . . . . . . 4
3 Visualization of natural features . . . . . . . . . . . . . . . . . . . . . . 5
4 Fiducial markers used for reconstruction . . . . . . . . . . . . . . . . . 5
5 Selection of rectangular shaped fiducial markers . . . . . . . . . . . . . 6
6 Selection of circular shaped fiducial markers . . . . . . . . . . . . . . . 6
7 Camera model in a stereo setup . . . . . . . . . . . . . . . . . . . . . . 8
8 Visualization of the essential forms of radial distortion . . . . . . . . . 10
9 Illustration of the noise model . . . . . . . . . . . . . . . . . . . . . . . 10
10 Epipolar geometry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
11 The stereo normal case . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
12 Theoretical dependency of disparity and distance in a stereo setup . . . 12
13 Comparison of PnP-methods in a planar setup . . . . . . . . . . . . . . 13
14 Visualization of the concept to compensate radial distortion of ellipses . 14
15 Illustration of Semi-Global-Matching . . . . . . . . . . . . . . . . . . . 17
16 The graphic rendering pipeline . . . . . . . . . . . . . . . . . . . . . . . 18
17 The extended rendering pipeline . . . . . . . . . . . . . . . . . . . . . . 19
18 Specification of significant coordinate systems and transformations . . . 21
19 Schematic representation of the evaluation pipeline . . . . . . . . . . . 22
20 Scenedata content . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
21 Exemplary subimages from the proposed datasets . . . . . . . . . . . . 23
22 Specification of the marker areas . . . . . . . . . . . . . . . . . . . . . 24
23 Exemplary simulated subimages from the proposed datasets . . . . . . 24
24 Embedding of the shader pipeline . . . . . . . . . . . . . . . . . . . . . 26
25 Important components of the scenegraph . . . . . . . . . . . . . . . . . 26
26 General procedure in the application stage . . . . . . . . . . . . . . . . 27
27 Explanation of evaluation procedure and plots . . . . . . . . . . . . . . 28
28 Preliminary evaluation of AprilTags . . . . . . . . . . . . . . . . . . . . 30
29 Specification of marker configurations for the evaluation . . . . . . . . . 31
30 Applied processing chain with WhyCon . . . . . . . . . . . . . . . . . . 31
31 Visualization of steps to estimate the angular shift of the code . . . . . 32
32 Pipeline for extracting the binary code . . . . . . . . . . . . . . . . . . 33
33 Extracts from the experiments of WhyCon detection . . . . . . . . . . 34
34 Analysis of the detection range for different WhyCon patterns . . . . . 34
35 Illustration of the SGM application . . . . . . . . . . . . . . . . . . . . 35
36 Comparison of SGM with marker-based triangulation . . . . . . . . . . 36
37 Comparison based on simulation and a real-world experiment . . . . . . 38
38 Evaluation of the application range for different view-angles . . . . . . 39
39 Exemplary simulated images for the application range comparison . . . 39
40 Consideration of image exposure . . . . . . . . . . . . . . . . . . . . . . 40
41 Illustration of the influence of exposure . . . . . . . . . . . . . . . . . . 41
42 Comparison based on simulation with variation and uncertainty . . . . 42
43 Consideration of faulty outliers . . . . . . . . . . . . . . . . . . . . . . 43
44 Correlation to uncertainty in markers . . . . . . . . . . . . . . . . . . . 44
vi
Lists of -figures/-tables/-listings/-abbrevations Patrick Irmisch
List of Tables
1 Visualization of methods for anti-aliasing . . . . . . . . . . . . . . . . . 20
2 Varied parameters in the evaluation pipeline . . . . . . . . . . . . . . . 25
Register of listings
vii
Lists of -figures/-tables/-listings/-abbrevations Patrick Irmisch
Nomenclature
CAD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Computer-Aided Design
MCS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Monte-Carlo-Simulation
SGM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Semi-Global-Matching
viii
Introduction Patrick Irmisch
1 Introduction
This chapter provides a motivation for this work and a description of the objective,
formulated by research questions. Then the general structure of the work is delineated.
1.1 Motivation
Relative distance estimation incarnates an important role in numerous safety-critical
applications in the context of Advanced Driver Assistance Systems. Thus, systems as
Automatic Cruise Control in vehicle-platooning applications rely on continuous knowl-
edge about the relative position of the preceding vehicle. In this context, a future-
oriented application is investigated by the German Aerospace Center with "Virtual
Coupling" of trains in the project Next Generation Train (DLR, 2016, NGT). The
vision is to replace physical coupling by driving in short distances as illustrated in
Figure 1. This allows to compose multiple trains during continuous driving, which
promises a more effective use of the rail system and shortened travel times.
Certainly, the estimation of the relative distance to the preceding train needs to be
highly reliable and accurate. To accomplish this task while ensuring flexibility and
independence to external systems, such as Global Positioning System (GPS), vehicle-
based sensors are commonly used. Most represented are radar sensors, Light Detection
and Ranging (LIDAR) devices as well as camera systems. These sensors show different
advantages and disadvantages, which is why safety-critical systems are usually designed
to combine results of different sensors. However, in recent years camera systems en-
joy increasing attention due to more available computing power and rapid advances in
camera technology. Moreover, camera systems come along with different methods to
estimate the relative distance. For instance, they are differentiated in monocular- or
stereo systems and based on image features, natural features or fiducial features. Due
to this versatility, this thesis investigates the use of camera sensors.
Before applying methods to a safety-critical real-world scenario they have to be ex-
amined on their reliability, accuracy and sensitivity to various influences. In general,
real-world experiments represent the key for a conclusive investigation. Nonetheless,
real-world experiments do not allow to analyze the dependency on complex variations
of the input parameters such as calibration uncertainty. Because of that, this work
presents an evaluation setup that evaluates real-world data and supports these exper-
iments by simulated data based on a Monte-Carlo-Simulation.
1
Introduction Patrick Irmisch
2
Related Work Patrick Irmisch
2 Related Work
Relative Positioning has been the subject of research for many years and has been
investigated in many studies with different approaches. (Ponte Muller, 2017) presents
a comprehensive review of different vehicle-based sensors to estimate the relative dis-
tance. That includes radar, LIDAR and monocular-, stereo- and time-of-flight camera
systems. Furthermore, different cooperative methods are discussed that include abso-
lute positioning methods and direct communication of autonomous vehicles. Compre-
hensive reviews for vision based vehicle detection and distance estimation are presented
in (Bernini et al., 2014; Dhanaekaran et al., 2015; Sivaraman and Trivedi, 2013b).
The following survey of related work differentiates between stereo-based and monocular
approaches. In conclusion, the methods selected for this work are named and justified.
3
Related Work Patrick Irmisch
(a) Superpixel (Menze and Geiger, 2015) (b) Stixel (Erbs et al., 2011)
Another widespread method is Optical flow (Lucas and Kanade, 1981) that is applied to
stereo approaches to make use of temporal information. For instance (Lenz et al., 2011)
matches interest points in a temporal as well as stereo sense to distinguish between
moving and rigid spatial objects. (Menze and Geiger, 2015) proposed a slanted-plain
model assuming that the 3D structure of the scene can be approximated by a set of
piece-wise planar superpixel (Figure 2 (a)). They optimized their observation by using
a disparity map generated by SGM and a CAD model to apply 3D model fitting.
4
Related Work Patrick Irmisch
License plate yb
Bottom yb HC HC
Vehicle width
Lane width
d
(a) Captured image (b) Triangulation geometry
In contrast, license plate based distance estimation employs prior knowledge of a pat-
tern with fixed dimension. License plate recognition has been already applied in (Chen
and Chen, 2011) and in (Lu et al., 2011), which additionally uses the vehicles taillight
to recognize the license plate. Recent research (Liu et al., 2017) proposed a robust
license plate detection based on a fusion method and estimated the distance based on
the known plate height to avoid influences caused by turning-vehicles. These applica-
tions present a reliable distance estimation when attaching a pattern of known size to
the vehicle. However, this thesis is rather addressed to train applications, where license
plates are not present. Nevertheless, this method demonstrates the opportunities that
come with vehicle-attached markers of known size.
Fiducial features (also referred as tags) exist in various appearances and are applied to
different applications such as 3D reconstruction, pose estimation and object identifi-
cation. Therefore, they generally consist of a rectangular or circular, two-dimensional
shape of known dimension and include a visual code to identify the tag. The following
summary starts with a brief overview of tags created for reconstruction and ends with
markers especially designed for pose and distance estimation.
For reconstruction an accurate pose estimation of the camera is presupposed that
requires multiple well extracted image features. A state-of-the-art tag is Rune-Tag
(Bergamasco et al., 2011, 2016). It is based on multiple small black circles likewise
arranged in circles, shown in Figure 4 (a). By accommodating redundant coding, they
achieve great robustness to occlusion and a large number of dots favor an accurate
pose estimation. Similarly, (Bergamasco et al., 2013) proposed Pi-Tag which combines
multiple circles in a rectangular shape. By exploiting collinearity and cross-ratios
5
Related Work Patrick Irmisch
they reduce the influence of perspective distortion. (Birdal et al., 2016) proposed X-
Tag which uses a circular shape with randomly positioned inner dots and two white
additional dots to identify the markers orientation. While using multiple tags in a non-
planar configuration, they show superiority over using co-planar circle features given
by Rune-Tag during 3D reconstruction. Even though these tags allow precise pose
estimation they need many pixels to be detected, which is why they are not well suited
for distance estimation.
Rectangular shaped markers are frequently applied for pose estimation. A selection is
presented in Figure 5. The outer rectangular shape facilitates a reliable recognition
and provides four points that are used for the Perspective-n-Point problem (PnP) to
estimate the camera pose. Early approaches are represented by ARToolKit (Kato and
Billinghurst, 1999) and (Ababsa and Mallem, 2004), both originally developed for real-
time augmented reality systems. However, ARToolKit was successfully applied in (Seng
et al., 2013) to estimate the pose of an unmanned aerial vehicle. To differentiate be-
tween multiple markers the inner area of the tag is usually equipped with a coding
system. While ARToolKit uses Latin characters that are disadvantageous due to their
high computational effort of decoding, ARTag (Fiala, 2005) is equipped with a binary
coding system based on forward error correction, which leads to easier generation and
correlation of tags. Furthermore, this tag is robust to changes of lightning and partially
occlusion. (Olson, 2011) proposed AprilTag which improves upon ARTag in detection
and encoding by using a graph-based clustering for detecting the tag borders and a new
coding system preferring complex pattern to reduce the false positive rate. AprilTag is
fully opensource and has been successfully used in many applications. In (Britto et al.,
2015) AprilTag are used to estimate the pose of an unmanned underwater vehicle and
(Winkens and Paulus, 2017) applied these markers for truck tracking in short range
before building a model of natural features for long range tracking. (Wang and Olson,
2016) improved AprilTag especially for the reliable detection of small tags. (Walters
and Manja, 2015) proposed Chroma-Tag and expanded the coding system to use color
information to provide more distinguished IDs. (Mangelson et al., 2016) proved the
sensibility of AprilTag to image exposure experimentally and expanded AprilTag with
(a) Intersense (b) Cucci (c) Calvet (d) WhyCon (e) WhyCode
6
Related Work Patrick Irmisch
multiple circles for robustness. (Pertile et al., 2015) evaluated the uncertainty of a
vision system with a rectangular marker based on a Monte-Carlo-Simulation.
Circle-based features are often applied since their image projections are cheaply and
robustly detected. A selection is presented in Figure 6. (Naimark and Foxlin, 2002)
used the centers of four of their depicted markers for pose estimation in the context
of visual-inertial self-tracking. Similarly, (Wilson et al., 2014) used four LED-markers
on a plane for applying PnP for a formation flight. (Krajník et al., 2013) proposed
WhyCon, a fiducial marker based on a simple concentric contrasting circle (Gatrell and
Hoff, 1991). WhyCon impresses with short detection time, long detection range and
precise pose estimation based on one marker. It is mainly applied for tracking multiple
mobile robots and is fully opensource. (Lightbody et al., 2017) proposed WhyCode
that extends WhyCon with a circular binary code. (Calvet et al., 2016) investigated an
opensource fiducial marker based on multiple concentric circles under challenging con-
ditions as motion blur and weak lightning. (Cucci, 2016) proposed a circular fiducial
feature design based on two black circles coded with white blobs for aerial photogram-
metry. The hierarchical design allows a reliable detection while landing and from far
distances.
2.3 Outline
The presented related work shows the intensity of research related to relative distance
estimation of autonomous vehicles. Natural features have shown to be an encouraging
solution since only one camera is required and no further attachments to the preced-
ing vehicle. However, disadvantageous is the potential for inaccurate or even wrong
assumptions and the general non-public provision of implementations. The implemen-
tations of fiducial feature detection on the other hand are mostly public available,
which makes them attractive for further research. Furthermore, the known dimensions
of fiducial markers prevent the system from inaccurate assumptions that are crucial
in safety-critical applications such as dynamic train composition. As a consequence,
this work focuses on the application of fiducial markers. Two tags are selected. First,
AprilTag is chosen because of its popularity and frequent use. Second, WhyCon is used
since its simplicity promises a long range application.
Many stereo-based approaches rely on SGM. Therefore, this work evaluates the direct
distance estimation by SGM without any further extensions.
Several benchmarks evaluating and comparing different vision-based methods for ve-
hicle applications are publicly available. (Geiger et al., 2012) provides the popular
KITTI-benchmark, which includes stereo-records completed with CAN-Bus and LI-
DAR data. (Menze and Geiger, 2015) partially extended this dataset by providing
ground truth labeling of vehicles especially for scene flow applications. (Sivaraman
and Trivedi, 2013b) and (Caraffi et al., 2012) provide image sequences of a monoc-
ular camera for vehicle detection and tracking. However, these benchmarks are not
applicable in this work since different fiducial markers in various configurations will be
applied. Therefore, two task-specific benchmarks are presented in Section 4.2.
7
Fundamentals Patrick Irmisch
3 Fundamentals
This chapter describes basic theories and methods from the field of computer vision and
computer graphics that are used in this work. First, the geometry of single and stereo
camera views is stated in conjunction with utilized mathematical models for describing
camera characteristics. Second, methods are introduced that allow to estimate the
distances to objects projected onto the image plane. Third, the methods selected in the
previous section are recapitulated to call attention to important characteristics. Finally,
the underlying rendering pipeline is explained, which is used to extract synthetic image
data within this work.
To model the projection of a camera, the pinhole camera model (Schreer, 2005, p.40)
is used, illustrated in Figure 7 (left). It describes the central projection of an object
point M onto the image plane Π, which is defined parallel to the xy-plane of the
camera coordinate-frame CL and is placed in front of the camera in the mathematical
model. Related, the principal point c describes the point on the image plane that is
the intersection of the principal axis z and Π. The projected image point m is then
defined in image coordinates (u, v)T .
α 0 u0 1 0 0 0
P = K ∗ PN = 0 α v0 ∗ 0 1 0 0 , with α = f /δ (1)
0 0 1 0 0 1 0
Direct mapping of an object point from camera coordinates to ideal image coordinates
is realized by the projection matrix P of Equation 1, which consists of the camera
u
v M
u`
π`
m π v`
c c`
CL z CR
z`
x`
y x y`
HCL2CR
8
Fundamentals Patrick Irmisch
matrix K and a normalized 3x4 matrix. The algebraic model of K is composed of the
principal point c = (u0 , v0 )T and the principal distance in pixel units α, which is based
on the focal length f and pixel size δ.
wm = P ∗ Hw ∗ M (2)
In the case that M is defined in a world-coordinate frame, it firstly is transformed into
the camera-coordinate frame CL by an associated Euclidean homography matrix Hw
before it is applied to the projection matrix, as shown in Equation 3. w is a scale factor
for the transformation into the two-dimensional Euclidean space.
In the case of a stereo setup, as illustrated in Figure 7, the object point is additionally
projected onto the right image plane Π‘ by Equation 3. It applies the homography
matrix HCL2CR after the world-to-camera frame transformation Hw . Thus, HCL2CR
describes the transformation from the left to the right camera-coordinate frame.
When using real lenses, deviations to the ideal pinhole model occur as for example
in forms of defocus, spherical and chromatic aberration, coma, and image distortion,
which is generally most significantly. For illustrative purposes, Figure 8 shows two
exaggerated forms of radial distortion.
(Brown, 1971) proposed a distortion model that models radial distortion δr and tan-
gential distortion δt , as formulated in Equation 4. It describes the relation between
the distorted point (û, v̂)T and the ideal point (u, v)T . This model is frequently used
in many applications such as (Heikkila and Silven, 1997; Zhang, 2000).
! !
û u
g(u, v) = = + δr (u, v, k1,2,3 ) + δt (u, v, p1,2 ), with r2 = u2 + v 2 (4a)
v̂ v
!
u
and δr (u, v, k1,2,3 ) = ∗ (k1 r2 + k2 r4 + k3 r6 ) (4b)
v
!
p (3u2 + v 2 ) + 2p2 xy
and δt (u, v, p1,2 ) = 1 2 (4c)
p2 (u + 3v 2 ) + 2p1 uv
The radial parameters k1 , k2 , k3 and tangential parameters p1 , p2 of this model are
estimated using a calibration process. The calibration estimates a probabilistic dis-
tribution for each parameter by assigning a Gaussian distribution with a bias and
a standard deviation. The standard deviation describes the uncertainties of the in-
dividual parameters, which depend among others on the accuracy and the number
of detected chessboard corners in the image used for the calibration. Depending on
the manufacturing quality of the camera to be calibrated, single parameters of the
model are usually set to zero. This improves the accuracy and uncertainty due to
fewer parameters while the number of used image points remains unchanged. Thus,
9
Fundamentals Patrick Irmisch
the tangential parameters are defined as zero within this work since the used stereo
camera [DLRStereo] has cameras with negligible tangential distortion.
In addition to degradation caused by the lens, the conversion of the captured light into
a digital signal adds noise to the image. Image noise sources are mainly classified in
fixed pattern noise and dynamic noise. Fixed pattern noise such as Photo Response
Non Uniformity (PRNU) and Dark Signal Non Uniformity (DSNU) are usually auto-
matically corrected by the camera itself. In contrast, dynamic noise varies between
each captured frame due to read-out noise and photon noise. A comprehensive review
of different noise models is presented by (Boyat and Joshi, 2015). A recent approach
is proposed by (Zhang et al., 2017), which is used in this work to degrade the image.
Figure 9 shows the root-shaped dependency of noise‘s standard deviation to the pixel
grey-scale values. This distribution is described by Equation 5. As formulated in
(Zhang et al., 2017), it firstly composes of a parameter NE representing the electronic
noise of the camera.It ensures a standard deviation greater than zero even for dark pix-
els. And second, the the grey value I divided through a gain parameter G represents
shot noise. The shown image noise of Figure 9 is generated by the simulator based on
the calibrated parameters NE = 0.2658 and G = 59.1944.
2.5
2.0
1.5
std
q
1.0 Noise = NE2 + I/G (5)
z
0.5
0
0 50 100 150 200 250
mean
Figure 9: Illustration of the noise model (Zhang et al., 2017). Blue points show all
standard deviation for each intensity. The red curve is fitted by Equation 5.
10
Fundamentals Patrick Irmisch
M π`
m π m`
CL l`
CR
e B e`
The estimation of the distance to an object point M by its projected image points m
and m‘ in a stereo setup is based on the epipolar geometry, illustrated in Figure 10.
The baseline B describes the connection of the origins of the camera-coordinate frames
CL and CR. Its intersections with the image planes Π and Π’ define the epipols e and
e’. Related, the object point M and the origins of CL and CR define the epipolar plane,
while m, m’, e, e’ lie on this plane. Its intersections with the image planes denote the
epipolar lines. The epipolar geometry then states that the related image point of m in
image plane Π’ lies on the epipolar line l’.
Thus, the epipolar geometry reduces the costs of matching image features due to a
smaller search space. In the case of a known stereo geometry HCL2CR , rectification is
used to simplify the epipolar geometry (Schreer, 2005, p.105). By virtually rotating
CL and CR to form an axis-parallel camera system, the epipolar lines become parallel.
Consequently, the corresponding image point mr ’ lies in one image row of Π’r . The
result of the rectification is the normalized stereo case shown in Figure 11, which
reveals an intercept theorem for estimating the depth dz of M in rectified camera
u dz u` B∗f
dz = (6)
δ ∗ (u − u‘)
f
mr m`r f
CLr B CR r
Figure 11: The stereo normal case, based on (Schreer, 2005, p.67)
11
Fundamentals Patrick Irmisch
baseline = 0.34m
40
+0.6m d z±
disparity [px]
30
+1.4m
20
+2.6m
+4.1m
+5.9m
+8.2m +10.9m
10
20 30 40 50 60 70 80 90
distance [m]
coordinates. The relation is described in Equation 6 in which δ states the pixel size. It
reveals a strong dependency between the accuracy of the estimated disparity (u − u0 )
based on the image point matching and the resulting calculated distance dz . Figure 12
illustrates this dependency. Red lines show the resulting distance deviations dz± for
±1px deviation in disparity space starting from the disparity at the displayed distances.
E.g. the exact disparity value at 70m is 9.52px and at 80m it is 8.33px. The value of
the positive deviation is displayed. It is recognizable that the dependency incarnates
a square rising shape for smaller disparities and thus larger distances. In addition, a
disparity deviation of +1px at 10m results in a distance deviation dz+ of 0.15m.
3.2.2 Perspective-n-Point
The "Perspective-n-Point problem" (Fischler and Bolles, 1981) can be applied to esti-
mate the pose and thus the distance between the camera with respect to another object
if known correspondences between 3D world points and their 2D projections exist. A
comprehensive overview and comparison is presented in (Urban et al., 2016).
Frequently used representatives are Efficient-PnP (Lepetit et al., 2009, EPnP) and
Robust-PnP (Li et al., 2012, RPnP), which are tested for this work1 . Both represent
non-iterative linear solutions to the PnP problem. EPnP accomplishes linearity by
expressing the 3D reference points as a weighted sum of four virtual control points
and refining the solution using a Gauss-Newton optimization. RPnP on the other
hand investigates 3-point subsets by exploring the local minima of the equation sys-
tem, which is based on the fourth-order polynomials (Quan and Lan, 1999), in terms
of least-squares residual. For each minimum the camera pose is estimated before the
final pose is selected based on the reprojection error.
Figure 13 shows a comparison of both methods concerning their accuracy and robust-
ness to noised image and reference points based on a Monte-Carlo test. In an arbitrary
1
All methods used in the comparison of Figure 13 are accessible for this work as c++ implemen-
tations [OSVisionLib], while EPnP and iterative PnP are implemented by [OpenCV]
12
Fundamentals Patrick Irmisch
Planar Scene
Deviation in Positions XYZ Mean Reprojection Error
number of used points = 12: number of used points = 12:
deviation [pixel]
deviation [m]
1
40
0
20
−1 0
0 2 4 0 2 4
deviation [pixel]
deviation [m]
1
40
0
20
−1 0
0 2 4 0 2 4
noise level [*σi ] noise level [*σi ]
EPnP (øt : 93us) RPnP (øt : 177us) RPnP+Iterative (øt : 245us)
scene2 , which simulates the projection of the corners of three coplanar AprilTags, four
and twelve correspondences are used to solve the PnP problem. Each method is applied
to different noise levels nl, which applies Gaussian noise to the position of the reference
points by σ3D = nl ∗ 1cm and to image points by σ2D = nl ∗ 1pixel. Each distribution
of Figure 13 is based on 100.000 iterations in the Monte-Carlo test. For a description
of the box plots, please visit Section 4.4 (p.28).
Figure 13 (left) shows the deviation of the estimated homography to the ground truth
matrix based on its translation, while each box includes the deviation in all three di-
rections. Concerning an increasing noise level RPnP is more accurate using either four
and twelve points. The same result is maintained when comparing the mean reprojec-
tion error. The grey-lined box that marks the result of EPnP at zero noise indicates
obvious non-valid calculations, which implies a leak in robustness by EPnP.
In this work PnP is mainly applied for four correspondences provided by one April-
Tag or four WhyCon, which implies the application of RPnP because of its superior
performance in this case. The result is then refined by applying an iterative PnP
(Levenberg-Marquardt, [OpenCV]) approach that directly optimizes the pose based on
the reprojection error. Figure 13 shows a clear improvement with respect to stand-alone
EPnP or RPnP. However, the quality and validity of the iterative approach strongly
depends on the initial pose assumption. For this reason, the approach RPnP+Iterative
is used in the further course of the work, names as PnP.
2
A visualization of this scene with coplanar 3D reference points can be found in Appendix A.1.
Also a scene with non-planar reference points is provided.
13
Fundamentals Patrick Irmisch
The projected ellipse of a circular shaped marker can be used to estimate its pose,
detailed in (Krajník et al., 2014, p.9)3 . In the further course of this work, this method
is applied to estimate the position of the circle of a WhyCon marker (WhyCon+Circle).
The projected ellipse is defined by its center ce (u, v) and semiaxis e0 , e1 , which result
from the marker detection of Chapter 3.3.2. It is transformed to a canonical camera
system to compensate the radial distortion at the position of the detected ellipse.
Illustrated in Figure 14 the image coordinates of the ellipses vertices a0,1 and b0,1
are calculated and un-distorted by using the model of Section 3.1.3. The transformed
image points a00,1 and b00,1 are used to define the the transformed ellipse, resulting in
c0e (u0c , vc0 ) and e00 , e01 .
The resulting parameters are then used to establish the parameters of the conic, defined
in the ellipse characteristic equation of Equation 7.
0 0 0 2 0 0 0 2
qa = +e0u e0u /|e0 | + e0v e0v /|e1 |
qb = +e00u e00v /|e00 |2 + e00u e00v /|e01 |2
0 T
qa qb qd u0
u
q = +e0 e0 /|e0 |2 + e0 e0 /|e0 |2
0 c 0u 0u 0v 0v
v qb qc qe v 0 = 0, with 1 0
(7)
0 0
1 q d qe q f 1
q d = −u q
c a − vc bq
0 0
qe = −uc qb − vc qc
q = +q u0 2 + q v 0 2 + 2q u0 v 0 − 1
f a c c c b c c
3
The implementation is provided in the opensource package[WhyConLib].
14
Fundamentals Patrick Irmisch
3.3.1 AprilTag
AprilTag is a "visual fiducial system that uses a 2D bar code style" (Olson, 2011),
shown in Figure 5 (c, p.6). Its detection is divided into two main steps: the detection
of the pattern and the coding system used to identify the different tags.
The detection phase starts by smoothing the image to reduce the influence of image
noise. Then for each pixel the gradient and its magnitude are estimated. The use of
gradients reduce the influence of exposure. Based on the gradients, lines are detected
by a graph-based clustering method. The set of detected lines is then investigated by
a recursive "depth-first search with a depth of four" (Olson, 2011, p.4) to find quads
of lines. For each candidate quad the 2D homography transformation from the tag
coordinate system into the system defined by the four corners of the quad is estimated.
This homography is then used to calculate the image position of each bit to extract the
pixel values. Finally, a proposed spatially-varying threshold method is used to classify
black and white values, which also increases the robustness to exposure.
The extracted binary codes of the candidate quads are used to distinguish between
patterns with different IDs and to filter out quads with invalid IDs. Therefore, AprilTag
consists of a complex coding system. First, it rejects code words that result in simple
geometric patterns under the assumption that complex patterns occur less frequently
in nature. Second, tag clusters are chosen in such way that the entropy is maximum
in each bit by maximizing the Hamming distance.
The encoding and identification of the binary code plays an important role in the
AprilTag pattern detection. The quad detection generates many candidates, which
are verified by the subsequent identification4 . That means the maximum detection
distance of a AprilTag is bounded by the granularity of the included code.
The result of the Apriltag pattern detection are the four corner points of each tag,
which are used to apply a PnP approach, described in Section 3.2.2.
The detection of the circular pattern of Figure 6 (d, p.6) is based on the assumption
of a coherent circular white segment enclosed by a ring-shaped black segment. The
algorthm is proposed and described in detail in (Krajník et al., 2013, 2014).
Given a new frame Image a buffer pixel_class is initialized, which is used to store
the information for each pixel whether it is black, white or initially unknown. Then,
Algorithm 1 is applied multiple times to find the next pattern until all possible patterns
in the image are detected or a maximum number is reached. The algorithm starts at
a passed pixel position p0 and iterates pixel by pixel i through the image until a black
pixel is reached, classified by the passed threshold τ . Using a Floodfill algorithm,
4
An example is provided in Appendix A.2.
15
Fundamentals Patrick Irmisch
all connected black pixels are segmented and additionally marked in the buffer. If
the segment couter surpasses a minimum size and passes a simple roundness test to
guarantee a circular shape, a new segmentation of white pixels is started from the
center of couter to verify its annularity. The white segment cinner is also investigated
on its minimum size and roundness. Given both segments and prior knowledge about
the proportion of the patterns inner and outer radius, a ratio test on the segments size
is performed. Using all included pixels of the segments, the ellipses center (uc ,vc ) is
defined by their mean position. Formulated in Equation 9, the ellipses semiaxis e0 ,
e1 are calculated by the eigenvalues λ0 , λ1 and eigenvectors v0 , v1 of the covariance
matrix C.
1 s−1
" # " #
X ui ui ui vi uu uv
e0,1 = 2λ 1/2
0,1 v0,1 with C = − c c c c (9)
s i=0 ui vi vi vi uc vc vc vc
c = couter
c.valid = true
break . segment found
i ← (i + 1)mod sizeof(Image)
until i 6= p0
p0 ← i
16
Fundamentals Patrick Irmisch
pixels µinner . This technique ensures optimal thresholding during the segmentation
and thus an accurate estimation of the pattern borders. This is necessary since the
position estimation based on a projected ellipse (see Section 3.2.3) is highly affected
by the estimated pattern borders. Also, a correction of estimated semiaxis is proposed
that takes the true ratio of the inner and outer circle into account (Krajník et al., 2014,
p. 8). However, this time-dependent determination of the threshold assumes a constant
exposure, which only applies for indoor applications.
By reducing precision of the roundness check for the inner circle, non-circular inner
white segments can be applied to the pattern. As illustrated in Figure 6 (e, p.6),
WhyCode (Lightbody et al., 2017) extends the pattern by applying a binary code to the
inner circle. For code identification, they combine a "Necklace code" with Manchester
Encoding (Forster, 2000), which provides rotation invariance and different IDs.
3.3.3 SGM
Semi-Global Matching "[...] uses a pixelwise, Mutal Information based matching cost
for compensating radiometric differences of input images" (Hirschmüller, 2007). It is
based on a known interior and relative calibration of a rectified stereo setup. This
implies that the corresponding pixel in the other image is known to lie on the same
image line. This knowledge is used to apply a local smoothness constraint. This is
usually realized by calculating the matching costs of all possible disparities for each
pixel p on that line, registered in a matrix and applying dynamic programming to find
the path through the matrix which has minimal costs. However, neighboring pixels
of contiguously lines often show irregular jumps in disparity (Moratto, 2013). SGM
efficiently solves this problem by combining several one-dimensional optimization from
all directions as illustrated in figure 15 (a). The result is a dense disparity image with
sub-pixel estimation that provides sharp object boundaries and fine details.
Figure 15 (b) shows an exemplary colored disparity image based on a simulated image
pair with a pictured train. Distant objects have small disparities and near rather
large disparities, as it can be traced in Figure 11 (p.11) with the disparity (u − u0 ).
The colored disparity visualization shows large gaps colored in grey, which arise in
untextured areas. The implementation of (Ernst and Hirschmüller, 2008) is provided
by the DLR for this work.
u small
p
v
big
17
Fundamentals Patrick Irmisch
(i) (ii)
(ii) (w`,h`) (iii)
Synthetic data can be generated by using the graphic rendering pipeline. "The main
function of the pipeline is to generate, or to render, a two-dimensional image, given
a virtual camera, three-dimensional objects, light sources, shading equations, textures
and more" (Akenine-Möller et al., 2008, p.11). This basic rendering pipeline consists
of three conceptional steps, illustrated in Figure 16. First, the geometry of the scene
is defined in the application step, which defines the positions of all elements based
on the scene specification. Elements are points (or vertices), lines and faces. Faces
are each defined by three vertices and represent the surface of the object. Second, all
positions of the to be rendered objects are projected into image coordinates during the
geometry stage based on a normalized camera model. Also the vertices that are not
bordered by the image are clipped. Last, the rasterizer stage uses the transformed and
projected vertices to compute the nearest face and set colors for each pixel defined by
the face. The result is an ideal image that coincides with the pinhole model, as shown
in Figure 16 (iii) or Figure 17 (ii).
In this work, an extended graphics rendering pipeline is used, provided by the DLR
(Lehmann, 2015, 2016). The extension is realized in two additional shader levels linked
to the end of the basic shader pipeline and applies image degradation.
In the first step, lens distortion is realized in the lens-shader. It distorts the resulting
image (Figure 17 (ii)) of the basic rendering pipeline by using the Brown distortion
model, explained in Section 3.1.2 (p.9), and bilinear interpolation (Akenine-Möller et
al., 2008, p.158). This is done by precomputing a lookup table on the central pro-
cessing unit (CPU), which holds the position in the ideal image (ii) for each pixel of
18
Fundamentals Patrick Irmisch
Graphic Rendering
Lens-Shader Sensor-Shader
Pipeline
the distorted image (iii). This lookup table is initially also used to adapt the camera
model used in the basic rendering pipeline to increase its image borders (w0 ,h0 ) of (ii)
in the way that all transformed positions of the distorted image can be mapped. This
is necessary since the distortion can exceed the original image borders as shown in
Figure 8 (p.10), which would lead to undefined regions in the distorted image (iii).
Finally, various image degradation effects are modeled in the sensor-shader. This
includes blurring with a Gaussian kernel, greyscaling, exposure and also an implemen-
tation of the noise model of Section 3.1.3 (p.10).
3.4.3 Anti-Aliasing
19
Fundamentals Patrick Irmisch
shows that mip-mapping removes artifacts and pixel flickering within the tags, but
jaggies on the wall border remain.
For this work, a combination of these three methods is used to ensure sufficient image
quality. Supersampling is applied with a sampling grid of 42 px for each original image
pixel, followed by multisampling with a 42 px grid for each supersampled pixel. Last,
mip-mapping prevents undersampling of the pictured textures.
Proposed
(a) →
Combination
(Frame 1 Hi-Res)
(b) Standard →
(c) Supersampling →
(d) Multisampling →
(e) Mip-Mapping →
Table 1: Visualization of methods for anti-aliasing. Frames 1,2 represent low resolution
images captured from a distance to the wall of 100m with a vertical camera
shift of 0.05m from frame 1 to 2. The pictured AprilTag has a width of 0.6m.
Frame 1 is shown in higher resolution in the visualization (a). Visualization (e)
shows an exemplary image pyramid for mip-mapping. Visualizations (b,c,d)
show the rastering of a scene with one face, the original sample positions as
grey points and the new sample positions with red crosses with a sample grid
of 22 for each original pixel, based on (Thoman, 2014).
20
Evaluation Pipeline Patrick Irmisch
4 Evaluation Pipeline
In this section an evaluation pipeline is proposed to evaluate and compare different
RPV-methods. This setup allows to compare the accuracy of the applied distance-
estimation methods under the influence of variation and uncertainty of versatile pa-
rameters such as exposure and calibration uncertainty. First, the general experiment
setup and an outline of the stages are explained. Second, real-world experiment setups
as well as the simulation concept are presented. Finally, the evaluation procedure is
introduced.
y
z
x
T
HT2V
z
RP z
x
α d y x
y
HCL2RP HRP2V V
CL
z
x CR z HV2W
y
x z
y y
x
W
HCL2W
Figure 18: Specification of significant coordinate systems and transformations, e.g. the
transformation of the vehicle V to the world coordinate system W is labeled
with HV 2W .
21
Evaluation Pipeline Patrick Irmisch
data container scenedata is created and passed through, which holds the specifications
of each individual iteration and is filled during each stage. In the first stage of the
evaluation pipeline, the scenedata is set up using following parameter groups which are
defined in the experiment setup:
• Number of iterations define how often each scene of the two-dimensional array of
the scenedata is repeated with individual sampled target parameters.
• Secondary parameters define the Gaussian distributions of the variation and cal-
ibration parameters for the experiment. Based on this distribution the corre-
sponding target parameters are sampled individually for each iteration.
• Primary parameters fix target parameters to specific values for each scene. For
each simulated experiment up to two primary parameters are chosen which define
the two dimensional scene-array of Figure 18 and 27 of the resulting plots.
As illustrated in Figure 20, each scene consists of multiple iterations, each defined by
a set of parameters. These parameters are divided into four parameter groups:
• Target parameters hold the sampled input values that are used in the aggregation
stage and the application stage such as the exposure or the noised focal length.
• Ground truth parameters hold the ground truth distance of the stereo-camera to
the vehicle. In the case of simulated data, it also contains all true transformations
of all simulated objects.
• Support parameters hold additional information about the scene, which would be
estimated by another not-implemented algorithm. For instance, the labeling of
the vehicle in the image used for the SGM approach in Section 5.3.1.
• Estimated parameters hold the estimated distance of the stereo-camera to the
vehicle and information about the success of the marker detection.
During the aggregation stage and in the case of virtual test data, each iteration is sim-
ulated with the related target parameters and all required ground truth and support
parameters are noted. In the case of real-world test data, the corresponding stereo
22
Evaluation Pipeline Patrick Irmisch
Scenedata
Prim. Param. A
a1 a2 a3
Prim. Param. B
Scene 11 Scene 21
b1 Target Parameters
Nr
Scene 12 Ground Truth Param.
.
It.1
of
b2 It.2
It.
It.X Support Parameters
b3 Estimated Parameters
frames of the dataset are linked to each iteration and the ground truth and support
parameters are defined by manually labeling the images.
In the application stage the individual methods are applied to each repetition to es-
timate the distance. The noised calibration parameters that are stored in the target
parameters are applied in this stage. Afterwards, the detection of the markers in the
image is checked by using the ground truth data to assess the success of the marker
detection and thus the distance estimation. Finally, the generated data is evaluated
and interpreted by the help of different plots in the last stage.
Figure 21: Exemplary left camera subimages from the proposed datasets. (Increased
contrast and brightness for better illustration)
23
Evaluation Pipeline Patrick Irmisch
ing. Thus, the only influence that varies between each iteration is image noise. For each
position to the train or vehicle a reference measurement with a laser scanner [GLM-80]
was conducted. Detailed information about the datasets can be found in Appendix B.3.
The first dataset is based on a measurement campaign (Funk, 2017) in which two con-
figurations of different AprilTags attached to a train (BR219) were recorded, each from
three distances of up to 24m. Figure 21 (a) and (b) shows two exemplary subimages
captured by the left camera. This dataset is used for a preliminary experiment to
find a promising configuration for AprilTags. Charactaristic of this data is the visible
overexposure, exemplary shown in Appendix B.3.1.
The second dataset is divided into two subsets. First, the setup of Figure 21 (d) is
based on multiple configurations of different markers attached to a vehicle. It includes
five close distance recordings between 5 and 15m and six far distance recordings be-
tween 20 and 60m. It is used to directly compare three different marker configurations,
which are discussed in Section 5. Figure 22 specifies the occupied marker area of all
four configurations5 . Second, the dataset of Figure 21 (c) is used to experimentally
determine the detection range of different WhyCon-based markers, recorded in 5m
steps up to 55m. Both subsets show the characteristic of low exposure beginning at a
distance of 50m. This is caused by the shadow of a row of trees that darkens the part
of the image with the pictured vehicle.
All real-world setups are also simulated, as exemplary shown in Figure 23, to facilitate
a deeper evaluation of the applied marker configurations. The reference point RP is
placed in the center of the middle AprilTag for datasets (a, b) and in the center of the
large WhyCon marker for dataset (c).
Figure 23: Exemplary simulated left camera subimages from the proposed datasets.
(Increased contrast and brightness for better illustration)
5
The marker dimensions are listed in Appendix B.3.
24
Evaluation Pipeline Patrick Irmisch
4.3 Simulation
In general, real-world benchmark datasets are rather difficult and time consuming to
produce. In contrast, the creation of synthetic datasets is only limited to the available
computational power and the number of implemented variable parameters. Further-
more, simulated data provides perfect noiseless ground truth data.
The goal of the proposed simulator is to complement the real-world datasets with more
dissolute synthetic datasets, which are generated with varying properties. Next to the
extensive evaluation and comparison of different RPV-methods, it also allows to "[...]
investigate the influence of camera or scene properties on the [distance evaluation],
to prototype, design, and test experiments before realizing them in the real-world,
[...]" (Ley et al., 2016, p.4).
The evaluation of this work is based on a Monte-Carlo-Simulation, which is used to
statistically estimate the uncertainty of the estimated distances and to analyze correla-
tion to input parameters. For a number of trials, values are sampled from the assigned
probability density function (PDF) of the individual parameter. For simplification,
they are assumed to be independent in this work. The estimated distances of all itera-
tions form a PDF that is used to define the resulting uncertainty. The implementation
of the Monte-Carlo-Simulation is carried out according to the step-by-step procedure
of (JCGM, 2008b)
Table 2 shows all parameters that are varied between each iteration. They are divided
into two categories. First, in the simulation stage all parameters that variate the scene
are changed for each iteration. And second, uncertainties of all calibration param-
eters are modeled in the application stage. In the following, the implementation of
the Monte-Carlo-Simulation in the evaluation pipeline of Figure 19 is explained. The
explanation starts with the embedding of the extended shader pipeline and continues
with the realization of the application stage. Finally, the main evaluation procedure is
introduced, which is used in the further course of this work.
The simulation stage is based on the extended rendering pipeline of Section 3.4.2. Fig-
ure 24 illustrates how this pipeline is embedded in the aggregation stage to realize and
update all specific scene and iteration specifications.
First, the spatial correlations of all objects of the scene are represented in a hierarchi-
cal fashion, the scenegraph (Akenine-Möller et al., 2008, p.658). This graph represents
a tree with objects as nodes and three-dimensional homography transformations as
25
Evaluation Pipeline Patrick Irmisch
For
For each
each itit in
in scene
scenexy Iteration
Specification
Scene Geometry Distortion Parameter Sensor Specification
Graphic Rendering
Lens-Shader Sensor-Shader
Pipeline
· Rendering · Distortion · Exposure
· Anti-Aliasing · Downsampling · Image Noise
· Greyscale
edges. Figure 25 shows the important elements of the applied scenegraph. Starting in
the world coordinate frame, it starts with the transformation HV 2W from the vehicle
coordinate frame. In the example of Figure 18 (p.21), it describes the position of a
train in world coordinates, visualized with a mesh of a train [Blend1] bound to this
node. To this object, several markers Tn are attached. The asterisk of their homog-
raphy marks that this transformation stays unchanged in the simulation, but will be
noised in the application stage to model calibration uncertainty. On the same level, a
node for the reference point RP is defined that represents the point on the vehicle to
which the distance is to be estimated. To this node, the position of the left camera CL
is set by the matrix HCL2RP , defined by the to be evaluated distance d and view-angle
α. To variate the position of the projected vehicle in the image, the orientation and
position of the camera is minimal varied by the transformation HT remble between each
iteration6 .
Then the defined scene is rendered based on the camera model with the extended im-
age borders (w0 ,h0 ). This modification of the texture size includes on the one hand a
summation of the offsets ou and ov to handle the complete distortion (see Section 3.1.2)
and on the other hand it is scaled by the supersampling factor s, which is set to 4.
The supersampling is directly embedded in the distortion step. As formulated in Equa-
tion 10a, each pixel (u,v) of the distorted image (iii) is mapped to the larger ideal image
(ii) by the inverse distortion equation, the offsets ou , ov and the scaling s. Within the
6
Details are provided in Appendix B.1.
26
Evaluation Pipeline Patrick Irmisch
ideal image the neighborhood (is ,js ) of s2 pixels is sampled and averaged. The samples
are interpolated by bilinear interpolation (Akenine-Möller et al., 2008, p.158). Besides
supersampling, multisampling and mip-mapping are applied, but already implemented
by [OSG].
1 s−1
X s−1
" #! " # " #!
u X u0 is − s−1
2
piii = 2 pii + (10a)
v s is =0 js =0 v0 js − s−1
2
" # " # " #!!
u0 ou u
with = + distort−1 ∗s (10b)
v0 ov v
Finally exposure is applied, which scales all pixel values by a factor that is varied for
each scene and image noise is added. Further lens and sensor effects such as blurring
and vignetting are not applied due to incomplete information about these properties.
In the application stage, all RPV-methods are applied to each iteration of all scenes.
As illustrated in Figure 26, before an iteration of one scene is processed, all geometric
calibration parameters are resampled to employ false values with regard to their cali-
brated distribution.
In the simulation itself all calibration parameters are fixed to the values estimated from
the calibration of the real-world camera [DLRStereo]. The calibration uncertainty is
subsequently modeled in the application stage by sampling from the parameter dis-
tribution around the simulated value. In contrast to directly simulate the calibration
uncertainty in the simulation stage, this results in the loss of different unknown image
effects. However, this variant is used for several reasons. First, applying the interior
camera calibration uncertainty in the simulation, thus recreating the distortion lookup
table at each iteration makes it computationally infeasible to apply an extensive Monte-
Carlo test7 . Second, the calibration of the markers cannot be simulated since variation
in their pose could lead to overlapping with the rigid modeled vehicle surface. And
third, the uncertainty of the stereo calibration HCL2CR (Figure 7 p.8) is also modeled in
this stage to cleanly separate the application of variation and calibration uncertainty.
For each RPV-method, an object is initially created to avoid setting up all buffers
For
For each
each itit in
in scene
scenexy Iteration
Specification
Sample Calibration
Parameters
Calibration Support Stereo Groundtruth
Parameters Parameters Frame Parameters
For
For each
each method
method
RPV-instance
Clear temporal Update Update Estimate Validate
buffers Calibration Support Distance Detection
7
The generated frame rate drops from around 15 frames per second to around 3 per minute.
27
Evaluation Pipeline Patrick Irmisch
for each iteration. However, these buffers are cleared for each iteration to facilitate
independent estimations, as marked in Figure 26. Then the target parameters with
meaning of the calibration parameters and additional support parameters are updated.
Afterwards the distance is estimated for the current stereo frame. Finally, ground truth
data is used to verify the detection of the required markers to exclude influences by
erroneous detections during the statistically evaluation of the estimated distance.
Different Experiments
Prim. Param. B
Iterations
28
Integration of Methods Patrick Irmisch
5 Integration of Methods
This chapter describes the RPV-methods applied in this work. First, the usage of the
fiducial markers AprilTag and WhyCon is outlined. Therefore, different configurations
are proposed for each marker, which are verified in a preliminary evaluation to continue
with the individual best configurations in the final evaluation phase. Finally, it is
explained how SGM is applied and how the markers could be used in a stereo setup.
5.1.1 Application
When using AprilTags, the distance is estimated by applying a PnP method on the
corners of the detected AprilTags (see Section 3.3.1). The advantage is that multiple
AprilTags can be applied at the same time. Formulated in Algorithm 2, multiple
markers Mcalib can be attached to the same vehicle, each defined by its pose HT∗ 2V in
vehicle coordinates and the id of the marker. (Square brackets represent lists.)
The algorithm starts by extracting the AprilTags from the image, while mT holds
the respective four corners. Based on the id, the extracted AprilTags are assigned to
the defined markers of Mcalib . If more than one extracted AprilTags have the same id,
multiple candidates are created. If one is missing, only the detected AprilTags are used.
For each candidate, all image points ximage and corresponding world points xworld are
stored in Mcorr . For each candidate Ct , the pose of the camera to the vehicle HCL2V is
estimated by RPnP and by using the average reprojection error rt , the best candidate
29
Integration of Methods Patrick Irmisch
Cb is selected. rtresh represents a maximum allowed error (50px) to sort out obviously
wrong estimations. Finally, the pose is refined by an iterative PnP approach, before
the distance d from the vehicles reference point to the left camera is estimated.
The preliminary evaluation of Figure 28 compares the usage of two different AprilTag
types. It is based on the setups (a) and (b) of Figure 21, applied in real-world and sim-
ulation. Type 1 applies three small markers with high code density and Type 2 shows
large markers with smaller code density. Additionally, the applications of all three
markers (AprilTags #3) and only the middle one (AprilTags #1) are investigated.
The real-world experiment (ii) shows an increased deviation to the ground truth dis-
tance when using only one marker. This is caused by a strong overexposure of the
recordings that makes the marker appear smaller, illustrated in Appendix B.3. This
effect is also present in the simulated experiment (i), revealed by the long upper whisker
when using only one marker (blue) of Type 2. This effect can be bypassed by using
multiple markers (red) since the influence of inaccuracies of the corner detection is
balanced by a greater number of correspondences for PnP (see Section 3.2.2). When
comparing the different marker types, it is striking that Type 2 has a greater detection
range. Furthermore, the number of used markers also increases the detection range,
because only one marker needs to be detected for a successful distance estimation. But
also because using only one marker leads to a more frequent failure of RPnP itself.
To summarize, this experiment shows the superiority of large markers with small code
density. Also, using three marker increases the accuracy of the estimation due to more
correspondences. However, to provide a total marker area that is comparable to other
RPV-methods, only one large and two small markers are applied in the final experiment
of Figure 28 (a). This setup should support a wide detection range due to the large
marker and an accurate estimation for smaller distances due to three usable markers.
0.2
0.5
0.1
0.0 0.0
−0.1
−0.5
−0.2
(i) Simulated
6.0 11.5Experiment
21.0 (+Var)
30.0 (ii) Real-World
6.0 Experiment
11.5 21.0
AprilTag Type 2distance [m] AprilTag Type 2distance [m]
distance error [m]
0.2
AprilTags #3 + PnP CL AprilTags #1 + PnP
AprilTags
CL0.5 #3 + PnP CL AprilTags #1 + PnP CL
0.1
0.0 0.0
−0.1
−0.5
−0.2
5.88 12.96 24.2 30.0 5.88 12.96 24.2
distance [m] 500 fpb distance [m] 35 fpb
30
Integration of Methods Patrick Irmisch
5.2.1 Application
1
Pre- Re- Position
Identification
Detection Detection Estimation
31
Integration of Methods Patrick Irmisch
medians to calculate the optimal threshold τopt for each individual pattern. Based on
this threshold, each WhyCon marker mT is re-detected with optimal thresholding. As
discussed in Appendix A.3, the speed advantage of WhyCon by tracking the pattern
is lost, but the proposed extension is robust to variation in exposure.
In the third step, the patterns are identified that most likely represent the attached
WhyCon markers on the vehicle. It is distinguished between two setups shown in
Figure 29 (b) and (c).
First, in Figure 29 (a) a single WhyCon marker with the proposed coding of the next
section is attached to the vehicle. The purpose of this configuration is to evaluate the
performance of the distance estimation based on the circle of one large pattern. Due
to the attached coding, the pattern can be directly identified by its code. In it the
comprehensive evaluation the id is not used, as described in Section 5.2.4. Instead, it
is identified by its projected size with the assumption that all faulty detected WhyCon
pattern are smaller than this projected pattern or have a less circular shape. This
assumption holds true for all considered datasets in this work. Moreover, due to the
applied code the inner circle does not represent a circle anymore, which is why the
proposed correction of the circle semiaxis of (Krajník et al., 2014, p. 8) is not applied.
Second, Figure 29 (c) shows a configuration with four normal WhyCon markers whose
detected center points are used for PnP while using a method similar to Algorithm 2
(p.29). The idea is to reduce inaccuracies by employing a large quadrangle spanned
by four small patterns. They are identified by their same appearances and spatial
correlations during the creation of candidates in the method assignMarkers. Thus,
four pattern form a candidate, if the following condition regarding their size is full
filled. The square brackets include the four patterns and e0 states the first eigenvector
of the ellipse of mt (see Section 3.2.3).
max([mT .e0 ])
< 1.5 (11)
min([mT .e0 ])
The patterns are then assigned to the attached WhyCon markers based on their spatial
correlation. Exemplary, the projected left-down pattern of Figure 29 (c) has a smaller
x-value then the two patterns on the right and a larger y-value than the two above.
Only one possibility remains for each foursome combination.
0 1
α
B3 B1
B2 B2
B1 B3
1 0
Figure 31: Visualization of steps to estimate the angular shift of the code
32
Integration of Methods Patrick Irmisch
Inspired by WhyCode (Lightbody et al., 2017), a coding system and code extraction is
proposed in this section. The goal of this implementation is to evaluate the detection
range of WhyCon with code identification and not to compete with WhyCode itself.
The proposed coding system is presented in Figure 31 (a). The code is attached to the
inner border of the black circle. Two large opposing excesses define the beginning of
the hidden code. Three bits on the left that are used to store the id are negated and
mirrored on the right side.
id
y x B1 B2 B3 B1 B2 B3
1010011010
x y
α
Transform Circular Hough Estimate Code
to Circle Gradient Transf. α-Shift Extraction
The code extraction algorithm is summarized in Figure 31 (b) and detailed in Figure 32.
First, the subimage of the detected ellipse is transformed into circle coordinates with
a fixed resolution of 312 px. This transformation ensures the correct weighting of the
gradients in the Hough space. Second, a clockwise circular gradient is applied by
calculating the image gradients in x and y direction with the sobel operator (Jähne,
2005, p. 365) first and projecting them in circular direction afterwards. The circular
gradients are then registered in an one-dimensional Hough space to find the angle
that corresponds to the beginning of the code. Since two possibilities remain, the
assumption that the pattern can not be upside down in the image is employed. With
the α-shift by hand, the n pixel values pi of each bit are sampled and a binary value
bi is assigned based on a spatially-varying threshold formulated in Equation 12. This
assignment method shows the necessity of equal numbers of black and white bits.
Pn
0 1
if pi < n j=1 pj
bi = (12)
1 else
This preliminary evaluation investigates the detection range of WhyCon marker with
and without an attached binary code. Figure 33 shows the used datasets. The simu-
lated setup (a) includes multiple WhyCon in front of a wall with different tilts. This
setup is used to estimate the detection range for scenes with variation (WC Detection).
(b) shows multiple WhyCon extended with a binary code. This setup is used to apply
the WhyCon pattern detection either with (WC* Identification) and without identi-
fication (WC* Detection). Finally, on the real-world setup (c) all three methods are
applied to confirm the results from the simulated experiment. All marker of these se-
tups have a total diameter of 24cm, including the outer white circle (see Appendix B.2).
33
Integration of Methods Patrick Irmisch
One property of the simulated experiment is that all detection rates start decreasing
quite early at around 30m, which is caused by the strongly angled markers. The dis-
played line at a detection rate of 0.5 seems to be suited for a comparison of the methods
since it still includes all slightly tilted markers. The experiment (a) of Figure 34 shows
that the extended WhyCon pattern has an even better operation than the normal
WhyCon. This is caused by the adaptive thresholding, which favours a different black
ring width than the original method as deepened in App. A.3.2. When applying the
identification of the code, the detection range drops by 10m at a detection rate of 0.5.
Both observations can be retrieved in the real-world experiment that only contains
slightly tilted markers. First, the detection range of both markers types is nearly
equal. And Second, the identification reduces the detection range of the extended
WhyCon marker by 10-15m. Thus, the code of this marker size was always success-
fully identified up to 25 meters, while the detection of the extended marker goes up to
40m with a 100% detection rate. This observation implies a loss of the detection range
of 30%, when applying code extraction.
1.0 1.0
0.5 0.5
0.0 0.0
10 20 30 40 50 10 20 30 40 50
distance [m] 50 fpb distance [m] 50 fpb
WC Detection WC Detection
WC* Detection WC* Identification
WC* Detection WC* Identification
WC Detection WC* Identification
WC* Detection
Figure 34: Analysis of the detection range for different WhyCon patterns
5.2.4 Summary
Two WhyCon configurations are chosen for the comprehensive evaluation. First, a sin-
gle large WhyCon marker with attached binary code. However, the code identification
is not applied further since the advantage of the extended WhyCon marker is that
it can be detected even though the code can not be extracted. Second, four single
WhyCon markers without attached code are used to apply the PnP approach.
34
Integration of Methods Patrick Irmisch
The result of Semi-Global Matching is a disparity map, which allows to estimate the
depth of each pixel by triangulation. From this disparity map all disparities that belong
to the preceding car need to be classified, which requires a labeling of the vehicle in
the image. This could be done by using a vehicle detection and classification method
(Sivaraman and Trivedi, 2013a). However, this preliminary step makes the estimated
result dependent on the quality of the classification. This is to be prevented in this
work since the focus is to test the potential of estimating the distance by SGM. Thus,
the vehicle is labeled in the image by the support parameters sm ,sw ,sh . During sim-
ulation, these points are defined on the three-dimensional vehicle mesh, as illustrated
in Figure 35 and are then projected into the camera. For the real-world experiment,
they are labeled by hand for each different position.
The image points sm ,sw ,sh are then used to create a rectangle, as shown in Fig-
ure 35. For each pixel of this subimage, a sampling weight is assigned based on a
two-dimensional Gaussian distribution with the mean sm and sigmas σx = 31 |smx −swx |,
σy = 31 |smy − shy |. After normalization, this distribution is used to sample 101 dis-
parities by stochastic universal sampling. For all samples, the depths of the pixels
are triangulated. Finally, from all distances the median is chosen as the representing
vehicle distance. This sampling is implemented to compensate single outliers, which
can occur due to image noise and light reflections. Though, if the rectangle contains
a bulge of the vehicle surface, a constant bias can occur in the statistical evaluation,
which is not considered in this implementation.
small
sh
sw sm
big
35
Integration of Methods Patrick Irmisch
If markers are applied to the vehicle and the scene is observed by a stereo-camera,
then it is convenient to apply triangulation on the detected pattern in both images to
produce another potentially independent measurement. Therefore, either one reference
pattern of AprilTag and of WhyCon is used for triangulation.
In the case of AprilTag, the pattern with the correct id is matched between both im-
ages. All four points are triangulated to estimate their three-dimensional position to
the left camera. The distance is then calculated by the distance to the center of the
marker, defined by the mean of the corner positions. In the case of WhyCon, a large
pattern attached to the vehicle is used for triangulation. The pattern is identified as
explained in Section 5.2.1.
In the case that the reference point is not the center of the considered marker, the dis-
tance is corrected by the Pythagorean theorem with the assumption that the reference
point is on a vertical line to the estimate marker center. Thus, this correction is not
generally applicable, but it is sufficient for the forthcoming consideration and only used
to correct the distance of the estimation with the large AprilTag of Figure 21 (c).
Figure 36 shows a short comparison of the SGM and the marker-based triangulation
methods. The configuration of Figure 21 (c) is simulated for two small distances (i)
and two far distances (ii). The experiment shows that the triangulated distance by
the AprilTag is most accurate for short distances, but not applicable for far distances
since the pattern is not detected anymore. WhyCon is more accurate than SGM
for short distances. This is because the triangulation of WhyCon is based on the
center of the circular pattern, which can be determined with high precision. This
advantage decreases for far distances, because the number of pixels used for the center
determination gets smaller. The consequence is that WhyCon shows a larger spread
for far distances than SGM.
To reduce the number of compared methods in the following evaluation, only SGM is
considered at first as representative for the stereo methods. The triangulation of the
markers is pick up again in Section 6.3.
0.1
0.0 0.0
-0.1
-2.0
10 15 40 70
distance [m] distance [m] 500 fpb
AprilTag +
AprilTag + Tri.
Tri. WhyCon + Tri. SGM
RPVAprilTags-TRIANG-Distorted-1 time: 545ms
WhyCon + Tri. RPVWhyCon-TRIANG-1 time: 353ms
Figure 36: Comparison
SGM of SGM with marker-based
RPVSGM time: 384ms triangulation
36
Evaluation Patrick Irmisch
6 Evaluation
In this section, the selected configurations of Figure 29 (p.31) and the SGM-based
approach are evaluated, with respect to the formulated research questions. For an ex-
planation of the plot characteristics, please revisit section 4.4 (p.28).
Concerning Research Question (1), the general potential is investigated by concentrat-
ing on specific variation parameters in simulation. That includes the distance, the
view-angle and image exposure. During these simulated experiments, the calibration
uncertainty is not modeled to retain noiseless results. Then the influence of calibration
uncertainty on the different methods is investigated, motivated by Research Ques-
tion (2). Based on a Monte-Carlo-Simulation (MSC), the uncertainty of the methods
are estimated and compared. Then, the correlation of the methods with specific cal-
ibration parameters is exploited and the discovered correlations are analyzed in more
detail. Finally, it is investigated how the methods could be combined to yield a more
accurate, robust and less uncertain estimation to answer Research Question (3).
A few of the experiments refer to the appendix, which contains tables with specified
information. The computational time is briefly considered in Appendix A.4.
6.1.1 Distance
Figure 37 shows a comparison of the RPV-methods pointed out in the previous section
based on the distance to the preceding vehicle. Therefore, a simulated and a real-world
experiment are conducted based on the setup of Figure 21 (c, p.23). For both exper-
iments, three short and three far distances are evaluated in Figure 37. The simulated
experiment (i, left) shows a high precision of the PnP-based methods for short dis-
tances. Although, the spread of AprilTags+PnP is suddenly increasing at a distance
of 20m. This is caused by the missing detection of the smaller AprilTags and the ac-
companying dependency on image exposure, discussed in Section 6.1.3. Related, the
estimation based on the single WhyCon shows a relatively large spread for short dis-
tances caused by the same dependency on image exposure. However, Figure 37 (i, right)
shows the advantage of WhyCon+Circle manifested in its large detection range, even
though this marker configuration has the smallest occupied area of all three considered
marker configurations of Figure 22 (p.24). Finally, the SGM approach provides good
results comparable to WhyCon+Circle. Its results for the large distance indicate a
smaller correlation to the varied parameters than WhyCon+Circle, but show an in-
creasing bias of the average distance deviation for larger distances, which reveals the
limitations of the disparity sub pixel resolution of this matching algorithm.
The results of the real-world experiment of Figure 37 (ii) confirm these observations.
Please note that the real-world experiment only represents a snapshot of all applied
variations since the 50 frames at each distance are taken with the same pose, which
37
Evaluation Patrick Irmisch
AprilTagstime:
RPVAprilTags-RPNP-IT-Distorted-3 + PnP CL
1419.5ms CL WhyCon + Circle CL
RPVAprilTags-RPNP-IT-Distorted-3 time: 1325.0ms CL
WhyCons
RPVWhyCon-RPNP-IT-4 time: + PnP
307.5ms CL CL SGM
RPVWhyCon-RPNP-IT-4 time: 217.0ms CL
RPVWhyCon-CIRCULAR time: 306.0ms CL RPVWhyCon-CIRCULAR time: 212.5ms CL
Figure 37: Comparison based on simulation
RPVSGM time: 257ms RPVSGMand a real-world
time: 676ms experiment
makes image noise the only varied parameter. Also, the calibration parameters rep-
resent only a fixed but random selection based on the individual distribution. The
calibration uncertainty is considered for the simulation in the Section 6.2. The real-
world experiment (ii, left) shows an average bias for all methods of around 4cm, which
could be caused by inaccurate recordings of ground truth values. And similar to (i,
right), (ii, right) indicates a wide application range of WhyCons+PnP, which is con-
sidered in the next section in more detail. This method shows a high spreading in the
simulated data at a distance of 80m, which is caused by the variation of the scene. Es-
pecially the image exposure has a great influence, as discussed in Section 6.1.3 (p.41).
This large spreading is not present in the real-world experiment (ii, right), where the
frames differ only in image noise at each distance. The influence of image noise on
WhyCon is briefly discussed in Appendix A.3.3.
Figure 38 shows an evaluation of the application range (i) and the accuracy for two
selected distances (ii) for different view-angles, accomplished by varying the parameter
α of Section 4.1 (p.21). The consideration of the view-angle is necessary, because trains
have different flat noses and the view-angle changes in curves. Figure 39 illustrates the
considered α values.
The bars of Figure 38 (i) illustrate the application range for each method with a marking
of the success rate of the estimations for 95%, 80% and 50%. For this experiment, 100
frames are simulated and evaluated at each noted distance for each angle. Based on
38
Evaluation Patrick Irmisch
methods
95% 80% 50%
0.05
methods
0.0 .
0.00 0.0 .
−0.05 -2.0
-2.0
0 10 20 30 40 50 60 70 80 0 10 20 30 40 4050 60 70 80
0 10 20 30 40 50 60 70 80 10 40
alpha
alpha = 30.0◦◦ ::
= 30.0 alpha 30.0◦
= = 30.0:◦ :
alpha
methods
methods
−0.05
-2.0 -2.0
0 10
10 20
20 30
30 40
40 5050 6060 7070 80 80 0 10 2010 30 40 4050 60
40 70 80
alpha = 60.0◦◦ ::
= 60.0 alpha ==
alpha 60.0 ◦ ◦
60.0 : :
methods
0.00 0.0 . 0.0 .
−0.05
-2.0 -2.0
00 10
10 20
20 30
30 40
40 5050 6060 7070 80 80 0 10 2010 30 40 4050 40
60 70 80
distance
distance[m]
[m] distance [m]distance [m]distance
distance [m] [m]200 fpb 200 fpb
RPVAprilTags-RPNP-IT-Distorted-3
AprilTags + PnP CL
time:
RPVAprilTags-RPNP-IT-Distorted-3 time: 265.0ms CL265.0msWhyCons
CL RPVWhyCon-RPNP-IT-4 time: 242.0ms242.0ms
+ PnP
RPVWhyCon-RPNP-IT-4
CL WhyCon +
time:
CircleCLCL CL SGM RPV
RPVWhyCo
AprilTags + PnP CL WhyCon + Circle CL
WhyCons + PnP CL SGM
Figure 38: Evaluation of the application range for different view-angles (App. B.4.1)
this estimation, the success rate is estimated and interpolated between the evaluated
distances to provide a good impression of the method behaviors. It is obvious that
the application range of the marker-based methods decrease with a more acute view-
angle. However, unexpectedly is the higher application range of AprilTags+PnP at
30◦ than at 0◦ . At a distance of 30m and angle of 0◦ , the big AprilTag is still detected
in almost all frames. But the subsequent distance estimation with RPnP based on
the four corners of the marker frequently fails in this situation where the AprilTag is
almost perpendicular in front of the vehicle. The application range of WhyCons+PnP
is slightly worse than AprilTags+PnP at a view-angle of 30◦ . This is caused by the
Figure 39: Exemplary simulated images for the application range comparison. (In-
creased contrast and brightness for better illustration)
39
Evaluation Patrick Irmisch
slight curvature of the vehicle rear, noticeable in Figure 39 (c). This leads to a rapidly
occurring non-detection of the upper left WhyCon and since all four corners need to
be detected for the PnP estimation, the application range is restricted. On the other
hand, WhyCon+Circle provides a robust application range with a success rate of 95%
up to 70m at a view-angle of 0◦ . But also drops down to 40m when greatly increasing
the view-angle. The application of SGM is less meaningful at this point since the car
is labeled in the image for this approach.
When considering the accuracy for the selected distances in Figure 38 (i), it is striking
that the influence of the view-angle is negligible. An exception is WhyCon+Circle that
shows an increasing inaccuracy for more acute angles, well visible at the distance of
40m in Figure 38 (i, right). This is caused by the increased view-angle that reduces
the number of pixels inhabited by the projected marker, which has a similar effect as
increasing the distance of the marker to the camera.
When using fiducial markers for distance estimation, image exposure can have a rel-
evant influence on the estimated distance. (Mangelson et al., 2016) has shown in a
real-world experiment that the corner detection of AprilTags is highly affected by im-
age blooming. It describes the blooming effect of white areas onto surrounding pixels,
which varies for different exposure factors. They solved this problem by surrounding
the AprilTag with small circles, whose center estimation is more robust to blooming
effects. Thus, this effect needs to be considered for the application of the proposed
marker configurations. Therefore, the different RPV-methods are examined for differ-
ent exposure factors in simulation. Please note that this investigation is restricted in
its universality since the applied simulator does not especially model image blooming
and no suitable real-world experiment was conducted.
0.1 0.1
0.0 0.0
-0.1 −0.1
0.1 0.2 0.8 1.4 1.5 0.1 0.2 0.8 1.4 1.5
exposure exposure 500 fpb
RPVAprilTags-RPNP-IT-Distorted-1 CL AprilTagRPVAprilTags-RPNP-IT-Distorted-3
#1 + PnP CL WhyCons CL+ PnP CL time: 262.5ms CL
RPVWhyCon-RPNP-IT-4
AprilTags #3 + PnP CL WhyCon + Circle CL time: 261.0ms CL
RPVWhyCon-CIRCULAR
Figure 40 shows the behavior of the approaches for variation of image exposure. Since
Section 6.1.1 implied different behaviors for the application of a single (AprilTag#1)
and multiple AprilTags (AprilTags#3), this experiment considers both variants. Also
both WhyCon approaches are evaluated. SGM has shown that it does not significantly
correlate with image exposure, which is why it is not considered in this section.
40
Evaluation Patrick Irmisch
In the case of image exposure with a value of 1.0, the projected pixels of the mark-
ers black area have a grey value of 5 and the pixels of white areas a value of 255 in
simulation. This implies that the exposure of 1.0 marks the transition to saturation.
The single AprilTag shows a bias in the distance error for over exposure (>1.0). This
is caused by the saturation that pushes the AprilTag borders inwards as described in
(Mangelson et al., 2016). Figure 41 illustrates this effect. It shows the transition of
the markers black and white area, which is quantized to pixels xj and grey values for
different exposure factors. When increasing the exposure from 1.0 to 1.5, the grey value
of the border pixel xi increases, the estimated border is pushed towards the black area,
which makes the marker appear smaller in the image and thus, a too far distance is
estimated in average. This effect does not occur for low exposure, because the relation
between the values of the pixels xi−1, , xi , xi+1 remains the same. This effect is also
balanced when using multiple AprilTags since the relative distance of the markers to
each other remains unchained. However, since the two AprilTags are rather small in
the chosen configuration of Figure 29 (a, p.31), this effect occurs for larger distance,
as observed in Section 6.1.1. This results in a long tail of the upper whiskers.
τ0.2
0 0 0
x i-2 x i-1 xi x i+1 x i+2 x i-2 x i-1 xi x i+1 x i+2 x i-2 x i-1 xi x i+1 x i+2
result:
The same effect caused by over exposure occurs by the application of WhyCon+Circle.
Please note that the compensation of incorrect diameter estimation of (Krajník et al.,
2014, p. 8) is not applied, as stated in Section 5.2.1. For this approach, Figure 41 also
provides an illustration of the estimated threshold (red line) and the resulting binary
assignment for each pixel. In contrast to AprilTags+PnP, WhyCon+Circle shows a
slightly negative bias for the remaining exposure factor, which implies that the line
detection of the AprilTags line detection is potentially more accurate than the hard
assignment of WhyCons circle detection. Similar to the usage of multiple Apriltags,
the estimation of PnP based on the four centers of four WhyCons is invariant to image
exposure since the projected pattern is equally affected in all directions and the esti-
mated center remains the same. Shown in (Mangelson et al., 2016) for circular markers
by a real-world experiment.
41
Evaluation Patrick Irmisch
Figure 42 repeats the experiment of Figure 37 (p.38, i) and additionally applies uncer-
tainty of calibration parameters according to section 4.3.2. This includes uncertainty
of the stereo calibration, interior camera parameter and uncertainty of the marker at-
tachment on the vehicle, noted in Table 2 (p. 25). When comparing Figure 42 (iii)
with Figure 37 (i), it is striking that the modeled calibration uncertainties increase
the spreading and thus the uncertainty of all RPV-methods. The uncertainty of each
method is represented by the standard deviation of the assumed out-coming Gaussian
distribution, marked with three small marks. The middle mark represents the bias.
0.0 0.0
-2.0 -20.0
10.0 15.0 20.0 40 60 80
distance [m] distance [m] 5000 fpb
AprilTags AprilTags
+ PnP CL+ PnP CL WhyCon + Circle CL
RPVAprilTags-RPNP-IT-Distorted-3 time: 1409.5ms CL
WhyCons WhyCons
+ PnP CL+ PnP CL SGM
RPVWhyCon-RPNP-IT-4 time: 137.5ms CL
WhyCon + Circle CL RPVWhyCon-CIRCULAR time: 135.5ms CL
Figure 42: Comparison based on simulation
SGM with
RPVSGM variation
time: 277ms and uncertainty
When considering the marker-based methods for near distances, WhyCon+Circle has
the smallest uncertainty. This is caused by a small dependency on all calibration pa-
rameters. The dependencies are explored in the next section. The two PnP-based
methods show approximately the same behavior with a relatively large uncertainty
compared to WhyCon+Circle. This is caused by the strong dependency on the marker
calibration. In contrast, SGM shows a comparatively large uncertainty that is rapidly
increasing with larger distances. This is caused by strong dependencies to many cali-
bration parameters of the camera. The course of the pictured standard deviation shows
a square rise of the uncertainty of the estimated distance to the examined distance.
This matches the theoretical consideration of Section 3.2.1 with Figure 12 (p. 12).
42
Evaluation Patrick Irmisch
0.000
-1 -1 0 1
SGM, distance = 70m
SGM, distancedistance
= 70m error [m]
probability
probability
0.100
0.100
gm-100% gm-SGM
0.050
0.050 gm-99.5% gm-SGM-outliers
0.000
0.000 -40 0 40
-40
distance0 error [m] 40
distance error [m]
gm-100% gm-SGM
gm-100% gm-SGM
gm-99.5% gm-SGM-outliers
gm-99.5% gm-SGM-outliers
(a) Examples with extreme outliers (b) Anaglyph and disparity image
The pictured standard deviations of Figure 42 exclude extreme outliers from the calcu-
lation. Figure 43 (a) shows two distributions that contain extreme outliers (outliers are
not pictured). Concerning AprilTags+PnP, gm-100% represents the resulting Gaussian
PDF that results if extreme outliers are included. The actual distribution is not well
represent. Because of that, only the middle 99.5% of the sorted data is considered for
all standard deviation estimations, resulting in gm-99.5%.
Second, SGM shows another characteristic for far distances in (a, bottom). The plot
for 70m shows a concentration of outliers at a distance error of around -65m. Fig-
ure 43 (b, top) shows an example of a rectified image pair used for the SGM approach
at 80m that shows the reason for this characteristic. It is rectified by a camera model
composed of sampled calibration parameters. In such an extreme case, SGM is no
longer able to match both images correctly since it only matches on the same image
line. The resulting disparity map of (b, bottom) shows a sparse disparity estimation
with only wrong estimations due to incorrect matching. To filter out these faulty esti-
mations, the estimated results of all iterations of one scene with a distance less than 13
of the current investigated distance are sorted out, if this condition concerns at least
0.5% of the already cropped data. Figure 42 (a, bottom) shows that the resulting
Gaussian distribution gm-SGM represents the data better. This consideration shows
shows that SGM is not robust to calibration uncertainty and requires calibrated camera
parameters of high precision with low uncertainty.
6.2.2 Correlation
43
Evaluation Patrick Irmisch
1.00
AprilTags + PnP CL
0.75
WhyCons + PnP CL
0.50
WhyCon + Circle CL
0.25
SGM
0.00
XYZ S XYZ S XYZ S XYZ S XYZ S XYZ S XYZ S XYZ S
Ta 0 X
g TY
Ta g Z
Tag 1 0 S
Ta 1 X
g TY
Ta g Z
Tag 2 1 S
Ta 2 X
g TY
Ta g Z
Tag 3 2 S
Ta 3 X
g TY
Ta ag TZ
Tag 4 3 S
Ta 4 X
g TY
Ta ag TZ
Tag 5 4 S
Ta 5 X
g TY
Ta ag TZ
Tag 6 5 S
Ta 6 X
g TY
Ta ag TZ
Tag 7 6 S
Ta 7 X
g TY
g Z
S
Ta 0 T
Ta 1 T
Ta 2 T
Ta 7 T
g T
g T
g T
g T
g T
g T
g T
g T
7
T3
T4
T5
T6
Tag 0
Ta
down left down right up right up left down left down right
by their absolute values. Red indicates a high correlation and dark blue rather non.
Please note that only the distance estimation of the left camera for the marker-based
methods are considered. The corresponding right camera estimations are shown in
Appendix B.4.4.
Figure 44 shows the dependencies of the applied RPV-methods and the calibration
uncertainty of the position in X, Y, Z direction and of the marker scaling S. The
rotation of each marker is not displayed, because its correlations have shown to be
negligible. Figure 44 points out that the PnP-based methods strongly correlate with
the calibrated positions in Y- and Z-direction of the markers, which scale the most
important reference lengths of the model. Based on these correlations, the most sig-
nificant direction of each marker is drawn into Figure 46 (a). In contrast to the PnP
methods, WhyCon+Circle shows only a great correlation with the scale of the marker,
which is most significant for the estimation with a single marker even in comparison
to the X-direction itself.
1.00
AprilTags + PnP CL
0.75
WhyCons + PnP CL
0.50
WhyCon + Circle CL
0.25
SGM
0.00
f u0 v0 k1 k2 f u0 v0 k1 k2 RxRy Rz TxTy Tz
Exposure
CL L f
CL u0
CL v0
CL k1
C 2
CRR f
CR u0
CR v0
St CR k1
St reo k2
St reo X
St reo RY
St reo RZ
St reo X
Ex eo Y
po TZ
re
k
e R
e T
er T
su
C
CL CR HCR2CL
e
e
e
Figure 45 shows the correlation of the methods with all considered camera calibration
parameters. This includes the focal length, principal point and two distortion parame-
ters of each camera as well as the stereo transformation HCR2CL . When considering all
marker-based methods, the correlation matrix indicates that only the uncertainty of
the focal length has a noticeable influence on the measured distance. In contrast, SGM
is highly affected by the uncertainties of the principal points in horizontal direction and
also by the uncertainty of the rotation around the y-axis of HCR2CL . Both influences
are caused by the direct conjunction with the triangulation calculation. Figure 46 (b)
marks out these parameters.
To complete this consideration, Figure 45 also includes a visualization of the methods
correlation with image exposure. As pointed out in Section 6.1.3, only WhyCon+Circle
has a noticeable correlation. The correlation with the position and orientation trem-
bling of the cameras is negligible and not illustrated.
44
Evaluation Patrick Irmisch
u
v
M π`
u`
m π v` m`
c c`
CL z CR
z`
e e` x`
y x y`
z
y HCL2CR
Figure 46: Visualization of the most influencing calibration parameters (App. B.4.4)
In Figure 47, the investigations of different severities of uncertainty of the marker cali-
bration are illustrated. Therefore, the standard deviations σm of all components of the
marker calibration are scaled by a factor sm . The camera calibration uncertainties stay
unchanged. A difference is recognizable for the uncertainties of the measured distances
for the marker-based methods between sm = 0 that states no marker calibration and
the normal case sm = 1. This tremendous difference confirms the results of the obser-
vations of the previous section that the calibration of the markers is most significant
for these methods. When increasing the factor up to three, which implies a marker
(i) Simulated Experiment (+Var,Unc)
uncertainty of 3cm translation on each axis and 3◦ rotation around each axis, the un-
certainty of the PnP-based methods exceed those distance = 10m
of the SGM-approach. This shows
0.8
distance error [m]
0.4
AprilTags + PnP CL
WhyCons + PnP CL
0.0
WhyCon + Circle CL
-0.4 SGM
-0.8
0.0 1.0 2.0 3.0
sm [∗σm ] 5000 fpb
AprilTags + PnP CL
Figure 47: Uncertainty of marker
WhyCons + PnP CL
calibration parameters (App. B.4.5)
WhyCon + Circle CL
SGM
45
(i) Simulated Experiment (+Var,Unc)
distance = 10m
0.8
0.4
AprilTags + PnP CL
WhyCons + PnP CL
0.0
WhyCon + Circle CL
-0.4 SGM
-0.8
0.0 0.33 0.67 1.0
sc [∗σc ] 5000 fpb
AprilTags + PnP CL
Figure 48: Uncertainty of camera calibration parameters (App. B.4.6)
WhyCons + PnP CL
WhyCon + Circle CL
SGM
the marker-based approaches. In contrast, the resulting uncertainty of SGM decreases
almost linearly with a smaller camera uncertainty. In order to be competitive with
the marker-based methods, SGM needs a camera calibration that is three times less
uncertain than the applied parameters.
2.0
AprilTags + PnP CL
1.0
WhyCons + PnP CL
0.0
WhyCon + Circle CL
-1.0
SGM
-2.0
0.34 0.68 1.02 1.36
baseline [m] 5000 fpb
AprilTags + PnP CL
Figure 49: Influence of the baseline on uncertainties (App. B.4.7)
WhyCons + PnP CL
WhyCon + Circle CL
baselines of the used camera
SGM system to investigate its influence on the uncertainty
of
the SGM distance estimation. The standard baseline of the camera system is 0.34m,
which is represented by the first box plot of the plot. Then, when linearly increasing
the baseline, the uncertainty of SGM decreases with a root shape. At a baseline three
times larger than the applied one, the uncertainty of SGM is comparable to the other
marker-based methods. Thus, this experiments shows the advantage of great baseline.
However, at a baseline of 1.36m the disparity of the object at a distance of 20m exceeds
the maximum set disparity of 128px. The maximum disparity is important to limit
required resources and computational power. Thus, the matching fails and a distance
estimate that is not within the plotted limits of the distance error.
Besides the baseline, the focal length could also be increased, which is not considered
since the given focal length of around 12.5mm [DLRStereo] is already rather big.
46
Evaluation Patrick Irmisch
1.00
AprilTag + PnP CL
CR
WhyCons + PnP CL
0.75
CR
WhyCon + Circle CL 0.50
CR
AprilTag + Tri. 0.25
WhyCon + Tri.
SGM
AAp iirrcc CCLL 0.00
CCLL
CCoo ++ T RR
+ i..
SSGG ii..
MM
nn + Trri
CC
W iillTTa CC
TTrr
W CCoonn nnPP
P
PP
hhyy ss ++
hhyy ++
CC
W aagg
W
TT
CCoo
rriill
AApp
W
W
Figure 50 visualizes the correlation of the different RPV-methods. For this correla-
tion analysis, the same extensive Monte-Carlo-Experiment is used as in Section 6.2.2.
Figure 50 shows a clear independence between the marker-based methods and the tri-
angulation methods. The stereo methods strongly correlate with each other, caused by
the shared correlation to the camera calibration parameters. The monocular marker-
based methods show a strong correlation between the estimations of the left CL and
the right camera CR, which is caused by the shared strong correlation to the calibration
uncertainty of the markers.
47
Evaluation Patrick Irmisch
0.0
0 0
-0.5
-10 -10
-1.0
15 20 40
10 60
15 80
20 40 60 80
tance [m] distance [m] distance [m] 5000 fpb
0.0 0.0
-0.1
-4.0 -4.0
-0.2
15 20 40
10 60
15 80
20 40 60 80
tance [m] distance [m] distance [m] 50 fpb
RPVWhyCon-CIRCULAR time:WhyCon
RPVWhyCon-TRIANG-1 time: 127.0ms + CircleRPVSGM
379ms CL CL,CR WhyCon +time:
RPVWhyCon-TRIANG-1
time: 242ms Tri.
Combined SGM
251ms time: Combined
ms RPVSGM time: 301ms Combin
RPVWhyCon-CIRCULAR time: 127.0ms CR
Figure 51: Combination of results based on uncertainties (App. B.4.8)
uncertainties of the stereo-based methods are much higher than those of the monocular-
based methods. Thus, it is unattractive for this theoretical investigation to use only
the result of the method with the smallest uncertainty. The distance of the Combined
method is estimated for each iteration based on the uncertainties of Figure 51 (i),
formulated in the following equation:
4 4
1 X 1 X
2
dc = d ,
2 m
with K = σm (13)
K m=1 σm m=1
Figure 51 (ii) shows a small improvement in accuracy in comparison to the stand alone
methods, especially when considering a distance of 20m. However, when considering the
resulting uncertainty of the combined method in Figure 51 (i), it is striking that it does
not get smaller than the uncertainties of the marker-based methods. This is caused by
the shared strong correlation to the calibration uncertainties of the standalone methods.
As a result, the uncertainty does not improve.
This theoretical consideration shows that the uncertainty of the marker calibration
does not only worsen the estimation for one camera, it also prevents a multi-camera
system from reducing the uncertainty. Furthermore, because of the strong influence of
the camera calibration on the stereo-based methods, the variance-based weight of the
stereo-based method is too small to improve the result based on Equation 13.
However, a multi-camera system comes along with redundant measurements, which
ensure a valid result in the case that the WhyCon marker is not detected. Section 6.1.2
shows that this frequently applies for large distances and acute view-angles, whereas
SGM succeeds in all considered cases. Thus, the second pattern detection with the
right camera and the additional estimation based on SGM increase the robustness.
48
Conclusion Patrick Irmisch
7 Conclusion
In this work, the application of fiducial marker for relative distance estimation was
investigated with particular reference to "Virtual Coupling" of trains. Different marker
configurations were evaluated, which include the application of multiple AprilTags as
well as four WhyCon to apply a PnP method and a single WhyCon to estimate the
distance based on its outer circle. They were compared with an SGM approach and
were tested in a stereo setup. The related experiments were conducted in simulation
and real-world.
The first research question concerns the overall performance of different marker con-
figurations with respect to the applied distance, view-angle and image exposure and
noise. The application of multiple markers provided accurate results and bypassed the
dependency to image exposure that occurred when using single markers. However, the
occupied marker area was relatively large with a comparatively low application range.
In contrast, the single large WhyCon marker was more affected by image exposure,
but provided the longest application range, boosted by the fact that its code did not
need to be extracted in order to detect the marker. The SGM approach has shown
to be superior with respect to the application range even for acute view-angles, while
providing the same accuracy and invariance to image exposure.
The second research question addresses the uncertainty of the individual estimations
caused by uncertainty in calibration and the correlation to specific calibration param-
eters. The research has shown that SGM comes with many strong dependencies to
camera calibration parameters, which resulted in comparatively large uncertainties.
The experiments showed that the uncertainty of the given camera calibration parame-
ters needs to be improved by a factor of three to provide a comparable uncertainty to
the other methods. Enlarging the baseline by the same factor provided similar results.
The approaches that are based on multiple markers correlated strongly with the cali-
bration of the markers, but less with the camera calibration. The estimation based on
the single marker was the most independent to all calibration parameters, which lead
to the smallest uncertainty of all considered methods.
The last research question scrutinizes the application of the marker-based methods in
a stereo setup. Therefore, the single WhyCon configuration was used to estimate the
distance from both cameras individually and by triangulation. The SGM approach was
applied to generate a fourth estimation. The individual results were combined based on
the estimated uncertainties by the Monte-Carlo-Simulation. The research has shown
that the result of the mono-camera method was not improved significantly, due to
strong correlations between both single-camera estimations and the low influence of
the stereo methods caused by their high uncertainty. However, the robustness was in-
creased in cases of non-detection of the marker due to additional measurements.
The presented results imply that the single WhyCon estimation is most suited for the
task of relative position estimation of vehicles in comparison to all considered methods,
based on the given camera setup and used calibration parameters. Therefore, I sug-
gest using a stereo-camera system with SGM to ensure robustness and apply fiducial
markers to gain high certainty. Effort should be put into the geometrical calibration.
Since the markers are robustly detected for short distances, the baseline should be set
as large as possible.
49
Discussion and Outlook Patrick Irmisch
50
References Patrick Irmisch
References
Ababsa, Fakhr-eddine and Malik Mallem (2004). “Robust camera pose estimation using
2d fiducials tracking for real-time augmented reality systems”. In: Proceedings of the
2004 ACM SIGGRAPH international conference on Virtual Reality continuum and
its applications in industry. Ed. by Judith Brown. New York, NY: ACM, p. 431.
isbn: 1581138849. doi: 10.1145/1044588.1044682.
Akenine-Möller, Tomas, Eric Haines, and Naty Hoffman (2008). Real-Time Rendering
3rd Edition. Natick, MA, USA: A. K. Peters, Ltd. isbn: 987-1-56881-424-7.
Badino, Hernán, Uwe Franke, and Rudolf Mester (2007). “Free Space Computation
Using Stochastic Occupancy Grids and Dynamic Programming”. In: Proc. Int’l Conf.
Computer Vision, Workshop Dynamical Vision.
Badino, Hernán, Uwe Franke, and David Pfeiffer (2009). “The Stixel World - A Com-
pact Medium Level Representation of the 3D-World”. In: url: http://www.lelaps.
de/papers/badino_dagm09.pdf.
Bergamasco, Filippo et al. (2011). “RUNE-Tag: A high accuracy fiducial marker with
strong occlusion resilience”. In: CVPR 2011. IEEE, pp. 113–120. isbn: 978-1-4577-
0394-2. doi: 10.1109/CVPR.2011.5995544.
Bergamasco, Filippo, Andrea Albarelli, and Andrea Torsello (2013). “Pi-Tag: A fast
image-space marker design based on projective invariants”. In: Machine Vision and
Applications 24.6, pp. 1295–1310. issn: 0932-8092. doi: 10 . 1007 / s00138 - 012 -
0469-6.
Bergamasco, Filippo et al. (2016). “An Accurate and Robust Artificial Marker Based
on Cyclic Codes”. In: IEEE TRANSACTIONS ON PATTERN ANALYSIS AND
MACHINE INTELLIGENCE 38.12, pp. 2359–2373. doi: 10.1109/TPAMI.2016.
2519024.
Bernini, Nicola et al. (2014). “Real-time obstacle detection using stereo vision for au-
tonomous ground vehicles: A survey”. In: IEEE 17th International Conference on
Intelligent Transportation Systems (ITSC), 2014. Piscataway, NJ: IEEE, pp. 873–
878. isbn: 978-1-4799-6078-1. doi: 10.1109/ITSC.2014.6957799.
Birdal, Tolga, Ievgeniia Dobryden, and Slobodan Ilic (2016). “X-Tag: A Fiducial Tag
for Flexible and Accurate Bundle Adjustment”. In: 2016 Fourth International Con-
ference on 3D Vision (3DV). IEEE, pp. 556–564. isbn: 978-1-5090-5407-7. doi:
10.1109/3DV.2016.65.
Boyat, Ajay Kumar and Brijendra Kumar Joshi (2015). “A Review Paper: Noise Mod-
els in Digital Image Processing”. In: Signal & Image Processing : An International
Journal 6.2, pp. 63–75. issn: 22293922. doi: 10.5121/sipij.2015.6206.
Britto, Joao et al. (2015). “Model identification of an unmanned underwater vehi-
cle via an adaptive technique and artificial fiducial markers”. In: OCEANS 2015 -
MTS/IEEE Washington. Piscataway, NJ: IEEE, pp. 1–6. isbn: 978-0-9339-5743-5.
doi: 10.23919/OCEANS.2015.7404391.
Brown, Duane C. (1971). “Close-range camera calibration”. In: PHOTOGRAMMET-
RIC ENGINEERING 37.8, pp. 855–866.
Calvet, Lilian et al. (2016). “Detection and Accurate Localization of Circular Fidu-
cials under Highly Challenging Conditions”. In: 29th IEEE Conference on Computer
i
References Patrick Irmisch
Vision and Pattern Recognition. Piscataway, NJ: IEEE, pp. 562–570. isbn: 978-1-
4673-8851-1. doi: 10.1109/CVPR.2016.67.
Caraffi, Claudio et al. (2012). “A system for real-time detection and tracking of vehicles
from a single car-mounted camera”. In: 2012 15th International IEEE Conference
on Intelligent Transportation Systems. IEEE, pp. 975–982. isbn: 978-1-4673-3063-3.
doi: 10.1109/ITSC.2012.6338748.
Chen, Shi-Huang and Ruie-Shen Chen (2011). “Vision-Based Distance Estimation for
Multiple Vehicles Using Single Optical Camera”. In: Second International Conference
on Innovations in Bio-inspired Computing and Applications (IBICA), 2011. Piscat-
away, NJ: IEEE, pp. 9–12. isbn: 978-1-4577-1219-7. doi: 10.1109/IBICA.2011.7.
Cordts, Marius et al. (2017). “The Stixel world: A medium-level representation of traffic
scenes”. In: Image and Vision Computing. issn: 02628856. doi: 10.1016/j.imavis.
2017.01.009.
Cucci, D. A. (2016). “Accurate Optical Target Pose Determination for Application
in Aerial photogrammetry”. In: ISPRS Annals of Photogrammetry, Remote Sensing
and Spatial Information Sciences III-3, pp. 257–262. issn: 2194-9050. doi: 10.5194/
isprsannals-III-3-257-2016.
Danescu, Radu and Sergiu Nedevschi (2014). “A Particle-Based Solution for Modeling
and Tracking Dynamic Digital Elevation Maps”. In: IEEE Transactions on Intelli-
gent Transportation Systems 15.3, pp. 1002–1015. issn: 1524-9050. doi: 10.1109/
TITS.2013.2291447.
Dhanaekaran, Surender et al. (2015). “A Survey on Vehicle Detection based on Vi-
sion”. In: Modern Applied Science 9.12, p. 118. issn: 1913-1852. doi: 10.5539/mas.
v9n12p118.
DLR, ed. (2016). Im Hochgeschwindigkeitszug durch die Nacht - DLR Wissenschaftler
entwickeln Zug-zu-Zug-Kommunikation. Germany. url: http://www.dlr.de/dlr/
desktopdefault.aspx/tabid-10122/333_read-17514/#/gallery/22712.
Elfes, Alberto (1989). “Using occupancy grids for mobile robot perception and naviga-
tion - Computer”. In: IEEE.
Erbs, Friedrich, Alexander Barth, and Uwe Franke (2011). “Moving vehicle detection
by optimal segmentation of the Dynamic Stixel World”. In: IEEE Intelligent Vehicles
Symposium (IV), 2011 ; 5 - 9 June 2011 ; Baden-Baden, Germany. Piscataway, NJ:
IEEE, pp. 951–956. isbn: 978-1-4577-0890-9. doi: 10.1109/IVS.2011.5940532.
Ernst, Ines and Heiko Hirschmüller (2008). “Mutual Information Based Semi-Global
Stereo Matching on the GPU”. In: Advances in visual computing. Ed. by George
Bebis. Vol. 5358. Lecture Notes in Computer Science. Berlin: Springer, pp. 228–239.
isbn: 978-3-540-89638-8. doi: 10.1007/978-3-540-89639-5{\textunderscore}22.
Fiala, Mark (2005). “ARTag, a Fiducial Marker System Using Digital Techniques”. In:
CVPR ’05 Proceedings of the 2005 IEEE Computer Society Conference on Computer
Vision and Pattern Recognition, pp. 590–596.
Fischler, Martin A. and Robert C. Bolles (1981). “Random sample consensus: A paradigm
for model fitting with applications to image analysis and automated cartography”.
In: Communications of the ACM 24.6, pp. 381–395. issn: 00010782. doi: 10.1145/
358669.358692.
Forster, Roger (2000). “Manchester encoding: opposing definitions resolved”. In: Engi-
neering Science and Education Journal.
ii
References Patrick Irmisch
Funk, Eugen (2017). Next Generation Train. Meilenstein 24301703. Ed. by Deutsches
Zentrum für Luft- und Raumfahrt e.V. Berlin.
Gatrell, Lance B. and William A. Hoff (1991). “Robust Image Features: Concentric
Contrasting Circles and Their Image Extraction”. In: Proceedings Volume 1612, Co-
operative Intelligent Robotics in Space II 1992.
Geiger, Andreas, Philip Lenz, and Raquel Urtasun (2012). “Are we ready for Au-
tonomous Driving? The KITTI Vision Benchmark Suite”. In: Conference on Com-
puter Vision and Pattern Recognition (CVPR).
Griesbach, Denis, Dirk Baumbach, and Sergey Zuev (2014). “Stereo-vision-aided in-
ertial navigation for unknown indoor and outdoor environments”. In: 2014 Inter-
national Conference on Indoor Positioning and Indoor Navigation (IPIN). Piscat-
away, NJ: IEEE, pp. 709–716. isbn: 978-1-4673-8054-6. doi: 10.1109/IPIN.2014.
7275548.
Heikkila, J. and O. Silven (1997). “A four-step camera calibration procedure with
implicit image correction”. In: Proceedings of IEEE Computer Society Conference
on Computer Vision and Pattern Recognition. IEEE Comput. Soc, pp. 1106–1112.
isbn: 0-8186-7822-4. doi: 10.1109/CVPR.1997.609468.
Hermann, Simon and Reinhard Klette (2013). “Iterative Semi-Global Matching for
Robust Driver Assistance Systems”. In: Computer Vision - ACCV 2012. Ed. by David
Hutchison et al. Vol. 7726. Lecture Notes in Computer Science / Image Processing,
Computer Vision, Pattern Recognition, and Graphics. Berlin/Heidelberg: Springer
Berlin Heidelberg, pp. 465–478. isbn: 978-3-642-37430-2. doi: 10.1007/978-3-642-
37431-9{\textunderscore}36.
Hirschmuller, H. (2005). “Accurate and Efficient Stereo Processing by Semi-Global
Matching and Mutual Information”. In: CVPR. Ed. by Cordelia Schmid. Los Alami-
tos, Calif.: IEEE Computer Society, pp. 807–814. isbn: 0-7695-2372-2. doi: 10.1109/
CVPR.2005.56.
Hirschmüller, Heiko (2007). “Stereo Processing by Semi-Global Matching and Mutual
Information”. In: IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MA-
CHINE INTELLIGENCE. url: https://core.ac.uk/download/pdf/11134866.
pdf.
Jähne, Bernd (2005). Digitale Bildverarbeitung. 6th ed. Springer Berlin Heidelberg.
isbn: 3-540-24999-0.
JCGM (2008a). “Evaluation of measurement data - Guide to the expression of uncer-
tainty in measurement: GUM 1995 with minor corrections”. In: Joint Committee for
Guides in Metrology.
– (2008b). “Evaluation of measurement data - Supplement 1 to the Guide to the ex-
pression of uncertainty in measurement: Propagation of distributions using a Monte
Carlo method”. In: Joint Committee for Guides in Metrology.
Kato, H. and M. Billinghurst (1999). “Marker tracking and HMD calibration for a
video-based augmented reality conferencing system”. In: Proceedings, 2nd IEEE
and ACM International Workshop on Augmented Reality (IWAR’99). Los Alami-
tos, Calif: IEEE Computer Society, pp. 85–94. isbn: 0-7695-0359-4. doi: 10.1109/
IWAR.1999.803809.
iii
References Patrick Irmisch
Krajník, Tomáš et al. (2013). “External localization system for mobile robotics”. In:
16th International Conference on Advanced Robotics (ICAR), 2013. Piscataway, NJ:
IEEE, pp. 1–6. isbn: 978-1-4799-2722-7. doi: 10.1109/ICAR.2013.6766520.
Krajník, Tomáš et al. (2014). “A Practical Multirobot Localization System”. In: Journal
of Intelligent & Robotic Systems 76.3-4, pp. 539–562. issn: 0921-0296. doi: 10.1007/
s10846-014-0041-x.
Lehmann, Florian (2015). Evaluierung eines Inertialsensors. Implementierung einer
virtuellen Kamera mit Verzeichnung. Ed. by Deutsches Zentrum für Luft- und Raum-
fahrt e.V. in der Helmholtz-Gemeinschaft.
– (2016). Implementierung einer virtuellen Stereokamera. Ed. by Deutsches Zentrum
für Luft- und Raumfahrt e.V. in der Helmholtz-Gemeinschaft.
Lenz, Philip et al. (2011). “Sparse scene flow segmentation for moving object detection
in urban environments”. In: IEEE Intelligent Vehicles Symposium (IV), 2011 ; 5 -
9 June 2011 ; Baden-Baden, Germany. Piscataway, NJ: IEEE, pp. 926–932. isbn:
978-1-4577-0890-9. doi: 10.1109/IVS.2011.5940558.
Lepetit, Vincent, Francesc Moreno-Noguer, and Pascal Fua (2009). “EPnP: An Ac-
curate O(n) Solution to the PnP Problem”. In: International Journal of Computer
Vision 81.2, pp. 155–166. issn: 0920-5691. doi: 10.1007/s11263-008-0152-6.
Lessmann, Stephanie et al. (2016). “Probabilistic distance estimation for vehicle track-
ing application in monocular vision”. In: 2016 IEEE Intelligent Vehicles Symposium
(IV). IEEE, pp. 1199–1204. isbn: 978-1-5090-1821-5. doi: 10 . 1109 / IVS . 2016 .
7535542.
Ley, Andreas, Ronny Hänsch, and Olaf Hellwich (2016). “SyB3R: A Realistic Synthetic
Benchmark for 3D Reconstruction from Images”. In: SpringerLink.
Li, Shiqi, Chi Xu, and Ming Xie (2012). “A Robust O(n) Solution to the Perspective-
n-Point Problem”. In: IEEE TRANSACTIONS ON PATTERN ANALYSIS AND
MACHINE INTELLIGENCE 34.7, pp. 1444–1450. doi: 10.1109/TPAMI.2012.41.
Lightbody, Peter, Tomas Krajnik, and Marc Hanheide (2017). “A Versatile High-
Performance Visual Fiducial Marker Detection System with Scalable Identity En-
coding”. In: Proceedings of the Symposium on Applied Computing, pp. 276–282.
Liu, Yinan et al. (2017). “Calculating Vehicle-to-Vehicle Distance Based on License
Plate Detection”. In: Advances in Intelligent Systems and Computing 454.
Lu, Yin-Yu et al. (2011). “A vision-based system for the prevention of car collisions
at night”. In: Machine Vision and Applications 22.1, pp. 117–127. issn: 0932-8092.
doi: 10.1007/s00138-009-0239-2.
Lucas, Bruce D. and Takeo Kanade (1981). “An iterative image registration technique
with an application to stereo vision”. In: In IJCAI81, pp. 674–679.
Mangelson, Joshua G. et al. (2016). “Robust visual fiducials for skin-to-skin relative
ship pose estimation”. In: OCEANS 2016 MTS/IEEE Monterey. IEEE, pp. 1–8.
isbn: 978-1-5090-1537-5. doi: 10.1109/OCEANS.2016.7761168.
Menze, Moritz and Andreas Geiger (2015). “Object scene flow for autonomous ve-
hicles”. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition
(CVPR). Piscataway, NJ: IEEE, pp. 3061–3070. isbn: 978-1-4673-6964-0. doi: 10.
1109/CVPR.2015.7298925.
Moratto, Zack (2013). Semi-Global Matching. Ed. by LUNOKHOD. url: http : / /
lunokhod.org/?p=1356.
iv
References Patrick Irmisch
Naimark, L. and E. Foxlin (2002). “Circular data matrix fiducial system and robust
image processing for a wearable vision-inertial self-tracker”. In: Proceedings / Inter-
national Symposium on Mixed and Augmented Rality. Los Alamitos, Calif.: IEEEE
Computer Society, pp. 27–36. isbn: 0-7695-1781-1. doi: 10 . 1109 / ISMAR . 2002 .
1115065.
Nakamura, Katsuyuki et al. (2013). “Real-time monocular ranging by Bayesian trian-
gulation”. In: 2013 IEEE Intelligent Vehicles Symposium (IV). IEEE, pp. 1368–1373.
isbn: 978-1-4673-2755-8. doi: 10.1109/IVS.2013.6629657.
Olson, Edwin (2011). “AprilTag: A robust and flexible visual fiducial system”. In: 2011
IEEE International Conference on Robotics and Automation. Ed. by Antonio Bicchi.
Piscataway, NJ: IEEE, pp. 3400–3407. isbn: 978-1-61284-386-5. doi: 10.1109/ICRA.
2011.5979561.
Oniga, F. and S. Nedevschi (2010). “Processing Dense Stereo Data Using Elevation
Maps: Road Surface, Traffic Isle, and Obstacle Detection”. In: IEEE Transactions
on Vehicular Technology 59.3, pp. 1172–1182. issn: 0018-9545. doi: 10.1109/TVT.
2009.2039718.
Park, Ki-Yeong and Sun-Young Hwang (2014). “Robust range estimation with a monoc-
ular camera for vision-based forward collision warning system”. In: TheScientific-
WorldJournal 2014, p. 923632. issn: 1537-744X. doi: 10.1155/2014/923632.
Pertile, Marco et al. (2015). “Uncertainty evaluation of a vision system for pose mea-
surement of a spacecraft with fiducial markers”. In: Metrology for Aerospace, IEEE
2015.
Ponte Muller, Fabian de (2017). “Survey on Ranging Sensors and Cooperative Tech-
niques for Relative Positioning of Vehicles”. In: Sensors (Basel, Switzerland) 17.2.
issn: 1424-8220. doi: 10.3390/s17020271.
Quan, Long and Zhongdan Lan (1999). “Linear N-point camera pose determination”.
In: IEEE Transactions on Pattern Analysis and Machine Intelligence 21.8, pp. 774–
780. issn: 01628828. doi: 10.1109/34.784291.
Remondino, Fabio et al. (2013). “Dense image matching: Comparisons and analyses”.
In: 2013 Digital Heritage International Congress (DigitalHeritage). IEEE, pp. 47–54.
isbn: 978-1-4799-3170-5. doi: 10.1109/DigitalHeritage.2013.6743712.
Schreer, Oliver (2005). Stereoanalyse und Bildsynthese: Mit 6 Tabellen. Berlin, Heidel-
berg: Springer-Verlag Berlin Heidelberg. isbn: 3-540-23439-X. doi: 10.1007/3-540-
27473-1. url: http://dx.doi.org/10.1007/3-540-27473-1.
Seng, Kian Lee et al. (2013). “Vision-based State Estimation of an Unmanned Aerial
Vehicle”. In: Trends in Bioinformatics 10, pp. 11–19.
Sivaraman, Sayanan and Mohan M. Trivedi (2013a). “A review of recent develop-
ments in vision-based vehicle detection”. In: 2013 IEEE Intelligent Vehicles Sympo-
sium (IV). IEEE, pp. 310–315. isbn: 978-1-4673-2755-8. doi: 10.1109/IVS.2013.
6629487.
Sivaraman, Sayanan and Mohan Manubhai Trivedi (2013b). “Looking at Vehicles on
the Road: A Survey of Vision-Based Vehicle Detection, Tracking, and Behavior Anal-
ysis”. In: IEEE Transactions on Intelligent Transportation Systems 14.4, pp. 1773–
1795. issn: 1524-9050. doi: 10.1109/TITS.2013.2266661.
Stein, G. P., O. Mano, and A. Shashua (2003). “Vision-based ACC with a single cam-
era: bounds on range and range rate accuracy”. In: Proceedings / IEEE IV 2003,
v
References Patrick Irmisch
Intelligent Vehicles Symposium. Piscataway, NJ: IEEE Operations Center, pp. 120–
125. isbn: 0-7803-7848-2. doi: 10.1109/IVS.2003.1212895.
Stein, Gideon P., D. Ferenez, and Ofer Avni (2012). “Estimating distance to an object
using a sequence of images recorded by a monocular camera”. Pat. US8164628 B2.
Thoman, Peter (2014). Diving into Anti-Aliasing: Sampling-based Anti-Aliasing Tech-
niques. Ed. by Beyond3D. url: https://www.beyond3d.com/content/articles/
122/4.
Thrun, Sebastian, Wolfram Burgard, and Dieter Fox (2006). Probabilistic robotics. In-
telligent robotics and autonomous agents series. Cambridge, Mass.: MIT Press. isbn:
978-0-262-20162-9.
Tukey, John W. (1977). “Exploratory data analysis”. In: Addison-Wesley, pp. 530–537.
Urban, Steffen, Jens Leitloff, and Stefan Hinz (2016). “MLPnP - A Real-Time Maxi-
mum Likelihood Solution to the Perspective-n-Point Problem”. In: ISPRS Annals of
Photogrammetry, Remote Sensing and Spatial Information Sciences III-3, pp. 131–
138. issn: 2194-9050. doi: 10.5194/isprs-annals-III-3-131-2016.
Walters, Austin and Bhargava Manja (2015). “ChromaTag - A Colored Fiducial Marker”.
In: International Conference on Computer Vision arXiv:1708.02982.
Wang, John and Edwin Olson (2016). “AprilTag 2: Efficient and robust fiducial de-
tection”. In: 2016 IEEE/RSJ International Conference on Intelligent Robots and
Systems (IROS). IEEE, pp. 4193–4198. isbn: 978-1-5090-3762-9. doi: 10 . 1109 /
IROS.2016.7759617.
Wilson, Daniel B., Ali H. Goktogan, and Salah Sukkarieh (2014). “A vision based rel-
ative navigation framework for formation flight”. In: IEEE International Conference
on Robotics and Automation (ICRA), 2014. Piscataway, NJ: IEEE, pp. 4988–4995.
isbn: 978-1-4799-3685-4. doi: 10.1109/ICRA.2014.6907590.
Winkens, Christian and Dietrich Paulus (2017). “Long Range Optical Truck Tracking”.
In: Proceedings of the 9th International Conference on Agents and Artificial Intel-
ligence. SCITEPRESS - Science and Technology Publications, pp. 330–339. isbn:
978-989-758-219-6. doi: 10.5220/0006296003300339.
Zhang, Hongmou et al. (2017). “Uncertainty Model for Template Feature Matching”.
In: PSIVT2017.
Zhang, Z. (2000). “A flexible new technique for camera calibration”. In: IEEE Trans-
actions on Pattern Analysis and Machine Intelligence 22.11, pp. 1330–1334. issn:
01628828. doi: 10.1109/34.888718.
vi
Technology List Patrick Irmisch
Technology List
[OSG] OpenSceneGraph-3.4.0. 2015. OpenSceneGraph is an OpenGL-based high per-
formance 3D graphics toolkit for visual simulation, games, virtual reality, scien-
tific visualization, and modelin. http://www.openscenegraph.org. Last down-
loaded 2017-06-12.
[OpenGL] OpenGL. The Industry’s Foundation of High Performance Graphics. https:
//www.opengl.org/. Last downloaded 2017-06-12. Embedded in OpenScene-
Graph.
[OpenCV] OpenCV-3.1 2015. Open Source Computer Vision Library. http://opencv.
org/. Last downloaded 2016.
[OSLib] OSLib. DLR Intern. C++ Software Library for Image Processing. Implements
basics structures, classes and algorithms. Last downloaded 2017-07-27.
[OSVisionLib] OSVisionLib. DLR Intern. C++ Software Library for Image Processing.
Implements computer vision algorithms and interfaces to access external libraries.
Last downloaded 2017-07-27.
[AprilTagLib] Michael Kaess. 2012. AprilTags Library. https://github.com/NifTK/
apriltags. Last downloaded 2016.
[WhyConLib] Tomáš Krajník, Matias Nitsche, Jan Faigl. 2016. WhyCon. https://
github.com/LCAS/whycon. Last downloaded Arpil 2017.
[CalLab] K. H. Strobl and W. Sepp and S. Fuchs and C. Paredes and M. Smisek
and K. Arbter. DLR CalDe and DLR CalLab. Institute of Robotics and Mecha-
tronics, German Aerospace Center (DLR). Oberpfaffenhofen, Germany. http:
//www.robotic.dlr.de/callab/. Last checked 2017.
[DLRStereo] DLR. Outdoor Stereo Camera. Cameras: Prosilica GC1380H (resolution:
1360x1024, cell-size:6.52 µm2 ). Baseline: 0.34m.
[DellPrecision] Dell Precision Tower 3620. Processor: Intel(R) Xeon(R) CPU E3-1270
v5 @ 3.6 HHz 4 Cores 8 Threads. Graphic Card: NVIDIA Quadro M4000 8 GB
GDDR5 1664 CUDA Cores.
[Blend1] Mesh of a train. Blendswap. German Train BR646 of the UBB. Source and li-
cense information: https://www.blendswap.com/blends/view/83719. License
type: CC-BY. Last downloaded 2017-06-25. Changes: Added DLR logo.
[Blend2] Mesh of a rail. Blendswap. Train. Source and license information: https://
www.blendswap.com/blends/view/22626. License: CC-Zero. Last downloaded
2017-04-23. Changes: Used and changed rails and ground.
[Town] Institut für Verkehrssystemtechnik. DLR Intern. Demo Small Town.
[GLM-80] Bosch. Bosch GLM-80.
vii
Appendix Patrick Irmisch
Appendix
The appendix provides further considerations and completions that are less decisive
for the work itself. This includes an further investigation of the fundamental meth-
ods. Also, all parameters used in this experiment are listed and more information
to the individual conducted experiments are provided. Finally, a few more simulated
experiments are conducted.
A - Method Characteristics
20
0
−1
0
0 2 4 0 2 4
20
0
−1
0
0 2 4 0 2 4
noise level [*σi ] noise level [*σi ]
EPnP (øt : 87us) RPnP (øt : 175us) RPnP+Iterative (øt : 248us)
Figure 53 visualizes the applied camera pose and used correspondences for comparing
different PnP-methods. The chosen points are the corners of the visualized AprilTag.
However, AprilTags are only used for visualization and its detection is not applied
during these experiments. White crosses mark points that are used for the 4-point
evaluation. (a) shows the setup for the experiment of Section 3.2.2. All applied world
points are co-planar. (b) shows a setup where the upper marker is positioned one
half meter in front of the wall, marked with a white arrow. This setup is used in
figure 52 to compare EPNP and RPNP for non-planar world points. When using twelve
viii
Appendix Patrick Irmisch
correspondences EPNP is more accurate than RPNP. However, when using only four
points, which is the use case for this work, RPNP is more accurate. Furthermore, in
the case of non-noised data EPNP shows again a degenerated solution.
Figure 54: Application of AprilTag pattern detection on an outdoor image (a), a cor-
responding subimage (b) and an indoor image (c). Blue quads illustrate the
detected AprilTags.
This experiment addresses the strong correspondences of the quad detection and identi-
fication of AprilTags. Table 3 (c) shows that the method generates multiple candidates
of quads, which are subsequently verified by the code identification. Furthermore, the
number gets greatly increased in texture rich images, as shown in figure 54 (a). Because
of the gravel, a high number of quad candidates is produced, which also increases the
false positive rate of detected AprilTags of the used Tag-Set.
Table 3 also maintains the average required time of 100 repetitions of processing each
frame. It shows that image (a) requires much more computational time than the indoor
image (c). Moreover, the process occupied multiples cores on a [DellPrecision], which
shows that the processing of a full image is not real-time capable in the underlying
implementation. However, the results of image (c) show that a restriction of the search
area in the image, as for instance by tracking the vehicle, would lead to faster process-
ix
Appendix Patrick Irmisch
ing time. This supports the decision not to evaluate the processing time in this study.
Table 4 lists the required computational time for each step of the single WhyCon dis-
tance estimation based on the stereo frame of Figure 55. The majority of the processing
time is consumed by the initial detection of the pattern in the image pyramid, which
processes the entire image in multiple levels. The redetection, that represents the pro-
cessing by the original method, requires only a minor part of the overall computational
time. Thus, the processing in the image pyramid could be reduced greatly in the in
presence of marker tracking.
Figure 56 shows an experiment to evaluate different widths of the outer circle of the
WhyCon pattern. This is done by varying the ratio of the circle width to the inner ra-
dius. In the configuration of WhyCons+PnP, a ratio of 0.6 is used that approximately
corresponds to the ratio used for a test marker provided in [WhyConLib]. Figure 56
shows that a ratio of 0.5 or less provides a better detection range in the proposed
setup based on an adaptive thresholding. This experiment shows that WhyCons+PnP
has the potential to provide an even further application range than estimated in Sec-
tion 6.1.2.
x
Appendix Patrick Irmisch
Figure 55: Illustration of all detected WhyCon pattern in the stereo image
0.6 0.5
0.5
0.4 0.0
20 30 40 50 60 70
distance [m] 50 fpb
Figure 56: Experiment to find the most suitable width for WhyCon
For all simulated experiments of this study, image noise is applied based on the noise
model of Section 3.1.3 (p.10). Its influence on the methods has shown to be very small,
which is why it was not considered separately in this study. However, one charac-
teristic of WhyCon has been noticed, which is outlined in this section. First of all,
Figure 57 (i) shows only a weak influence of image noise. It evaluates WhyCon+Circle
at a distance of 80m with applied image noise (on) and without (off), while applying
the general variation of the simulation. The figure shows the individual estimations
for the left and the right camera. The applied image noise does not greatly change the
distribution.
In contrast, Figure 57 (ii) shows a different behavior. For this experiment, 200 frames
from the same position for the left and the right camera are captured. Thus, they rep-
resent two snapshots. The images are captured either with an image exposure of 0.1
and 0.8 for either off and on image exposure. In the case that image noise is applied,
it is striking that only for an exposure of 0.1 a spreading occurs. Furthermore, the
spreads vary in their severity.
An explanation for this is the hard classification in black and white pixels of WhyCon.
In relation, Figure 41 (p.41) (left) illustrates for low image exposure that the value of
xi and the estimated threshold are very close. Thus, if the range of the noise of xi over-
laps with the estimated threshold, the assignment of xi varies between different images
from the same pose. As more border pixels of the projected pattern overlap with the
xi
Appendix Patrick Irmisch
estimated threshold, as greater becomes the spreading of the resulting distance. This
effect also occurs for long exposure as casually illustrated in Figure 41 (p.41) (right),
but less likely since the value range is much wider.
This effect is strongly related to the influence of image exposure. (Krajník et al., 2014)
propose to use the ratio of the inner and outer circle to correct the estimated results
and state that this "compensation of the pattern diameter reduces the average local-
ization error by approximately 15%".
WhyCon + ConicWhyCon
CL + CircleWhyCon
CL + Conic CL + CircleWhyCon
WhyCon CR + Conic CL
WhyCon + Conic CR WhyCon + Conic CR WhyCon + Conic CR
Figure 57: Evaluation of the application of image noise at a distance of 80m
Figure 5 presents a short consideration of the computation time of the four most
important methods of this study. The Time is average over 1000 distance estimations
with each method based on the pictured image and run on a [DellPrecision]. The
results show that all methods require approximately the same time for this exemplary
image. Please note that the AprilTag detection uses multiple CPUs, WhyCon runs on
one CPU and SGM on the GPU.
xii
Appendix Patrick Irmisch
B - Tables
xiii
Appendix Patrick Irmisch
xiv
Appendix Patrick Irmisch
Table 8 defines the measured distances for the real-world dataset of Figure 21 (a,b)
that are used as ground truth values. For the measurement a laser scanner [GLM-80]
was used. The images of this dataset show large image blooming caused by extrem
over exposure. This is exemplary shown in Figure 58.
Table 9 defines the measured distances of the real-world experiment of Figure 21 (c)
divided in near and far distances. The measurements of dataset (d) are only based on
marking on the ground measured with a tape measure. A precise measurement is not
required for this particular dataset, because it is only used to estimate the application
range.
xv
Appendix Patrick Irmisch
alpha = 0◦
distance[m] 10.0 20.0 30.0 40.0 50.0 60.0 70.0 80.0
unit % % % % % % % %
AprilTags+PnP CL 100.0 99.0 56.0 10.0 0.0 0.0 0.0 0.0
WhyCons+PnP CL 100.0 100.0 99.0 96.0 68.0 6.0 0.0 0.0
WhyCon+Circle CL 100.0 100.0 100.0 100.0 100.0 100.0 97.0 86.0
SGM 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0
alpha = 30◦
distance[m] 10.0 20.0 30.0 40.0 50.0 60.0 70.0 80.0
unit % % % % % % % %
AprilTags+PnP CL 100.0 100.0 93.0 3.0 0.0 0.0 0.0 0.0
WhyCons+PnP CL 100.0 100.0 100.0 91.0 14.0 0.0 0.0 0.0
WhyCon+Circle CL 100.0 100.0 100.0 100.0 100.0 99.0 89.0 68.0
SGM 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0
alpha = 60◦
distance[m] 10.0 20.0 30.0 40.0 50.0 60.0 70.0 80.0
unit % % % % % % % %
AprilTags+PnP CL 100.0 97.0 1.0 0.0 0.0 0.0 0.0 0.0
WhyCons+PnP CL 100.0 93.0 0.0 0.0 0.0 0.0 0.0 0.0
WhyCon+Circle CL 100.0 100.0 100.0 99.0 80.0 29.0 1.0 2.0
SGM 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0
xvi
Appendix Patrick Irmisch
Near distances
distance[m] 5 7.5 10 12.5 15 20
unit m m m m m m
AprilTags+PnP CL 0.052 0.071 0.09 0.111 0.131 0.187
WhyCons+PnP CL 0.043 0.061 0.083 0.102 0.121 0.162
WhyCon+Circle CL 0.021 0.03 0.041 0.053 0.064 0.096
SGM 0.057 0.127 0.225 0.353 0.51 0.905
Far distances
distance[m] 30 40 50 60 70 80
unit m m m m m m
AprilTags+PnP CL - - - - - -
WhyCons+PnP CL 0.238 0.318 - - - -
WhyCon+Circle CL 0.205 0.43 0.707 0.692 1.41 3.008
SGM - 3.803 6.04 8.598 12.182 16.811
1.00
AprilTags + PnP CL
AprilTags + PnP CR
WhyCons + PnP CL 0.75
WhyCons + PnP CR
WhyCon + Circle CL 0.50
WhyCon + Circle CR
AprilTag + Tri. 0.25
WhyCon + Tri.
SGM
0.00
XYZ S XYZ S XYZ S XYZ S XYZ S XYZ S XYZ S XYZ S
Tag 0 TX
g TY
Ta ag TZ
Tag 1 0 S
Tag 1 TX
g TY
Ta ag TZ
Tag 2 1 S
Tag 2 TX
g TY
Ta ag TZ
Tag 3 2 S
Tag 3 TX
g TY
Ta g Z
Tag 4 3 S
Tag 4 TX
g TY
Ta ag TZ
Tag 5 4 S
Tag 5 TX
g TY
Ta ag TZ
Tag 6 5 S
Tag 6 TX
g TY
Ta ag TZ
Tag 7 6 S
Tag 7 TX
g TY
g Z
S
Ta 3 T
Ta 7 T
7
T0
T1
T2
T4
T5
T6
Tag 0
Ta
down left down right up right up left down left down right
Figure 59: Correlation to uncertainty of the marker pose (Figure 44, p.44)
1.00
AprilTags + PnP CL
AprilTags + PnP CR
WhyCons + PnP CL 0.75
WhyCons + PnP CR
WhyCon + Circle CL 0.50
WhyCon + Circle CR
AprilTag + Tri. 0.25
WhyCon + Tri.
SGM
0.00
f u0 v0 k1 k2 f u0 v0 k1 k2 RxRy Rz TxTy Tz
Exposure
f
CL u0
CL v0
CL k1
C k2
f
CR u0
CR v0
St C k1
St reo k2
St reo RX
St ereo RY
St reo RZ
St reo TX
Exreo Y
po TZ
re
CL L
CRR
e T
su
C
e R
CL CR HCR2CL
e
e
e
xvii
Appendix Patrick Irmisch
1.00
AprilTags + PnP CL
AprilTags + PnP CR
WhyCons + PnP CL 0.75
WhyCons + PnP CR
WhyCon + Circle CL 0.50
WhyCon + Circle CR
AprilTag + Tri. 0.25
WhyCon + Tri.
SGM
0.00
Tr em le TX
Tr mb le TY
Tremble R Z
em le X
bl RY
Z
R
Tr mble T
e
e b
Tremb
e
Tr
Figure 61: Correlation to orientation trembling. Please note that the trembling is part
of the ground truth matrix.
Table 13: The columns of "WC" correspond to the correlation of the WhyCons+PnP
and the four whycon marker. The columns of "AT" correspond to the
correlation of the AprilTag+PnP and the three AprilTags. The angle is
defined in the yz-plane. (m-middle, d- down, t-top, l-left, r-right)
xviii
Appendix Patrick Irmisch
xix
Appendix Patrick Irmisch
Near distances
distance[m] 30 40 50 60 70 80
unit m m m m m m
WhyCon+Circle CL 0.205 0.43 0.707 0.692 1.41 3.008
WhyCon+Circle CR 0.209 0.422 0.695 0.709 1.436 3.069
WhyCon+Tri. 2.056 3.662 5.803 8.445 11.95 16.204
SGM 2.078 3.817 6.0 8.654 12.236 16.771
Combined 0.18 0.349 0.543 0.573 1.103 3.021
Far distances
distance[m] 5 7.5 10 12.5 15 20
unit m m m m m m
WhyCon+Circle CL 0.021 0.03 0.041 0.053 0.064 0.096
WhyCon+Circle CR 0.021 0.03 0.04 0.053 0.064 0.096
WhyCon+Tri. 0.057 0.126 0.223 0.347 0.504 0.897
SGM - 0.127 0.225 0.353 0.51 0.905
Combined - 0.029 0.039 0.051 0.062 0.091
xx