Papers by Guillermo Gallego

Event cameras are bio-inspired vision sensors that output pixel-level brightness changes instead ... more Event cameras are bio-inspired vision sensors that output pixel-level brightness changes instead of standard intensity frames. They offer significant advantages over standard cameras, namely a very high dynamic range, no motion blur, and a latency in the order of microseconds. However, because the output is composed of a sequence of asyn-chronous events rather than actual intensity images, traditional vision algorithms cannot be applied, so that a paradigm shift is needed. We introduce the problem of Event-based Multi-View Stereo (EMVS) for event cameras and propose a solution to it. Unlike traditional MVS methods, which address the problem of estimating dense 3D structure from a set of known viewpoints, EMVS estimates semi-dense 3D structure from an event camera with known trajectory. Our EMVS solution elegantly exploits two inherent properties of an event camera: (i) its ability to respond to scene edges—which naturally provide semi-dense geometric information without any pre-processing operation—and (ii) the fact that it provides continuous measurements as the sensor moves. Despite its simplicity (it can be implemented in a few lines of code), our algorithm is able to produce accurate, semi-dense depth maps. We successfully validate our method on both synthetic and real data. Our method is computationally very efficient and runs in real-time on a CPU.

New vision sensors, such as the Dynamic and Active-pixel Vision sensor (DAVIS), incorporate a con... more New vision sensors, such as the Dynamic and Active-pixel Vision sensor (DAVIS), incorporate a conventional camera and an event-based sensor in the same pixel array. These sensors have great potential for robotics because they allow us to combine the benefits of conventional cameras with those of event-based sensors: low latency, high temporal resolution, and high dynamic range. However, new algorithms are required to exploit the sensor characteristics and cope with its unconventional output, which consists of a stream of asynchronous brightness changes (called " events ") and synchronous grayscale frames. In this paper, we present a low-latency visual odometry algorithm for the DAVIS sensor using event-based feature tracks. Features are first detected in the grayscale frames and then tracked asynchronously using the stream of events. The features are then fed to an event-based visual odometry algorithm that tightly interleaves robust pose optimization and probabilistic mapping. We show that our method successfully tracks the 6-DOF motion of the sensor in natural scenes. This is the first work on event-based visual odometry with the DAVIS sensor using feature tracks.

Because standard cameras sample the scene at constant time intervals, they do not provide any inf... more Because standard cameras sample the scene at constant time intervals, they do not provide any information in the blind time between subsequent frames. However, for many high-speed robotic and vision applications, it is crucial to provide high-frequency measurement updates also during this blind time. This can be achieved using a novel vision sensor, called DAVIS, which combines a standard camera and an asynchronous event-based sensor in the same pixel array. The DAVIS encodes the visual content between two subsequent frames by an asynchronous stream of events that convey pixel-level brightness changes at microsecond resolution. We present the first algorithm to detect and track visual features using both the frames and the event data provided by the DAVIS. Features are first detected in the grayscale frames and then tracked asynchronously in the blind time between frames using the stream of events. To best take into account the hybrid characteristics of the DAVIS, features are built based on large, spatial contrast variations (i.e., visual edges), which are the source of most of the events generated by the sensor. An event-based algorithm is further presented to track the features using an iterative, geometric registration approach. The performance of the proposed method is evaluated on real data acquired by the DAVIS.

Event-based vision sensors, such as the Dynamic Vision Sensor (DVS), do not output a sequence of ... more Event-based vision sensors, such as the Dynamic Vision Sensor (DVS), do not output a sequence of video frames like standard cameras, but a stream of asynchronous events. An event is triggered when a pixel detects a change of brightness in the scene. An event contains the location, sign, and precise timestamp of the change. The high dynamic range and temporal resolution of the DVS, which is in the order of micro-seconds, make this a very promising sensor for high-speed applications, such as robotics and wearable computing. However, due to the fundamentally different structure of the sensor's output, new algorithms that exploit the high temporal resolution and the asynchronous nature of the sensor are required. In this paper, we address ego-motion estimation for an event-based vision sensor using a continuous-time framework to directly integrate the information conveyed by the sensor. The DVS pose trajectory is approximated by a smooth curve in the space of rigid-body motions using cubic splines and it is optimized according to the observed events. We evaluate our method using datasets acquired from sensor-in-the-loop simulations and onboard a quadrotor performing flips. The results are compared to the ground truth, showing the good performance of the proposed technique.

2015 IEEE International Conference on Robotics and Automation (ICRA), 2015
We propose an algorithm to estimate the "lifetime" of events from retinal cameras, such as a Dyna... more We propose an algorithm to estimate the "lifetime" of events from retinal cameras, such as a Dynamic Vision Sensor (DVS). Unlike standard CMOS cameras, a DVS only transmits pixel-level brightness changes ("events") at the time they occur with micro-second resolution. Due to its low latency and sparse output, this sensor is very promising for highspeed mobile robotic applications. We develop an algorithm that augments each event with its lifetime, which is computed from the event's velocity on the image plane. The generated stream of augmented events gives a continuous representation of events in time, hence enabling the design of new algorithms that outperform those based on the accumulation of events over fixed, artificially-chosen time intervals. A direct application of this augmented stream is the construction of sharp gradient (edge-like) images at any time instant. We successfully demonstrate our method in different scenarios, including highspeed quadrotor flips, and compare it to standard visualization methods.
We propose a novel Variational Wave Acquisition Stereo System (VWASS) that exploits new stereo re... more We propose a novel Variational Wave Acquisition Stereo System (VWASS) that exploits new stereo reconstruction techniques for accurate estimates of the space-time dynamics of oceanic sea states. The rich information content of the acquired three-dimensional video data is exploited to make predictions on directional spectra and large waves over an area. These are based on a new statistical analysis that stems from the theory of Euler Characteristics of random fields. The broader impact of these results for oceanic applications is finally discussed.
Real-time surveillance application by multiple detectors and compressive trackers
2015 IEEE International Conference on Consumer Electronics (ICCE), 2015

IEEE Transactions on Image Processing, Nov 2013
An image processing observational technique for the stereoscopic reconstruction of the wave form ... more An image processing observational technique for the stereoscopic reconstruction of the wave form of oceanic sea states is developed. The technique incorporates the enforcement of any given statistical wave law modeling the quasi Gaussianity of oceanic waves observed in nature. The problem is posed in a variational optimization framework, where the desired wave form is obtained as the minimizer of a cost functional that combines image observations, smoothness priors and a weak statistical constraint. The minimizer is obtained combining gradient descent and multigrid methods on the necessary optimality equations of the cost functional. Robust photometric error criteria and a spatial intensity compensation model are also developed to improve the performance of the presented image matching strategy. The weak statistical constraint is thoroughly evaluated in combination with other elements presented to reconstruct and enforce constraints on experimental stereo data, demonstrating the improvement in the estimation of the observed ocean surface.

IEEE Transactions on Circuits and Systems I: Regular Papers, Nov 2014
The analysis of complex nonlinear systems is often carried out using simpler piecewise linear rep... more The analysis of complex nonlinear systems is often carried out using simpler piecewise linear representations of them. A principled and practical technique is proposed to linearize and evaluate arbitrary continuous nonlinear functions using polygonal (continuous piecewise linear) models under the L1 norm. A thorough error analysis is developed to guide an optimal design of two kinds of polygonal approximations in the asymptotic case of a large budget of evaluation subintervals N. The method allows the user to obtain the level of linearization (N) for a target approximation error and vice versa. It is suitable for, but not limited to, an efficient implementation in modern Graphics Processing Units (GPUs), allowing real-time performance of computationally demanding applications. The quality and efficiency of the technique has been measured in detail on two nonlinear functions that are widely used in many areas of scientific computing and are expensive to evaluate.

Journal of Mathematical Imaging and Vision, Nov 2014
In 3D reconstruction, the recovery of the calibration parameters of the cameras is paramount sinc... more In 3D reconstruction, the recovery of the calibration parameters of the cameras is paramount since it provides metric information about the observed scene, e.g., measures of angles and ratios of distances. Autocalibration enables the estimation of the camera parameters without using a calibration device, but by enforcing simple constraints on the camera parameters. In the absence of information about the internal camera parameters such as the focal length and the principal point, the knowledge of the camera pixel shape is usually the only available constraint. Given a projective reconstruction of a rigid scene, we address the problem of the autocalibration of a minimal set of cameras with known pixel shape and otherwise arbitrarily varying intrinsic and extrinsic parameters. We propose an algorithm that only requires 5 cameras (the theoretical minimum), thus halving the number of cameras required by previous algorithms based on the same constraint. To this purpose, we introduce as our basic geometric tool the six-line conic variety (SLCV), consisting in the set of planes intersecting six given lines of 3D space in points of a conic. We show that the set of solutions of the Euclidean upgrading problem for three cameras with known pixel shape can be parameterized in a computationally efficient way. This parameterization is then used to solve autocalibration from five or more cameras, reducing the three-dimensional search space to a two-dimensional one. We provide experiments with real images showing the good performance of the technique.
Volume 4: Ocean Engineering; Offshore Renewable Energy, Jun 2008
In-loop feature tracking for structure and motion with out-of-core optimization
In this paper, a novel and approach for obtaining 3D models from video sequences captured with ha... more In this paper, a novel and approach for obtaining 3D models from video sequences captured with hand-held cameras is addressed. We define a pipeline that robustly deals with different types of sequences and acquiring devices. Our system follows a divide and conquer approach: after a frame decimation that pre-conditions the input sequence, the video is split into short-length clips. This

A variational wave acquisition stereo system for the 3-D reconstruction of oceanic sea states
ABSTRACT We propose a novel remote sensing technique that infers the three-dimensional wave form ... more ABSTRACT We propose a novel remote sensing technique that infers the three-dimensional wave form and radiance of oceanic sea states via a variational stereo imagery formulation. In this setting, the shape and radiance of the wave surface are minimizers of a com-posite cost functional which combines a data fidelity term and smoothness priors on the unknowns. The solution of a system of coupled partial differential equations derived from the cost functional yields the desired ocean surface shape and radiance. The proposed method is naturally extended to study the spatio-temporal dynamics of ocean waves, and applied to three sets of video data. Statistical and spectral analysis are carried out. The results shows evidence of the fact that the omni-directional wavenumber spectrum S(k) of the reconstructed waves decays as k −2.5 in agreement with Zakharov's theory (1999). Further, the three-dimensional spectrum of the reconstructed wave surface is exploited to estimate wave dispersion and currents. INTRODUCTION Wind-generated waves play a prominent role at the inter-faces of the ocean with the atmosphere, land and solid Earth. Waves also define in many ways the appearance of the ocean seen by remote-sensing instruments. Classical observational methods rely on time series retrieved from wave gauges and ultrasonic in-struments or buoys to measure the space-time dynamics of ocean waves. Global altimeters, or Synthetic Aperture Radar (SAR) instruments are exploited for observations of large oceanic ar-eas via satellites, but details on small-scales are lost. Herein, we propose to complement the abovementioned instruments with a novel video observational system which relies on variational stereo techniques to reconstruct the 3-D wave surface both in space and time. Such system uses two or more stereo camera views pointing at the ocean to provide spatio-temporal data and statistical content richer than that of previous monitoring meth-ods. Vision systems are non-intrusive and have economical ad-vantages over their predecessors, but they require more process-ing power to extract information from the ocean.
Weak statistical constraints for variational stereo imaging of oceanic waves
We present a non-conformal metric that generalizes the geodesic active contours approach for imag... more We present a non-conformal metric that generalizes the geodesic active contours approach for image segmentation. The new metric is obtained by adding to the Euclidean metric an additional term that penalizes the misalignment of the curve with the image gradient and multiplying the resulting metric by a conformal factor that depends on the edge intensity. In this way, a closer fitting to the edge direction results. The provided experimental results address the computation of the geodesics of the new metric by applying a gradient descent to externally provided curves. The good performance of the proposed techniques is demonstrated in comparison with other active contours methods.
Two variational stereo methods for space-time measurements of ocean waves
On the Mahalanobis distance classification criterion for multidimensional normal distributions
Improving 3-D variational stereo reconstruction of oceanic sea states by camera calibration refinement

The analysis of complex nonlinear systems is often carried out using simpler piecewise linear rep... more The analysis of complex nonlinear systems is often carried out using simpler piecewise linear representations of them. A principled and practical technique is proposed to linearize and evaluate arbitrary continuous nonlinear functions using polygonal (continuous piecewise linear) models under the L1 norm. A thorough error analysis is developed to guide an optimal design of two kinds of polygonal approximations in the asymptotic case of a large budget of evaluation subintervals N. The method allows the user to obtain the level of linearization (N) for a target approximation error and vice versa. It is suitable for, but not limited to, an efficient implementation in modern Graphics Processing Units (GPUs), allowing real-time performance of computationally demanding applications. The quality and efficiency of the technique has been measured in detail on two nonlinear functions that are widely used in many areas of scientific computing and are expensive to evaluate.
Uploads
Papers by Guillermo Gallego