Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition
Depth maps captured with time-of-flight cameras have very low data quality: the image resolution is rather limited and the level of random noise contained in the depth maps is very high. Therefore, such flash lidars cannot be used out of the box for high-quality 3D object scanning. To solve this problem, we present LidarBoost, a 3D depth superresolution method that combines several low resolution noisy depth images of a static scene from slightly displaced viewpoints, and merges them into a high-resolution depth image. We have developed an optimization framework that uses a data fidelity term and a geometry prior term that is tailored to the specific characteristics of flash lidars. We demonstrate both visually and quantitatively that LidarBoost produces better results than previous methods from the literature.
2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, 2008
Time-of-flight (TOF) cameras robustly provide depth data of real world scenes at video frame rates. Unfortunately, currently available camera models provide rather low X-Y resolution. Also, their depth measurements are starkly influenced by random and systematic errors which renders them inappropriate for high-quality 3D scanning. In this paper we show that ideas from traditional color image superresolution can be applied to TOF cameras in order to obtain 3D data of higher X-Y resolution and less noise. We will also show that our approach, which works using depth images only, bears many advantages over alternative depth upsampling methods that combine information from separate high-resolution color and low-resolution depth data.
IEEE Robotics and Automation Letters
LiDAR depth completion is a task that predicts depth values for every pixel on the corresponding camera frame, although only sparse LiDAR points are available. Most of the existing state-of-the-art solutions are based on deep neural networks, which need a large amount of data and heavy computations for training the models. In this letter, a novel non-learning depth completion method is proposed by exploiting the local surface geometry that is enhanced by an outlier removal algorithm. The proposed surface geometry model is inspired by the observation that most pixels with unknown depth have a nearby LiDAR point. Therefore, it is assumed those pixels share the same surface with the nearest LiDAR point, and their respective depth can be estimated as the nearest LiDAR depth value plus a residual error. The residual error is calculated by using a derived equation with several physical parameters as input, including the known camera intrinsic parameters, estimated normal vector, and offset distance on the image plane. The proposed method is further enhanced by an outlier removal algorithm that is designed to remove incorrectly mapped LiDAR points from occluded regions. On KITTI dataset, the proposed solution achieves the best error performance among all existing non-learning methods and is comparable to the best self-supervised learning method and some supervised learning methods. Moreover, since outlier points from occluded regions is a commonly existing problem, the proposed outlier removal algorithm is a general preprocessing step that is applicable to many robotic systems with both camera and LiDAR sensors. The code has been published at https://github.com/placeforyiming/RAL_Non-Learning_DepthCompletion. Index Terms-Computer vision for transportation, range sensing, sensor fusion. I. INTRODUCTION L IDAR is playing an important role for modern mobile robots. Many LiDAR related perception tasks, such as LiDAR 3D detection [1] or point cloud semantic segmentation [2] attract many recent contributions. LiDAR-based depth completion is one of those tasks which aims to provide a full dense depth map for the camera from the sparse depth generated by mapping LiDAR points on the image. Before the booming of LiDAR technology, depth completion is usually referred as Manuscript
This paper presents an algorithm that reconstructs a broad-view 3D image from multiple depth and optical cameras. The single-view super-resolution reconstruction method utilizing Markov Random Field is extended into a multi-view setting by incorporating noise characteristics of depth sensors and depth information from multiple views in addition to the color information. The noisy and low resolution time-of-flight depth sensor measurements are transformed into high-quality 3D information. We also analyze the consequences of changing weights on different terms of the optimization objective function.
Springer Tracts in Advanced Robotics, 2013
Globally consistent 3D maps are commonly used for robot mission planning, navigation, and teleoperation in unstructured and uncontrolled environments. These maps are typically represented as 3D point clouds; however other representations, such as surface or solid models, are often required for humans to perform scientific analyses, infrastructure planning, or for general visualization purposes. Robust large-scale solid model reconstruction from point clouds of outdoor scenes can be challenging due to the presence of dynamic objects, the ambiguitiy between non-returns and sky-points, and scalability requirements. Volume-based methods are able to remove spurious points arising from moving objects in the scene by considering the entire ray of each measurement, rather than simply the end point. Scalability can be addressed by decomposing the overall space into multiple tiles, from which the resulting surfaces can later be merged. We propose an approach that applies a weighted signed distance function along each measurement ray, where the weight indicates the confidence of the calculated distance. Due to the unenclosed nature of outdoor environments, we introduce a technique to automatically generate a thickened structure in order to model surfaces seen from only one side. The final solid models are thus suitable to be physically printed by a rapid prototyping machine. The approach is evaluated on 3D laser point cloud data collected from a mobile lidar in unstructured and uncontrolled environments, including outdoors and inside caves. The accuracy of the solid model reconstruction is compared to a previously developed binary voxel carving method. The results show that the weighted signed distance approach produces a more accurate reconstruction of the surface, and since higher accuracy models can be produced at lower resolutions, this additionally results in significant improvements in processing time.
2016 IEEE 19th International Conference on Intelligent Transportation Systems (ITSC), 2016
High resolution depth-maps, obtained by upsampling sparse range data from a 3D-LIDAR, find applications in many fields ranging from sensory perception to semantic segmentation and object detection. Upsampling is often based on combining data from a monocular camera to compensate the low-resolution of a LIDAR. This paper, on the other hand, introduces a novel framework to obtain dense depth-map solely from a single LIDAR point cloud; which is a research direction that has been barely explored. The formulation behind the proposed depth-mapping process relies on local spatial interpolation, using sliding-window (mask) technique, and on the Bilateral Filter (BF) where the variable of interest, the distance from the sensor, is considered in the interpolation problem. In particular, the BF is conveniently modified to perform depthmap upsampling such that the edges (foreground-background discontinuities) are better preserved by means of a proposed method which influences the range-based weighting term. Other methods for spatial upsampling are discussed, evaluated and compared in terms of different error measures. This paper also researches the role of the mask's size in the performance of the implemented methods. Quantitative and qualitative results from experiments on the KITTI Database, using LIDAR point clouds only, show very satisfactory performance of the approach introduced in this work.
2012 IEEE International Conference on Robotics and Automation, 2012
The combination of range sensors with color cameras can be very useful for robot navigation, semantic perception, manipulation, and telepresence. Several methods of combining range-and color-data have been investigated and successfully used in various robotic applications. Most of these systems suffer from the problems of noise in the range-data and resolution mismatch between the range sensor and the color cameras, since the resolution of current range sensors is much less than the resolution of color cameras. High-resolution depth maps can be obtained using stereo matching, but this often fails to construct accurate depth maps of weakly/repetitively textured scenes, or if the scene exhibits complex self-occlusions. Range sensors provide coarse depth information regardless of presence/absence of texture. The use of a calibrated system, composed of a time-of-flight (TOF) camera and of a stereoscopic camera pair, allows data fusion thus overcoming the weaknesses of both individual sensors. We propose a novel TOF-stereo fusion method based on an efficient seed-growing algorithm which uses the TOF data projected onto the stereo image pair as an initial set of correspondences. These initial "seeds" are then propagated based on a Bayesian model which combines an image similarity score with rough depth priors computed from the low-resolution range data. The overall result is a dense and accurate depth map at the resolution of the color cameras at hand. We show that the proposed algorithm outperforms 2D image-based stereo algorithms and that the results are of higher resolution than off-the-shelf color-range sensors, e.g., Kinect. Moreover, the algorithm potentially exhibits real-time performance on a single CPU.
Sensors, 2014
The 3D acquisition of object structures has become a common technique in many fields of work, e.g., industrial quality management, cultural heritage or crime scene documentation. The requirements on the measuring devices are versatile, because spacious scenes have to be imaged with a high level of detail for selected objects. Thus, the used measuring systems are expensive and require an experienced operator. With the rise of low-cost 3D imaging systems, their integration into the digital documentation process is possible. However, common low-cost sensors have the limitation of a trade-off between range and accuracy, providing either a low resolution of single objects or a limited imaging field. Therefore, the use of multiple sensors is desirable. We show the combined use of two low-cost sensors, the Microsoft Kinect and the David laserscanning system, to achieve low-resolved scans of the whole scene and a high level of detail for selected objects, respectively. Afterwards, the high-resolved David objects are automatically assigned to their corresponding Kinect object by the use of surface feature histograms and SVM-classification. The corresponding objects are fitted using an ICP-implementation to produce a multi-resolution map. The applicability is shown for a fictional crime scene and the reconstruction of a ballistic trajectory.
Lecture Notes in Computer Science, 2009
In this paper a systematic approach to the processing and combination of high resolution color images and low resolution time-offlight depth maps is described. The purpose is the calculation of a dense depth map for one of the high resolution color images. Special attention is payed to the different nature of the input data and their large difference in resolution. This way the low resolution time-of-flight measurements are exploited without sacrificing the high resolution observations in the color data.
4th International Symposium on Innovative Approaches in Engineering and Natural Sciences Proceedings, 2019
One of the fastest 3D data acquisition methods is LiDAR scanning technology. By using LiDAR scanners, 3D point clouds of the scanned scene are obtained easily. Although the scanning phase is fast, the meaningful and effective 3D visualization of the scene requires the raw point cloud data to be processed in advanced levels. This paper aims at explaining those processing steps and comparison of the LiDAR technology to the state of the art methods. For this purpose, a building point clouds data obtained by a terrestrial LiDAR scanner was used. This paper explains the registration of point clouds data, its underlying concepts, mathematical models as well as the free open source libraries to be used to perform these operations. Despite the remarkable achievements in LiDAR technology and data processing, it is still a relatively young subject and would likely change its course rather quickly in the near future.
Proceedings. International Conference on Virtual Systems and MultiMedia VSMM '97 (Cat. No.97TB100182), 1997
The increasing use of virtual object representations for various applications creates a need for fast and simple object digitizing systems. Range finders provide a convenient way to digitize solid objects and permit the accurate and fast scanning of an object shape without any probe contact. However, only one view of an object can be captured at once and therefore for most objects several views have to be combined in order to obtain a description of the complete surface. We consider a digitizing system which captures and triangulates views of a real world 3D object and finally registers and integrates them. Registration is based on geometric matching and uses an interactively entered pose estimate. Integration is performed by a new fusion algorithm proposed in this paper. This algorithm takes advantage of the previous view registration to remove the redundant overlap area of two views and to fuse together their respective meshes by a gap filling algorithm. The fusion algorithm integrates well in the whole reconstruction process and is simple and successful.
Journal of Artificial Intelligence and Soft Computing Research, 2023
Scanning real 3D objects face many technical challenges. Stationary solutions allow for accurate scanning. However, they usually require special and expensive equipment. Competitive mobile solutions (handheld scanners, LiDARs on vehicles, etc.) do not allow for an accurate and fast mapping of the surface of the scanned object. The article proposes an end-to-end automated solution that enables the use of widely available mobile and stationary scanners. The related system generates a full 3D model of the object based on multiple depth sensors. For this purpose, the scanned object is marked with markers. Markers type and positions are automatically detected and mapped to a template mesh. The reference template is automatically selected for the scanned object, which is then transformed according to the data from the scanners with non-rigid transformation. The solution allows for the fast scanning of complex and varied size objects, constituting a set of training data for segmentation and classification systems of 3D scenes. The main advantage of the proposed solution is its efficiency, which enables real-time scanning and the ability to generate a mesh with a regular structure. It is critical for training data for machine learning algorithms. The source code is available at https://github.com/SATOffice/improved_scanner3D.
2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops, 2009
Multi-view stereo methods frequently fail to properly reconstruct 3D scene geometry if visible texture is sparse or the scene exhibits difficult self-occlusions. Time-of-Flight (ToF) depth sensors can provide 3D information regardless of texture but with only limited resolution and accuracy. To find an optimal reconstruction, we propose an integrated multi-view sensor fusion approach that combines information from multiple color cameras and multiple ToF depth sensors. First, multi-view ToF sensor measurements are combined to obtain a coarse but complete model. Then, the initial model is refined by means of a probabilistic multiview fusion framework, optimizing over an energy function that aggregates ToF depth sensor information with multiview stereo and silhouette constraints. We obtain high quality dense and detailed 3D models of scenes challenging for stereo alone, while simultaneously reducing complex noise of ToF sensors.
ABCM Symposium Series in Mechatronics, 2006
The necessity of obtaining geometric models in three-dimension that represent with precision a real world object is becoming common each day. For this, one has to recur to methods of 3D Modeling. Three-dimension models have application in several areas, amongst which one can cite photogrammetry, archaeology, reverse engineering, robotic guidance, virtual reality, medicine, cinema, game programming, and others. A current challenge is the construction of 3D models digitized with precision enough to be used in manufacturing systems or numerical simulation of the performance of machines and components in operation, such as turbines and flows in non-circular ducts when the geometric model is not available. The reconstruction of 3D shapes of objects or scenes from range images, also known as depth maps, is preferable than using intensity images or stereoscopy. These maps represent information of distances measured from an observer (optical sensor or camera) to the scene in a rectangular grid. Therefore, the 3D information is explicit and will not need to be recovered as in the case of intensity images. The reconstruction process presents three stages. The first one is sampling of the real world in depth maps. The second stage is the alignment of several views within the same coordinate system, known as image registration. The third stage is the integration of the views for the generation of surface meshes, named merging. The current challenges converge to searching methods that meet with the highest number of desirable properties, such as robustness to outliers, efficiency of time and space complexity and precision of results. This work consists in the discussion of different methods dealing with 3D shape reconstruction from range images found in the literature and in the implementation of the second phase of 3D reconstruction: range image registration.
There is a considerable interest in the development of new optical imaging systems that are able to give three-dimensional images. In this paper, we present some considerations concerning the field of three-dimensional laser images where significant technological advances have encouraged research over the past decade. Potential applications range across medical imaging, surveillance and robotic vision. Identifying targets or objects concealed by foliage or camouflage is a critical requirement for operations in public safety, law enforcement and defense.
2021
Depth-Image-Based Rendering (DIBR) can synthesize a virtual view image from a set of multiview images andcorresponding depth maps. However, this requires an accurate depth map estimation that incurs a high compu-tational cost over several minutes per frame in DERS (MPEG-I’s Depth Estimation Reference Software) even byusing a high-class computer. LiDAR cameras can thus be an alternative solution to DERS in real-time DIBR ap-plications. We compare the quality of a low-cost LiDAR camera, the Intel Realsense LiDAR L515 calibrated andconfigured adequately, with DERS using MPEG-I’s Reference View Synthesizer (RVS). In IV-PSNR, the LiDARcamera reaches 32.2dB view synthesis quality with a 15cm camera baseline and 40.3dB with a 2cm baseline.Though DERS outperforms the LiDAR camera with 4.2dB, the latter provides a better quality-performance trade-off. However, visual inspection demonstrates that LiDAR’s virtual views have even slightly higher quality thanwith DERS in most tested low-texture ...
Lecture Notes in Computer Science, 2011
3D imaging sensors for the acquisition of three dimensional faces have created, in recent years, a considerable degree of interest for a number of applications. Structured light camera/projector systems are often used to overcome the relatively uniform appearance of skin. In this paper, we propose a 3D acquisition solution with a 3D space-time nonrigid super-resolution capability, using three calibrated cameras coupled with a non calibrated projector device, which is particularly suited to 3D face scanning, i.e. rapid, easily movable and robust to ambient lighting conditions. The proposed solution is a hybrid stereovision and phaseshifting approach, using two shifted patterns and a texture image, which not only takes advantage of the assets of stereovision and structured light but also overcomes their weaknesses. The super-resolution process is performed to deal with 3D artefacts and to complete the 3D scanned view in the presence of small non-rigid deformations as facial expressions. The experimental results further validated the effectiveness of the proposed approach.
Pattern Recognition Letters, 2014
This paper proposes to infer accurately a 3D shape of an object captured by a depth camera from multiple view points. The Generalised Relaxed Radon Transform (GR 2 T) [1] is used here to merge all depth images in a robust kernel density estimate that models the surface of an object in the 3D space. The kernel is tailored to capture the uncertainty associated with each pixel in the depth images. The resulting cost function is suitable for stochastic exploration with gradient ascent algorithms when the noise of the observations is modelled with a differentiable distribution. When merging several depth images captured from several view points, extrinsic camera parameters need to be known accurately, and we extend GR 2 T to also estimate these nuisance parameters. We illustrate qualitatively the performance of our modelling and we assess quantitatively the accuracy of our 3D shape reconstructions computed from depth images captured with a Kinect camera.
Journal of Information Processing Systems, 2016
The recent advent of increasingly affordable and powerful 3D scanning devices capable of capturing high resolution range data about real-world objects and environments has fueled research into effective 3D surface reconstruction techniques for rendering the raw point cloud data produced by many of these devices into a form that would make it usable in a variety of application domains. This paper, therefore, provides an overview of the existing literature on surface reconstruction from 3D point clouds. It explains some of the basic surface reconstruction concepts, describes the various factors used to evaluate surface reconstruction methods, highlights some commonly encountered issues in dealing with the raw 3D point cloud data and delineates the tradeoffs between data resolution/accuracy and processing speed. It also categorizes the various techniques for this task and briefly analyzes their empirical evaluation results demarcating their advantages and disadvantages. The paper concludes with a cross-comparison of methods which have been evaluated on the same benchmark data sets along with a discussion of the overall trends reported in the literature. The objective is to provide an overview of the state of the art on surface reconstruction from point cloud data in order to facilitate and inspire further research in this area.
Proceedings of gruppo telecomunicazioni e tecnologie dell’informazione (GTTI) meeting, 2010
3D video applications require the acquisition of high quality depth information in real time. This issue can be solved using stereo vision systems or time-of-flight (ToF) cameras. Both solutions present critical issues, that can be overcome by their combined use. A heterogeneous acquisition system is considered in this paper, made of two high resolution standard cameras (stereo pair) and one ToF camera. The ToF range camera technology is quite new, therefore a brief description of how a ToF range cameras works is firstly given, ...
Proceedings of the Eighth Indian Conference on Computer Vision, Graphics and Image Processing, 2012
In this paper we describe a system for building geometrically consistent 3D models using structured-light depth cameras. While the commercial availability of such devices, i.e. Kinect, has made obtaining depth images easy, the data tends to be corrupted with high levels of noise. In order to work with such noise levels, our approach decouples the problem of scan alignment from that of merging the aligned scans. The alignment problem is solved by using two methods tailored to handle the effects of depth image noise and erroneous alignment estimation. The noisy depth images are smoothed by means of an adaptive bilateral filter that explicitly accounts for the sensitivity of the depth estimation by the scanner. Our robust method overcomes failures due to individual pairwise ICP errors and gives alignments that are accurate and consistent. Finally, the aligned scans are merged using a standard procedure based on the signed distance function representation to build a full 3D model of the object of interest. We demonstrate the performance of our system by building complete 3D models of objects of different physical sizes, ranging from cast-metal busts to a complete model of a small room as well as that of a complex scale model of an aircraft.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.