Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2020, Computer Vision – ECCV 2020
A 3D imaging and mapping system that can handle both multiple-viewers and dynamic-objects is attractive for many applications. We propose a freeform structured light system that does not rigidly constrain camera(s) to the projector. By introducing an optimized phasecoded aperture in the projector, we transform the projector pattern to encode depth in its defocus robustly; this allows a camera to estimate depth, in projector coordinates, using local information. Additionally, we project a Kronecker-multiplexed pattern that provides global context to establish correspondence between camera and projector pixels. Together with aperture coding and projected pattern, the projector offers a unique 3D labeling for every location of the scene. The projected pattern can be observed in part or full by any camera, to reconstruct both the 3D map of the scene and the camera pose in the projector coordinates. This system is optimized using a fully differentiable rendering model and a CNN-based reconstruction. We build a prototype and demonstrate high-quality 3D reconstruction with an unconstrained camera, for both dynamic scenes and multi-camera systems.
Journal of Computational Vision and Imaging Systems
Structured Light (SL) patterns generated based on pseudo-random arrays are widely used for single-shot 3D reconstruction using projector-camera systems. These SL images consist of a set of tags with different appearances, where these patterns will be projected on a target surface, then captured by a camera and decoded. The precision of localizing these tags from captured camera images affects the quality of the pixel-correspondences between the projector and the camera, and consequently that of the derived 3D shape. In this paper, we incorporate a quadrilateral representation for the detected SL tags that allows the construction of robust and accurate pixel-correspondences and the application of a spatial rectification module that leads to high tag classification accuracy. When applying the proposed method to single-shot 3D reconstruction, we show the effectiveness of this method over a baseline in estimating denser and more accurate 3D point-clouds.
The quality of the captured point cloud and the scanning speed of a structured light 3D camera system depend upon their capability of handling the object surface of a large reflectance variation in the trade-off of the required number of patterns to be projected. In this paper, we propose and implement a flexible embedded framework that is capable of triggering the camera single or multiple times for capturing single or multiple projections within a single camera exposure setting. This allows the 3D camera system to synchronize the camera and projector even for miss-matched frame rates such that the system is capable of projecting different types of patterns for different scan speed applications. This makes the system capturing a high quality of 3D point cloud even for the surface of a large reflectance variation while achieving a high scan speed. The proposed framework is implemented on the Field Programmable Gate Array (FPGA), where the camera trigger is adaptively generated in such a way that the position and the number of triggers are automatically determined according to camera exposure settings. In other words, the projection frequency is adaptive to different scanning applications without altering the architecture. In addition, the proposed framework is unique as it does not require any external memory for storage because pattern pixels are generated in real-time, which minimizes the complexity and size of the application-specific integrated circuit (ASIC) design and implementation.
2006
Depth cameras, which provide color and depth information per pixel at video rates, offer exciting new opportunities in computer graphics. We address the challenge of supporting free-viewpoint video of dynamic 3D scenes using live data captured and streamed from widely-spaced viewpoints by a handful of synchronized depth cameras. We introduce the concept of the depth hull, which is a generalization of the well-known visual hull. The depth hull reflects all the dense depth information as observed from several centers of projection around the scene. It is the best approximation of the scene geometry that can be obtained from a given set of depth camera recordings. We first present a general improvement to the best existing visual hull rendering algorithm, which is of independent interest. We then use this to contribute a hardware-accelerated method for rendering novel views from depth hulls in real-time. This method is based on a combination of techniques from projective shadow mapping and constructive solid geometry (CSG). Our rendering method achieves high-quality results even when only a modest number of depth cameras are deployed. They are applicable to any set of images with accompanying dense depth maps that correspond to arbitrary viewing positions around the scene. We provide experimental results using a system incorporating two depth cameras recording a dynamic scene. To the best of our knowledge, these are the first results on free-viewpoint image synthesis using commercially-available depth cameras.
2013
Research interest in rapid structured-light imaging has grown increasingly for the modeling of moving objects, and a number of methods have been suggested for the range capture in a single video frame. The imaging area of a 3D object using a single projector is restricted since the structured light is projected only onto a limited area of the object surface. Employing additional projectors to broaden the imaging area is a challenging problem since simultaneous projection of multiple patterns results in their superposition in the light-intersected areas and the recognition of original patterns is by no means trivial. This paper presents a novel method of multi-projector color structured-light vision based on projector–camera triangulation. By analyzing the behavior of superposed-light colors in a chromaticity domain, we show that the original light colors cannot be properly extracted by the conventional direct estimation. We disambiguate multiple projectors by multiplexing the orientations of projector patterns so that the superposed patterns can be separated by explicit derivative computations. Experimental studies are carried out to demonstrate the validity of the presented method. The proposed method increases the efficiency of range acquisition compared to conventional active stereo using multiple projectors.
Optical Engineering, 2011
We present a novel 3-D recovery method based on structured light. This method unifies depth from focus (DFF) and depth from defocus (DFD) techniques with the use of a dynamic (de)focused projection. With this approach, the image acquisition system is specifically constructed to keep a whole object sharp in all the captured images. Therefore, only the projected patterns experience different defocused deformations according to the object's depths. When the projected patterns are out of focus, their point-spread function (PSF) is assumed to follow a Gaussian distribution. The final depth is computed by the analysis of the relationship between the sets of PSFs obtained from different blurs and the variation of the object's depths. Our new depth estimation can be employed as a stand-alone strategy. It has no problem with occlusion and correspondence issues. Moreover, it handles textureless and partially reflective surfaces. The experimental results on real objects demonstrate the effective performance of our approach, providing reliable depth estimation and competitive time consumption. It uses fewer input images than DFF, and unlike DFD, it ensures that the PSF is locally unique. C 2011 Society of Photo-Optical Instrumentation Engineers (SPIE).
Journal of Computational Vision and Imaging Systems
Multi-frame structured light in projector-camera systems affords high-density and non-contact methods of 3D surface reconstruction. However, they have strict setup constraints which can become expensive and time-consuming. Here, we investigate the conditions under which a projective homography can be used to compensate for small perturbations in pose caused by a hand-held camera. We synthesize data using a pinhole camera model and use it to determine the average 2D reprojection error per point correspondence. This error map is grouped into regions with specified upper-bounds to classify which regions produce sufficiently minimal error to be considered feasible for a structured-light projector-camera system with a hand-held camera. Empirical results demonstrate that a sub-pixel reprojection accuracy is achievable with a feasible geometric constraints
2015 IEEE International Conference on Computer Vision (ICCV), 2015
The central projection model commonly used to model cameras as well as projectors, results in similar advantages and disadvantages in both types of system. Considering the case of active stereo systems using a projector and camera setup, a central projection model creates several problems; among them, narrow depth range and necessity of wide baseline are crucial. In the paper, we solve the problems by introducing a light field projector, which can project a depth-dependent pattern. The light field projector is realized by attaching a coded aperture with a high frequency mask in front of the lens of the video projector, which also projects a high frequency pattern. Because the light field projector cannot be approximated by a thin lens model and a precise calibration method is not established yet, an image-based approach is proposed to apply a stereo technique to the system. Although image-based techniques usually require a large database and often imply heavy computational costs, we propose a hierarchical approach and a feature-based search for solution. In the experiments, it is confirmed that our method can accurately recover the dense shape of curved and textured objects for a wide range of depths from a single captured image. Image database Reference images Tree database Capture reference images Belief Propagation Rotated images
2008 IEEE Conference on Computer Vision and Pattern Recognition, 2008
3D acquisition techniques to measure dynamic scenes and deformable objects with little texture are extensively researched for applications like the motion capturing of human facial expression. To allow such measurement, several techniques using structured light have been proposed. These techniques can be largely categorized into two types. The first involves techniques to temporally encode positional information of a projector's pixels using multiple projected patterns, and the second involves techniques to spatially encode positional information into areas or color spaces. Although the former allows dense reconstruction with a sufficient number of patterns, it has difficulty in scanning objects in rapid motion. The latter technique uses only a single pattern, so this problem can be resolved, however, it often uses complex patterns or color intensities, which are weak to noise, shape distortions, or textures. Thus, it remains an open problem to achieve dense and stable 3D acquisition in real cases. In this paper, we propose a technique to achieve dense shape reconstruction that requires only a single-frame image of a grid pattern. The proposed technique also has the advantage of being robust in terms of image processing.
This paper describes how a mirror can be integrated as another view and another source of light patterns in an interactive reconstruction system with structured light, where the object, the camera, and the mirror can move. We show how a single pass of structured light can provide 3D points to accurately estimate the pose of a mirror, while also reconstructing 3D points on the object. We develop new structured light patterns that are unaffected by the reversed order created by some mirror configurations. We also describe hardware rendering support to avoid conflicting emitted/captured light patterns, and demonstrate how all the proposed realizations extend naturally for multiple mirror configurations. We finally conclude with results, discuss limitations, and suggest further improvements.
2009
In this paper, we present and analyze a depth imaging system based on the integration of active stereo matching and structured light methods. The integration of these methods benefits from the advantages of the two approaches, allowing a shape recovery from a wider view with less occlusion. We build a system composed of two cameras and a projector, and project a single one-shot pattern. We first use the structured light part in order to estimate reliable correspondences between each camera and the projector via an efficient pattern decoding technique. The remaining unresolved regions are explored by a stereo matching technique which is less sensitive to object surface colors and projectors short depth of field to estimate additional correspondences. By switching between the colored pattern and a white light, the texture information of the system is retrieved at the same time and from the same viewpoint. Finally, we present a thoughtful and in-depth analysis of the capabilities and limitations of the presented system in the context of the development and contents creation for depth image-based representation (DIBR) 3DTV. Through carefully designed experiments we quantify the depth range of the camera system, the effects of the projector depth of field on the pattern decoding performance and the robustness to scene surface colors with respect to their hue, saturation, and brightness.
The Visual Computer, 2005
In this paper we present a scalable 3D video framework for capturing and rendering dynamic scenes. The acquisition system is based on multiple sparsely placed 3D video bricks, each comprising a projector, two grayscale cameras, and a color camera. Relying on structured light with complementary patterns, texture images and pattern-augmented views of the scene are acquired simultaneously by time-multiplexed projections and synchronized camera exposures. Using space-time stereo on the acquired pattern images, high-quality depth maps are extracted, whose corresponding surface samples are merged into a view-independent, point-based 3D data structure. This representation allows for effective photo-consistency enforcement and outlier removal, leading to a significant decrease of visual artifacts and a high resulting rendering quality using EWA volume splatting. Our framework and its view-independent representation allow for simple and straightforward editing of 3D video. In order to demonstrate its flexibility, we show compositing techniques and spatiotemporal effects.
Structured light methods achieve 3D modelling by observing with a camera system, a known pattern projected on the scene. The main drawback of single projection structured light methods is that moving the projector changes significatively the appearance of the scene at every acquisition time. Classical multi-view stereovision approaches based on the appearance matching are then not useable. The presented work is based on a two-cameras and one single slide projector system embedded in a hand-held device for industrial applications (reverse engineering, dimensional control, etc). We propose a method to achieve multi-view modelling for camera pose and surface reconstruction estimation in a joint process. The proposed method is based on the extension of a stereo-correlation criterion. Acquisitions are linked through a generalized expression of local homographies. The constraints brought by this formulation allow an accurate estimation of the modelling parameters for dense reconstruction of the scene and improve the result when dealing with detailed or sharp objects, compared to pairwise stereovision methods.
Geoinformatics FCE CTU, 2011
Recently, one of the central issues in the fields of Photogrammetry, Computer Vision, Computer Graphics and Image Processing is the development of tools for the automatic reconstruction of complex 3D objects. Among various approaches, one of the most promising is Structured Light 3D scanning (SL) which combines automation and high accuracy with low cost, given the steady decrease in price of cameras and projectors. SL relies on the projection of different light patterns, by means of a video projector, on 3D object sur faces, which are recorded by one or more digital cameras. Automatic pattern identification on images allows reconstructing the shape of recorded 3D objects via triangulation of the optical rays corresponding to projector and camera pixels. Models draped with realistic phototexture may be thus also generated, reproducing both geometry and appearance of the 3D world. In this context, subject of our research is a synthesis of state-of-the-art as well as the development of...
2014 IEEE/ASME International Conference on Advanced Intelligent Mechatronics, 2014
Structured Light 3D scanners have released a stampede of successful applications in robotics. This opens up the possibility to identify subtle deficiencies of actual sensors and propose solutions. In this paper we present the design of a novel Structured Light scanner that defies the conventional and simplistic static cameras-projector configuration. We found that by adding some degrees of freedom to the cameras it is possible to improve the capabilities of the system, for example reducing the minimum operating distance and increasing the depth estimation accuracy. This multidisciplinary project involves the fields of mechanics, electronics, control, optics and computer engineering, and aims to design a powerful tool, for academic and industrial applications.
Sixth International Conference on 3-D Digital Imaging and Modeling (3DIM 2007), 2007
This paper presents a new system for acquiring complete 3D surface models using a single structured light projector, a pair of planar mirrors, and one or more synchronized cameras. We project structured light patterns that illuminate the object from all sides (not just the side of the projector) and are able to observe the object from several vantage points simultaneously. This system requires that projected planes of light be parallel, and so we construct an orthographic projector using a Fresnel lens and a commercial DLP projector. A single Gray code sequence is used to encode a set of vertically-spaced light planes within the scanning volume, and five views of the illuminated object are obtained from a single image of the planar mirrors located behind it. Using each real and virtual camera, we then recover a dense 3D point cloud spanning the entire object surface using traditional structured light algorithms. As we demonstrate, this configuration overcomes a major hurdle to achieving full 360 degree reconstructions using a single structured light sequence by eliminating the need for merging multiple scans or multiplexing several projectors.
Computer Vision and Image Understanding, 2004
The view-independent visualization of 3D scenes is most often based on rendering accurate 3-dimensional models or utilizes image-based rendering techniques. To compute the 3D structure of a scene from a moving vision sensor or to use imagebased rendering approaches, we need to be able to estimate the motion of the sensor from the recorded image information with high accuracy, a problem that has been well-studied. In this work, we investigate the relationship between camera design and our ability to perform accurate 3D photography, by examining the influence of camera design on the estimation of the motion and structure of a scene from video data. By relating the differential structure of the time varying plenoptic function to different known and new camera designs, we can establish a hierarchy of cameras based upon the stability and complexity of the computations necessary to estimate structure and motion. At the low end of this hierarchy is the standard planar pinhole camera for which the structure from motion problem is non-linear and ill-posed. At the high end is a camera, which we call the full field of view polydioptric camera, for which the motion estimation problem can be solved independently of the depth of the scene which leads to fast and robust algorithms for 3D Photography. In between are multiple view cameras with a large field of view which we have built, as well as omni-directional sensors.
This research concerns the acquisition of 3-dimensional data from images for the purpose of modeling a person's head. This paper proposes an approach for acquiring the 3-dimensional reconstruction using a multiple stereo camera vision platform and a combination of passive and active lighting techniques. The proposed one-shot active lighting method projects a single, binary dot pattern, hence ensuring the suitability of the method to reconstruct dynamic scenes. Contrary to the conventional spatial neighborhood coding techniques, this approach matches corresponding spots between image pairs by exploiting solely the redundant data available in the multiple camera images. This produces an initial, sparse reconstruction, which is then used to guide a passive lighting technique to obtain a dense 3-dimensional representation of the object of interest. The results obtained reveal the robustness of the projected pattern and the spot matching algorithm, and a decrease in the number of fal...
IPSJ Transactions on Computer Vision and Applications
The combination of a pattern projector and a camera is widely used for 3D measurement. To recover shape from a captured image, various kinds of depth cues are extracted from projected patterns in the image, such as disparities from active stereo or blurriness for depth from defocus. Recently, several techniques have been proposed to improve 3D quality using multiple depth cues by installing coded apertures in projectors or by increasing the number of projectors. However, superposition of projected patterns forms a complicated light field in 3D space, which makes the process of analyzing captured images challenging. In this paper, we propose a learning-based technique to extract depth information from such a light field, which includes multiple depth cues. In the learning phase, prior to the 3D measurement of unknown scenes, projected patterns as they appear at various depths are prepared from not only actual images but also ones generated virtually using computer graphics and geomet...
2012 IEEE International Conference on Robotics and Automation, 2012
In this paper, we present a new method for decoding pixel correspondences in structured light based 3D reconstruction, refer to here as Ray-Tracing codec. The key idea of Ray-Tracing codec is to correctly define the region boundaries in real number, for each layer of the Hierarchical Orthogonal Code (HOC) based on an accurate boundary estimator, and to inherit the correct region boundaries between layers sharing common boundaries. Furthermore, each region in lower layer is traced back to the upper layer for the correct correspondence between regions. This is an improvement over existing HOC decoding algorithms as the wrong decoded pixel correspondences can be greatly reduced. The experimental results have shown that the proposed Ray-Tracing codec significantly enhances the robustness and precision in depth imaging, compare with HOC and other well-known conventional approach. The proposed approach opens a greater feasibility of applying structured light based depth imaging to a 3D modeling of cluttered workspace for home service robots.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.