Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2010, 2010 IEEE International Symposium on Mixed and Augmented Reality
We propose a real-time solution for modeling and tracking multiple 3D objects in unknown environments. Our contribution is twofold: First, we show how to scale with the number of objects. This is done by combining recent techniques for image retrieval and online Structure from Motion, which can be run in parallel. As a result, tracking 40 objects in 3D can be done within 6 to 25 milliseconds per frame, even under difficult conditions for tracking. Second, we propose a method to let the user add new objects very quickly. The user simply has to select in an image a 2D region lying on the object. A 3D primitive is then fitted to the features within this region, and adjusted to create the object 3D model. In practice, this procedure takes less than a minute.
Computers & Graphics, 2012
We propose a real-time solution for modeling and tracking multiple 3D objects in unknown environments for Augmented Reality. The proposed solution consists of both scalable tracking and interactive modeling. Our contribution is twofold: First, we show how to scale with the number of objects using keyframes. This is done by combining recent techniques for image retrieval and online Structure from Motion, which can be run in parallel. As a result, tracking 50 objects in 3D can be done within 6-35 ms per frame, even under difficult conditions for tracking. Second, we propose a method to let the user add new objects very quickly. The user simply has to select in an image a 2D region lying on the object. A 3D primitive is then fitted to the features within this region, and adjusted to create the object 3D model. We demonstrate the modeling of polygonal and circular-based objects. In practice, this procedure takes less than a minute.
2012 IEEE International Conference on Multimedia and Expo, 2012
This paper presents a flexible and easy-to-use tracking method for 3D interaction. The method reconstructs points of a user-specified object from a video sequence, and recovers the 6 degrees of freedom (DOF) camera pose and position relative to the reconstructed points in each video frame. As opposed to most existing 3D object tracking methods, the proposed method does not need any off-line modeling or training process. Instead, it first segments the object from the background, then reconstructs and tracks the object using the Visual Simultaneous Localization And Mapping (VSLAM) techniques. To our knowledge, there are no existing works investigating this kind of on line reconstruction and tracking of moving objects. The proposed method employs the adapted pyramidal Lucas-Kanade tracker to increase the stability and the robustness of the tracking when dealing with a lightly textured or fast moving object. Experiments show that fast, accurate, stable and robust tracking can be achieved in everyday environment. Moreover, a simple stereo initialization approach is adopted to minimize user intervention. All these attributes conspire to make the method an adequate tool for some interaction applications. As a concrete example, an interactive 3D scene displaying system is demonstrated.
IEEE Computer Graphics and Applications, 2002
I n research on 3D image communications and virtual reality, developing techniques for synthesizing arbitrary views has become an important technical issue. Given an object's structural model (such as a polygon or volume model), it's relatively easy to synthesize arbitrary views. Generating a structural model of an object, however, isn't necessarily easy. For this reason, research has been progressing on a technique called image-based modeling and rendering (IBMR) that avoids this problem. To date, researchers have performed studies on various IBMR techniques. (See the "Related Work" sidebar for more specific information.) Our work targets 3D scenes in motion. In this article, we propose a method for view-dependent layered representation of 3D dynamic scenes. Using densely arranged cameras, we've developed a system that can perform processing in real time from image pickup to interactive display, using video sequences instead of static images, at 10 frames per second (frames/sec). In our system, images on layers are view dependent, and we update both the shape and image of each layer in real time. This lets us use the dynamic layers as the coarse structure of the dynamic 3D scenes, which improves the quality of the synthesized images. In this sense, our prototype system may be one of the first full real-time IBMR systems. Our experimental results show that this method is useful for interactive 3D rendering of real scenes.
Journal of Real-Time Image Processing, 2007
This work presents a system for the generation of a free-form surface model from video sequences. Although any single centered camera can be applied in the proposed system the approach is demonstrated using fish-eye lenses because of their good properties for tracking. The system is designed to function automatically and to be flexible with respect to size and shape of the reconstructed scene. To minimize geometric assumptions a statistic fusion of dense depth maps is utilized. Special attention is payed to the necessary rectification of the spherical images and the resulting iso-disparity surfaces, which can be exploited in the fusion approach. Before dense depth estimation can be performed the cameras' pose parameters are extracted by means of a Structurefrom-Motion (SfM) scheme. In this respect automation of the system is achieved by thorough decision model based on robust statistics and error propagation of projective measurement uncertainties. This leads to a sceneindependent set of only a few parameters. All system components are formulated in a general way, making it possible to cope with any single centered projection model, in particular with spherical cameras. In using wide field-of-view cameras the presented system is able to reliably retrieve poses and consistently reconstruct large scenes. A textured triangle mesh constructed on basis of the scene's reconstructed depth, makes the system's results suitable to function as reference models in a GPU driven analysis-by-synthesis framework for real-time tracking.
Image and Vision Computing, 2010
This paper adds to the abundant visual tracking literature with two main contributions. First, we illustrate the interest of using Graphic Processing Units (GPU) to support efficient implementations of computer vision algorithms, and secondly, we introduce the use of point-based 3D models as a shape prior for real-time 3D tracking with a monocular camera.
IEEE Transactions on Visualization and Computer Graphics, 2011
We present a method that is able to track several 3D objects simultaneously, robustly, and accurately in real time. While many applications need to consider more than one object in practice, the existing methods for single object tracking do not scale well with the number of objects, and a proper way to deal with several objects is required. Our method combines object detection and tracking: frame-to-frame tracking is less computationally demanding but is prone to fail, while detection is more robust but slower. We show how to combine them to take the advantages of the two approaches and demonstrate our method on several real sequences.
We present a method that is able to track several 3D objects simultaneously , robustly, and accurately in real-time. While many applications need to consider more than one object in practice, the existing methods for single object tracking do not scale well with the number of objects, and a proper way to deal with several objects is required. Our method combines object detection and tracking: Frame-to-frame tracking is less computationally demanding but is prone to fail, while detection is more robust but slower. We show how to combine them to take the advantages of the two approaches, and demonstrate our method on several real sequences.
IEEE CVPR 2014, 2014
We consider the problem of tracking multiple interacting objects in 3D, using RGBD input and by considering a hypothesize-and-test approach. Due to their interaction, objects to be tracked are expected to occlude each other in the field of view of the camera observing them. A naive approach would be to employ a Set of Independent Trackers (SIT) and to assign one tracker to each object. This approach scales well with the number of objects but fails as occlusions become stronger due to their disjoint consideration. The solution representing the current state of the art employs a single Joint Tracker (JT) that accounts for all objects simultaneously. This directly resolves ambiguities due to occlusions but has a computational complexity that grows geometrically with the number of tracked objects. We propose a middle ground, namely an Ensemble of Collaborative Trackers (ECT), that combines best traits from both worlds to deliver a practical and accurate solution to the multi-object 3D tracking problem. We present quantitative and qualitative experiments with several synthetic and real world sequences of diverse complexity. Experiments demonstrate that ECT manages to track far more complex scenes than JT at a computational time that is only slightly larger than that of SIT.
Lecture Notes in Computer Science, 1998
We present some recent progress in designing and implementing two interactive image-based 3D modeling systems. The first system constructs 3D models from a collection of panoramic image mosaics. A panoramic mosaic consists of a set of images taken around the same viewpoint, and a camera matrix associated with each input image. The user first interactively specifies features such as points, lines, and planes. Our system recovers the camera pose for each mosaic from known line directions and reference points. It then constructs the 3D model using all available geometrical constraints. The second system extracts structure from stereo by representing the scene as a collection of approximately planar layers. The user first interactively segments the images into corresponding planar regions. Our system recovers a composite mosaic for each layer, estimates the plane equation for the layer, and optionally recovers the camera locations as well as out-of-plane displacements. By taking advantage of known scene regularities, our interactive systems avoid difficult feature correspondence problems that occur in traditional automatic modeling systems. They also shift the interactive high-level structural model specification stage to precede (or intermix with) the 3D geometry recovery. They are thus able to extract accurate wire frame and texture-mapped 3D models from multiple image sequences.
A system for the automatic reconstruction of real world objects from multiple uncalibrated camera views is presented. The camera position and orientation for all views, the 3-D shape of the rigid object as well as associated color information are recovered from the image sequence. The system proceeds in four steps. First, the internal camera parameters describing the imaging geometry of the camera are calibrated using a reference object. Second, an initial 3-D description of the object is computed from two views. This model information is then used in a third step to estimate the camera positions for all available views using a novel linear 3-D motion and shape estimation algorithm. The main feature of this third step is the simultaneous estimation of 3-D camera motion parameters and object shape refinement with respect to the initial 3-D model. The initial 3-D shape model exhibits only a few degrees of freedom and the object shape refinement is defined as flexible deformation of the initial shape model. Our formulation of the shape deformation allows the object texture to slide on the surface, which differs from traditional flexible body modeling. This novel combined shape and motion estimation using sliding texture considerably improves the calibration data of the individual views in comparison to fixed-shape model-based camera motion estimation. Since the shape model used for model-based camera motion estimation is approximate only, a volumetric 3-D reconstruction process is initiated in the fourth step that combines the information from all views simultaneously. The recovered object consists of a set of voxels with associated color information that describe even fine structures and details of the object. New views of the object can be rendered from the recovered 3-D model, which has potential applications in virtual reality or multimedia systems and the emerging field of video coding using 3-D scene models.
Pattern Recognition Letters, 1998
A method of tracking multiple objects of known geometry using multiple cameras is proposed. Our approach differs from the previous approaches in that the object geometry is tightly integrated into the tracking process. The major contribution is threefold: Firstly, multiple cameras are used to improve the accuracy of the estimated posture parameters. Additional formalism required by considering multiple images is nicely integrated into the tracking model, and is handled effectively. Secondly, the feature tracking is facilitated by integrating the measurement and dynamic models into the matching process, thereby improving the accuracy and robustness of the feature correspondence. Thirdly, ambiguities that may arise in the course of the feature matching are resolved by the statistical analysis and the visibility test. The entire process from the image sequence to the posture parameters has been completely automated into a single, seamless process, and has been extensively tested on synthetic and real images.
In this paper we present a framework for the estimation of the pose of an object in 3D space: from the detection and subsequent recognition from a 3D point-cloud, to tracking in the 2D camera plane. The detection process proposes a way to remove redundant features, which leads to significant computational savings without affecting identification performance. The tracking process introduces a method that is less sensitive to outliers and is able to perform in soft real-time. We present preliminary results that illustrate the effectiveness of the approach both in terms of accuracy and computational speed.
INTERNATIONAL JOURNAL OF ADVANCE RESEARCH, IDEAS AND INNOVATIONS IN TECHNOLOGY
The dimensional analysis of an object from an image reduces a lot of burden for a user, like the traditional measuring tape method. Using the dimensions will make reconstruction of the 3D model of the real-time object easier. However, this method is not used in the current implementation. Dimensional analysis can also be helpful in online shopping where the user's availability for fitting is not possible. 3D model replaces the fitting stage in online shopping. Once the dimensions of an object's surface are found, it is easy to calculate surface areas, given surface areas we can calculate volume. But the calculation of volume requires more than one dimension of the object. In this paper, an approach using a reference object, whose real-time dimensions are already known is used. The whole process is divided into three tasks-Object detection using SURF algorithm, Dimensional Analysis of the 2D object using pixel per metric ratio given that there is a reference object on the same plane and 3D reconstruction using Structure from Motion algorithm.
IEEJ Transactions on Electronics, Information and Systems, 2013
In this paper we propose a method for detecting 3D keypoints in a 3D point cloud for robust real-time camera tracking. Assuming that there are a number of images corresponding to the 3D point cloud, we define a 3D keypoint as a point that has corresponding 2D keypoints in many images. These 3D keypoints are expected to appear with high probability as 2D keypoints in newly taken query images. For 3D-2D matching, we embed 2D feature descriptors into the 3D keypoints. Experimental results with 3D point clouds of indoor and outdoor scenes show that the extracted 3D keypoints can be used for matching with 2D keypoints in query images.
EURASIP Journal on Image and Video Processing, 2017
Accurate 3D measuring systems thrive in the past few years. Most of them are based on laser scanners because these laser scanners are able to acquire 3D information directly and precisely in real time. However, comparing to the conventional cameras, these kinds of equipment are usually expensive and they are not commonly available to customers. Moreover, laser scanners interfere easily with each other sensors of the same type. On the other hand, computer vision-based 3D measuring techniques use stereo matching to acquire the cameras' relative position and then estimate the 3D location of points on the image. Because this kind of systems needs additional estimation of the 3D information, systems with real time capability often relies on heavy parallelism that prevents implementation on mobile devices. Inspired by the structure from motion systems, we propose a system that reconstructs sparse feature points to a 3D point cloud using a mono video sequence so as to achieve higher computation efficiency. The system keeps tracking all detected feature points and calculates both the amount of these feature points and their moving distances. We only use the key frames to estimate the current position of the camera in order to reduce the computation load and the noise interference on the system. Furthermore, for the sake of avoiding duplicate 3D points, the system reconstructs the 2D point only when the point shifts out of the boundary of a camera. In our experiments, we show that our system is able to be implemented on tablets and can achieve state-of-the-art accuracy with a denser point cloud with high speed.
Real-Time Imaging, 1999
A model-based tracking algorithm, such as Worrall [1], Gennery [2], Koller [3], Harris [4], Schneiderman [5] and Lowel , consists of the following repeated stages in one frame of the captured image sequence: (1) matching between a known projected model and the image feature;
Proceedings 1999 IEEE/RSJ International Conference on Intelligent Robots and Systems. Human and Environment Friendly Robots with High Intelligence and Emotional Quotients (Cat. No.99CH36289), 1999
Joao P Barreto, Paulo Peixoto, Jorge Batista, Helder Araujo 1999 IEEE/RSJ International Conference on Intelligent Robots and Systems(IROS'99): Human and Environment Friendly Robots with High Intelligence and Emotional Quotients', 210-215, 1999. ...
2008 Canadian Conference on Computer and Robot Vision, 2008
Methods for reconstructing photorealistic 3D graphics models from images or video are appealing applications of computer vision. Such methods rely on good input image data, but the lack of user feedback during image acquisition often leads to incomplete or poorly sampled reconstruction results. We describe a video-based system that constructs and visualizes a coarse graphics model in real-time and automatically saves a set of images appropriate for later offline dense reconstruction. Visualization of the model during image acquisition allows the operator to interactively verify that an adequate set of input images has been collected for the modeling task, while automatic image selection keeps storage requirements to a minimum. Our implementation uses real-time monocular SLAM to compute and continuously keep extending a 3D model, augments this with keyframe selection for storage, surface modelling, and on-line rendering of the current structure textured from a selection of key-frames. This rendering gives an immediate and intuitive view of both the geometry and if suitable viewpoints of texture images have already been captured.
2007 IEEE Instrumentation & Measurement Technology Conference IMTC 2007, 2007
In this paper, it is presented an algorithm for processing visual data to obtain relevant information that will be afterwards used to track the different moving objects in complex indoor environments. In autonomous robots applications, visual detection ofthe obstacles in a dynamic environment from a mobile platform is a complicated task. The robustness of this process is fundamental in tracking and navigation reliability for autonomous robots. The solution exposed in the document is based on a stereo-vision system; so that 3D information related to each object position in the local environment of the robot is extracted directly form the cameras. In the proposed application, all objects, both dynamic and static, in the local environment of the robot but the structure of the environment itself are considered to be obstacles. With this specification a distinction between building elements (ceiling, walls, columns and so on) and the rest of items in the robot surroundings is needed. Therefore, a classification has to be developed altogether with the detection task. On the other hand, the obtained data can be used to implement a partial reconstruction of the environmental structure that surrounds the robot. All these algorithms explained in detail in the following paragraphs and visual results are also included at the end ofthe paper.
The use of visual sensors may have high impact in applications where it ist required to measure the pose (position and orientation) and the visual features of object moving in unstructured environments. In robotics, the measurement provided by video cameras can be directly used to perform closed-loop control of the robot end-effector pose. In this chapter the problem of real-time estimation of the position and orientation of a moving object using a fixed stereo camera system is considered. An approach based on the use of the Extended Kalman Filter (EKF) combined with a 3D representation of the objects geometry based on Binary Space Partition (BSP) trees ist illustrated. The performance of the proposed visual tracking algorithm is experimentally tested in the case of an object moving in the visible space of a fixed stereo camera system.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.