Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2000, IEEE Transactions on Pattern Analysis and Machine Intelligence
A novel model-based approach to 3D hand tracking from monocular video is presented. The 3D hand pose, the hand texture and the illuminant are dynamically estimated through minimization of an objective function. Derived from an inverse problem formulation, the objective function enables explicit use of temporal texture continuity and shading information, while handling important self-occlusions and time-varying illumination. The minimization is done efficiently using a quasi-Newton method, for which we provide a rigorous derivation of the objective function gradient. Particular attention is given to terms related to the change of visibility near self-occlusion boundaries that are neglected in existing formulations. To this end we introduce new occlusion forces and show that using all gradient terms greatly improves the performance of the method. Qualitative and quantitative experimental results demonstrate the potential of the approach.
2008 IEEE Conference on Computer Vision and Pattern Recognition, 2008
A novel model-based approach to 3D hand tracking from monocular video is presented. The 3D hand pose, the hand texture and the illuminant are dynamically estimated through minimization of an objective function. Derived from an inverse problem formulation, the objective function enables explicit use of texture temporal continuity and shading information, while handling important self-occlusions and time-varying illumination. The minimization is done efficiently using a quasi-Newton method, for which we propose a rigorous derivation of the objective function gradient. Particular attention is given to terms related to the change of visibility near self-occlusion boundaries that are neglected in existing formulations. In doing so we introduce new occlusion forces and show that using all gradient terms greatly improves the performance of the method. Experimental results demonstrate the potential of the formulation.
Advances in Intelligent Systems and Computing, 2015
Recently, model-based approaches have produced very promising results to the problems of 3D hand tracking. The current state of the art method recovers the 3D position, orientation and 20 DOF articulation of a human hand from markerless visual observations obtained by an RGB-D sensor. Hand pose estimation is formulated as an optimization problem, seeking for the hand model parameters that minimize an objective function that quantifies the discrepancy between the appearance of hand hypotheses and the actual hand observation. The design of such a function is a complicated process that requires a lot of prior experience with the problem. In this paper we automate the definition of the objective function in such optimization problems. First, a set of relevant, candidate image features is computed. Then, given synthetic data sets with ground truth information, regression analysis is used to combine these features in an objective function that seeks to maximize optimization performance. Extensive experiments study the performance of the proposed approach based on various dataset generation strategies and feature selection techniques.
Workshop on Motion and Video Computing, 2002. Proceedings., 2000
We present a model based approach to the integration of multiple cues for tracking high degree of freedom articulated motions. We then apply it to the problem of hand tracking using a single camera sequence. Hand tracking is particularly challenging because of occlusions, shading variations, and the high dimensionality of the motion. The novelty of our approach is in the combination of multiple sources of information which come from edges, optical flow and shading information. In particular we introduce in deformable model theory a generalized version of the gradient-based optical flow constraint, that includes shading flow i.e., the variation of the shading of the object as it rotates with respect to the light source. This constraint unifies the shading and the optical flow constraints (it simplifies to each one of them, when the other is not present). Our use of cue information from the entirety of the hand enables us to track its complex articulated motion in the presence of shading changes. Given the model-based formulation we use shading when the optical flow constraint is violated due to significant shading changes in a region. We use a forward recursive dynamic model to track the motion in response to 3D data derived forces applied to the model. The hand is modeled as a base link (palm) with five linked chains (fingers) while the allowable motion of the fingers is controlled by recursive dynamics constraints. Model driving forces are generated from edges, optical flow and shading. The effectiveness of our approach is demonstrated with experiments on a number of different hand motions with shading changes, rotations and occlusions of significant parts of the hand.
2003
We present a model based approach to the integration of multiple cues for tracking high degree of freedom articulated motions and model refinement. We then apply it to the problem of hand tracking using a single camera sequence. Hand tracking is particularly challenging because of occlusions, shading variations, and the high dimensionality of the motion. The novelty of our approach is in the combination of multiple sources of information which come from edges, optical flow and shading information in order to refine the model during tracking. We first use a previously formulated generalized version of the gradient-based optical flow constraint, that includes shading flow i.e., the variation of the shading of the object as it rotates with respect to the light source. Using this model we track its complex articulated motion in the presence of shading changes. We use a forward recursive dynamic model to track the motion in response to data derived 3D forces applied to the model. However, due to inaccurate initial shape the generalized optical flow constraint is violated. In this paper we use the error in the generalized optical flow equation to compute generalized forces that correct the model shape at each step. The effectiveness of our approach is demonstrated with experiments on a number of different hand motions with shading changes, rotations and occlusions of significant parts of the hand.
IEEE ICCV 2011, 2011
Due to occlusions, the estimation of the full pose of a human hand interacting with an object is much more challenging than pose recovery of a hand observed in isolation. In this work we formulate an optimization problem whose solution is the 26-DOF hand pose together with the pose and model parameters of the manipulated object. Optimization seeks for the joint hand-object model that (a) best explains the incompleteness of observations resulting from occlusions due to hand-object interaction and (b) is physically plausible in the sense that the hand does not share the same physical space with the object. The proposed method is the first that solves efficiently the continuous, full-DOF, joint hand-object tracking problem based solely on markerless multicamera input. Additionally, it is the first to demonstrate how hand-object interaction can be exploited as a context that facilitates hand pose estimation, instead of being considered as a complicating factor. Extensive quantitative and qualitative experiments with simulated and real world image sequences as well as a comparative evaluation with a state-of-the-art method for pose estimation of isolated hands, support the above findings.
Lecture Notes in Computer Science, 2013
Discriminative techniques are good for hand part detection, however they fail due to sensor noise and high inter-finger occlusion. Additionally, these techniques do not incorporate any kinematic or temporal constraints. Even though model-based descriptive (for example Markov Random Field) or generative (for example Hidden Markov Model) techniques utilize kinematic and temporal constraints well, they are computationally expensive and hardly recover from tracking failure. This paper presents a unified framework for 3D hand tracking, utilizing the best of both methodologies. Hand joints are detected using a regression forest, which uses an efficient voting technique for joint location prediction. The voting distributions are multimodal in nature; hence, rather than using the highest scoring mode of the voting distribution for each joint separately, we fit the five high scoring modes of each joint on a tree-structure Markovian model along with kinematic prior and temporal information. Experimentally, we observed that relying on discriminative technique (i.e. joints detection) produces better results. We therefore efficiently incorporate this observation in our framework by conditioning 50% low scoring joints modes with remaining high scoring joints mode. This strategy reduces the computational cost and produces good results for 3D hand tracking on RGB-D data.
Actes du Colloque Scientifique …, 1999
We address the issue of 3D hand gesture analysis by monoscopic vision without body markers. A 3D articulated model is registered with images sequences. We compare several registration evaluation functions (edge distance, non-overlapping surface) and optimisation methods (Levenberg-Marquardt, downhill simplex and Powell). Biomechanical constraints are integrated into the minimisation algorithm to constrain registration to realistic postures. Results on image sequences are presented. Potential application include hand gesture acquisition and human machine interface.
Procedings of the British Machine Vision Conference 2006, 2006
In this paper, we propose a novel model-based approach to recover 3D hand pose from 2D images through a compact articulated 3D hand model whose parameters are inferred in a Bayesian manner. To this end, we propose generative models for hand and background pixels leading to a loglikelihood objective function which aims at enclosing hand-like pixels within the silhouette of the projected 3D model while excluding background-like pixels.Segmentation and hand pose estimation are unified through the minimization of a single likelihood function, which is novel and improve overall robustness. We derive the gradient in the hand parameter space of such an area-based objective function, which is new and allows faster convergence rate than gradient free methods. Furthermore , we propose a new constrained variable metric gradient descent to speed up convergence and finally the so called smart particle filter is used to improve robustness through multiple hypotheses and to exploit temporal coherence. Very promising experimental results demonstrate the potentials of our approach.
ERCIM News, 2013
In this paper we rst describe how w e h a ve constructed a 3D deformable Point Distribution Model of the human hand, capturing training data semi-automatically from volume images via a p h ysically-based model. We then show h o w w e have attempted to use this model in tracking an unmarked hand moving with 6 degrees of freedom (plus deformation) in real time using a single video camera. In the course of this we s h o w how to improve o n a w eighted least-squares pose parameter approximation at little computational cost. We note the successes and shortcomings of our system and discuss how i t m i g h t be improved.
BMVC 2011, 2011
We present a novel solution to the problem of recovering and tracking the 3D position, orientation and full articulation of a human hand from markerless visual observations obtained by a Kinect sensor. We treat this as an optimization problem, seeking for the hand model parameters that minimize the discrepancy between the appearance and 3D structure of hypothesized instances of a hand model and actual hand observations. This optimization problem is effectively solved using a variant of Particle Swarm Optimization (PSO). The proposed method does not require special markers and/or a complex image acquisition setup. Being model based, it provides continuous solutions to the problem of tracking hand articulations. Extensive experiments with a prototype GPU-based implementation of the proposed method demonstrate that accurate and ro- bust 3D tracking of hand articulations can be achieved in near real-time (15Hz).
Model-based methods to the tracking of an articulated hand in a video sequence could be divided in two categories. The first one, called stochastic methods, uses stochastic filters such as kalman or particle ones. The second category, named deterministic methods, defines a dissimilarity function to measure how well the hand model is aligned with the hand images of a video sequence. This dissimilarity function is then minimized to achieve the hand tracking. Two well-known problems are related to the minimization algorithms. The first one is that of local minima. The second problem is that of computing time required to reach the solution. These problems are compounded with the large number of degrees of freedom (DOF) of the hand (around 26). The choice of the function to be minimized and that of the minimization process can be an answer to these problems. In this paper two major contributions are presented. The first one defines a new dissimilarity function, which gives better results for hand tracking than other well-known functions like the directed chamfer or hausdorff distances. The second contribution proposes a minimization process that operates in two steps. The first one provides the global parameters of the hand, i.e. position and orientation of the palm, whereas the second step gives the local parameters of the hand, i.e. finger joint angles. Operating in two stages, the proposed two-step algorithm reduces the complexity of the minimization problem. Indeed, it seems more robust to local minima than a one-step algorithm and improves the computing time needed to get the desired solution.
Procedings of the British Machine Vision Conference 2017, 2017
We present a method for 3D hand tracking that exploits spatial constraints in the form of end effector (fingertip) locations. The method follows a generative, hypothesize-andtest approach and uses a hierarchical particle filter to track the hand. In contrast to state of the art methods that consider spatial constraints in a soft manner, the proposed approach enforces constraints during the hand pose hypothesis generation phase by sampling in the Reachable Distance Space (RDS). This sampling produces hypotheses that respect both the hands' dynamics and the end effector locations. The data likelihood term is calculated by measuring the discrepancy between the rendered 3D model and the available observations. Experimental results on challenging, ground truth-annotated sequences containing severe hand occlusions demonstrate that the proposed approach outperforms the state of the art in hand tracking accuracy.
The analysis and the understanding of object manipulation scenarios based on computer vision techniques can be greatly facilitated if we can gain access to the full articulation of the manipulating hands and the 3D pose of the manipulated objects. Currently, there exist methods for tracking hands in interaction with objects whose 3D models are known. There are also methods that can reconstruct 3D models of objects that are partially observable in each frame of a sequence. However, to the best of our knowledge, no method can track hands in interaction with unknown objects. In this paper we propose such a method. Experimental results show that hand tracking can be achieved with an accuracy that is comparable to the one obtained by methods that assume knowledge of the object models. Additionally, as a by-product, the proposed method delivers accurate 3D models of the manipulated objects.
Lecture Notes in Computer Science, 2015
Research in vision-based 3D hand tracking targets primarily the scenario in which a bare hand performs unconstrained motion in front of a camera system. Nevertheless, in several important application domains, augmenting the hand with color information so as to facilitate the tracking process constitutes an acceptable alternative. With this observation in mind, in this work we propose a modification of a state of the art method [12] for markerless 3D hand tracking, that takes advantage of the richer observations resulting from a colored glove. We do so by modifying the 3D hand model employed in the aforementioned hypothesize-and-test method as well as the objective function that is minimized in its optimization step. Quantitative and qualitative results obtained from a comparative evaluation of the baseline method to the proposed approach confirm that the latter achieves a remarkable increase in tracking accuracy and robustness and, at the same time, reduces drastically the associated computational costs.
ACM Transactions on Graphics, 2016
Fully articulated hand tracking promises to enable fundamentally new interactions with virtual and augmented worlds, but the limited accuracy and efficiency of current systems has prevented widespread adoption. Today's dominant paradigm uses machine learning for initialization and recovery followed by iterative model-fitting optimization to achieve a detailed pose fit. We follow this paradigm, but make several changes to the model-fitting, namely using: (1) a more discriminative objective function; (2) a smooth-surface model that provides gradients for non-linear optimization; and (3) joint optimization over both the model pose and the correspondences between observed data points and the model surface. While each of these changes may actually increase the cost per fitting iteration, we find a compensating decrease in the number of iterations. Further, the wide basin of convergence means that fewer starting points are needed for successful model fitting. Our system runs in real-t...
2010
Abstract—We present a real-time hand tracking technique by using a cloth glove with color markers placed at moveable hand bone positions and two different, fixed-positioned webcams. After calibrating the webcams in a preprocessing step, linear triangulation reconstructs for each match of the color blobs tracked in both images, the corresponding point in three-dimensional space. Then the necessary transformations are computed to manipulate a three-dimensional virtual hand mesh according to the reconstructed hand ...
Lecture Notes in Computer Science, 2013
Hand pose estimation is an important task in areas such as human computer interaction (HCI), sign language recognition and robotics. Due to the high variability in hand appearance and many degrees of freedom (DoFs) of the hand, hand pose estimation and tracking is very challenging, and different sources of data and methods are used to solve this problem. In the paper, we propose a method for model-based full DoF hand pose estimation from a single RGB-D image. The main advantage of the proposed method is that no prior manual initialization is required and only very general assumptions about the hand pose are made. Therefore, this method can be used for hand pose estimation from a single RGB-D image, as an initialization step for subsequent tracking, or for tracking recovery.
Proceedings of the 16th International Conference on PErvasive Technologies Related to Assistive Environments
We present a method for simultaneous 3D hand shape and pose estimation on a single RGB image frame. Specifically, our method fits the MANO 3D hand model to 2D hand keypoints. Fitting is achieved based on a novel 2D objective function that exploits anatomical joint limits, combined with shape regularization on the MANO hand model, jointly optimizing the 3D shape and pose of the hand in a single frame. In a series of quantitative experiments on wellestablished datasets annotated with ground truth, we show that it is possible to obtain reconstructions that are competitive and, in some cases, superior to existing 3D hand pose estimation approaches.
Pattern Analysis and …, 2006
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.