Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
Model-based methods to the tracking of an articulated hand in a video sequence generally use a cost function to compare the hand pose with a parametric three-dimensional (3D) hand model. This comparison allows adapting the hand model parameters and it is thus possible to reproduce the hand gestures. Many proposed cost functions exploit either silhouette or edge features. Unfortunately, these functions cannot deal with the tracking of complex hand motion. This paper presents a new depth-based function to track complex hand motion such as opening and closing hand. Our proposed function compares 3D point clouds stemming from depth maps. Each hand point cloud is compared with several clouds of points which correspond to different model poses in order to obtain the model pose that is close to the hand one. To reduce the computational burden, we propose to compute a volume of voxels from a hand point cloud, where each voxel is characterized by its distance to that cloud. When we place a model point cloud inside this volume of voxels, it becomes fast to compute its distance to the hand point cloud. Compared with other well-known functions such as the directed Hausdorff distance(Huttenlocher et al., 1993), our proposed function is more adapted to the hand tracking problem and it is faster than the Hausdorff function.
2016 23rd International Conference on Pattern Recognition (ICPR), 2016
In this paper, we propose two new approaches using the Convolution Neural Network (CNN) and the Recurrent Neural Network (RNN) for tracking 3D hand poses. The first approach is a detection based algorithm while the second is a data driven method. Our first contribution is a new trackingby-detection strategy extending the CNN based single frame detection method to a multiple frame tracking approach by taking into account prediction history using RNN. Our second contribution is the use of RNN to simulate the fitting of a 3D model to the input data. It helps to relax the need of a carefully designed fitting function and optimization algorithm. With such strategies, we show that our tracking frameworks can automatically correct the fail detection made in previous frames due to occlusions. Our proposed method is evaluated on two public hand datasets, i.e. NYU and ICVL, and compared against other recent hand tracking methods. Experimental results show that our approaches achieve the state-of-the-art accuracy and efficiency in the challenging problem of 3D hand pose estimation.
Advances in Visual Computing, 2015
Real-time hand articulations tracking is important for many applications such as interacting with virtual / augmented reality devices. However, most of existing algorithms highly rely on expensive and high power-consuming GPUs to achieve real-time processing. Consequently, these systems are inappropriate for mobile and wearable devices. In this paper, we propose an efficient hand tracking system which does not require high performance GPUs. In our system, we track hand articulations by minimizing discrepancy between depth map from sensor and computer-generated hand model. We also re-initialize hand pose at each frame using finger detection and classification. Our contributions are: (a) propose adaptive hand model to consider different hand shapes of users without generating personalized hand model; (b) improve the highly efficient re-initialization for robust tracking and automatic initialization; (c) propose hierarchical random sampling of pixels from each depth map to improve tracking accuracy while limiting required computations. To the best of our knowledge, it is the first system that achieves both automatic hand model adjustment and realtime tracking without using GPUs.
ERCIM News, 2013
Lecture Notes in Computer Science, 2013
Discriminative techniques are good for hand part detection, however they fail due to sensor noise and high inter-finger occlusion. Additionally, these techniques do not incorporate any kinematic or temporal constraints. Even though model-based descriptive (for example Markov Random Field) or generative (for example Hidden Markov Model) techniques utilize kinematic and temporal constraints well, they are computationally expensive and hardly recover from tracking failure. This paper presents a unified framework for 3D hand tracking, utilizing the best of both methodologies. Hand joints are detected using a regression forest, which uses an efficient voting technique for joint location prediction. The voting distributions are multimodal in nature; hence, rather than using the highest scoring mode of the voting distribution for each joint separately, we fit the five high scoring modes of each joint on a tree-structure Markovian model along with kinematic prior and temporal information. Experimentally, we observed that relying on discriminative technique (i.e. joints detection) produces better results. We therefore efficiently incorporate this observation in our framework by conditioning 50% low scoring joints modes with remaining high scoring joints mode. This strategy reduces the computational cost and produces good results for 3D hand tracking on RGB-D data.
2012
This paper describes a method that, given an input image of a person signing a gesture in a cluttered scene, locates the gesturing arm, automatically detects and segments the hand and finally creates a ranked list of possible shape class, 3D pose orientation and full hand configuration parameters. The clutter-tolerant hand segmentation algorithm is based on depth data from a single image captured with a commercially available depth sensor, namely the Kinect TM.
Actes du Colloque Scientifique …, 1999
We address the issue of 3D hand gesture analysis by monoscopic vision without body markers. A 3D articulated model is registered with images sequences. We compare several registration evaluation functions (edge distance, non-overlapping surface) and optimisation methods (Levenberg-Marquardt, downhill simplex and Powell). Biomechanical constraints are integrated into the minimisation algorithm to constrain registration to realistic postures. Results on image sequences are presented. Potential application include hand gesture acquisition and human machine interface.
Hand gestures are an important type of natural language used in many research areas such as human-computer interaction and computer vision. Hand gestures recognition requires the prior determination of the hand position through detection and tracking. One of the most efficient strategies for hand tracking is to use 2D visual information such as color and shape. However, visual-sensor-based hand tracking methods are very sensitive when tracking is performed under variable light conditions. Also, as hand movements are made in 3D space, the recognition performance of hand gestures using 2D information is inherently limited. In this article, we propose a novel real-time 3D hand tracking method in depth space using a 3D depth sensor and employing Kalman filter. We detect hand candidates using motion clusters and predefined wave motion, and track hand locations using Kalman filter. To verify the effectiveness of the proposed method, we compare the performance of the proposed method with the visual-based method. Experimental results show that the performance of the proposed method out performs visual-based method.
Proceedings, International Conference on Image Analysis and Recognition, 2012
Tracking hands and estimating their trajectories is useful in a number of tasks, including sign language recognition and human computer interaction. Hands are extremely difficult objects to track, their deformability, frequent self occlusions and motion blur cause appearance variations too great for most standard object trackers to deal with robustly. In this paper, the 3D motion field of a scene (known as the Scene Flow, in contrast to Optical Flow, which is it's projection onto the image plane) is estimated using a recently proposed algorithm, inspired by particle filtering. Unlike previous techniques, this scene flow algorithm does not introduce blurring across discontinuities, making it far more suitable for object segmentation and tracking. Additionally the algorithm operates several orders of magnitude faster than previous scene flow estimation systems, enabling the use of Scene Flow in real-time, and near real-time applications. A novel approach to trajectory estimation is then introduced, based on clustering the estimated scene flow field in both space and velocity dimensions. This allows estimation of object motions in the true 3D scene, rather than the traditional approach of estimating 2D image plane motions. By working in the scene space rather than the image plane, the constant velocity assumption, commonly used in the prediction stage of trackers, is far more valid, and the resulting motion estimate is richer, providing information on out of plane motions. To evaluate the performance of the system, 3D trajectories are estimated on a multi-view sign-language dataset, and compared to a traditional high accuracy 2D system, with excellent results.
BMVC 2011, 2011
We present a novel solution to the problem of recovering and tracking the 3D position, orientation and full articulation of a human hand from markerless visual observations obtained by a Kinect sensor. We treat this as an optimization problem, seeking for the hand model parameters that minimize the discrepancy between the appearance and 3D structure of hypothesized instances of a hand model and actual hand observations. This optimization problem is effectively solved using a variant of Particle Swarm Optimization (PSO). The proposed method does not require special markers and/or a complex image acquisition setup. Being model based, it provides continuous solutions to the problem of tracking hand articulations. Extensive experiments with a prototype GPU-based implementation of the proposed method demonstrate that accurate and ro- bust 3D tracking of hand articulations can be achieved in near real-time (15Hz).
Proceedings of the 16th International Conference on Advanced Robotics (ICAR 2013)
Hand pose estimation is the task of deriving a hand's articulation from sensory input, here depth images in particular. A novel approach states pose estimation as an optimization problem: a high-dimensional hypothesis space is constructed from a hand model, in which particle swarms search for the best pose hypothesis. We propose various additions to this approach. Our extended hand model includes anatomical constraints of hand motion by applying principal component analysis (PCA). This allows us to treat pose estimation as a problem with variable dimensionality. The most important benefit becomes visible once our PCA-enhanced model is combined with biased particle swarms. Several experiments show that accuracy and performance of pose estimation improve significantly.
2004
A method is proposed to track the full hand motion from 3D points reconstructed using a stereoscopic set of cameras. This approach combines the advantages of methods that use 2D motion (e.g. optical flow), and those that use a 3D reconstruction at each time frame to capture the hand motion. Matching either contours or a 3D reconstruction against a 3D hand model is usually very difficult due to self-occlusions and the locally-cylindrical structure of each phalanx in the model, but our use of 3D point trajectories constrains the motion and overcomes these problems. Our tracking procedure uses both the 3D point matches between two time frames and a smooth surface model of the hand, build with implicit surface. We used animation techniques to represent faithfully the skin motion, especially near joints. Robustness is obtained by using an EM version of the ICP algorithm for matching points between consecutive frames, and the tracked points are then registered to the surface of the hand model. Results are presented on a stereoscopic sequence of a moving hand, and are evaluated using a side view of the sequence.
International Journal of Image, Graphics and Signal Processing, 2018
Human-Computer Interaction (HCI) is one of the most interesting and challenging research topics in computer vision community. Among different HCI methods, hand gesture is the natural way of humancomputer interaction and is focused on by many researchers. It allows the human to use their hand movements to interact with machine easily and conveniently. With the birth of depth sensors, many new techniques have been developed and gained a lot of achievements. In this work, we propose a set of features extracted from depth maps for dynamic hand gesture recognition. We extract HOG2 for shape and appearance of hand in gesture representation. Moreover, to capture the movement of the hands, we propose a new feature named HOF2, which is extracted based on optical flow algorithm. These spatial-temporal descriptors are easy to comprehend and implement but perform very well in multi-class classification. They also have a low computational cost, so it is suitable for real-time recognition systems. Furthermore, we applied Robust PCA to reduce feature's dimension to build robust and compact gesture descriptors. The robust results are evaluated by cross-validation scheme using a SVM classifier, which shows good outcome on challenging MSR Hand Gestures Dataset and VIVA Challenge Dataset with 95.51% and 55.95% in accuracy, respectively.
Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems, 2015
Figure : We present a new system for tracking the detailed motion of a user's hand using only a commodity depth camera. Our system can accurately reconstruct the complex articulated pose of the hand, whilst being robust to tracking failure, and supporting flexible setups such as tracking at large distances and over-the-shoulder camera placement.
2017 IEEE International Conference on Computer Vision Workshops (ICCVW), 2017
We present a novel solution to the problem of 3D tracking of the articulated motion of human hand(s), possibly in interaction with other objects. The vast majority of contemporary relevant work capitalizes on depth information provided by RGBD cameras. In this work, we show that accurate and efficient 3D hand tracking is possible, even for the case of RGB stereo. A straightforward approach for solving the problem based on such input would be to first recover depth and then apply a state of the art depth-based 3D hand tracking method. Unfortunately, this does not work well in practice because the stereo-based, dense 3D reconstruction of hands is far less accurate than the one obtained by RGBD cameras. Our approach bypasses 3D reconstruction and follows a completely different route: 3D hand tracking is formulated as an optimization problem whose solution is the hand configuration that maximizes the color consistency between the two views of the hand. We demonstrate the applicability of our method for real time tracking of a single hand, of a hand manipulating an object and of two interacting hands. The method has been evaluated quantitatively on standard datasets and in comparison to relevant, state of the art RGBD-based approaches. The obtained results demonstrate that the proposed stereo-based method performs equally well to its RGBD-based competitors, and in some cases, it even outperforms them.
IEEE/CAA Journal of Automatica Sinica, 2021
Hand gesture recognition is a popular topic in computer vision and makes human-computer interaction more flexible and convenient. The representation of hand gestures is critical for recognition. In this paper, we propose a new method to measure the similarity between hand gestures and exploit it for hand gesture recognition. The depth maps of hand gestures captured via the Kinect sensors are used in our method, where the 3D hand shapes can be segmented from the cluttered backgrounds. To extract the pattern of salient 3D shape features, we propose a new descriptor–3D Shape Context, for 3D hand gesture representation. The 3D Shape Context information of each 3D point is obtained in multiple scales because both local shape context and global shape distribution are necessary for recognition. The description of all the 3D points constructs the hand gesture representation, and hand gesture recognition is explored via dynamic time warping algorithm. Extensive experiments are conducted on multiple benchmark datasets. The experimental results verify that the proposed method is robust to noise, articulated variations, and rigid transformations. Our method outperforms state-of-the-art methods in the comparisons of accuracy and efficiency.
Proceedings of the 12th International Conference on Informatics in Control, Automation and Robotics, 2015
In this paper we propose a recognition technique of 3D dynamic gesture for human robot interaction (HRI) based on depth information provided by Kinect sensor. The body is tracked using the skeleton algorithm provided by the Kinect SDK. The main idea of this work is to compute the angles of the upper body joints which are active when executing gesture. The variation of these angles are used as inputs of Hidden Markov Models (HMM) in order to recognize the dynamic gestures. Results demonstrate the robustness of our method against environmental conditions such as illumination changes and scene complexity due to using depth information only.
Workshop on Synthetic-Natural Hybrid Coding and …, 1999
We address the issue of 3D hand gesturee modelling given only one camera input and without body markers. A 3D articulated model of the hand is first adjusted to the user's hand morphology with respect to anthropometric constraints. Then it is registered with image sequences by minimising an error function. Several functions (edge distances, non-overlapping surface) and optimisation methods (Levenberg-Marquardt, downhill simplex and Powell) are compared. Biomechanical constraints are integrated into the minimisation algorithm to force registration to realistic postures. Results on hand gesture image sequences are finally presented. Potential target applications include SNHC coding of human movements, virtual character animation, human-machine interaction and sign language recognition.
Advances in Intelligent Systems and Computing, 2015
Recently, model-based approaches have produced very promising results to the problems of 3D hand tracking. The current state of the art method recovers the 3D position, orientation and 20 DOF articulation of a human hand from markerless visual observations obtained by an RGB-D sensor. Hand pose estimation is formulated as an optimization problem, seeking for the hand model parameters that minimize an objective function that quantifies the discrepancy between the appearance of hand hypotheses and the actual hand observation. The design of such a function is a complicated process that requires a lot of prior experience with the problem. In this paper we automate the definition of the objective function in such optimization problems. First, a set of relevant, candidate image features is computed. Then, given synthetic data sets with ground truth information, regression analysis is used to combine these features in an objective function that seeks to maximize optimization performance. Extensive experiments study the performance of the proposed approach based on various dataset generation strategies and feature selection techniques.
2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018
In this paper, we strive to answer two questions: What is the current state of 3D hand pose estimation from depth images? And, what are the next challenges that need to be tackled? Following the successful Hands In the Million Challenge (HIM2017), we investigate the top 10 state-ofthe-art methods on three tasks: single frame 3D pose estimation, 3D hand tracking, and hand pose estimation during object interaction. We analyze the performance of different CNN structures with regard to hand shape, joint visibility, view point and articulation distributions. Our findings include: (1) isolated 3D hand pose estimation achieves low mean errors (10 mm) in the view point range of [70, 120] degrees, but it is far from being solved for extreme view points; (2) 3D volumetric representations outperform 2D CNNs, better capturing the spatial structure of the depth data; (3) Discriminative methods still generalize poorly to unseen hand shapes; (4) While joint occlusions pose a challenge for most methods, explicit modeling of structure constraints can significantly narrow the gap between errors on visible and occluded joints.
In this paper we rst describe how w e h a ve constructed a 3D deformable Point Distribution Model of the human hand, capturing training data semi-automatically from volume images via a p h ysically-based model. We then show h o w w e have attempted to use this model in tracking an unmarked hand moving with 6 degrees of freedom (plus deformation) in real time using a single video camera. In the course of this we s h o w how to improve o n a w eighted least-squares pose parameter approximation at little computational cost. We note the successes and shortcomings of our system and discuss how i t m i g h t be improved.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.