Few-shot learning, namely recognizing novel categories with a very small amount of training examp... more Few-shot learning, namely recognizing novel categories with a very small amount of training examples, is a challenging area of machine learning research. Traditional deep learning method requires massive training data to tune the huge number of parameters, which is often impractical and prone to over-fitting. In this work, we further work on the well-known few-shot learning method known as prototypical networks for better performance. Our contributions include (1) a new embedding structure to encode relative spatial relationships between features by applying capsule network; (2) a new triplet loss designated to enhance the semantic feature embedding where similar samples are close to each other while dissimilar samples are farther apart; and (3) an effective nonparametric classifier termed attentive prototypes in place of the simple prototypes in current few-shot learning. The proposed attentive prototype aggregates all of the instances in a support class which are weighted by their importance defined by the reconstruction error for a given query. The reconstruction error allows the classification posterior probability to be estimated, which corresponds to the classification confidence score. Extensive experiments on three benchmark datasets demonstrate that our approach is effective for the few-shot classification task.
7th International Conference on Automatic Face and Gesture Recognition (FGR06)
Face recognition can be considered as a one-class classification problem and associative memory (... more Face recognition can be considered as a one-class classification problem and associative memory (AM) based approaches have been proven efficient in previous studies. In this paper, a Kernel Associative Memory (KAM) based face recognition scheme with a Multiscale Gabor transform, is proposed. In our method, face images of each person are first decomposed into their multiscale representations by a quasicomplete Gabor transform, which are then modelled by erne el Associative Memories. The pyramidal multiscale Gabor wavelet transform not only provides a very efficient implementation of Gabor transform in spatial domain, but also permits a fast reconstruction. In the testing phase, a query face image is also represented by a Gabor multiresolution pyramid and the recalled results from different KAM models corresponding to even Gabor channels are then simply added together to provide a reconstruction. The recognition scheme was thoroughly tested using several benchmark face datasets, including the AR faces, UMIST faces, JAFFE faces and Yale A faces. The experiment results have demonstrated strong robustness in recognizing faces under different conditions, particularly the poses alterations, varying occulusions and expression changes.
2018 24th International Conference on Pattern Recognition (ICPR), 2018
Image captioning is a significant task in artificial intelligence which connects computer vision ... more Image captioning is a significant task in artificial intelligence which connects computer vision and natural language processing. With the rapid development of deep learning, the sequence to sequence model with attention, has become one of the main approaches for the task of image captioning. Nevertheless, a significant issue exists in the current framework: the exposure bias problem of Maximum Likelihood Estimation (MLE) in the sequence model. To address this problem, we use generative adversarial networks (GANs) for image captioning, which compensates for the exposure bias problem of MLE and also can generate more realistic captions. GANs, however, cannot be directly applied to a discrete task, like language processing, due to the discontinuity of the data. Hence, we use a reinforcement learning (RL) technique to estimate the gradients for the network. Also, to obtain the intermediate rewards during the process of language generation, a Monte Carlo roll-out sampling method is utilized. Experimental results on the COCO dataset validate the improved effect from each ingredient of the proposed model. The overall effectiveness is also evaluated.
2019 IEEE International Conference on Image Processing (ICIP), 2019
With the rapid development of facial recognition, the research field of near infrared (NIR) face ... more With the rapid development of facial recognition, the research field of near infrared (NIR) face recognition, which is less sensitive to illumination levels, has attracted increased attention. Unfortunately, directly applying the face recognition model trained using visual light (VIS) data to NIR face data does not produce a satisfactory performance. This is due to the domain bias between the NIR image and the VIS image. To this end, we created the Outdoor NIR-VIS Face (ONVF) database and Indoor NIR Face (INF) database to increase the number of near infrared facial images. In this paper, we propose an efficient NIR face recognition method, which consists of face detection and alignment, NIR-VIS image translation and face embedding. The NIR-VIS image conversion model is capable of transforming near-infrared facial images into their corresponding VIS images whilst maintaining sufficient identity information to enable existing VIS facial recognition models to perform recognition. Extensive experiments using the INF Dataset and the CSIST Database have demonstrated that the proposed method yields a consistent and competitive performance for near infrared face recognition.
Automatically generating the descriptions of an image, i.e., image captioning, is an important an... more Automatically generating the descriptions of an image, i.e., image captioning, is an important and fundamental topic in artificial intelligence, which bridges the gap between computer vision and natural language processing. Based on the successful deep learning models, especially the CNN model and Long Short Term Memories (LSTMs) with attention mechanism, we propose a hierarchical attention model by utilizing both of the global CNN features and the local object features for more effective feature representation and reasoning in image captioning. The generative adversarial network (GAN), together with a reinforcement learning (RL) algorithm, is applied to solve the exposure bias problem in RNN-based supervised training for language problems. In addition, through the automatic measurement of the consistency between the generated
Vehicle re-identification (re-ID), namely, finding exactly the same vehicle from a large number o... more Vehicle re-identification (re-ID), namely, finding exactly the same vehicle from a large number of vehicle images, remains a great challenge in computer vision. Most existing vehicle re-ID approaches follow a fully-supervised learning methodology, in which sufficient labeled training data is required. However, this limits their scalability to realistic applications, due to the high cost of data labeling. In this paper, we adopted a Generative Adversarial Network (GAN) to generate unlabeled samples and enlarge the training set. A semi-supervised learning scheme with the Convolutional Neural Networks (CNN) was proposed accordingly, which assigns a uniform label distribution to the unlabeled images to regularize the supervised model and improve the performance of the vehicle re-ID system. Besides, an improved re-ranking method based on Jaccard distance and k-reciprocal nearest neighbors is proposed to optimize the initial rank list. Extensive experiments over the benchmark datasets VeRi-776, VehicleID and VehicleReID have demonstrated that the proposed method outperforms the state-of-the-art approaches for vehicle re-ID.
Proceedings of 2001 International Symposium on Intelligent Multimedia, Video and Speech Processing. ISIMP 2001 (IEEE Cat. No.01EX489)
In this paper, we proposed a face recognition scheme based on auto-associative memory (AM) model.... more In this paper, we proposed a face recognition scheme based on auto-associative memory (AM) model. Two kind of AM models are compared, namely, pseudo-inverse memory and Radial Basis Function (RBF) network, and we found that RBF based associative memory is much more e cient. To capture the substantial facial features and reduce computational complexity, we proposed to use wavelet transform (WT) to decompose face images and choose the lowest resolution subband coe cients for face representation. Results indicate that the modular scheme yield accurate recognition on the widely used XM2VTS face d a t a b ase and Olivetti Research Laboratory (ORL) face database.
International Conference on Computational Intelligence for Modelling, Control and Automation and International Conference on Intelligent Agents, Web Technologies and Internet Commerce (CIMCA-IAWTIC'06), 2005
Based on the Kernel Principal Component Regression (KPCR) recently proposed in the literature, a ... more Based on the Kernel Principal Component Regression (KPCR) recently proposed in the literature, a new kernel auto-associator (KAA) model is proposed for classification and novelty detection. For face recognition problem, KAA model can efficiently characterize each subject thus offering a good recognition performance. Steming from the Principal component regression (PCR), a simple technique using principal
Proceedings. Fourteenth International Conference on Pattern Recognition (Cat. No.98EX170)
This paper describes a modular class$cation system for handwritten digit recognition based on the... more This paper describes a modular class$cation system for handwritten digit recognition based on the elastic net model. We use ten separate elastic nets to capture different features in the ten classes of handwritten digits and represent an input sample from the activations in each net by population decoding. Compared with traditional neural networks based discriminant classifiers, our scheme features fast training and high recognition accuracy.
Proceedings. Fourteenth International Conference on Pattern Recognition (Cat. No.98EX170)
This paper propose a vector quantization (VQ) technique to solve the problem of handwritten signa... more This paper propose a vector quantization (VQ) technique to solve the problem of handwritten signature verijication. A neural 'gas' model is trained to establish a reference set for each registered person with handwritten signature samples. Then a test sample is compared with all the prototypes in the reference set and the system outputs the label of the writer of the word. Several difSerent feature extraction methods are compared and good results have been obtained by the VQ technique.
Mixture of local principal component analysis (PCA) has attracted attention due to a number of be... more Mixture of local principal component analysis (PCA) has attracted attention due to a number of benefits over global PCA. The performance of a mixture model usually depends on the data partition and local linear fitting. In this paper, we propose a mixture model which has the properties of optimal data partition and robust local fitting. Data partition is realized by a soft competition algorithm called neural 'gas' and robust local linear fitting is approached by a nonlinear extension of PCA learning algorithm. Based on this mixture model, we describe a modular classification scheme for handwritten digit recognition, in which each module or network models the manifold of one of ten digit classes. Experiments demonstrate a very high recognition rate.
LISSOM (Laterally Interconnected Synergetically Self-Organizing Map) is a biologically motivated ... more LISSOM (Laterally Interconnected Synergetically Self-Organizing Map) is a biologically motivated self-organizing neural network for the simultaneous development of topographic maps and lateral interactions in the visual cortex. However, the simple Hebbian mechanism for afferent connections requires a redundant dimension to be added to the input, and normalization is necessary. Another shortcoming of LISSOM is that several parameters must be chosen before it can be used as a model of topographic map formation. To solve these problems, we propose to apply the least mean-square error reconstruction (LMSER) learning rule as an alternative to the simple Hebbian rule for the afferent connections. Experiments demonstrate the essential topographic map properties from the improved LISSOM model
Face recognition is an important research area with many potential applications such as biometric... more Face recognition is an important research area with many potential applications such as biometric security. Among various techniques, eigenface method by principal component analysis (PCA) of face images has been widely used. In traditional eigenface methods, PCA was used to get the eigenvectors of the covariance matrix of a training set of face images and recognition was achieved by applying a template matching scheme with the vectors obtained by projecting new faces along a small number of eigenfaces. In order to avoid the time consuming step of recomputing eigenfaces when new faces are added, we use a set of modules to generate PCA based face representation for each subjects instead of PCA of entire face images. The localized nature of the representation makes the system easy to maintain and tolerant of local facial characteristic changes. Results indicate that the modular scheme yield accurate recognition on the widely used Olivetti Research Laboratory (ORL) face database.
2006 IEEE International Conference on Video and Signal Based Surveillance, 2006
Associative memory models have been studied as efficient paradigm for visual information processi... more Associative memory models have been studied as efficient paradigm for visual information processing. In this paper we investigate a new hetero-associative memory (HAM) model based on the Kernel Partial Least Square (KPLS) regression recently proposed in the literature. Steming from the Partial Least Square (PLS) regression, a class of techniques for modeling relations between blocks of observed variables by means of latent variables, kernelized PLS methods have proved to be efficient in treating nonlinear data. By establishing the relations between sets of observed variables, KPLS can provide a general purpose HAM model, which construct nonlinear regression models in possibly high-dimensional feature spaces and uses a few latent factors to account for most of the variations in the response. For face recognition problem, we apply the HAM model to efficiently characterize each subject by relating some possible variations of a face image to a given face, using a modular strudcture by assigning an independent model to each subject. Several benchmark face databases have been used to test the performance.
Principal component analysis (PCA) is a popular tool in multivariate statistics and pattern recog... more Principal component analysis (PCA) is a popular tool in multivariate statistics and pattern recognition. Recently, some mixture models of local principal component analysis have attracted attention due to a number of bene"ts over global PCA. In this paper, we propose a mixture model by concurrently performing global data partition and local linear PCA. The partition is optimal or near optimal, which is realized by a soft competition algorithm called &neural gas'. The local PCA type representation is approximated by a neural learning algorithm in a nonlinear autoencoder network, which is set up on the generalization of the least-squares reconstruction problem leading to the standard PCA. Such a local PCA type representation has a number of numerical advantages, for example, faster convergence and insensitive to local minima. Based on this mixture model, we describe a modular classi"cation scheme to solve the problem of handwritten digits recognition. We use 10 networks (modules) to capture di!erent features in the 10 classes of handwritten digits, with each network being a mixture model of local PCA type representations. When a test digit is presented to all the modules, each module provides a reconstructed pattern by a prescribed principle and the system outputs the class label by comparing the reconstruction errors from the 10 networks. Compared with some traditional neural network-based classi"ers, our scheme converges faster and recognizes with higher accuracy. For a relatively small size of each module, the classi"cation accuracy reaches 98.6% on the training set and 97.8% on the testing set.
2014 7th International Congress on Image and Signal Processing, 2014
This paper presents a novel system for vision-based driving posture recognition. The driving post... more This paper presents a novel system for vision-based driving posture recognition. The driving posture dataset was prepared by a side-mounted camera looking at a driver's left profile. After pre-processing for illumination variations, eight action classes of constitutive components of the driving activities were segmented, including normal driving, operating a cell phone, eating and smoking. A global grid-based representation for the action sequence was emphasized, which featured two consecutive steps. Step 1 generates a motion descriptive shape based on a motion frequency image(MFI), and step 2 applies the pyramid histogram of oriented gradients (PHOG) for more discriminating characterization. A three level hierarchal classification system is designed to overcome the difficulties of some overlapping classes. Four commonly applied classifiers, including k-nearest neighbor(KNN), random forest (RF), support vector machine(SVM) and multiple layer perceptron (MLP), are evaluated in each level. The overall classification accuracy is over 87.2% for the eight classes of driving actions by the proposed classification system.
Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004., 2004
Autoassociator is an important issue in concept learning, and the learned concept of a particular... more Autoassociator is an important issue in concept learning, and the learned concept of a particular class can be used to distinguish the class from the others. For nonlinear autoassociation, this paper presents a new model referred to as kernel autoassociator. Using kernel feature space as a potential nonlinear manifold, the model formulates the autoassociation as a special reconstruction problem from kernel feature space to input space. Two methods are developed to solve the problem. We evaluate the autoassociator with artificial data, and apply it to handwritten digit recognition and multiview face recognition, yielding positive experimental results.
Face retrieval has received much attention in recent years. This paper comparatively studied five... more Face retrieval has received much attention in recent years. This paper comparatively studied five feature description methods for face representation, including Local Binary Pattern (LBP), Gabor feature, Gray Level Co-occurrence Matrices (GLCM), Pyramid Histogram of Oriented Gradient (PHOG) and Curvelet Transform (CT). The problem of large dimensionalities of the extracted features was addressed by employing a manifold learning method called Spectral Regression (SR). A fusion scheme was proposed by aggregating the distance metrics. Experiments illustrated that dimension reduced features are more efficient and the fusion scheme can offer much enhanced performance. A 98% rank 1 accuracy was obtained for the AR faces and 92% for the FERET faces.
International Journal of Artificial Intelligence and Soft Computing, 2011
ABSTRACT This paper emphasised an approach for offline signature verification and identification.... more ABSTRACT This paper emphasised an approach for offline signature verification and identification. Two image descriptors are studied, including Pyramid Histogram of Oriented Gradients (PHOG), and a direction feature proposed in the literature. Compared with many previously proposed signature feature extraction approaches, PHOG has advantages in the extraction of discriminative information from handwriting signature images. The significance of classification framework is stressed. With the benchmarking database ||Grupo de Procesado Digital de Senales|| (GPDS), satisfactory performances were obtained from several classifiers. Among the classifiers compared, SVM is clearly superior, giving a False Rejection Rate (FRR) of 2.5% and a False Acceptance Rate (FAR) 2% for skillful forgery, which compares sharply with the latest published results on the same dataset. This substantiates the superiority of the proposed method. The related issue offline signature recognition is also investigated based on the same approach, with an accuracy of 99% on the GPDS data from SVM classification.
Few-shot learning, namely recognizing novel categories with a very small amount of training examp... more Few-shot learning, namely recognizing novel categories with a very small amount of training examples, is a challenging area of machine learning research. Traditional deep learning method requires massive training data to tune the huge number of parameters, which is often impractical and prone to over-fitting. In this work, we further work on the well-known few-shot learning method known as prototypical networks for better performance. Our contributions include (1) a new embedding structure to encode relative spatial relationships between features by applying capsule network; (2) a new triplet loss designated to enhance the semantic feature embedding where similar samples are close to each other while dissimilar samples are farther apart; and (3) an effective nonparametric classifier termed attentive prototypes in place of the simple prototypes in current few-shot learning. The proposed attentive prototype aggregates all of the instances in a support class which are weighted by their importance defined by the reconstruction error for a given query. The reconstruction error allows the classification posterior probability to be estimated, which corresponds to the classification confidence score. Extensive experiments on three benchmark datasets demonstrate that our approach is effective for the few-shot classification task.
7th International Conference on Automatic Face and Gesture Recognition (FGR06)
Face recognition can be considered as a one-class classification problem and associative memory (... more Face recognition can be considered as a one-class classification problem and associative memory (AM) based approaches have been proven efficient in previous studies. In this paper, a Kernel Associative Memory (KAM) based face recognition scheme with a Multiscale Gabor transform, is proposed. In our method, face images of each person are first decomposed into their multiscale representations by a quasicomplete Gabor transform, which are then modelled by erne el Associative Memories. The pyramidal multiscale Gabor wavelet transform not only provides a very efficient implementation of Gabor transform in spatial domain, but also permits a fast reconstruction. In the testing phase, a query face image is also represented by a Gabor multiresolution pyramid and the recalled results from different KAM models corresponding to even Gabor channels are then simply added together to provide a reconstruction. The recognition scheme was thoroughly tested using several benchmark face datasets, including the AR faces, UMIST faces, JAFFE faces and Yale A faces. The experiment results have demonstrated strong robustness in recognizing faces under different conditions, particularly the poses alterations, varying occulusions and expression changes.
2018 24th International Conference on Pattern Recognition (ICPR), 2018
Image captioning is a significant task in artificial intelligence which connects computer vision ... more Image captioning is a significant task in artificial intelligence which connects computer vision and natural language processing. With the rapid development of deep learning, the sequence to sequence model with attention, has become one of the main approaches for the task of image captioning. Nevertheless, a significant issue exists in the current framework: the exposure bias problem of Maximum Likelihood Estimation (MLE) in the sequence model. To address this problem, we use generative adversarial networks (GANs) for image captioning, which compensates for the exposure bias problem of MLE and also can generate more realistic captions. GANs, however, cannot be directly applied to a discrete task, like language processing, due to the discontinuity of the data. Hence, we use a reinforcement learning (RL) technique to estimate the gradients for the network. Also, to obtain the intermediate rewards during the process of language generation, a Monte Carlo roll-out sampling method is utilized. Experimental results on the COCO dataset validate the improved effect from each ingredient of the proposed model. The overall effectiveness is also evaluated.
2019 IEEE International Conference on Image Processing (ICIP), 2019
With the rapid development of facial recognition, the research field of near infrared (NIR) face ... more With the rapid development of facial recognition, the research field of near infrared (NIR) face recognition, which is less sensitive to illumination levels, has attracted increased attention. Unfortunately, directly applying the face recognition model trained using visual light (VIS) data to NIR face data does not produce a satisfactory performance. This is due to the domain bias between the NIR image and the VIS image. To this end, we created the Outdoor NIR-VIS Face (ONVF) database and Indoor NIR Face (INF) database to increase the number of near infrared facial images. In this paper, we propose an efficient NIR face recognition method, which consists of face detection and alignment, NIR-VIS image translation and face embedding. The NIR-VIS image conversion model is capable of transforming near-infrared facial images into their corresponding VIS images whilst maintaining sufficient identity information to enable existing VIS facial recognition models to perform recognition. Extensive experiments using the INF Dataset and the CSIST Database have demonstrated that the proposed method yields a consistent and competitive performance for near infrared face recognition.
Automatically generating the descriptions of an image, i.e., image captioning, is an important an... more Automatically generating the descriptions of an image, i.e., image captioning, is an important and fundamental topic in artificial intelligence, which bridges the gap between computer vision and natural language processing. Based on the successful deep learning models, especially the CNN model and Long Short Term Memories (LSTMs) with attention mechanism, we propose a hierarchical attention model by utilizing both of the global CNN features and the local object features for more effective feature representation and reasoning in image captioning. The generative adversarial network (GAN), together with a reinforcement learning (RL) algorithm, is applied to solve the exposure bias problem in RNN-based supervised training for language problems. In addition, through the automatic measurement of the consistency between the generated
Vehicle re-identification (re-ID), namely, finding exactly the same vehicle from a large number o... more Vehicle re-identification (re-ID), namely, finding exactly the same vehicle from a large number of vehicle images, remains a great challenge in computer vision. Most existing vehicle re-ID approaches follow a fully-supervised learning methodology, in which sufficient labeled training data is required. However, this limits their scalability to realistic applications, due to the high cost of data labeling. In this paper, we adopted a Generative Adversarial Network (GAN) to generate unlabeled samples and enlarge the training set. A semi-supervised learning scheme with the Convolutional Neural Networks (CNN) was proposed accordingly, which assigns a uniform label distribution to the unlabeled images to regularize the supervised model and improve the performance of the vehicle re-ID system. Besides, an improved re-ranking method based on Jaccard distance and k-reciprocal nearest neighbors is proposed to optimize the initial rank list. Extensive experiments over the benchmark datasets VeRi-776, VehicleID and VehicleReID have demonstrated that the proposed method outperforms the state-of-the-art approaches for vehicle re-ID.
Proceedings of 2001 International Symposium on Intelligent Multimedia, Video and Speech Processing. ISIMP 2001 (IEEE Cat. No.01EX489)
In this paper, we proposed a face recognition scheme based on auto-associative memory (AM) model.... more In this paper, we proposed a face recognition scheme based on auto-associative memory (AM) model. Two kind of AM models are compared, namely, pseudo-inverse memory and Radial Basis Function (RBF) network, and we found that RBF based associative memory is much more e cient. To capture the substantial facial features and reduce computational complexity, we proposed to use wavelet transform (WT) to decompose face images and choose the lowest resolution subband coe cients for face representation. Results indicate that the modular scheme yield accurate recognition on the widely used XM2VTS face d a t a b ase and Olivetti Research Laboratory (ORL) face database.
International Conference on Computational Intelligence for Modelling, Control and Automation and International Conference on Intelligent Agents, Web Technologies and Internet Commerce (CIMCA-IAWTIC'06), 2005
Based on the Kernel Principal Component Regression (KPCR) recently proposed in the literature, a ... more Based on the Kernel Principal Component Regression (KPCR) recently proposed in the literature, a new kernel auto-associator (KAA) model is proposed for classification and novelty detection. For face recognition problem, KAA model can efficiently characterize each subject thus offering a good recognition performance. Steming from the Principal component regression (PCR), a simple technique using principal
Proceedings. Fourteenth International Conference on Pattern Recognition (Cat. No.98EX170)
This paper describes a modular class$cation system for handwritten digit recognition based on the... more This paper describes a modular class$cation system for handwritten digit recognition based on the elastic net model. We use ten separate elastic nets to capture different features in the ten classes of handwritten digits and represent an input sample from the activations in each net by population decoding. Compared with traditional neural networks based discriminant classifiers, our scheme features fast training and high recognition accuracy.
Proceedings. Fourteenth International Conference on Pattern Recognition (Cat. No.98EX170)
This paper propose a vector quantization (VQ) technique to solve the problem of handwritten signa... more This paper propose a vector quantization (VQ) technique to solve the problem of handwritten signature verijication. A neural 'gas' model is trained to establish a reference set for each registered person with handwritten signature samples. Then a test sample is compared with all the prototypes in the reference set and the system outputs the label of the writer of the word. Several difSerent feature extraction methods are compared and good results have been obtained by the VQ technique.
Mixture of local principal component analysis (PCA) has attracted attention due to a number of be... more Mixture of local principal component analysis (PCA) has attracted attention due to a number of benefits over global PCA. The performance of a mixture model usually depends on the data partition and local linear fitting. In this paper, we propose a mixture model which has the properties of optimal data partition and robust local fitting. Data partition is realized by a soft competition algorithm called neural 'gas' and robust local linear fitting is approached by a nonlinear extension of PCA learning algorithm. Based on this mixture model, we describe a modular classification scheme for handwritten digit recognition, in which each module or network models the manifold of one of ten digit classes. Experiments demonstrate a very high recognition rate.
LISSOM (Laterally Interconnected Synergetically Self-Organizing Map) is a biologically motivated ... more LISSOM (Laterally Interconnected Synergetically Self-Organizing Map) is a biologically motivated self-organizing neural network for the simultaneous development of topographic maps and lateral interactions in the visual cortex. However, the simple Hebbian mechanism for afferent connections requires a redundant dimension to be added to the input, and normalization is necessary. Another shortcoming of LISSOM is that several parameters must be chosen before it can be used as a model of topographic map formation. To solve these problems, we propose to apply the least mean-square error reconstruction (LMSER) learning rule as an alternative to the simple Hebbian rule for the afferent connections. Experiments demonstrate the essential topographic map properties from the improved LISSOM model
Face recognition is an important research area with many potential applications such as biometric... more Face recognition is an important research area with many potential applications such as biometric security. Among various techniques, eigenface method by principal component analysis (PCA) of face images has been widely used. In traditional eigenface methods, PCA was used to get the eigenvectors of the covariance matrix of a training set of face images and recognition was achieved by applying a template matching scheme with the vectors obtained by projecting new faces along a small number of eigenfaces. In order to avoid the time consuming step of recomputing eigenfaces when new faces are added, we use a set of modules to generate PCA based face representation for each subjects instead of PCA of entire face images. The localized nature of the representation makes the system easy to maintain and tolerant of local facial characteristic changes. Results indicate that the modular scheme yield accurate recognition on the widely used Olivetti Research Laboratory (ORL) face database.
2006 IEEE International Conference on Video and Signal Based Surveillance, 2006
Associative memory models have been studied as efficient paradigm for visual information processi... more Associative memory models have been studied as efficient paradigm for visual information processing. In this paper we investigate a new hetero-associative memory (HAM) model based on the Kernel Partial Least Square (KPLS) regression recently proposed in the literature. Steming from the Partial Least Square (PLS) regression, a class of techniques for modeling relations between blocks of observed variables by means of latent variables, kernelized PLS methods have proved to be efficient in treating nonlinear data. By establishing the relations between sets of observed variables, KPLS can provide a general purpose HAM model, which construct nonlinear regression models in possibly high-dimensional feature spaces and uses a few latent factors to account for most of the variations in the response. For face recognition problem, we apply the HAM model to efficiently characterize each subject by relating some possible variations of a face image to a given face, using a modular strudcture by assigning an independent model to each subject. Several benchmark face databases have been used to test the performance.
Principal component analysis (PCA) is a popular tool in multivariate statistics and pattern recog... more Principal component analysis (PCA) is a popular tool in multivariate statistics and pattern recognition. Recently, some mixture models of local principal component analysis have attracted attention due to a number of bene"ts over global PCA. In this paper, we propose a mixture model by concurrently performing global data partition and local linear PCA. The partition is optimal or near optimal, which is realized by a soft competition algorithm called &neural gas'. The local PCA type representation is approximated by a neural learning algorithm in a nonlinear autoencoder network, which is set up on the generalization of the least-squares reconstruction problem leading to the standard PCA. Such a local PCA type representation has a number of numerical advantages, for example, faster convergence and insensitive to local minima. Based on this mixture model, we describe a modular classi"cation scheme to solve the problem of handwritten digits recognition. We use 10 networks (modules) to capture di!erent features in the 10 classes of handwritten digits, with each network being a mixture model of local PCA type representations. When a test digit is presented to all the modules, each module provides a reconstructed pattern by a prescribed principle and the system outputs the class label by comparing the reconstruction errors from the 10 networks. Compared with some traditional neural network-based classi"ers, our scheme converges faster and recognizes with higher accuracy. For a relatively small size of each module, the classi"cation accuracy reaches 98.6% on the training set and 97.8% on the testing set.
2014 7th International Congress on Image and Signal Processing, 2014
This paper presents a novel system for vision-based driving posture recognition. The driving post... more This paper presents a novel system for vision-based driving posture recognition. The driving posture dataset was prepared by a side-mounted camera looking at a driver's left profile. After pre-processing for illumination variations, eight action classes of constitutive components of the driving activities were segmented, including normal driving, operating a cell phone, eating and smoking. A global grid-based representation for the action sequence was emphasized, which featured two consecutive steps. Step 1 generates a motion descriptive shape based on a motion frequency image(MFI), and step 2 applies the pyramid histogram of oriented gradients (PHOG) for more discriminating characterization. A three level hierarchal classification system is designed to overcome the difficulties of some overlapping classes. Four commonly applied classifiers, including k-nearest neighbor(KNN), random forest (RF), support vector machine(SVM) and multiple layer perceptron (MLP), are evaluated in each level. The overall classification accuracy is over 87.2% for the eight classes of driving actions by the proposed classification system.
Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004., 2004
Autoassociator is an important issue in concept learning, and the learned concept of a particular... more Autoassociator is an important issue in concept learning, and the learned concept of a particular class can be used to distinguish the class from the others. For nonlinear autoassociation, this paper presents a new model referred to as kernel autoassociator. Using kernel feature space as a potential nonlinear manifold, the model formulates the autoassociation as a special reconstruction problem from kernel feature space to input space. Two methods are developed to solve the problem. We evaluate the autoassociator with artificial data, and apply it to handwritten digit recognition and multiview face recognition, yielding positive experimental results.
Face retrieval has received much attention in recent years. This paper comparatively studied five... more Face retrieval has received much attention in recent years. This paper comparatively studied five feature description methods for face representation, including Local Binary Pattern (LBP), Gabor feature, Gray Level Co-occurrence Matrices (GLCM), Pyramid Histogram of Oriented Gradient (PHOG) and Curvelet Transform (CT). The problem of large dimensionalities of the extracted features was addressed by employing a manifold learning method called Spectral Regression (SR). A fusion scheme was proposed by aggregating the distance metrics. Experiments illustrated that dimension reduced features are more efficient and the fusion scheme can offer much enhanced performance. A 98% rank 1 accuracy was obtained for the AR faces and 92% for the FERET faces.
International Journal of Artificial Intelligence and Soft Computing, 2011
ABSTRACT This paper emphasised an approach for offline signature verification and identification.... more ABSTRACT This paper emphasised an approach for offline signature verification and identification. Two image descriptors are studied, including Pyramid Histogram of Oriented Gradients (PHOG), and a direction feature proposed in the literature. Compared with many previously proposed signature feature extraction approaches, PHOG has advantages in the extraction of discriminative information from handwriting signature images. The significance of classification framework is stressed. With the benchmarking database ||Grupo de Procesado Digital de Senales|| (GPDS), satisfactory performances were obtained from several classifiers. Among the classifiers compared, SVM is clearly superior, giving a False Rejection Rate (FRR) of 2.5% and a False Acceptance Rate (FAR) 2% for skillful forgery, which compares sharply with the latest published results on the same dataset. This substantiates the superiority of the proposed method. The related issue offline signature recognition is also investigated based on the same approach, with an accuracy of 99% on the GPDS data from SVM classification.
Uploads
Papers by Bailing Zhang