Papers by Mehrtash Harandi
2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015

Science Engineering Faculty, Mar 7, 2013
In the field of face recognition, Sparse Representation (SR) has received considerable attention ... more In the field of face recognition, Sparse Representation (SR) has received considerable attention during the past few years. Most of the relevant literature focuses on holistic descriptors in closed-set identification applications. The underlying assumption in SR-based methods is that each class in the gallery has sufficient samples and the query lies on the subspace spanned by the gallery of the same class. Unfortunately, such assumption is easily violated in the more challenging face verification scenario, where an algorithm is required to determine if two faces (where one or both have not been seen before) belong to the same person. In this paper, we first discuss why previous attempts with SR might not be applicable to verification problems. We then propose an alternative approach to face verification via SR. Specifically, we propose to use explicit SR encoding on local image patches rather than the entire face. The obtained sparse signals are pooled via averaging to form multiple region descriptors, which are then concatenated to form an overall face descriptor. Due to the deliberate loss spatial relations within each region (caused by averaging), the resulting descriptor is robust to misalignment & various image deformations. Within the proposed framework, we evaluate several SR encoding techniques: l1-minimisation, Sparse Autoencoder Neural Network (SANN), and an implicit probabilistic technique based on Gaussian Mixture Models. Thorough experiments on AR, FERET, exYaleB, BANCA and ChokePoint datasets show that the proposed local SR approach obtains considerably better and more robust performance than several previous state-of-the-art holistic SR methods, in both verification and closed-set identification problems. The experiments also show that l1-minimisation based encoding has a considerably higher computational than the other techniques, but leads to higher recognition rates.
2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015

Vectors of Locally Aggregated Descriptors (VLAD) have emerged as powerful image/video representat... more Vectors of Locally Aggregated Descriptors (VLAD) have emerged as powerful image/video representations that compete with or even outperform state-of-the-art approaches on many challenging visual recognition tasks. In this paper, we address two fundamental limitations of VLAD: its requirement for the local descriptors to have vector form and its restriction to linear classifiers due to its high-dimensionality. To this end, we introduce a kernelized version of VLAD. This not only lets us inherently exploit more sophisticated classification schemes, but also enables us to efficiently aggregate non-vector descriptors (e.g., tensors) in the VLAD framework. Furthermore, we propose three approximate formulations that allow us to accelerate the coding process while still benefiting from the properties of kernel VLAD. Our experiments demonstrate the effectiveness of our approach at handling manifold-valued data, such as covariance descriptors, on several classification tasks. Our results also evidence the benefits of our nonlinear VLAD descriptors against the linear ones in Euclidean space using several standard benchmark datasets.

State-of-the-art image-set matching techniques typically implicitly model each image-set with a G... more State-of-the-art image-set matching techniques typically implicitly model each image-set with a Gaussian distribution. Here, we propose to go beyond these representations and model imagesets as probability distribution functions (PDFs) using kernel density estimators. To compare and match image-sets, we exploit Csiszár f -divergences, which bear strong connections to the geodesic distance defined on the space of PDFs, i.e., the statistical manifold. Furthermore, we introduce valid positive definite kernels on the statistical manifolds, which let us make use of more powerful classification schemes to match image-sets. Finally, we introduce a supervised dimensionality reduction technique that learns a latent space where f -divergences reflect the class labels of the data. Our experiments on diverse problems, such as video-based face recognition and dynamic texture classification, evidence the benefits of our approach over the state-of-the-art image-set matching methods.

In this paper, we tackle the problem of unsupervised domain adaptation for classification. In the... more In this paper, we tackle the problem of unsupervised domain adaptation for classification. In the unsupervised scenario where no labeled samples from the target domain are provided, a popular approach consists in transforming the data such that the source and target distributions be-come similar. To compare the two distributions, existing approaches make use of the Maximum Mean Discrepancy (MMD). However, this does not exploit the fact that prob-ability distributions lie on a Riemannian manifold. Here, we propose to make better use of the structure of this man-ifold and rely on the distance on the manifold to compare the source and target distributions. In this framework, we introduce a sample selection method and a subspace-based method for unsupervised domain adaptation, and show that both these manifold-based techniques outperform the cor-responding approaches based on the MMD. Furthermore, we show that our subspace-based approach yields state-of-the-art results on a standard obj...

International Journal of Computer Vision, 2015
Sparsity-based representations have recently led to notable results in various visual recognition... more Sparsity-based representations have recently led to notable results in various visual recognition tasks. In a separate line of research, Riemannian manifolds have been shown useful for dealing with features and models that do not lie in Euclidean spaces. With the aim of building a bridge between the two realms, we address the problem of sparse coding and dictionary learning over the space of linear subspaces, which form Riemannian structures known as Grassmann manifolds. To this end, we propose to embed Grassmann manifolds into the space of symmetric matrices by an isometric mapping. This in turn enables us to extend two sparse coding schemes to Grassmann manifolds. Furthermore, we propose closed-form solutions for learning a Grassmann dictionary, atom by atom. Lastly, to handle non-linearity in data, we extend the proposed Grassmann sparse coding and dictionary learning algorithms through embedding into Hilbert spaces.
2015 IEEE Winter Conference on Applications of Computer Vision, 2015

IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015
In this paper, we develop an approach to exploiting kernel methods with manifold-valued data. In ... more In this paper, we develop an approach to exploiting kernel methods with manifold-valued data. In many computer vision problems, the data can be naturally represented as points on a Riemannian manifold. Due to the non-Euclidean geometry of Riemannian manifolds, usual Euclidean computer vision and machine learning algorithms yield inferior results on such data. In this paper, we define Gaussian radial basis function (RBF)-based positive definite kernels on manifolds that permit us to embed a given manifold with a corresponding metric in a high dimensional reproducing kernel Hilbert space. These kernels make it possible to utilize algorithms developed for linear spaces on nonlinear manifold-valued data. Since the Gaussian RBF defined with any given metric is not always positive definite, we present a unified framework for analyzing the positive definiteness of the Gaussian RBF on a generic metric space. We then use the proposed framework to identify positive definite kernels on two specific manifolds commonly encountered in computer vision: the Riemannian manifold of symmetric positive definite matrices and the Grassmann manifold, i.e., the Riemannian manifold of linear subspaces of a Euclidean space. We show that many popular algorithms designed for Euclidean spaces, such as support vector machines, discriminant analysis and principal component analysis can be generalized to Riemannian manifolds with the help of such positive definite Gaussian kernels.

Pattern Recognition, 2015
ABSTRACT Integrated analysis of spatial and temporal domains is considered to overcome some of th... more ABSTRACT Integrated analysis of spatial and temporal domains is considered to overcome some of the challenging computer vision problems such as ‘Dynamic Scene Understanding’ and ‘Action Recognition’. In visual tracking, ‘Spatiotemporal Oriented Energy’ (SOE) features are successfully applied to locate the object in cluttered scenes under varying illumination. In contrast to previous studies, this paper introduces SOE features for occlusion modeling and novelty detection in tracking. To this end, we propose a Bayesian state machine that exploits SOE information to analyze occlusion and identify the target status in the course of tracking. The proposed approach can be seamlessly merged with a generic tracking system to prevent template corruption (for example when the target is occluded). Comparative evaluations show that the proposed approach could significantly improve the performance of a generic tracking system in challenging occlusion situations.

Symmetric Positive Definite (SPD) matrices in the form of region covariances are considered rich ... more Symmetric Positive Definite (SPD) matrices in the form of region covariances are considered rich descriptors for images and videos. Recent studies suggest that exploiting the Riemannian geometry of the SPD manifolds could lead to improved performances for vision applications. For tasks involving processing large-scale and dynamic data in computer vision, the underlying model is required to progressively and efficiently adapt itself to the new and unseen observations. Motivated by these requirements, this paper studies the problem of online dictionary learning on the SPD manifolds. We make use of the Stein divergence to recast the problem of online dictionary learning on the manifolds to a problem in Reproducing Kernel Hilbert Spaces, for which, we develop efficient algorithms by taking into account the geometric structure of the SPD manifolds. To our best knowledge, our work is the first study that provides a solution for online dictionary learning on the SPD manifolds. Empirical results on both large-scale image classification task and dynamic video processing tasks validate the superior performance of our approach as compared to several state-of-the-art algorithms.

A robust visual tracking system requires the object model to handle occlusions, deformations, as ... more A robust visual tracking system requires the object model to handle occlusions, deformations, as well as variations in pose and illumination. To handle such challenges, in this paper we propose a tracking approach where the object is modelled as a continuously updated set of affine subspaces, with each subspace constructed from the object's appearance over several consecutive frames. Furthermore, during the search for the object's location in a new frame, we propose to represent the candidate regions also as affine subspaces, by including the immediate tracking history over several frames. Distances between affine subspaces from the object model and candidate regions are obtained by exploiting the non-Euclidean geometry of Grassmann manifolds. Quantitative evaluations on challenging image sequences indicate that the proposed approach obtains considerably better performance than several recent state-of-theart methods such as Tracking-Learning-Detection and Multiple Instance Learning Tracking.

IEEE Transactions on Pattern Analysis and Machine Intelligence, 2014
Low-dimensional representations are key to the success of many video classification algorithms. H... more Low-dimensional representations are key to the success of many video classification algorithms. However, the commonly-used dimensionality reduction techniques fail to account for the fact that only part of the signal is shared across all the videos in one class. As a consequence, the resulting representations contain instance-specific information, which introduces noise in the classification process. In this paper, we introduce non-linear stationary subspace analysis: a method that overcomes this issue by explicitly separating the stationary parts of the video signal (i.e., the parts shared across all videos in one class), from its non-stationary parts (i.e., the parts specific to individual videos). Our method also encourages the new representation to be discriminative, thus accounting for the underlying classification problem. We demonstrate the effectiveness of our approach on dynamic texture recognition, scene classification and action recognition.

2012 19th IEEE International Conference on Image Processing, 2012
For covariance-based image descriptors, taking into account the curvature of the corresponding fe... more For covariance-based image descriptors, taking into account the curvature of the corresponding feature space has been shown to improve discrimination performance. This is often done through representing the descriptors as points on Riemannian manifolds, with the discrimination accomplished on a tangent space. However, such treatment is restrictive as distances between arbitrary points on the tangent space do not represent true geodesic distances, and hence do not represent the manifold structure accurately. In this paper we propose a general discriminative model based on the combination of several tangent spaces, in order to preserve more details of the structure. The model can be used as a weak learner in a boosting-based pedestrian detection framework. Experiments on the challenging INRIA and DaimlerChrysler datasets show that the proposed model leads to considerably higher performance than methods based on histograms of oriented gradients as well as previous Riemannian-based techniques.
2012 19th IEEE International Conference on Image Processing, 2012
With the aim of improving the clustering of data (such as image sequences) lying on Grassmann man... more With the aim of improving the clustering of data (such as image sequences) lying on Grassmann manifolds, we propose to embed the manifolds into Reproducing Kernel Hilbert Spaces. To this end, we define a measure of cluster distortion and embed the manifolds such that the distortion is minimised. We show that the optimal solution is a generalised eigenvalue problem that can be solved very efficiently. Experiments on several clustering tasks (including human action clustering) show that in comparison to the recent intrinsic Grassmann k-means algorithm, the proposed approach obtains notable improvements in clustering accuracy, while also being several orders of magnitude faster.
2013 IEEE Workshop on Applications of Computer Vision (WACV), 2013

Lecture Notes in Computer Science, 2012
Recent advances suggest that a wide range of computer vision problems can be addressed more appro... more Recent advances suggest that a wide range of computer vision problems can be addressed more appropriately by considering non-Euclidean geometry. This paper tackles the problem of sparse coding and dictionary learning in the space of symmetric positive definite matrices, which form a Riemannian manifold. With the aid of the recently introduced Stein kernel (related to a symmetric version of Bregman matrix divergence), we propose to perform sparse coding by embedding Riemannian manifolds into reproducing kernel Hilbert spaces. This leads to a convex and kernel version of the Lasso problem, which can be solved efficiently. We furthermore propose an algorithm for learning a Riemannian dictionary (used for sparse coding), closely tied to the Stein kernel. Experiments on several classification tasks (face recognition, texture classification, person reidentification) show that the proposed sparse coding approach achieves notable improvements in discrimination accuracy, in comparison to state-of-the-art methods such as tensor sparse coding, Riemannian locality preserving projection, and symmetry-driven accumulation of local features.

2013 IEEE Conference on Computer Vision and Pattern Recognition, 2013
ABSTRACT Existing multi-model approaches for image set classification extract local models by clu... more ABSTRACT Existing multi-model approaches for image set classification extract local models by clustering each image set individually only once, with fixed clusters used for matching with other image sets. However, this may result in the two closest clusters to represent different characteristics of an object, due to different undesirable environmental conditions (such as variations in illumination and pose). To address this problem, we propose to constrain the clustering of each query image set by forcing the clusters to have resemblance to the clusters in the gallery image sets. We first define a Frobenius norm distance between subspaces over Grassmann manifolds based on reconstruction error. We then extract local linear subspaces from a gallery image set via sparse representation. For each local linear subspace, we adaptively construct the corresponding closest subspace from the samples of a probe image set by joint sparse representation. We show that by minimising the sparse representation reconstruction error, we approach the nearest point on a Grassmann manifold. Experiments on Honda, ETH-80 and Cambridge-Gesture datasets show that the proposed method consistently outperforms several other recent techniques, such as Affine Hull based Image Set Distance (AHISD), Sparse Approximated Nearest Points(SANP) and Manifold Discriminant Analysis (MDA).

IEEE Transactions on Neural Networks and Learning Systems, 2015
This paper introduces sparse coding and dictionary learning for Symmetric Positive Definite (SPD)... more This paper introduces sparse coding and dictionary learning for Symmetric Positive Definite (SPD) matrices, which are often used in machine learning, computer vision and related areas. Unlike traditional sparse coding schemes that work in vector spaces, in this paper we discuss how SPD matrices can be described by sparse combination of dictionary atoms, where the atoms are also SPD matrices. We propose to seek sparse coding by embedding the space of SPD matrices into Hilbert spaces through two types of Bregman matrix divergences. This not only leads to an efficient way of performing sparse coding, but also an online and iterative scheme for dictionary learning. We apply the proposed methods to several computer vision tasks where images are represented by region covariance matrices. Our proposed algorithms outperform state-of-the-art methods on a wide range of classification tasks, including face recognition, action recognition, material classification and texture categorization.
Uploads
Papers by Mehrtash Harandi