Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2019, Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining
We present a reformulation of the distance metric learning problem as a penalized optimization problem, with a penalty term corresponding to the von Neumann entropy of the distance metric. This formulation leads to a mapping to statistical mechanics such that the metric learning optimization problem becomes equivalent to free energy minimization. Correspondingly, our approach leads to an analytical solution of the optimization problem based on the Boltzmann distribution. The mapping established in this work suggests new approaches for dimensionality reduction and provides insights into determination of optimal parameters for the penalty term. Furthermore, we demonstrate that the metric projects the data onto direction of maximum dissimilarity with optimal and tunable separation between classes and thus the transformation can be used for high dimensional data visualization, classification, and clustering tasks. We benchmark our method against previous distance learning methods and provide an efficient implementation in an R package
2020
The success of many machine learning algorithms (e.g. the nearest neighborhood classification and k-means clustering) depends on the representation of the data as elements in a metric space. Learning an appropriate distance metric from data is usually superior to the default Euclidean distance. In this paper, we revisit the original model proposed by Xing et al. [24] and propose a general formulation of learning a Mahalanobis distance from data. We prove that this novel formulation is equivalent to a convex optimization problem over the spectrahedron. Then, a gradientbased optimization algorithm is proposed to obtain the optimal solution which only needs the computation of the largest eigenvalue of a matrix per iteration. Finally, experiments on various UCI datasets and a benchmark face verification dataset called Labeled Faces in the Wild (LFW) demonstrate that the proposed method compares competitively to those state-of-the-art methods.
2023
Dimensionality reduction techniques are crucial for modern data analysis applications due to the large complexity of the datasets found in several domains of science. In order to find relevant patterns in this vast amount of data, a feature extraction step is often required. Linear approaches were the first class of methods applied to reduce dimensionality of data. However, they assume that the extracted features lie in an Euclidean space, which is not a reasonable assuption in many real problems. In 2000, with advances on kernel methods and computational resources, the research field manifold learning started to emerge with the pioneering ISOMAP algorithm, showing a much more realistic model for dimensionality reduction for metric learning. This paper presents a review of unsupervised metric learning techniques ranging from Principal Component Analysis (PCA) to modern manifold learning algorithms, such as t-SNE, Local Tangent Space Aligment, Diffusion Maps, among others, with a special attention to the mathematical background, but at the same time trying to elucidate the intuition behind each method. This year we celebrate the 20th anniversary of this remarkable field which had profound impact in the study of unsupervised metric learning methods, in the sense that these methods can learn a distance function that geometrically is better suited to represent a similarity measure between a pair of objects in a given dataset. Basically, our goal with this review is to provide a complete guide to researchers interested in initiating at the field.
2000
Many machine learning algorithms, such as K Nearest Neighbor (KNN), heavily rely on the distance metric for the input data patterns. Distance Metric learning is to learn a distance metric for the input space of data from a given collection of pair of similar/dissimilar points that preserves the distance relation among the training data. In recent years, many studies have demonstrated, both empirically and theoretically, that a learned metric can significantly improve the performance in classification, clustering and retrieval tasks. This paper surveys the field of distance metric learning from a principle perspective, and includes a broad selection of recent work. In particular, distance metric learning is reviewed under different learning conditions: supervised learning versus unsupervised learning, learning in a global sense versus in a local sense; and the distance matrix based on linear kernel versus nonlinear kernel. In addition, this paper discusses a number of techniques that is central to distance metric learning, including convex programming, positive semi-definite programming, kernel learning, dimension reduction, K Nearest Neighbor, large margin classification, and graph-based approaches.
2012 IEEE International Conference on Multimedia and Expo, 2012
We introduce a new method for face recognition using a versatile probabilistic model known as Restricted Boltzmann Machine (RBM). In particular, we propose to regularise the standard data likelihood learning with an informationtheoretic distance metric defined on intra-personal images. This results in an effective face representation which captures the regularities in the face space and minimises the intra-personal variations. In addition, our method allows easy incorporation of multiple feature sets with controllable level of sparsity. Our experiments on a high variation dataset show that the proposed method is competitive against other metric learning rivals. We also investigated the RBM method under a variety of settings, including fusing facial parts and utilising localised feature detectors under varying resolutions. In particular, the accuracy is boosted from 71.8% with the standard whole-face pixels to 99.2% with combination of facial parts, localised feature extractors and appropriate resolutions.
Pattern Recognition, 2010
The problem of clustering with side information has received much recent attention and metric learning has been considered as a powerful approach to this problem. Until now, various metric learning methods have been proposed for semi-supervised clustering. Although some of the existing methods can use both positive (must-link) and negative (cannot-link) constraints, they are usually limited to learning a linear transformation (i.e., finding a global Mahalanobis metric). In this paper, we propose a framework for learning linear and non-linear transformations efficiently. We use both positive and negative constraints and also the intrinsic topological structure of data. We formulate our metric learning method as an appropriate optimization problem and find the global optimum of this problem. The proposed non-linear method can be considered as an efficient kernel learning method that yields an explicit non-linear transformation and thus shows out-of-sample generalization ability. Experimental results on synthetic and real-world data sets show the effectiveness of our metric learning method for semi-supervised clustering tasks.
Many algorithms rely critically on being given a good metric over their inputs. For instance, data can often be clustered in many "plausible" ways, and if a clustering algorithm such as K-means initially fails to find one that is meaningful to a user, the only recourse may be for the user to manually tweak the metric until sufficiently good clusters are found. For these and other applications requiring good metrics, it is desirable that we provide a more systematic way for users to indicate what they consider "similar." For instance, we may ask them to provide examples. In this paper, we present an algorithm that, given examples of similar (and, if desired, dissimilar) pairs of points in ¢ ¤ £ , learns a distance metric over ¢ ¥ £ that respects these relationships. Our method is based on posing metric learning as a convex optimization problem, which allows us to give efficient, local-optima-free algorithms. We also demonstrate empirically that the learned metrics can be used to significantly improve clustering performance.
National Conference on Artificial Intelligence, 2008
There has been much recent attention to the problem of learning an appropriate distance metric, using class labels or other side information. Some proposed algorithms are iterative and computationally expensive. In this paper, we show how to solve one of these methods with a closed-form solution, rather than using semidefinite programming. We provide a new problem setup in which the algorithm performs better or as well as some standard methods, but without the computational complexity. Furthermore, we show a strong relationship between these methods and the Fisher Discriminant Analysis.
2003
In this paper, we propose a feature weighting method that works in both the input space and the kernel-induced feature space. It assumes only the availability of similarity (dissimilarity) information, and the number of parameters in the transformation does not depend on the number of features. Besides feature weighting, it can also be regarded as performing nonparametric kernel adaptation. Experimental results on both toy and real-world datasets show promising results.
Neural Networks, 2005. …, 2005
Abstract-Defining a good distance measure between patterns is of crucial importance in many classification andclustering algorithms. Recently, relevant component analysis (RCA) is proposed which offers a simple yet powerful method to learn this distance metric. However, it is ...
2012
We propose a general information-theoretic approach called SERAPH (SEmi-supervised metRic leArning Paradigm with Hyper-sparsity) for metric learning that does not rely upon the manifold assumption. Given the probability parameterized by a Mahalanobis distance, we maximize the entropy of that probability on labeled data and minimize it on unlabeled data following entropy regularization, which allows the supervised and unsupervised parts to be integrated in a natural and meaningful way. Furthermore, SERAPH is regularized by encouraging a low-rank projection induced from the metric. The optimization of SER-APH is solved efficiently and stably by an EMlike scheme with the analytical E-Step and convex M-Step. Experiments demonstrate that SER-APH compares favorably with many well-known global and local metric learning methods.
National Conference on Artificial Intelligence, 2006
Learning application-specific distance metrics from labeled data is critical for both statistical classification and information retrieval. Most of the earlier work in this area has focused on finding metrics that simultaneously optimize compactness and separability in a global sense. Specifically, such distance metrics attempt to keep all of the data points in each class close together while ensuring that data points from different classes are separated. However, particularly when classes exhibit multimodal data distributions, these goals conflict and thus cannot be simultaneously satisfied. This paper proposes a Local Distance Metric (LDM) that aims to optimize local compactness and local separability. We present an efficient algorithm that employs eigenvector analysis and bound optimization to learn the LDM from training data in a probabilistic framework. We demonstrate that LDM achieves significant improvements in both classification and retrieval accuracy compared to global distance learning and kernel-based KNN.
ArXiv, 2021
Distance metric learning algorithms aim to appropriately measure similarities and distances between data points. In the context of clustering, metric learning is typically applied with the assist of side-information provided by experts, most commonly expressed in the form of cannot-link and must-link constraints. In this setting, distance metric learning algorithms move closer pairs of data points involved in must-link constraints, while pairs of points involved in cannot-link constraints are moved away from each other. For these algorithms to be effective, it is important to use a distance metric that matches the expert knowledge, beliefs, and expectations, and the transformations made to stick to the side-information should preserve geometrical properties of the dataset. Also, it is interesting to filter the constraints provided by the experts to keep only the most useful and reject those that can harm the clustering process. To address these issues, we propose to exploit the dual...
Research Square (Research Square), 2024
Clustering does not assume the knowledge of the class labels, defining an unsupervised learning paradigm relevant for many pattern recognition and machine learning problems. One of the limitations of clustering is that most algorithms rely heavily in a distance function used to compute a dissimilarity measure between the samples. The usual choice is the Euclidean distance, which is known to have a poor discriminant power in high-dimensional spaces and also is quite sensitive to the presence of outliers. In this paper, we propose to investigate how unsupervised metric learning via the curvature-based ISOMAP algorithm (K-ISOMAP) can influence the clustering performance by means of quantitative evaluation metrics. The computation of the local curvature in the dimensionality reduction process allows the incorporation of an adaptive and intrinsic distance metric function into clustering, making it more aware of the geometry of the underlying data manifold. Computational experiments with real-world datasets indicate that the use of K-ISOMAP prior to a clustering algorithm may produce superior Rand, Calinski-Harabasz and Fowlkes-Mallows indices, in comparison to raw and regular ISOMAP data, suggesting that learning a suitable metric can be a relevant pre-processing step before clustering.
In this paper, we introduce two novel metric learning algorithms, ��2-LMNN and GB-LMNN, which are explicitly designed to be non-linear and easy-to-use. The two approaches achieve this goal in fundamentally different ways: ��2-LMNN inherits the computational benefits of a linear mapping from linear metric learning, but uses a non-linear ��2-distance to explicitly capture similarities within histogram data sets; GB-LMNN applies gradient-boosting to learn non-linear mappings directly in function space and takes advantage of this ...
ACM Transactions on Intelligent Systems and Technology, 2020
We propose a novel deep metric learning method. Differently from many works on this area, we defined a novel latent space obtained through an autoencoder. The new space, namely S-space, is divided into different regions that describe the positions where pairs of objects are similar/dissimilar. We locate makers to identify these regions. We estimate the similarities between objects through a kernel-based t-student distribution to measure the markers' distance and the new data representation. In our approach, we simultaneously estimate the markers' position in the S-space and represent the objects in the same space. Moreover, we propose a new regularization function to avoid similar markers to collapse altogether. We present evidences that our proposal can represent complex spaces, for instance, when groups of similar objects are located in disjoint regions. We compare our proposal to 9 different distance metric learning approaches (four of them are based on deep-learning) on 28 real-world heterogeneous datasets. According to the four quantitative metrics used, our method overcomes all the nine strategies from the literature.
2003
We address the problem of learning distance metrics using side-information in the form of groups of "similar" points. We propose to use the RCA algorithm, which is a simple and efficient algorithm for learning a full ranked Mahalanobis metric . We first show that RCA obtains the solution to an interesting optimization problem, founded on an information theoretic basis. If the Mahalanobis matrix is allowed to be singular, we show that Fisher's linear discriminant followed by RCA is the optimal dimensionality reduction algorithm under the same criterion. We then show how this optimization problem is related to the criterion optimized by another recent algorithm for metric learning , which uses the same kind of side information. We empirically demonstrate that learning a distance metric using the RCA algorithm significantly improves clustering performance, similarly to the alternative algorithm. Since the RCA algorithm is much more efficient and cost effective than the alternative, as it only uses closed form expressions of the data, it seems like a preferable choice for the learning of full rank Mahalanobis distances.
2007
Distance metric learning and nonlinear dimensionality reduction are two interesting and active topics in recent years. However, the connection between them is not thoroughly studied yet. In this paper, a transductive framework of distance metric learning is proposed and its close connection with many nonlinear spectral dimensionality reduction methods is elaborated. Furthermore, we prove a representer theorem for our framework, linking it with function estimation in an RKHS, and making it possible for generalization to unseen test samples. In our framework, it suffices to solve a sparse eigenvalue problem, thus datasets with 10 5 samples can be handled. Finally, experiment results on synthetic data, several UCI databases and the MNIST handwritten digit database are shown.
2015
Distance metric learning has proven to be very successful in various problem domains. Most techniques learn a global metric in the form of a n × n symmetric positive semidefinite (PSD) Mahalanobis distance matrix, which has O(n 2) unknowns. The PSD constraint makes solving the metric learning problem even harder making it computationally intractable for high dimensions. In this work, we propose a flexible formulation that can employ different regularization functions, while implicitly maintaining the positive semidefiniteness constraint. We achieve this by eigendecomposition of the rank p Mahalanobis distance matrix followed by a joint optimization on the Stiefel manifold S n,p and the positive orthant R p +. The resulting nonconvex optimization problem is solved by employing an alternating strategy. We use a recently proposed projection free approach for efficient optimization over the Stiefel manifold. Even though the problem is nonconvex, we empirically show competitive classification accuracy on UCI and USPS digits datasets.
Most of the current metric learning methods are proposed for point-to-point distance (PPD) based classification. In many computer vision tasks, however, we need to measure the point-to-set distance (PSD) and even set-to-set distance (SSD) for classification. In this paper, we extend the PPD based Mahalanobis distance metric learning to PSD and SSD based ones, namely point-to-set distance metric learning (PSDML) and set-to-set distance metric learning (SSDML), and solve them under a unified optimization framework. First, we generate positive and negative sample pairs by computing the PSD and SSD between training samples. Then, we characterize each sample pair by its covariance matrix, and propose a covariance kernel based discriminative function. Finally, we tackle the PSDML and SSDML problems by using standard support vector machine solvers, making the metric learning very efficient for multiclass visual classification tasks. Experiments on gender classification, digit recognition, object categorization and face recognition show that the proposed metric learning methods can effectively enhance the performance of PSD and SSD based classification.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.