Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2006, Pattern Recognition
Multi-way partitioning of an undirected weighted graph where pairwise similarities are assigned as edge weights, provides an important tool for data clustering, but is an NP-hard problem. Spectral relaxation is a popular way of relaxation, leading to spectral clustering where the clustering is performed by the eigen-decomposition of the (normalized) graph Laplacian. On the other hand, semidefinite relaxation, is an alternative way of relaxing a combinatorial optimization, leading to a convex optimization. In this paper we employ a semidefinite programming (SDP) approach to the graph equipartitioning for clustering, where sufficient conditions for strong duality hold. The method is referred to as semidefinite spectral clustering, where the clustering is based on the eigen-decomposition of the optimal feasible matrix computed by SDP. Numerical experiments with several data sets, demonstrate the useful behavior of our semidefinite spectral clustering, compared to existing spectral clustering methods.
Lecture Notes in Computer Science, 2002
A semidefinite program (SDP) is an optimization problem over n × n symmetric matrices where a linear function of the entries is to be minimized subject to linear equality constraints, and the condition that the unknown matrix is positive semidefinite. Standard techniques for solving SDP's require O(n 3) operations per iteration. We introduce subspace algorithms that greatly reduce the cost os solving large-scale SDP's. We apply these algorithms to SDP approximations of graph partitioning problems. We numerically compare our new algorithm with a standard semidefinite programming algorithm and show that our subspace algorithm performs better.
Studies in big data, 2017
This chapter discusses clustering methods based on similarities between pairs of objects. Such a knowledge does not imply that the entire objects are embedded in a metric space. Instead, the local knowledge supports a graphical representation displaying relationships among the objects from a given data set. The problem of data clustering transforms then into the problem of graph partitioning, and this partitioning is acquired by analysing eigenvectors of the graph Laplacian, a basic tool used in spectral graph theory. We explain how various forms of graph Laplacian are used in various graph partitioning criteria, and how these translate into particular algorithms. There is a strong and fascinating relationship between graph Laplacian and random walk on a graph. Particularly, it allows to formulate a number of other clustering criteria, and to formulate another data clustering algorithms. We briefly review these problems. It should be noted that the eigenvectors deliver so-called spectral representation of data items. Unfortunately, this representation is fixed for a given data set, and adding or deleting some items destroys it. Thus we discuss recently invented methods of out of sample spectral clustering allowing to overcome this disadvantage. Although spectral methods are successful in extracting non-convex groups in data, the process of forming graph Laplacian is memory consuming and computing its eigenvectors is time consuming. Thus we discuss various local methods in which only relevant part of the graph are considered. Moreover, we mention a number of methods allowing fast and approximate computation of the eigenvectors.
Journal of Computer Science and Cybernetics, 2015
Cluster analysis is an unsupervised technique of grouping related objects without considering their label or class. The objects belonging to the same cluster are relatively more homogeneous in comparison with other clusters. The application of cluster analysis is in areas like gene expression analysis, galaxy formation, natural language processing and image segmentation etc. The clustering problem can be formulated as a graph cut problem where a suitable objective function has to be optimized. This study uses different graph cluster formulations based on graph cut and partitioning problems. A special class of graph clustering algorithm known as spectral clustering algorithms is used for the study. Two widely used spectral clustering algorithms are applied to explaining solution to these problems. These algorithms are generally based on the Eigen-decomposition of Laplacian matrices of either weighted or non-weighted graphs.
2007
Spectral clustering is a powerful technique in data analysis that has found increasing support and application in many areas. This report is geared to give an introduction to its methods, presenting the most common algorithms, discussing advantages and disadvantages of each, rather than endorsing one of them as the best, because, arguably, there is no black-box algorithm, which performs equally
Proceedings of the AAAI Conference on Artificial Intelligence
Spectral clustering has found extensive use in many areas. Most traditional spectral clustering algorithms work in three separate steps: similarity graph construction; continuous labels learning; discretizing the learned labels by k-means clustering. Such common practice has two potential flaws, which may lead to severe information loss and performance degradation. First, predefined similarity graph might not be optimal for subsequent clustering. It is well-accepted that similarity graph highly affects the clustering results. To this end, we propose to automatically learn similarity information from data and simultaneously consider the constraint that the similarity matrix has exact c connected components if there are c clusters. Second, the discrete solution may deviate from the spectral solution since k-means method is well-known as sensitive to the initialization of cluster centers. In this work, we transform the candidate solution into a new one that better approximates the disc...
2017
Spectral clustering is often used to partition a data set into a specified number of clusters. Both the unweighted and the vertex-weighted approaches use eigenvectors of the Laplacian matrix of a graph. Our focus is on using vertex-weighted methods to refine clustering of observations. Coefficients of a Fiedler vector are used to partition vertices of a given graph into two clusters. A vertex is classified as unassociated if the Fiedler coefficient of the vertex is close to zero compared to the largest Fiedler coefficient of the graph. We propose a vertex-weighted spectral clustering algorithm which incorporates a vector of weights for each vertex of a given graph to form a vertex-weighted graph. The proposed algorithm predicts association of data points while the unweighted clustering does not provide association. Finally, we implemented both the algorithms on several data sets to show that the proposed algorithm works in general.
Mathematics
Spectral techniques are often used to partition the set of vertices of a graph, or to form clusters. They are based on the Laplacian matrix. These techniques allow easily to integrate weights on the edges. In this work, we introduce a p-Laplacian, or a generalized Laplacian matrix with potential, which also allows us to take into account weights on the vertices. These vertex weights are independent of the edge weights. In this way, we can cluster with the importance of vertices, assigning more weight to some vertices than to others, not considering only the number of vertices. We also provide some bounds, similar to those of Chegeer, for the value of the minimal cut cost with weights at the vertices, as a function of the first non-zero eigenvalue of the p-Laplacian (an analog of the Fiedler eigenvalue).
Lecture Notes in Computer Science, 2014
In this paper we propose a new method for choosing the number of clusters and the most appropriate eigenvectors, that allow to obtain the optimal clustering. To accomplish the task we suggest to examine carefully properties of adjacency matrix eigenvectors: their weak localization as well as the sign of their values. The algorithm has only one parameter-the number of mutual neighbors. We compare our method to several clustering solutions using different types of datasets. The experiments demonstrate that our method outperforms in most cases many other clustering algorithms.
International Joint Conference on Computational Intelligence, 2011
In this paper, we propose a semi-supervised spectral clustering method able to integrate some limited supervisory information. This prior knowledge consists of pairwise constraints which indicate whether a pair of objects belongs to a same cluster (Must-Link constraints) or not (Cannot-Link constraints). The spectral clustering then aims at optimizing a cost function built as a classical Multiple Normalized Cut measure, modified in order to penalize the non-respect of these constraints. We show the relevance of the proposed method with an illustrative dataset and some UCI benchmarks, for which two-class and multi-class problems are dealt with. In all examples, a comparison with other semi-supervised clustering algorithms using pairwise constraints is proposed.
In recent years, spectral clustering has become one of the most popular modern clustering algorithms. It is simple to implement, can be solved efficiently by standard linear algebra software, and very often outperforms traditional clustering algorithms such as the k-means algorithm. On the first glance spectral clustering appears slightly mysterious , and it is not obvious to see why it works at all and what it really does. The goal of this tutorial is to give some intuition on those questions. We describe different graph Laplacians and their basic properties, present the most common spectral clustering algorithms, and derive those algorithms from scratch by several different approaches. Advantages and disadvantages of the different spectral clustering algorithms are discussed.
ArXiv, 2020
This article considers spectral community detection in the regime of sparse networks with heterogeneous degree distributions, for which we devise an algorithm to efficiently retrieve communities. Specifically, we demonstrate that a conveniently parametrized form of regularized Laplacian matrix can be used to perform spectral clustering in sparse networks, without suffering from its degree heterogeneity. Besides, we exhibit important connections between this proposed matrix and the now popular non-backtracking matrix, the Bethe-Hessian matrix, as well as the standard Laplacian matrix. Interestingly, as opposed to competitive methods, our proposed improved parametrization inherently accounts for the hardness of the classification problem. These findings are summarized under the form of an algorithm capable of both estimating the number of communities and achieving high-quality community reconstruction.
Spectral clustering is a fundamental technique in the field of data mining and information processing. Most existing spectral clustering algorithms integrate dimensionality reduction into the clustering process assisted by manifold learning in the original space. However, the manifold in reduced-dimensional subspace is likely to exhibit altered properties in contrast with the original space. Thus, applying manifold information obtained from the original space to the clustering process in a low-dimensional subspace is prone to inferior performance. Aiming to address this issue, we propose a novel convex algorithm that mines the manifold structure in the low-dimensional subspace. In addition, our unified learning process makes the manifold learning particularly tailored for the clustering. Compared with other related methods, the proposed algorithm results in more structured clustering result. To validate the efficacy of the proposed algorithm, we perform extensive experiments on several benchmark datasets in comparison with some state-of-the-art clustering approaches. The experimental results demonstrate that the proposed algorithm has quite promising clustering performance.
2011
Spectral clustering is a flexible clustering methodology that is applicable to a variety of data types and has the particular virtue that it makes few assumptions on cluster shapes. It has become popular in a variety of application areas, particularly in computational vision and bioinformatics. The approach appears, however, to be particularly sensitive to irrelevant and noisy dimensions in the data. We thus introduce an approach that automatically learns the relevant dimensions and spectral clustering simultaneously. We pursue an augmented form of spectral clustering in which an explicit projection operator is incorporated in the relaxed optimization functional. We optimize this functional over both the projection and the spectral embedding. Experiments on simulated and real data show that this approach yields significant improvements in the performance of spectral clustering.
Mathematical Programming, 2014
The graph partition problem is the problem of partitioning the vertex set of a graph into a fixed number of sets of given sizes such that the sum of weights of edges joining different sets is optimized. In this paper we simplify a known matrixlifting semidefinite programming relaxation of the graph partition problem for several classes of graphs and also show how to aggregate additional triangle and independent set constraints for graphs with symmetry. We present an eigenvalue bound for the graph partition problem of a strongly regular graph, extending a similar result for the equipartition problem. We also derive a linear programming bound of the graph partition problem for certain Johnson and Kneser graphs. Using what we call the Laplacian algebra of a graph, we derive an eigenvalue bound for the graph partition problem that is the first known closed form bound that is applicable to any graph, thereby extending a well-known result in spectral graph theory. Finally, we strengthen a known semidefinite programming relaxation of a specific quadratic assignment problem and the above-mentioned matrix-lifting semidefinite programming relaxation by adding two constraints that correspond to assigning two vertices of the graph to different parts of the partition. This strengthening performs well on highly symmetric graphs when other relaxations provide weak or trivial bounds.
2005
Consistency is a key property of statistical algorithms, when the data is drawn from some underlying probability distribution. Surprisingly, despite decades of work, little is known about consistency of most clustering algorithms. In this paper we investigate consistency of a popular family of spectral clustering algorithms, which cluster the data with the help of eigenvectors of graph Laplacian matrices. We show that one of the two of major classes of spectral clustering (normalized clustering) converges under some very general conditions, while the other (unnormalized), is only consistent under strong additional assumptions, which, as we demonstrate, are not always satisfied in real data. We conclude that our analysis provides strong evidence for the superiority of normalized spectral clustering in practical applications. We believe that methods used in our analysis will provide a basis for future exploration of Laplacian-based methods in a statistical setting.
Lecture Notes in Computer Science, 2006
In this paper, we analyze the second eigenvector technique of spectral partitioning on the planted partition random graph model, by constructing a recursive algorithm using the second eigenvectors in order to learn the planted partitions. The correctness of our algorithm is not based on the ratio-cut interpretation of the second eigenvector, but exploits instead the stability of the eigenvector subspace. As a result, we get an improved cluster separation bound in terms of dependence on the maximum variance. We also extend our results for a clustering problem in the case of sparse graphs.
Pattern Recognition Letters, 2014
Kernel learning is one of the most important and recent approaches to constrained clustering. Until now many kernel learning methods have been introduced for clustering when side information in the form of pairwise constraints is available. However, almost all of the existing methods either learn a whole kernel matrix or learn a limited number of parameters. Although the non-parametric methods that learn whole kernel matrix can provide capability of finding clusters of arbitrary structures, they are very computationally expensive and these methods are feasible only on small data sets. In this paper, we propose a kernel learning method that shows flexibility in the number of variables between the two extremes of freedom degree. The proposed method uses a spectral embedding to learn a square matrix whose number of rows is the number of dimensions in the embedded space. Therefore, the proposed method shows much higher scalability compared to other methods that learn a kernel matrix. Experimental results on synthetic and real-world data sets show that the performance of the proposed method is generally near to the learning a whole kernel matrix while its time cost is very low compared to these methods.
Pattern Recognition Letters, 2016
Detection of data structures in spectral clustering approaches becomes a difficult task when dealing with complex distributions. Moreover, there is a need of a real user prior knowledge about the influence of the free parameters when building the graph. Here, we introduce a graph pruning approach, termed Kernel Alignment based Graph Pruning (KAGP), within a spectral clustering framework that enhances both the local and global data consistencies for a given input similarity. The KAGP allows revealing hidden data structures by finding relevant pair-wise relationships among samples. So, KAGP estimates the loss of information during the pruning process in terms of a kernel alignment-based cost function. Besides, we encode the sample similarities using a compactly supported kernel function that allows obtaining a sparse data representation to support spectral clustering techniques. Attained results shows that KAGP enhances the clustering performance in most of the cases. In addition, KAGP avoids the need for a comprehensive user knowledge regarding the influence of its free parameters.
Bioinformatics (Oxford, England), 2018
Single-cell RNA-sequencing (scRNA-seq) technology can generate genome-wide expression data at the single-cell levels. One important objective in scRNA-seq analysis is to cluster cells where each cluster consists of cells belonging to the same cell type based on gene expression patterns. We introduce a novel spectral clustering framework that imposes sparse structures on a target matrix. Specifically, we utilize multiple doubly stochastic similarity matrices to learn a similarity matrix, motivated by the observation that each similarity matrix can be a different informative representation of the data. We impose a sparse structure on the target matrix followed by shrinking pairwise differences of the rows in the target matrix, motivated by the fact that the target matrix should have these structures in the ideal case. We solve the proposed non-convex problem iteratively using the ADMM algorithm and show the convergence of the algorithm. We evaluate the performance of the proposed clus...
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.