Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2013, Lecture Notes in Computer Science
Consensus clustering methodologies combine a set of partitions on the clustering ensemble providing a consensus partition. One of the drawbacks of the standard combination algorithms is that all the partitions of the ensemble have the same weight on the aggregation process. By making a differentiation among the partitions the quality of the consensus could be improved. In this paper we propose a novel formulation that tries to find a median-partition for the clustering ensemble process based on the evidence accumulation framework, but including a weighting mechanism that allows to differentiate the importance of the partitions of the ensemble in order to become more robust to noisy ensembles. Experiments on both synthetic and real benchmark data show the effectiveness of the proposed approach.
Machine Learning, 2013
Clustering ensemble methods produce a consensus partition of a set of data points by combining the results of a collection of base clustering algorithms. In the evidence accumulation clustering (EAC) paradigm, the clustering ensemble is transformed into a pairwise co-association matrix, thus avoiding the label correspondence problem, which is intrinsic to other clustering ensemble schemes. In this paper, we propose a consensus clustering approach based on the EAC paradigm, which is not limited to crisp partitions and fully ex
Proceedings of the 2nd International Conference on Pattern Recognition Applications and Methods, 2013
Ensemble clustering methods derive a consensus partition of a set of objects starting from the results of a collection of base clustering algorithms forming the ensemble. Each partition in the ensemble provides a set of pairwise observations of the co-occurrence of objects in a same cluster. The evidence accumulation clustering paradigm uses these co-occurrence statistics to derive a similarity matrix, referred to as co-association matrix, which is fed to a pairwise similarity clustering algorithm to obtain a final consensus clustering. The advantage of this solution is the avoidance of the label correspondence problem, which affects other ensemble clustering schemes. In this paper we derive a principled approach for the extraction of a consensus clustering from the observations encoded in the co-association matrix. We introduce a probabilistic model for the co-association matrix parameterized by the unknown assignments of objects to clusters, which are in turn estimated using a maximum likelihood approach. Additionally, we propose a novel algorithm to carry out the parameter estimation with convergence guarantees towards a local solution. Experiments on both synthetic and real benchmark data show the effectiveness of the proposed approach.
2008 Proceedings of the Tenth Workshop on Algorithm Engineering and Experiments (ALENEX), 2008
Consensus clustering is the problem of reconciling clustering information about the same data set coming from different sources or from different runs of the same algorithm. Cast as an optimization problem, consensus clustering is known as median partition, and has been shown to be NP-complete. A number of heuristics have been proposed as approximate solutions, some with performance guarantees. In practice, the problem is apparently easy to approximate, but guidance is necessary as to which heuristic to use depending on the number of elements and clusterings given. We have implemented a number of heuristics for the consensus clustering problem, and here we compare their performance, independent of data size, in terms of efficacy and efficiency, on both simulated and real data sets. We find that based on the underlying algorithms and their behavior in practice the heuristics can be categorized into two distinct groups, with ramification as to which one to use in a given situation, and that a hybrid solution is the best bet in general. We have also developed a refined consensus clustering heuristic for the occasions when the given clusterings may be too disparate, and their consensus may not be representative of any one of them, and we show that in practice the refined consensus clusterings can be much superior to the general consensus clustering.
Expert Systems with Applications, 2020
This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
Anais do 10. Congresso Brasileiro de Inteligência Computacional, 2016
Consensus clustering has emerged as a method of improving quality and robustness in clustering by optimally combining the results of different clustering process. In last few years, several approaches are proposed. In this paper, we propose a new method of arriving at a consensus clustering. We assign confidence score to each partition in the ensemble and compute weighted co-association for all pairs of data objects. In order to derive the consensus clustering from the co-association matrix, we use cross-association technique to group the rows and columns simultaneously. The objective is to derive as many clusters of homogenous blocks as possible. The set of non-zero blocks are taken as the resulting partition. The use of cross-association technique captures the transitive relationship. We show empirically that for the benchmark datasets, our technique yields better consensus clustering than any other known algorithms.
Lecture Notes in Computer Science, 2010
Work on clustering combination has shown that clustering combination methods typically outperform single runs of clustering algorithms. While there is much work reported in the literature on validating data partitions produced by the traditional clustering algorithms, little has been done in order to validate data partitions produced by clustering combination methods. We propose to assess the quality of a consensus partition using a pattern pairwise similarity induced from the set of data partitions that constitutes the clustering ensemble. A new validity index based on the likelihood of the data set given a data partition, and three modified versions of well-known clustering validity indices are proposed. The validity measures on the original, clustering ensemble, and similarity spaces are analysed and compared based on experimental results on several synthetic and real data sets.
Pattern Recognition, 2010
Voting-based consensus clustering refers to a distinct class of consensus methods in which the cluster label mismatch problem is explicitly addressed. The voting problem is defined as the problem of finding the optimal relabeling of a given partition with respect to a reference partition. It is commonly formulated as a weighted bipartite matching problem. In this paper, we present a more general formulation of the voting problem as a regression problem with multiple-response and multiple-input variables. We show that a recently introduced cumulative voting scheme is a special case corresponding to a linear regression method. We use a randomized ensemble generation technique, where an overproduced number of clusters is randomly selected for each ensemble partition. We apply an information theoretic algorithm for extracting the consensus clustering from the aggregated ensemble representation and for estimating the number of clusters. We apply it in conjunction with bipartite matching and cumulative voting. We present empirical evidence showing substantial improvements in clustering accuracy, stability, and estimation of the true number of clusters based on cumulative voting. The improvements are achieved in comparison to consensus algorithms based on bipartite matching, which perform very poorly with the chosen ensemble generation technique, and also to other recent consensus algorithms.
International Journal of Electrical and Computer Engineering (IJECE), 2018
Data analysis plays a prominent role in interpreting various phenomena. Data mining is the process to hypothesize useful knowledge from the extensive data. Based upon the classical statistical prototypes the data can be exploited beyond the storage and management of the data. Cluster analysis a primary investigation with little or no prior knowledge, consists of research and development across a wide variety of communities. Cluster ensembles are melange of individual solutions obtained from different clusterings to produce final quality clustering which is required in wider applications. The method arises in the perspective of increasing robustness, scalability and accuracy. This paper gives a brief overview of the generation methods and consensus functions included in cluster ensemble. The survey is to analyze the various techniques and cluster ensemble methods.
Proceedings - Fourth IEEE International Conference on Data Mining, ICDM 2004, 2004
In combination of multiple partitions, one is usually interested in deriving a consensus solution with a quality better than that of given partitions. Several recent studies have empirically demonstrated improved accuracy of clustering ensembles on a number of artificial and realworld data sets. Unlike certain multiple supervised classifier systems, convergence properties of unsupervised clustering ensembles remain unknown for conventional combination schemes. In this paper we present formal arguments on the effectiveness of cluster ensemble from two perspectives. The first is based on a stochastic partition generation model related to re-labeling and consensus function with plurality voting. The second is to study the property of the "mean" partition of an ensemble with respect to a metric on the space of all possible partitions. In both the cases, the consensus solution can be shown to converge to a true underlying clustering solution as the number of partitions in the ensemble increases. This paper provides a rigorous justification for the use of cluster ensemble.
Proceedings of the 2006 SIAM International Conference on Data Mining, 2006
Cluster ensembles offer a solution to challenges inherent to clustering arising from its ill-posed nature. Cluster ensembles can provide robust and stable solutions by leveraging the consensus across multiple clustering results, while averaging out emergent spurious structures that arise due to the various biases to which each participating algorithm is tuned. In this paper, we address the problem of combining multiple weighted clusters which belong to different subspaces of the input space. We leverage the diversity of the input clusterings in order to generate a consensus partition that is superior to the participating ones. Since we are dealing with weighted clusters, our consensus function makes use of the weight vectors associated with the clusters. The experimental results show that our ensemble technique is capable of producing a partition that is as good as or better than the best individual clustering.
The clustering ensembles contains multiple partitions are divided by different clustering algorithms into a single clustering solutions. Clustering ensembles used for improving robustness, stability, and accuracy of unsupervised classification solutions. The major problem of clustering ensemble is the consensus function. Consensus functions in clustering ensembles including hyperactive graph partition, mutual information, co-association based functions, voting approach and finite machine. The characteristics of clustering ensembles algorithm are computational complexity, robustness, simplicity and accuracy on different datasets in previous techniques.
IEEE Transactions on Pattern Analysis and Machine Intelligence, 2000
2006
Ensemble of clustering methods is recently shown to perform better than conventional clustering methods. One of the drawback of the ensemble is, its computational requirements can be very large and hence may not be suitable for large data sets. The paper presents an ensemble of leaders clustering methods where the entire ensemble requires only a single scan of the data set. Further, the component leaders complement each other while deriving individual partitions. A heuristic based consensus method to combine the individual partitions is presented and is compared with a well known consensus method called co-association based consensus. Experimentally the proposed methods are shown to perform well. for
International journal of electronics engineering and application, 2021
Clustering Ensemble, also referred as Consensus Clustering, is a tool for enhancing the reliability and stability of data clustering by aggregating the base clusterings obtained by different clustering algorithms in the input ensemble. This study introduces a novel ensemble selection strategy for establishing consensus clustering. Our strategy avoids looking at the entire population of base clusterings in the ensemble in order to establish a quality consensus by carefully selecting a few base clusterings. The experimental results reveal that the suggested method's consensus clustering surpasses some other well-known clustering ensemble methods in terms of clustering accuracy for diverse data sets.
Applied Artificial Intelligence, 2012
Proceedings of the International Conference on Knowledge Discovery and Information Retrieval, 2009
In order to combine multiple data partitions into a more robust data partition, several approaches to produce the cluster ensemble and various consensus functions have been proposed. This range of possibilities in the multiple data partitions combination raises a new problem: which of the existing approaches, to produce the cluster ensembles' data partitions and to combine these partitions, best fits a given data set. In this paper, we address the cluster ensemble selection problem. We proposed a new measure to select the best consensus data partition, among a variety of consensus partitions, based on a notion of average cluster consistency between each data partition that belongs to the cluster ensemble and a given consensus partition. We compared the proposed measure with other measures for cluster ensemble selection, using 9 different data sets, and the experimental results shown that the consensus partitions selected by our approach usually were of better quality in comparison with the consensus partitions selected by other measures used in our experiments.
2012
In this paper, a new paradigm of clustering is proposed, which is based on a new Binarization of Consensus Partition Matrix (Bi-CoPaM) technique. This method exploits the results of multiple clustering experiments over the same dataset to generate one fuzzy consensus partition. The proposed tunable techniques to binarize this partition reflect the biological reality in that it allows some genes to be assigned to multiple clusters and others not to be assigned at all. The proposed method has the ability to show the relative ...
2009
The clustering ensembles combine multiple partitions generated by different clustering algorithms into a single clustering solution. Clustering ensembles have emerged as a prominent method for improving robustness, stability and accuracy of unsupervised classification solutions. So far, many contributions have been done to find consensus clustering. One of the major problems in clustering ensembles is the consensus function. In this paper, firstly, we introduce clustering ensembles, representation of multiple partitions, its challenges and present taxonomy of combination algorithms. Secondly, we describe consensus functions in clustering ensembles including Hypergraph partitioning, Voting approach, Mutual information, Co-association based functions and Finite mixture model, and next explain their advantages, disadvantages and computational complexity. Finally, we compare the characteristics of clustering ensembles algorithms such as computational complexity, robustness, simplicity and accuracy on different datasets in previous techniques.
IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008
Over the past few years, there has been a renewed interest in the consensus clustering problem. Several new methods have been proposed for finding a consensus partition for a set of n data objects that optimally summarizes an ensemble. In this paper, we propose new consensus clustering algorithms with linear computational complexity in n. We consider clusterings generated with a random number of clusters, which we describe by categorical random variables. We introduce the idea of cumulative voting as a solution for the problem of cluster label alignment, where unlike the common one-to-one voting scheme, a probabilistic mapping is computed. We seek a first summary of the ensemble that minimizes the average squared distance between the mapped partitions and the optimal representation of the ensemble, where the selection criterion of the reference clustering is defined based on maximizing the information content as measured by the entropy. We describe cumulative vote weighting schemes and corresponding algorithms to compute an empirical probability distribution summarizing the ensemble. Given the arbitrary number of clusters of the input partitions, we formulate the problem of extracting the optimal consensus as that of finding a compressed summary of the estimated distribution that preserves the maximum relevant information. An efficient solution is obtained using an agglomerative algorithm that minimizes the average generalized Jensen-Shannon divergence within the cluster. The empirical study demonstrates significant gains in accuracy and superior performance compared to several recent consensus clustering algorithms.
World Journal Of Advanced Research and Reviews, 2022
Voting-based consensus clustering is a subset of consensus techniques that makes clear the cluster label mismatch issue. Finding the best relabeling for a given partition in relation to a reference partition is known as the voting problem. As a weighted bipartite matching problem, it is frequently formulated. We propose a more generic formulation of the voting problem as a regression problem with various and multiple-input variables in this work. We demonstrate how a recently developed cumulative voting system is an exception that corresponds to a linear regression technique. We employ a randomised ensemble creation method in which an excess of clusters are randomly chosen for each ensemble partition. In order to extract the consensus clustering from the combined ensemble representation and to calculate the number of clusters, we use an information-theoretic approach. Together with bipartite matching and cumulative voting, we use it. We provide empirical data demonstrating significant enhancements in clustering stability, estimation of the real number of clusters, and accuracy of clustering based on cumulative voting. The gains are made in comparison to recent consensus algorithms as well as bipartite matching-based consensus algorithms, which struggle with the selected ensemble generation technique.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.