Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2012, Applied Artificial Intelligence
AI
The paper discusses consensus functions applied to cluster ensembles, addressing significant challenges faced in clustering techniques such as scalability, robustness, and sensitivity to noise. By combining multiple clustering results using consensus functions, the study aims to enhance performance and reliability in clustering outcomes, demonstrating the effectiveness of various consensus methods through experimental evaluation across different datasets.
IEEE Transactions on Pattern Analysis and Machine Intelligence, 2000
Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 2011
Cluster ensembles combine multiple clusterings of a set of objects into a single consolidated clustering, often referred to as the consensus solution. Consensus clustering can be used to generate more robust and stable clustering results compared to a single clustering approach, perform distributed computing under privacy or sharing constraints, or reuse existing knowledge. This paper describes a variety of algorithms that have been proposed to address the cluster ensemble problem, organizing them in conceptual categories that bring out the common threads and lessons learnt while simultaneously highlighting unique features of individual approaches.
Proceedings - Fourth IEEE International Conference on Data Mining, ICDM 2004, 2004
In combination of multiple partitions, one is usually interested in deriving a consensus solution with a quality better than that of given partitions. Several recent studies have empirically demonstrated improved accuracy of clustering ensembles on a number of artificial and realworld data sets. Unlike certain multiple supervised classifier systems, convergence properties of unsupervised clustering ensembles remain unknown for conventional combination schemes. In this paper we present formal arguments on the effectiveness of cluster ensemble from two perspectives. The first is based on a stochastic partition generation model related to re-labeling and consensus function with plurality voting. The second is to study the property of the "mean" partition of an ensemble with respect to a metric on the space of all possible partitions. In both the cases, the consensus solution can be shown to converge to a true underlying clustering solution as the number of partitions in the ensemble increases. This paper provides a rigorous justification for the use of cluster ensemble.
2009
The clustering ensembles combine multiple partitions generated by different clustering algorithms into a single clustering solution. Clustering ensembles have emerged as a prominent method for improving robustness, stability and accuracy of unsupervised classification solutions. So far, many contributions have been done to find consensus clustering. One of the major problems in clustering ensembles is the consensus function. In this paper, firstly, we introduce clustering ensembles, representation of multiple partitions, its challenges and present taxonomy of combination algorithms. Secondly, we describe consensus functions in clustering ensembles including Hypergraph partitioning, Voting approach, Mutual information, Co-association based functions and Finite mixture model, and next explain their advantages, disadvantages and computational complexity. Finally, we compare the characteristics of clustering ensembles algorithms such as computational complexity, robustness, simplicity and accuracy on different datasets in previous techniques.
Clustering is fundamental to understand the structure of data. In the past decade the cluster ensemble problem has been introduced, which combines a set of partitions (an ensemble) of the data to obtain a single consensus solution that outperforms all the ensemble members. However, there is disagreement about which are the best ensemble characteristics to obtain a good performance: some authors have suggested that highly different partitions within the ensemble are beneficial for the final performance, whereas others have stated that medium diversity among them is better. While there are several measures to quantify the diversity, a better method to analyze the best ensemble characteristics is necessary. This paper introduces a new ensemble generation strategy and a method to make slight changes in its structure. Experimental results on six datasets suggest that this is an important step towards a more systematic approach to analyze the impact of the ensemble characteristics on the overall consensus performance.
Cluster ensembles offer a solution to challenges inherent to clustering arising from its ill-posed nature. Cluster ensembles can provide robust and stable solutions by leveraging the consensus across multiple clustering results, while averaging out emergent spurious structures that arise due to the various biases to which each participating algorithm is tuned. In this article, we address the problem of combining multiple weighted clusters that belong to different subspaces of the input space. We leverage the diversity of the input clusterings in order to generate a consensus partition that is superior to the participating ones. Since we are dealing with weighted clusters, our consensus functions make use of the weight vectors associated with the clusters. We demonstrate the effectiveness of our techniques by running experiments with several real datasets, including high-dimensional text data. Furthermore, we investigate in depth the issue of diversity and accuracy for our ensemble methods. Our analysis and experimental results show that the proposed techniques are capable of producing a partition that is as good as or better than the best individual clustering.
International Journal of …, 2010
World Journal Of Advanced Research and Reviews, 2022
Voting-based consensus clustering is a subset of consensus techniques that makes clear the cluster label mismatch issue. Finding the best relabeling for a given partition in relation to a reference partition is known as the voting problem. As a weighted bipartite matching problem, it is frequently formulated. We propose a more generic formulation of the voting problem as a regression problem with various and multiple-input variables in this work. We demonstrate how a recently developed cumulative voting system is an exception that corresponds to a linear regression technique. We employ a randomised ensemble creation method in which an excess of clusters are randomly chosen for each ensemble partition. In order to extract the consensus clustering from the combined ensemble representation and to calculate the number of clusters, we use an information-theoretic approach. Together with bipartite matching and cumulative voting, we use it. We provide empirical data demonstrating significant enhancements in clustering stability, estimation of the real number of clusters, and accuracy of clustering based on cumulative voting. The gains are made in comparison to recent consensus algorithms as well as bipartite matching-based consensus algorithms, which struggle with the selected ensemble generation technique.
2012
Clustering ensemble is one of the most recent advances in unsupervised learning. It aims to combine the clustering results obtained using different algorithms or from different runs of the same clustering algorithm for the same data set, this is accomplished using on a consensus function, the efficiency and accuracy of this method has been proven in many works in literature.
International Conference on Information Fusion, 2006
Cluster ensembles are deemed to be a robust and accurate alternative to single clustering runs. 24 methods for designing cluster ensembles are compared here using 24 data sets, both artificial and real. Adjusted rand index and classification accuracy are used as accuracy criteria with respect to a known partition assumed to be the "true" one. The data sets are randomly
Pattern Recognition, 2010
Voting-based consensus clustering refers to a distinct class of consensus methods in which the cluster label mismatch problem is explicitly addressed. The voting problem is defined as the problem of finding the optimal relabeling of a given partition with respect to a reference partition. It is commonly formulated as a weighted bipartite matching problem. In this paper, we present a more general formulation of the voting problem as a regression problem with multiple-response and multiple-input variables. We show that a recently introduced cumulative voting scheme is a special case corresponding to a linear regression method. We use a randomized ensemble generation technique, where an overproduced number of clusters is randomly selected for each ensemble partition. We apply an information theoretic algorithm for extracting the consensus clustering from the aggregated ensemble representation and for estimating the number of clusters. We apply it in conjunction with bipartite matching and cumulative voting. We present empirical evidence showing substantial improvements in clustering accuracy, stability, and estimation of the true number of clusters based on cumulative voting. The improvements are achieved in comparison to consensus algorithms based on bipartite matching, which perform very poorly with the chosen ensemble generation technique, and also to other recent consensus algorithms.
The clustering ensembles contains multiple partitions are divided by different clustering algorithms into a single clustering solutions. Clustering ensembles used for improving robustness, stability, and accuracy of unsupervised classification solutions. The major problem of clustering ensemble is the consensus function. Consensus functions in clustering ensembles including hyperactive graph partition, mutual information, co-association based functions, voting approach and finite machine. The characteristics of clustering ensembles algorithm are computational complexity, robustness, simplicity and accuracy on different datasets in previous techniques.
International Journal of Electrical and Computer Engineering (IJECE), 2018
Data analysis plays a prominent role in interpreting various phenomena. Data mining is the process to hypothesize useful knowledge from the extensive data. Based upon the classical statistical prototypes the data can be exploited beyond the storage and management of the data. Cluster analysis a primary investigation with little or no prior knowledge, consists of research and development across a wide variety of communities. Cluster ensembles are melange of individual solutions obtained from different clusterings to produce final quality clustering which is required in wider applications. The method arises in the perspective of increasing robustness, scalability and accuracy. This paper gives a brief overview of the generation methods and consensus functions included in cluster ensemble. The survey is to analyze the various techniques and cluster ensemble methods.
Proceedings of the 2006 SIAM International Conference on Data Mining, 2006
Cluster ensembles offer a solution to challenges inherent to clustering arising from its ill-posed nature. Cluster ensembles can provide robust and stable solutions by leveraging the consensus across multiple clustering results, while averaging out emergent spurious structures that arise due to the various biases to which each participating algorithm is tuned. In this paper, we address the problem of combining multiple weighted clusters which belong to different subspaces of the input space. We leverage the diversity of the input clusterings in order to generate a consensus partition that is superior to the participating ones. Since we are dealing with weighted clusters, our consensus function makes use of the weight vectors associated with the clusters. The experimental results show that our ensemble technique is capable of producing a partition that is as good as or better than the best individual clustering.
Clustering is fundamental to understand the structure of data. In the past decade the cluster ensemble problem has been introduced, which combines a set of partitions (an ensemble) of the data to obtain a single consensus solution that outperforms all the ensemble members. Although disagreement among ensemble partitions (diversity) has been found to be fundamental for success, the literature has arrived to confusing conclusions: some authors suggest that high diversity is beneficial for the final performance, whereas others have indicated that medium is better. While there are several options to measure the diversity, there is no method to control it. This paper introduces a new ensemble generation strategy and a method to smoothly change the ensemble diversity. Experimental results on three datasets suggest that this is an important step towards a more systematic approach to analyze the impact of the ensemble diversity on the overall consensus performance.
Pattern Recognition Letters, 2010
In recent years, the cluster ensembles have been successfully used to tackle well known drawbacks of individual clustering algorithms. Beyond the expected improvement provided by the averaging effect of many clustering algorithms (clustering committee) aiming at the same goal, some interesting experimental results also show that even committees of completely random partitions may lead to a useful consensus. Another powerful finding in cluster ensemble research is that the blind criterion Averaged Normalized Mutual Information seems to replace actual misclassification ratio, whenever labels are given to actual clusters. In this work, we study what is behind these interesting results and the blind criterion, and we use what we learn from this study to propose a new point of view for analysis and design of clustering committees. The usefulness of this new perspective is illustrated through experimental results.
Proceedings of the International Conference on Knowledge Discovery and Information Retrieval, 2009
In order to combine multiple data partitions into a more robust data partition, several approaches to produce the cluster ensemble and various consensus functions have been proposed. This range of possibilities in the multiple data partitions combination raises a new problem: which of the existing approaches, to produce the cluster ensembles' data partitions and to combine these partitions, best fits a given data set. In this paper, we address the cluster ensemble selection problem. We proposed a new measure to select the best consensus data partition, among a variety of consensus partitions, based on a notion of average cluster consistency between each data partition that belongs to the cluster ensemble and a given consensus partition. We compared the proposed measure with other measures for cluster ensemble selection, using 9 different data sets, and the experimental results shown that the consensus partitions selected by our approach usually were of better quality in comparison with the consensus partitions selected by other measures used in our experiments.
Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004., 2004
Clustering ensembles combine multiple partitions of the given data into a single clustering solution of better quality. Inspired by the success of supervised boosting algorithms, we devise an adaptive scheme for integration of multiple non-independent clusterings. Individual partitions in the ensemble are sequentially generated by clustering specially selected subsamples of the given data set. The sampling probability for each data point dynamically depends on the consistency of its previous assignments in the ensemble. New subsamples are drawn to increasingly focus on the problematic regions of the input feature space. A measure of a data point's clustering consistency is defined to guide this adaptation. An empirical study compares the performance of adaptive and regular clustering ensembles using different consensus functions on a number of data sets. Experimental results demonstrate improved accuracy for some clustering structures.
Progress in Pattern …, 2008
The user has requested enhancement of the downloaded file. All in-text references underlined in blue are added to the original document and are linked to publications on ResearchGate, letting you access and read them immediately.
International Journal of Learning Management Systems, 2013
A new criterion for clusters validation is proposed in the paper and based on the new cluster validation criterion a clustering ensmble framework is proposed. The main idea behind the framework is to extract the most stable clusters in terms of the defined criteria. Employing this new cluster validation criterion, the obtained ensemble is evaluated on some well-known and standard data sets. The empirical studies show promising results for the ensemble obtained using the proposed criterion comparing with the ensemble obtained using the standard clusters validation criterion.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.