Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2013, Proceedings of the 2nd International Conference on Pattern Recognition Applications and Methods
…
10 pages
1 file
Ensemble clustering methods derive a consensus partition of a set of objects starting from the results of a collection of base clustering algorithms forming the ensemble. Each partition in the ensemble provides a set of pairwise observations of the co-occurrence of objects in a same cluster. The evidence accumulation clustering paradigm uses these co-occurrence statistics to derive a similarity matrix, referred to as co-association matrix, which is fed to a pairwise similarity clustering algorithm to obtain a final consensus clustering. The advantage of this solution is the avoidance of the label correspondence problem, which affects other ensemble clustering schemes. In this paper we derive a principled approach for the extraction of a consensus clustering from the observations encoded in the co-association matrix. We introduce a probabilistic model for the co-association matrix parameterized by the unknown assignments of objects to clusters, which are in turn estimated using a maximum likelihood approach. Additionally, we propose a novel algorithm to carry out the parameter estimation with convergence guarantees towards a local solution. Experiments on both synthetic and real benchmark data show the effectiveness of the proposed approach.
Machine Learning, 2013
Clustering ensemble methods produce a consensus partition of a set of data points by combining the results of a collection of base clustering algorithms. In the evidence accumulation clustering (EAC) paradigm, the clustering ensemble is transformed into a pairwise co-association matrix, thus avoiding the label correspondence problem, which is intrinsic to other clustering ensemble schemes. In this paper, we propose a consensus clustering approach based on the EAC paradigm, which is not limited to crisp partitions and fully ex
IEEE transactions on systems, man, and cybernetics, 2021
Ensemble clustering has been a popular research topic in data mining and machine learning. Despite its significant progress in recent years, there are still two challenging issues in the current ensemble clustering research. First, most of the existing algorithms tend to investigate the ensemble information at the object-level, yet often lack the ability to explore the rich information at higher levels of granularity. Second, they mostly focus on the direct connections (e.g., direct intersection or pairwise co-occurrence) in the multiple base clusterings, but generally neglect the multi-scale indirect relationship hidden in them. To address these two issues, this paper presents a novel ensemble clustering approach based on fast propagation of cluster-wise similarities via random walks. We first construct a cluster similarity graph with the base clusters treated as graph nodes and the cluster-wise Jaccard coefficient exploited to compute the initial edge weights. Upon the constructed graph, a transition probability matrix is defined, based on which the random walk process is conducted to propagate the graph structural information. Specifically, by investigating the propagating trajectories starting from different nodes, a new cluster-wise similarity matrix can be derived by considering the trajectory relationship. Then, the newly obtained cluster-wise similarity matrix is mapped from the cluster-level to the object-level to achieve an enhanced co-association (ECA) matrix, which is able to simultaneously capture the object-wise co-occurrence relationship as well as the multi-scale cluster-wise relationship in ensembles. Finally, two novel consensus functions are proposed to obtain the consensus clustering result. Extensive experiments on a variety of real-world datasets have demonstrated the effectiveness and efficiency of our approach.
Lecture Notes in Computer Science, 2013
Consensus clustering methodologies combine a set of partitions on the clustering ensemble providing a consensus partition. One of the drawbacks of the standard combination algorithms is that all the partitions of the ensemble have the same weight on the aggregation process. By making a differentiation among the partitions the quality of the consensus could be improved. In this paper we propose a novel formulation that tries to find a median-partition for the clustering ensemble process based on the evidence accumulation framework, but including a weighting mechanism that allows to differentiate the importance of the partitions of the ensemble in order to become more robust to noisy ensembles. Experiments on both synthetic and real benchmark data show the effectiveness of the proposed approach.
2006
Ensemble of clustering methods is recently shown to perform better than conventional clustering methods. One of the drawback of the ensemble is, its computational requirements can be very large and hence may not be suitable for large data sets. The paper presents an ensemble of leaders clustering methods where the entire ensemble requires only a single scan of the data set. Further, the component leaders complement each other while deriving individual partitions. A heuristic based consensus method to combine the individual partitions is presented and is compared with a well known consensus method called co-association based consensus. Experimentally the proposed methods are shown to perform well. for
Anais do 10. Congresso Brasileiro de Inteligência Computacional, 2016
Consensus clustering has emerged as a method of improving quality and robustness in clustering by optimally combining the results of different clustering process. In last few years, several approaches are proposed. In this paper, we propose a new method of arriving at a consensus clustering. We assign confidence score to each partition in the ensemble and compute weighted co-association for all pairs of data objects. In order to derive the consensus clustering from the co-association matrix, we use cross-association technique to group the rows and columns simultaneously. The objective is to derive as many clusters of homogenous blocks as possible. The set of non-zero blocks are taken as the resulting partition. The use of cross-association technique captures the transitive relationship. We show empirically that for the benchmark datasets, our technique yields better consensus clustering than any other known algorithms.
2009
The clustering ensembles combine multiple partitions generated by different clustering algorithms into a single clustering solution. Clustering ensembles have emerged as a prominent method for improving robustness, stability and accuracy of unsupervised classification solutions. So far, many contributions have been done to find consensus clustering. One of the major problems in clustering ensembles is the consensus function. In this paper, firstly, we introduce clustering ensembles, representation of multiple partitions, its challenges and present taxonomy of combination algorithms. Secondly, we describe consensus functions in clustering ensembles including Hypergraph partitioning, Voting approach, Mutual information, Co-association based functions and Finite mixture model, and next explain their advantages, disadvantages and computational complexity. Finally, we compare the characteristics of clustering ensembles algorithms such as computational complexity, robustness, simplicity and accuracy on different datasets in previous techniques.
Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 2011
Cluster ensembles combine multiple clusterings of a set of objects into a single consolidated clustering, often referred to as the consensus solution. Consensus clustering can be used to generate more robust and stable clustering results compared to a single clustering approach, perform distributed computing under privacy or sharing constraints, or reuse existing knowledge. This paper describes a variety of algorithms that have been proposed to address the cluster ensemble problem, organizing them in conceptual categories that bring out the common threads and lessons learnt while simultaneously highlighting unique features of individual approaches.
The clustering ensembles contains multiple partitions are divided by different clustering algorithms into a single clustering solutions. Clustering ensembles used for improving robustness, stability, and accuracy of unsupervised classification solutions. The major problem of clustering ensemble is the consensus function. Consensus functions in clustering ensembles including hyperactive graph partition, mutual information, co-association based functions, voting approach and finite machine. The characteristics of clustering ensembles algorithm are computational complexity, robustness, simplicity and accuracy on different datasets in previous techniques.
International Journal of Electrical and Computer Engineering (IJECE), 2018
Data analysis plays a prominent role in interpreting various phenomena. Data mining is the process to hypothesize useful knowledge from the extensive data. Based upon the classical statistical prototypes the data can be exploited beyond the storage and management of the data. Cluster analysis a primary investigation with little or no prior knowledge, consists of research and development across a wide variety of communities. Cluster ensembles are melange of individual solutions obtained from different clusterings to produce final quality clustering which is required in wider applications. The method arises in the perspective of increasing robustness, scalability and accuracy. This paper gives a brief overview of the generation methods and consensus functions included in cluster ensemble. The survey is to analyze the various techniques and cluster ensemble methods.
2004
This paper presents a probabilistic model for combining cluster ensembles utilizing information theoretic measures. Starting from a co-association matrix which summarizes the ensemble, we extract a set of association distributions, which are modelled as discrete probability distributions of the object labels, conditional on each data object. The key objectives are, first, to model the associations of neighboring data objects, and second, to allow for the manipulation of the defined probability distributions using statistical and information theoretic means. A Jensen-Shannon Divergence based Clustering Combination (JSDCC) method is proposed. The method selects cluster prototypes from the set of association distributions based on entropy maximization and maximization of the generalized JS divergence among the selected prototypes. The method proceeds by grouping association distributions by minimizing their JS divergences to the selected prototypes. By aggregating the grouped association distributions, we can represent empirical cluster conditional probability distributions of the object labels, for each of the combined clusters. Finally, data objects are assigned to their most likely clusters, and their cluster assignment probabilities are estimated. Experiments are performed to assess the presented method and compare its performance with other alternative co-association based methods.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.
Pattern Recognition, 2010
IEEE Transactions on Pattern Analysis and Machine Intelligence, 2005
Machine Learning, 2013
IEEE Transactions on Pattern Analysis and Machine Intelligence, 2000
International Journal of …, 2010
International Journal of Advanced Science and Technology, 2014
World Journal Of Advanced Research and Reviews, 2022
Intelligent Data Analysis, 2014
Expert Systems with Applications, 2020
Object recognition supported by user interaction for service robots, 2002
Proceedings of the 2006 SIAM International Conference on Data Mining, 2006
Proceedings - Fourth IEEE International Conference on Data Mining, ICDM 2004, 2004