Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2010, International Journal of …
2009
The clustering ensembles combine multiple partitions generated by different clustering algorithms into a single clustering solution. Clustering ensembles have emerged as a prominent method for improving robustness, stability and accuracy of unsupervised classification solutions. So far, many contributions have been done to find consensus clustering. One of the major problems in clustering ensembles is the consensus function. In this paper, firstly, we introduce clustering ensembles, representation of multiple partitions, its challenges and present taxonomy of combination algorithms. Secondly, we describe consensus functions in clustering ensembles including Hypergraph partitioning, Voting approach, Mutual information, Co-association based functions and Finite mixture model, and next explain their advantages, disadvantages and computational complexity. Finally, we compare the characteristics of clustering ensembles algorithms such as computational complexity, robustness, simplicity and accuracy on different datasets in previous techniques.
International Journal of Learning Management Systems, 2013
A new criterion for clusters validation is proposed in the paper and based on the new cluster validation criterion a clustering ensmble framework is proposed. The main idea behind the framework is to extract the most stable clusters in terms of the defined criteria. Employing this new cluster validation criterion, the obtained ensemble is evaluated on some well-known and standard data sets. The empirical studies show promising results for the ensemble obtained using the proposed criterion comparing with the ensemble obtained using the standard clusters validation criterion.
International Journal of Electrical and Computer Engineering (IJECE), 2024
Ensemble learning stands out as a widely embraced technique in machine learning. This research explores the application of ensemble learning, including ensemble clustering, to enhance the precision of cluster analysis for datasets with multiple attributes and unclear correlations. Employing a majority voting-based ensemble clustering approach, specific techniques such as k-means clustering, affinity propagation, mean shift, BIRCH clustering, and others are applied to defined datasets, leading to improved clustering results. The study involves a comprehensive comparative analysis, contrasting ensemble clustering outcomes with those of individual techniques. The process of improving cluster identification accuracy encompasses data collection, pre-processing to exclude irrelevant elements, and the application of standard clustering algorithms. The task includes defining the optimal number of groups before comparing clustering models. Additionally, a combined model is constructed by merging BIRCH clustering and mean shift clustering, leveraging their advantages to enhance overall clustering strength and accuracy. This research contributes to advancing ensemble learning and ensemble clustering methodologies, offering improved accuracy, and uncovering hidden patterns in complex datasets.
2007
Previous clustering ensemble algorithms usually use a consensus function to obtain a final partition from the outputs of the initial clustering. In this paper, we propose a new clustering ensemble method, which generates a new feature space from initial clustering outputs. Multiple runs of an initial clustering algorithm like k-means generate a new feature space, which is significantly better than pure or normalized feature space. Therefore, running a simple clustering algorithm on generated feature space can obtain the final partition significantly better than pure data. In this method, we use a modification of k-means for initial clustering runs named as “Intelligent k-means”, which is especially defined for clustering ensembles. The results of the proposed method are presented using both simple k-means and intelligent k-means. Fast convergence and appropriate behavior are the most interesting points of the proposed method. Experimental results on real data sets show effectiveness of the proposed method.
Clustering is fundamental to understand the structure of data. In the past decade the cluster ensemble problem has been introduced, which combines a set of partitions (an ensemble) of the data to obtain a single consensus solution that outperforms all the ensemble members. However, there is disagreement about which are the best ensemble characteristics to obtain a good performance: some authors have suggested that highly different partitions within the ensemble are beneficial for the final performance, whereas others have stated that medium diversity among them is better. While there are several measures to quantify the diversity, a better method to analyze the best ensemble characteristics is necessary. This paper introduces a new ensemble generation strategy and a method to make slight changes in its structure. Experimental results on six datasets suggest that this is an important step towards a more systematic approach to analyze the impact of the ensemble characteristics on the overall consensus performance.
2006
Ensemble of clustering methods is recently shown to perform better than conventional clustering methods. One of the drawback of the ensemble is, its computational requirements can be very large and hence may not be suitable for large data sets. The paper presents an ensemble of leaders clustering methods where the entire ensemble requires only a single scan of the data set. Further, the component leaders complement each other while deriving individual partitions. A heuristic based consensus method to combine the individual partitions is presented and is compared with a well known consensus method called co-association based consensus. Experimentally the proposed methods are shown to perform well. for
International Conference on Information Fusion, 2006
Cluster ensembles are deemed to be a robust and accurate alternative to single clustering runs. 24 methods for designing cluster ensembles are compared here using 24 data sets, both artificial and real. Adjusted rand index and classification accuracy are used as accuracy criteria with respect to a known partition assumed to be the "true" one. The data sets are randomly
International Journal of Electrical and Computer Engineering (IJECE), 2018
Data analysis plays a prominent role in interpreting various phenomena. Data mining is the process to hypothesize useful knowledge from the extensive data. Based upon the classical statistical prototypes the data can be exploited beyond the storage and management of the data. Cluster analysis a primary investigation with little or no prior knowledge, consists of research and development across a wide variety of communities. Cluster ensembles are melange of individual solutions obtained from different clusterings to produce final quality clustering which is required in wider applications. The method arises in the perspective of increasing robustness, scalability and accuracy. This paper gives a brief overview of the generation methods and consensus functions included in cluster ensemble. The survey is to analyze the various techniques and cluster ensemble methods.
Pattern Analysis and Applications, 2017
Clustering as a major task in data mining is responsible for discovering hidden patterns in unlabeled datasets. Finding the best clustering is also considered as one of the most challenging problems in data mining. Due to the problem complexity and the weaknesses of primary clustering algorithm, a large part of research has been directed toward ensemble clustering methods. Ensemble clustering aggregates a pool of base clusterings and produces an output clustering that is also named consensus clustering. The consensus clustering is usually better clustering than the output clusterings of the basic clustering algorithms. However, lack of quality in base clusterings makes their consensus clustering weak. In spite of some researches in selection of a subset of high quality base clusterings based on a clustering assessment metric, cluster-level selection has been always ignored. In this paper, a new clustering ensemble framework has been proposed based on cluster-level weighting. The certainty amount that the given ensemble has about a cluster is considered as the reliability of that cluster. The certainty amount that the given ensemble has about a cluster is computed by the accretion amount of that cluster by the ensemble. Then by selecting the best clusters and assigning a weight to each selected cluster based on its reliability, the final ensemble is created. After that, the paper proposes cluster-level weighting co-association matrix instead of traditional co-association matrix. Then, two consensus functions have been introduced and used for production of the consensus partition. The proposed framework completely overshadows the state-of-the-art clustering ensemble methods experimentally.
Proceedings of the 16th Conference on Computer Science and Intelligence Systems, 2021
The clustering ensembles contains multiple partitions are divided by different clustering algorithms into a single clustering solutions. Clustering ensembles used for improving robustness, stability, and accuracy of unsupervised classification solutions. The major problem of clustering ensemble is the consensus function. Consensus functions in clustering ensembles including hyperactive graph partition, mutual information, co-association based functions, voting approach and finite machine. The characteristics of clustering ensembles algorithm are computational complexity, robustness, simplicity and accuracy on different datasets in previous techniques.
Journal of Intelligent & Fuzzy Systems, 2020
During the last decade, ensemble clustering has been the subject of many researches in data mining. In ensemble clustering, several basic partitions are first generated and then a function is used for the clustering aggregation in order to create a final partition that is similar to all of the basic partitions as much as possible. Ensemble clustering has been proposed to enhance efficiency, strength, reliability, and stability of the clustering. A common slogan concerning the ensemble clustering techniques is that "the model combining several poorer models is better than a stronger model". Here at this paper, an ensemble clustering method is proposed using the basic k-means clustering method as its base clustering algorithm. Also, this study could raise the diversity of consensus by adopting some measures. Although our clustering ensemble approach has the strengths of kmeans, such as its efficacy and low complexity, it lacks the drawbacks which the kmeans suffers from; such as its problem in detection of clusters that are not uniformly distributed or in the circular shape. In the empirical studies, we test the proposed ensemble clustering algorithm as well as the other up-to-date cluster ensembles on different data-sets. Based on the experimental results, our cluster ensemble method is stronger than the recent competitor cluster ensemble algorithms and is the most up-to-date clustering method available.
Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004., 2004
Clustering ensembles combine multiple partitions of the given data into a single clustering solution of better quality. Inspired by the success of supervised boosting algorithms, we devise an adaptive scheme for integration of multiple non-independent clusterings. Individual partitions in the ensemble are sequentially generated by clustering specially selected subsamples of the given data set. The sampling probability for each data point dynamically depends on the consistency of its previous assignments in the ensemble. New subsamples are drawn to increasingly focus on the problematic regions of the input feature space. A measure of a data point's clustering consistency is defined to guide this adaptation. An empirical study compares the performance of adaptive and regular clustering ensembles using different consensus functions on a number of data sets. Experimental results demonstrate improved accuracy for some clustering structures.
Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 2011
Cluster ensembles combine multiple clusterings of a set of objects into a single consolidated clustering, often referred to as the consensus solution. Consensus clustering can be used to generate more robust and stable clustering results compared to a single clustering approach, perform distributed computing under privacy or sharing constraints, or reuse existing knowledge. This paper describes a variety of algorithms that have been proposed to address the cluster ensemble problem, organizing them in conceptual categories that bring out the common threads and lessons learnt while simultaneously highlighting unique features of individual approaches.
Machine Learning, 2013
Cluster ensembles or consensus clusterings have been shown to be better than any standard clustering algorithm at improving accuracy and robustness across various sets of data. This meta-learning formalism also helps users to overcome the dilemma of selecting an appropriate technique and the parameters for that technique. Since founded, different research areas have emerged with the common purpose of enhancing the effectiveness and applicability of cluster ensembles. These include the selection of ensemble members, the imputation of missing values, and the summarization of ensemble members. In particular, this paper is set to provide the review of different matrix refinement approaches that have been recently proposed in the literature for summarizing information of multiple clusterings. With various benchmark datasets and quality measures, the comparative study of these novel techniques is carried out to provide empirical findings from which a practical guideline can be drawn.
2010
Looking back on the past decade of research on clustering algorithms, we witness two major and apparent trends: 1) The already vast amount of existing clustering algorithms, is continuously broadened and 2) clustering algorithms in general, are becoming more and more adapted to specific application domains with very particular assumptions. As a result, algorithms have grown complicated and/or very scenariodependent, which made clustering a hardly accessible domain for non-expert users. This is an especially critical development, since, due to increasing data gathering, the need for analysis techniques like clustering emerges in many application domains. In this paper, we oppose the current focus on specialization, by proposing our vision of a usable, guided and universally applicable clustering process. In detail, we are going to describe our already conducted work and present our future research directions.
2010
Part I investigates the concept of ensemble clustering. It presents a comprehen-sive review of the state of the art in ensemble clustering. It follows by discussing the impact of the ensemble variability in the final consensual result. Visualization of ensemble variability based on ...
Computer Science Review, 2018
Cluster ensembles have been shown to be better than any standard clustering algorithm at improving accuracy and robustness across different data collections. This meta-learning formalism also helps users to overcome the dilemma of selecting an appropriate technique and the corresponding parameters, given a set of data to be investigated. Almost two decades after the first publication of a kind, the method has proven effective for many problem domains, especially microarray data analysis and its downstreaming applications. Recently, it has been greatly extended both in terms of theoretical modelling and deployment to problem solving. The survey attempts to match this emerging attention with the provision of fundamental basis and theoretical details of state-of-the-art methods found in the present literature. It yields the ranges of ensemble generation strategies, summarization and representation of ensemble members, as well as the topic of consensus clustering. This review also includes different applications and extensions of cluster ensemble, with several research issues and challenges being highlighted.
Combination of multiple clusterings is an important task in the area of unsupervised learning. Inspired by the success of supervised bagging algorithms, we propose a resampling scheme for integration of multiple independent clusterings. Individual partitions in the ensemble are sequentially generated by clustering specially selected subsamples of the given data set. In this paper, we compare the efficacy of both subsampling (sampling without replacement) and bootstrap (with replacement) techniques in conjunction with several fusion algorithms. The empirical study shows that a meaningful consensus partition for an entire set of data points emerges from multiple clusterings of subsamples of small size. The purpose of this paper is to show that small subsamples generally suffice to represent the structure of the entire data set in the framework of clustering ensembles. Subsamples of small size can reduce computational cost and measurement complexity for many unsupervised data mining ta...
Mathematical and Computational Applications, 2011
Ensemble clustering is a promising approach that combines the results of multiple clustering algorithms to obtain a consensus partition by merging different partitions based upon well-defined rules. In this study, we use an ensemble clustering approach for merging the results of five different clustering algorithms that are sometimes used in bioinformatics applications. The ensemble clustering result is tested on microarray data sets and compared with the results of the individual algorithms. An external cluster validation index, adjusted rand index (C-rand), and two internal cluster validation indices; silhouette, and modularity are used for comparison purposes.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.