Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2021, Computers, Materials & Continua
…
24 pages
1 file
In order to improve performance and robustness of clustering, it is proposed to generate and aggregate a number of primary clusters via clustering ensemble technique. Fuzzy clustering ensemble approaches attempt to improve the performance of fuzzy clustering tasks. However, in these approaches, cluster (or clustering) reliability has not paid much attention to. Ignoring cluster (or clustering) reliability makes these approaches weak in dealing with low-quality base clustering methods. In this paper, we have utilized cluster unreliability estimation and local weighting strategy to propose a new fuzzy clustering ensemble method which has introduced Reliability Based weighted co-association matrix Fuzzy C-Means (RBFCM), Reliability Based Graph Partitioning (RBGP) and Reliability Based Hyper Clustering (RBHC) as three new fuzzy clustering consensus functions. Our fuzzy clustering ensemble approach works based on fuzzy cluster unreliability estimation. Cluster unreliability is estimated according to an entropic criterion using the cluster labels in the entire ensemble. To do so, the new metric is de ned to estimate the fuzzy cluster unreliability; then, the reliability value of any cluster is determined using a Reliability Driven Cluster Indicator (RDCI). The time complexities of RBHC and RBGP are linearly proportional with the This work is licensed under a Creative Commons Attribution 4.
Applied Intelligence, 2018
Pattern Analysis and Applications, 2017
Clustering as a major task in data mining is responsible for discovering hidden patterns in unlabeled datasets. Finding the best clustering is also considered as one of the most challenging problems in data mining. Due to the problem complexity and the weaknesses of primary clustering algorithm, a large part of research has been directed toward ensemble clustering methods. Ensemble clustering aggregates a pool of base clusterings and produces an output clustering that is also named consensus clustering. The consensus clustering is usually better clustering than the output clusterings of the basic clustering algorithms. However, lack of quality in base clusterings makes their consensus clustering weak. In spite of some researches in selection of a subset of high quality base clusterings based on a clustering assessment metric, cluster-level selection has been always ignored. In this paper, a new clustering ensemble framework has been proposed based on cluster-level weighting. The certainty amount that the given ensemble has about a cluster is considered as the reliability of that cluster. The certainty amount that the given ensemble has about a cluster is computed by the accretion amount of that cluster by the ensemble. Then by selecting the best clusters and assigning a weight to each selected cluster based on its reliability, the final ensemble is created. After that, the paper proposes cluster-level weighting co-association matrix instead of traditional co-association matrix. Then, two consensus functions have been introduced and used for production of the consensus partition. The proposed framework completely overshadows the state-of-the-art clustering ensemble methods experimentally.
Karbala International Journal of Modern Science, 2015
Data mining literature offer some clustering techniques. But when we implement even an effective clustering technique, the results are found unreliable. The efficacy of the technique come under scrutiny. Here, the proposal is about an integrated framework, which ensures the reliability of the class labels assigned to a dataset whose class labels are unknown. The model uses PSO-k-means, k-medoids, c-means and Expectation Maximization for data clustering. This model integrates their results through majority voting cluster ensemble technique to enhance reliability. The reliable outcomes serve as the training set for the classification process through Bayesian classifier, Multi Layer Perceptron, Support Vector Machine and Decision tree. The predicted class labels by majority of classifiers through bagging classifier ensemble method are included with the training set and in combination, designated as the set with known class labels. Heterogeneous datasets with unknown class labels but known number of classes, after being treated through this model would be able to find the class labels for a significant portion of the data and may be accepted with reliability. The evaluation procedure has been performed by following the Dunn's, DavieseBouldin and Modified Good-maneKruskal indexing techniques for internal validation and probabilistic measures such as Normalized Mutual Information, Normalized Variation of Information and Adjusted Random Index which are appropriate measures of goodness-of-fit and robustness of the final clusters. The predictive capacity of the model is also validated through probabilistic measures and external indexing techniques such as Purity Measure, Random Index and F-measure.
Cluster analysis is an important exploratory tool which reveals underlying structures in data and organizes them in clusters (groups) based on their similarities. The fuzzy approach to the clustering problem involves the concept of partial memberships of the instances in the clusters, increasing the flexibility and enhancing the semantics of the generated clusters. Several fuzzy clustering algorithms have been devised like fuzzy c-means (FCM), Gustafson-Kessel, Gath-Geva, kernel-based FCM etc. Although these algorithms do have a myriad of successful applications, each of them has its stability drawbacks related to several factors including the shape and density of clusters, the presence of noise or outliers and the choices about the algorithm's parameters and cluster center initialization. In this paper we are providing a heterogeneous cluster ensemble approach to improve the stability of fuzzy cluster analysis. The key idea of our methodology is the application of different fuzzy clustering algorithms on the datasets obtaining multiple partitions, which in the later stage will be fused into the final consensus matrix. Finally we have experimentally evaluated and compared the accuracy of this methodology.
Fuzzy clustering and Cluster Ensemble are important subjects in data mining. In recent years, fuzzy clustering algorithms have been growing rapidly, but fuzzy Clustering ensemble techniques have not grown much and most of them have been created by converting them to a fuzzy version of Consensus Function. In this paper, a fuzzy cluster ensemble method based on graph is introduced. Proposed approach uses membership matrixes obtained from multiple fuzzy partitions resulted by various fuzzy methods, and then creates fuzzy co-association matrixes for each partition which their entries present degree of correlation between related data points. Finally all of these matrixes summarize in another matrix called strength matrix and the final result is specified by an iterative decreasing process until one gets the desired number of clusters. Also a few data sets and some UCI datasets data set are used for evaluation of proposed methods. The proposed approach shows this could be more effective than base clustering algorithms same of FCM, K-means and spectral method and in comparison with various cluster ensemble methods, the proposed methods consist of results that are more reliable and less error rates than other methods.
Many stability measures, such as Normalized Mutual Information (NMI), have been proposed to validate a set of partitionings. It is highly possible that a set of partitionings may contain one (or more) high quality cluster(s) but is still adjudged a bad cluster by a stability measure, and as a result, is completely neglected. Inspired by evaluation approaches measuring the efficacy of a set of partitionings, researchers have tried to define new measures for evaluating a cluster. Thus far, the measures defined for assessing a cluster are mostly based on the well-known NMI measure. The drawback of this commonly used approach is discussed in this paper, after which a new asymmetric criterion, called the Alizadeh-Parvin-Moshki-Minaei criterion (APMM), is proposed to assess the association between a cluster and a set of partitionings. We show that the APMM criterion overcomes the deficiency in the conventional NMI measure. We also propose a clustering ensemble framework that incorporates the APMM's capabilities in order to find the best performing clusters. The framework uses Average APMM (AAPMM) as a fitness measure to select a number of clusters instead of using all of the results. Any cluster that satisfies a predefined threshold of the mentioned measure is selected to participate in an elite ensemble. To combine the chosen clusters, a co-association matrix-based consensus function (by which the set of resultant partitionings are obtained) is used. Because Evidence Accumulation Clustering (EAC) can not derive the co-association matrix from a subset of clusters appropriately, a new EAC-based method, called Extended EAC (EEAC), is employed to construct the co-association matrix from the chosen subset of clusters. Empirical studies show that our proposed approach outperforms other cluster ensemble approaches.
Studies in Computational Intelligence, 2008
Ensemble clustering is a novel research field that extends to unsupervised learning the approach originally developed for classification and supervised learning problems. In particular ensemble clustering methods have been developed to improve the robustness and accuracy of clustering algorithms, as well as the ability to capture the structure of complex data. In many clustering applications an example may belong to multiple clusters, and the introduction of fuzzy set theory concepts can improve the level of flexibility needed to model the uncertainty underlying real data in several application domains. In this paper, we propose an unsupervised fuzzy ensemble clustering approach that permit to dispose both of the flexibility of the fuzzy sets and the robustness of the ensemble methods. Our algorithmic scheme can generate different ensemble clustering algorithms that allow to obtain the final consensus clustering both in crisp and fuzzy formats.
Intelligent Data Analysis, 2014
Many stability measures, such as Normalized Mutual Information (NMI), have been proposed to validate a set of partitionings. It is highly possible that a set of partitionings may contain one (or more) high quality cluster(s) but is still adjudged a bad cluster by a stability measure, and as a result, is completely neglected. Inspired by evaluation approaches measuring the efficacy of a set of partitionings, researchers have tried to define new measures for evaluating a cluster. Thus far, the measures defined for assessing a cluster are entirely based on the well-known NMI measure. The drawback of this commonly used approach is discussed in this paper, after which a new asymmetric criterion, called the Alizadeh-Parvin-Moshki-Minaei criterion (APMM), is proposed to assess the association between a cluster and a set of partitionings. The APMM criterion overcomes the deficiency in the conventional NMI measure. We also propose a clustering ensemble framework that incorporates the APMM's capabilities in order to find the best performing clusters. The framework uses Average APMM (AAPMM) as a fitness measure to select a number of clusters instead of using all of the results. Any cluster that satisfies a predefined threshold of the mentioned measure is selected to participate in an elite ensemble. To combine the chosen clusters, a co-association matrix-based consensus function (by which the set of resultant partitionings are obtained) is used. Because Evidence Accumulation Clustering (EAC) can not derive the co-association matrix from a subset of clusters, a new EAC-based method, called Extended EAC (EEAC), is employed to construct the co-association matrix from the chosen subset of clusters. Empirical studies show that our proposed approach outperforms other cluster ensemble approaches.
Fuzzy Sets and Systems, 2012
Consensus clustering, i.e. the task of combining the outcomes of several clustering systems into a single partition, has lately attracted the attention of researchers in the unsupervised classification field, as it allows the creation of clustering committees that can be applied with multiple interesting purposes, such as knowledge reuse or distributed clustering. However, little attention has been paid to the development of algorithms, known as consensus functions, especially designed for consolidating the outcomes of multiple fuzzy (or soft) clustering systems into a single fuzzy partition-despite the fact that fuzzy clustering is far more informative than its crisp counterpart, as it provides information regarding the degree of association between objects and clusters that can be helpful for deriving richer descriptive data models. For this reason, this paper presents a set of fuzzy consensus functions capable of creating soft consensus partitions by fusing a collection of fuzzy clusterings. Our proposals base clustering combination on a cluster disambiguation process followed by the application of positional and confidence voting techniques. The modular design of these algorithms makes it possible to sequence their constituting steps in different manners, which allows to derive versions of the proposed consensus functions optimized from a computational standpoint. The proposed consensus functions have been evaluated in terms of the quality of the consensus partitions they deliver and in terms of their running time on multiple benchmark data sets. A comparison against several representative state-of-the-art consensus functions reveals that our proposals constitute an appealing alternative for conducting fuzzy consensus clustering, as they are capable of yielding high quality consensus partitions at a low computational cost.
The clustering ensembles contains multiple partitions are divided by different clustering algorithms into a single clustering solutions. Clustering ensembles used for improving robustness, stability, and accuracy of unsupervised classification solutions. The major problem of clustering ensemble is the consensus function. Consensus functions in clustering ensembles including hyperactive graph partition, mutual information, co-association based functions, voting approach and finite machine. The characteristics of clustering ensembles algorithm are computational complexity, robustness, simplicity and accuracy on different datasets in previous techniques.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.
International Journal of Electrical and Computer Engineering (IJECE), 2018
International Journal of Learning Management Systems, 2013
International journal of electronics engineering and application, 2021
Computers, Materials & Continua, 2022
Pattern Analysis and Applications, 2009
Journal of Data Science and Its Applications, 2018
Engineered science, 2024
International Journal of Electrical and Computer Engineering (IJECE), 2024
Anais do 10. Congresso Brasileiro de Inteligência Computacional, 2016
Proceedings of the International Conference on Knowledge Discovery and Information Retrieval, 2009
Advances in Soft Computing, 2006
2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings.
Progress in Pattern Recognition, Image …, 2010
Lecture Notes in Computer Science, 2010
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2008
Memetic Computing, 2012
Pattern Recognition, 2010