Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2007, Lecture Notes in Computer Science
Because clustering is an unsupervised procedure, clustering results need be judged by external criteria called validity indices. These indices play an important role in determining the number of clusters in a given dataset. A general approach for determining this number is to select the optimal value of a certain cluster validity index. Most existing indices give good results for data sets with well separated clusters, but usually fail for complex data sets, for example, data sets with overlapping clusters. In this paper, we propose a new approach for clustering quality evaluation while combining fuzzy logic with Formal Concept Analysis based on concept lattice. We define a formal quality index including the separation degree and the overlapping rate.
2007 IEEE International Fuzzy Systems Conference, 2007
The purpose of this paper is to construct structural information from the original data, where the results of fuzzy clustering can be displayed and interpreted. We use Fuzzy Formal Concept Analysis (FFCA) based technique for visual data mining and fuzzy clustering results interpretation. The visual interpretation and the navigation in the fuzzy lattice provided useful insights about the overlapping of different clusters and their relationships.
2013
Clustering can be defined as the process of grouping physical or abstract objects into classes of similar objects. It’s an unsupervised learning problem of organizing unlabeled objects into natural groups in such a way objects in the same group is more similar than objects in the different groups. Conventional clustering algorithms cannot handle uncertainty that exists in the real life experience. Fuzzy clustering handles incompleteness, vagueness in the data set efficiently. The goodness of clustering is measured in terms of cluster validity indices where the results of clustering are validated repeatedly for different cluster partitions to give the maximum efficiency i.e. to determine the optimal number of clusters. Especially, fuzzy clustering has been widely applied in a variety of areas and fuzzy cluster validation plays a very important role in fuzzy clustering. Since then Fuzzy clustering has been evaluated using various cluster validity indices. But primary indices have used...
2009
This paper presents a new approach to find the optimal number of clusters of a fuzzy partition. It is based on a fuzzy modeling approach which combines measures of clusters' separation and overlap. Theses measures are based on triangular norms and a discrete Sugeno integral. Results on artificial and real data sets prove its efficiency compared to indexes from the literature.
Computers, Materials & Continua, 2022
Unsupervised clustering and clustering validity are used as essential instruments of data analytics. Despite clustering being realized under uncertainty, validity indices do not deliver any quantitative evaluation of the uncertainties in the suggested partitionings. Also, validity measures may be biased towards the underlying clustering method. Moreover, neglecting a confidence requirement may result in over-partitioning. In the absence of an error estimate or a confidence parameter, probable clustering errors are forwarded to the later stages of the system. Whereas, having an uncertainty margin of the projected labeling can be very fruitful for many applications such as machine learning. Herein, the validity issue was approached through estimation of the uncertainty and a novel low complexity index proposed for fuzzy clustering. It involves only uni-dimensional membership weights, regardless of the data dimension, stipulates no specific distribution, and is independent of the underlying similarity measure. Inclusive tests and comparisons returned that it can reliably estimate the optimum number of partitions under different data distributions, besides behaving more robust to over partitioning. Also, in the comparative correlation analysis between true clustering error rates and some known internal validity indices, the suggested index exhibited the highest strong correlations. This relationship has been also proven stable through additional statistical acceptance tests. Thus the provided relative uncertainty measure can be used as a probable error estimate in the clustering as well. Besides, it is the only method known that can exclusively identify data points in dubiety and is adjustable according to the required confidence level.
IEEE Transactions on Fuzzy Systems, 2011
Since a clustering algorithm can produce as many partitions as desired, one needs to assess their quality in order to select the partition that most represents the structure in the data if there is any. This is the rationale for the cluster validity problem and indices. This paper presents a cluster validity index that helps to find the optimal number of clusters of data from partitions generated by a fuzzy clustering algorithm such as the Fuzzy C-Means (FCM) or its derivatives. Given a fuzzy partition, this new index uses a measure of multiple clusters overlap and a separation measure for each data point, both based on an aggregation operation of membership degrees. Experimental results on artificial and benchmark data sets are given to demonstrate the performance of the proposed index as compared to traditional and recent indices.
Two well-known drawbacks in fuzzy clustering are the requirement of assign in advance the number of clusters and random initialization of cluster centers.; the quality of the final fuzzy clusters depends heavily on the initial choice of the number of clusters and the initialization of the clusters, then it is necessary to apply a validity index to measure the compactness and the separability of the final clusters and run the clustering algorithm several times. We propose a new fuzzy C-means algorithm in which a validity index based on the concepts of maximum fuzzy energy and minimum fuzzy entropy is applied to initialize the cluster centers and to find the optimal number of clusters and initial cluster centers in order to obtain a good clustering quality, without increasing time consumption. We test our algorithm on UCI machine learning classification datasets comparing the results with the ones obtained by using well-known validity indices and variations of FCM using optimization a...
15th International Conference on Advanced Computing and Communications (ADCOM 2007), 2007
Identification of correct number of clusters and the corresponding partitioning are two important considerations in clustering. In this paper, a new fuzzy quantizationdequantization criterion is used to propose a cluster validity index named Fuzzy Vector Quantization based validity index, FVQ index. This index identifies how well the formed cluster centers represent that particular data set. In general, most of the existing validity indices try to optimize the total variance of the partitioning which is a measure of compactness of the clusters so formed. Here a new kind of error function which reflects how well the formed cluster centers represent the whole data set is used as the goodness of the obtained partitioning. This error function is monotonically decreasing with increase in the number of clusters. Minimum separation between two cluster centers is used here to normalize the error function. The well-known genetic algorithm based K-means clustering algorithm (GAK-means) is used as the underlying partitioning technique. The number of clusters is varied from 2 to √ N where N is the total number of data points present in the data set and the values of the proposed validity index is noted down. The minimum value of the FVQ index over these √ N −1 partitions corresponds to the appropriate partitioning and the number of partitions as indicated by the validity index. Results on five artificially generated and three real-life data sets show the effectiveness of the proposed validity index. For the purpose of comparison the cluster number identified by a well-known cluster validity index, XB-index, for the above mentioned eight data sets are also reported.
Pattern Recognition Letters, 2008
We introduce two new criterions for validation of results obtained from recent novel-clustering algorithm, improved fuzzy clustering (IFC) to be used to find patterns in regression and classification type datasets, separately. IFC algorithm calculates membership values that are used as additional predictors to form fuzzy decision functions for each cluster. Proposed validity criterions are based on the ratio of compactness to separability of clusters. The optimum compactness of a cluster is represented with average distances between every object and cluster centers, and total estimation error from their fuzzy decision functions. The separability is based on a conditional ratio between the similarities between cluster representatives and similarities between fuzzy decision surfaces of each cluster. The performance of the proposed validity criterions are compared to other structurally similar cluster validity indexes using datasets from different domains. The results indicate that the new cluster validity functions are useful criterions when selecting parameters of IFC models.
2008
Clustering is one of the most important task in pattern recognition. For most of partitional clustering algorithms, a partition that represents as much as possible the structure of the data is generated. In this paper, we adress the problem of finding the optimal number of clusters from data. This can be done by introducing an index which evaluates the validity of the generated fuzzy c-partition. We propose to use a criterion based on the fuzzy combination of membership values which quantifies the l-order overlap and the intercluster separation of a given pattern.
2006
Since clustering is an unsupervised method and there is no a-priori indication for the actual number of clusters presented in a data set, there is a need of some kind of clustering results validation. In this paper, we propose a new cluster validity index for the fuzzy clustering algorithms. This validation includes two levels. The first during the clustering process for identifying the worst cluster to delete it. The second includes the validity function for evaluating the set of the resulting partitions.
Pattern Recognition, 2004
In this article, a cluster validity index and its fuzziÿcation is described, which can provide a measure of goodness of clustering on di erent partitions of a data set. The maximum value of this index, called the PBM-index, across the hierarchy provides the best partitioning. The index is deÿned as a product of three factors, maximization of which ensures the formation of a small number of compact clusters with large separation between at least two clusters. We have used both the k-means and the expectation maximization algorithms as underlying crisp clustering techniques. For fuzzy clustering, we have utilized the well-known fuzzy c-means algorithm. Results demonstrating the superiority of the PBM-index in appropriately determining the number of clusters, as compared to three other well-known measures, the Davies-Bouldin index, Dunn's index and the Xie-Beni index, are provided for several artiÿcial and real-life data sets.
Pattern Recognition, 1999
Fuzzy cluster-validity criterion tends to evaluate the quality of fuzzy c-partitions produced by fuzzy clustering algorithms. Many functions have been proposed. Some methods use only the properties of fuzzy membership degrees to evaluate partitions. Others techniques combine the properties of membership degrees and the structure of data. In this paper a new heuristic method is based on the combination of two functions. The search of good clustering is measured by a fuzzy compactness}separation ratio. The "rst function calculates this ratio by considering geometrical properties and membership degrees of data. The second function evaluates it by using only the properties of membership degrees. Four numerical examples are used to illustrate its use as a validity functional. Its e!ectiveness is compared to some existing cluster-validity criterion.
Pattern Recognition Letters, 2006
Cluster validation is a major issue in cluster analysis. Many existing validity indices do not perform well when clusters overlap or there is significant variation in their covariance structure. The contribution of this paper is twofold. First, we propose a new validity index for fuzzy clustering. Second, we present a new approach for the objective evaluation of validity indices and clustering algorithms. Our validity index makes use of the covariance structure of clusters, while the evaluation approach utilizes a new concept of overlap rate that gives a formal measure of the difficulty of distinguishing between overlapping clusters. We have carried out experimental studies using data sets containing clusters of different shapes and densities and various overlap rates, in order to show how validity indices behave when clusters become less and less separable. Finally, the effectiveness of the new validity index is also demonstrated on a number of real-life data sets.
International Journal of Computer Engineering in Research Trends, 2018
This paper presents a comparative study on clustering methods and developments made at various times. Clustering is defined as unsupervised learning where the objects are grouped on the basis of some similarity inherent among them. There are different methods for clustering objects such as hierarchical, partitioned, grid, density based and model-based. Many algorithms exist that can solve the problem of clustering, but most of them are very sensitive to their input parameters. Therefore it is essential to evaluate the result of the clustering algorithm. It is difficult to define whether a clustering result is acceptable or not; thus several clustering validity techniques and indices have been developed. Cluster validity indices are used for measuring the goodness of a clustering result comparing to other ones which were created by other clustering algorithms, or by the same algorithms but using different parameter values. The results of a clustering algorithm on the same data set can vary as the input parameters of an algorithm can extremely modify the behaviour and execution of the algorithm the intention of this paper is to describe the clustering process with an overview of different clustering methods and analysis of clustering validity indices.
2012
Abstract Traditional quality indexes (Inertia, DB,…) are known to be method-dependent indexes that do not allow to properly estimate the quality of the clustering in several cases, as in that one of complex data, like textual data. We thus propose an alternative approach for clustering quality evaluation based on unsupervised measures of Recall, Precision and F-measure exploiting the descriptors of the data associated with the obtained clusters. Two categories of index are proposed, that are Macro and Micro indexes.
Clustering attempts to discover significant groups present in a data set. It is an unsupervised process. It is difficult to define when a clustering result is acceptable. Thus, several clustering validity indices are developed to evaluate the qual-ity of clustering algorithms results. In this paper, we pro-pose to improve the quality of a clustering algorithm called "CLUSTER" by using a validity index. CLUSTER is an au-tomatic clustering technique. It is able to identify situations where data do not have any natural clusters. However, CLUS-TER has some drawbacks. In several cases, CLUSTER gen-erates small and not well-separated clusters. The extension of CLUSTER with validity indices overcomes these drawbacks. We propose four extensions of CLUSTER with four validity indices Dunn, DunnRNG, DB, and DB * . These extensions provide an adequate number of clusters. The experimental results on real data show that these algorithms improve the clustering quality of CLUSTER. In part...
IEEE Access, 2017
Clustering is an important problem, which has been applied in many research areas. However, there is a large variety of clustering algorithms and each could produce quite different results depending on the choice of algorithm and input parameters, so how to evaluate clustering quality and find out the optimal clustering algorithm is important. Various clustering validity indices are proposed under this background. Traditional clustering validity indices can be divided into two categories: internal and external. The former is mostly based on compactness and separation of data points, which is measured by the distance between clusters' centroids, ignoring the shape and density of clusters. The latter needs external information, which is unavailable in most cases. In this paper, we propose a new clustering validity index for both fuzzy and hard clustering algorithms. Our new index uses pairwise pattern information from a certain number of interrelated clustering results, which focus more on logical reasoning than geometrical features. The proposed index overcomes some shortcomings of traditional indices. Experiments show that the proposed index performs better compared with traditional indices on the artificial and real datasets. Furthermore, we applied the proposed method to solve two existing problems in telecommunication fields. One is to cluster serving GPRS support nodes in the city Chongqing based on service characteristics, the other is to analyze users' preference. INDEX TERMS Pairwise pattern, clustering validity, clustering analysis, fuzzy c-means.
2007
To measure the fuzziness of fuzzy sets, this paper introduces a distance-based and a fuzzy entropybased measurements. Then these measurements are generalized to measure the fuzziness of fuzzy partition, namely partition fuzziness. According to the relationship between the validity of fuzzy partition and its partition fuzziness, a family of cluster validity functions is proposed based on the modified partition fuzziness. The new cluster validity functions overcome the increasing tendency of the traditional partition fuzziness with the cluster number, which provides an effective analysis methodology for fuzzy cluster validity. The experimental results with different testing data sets illustrate the effectiveness, reliability, sensitivity and applicability of the proposed cluster validity function.
HAL (Le Centre pour la Communication Scientifique Directe), 2006
Performance of any clustering algorithm depends critically on the number of clusters that are initialized. A practitioner might not know, a priori, the number of partitions into which his data should be divided; to address this issue many cluster validity indices have been proposed for finding the optimal number of partitions. In this paper, we propose a new "Graded Distance index" (GD_index) for computing optimal number of fuzzy clusters for a given data set. The efficiency of this index is compared with well-known existing indices and tested on several data sets. It is observed that the "GD_index" is able to correctly compute the optimal number of partitions in most of the data sets that are tested.
Clustering (or cluster analysis) has been used widely in pattern recognition, image processing, and data analysis. It aims to organize a collection of data items into c clusters, such that items within a cluster are more similar to each other than they are items in the other clusters. The number of clusters c is the most important parameter, in the sense that the remaining parameters have less influence on the resulting partition. To determine the best number of classes several methods were made, and are called validity index. This paper presents a new validity index for fuzzy clustering called a Modified Partition Coefficient And Exponential Separation (MPCAES) index. The efficiency of the proposed MPCAES index is compared with several popular validity indexes. More information about these indexes is acquired in series of numerical comparisons and also real data Iris.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.