Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
…
6 pages
1 file
Clustering (or cluster analysis) has been used widely in pattern recognition, image processing, and data analysis. It aims to organize a collection of data items into c clusters, such that items within a cluster are more similar to each other than they are items in the other clusters. The number of clusters c is the most important parameter, in the sense that the remaining parameters have less influence on the resulting partition. To determine the best number of classes several methods were made, and are called validity index. This paper presents a new validity index for fuzzy clustering called a Modified Partition Coefficient And Exponential Separation (MPCAES) index. The efficiency of the proposed MPCAES index is compared with several popular validity indexes. More information about these indexes is acquired in series of numerical comparisons and also real data Iris.
IEEE Transactions on Fuzzy Systems, 2011
Since a clustering algorithm can produce as many partitions as desired, one needs to assess their quality in order to select the partition that most represents the structure in the data if there is any. This is the rationale for the cluster validity problem and indices. This paper presents a cluster validity index that helps to find the optimal number of clusters of data from partitions generated by a fuzzy clustering algorithm such as the Fuzzy C-Means (FCM) or its derivatives. Given a fuzzy partition, this new index uses a measure of multiple clusters overlap and a separation measure for each data point, both based on an aggregation operation of membership degrees. Experimental results on artificial and benchmark data sets are given to demonstrate the performance of the proposed index as compared to traditional and recent indices.
Pattern Recognition, 1999
Fuzzy cluster-validity criterion tends to evaluate the quality of fuzzy c-partitions produced by fuzzy clustering algorithms. Many functions have been proposed. Some methods use only the properties of fuzzy membership degrees to evaluate partitions. Others techniques combine the properties of membership degrees and the structure of data. In this paper a new heuristic method is based on the combination of two functions. The search of good clustering is measured by a fuzzy compactness}separation ratio. The "rst function calculates this ratio by considering geometrical properties and membership degrees of data. The second function evaluates it by using only the properties of membership degrees. Four numerical examples are used to illustrate its use as a validity functional. Its e!ectiveness is compared to some existing cluster-validity criterion.
2009
This paper presents a new approach to find the optimal number of clusters of a fuzzy partition. It is based on a fuzzy modeling approach which combines measures of clusters' separation and overlap. Theses measures are based on triangular norms and a discrete Sugeno integral. Results on artificial and real data sets prove its efficiency compared to indexes from the literature.
Two well-known drawbacks in fuzzy clustering are the requirement of assign in advance the number of clusters and random initialization of cluster centers.; the quality of the final fuzzy clusters depends heavily on the initial choice of the number of clusters and the initialization of the clusters, then it is necessary to apply a validity index to measure the compactness and the separability of the final clusters and run the clustering algorithm several times. We propose a new fuzzy C-means algorithm in which a validity index based on the concepts of maximum fuzzy energy and minimum fuzzy entropy is applied to initialize the cluster centers and to find the optimal number of clusters and initial cluster centers in order to obtain a good clustering quality, without increasing time consumption. We test our algorithm on UCI machine learning classification datasets comparing the results with the ones obtained by using well-known validity indices and variations of FCM using optimization a...
HAL (Le Centre pour la Communication Scientifique Directe), 2006
Performance of any clustering algorithm depends critically on the number of clusters that are initialized. A practitioner might not know, a priori, the number of partitions into which his data should be divided; to address this issue many cluster validity indices have been proposed for finding the optimal number of partitions. In this paper, we propose a new "Graded Distance index" (GD_index) for computing optimal number of fuzzy clusters for a given data set. The efficiency of this index is compared with well-known existing indices and tested on several data sets. It is observed that the "GD_index" is able to correctly compute the optimal number of partitions in most of the data sets that are tested.
2008
Clustering is one of the most important task in pattern recognition. For most of partitional clustering algorithms, a partition that represents as much as possible the structure of the data is generated. In this paper, we adress the problem of finding the optimal number of clusters from data. This can be done by introducing an index which evaluates the validity of the generated fuzzy c-partition. We propose to use a criterion based on the fuzzy combination of membership values which quantifies the l-order overlap and the intercluster separation of a given pattern.
Pattern Recognition Letters, 2008
We introduce two new criterions for validation of results obtained from recent novel-clustering algorithm, improved fuzzy clustering (IFC) to be used to find patterns in regression and classification type datasets, separately. IFC algorithm calculates membership values that are used as additional predictors to form fuzzy decision functions for each cluster. Proposed validity criterions are based on the ratio of compactness to separability of clusters. The optimum compactness of a cluster is represented with average distances between every object and cluster centers, and total estimation error from their fuzzy decision functions. The separability is based on a conditional ratio between the similarities between cluster representatives and similarities between fuzzy decision surfaces of each cluster. The performance of the proposed validity criterions are compared to other structurally similar cluster validity indexes using datasets from different domains. The results indicate that the new cluster validity functions are useful criterions when selecting parameters of IFC models.
Pattern Recognition, 2004
In this article, a cluster validity index and its fuzziÿcation is described, which can provide a measure of goodness of clustering on di erent partitions of a data set. The maximum value of this index, called the PBM-index, across the hierarchy provides the best partitioning. The index is deÿned as a product of three factors, maximization of which ensures the formation of a small number of compact clusters with large separation between at least two clusters. We have used both the k-means and the expectation maximization algorithms as underlying crisp clustering techniques. For fuzzy clustering, we have utilized the well-known fuzzy c-means algorithm. Results demonstrating the superiority of the PBM-index in appropriately determining the number of clusters, as compared to three other well-known measures, the Davies-Bouldin index, Dunn's index and the Xie-Beni index, are provided for several artiÿcial and real-life data sets.
Cluster validity indexes have been used to evaluate the fitness of partitions produced by clustering algorithms. This paper presents a new validity index for fuzzy clustering called inter-cluster and intra-cluster separation (IC2S) index. Therefore, we proposed the function of disparity which combines the intra and inter-cluster separation existing between the clusters. The results of comparative study show that the proposed IC2S index has high ability in producing a good cluster number estimate. This performance is achieved by taking into consideration the existing disparity between clusters. To assess the new validation index, two data sets (Fisher's IRIS and Butterfly data set) were used and the results show that IC2S outperforms other clustering validation index for fuzzy c-means.
15th International Conference on Advanced Computing and Communications (ADCOM 2007), 2007
Identification of correct number of clusters and the corresponding partitioning are two important considerations in clustering. In this paper, a new fuzzy quantizationdequantization criterion is used to propose a cluster validity index named Fuzzy Vector Quantization based validity index, FVQ index. This index identifies how well the formed cluster centers represent that particular data set. In general, most of the existing validity indices try to optimize the total variance of the partitioning which is a measure of compactness of the clusters so formed. Here a new kind of error function which reflects how well the formed cluster centers represent the whole data set is used as the goodness of the obtained partitioning. This error function is monotonically decreasing with increase in the number of clusters. Minimum separation between two cluster centers is used here to normalize the error function. The well-known genetic algorithm based K-means clustering algorithm (GAK-means) is used as the underlying partitioning technique. The number of clusters is varied from 2 to √ N where N is the total number of data points present in the data set and the values of the proposed validity index is noted down. The minimum value of the FVQ index over these √ N −1 partitions corresponds to the appropriate partitioning and the number of partitions as indicated by the validity index. Results on five artificially generated and three real-life data sets show the effectiveness of the proposed validity index. For the purpose of comparison the cluster number identified by a well-known cluster validity index, XB-index, for the above mentioned eight data sets are also reported.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.
IEEE Transactions on Fuzzy Systems, 1995
Computers, Materials & Continua, 2022
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2009
Scientia Iranica
Pattern Recognition, 2010
Journal of Development Research, 2021
International Journal of Computer Applications, 2014
Pattern Recognition Letters, 1990
Lecture Notes in Computer Science, 2007
International Journal of Engineering Research, 2014
10th International Conference on Information Technology (ICIT 2007), 2007
Journal of Process Management. New Technologies
International Journal of Futuristic Trends in Engineering and Technology, 2014
Fuzzy Sets and Systems, 2005
IEEE Transactions on Pattern Analysis and Machine Intelligence, 2000
IEEE Transactions on Systems, Man and Cybernetics, Part B (Cybernetics), 1999