Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2013
…
6 pages
1 file
Clustering can be defined as the process of grouping physical or abstract objects into classes of similar objects. It’s an unsupervised learning problem of organizing unlabeled objects into natural groups in such a way objects in the same group is more similar than objects in the different groups. Conventional clustering algorithms cannot handle uncertainty that exists in the real life experience. Fuzzy clustering handles incompleteness, vagueness in the data set efficiently. The goodness of clustering is measured in terms of cluster validity indices where the results of clustering are validated repeatedly for different cluster partitions to give the maximum efficiency i.e. to determine the optimal number of clusters. Especially, fuzzy clustering has been widely applied in a variety of areas and fuzzy cluster validation plays a very important role in fuzzy clustering. Since then Fuzzy clustering has been evaluated using various cluster validity indices. But primary indices have used...
Computers, Materials & Continua, 2022
Unsupervised clustering and clustering validity are used as essential instruments of data analytics. Despite clustering being realized under uncertainty, validity indices do not deliver any quantitative evaluation of the uncertainties in the suggested partitionings. Also, validity measures may be biased towards the underlying clustering method. Moreover, neglecting a confidence requirement may result in over-partitioning. In the absence of an error estimate or a confidence parameter, probable clustering errors are forwarded to the later stages of the system. Whereas, having an uncertainty margin of the projected labeling can be very fruitful for many applications such as machine learning. Herein, the validity issue was approached through estimation of the uncertainty and a novel low complexity index proposed for fuzzy clustering. It involves only uni-dimensional membership weights, regardless of the data dimension, stipulates no specific distribution, and is independent of the underlying similarity measure. Inclusive tests and comparisons returned that it can reliably estimate the optimum number of partitions under different data distributions, besides behaving more robust to over partitioning. Also, in the comparative correlation analysis between true clustering error rates and some known internal validity indices, the suggested index exhibited the highest strong correlations. This relationship has been also proven stable through additional statistical acceptance tests. Thus the provided relative uncertainty measure can be used as a probable error estimate in the clustering as well. Besides, it is the only method known that can exclusively identify data points in dubiety and is adjustable according to the required confidence level.
Two well-known drawbacks in fuzzy clustering are the requirement of assign in advance the number of clusters and random initialization of cluster centers.; the quality of the final fuzzy clusters depends heavily on the initial choice of the number of clusters and the initialization of the clusters, then it is necessary to apply a validity index to measure the compactness and the separability of the final clusters and run the clustering algorithm several times. We propose a new fuzzy C-means algorithm in which a validity index based on the concepts of maximum fuzzy energy and minimum fuzzy entropy is applied to initialize the cluster centers and to find the optimal number of clusters and initial cluster centers in order to obtain a good clustering quality, without increasing time consumption. We test our algorithm on UCI machine learning classification datasets comparing the results with the ones obtained by using well-known validity indices and variations of FCM using optimization a...
Pattern Recognition, 2004
In this article, a cluster validity index and its fuzziÿcation is described, which can provide a measure of goodness of clustering on di erent partitions of a data set. The maximum value of this index, called the PBM-index, across the hierarchy provides the best partitioning. The index is deÿned as a product of three factors, maximization of which ensures the formation of a small number of compact clusters with large separation between at least two clusters. We have used both the k-means and the expectation maximization algorithms as underlying crisp clustering techniques. For fuzzy clustering, we have utilized the well-known fuzzy c-means algorithm. Results demonstrating the superiority of the PBM-index in appropriately determining the number of clusters, as compared to three other well-known measures, the Davies-Bouldin index, Dunn's index and the Xie-Beni index, are provided for several artiÿcial and real-life data sets.
IEEE Transactions on Fuzzy Systems, 2011
Since a clustering algorithm can produce as many partitions as desired, one needs to assess their quality in order to select the partition that most represents the structure in the data if there is any. This is the rationale for the cluster validity problem and indices. This paper presents a cluster validity index that helps to find the optimal number of clusters of data from partitions generated by a fuzzy clustering algorithm such as the Fuzzy C-Means (FCM) or its derivatives. Given a fuzzy partition, this new index uses a measure of multiple clusters overlap and a separation measure for each data point, both based on an aggregation operation of membership degrees. Experimental results on artificial and benchmark data sets are given to demonstrate the performance of the proposed index as compared to traditional and recent indices.
15th International Conference on Advanced Computing and Communications (ADCOM 2007), 2007
Identification of correct number of clusters and the corresponding partitioning are two important considerations in clustering. In this paper, a new fuzzy quantizationdequantization criterion is used to propose a cluster validity index named Fuzzy Vector Quantization based validity index, FVQ index. This index identifies how well the formed cluster centers represent that particular data set. In general, most of the existing validity indices try to optimize the total variance of the partitioning which is a measure of compactness of the clusters so formed. Here a new kind of error function which reflects how well the formed cluster centers represent the whole data set is used as the goodness of the obtained partitioning. This error function is monotonically decreasing with increase in the number of clusters. Minimum separation between two cluster centers is used here to normalize the error function. The well-known genetic algorithm based K-means clustering algorithm (GAK-means) is used as the underlying partitioning technique. The number of clusters is varied from 2 to √ N where N is the total number of data points present in the data set and the values of the proposed validity index is noted down. The minimum value of the FVQ index over these √ N −1 partitions corresponds to the appropriate partitioning and the number of partitions as indicated by the validity index. Results on five artificially generated and three real-life data sets show the effectiveness of the proposed validity index. For the purpose of comparison the cluster number identified by a well-known cluster validity index, XB-index, for the above mentioned eight data sets are also reported.
Pattern Recognition Letters, 2008
We introduce two new criterions for validation of results obtained from recent novel-clustering algorithm, improved fuzzy clustering (IFC) to be used to find patterns in regression and classification type datasets, separately. IFC algorithm calculates membership values that are used as additional predictors to form fuzzy decision functions for each cluster. Proposed validity criterions are based on the ratio of compactness to separability of clusters. The optimum compactness of a cluster is represented with average distances between every object and cluster centers, and total estimation error from their fuzzy decision functions. The separability is based on a conditional ratio between the similarities between cluster representatives and similarities between fuzzy decision surfaces of each cluster. The performance of the proposed validity criterions are compared to other structurally similar cluster validity indexes using datasets from different domains. The results indicate that the new cluster validity functions are useful criterions when selecting parameters of IFC models.
2009
This paper presents a new approach to find the optimal number of clusters of a fuzzy partition. It is based on a fuzzy modeling approach which combines measures of clusters' separation and overlap. Theses measures are based on triangular norms and a discrete Sugeno integral. Results on artificial and real data sets prove its efficiency compared to indexes from the literature.
Pattern Recognition, 1999
Fuzzy cluster-validity criterion tends to evaluate the quality of fuzzy c-partitions produced by fuzzy clustering algorithms. Many functions have been proposed. Some methods use only the properties of fuzzy membership degrees to evaluate partitions. Others techniques combine the properties of membership degrees and the structure of data. In this paper a new heuristic method is based on the combination of two functions. The search of good clustering is measured by a fuzzy compactness}separation ratio. The "rst function calculates this ratio by considering geometrical properties and membership degrees of data. The second function evaluates it by using only the properties of membership degrees. Four numerical examples are used to illustrate its use as a validity functional. Its e!ectiveness is compared to some existing cluster-validity criterion.
Lecture Notes in Computer Science, 2007
Because clustering is an unsupervised procedure, clustering results need be judged by external criteria called validity indices. These indices play an important role in determining the number of clusters in a given dataset. A general approach for determining this number is to select the optimal value of a certain cluster validity index. Most existing indices give good results for data sets with well separated clusters, but usually fail for complex data sets, for example, data sets with overlapping clusters. In this paper, we propose a new approach for clustering quality evaluation while combining fuzzy logic with Formal Concept Analysis based on concept lattice. We define a formal quality index including the separation degree and the overlapping rate.
2007
To measure the fuzziness of fuzzy sets, this paper introduces a distance-based and a fuzzy entropybased measurements. Then these measurements are generalized to measure the fuzziness of fuzzy partition, namely partition fuzziness. According to the relationship between the validity of fuzzy partition and its partition fuzziness, a family of cluster validity functions is proposed based on the modified partition fuzziness. The new cluster validity functions overcome the increasing tendency of the traditional partition fuzziness with the cluster number, which provides an effective analysis methodology for fuzzy cluster validity. The experimental results with different testing data sets illustrate the effectiveness, reliability, sensitivity and applicability of the proposed cluster validity function.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.
Pattern Recognition Letters, 1990
Scientia Iranica
Pattern Recognition Letters, 2006
International Journal of Computer Engineering in Research Trends, 2018
Journal of Development Research, 2021
International Journal of Engineering Research, 2014
Fuzzy Logic and Applications, 2009
Fuzzy Sets and Systems, 1998
International Journal of Advance Research, Ideas and Innovations in Technology, 2019
Pattern Recognition, 2010
International Journal of Computer Applications, 2014
Applied Intelligence, 2018
IEEE Transactions on Fuzzy Systems, 1995