Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
1994, Pattern Recognition Letters
In this paper, a new non-iterative clustering method is proposed. It consists of two passes. In the first pass, the mean distance from one object to its nearest neighbor is estimated. Based on this distance, those noises far away from objects are extracted and removed. In the second pass, the mean distance from the remaining objects to their nearest neighbors is computed. Based on the distance, all the intrinsic clusters are then found. The proposed method is non-iterative and can automatically determine the number of clusters. Experimental results also show that the partition generated by the proposed method is more reasonable than that of the well-known c-means algorithm in many complicated object distributions.
2006
In this paper, we propose a new non-parametric clustering method based on local shrinking. Each data point is transformed in such a way that it moves a specific distance toward a cluster center. The direction and the associated size of each movement are determined by the median of its K-nearest neighbors. This process is repeated until a pre-defined convergence criterion is satisfied. The optimal value of the K is decided by optimizing index functions that measure the strengths of clusters. The number of clusters and the final partition are determined automatically without any input parameter except the stopping rule for convergence. Our performance studies have shown that this algorithm converges fast and achieves high accuracy.
2012
A novel clustering algorithm CSHARP is presented for the purpose of finding clusters of arbitrary shapes and arbitrary densities in high dimensional feature spaces. It can be considered as a variation of the Shared Nearest Neighbor algorithm (SNN), in which each sample data point votes for the points in its k-nearest neighborhood. Sets of points sharing a common mutual nearest neighbor are considered as dense regions/ blocks. These blocks are the seeds from which clusters may grow up. Therefore, CSHARP is not a point-to-point clustering algorithm. Rather, it is a block-to-block clustering technique. Much of its advantages come from these facts: Noise points and outliers correspond to blocks of small sizes, and homogeneous blocks highly overlap. The proposed technique is less likely to merge clusters of different densities or different homogeneity. The algorithm has been applied to a variety of low and high dimensional data sets with superior results over existing techniques such as DBScan, K-means, Chameleon, Mitosis and Spectral Clustering. The quality of its results as well as its time complexity, rank it at the front of these techniques.
Neural Computing and Applications, 2016
Organizing data into sensible groups is called as 'data clustering.' It is an open research problem in various scientific fields. Neither a universal solution nor an absolute strategy for its evaluation exists in the literature. In this context, through this paper, we make following three contributions: (1) A new method for finding 'natural groupings' or clusters in the data set is presented. For this, a new term 'vicinity' is coined. Vicinity captures the idea of density together with spatial distribution of data points in feature space. This new notion has a potential to separate various type of clusters. In summary, the approach presented here is non-convex admissive (i.e., convex hulls of the clusters found can intersect which is desirable for non-convex clusters), cluster proportion and omission admissive (i.e., duplicating a cluster arbitrary number of times or deleting a cluster does not alter other cluster's boundaries), scale covariant, consistent (shrinking within cluster distances and enlarging inter-cluster distances does not affect the clustering results) but not rich (does not generates exhaustive partitions of the data) and density invariant. (2) Strategy for automatic detection of various tunable parameters in the proposed 'Vicinity Based Cluster Detection' (VBCD) algorithm is presented. (3) New internal evaluation index called 'Space-Density Index' (SDI) for the clustered results (by any method) is also presented. Experimental results reveal that VBCD captures the idea of 'natural groupings' better than the existing approaches. Also, SDI evaluation scheme provides a better judgment as compared to earlier internal cluster validity indices.
International Journal of Computer Applications
Data mining main goal of information find in large dataset or the data mining process is to take out information from an outsized data set and transform it into a clear kind for any use. group is vital in information analysis and data processing applications. it's the task of clustering a group of objects in order that objects within the same group are additional kind of like different or one another than to those in other teams (clusters).speedy recovery of the related data from databases has invariably been a big issue. There are several techniques are developed for this purpose; in among information cluster is one amongst the key techniques. The method of making very important data from a large quantity of information is learning. It may be classified into 2 like supervised learning and unsupervised learning. Group could be a quite unsupervised data processing technique. It describes the overall operating behavior, the methodologies followed by these approaches and therefore the parameters that have an effect on the performance of those algorithms. a review of cluster and its completely different techniques in data processing is completed.
Image segmentation plays an important role in image analysis. Image segmentation is useful in many applications like medical, face recognition, crop disease detection, and geographical object detection in map. Image segmentation is performed by clustering method. Clustering method is divided into Crisp and Fuzzy clustering methods. FCM is famous method used in fuzzy clustering to improve result of image segmentation. FCM does not work properly in noisy and nonlinear separable image, to overcome this drawback, KFCM method for image segmentation can be used. In KFCM method, Gaussian kernel function is used to convert nonlinear separable data into linear separable data and high dimensional data and then apply FCM on this data. KFCM is improving result of noisy image segmentation. KFCM improves accuracy rate but does not focus on neighbor pixel. NMKFCM method incorporates neighborhood pixel information into objective function and improves result of image segmentation. New proposed algorithm is effective and efficient than other fuzzy clustering algorithms and it has better performance in noisy and noiseless images. In noisy image, find automatically required number of cluster with the help of Hill-climbing algorithm. I.Introduction Image segmentation is a major topic for many image processing research. Image segmentation is critical and essential component of image analysis system. Image segmentation is process of partitioning image into different segment (set of pixel). Segment consist set of similar pixel by using different properties of pixel like intensity, color, tone, texture etc. The goal of image segmentation is to simplify or change the representation of an image into something that is more meaningful and easier to analyze. Image segmentation is performed using four approaches like Clustering, Thresholding, Region Extraction and Edge Detection,. Image segmentation plays more crucial role in many applications like medical image, pattern recognition, machine vision, computer vision, video surveillance, geographical object detection, image analysis, crop disease detection. Clustering is one approach to perform image segmentation on image. Clustering is a process of partitioning or grouping a given set of unlabeled objects into number of clusters such that similar objects are allocated to one cluster [1]. Clustering method perform by using two main approaches like crisp clustering and fuzzy clustering[2]. Crisp clustering is to process in which finding boundary between clusters. In this object belong to only one cluster. Fuzzy clustering has better solution for this problem, in fuzzy clustering object can belong to more than one cluster. Fuzzy C-means (FCM) algorithm is most widely used clustering technique which follows fuzzy clustering for image segmentation. FCM clustering algorithm was first introduced by Dunn and later extended by Bezdek [3][1]. FCM is method of clustering to which allow one object belongs to two or more clusters. FCM is introducing fuzziness with degree of membership function of every object and range of membership function between 0 and 1[4]. Aim of FCM is to minimize value of objective function and perform partition on dataset into n number of clusters. FCM provide better accuracy result than HCM in noiseless image. FCM is not working properly in noisy image and failed in nonlinear separable data, to overcome this drawback Kernel Fuzzy C-means (KFCM) algorithm is used. Kernel function is use to convert nonlinear separable data into linear separable data and low dimension into high dimensional feature space[5]. KFCM is not adequate for image corrupted by impulse noise. KFCM is not to focus on neighbor pixel. Propose Novel Kernel Fuzzy C-means (NMKFCM) algorithm which is to assimilate neighbor term in objective function and amend result over KFCM and FCM in noisy and noiseless image[6]. NMKFCM is very beneficial and useful method for image segmentation. II.Clustering Algorithm Definition: Let F be the set of all pixels and P() be an uniformity (homogeneity) predicate defined on groups of associated pixels, then segmentation is a partitioning of the set F into a set of connected subsets or regions such that with when i ≠ j. The uniformity predicate P(S i) is true for all
Journal of Applied Statistics, 2017
In this paper, we present an algorithm for clustering based on univariate kernel density estimation, named ClusterKDE. It consists of an iterative procedure that in each step a new cluster is obtained by minimizing a smooth kernel function. Although in our applications we have used the univariate Gaussian kernel, any smooth kernel function can be used. The proposed algorithm has the advantage of not requiring a priori the number of cluster. Furthermore, the ClusterKDE algorithm is very simple, easy to implement, well-defined and stops in a finite number of steps, namely, it always converges independently of the initial point. We also illustrate our findings by numerical experiments which are obtained when our algorithm is implemented in the software Matlab and applied to practical applications. The results indicate that the ClusterKDE algorithm is competitive and fast when compared with the well-known Clusterdata and K-means algorithms, used by Matlab to clustering data.
International Journal of Modern Education and Computer Science
Clustering is the technique of finding useful patterns in a dataset by effectively grouping similar data items. It is an intense research area with many algorithms currently available, but practically most algorithms do not deal very efficiently with noise. Most real-world data are prone to containing noise due to many factors, and most algorithms, even those which claim to deal with noise, are able to detect only large deviations as noise. In this paper, we present a data-clustering method named SIDNAC, which can efficiently detect clusters of arbitrary shapes, and is almost immune to noisea much desired feature in clustering applications. Another important feature of this algorithm is that it does not require apriori knowledge of the number of clusterssomething which is seldom available.
2006
Density-based clustering algorithms are attractive for the task of class identification in spatial database. However, in many cases, very different local-density clusters exist in different regions of data space, therefore, DBSCAN [Ester, M. et al., A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. In E. Simoudis, J. Han, & U. M. Fayyad (Eds.), Proc. 2nd Int. Conf. on Knowledge Discovery and Data Mining (pp. 226-231). Portland, OR: AAAI.] using a global density parameter is not suitable. As an improvement, OPTICS [Ankerst, M. et al,(1999). OPTICS: Ordering Points To Identify the Clustering Structure. In A. Delis, C. Faloutsos, & S. Ghandeharizadeh (Eds.), Proc. ACM SIGMOD Int. Conf. on Management of Data (pp. 49-60). Philadelphia, PA: ACM.] creates an augmented ordering of the database representing its density-based clustering structure, but it only generates the clusters whose local-density exceeds some threshold instead of similar local-density clusters and doesn't produce a clustering of a data set explicitly. Furthermore the parameters required by almost all the well-known clustering algorithms are hard to determine but have a significant influence on the clustering result. In this paper, a new clustering algorithm LDBSCAN relying on a local-density-based notion of clusters is proposed to solve those problems and, what is more, it is very easy for us to pick the appropriate parameters and takes the advantage of the LOF [Breunig, M. M., et al.,(2000). LOF: Identifying Density-Based Local Outliers. In W. Chen, J. F. Naughton, & P. A. Bernstein (Eds.), Proc. ACM SIGMOD Int. Conf. on Management of Data (pp. 93-104). Dalles, TX: ACM.] to detect the noises comparing with other density-based clustering algorithms. The proposed algorithm has potential applications in business intelligence and enterprise information systems.
2015
Abstract: Image segmentation plays an important role in image analysis. It is one of the first and most important tasks in image analysis and computer vision. This proposed system presents a variation of fuzzy c- means algorithm that provides image clustering. Based on the Mercer kernel, the kernel fuzzy c-means clustering algorithm (KFCM) is derived from the fuzzy c-means clustering algorithm (FCM).The KFCM algorithm that provides image clustering and improves accuracy significantly compared with classical fuzzy C-Means algorithms. This proposed system makes the use of the advantages of KFCM and also incorporates the local spatial information and gray level information in a novel fuzzy way. The new algorithm is called Generalized Spatial Kernel based Fuzzy C-Means (GSKFCM) algorithm. The major characteristic of GSKFCM is the use of a fuzzy local (both spatial and gray level) similarity measure, aiming to guarantee noise insensitiveness and image detail preservation as well as it is...
IEEE Transactions on Fuzzy Systems, 2020
Indonesian Journal of Electrical Engineering and Computer Science
Clustering represents one of the most popular and used Data Mining techniques due to its usefulness and the wide variations of the applications in real world. Defining the number of the clusters required is an application oriented context, this means that the number of clusters k is an input to the whole clustering process. The proposed approach represents a solution for estimating the optimum number of clusters. It is based on the use of iterative K-means clustering under three different criteria; centroids convergence, total distance between the objects and the cluster centroid and the number of migrated objects which can be used effectively to ensure better clustering accuracy and performance. A total of 20000 records available on the internet were used in the proposed approach to test the approach. The results obtained from the approach showed good improvement on clustering accuracy and algorithm performance over the other techniques where centroids convergence represents a maj...
k-Means clustering algorithm is a heuristic algorithm that partitions the dataset into k clusters by minimizing the sum of squared distance in each cluster. In contrast, there are number of weaknesses. First it requires a prior knowledge of cluster number 'k'. Second it is sensitive to initialization which leads to random solutions. This paper presents a new approach to k-Means clustering by providing a solution to initial selection of cluster centroids and a dynamic approach based on silhouette validity index. Instead of running the algorithm for different values of k, the user need to give only initial value of k as k o as input and algorithm itself determines the right number of clusters for a given dataset. The algorithm is implemented in the MATLAB R2009b and results are compared with the original k-Means algorithm and other modified k-Means clustering algorithms. The experimental results demonstrate that our proposed scheme improves the initial center selection and overall computation time.
Pattern Recognition, 2009
In this paper, we present a fast k-means clustering algorithm (FKMCUCD) using the displacements of cluster centers to reject unlikely candidates for a data point. The computing time of our proposed algorithm increases linearly with the data dimension d, whereas the computational complexity of major available kd-tree based algorithms increases exponentially with the value of d. Theoretical analysis shows that our method can reduce the computational complexity of full search by a factor of SF and SF is independent of vector dimension. The experimental results show that compared to full search, our proposed method can reduce computational complexity by a factor of 1.37-4.39 using the data set from six real images. Compared with the filtering algorithm, which is among the available best algorithms of k-means clustering, our algorithm can effectively reduce the computing time. It is noted that our proposed algorithm can generate the same clusters as that produced by hard k-means clustering. The superiority of our method is more remarkable when a larger data set with higher dimension is used.
Proceedings of the Seventh IEEE International Conference on Computer Vision, 1999
In cluster-based segmentation pixels are mapped into various feature-spaces whereupon they are subjected to a grouping-algorithm. In this paper we develop a robust and versatile non-parametric clustering algorithm that is able to handle the unbalanced and irregular clusters encountered in such segmentationapplications. The strength of our approach lies in the de nition and use of two cluster-validity indices that are independent of the cluster-topology. By combining them, an excellent clustering can be identi ed, and experiments con rm that the associated clusters do indeed c orrespond to perceptually salient image-regions.
IAEME PUBLICATION, 2014
Clustering is a method of unsupervised learning, and a common technique for statistical data analysis used in many fields, including machine learning, data mining, pattern recognition, image analysis and bioinformatics. Hierarchical clustering cannot represent distinct clusters with similar expression patterns. In order to address the problem that the time complexity of the existing hierarchical K-means algorithms is high and most of algorithms are sensitive to noise, a hierarchical K-means clustering algorithm based on silhouette and entropy (HKSE) is put forward. The optimal number of clusters is determined through computing the average Improved Silhouette of the dataset, such that the time complexity can be reduced. we develop an algorithm for calculating the overlap rate. Then we show how the theory can be used to deal with the problem of cluster merging in a hierarchical approach to clustering and give an optimal number of clusters automatically. Finally, experimental results demonstrate the effectiveness of the overlap rate measuring method and the new hierarchical clustering algorithm.
International Journal of Fuzzy Systems, 2019
The fuzzy c-means (FCM) clustering algorithm is an unsupervised learning method that has been widely applied to cluster unlabeled data automatically instead of artificially, but is sensitive to noisy observations due to its inappropriate treatment of noise in the data. In this paper, a novel method considering noise intelligently based on the existing FCM approach, called adaptive-FCM and its extended version (adaptive-REFCM) in combination with relative entropy, are proposed. Adaptive-FCM, relying on an inventive integration of the adaptive norm, benefits from a robust overall structure. Adaptive-REFCM further integrates the properties of the relative entropy and normalized distance to preserve the global details of the dataset. Several experiments are carried out, including noisy or noisefree University of California Irvine (UCI) clustering and image segmentation experiments. The results show that adaptive-REFCM exhibits better noise robustness and adaptive adjustment in comparison with relevant state-ofthe-art FCM methods.
Computer programs in biomedicine, 1972
A computer program for nonparametric cluster synthesis, using similarity rather than maximum likelihood as the basis for class membership, is presented. The algorithm utilizes recursive computations to develop a hierarchy or tree of nested clusters. The major components of the program are: (1) a (dis)similarity function.
Journal of Computers, 2013
Clustering as an important unsupervised learning technique is widely used to discover the inherent structure of a given data set. For clustering is depended on applications, researchers use different models to defined clustering problems. Heuristic clustering algorithm is an efficient way to deal with clustering problem defined by combining optimization model, but initialization sensitivity is an inevitable problem. In the past decades, a lot of methods have been proposed to deal with such problem. In this paper, on the contrary, we take the advantage of the initialization sensitivity to design a new clustering algorithm. We, firstly, run K-means, a widely used heuristic clustering algorithm, on data set for multiple times to generate several clustering results; secondly, propose a structure named Local Accumulative Knowledge (LAKE) to capture the common information of clustering results; thirdly, execute the Single-linkage algorithm on LAKE to generate a rough clustering result; eventually, assign the rest data objects to the corresponding clusters. Experimental results on synthetic and real world data sets demonstrate the superiority of the proposed approach in terms of clustering quality measures.
In this paper, a new level-based (hierarchical) approach to the fuzzy clustering problem for spatial data is proposed. In this approach each point of the initial set is handled as a fuzzy point of the multidimensional space. Fuzzy point conical form, fuzzy -neighbor points, fuzzy -joint points are defined and their properties are explored. It is known that in classical fuzzy clustering the matter of fuzziness is usually a possibility of membership of each element into different classes with different positive degrees from [0,1]. In this study, the fuzziness of clustering is evaluated as how much in detail the properties of classified elements are investigated. In this extent, a new Fuzzy Joint Points (FJP) method which is robust through noises is proposed. Algorithm of FJP method is developed and some properties of the algorithm are explored. Also sufficient condition to recognize a hidden optimal structure of clusters is proven. The main advantage of the FJP algorithm is that it combines determination of initial clusters, cluster validity and direct clustering, which are the fundamental stages of a clustering process. It is possible to handle the fuzzy properties with various level-degrees of details and to recognize individual outlier elements as independent classes by the FJP method. This method could be important in biological, medical, geographical information, mapping, etc. problems.
IEEE Transactions on Image Processing, 2000
This paper presents a variation of fuzzy c-means (FCM) algorithm that provides image clustering. The proposed algorithm incorporates the local spatial information and gray level information in a novel fuzzy way. The new algorithm is called Fuzzy Local Information C-Means (FLICM). FLICM can overcome the disadvantages of the known fuzzy c-means algorithms and at the same time enhances the clustering performance. The major characteristic of FLICM is the use of a fuzzy local (both spatial and gray level) similarity measure, aiming to guarantee noise insensitiveness and image detail preservation. Furthermore, the proposed algorithm is fully free of the empirically adjusted parameters (a, λg, λs, etc) incorporated in to all other fuzzy c-means algorithms proposed in the literature. Experiments performed on synthetic and real-world images show that FLICM algorithm is effective and efficient, providing robustness to noisy images.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.