Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2003
…
6 pages
1 file
A resampling scheme for clustering with similarity to bootstrap aggregation (bagging) is presented. Bagging is used to improve the quality of pathbased clustering, a data clustering method that can extract elongated structures from data in a noise robust way. The results of an agglomerative optimization method are influenced by small fluctuations of the input data. To increase the reliability of clustering solutions, a stochastic resampling method is developed to infer consensus clusters. A related reliability measure allows us to estimate the number of clusters, based on the stability of an optimized cluster solution under resampling. The quality of path-based clustering with resampling is evaluated on a large image dataset of human segmentations.
Marketing Letters, 2010
Segmentation results derived using cluster analysis depend on (1) the structure of the data and (2) algorithm parameters. Typically neither the data structure is assessed in advance of clustering nor is the sensitivity of the analysis to changes in algorithm parameters. We propose a benchmarking framework based on bootstrapping techniques that accounts for sample and algorithm randomness. This provides much needed guidance both to data analysts and users of clustering solutions regarding the choice of the final clusters from computations which are exploratory in nature.
Combination of multiple clusterings is an important task in the area of unsupervised learning. Inspired by the success of supervised bagging algorithms, we propose a resampling scheme for integration of multiple independent clusterings. Individual partitions in the ensemble are sequentially generated by clustering specially selected subsamples of the given data set. In this paper, we compare the efficacy of both subsampling (sampling without replacement) and bootstrap (with replacement) techniques in conjunction with several fusion algorithms. The empirical study shows that a meaningful consensus partition for an entire set of data points emerges from multiple clusterings of subsamples of small size. The purpose of this paper is to show that small subsamples generally suffice to represent the structure of the entire data set in the framework of clustering ensembles. Subsamples of small size can reduce computational cost and measurement complexity for many unsupervised data mining ta...
Pattern Recognition, 2014
Contrary to most of the existing 3D shape clustering methods, in which all the objects in a dataset must be classified in clusters, in this paper we tackle an incomplete but reliable unsupervised clustering solution. The central idea lies in obtaining coherent 3D shape groups using a consensus between different similarity measures which are defined in a common 3D shape representation framework. Our goal, therefore, is to extract some consistent groups of objects, considering the incomplete classification, if this occurs, as a natural result. The Weighted Cone Curvature (WCC) is defined as an overall feature which synthesizes a set of curvature levels on the nodes of a standard triangular mesh representation. The WCC concept is used to define a master descriptor called an RC-Image on which up to eight similarity measures are defined. A hierarchical clustering process is then carried out for all the measures and evaluated by means of a clustering confidence measure. Finally, a consensus between the best measures is achieved to provide a coherent group of objects. The proposed clustering approach has been tested on a set of mesh models belonging to a wide variety of free-shape objects, yielding promising results. The results of our experiments demonstrate that both the 3D shape descriptor used and the clustering strategy proposed might be useful for future developments in the unsupervised grouping field.
Computer Vision and Image Understanding, 2004
The goal of this communication is to suggest an alternative implementation of the k-way Ncut approach for image segmentation. We believe that our implementation alleviates a problem associated with the Ncut algorithm for some types of images: its tendency to partition regions that are nearly uniform with respect to the segmentation parameter. Previous implementations have used the k-means algorithm to cluster the data in the eigenspace of the affinity matrix. In the k-means based implementations, the number of clusters is estimated by minimizing a function that represents the quality of the results produced by each possible value of k. Our proposed approach uses the clustering algorithm of Koontz and Fukunaga in which k is automatically selected as clusters are formed (in a single iteration). We show comparison results obtained with the two different approaches to non-parametric clustering. The Ncut generated oversegmentations are further suppressed by a grouping stage-also Ncut based-in our implementation. The affinity matrix for the grouping stage uses similarity based on the mean values of the segments.
2003
Perceptual Grouping organizes image parts in clusters based on psychophysically plausible similarity measures. We propose a novel grouping method in this paper, which stresses connectedness of image elements via mediating elements rather than favoring high mutual similarity. This grouping principle yields superior clustering results when objects are distributed on lowdimensional extended manifolds in a feature space, and not as local point clouds. In addition to extracting connected structures, objects are singled out as outliers when they are too far away from any cluster structure. The objective function for this perceptual organization principle is optimized by a fast agglomerative algorithm. We report on perceptual organization experiments where small edge elements are grouped to smooth curves. The generality of the method is emphasized by results from grouping textured images with texture gradients in an unsupervised fashion.
Clustering ensembles combine multiple partitions of data into a single clustering solution of better quality. Inspired by the success of supervised bagging and boosting algorithms, we propose non-adaptive and adaptive resampling schemes for the integration of multiple independent and dependent clusterings.We investigate the effectiveness of bagging techniques, comparing the efficacy of sampling with and without replacement, in conjunction with several consensus algorithms. In our adaptive approach, individual partitions in the ensemble are sequentially generated by clustering specially selected subsamples of the given dataset.
1987
AImtract--We define a method to estimate the number of clusters in a data set E, using the bootstrap technique. This approach involves the generation of several "fake" data sets by sampling patterns with replacement in E (bootstrapping). For each number, K, of clusters, a measure of stability of the K-cluster partitions over the bootstrap samples is used to characterize the significance of the K-cluster partition for the original data set. The value of K which provides the most stable partitions is the estimate of the number ot clusters m 6. I ne perlormance ot tam new techmque Is demonstrated on both synthetic and real data, and is applied to the segmentation of range images.
2002
The concept of cluster stability is introduced as a means for assessing the validity of data partitionings found by clustering algorithms. It allows us to explicitly quantify the quality of a clustering solution, without being dependent on external information. The principle of maximizing the cluster stability can be interpreted as choosing the most self-consistent data partitioning. We present an empirical estimator for the theoretically derived stability index, based on imitating independent sample-sets by way of resampling. Experiments on both toy-examples and real-world problems effectively demonstrate that the proposed validation principle is highly suited for model selection.
International Journal of Imaging Systems and Technology, 2009
We propose an approach for data clustering based on optimum-path forest. The samples are taken as nodes of a graph, whose arcs are defined by an adjacency relation. The nodes are weighted by their probability density values (pdf) and a connectivity function is maximized, such that each maximum of the pdf becomes root of an optimum-path tree (cluster), composed by samples ''more strongly connected'' to that maximum than to any other root. We discuss the advantages over other pdf-based approaches and present extensions to large datasets with results for interactive image segmentation and for fast, accurate, and automatic brain tissue classification in magnetic resonance (MR) images. We also include experimental comparisons with other clustering approaches. Figure 4. The boxes show an 1D pdf q with four maxima. The map V (white) indicates the removal of two irrelevant domes (gray) when h(t) 5 q(t) 2 2. The 1D optimum-path forest P (vectors) shows the influence zones of the two remaining maxima.
Neural Computing and Applications, 2016
Organizing data into sensible groups is called as 'data clustering.' It is an open research problem in various scientific fields. Neither a universal solution nor an absolute strategy for its evaluation exists in the literature. In this context, through this paper, we make following three contributions: (1) A new method for finding 'natural groupings' or clusters in the data set is presented. For this, a new term 'vicinity' is coined. Vicinity captures the idea of density together with spatial distribution of data points in feature space. This new notion has a potential to separate various type of clusters. In summary, the approach presented here is non-convex admissive (i.e., convex hulls of the clusters found can intersect which is desirable for non-convex clusters), cluster proportion and omission admissive (i.e., duplicating a cluster arbitrary number of times or deleting a cluster does not alter other cluster's boundaries), scale covariant, consistent (shrinking within cluster distances and enlarging inter-cluster distances does not affect the clustering results) but not rich (does not generates exhaustive partitions of the data) and density invariant. (2) Strategy for automatic detection of various tunable parameters in the proposed 'Vicinity Based Cluster Detection' (VBCD) algorithm is presented. (3) New internal evaluation index called 'Space-Density Index' (SDI) for the clustered results (by any method) is also presented. Experimental results reveal that VBCD captures the idea of 'natural groupings' better than the existing approaches. Also, SDI evaluation scheme provides a better judgment as compared to earlier internal cluster validity indices.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.
Pattern Recognition Letters, 1994
2013 IEEE International Conference on Image Processing, 2013
Lecture Notes in Computer Science, 2010
Science Advances, 2019
Lecture Notes in Computer Science, 2017
Neural Computation, 2004
Lecture Notes in Computer Science, 2013
ArXiv, 2019
Lecture Notes in Computer Science, 2010
Neural computation, 2001
Proceedings of the Seventh IEEE International Conference on Computer Vision, 1999
International Journal of Computer Applications , 2016
Advances in Data Analysis and Classification, 2009