Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2012, Journal of Biomedical Science and Engineering
A new method (the Contrast statistic) for estimating the number of clusters in a set of data is proposed. The technique uses the output of self-organising map clustering algorithm, comparing the change in dependency of "Contrast" value upon clusters number to that expected under a uniform distribution. A simulation study shows that the Contrast statistic can be used successfully either, when variables describing the object in a multi-dimensional space are independent (ideal objects) or dependent (real biological objects).
International Journal of Neural Systems, 1999
Determining the structure of data without prior knowledge of the number of clusters or any information about their composition is a problem of interest in many fields, such as image analysis, astrophysics, biology, etc. Partitioning a set of n patterns in a p-dimensional feature space must be done such that those in a given cluster are more similar to each other than the rest. As there are approximately [Formula: see text] possible ways of partitioning the patterns among K clusters, finding the best solution is very hard when n is large. The search space is increased when we have no a priori number of partitions. Although the self-organizing feature map (SOM) can be used to visualize clusters, the automation of knowledge discovery by SOM is a difficult task. This paper proposes region-based image processing methods to post-processing the U-matrix obtained after the unsupervised learning performed by SOM. Mathematical morphology is applied to identify regions of neurons that are simi...
Self-Organizing Maps, 2010
Menemui Matematik (Discovering Mathematics), 2016
The Self-organizing map is among the most acceptable algorithm in the unsupervised learning technique for cluster analysis. It is an important tool used to map high-dimensional data sets onto a low-dimensional discrete lattice of neurons. This feature is used for clustering and classifying data. Clustering is the process of grouping data elements into classes or clusters so that items in each class or cluster are as similar to each other as possible. In this paper, we present an overview of self organizing map, its architecture, applications and its training algorithm. Computer simulations have been analyzed based on samples of data for clustering problems.
Abstract—The self-organizing map (SOM) is an excellent tool in exploratory phase of data mining. It projects input space on prototypes of a low-dimensional regular grid that can be effectively utilized to visualize and explore properties of the data. When the number of SOM units is large, to facilitate quantitative analysis of the map and the data, similar units need to be grouped, i.e., clustered. In this paper, different approaches to clustering of the SOM are considered. In particular, the use of hierarchical agglomerative clustering and partitive clustering using -means are investigated. The two-stage procedure—first using SOM to produce the prototypes that are then clustered in the second stage—is found to perform well when compared with direct clustering of the data and to reduce the computation time.
2012
We introduce the Novel approach for etermination of clusters from unlabeled data sets. we nvestigate a new method called Extended Support vector achine(ESVM)along with existing Dark Block xtraction (DBE) which is based on an existing algorithm or Visual Assessment of Cluster Tendency (VAT) of a ata set, using several common image and signal rocessing techniques. Its basic steps include )Generating a VAT image of an input dissimilarity atrix, 2)Performing image segmentation on the VAT mage to obtaina binary image, followed by directional orphologicalfiltering, 3)Applying a distance transform o the filtered binary image and projecting the pixel alues onto the maindiagonal axis of the image to form a rojection signal, 4)Smoothing the projection signal, omputing its First-order derivative and then detecting ajor peaks and valleys in the resulting signal to decide he number of clusters, and 5)TheK-Means algorithm is pplied to the major peaks. We alsoimplement the luster Count Extraction ...
2005
In this paper, we propose a new clustering method consisting in automated “flood- fill segmentation” of the U*-matrix of a Self-Organizing Map after training. Using several artificial datasets as a benchmark, we find that the clustering results of our U*F method are good over a wide range of critical dataset types. Furthermore, comparison to standard clustering algorithms (K-means, single-linkage and Ward) directly applied on the same datasets show that each of the latter performs very bad on at least one kind of dataset, contrary to our U*F clustering method: while not always the best, U*F clustering has the great advantage of exhibiting consistently good results. Another advantage of U*F is that the computation cost of the SOM segmentation phase is negligible, contrary to other SOM-based clustering approaches which apply O(n2logn) standard clustering algorithms to the SOM prototypes. Finally, it should be emphasized that U*F clustering does not require a priori knowledge on the nu...
2016
Nowadays clustering is applied in many different scopes of study. There are many methods that have been proposed, but the most widely used is K-means algorithm. Neural network has been also usedin clustering case, and the most popular neural network method for clustering is Self-Organizing Map (SOM). Both methods recently become the most popular and powerful one. Many scholarstry to employ and compare the performance of both mehods. Many papers have been proposed to reveal which one is outperform the other. However, until now there is no exact solution. Different scholar gives different conclusion. In this study, SOM and K-means are compared using three popular data set. Percent misclassified and output visualization graphs (separately and simultaneously with PCA) are presented to verify the comparison result.
IEEE Transactions on Knowledge and Data Engineering, 2010
Visual methods have been widely studied and used in data cluster analysis. Given a pairwise dissimilarity matrix D D of a set of n objects, visual methods such as the VAT algorithm generally represent D D as an n  n image IðD DÞ where the objects are reordered to reveal hidden cluster structure as dark blocks along the diagonal of the image. A major limitation of such methods is their inability to highlight cluster structure when D D contains highly complex clusters. This paper addresses this limitation by proposing a Spectral VAT algorithm, where D D is mapped to D D 0 in a graph embedding space and then reordered toD D 0 using the VAT algorithm. A strategy for automatic determination of the number of clusters in IðD D 0 Þ is then proposed, as well as a visual method for cluster formation from IðD D 0 Þ based on the difference between diagonal blocks and off-diagonal blocks. A sampling-based extended scheme is also proposed to enable visual cluster analysis for large data sets. Extensive experimental results on several synthetic and real-world data sets validate our algorithms. Index Terms-Clustering, VAT, cluster tendency, spectral embedding, out-of-sample extension. Ç 1 INTRODUCTION A general question in the data mining community is how to organize observed data into meaningful structures (or taxonomies). As a tool of exploratory data analysis [36], cluster analysis aims at grouping objects of a similar kind into their respective categories. Given a data set O comprising n objects fo 1 ; o 2 ;. .. ; o n g (e.g., fish, flowers, beers, etc.), (crisp) clustering partitions the data into c groups C 1 ; C 2 ;. .. ; C c , so that C i \ C j ¼ (; if i 6 ¼ j and C 1 [ C 2 [ Á Á Á [ C c ¼ O. There have been a large number of data clustering algorithms in the recent literature [24]. In general, clustering of unlabeled data poses three major problems: 1) assessing cluster tendency, i.e., how many clusters to seek or what is the value of c?, 2) partitioning the data into c groups, and 3) validating the c clusters discovered. Given "only" a pairwise dissimilarity matrix D D 2 R nÂn representing a data set of n objects (i.e., the original object data is not necessarily available), this paper addresses the first two problems, i.e., determining the number of clusters c prior to clustering and partitioning the data into c clusters. Most clustering algorithms require the number of clusters c as an input parameter, so the quality of the resulting clusters is largely dependent on the estimation of
The VAT algorithm is a visual method for determining the possible number of clusters in, or the cluster tendency of, a set of objects. The improved VAT (iVAT) algorithm uses a graph-theoretic distance transform to improve the effectiveness of the VAT algorithm for "tough" cases where VAT fails to accurately show the cluster tendency. In this paper we present an efficient formulation of the iVAT algorithm which reduces the computational complexity of the iVAT algorithm from O(N 3 ) to O(N 2 ). We also prove a direct relationship between the VAT image and the iVAT image produced by our efficient formulation. We conclude with three examples displaying clustering tendencies in three of the Karypis data sets that illustrate the improvement offered by the iVAT transformation. We also provide a comparison of iVAT images to those produced by the Reverse Cuthill-Mckee (RCM) algorithm; our examples suggest that iVAT is superior to the RCM method of display.
2003
Self-Organizing Map (SOM) is a special kind of unsupervised neural network. SOM consists of regular, usually two-dimensional, neurons. During training, SOM forms an elastic net that folds onto the "cloud" formed by the input data. Thus, SOM can be interpreted as a topology preserving mapping from input space onto the twodimensional grid of neurons. In mining data, SOM has been used as a clustering technique. As there are several important issues concerning with data mining clustering techniques, some experiments have been done with the goal of discovering the relation between SOM and the issues. This paper discusses SOM, the experiments and the analytical results of how SOM, in some way, has provided good solutions to several of the issues.
Data mining is generally the process of examining data from different aspects and summarizing it into valuable information. There are number of data mining software's for analysing the data. They allow users to examine the data from various angles, categorize it, and summarize the relationships identified.
Studies in health technology and informatics, 2002
This work deals with multidimensional data analysis, precisely cluster analysis applied to a very well known dataset, the Wisconsin Breast Cancer dataset. After the introduction of the topics of the paper the cluster analysis concept is shortly explained and different methods of cluster analysis are compared. Further, the Kohonen model of self-organizing maps is briefly described together with an example and with explanations of how the cluster analysis can be performed using the maps. After describing the data set and the methodology used for the analysis we present the findings using textual as well as visual descriptions and conclude that the approach is a useful complement for assessing multidimensional data and that this dataset has been overused for automated decision benchmarking purposes, without a thorough analysis of the data it contains.
This paper proposes a maximum clustering similarity (MCS) method for determining the number of clusters in a data set by studying the behavior of similarity indices comparing two (of several) clustering methods. The similarity between the two clusterings is calculated at the same number of clusters, using the indices of Rand (R), Fowlkes and Mallows (FM), and Kulczynski (K) each corrected for chance agreement. The number of clusters at which the index attains its maximum is a candidate for the optimal number of clusters. The proposed method is applied to simulated bivariate normal data, and further extended for use in circular data. Its performance is compared to the criteria discussed in Tibshirani, Walther, and Hastie (2001). The proposed method is not based on any distributional or data assumption which makes it widely applicable to any type of data that can be clustered using at least two clustering algorithms.
Lecture Notes in Computer Science, 2005
This paper presents an innovative, adaptive variant of Kohonen's selforganizing maps called ASOM, which is an unsupervised clustering method that adaptively decides on the best architecture for the self-organizing map. Like the traditional SOMs, this clustering technique also provides useful information about the relationship between the resulting clusters. Applications of the resulting software to clustering biological data are discussed in detail.
Psychometrika, 1985
A Monte Carlo evaluation of 30 procedures for determining the number of clusters was conducted on artificial data sets which contained either 2, 3, 4, or 5 distinct nonoverlapping clusters. To provide a variety of clustering solutions, the data sets were analyzed by four hierarchical clustering methods. External criterion measures indicated excellent recovery of the true cluster structure by the methods at the correct hierarchy level. Thus, the clustering present in the data was quite strong. The simulation results for the stopping rules revealed a wide range in their ability to determine the correct number of clusters in the data. Several procedures worked fairly well, whereas others performed rather poorly. Thus, the latter group of rules would appear to have little validity, particularly for data sets containing distinct clusters. Applied researchers are urged to select one or more of the better criteria. However, users are cautioned that the performance of some of the criteria may be data dependent.
Computers in Biology and Medicine, 2007
Cluster analysis is one of the crucial steps in gene expression pattern (GEP) analysis. It leads to the discovery or identification of temporal patterns and coexpressed genes. GEP analysis involves highly dimensional multivariate data which demand appropriate tools. A good alternative for grouping many multidimensional objects is self-organizing maps (SOM), an unsupervised neural network algorithm able to find relationships among data. SOM groups and maps them topologically. However, it may be difficult to identify clusters with the usual visualization tools for SOM. We propose a simple algorithm to identify and visualize clusters in SOM (the RP-Q method). The RP is a new node-adaptive attribute that moves in a two dimensional virtual space imitating the movement of the codebooks vectors of the SOM net into the input space. The Q statistic evaluates the SOM structure providing an estimation of the number of clusters underlying the data set. The SOM-RP-Q algorithm permits the visualization of clusters in the SOM and their node patterns. The algorithm was evaluated in several simulated and real GEP data sets. Results show that the proposed algorithm successfully displays the underlying cluster structure directly from the SOM and is robust to different net sizes. ᭧
2008
Cluster analysis is the name given to a diverse collection of techniques that can be used to classify objects (e.g. individuals, quadrats, species etc). While Kohonen's Self-Organizing Feature Map (SOFM) or Self-Organizing Map (SOM) networks have been successfully applied as a classification tool to various problem domains, including speech recognition, image data compression, image or character recognition, robot control and medical diagnosis, its potential as a robust substitute for clustering analysis remains relatively unresearched. SOM networks combine competitive learning with dimensionality reduction by smoothing the clusters with respect to an a priori grid and provide a powerful tool for data visualization. In this paper, SOM is used for creating a toroidal mapping of two-dimensional lattice to perform cluster analysis on results of a chemical analysis of wines produced in the same region in Italy but derived from three different cultivators, referred to as the " wine recognition data " located in the University of California-Irvine database. The results are encouraging and it is believed that SOM would make an appealing and powerful decision-support system tool for clustering tasks and for data visualization.
2004 IEEE International Joint Conference on Neural Networks (IEEE Cat. No.04CH37541)
The Self-Organizing Map (SOM) has emerged as one of the popular choices for clustering data; however, when it comes to point density accuracy of codebooks or reliability and interpretability of the map, the SOM leaves much to be desired. In this paper, we compare the newly developed K-Means Hierarchical (KMH) clustering algorithm to the SOM. We also introduce a new initialization scheme for the K-means that improves codebook placement and, propose a novel visualization scheme that combines the Principal Component Analysis (PCA) and Minimal Spanning Tree (MST) in an arrangement that ensures reliability of the visualization unlike the SOM. A practical application of the algorithm is demonstrated on a challenging Bioinformatics problem.
Information Sciences, 2004
The Self-Organizing Map (SOM) is a powerful tool in the exploratory phase of data mining. It is capable of projecting high-dimensional data onto a regular, usually 2-dimensional grid of neurons with good neighborhood preservation between two spaces. However, due to the dimensional conflict, the neighborhood preservation cannot always lead to perfect topology preservation. In this paper, we establish an Expanding SOM (ESOM) to preserve better topology between the two spaces. Besides the neighborhood relationship, our ESOM can detect and preserve an ordering relationship using an expanding mechanism. The computation complexity of the ESOM is comparable with that of the SOM. Our experiment results demonstrate that the ESOM constructs better mappings than the classic SOM, especially, in terms of the topological error. Furthermore, clustering results generated by the ESOM are more accurate than those obtained by the SOM.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.