Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2010, Self-Organizing Maps
AI
This work introduces a new clustering algorithm based on Self-Organizing Maps (SOM) that simultaneously learns the structure of data and its segmentation. The proposed Density-based Simultaneous Two-Level SOM (DS2L-SOM) effectively addresses the challenge of determining an appropriate number of clusters without prior knowledge by leveraging both density and distance in the clustering process. The algorithm demonstrates superior performance in clustering complex and irregular datasets, making it a robust tool for unsupervised learning.
Pattern Recognition, 2009
Clustering is an important unsupervised learning technique widely used to discover the inherent structure of a given data set. Some existing clustering algorithms uses single prototype to represent each cluster, which may not adequately model the clusters of arbitrary shape and size and hence limit the clustering performance on complex data structure. This paper proposes a clustering algorithm to represent one cluster by multiple prototypes. The squared-error clustering is used to produce a number of prototypes to locate the regions of high density because of its low computational cost and yet good performance. A separation measure is proposed to evaluate how well two prototypes are separated. Multiple prototypes with small separations are grouped into a given number of clusters in the agglomerative method. New prototypes are iteratively added to improve the poor cluster separations. As a result, the proposed algorithm can discover the clusters of complex structure with robustness to initial settings. Experimental results on both synthetic and real data sets demonstrate the effectiveness of the proposed clustering algorithm.
Abstract—The self-organizing map (SOM) is an excellent tool in exploratory phase of data mining. It projects input space on prototypes of a low-dimensional regular grid that can be effectively utilized to visualize and explore properties of the data. When the number of SOM units is large, to facilitate quantitative analysis of the map and the data, similar units need to be grouped, i.e., clustered. In this paper, different approaches to clustering of the SOM are considered. In particular, the use of hierarchical agglomerative clustering and partitive clustering using -means are investigated. The two-stage procedure—first using SOM to produce the prototypes that are then clustered in the second stage—is found to perform well when compared with direct clustering of the data and to reduce the computation time.
2005
In this paper, we propose a new clustering method consisting in automated “flood- fill segmentation” of the U*-matrix of a Self-Organizing Map after training. Using several artificial datasets as a benchmark, we find that the clustering results of our U*F method are good over a wide range of critical dataset types. Furthermore, comparison to standard clustering algorithms (K-means, single-linkage and Ward) directly applied on the same datasets show that each of the latter performs very bad on at least one kind of dataset, contrary to our U*F clustering method: while not always the best, U*F clustering has the great advantage of exhibiting consistently good results. Another advantage of U*F is that the computation cost of the SOM segmentation phase is negligible, contrary to other SOM-based clustering approaches which apply O(n2logn) standard clustering algorithms to the SOM prototypes. Finally, it should be emphasized that U*F clustering does not require a priori knowledge on the nu...
2018
Self-Organizing Maps (SOM) are very powerful tools for data mining, in particular for visualizing the distribution of the data in very highdimensional data sets. Moreover, the 2D map produced by SOM can be used for unsupervised partitioning of the original data set into categories, provided that this map is somehow adequately segmented in clusters. This is usually done either manually by visual inspection, or by applying a classical clustering technique (such as agglomerative clustering) to the set of prototypes corresponding to the map. In this paper, we present a new approach for the segmentation of Self-Organizing Maps after training, which is both very simple and efficient. Our algorithm is based on a post-processing of the U-matrix (the matrix of distances between adjacent map units), which is directly derived from an elementary image-processing technique. It is shown on some simulated data sets that our partitioning algorithm appears to give very good results in terms of segmentation quality. Preliminary results on a real data set also seem to indicate that our algorithm can produce meaningful clusters on real data.
International Journal of Neural Systems, 1999
Determining the structure of data without prior knowledge of the number of clusters or any information about their composition is a problem of interest in many fields, such as image analysis, astrophysics, biology, etc. Partitioning a set of n patterns in a p-dimensional feature space must be done such that those in a given cluster are more similar to each other than the rest. As there are approximately [Formula: see text] possible ways of partitioning the patterns among K clusters, finding the best solution is very hard when n is large. The search space is increased when we have no a priori number of partitions. Although the self-organizing feature map (SOM) can be used to visualize clusters, the automation of knowledge discovery by SOM is a difficult task. This paper proposes region-based image processing methods to post-processing the U-matrix obtained after the unsupervised learning performed by SOM. Mathematical morphology is applied to identify regions of neurons that are simi...
Clustering is a technique of grouping similar data objects in one group and dissimilar data objects in other group. Clustering or data grouping is the key technique of the data mining. It is an unsupervised learning task where one seeks to identify a finite set of categories termed clusters to describe the data. Grouping of data into clusters aims to maximize the intra class similarity and also minimize the inter class similarity. The clustering techniques can be categorized into partitioning methods, hierarchical methods, density-based methods and grid-based methods. This paper aims to provide a brief overview and complexities of various clustering algorithms.
WSEAS Transactions on Information Science …, 2004
Unsupervised learning (clustering) deals with instances, which have not been pre-classified in any way and so do not have a class attribute associated with them. The scope of applying clustering algorithms is to discover useful but unknown classes of items. Unsupervised learning is an approach of learning where instances are automatically placed into meaningful groups based on their similarity. This paper introduces the fundamental concepts of unsupervised learning while it surveys the recent clustering algorithms. Moreover, recent advances in unsupervised learning, such as ensembles of clustering algorithms and distributed clustering, are described.
2010
This work presents a neural network model for the clustering analysis of data based on Self Organizing Maps (SOM). The model evolves during the training stage towards a hierarchical structure according to the input requirements. The hierarchical structure symbolizes a specialization tool that provides refinements of the classification process. The structure behaves like a single map with different resolutions depending on the region to analyze. The benefits and performance of the algorithm are discussed in application to the Iris dataset, a classical example for pattern recognition.
Journal of Neuroscience Methods, 2005
Cluster analysis is an important tool for classifying data. Established techniques include k-means and k-median cluster analysis. However, these methods require the user to provide a priori estimations of the number of clusters and their approximate location in the parameter space. Often these estimations can be made based on some prior understanding about the nature of the data. Alternatively, the user makes these estimations based on visualization of the data. However, the latter is problematic in data sets with large numbers of dimensions. Presented here is an algorithm that can automatically provide these estimates without human intervention based on the inherent structure of the data set. The number of dimensions does not limit it.
International Journal of Applied Research (ISSN Print: 2394-7500 ISSN Online: 2394-5869), 2021
There is a need to scrutinize and retrieve information from data in today's world. Clustering is an analytical technique which involves dividing data into groups of similar objects. Every group is called a cluster, and it is formed from objects that have affinities within the cluster but are significantly different to objects in other groups. The aim of this paper is to look at and compare two different types of hierarchical clustering algorithms. Partition and hierarchical clustering are the two main types of clustering techniques. Hierarchical clustering algorithm is one of the algorithms discussed here. The aforementioned algorithms are described and analysed in terms of factors such as dataset size, data set type, number of clusters formed, consistency, accuracy, and efficiency. Hierarchical clustering is a cluster analysis technique that aims to create a hierarchy of clusters. A hierarchical clustering method is a set of simple (flat) clustering methods arranged in a tree structure. These methods create clusters by recursively partitioning the entities in a top-down or bottom-up manner. We examine and compare hierarchical clustering algorithms in this paper. The intent of discussing the various implementations of hierarchical clustering algorithms is to assist new researchers and beginners to understand how they function, so they can come up with new approaches and innovations for improvement.
2005
The extraction of meaningful information from large collections of data is a fundamental issues in science. To this end, clustering algorithms are typically employed to identify groups (clusters) of similar objects. A critical issue for any clustering algorithm is the determination of the number of clusters present in a dataset. In this contribution we present a clustering algorithm that in addition to partitioning the data into clusters, it approximates the number of clusters during its execution. We further present modifications of this algorithm for different distributed environments, and dynamic databases. Finally, we present a modification of the algorithm that exploits the fractal dimension of the data to partition the dataset.
Menemui Matematik (Discovering Mathematics), 2016
The Self-organizing map is among the most acceptable algorithm in the unsupervised learning technique for cluster analysis. It is an important tool used to map high-dimensional data sets onto a low-dimensional discrete lattice of neurons. This feature is used for clustering and classifying data. Clustering is the process of grouping data elements into classes or clusters so that items in each class or cluster are as similar to each other as possible. In this paper, we present an overview of self organizing map, its architecture, applications and its training algorithm. Computer simulations have been analyzed based on samples of data for clustering problems.
International Journal of Computer Applications
Data mining main goal of information find in large dataset or the data mining process is to take out information from an outsized data set and transform it into a clear kind for any use. group is vital in information analysis and data processing applications. it's the task of clustering a group of objects in order that objects within the same group are additional kind of like different or one another than to those in other teams (clusters).speedy recovery of the related data from databases has invariably been a big issue. There are several techniques are developed for this purpose; in among information cluster is one amongst the key techniques. The method of making very important data from a large quantity of information is learning. It may be classified into 2 like supervised learning and unsupervised learning. Group could be a quite unsupervised data processing technique. It describes the overall operating behavior, the methodologies followed by these approaches and therefore the parameters that have an effect on the performance of those algorithms. a review of cluster and its completely different techniques in data processing is completed.
IOSR Journal of Engineering, 2012
Clustering is a common technique for statistical data analysis, which is used in many fields, including machine learning, data mining, pattern recognition, image analysis and bioinformatics. Clustering is the process of grouping similar objects into different groups, or more precisely, the partitioning of a data set into subsets, so that the data in each subset according to some defined distance measure. This paper covers about clustering algorithms, benefits and its applications. Paper concludes by discussing some limitations.
IJCSMC, 2019
Machine learning algorithms were broadly classified into supervised, unsupervised and semi-supervised learning algorithms. Supervised learning algorithms were classified into classification and regression techniques whereas unsupervised learning algorithms were classified into clustering and dimensionality reduction. This paper deals with the evaluation of clustering techniques under unsupervised learning. Clustering is the process of coordinating the data of similar properties under single group. There are several clustering techniques available such as partitional clustering, hierarchical clustering, Fuzzy clustering, Density-based clustering, and Model-based clustering. This paper focuses on the analysis and evaluation of K-means clustering of partitional method and Divisive clustering of hierarchical method. The result of evaluation shows that K-means clustering can hold better for large datasets and it also takes less time than hierarchical clustering.
Clustering is a branch of multivariate analysis that is used to create groups of data. While there are currently a variety of techniques that are used for creating clusters, many require defining additional information, including the actual number of clusters, before they can be carried out. The case study of this research presents a novel neural network that is capable of creating groups by using a combination of hierarchical clustering and self-organizing maps, without requiring the number of existing clusters to be specified beforehand.
Pattern Recognition Letters, 1994
In this paper, a new non-iterative clustering method is proposed. It consists of two passes. In the first pass, the mean distance from one object to its nearest neighbor is estimated. Based on this distance, those noises far away from objects are extracted and removed. In the second pass, the mean distance from the remaining objects to their nearest neighbors is computed. Based on the distance, all the intrinsic clusters are then found. The proposed method is non-iterative and can automatically determine the number of clusters. Experimental results also show that the partition generated by the proposed method is more reasonable than that of the well-known c-means algorithm in many complicated object distributions.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.