Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2011, Machine Learning
…
20 pages
1 file
We present a bioinspired algorithm which performs dimensionality reduction on datasets for visual exploration, under the assumption that they have a clustered structure. We formulate a decision-making strategy based on foraging theory, where a software agent is viewed as an animal, a discrete space as the foraging landscape, and objects representing points from the dataset as nutrients or prey items. We apply this algorithm to artificial and real databases, and show how a multi-agent system addresses the problem of mapping highdimensional data into a two-dimensional space.
Lecture Notes in Computer Science, 2012
We explore the notion of agent-based data mining and visualization as a means for exploring large, multi-dimensional data sets. In Reynolds' classic flocking algorithm (1987), individuals move in a 2-dimensional space and emulate the behavior of a flock of birds (or "boids", as Reynolds refers to them). Each individual in the simulated flock exhibits specific behaviors that dictate how it moves and how it interacts with other boids in its "neighborhood". We are interested in using this approach as a way of visualizing large multi-dimensional data sets. In particular, we are focused on data sets in which records contain time-tagged information about people (e.g., a student in an educational data set or a patient in a medical records data set). We present a system in which individuals in the data set are represented as agents, or "data boids". The flocking exhibited by our boids is driven not by observation and emulation of creatures in nature, but rather by features inherent in the data set. The visualization quickly shows separation of data boids into clusters, where members are attracted to each other by common feature values.
2009 IEEE Symposium on Visual Analytics Science and Technology, 2009
In this paper, we discuss dimension reduction methods for 2D visualization of high dimensional clustered data. We propose a twostage framework for visualizing such data based on dimension reduction methods. In the first stage, we obtain the reduced dimensional data by applying a supervised dimension reduction method such as linear discriminant analysis which preserves the original cluster structure in terms of its criteria. The resulting optimal reduced dimension depends on the optimization criteria and is often larger than 2. In the second stage, the dimension is further reduced to 2 for visualization purposes by another dimension reduction method such as principal component analysis. The role of the second-stage is to minimize the loss of information due to reducing the dimension all the way to 2. Using this framework, we propose several two-stage methods, and present their theoretical characteristics as well as experimental comparisons on both artificial and real-world text data sets.
Proceedings of the Fifth Conference: Neural …, 2000
Interactive exploratory data analysis can be realised by using dimensionality reduction techniques integrated in data visualization software. This work presents an adaptation of one multidimensional scaling algorithm to provide it with generalization capability, allowing the display of new data on an existing mapping. The ensuing relative mapping is used to help the understanding of classification results.
This paper describes Cluster Sculptor, a novel interactive clustering system that allows a user to iteratively update the cluster labels of a data set, and an associated low-dimensional projection. The system is fed by clustering results computed in a high-dimensional space, and uses a 2D projection, both as support for overlaying the cluster labels, and engaging user interaction. By easily interacting with elements directly in the visualization, the user can inject his or her domain knowledge progressively, crafting an updated 2D projection and the associated clustering structure that combine his or her preferences and the manifolds underlying the data. Via interactive controls, the distribution of the data in the 2D space can be used to amend the cluster labels, or reciprocally, the 2D projection can be updated so as to emphasize the current clusters. The 2D projection updates follow a smooth physical metaphor, that gives insight of the process to the user. Updates can be interrupted any time, for further data inspection, or modifying the input preferences. The interest of the system is demonstrated by detailed experimental scenarios on three real data sets.
2007 IEEE Symposium on Visual Analytics Science and Technology, 2007
Cluster analysis (CA) is a powerful strategy for the exploration of high-dimensional data in the absence of a-priori hypotheses or data classification models, and the results of CA can then be used to form such models. But even though formal models and classification rules may not exist in these data exploration scenarios, domain scientists and experts generally have a vast amount of non-compiled knowledge and intuition that they can bring to bear in this effort. In CA, there are various popular mechanisms to generate the clusters, however, the results from their nonsupervised deployment rarely fully agree with this expert knowledge and intuition. To this end, our paper describes a comprehensive and intuitive framework to aid scientists in the derivation of classification hierarchies in CA, using k-means as the overall clustering engine, but allowing them to tune its parameters interactively based on a non-distorted compact visual presentation of the inherent characteristics of the data in highdimensional space. These include cluster geometry, composition, spatial relations to neighbors, and others. In essence, we provide all the tools necessary for a high-dimensional activity we call cluster sculpting, and the evolving hierarchy can then be viewed in a space-efficient radial dendrogram. We demonstrate our system in the context of the mining and classification of a large collection of millions of data items of aerosol mass spectra, but our framework readily applies to any high-dimensional CA scenario.
2011
The exponential growth of data generates terabytes of very large databases. The growing number of data dimensions and data objects presents tremendous challenges for effective data analysis and data exploration methods and tools. One solution commonly proposed is the use of a condensed description of the properties and structure of data. Thus, it becomes crucial to have visualization tools capable of representing the data structure, not from the data themselves, but from these condensed descriptions. The purpose of our work described in this paper is to develop and put a synergistic visualization of data and knowledge into the knowledge discovery process. We propose here a method of describing data from enriched and segmented prototypes using a clustering algorithm. We then introduce a visualization tool that can enhance the structure within and between groups in data. We show, using some artificial and real databases, the relevance of the proposed method.
2002
Information Visualization is commonly recognized as a useful method for understanding sophistication in large datasets. In this paper, we introduce an efficient and flexible clustering approach that combines visual clustering and fast disk labelling for very large datasets. This paper has three contributions. First, we propose a framework Vista that incorporates information visualization methods into the clustering process in order to enhance the understanding of the intermediate clustering results and allow user to revise the clustering results before disk labelling phase. Second, we introduce a fast and flexible disk-labelling algorithm ClusterMap, which utilizes the visual clustering result to improve the overall performance of clustering on very large datasets. Third, we develop a visualization model that maps multidimensional dataset to 2D visualization while preserving or partially preserving clusters. Experiments show that Vista combining with ClusterMap, is faster and has lower error rate than existing algorithms for very large datasets. It is also flexible because the "cluster map" can be easily adjusted to meet application-specific clustering requirements.
2003
Traditional visualization techniques for multidimensional data sets, such as parallel coordinates, glyphs, and scatterplot matrices, do not scale well to high numbers of dimensions. A common approach to solving this problem is dimensionality reduction. Existing dimensionality reduction techniques usually generate lower dimensional spaces that have little intuitive meaning to users and allow little user interaction. In this paper we propose a new approach to handling high dimensional data, named Visual Hierarchical Dimension Reduction (VHDR), that addresses these drawbacks. VHDR not only generates lower dimensional spaces that are meaningful to users, but also allows user interactions in most steps of the process. In VHDR, dimensions are grouped into a hierarchy, and lower dimensional spaces are constructed using clusters of the hierarchy. We have implemented the VHDR approach into XmdvTool, and extended several traditional multidimensional visualization methods to convey dimension cluster characteristics when visualizing the data set in lower dimensional spaces. Our case study of applying VHDR to a real data set supports our belief that this approach is effective in supporting the exploration of high dimensional data sets.
Lecture Notes in Computer Science, 2004
Visualization techniques provide an outstanding role in KDD process for data analysis and mining. However, one image does not always convey successfully the inherent information from high dimensionality, very large databases. In this paper we introduce VSIS (Visual Set of Information Segments), an interactive tool to visually explore multidimensional, very large, numerical data. Within the supervised learning, our proposal approaches the problem of classification by searching of meaningful intervals belonging to the most relevant attributes. These intervals are displayed as multi-colored bars in which the degree of impurity with respect to the class membership can be easily perceived. Such bars can be re-explored interactively with new values of user-defined parameters. A case study of applying VSIS to some UCI repository data sets shows the usefulness of our tool in supporting the exploration of multidimensional and very large data.
2001
In several real-life data mining applications, data resides in very high (1000) dimensional space, where both clustering techniques developed for low dimensional spaces (-means, BIRCH, CLARANS, CURE, DBScan etc) as well as visualization methods such as parallel coordinates or projective visualizations, are rendered ineffective. This paper proposes a relationship based approach to clustering that alleviates both problems, side-stepping the "curse of dimensionality" issue by working in a suitable similarity space instead of the original high-dimensional attribute space. The similarity measure used can be tailored to satisfy business criteria such as obtaining user clusters representing comparable amounts of revenue. The clustering algorithm is used to reorder the data points so that the resulting (rearranged) similarity matrix can be readily visualized in two dimensions, with clusters showing up as bands. While such visualization is not novel, the two key contributions of our method are: (i) it leads to clusters of (approximately) equal importance, and (ii) related clusters show up adjacent to one another, further facilitating the visualization of results. Both properties arise from the efficient and scalable top-down graphpartitioning approach used for clustering in similarity space. The visualization is very helpful for assessing and improving clustering. For example, actionable recommendations for splitting or merging of clusters can be easily derived, and it also guides the user towards the right number of clusters. Results are presented on a real retail industry data-set of several thousand customers and products, as well as on clustering of web document collections.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.
2014 IEEE Conference on Visual Analytics Science and Technology (VAST), 2014
International Journal of Science and Research (IJSR), 2017
Journal of Advanced Computational Intelligence and Intelligent Informatics, 2005
International Journal of Advanced Computer Science and Applications, 2016
IEEE Transactions on Visualization and Computer Graphics, 2017
Database and Expert Systems Applications, 2002
2013 17th International Conference on Information Visualisation, 2013
Multimedia Tools and Applications, 2010