Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2011, Machine Learning
…
20 pages
1 file
We present a bioinspired algorithm which performs dimensionality reduction on datasets for visual exploration, under the assumption that they have a clustered structure. We formulate a decision-making strategy based on foraging theory, where a software agent is viewed as an animal, a discrete space as the foraging landscape, and objects representing points from the dataset as nutrients or prey items.
2009 IEEE Symposium on Visual Analytics Science and Technology, 2009
In this paper, we discuss dimension reduction methods for 2D visualization of high dimensional clustered data. We propose a twostage framework for visualizing such data based on dimension reduction methods. In the first stage, we obtain the reduced dimensional data by applying a supervised dimension reduction method such as linear discriminant analysis which preserves the original cluster structure in terms of its criteria. The resulting optimal reduced dimension depends on the optimization criteria and is often larger than 2. In the second stage, the dimension is further reduced to 2 for visualization purposes by another dimension reduction method such as principal component analysis. The role of the second-stage is to minimize the loss of information due to reducing the dimension all the way to 2. Using this framework, we propose several two-stage methods, and present their theoretical characteristics as well as experimental comparisons on both artificial and real-world text data sets.
2020
Dimensionality reduction is often used as an initial step in data exploration, either as preprocessing for classification or regression or for visualization. Most dimensionality reduction techniques to date are unsupervised; they do not take class labels into account (e.g., PCA, MDS, t-SNE, Isomap). Such methods require large amounts of data and are often sensitive to noise that may obfuscate important patterns in the data. Various attempts at supervised dimensionality reduction methods that take into account auxiliary annotations (e.g., class labels) have been successfully implemented with goals of increased classification accuracy or improved data visualization. Many of these supervised techniques incorporate labels in the loss function in the form of similarity or dissimilarity matrices, thereby creating over-emphasized separation between class clusters, which does not realistically represent the local and global relationships in the data. In addition, these approaches are often ...
This paper describes Cluster Sculptor, a novel interactive clustering system that allows a user to iteratively update the cluster labels of a data set, and an associated low-dimensional projection. The system is fed by clustering results computed in a high-dimensional space, and uses a 2D projection, both as support for overlaying the cluster labels, and engaging user interaction. By easily interacting with elements directly in the visualization, the user can inject his or her domain knowledge progressively, crafting an updated 2D projection and the associated clustering structure that combine his or her preferences and the manifolds underlying the data. Via interactive controls, the distribution of the data in the 2D space can be used to amend the cluster labels, or reciprocally, the 2D projection can be updated so as to emphasize the current clusters. The 2D projection updates follow a smooth physical metaphor, that gives insight of the process to the user. Updates can be interrupted any time, for further data inspection, or modifying the input preferences. The interest of the system is demonstrated by detailed experimental scenarios on three real data sets.
Lecture Notes in Computer Science, 2012
We explore the notion of agent-based data mining and visualization as a means for exploring large, multi-dimensional data sets. In Reynolds' classic flocking algorithm (1987), individuals move in a 2-dimensional space and emulate the behavior of a flock of birds (or "boids", as Reynolds refers to them). Each individual in the simulated flock exhibits specific behaviors that dictate how it moves and how it interacts with other boids in its "neighborhood". We are interested in using this approach as a way of visualizing large multi-dimensional data sets. In particular, we are focused on data sets in which records contain time-tagged information about people (e.g., a student in an educational data set or a patient in a medical records data set). We present a system in which individuals in the data set are represented as agents, or "data boids". The flocking exhibited by our boids is driven not by observation and emulation of creatures in nature, but rather by features inherent in the data set. The visualization quickly shows separation of data boids into clusters, where members are attracted to each other by common feature values.
2003
Traditional visualization techniques for multidimensional data sets, such as parallel coordinates, glyphs, and scatterplot matrices, do not scale well to high numbers of dimensions. A common approach to solving this problem is dimensionality reduction. Existing dimensionality reduction techniques usually generate lower dimensional spaces that have little intuitive meaning to users and allow little user interaction. In this paper we propose a new approach to handling high dimensional data, named Visual Hierarchical Dimension Reduction (VHDR), that addresses these drawbacks. VHDR not only generates lower dimensional spaces that are meaningful to users, but also allows user interactions in most steps of the process. In VHDR, dimensions are grouped into a hierarchy, and lower dimensional spaces are constructed using clusters of the hierarchy. We have implemented the VHDR approach into XmdvTool, and extended several traditional multidimensional visualization methods to convey dimension cluster characteristics when visualizing the data set in lower dimensional spaces. Our case study of applying VHDR to a real data set supports our belief that this approach is effective in supporting the exploration of high dimensional data sets.
2007 IEEE Symposium on Visual Analytics Science and Technology, 2007
Cluster analysis (CA) is a powerful strategy for the exploration of high-dimensional data in the absence of a-priori hypotheses or data classification models, and the results of CA can then be used to form such models. But even though formal models and classification rules may not exist in these data exploration scenarios, domain scientists and experts generally have a vast amount of non-compiled knowledge and intuition that they can bring to bear in this effort. In CA, there are various popular mechanisms to generate the clusters, however, the results from their nonsupervised deployment rarely fully agree with this expert knowledge and intuition. To this end, our paper describes a comprehensive and intuitive framework to aid scientists in the derivation of classification hierarchies in CA, using k-means as the overall clustering engine, but allowing them to tune its parameters interactively based on a non-distorted compact visual presentation of the inherent characteristics of the data in highdimensional space. These include cluster geometry, composition, spatial relations to neighbors, and others. In essence, we provide all the tools necessary for a high-dimensional activity we call cluster sculpting, and the evolving hierarchy can then be viewed in a space-efficient radial dendrogram. We demonstrate our system in the context of the mining and classification of a large collection of millions of data items of aerosol mass spectra, but our framework readily applies to any high-dimensional CA scenario.
2004
Haiku is a data mining system which combines the best properties of human and machine discovery. An self organising visualisation system is coupled with a genetic algorithm to provide an interactive, flexible system. Visualisation of data allows the human visual system to identify areas of interest, such as clusters, outliers or trends. A genetic algorithm based machine learning algorithm can then be used to explain the patterns identified visually. The explanations (in rule form) can be biased to be short or long; contain all the characteristics of a cluster or just those needed to predict membership; or concentrate on accuracy or on coverage of the data. This paper describes both the visualisation system and the machine learning component, with a focus on the interactive nature of the data mining process, and provides case studies to demonstrate the capabilities of the system.
Clustering is a technique commonly used in scientific research. The task of clustering inevitably involves human participation -The clustering is not finished when the computer/algorithm finishes but the user has evaluated, understood and accepted the patterns. This defines a human involved "clusteringanalysis/evaluation" iteration. Instead of neglecting this human involvement, we provide a visual framework (VISTA) with all power of algorithmic approaches (since their result can be visualized), and in addition we allow the user to steer/monitor/refine the clustering process with domain knowledge. The visual-rendering result also provides a precise pattern for fast post-processing.
IEEE Transactions on Visualization and Computer Graphics, 2000
Fig. 1. Hierarchical clustering results on a synthetic point dataset (the black dots) are shown as a heatmap. Colored levels of the hierarchy (with darker red colors corresponding to higher levels) highlight those cluster constellations that were found interesting by our heuristic, aiding the user in selecting appropriate algorithmic settings.
IEEE Transactions on Visualization and Computer Graphics, 2017
Dimension reduction algorithms and clustering algorithms are both frequently used techniques in visual analytics. Both families of algorithms assist analysts in performing related tasks regarding the similarity of observations and finding groups in datasets. Though initially used independently, recent works have incorporated algorithms from each family into the same visualization systems. However, these algorithmic combinations are often ad hoc or disconnected, working independently and in parallel rather than integrating some degree of interdependence. A number of design decisions must be addressed when employing dimension reduction and clustering algorithms concurrently in a visualization system, including the selection of each algorithm, the order in which they are processed, and how to present and interact with the resulting projection. This paper contributes an overview of combining dimension reduction and clustering into a visualization system, discussing the challenges inherent in developing a visualization system that makes use of both families of algorithms.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.
2014 IEEE Conference on Visual Analytics Science and Technology (VAST), 2014
International Journal of Science and Research (IJSR), 2017
Lecture Notes in Computer Science, 2004
Proceedings of the Fifth Conference: Neural …, 2000
2013 17th International Conference on Information Visualisation, 2013
Proceedings of the 12th international conference on Intelligent user interfaces - IUI '07, 2007