Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
This paper presents a new framework for multivariate data analysis, based on graph theory, using intersection graphs . We have named this approach DAIG Data Analysis with Intersection Graphs. This new framework represents data vectors as paths on a graph, which has a number of advantages over the classical table representation of data. To do so, each node represents an atom of information, i.e. a pair of a variable and a value, associated with the set of observations for which that pair occurs. An edge exists between a pair of nodes whenever the intersection of their respective sets is not empty. We show that this representation of data as an intersection graph allows an easy and intuitive geometric interpretation of data observations, groups of observations, and results of multivariate data analysis techniques such as biplots, principal components, cluster analysis, or multidimensional scaling. These will appear as paths on the graph, relating variables, values and observations. This approach allows for a compact and memory efficient representation of data that contains many missing values or multi-valued attributes. The basic principles and advantages of this approach are presented with an example of its application to a simple toy problem. The main features of this methodology are illustrated with the aid software specifically developed for this purpose.
Institute of Mathematical Statistics Lecture Notes - Monograph Series, 2000
In this paper we explore the relationship between multivariate data analysis and techniques for graph drawing or graph layout. Although both classes of techniques were created for quite different purposes, we find many common principles and implementations. We start with a discussion of the data analysis techniques, in particular multiple correspondence analysis, multidimensional scaling, parallel coordinate plotting, and seriation. We then discuss parallels in the graph layout literature. Categories of second variable FIGURE 1. The multivariable graph of a toy example 1 A bipartite graph is a 2-layered graph, where edges only go from one layer to the other layer.
Methods of Information in Medicine, 2014
Objectives-Graphical displays can make data more understandable; however, large graphs can challenge human comprehension. We have previously described a filtering method to provide high-level summary views of large data sets. In this paper we demonstrate our method for setting and selecting thresholds to limit graph size while retaining important information by applying it to large single and paired data sets, taken from patient and bibliographic databases.
2008
This paper presents a new way of representing a contingency table as a graph. A brief introduction to contingency tables is given, and it is shown that an intersection graph can be constructed that contains the same information. It is also shown that this representation of contingency tables can be used for p way contingency tables (with p > 2). In this latter case, contingency tables become difficult to interpret, and the intuitive reasoning which is easy for 2-way tables is lost. We show that a graphbased representation of contingency tables is useful for p-way tables for p >> 2. A simple example is given, that shows the potential of this method and how to plot that graph using parallel coordinate graphs or radar graphs.
This paper presents a new way of representing a contingency table as a graph. A brief introduction to contingency tables is given, and it is shown that an intersection graph can be constructed that contains the same information. It is also shown that this representation of contingency tables can be used for p - way contingency tables (with p > 2). In this latter case, contingency tables become difficult to interpret, and the intuitive reasoning which is easy for 2-way tables is lost. We show that a graph- based representation of contingency tables is useful for p-way tables for p >> 2. A simple example is given, that shows the potential of this method and how to plot that graph using parallel coordinate graphs or radar graphs.
Computational Statistics, 2001
In this paper the problem of visualizing categorical multivariate data sets is considered. By representing the data as the adjacency matrix of an appropriately defined bipartite graph, the problem is transformed to one of graph drawing. A general graph drawing framework is introduced, the corresponding mathematical problem defined and an algorithmic approach for solving the necessary optimization problem discussed. The new approach is illustrated through several examples.
HAL (Le Centre pour la Communication Scientifique Directe), 2017
Many computing applications imply dealing with network data, for example, social networks, communications and computing networks, epidemiological networks, among others. These applications are usually based on multivariate graphs, i.e., graphs in which items and relationships have multiple attributes. Most of the visualization techniques described in the literature for dealing with multivariate graphs focus either on problems associated with the visualization of topology or on problems associated with the visualization of the items' attributes. The integration of these two components (topology and multiple attributes) in a single visualization turns into a challenge due to the necessity of simultaneously representing the connections and mapping attributes possibly generating overlapping elements. Among usual strategies to overcome this legibility problem we find filtering and aggregation that makes possible a simplified representation with reduced size and density providing a general view. However, this simplification may lead to a reduction of the amount of information being displayed, while in several applications the graph details still need to be represented in order to make possible in-depth data analysis. In face of that, we propose ClusterVis, a visualization technique aiming at exploring nodes attributes pertaining to sub-graphs, which are either obtained from clustering algorithms or some user-defined criteria. The technique allows comparing attributes of nodes while keeping the representation of the relationships among them. The technique was implemented within a visualization framework and evaluated by potential users. CCS Concepts •Human-centered Computing ➝ Visualization application domains ➝ Information Visualization.
1986
Graphical techniques for displaying, ex~n1ng, and anaLyzing multivariable observations are discussed. Graphical .methods thai;: reveal important features of data serve tO complement and illuminate formal statistical inferences. Recently developed graphical displays having practical value for applied work with high-dimensional data are emphasized. Star plots, faces, and trees are examples of such methods. The strengths and weaknesses of these and other techniques for dealing with data from applied situations will be treated"and compared.
Dynamic graphics for …, 1988
Orion I is a graphic system used to study applications of computer graphics especially interactive motion graphics-in statistics. Orion I is the newest of a family of "Prim" systems whose most striking common feature is the use of realtime motion graphics to display three-dimensional scatterplots. &ion I differs from earlier Prim systems through the use of modern and relatively inexpensive raster graphics and microprocessor technology. It also delivers more computing power to its user; Orion I can perform more sophisticated real-time computations than were possible on previous such systems. We demonstrate some of Orion I's capabilities in our film: Exploring Data with Orion I.
Encyclopedia of Database Systems, 2009
The large volume of data available in many domains and the need to analyze the data to extract useful information from it has lead to the need of visualization techniques to get information about the data at a glance. Visual inspection is useful in providing fast and abstract information about datasets to guide the researchers in choosing the suitable approach to process the data. Recently, there have been notable advances in graph visualization; however, visualizing sets still needs more attention. In this paper a method is proposed to visualize overlapping sets so that the underlying hierarchy and relations of the sets can be easily understood by visual inspection. This approach utilizes the graph representation of the sets to aid the drawing process. Using the spectral decomposition of the graph derived from the sets, we developed algorithms to compute the best coordinates for the items of the sets and plot them on the Euclidean plane. The method has been tested on both real and synthetic datasets to investigate its performance.
SpringerBriefs in Computer Science, 2013
ABSTRACT This work presents a data visualization technique that combines graph-based topology representation and dimensionality reduction methods to visualize the intrinsic data structure in a low-dimensional vector space. The application of graphs in clustering and visualization has several advantages. A graph of important edges (where edges characterize relations and weights represent similarities or distances) provides a compact representation of the entire complex data set. This text describes clustering and visualization methods that are able to utilize information hidden in these graphs, based on the synergistic combination of clustering, graph-theory, neural networks, data visualization, dimensionality reduction, fuzzy methods, and topology learning. The work contains numerous examples to aid in the understanding and implementation of the proposed algorithms, supported by a MATLAB toolbox available at an associated website.
We discuss the problem of extending data mining approaches to cases in which data points arise in the form of individual graphs. Being able to find the intrinsic low-dimensionality in ensembles of graphs can be useful in a variety of modeling contexts, especially when coarse-graining the detailed graph information is of interest. One of the main challenges in mining graph data is the definition of a suitable pairwise similarity metric in the space of graphs. We explore two practical solutions to solving this problem: one based on finding subgraph densities, and one using spectral information. The approach is illustrated on three test data sets (ensembles of graphs); two of these are obtained from standard graph generating algorithms, while the graphs in the third example are sampled as dynamic snapshots from an evolving network simulation.
Bull Int Statist Inst, 1999
Genomics, 2014
Advances in science and technology have resulted in an exponential growth of multivariate (or multidimensional) datasets which are being generated from various research areas especially in the domain of biological sciences. Visualization and analysis of such data (with the objective of uncovering the hidden patterns therein) is an important and challenging task. We present a tool, called Igloo-Plot, for efficient visualization of multidimensional datasets. The tool addresses some of the key limitations of contemporary multivariate visualization and analysis tools. The visualization layout, not only facilitates an easy identification of clusters of data-points having similar feature compositions, but also the 'marker features' specific to each of these clusters. The applicability of the various functionalities implemented herein is demonstrated using several well studied multi-dimensional datasets. Igloo-Plot is expected to be a valuable resource for researchers working in multivariate data mining studies.
ArXiv, 2017
Graph layout is the process of creating a visual representation of a graph through a node-link diagram. Node-attribute graphs have additional data stored on the nodes which describe certain properties of the nodes called attributes. Typical force-directed representations often produce hairball-like structures that neither aid in understanding the graph's topology nor the relationship to its attributes. The aim of this research was to investigate the use of node-attributes for graph layout in order to improve the analysis process and to give further insight into the graph over purely topological layouts. In this article we present graphTPP, a graph based extension to targeted projection pursuit (TPP) --- an interactive, linear, dimension reduction technique --- as a method for graph layout and subsequent further analysis. TPP allows users to control the projection and is optimised for clustering. Three case studies were conducted in the areas of influence graphs, network security...
2007 6th International Asia-Pacific Symposium on Visualization, 2007
In this paper, we introduce a new method, GraphScape, to visualize multivariate networks, i.e., graphs with multivariate data associated with their nodes. GraphScape adopts a landscape metaphor with network structure displayed on a 2D plane and the surface height in the third dimension represents node attribute. More than one attribute can be visualized simultaneously by using multiple surfaces. In addition, GraphScape can be easily combined with existing methods to further increase the total number of attributes visualized. One of the major goals of GraphScape is to reveal multivariate graph clustering, which is based on both network structure and node attributes. This is achieved by a new layout algorithm and an innovative way of constructing attribute surface, which also allows visual clustering at different scales through interaction. A simplified attribute surface model is also proposed to reduce computation requirement when visualizing large networks. GraphScape is applied to networks of three different size (20, 100, and 1500) to demonstrate its effectiveness.
Abstract. Visualization techniques are especially relevant to multidimensional data, the analysis of which is limited by human perception abilities. The paper presents a hybrid method of multidimensional data analysis. The main goal was to test the efficiency of the method in the context of real-life medical data. A short survey of issues and techniques concerned with data visualization are also included.
Indonesian Journal of Electrical Engineering and Computer Science, 2019
The tremendous growth of big data has caused the data visualization process becomes more complex and challenging, and yet, data is expected to be increased from time to time. With these massive and complex data, it is getting harder for the data analyst to interpret or read the data in order to gain new knowledge or information. Therefore, it is important to visualize these data using different techniques. However, there are many remaining issues in data visualization techniques. These issues make the data visualization a big challenge to the data analyst. The most common issue in data visualization techniques is the overlapping issue. This paper reviews the overlapping issues in multidimensional and network data visualization techniques. The existing solutions are also reviewed and discussed in term of advantages and disadvantages. This paper concludes the advantages of the overlapping issues and solutions, before discussing their drawbacks. This paper suggests the color-based approach, relocation, and reduction of data sets to solve the overlapping issues.
2019
Graphs are irregular structures which naturally account for data integrity, however, traditional approaches have been established outside Signal Processing, and largely focus on analyzing the underlying graphs rather than signals on graphs. Given the rapidly increasing availability of multisensor and multinode measurements, likely recorded on irregular or ad-hoc grids, it would be extremely advantageous to analyze such structured data as graph signals and thus benefit from the ability of graphs to incorporate spatial awareness of the sensing locations, sensor importance, and local versus global sensor association. The aim of this lecture note is therefore to establish a common language between graph signals, defined on irregular signal domains, and some of the most fundamental paradigms in DSP, such as spectral analysis of multichannel signals, system transfer function, digital filter design, parameter estimation, and optimal filter design. This is achieved through a physically mean...
2007 IEEE Symposium on Visual Analytics Science and Technology, 2007
Visualization systems traditionally focus on graphical representation of information. They tend not to provide integrated analytical services that could aid users in tackling complex knowledge discovery tasks. Users' exploration in such environments is usually impeded due to several problems: 1) valuable information is hard to discover when too much data is visualized on the screen; 2) Users have to manage and organize their discoveries off line, because no systematic discovery management mechanism exists; 3) their discoveries based on visual exploration alone may lack accuracy; 4) and they have no convenient access to the important knowledge learned by other users. To tackle these problems, it has been recognized that analytical tools must be introduced into visualization systems. In this paper, we present a novel analysis-guided exploration system, called the Nugget Management System (NMS). It leverages the collaborative effort of human comprehensibility and machine computations to facilitate users' visual exploration processes. Specifically, NMS first extracts the valuable information (nuggets) hidden in datasets based on the interests of users. Given that similar nuggets may be re-discovered by different users, NMS consolidates the nugget candidate set by clustering based on their semantic similarity. To solve the problem of inaccurate discoveries, localized data mining techniques are applied to refine the nuggets to best represent the captured patterns in datasets. Lastly, the resulting well-organized nugget pool is used to guide users' exploration. To evaluate the effectiveness of NMS, we integrated NMS into Xmd-vTool, a freeware multivariate visualization system. User studies were performed to compare the users' efficiency and accuracy in finishing tasks on real datasets, with and without the help of NMS. Our user studies confirmed the effectiveness of NMS.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.