0% found this document useful (0 votes)
8 views48 pages

Hive Plot

The document discusses the challenges of visualizing complex networks, highlighting the limitations of traditional 'hairball' visualizations that obscure data relationships. It proposes a linear layout method for network visualization that enhances clarity and allows for easier comparison by organizing nodes based on meaningful properties. This approach aims to improve both exploratory and communicative visualizations, making it easier to identify patterns and insights in network data.

Uploaded by

kumarsukrit2
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views48 pages

Hive Plot

The document discusses the challenges of visualizing complex networks, highlighting the limitations of traditional 'hairball' visualizations that obscure data relationships. It proposes a linear layout method for network visualization that enhances clarity and allows for easier comparison by organizing nodes based on meaningful properties. This approach aims to improve both exploratory and communicative visualizations, making it easier to identify patterns and insights in network data.

Uploaded by

kumarsukrit2
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 48

HIVE PLOTS!

LINEAR LAYOUT FOR !


VISUALIZATION OF NETWORKS!
!
!
DRAWING NETWORKS RATIONALLY!
!
OR!
!
THE END OF HAIRBALLS!

MARTIN KRZYWINSKI

"

Talks and software – http://mkweb.bcgsc.ca/linnet.


Originally presented at Genome Informatics 2010, Hinxton UK (September 17).

technical brief
version 1 / 6 Oct 2010

Linear Layout for Visualization of Networks: End of Hairballs / M Krzywinski


WHAT ARE WE HOPING TO ACHIEVE?

Exploring data sets and communicating your findings are two different activities. Typically, the same
visualization approach does not suit both.
EXPLORATORY VISUALIZATIONS ARE TOO COMPLEX TO COMMUNICATE.
COMMUNICATIVE VISUALIZATIONS CANNOT BE CREATED UNTIL DATA IS EXPLORED.

The EVEREST PowerWall at Oak Ridge National Laboratory, Comparison of US budget spending by department (left) and related media
in Tennessee, is a computer visualization facility. EVEREST coverage (right).
stands for Exploratory Visualization Environment for Research http://www.pitchinteractive.com/usbudget/
in Science and Technology. The 9-meter-wide, 2.4-meter-tall
screen can display 35 million pixels of information and is now
being used as a tool to model climate change.
http://spectrum.ieee.org/energy/nuclear/slideshow-a-nuclear-
family-vacation/0

4 Nov 2010 / 2!
Linear Layout for Visualization of Networks: End of Hairballs / M Krzywinski
EXPLORING VS COMMUNICATING

EXPLORE COMMUNICATE
PROCESS AND INTEGRATE

identify minimal set of visual


elements to communicate
your findings

identify appropriate scale at


data driven which differences are shown theme driven

hypothesis generating communicate patterns and themes

discover patterns and themes be specific in guiding the reader


towards your conclusion – the
reader is exploring!
be thorough by applying variety of
approaches
DESIGNED TO EMPHASIZE
CONCLUSIONS, THESE
EXPLORATORY VISUALIZATIONS ARE
VISUALIZATIONS MUST COMMUNICATE
USEFUL TO THE EXPERT, WHO
THE STRENGTH AND SIGNIFICANCE OF
KNOWS THE DATA, CAN DISTINGUISH
THE EFFECT. OPTIONALLY, OTHER
SIGNAL FROM NOISE, AND IDENTIFY
DATA CAN BE INTEGRATED TO
INTERESTING PATTERNS.
PROVIDE CONTEXT, WHILE
PRESERVING CLARITY.

4 Nov 2010 / 3!
Linear Layout for Visualization of Networks: End of Hairballs / M Krzywinski
CONSTRAIN AMBIGUITY BY IDENTIFYING MEANINGFUL SCALE AND SIGNIFICANCE

4 Nov 2010 / 4!
Linear Layout for Visualization of Networks: End of Hairballs / M Krzywinski
EMERGING PATTERNS

Showing the entire data set presents the opportunity to identify emergent patterns – characteristics that are
very difficult to express analytically but trivially identified visually.

Nope, no emergent pattern here. Yup, there’s a pattern here.


Consider how long it took you to identify the form and compare this to how
long it would take to create a general program to do the same.

4 Nov 2010 / 5!
Linear Layout for Visualization of Networks: End of Hairballs / M Krzywinski
CAPTURING IMAGINATION

Network data sets are natural inputs for information art. What is information art? A visualization which is
beautiful, engaging and compels us to look deeper – it does not need to be informative.

Yawn. Rockin’.

4 Nov 2010 / 6!
Linear Layout for Visualization of Networks: End of Hairballs / M Krzywinski
HAIRBALLS

Both are visualizations of a complex system.

Miller W, Drautz DI, Ratan A, Pusey B, Qi J, Lesk AM, Tomsho Figure 2 and caption quote from Rual et al., Nature 437(7062):1173-8.
LP, Packard MD, Zhao F, Sher A et al. 2008. Sequencing the
nuclear genome of the extinct woolly mammoth. Nature 456
(7220): 387-390.

4 Nov 2010 / 7!
Linear Layout for Visualization of Networks: End of Hairballs / M Krzywinski
SYSTEM BEHIND THE HAIRBALL

Sometimes a better visualization exists if the question “what does it look like” makes sense (mammoth). If the
question does not make sense, a different question must be asked: what is important?

A good visualization of the mammoth. An interaction network is not a tangible physical system like the mammoth
http://wild-facts.com/2010/09/22/wild-fact-718-the-snow- and its visualization must be approached by first asking: what questions
plow-wooly-mammoth/ would l like to answer and what properties would I like to communicate?
Rual et al., Nature 437(7062):1173-8.
4 Nov 2010 / 8!
Linear Layout for Visualization of Networks: End of Hairballs / M Krzywinski
HOW DO WE NAVIGATE A VISUALIZATION?

A silly question – bear with me. When looking at a scatter plot you know exactly how to identify the meaning of a single data point.

4 Nov 2010 / 9!
Linear Layout for Visualization of Networks: End of Hairballs / M Krzywinski
USE LANDMARKS – AXES

The axes provide a reference system.

4 Nov 2010 / 10!


Linear Layout for Visualization of Networks: End of Hairballs / M Krzywinski
A COMMON REFERENCE SYSTEM PERMITS COMPARISONS

Scatter plots can be easily compared – such as in this matrix – because they have a common coordinate system. Furthermore, the position of
any one point on a plot is directly (and solely) decided by meaningful properties (x,y coordinate).
Buess et al. Genome Biology 2007 8:R191 doi:10.1186/gb-2007-8-9-r191
4 Nov 2010 / 11!
Linear Layout for Visualization of Networks: End of Hairballs / M Krzywinski
WHAT IS DIFFERENT? DATA OR COORDINATE SYSTEM?

Our ability to detect linearity is invariant under rotation, but the coordinate system change affects our
interpretation of the proportionality in the linear relationship.

Zhao et al. BMC Genomics 2002 3:31 doi: Modified figure. Each scatter plot was arbitrarily rotated.
10.1186/1471-2164-3-31 Zhao et al. BMC Genomics 2002 3:31 doi:10.1186/1471-2164-3-31

4 Nov 2010 / 12!


Linear Layout for Visualization of Networks: End of Hairballs / M Krzywinski
GOOD LUCK!

This figure illustrates the challenge posed by an inadequate coordinate system. The data points are too few and too loosely distributed for us
to identify meaningful patterns. We’re not very good at identifying loose clusters of patterns against a background of apparent randomness.
Schiøtz et al. Virology Journal 2009 6:91 doi:10.1186/1743-422X-6-91
4 Nov 2010 / 13!
Linear Layout for Visualization of Networks: End of Hairballs / M Krzywinski
LET’S TALK ABOUT NETWORKS

4 Nov 2010 / 14!


Linear Layout for Visualization of Networks: End of Hairballs / M Krzywinski
HAIRBALLS ARE IRRATIONAL – THERE IS NO MEANINGFUL COORDINATE SYSTEM

6726 human protein-protein interactions. Rendered with Cytoscape (force directed layout).
Rual et al., Nature 437(7062):1173-8.

4 Nov 2010 / 15!


Linear Layout for Visualization of Networks: End of Hairballs / M Krzywinski
SIGNIFICANCE OF SPATIAL PROXIMITY IS ENTANGLED WITH LAYOUT

6726 human protein-protein interactions. Rendered with Cytoscape (force directed layout).
Rual et al., Nature 437(7062):1173-8.

4 Nov 2010 / 16!


Linear Layout for Visualization of Networks: End of Hairballs / M Krzywinski
DIFFERENT LAYOUTS – SAME NETWORK?

Subset of the human protein-protein interaction network rendered by Cytoscape using a variety of layouts.
Rual et al., Nature 437(7062):1173-8.

4 Nov 2010 / 17!


Linear Layout for Visualization of Networks: End of Hairballs / M Krzywinski
SAME LAYOUT – DIFFERENT NETWORK?

Subset of the human protein-protein interaction network rendered by Cytoscape. Each visualization uses the same layout (spring embedded),
using the previous as a starting point.
Rual et al., Nature 437(7062):1173-8.
4 Nov 2010 / 18!
Linear Layout for Visualization of Networks: End of Hairballs / M Krzywinski
SAME HAIRBALL – CAN YOU TELL?

Subset of the human protein-protein interaction network rendered by Cytoscape (edge weighted spring embedded). Each panel shows the
same visualization, but with a random rotation and flip.
Rual et al., Nature 437(7062):1173-8.
4 Nov 2010 / 19!
Linear Layout for Visualization of Networks: End of Hairballs / M Krzywinski
VISUALIZATION APOLOGETICS

“The apparent banding pattern of the yellow nodes is an artefact of the graph layout algorithm (Supplementary Data). Importantly, the layout
algorithm was not informed by type of supporting evidence and therefore does not explain the evident separation of blue and red edges.”
Figure 2 and caption quote from Rual et al., Nature 437(7062):1173-8
4 Nov 2010 / 20!
Linear Layout for Visualization of Networks: End of Hairballs / M Krzywinski
VISUAL CONFUSION

disambiguating layout algorithm from data is


impossible!

/ a separation of blue and red edges

/ columns of nodes

Authors acknowledge that both effects are


artefacts of the layout algorithm.

This suggests the following

/ what are we seeing?

/ what are we supposed to see?

/ is there a better layout algorithm?

Figure 2 and caption quote from Rual et al., Nature 437(7062):1173-8.

4 Nov 2010 / 21!


Linear Layout for Visualization of Networks: End of Hairballs / M Krzywinski
WHAT ARE WE HOPING TO SEE? WHAT IS IMPORTANT?

How are these networks different?


Network visualizations from SNAP (http://snap.stanford.edu/data/index.html). For full size poster see
http://mkweb.bcgsc.ca/linnet/img/network-communities.png
4 Nov 2010 / 22!
Linear Layout for Visualization of Networks: End of Hairballs / M Krzywinski
CONVENTIONAL NETWORK VISUALIZATION – PROBLEMS

IMPENETRABLE COMPLEXITY DATA SUBORDINATE TO LAYOUT COMPARISON IMPOSSIBLE

rapidly grow in visual complexity hairball’s form is determined by the layout algorithm is a major
layout algorithm influence of the final visualization
become visually impenetrable
node and edge metadata are similar networks may have different
subordinate to layout layouts
DEPICTIONS OF LARGE NETWORKS
EXCEED RESOLUTION OF OUTPUT AND
VISUAL PERCEPTION important characteristics of the DIFFERENCES BETWEEN TWO
network cease to drive the HAIRBALLS DO NOT NECESSARILY
visualization and cannot be REFLECT A DIFFERENCE IN THE DATA,
evaluated NOR CLEARLY CAPTURE THE EXTENT
OF THE DIFFERENCE

HAIRBALL VISUALIZATIONS DO NOT


CLEARLY REFLECT ASPECTS OF
INTEREST

4 Nov 2010 / 23!


Linear Layout for Visualization of Networks: End of Hairballs / M Krzywinski
THE SOLUTION – CONCEPT

the linear network layout addresses the


shortcomings of the conventional layout!

/ nodes are constrained to linear axes

/ edges are drawn as curves between nodes

In the linear network layout, nodes are constrained to linear axes. Edges are
drawn as curves between connected nodes.

4 Nov 2010 / 24!


Linear Layout for Visualization of Networks: End of Hairballs / M Krzywinski
THE SOLUTION – MAPPING

placement of nodes in the linear layout is


informed by connectivity and/or annotation!

/ layout is controlled solely by meaningful


properties

/ interpretation of the visualization is easy,


because the layout rules are based on data
properties

/ direct comparison between networks is


possible

/ describing how the layout was obtained uses


meaningful language (i.e. based on data
properties not aesthetics)

Nodes are mapped and positioned on axes based on structural


characteristics and/or annotations. The mappings are meant to be informed
by properties of interest, creating a layout that directly illustrates meaningful
aspects of the data set.
4 Nov 2010 / 25!
Linear Layout for Visualization of Networks: End of Hairballs / M Krzywinski
LAYOUT BASED ON STRUCTURE AND FUNCTION

NODE TO AXIS AXIS NODE POSITION COLOR AND SHAPE

absolute or rank ordered node


node type (source, sink, both) connectivity edge color and transparency
controlled by edge weight
node annotation class (e.g. gene neighbour connectivity
classification) glyphs or color codes at node
annotation property (e.g. positions classify nodes or layer
expression level) additional data
AXES CATEGORIZE NODES (NOMINAL
SCALE)
NODE POSITION ENCODES LOCAL
STRUCTURE (ORDINAL OR INTERVAL
SCALE)

4 Nov 2010 / 26!


Linear Layout for Visualization of Networks: End of Hairballs / M Krzywinski
SCALE, ORIENTATION AND SEGMENTATION

axis subdivision, scale and orientation can


be adjusted to add texture and reveal
patterns!

/ axis length can be absolute (e.g. number of


nodes on axis), or normalized

/ an axis can be further divided into segments to


further classify nodes (e.g. expression state)

/ individual axes or segments can be reversed or


scaled

Each axis may have modfied length, orientation, scale and segmentation.

4 Nov 2010 / 27!


Linear Layout for Visualization of Networks: End of Hairballs / M Krzywinski
APPLICATION

Yan et al.[1] compare E. coli gene regulatory


network to the Linux kernel function call
network. The linear layout method presented
here greatly facilitates in the visual assessment
of differences between these networks.

/ the networks are directional (geneA regulates


geneB or functionA calls functionB).

/ nodes are classified based on in/out degree


out only (source) – regulator
in/out – manager
in only (sink) – workhorse

/ node-to-axis mapping uses this node


classification

[1] Yan KK, Fang G, Bhardwaj N, Alexander RP, Gerstein M. 2010.


Comparing genomes to computer operating systems in terms of the
topology and evolution of their regulatory control networks. Proc
Natl Acad Sci U S A 107(20): 9186-9191.

Nodes in the network are classified as regulators (out only), managers (in/
out) and workhorses (in only). The E. coli network is bottom-heavy (many
workhorses), whereas Linux is top heavy (many regulators and managers).
4 Nov 2010 / 28!
Linear Layout for Visualization of Networks: End of Hairballs / M Krzywinski
CONVENTIONAL COMPARISON

Conventional layouts are not helpful in determining structure of the E. coli (left) and Linux (right) networks. Even though the networks are vastly
different, except for the network size, all properties are opaque.

4 Nov 2010 / 29!


Linear Layout for Visualization of Networks: End of Hairballs / M Krzywinski
LINEAR LAYOUT

Nodes are assigned to axes based on connectivity. Node position is based on rank order of the number of
edges at a node (degree). Axis length is proportional to the number of nodes on the axis.

E. coli (6x magnification) / The length of the workhorse (green) Linux / Large number of regulator-manager connections. Small number of
axis demonstrates an over-representation in this category. workhorse nodes have very high connectivity (increased density of edges at
Very few regulator-manager (red-yellow) connections exist. end of workhorse axis). Approximately 1/3 of the regulators (red) have high
Workhorse connectivity is uniform. connectivity to about 5% of the managers (orange), as evidenced by the
converging edge density between the two axes.
4 Nov 2010 / 30!
Linear Layout for Visualization of Networks: End of Hairballs / M Krzywinski
NORMALIZED AXIS LENGTH

The layout method is the same as in the previous slide, but here axis length is normalized to decouple node
category size from connectivity patterns. This view allows direct comparison based on node category fractions.

E. coli / Small number of managers are highly connected. Linux / Heavily connected workhorses are more clearly evidenced when axis
length is normalized.

4 Nov 2010 / 31!


Linear Layout for Visualization of Networks: End of Hairballs / M Krzywinski
STRUCTURAL ANNOTATION

Layering structural information is easily done using color. Here, links to the most connected node in each group
(i.e. most connected regulator, manager, workhorse) are colored by the node’s axis color, demonstrating
neighbour connectivity around a node category’s most connected member.

E. coli / The most connected regulator (red) primarily Linux / Unlike E. coli, here the most connected manager is connected to
connects to workhorses, but also 5 distinct managers. The regulators. Note the regular banding pattern in the links, suggesting
most connected manager connects to workhorses. substructure.

4 Nov 2010 / 32!


Linear Layout for Visualization of Networks: End of Hairballs / M Krzywinski
SEGMENTATION

Yan et al. further classified each node as either non-persistent or persistent. This is shown here by splitting each
axis into two segments that correspond to these two classifications.

E. coli / Relatively few nodes are classified as persistent (outer Linux / Each axis contains a near-equal mix of node types. Note that the
segments on each axis). most connected workhorse is non-persistent (inner segment), whereas the
most connected manager is persistent (outer segment).

4 Nov 2010 / 33!


Linear Layout for Visualization of Networks: End of Hairballs / M Krzywinski
INTRA-AXIS CONNECTIONS

Managers (in/out nodes) can connect to other managers. These intra-axis links were previously not shown, but
can be revealed by cloning the manager axis and displaying manager-manager connections between the
cloned axes. The network is directional, with the edge direction clockwise between the two axes.

E. coli / The manager-manager connections are largely Linux / The most connected manager has a large number of in edges (it’s
composed of the most connected manager (its high degree is found on the second of the cloned axes, clockwise) and its connectivity to
due to out edges) connecting to other managers. This other managers is exclusive to persistent managers.
suggests a cascade.

4 Nov 2010 / 34!


Linear Layout for Visualization of Networks: End of Hairballs / M Krzywinski
APPLICATION – ABSOLUTE CONNECTIVITY

Here, node position is based on absolute degree of a node (number of edges). Axis length is therefore
proportional to the maximum node degree within a node group.

E. coli (3.5x magnification) / The distribution of node degrees Linux / Intra-manager (yellow) edges reveal that large number of managers
becomes evident, with the highest connectivity seen in with low degree connect to managers with a high degree. From the
managers. manager-workhorse links, it is clear that only low degree managers connect
to workhorses, whereas high degree managers connect to regulators.

4 Nov 2010 / 35!


Linear Layout for Visualization of Networks: End of Hairballs / M Krzywinski
AXIS MAGNIFICATION

Detail can be revealed by magnifying an axis, or individual segments. When the range of node degrees is small,
links connect axes at discrete positions.

E. coli / Workhorse axis is magnified 25x. Linux / Regulator axis is magnified 25x.

4 Nov 2010 / 36!


Linear Layout for Visualization of Networks: End of Hairballs / M Krzywinski
VISUAL COMPARISON REVEALS DIFFERENCES

Each view reveals different aspects of the two networks, and contrasts distinct differences. Unlike the hairballs,
each view is different for the two networks.

E. coli Linux

4 Nov 2010 / 37!


Linear Layout for Visualization of Networks: End of Hairballs / M Krzywinski
CLASSIFYING PROTEIN INTERACTIONS

Rual et al., Nature 437(7062):1173-8.

4 Nov 2010 / 38!


Linear Layout for Visualization of Networks: End of Hairballs / M Krzywinski
CLASSIFYING PROTEINS AND INTERACTIONS

Rual et al., Nature 437(7062):1173-8.

4 Nov 2010 / 39!


Linear Layout for Visualization of Networks: End of Hairballs / M Krzywinski
HIVE PLOT SETUP

4 Nov 2010 / 40!


Linear Layout for Visualization of Networks: End of Hairballs / M Krzywinski
5

1. confirmed interactions are mostly


for proteins with few interactions
4
2. confirmed self-interaction for
most connected mixed protein
(TRAF2)
3. more undetected interactions to
unconfirmed proteins than
unconfirmed interactions to 3
undetected proteins
4. fewer confirmed interactions 6
between unconfirmed than
undetected proteins
5. most connected unconfirmed
protein has no confirmed or 14
undetected links
6. most connected undetected
protein has an unconfirmed link

data from Rual et al., Nature 437(7062):1173-8.

3
4 Nov 2010 / 41!
Linear Layout for Visualization of Networks: End of Hairballs / M Krzywinski
APPLICATION – LAYERED NETWORKS

Suppose you have a network composed of three distinct edge groups. These could be thought of as layers of connectivity, with each layer
describing a different type of relationship.

4 Nov 2010 / 42!


Linear Layout for Visualization of Networks: End of Hairballs / M Krzywinski
APPLICATION – LAYERED NETWORKS

A matrix of linear layouts can reveal how connectivity layers correlate. For each plot the connectivity data that is used to (a) map nodes to axes
and determine node position and (b) draw links is not necessarily the same.

4 Nov 2010 / 43!


Linear Layout for Visualization of Networks: End of Hairballs / M Krzywinski
APPLICATION – STACKED PLOTS

the linear network view can be used to


compose stacked bar plots, ideally suitable
for comparing multiple ratios!

/ edges are drawn as ribbons, with different


edge lengths

/ nodes become data values

Application of the network layout to stacked bar plots. The plots are
wrapped circularly, creating a comparison loop.

4 Nov 2010 / 44!


Linear Layout for Visualization of Networks: End of Hairballs / M Krzywinski
VISUALIZING ASSEMBLY QUALITY

D G

E F

C H

B A

Pairwise comparison between bases in reads, assembly and reference. For example: (A) 20% of reads are unassembled, (B) 30% of reads are
unaligned to reference, (C) 2% of reference has no read coverage, (D) 15% of reference has no contig coverage, (E) 60% of reference is
constructed by contigs <200kb, (F) there are no contigs >200kb, (G) 20% of contigs are unaligned to reference, (H) 80% of contig bases
assembled at k=27.
4 Nov 2010 / 45!
Linear Layout for Visualization of Networks: End of Hairballs / M Krzywinski
USING THE SOFTWARE

search GIN for “linnet”!

/ use local installation

/ download software from web page

mkweb.bcgsc.ca/linnet

4 Nov 2010 / 46!


Linear Layout for Visualization of Networks: End of Hairballs / M Krzywinski
ACKNOWLEDGEMENTS

BC CANCER AGENCY

Cydney Nielsen
Shaun Jackman
Rod Docking
Anthony Fejes
Dan Fornika
Jenny Qian

Katayoon Kasaian
Olena Morozova
Inanc Birol
Steven Jones
Marco Marra"

MASARYK UNIVERSITY
The genius of Gene Rodenberry allowed him to predict a future in which
hairballs run amok. In this episode of Star Trek, Trouble with Tribbles,
Martin Lysak engineer Scott consults with Kirk and Spock about the hairball crisis. Note
the tribble in Kirk’s cup and those stuck to the walls. It isn’t clear how
tribbles, which have no legs, can adhere to a vertical surface.
Star Trek Episode 44, 2nd Season, 29 Dec 1967

4 Nov 2010 / 47!


Linear Layout for Visualization of Networks: End of Hairballs / M Krzywinski
Linear Layout for Visualization of Networks: End of Hairballs / M Krzywinski

You might also like