0% found this document useful (0 votes)
31 views16 pages

Entropy Directed Graphs

This paper presents novel methods for measuring structural complexity in directed graphs, addressing a gap in existing literature that primarily focuses on undirected graphs. The authors extend von Neumann entropy and Estrada's heterogeneity index to directed graphs, providing simplified forms based on vertex in-degree and out-degree statistics. The study includes an analysis of various network types, demonstrating the applicability of the proposed measures in characterizing network structures.

Uploaded by

hunjayr60
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views16 pages

Entropy Directed Graphs

This paper presents novel methods for measuring structural complexity in directed graphs, addressing a gap in existing literature that primarily focuses on undirected graphs. The authors extend von Neumann entropy and Estrada's heterogeneity index to directed graphs, providing simplified forms based on vertex in-degree and out-degree statistics. The study includes an analysis of various network types, demonstrating the applicability of the proposed measures in characterizing network structures.

Uploaded by

hunjayr60
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Entropy and Heterogeneity Measures

for Directed Graphs

Cheng Ye1 , Richard C. Wilson1 , César H. Comin2 , Luciano da F. Costa2 ,


and Edwin R. Hancock1,
1
Department of Computer Science, University of York,
York, YO10 5GH, UK
{cy666,[Link],[Link]}@[Link]
2
Institute of Physics at São Carlos, University of São Paulo,
P.O. Box 369, São Carlos, São Paulo, 13560-970, Brazil
{appdnails,ldfcosta}@[Link]

Abstract. In this paper, we aim to develop novel methods for measuring


the structural complexity for directed graphs. Although there are many
existing alternative measures for quantifying the structural properties of
undirected graphs, there are relatively few corresponding measures for
directed graphs. To fill this gap in the literature, we explore a number of
alternative techniques that are applicable to directed graphs. We com-
mence by using Chung’s generalisation of the Laplacian of a directed
graph to extend the computation of von Neumann entropy from undi-
rected to directed graphs. We provide a simplified form of the entropy
which can be expressed in terms of simple vertex in-degree and out-
degree statistics. Moreover, we find approximate forms of the von Neu-
mann entropy that apply to both weakly and strongly directed graphs,
and that can be used to characterize network structure. Next we ex-
plore how to extend Estrada’s heterogeneity index from undirected to
directed graphs. Our measure is motivated by the simplified von Neu-
mann entropy, and involves measuring the heterogeneity of differences in
in-degrees and out-degrees. Finally, we perform an analysis which reveals
a novel linear relationship between heterogeneity and resistance distance
(commute time) statistics for undirected graphs. This means that the
larger the difference between the average commute time and shortest re-
turn path length between pairs of vertices, the greater the heterogeneity
index. Based on this observation together with the definition of commute
time on a directed graph, we define an analogous heterogeneity measure
for directed graphs. We illustrate the usefulness of the measures defined
in this paper for datasets describing Erdos-Renyi, ’small-world’, ’scale-
free’ graphs, Protein-Protein Interaction (PPI) networks and evolving
networks.

Keywords: directed graph, structural complexity, von Neumann en-


tropy, heterogeneity index.


Edwin R. Hancock is supported by a Royal Society Wolfson Research Merit Award.

E. Hancock and M. Pelillo (Eds.): SIMBAD 2013, LNCS 7953, pp. 219–234, 2013.

c Springer-Verlag Berlin Heidelberg 2013
220 C. Ye et al.

1 Introduction
Recently there has been considerable interest in analyzing the properties of com-
plex networks since they play a significant role in modelling large-scale systems
in biology, physics and the social sciences. In fact, complex networks provide con-
venient models for complex systems. However, to render such models tractable,
it is essential to have to hand methods for characterizing their salient properties.
As Costa and Rodrigues [7] stated, complex networks are graphs whose connec-
tivity properties deviate from those of regular graphs, which can be defined as
a process of being ’simple’, and the complexity of a network can be understood
as the complement of simplicity. Structural complexity is therefore perhaps the
most important property of a complex network. In order to analyze the features
of a complex network it is imperative that computationally efficient measures are
to hand that can be used to represent and quantify the structural complexity.
In this context graph theoretic methods are often used since they provide
effective tools for characterizing network structure together with the intrinsic
complexity. This approach has lead to the design of several practical methods for
characterizing the global and local structure of undirected networks. However,
there is relatively little work aimed at characterizing directed network struc-
ture. One of the reasons for this is that the graph theory underpinning directed
networks is less developed than that for undirected networks.
The aim in this paper is to explore whether a number of different charac-
terizations developed for undirected graphs can be extended to the domain of
directed graphs, using some recent results from spectral graph theory.

1.1 Related Literature


Recently, Amancio et al. [1] have shown that labyrinths can be modelled as
complex networks and studied in terms of the concept of absorption time, defined
as the time it takes for a random walk from an internal node to an output node,
to classify networks’ metrics. Moreover, Estrada [10] has proposed an index that
can be used to quantify the heterogeneity characteristics of undirected graphs.
This index depends on vertex degree statistics and graph size. The lower bound
of this quantity is zero, which occurs for a regular graph (i.e. all the vertices
have the same degree). The upper bound is equal to one, which is obtained for
a star graph (i.e. there exists a central vertex and all other vertices connect and
only connect to it).
Working in the domain of structural pattern recognition, Xiao et al. [19] have
explored how the heat kernel trace can be used as a means to characterize the
structural complexity of graphs. To do this, they first consider the zeta function
associated with the Laplacian eigenvalues and use the derivative of zeta function
at origin as a characterization for distinguishing different types of graphs. Ren
et al. [15] have developed a novel method to characterize unweighted graphs by
using the polynomial coefficients determined by the Ihara zeta function. To do
this, they construct a pattern vector of Ihara coefficients, and successfully use this
to cluster unweighted graphs. Furthermore, they extend their work by applying
Entropy and Heterogeneity Measures for Directed Graphs 221

Ihara coefficients from unweighted graphs to edge-weighted graphs, which is


achieved by establishing the Perron-Frobenius operator with the assistance of a
reduced Bartholdi zeta function.
Escolano et al. [8] have used the concept of thermodynamic depth to measure
the complexity of networks. They first define the polytopal complexity of a graph
and then introduce a phase-transition principle which links this complexity to
the heat flow, and thus obtain a complexity measure referred as flow complexity.
Recently, Han et al. [12] have developed simplified expressions of von Neumann
entropy on undirected graphs. To do this, they replace the Shannon entropy by
its quadratic counterpart, investigate how to simplify and approximate the cal-
culation of von Neumann entropy. They also explore the relationship among the
heterogeneity index, commute time and the von Neumann entropy, and introduce
a graph complexity measure based on thermodynamic depth.
The above provides a brief survey of recent work on the structural complex-
ity of undirected graphs. However, in the real world, directed graphs are also
common as many networks can be modelled with them. For example, the World
Wide Web is a directed network in which vertices represent web pages while
edges are the hyperlinks between pages.
Turning our attention to directed graphs, Riis [16] has extended the concept of
entropy to directed graphs, using the definitions of guessing number and shortest
index code. He shows that the entropy is the same as the guessing number and
can be bounded by the graph size and shortest index code size. Berwanger et
al. [4] have proposed a new parameter for the complexity of infinite directed
graphs by measuring to what extent the cycles in graphs are intertwined. This
index is defined according to the definitions of tree width, directed tree width
and hypertree width and a similar ’robber-and-cops’ game. Recently Escolano
et al. [9] have extended the concept of heat diffusion thermodynamic depth for
undirected networks to directed networks and thus obtain a measure to quantify
the complexity of structural patterns encoded by directed graphs.

1.2 Paper Outline


One natural way of capturing the structure of directed networks is to use statis-
tics that capture the balance of in-degree and out-degree at vertices. In this pa-
per we commence from Passerini and Severini’s work [13], which interprets the
normalized Laplacian as a density matrix for an undirected graph, and this in
turn allows the graph to be characterized in terms of the von Neumann entropy
associated with the density matrix. We extend this work to directed graphs,
using Chung’s [6] definition of the normalized Laplacian on a directed graph.
According to this definition, the directed normalized Laplacian is Hermitian,
so Passerini and Severini’s interpretation still holds for the domain of directed
graphs. The von Neumann entropy is essentially the Shannon entropy associ-
ated with the normalized Laplacian eigenvalues. If we approximate the Shannon
entropy by its quadratic counterpart, then the von Neumann entropy can be
simplified. The resulting expression depends on the in-degree and out-degree of
pairs of vertices connected by edges.
222 C. Ye et al.

To simplify this resulting expression a step further, we consider graphs that


are either weakly or strongly directed, i.e. those in which there are large or small
proportions of bidirectional edges, and develop corresponding approximations of
the von Neumann entropy.
Finally, we explore how Estrada’s heterogeneity index can be extended from
undirected to directed graphs. Our study of von Neumann entropy suggests a
statistic determined by the in-degree and out-degree for nodes connected by
a directed edge. We show that the resulting heterogeneity index is linked to
the difference between the elements of the normalized adjacency matrix (as a
local measure of connectivity) and the average commute time between nodes (or
resistance distance) as a more global measure of connectivity structure.
The outline of this paper is as follows. In Sect.2, we develop the simplified
forms of von Neumann entropy of directed graphs, and in Sect.3, we introduce the
heterogeneity index and commute time on directed graphs and then investigate
their correlation. In Sect.4, we analyze our theoretical result by undertaking
experiment on network datasets and finally we conclude this paper with an
evaluation of our contribution and suggestions for future work.

2 Von Neumann Entropy of Directed Graphs

In this section, we propose novel methods for characterizing the complexity of


directed graphs. The first method is based on extending the definition of von
Neumann entropy from undirected to directed graphs. To do this we commence
from Chung’s definition of the Laplacian for directed graphs. This leads to an
expression for the von Neumann entropy in terms of the in-degree and out-degree
statistics of vertices. We then provide approximations for the von Neumann
entropy for both strongly directed graphs where there are few bidirectional edges
and weakly directed graphs where there are few edges that are not bidirectional.

2.1 Laplacian of Directed Graphs

Suppose G(V, E) is a directed graph with vertex set V and edge set E ⊆ V × V ,
then the structure of this graph can be represented by a |V | × |V | adjacency
matrix A as follows (where |V | is the number of vertices in the graph)

1 if (i, j) ∈ E
Aij = (1)
0 otherwise.
The in-degree and out-degree of vertex i are

|V | |V |
 
din
i = Aji , dout
i = Aij . (2)
j=1 j=1

With these ingredients, the transition matrix P for the directed graph G is
defined as
Entropy and Heterogeneity Measures for Directed Graphs 223


Aij
dout
if (i, j) ∈ E
Pij = i (3)
0 otherwise.
According to the Perron-Frobenius Theorem, for a strongly connected directed
graph, the transition matrix P has a unique left eigenvector φ with φ(i) > 0, ∀i
which satisfies φP = ρφ where ρ denotes the eigenvalue of P . The theorem also
implies that if P is aperiodic, the eigenvalues of P have absolute values smaller
than the leading eigenvalue ρ = 1. Thus any random walk on a directed graph
will converge to a unique stationary distribution if the graph satisfies the prop-
|V |
erties of strong connection and aperiodicity. We normalize φ s.t. i=1 φ(i) = 1,
this normalized vector corresponds to the unique stationary distribution. There-
fore, the probability of a random walk is at vertex i is the  sum of all incoming
probabilities of vertices j satisfying (j, i) ∈ E, i.e. φ(i) = j,(j,i)∈E φ(j)Pji , then
we can obtain the following approximate equation

φ(i) din
≈ iin . (4)
φ(j) dj
As stated in Chung [6], if we let Φ = diag(φ(1), φ(2), ...), then the normalized
Laplacian matrix of a directed graph can be defined as

Φ1/2 P Φ−1/2 + Φ−1/2 P T Φ1/2


L̃ = I − . (5)
2
Clearly, the normalized matrix is Hermitian matrix, i.e. L̃ = L̃T where L̃T
denotes the conjugated transpose of L̃.

2.2 Von Neumann Entropy of Undirected Graphs


Having defined the prerequisites, we now show how the concept of von Neumann
entropy can be extended from undirected to directed graphs. Passerini and Sev-
erini [13] have argued that the normalized Laplacian can be interpreted as the
density matrix of an undirected graph, and hence the associated von Neumann
entropy of the graph is the Shannon entropy associated with the normalized
Laplacian eigenvalues, i.e.
|V |
 λ̃i λ̃i
HVUN = − ln (6)
i=1
|V | |V |
where λ̃i , i = 1, ..., |V | are the eigenvalues of the normalized Laplacian matrix.
Commencing from their definition, Han et al. [12] have shown that for an
|V |
 λ̃i λ̃i
undirected graph G(V, E), the Shannon entropy HS = − U
ln can be
i=1
|V | |V |
|V |
 λ̃i λ̃i
approximated by the quadratic entropy = U
HQ (1 − ). As a result the
i=1
|V | |V |
von Neumann entropy of undirected graphs can be approximated by
224 C. Ye et al.

T r[L̃] T r[L̃2 ]
HVUN = − . (7)
|V | |V |2
For undirected graphs, the traces appearing in the above expression can be
approximated by degree statistics, with the result that
1 1  1
HVUN = 1 − − . (8)
|V | |V | 2 di dj
(i,j)∈E

2.3 Von Neumann Entropy of Directed Graphs


To extend the analysis of Han et al. [12] to directed graphs, we commence from
(7) and repeat the computation of traces for the case of a directed graph. This
is not a straightforward task, and requires that we distinguish between the in-
degree and out-degree of vertices. To commence, we turn to Chung’s expression
for the normalized Laplacian of directed graphs and write

Φ1/2 P Φ−1/2 + Φ−1/2 P T Φ1/2


T r[L̃] = T r[I − ]
2
1 1
= T r[I] − T r[Φ1/2 P Φ−1/2 ] − T r[Φ−1/2 P T Φ1/2 ]. (9)
2 2
Since the trace is invariant under cyclic permutations, i.e. T r[ABC] = T r[BCA]
= T r[CAB], we have

1 1
T r[L̃] = T r[I] − T r[P Φ−1/2 Φ1/2 ] − T r[P T Φ1/2 Φ−1/2 ]
2 2
1 1
= T r[I] − T r[P ] − T r[P T ]. (10)
2 2
The diagonal elements of the transition matrix P are all zeros, hence we obtain

T r[L̃] = T r[I] = |V |, (11)


which is exactly the same as in the case of undirected graphs.
Next we turn our attention to T r[L̃2 ],
T r[L̃2 ] = T r[I 2 − (Φ1/2 P Φ−1/2 + Φ−1/2 P T Φ1/2 ) +
1 1/2 −1/2 1/2 −1/2 1/2 −1/2 −1/2 T 1/2
(Φ PΦ Φ PΦ +Φ PΦ Φ P Φ +
4
Φ−1/2 P T Φ1/2 Φ1/2 P Φ−1/2 + Φ−1/2 P T Φ1/2 Φ−1/2 P T Φ1/2 )]
2 T 1 2 −1 T T −1 T2
= T r[I ] − T r[P ] − T r[P ] + (T r[P ] + T r[P Φ P Φ] + T r[P ΦP Φ ] + T r[P ])
4
1 2 −1 T
= |V | + (T r[P ] + T r[P Φ P Φ]), (12)
2

which is different to the result obtained in the case of undirected graphs.


Entropy and Heterogeneity Measures for Directed Graphs 225

To continue the development we first divide the edge set E into two dis-
joint subsets E1 and E2 , where E1 = {(i, j)|(i, j) ∈ E and (j, i) ∈
/E}, E2 =
{(i,j)|(i, j) ∈ E and (j, i) ∈ E} that satisfy the conditions E1 E2 = E,
E1 E2 = ∅. Then according to the definition of the transition matrix, we
find

|V | |V |
   1
T r[P 2 ] = Pij Pji = . (13)
dout dout
i=1 j=1 (i,j)∈E2 i j

Using the fact that Φ = diag(φ(1), (2), ...) we have


|V | |V |
  φ(i)  φ(i)
T r[P Φ−1 P T Φ] = Pij2 = 2 . (14)
i=1 j=1
φ(j)
(i,j)∈E
φ(j)dout
i

φ(i) din
Using (4), i.e. φ(j) ≈ i
din
, we can approximate the von Neumann entropy of a
j
directed graph in terms of the in-degree and out-degree of the vertices as follows

   
1 1 1 din 1
HVDN = 1 − − + i
− ,
|V | 2|V |2 dout
i dj
out
dj dout
in 2
dout dout
(i,j)∈E i (i,j)∈E1 i j
(15)
or equivalently,
  
1 1 din 1
HVDN =1− − i
2 + out dout . (16)
|V | 2|V |2 in
dj di out di j
(i,j)∈E (i,j)∈E 2

We can simplify this expression a step further according to the relative sizes of
the sets E1 and E2 .
For weakly directed graphs, |E1 |  |E2 |, i.e. few of the edges are not bidi-
rectional, and we can ignore the summation over E1 in (15). Re-writing the
remaining terms in curly braces in terms of a common denominator and then
dividing numerator and denominator by douti dj
out
we obtain

din din
j
1 1  i
i
dout
+ dout
j
HVWND =1− − . (17)
|V | 2|V |2 dout
i dj
in
(i,j)∈E

The first term 1 − |V1 | tends to unity as the graph size becomes large and the
remaining term is normalized by 2|V |2 . In its second term above, the numerator
is given in terms of the sum of the ratios of in-degree and out-degree at the two
vertices. Since the directed edges cannot commence at a sink (a node of zero out-
2
degree), the ratios do not become infinite. Replacing dout
i in the denominator by
din
i dout
i , we obtain the following expression that approximates the von Neumann
entropy for weakly directed graphs
226 C. Ye et al.

1 1   1 1
HVWND =1− − + in out . (18)
|V | 2|V | 2 out
di djin di dj
(i,j)∈E

On the other hand, for strongly directed graphs, there are few bidirectional edges,
i.e. |E2 |  |E1 |, and we can ignore the summation over E2 in (16), giving the
approximate entropy for strongly directed graphs

1 1   1
HVSD = 1 − − . (19)
N
|V | 2|V |2 dout
i dj
in
(i,j)∈E

Both the weakly and strongly directed forms of the von Neumann entropy (HVWND
and HVSDN ) contain two terms. The first is the graph size while the second one
depends on the in-degree and out-degree statistics of each pair of vertices linked
by an edge. Moreover, the computational complexity of these expressions is
quadratic in the graph size.
There are a number of points to note concerning the development above. First,
the normalized Laplacian matrix of directed graphs denoted by L̃ in (5) satisfies
Passerini and Severini’s conditions [13] for the density matrix. Moreover, we
have shown that L̃ is Hermitian matrix, so its eigenvalues are all real. Hence
theoretically, the density matrix interpretation of Passerini and Severini [13] can
be extended to directed graphs. Second, when the out-degree and in-degree are
the same at a vertex, then the von Neumann entropy for directed and undirected
graphs are identical.

3 Heterogeneity Index and Commute Time

In this section, we present an index which quantifies the heterogeneous properties


of directed graphs. We introduce the definitions of hitting time and commute
time and describe how to compute them, then explore that on undirected graphs,
there exists a relationship between heterogeneity index and commute time, and
show that the similar relationship also applies to the directed graphs.

3.1 Heterogeneity Index of Directed Graphs

Following Estrada [10], in order to compute a heterogeneity index for directed


graphs, we first require a local index to measure the irregularity associated with
a single edge (i, j) ∈ E. Estrada [10] uses the following quantity to measure the
variation in degree
U
σij = [f (di ) − f (dj )]2 (20)
where f (d) is a function of the vertex degree. To extend this measure to directed
graphs, we measure the difference in out-degrees and in-degrees and write

i ) − f (dj )] .
D
σij = [f (dout in 2
(21)
Entropy and Heterogeneity Measures for Directed Graphs 227

This local heterogeneity measure takes on a value zero when the out-degree of the
starting vertex is the same as the in-degree of the end vertex. On the other hand,
the index should become larger when the difference of both degrees increases,
thus we can select f (d) = d−1/2 . The local heterogeneity index associated with
the irregularity of the edge (i, j) ∈ E in a directed graph is given by
 2
1 1
D
σij = − . (22)
dout
i din
j

To compute the global heterogeneity index of a directed graph we sum the local
measure over all the edges in the graph to obtain
  1 1
2   1 1  1
D
ρ (G) = − = out + in −2 .
out
di djin di dj di din
out
(i,j)∈E (i,j)∈E (i,j)∈E j

(23)
The heterogeneity index should take on a minimal value when the graph is
regular, i.e. all the vertices have the same in-degree and out-degree. It is maximal
when the graph is a star graph, i.e. there exists a central vertex such that all the
other vertices connect and only connect to it. We calculate the lower and upper
bounds of ρD (G) according to these constraints. For a regular directed graph,
suppose all the vertices have the same in-degree and out-degree d0 , then
 1 1  1
ρD (G) = + −2 = 0.
d0 d0 d0
(i,j)∈E (i,j)∈E

On the other hand, for a star graph, suppose that the central vertex has out-
degree (in-degree) |V | − 1 and all the other vertices have in-degree (out-degree)
1. Then,

|V | |V | 
 1  1 |V |(|V | − 2 |V | − 1) 
ρD (G) = ( +1)−2  = ≈ |V |−2 |V | − 1.
i=1
|V | − 1 i=1 |V | − 1 |V | − 1

We hence have the following lower and upper bounds for the heterogeneity index

  1 1 2
0 ≤ ρD (G) = out + in − ≤ |V | − 2 |V | − 1. (24)
di dj dout in
(i,j)∈E i dj

Therefore we can define the normalized heterogeneity index of directed graphs


as

1   1 1 2
out + in −
D
ρ̃ (G) = (25)
|V | − 2 |V | − 1 di dj
(i,j)∈E dout din i j

This index is zero for regular directed graphs, one for star graphs, i.e. 0 ≤
ρ̃D (G) ≤ 1.
228 C. Ye et al.

3.2 Commute Time of Directed Graphs


To take our development one step further, we establish a relationship between
the heterogeneity index and the commute time (or resistance distance) between
nodes in a graph. To this end we commence by introducing the definitions of
hitting time and commute time on directed graphs. The hitting time QD ij is the
expected number of steps for a random walk to reach vertex j for the first time,
D
starting from vertex i. The commute time Cij is the sum of QD D
ij and Qji , i.e.
D D D
Cij = Qij + Qji , is the expected number of steps of a random walk starting at
vertex i, visits j for the first time and then returns to vertex i.
Our expressions for both the hitting time and commute time are from Boley
et al. [5]. We first introduce the definition of fundamental matrix Z which has
elements ∞

Zij = (Pijt − φ(j)), 1 ≤ i, j ≤ |V | (26)
t=0

or in matrix form,


Z= (P t − 1φ) (27)
t=0

where P is the transition matrix, 1 = (1, ..., 1)T and φ is the stationary distri-
bution.
The formulae for hitting time and commute time are
Zjj − Zij Zjj − Zij Zii − Zji
QD
ij = , D
Cij = QD D
ij + Qji = + . (28)
φ(j) φ(j) φ(i)

3.3 Relationship between Heterogeneity Index and Commute Time


According to Estrada [10], the normalized heterogeneity index of undirected
graph has the following form

1  1 1 2
ρ̃U (G) = + − . (29)
|V | − 2 |V | − 1 (i,j)∈E i
d dj di dj

Recently, von Luxburg et al. [17] have shown that if the graph size is large
enough, then the hitting time and commute time can be approximated by the
resistance distance whichtakes on a simple form in terms of the vertex degree.
U
In particular, Cij ≈ vol d1i + d1j where vol is the volume of graph defined
|V |
by vol = i=1 di . As a result the first term appearing in the expression for
Estrada’s heterogeneity index can be expressed in terms of commute time.
To take this development one step further, we note that the normalized ad-
jacency matrix for an undirected graph is given by à = D−1/2 AD−1/2 where
D is the diagonal matrix of vertex degrees. The normalized adjacency matrix
has elements Ãij = √ 1 , if (i, j) ∈ E. As a result, in the heterogeneity index
di dj
Entropy and Heterogeneity Measures for Directed Graphs 229

U
Cij
formula, if we make the substitutions 1
di + 1
dj = vol and √ 1 = Ãij we obtain
di dj
the approximation

1   CijU
ρ̃ (G) ≈
U
− 2Ãij . (30)
|V | − 2 |V | − 1 (i,j)∈E vol

To extend this result to directed graphs, we note that

  1 1  Cij
D
+ ≈ (31)
dout
i din
j vol
(i,j)∈E (i,j)∈E

|V | out |V | in


where vol = i=1 di = i=1 di . If we denote by Dout , Din the diagonal
matrices of vertex out-degrees and in-degrees respectively, then the normalized
−1/2 −1/2
adjacency matrix for a directed graph is ÃD = Dout ADin with elements
A˜D ij = √ out
1
in
, if (i, j) ∈ E.
di dj
Hence, we obtain the following relationship between the heterogeneity index
and commute time on directed graphs as

1   CijD
ρ̃D (G) ≈ − 2A˜D ij . (32)
|V | − 2 |V | − 1 (i,j)∈E vol

Thus we have shown that this relationship between heterogeneity index and
commute time applies not only to undirected graphs but also to directed graphs.
Hence for both directed and undirected graphs, if the heterogeneity index
is chosen in an appropriate way then there are two observations that can be
drawn from this analysis. First, the heterogeneity index is proportional to the
average commute time over pairs of nodes connected by an edge. Second, the
heterogeneity index is greatest when the difference between the commute time
and the twice the normalized adjacency matrix element is greatest. Hence, the
heterogeneity index will be smallest for regular graphs and greatest for trees or
star graphs.

4 Experiments and Evaluations

We have suggested several novel methods to measure the structural complexity


of directed graphs. In this section, we aim to evaluate these methods on network
data and give empirical analysis of their properties. First we examine both the
weakly and strongly directed forms of von Neumann entropy, and compare their
performance. Next, we explore whether our theoretically derived relationship
between the heterogeneity index and commute time holds for both undirected
and directed graphs.
230 C. Ye et al.

4.1 The Datasets


Before undertaking our experiments, we first give a brief overview of the datasets
used. The first dataset contains 150 undirected graphs in which the graph
size varies from 50 to 100 nodes. Of this sample, 50 graphs are generated us-
ing the Erdos-Renyi model, which is considered as the most classical random
graph model. An additional 50 graphs are generated according to the ’small-
world’ model, which was introduced by Watts and Strogatz [18]. The remaining
50 graphs are generated using the ’scale-free’ model, which was developed by
Barabasi and Albert [3]. The second dataset contains Protein-Protein Interac-
tion (PPI) networks extracted from Franceschini et al. [11]. These graphs repre-
sent the interaction relationships between histidine kinase in different species of
bacteria. The third dataset consists of 10 evolving directed networks. Each net-
work commences from a fully connected network of size 5, and evolves gradually
with new connections being established proportionally to the current dynamical
activity of each vertex (preferential attachment). This dataset is generated using
the model developed by Antiqueira et al. [2].

4.2 Entropy for Weakly and Strongly Directed Graphs


Equations (18) and (19) give the simplified forms of the von Neumann entropy for
weakly and strongly directed graphs. We calculate them according to these two
equations respectively and compare their behaviours with a reference entropy,
i.e. the approximate von Neumann entropy generated using (15) (or equivalently,
(16)), on the weakly and strongly directed networks in the third dataset.

1 1

0.95 approximate von Neumann entropy


approximate von Neumann entropy
weakly directed form 0.95
strongly directed form

0.9
0.9
entropy

entropy

0.85

0.85
0.8

0.8
0.75

0.7 0.75
0 20 40 60 80 100 0 20 40 60 80 100
time time

(a) (b)

Fig. 1. Entropy for weakly & strongly directed graphs

We see in both Fig.1(a) and Fig.1(b), as the network evolves, both the simpli-
fied form and the reference entropy increase approximately monotonically until
a plateaux value of unity is reached. Moreover, it is worth noting that the dif-
ference between these two quantities is negligible, thus we conclude that for
Entropy and Heterogeneity Measures for Directed Graphs 231

weakly (strongly) directed graphs, the approximate von Neumann entropy and
the simplified weakly (strongly) directed form we suggested are approximately
equivalent.
We then explore whether the von Neumann entropy can be used to distinguish
different types of graph. To this end we create directed versions of the Erdos-
Renyi, ’small-world’ and ’scale-free’ graphs by deleting at random elements from
the adjacency matrix. This has the effect of creating directed edges. In this
analysis we consider the quantity
  
1 1 1 din
J = HV N − (1 −
D
)= + i
|V | 2|V |2 out
di djout
din dout2
j i
(i,j)∈E (i,j)∈E 2

which removes some of the size dependence of the entropy.

6 5
Erdos−Renyi Erdos−Renyi
small−world 4.5 small−world
scale−free scale−free
5
4

3.5
4
3
frequency

frequency

3 2.5

2
2
1.5

1
1
0.5

0 0
0 1 2 3 4 5 6 7 8 0 1 2 3 4 5
approximate von Neumann entropy −3
x 10 approximate von Neumann entropy −3
x 10

(a) (b)

Fig. 2. Directed/Undirected graph characterization using von Neumann Entropy

In the left-hand of Fig.2, the plot shows superimposed histograms of J for each
of the three types of directed graph. The main feature to note is that the Erdos-
Renyi graphs are well separated from the ’small-world’ and ’scale-free’ graphs.
Moreover, the ’scale-free’ and ’small-world’ networks although overlapped are
reasonably well separated. The right-hand panel of Fig.2 repeats this analysis
for the undirected versions of the three types of graph, using the original form of
the von Neumann entropy suggested by Han et al. [12]. Here there is significantly
more overlap, and the different types of network can not be easily separated,
especially for the ’small-world’ and ’scale-free’ networks.

4.3 Heterogeneity Index and Commute Time

We have shown theoretically that the heterogeneity index has a linear depen-
dance on the the commute time for both undirected and directed graphs. In
this subsection we aim to confirm these results empirically. In Fig.3 we plot the
232 C. Ye et al.

heterogeneity index versus commute time for different types of graphs. Here the
commute time of undirected graphs is calculated precisely using the graph spec-
tral formula used by Qiu and Hancock [14]. Figure 3(a) shows the result for the
Erdos-Renyi, ’small-world’ and ’scale-free’ graphs (shown in different colours).
All three types of graphs follow a linear trend (i.e. they satisfy our theoretical
prediction), but populate different parts of the ’heterogeneity-commute time’
space. The second plot is for the protein-protein interaction networks. Although
there are some outliers, most of the data falls in a linear regression curve. In fact,
these outliers represent the graphs with particularly small graph size (e.g. 6 or
8), which is too small compared with others, thus these graphs do not perform
the similar relation as other graphs do. Then we turn our attention to Fig.3(c),
which is the plot of heterogeneity index versus average commute time for the
directed graphs in the evolving sequence. The commute time here is computed
according to (28). For the tightly clustered points in the upper right-hand cor-
ner of the plot, there is again a clear linear relationship, which confirms our
theoretical prediction in (32).
Finally we explore the performance of directed graph characterization us-
ing the heterogeneity index. The histogram of the directed graph heterogeneity
index is shown in Fig.4. In the histogram the ’scale-free’ graphs are perfectly

0.3 0.3 0.4

Erdos−Renyi
0.2 0.2
commute time & normalized adjacency matrix

commute time & normalized adjacency matrix

commute time & normalized adjacency matrix

0.25 small−world
scale−free
0.1 0
0.2
0 −0.2

0.15 −0.1 −0.4

0.1 −0.2 −0.6

−0.3 −0.8
0.05
−0.4 −1

0
−0.5 −1.2

−0.05 −0.6 −1.4


0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.05 0.1 0.15 0.2 0.25 0.3 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4
heterogeneity index heterogeneity index heterogeneity index

(a) (b) (c)

Fig. 3. Relationship between Heterogeneity Index and Commute Time on undi-


rected/directed graphs

Erdos−Renyi
6 small−world
scale−free

5
frequency

0
0 0.1 0.2 0.3 0.4 0.5
heterogeneity index

Fig. 4. Directed graph characterization using Heterogeneity Index


Entropy and Heterogeneity Measures for Directed Graphs 233

separated from the Erdos-Renyi and ’small-world’ graphs. The result is not un-
expected since for ’scale-free’ graphs, the difference in the vertex in-degrees and
out-degrees is particularly large, and the heterogeneity index of such graphs is
greater than that for other types of graphs.

5 Conclusion
In this paper, motivated by the aim of developing novel and effective methods
for quantifying the structural complexity of directed graphs, first we have devel-
oped approximations of the von Neumann entropy for both strongly and weakly
directed graphs. They both depend on the vertex in-degree and out-degree statis-
tics. Our approximations are based on using Chung’s definition of normalized
Laplacian matrix of directed graphs and simplifying the calculation via replac-
ing the Shannon entropy by the quadratic entropy. Next, following the idea of
developing the heterogeneity index for undirected graphs proposed by Estrada
[10], we construct a similar measure which quantifies the heterogeneous proper-
ties of directed graphs. Moreover, concerning the commute time (or resistance
distance), we have found that on an undirected graph, the heterogeneity index
has a particular relation with it. Extending this correlation to directed graphs,
we have discovered that they also exhibit a similar behaviour, which shows that
the heterogeneity index can be approximated by the commute time and the nor-
malized adjacency matrix. Then, in order to evaluate these methods and analyze
their properties, we have undertaken some experiments on both undirected and
directed network data and the experimental outcomes have demonstrated the
effectiveness of our methods. In the future, our work can be extended by intro-
ducing more approaches to improving the measures we proposed in this paper
for representing the structural complexity for directed graphs, and developing
more novel indices which can reflect a directed graph’s structure based on the
entropy and heterogeneity index.

Acknowledgements. The authors are grateful to FAPESP (12/50986-7) and


the University of York for financial support.

References
1. Amancio, D.R., Oliveira Jr., O.N., Costa, [Link] F.: On the Concepts of Complex
Networks to Quantify the Difficulty in Finding the Way Out of Labyrinths. Physica
A 390, 4673–4683 (2011)
2. Antiqueira, L., Rodrigues, F.A., Costa, [Link] F.: Modeling Connectivity in Terms
of Network Activity. Journal of Statistical Mechanics: Theory and Experi-
ment 0905.4706 (2009)
3. Barabasi, A.L., Albert, R.: Emergence of Scaling in Random Networks. Science 286,
509–512 (1999)
4. Berwanger, D., Gradel, E., Kaiser, L., Rabinovich, R.: Entanglement and the Com-
plexity of Directed Graphs. Theoretical Computer Science 463, 2–25 (2012)
234 C. Ye et al.

5. Boley, D., Ranjan, G., Zhang, Z.: Commute Times for a Directed Graph Using an
Asymmetric Laplacian. Linear Algebra and Its Applications 435, 224–242 (2011)
6. Chung, F.: Laplacians and the Cheeger Inequailty for Directed Graphs. Annals of
Combinatorics 9, 1–19 (2005)
7. Costa, [Link] F., Rodrigues, F.A.: Seeking for Simplicity in Complex Networks.
Europhysics Letters 85, 48001 (2009)
8. Escolano, F., Hancock, E.R., Lozano, M.A.: Heat Diffusion: Thermodynamic Depth
Complexity of Networks. Physical Review E 85, 036206 (2012)
9. Escolano, F., Bonev, B., Hancock, E.R.: Heat Flow-Thermodynamic Depth Com-
plexity in Directed Networks. In: Gimel’farb, G., Hancock, E., Imiya, A., Kuijper,
A., Kudo, M., Omachi, S., Windeatt, T., Yamada, K. (eds.) SSPR&SPR 2012.
LNCS, vol. 7626, pp. 190–198. Springer, Heidelberg (2012)
10. Estrada, E.: Quantifying Network Heterogeneity. Physical Review E 82, 066102
(2010)
11. Franceschini, A., Szklarczyk, D., Frankild, S., Kuhn, M., Simonovic, M., Roth,
A., Lin, J., Minguez, P., Bork, P., von Mering, C., Jensen, L.J.: STRING v9.1:
protein-protein interaction networks, with increased coverage and integration. Nu-
cleic Acids Res. 41, D808–D815 (2013)
12. Han, L., Escolano, F., Hancock, E.R., Wilson, R.C.: Graph Characterizations from
Von Neumann Entropy. Pattern Recognition Letters 33, 1958–1967 (2012)
13. Passerini, F., Severini, S.: The Von Neumann Entropy of Networks. International
Journal of Agent Technologies and Systems, 58–67 (2008)
14. Qiu, H., Hancock, E.R.: Clustering and Embedding Using Commute Times. IEEE
Transactions on Pattern Analysis and Machine Intelligence 29, 1873–1890 (2007)
15. Ren, P., Wilson, R.C., Hancock, E.R.: Graph Characterization via Ihara Coeffi-
cients. IEEE Transactions on Neural Networks 22, 233–245 (2011)
16. Riis, S.: Graph Entropy, Network Coding and Guessing Games. The Computing
Research Repository 0711.4175 (2007)
17. von Luxburg, U., Radl, A., Hein, M.: Hitting and Commute Times in Large Graphs
are often Misleading. Data Structures and Algorithms 1003.1266 (2010)
18. Watts, D.J., Strogatz, S.H.: Collective Dynamics of ’Small-World’ Networks. Na-
ture 393, 440–442 (1998)
19. Xiao, B., Hancock, E.R., Wilson, R.C.: Graph Characteristics from the Heat Kernel
Trace. Pattern Recognition 42, 2589–2606 (2009)

You might also like