Entropy and Graphs
Entropy and Graphs
by
A thesis
presented to the University of Waterloo
in fulfillment of the
thesis requirement for the degree of
Master of Math
in
Combinatorics and Optimization
I hereby declare that I am the sole author of this thesis. This is a true copy of the thesis,
including any required final revisions, as accepted by my examiners.
iii
Abstract
The entropy of a graph is a functional depending both on the graph itself and on
a probability distribution on its vertex set. This graph functional originated from the
problem of source coding in information theory and was introduced by J. Körner in 1973.
Although the notion of graph entropy has its roots in information theory, it was proved to
be closely related to some classical and frequently studied graph theoretic concepts. For
example, it provides an equivalent definition for a graph to be perfect and it can also be
applied to obtain lower bounds in graph covering problems.
In this thesis, we review and investigate three equivalent definitions of graph entropy
and its basic properties. Minimum entropy colouring of a graph was proposed by N. Alon
in 1996. We study minimum entropy colouring and its relation to graph entropy. We also
discuss the relationship between the entropy and the fractional chromatic number of a
graph which was already established in the literature.
A graph G is called symmetric with respect to a functional FG (P ) defined on the set of
all the probability distributions on its vertex set if the distribution P ∗ maximizing FG (P ) is
uniform on V (G). Using the combinatorial definition of the entropy of a graph in terms of
its vertex packing polytope and the relationship between the graph entropy and fractional
chromatic number, we prove that vertex transitive graphs are symmetric with respect to
graph entropy. Furthermore, we show that a bipartite graph is symmetric with respect to
graph entropy if and only if it has a perfect matching. As a generalization of this result,
we characterize some classes of symmetric perfect graphs with respect to graph entropy.
Finally, we prove that the line graph of every bridgeless cubic graph is symmetric with
respect to graph entropy.
v
Acknowledgements
I would like to thank my advisor Chris Godsil for his guidance and support throughout
my graduate studies in Combinatorics and Optimization Department. I am also grateful
to the Department of Combinatorics and Optimization for providing me with a motivating
academic environment.
vii
Table of Contents
List of Figures xi
1 Introduction 1
3 Graph Entropy 13
3.1 Entropy of a Convex Corner . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.2 Entropy of a Graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.3 Graph Entropy and Information Theory . . . . . . . . . . . . . . . . . . . 18
3.4 Basic Properties of Graph Entropy . . . . . . . . . . . . . . . . . . . . . . 19
3.5 Entropy of Some Special Graphs . . . . . . . . . . . . . . . . . . . . . . . . 21
3.6 Graph Entropy and Fractional Chromatic Number . . . . . . . . . . . . . . 26
3.7 Probability Density Generators . . . . . . . . . . . . . . . . . . . . . . . . 29
3.8 Additivity and Sub-additivity . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.9 Perfect Graphs and Graph Entropy . . . . . . . . . . . . . . . . . . . . . . 32
ix
4 Chromatic Entropy 35
4.1 Minimum Entropy Coloring . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.2 Entropy Comparisons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
4.3 Number of Colours and Brooks’ Theorem . . . . . . . . . . . . . . . . . . . 38
4.4 Grundy Colouring and Minimum Entropy Colouring . . . . . . . . . . . . . 39
4.5 Minimum Entropy Colouring and Kneser Graphs . . . . . . . . . . . . . . 41
4.6 Further Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
5 Symmetric Graphs 45
5.1 Symmetric Bipartite Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . 45
5.2 Symmetric Perfect Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
5.3 Symmetric Line Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
6 Future Work 55
6.1 Normal Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
6.2 Lovász ϑ Function and Graph Entropy . . . . . . . . . . . . . . . . . . . . 57
Appendix A 61
Bibliography 73
x
List of Figures
xi
Chapter 1
Introduction
V (F ∪ G) = V,
E (F ∪ G) = E (F ) ∪ E (G) .
The most important property of the entropy of a graph is that it is sub-additive with
respect to the union of graphs. This leads to the application of graph entropy for graph
covering problem as well as the problem of perfect hashing.
The graph covering problem can be described as follows. Given a graph G and a family
of graphs G where each graph Gi ∈ G has the same vertex set as G, we want to cover the
edge set of G with the minimum number of graphs from G. Using the sub-additivity of
graph entropy one can obtain lower bounds on this number.
Graph entropy was used in a paper by Fredman and Komlós for the minimum number
of perfect hash functions of a given range that hash all k-element subsets of a set of a given
size (see Fredman and Komlós [14]).
1
As another application of graph entropy, Kahn and Kim in [18] proposed a sorting
algorithm based on the entropy of an appropriate comparability graph.
In 1990, I. Csiszár, J. Körner, L. Lovász, K. Marton, and G. Simony, characterized
minimal pairs of convex corners which generate the probability density p = (p1 , · · · , pk ) in
a k-dimensional space. Their study led to another definition of the graph entropy in terms
of the vertex packing polytope of the graph. They also gave another characterization of a
perfect graph using the sub-additivity property of graph entropy.
The sub-additivity property of the graph entropy was further studied in J. Körner [20],
J. Körner and G. Longo [22], J. Körner and et. al. [23], and J. Körner and K. Marton [24].
Their studies led to the notion of a class of graphs which is called normal graphs.
A set A consisting of some subsets of the vertices of a graph G is a covering, if every
vertex of G is contained in an element of A.
A graph G is called a normal graph, if it admits two coverings C and S such that every
element C of C induces a clique and every element S of S induces a co-clique, and the
intersection of any element of C and any element of S is nonempty, i.e.,
C ∩ S 6= ∅, ∀C ∈ C, S ∈ S.
It turns out that one can consider normal graphs as a generalization of perfect graphs,
since every perfect graph is a normal graph (see J. Körner [20] and C. De Simone and J.
Körner [15]).
Noga Alon, and Alon Orlitsky studied the problem of source coding in information
theory using the minimum entropy colouring of the characteristic graph associated with a
given information source. They investigated the relationship between the minimum entropy
colouring of a graph and the graph entropy (see N. Alon and A. Orlitsky [1]).
This thesis is organized as follows. In Chapter 2, we define the entropy of a random
variable. We also briefly investigate the application of entropy in counting problems. In
chapter 3, we define the entropy of a graph. Let V P (G) be the vertex packing polytope of
a given graph G which is the convex hull of the characteristic vectors of its independent
sets. Let |V (G)| = n and P be a probability density on V (G). Then the entropy of G with
respect to the probability density P is defined as
n
X
Hk (G, P ) = min pi log(1/ai ).
a∈V P (G)
i=1
This is the definition of graph entropy which we work with throughout this thesis and was
given by I. Csiszár and et. al. in [9]. However, the origininal denition of graph entropy was
2
given by J. Körner [19] in the context of source coding problem in information theory and
is as follows.
Let G(n) be the n-th conormal power of the given graph G with vertex set
V G(n) = V n and edge set E (n) as
E (n) = {(x, y) ∈ V n × V n : ∃i : (xi , yi ) ∈ E}.
Furthermore, let
T(n) = {U ⊆ V n : P n (U ) ≥ 1 − }.
Then J. Körner [19] defined graph entropy Hk (G, P ) as
1
H(G, P ) = lim min log χ(G(n) [U ]). (1.1)
n→∞ U ∈T (n)
n
It is shown in I. Csiszár and et. al. in [9] that the above two definitions are equal. We
also investigate the basic properties of graph entropy and explain the relationship between
the the graph entropy and perfect graphs and fractional chromatic number of a graph.
Chapter 4 is devoted to minimum entropy colouring of a given graph and its connection to
the graph entropy. G. Simonyi in [36] showed that the maximum of the graph entropy of a
given graph over the probability density of its vertex set is equal to its fractional chromatic
number. We call a graph is symmetric with respect to graph entropy if the uniform density
maximizes its entropy. We show that vertex transitive graphs are symmetric. In Chapter 5,
we study some other classes of graphs which are symmetric with respect to graph entropy.
Our main results are the following theorems.
Theorem. Let G be a bipartite graph with parts A and B, and no isolated vertices. Then,
uniform probability distribution U over the vertices of G maximizes Hk (G, P ) if and only
if G has a perfect matching.
A. Schrijver [34] calls a graph G a k-graph if it is k-regular and its fractional edge
coloring number χ0f (G) is equal to k. We show that
Theorem. Let G be a k-graph with k ≥ 3. Then the line graph of G is symmetric with
respect to graph entropy.
As a corollary to this result we show that the line graph of every bridgeless cubic graph
is symmetric with respect to graph entropy.
3
Chapter 2
In this chapter, we explain some probabilistic preliminaries such as the notions of prob-
ability spaces, random variables and the entropy of a random variable. Furthermore, we
give some applications of entropy methods in counting problems.
5
P
and ω∈Ω p(ω) = 1, for all event A ∈ F, the probability of the event A is denoted by
P (A), which is X
P (A) = p(w)
ω∈A
Note that members of F are called measurable sets in measure theory; they are also called
events in a probability space.
On a finite set Ω, there is a natural probability measure P , called the (discrete) uniform
1
measure on 2Ω , which assigns probability |Ω| to singleton {ω} for each ω in Ω. Coin tossing
n
gives us examples with |Ω| = 2 , n = 1, 2, · · · . Another classical example is a fair die, a
perfect cube which is thrown at random so that each of the six faces, marked with the
integers 1 to 6, has equal probability 16 of coming up.
Probability spaces become more interesting when random variables are defined on them.
Let (S, S) be a measurable space. A function
X : Ω → S,
is called a measurable map from (Ω, F) to (S, S) if
X −1 (B) = {ω : X(ω) ∈ B} ∈ F.
If (S, S) = (R, R), the real valued function X defined on Ω is a random variable.
For a discrete probability space Ω any function X : Ω → R is a random variable. The
indicator function 1A (ω) of a set A ∈ F which is defined as
1, ω ∈ A,
1A (ω) = (2.1)
0, ω∈/ A.
is an example of a random variable. If X is a random variable, then X induces a probability
measure on R called its probability density function by setting
µ(A) = P (X ∈ A)
for sets A. Using the notation introduced above, the right-hand side can be written as
P (X −1 (A)). In words, we pull A ⊆ R back to X −1 (A) ∈ F and then take P of that set.
For a comprehensive study of probability spaces see R. M. Duddley [10] and Rick Durrett
[11].
In this thesis, we consider discrete random variables. Let X be a discrete random
variable with alphabet X and probability density function pX (x) = Pr{X = x}, x ∈ X .
For the sake of convenience, we use p(x) instead of pX (x). Thus, p(x) and p(y) refer to two
different random variables and are in fact different probability density functions, pX (x)
and pY (y), respectively.
6
2.2 Entropy of a Random Variable
Let X be a random variable X with probability density p(x). We denote the expectation
by E. Then expected value of the random variable X is written
X
E (X) = xp(x),
x∈X
and for a function g(.), the expected value of the random variable g(X) is written
X
Ep (g(X)) = g(x)p(x),
x∈X
or more simply as E (g(X)) when the probability density function is understood from the
context.
Let X be a random variable which drawn according to probability density function
p(x). The entropy of X, H(X) is defined as the expected value of the random variable
1
log p(x) , therefore, we have
X
H(X) = − p(x) log p(x).
x∈X
The log is to the base 2 and entropy is expressed in bits. Furthermore, 0 log 0 = 0. Since
1
0 ≤ p(x) ≤ 1, we have log p(x) ≥ 0 which implies that H(X) ≥ 0. Let us recall our coin
toss example where the coin is not necessarily fair. That is denoting the event head by H
and the event tail by T, let P (H) = p and P (T ) = 1 − p. Then the corresponding random
variable X is defined as X(H) = 1 and X(T ) = 0. That is we have
1, Pr{X = 1} = p;
X=
0, Pr{X = 0} = 1 − p.
Then,
H(X) = −p log p − (1 − p) log(1 − p).
Note that the maximum of H(X) is equal to 1 which is attained when p = 21 . Thus, the
entropy of a fair coin toss, i.e., P (H) = P (T ) = 12 is 1 bit. More generally for any random
variable X,
H(X) ≤ log |X |, (2.2)
with equality if and only if X is uniformly distributed.
7
The joint entropy H(X, Y ) of a pair of discrete random variables (X, Y ) with a joint
probability density function p(x, y) is defined as
XX
H (X, Y ) = − p(x, y) log p(x, y).
x∈X y∈Y
Now we can again get another description of the conditional entropy in terms of the
conditional expectation of random variable as follows.
X
H (Y |X) = p(x)H (Y |X = x)
x∈X
X X
= − p(x) p(y|x) log p(y|x)
x∈X y∈Y
XX
= − p(x, y) log p(y|x)
x∈X y∈Y
= −E log p(Y |X).
The following theorem is proved by T. Cover and J. Thomas in [8] pages 17 and 18.
2.2.1 Theorem. Let X, Y , and Z be random variables with joint probability distribution
p(x, y, z). Then we have
H (X, Y ) = H (X) + H (Y |X) ,
H (X, Y |Z) = H (X|Z) + H (Y |X, Z) .
Furthermore, letting f (.) be any function (see T. Cover and J. Thomas [8] pages 34
and 35), we have
0 ≤ H(X|Y ) ≤ H(X|f (Y )) ≤ H(X).
8
2.3 Relative Entropy and Mutual Information
Let X be a random variable and consider two different probability density functions p(x)
and q(x) for X. The relative entropy D(p||q) is a measure of the distance between two
distributions p(x) and q(x). The relative entropy or Kullback-Leibler distance between two
probability densities p(x) and q(x) is defined as
X p(x)
D(p||q) = p(x) log , (2.4)
x∈X
q(x)
XX p(x, y)
I(X; Y ) = p(x, y) log
x∈X y∈Y
p(x)p(y)
= D(p(x, y)||p(x)p(y)).
It is proved in T. Cover and J. Thomas [8], on pages 28 and 29, that we have
9
2.4.1 Lemma. (Shearer’s Lemma). Suppose n distinct points in R3 have n1 distinct
projections on the XY -plane, n2 distinct projections on the XZ-plane and n3 distinct
projections on the Y Z-plane. Then, n2 ≤ n1 n2 n3 .
Proof. Let P = (A, B, C) be one of the n points picked at random with uniform distribu-
tion, and P1 = (A, B), P2 = (A, C), and P3 = (B, C) are its three projections. Then we
have
Furthermore,
H (P1 ) = H (A) + H (B|A) ,
H (P2 ) = H (A) + H (C|A) ,
H (P3 ) = H (B) + H (C|B) .
Adding both sides of these equations and considering 2.6, we have 2H (P ) ≤ H (P1 ) +
H (P2 ) + H (P3 ). Now, noting that H (P ) = log n, and H (Pi ) ≤ log ni , the lemma is
proved.
As another application of the entropy method, we can give an upper bound on the
number of the perfect matchings of a bipartite graph (see J. Radhakrishnan [33]).
2.4.2 Theorem. (Brégman’s Theorem).Let G be a bipartite graph with parts V1 and V2
such that |V1 | = |V2 | = n. Let d(v) denote the degree of a vertex v in G. Then, the number
of perfect matchings in G is at most
Y 1
(d(v)!) d(v) .
v∈V1
Proof. Let X be the set of perfect matchings of G. Let X be a random variable corre-
sponding to the elements of X with uniform density. Then
H(X) = log |X |.
The following remark is useful in our discussion. Let Y be any random variable with the
set of possible values Y. First note that the conditional entropy H (Y |X) is obtained using
(2.3). Let Yx denote the set of possible values for the random variable Y given x ∈ X ,
that is
Yx = {y ∈ Y : P (Y = y|X = x) > 0}.
10
We partition the set X into sets X1 , X2 , · · · , Xr such that for i = 1, 2, · · · , r and all x ∈ Xi ,
we have
|Yx | = i. (2.7)
Letting Yx be a random variable taking its value on the set Yx with uniform density, and
noting equations (2.2) and (2.7) for all x ∈ Xi we have
log |X | = H(X)
= H (X(v1 )) + H (X(v2 )|X(v1 )) + · · · + H (X(vn )|X(v1 ), · · · , X(vn−1 ))
(2.11)
H(X) = H (X(τ (1)))+H (X(τ (2))|X(τ (1)))+· · ·+H (X(τ (n))|X(τ (1)), · · · , X(τ (n − 1))) .
H(X) = Eτ (H (X(τ (1))) + H (X(τ (2))|X(τ (1))) + · · · + H (X(τ (n))|X(τ (1)), · · · , X(τ (n − 1)))) .
For a fixed τ , fix v ∈ V1 and let k = τ −1 (v). Then we let Yv,τ to be the set of vertices u in
V2 which are adjacent to vertex v ∈ V1 and
u∈
/ {x (τ (1)) , x (τ (2)) , · · · , x (τ (k − 1))}
11
Letting N (v) be the set of neighbours of v ∈ V1 in V2 , we have
Letting d(v) be the degree of vertex v and Yv,τ = |Yv,τ | be a random variable taking its
value in {1, · · · , d(v)}, that is
12
Chapter 3
Graph Entropy
In this chapter, we introduce and study the entropy of a graph which was defined in [19] by
J. Körner in 1973. We present several equivalent definitions of this parameter. However,
we will focus mostly on the combinatorial definition which is going to be the main theme
of this thesis.
3.1.1 Remark. Note that the function − ki=1 pi log ai in the definition of a convex corner
P
is a convex function and tends to infinity at the boundary of the non-negative orthant and
tends monotonically to −∞ along the rays from the origin.
P
Consider the convex corner S := {x ≥ 0, i xi ≤ 1}, which is called a unit corner . The
following lemma relates the entropy of a random variable defined in the previous chapter
to the entropy of the unit corner.
13
3.1.1 Lemma. The entropy HS (P ) of a probability densityP P with respect to the unit
corner S is just the regular (Shannon) entropy H (P ) = − i pi log pi .
Thus the above minimum is attained by a probability density vector s. More precisely, we
have
HS (p) = D(p||s) + H(p).
Noting that D(p||s) ≥ 0 and D(p||s) = 0 if and only if s = p, we get
HS (p) = H(p).
There is another way to obtain the entropy of a convex corner. Consider the mapping
Λ : int Rn+ → Rn defined by
It is easy to see using the concavity of the log function that if A is a convex corner,
then Λ(A) is a closed, convex, full-dimensional set, which is up-monotone, i.e., a ∈ Λ(A)
and a0 ≥ a imply a0 ∈ Λ(A). Now, HA (P ) is the minimum of the linear objective function
P
i pi xi over Λ(A). Now we have the following lemma (See [9]).
3.1.2 Lemma. (I. Csiszár, J. Körner, L. Lovás , K. Marton, and G. Simonyi ). For two
convex corners A, C ⊆ Rk+ , we have HA (P ) ≥ HC (P ) for all P if and only if A ⊆ C.
Proof. The “if” part is obvious. Assume that HC (P ) ≤ HA (P ) for all P . As remarked
above, we have
HA (P ) = min{P T x : x ∈ Λ(A)},
and hence it follows that we must have Λ(A) ⊆ Λ(C). This clearly implies A ⊆ C.
Then we have the following corollary.
3.1.3 Corollary. We have 0 ≤ HA (P ) ≤ H(P ) for every probability distribution P if and
only if A contains the unit corner and is contained in the unit cube.
14
3.2 Entropy of a Graph
Let G be a graph on vertex set V (G) = {1, · · · , n}, let P = (p1 , · · · , pn ) be a probability
density on V (G), and let V P (G) denote the vertex packing polytope of G. The entropy of
G with respect to P is then defined as
n
X
Hk (G, P ) = min pi log(1/ai ).
a∈V P (G)
i=1
Let G = (V, E) be a graph with vertex set V and edge set E. Let V n be the set of sequences
of length n from V . Then the graph G(n) = (V n , E (n) ) is the n-th conormal power. Two
distinct vertices x and y of G(n) are adjacent in G(n) if there is some i ∈ n such that xi
and yi are adjacent in G, that is
Let X and Y be two discrete random variables taking their values on some (possibly
different) finite sets and consider the random variable formed by the pair (X, Y ).
Now let X denote a random variable taking its values on the vertex set of G and Y be a
random variable taking its values on the independent sets of G. Having a fixed distribution
P over the vertices, the set of feasible joint distributions Q consists of the joint distributions
Q of X and Y such that X
Q(X, Y = y) = P (X).
y∈Y
V (C5 ) = {x1 , x2 , x3 , x4 , x5 },
15
and let Y denote the set of independent sets of G. Let P be the uniform distribution over
the vertices of G, i.e.,
1
P (X = xi ) = , ∀i ∈ {1, · · · , 5},
5
Noting that each vertex of C5 lies in two maximal independent sets, we define the joint
distribution Q as
1
10
, y maximal and y 3 x,
Q(X = x, Y = y) = (3.2)
0, Otherwise.
Proof. First, we show that Hk (G, P ) = H 0 (G, P ). Let X be a random variable taking its
values on the vertices of G with probability density P = (p1 , · · · , pn ). Furthermore, let Y
be the random variable associated with the independent sets of G and F(G) be the family
of independent sets of G. Let q be the conditional distribution of Y which achieves the
minimum in (3.3) and r be the corresponding distribution of Y . Then we have
X X r(F )
H 0 (G, P ) = I (X; Y ) = − pi q (F |i) log .
i
q(F |i)
i∈F ∈F (G)
16
Note that a ∈ V P (G). Hence,
X
H 0 (G, P ) ≥ − pi log ai .
i
and consequently,
H 0 (G, P ) ≥ Hk (G, P ) .
Now we prove the reverse inequality. Let a ∈ V P (G). Then letting s be a probability
density on F(G), we have X
ai = s(F ).
i∈F ∈F (G)
Thus, X X
− pi q(F |i) log r(F ) ≤ − pi q(F |i) log s(F ).
i,F i,F
And therefore,
X q(F |i) X
H 0 (G, P ) ≤ pi q(F |i) log =− pi log ai .
i,F
s(F ) i
3.2.2 Lemma. (J. Körner). For every graph G we have H 0 (G, P ) = H (G, P ).
17
x1
x5
x2
x4 x3
X = {x1 , x2 , x3 , x4 , x5 }.
V (G) = X .
Furthermore, two vertices of G are adjacent if and only if the corresponding elements of
X are distinguishable. As an example one can think of the 5-cycle of Figure 3.1 as a
characteristic graph of an information source X . In the source coding problem, our goal is
to label the vertices of the characteristic graph with minimum number of labels so that we
can recover the elemnets of a given alphabet in a unique way. This means that we should
colour the vertices of the graph properly with minimum number of colours. More precisely,
one way of encoding the elements of the source alphabet X in Figure 3.1 is
{x1 , x3 } → red,
{x2 , x4 } → blue,
{x5 } → green.
(3.5)
18
Now, let X be a random variable takes its values from X with the following probability
density
P (X = xi ) = pi , ∀i ∈ {1, · · · , 5}.
Now consider the graph G(n) , and let > 0. Then neglecting vertices of G(n) having a total
probability less than , the encoding of vertices of G(n) essentially becomes the colouring
of a sufficiently large subgraph of G(n) . And therefore, the minimum number of codewords
is
min χ(G(n) (U )).
(n)
U ∈T
Taking logarithm of the above quantity, normalizing it by n, and making n very large,
we get the minimum number of required information bits which is the same as the graph
entropy of G. The characteristic graph of a regular source where distinct elements of the
source alphabet are distinguishable is a complete graph. We will see in section 3.5 that
the entropy of a complete graph is the same as the entropy of a random variable.
Proof. For graphs F and G mentioned above, we have V P (G) ⊆ V P (F ). This immediately
implies the statement by the definition of graph entropy.
The sub-additivity was first recognized by Körner in [21] and he proved the following
lemma.
3.4.2 Lemma. (J. Körner). Let F and G be two graphs on the same vertex set V and
F ∪ G denote the graph on V with edge set E(F ) ∪ E(G). For any fixed probability density
P we have
Hk (F ∪ G, P ) ≤ Hk (F, P ) + Hk (G, P ) .
Proof. Let a ∈ V P (F ) and b ∈ V P (G) be the vectors achieving the minima in the
definition of graph entropy for Hk (F, P ) and Hk (G, P ), respectively. Notice the vector
a ◦ b = (a1 b1 , a2 b2 , · · · , an bn ) is in V P (F ∪ G), simply because the intersection of a stable
set of F with a stable set of G is always a stable set in F ∪ G. Hence, we have
19
u1
u5
u2
v1
v3 v2
u4 u3
(a) A 5-cycle G. (b) A triangle F .
v1
v2 v3
u5
u2
u4 u3
(c) The graph Gu1 ←−F
Figure 3.2
n n
X 1 X 1
Hk (F, P ) + Hk (G, P ) = pi log + pi log
i=1
ai i=1 bi
n
X 1
= pi log
i=1
ai b i
≥ Hk (F ∪ G, P ) .
The notion of substitution is defined as follows. Let F and G be two vertex disjoint
graphs and v be a vertex of G. By substituting F for v we mean deleting v and joining
every vertex of F to those vertices of G which have been adjacent with v. We will denote
the resulting graph Gv←F . We extend this concept also to distributions. If we are given
20
a probability distribution P on V (G) and a probability distribution Q on V (F ) then by
Pv←Q we denote the distribution on V (Gv←F ) given by Pv←Q (x) = P (x) if x ∈ V (G) \ v
and Pv←Q (x) = P (x)Q(x) if x ∈ V (F ). This operation is illustrated in Figure 3.2.
Now we state the following lemma whose proof can be found in J. Körner, et. al. [23].
3.4.3 Lemma. (J. Körner, G. Simonyi, and Zs. Tuza). Let F and G be two vertex disjoint
graphs, v a vertex of G, while P and Q are probability distributions on V (G) and V (F ),
respectively. Then we have
Notice that the entropy of an empty graph (a graph with no edges) is always zero
(regardless of the distribution on its vertices). Noting this fact, we have the following
corollary as a consequence of Lemma 3.4.3.
3.4.4 Corollary. Let the connected components of the graph G be the subgraphs Gi and P
be a probability distribution on V (G). Set
Then X
Hk (G, P ) = P (V (Gi )) Hk (Gi , Pi ) .
i
Proof. Consider the empty graph on as many vertices as the number of connected compo-
nents of G. Let a distribution be given on its vertices by P (V (Gi )) being the probability
of the vertex corresponding to the ith component of G. Now substituting each vertex by
the component it belongs to and applying Lemma 3.4.3 the statement follows.
Hk (Kn , P ) = H(P ).
21
Proof. By definition of entropy of a graph, Hk (Kn , P ) has the form ni=1 pi log q1i where
P
Proof. The statement follows from Lemma 3.4.3 and substituting stable sets of size
m1 , m2 , · · · , mk for the vertices of Kk .
A special case of the above Lemma is the entropy of a complete bipartite graph with
equal probability measure on its stable sets equal to 1. Now, let G be a bipartite graph
with color classes A and B. For a set D ⊆ A, let N (D) denotes the the set of neighbours
of D in B, that is a subtes of the vertices in B which are adjacent to a vertex in A.
Given a distribution P on V (G) we have
X
P (D) = pi ∀D ⊆ V (G),
i∈D
P (D) P (N (D))
≤ ,
P (A) P (B)
22
then there exists a partition of A = D1 ∪ · · · ∪ Dk and a partition of B = U1 ∪ · · · ∪ Uk such
that
k
X P (Di )
Hk (G, P ) = P (Di ∪ Ui ) h .
i=1
P (D i ∪ U i )
Proof. Let us assume the condition in the theorem statement holds. Then, using max-flow
min-cut theorem (see A. Schrijver [34] page 150), we show that there exists a probability
density Q on the edges of G such that for all vertices v ∈ A, we have
X p(v)
Q(e) = , (3.6)
P (A)
v∈e∈E(G)
We define a digraph D0 by
V (D0 ) = V (G) ∪ {s, t},
and joining vertices s and t to all vertices in parts A and B, respectively. The edges between
A and B are the exactly the same edges in G. Furthermore, we orient edges from s toward
A and from A toward B and from B to t. We define a capacity function c : E(D0 ) → R+
as
p(v)
P (A) ,
e = (s, v), v ∈ A,
c(e) = 1, e = (v, u), v ∈ A and u ∈ B, (3.7)
p(u) ,
e = (u, t), u ∈ B.
P (B)
By the definition of c, we note that the maximum st-flow is at most 1. Now, by showing
that the minimum capacity of an st-cut is at least 1, we are done.
Let δ(U ) be a st-cut for some subset U = {s} ∪ A0 ∪ B 0 of V (D0 ) with A0 ⊆ A and
B 0 ⊆ B. If
N (A0 ) * B 0 ,
then
c (δ(U )) ≥ 1.
So suppose that
N (A0 ) ⊆ B 0 .
Then using the assumption
P (A0 ) P N (A0 )
≤ ,
P (A) P (A)
23
we get
P (B 0 ) P (A \ A0 )
c (δ(U )) ≥ +
P (B) P (A)
P (A ) P (A \ A0 )
0
≥ + = 1. (3.8)
P (A) P (A)
|V (G)|
Now, we define the vector b ∈ R+ , as follows,
p(v)
(b)v := .
P (A)
Then using (3.6), we have
b∈VP G ,
Thus,
X 1
Hk (G, P ) ≤ p(v) log = H(P ) − h (P (A)) .
bv
v∈V (G)
24
is maximal. Now consider the subgraph (A \ D1 ) ∪ (B \ N (D1 )) and for i = 2, · · · , k let
i−1
[
Di ⊆ A \ Dj ,
j=1
such that
P (B \ i−1
S
P (Di ) j=1 N (Dj ))
Si−1 . ,
P (A \ j=1 Dj ) P (N (Di ))
is maximal. Let us
Ui = N (Di ) \ N (Di ∪ · · · ∪ Di−1 ), for i = 1, · · · , k.
Consider the independent sets J0 , · · · , Jk of the following form
J0 = B, J1 = D1 ∪ B \ U1 , · · · , Ji = D1 ∪ · · · ∪ Di ∪ B \ U1 \ · · · \ Ui , · · · , Jk = A.
Set
P (U1 )
α(J0 ) = ,
P (U1 ∪ D1 )
P (Ui+1 ) P (Ui )
α(Ji ) = − , for i = 1, · · · , k − 1,
P (Ui+1 ∪ Di+1 ) P (Ui ∪ Di )
P (Uk )
α(Jk ) = 1 − .
P (Uk ∪ Dk )
Note that by the choice of Di ’s, all α(Ji )’s are non-negative and add up to one. This
|V (G)|
implies that the vector a ∈ R+ defined as
X
aj = α(Jr ), ∀j ∈ V (G),
j∈Jr
is in V P (G). Furthermore,
( P (Di )
P (Di ∪Ui )
, j ∈ Di ,
aj = P (Ui )
P (Di ∪Ui )
, j ∈ Ui .
By the choice of the Dj ’s and using the same max-flow min-cut argument we had, there
exists a probability density Qi on edges of G[Di ∪ Ui ] such that
X pj
b0j = Qi (e) = , ∀j ∈ Di ,
P (Di )
j∈e∈E(G[Di ∪Ui ])
0
X pj
bj = Qi (e) = , ∀j ∈ Ui .
P (Ui )
j∈e∈E(G[Di ∪Ui ])
25
Now we define the probability density Q on the edges of G as follows
P (Di ∪ Ui )Qi (e), e ∈ E (G[Di ∪ Ui ]) ,
Q(e) =
0, e∈/ E (G[Di ∪ Ui ]) .
The corresponding vector b ∈ V P G is given by
the sub-additivity of graph entropy is violated. Now, it can be verified that Hk (G, P ) is
equal to what stated in the theorem statement.
26
1
Proof. Note that for every graph G we have χf (G)
,··· , χf 1(G) ∈ V P (G). Thus for every
probability density P , we have
Hk (G, P ) ≤ log χf (G).
Now, from the definition of the fractional chromatic number we deduce that graph G has
an induced subgraph G0 with χf (G0 ) = χf (G) = χf such that
1 1
∀y ∈ V P (G0 ) , y ≥ implies y = .
χf χf
Now, by the above remark from [Link]́r and et. al. [9], there exists a probability density
P 0 on V P (G0 ) such that Hk (G0 , P 0 ) = log χf . Extending P 0 to a probability distribution
P as 0
pi , i ∈ V (G),
pi = (3.11)
0, i ∈ V (G) − V (G0 ).
the lemma is proved.
Now there is a natural question of uniqueness of the probability density which is a max-
imizer in the above lemma. Using the above lemma we compute the fractional chromatic
number of a vertex transitive graph in the following corollary.
3.6.2 Corollary. Let G be a vertex transitive graph with |V (G)| = n, and let α(G) denote
the size of a coclique of G with maximum size. Then
n
χf (G) = .
α(G)
Proof. First note that since G is a vertex transitive graph, there exists a family of cocliques
S1 , · · · , Sb of size α(G) that cover the vertex set of G, i.e., V (G) uniformly. That is each
vertex of G lies in exactly r of these cocliques, for some constant r. Thus we have
bα(G) = nr, (3.12)
Now, we define a fractional coloring f as follows
1
r
, i ∈ {1, · · · , b},
fi = (3.13)
0, Otherwise.
Thus, from the definition of the fractional chromatic number of a graph, (3.12), and (3.13),
we have
X b n
log χf (G) ≤ log fi = log = log . (3.14)
i
r α(G)
27
Now suppose that the probability density u of the vertex set V (G) is uniform and let B
be the 01-matrix whose columns are the characteristic vectors of the independent sets in
G. Then X
V P (G) = {x ∈ Rn+ : Bλ = x, λi = 1, λi ≥ 0, ∀i}
i
Now using Karush-Kuhn-Tucker conditions for our convex optimization problem (see S.
Boyd and L. Vanderberghe[4]) we get
Then considering the co-clique cover {S1 , · · · , Sb } above with |Si | = α(G) for all i, one can
verify that λ∗ defined as
α(G)
∗ , i ∈ {1, · · · , b},
λi = nr (3.16)
0, Otherwise.
is an optimum solution to our minimization problem. Since setting γi = 0 for i ∈ S \
{1, · · · , b} along with λ∗ gives a solution to (3.15). Substituting λ∗ into g(λ)
n
Hk (G, U ) = log .
α(G)
28
Using (3.14) and Lemma 3.6.1, the corollary is proved.
The above corollary implies that the uniform probability density is a maximizer for
Hk (G, P ) for a vertex transitive graph. We will give another proof of this fact at the end
of the next chapter using chromatic entropy.
We have also the following corollary.
3.6.3 Corollary. For any graph G and probability density P , we have
Equality holds if χ(G) = χf (G) and P maximizes the left hand side above.
Note that (3.1), Lemma 3.2.1, Lemma 3.2.2, and the sub-multiplicative nature of the
chromatic number, also results in the above corollary.
(a ◦ b)i = ai .bi , i = 1, · · · , k.
A ◦ B = {a ◦ b : a ∈ A, b ∈ B}.
We say a pair of sets A, B ∈ Rk+ is a generating pair, if every probability density vector
p ∈ Rk+ can be represented as the schur product of the elements of A and B, i.e.,
p = a ◦ b, a ∈ A, b ∈ B.
In this section we characterize a pair of generating convex corners. First, we recall the
definition of the antiblocker of a convex corner (see D. R. Fulkerson [16]). The antiblocker
of a convex corner A is defined as
A∗ := b ∈ Rn+ : bT a ≤ 1, ∀a ∈ A ,
29
3.7.1 Lemma. (I. Csiszár and et. al.). Let A, B ⊆ Rn+ be convex corners and p ∈ Rn+ a
probability density. Then
(i) If p = a ◦ b for some a ∈ A and b ∈ B, then
H(p) ≥ HA (p) + HB (p),
with equality if and only if a and b achieve HA (p) and HB (p).
(ii) If B ⊆ A∗ then
H(p) ≤ HA (p) + HB (p).
with equality if and only if p = a ◦ b for some a ∈ A and b ∈ B.
30
3.8 Additivity and Sub-additivity
If a ∈ Rk+ and b ∈ Rl+ then their Kronecker product a ⊗ b ∈ Rkl
+ is defined by
(a ⊗ b)ij = ai .bj , i = 1, · · · , k, j = 1, · · · , l.
Note that if p and q are probability distributions then p ⊗ q is the usual product distri-
bution. If k = l, then also the Schur product a ◦ b ∈ Rk+ is defined by
a ◦ b = ai .bi , i = 1, · · · , k.
Let A ⊆ Rk+ and B ⊆ Rl+ be convex corners. Their Kronecker product A ⊗ B ⊆ Rkl
+ is the
convex corner spanned by the Kronecker products a ⊗ b such that a ∈ A and b ∈ B. The
Schur product A B of the convex corners A, B ⊆ Rk+ is the convex corner in that same
space spanned by the vectors a ◦ b such that a ∈ A and b ∈ B. Thus
A B = Convex Hull of {a ◦ b : a ∈ A, b ∈ B} .
I. Csiszár et. al. proved the following lemma and theorem in [9].
3.8.1 Lemma. ( I. Csiszár et. al.). Let A, B ⊆ Rk+ be convex corners. The pair (A, B) is
an antiblocking pair if and only if
H(p) = HA (p) + HB (p)
for every probability distribution p ∈ Rk+ .
3.8.2 Theorem. ( I. Csiszár et. al.). Let A ⊆ Rk+ and B ⊆ Rl+ be convex corners, and
p ∈ Rk+ , q ∈ Rl+ probability distributions. Then, we have
HA⊗B (p ⊗ q) = HA (p) + HB (q) = H(A∗ ⊗B∗ )∗ (p ⊗ q),
Furthermore, for convex corners A, B ⊆ Rk+ , and a probability distribution p ∈ Rk+ , we
have
HA B (p) ≤ HA (p) + HA (p).
31
Hence HA⊗B (p ⊗ q) ≤ HA (p) + HB (q). By Lemma 3.8.1,
H (p ⊗ q) = HA⊗B (p ⊗ q) + H(A⊗B)∗ (p ⊗ q) .
and consequently,
32
We defined the vertex packing polytope V P (G) of a graph, in the previous sections.
Here, we need another important notion from graph theory, i.e, the fractional vertex packing
polytope of a graph G. The fractional vertex packing polytope of G is defined as
X
F V P (G) = {b ∈ R|V | : b ≥ 0, bi ≤ 1 for all cliques K of G}
i∈K
It is easy to see that, similar to V P (G), the fractional vertex packing polytope F V P (G)
is also a convex corner and V P (G) ⊆ F V P (G) for every graph G. Equality holds here if
and only if the graph is perfect (See V. Chvátal [7] and D. R. Fulkerson [16]). Also note
that ∗
F V P (G) = V P (G) .
P
3.9.1 Lemma. (I. Csiszár and et. al.). Let S = {x ≥ 0, i xi ≤ 1}. Then we have
Furthermore,
V P (G) V P (G) ⊆ F V P (G) F V P (G)
The following theorem conjectured by Körner and Marton in [25] first and proved by
I. Csiszár and et. al. in [9].
33
3.9.3 Theorem. (I. Csiszár and et. al.). A graph is strongly splitting if and only if it is
perfect.
Proof. By Lemmas 3.1.2 and 3.9.2, G is strongly splitting if and only if V P (G) = F V P (G).
This is equivalent to the perfectness of G.
Let G = (V, E) be a graph with vertex set V and edge set E. The graph
G[n] = (V n , E [n] )
is the n-th normal power where V n is the set of sequences of length n from V , and two
distinct vertices x and y are adjacent in G[n] if all of their entries are adjacent or equal in
G, that is
E [n] = {(x, y) ∈ V n × V n : x 6= y, ∀i (xi , yi ) ∈ E or xi = yi }.
The π-entropy of a graph G = (V, E) with respect to the probability density P on V is
defined as
1
Hπ (G, P ) = lim lim min log χ(G[n] (U )).
→0 n→∞ U ⊆V n ,pn (U )≥1− n
(n)
Note that G[n] = G .
The follwoing theorem is proved in G. Simonyi [36].
3.9.4 Theorem. (G. Simonyi). If G = (V, E) is perfect, then Hπ (G, P ) = Hk (G, P ).
34
Chapter 4
Chromatic Entropy
In this chapter, we investigate minimum entropy colouring of the vertex set of a proba-
bilistic graph (G, P ) which was previously studied by N. Alon and A. Orlitsky [1]. The
minimum number of colours χH (G, P ) required in a minimum entropy colouring of V (G)
was studied by J. Cardinal and et. al. [5] and [6]. We state their results and further
investigate χH (G, P ).
k
X 1
H (π) = p (Ci ) log ,
i=1
p (Ci )
35
The chromatic entropy of a probabilistic graph (G, P ) is defined as
For the second probabilistic 5-cycle, the chromatic entropy is attained by choosing the
colour classes as {1, 3}, {2, 5}, and {4}. Then, its chromatic entropy is
The source coding problem is the problem of representing a random variable by a sequence
of bits such that the expected length of the representation is minimized.
N. Alon and A. Orlitsky [1] considered a source coding problem in which a sender wants
to transmit an information source to a receiver with some related data to the intended
information source. Motivated by this problem, they considered the OR product of graphs,
as we stated in the previous chapter. We recall this graph product here.
36
Let G1 , · · W· , Gn be graphs with vertex sets V1 , · · · , Vn . The OR product of G1 , · · · , Gn
is the graph ni=1 Gi whose vertex set is V n and where two distinct vertices (v1 , · · · , vn )
and (v10 , · · · , vn0 ) are adjacent if for some i ∈ {1, · · · , n} such that Wvi 6= vi0 , vi is adjacent to
vi0 in Gi . The n-fold OR product of G with itself is denoted by G n .
N. Alon and A. Orlitsky [1] proved the following lemma which relates chromatic entropy
to graph entropy.
W
4.2.1 Lemma. (N. Alon and A. Orlitsky). limn→∞ n1 Hχ (G n
, P (n) ) = Hk (G, P ).
Let Ω(G) be the collection of cliques of a graph G. The clique entropy of a probabilistic
graph (G, P ) is
Hω (G, P ) := max{H(X|Z 0 ) : X ∈ Z 0 ∈ Ω(G)}.
That is, for every vertex x we choose a conditional probability distribution p(z 0 |x) ranging
over the cliques containing x. This determines a joint probability distribution of X and a
random variable Z 0 ranging over all cliques containing X. Then, the clique entropy is the
maximal conditional entropy of X given Z 0 .
Example. The only cliques of an empty graph are singletones. Thus for an empty graph,
we have
Z 0 = {X},
which implies
Hω (G, P ) = 0.
On the other hand, for a complete graph, we can take Z 0 to be the set of all vertices. Thus,
for a probabilistic complete graph (G, P ), we have
Hω (G, P ) = H(X).
For a 5-cycle, every clique is either a singleton or an edge. Thus, for a probabilistic 5-cycle
with uniform distribution over the vertices, we have
Hω (G, P ) ≤ 1.
Now, if for every x we let Z 0 be uniformly distributed over the two edges containing x,
then by symmetry we get
H(X|Z 0 ) = 1,
which implies
Hω (G, P ) = 1.
37
N. Alon and J. Cardinal and et. al. proved the following lemmas in [1] and [6].
4.2.2 Lemma. (N. Alon and A. Orlitsky). Let U be the uniform distribution over the
vertices V (G) of a probabilistic graph (G, U ) and α(G) be the independence number of the
graph G. Then,
|V (G)|
Hχ (G, U ) ≥ log .
α(G)
4.2.3 Lemma. (N. ALon and A. Orlitsky). For every probabilistic graph (G, P )
4.2.4 Lemma. (J. Cardinal and et. al.). For every probabilistic graph (G, P ), we have
Here α(G, P ) denotes the maximum weight P (S) of an independent set S of (G, P ).
It may seem that non-uniform distribution decreases chromatic entropy Hχ (G, P ), but
the following example shows that this is not true. Let us consider 7-star with deg(v1 ) = 7
1
and deg(vi ) = 1 for i ∈ {2, · · · , 8}. If p(v1 ) = 0.5 and p(vi ) = 14 for i ∈ {2, · · · , 8},
1
then Hχ (G, P ) = H(0.5, 0.5) = 1, while if p(vi ) = 8 for i ∈ {1, · · · , 8}, then Hχ (G, P ) =
H( 81 , 78 ) ≤ H(0.5, 0.5) = 1.
38
4.3.1 Theorem. (J. Cardinal and et. al.) Any minimum entropy colouring of a graph G
equipped with a probability distribution on its vertices is a Grundy colouring. Moreover,
for any Grundy colourig φ of G, there exists a probability mass function P over V (G) such
that φ is the unique minimum entropy colouring of (G, P ).
We now consider upper bounds on χH (G, P ) in terms of the maximum valency of G, i.e.,
∆(G). The following theorems were proved in J. Cardinal and et. al. [5] and J. Cardinal
and et. al. [6].
4.3.2 Theorem. (J. Cardinal and et. al.). For any probabilistic graph (G, P ), we have
χH (G, P ) ≤ ∆(G) + 1.
4.3.3 Theorem. (Brooks’ Theorem for Probabilistic Graphs). If G is connected graph
different from a complete graph or an odd cycle, and U is a uniform distribution on its
vertices, then χH (G, U ) ≤ ∆(G).
39
4.4.2 Lemma. For every probabilistic graph (G, P ), we have
Proof. Due to Theorem 4.3.1 any minimum entropy colouring of a graph G equipped
with a probability distribution on its vertices is a Grundy colouring, and for any Grundy
colouring φ of G, there exists a probability distribution P over V (G) such that φ is the
unique minimum entropy colouring of (G, P ).
4.4.3 Corollary.
max χH (G, P ) = χo (G, P ).
P
Note that every greedy colouring of the vertices of a graph is a Grundy colouring.
It is worth mentioning that the chromatic number of a vertex transitive graph is not
achieved by a Grundy colouring. Let G be a 6-cycle. Consider the first Grundy colouring
of G with colour classes {v1 , v4 }, {v3 , v6 }, and {v2 , v5 } and the second Grundy colouring
with colour classes {v1 , v3 , v5 } and {v2 , v4 , v6 }.
It may seem that every Grundy colouring of a probabilistic graph is a minimum en-
tropy colouring, but the following example shows that is not true. Consider a proba-
bility distribution for the 6-cycle in the above example as p(v1 ) = p(v3 ) = 0.4 and
p(v2 ) = p(v4 ) = p(v5 ) = p(v6 ) = 0.05. Then, denoting the first Grundy colouring in the
example above by cA and the second one by cB , we have H(cA ) = 0.44 and H(cB ) = 0.25
which are not equal.
4.4.1 Remark. Let (G, P ) and (G0 , P 0 ) be two probabilistic graphs, and φ : G → G0 a
homomorphism from G to G0 such that for every v 0 ∈ V (G0 ), we have
X
p0 (v 0 ) = p(v).
v:v∈φ−1 (v 0 )
40
4.5 Minimum Entropy Colouring and Kneser Graphs
In this section, we study the minimum entropy colouring of a Kneser graph Kv:r and prove
the following Theorem.
4.5.1 Theorem. Let (Kv:r , U ) be a probabilistic Kneser graph with uniform distribution U
over its vertices and v ≥ 2r. Then, the minimum number of colours in a minimum entropy
colouring of (Kv:r , U ), - i.e. χH (Kv:r , U ), is equal to the chromatic number of Kv:r , i.e.
χ(Kv:r ). Furthermore, the chromatic entropy of (Kv:r , U ) is
1
Hχ (Kv:r , U ) = log χf (Kv:r )+
χf (Kv:r )
i
X 1 1 Y
Qi log χf (Kv:r ) χf (Kv−j−1:v−r−j ).
0≤i≤v−1−2r
χ f (K v:r ) j=0 χ f (K v−j−1:v−r−j ) j=0
Before proving the above theorem, we explain some preliminaries and a lemma which
were previously given in J. Cardinal and et. al. [6].
Consider a probabilistic graph (G, P ). Let S be a subset of the vertices of G, i.e.,
S ⊆ V (G).
Then P (S) denotes X
P (S) := p(x).
x∈S
Note that a colouring of V (G) is a map φ from the vertex set V (G) of G to the set of
positive integers N, that is
φ : V (G) → N.
Then φ−1 (i) denotes the set of vertices coloured with colour i. Let ci be the probability
of the i-th colour class. Hence, letting X be a random vertex with distribution P ranging
over the vertices of G, we get
ci = P (φ−1 (i)) = P (φ(X) = i).
The colour sequence of φ with respect to P is the infinite vector c = (ci ).
Let (G, P ) be a probabilistic graph. A sequence c is said to be colour-feasible if there
exists a colouring φ of V (G) having c as colour sequence. We consider nonincreasing colour
sequences, that is, colour sequences c such that
ci ≥ ci+1 , ∀ i.
41
Note that colour sequences define discrete probability distributions on N. Then the entropy
of colour sequence c of a colouring φ, i.e., H(c) is
X
H(c) = − ci log ci .
i∈N
42
max Hk (G, P ) = log χf (G) . (4.1)
P
In this section, we prove the following theorem for vertex transitive graphs using chro-
matic entropy. Recall that we gave another proof of the following theorem using the
structure of vertex transitive graphs and convex optimization techniques in previous chap-
ter.
4.6.1 Theorem. Let G be a vertex transitive graph. Then the uniform distribution over
vertices of G maximizes Hk (G, P ). That is Hk (G, U ) = log χf (G).
Proof. First note that for a vertex transitive graph G, we have χf (G) = |Vα(G)
(G)|
, and the
W
n-fold OR product G n of a vertex transitive graph G is also vertex transitive. Now from
Lemma 4.2.2, Lemma 4.2.4 , and equation 4.1, we have
W W W
n
, U ≤ log χf G n ≤ Hχ G n , U ,
Hk G (4.2)
log χf (G) = limn→∞ n1 log χ G n . Hence, applying Lemma 4.2.1 to equation 4.2 and
using squeezing theorem, we get
1 W 1 W
log χ G n = lim Hχ G n , U .
Hk (G, U ) = log χf (G) = lim (4.3)
n→∞ n n→∞ n
The following example shows that the converse of the above theorem is not true. Con-
sider G = C4 ∪C6 , with vertex sets V (C4 ) = {v1 , v2 , v3 , v4 } and V (C6 ) = {v5 , v6 , v7 , v8 , v9 , v10 },
and parts A = {v1 , v3 , v5 , v7 , v9 }, B = {v2 , v4 , v6 , v8 , v10 }. Clearly, G is not a vertex tran-
sitive graph, however, using Theorem 3.5.3, one can see that the uniform distribution
1 1
U = 10 , · · · , 10 gives the maximum graph entropy which is 1.
4.6.1 Remark. Note that the maximizer probability distribution of the graph entropy is
not unique. Consider C4 with vertex set V (C4 ) = {v1 , v2 , v3 , v4 } with parts A = {v1 , v3 }
and B = {v2 , v4 }. Using Theorem 3.5.3, probability distributions P1 = ( 41 , 14 , 14 , 14 ) and
P2 = ( 81 , 14 , 38 , 14 ) give the maximum graph entropy which is 1.
43
Now note that we can describe the chromatic entropy of a graph in terms of the graph
entropy of a complete graph as
A graph G is called symmetric with respect to a functional FG (P ) defined on the set of all
the probability distributions on its vertex set if the distribution P ∗ maximizing FG (P ) is
uniform on V (G). We study this concept in more detail in the next chapter.
44
Chapter 5
Symmetric Graphs
A graph G with distribution P on its vertices is called symmetric with respect to graph en-
tropy Hk (G, P ) if the uniform probability distribution on its vertices maximizes Hk (G, P ).
In this chapter we characterize different classes of graphs which are symmetric with respect
to graph entropy.
Proof. Suppose G has a perfect matching, then |A| = |B|, and due to Hall’s theorem we
have
|D| ≤ |N (D)|, ∀D ⊆ A.
45
p(D) p(N (D))
≤ , ∀D ⊆ A,
p(A) p(B)
Noting that Hk (G, P ) ≤ log Xf (G), ∀P , and log Xf (G) = 1 for a bipartite graph G,
the assertion holds.
Now suppose that G has no perfect matching, then we show that Hk (G, U ) < 1. First,
note that from König’s theorem we can say that a bipartite graph G = (V, E) has a perfect
matching if and only if each vertex cover has size at least 12 |V |. This implies that if a
bipartite graph G does not have a perfect matching, then G has a stable set with size
> |V2 | .
Furtthermore, as mentioned in [34], the stable set polytope of a graph G is determined
by the following inequalities if and only if G is bipartite.
0 ≤ xv ≤ 1, ∀v ∈ V (G),
xu + xv ≤ 1, ∀e = uv ∈ E(G).
We show maxx∈ stable set polytope v∈V xv > 2−|V | . Let S denote a stable set in G with
Q
|S|
|S| > |V2 | . We define a vector x such that xv = |V |
|S|
if v ∈ S and xv = 1 − |V |
otherwise.
|V | |S|
Since |S| > 2
, x is feasible. Letting t := |V |
, we have
46
5.2 Symmetric Perfect Graphs
Let G = (V, E) be a graph. Recall that the fractional vertex packing polytope of G,i.e,
F V P (G) is defined as
|V |
X
F V P (G) := {x ∈ R+ : xv ≤ 1 for all cliques K of G}.
v∈K
Note that F V P (G) is a convex corner and for every graph G, V P (G) ⊆ F V P (G). The
following theorem was previously proved in [7] and [16].
5.2.1 Theorem. A graph G is perfect if and only if V P (G) = F V P (G).
The following theorem which is called weak perfect graph theorem is useful in the fol-
lowing discussion. This theorem was proved by Lovász in [27] and [28] and is follows.
5.2.2 Theorem. A graph G is perfect if and only if its complement is perfect.
Now, we prove the following theorem which is a generalization of our bipartite sym-
metric graphs with respect to graph entropy.
5.2.3 Theorem. Let G = (V, E) be a perfect graph and P be a probability distribution on
V (G). Then G is symmetric with respect to graph entropy Hk (G, P ) if and only if G can
be covered by its cliques of maximum size.
Now, having V (T ) = V (G) and E(T ) ⊆ E(G), we get Hk (T, P ) ≤ Hk (G, P ) for every
distribution P . Using Lemma 3.6.1, this implies
X
Hk (T, P ) = P (Qi )Hk (Qi , Pi ) ≤ log χf (G), ∀P, (5.1)
i
47
Noting that G is a perfect graph, the fact that complete graphs are symmetric with
respect to graph entropy, χf (Qi ) = χf (G) = ω(G) = χ(G), ∀i ∈ [m], and 5.1, we conclude
that uniform distribution maximizes Hk (G, P ).
Now, suppose that G is symmetric with respect to graph entropy. We prove that G can
be covered by its maximum-sized cliques. Suppose this is not true. We show that G is not
symmetric with respect to Hk (G, P ).
Denoting the minimum clique cover number of G by γ(G) and the maximum indepen-
dent set number of G by α(G), from perfection of G and weak perfect theorem, we get
γ(G) = α(G). Then, using this fact, our assumption implies that G has an independent
set S with |S| > |Vω(G)
(G)|
.
|S|
|S| 1− |V |
We define a vector x such that xv = |V |
if v ∈ S and xv = ω−1
if v ∈ V (G)\S. Then,
|S|
we can see that x ∈ F V P (G) = V P (G). Let t := |V |
. Then, noting that t > ω1 ,
1 X
Hk (G, U ) ≤ − log xv
|V | v∈V
1 X X
= − log xv + xv
|V | v∈S
v∈V \S
1 1−α
= − |S| log α + (|V | − |S|) log
|V | ω−1
1−t
= −t log t − (1 − t) log
ω−1
1−t 1−t
= −t log t − (ω − 1) log < log ω(G).
ω−1 ω−1
48
Proof. Let G = L(H) for some graph H. Then either H is bipartite or regular. If H
is bipartite, then G is perfect (See [40]) and because of Theorem 5.2.3 we are done. So
suppose that H is not bipartite. Then each clique of size k in G corresponds to a vertex
v in V (H) and the edges incident to v in H and vice versa. That is because any such
cliques in G contains a triangle and there is only one way extending that triangle to the
whole clique which corresponds to edges incident with the corresponding vertex in H. This
implies that the covering cliques in G give an independent set in H which is also a vertex
cover in H. Hence H is a bipartite graph and hence G is perfect. Then due to Theorem
5.2.3 the theorem is proved.
xe ≥ 0 ∀e ∈ E(G1 ),
x(δ(v)) ≤ 1 ∀v ∈ V (G1 ), (5.2)
x (E[U ]) ≤ b 21 |U |c, ∀U ⊆ V (G1 ) with |U | odd.
Let M denote the family of all matchings in G1 , and for every matching M ∈ M let the
charactersitic vector bM ∈ Rm
+ be as
1, e ∈ M,
(bM )e = (5.3)
0, e∈/ M.
Then the fractional edge-colouring number χ0f (G1 ) of G1 is defined as
X X
χ0f (G1 ) := min{ λM |λ ∈ RM + , λM bM = 1}.
M ∈M M ∈M
If we restrict λM to be an integer, then the above definition give rise to the edge colouring
number of G1 , i.e., χ0 (G1 ). Thus
χ0f (G) ≤ χ0 (G).
As an example considering G1 to be the peterson graph, we have
49
5.3.1 Remark. Note that every matching in G1 corresponds to an independent set in G2
and every independent set in G2 corresponds to a matching in G1 . Note that the fractional
edge-colouring number of G1 , i.e., χ0f (G1 ) is equal to the fractional chromatic number of
G2 , i.e.,χ( G2 ). Thus
χ0f (G1 ) = χf (G2 ).
Furthermore, note that the vertex packing polytope V P (G2 ) of G2 is the matching polytope
M P (G1 ) of G1 (see L. Lovász and M. D. Plummer [30]). That is
V P (G2 ) = M P (G1 ).
The following theorem which was proved by Edmond, gives a characterization of the
fractional edge-colouring number χ0f (G1 ) of a graph G1 (see A. Schrijver [34]).
5.3.1 Theorem. Let ∆(G1 ) denote the maximum degree of G1 . Then the fractional edge-
colouring number of G1 is obtained as
|E(U )|
χ0f (G1 ) = max{∆(G1 ), max 1 }.
U ⊆V, |U |≥3 b |U |c
2
The following theorem introduces a class of symmetric line graphs with respect to graph
entropy. The main tool in the proof of the following theorem is Karush-Kuhn-Tucker
(KKT) optimality conditions in convex optimization (see S. Boyd and L. Vanderberghe
[4]).
5.3.3 Theorem. Let G1 be a k-graph with k ≥ 3. Then the line graph G2 = L(G1 ) is
symmetric with respect to graph entropy.
50
Let λv , γU ≥ 0 be the Lagrange multipliers corresponding to inequalities x(δ(v)) ≤ 1 and
x (E[U ]) ≤ b 12 |U |c in the description of the matching polytope M P (G1 ) in (5.2) for all
v ∈ V (G1 ) and for all U ⊆ V (G1 ) with |U | odd, and |U | ≥ 3, repectively. From our
discussion in Remark 3.1.1, the Lagrange mulitipliers corresponding to inequalities xe ≥ 0
are all zero.
Set X
g(x) = − pe log xe ,
e∈E(G1 )
Using KKT conditions (see S. Boyd, and L. Vanderberghe [4]), the vector x∗ minimizes
g(x) if and only if it satisfies
∂L
= 0,
∂x∗e
pe X
→ − ∗ + (λu + λv ) + γU = 0 for e = {u, v}. (5.5)
xe U ⊆V,
U 3e,|U | odd, |U |≥3
51
v4
v1
v3 v2
v6 v5
Then using Lemma 3.6.1 and the assumption χf (G2 ) = k the theorem is proved.
It is well known that cubic graphs has a lot of interesting structures. For example,
it can be checked that every edge in a bridgeless cubic graph is in a perfect matching.
Furthermore, L. Lovász and M. D. Plummer [30] conjectured that every bridgeless cubic
graph has an exponentially many perfect matching. This conjecture was proved by Louis
Esperet, et. al. [13] recently. Now we have the following interesting statement for every
cubic bridgeless graph.
5.3.4 Corollary. The line graph of every cubic bridgeless graph G1 = (V1 , E1 ) is symmetric
with respect to graph entropy.
Proof. We may assume that G1 is connected. Let U ⊆ V1 and let U1 ⊆ U consist of vertices
v such that δ(v) ∩ δ(U ) = ∅. Then using handshaking lemma for G1 [U ], we have
And consequently,
3|U | = |δ(U )| mod 2,
Assuming |U | is odd and noting that G1 is bridgeless, we have
δ(U ) ≥ 3.
52
v2 v7
v5 v6
v1 v3 v9 v10
v4 v8
in Figure 5.1 is neither vertex transitive nor covered by disjoint copies of its maximum size
cliques. However, it is symmetric with respect to graph entropy by Corollary 5.3.4.
Figure 5.2 shows a cubic graph with a bridge. The fractional edge chromatic number
of this graph is 3.5 while the entropy of its line graph is 1.75712, i.e., log2 3.5 = 1.8074 >
1.75712. Thus, its line graph is not symmetric with respect to graph entropy, and we
conclude that Corollary 5.3.4 is not true for cubic graphs with bridge.
53
Chapter 6
Future Work
In this chapter we explain two possible research directions related to the entropy of graphs
discussed in previous chapters. Since these directions are related to a superclass of per-
fect graphs which are called normal graphs and Lovász ϑ, we explain the corresponding
terminologies and results in the sequel.
C ∩ S 6= ∅, ∀C ∈ C, S ∈ S.
A probabilistic graph (G, P ) is weakly splitting if there exists a nowhere zero probability
distribution P on its vertex set which makes inequality (6.1) equality. The following lemma
was proved in J. Körner et. al. [23].
55
6.1.1 Lemma. (J. Körner, G. Simonyi, and Zs. Tuza) A graph G is weakly splitting if
and only if it is normal.
Furthermore, we call (G, P ) is strongly splitting if inequality (6.1) becomes equality for
every probability distribution P . The following lemma was proved in I. Csiszár et. al. [9].
6.1.2 Lemma. (I. Csiszár et. al.) For a probabilistic graph (G, P ), we have
H(P ) = Hk (G, P ) + Hk (G, P ) if and only if HV P (G) (P ) = HF V P (G) (P ).
Using Lemmas 6.1.1, 6.1.2, and 6.1.3, we conclude that every perfect graph is also a
normal graph. This fact was previously proved in J. Körner [20]. It is shown in [32] that
the line graph of a cubic graph is normal. Furthermore, it is shown in J. Körner [20] that
every odd cycle of length at least nine is normal. Smaller odd cycles are either perfect
like a triangle or not perfect nor normal like C5 and C7 . If we require that every induced
subgraph of a normal graph to be normal, we obtain the notion of hereditary normality.
The following conjecture was proposed in C. De Simone and J. Körner [15].
6.1.4 Conjecture. Normal Graph Conjecture A graph is hereditarily normal if and only
if the graph nor its complement contains C5 or C7 as an induced subgraph.
A circulant Cnk is a graph with vertex set {1, · · · , n}, and two vertices i 6= j are adjacent
if and only if
i − j ≡ k mod n.
We assume k ≥ 1 and n ≥ 2(k + 1) to avoid cases where Cnk is an independent set or a
clique. The following theorem was proved in L. E. Trotter, jr. [37].
0
6.1.5 Theorem. (L. E. Trotter, jr.) The circulant Cnk0 is an induced subgraph of Cnk if
and only if
k+1 0 k
0
n ≤ n ≤ 0.
k +1 k
Note that
0
Cnk0 ⊂ Cnk ,
implies k 0 < k and n0 < n. Particularly, the following lemma was proved in A. K. Wagler
[39].
56
6.1.6 Lemma. (A. K. Wagler)
5(k+1)
(i) C5 ⊆ Cnk if and only if 2
≤ n ≤ 5k.
7(k+1)
(ii) C7 ⊆ Cnk if and only if 2
≤ n ≤ 7k.
7(k+1) 7k
(iii) C72 ⊆ Cnk if and only if 3
≤n≤ 2
.
Using the above theorem and lemma, A. K. Wagler [39] proved the Normal Graph Conjec-
ture for circulants Cnk .
One direction for future research is investigating the Normal Graph Conjecture for
general circulants and Cayley graphs.
57
Given a probabilistic graph (G, P ), K. Marton in K. Marton [31] introduced a functional
λ(G, P ) which is analogous to Lovász’s bound ϑ(G) on Shannon capacity of graphs. Sim-
ilar to ϑ(G), the probabilistic functional λ(G, P ) is based on the concept of orthonormal
representation of a graph which is recalled here.
Let U = {ui : i ∈ V (G)} be a set of unit vectors of a common dimension d such that
Let c be a unit vector of dimension d. Then, the system (U, c) is called an orthonormal
representation of the graph G with handle c.
Letting T (G) denote the set of all orthonormal representations with a handle for graph
G, L. Lovász [29] defined
1
ϑ(G) = min max .
(U,c)∈T (G) i∈V (G) (ui , c)2
Then it is shown in L. Loász [29] that zero error Shannon capacity C(G) can be bounded
above by ϑ(G) as
C(G) ≤ log ϑ(G).
A probabilistic version of ϑ(G) denoted by λ(G, P ) is defined in K. Marton [31] as
X 1
λ(G, P ) := min Pi log .
(U,c)∈T (G) (ui , c)2
i∈V (G)
The following theorem was proved in K. Marton [31] which relates λ(G, P ) to Hk (G, P ).
6.2.2 Theorem. (K. Marton) For any probabilistic graph (G, P ),
λ G, P ≤ Hk (G, P ) .
58
K. Marton [31] also related λ(G, P ) to ϑ(G) by showing
Thus, noting (6.2) and (6.3) and the above discussion, investigating the relationship be-
tween graph homomorphism and graph entropy which may lead to investigating the rela-
tionship between graph homomorphism and graph covering problem seems interesting.
59
Appendix A
Proof. Let κ(G) be the minimum number of maximal independent sets covering the vertices
of G. Then κ(G) ≤ χ(G), since the colour classes of any proper colouring of V (G) can
be extended to maximal independent sets. On the other hand, consider a covering system
consisting of maximal independent sets S with a minimum number of maximal independent
sets. Let S = {S1 , · · · , Sκ(G) }. We define a colouring c of the vertices of graph G as
The proposed colouring is a proper colouring of the vertices of V (G) in which each colour
class corresponds to a maximal independent set in our covering system S. That is
κ(G) ≥ χ(G).
Let X be a finite set and let P be a probability density on its elements. Let K be
a constant. Then, a sequence x ∈ X n is called P -typical if for every y ∈ X and for the
number of occurrences of the element y in x, i.e., N (y|x), we have
p
|N (y|x) − np(y)| ≤ K p(y).
61
A. 2. Lemma. Let T n (P ) be the set of the P -typical n-sequences. Then,
for some constant C > 0 depending on |X | and > 0 and independent of n and P ,
(iii) The number of typical sequences N (n) is bounded as
√ √
2nH(P )−C n
≤ N (n) ≤ 2nH(P )+C n
.
for some constant C > 0 depending on |X | and > 0 and independent of n and P .
Having X defined as above, let (G, P ) be a probabilistic graph with vertex V (G) = X .
We define the relation e as
If e determines an equivalence relation on the vertex set V (G), then graph G is the union of
pairwise disjoint cliques. Let H(P |e) denote the conditional entropy given the equivalence
class e, i.e., P
y:xey p(y)
X
H(P |e) = p(x) log ,
x∈X
p(x)
Let A denote the collection of equivalence classes under e. Let Pe be the probability density
on the elements A of A given by
X
pe (A) = p(x),
x∈A
Then we have the following lemma (see V. Anantharam [2] and J. Körner [19] ).
A. 3. Lemma. (V. Anantharam). The number of P -typical n-sequences √
in a Pe -typical
nH(P |Pe )−C n
n-sequence of √ equivalence classes is bounded below by 2 and bounded above
nH(P |Pe )+C n
by 2 for some constant C > 0.
62
Proof. Let A = (A1 , · · · , An ) be a Pe -typical n-sequence. That is for each A ∈ A
p
|N (A|A) − npe (A)| ≤ K npe (A), (6.4)
Then for all A ∈ A, we have
np(A) ≤ max(4K 2 , 2N (A|A)), (6.5)
p
The proof is as follows. Suppose np(A) ≥ 4K 2 . Then np(A) ≥ 2K npe (A) and therefore,
p npe (A)
N (A|A) ≥ npe (A) − K npe (A) ≥ ,
2
Let x = (x1 , · · · , xn ) be a P -typical n-sequence in A, i.e.,
xi ∈ Ai , 1 ≤ i ≤ n.
From P -typicality of x, we have
p
|N (x|x) − np(x)| ≤ K np(x).
Now, we prove
that foreach A ∈ A, the restriction of x to those co-ordinates having
p(x)
Ai = A is pe (A) : x ∈ A -typical. For x ∈ A, we have
p(x) p(x)
N (x|x) − N (A|A) ≤ |N (x|x) − np(x)| + np(x) − N (A|A)
pe (A) pe (A)
p p(x) p
≤ K np(x) + K npe (A)
pe (A)
s !
p(x) p(x) p
= K + npe (A).
pe (A) pe (A)
q
Using (6.5), and noting N (A|A) ≥ 1 and pp(x)
e (A)
≤ p(x)
pe (A)
, we get
s !
p(x) p(x) p(x) p
N (x|x) − N (A|A) ≤ K + max(4K 2 , 2N (A|A))
pe (A) pe (A) pe (A)
s !s
p(x) p(x) 4K 2 p
≤ K + max , 2 . N (A|A)
pe (A) pe (A) N (A|A)
s !
√ p(x) p(x) p
≤ max(2K 2 , 2K) + N (A|A)
pe (A) pe (A)
s
√ N (A|A)p(x)
≤ 2 max(2K 2 , 2K) . (6.6)
pe (A)
63
P p(x) pe (A)
Now, letting H(P |e = A) denote x∈A pe (A) log p(x) , we give the following lower and
upper bounds on the number of P -typical n-sequences x in A. Let C > 0 be some
constant depending on K and |X | as in Lemma 6.2, then using Lemma 6.2 and (6.4) we
get the following upper bound on the P -typical n-sequences x in A
Y N (A|A)H( P ) p
2 Pe (A) + C N (A|A)
A∈A
P
N (A|A)
√
n H(P |e=A)+C N (A|A)
=2 A∈A n
√
√
pe (A)+ K
P P
n npe (A) H(P |e=A)+ A∈A C n
≤2 A∈A n
P P √ √
pe (A)H(P |Pe )+K npe (A)H(P |e=A)+C|X | n
≤ 2n A∈A A∈A
√
≤ 2nH(P |Pe )+ n(C|X |+K A∈A log |A|)
P
√
= 2nH(P |Pe )+ n(C|X |+K|X |) ,
Now, setting
C1 = C|X | + K|X |,
Thus, the number of P -typical n-sequences x in A is upper bounded by
√
2nH(P |e)+C1 n
,
for sufficiently large n. Let λ > 0 be a positive number. First, we show that
0
M (n, ) ≥ 2(H (G,P )−λ) .
(n)
Consider G(n) [U ] for some U ∈ T . Using Lemma A. 2, for any δ > 0 there is a K > 0
such that for any sufficiently large n, we have
P (T n (P )) ≥ 1 − δ.
64
First, note that
1 − δ − ≤ P (U ∩ T n (P )) . (6.7)
Now, we estimate the chromatic number of G(n) [U ∩ T n (P )]. Let S n denote the family of
the maximal independent sets of G(n) . Note that every colour class in a minimum colouring
of graph can be enlarged to a maximal independent set. Thus,
P (U ∩ T n (P )) ≤ χ G(n) [U ∩ T n (P )] . maxn P (S ∩ T n (P )) ,
(6.8)
S∈S
Furthermore, we have
maxn P (S ∩ T n (P )) ≤ max
n
p(x). maxn |S ∩ T n (P )|. (6.9)
S∈S x∈T (P ) S∈S
65
Then X
N (R|S) − na(R) = N (y, R|x, S) − nq(y, R),
y∈R
and probability density Q for the graph Γ, the number of Q-typical n-sequences in each
a-typical equivalence class A which is a maximal independent set of G lies in the interval
h √ √ i
2nH(Q|a)−K2 n , 2nH(Q|a)+K2 n . (6.13)
Noting that every pair (y, R) may occur zero, one,· · · , or n-times in the n-sequence (x, S)
and for a given y knowing N (y, R|x, S) for all R uniquely determines N (y|x), there are at
most (n + 1)|V (Γ)| different auxiliary densities of the type given by (6.10). Now we bound
maxS∈S n |S ∩ T n (P )| as follows. Note that S ∩ T n (P ) is the set of P -typical n-sequenences
which are contained in a given maximal independent set S in G(n) . Then letting Q be the
feasible joint distribution for (X, S), for all S ∈ S n and all Q ∈ Q, set
From (6.10), for all S ∈ S n and for all x in |S ∩ T n (P )| there is some Q ∈ Q such that
x ∈ T n (S, Q). Therefore, for all S ∈ S n , we get
[
|S ∩ T n (P )| ≤ | T n (S, Q)|
Q∈Q
X
≤ |T n (S, Q)|
Q∈Q
≤ |Q| max |T n (S, Q)|,
Q∈Q
66
Then, using (6.13), we obtain
0 √
maxn |S ∩ T n (P )| ≤ (n + 1)|V (Γ)| .2n. maxQ0 ∈Q H(Q |a)+K2 n
. (6.14)
S∈S
Further,
X p(y) X
q(y, R) = . N (y, R|x, S) = p(y).
R:y∈R
N (y|x) R:y∈R
≤ χ G(n) [U ∩ T n (p)]
(1 − λ − )
√
0
.exp2 n. max 0
H(Q |a) − H(P ) + K2 n + |V (Γ)|. log2 (n + 1) ,
Q ∈Q
And consequently,
χ G(n) [U ∩ T n (P )]
≥ (1 − λ − ) (6.16)
√
0
.exp2 n(H(P ) − max
0
H(Q |a) − K2 n − |V (Γ)|. log2 (n + 1)) .
Q ∈Q
Note that
X q 0 (x, S)
H(P ) − max H(Q0 |a) = min q 0 (x, S) log2 = min I(Q0 ).
0
Q ∈Q 0 Q ∈Q
x,S
p(x).q 0 (S) Q0 ∈Q
Now, considering
χ G(n) [U ] ≥ χ G(n) [U ∩ T n (P )] ,
(n)
and using (6.16), for every U ∈ T we get
√
χ(G(n) [U ]) ≥ (1 − λ − ).exp2 nH 0 (G, P ) − K2 n − |V (Γ)|. log2 (n + 1) .
Thus,
1 1 K2 |V (Γ)|
(n)
log2 (1 − λ − ) + H 0 (G, P ) − √ −
log2 minn χ G [U ] ≥ log2 (n + 1),
n U ∈T n n n
67
Therefore, we get
1
lim inf log2 M (n, ) ≥ H 0 (G, P ). (6.17)
n→∞ n
Now we show that for every 0 < < 1 and δ > 0 and sufficiently large n, there exists
subgraphs G(n) [U ] of G(n) , for some U ⊆ V (G(n) ), such that
0
χ G(n) [U ] ≤ 2n(H (G,P )+δ) .
Let Q∗ be the joint density on vertices and independent sets of G which minimizes the
mutual information I(Q∗ ). That is
I(Q∗ ) = H 0 (G, P ).
Let L be a fixed parameter. For a family of L maximal independent sets, not necessarily
distinct and not necessarily covering, we define the corresponding probability density Q∗L
as follows. We assume that the L maximal independent sets of a given system of maximal
independent sets are chosen independently. Thus,
L
Y
Q∗L (S1 , S2 , · · · , SL ) = Q∗ (Sj ).
j=1
Now consider a fixed n. Let G(n) be the n-th conormal power graph of graph G. Consider
systems of maximal independent sets consisting of L maximal independent sets each in
the form of a n-sequence of maximal independent sets. We call this system of maximal
independent sets an L-system.
For each L-system (S1 , S2 , · · · , SL ) let U (S1 , S2 , · · · , SL ) be the union of all vertices
of V (G(n) ) which are not covered by the L-system (S1 , S2 , · · · , SL ). For a given L, we
show that the expected value of P (U (S1 , S2 , · · · , SL )) is less than . This implies that
there exists at least one system S1 , · · · , SL covering a subgraph of G(n) with probability
68
greater than or equal to 1 − . For an L-system chosen with probability Q∗L , let Q∗L,x be
the probability that a given n-sequence x is not covered by an L-system, that is
Q∗L,x = Q∗L ({(S1 , · · · , SL ) : x ∈ U (S1 , · · · , SL )})
X
= Q∗L (S1 , · · · , SL ) .
(S1 ,··· ,SL )3x
Then we have
X
E (P (U (S1 , · · · , SL ))) = Q∗L (S1 , · · · , SL ) .P (U (S1 , · · · , SL ))
S1 ,··· ,SL
X X
= Q∗L (S1 , · · · , SL ) P (x)
(S1 ,··· ,SL ) x∈U (S1 ,··· ,SL )
X X
= P (x) Q∗L (S1 , · · · , SL )
x∈X n U (S1 ,··· ,SL )3x
X
= P (x).Q∗L,x . (6.18)
x∈X n
For a given with 0 < < 1, by Lemma A. 2 there exists a set of typical sequences with
total probability greater than or equal to 1 − 2 . Then we can write the right hand of the
above equation as
X
P (x).Q∗L,x
x∈X n
X
= P (x).Q∗L,x
x∈T n (P )
X
+ P (x).Q∗L,x . (6.19)
x∈T n (P )
The second term in (6.19) is upper-bounded by P T n (P ) which is less than 2 . We give
0
an upper bound for the first term and show that for L = 2n(H (G,P )+δ) it tends to 0 as
n → ∞. Now
X
P (x).Q∗L,x ≤
x∈T n (P )
n
P (T (P )) . max
n
Q∗L,x ≤
x∈T (P )
∗
max QL,x .
x∈T n (P )
69
If an n-sequence x is not covered by an L-system, then x is not covered by any element of
this system. Letting Sx be the set of maximal independent sets covering the n-sequence x,
we have
max
n
Q∗L,x = max
n
(1 − Q∗ (Sx ))L . (6.20)
x∈T (P ) x∈T (P )
We obtain a lower bound for Q∗ (Sx ) by counting the Q∗ -typical n-sequences of maximal
independent sets covering x ∈ T n (P ). This number is greater than or equal to the Q∗ -
typical sequences (y, B) with the first coordinate equal to x. The equality of the first
coordinate of the ordered pairs in V (Γ) is an equivalence relation p on the set V (Γ). Thus,
using Lemma A. 3, the number of the Q∗ -typical n-sequences of maximal independent sets
is bounded from below by √
∗
2nH(Q |q)−K3 n , (6.21)
Let K4 be a constant independent of n and the density a(Q∗ ). Then, applying Lemma A.2
to S and the marginal distribution a(Q∗ ) of Q∗ over the maximal independent sets, we
obtain the following lower bound on the probability Q∗ of the a(Q∗ )-typical n-sequences
of maximal independent sets,
∗ ))+K √n
Q∗ ≥ 2−(nH(a(Q 4 ). (6.22)
max Q∗L,x ≤
x∈T n (P )
√ √ L
1 − exp2 (−(nH(a(Q∗ )) + K4 n) + nH (Q∗ |p) − K3 n) . (6.23)
Therefore,
√ L
∗ −(nH 0 (G,P )+K5 n)
max
n
Q L,x ≤ 1 − 2 , for some constant K5 .
x∈T (P )
Then, using the inequality (1 − x)L ≤ exp2 (−Lx), the above inequality becomes
√
∗ −(nH 0 (G,P )+K5 n)
max
n
QL,x ≤ exp2 −L.2 . (6.24)
x∈T (P )
70
0
Setting L = 2(nH (G,P )+δ) , (6.24) becomes
√
∗ nH 0 (G,P )+δ −(nH 0 (G,P )+K5 n)
max QL,x ≤ exp2 −(2 − 1).2
x∈T n (P )
√
≤ exp2 −2nδ−K6 n . (6.25)
Consequently,
0
χ G(n) [U ] ≤ 2n(H (G,P )+δ) ,
min for every δ > 0,
U ⊂V (G(n) ),U ∈Tn
71
Bibliography
[1] Noga Alon, and Alon Orlitsky, “Source Coding and Graph Entropies,” IEEE Trans.
on Information Theory, Vol. 42, No. 5, pp. 1329-1339, September 1996.
[3] C. Berge, Graphs and hypergraphs, 2nd edition, North Holland, Amesterdam, 1976.
[4] Stephen Boyd, and Lieven Vanderberghe, Convex Optimization, Published in the
United States of America by Cambridge University Press, New York, First published
in 2004, Seventh printing with corrections in 2009.
[5] Jean Cardinal, Samuel Fiorini, and G. Van Assche, “On minimum entropy graph
colorings,” In Proc. of IEEE Int. Symposium on Information Theory, p 43, 2004.
[6] Jean Cardinal, Samuel Fiorini, and Gwenael Joret, “Minimum Entropy Coloring,”
IEEE Trans. on Information Theory, Vol. 42, No. 5, pp. 1329-1339, September 2008.
[7] V. Chvátal, “On certain polytopes associated with graphs”, J. Comb. Theory B, 18,
(1975), 138-154.
[8] Thomas M. Cover, and Joy A. Thomas, Elements of Information Theory, Second
Edition 2006, A Wiley-Interscience publication.
[9] I. Csiszár, J. Körner, L. Lovás, K. Marton, and G. Simonyi, “Entropy Splitting for
antiblocking corners and perfect graphs,” Combinatorica, Vol. 10, pp. 27-40, 1990.
[10] R. M. Dudley, Real Analysis and Probability, First published by Wadsworth, Inc. in
1989, Cambridge University Press Edition published in 2002.
73
[11] Rick Durrett, Probability: Theory and Example, Fourth Edition published 2010 by
Cambridge University Press.
[12] P. Erdös, S. T. Hedetniemi, R. C. Laskar, and G. C. E Prins, “On the equality of the
partial Grundy and Upper Ochromatic Numbers of Graphs,” Discrete Math 272(1):
53-64, in honor of Frank Haray, 2003.
[13] Louis Esperet, Frantis̆ek Kardos̆, and Andrew D. King, Daniel Král, and Serguei
Norine, “Exponentially many perfect matchings in cubic graphs”, Advances in Math-
ematics 227 (4): 1646-1664, 2011.
[14] M. Fredman, and J. Komlós, “On the size of separating systems and perfect hash
functions”, SIAM J. Algebraic Discrete Methods, 5 (1984), pp. 61-68.
[15] C. De Simone and J. Körner, “On the odd cycles of normal graphs”, Proceedings of
the Third International Conference on Graphs and Optimization, GO-III (Leukerbad,
1998), Discrete Appl. Math. 94 (1999), no. 1 - 3, 161-169.
[17] Chris Godsil, and Gordon Royle, Algebraic Graph Theory, Graduate Texts in
Mathematics-Springer, 2001.
[18] J. Kahn and J. H. Kim, “Entropy and Sorting”, Proc. 24th Annual ACM Symposium
on the Theory of Computing, pp. 178-187.
[19] J. Körner, “Coding of an information source having ambiguous alphabet and the
entropy of graphs,” in Transactions of the 6th Prague Conference on Information
Theory, Academia, Prague, (1973), 411-425.
[20] J. Körner, “An extension of the class of perfect graphs”, Studia Sci. Math. Hung., 8
(1973), 405-409.
[22] J. Körner and G. Longo, “Two-step encoding of finite memoryless sources”, IEEE
Trans. Information Theory, 19 (1973), 778-782.
[23] J. Körner, G. Simonyi, and Zs. Tuza, “Perfect couples of graphs,” in Combinatorica,
12 (1992), 179-192.
74
[24] J. Körner, and K. Marton, “Graphs that split entropies”, SIAM J. Discrete Math.,
1 (1988), 71-79.
[25] J. Körner, and K. Marton, “New bounds for perfect hashing via information theory”,
European J. of Combinatorics, 9 (1988), 523-530.
[26] L. Lovász, “Perfect graphs”, in More Selected Topics in Graph Theory, (L. W.
Beineke, R. J. Wilson, eds), Academic Press, New York-London, 1983, 55-87.
[27] L. Lovász, “Normal hypergraphs and the perfect graph conjecture”, in Discrete Math-
ematics 2 (3), 1972a: 253267.
[29] L. Lovász, “On the Shannon Capacity of a Graph”, in IEEE Trans. on Information
Theory, 25 (1979), pp. 1-7.
[31] Katalin Marton, “On the Shannon Capacity of Probabilistic Graphs”, in Journal of
Combinatorial Theory Series B, 57, pp. 183-195 (1993).
[32] Zsolt Patakfalvi, “Line-graphs of cubic graphs are normal”, in Discrete Mathematics
308 (2008), pp. 2351-2365.
[36] G. Simonyi, “Perfect graphs and graph entropy. An updated survey,” Perfect Graphs,
John Wiley and Sons (2001) pp. 293-328.
[37] L. E. Trotter, jr., “A Class of Facet Producing Graphs for Vertex Packing Polyhedra”,
in Discrete Math. 12 (1975) pp. 373-388.
75
[38] Daniel Ullman and Edward Scheinerman, Fractional Graph Theory : A Rational
Approach to the Theory of Graphs, Wiley-Interscience; 1 edition (August 25, 1997).
[39] Annegret K. Wagler, “The Normal Graph Conjecture is true for Circulants”, in Graph
Theory in Paris Trends in Mathematics 2007, pp. 365-374.
[40] Douglas B. West, Introduction to Graph Theory, Second Edition, Prentice-Hall, Inc.,
2001.
76