0% found this document useful (0 votes)

30 views7 pages

Lecture 15

Uploaded by

Islam K. Sharawneh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

30 views7 pages

Lecture 15

Uploaded by

Islam K. Sharawneh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

EE/Stats 376A: Information theory Winter 2017

Lecture 15 — March 2
Lecturer: David Tse Scribe: Vihan Lakshman, Burak B, Aykan O, Tung-Yu W, David Z

15.1 Outline
• Differential Entropy, Divergence, and Mutual Information

• Entropy Maximization

• Capacity of Gaussian Channels

15.2 Recap - Differential Entropy

Last lecture, we introduced a notion of entropy in the setting of continuous random variables
known as differential entropy. For a continuous random variable Y , the differential entropy
of Y is defined as
1
h(Y ) , E log
f (Y )
and, for another continuous random variable X, we defined the conditional differential en-
tropy as
1
h(Y |X) , E log .
f (Y |X)
Recall that h(Y ), h(Y |X) have units unlike their discrete counterpart. Let’s now check other
interesting properties.

15.2.1 Mutual Information: Label Invariance

Theorem 1. For a constant a, I(aX; Y ) = I(X; Y ).
Proof. Recall from last time,

I(aX; Y ) = h(aX) − h(aX|Y ).

Furthermore, we showed last time time that h(aX) = h(X) + log |a|. Therefore,

I(aX; Y ) = h(X) + log |a| − h(aX|Y ).

To proceed from here, we need to get a handle on the term h(aX|Y ). It will help to reason
by analogy and remember that in the discrete case we have
X
H(X|Y ) = H(X|Y = y)p(y).
y

15-1
EE/Stats 376A Lecture 15 — March 2 Winter 2017

For continuous random variables, recall that sums are replaced by integrals and probabilities
by density functions. Making these modifications, we arrive at the following definition for
conditional differential entropy:
Z
h(X|Y ) = h(X|Y = y)f (y)dy.
y

Plugging these quantities into the definition of mutual information we have

I(aX; Y ) = h(aX) − h(aX|Y )

= h(X) + log |a| − h(X|Y ) − log |a|
= h(X) − h(X|Y ) = I(X; Y ).

As an exercise, the reader is encouraged to consider whether the above result holds for
any one-to-one function on X or Y . Why would such a result be important? Remember
that the capacity of a channel can be described in terms of mutual information. Thus,
the aforementioned result would imply that the capacity remains the same if we perform a
lossless pre or post-processing to the data we communicate.

15.3 Properties
15.3.1 Chain Rule for Differential Entropy
Let’s continue our exploration of differential entropy and ask another very natural question:
Does the chain rule hold? That is, does h(X1 , X2 ) = h(X1 ) + h(X2 |X1 )? As it turns out,
this result is true and the proof is nearly identical to our derivation of the original chain rule
for discrete random variables:

1
h(X1 , X2 ) = E log
f (X1 , X2 )

1
= E log
f (X1 )f (X2 |X1 )

1 1
= E log +E
f (X1 ) f (X2 |X1 )
= h(X1 ) + h(X2 |X1 ).

15-2
EE/Stats 376A Lecture 15 — March 2 Winter 2017

15.3.2 Differential Entropy and Divergence

What about divergence? As a refresher, we defined the divergence between two discrete
distributions p and q as

p(X)
D(p||q) , E log for X ∼ p.
q(X)

A very natural idea is to define divergence the same way in the continuous case, substituting
density functions into the appropriate places:

f (X)
D(f ||g) , E log for X ∼ f.
g(X)

One of the most important properties of relative entropy in the discrete case is non-negativity,
which we leveraged a number of times in proving other bounds. Thus, we would like to know
if the same result holds for relative differential entropy, and the answer turns out to be an
affirmative.

Theorem 2 (Cover & Thomas, page 252).

D(f ||g) ≥ 0.

Proof. The proof will be very similar to the discrete version we saw in an earlier lecture. Let
f and g be two probability density functions and let S denote the support of f . Then,

g(X)
−D(f ||g) = E − log
f (X)

g(X)
≥ − log E Jensen’s Inequality
f (X)
Z
g(x)
= − log f (x) dx
f (x)
≥ − log 1 = 0.

Corollary 1. For continuous random variables X and Y , I(X; Y ) ≥ 0.

Proof. Recall that we can write mutual information as a divergence between a joint density
function and the product of the marginal densities. Therefore,

I(X; Y ) = D(fX,Y (x, y)||fX (x) · fY (y)) ≥ 0

by Theorem 2

Corollary 2. For continuous random variables X and Y , h(X|Y ) ≤ h(X).

Note that h(X|Y ) = h(X) if and only if X and Y are independent.

15-3
EE/Stats 376A Lecture 15 — March 2 Winter 2017

15.3.3 Concavity of Differential Entropy

Theorem 3. h(X) is concave in the distribution of X.

Proof. Let X be a continuous random variable. Now, let’s define an auxiliary random
variable Y ∼ Bern(λ) and let
(
X1 if Y = 0
X= .
X2 if Y = 1

Thus, we have

h(X) ≥ h(X|Y )
= h(X|Y = 1)p(Y = 1) + h(X|Y = 0)p(Y = 0)
= λh(X1 ) + (1 − λ)h(X2 )

15.4 Entropy Maximization: Discrete Case

We now transition to entropy maximization. In this section, we review a result that we have
seen before regarding the discrete distribution that maximizes entropy. Let X be a discrete
random variable on the support set {1, 2, . . . , k}. Amongst all discrete distributions p of X
on {1, 2, . . . , k}, which one has maximum entropy?

As we’ve seen, the answer is the discrete uniform distribution U (1, 2, . . . , k). In a previous
homework, we showed this result using concavity and label invariance. In this section, we
will present another argument using the definition of entropy directly – a line of reasoning
we will use again later in this lecture when finding the entropy maximizing distribution in
the continuous case.

Theorem 4. For a given alphabet, uniform distribution achieves the maximum entropy.

Proof. Let U denote the discrete uniform distribution on {1, 2, . . . , k} as defined above and
let p denote an arbitrary discrete distribution on X. By the non-negativity of relative
entropy, we note that
D(p||U ) ≥ 0.
Thus,
X p(x) X
D(p||U ) = p(x) log = −H(X) − p(x) log U (x).
x
U (x) x

Now, note that U (x) = k1 for all values of X ∈ {1, 2, . . . k}. Thus, U (x) is in fact a constant
so we can pull it out of the above summation and write
X
D(p||U ) = −H(X) − log U (x) p(x).
x

15-4
EE/Stats 376A Lecture 15 — March 2 Winter 2017

P
Since p is a probability distribution, we observe
P that x p(x) = 1. Therefore, can substitute
P
another expression equal to 1 in place of x p(x). The key insight is to substitute in x U (x)
which is also equal to 1 since it is also a probability distribution. Applying this substitution
gives us X
D(p||U ) = −H(X) − log U (x) U (x) = −H(X) + H(U )
x

where we define H(U ) to be the entropy of a uniformly distributed random variable. Up to

this point, we have shown that

D(p||U ) = −H(X) + H(U )

and using the fact that D(p||U ) ≥ 0 we can conclude that

H(U ) ≥ H(X).

Since our choice of a probability distribution on X was arbitrary, this completes the proof.

15.5 Entropy Maximization: Continuous Case

Let’s now ask the same question for the continuous setting. That is, we would like to solve
the optimization problem

max h(X)
f

for a continuous random variable X. After a bit of thought, one realizes that this problem
is not very well formulated since the differential entropy can become arbitrary large. In
particular, if we take the density function f to be uniform on longer and longer intervals,
the differential entropy will approach ∞. To remedy this issue, we introduce an additional
constraint. Intuitively, the problem we ran into with our first attempt at formulating the
optimization problem is that we could keep “spreading out” a uniform distribution over
longer and longer intervals, which thereby increases the differential entropy. We observe that
as we stretch a uniform distribution over longer intervals, the variance of the distribution
increases as well. Thus, if we introduce a constraint that could control the variance of the
density function, then we might be able to circumvent the problem. To that end, we will
introduce a second moment constraint. Since the second moment E[X 2 ] is intimately tied to
variance, this will help us rectify our earlier issue. Our new optimization problem is now

max h(X)
f

subject to E[X 2 ] = α.
for some constant α. Now, we are ready to state and prove the main result of this section.

Theorem 5. The Gaussian distribution achieves maximum differential entropy subject to

the second moment constraint.

15-5
EE/Stats 376A Lecture 15 — March 2 Winter 2017

Proof. We’ll follow a similar outline to our prove that the uniform distribution achieves
maximum entropy in the discrete case. As we did previously, let’s start with divergence. Let
φ(x) denote the Gaussian density function with 0 mean and unit variance. Thus, φ(x) =
2
√1 e−x /2 . Let f denote an arbitrary density function on X. The divergence is then
2π

f (X)
D(f ||φ) = E log , X ∼ f.
φ(X)
Using the definition of differential entropy this becomes

1
D(f ||φ) = −h(X) + E log
φ(X)

1 −X 2 /2
= −h(X) + E √ e .
2π
Simplifying the second term of the above sum gives us
√ log e 2
D(f ||φ) = −h(X) + log 2π + E X .
2
Observe that the term inside the expectation in the above expression is a constant by our
second moment constraint. Using the same trick we employed in the discrete case proof, we
can thus change the distribution as long as we preserve the second moment. Thus, we will
replace f by a Gaussian density function. To make this substitution explicit, let XG denote
the random variable X now under the Gaussian distribution with mean zero and variance
α. With this substitution, observe that

1 1
E log = E log = h(XG ).
φ(X) φ(XG )
Therefore, we see that
D(f ||φ) = −h(X) + h(XG ) ≥ 0
by the non-negativity of divergence. Therefore,

h(XG ) ≥ h(X).

Remark: Also note that the Gaussian distribution is essentially the unique differential en-
tropy optimizer. Slightly more formally, if f maximizes the differential entropy subject to the
second moment constraint, then f is equal to the Gaussian distribution almost everywhere.

15.6 The Gaussian Channel Revisited

With our understanding of maximum entropy distributions, let’s turn our attention back to
the communication problem and conclude this lecture by proving the result stated during
the previous class that the capacity of a Gaussian channel is C = 12 log(1 + σP2 ).

15-6
EE/Stats 376A Lecture 15 — March 2 Winter 2017

Recall the setup of the Gaussian channel. For each input, Xi , the channel outputs Yi
i.i.d 2
where Yi = Xi + Zi where Zi ∼ N (0, σP ), also Zi ⊥ X. Furthermore, we impose a power
constraint for a block length n that n E [ ni=1 Xi2 ] ≤ P . The key insight we now make is that
1

this power constraint is nothing more than a second moment constraint that we worked with
in the previous section. If we formulate the channel capacity as an optimization problem as
we have done before, we have

max I(X; Y )
fX

subject to E[X 2 ] = P.
Expanding the definition of mutual information, we have

I(X; Y ) = h(Y ) − h(Y |X) = h(Y ) − h(Z)

where the last equality follows from the fact that Z does not depend on X. Thus, we can
rewrite our optimization problem as

max h(Y )
f

subject to E[X 2 ] = P.
which we can rewrite as

max h(X + Z)
f

subject to E[X 2 + Z 2 ] = P + σ 2 .
From the result of the previous section, we know that maximum entropy will be achieved if
we set X + Z be normally distributed. Since the sum of two Gaussians is a Gaussian, we can
achieve this by setting X ∼ N (0, P ) (recall that Z is Gaussian by assumption). As before,
let XG denote the random variable X under the Gaussian distribution. Applying this result
to the mutual information between X and Y we have
1 1
I(X; Y ) = h(XG + Z) − h(Z) = log 2πe(P + σ 2 ) − log 2πeσ 2
2 2
which simplifies to
1 P
log(1 + 2 ).
2 σ
Therefore, the capacity of the Gaussian channel is
1 P
C = max I(X; Y ) = log(1 + 2 )
2E[X ]≤P 2 σ

as we stated last lecture.

15-7

Lecture 8: Channel Capacity, Continuous Random Variables: 1.1 Examples
No ratings yet
Lecture 8: Channel Capacity, Continuous Random Variables: 1.1 Examples
6 pages
ITC-Post Mid1
No ratings yet
ITC-Post Mid1
36 pages
Information Theory Differential Entropy
No ratings yet
Information Theory Differential Entropy
29 pages
Chapter 2
No ratings yet
Chapter 2
68 pages
Chapter 5
No ratings yet
Chapter 5
85 pages
Lecture 14
No ratings yet
Lecture 14
5 pages
Lecture 3: Entropy, Relative Entropy, and Mutual Information
No ratings yet
Lecture 3: Entropy, Relative Entropy, and Mutual Information
5 pages
Conditioning and Entropy Reduction
No ratings yet
Conditioning and Entropy Reduction
8 pages
Lec7 InformationTheory
No ratings yet
Lec7 InformationTheory
41 pages
Intro to Information Theory
No ratings yet
Intro to Information Theory
14 pages
ETN642-lec9 - CH9 Differential Entropy
No ratings yet
ETN642-lec9 - CH9 Differential Entropy
6 pages
Shannon's Theorems: Math and Science Summer Program 2020
No ratings yet
Shannon's Theorems: Math and Science Summer Program 2020
28 pages
Info Theory Homework Solutions
No ratings yet
Info Theory Homework Solutions
9 pages
Ech 4
No ratings yet
Ech 4
39 pages
Differential Entropy: Peng-Hua Wang
No ratings yet
Differential Entropy: Peng-Hua Wang
24 pages
It Co 1 en
No ratings yet
It Co 1 en
26 pages
Intro to Information Theory
No ratings yet
Intro to Information Theory
17 pages
2009 Lecture25
No ratings yet
2009 Lecture25
11 pages
Lecture 5
No ratings yet
Lecture 5
42 pages
Info Theory Course Notes
No ratings yet
Info Theory Course Notes
46 pages
Solutions To Problems Related To Information Theory
No ratings yet
Solutions To Problems Related To Information Theory
4 pages
Probabilistic Methods in Information Theory
No ratings yet
Probabilistic Methods in Information Theory
48 pages
Lecture 3: Entropy, Relative Entropy, and Mutual Information
No ratings yet
Lecture 3: Entropy, Relative Entropy, and Mutual Information
5 pages
Elements of Information Theory 2006 Thomas M. Cover and Joy A. Thomas
No ratings yet
Elements of Information Theory 2006 Thomas M. Cover and Joy A. Thomas
16 pages
BEC503-DC-M3-Information Theory
100% (1)
BEC503-DC-M3-Information Theory
100 pages
1 Introduction To Information Theory
No ratings yet
1 Introduction To Information Theory
9 pages
Information Theory/ Data Compression Ma 4211: J Urgen Bierbrauer February 28, 2007
No ratings yet
Information Theory/ Data Compression Ma 4211: J Urgen Bierbrauer February 28, 2007
78 pages
(397 P. COMPLETE SOLUTIONS) Elements of Information Theory 2nd Edition - COMPLETE Solutions Manual (Chapters 1-17)
85% (55)
(397 P. COMPLETE SOLUTIONS) Elements of Information Theory 2nd Edition - COMPLETE Solutions Manual (Chapters 1-17)
397 pages
DifferentialEntropy Examples
No ratings yet
DifferentialEntropy Examples
17 pages
Maximum Entropy Models in Statistics
No ratings yet
Maximum Entropy Models in Statistics
20 pages
Entropy and Mutual Information
No ratings yet
Entropy and Mutual Information
63 pages
Binary Entropy Function Overview
No ratings yet
Binary Entropy Function Overview
8 pages
Information Theory Basics
No ratings yet
Information Theory Basics
211 pages
Understanding Mutual Information Concepts
No ratings yet
Understanding Mutual Information Concepts
8 pages
Understanding Information Theory Concepts
No ratings yet
Understanding Information Theory Concepts
52 pages
Week 10 - Differential Entropy
No ratings yet
Week 10 - Differential Entropy
22 pages
Information Theory and Machine Learning
No ratings yet
Information Theory and Machine Learning
5 pages
Intro to Information Theory
No ratings yet
Intro to Information Theory
21 pages
Stat520 Ch.5
No ratings yet
Stat520 Ch.5
5 pages
LECTURE 1: Introduction
No ratings yet
LECTURE 1: Introduction
16 pages
Relative Entropy
No ratings yet
Relative Entropy
6 pages
Introduction To Information Theory
No ratings yet
Introduction To Information Theory
20 pages
Information Theoretic Inequalities
No ratings yet
Information Theoretic Inequalities
18 pages
Lec38 - 210108071 - AKSHAY KUMAR JHA
No ratings yet
Lec38 - 210108071 - AKSHAY KUMAR JHA
12 pages
2 - Maximum Likelihood
No ratings yet
2 - Maximum Likelihood
20 pages
Maximum Entropy: Density Estimation
No ratings yet
Maximum Entropy: Density Estimation
18 pages
03 Prob
No ratings yet
03 Prob
38 pages
CoverThomas Ch2 PDF
No ratings yet
CoverThomas Ch2 PDF
38 pages
Entropy, Relative Entropy and Mutual Information
No ratings yet
Entropy, Relative Entropy and Mutual Information
38 pages
Entropy 1
No ratings yet
Entropy 1
7 pages
Three Tutorial Lectures
No ratings yet
Three Tutorial Lectures
36 pages
Solution To Homework #1: (A) (B) (C) (D)
No ratings yet
Solution To Homework #1: (A) (B) (C) (D)
4 pages
Elements of Information Theory.2nd Ex 2.4
No ratings yet
Elements of Information Theory.2nd Ex 2.4
4 pages
Entropy and Correlation Solutions
No ratings yet
Entropy and Correlation Solutions
4 pages
Lecturer: Mark Braverman Scribe: Mark Braverman: COS597D: Information Theory in Computer Science
No ratings yet
Lecturer: Mark Braverman Scribe: Mark Braverman: COS597D: Information Theory in Computer Science
5 pages
Shannon Entropy Definitions & Theorems
No ratings yet
Shannon Entropy Definitions & Theorems
22 pages
Ratan Chandra Ghosh: Academic Records
No ratings yet
Ratan Chandra Ghosh: Academic Records
10 pages
Determination of Residence Time Distribution in Thin Film SSHE
No ratings yet
Determination of Residence Time Distribution in Thin Film SSHE
9 pages
Spaghetti: Streaming Accelerators For Highly Sparse Gemm On Fpgas
No ratings yet
Spaghetti: Streaming Accelerators For Highly Sparse Gemm On Fpgas
13 pages
What We Know About Transformational Leadership in Tourism and Hospitality A Systematic Review and Future Agenda
No ratings yet
What We Know About Transformational Leadership in Tourism and Hospitality A Systematic Review and Future Agenda
44 pages
Jonathan Xavier Inda, Renato Rosaldo - The Anthropology of Globalization - A Reader (Blackwell Readers in Anthropology) - Wiley-Blackwell (2002)
No ratings yet
Jonathan Xavier Inda, Renato Rosaldo - The Anthropology of Globalization - A Reader (Blackwell Readers in Anthropology) - Wiley-Blackwell (2002)
255 pages
Complete Fundamentals of Nursing 8th Edition Potter HQ File Verified
No ratings yet
Complete Fundamentals of Nursing 8th Edition Potter HQ File Verified
315 pages
Workshop 1 - Customer Service
No ratings yet
Workshop 1 - Customer Service
9 pages
Communication PROJECT PLAN
No ratings yet
Communication PROJECT PLAN
4 pages
BCPC Functionality Assessment Tool
100% (1)
BCPC Functionality Assessment Tool
19 pages
Meat Science: Suresh K. Devatkal, K. Narsaiah, A. Borah
No ratings yet
Meat Science: Suresh K. Devatkal, K. Narsaiah, A. Borah
5 pages
MSTE Nov 2021 Licensure Exam - (2) - 220425 - 092318
No ratings yet
MSTE Nov 2021 Licensure Exam - (2) - 220425 - 092318
155 pages
Out of The Abyss
No ratings yet
Out of The Abyss
1 page
Optimizing Drilling Fluid Properties and Flow Rates For Effective Hole Cleaning at High-Angle and Horizontal Wells
No ratings yet
Optimizing Drilling Fluid Properties and Flow Rates For Effective Hole Cleaning at High-Angle and Horizontal Wells
15 pages
Kangaroo Kids Profile
No ratings yet
Kangaroo Kids Profile
2 pages
CSU276387
No ratings yet
CSU276387
21 pages
Introduction To Mass Communication 10th Edition Baran Full Download
No ratings yet
Introduction To Mass Communication 10th Edition Baran Full Download
405 pages
Hark Its Santa RPG
No ratings yet
Hark Its Santa RPG
2 pages
Philosophy Final Exam Guidelines 2020
No ratings yet
Philosophy Final Exam Guidelines 2020
4 pages
Department of Economics Undergraduate Courses
No ratings yet
Department of Economics Undergraduate Courses
4 pages
Queue Using Single Linked List
No ratings yet
Queue Using Single Linked List
33 pages
Politeness & Impoliteness in Use Strategies - Presentation
No ratings yet
Politeness & Impoliteness in Use Strategies - Presentation
18 pages
6.2. PV Modules
No ratings yet
6.2. PV Modules
42 pages
Institute of Chemistry, UP-Diliman Chem 16 Syllabus: ST ND
100% (1)
Institute of Chemistry, UP-Diliman Chem 16 Syllabus: ST ND
8 pages
Subspace Analysis in R2 and R3
No ratings yet
Subspace Analysis in R2 and R3
3 pages
Homework Global Temperatures
No ratings yet
Homework Global Temperatures
2 pages
Medical Robotics Notes
No ratings yet
Medical Robotics Notes
3 pages
Basil Seed Bionanocomposite Films
No ratings yet
Basil Seed Bionanocomposite Films
9 pages
Unit Lesson Plan 5
No ratings yet
Unit Lesson Plan 5
3 pages
Environmental Awareness Activities Guide
No ratings yet
Environmental Awareness Activities Guide
7 pages
Student-Made Eco Dish Soap Study
No ratings yet
Student-Made Eco Dish Soap Study
9 pages

Lecture 15

Uploaded by

Lecture 15

Uploaded by

EE/Stats 376A: Information theory Winter 2017

• Capacity of Gaussian Channels

15.2 Recap - Differential Entropy

15.2.1 Mutual Information: Label Invariance

I(aX; Y ) = h(aX) − h(aX|Y ).

I(aX; Y ) = h(X) + log |a| − h(aX|Y ).

Plugging these quantities into the definition of mutual information we have

I(aX; Y ) = h(aX) − h(aX|Y )

15.3.2 Differential Entropy and Divergence

Theorem 2 (Cover & Thomas, page 252).

Corollary 1. For continuous random variables X and Y , I(X; Y ) ≥ 0.

I(X; Y ) = D(fX,Y (x, y)||fX (x) · fY (y)) ≥ 0

Corollary 2. For continuous random variables X and Y , h(X|Y ) ≤ h(X).

Note that h(X|Y ) = h(X) if and only if X and Y are independent.

15.3.3 Concavity of Differential Entropy

15.4 Entropy Maximization: Discrete Case

where we define H(U ) to be the entropy of a uniformly distributed random variable. Up to

D(p||U ) = −H(X) + H(U )

and using the fact that D(p||U ) ≥ 0 we can conclude that

15.5 Entropy Maximization: Continuous Case

Theorem 5. The Gaussian distribution achieves maximum differential entropy subject to

15.6 The Gaussian Channel Revisited

I(X; Y ) = h(Y ) − h(Y |X) = h(Y ) − h(Z)

as we stated last lecture.

You might also like