0% found this document useful (0 votes)

63 views54 pages

Informations Channels For Coding Theory

Chapter 2 discusses information channels, which are essential for transmitting information through various media. It covers the concepts of source coding and channel coding, the impact of noise on data transmission, and introduces the definitions and properties of information channels, including binary symmetric and binary erasure channels. The chapter also explains mutual information and how to calculate output probabilities based on input characteristics and channel matrices.

Uploaded by

zak zak

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

63 views54 pages

Informations Channels For Coding Theory

Uploaded by

zak zak

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Chapter 2

Information Channels

2.1 What Are Information Channels?

Information does not only have to be used or stored, it also has to be transmitted. In
communication systems a transmitter converts, or encodes, a message to a form suit-
able for transmission through a communications medium, be it a fibre optic channel,
satellite link or radio signal through space. The receiver then detects the transmitted
signal and decodes it back to the original message. The encoding and decoding oper-
ations will be discussed in the following chapters. Although we will be dealing with
digital data, physical transmission and storage has to deal with analog signals and
media. Thus digital data has to be modulated in some way prior to transmission and
storage, and detected by the receiver to reproduce the original digital sequence. Dis-
cussion of modern analog and digital communication systems is beyond the scope
of this book and interested readers should refer to the standard textbooks in the area,
for example [7].
As we will see, two forms of coding will be necessary. Source coding is required to
efficiently store and transmit the information generated by a general-purpose source
(e.g., ASCII text, images, audio, etc.) on the storage or transmission medium in use
(e.g., computer disk or digital data networks, both requiring conversion to binary
data). Channel coding is required to mitigate the effects of noise that may corrupt
data during storage and transmission through physical systems. What is noise? Noise
can be defined as any unwanted signal or effect in addition to the desired signal. As
such there can be many causes of noise: interference from other signals, thermal and
spurious effects generated by the electronic device being used to store or transmit the
signal, environmental disturbances, etc.
In Chapter 1 we were concerned with defining the information content of a source.
In this chapter we will be concerned with the equally important problem of defining
or measuring the information carrying capacity of a channel. Intuitively a channel
can carry no more information than is being pushed through the channel itself, that
is, the entropy of the source or message being transmitted through the channel. But
as we will see the presence of noise reduces the information carrying capacity of
the channel and leads us to define another important quantity (alongside entropy)
called mutual information. As we view the channel from an information theoretic

51
52 Fundamentals of Information Theory and Coding Design

point of view not surprisingly the concept of mutual information has uses beyond the
information carrying capacity of a channel (as does entropy beyond measuring the
information content of a source).
The following assumptions will apply in our modelling of an information channel:

Stationary The statistical nature of the channel and noise do not change with time

Memoryless The behaviour of the channel and the effect of the noise at time t will
not depend on the behaviour of the channel or the effect of noise at any previ-
ous time

We now formally deﬁne a mathematical structure for an information channel.

DEFINITION 2.1 Information Channel An information channel is a triple

, where is the input alphabet, is the output alphabet and is the
set of channel probabilities. is a discrete set of
symbols (where is the size of the input alphabet) , and
is a discrete set of symbols. The transmission behaviour of the channel is
described by the probabilities in ,
where is the probability that the output symbol will be received if the
input symbol is transmitted.

NOTE The input alphabet represents the symbols transmitted into the channel and
the output alphabet represents the symbols received from the channel. The deﬁnition
of the channel implies that the input and output symbols may be different. In real-
ity, one would expect that the received symbols are the same as those transmitted.
However the effect of noise may introduce “new” symbols and thus we use different
input and output alphabets to cater for such cases. For more general applications the
channel models the matching of the input symbols to prescribed output symbols or
classes which are usually different. In statistical applications the input and output
symbols arise from two random variables and the channel models the joint relationship
between the variables.

The conditional probabilities that describe an information channel can be represented

conveniently using a matrix representation:

½ ½ ¾ ½ ½
½ ¾ ¾ ¾ ¾

(2.1)
.. .. .. ..
. . . .
½ ¾
Information Channels 53

where is the channel matrix and for notational convenience we may sometimes
rewrite this as:
½½ ½

.. .. .. .. (2.2)
. . . .

where we have deﬁned .

A graphical representation of an information channel is given in Figure 2.1.

¾

¾

..
.
..
.
..
.

FIGURE 2.1
Graphical representation of an information channel.

The channel matrix exhibits the following properties and structure:

Each row of contains the probabilities of all possible outputs from the same
input to the channel
Each column of contains the probabilities of all possible inputs to a partic-
ular output from the channel
If we transmit the symbol we must receive an output symbol with probabil-
ity 1, that is:

(2.3)

that is, the probability terms in each row must sum to 1.

EXAMPLE 2.1
Consider a binary source and channel with input alphabet and output alphabet
.
54 Fundamentals of Information Theory and Coding Design

Noiseless: If the channel is noiseless there will be no error in transmission,the channel
matrix is given by

and the channel is

1.0
0 0
A B
1 1
1.0

A typical input-output sequence from this channel could be:

input: 0 1 1 0 0 1 0 1 1 0
output: 0 1 1 0 0 1 0 1 1 0

Noisy: Say the channel is noisy and introducesa bit inversion 1% of the time, then
the channel matrix is given by

and the channel is

0.99
0 0

0.01
A B
0.01

1 1
0.99

A typical input-output sequence from this channel could be:

input: 0 1 1 0 0 1 0 1 1 0
output: 0 1 1 0 0 1 1 1 1 0

2.2 BSC and BEC Channels

In digital communication systems the input to the channel will be the binary digits
0,1 and this set will be the input alphabet and, ideally, also be the output alphabet.
Furthermore, the effect of noise will not depend on the transmission pattern, that is,
Information Channels 55

the channel is assumed memoryless. Two possible scenarios on the effect of noise
are possible.
Ideally if there is no noise a transmitted 0 is detected by the receiver as a 0, and a
transmitted 1 is detected by the receiver as a 1. However in the presence of noise the
receiver may produce a different result.
The most common effect of noise is to force the detector to detect the wrong bit (bit
inversion), that is, a 0 is detected as a 1, and a 1 is detected as a 0. In this case the
information channel that arises is called a binary symmetric channel or BSC where
is the probability of error (also called
bit error probability, bit error rate (BER), or “crossover” probability) and the output
alphabet is also the set of binary digits . The parameter fully deﬁnes the be-
haviour of the channel. The BSC is an important channel for digital communication
systems as noise present in physical transmission media (ﬁbre optic cable, copper
wire, etc.) typically causes bit inversion errors in the receiver.

A BSC has channel matrix
where and is depicted in Figure

2.2.

p
0 0

q
A B
q

1 1
p

FIGURE 2.2
Binary symmetric channel.

Another effect that noise (or more usually, loss of signal) may have is to prevent
the receiver from deciding whether the symbol was a 0 or a 1. In this case the
output alphabet includes an additional symbol, , called the “erasure” symbol that
denotes a bit that was not able to be detected. Thus for binary input , the output
alphabet consists of the three symbols, . This information channel is called
a binary erasure channel or BEC where
is the probability of error (also called the “erasure” probability). Strictly speaking
a BEC does not model the effect of bit inversion; thus a transmitted bit is either
received correctly (with probability ) or is received as an “erasure” (with
probability ). A BEC is becoming an increasingly important model for wireless
mobile and satellite communication channels, which suffer mainly from dropouts
and loss of signal leading to the receiver failing to detect any signal.
56 Fundamentals of Information Theory and Coding Design

A BEC has channel matrix

and is depicted in Figure 2.3.

p
0 0
q
A ? B
q
1 1
p

FIGURE 2.3
Binary erasure channel.

2.3 Mutual Information

To fully specify the behaviour of an information channel it is necessary to specify
the characteristics of the input as well as the channel matrix. We will assume that
the input characteristics are described by a probability distribution over the input
alphabet, with denoting the probability of symbol being input to the chan-
nel. Then a channel will be fully speciﬁed if the input source probabilities,
½ , and channel probabilities,
, are
given.
If a channel is fully speciﬁed then the output probabilities,
, can be calculated by:

(2.4)

The probabilities are termed the forward probabilities where forward in-
dicates that the direction of channel use is with input symbol being transmitted
and output symbol being received (i.e., then or given ). We can simi-
larly deﬁne the backward probabilities as indicating the channel is running
backwards: output symbol occurs ﬁrst followed by input symbol (i.e., then
, or given ). The backward probabilities can be calculated by application of
Bayes’ Theorem as follows:

(2.5)
Information Channels 57

where is the joint probability of and .

EXAMPLE 2.2
Consider the binary information channel fully speciﬁed by:

(2.6)

which is usually represented diagrammatically as:

3/4
2/3 0 0

1/8
A B
1/4

1/3 1 1
7/8

The output probabilities are calculated as follows:

and the backward probabilities by:

Conceptually we can characterise the probabilities as a priori if they provide the

probability assignment before the channel is used (without any knowledge), and as a
posteriori if the probability assignment is provided after the channel is used (given
knowledge of the channel response). Speciﬁcally:
58 Fundamentals of Information Theory and Coding Design

a priori probability of output symbol if we do not know which input symbol

was sent

a posteriori probability of output symbol if we know that input symbol

was sent
a priori probability of input symbol if we do not know which output symbol
was received

a posteriori probability of input symbol if we know that output symbol

was received

We can similarly refer to the a priori entropy of A:

(2.7)
¾

as the average uncertainty we have about the input before the channel output is ob-
served and the a posteriori entropy of A given :

(2.8)
¾

as the average uncertainty we have about the input after the channel output is
observed.
How does our average uncertainty about the input change after observing the output
of the channel? Intuitively, we expect our uncertainty to be reduced as the channel
output provides us with knowledge and knowledge reduces uncertainty. However, as
we will see in the following example the output can sometimes increase our uncer-
tainty (i.e., be more of a hindrance than a help!).

EXAMPLE 2.3
Consider the binary information channel from Example 2.2:

3/4
2/3 0 0

1/8
A B
1/4

1/3 1 1
7/8
Information Channels 59

What is our uncertainty of the input that is transmitted through the channel before we
observe an output from the channel? This is provided by the entropy based on the
given a priori input probabilities, and , yielding the a
priori entropy of A, .
What is our uncertainty of the input that is transmitted through the channel after we
observe an output from the channel? Say we observe an output of , then the
a posteriori input probabilities, and
,
yield an a posteriori entropy of A, Thus we reduce our
uncertainty once we observe an output of . But what if we observe an output of
? Then the a posteriori input probabilities become and
and the a posteriori entropy of A, . Our
uncertainty is in fact increased! The reason is that our high expectation of an input
of 0, from the a priori probability , is negated by receiving an output
of 1. Thus is closer to equi-probable than and
this increases the a posteriori entropy.
Notwithstanding the fact that even though
we can show that if we average across all possible outputs the channel will
indeed reduce our uncertainty, that is:

and thus .

The average of the a posterior entropies of A, , calculated in Example 2.3 is

sometimes referred to as the equivocation of A with respect to B where equivocation
is used to refer to the fact that measures the amount of uncertainty or equiv-
ocation we have about the input when observing the output . Together with the
a priori entropy of , , we can now establish a measure of how well a channel
transmits information from the input to the output. To derive this quantity consider
the following interpretations:

average uncertainty (or surprise) of the input to the channel before observing
the channel output;

average uncertainty (or equivocation) of the input to the channel after ob-
serving the channel output;

reduction in the average uncertainty of the input to the channel

provided or resolved by the channel.
60 Fundamentals of Information Theory and Coding Design

DEFINITION 2.2 Mutual Information For input alphabet and output alpha-
bet the term
(2.9)
is the mutual information between and

The mutual information, , indicates the information about , , that is

provided by the channel minus the degradation from the equivocation or uncertainty,
. The can be construed as a measure of the “noise” in the channel
since the noise directly contributes to the amount of uncertainty we have about the
channel input, , given the channel output .
Consider the following cases:

noisefree If this implies which means the channel

is able to provide all the information there is about the input, i.e., . This
is the best the channel will ever be able to do.
noisy If but then the channel is noisy and the input
information, , is reduced by the noise, , so that the channel is
only able to provide amount of information about
the input.
ambiguous If the amount of noise totally masks the contribution
of the channel and the channel provides information about the
input. In other words the channel is useless and is no better than if the channel
was not there at all and the outputs were produced independently of the inputs!

An alternative expression to Equation 2.9 for the mutual information can be derived
as follows:

¾ ¾ ¾

¾ ¾ ¾ ¾
(2.10)
¾ ¾
Using Equation 2.5 the mutual information can be expressed more compactly:
RESULT 2.1 Alternative expressions for Mutual Information

¾ ¾ ¾ ¾
(2.11)
Information Channels 61

2.3.1 Importance of Mutual Information

The mutual information has been deﬁned in the context of measuring the information
carrying capacity of communication channels. However the concept of mutual infor-
mation has had far-reaching effects on solving difﬁcult estimation and data analysis
problems in biomedical applications [8], image processing and signal processing. In
these applications the key in using mutual information is that it provides a measure
of the independence between two random variables or distributions.
In image processing [12] and speech recognition [9] the use of the maximum mutual
information or MMI between the observed data and available models has yielded
powerful strategies for training the models based on the data in a discriminative
fashion. In signal processing for communication systems the idea of minimising the
mutual information between the vector components for separating mutually interfer-
ing signals [3] has led to the creation of a new area of research for signal separation
based on the idea of independent component analysis or ICA .

2.3.2 Properties of the Mutual Information

From Example 2.3 we saw that for speciﬁc values of the output alphabet, , either
or but when we averaged over the output
alphabet then . This implies that for this case. But
what can we say about for other cases? We restate the following result from
Chapter 1:

(2.12)

for two sources of size with symbol probabilities, and , respectively, and
equality only if for . Let and
and Then from Equations 2.11 and 2.12 we get:

¾ ¾
(2.13)
¾ ¾
That is:

RESULT 2.2
The mutual information is a non-negative quantity:

(2.14)

with if and only if , i.e., the input and

output alphabets are statistically independent.
62 Fundamentals of Information Theory and Coding Design

The expression for provided by Equation 2.11 is symmetric in the variables

and . Thus by exchanging and we get the following result:

(2.15)

or
(2.16)

NOTE The information the channel provides about upon observing is the
same as the information the channel provides about upon noting was sent.

The term is sometimes referred to as the equivocation of B with respect to

A and measures the uncertainty or equivocation we have about the output given
the input .

RESULT 2.3
From the fact that it can be stated that, in general and on
average, uncertainty is decreased when we know something, that is:

(2.17)

Other expressions and relationships between the entropies, , the equiv-
ocation, , the joint entropy, and the mutual information,
, can be derived. All these relations can be summarised by the
Venn diagram of Figure 2.4. From Figure 2.4 additional expressions involving the

FIGURE 2.4
Relationship between all entropy and mutual information expressions.
Information Channels 63

joint entropy can be derived:

(2.18)

The relation can be stated conceptually as:

“The total uncertainty in both and is the sum of the uncertainties in

and minus the information provided by the channel ”
whereas the relation becomes:
“The total uncertainty in both and is the sum of the uncertainty in
plus the remaining uncertainty in after we are given .”

REMARK 2.1 Given the input probabilities,

, and channel matrix, , only
three quantities need to be calculated directly from the individual probabilities to
completely determine the Venn diagram of Figure 2.4 and the remaining three can be
derived from the existing entropy expressions.

EXAMPLE 2.4
From Example 2.3 we calculated:

and from Example 2.2 we can calculate:

The quantities , and completely determine the Venn diagram
and the remaining quantities can be derived by:

64 Fundamentals of Information Theory and Coding Design

The mutual information of a BSC can be expressed algebraically by deﬁning:

and hence:

Then the output probabilities are given by:

A sketch of the BSC showing both input and output probabilities is given in Figure
2.5.

p
0 0
q
A B
q
1 1
p

FIGURE 2.5
Mutual information of a BSC.

The expression for is:

(2.19)

The simpliﬁed expression for is:

¾ ¾

(2.20)

Information Channels 65

and the mutual information is: .

EXAMPLE 2.5
What is the mutual information of a BSC for the following cases?

1. : The channel operates in an ambiguous manner since the errors

are as likely as no errors, the output symbols are equally likely, no
matter what is happening at the input (since ), the equivocation is
and the mutual information is .

2. : The channel is noisefree,

½ ½,

and the mutual information is

3. : The source exhibits maximum entropy with , the output

entropy also exhibits maximum entropy with (since ) and
½
the mutual information is given by . ½

4. : The source contains no information since , the output

entropy, , is the same as the channel uncertainty, , and the mutual
information is since there is no information to transmit.

The mutual information of a BEC can be similarly expressed algebraically. Adopting

the same notation used for the BSC the output probabilities of the BEC are given by:

A sketch of the BEC showing both input and output probabilities is given in Figure
2.6.
The expression for is:

(2.21)

The simpliﬁed expression for reduces to the same expression as Equation

2.20:

(2.22)

and the mutual information is: .

66 Fundamentals of Information Theory and Coding Design
p
0 0
q
A ? B
q
1 1
p

FIGURE 2.6
Mutual information of a BEC.

2.4 Noiseless and Deterministic Channels

We have already discussed the BSC and BEC structures as important channel models
for describing modern digital communication systems. We have also loosely referred
to channels as being noisefree, noisy and ambiguous. We now formally deﬁne noise-
less channels as those channels that are not subject to the effects of noise (i.e., no
uncertainty of the input that was transmitted given the output that was received). We
also deﬁne a dual class of channels called deterministic channels for which we can
determine the output that will be received given the input that was transmitted.

2.4.1 Noiseless Channels

A noiseless channel will have either the same or possibly more output symbols than
input symbols and is such that there is no noise, ambiguity, or uncertainty of which
input caused the output.

DEFINITION 2.3 Noiseless Channel A channel in which there are at least

as many output symbols as input symbols, but in which each of the output symbols
can be produced by the occurrence only of a particular one of the input symbols
is called a noiseless channel. The channel matrix of a noiseless channel has the
property that there is one, and only one, non-zero element in each column.

EXAMPLE 2.6
The following channel with 6 outputs and 3 inputs is noiseless because we know, with
certainty 1, which input symbol, ½ , was transmitted given knowledge of
the received output symbol, .
Information Channels 67

1/3

2/3

1/4
1/2

1/4

1

The corresponding channel matrix is given by:

and we note that there is only one non-zero element in each column.

What can we say about the mutual information through a noiseless channel? Let
be the received output. Then, from the deﬁnition of a noiseless channel, we know
which input, say , was transmitted and we know this with certainty 1. That is,
for and hence for all other . The equivocation
becomes:

¾
¾

¾
¾

since:

and hence ¾

½

.
The mutual information is then given by the following result.
68 Fundamentals of Information Theory and Coding Design

RESULT 2.4
The mutual information for noiseless channels is given by:

(2.23)

That is, the amount of information provided by the channel is the same as the
information sent through the channel.

2.4.2 Deterministic Channels

A deterministic channel will have either the same or possibly more input symbols
than output symbols and is such that we can determine which output symbol will be
received when a particular input symbol is transmitted.

DEFINITION 2.4 Deterministic Channel A channel in which there are at least

as many input symbols as output symbols, but in which each of the input symbols
is capable of producing only one of the output symbols is called a deterministic
channel. The channel matrix of a deterministic channel has the property that there
is one, and only one, non-zero element in each row, and since the entries along each
row must sum to 1, that non-zero element is equal to 1.

EXAMPLE 2.7

The following channel with 3 outputs and 6 inputs is deterministic because we know,
with certainty 1, which output symbol, ½ , will be received given knowledge
of the transmitted input symbol, .

1
1

1

1

1

1
Information Channels 69

The corresponding channel matrix is given by:

and we note that there is only one nonzero element in each row and that the element
is 1.

What is the mutual information of a deterministic channel? Let be the transmitted

input symbol. Then, from the deﬁnition of a deterministic channel, we know that
the received output will be, say, and we know this with certainty 1. That is,
for and hence for all other . The equivocation
then becomes:

¾
¾

¾
¾

since ¾ ½

.
The mutual information is given by the following result.

RESULT 2.5
The mutual information for deterministic channels is given by:

(2.24)

That is, the amount of information provided by the channel is the same as the
information produced by the channel output.

2.5 Cascaded Channels

In most typical cases of information transmission and storage the data are passed
through a cascade of different channels rather than through just the one channel.
70 Fundamentals of Information Theory and Coding Design

One example of where this happens is in modern data communication systems where
data can be sent through different physical transmission media links (e.g., copper
wire, optical ﬁbre) and wireless media links (e.g., satellite, radio) from transmitter
to receiver. Each of these links can be modelled as an independent channel and
the complete transmission path as a cascade of such channels. What happens to
the information when passed through a cascade of channels as compared to a single
channel only?
Intuitively we would expect additive loss of information arising from the cumulative
effects of uncertainty (or equivocation) from each channel in the cascade. Only for
the case of a noiseless channel would we expect no additional loss of information in
passing data through that channel. We verify these and other results when, without
loss of generality, we derive the mutual information of a cascade of two channels and
compare this with the mutual information through the ﬁrst channel.
Consider a pair of channels in cascade as shown in Figure 2.7. In Figure 2.7 the

A B C
Channel AB Channel BC

r− symbol s− symbol t− symbol

alphabet alphabet alphabet

FIGURE 2.7
Cascade of two channels.

output of channel is connected to the input of channel . Thus the symbol

alphabet of size is both the output from channel and the input to channel
. Say the input symbol is transmitted through channel and this produces
as the output from channel . Then forms the input to channel which,
in turn, produces as the output from channel . The output depends solely
on , not on . Thus we can deﬁne a cascade of channels as occurring when the
following condition holds:
(2.25)

Similarly we can also state that the following will also be true for a cascade of chan-
nels:
(2.26)

The problem of interest is comparing the mutual information through channel

only, , with the mutual information through the cascade of channel with
Information Channels 71

channel , that is, . To this end we ﬁrst show that

as follows:

¾ ¾

¾ ¾
(2.27)
¾ ¾ ¾

Equation 2.26 gives , noting that ,

and given that ½ with equality when we can now state:

¾ ¾ ¾

¾ ¾ ¾ ¾

(2.28)
¾ ¾

such that .

with equality when
Since and we have the
following result.

RESULT 2.6
For the cascade of channel with channel it is true that:

(2.29)
That is, channels tend to leak information and the amount of information out of a
cascade can be no greater (and is usually less) than the information from the ﬁrst
channel.

CLAIM 2.1
If channel is noiseless then

PROOF For noiseless channel if then this implies that .

For the cascade of channel with the condition for when
72 Fundamentals of Information Theory and Coding Design

is From Bayes’ Theorem we can show that:

P (2.30)
¾
For a cascade we know that and since when
, and otherwise, for noiseless channel this gives the required result
that:
(2.31)
and hence .

The converse of Claim 2.1, however, is not true since, surprisingly, there may exist
particular channel combinations and input distributions which give rise to
even if channel is not noiseless. The following example illustrates this
point.

EXAMPLE 2.8
Consider the cascade of channel :
1/2

1/2 1/3
A B
1/3

1/3

with channel :
3/4
1/4
1
B C
1/4
3/4

The corresponding channel matrices for and are:

Obviously channel is not noiseless. Nevertheless it is true that:

by virtue of the fact that the channel matrix for the cascaded channel :

is identical to the channel matrix for channel !

2.6 Additivity of Mutual Information

In the previous section it was shown that when information channels are cascaded
there is a tendency for information loss, unless the channels are noiseless. Of par-
ticular interest to communication engineers is the problem of how to reduce infor-
mation loss, especially when confronted with transmission through a noisy channel.
The practical outcome of this is the development of channel codes for reliable trans-
mission (channel codes are discussed in Chapters 5 to 9). From a purely information
theoretic point of view channel coding represents a form of additivity of mutual infor-
mation. Additivity is achieved when we consider the average information provided
by the channel about a single input symbol upon observing a succession of output
symbols. Such a multiplicity of outputs can occur spatially or temporally. Spatial
multiplicity occurs when the same input is transmitted simultaneously through more
than one channel, thereby producing as many outputs as there are channels. Temporal
multiplicity occurs when the same input is repeatedly transmitted through the same
channel, thereby producing as many outputs as repeat transmissions. Both cases
minimise information loss by exploiting redundancy, either extra channel space or
extra time to repeat the same input. We now develop an expression for the mutual
information for the special case of two channel outputs (original plus repeat) and
show that there is a gain of mutual information over using the single original channel
output.
Consider the channel system shown in Figure 2.8 where is the input alphabet of
size , is the original output alphabet of size and is the repeat
output alphabet of size . The output can be construed as the output from
74 Fundamentals of Information Theory and Coding Design

s− symbol
alphabet

B
A
Channel

r− symbol
t− symbol
alphabet
alphabet

FIGURE 2.8
Channel system with two outputs.

the ﬁrst channel (if there are two physical channels) or ﬁrst transmission (if there
is one physical channel) and output can be construed as the output of the second
channel or second (repeat) transmission.
To develop an expression for the mutual information we extend our notion of a priori
and a posteriori probabilities and entropies as discussed in Section 2.3 to include the
contribution of a second output as follows:

: a priori probability of input symbol if we do not know which output sym-

bol was received

: a posteriori probability of input symbol if we know that output symbol

was received
:
a posteriori probability of input symbol if we know that both output
symbols, and , were received.

Thus we can deﬁne the following a posteriori entropy:

(2.32)
¾

the equivocation of with respect to and by:

(2.33)
¾ ¾

Information Channels 75

and the amount of information channels and provide about that is the
mutual information of and , by:

(2.34)

What can we say about , the amount of information

channel provides about , in comparison to , the amount of informa-
tion both channels and provide about ? Expand Equation 2.34 as follows:

(2.35)

where is the amount of information channel additionally provides

about after using channel . It can be shown that:
I (2.36)

Thus we have the following result.

RESULT 2.7
For a channel with input and dual outputs and it is true that:

(2.37)

That is, dual use of a channel provides more information than the single use of a
channel.

EXAMPLE 2.9
Consider a BSC where the input symbol, , is transmitted through the channel twice.
Let represent the output from the original transmission and let be the output
from the repeat transmission. Since the same channel is used:

For simplicity we assume that From Example 2.5 we

have that for :

76 Fundamentals of Information Theory and Coding Design

What is the expression for and how does it compare to ? A direct

extension of Equation 2.11 yields the following expression for :

(2.38)

¾ ¾ ¾

from which we can state that:

where we note that since the repeat output does not depend on the
original output. We list the contribution of each term in the expression of Equation
2.38 as shown by Table 2.1.

Table 2.1 Individual terms of

expression
Type

½ ¾ X
¾ ½ ¾ ¾
½ ¾ ¾ X
¾
½
¾ Z
½
¾ Z
½
¾ Z
½
¾ Z

½ ¾ Y
¾ ½ ¾ ¾

½ ¾ ¾ Y
¾

The Type column assigns a class label to each term. The type X terms represent the
case of no error in both the original or repeat outputs and these positively reinforce
the information provided by the dual use of the channel. Collecting the type X terms
we get:
½ ¾ ¾
¾ ¾ ¾
½ ½
¾ ¾ ¾ ¾
¾ ¾

The type Y terms represent the case of complete error in both the original and repeat
outputs and these negatively reinforce the information provided by the dual use of the
channel. Collecting the type Y terms we get:
½ ¾ ¾
¾ ¾ ¾
½ ½
¾ ¾ ¾ ¾
¾ ¾
Information Channels 77

The type Z terms, however, indicate contradictory original and repeat outputs which
effectively negate any information that the dual use of the channel may provide.
Collecting the type Z terms we see that these make no contribution to the mutual
½
information:

½¾
¾

Putting it all together we get the ﬁnal expression for :

¾ ¾
¾
¾
¾
¾ ¾ ¾

By plotting and for different values of we see from Figure
2.9 that . It is interesting to note the conditions for equality,

1
I(A;B)
0.9 I(A;BC)
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0 0.2 0.4 0.6 0.8 1
q (error)

FIGURE 2.9
Comparison of with .

. This happens when:

¯ , there is 100% no error, so

¯ , there is 100% bit reversal, so
¯ , there is total ambiguity, so
78 Fundamentals of Information Theory and Coding Design

2.7 Channel Capacity: Maximum Mutual Information

Consider an information channel with input alphabet , output alphabet and chan-
nel matrix with conditional channel probabilities . The mutual infor-
mation:
(2.39)
¾ ¾
which, if we now assume the logarithm is base 2, indicates the amount of information
the channel is able to carry in bits per symbol transmitted. The maximum amount
of information carrying capacity for the channel is , the amount of information
that is being transmitted through the channel. But this is reduced by , which
is an indication of the amount of “noise” present in the channel.
The expression for mutual information depends not only on the channel probabil-
ities, , which uniquely identify a channel, but also on how the channel is
used, the input or source probability assignment, . As such cannot
be used to provide a unique and comparative measure of the information carrying
capacity of a channel since it depends on how the channel is used. One solution is
to ensure that the same probability assignment (or input distribution) is used in cal-
culating the mutual information for different channels. The questions then is: which
probability assignment should be used? Obviously we can’t use an input distribu-
tion with since that means for whatever channel we use!
Intuitively an input distribution with maximum information content (i.e., maximum
) makes sense. Although this allows us to compare different channels the com-
parison will not be fair since another input distribution may permit certain channels
to exhibit a higher value of mutual information.
The usual solution is to allow the input distribution to vary for each channel and to
determine the input distribution that produces the maximum mutual information for
that channel. That is we attempt to calculate the maximum amount of information a
channel can carry in any single use (or source assignment) of that channel, and we
refer to this measure as the capacity of the channel.

DEFINITION 2.5 Channel Capacity The maximum average mutual informa-

tion, , in any single use of a channel deﬁnes the channel capacity. Mathe-
matically, the channel capacity, , is deﬁned as:

(2.40)

that is, the maximum mutual information over all possible input probability assign-
ments, .

The channel capacity has the following properties:

Information Channels 79

1. since
2. since and
when considering the expression
, and when considering the
expression

The calculation of involves maximisation of over indepen-

dent variables (the input probabilities, ) subject to the two
constraints:

1.
2.

This constrained maximisation of a non-linear function is not a trivial task. Methods

that can be used include:

standard constrained maximisation techniques like the method of Lagrangian

multipliers
gradient search algorithms
derivation for special cases (e.g., weakly symmetric channels)
the iterative algorithms developed by Arimoto [1] and Blahut [2]

2.7.1 Channel Capacity of a BSC

For a BSC, where is given by Equation 2.19

and is given by Equation 2.20. We want to examine how the mutual in-
formation varies with different uses of the same channel. For the same channel the
channel probabilities, and , remain constant. However, for different uses the input
probability, , varies from 0 to 1. From Figure 2.10 the maximum
mutual information occurs at .
the mutual information expression simpliﬁes to:
For

(2.41)

Since this represents the maximum possible mutual information we can then state:

(2.42)

How does the channel capacity vary for different error probabilities, ? From Figure
2.11 the following observations can be made:
80 Fundamentals of Information Theory and Coding Design

0.8

0.6
I(A;B)
0.4

0.2

0
0 0.2 0.4 0.6 0.8 1
w

FIGURE 2.10
Variation of with different source assignments for a typical BSC.

When or , that is, no error or 100% bit inversion, the BSC channel will
provide its maximum capacity of 1 bit

When the BSC channel is totally ambiguous or useless and has a
capacity of 0 bits

2.7.2 Channel Capacity of a BEC

For a BEC, where is given by Equation 2.21

and is given by Equation 2.22. The expression can be further simpliﬁed to:

(2.43)

from which we can immediately see that:

´µ

´µ
(2.44)

since we know that for the binary input alphabet, , that ´µ
½
occurs when ¾ . The following observations
can be made:
Information Channels 81

0.8

0.6
C
0.4

0.2

0
0 0.2 0.4 0.6 0.8 1
q (error)

FIGURE 2.11
Variation of channel capacity of a BSC with .

when , that is, no erasures, the BEC channel will provide its maximum
capacity of 1 bit

when , the BEC channel will be producing only erasures and have a
capacity of 0 bits

2.7.3 Channel Capacity of Weakly Symmetric Channels

The calculation of the channel capacity, , is generally quite involved since it rep-
resents a problem in constrained optimisation of a non-linear function. However
there is a special class of channels where the derivation of the channel capacity can
be stated explicitly. Both symmetric and weakly symmetric channels are examples
of this special class. We now provide a formal deﬁnition of symmetric and weakly
symmetric channels.

DEFINITION 2.6 (Weakly) Symmetric Channels A channel is said to be sym-

metric if the rows and columns of the channel matrix are permutations of each other.

A channel is said to be weakly symmetric if the rows of the channel matrix are per-
mutations of each other and the column sums, ¾ , are equal.
Obviously a symmetric channel will also have equal column sums since columns
are a permutation of each other.
82 Fundamentals of Information Theory and Coding Design

Consider the weakly symmetric channel with mutual information:

¾ ¾

(2.45)
¾

where ¾ ´½ µ and the summation involves terms in

the th row of the channel matrix. Since each row is a permutation of, say, the ﬁrst
row, then ½ ¾ where .
This means that

since
. Since is bounded above, then the upper bound of
would be the channel capacity if it can be achieved by an appro-
priate input distribution. Let ½ then

½

. Since the channel is weakly symmetric then the column
sums, , are all equal, , and we have that , that is, the output prob-
abilities are equal. Since we know that maximum entropy is achieved with equal
symbol probabilities then it follows that when ½ . We
have established that:

THEOREM 2.1
For a symmetric or weakly symmetric channel , the channel capacity can be
stated explicitly as:
(2.46)
where ´ µ can be calculated for any row .
½

Channel capacity is achieved when the inputs are uniformly distributed, that is,
½ .

EXAMPLE 2.10
Consider the BSC channel matrix:

The BSC is a symmetric channel and we can use Theorem 2.1 to explicitly derive the
channel capacity as follows:

Information Channels 83

when ½
¾, which is the same expression that was derived in Section 2.7.1.

Now consider a channel with the following channel matrix:

Obviously the channel is not symmetric, but since the second row is a permutation of
the ﬁrst row and since the column sums are equal then the channel is weakly symmetric
and the channel capacity can be explicitly stated as:

when ½
¾.

2.8 Continuous Channels and Gaussian Channels

We extend our analysis of information channels to the case of continuous valued in-
put and output alphabets and to the most important class of continuous channel, the
Gaussian channel. In digital communication systems noise analysis at the most basic
level requires consideration of continuous valued random variables rather than dis-
crete quantities. Thus the Gaussian channel represents the most fundamental form
of all types of communication channel systems and is used to provide meaning-
ful insights and theoretical results on the information carrying capacity of channels.
The BSC and BEC models, on the other hand, can be considered as high-level de-
scriptions of the practical implementations and operations observed in most digital
communication systems.

When considering the continuous case our discrete-valued symbols and discrete
probability assignments are replaced by continuous-valued random variables, ,
with associated probability density functions, .
84 Fundamentals of Information Theory and Coding Design

DEFINITION 2.7 Mutual Information of two random variables Let and

be two random variables with joint density and marginal densities
and . Then the mutual information between and is deﬁned as:
(2.47)

where is the differential entropy:

(2.48)

and is the conditional differential entropy:

(2.49)

The mutual information,

, provides a measure of the amount of information
that can be carried by the continuous channel
. Let us now consider the Gaus-
sian Channel shown in Figure 2.12 where
¾ is a zero-mean Gaussian
¾ . Let
random variable with variance be the value of the input to the channel at

time , and let be the value of the output from the channel at time .

+
FIGURE 2.12
The Gaussian channel.

The output of the channel is given by:

(2.50)

That is, the channel output is perturbed by additive white Gaussian noise (AWGN).

We assume that the channel is band-limited to Hz. Two immediate implications

of this are that:
Information Channels 85

1. Assume the AWGN has a power spectral density of . Then the noise
¾
variance, or average power, is band-limited and given by
Ê
¾
¾
2. To faithfully reproduce any signals transmitted through the channel the signals
must be transmitted at a rate not exceeding the Nyquist rate of samples
per second

We further assume that the channel is power-limited to . That is:

¾ ¾ (2.51)

This band-limited, power-limited Gaussian channel just described is not only of the-
oretical importance in the ﬁeld of information theory but of practical importance to
communication engineers since it provides a fundamental model for many modern
communication channels, including wireless radio, satellite and ﬁbre optic links.

2.9 Information Capacity Theorem

In Section 2.7 we deﬁned the channel capacity as the maximum of the mutual in-
formation over all possible input distributions. Of importance to communication en-
gineers is the channel capacity of a band-limited, power-limited Gaussian channel.
This is given by the following maximisation problem:

¾

¾
(2.52)

We now provide the details of deriving an important and well-known closed-form

expression for for Gaussian channels. The result is the Information Capacity
Theorem which gives the capacity of a Gaussian communication channel in terms
of the two main parameters that confront communication engineers when designing
such systems: the signal-to-noise ratio and the available bandwidth of the system.

CLAIM 2.2
If and is uncorrelated with then:

(2.53)
86 Fundamentals of Information Theory and Coding Design

PROOF We note from Bayes’ Theorem that . Also

since and since and are uncorrelated we have that
. Using these in the expression for gives:

(2.54)

From Chapter 1 we stated that for a random variable the differential entropy
Gaussian
attains the maximum value of ; so for the Gaussian random variable, ,
we know that: ¾
(2.55)

For random variable where and are uncorrelated we have that:

¾ ¾ ¾
(2.56)

with the maximum achieved when is a Gaussian random variable. Thus:

¾ ¾

¾

¾ ¾ ¾

¾

¾ (2.57)

¾ then will also be a

If is chosen to be a Gaussian random variable with
Gaussian random variable and the maximum mutual information or channel capacity
will be achieved:

¾ (2.58)

Since the channel is also band-limited to Hz then there can be no more than
¾ . This provides the ﬁnal form of the
symbols transmitted per second and
Information Channels 87

channel capacity, stated as Shannon’s most famous Information Capacity Theorem

[10], which is also known as the Shannon-Hartley Law in recognition of the early
work by Hartley [6].

RESULT 2.8 Information Capacity Theorem

The information capacity of a continuous channel of bandwidth Hz, perturbed

by AWGN of power spectral density and bandlimited also to Hz, is given
by:

(2.59)

where is the average transmitted power and is the signal-to-noise ratio
or SNR.

Equation 2.59 provides the theoretical capacity or upper bound on the bits per second
that can be transmitted for error-free transmission through a channel for a given

transmitted power, , and channel bandwidth, , in the presence of AWGN noise

with power spectral density, . Thus the information capacity theorem deﬁnes a
fundamental limit that confronts communication engineers on the rate for error-free
transmission through a power-limited, band-limited Gaussian channel.

EXAMPLE 2.11

What is the minimum signal-to-noise ratio that is needed to support a 56k modem?

A 56k modem requires a channel capacity of 56,000 bits per second. We assume a
telephone bandwidth of Hz. From Equation 2.59 we have:

or:

Thus a SNR of at least 48 dB is required to support running a 56k modem. In

real telephone channels other factors such as crosstalk, co-channel interference, and
echoes also need to be taken into account.
88 Fundamentals of Information Theory and Coding Design

2.10 Rate Distortion Theory

Consider a discrete, memoryless source with alphabet

and associated probabilities
. Source coding is usually re-
quired to efﬁciently transmit or store messages from the source in the appropriate
representation for the storage or communication medium. For example, with dig-
ital communication channels and storage, source symbols need to be encoded as
a binary representation. Furthermore, source symbols transmitted through a com-
munication channel may be output in a different symbol representation. Thus the
source symbols will appear as symbols from a representation or code word alpha-
bet
. If there are no losses or distortions in the coding or

transmission there is perfect representation and can be recovered fully from its

representation . But in the following situations:

lossiness in the source coding where the code alphabet and permitted code
words do not allow exact representation of the source symbols and the decod-
ing is subject to errors,

insufﬁcient redundancy in the channel code such that the rate of information
is greater than the channel capacity,

the representation is not perfect and there are unavoidable errors or distortions in the

representation of the source symbol by the representation symbol .
Rate distortion theory, first developed by Shannon [11], deals with the minimum mu-
tual information (equivalent to the information or code rate) that the channel must
possess, for the given source symbol probability distributions, to ensure that the av-
erage distortion is guaranteed not to exceed a specified threshold . To derive this
value we first define what we mean by an information or code rate (this is for the
general case; in Chapter 5 we define the code rate for the specific case that applies
with channel codes).

DEFINITION 2.8 Code Rate (General Case) Assume one of possible source

messages is represented as a code word of length . Let be the average

number of bits transmitted with each source message, then the code rate is deﬁned

as:
(2.60)

For the case of equally likely source messages, . For the general

case, is the entropy of the source message.
Information Channels 89

DEFINITION 2.9 Distortion Measures The single-letter distortion measure,

, is deﬁned as the measure of the cost incurred in representing the source
symbol by the representation symbol . The average distortion, , is deﬁned as
the average of over all possible source symbol and representation symbol
combinations:

(2.61)

where the are the channel or transition probabilities.

Consider all possible conditional probability assignments for for the given
source and representation alphabets. An assignment is deemed to be -admissible if
and only if the average distortion, , is less than or equal to some acceptable or spec-
iﬁed threshold, . The set of all -admissible conditional probability assignments
is denoted by:
(2.62)

For each -admissible conditional probability assignment we have an associated

mutual information, or information rate, given by:

(2.63)

where are the representation alphabet probabilities.
DEFINITION 2.10 Rate Distortion Function For given and ﬁxed source
probability distribution the minimum information rate we
require for the given average distortion is given by the rate distortion function:

(2.64)

¾
which is derived by ﬁnding the -admissible conditional probability assignment
that minimises the mutual information subject to the constraints:

(2.65)

Intuitively, we expect that tolerating a larger distortion permits the use of lower in-
formation rates (as is the case of achieving low bit rate transmission by lossy com-
pression) and conversely that by increasing the information rate lower distortion is
possible.
The calculation of from Equation 2.64 involves a minimisation of a non-linear
function over an unknown but constrained set of probabilities, which is a very difﬁ-
cult problem analytically.
90 Fundamentals of Information Theory and Coding Design

EXAMPLE 2.12
Consider a binary source with symbols ½ ¾ that are detected at the
receiver as the ternary representation alphabet with symbols ½ ¾ ¿ .
This representation covers the following three cases:

½ is detected as ½ and ¾ is detected as ¿ , that is, there is

no error in representation.

½ is detected as ¾ and ¾ is detected as ¾ , that is, the

receiver fails to detect the symbol and there is an erasure.

½ is detected as ¿ and ¾ is detected as ½ , that is, the

receiver detects the symbol incorrectly and there is a bit inversion.

The corresponding channel is shown in Figure 2.13 where are the

transition probabilities and we are given that ½ ¾ .

½½ 0 ½
½ 0
½¾
½¿
¾½ ? ¾

¾¾
¾ 1
¾¿ 1 ¿

FIGURE 2.13
Graphical representation of information channel.

We deﬁne a distortion measure for this channel as follows:

½ ½ ¾ ¿ to indicate there is no distortion (or distortion of )

when there is no error in the representation.

½ ¾ ¾ ¾ to indicate there is a distortion or cost of when

there is an erasure.

½ ¿ ¾ ½ to indicate there is a distortion of where there is

a bit inversion.
Information Channels 91

0 0
½ 0
1
3
?
3
1
1
0 1

FIGURE 2.14
Graphical representation of distortion measure.

Note that a higher distortion of is attributed to bit inversions compared to a lower

distortion of when there is an erasure. The set of distortion measures are represented
graphically in Figure 2.14.
Assume the following conditional probability assignment for the channel:

(2.66)

We calculate the representation alphabet probabilities: , ,

, and the mutual information (from Equation 2.63 ) is .
The average distortion is then:

Consider the derivation of , that is, the rate distortion function for .
The conditional probability assignment of Equation 2.66 is -admissible and provides
an information rate of . Is there another probability assignment
with and information rate ? Intuitively one expects the
minimum information rate to occur at the maximum allowable distortion of .
To investigate this we consider the derivation of the corresponding BSC and BEC
equivalents of Figure 2.13 such that . The following conditional probability
assignments:

92 Fundamentals of Information Theory and Coding Design

both yield the same average distortion of . For the BSC equivalent the mutual
information is calculated as and for the BEC equivalent we have
. Thus the BSC equivalent is the channel with lowest information
rate for the same level of average distortion. To ﬁnd we need to consider
all -admissible conditional probability assignments and ﬁnd the one providing the
minimum value for the mutual information.

EXAMPLE 2.13

An important example of rate distortion analysis is for the case of analog to digital con-
version when the analog source signal has to be represented as a digital source signal.
An important component of this conversion is the quantisation of the continuous-
valued analog signal to a discrete-valued digital sample.
Consider source symbols generated from a discrete-time, memoryless Gaussian source
with zero mean and variance ¾ . Let denote the value of the source symbol or sam-
ple generated by this source. Although the source symbol is discrete in time it is
continuous-valued and a discrete-valued (quantised) representation of is needed for
storage and transmission through digital media. Let be a symbol from the discrete-
valued representation alphabet that is used to represent the continuous-valued .
For example can be the set of non-negative integers and is the value
of rounded to the nearest integer. It should be noted that, strictly speaking, a finite
representation is needed since digital data are stored in a finite number of bits. This is
usually achieved in practice by assuming a limited dynamic range for and limiting
the representation alphabet to a finite set of non-negative integers. Thus we refer to
as the quantised version of and the mapping of to as the quantisation operation.
The most intuitive distortion measure between and is a measure of the error in
representing by and the most widely used choice is the squared error distortion:

¾ (2.67)

It can be shown with the appropriate derivation (see [4, 5]) that the rate distortion
function for the quantisation of a Gaussian source with variance ¾ with a squared
error distortion is given by:
¾

½
¾ ¾
(2.68)
¾
Information Channels 93

2.10.1 Properties of
By considering the minimum and maximum permissible values for and the
corresponding distortion threshold, , a more intuitive understanding of the be-
haviour of the rate distortion function, , is possible. Obviously the minimum
value of is 0, implying that there is no minimum rate of information, but what
does this mean and what does the corresponding distortion threshold imply? Intu-
itively the maximum value of should be and this should happen when
(perfect reconstruction), but is this really the case? We answer these questions
by stating and proving the following result.

RESULT 2.9
The rate distortion function is a monotonically decreasing function of , limited in
range by:
(2.69)
where indicates that the upper bound occurs for the minimum
possible value of the distortion, , and indicates that there is a
maximum permissible value such that for .

A typical plot of as a function of for the case when is shown in

Figure 2.15.

0
FIGURE 2.15
Sketch of typical function .

We can prove that by considering the minimum permissi-

ble value for the distortion . Since this is equivalent to ﬁnding the smallest
possible . That is what we want to ﬁnd the conditional probability assignment that
94 Fundamentals of Information Theory and Coding Design

minimises:

(2.70)

The minimum occurs by considering, for each , the value of that minimises
and setting for these values, with all other conditional prob-
abilities set to zero. For each deﬁne as the minimum
distortion for and , where , as the representation
symbol that yields the minimum distortion. We set with all other
conditional probabilities for being set to zero. Hence:

(2.71)

In typical applications is the representation alphabet symbol that uniquely

identiﬁes the source symbol and in such cases the above conditional probabil-
ity assignment implies that (perfect reconstruction since there is no
uncertainty) and hence Thus we have that .
Furthermore we typically assign a distortion of for the pair and this
means that ; thus .
Derivation of the maximum permissible relies on the observation that this
condition occurs when . That is, and thus
, the representation symbols are independent of the source symbols and there
is no information conveyed by the channel. The average distortion measure for a
channel with is given by:

(2.72)

Then since we are looking for such that for , then
is given by the minimum value of Equation 2.72 over all possible probability
assignments for . Deﬁne . The mini-
ofEquation
mum 2.72 occurs when for that such that the expression

is smallest, and for all other . This
gives:

(2.73)

Equation 2.73 implies that if we are happy to tolerate an average distortion that is as
much as then there is a choice of representation that is independent of
such that .
Information Channels 95

EXAMPLE 2.14
Consider the channel from Example 2.12. The following conditional probability
assignment:
½

obviously implies that . The resulting average distortion is:

However this may not be since there may be other conditional probability
assignments for which that provide a lower distortion. Indeed using
Equation 2.73 gives:

Thus and this occurs with the following conditional probability assign-
ment:

Interestingly this represents the condition where, no matter what input is transmitted
through the channel, the output will always be with an average cost of 1. Thus
if we can tolerate an average distortion of 1 or more we might as well not bother!

EXAMPLE 2.15

Consider the rate distortion function given by Equation 2.68 in Example 2.13 for the
quantisation of a Gaussian source with squared error distortion. Since the source
is continuous-valued and is discrete-valued then for the rate
distortion function must be inﬁnite since no amount of information will ever be able
to reconstruct from with no errors, and the quantisation process, no matter what
discrete-valued representation is used, will always involve a loss of information. That
is, from Equation 2.68:

It should be noted that this does not contradict Result 2.9 since that result implicitly
assumed that both and were discrete-valued.
96 Fundamentals of Information Theory and Coding Design

For the case of zero rate distortion Equation 2.68 gives:

¾
This result can be conﬁrmed intuitively by observing that if no information is provided
(i.e., the receiver or decoder does not have access to ) the best estimate for is its
mean value from which the average squared error will, of course, be the variance, ¾ .

2.11 Exercises
1. A binary channel correctly transmits a 0 (as a 0) twice as many times as trans-
mitting it incorrectly (as a 1) and correctly transmits a 1 (as a 1) three times
more often then transmitting it incorrectly (as a 0). The input to the channel
can be assumed equiprobable.

(a) What is the channel matrix ? Sketch the channel.

(b) Calculate the output probabilities, .
(c) Calculate the backward channel probabilities, .
2. Secret agent 101 communicates with her source of information by phone, un-
fortunately in a foreign language over a noisy connection. Agent 101 asks
questions requiring only yes and no answers from the source. Due to the noise
and language barrier, Agent 101 hears and interprets the answer correctly only
75% of the time, she fails to understand the answer 10% of the time and she
misinterprets the answer 15% of the time. Before asking the question, Agent
101 expects the answer yes 80% of the time.

(a) Sketch the communication channel that exists between Agent 101 and
her source.
(b) Before hearing the answer to the question what is Agent 101’s average
uncertainty about the answer?
(c) Agent 101 interprets the answer over the phone as no. What is her av-
erage uncertainty about the answer? Is she is more uncertain or less
uncertain about the answer given her interpretation of what she heard?
Explain!
(d) Agent 101 now interprets the answer over the phone as yes. What is her
average uncertainty about the answer? Is she is more uncertain or less
uncertain about the answer given her interpretation of what she heard?
Explain and compare this with the previous case.
Information Channels 97

3. Calculate the equivocation of A with respect to B,

, for the com-
munication channel of Qu. 2. Now calculate the mutual information of the
channel as
. What can you say about the chan-
nel given your answers for Qu. 2(c) and 2(d)? Now verify that

by calculating and
.
*4. A friend of yours has just seen your exam results and has telephoned to tell
you whether you have passed or failed. Alas the telephone connection is so
bad that if your friend says “pass” you mistake that for “fail” 3 out of 10 times
and if your friend says “fail” you mistake that for “pass” 1 out of 10 times.
Before talking to your friend you were 60% conﬁdent that you had passed the
exam. How conﬁdent are you of having passed the exam if you heard your
friend say you have passed?
5. Which is better “erasure” or “crossover”?

(a) Consider a ﬁbre-optic communication channel with crossover probabil-

ity of ¾ and a wireless mobile channel with erasure probability of
¾ . Calculate the mutual information assuming equiprobable inputs
for both types of communication channels. Which system provides more
information for the same bit error rate?
*(b) Let us examine the problem another way. Consider a ﬁbre-optic com-

munication channel with crossover probability of and a wireless

mobile channel with erasure probability of . Assume equiproba-
ble inputs and express the mutual information
for the ﬁbre optic

channel as a function of and for the wireless mobile channel as a

function of . Calculate and for the same mutual infor-

mation of . Which channel can get away with a higher bit error rate
and still provide the same amount of mutual information?

6. Given , and ﬁnd ,

and .
7. Consider the following binary channel:

3/5
4/7 0 0

A B

3/7 1 1
2/3

Calculate as painlessly

as possible.
98 Fundamentals of Information Theory and Coding Design

8. Here are some quick questions:

(a) Prove that if a BSC is shorted (i.e., all the outputs are grounded to 0) then
the channel provides no information about the input.
(b) Consider the statement: “Surely if you know the information of the
source, , and the information provided by the channel, ,
you will know the information of the output,
?” Prove whether
this statement is true or not. If not, under what conditions, if any, would
it be true?
(c) Conceptually state (in plain English) what
means.

9. Sketch a sample channel with inputs and outputs, ﬁnd the expression for the
mutual information, the channel capacity and the input probability distribution
to achieve capacity, for the following cases:

(a) a noiseless, non-deterministic channel

(b) a deterministic, non-noiseless channel
(c) a noiseless, deterministic channel

*10. It was established that the minimum value of is 0, that is .

What is the maximum value of ?
11. Show that the mutual information expression for a BEC can be simpliﬁed to:

12. Consider the following errors-and-erasure channel:

p
0 0
1−p−q
q

A ? B

q
1−p−q
1 1
p

Find all the values of and for which the above channel is:
Information Channels 99

(a) totally ambiguous (i.e., ),

(b) noiseless,
(c) deterministic.

13. Channel has channel matrix:

and is connected to channel with matrix:

The ternary input to the channel system has the following statistics:
, , and . The output of the channel is and
the output of channel is .

(a) Calculate , and .

(b) Calculate , and .
(c) What can you say about channel ? Explain!

14. The input to a BEC is repeated as shown below:

p
0 0
q B (from original transmission)
A ?
q C (from repeat transmission)
1 1
p

Given equiprobable binary inputs derive the expression for and

show that for:

(a)

(b)

15. Agent 01 contacts two of his sources by email for some straight “yes” and
“no” answers. The ﬁrst source he contacts is known to be unreliable and to
give wrong answers about 30% of the time. Hence Agent 01 contacts his
second source to ask for the same information. Unfortunately, the second
source insists on using a non-standard email encoding and Agent 01 ﬁnds that
only 60% of the answers are intelligible.

(a) What is the average uncertainty Agent 01 has about the input given the
answers he receives from his first source? Hence what is the mutual
information of the first source?
(b) What is the average uncertainty Agent 01 has about the input given the
answers he receives from both the first and second source? Hence what
is the mutual information from both sources?

*16. In order to improve utilisation of a BSC, special input and output electron-
ics are designed so that the input to the BSC is sent twice and the output is
interpreted as follows:

¯ if two 0’s are received the channel outputs a single 0

¯ if two 1’s are received the channel outputs a single 1
¯ if either 01 or 10 is received the channel outputs a single ?

Derive the expression for the mutual information through this augmented BSC
assuming equiprobable inputs and compare this with the mutual information
of a standard BSC. Either analytically or numerically show whether this aug-
mented BSC is superior to the standard BSC. What price is paid for this “im-
proved” performance?
17. The mutual information between two random variables, and , is deﬁned
by Equation 2.47. Explain how
provides a measure of the amount of
independence between and .
*18. A digital transmission channel consists of a terrestrial ﬁbre-optic link with a
measured cross-over (bit error) probability of 0.1 followed by a satellite link
with an erasure probability of 0.2. No prior statistics regarding the source
of information being transmitted through the channel are available. What is
the average amount of information that can be resolved by the channel? In
order to improve the reliability of the channel it is proposed that the same data
being transmitted through the channel also be transmitted through a cheaper
and less reliable terrestrial/marine copper channel with cross-over probability
of 0.3. What is the average amount of information that can now be transmitted
through the combined channel system? Compare with your previous answer.
What is the cost of this improvement?
19. Consider the following errors-and-erasure channel:
Information Channels 101
p
0 0
1−p−q
q

A ? B

q
1−p−q
1 1
p

Under what conditions will the channel be weakly symmetric? What is the
expression for the channel capacity when the channel is weakly symmetric?

*20. Consider the following errors-and-erasure channel:

0.7
0 0
0.1
0.2

A ? B

0.3
0.1
1 1
0.6

Express as a function of . Hence derive the channel

capacity by ﬁnding the value of that maximises . You can do this
numerically, graphically or analytically.

*21. Consider the following channel:

102 Fundamentals of Information Theory and Coding Design
0 0

A 1 1 B

1/2
1/2
2

Let ½ and ; thus .

Calculate the mutual information for the following cases:

(a)
(b) ,

Now derive an algebraic expression for the mutual information as a function

of and and graphically, numerically or otherwise try to find the condition
for channel capacity. Explain your findings!
22. A proposed monochrome television picture standard consists of pixels
per frame, each occupying one of 16 grayscale levels with equal probability.
Calculate the minimum bandwidth required to support the transmission of 40
frames per second when the signal-to-noise ratio is 25 dB. Given that for effi-
cient spectrum usage the bandwidth should not exceed 10 MHz what do you
think happened to this standard?
23. You are asked to consider the design of a cable modem utilising a broadband
communications network. One of the requirements is the ability to support
full duplex 10 Mbps data communications over a standard television channel
bandwidth of 5.5 MHz. What is the minimum signal-to-noise ratio that is
required to support this facility?
24. Consider a BSC with channel matrix:

with input and output .
Define the
following single-letter distortion measure:

Assuming equiprobable inputs derive the expression for the average distortion,
. Hence what is the expression for the rate-distortion function as a
function of ?
Information Channels 103

25. For Qu. 24, what is and ?

26. For Qu. 24, what is
and what is a value of that yields ?
*27. Consider the following errors-and-erasure channel:

p
0 0
1−p−q
q

A ? B

q
1−p−q
1 1
p

with input ½ ¾ and output ½ ¾ ¿ . Deﬁne the
following single-letter distortion measure:

Assuming equiprobable inputs derive the expression for the average distortion,
. State the constrained minimisation problem that you need to solve to derive
the rate-distortion function,
. Can you solve it?

28. For Qu. 27 what is and what are values of and that yield ?

2.12 References
[1] S. Arimoto, An algorithm for calculating the capacity of an arbitrary discrete
memoryless channel, IEEE Trans. Inform. Theory, IT-18, 14-20, 1972.
[2] R. Blahut, Computation of channel capacity and rate distortion functions, IEEE
Trans. Inform. Theory, IT-18, 460-473, 1972.
[3] J.-F. Cardoso, Blind signal separation: statistical principles, Proceedings of the
IEEE, 86(10), 2009-2025, 1998.
104 Fundamentals of Information Theory and Coding Design

[4] T.M. Cover and J.A. Thomas, Elements of Information Theory, John Wiley &
Sons, New York, 1991.
[5] R.G. Gallager, Information Theory and Reliable Communication, John Wiley
& Sons, New York, 1968.
[6] T.V.L. Hartley, Transmission of information, Bell System Tech. J., 7, 535-563,
1928.
[7] S. Haykin, Communication Systems, John Wiley & Sons, New York, 4th ed.,
2001.
[8] J. Jeong, J.C. Gore, and B.S. Peterson, Mutual information analysis of the
EEG in patients with Alzheimer’s disease, Clinical Neurophysiology, 112(5),
827-835, 2001.
[9] Y. Normandin, R. Cardin, and R. De Mori, High-performance connected digit
recognition using maximum mutual information, IEEE Trans. Speech and Au-
dio Processing, 2(2), 299-311, 1994.
[10] C.E. Shannon, A mathematical theory of communication, Bell System Tech. J.,
vol. 28, pg 379-423, 623-656, 1948.
[11] C.E. Shannon, Coding theorems for a discrete source with a ﬁdelity criterion,
IRE Nat. Conv. Record, Part 4, 142-163, 1959.
[12] P. Viola, and W.M. Wells, Alignment by maximization of mutual information,
Int. J. of Comput. Vision, 24(2), 137-154, 1997.

CSC310: Information Channels Overview
No ratings yet
CSC310: Information Channels Overview
15 pages
Channel Coding in Noisy Channels
No ratings yet
Channel Coding in Noisy Channels
24 pages
Understanding Channel Coding in TE361
No ratings yet
Understanding Channel Coding in TE361
65 pages
1 Merged
No ratings yet
1 Merged
182 pages
Communication Channels: Course Name: Information Theory and Coding Course Code: 19ECE312 Semester: 6 Sem - ECE
No ratings yet
Communication Channels: Course Name: Information Theory and Coding Course Code: 19ECE312 Semester: 6 Sem - ECE
171 pages
Channel Capacity in Information Theory
No ratings yet
Channel Capacity in Information Theory
4 pages
Information Theory: Mohamed Hamada
No ratings yet
Information Theory: Mohamed Hamada
24 pages
ECS452 2014 Premidterm
No ratings yet
ECS452 2014 Premidterm
56 pages
Shannon's Coding Theorems: Saint Mary's College of California
No ratings yet
Shannon's Coding Theorems: Saint Mary's College of California
23 pages
Tc-502 Information Theory
No ratings yet
Tc-502 Information Theory
28 pages
Channel Encoding in Wireless Networks
No ratings yet
Channel Encoding in Wireless Networks
5 pages
Fundamentals of Digital Communications and Data Transmission
No ratings yet
Fundamentals of Digital Communications and Data Transmission
114 pages
Chapter 1
No ratings yet
Chapter 1
22 pages
Mathematics in Communications: Introduction To Coding: F F Tusubira, PHD, Muipe, Miee, Reng, Ceng
No ratings yet
Mathematics in Communications: Introduction To Coding: F F Tusubira, PHD, Muipe, Miee, Reng, Ceng
8 pages
EEE 309 Communication Systems I: Dr. Md. Forkan Uddin Associate Professor Dept. of EEE, BUET, Dhaka 1205
No ratings yet
EEE 309 Communication Systems I: Dr. Md. Forkan Uddin Associate Professor Dept. of EEE, BUET, Dhaka 1205
55 pages
Stats
No ratings yet
Stats
14 pages
Channel Capacity Explained
No ratings yet
Channel Capacity Explained
51 pages
Chapter 4
No ratings yet
Chapter 4
89 pages
Chapte-2 Information Theory and Coding
No ratings yet
Chapte-2 Information Theory and Coding
68 pages
Discrete Memoryless Channels Guide
No ratings yet
Discrete Memoryless Channels Guide
22 pages
Information Theory Basics
No ratings yet
Information Theory Basics
36 pages
Information Theory1
No ratings yet
Information Theory1
5 pages
01-Syllabus and Intro
No ratings yet
01-Syllabus and Intro
21 pages
Information Sources in Communication Systems
No ratings yet
Information Sources in Communication Systems
15 pages
ITC Unit 1
No ratings yet
ITC Unit 1
132 pages
L9,10, L11 - Module 3 Channel Models and Capacity
No ratings yet
L9,10, L11 - Module 3 Channel Models and Capacity
40 pages
Generic Digital Communication System
No ratings yet
Generic Digital Communication System
17 pages
Notes Digital Communication Lecture 1 - 4
100% (2)
Notes Digital Communication Lecture 1 - 4
63 pages
Notes Digital Communication Lecture 1 - 3
No ratings yet
Notes Digital Communication Lecture 1 - 3
63 pages
1 Informatics Theory
No ratings yet
1 Informatics Theory
67 pages
Channel Encoding
No ratings yet
Channel Encoding
11 pages
Information Theory and Coding - Chapter 5
No ratings yet
Information Theory and Coding - Chapter 5
41 pages
Digital Communication Chapter 3
No ratings yet
Digital Communication Chapter 3
37 pages
תקשורת ספרתית- הרצאה 1 - הקדמה
No ratings yet
תקשורת ספרתית- הרצאה 1 - הקדמה
34 pages
Digital Communication System Basics
No ratings yet
Digital Communication System Basics
4 pages
Lecture Note v1.2
No ratings yet
Lecture Note v1.2
107 pages
Channel Capacity and Models
No ratings yet
Channel Capacity and Models
30 pages
Dcom Easy Solution PDF
No ratings yet
Dcom Easy Solution PDF
97 pages
Discrete Memoryless Sources Explained
No ratings yet
Discrete Memoryless Sources Explained
37 pages
Notes Digital Communication Lecture 1
0% (1)
Notes Digital Communication Lecture 1
61 pages
Dgital Communication Lectures
No ratings yet
Dgital Communication Lectures
202 pages
Wireless Systems for Engineers
No ratings yet
Wireless Systems for Engineers
109 pages
Communication Theory II - Lecture 8
No ratings yet
Communication Theory II - Lecture 8
35 pages
Chapter 3 FEC
No ratings yet
Chapter 3 FEC
10 pages
Digital Communication System Overview
No ratings yet
Digital Communication System Overview
46 pages
Unit III Acs
No ratings yet
Unit III Acs
91 pages
Chapter #1 - Basic of Communication System
No ratings yet
Chapter #1 - Basic of Communication System
77 pages
Channel (Communications)
No ratings yet
Channel (Communications)
5 pages
T4 NoiseAndMutualInformation
No ratings yet
T4 NoiseAndMutualInformation
8 pages
Digital Com theory-Compiled-P.Nagaraju PDF
No ratings yet
Digital Com theory-Compiled-P.Nagaraju PDF
59 pages
Understanding Communication Channels
No ratings yet
Understanding Communication Channels
12 pages
Chapter One - Part 2
No ratings yet
Chapter One - Part 2
17 pages
Introductory Concept: Presented by
No ratings yet
Introductory Concept: Presented by
17 pages
Comm... System CH2-Lec1
No ratings yet
Comm... System CH2-Lec1
36 pages
Using The Surface Ground Penetrating Radar Method For Subsurface Investigation
No ratings yet
Using The Surface Ground Penetrating Radar Method For Subsurface Investigation
17 pages
AENB MAA 32T32R 128AE B40 240W Datasheet
No ratings yet
AENB MAA 32T32R 128AE B40 240W Datasheet
7 pages
Radio
No ratings yet
Radio
1 page
YC73 Stage Keyboard Owner's Manual
No ratings yet
YC73 Stage Keyboard Owner's Manual
76 pages
Pioneer DEH-2900MP Manual en
No ratings yet
Pioneer DEH-2900MP Manual en
20 pages
Photodiode, Phototransistor - Working and Application Circuits - Homemade Circuit Projects
No ratings yet
Photodiode, Phototransistor - Working and Application Circuits - Homemade Circuit Projects
11 pages
Features Description: Ltc3415 7A, Polyphase Synchronous Step-Down Regulator
No ratings yet
Features Description: Ltc3415 7A, Polyphase Synchronous Step-Down Regulator
28 pages
Jeppview For Windows: General Information General Information
100% (2)
Jeppview For Windows: General Information General Information
13 pages
6 Design of U-Slot Rectangular Patch Antenna For Wireless LAN at 2.45GHz-Libre1
No ratings yet
6 Design of U-Slot Rectangular Patch Antenna For Wireless LAN at 2.45GHz-Libre1
4 pages
FURIA2 UPV - EHU Robotiker IEEEVTCSpring09 MIMO
No ratings yet
FURIA2 UPV - EHU Robotiker IEEEVTCSpring09 MIMO
7 pages
How To Calibrate VEGA LT PDF
No ratings yet
How To Calibrate VEGA LT PDF
4 pages
Audi A4 From 1998 Dash Pinouts
No ratings yet
Audi A4 From 1998 Dash Pinouts
4 pages
Radio Reader Essays in The Cultural History of Radio 1st Edition Michele Hilmes PDF Download
100% (4)
Radio Reader Essays in The Cultural History of Radio 1st Edition Michele Hilmes PDF Download
54 pages
Olympus Digital Voice Recorder WS650S - WS750M - WS760M - ENGLISH - E02
No ratings yet
Olympus Digital Voice Recorder WS650S - WS750M - WS760M - ENGLISH - E02
92 pages
BCA Data Communication - Version1
No ratings yet
BCA Data Communication - Version1
31 pages
Sir Seewoosagur Ramgoolam Airport Data
No ratings yet
Sir Seewoosagur Ramgoolam Airport Data
22 pages
FrSky DHT Module Instruction Manual
No ratings yet
FrSky DHT Module Instruction Manual
2 pages
U2270b Em4095
No ratings yet
U2270b Em4095
6 pages
Introduction To Telecommunication
No ratings yet
Introduction To Telecommunication
30 pages
Roaming Number Provisioning in MSC
No ratings yet
Roaming Number Provisioning in MSC
43 pages
Ceiling Mount Pir Intrusion Detector: Description
No ratings yet
Ceiling Mount Pir Intrusion Detector: Description
2 pages
John Denver's Senate Statement on Censorship
No ratings yet
John Denver's Senate Statement on Censorship
4 pages
Lab 9 Exercise - Sampling Theorem - 2024
No ratings yet
Lab 9 Exercise - Sampling Theorem - 2024
4 pages
FURUNO FR-2105/2105-B Radar Manual
No ratings yet
FURUNO FR-2105/2105-B Radar Manual
207 pages
Ir 2 Mockexam 240306110537
No ratings yet
Ir 2 Mockexam 240306110537
23 pages
Mock Interview Answers Unistring
No ratings yet
Mock Interview Answers Unistring
2 pages
Megalodon Damp Ceiling Fan: Español P. 19
No ratings yet
Megalodon Damp Ceiling Fan: Español P. 19
36 pages
Coaxial Cables for Mobile Applications
No ratings yet
Coaxial Cables for Mobile Applications
6 pages
Pilot's Visual Approach Guide
No ratings yet
Pilot's Visual Approach Guide
1 page
VRX932LAExplodedView
0% (1)
VRX932LAExplodedView
4 pages

Informations Channels For Coding Theory

Uploaded by

Informations Channels For Coding Theory

Uploaded by

Chapter 2

2.1 What Are Information Channels?

We now formally deﬁne a mathematical structure for an information channel.

DEFINITION 2.1 Information Channel An information channel is a triple

The conditional probabilities that describe an information channel can be represented

A graphical representation of an information channel is given in Figure 2.1.

The channel matrix exhibits the following properties and structure:

A typical input-output sequence from this channel could be:

A typical input-output sequence from this channel could be:

2.2 BSC and BEC Channels

2.3 Mutual Information

where is the joint probability of and .

The output probabilities are calculated as follows:

Conceptually we can characterise the probabilities as a priori if they provide the

a priori probability of output symbol if we do not know which input symbol

a posteriori probability of output symbol if we know that input symbol

a posteriori probability of input symbol if we know that output symbol

We can similarly refer to the a priori entropy of A:

and thus   .

The average of the a posterior entropies of A,  , calculated in Example 2.3 is

 reduction in the average uncertainty of the input to the channel

The mutual information, , indicates the information about , , that is

noisefree If this implies which means the channel

2.3.1 Importance of Mutual Information

2.3.2 Properties of the Mutual Information

   (2.12)

    

with   if and only if     , i.e., the input and

The expression for provided by Equation 2.11 is symmetric in the variables

The term  is sometimes referred to as the equivocation of B with respect to

joint entropy can be derived:

The relation can be stated conceptually as:

“The total uncertainty in both and   is the sum of the uncertainties in

REMARK 2.1 Given the input probabilities, 

The mutual information of a BSC can be expressed algebraically by deﬁning:

The expression for   is:

The simpliﬁed expression for    is:

and the mutual information is: .

1.  : The channel operates in an ambiguous manner since the errors

2. : The channel is noisefree,  

and the mutual information is

3.  : The source exhibits maximum entropy with , the output

4.  : The source contains no information since , the output

The mutual information of a BEC can be similarly expressed algebraically. Adopting

The simpliﬁed expression for reduces to the same expression as Equation

and the mutual information is: .

2.4 Noiseless and Deterministic Channels

2.4.1 Noiseless Channels

DEFINITION 2.3 Noiseless Channel A channel in which there are at least

The corresponding channel matrix is given by:

2.4.2 Deterministic Channels

DEFINITION 2.4 Deterministic Channel A channel in which there are at least

The corresponding channel matrix is given by:

What is the mutual information of a deterministic channel? Let be the transmitted

2.5 Cascaded Channels

r− symbol s− symbol t− symbol

output of channel is connected to the input of channel . Thus the symbol

The problem of interest is comparing the mutual information through channel

channel , that is, . To this end we ﬁrst show that

     

Equation 2.26 gives      , noting that          ,

   such that    .

PROOF For noiseless channel if    then this implies that   .

is  From Bayes’ Theorem we can show that:

which produces the cascaded channel :

The corresponding channel matrices for and are:

2.6 Additivity of Mutual Information

: a priori probability of input symbol if we do not know which output sym-

 : a posteriori probability of input symbol if we know that output symbol

Thus we can deﬁne the following a posteriori entropy:

the equivocation of with respect to and by:

What can we say about   , the amount of information

where is the amount of information channel additionally provides

Thus we have the following result.

     (2.37)

For simplicity we assume that       From Example 2.5 we

and thus .

The average of the a posterior entropies of A, , calculated in Example 2.3 is

reduction in the average uncertainty of the input to the channel

(2.12)

with if and only if , i.e., the input and

The term is sometimes referred to as the equivocation of B with respect to

“The total uncertainty in both and is the sum of the uncertainties in

REMARK 2.1 Given the input probabilities,

The expression for is:

The simpliﬁed expression for is:

1. : The channel operates in an ambiguous manner since the errors

2. : The channel is noisefree,

3. : The source exhibits maximum entropy with , the output

4. : The source contains no information since , the output

Equation 2.26 gives , noting that ,

such that .

PROOF For noiseless channel if then this implies that .

is From Bayes’ Theorem we can show that:

which produces the cascaded channel :

: a posteriori probability of input symbol if we know that output symbol

What can we say about , the amount of information

(2.37)

For simplicity we assume that From Example 2.5 we

Putting it all together we get the ﬁnal expression for :

. This happens when:

¯ , there is 100% no error, so

The calculation of involves maximisation of over indepen-

standard constrained maximisation techniques like the method of Lagrangian

For a BSC, where is given by Equation 2.19

For a BEC, where is given by Equation 2.21

where ¾ ´½ µ and the summation involves terms in

(2.48)

and is the conditional differential entropy:

The mutual information,

We further assume that the channel is power-limited to . That is:

For random variable where and are uncorrelated we have that:

¾ then will also be a

For each -admissible conditional probability assignment we have an associated

The corresponding channel is shown in Figure 2.13 where are the

½ ½ ¾ ¿ to indicate there is no distortion (or distortion of )

½ ¾ ¾ ¾ to indicate there is a distortion or cost of when

½ ¿ ¾ ½ to indicate there is a distortion of where there is

We calculate the representation alphabet probabilities: , ,

(2.71)

(a) What is the channel matrix ? Sketch the channel.

6. Given , and ﬁnd ,

Calculate as painlessly