Web Coding Book
Web Coding Book
1
Department of Computer Science and Engineering, University at Buffalo, SUNY. Work supported by
NSF CAREER grant CCF-0844796.
2
Foreword
This book is based on lecture notes from coding theory courses taught by Venkatesan Gu-
ruswami at University at Washington and CMU; by Atri Rudra at University at Buffalo, SUNY
and by Madhu Sudan at Harvard and MIT.
This version is dated January 31, 2022. For the latest version, please go to
http://www.cse.buffalo.edu/faculty/atri/courses/coding-theory/book/
The material in this book is supported in part by the National Science Foundation under CA-
REER grant CCF-0844796. Any opinions, findings and conclusions or recomendations expressed
in this material are those of the author(s) and do not necessarily reflect the views of the National
Science Foundation (NSF).
3
4
Contents
I The Basics 17
1 The Fundamental Question 19
1.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.2 Some definitions and codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
1.3 Error correction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
1.4 Distance of a code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
1.5 Hamming Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
1.6 Hamming Bound . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
1.7 Generalized Hamming Bound . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
1.8 Family of codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
1.9 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
1.10 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
II The Combinatorics 79
4 What Can and Cannot Be Done-I 81
5
4.1 Asymptotic Version of the Hamming Bound . . . . . . . . . . . . . . . . . . . . . . . 81
4.2 Gilbert-Varshamov Bound . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
4.3 Singleton Bound . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
4.4 Plotkin Bound . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
4.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
4.6 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
7 Bridging the Gap Between Shannon and Hamming: List Decoding 133
7.1 Hamming versus Shannon: part II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
7.2 List Decoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
7.3 Johnson Bound . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
7.4 List-Decoding Capacity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
7.5 List Decoding from Random Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
7.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
7.7 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
6
9.4 The general case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
9.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
9.6 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172
7
14.4 The Outer Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259
14.5 Discussion and Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261
8
20.2 Avoiding Hash Collisions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 353
20.3 Almost Universal Hash Function Families and Codes . . . . . . . . . . . . . . . . . . 356
20.4 Data Possession Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 357
20.5 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 361
9
C.7 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 445
10
List of Figures
1.1 Decoding for Akash English, one gets “I need little little (trail)mix." . . . . . . . . . 19
1.2 Coding process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
1.3 Bad example for unique decoding. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
1.4 Illustration for proof of Hamming Bound . . . . . . . . . . . . . . . . . . . . . . . . . 34
11
12.1 The 2 × 2 Basic Polarizing Transform. Included in red are the conditional
³ entropies of´the variables, co
12.2 The n × n Basic Polarizing Transform defined as P n (Z) = P n (U, V) = P n (U + V), P n (V) . Acknowledgement
2 2
12.3 Block structure of the Basic Polarizing Transform. Circled are a block at the 2nd level and two 2nd leve
16.1 The recursive construction of C k . The final code C̃ k is also shown. . . . . . . . . . . 288
21.1 The minutiae are unordered and form a set, not a vector. . . . . . . . . . . . . . . . 365
22.1 Pick a subset S (not necessarily contiguous). Then pick a column j that is not present in S. There will
22.2 Construction of the final matrix MC ∗ from MC out and MC in from Example 22.4.3. The rows in MC ∗ that
E.1 Relationship between entropy, joint entropy, conditional entropy, and mutual information for two ran
12
List of Tables
3.1 Uniform distribution over F22×2 along with values of four random variables. . . . . . 64
10.1 Strongly explicit binary codes that we have seen so far. . . . . . . . . . . . . . . . . . 173
13
14
List of Algorithms
15
34 Decompression Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 358
35 Decompression Algorithm Using List Decoding . . . . . . . . . . . . . . . . . . . . . 360
36 UNLOCK 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 367
37 LOCK 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 368
38 UNLOCK 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 369
39 Decoder for Separable Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 376
40 Naive Decoder for Disjunct Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . 378
41 Initialization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 383
42 Update . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 383
43 Report Heavy Items . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 384
44 Simple Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 431
45 Sampling algorithm for G AP H AMMING . . . . . . . . . . . . . . . . . . . . . . . . . . 435
46 An average-case algorithm for G AP H AMMING . . . . . . . . . . . . . . . . . . . . . . 436
47 Exponential time algorithm for M AX L INEAR E Q . . . . . . . . . . . . . . . . . . . . . 437
48 Reduction from M AXC UT to M AX L INEAR E Q . . . . . . . . . . . . . . . . . . . . . . . 440
49 R OOT-F IND (Fq , f ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 464
50 L INEAR-R OOT-F IND (Fq , g ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 464
16
Part I
The Basics
17
Chapter 1
1.1 Overview
Communication is a fundamental need of our modern lives. In fact, communication is some-
thing that humans have been doing for a long time. For simplicity, let us restrict ourselves to
English. It is quite remarkable that different people speaking English can be understood pretty
well: even if e.g. the speaker has an accent. This is because English has some built-in redun-
dancy, which allows for “errors" to be tolerated. We will pick an example from one of the au-
thor’s experiences conversing with his two year old son, Akash. When Akash started to speak
his own version of English, which we will dub “Akash English," we got examples such as the one
illustrated below:
Figure 1.1: Decoding for Akash English, one gets “I need little little (trail)mix."
19
With some practice Akash’s parents were able to “decode" what Akash really meant. In fact,
Akash could communicate even if he did not say an entire word properly and gobbled up part(s)
of word(s).
The above example shows that having redundancy in a language allows for communication
even in the presence of (small amounts of) differences and errors. Of course in our modern
digital world, all kinds of entities communicate (and most of the entities do not communicate
in English or any natural language for that matter). Errors are also present in the digital world,
so these digital communications also use redundancy.
Error-correcting codes (henceforth, just codes) are clever ways of representing data so that
one can recover the original information even if parts of it are corrupted. The basic idea is to
judiciously introduce redundancy so that the original information can be recovered even when
parts of the (redundant) data have been corrupted.
For example, when packets are transmitted over the Internet, some of the packets get cor-
rupted or dropped. Packet drops are resolved by the TCP layer by a combination of sequence
numbers and ACKs. To deal with data corruption, multiple layers of the TCP/IP stack use a form
of error correction called CRC Checksum [102]. From a theoretical point of view, the checksum
is a terrible code (for that matter so is English). However, on the Internet, the current dominant
mode of operation is to detect errors and if errors have occurred, then ask for retransmission.
This is the reason why the use of checksum has been hugely successful in the Internet. However,
there are other communication applications where re-transmission is not an option. Codes are
used when transmitting data over the telephone line or via cell phones. They are also used in
deep space communication and in satellite broadcast (for example, TV signals are transmitted
via satellite). Indeed, asking the Mars Rover to re-send an image just because it got corrupted
during transmission is not an option–this is the reason that for such applications, the codes
used have always been very sophisticated.
Codes also have applications in areas not directly related to communication. In particu-
lar, in the applications above, we want to communicate over space. Codes can also be used to
communicate over time. For example, codes are used heavily in data storage. CDs and DVDs
work fine even in presence of scratches precisely because they use codes. Codes are used in Re-
dundant Array of Inexpensive Disks (RAID) [21] and error correcting memory [20]. Sometimes,
in the Blue Screen of Death displayed by Microsoft Windows family of operating systems, you
might see a line saying something along the lines of “parity check failed"–this happens when
the code used in the error-correcting memory cannot recover from error(s). Also, certain con-
sumers of memory, e.g. banks, do not want to suffer from even one bit flipping (this e.g. could
mean someone’s bank balance either got halved or doubled–neither of which are welcome1 ).
Codes are also deployed in other applications such as paper bar codes; for example, the bar
code used by UPS called MaxiCode [19]. Unlike the Internet example, in all of these applica-
tions, there is no scope for “re-transmission."
In this book, we will mainly think of codes in the communication scenario. In this frame-
work, there is a sender who wants to send (say) k message symbols over a noisy channel. The
1
This is a bit tongue-in-cheek: in real life banks have more mechanisms to prevent one bit flip from wreaking
havoc.
20
sender first encodes the k message symbols into n symbols (called a codeword) and then sends
it over the channel. The receiver gets a received word consisting of n symbols. The receiver then
tries to decode and recover the original k message symbols. Thus, encoding is the process of
adding redundancy and decoding is the process of removing errors.
Unless mentioned otherwise, in this book we will make the following assumption:
The sender and the receiver only communicate via the channel.a In other words, other than
some setup information about the code, the sender and the receiver do not have any other
information exchange (other than of course what was transmitted over the channel). In
particular, no message is more likely to be transmitted over another.
a
The scenario where the sender and receiver have a “side-channel" is an interesting topic that has been
studied but is outside the scope of this book.
The fundamental question that will occupy our attention for almost the entire book is the
tradeoff between the amount of redundancy used and the number of errors that can be cor-
rected by a code. In particular, we would like to understand:
Question 1.1.1. How much redundancy do we need to correct a given amount of errors? (We
would like to correct as many errors as possible with as little redundancy as possible.)
Intuitively, maximizing error correction and minimizing redundancy are contradictory goals:
a code with higher redundancy should be able to tolerate more number of errors. By the end of
this chapter, we will see a formalization of this question.
Once we determine the optimal tradeoff, we will be interested in achieving this optimal
tradeoff with codes that come equipped with efficient encoding and decoding. (A DVD player
that tells its consumer that it will recover from a scratch on a DVD by tomorrow is not exactly
going to be a best-seller.) In this book, we will primarily define efficient algorithms to be ones
that run in polynomial time.2
Definition 1.2.1 (Code). A code of block length n over an alphabet Σ is a subset of Σn . Typically,
we will use q to denote |Σ|.3
Remark 1.2.2. We note that the ambient space Σn can be viewed as a set of sequences, vectors or
functions. In other words, we can think of a vector (v 1 , . . . , v n ) ∈ Σn as just the sequence v 1 , . . . , v n
2
We are not claiming that this is the correct notion of efficiency in practice. However, we believe that it is a good
definition as the “first cut"– quadratic or cubic time algorithms are definitely more desirable than exponential time
algorithms: see Section C.4 for more on this.
3
Note that q need not be a constant and can depend on n: we’ll see codes in this book where this is true.
21
(in order) or a vector tuple (v 1 , . . . , v n ) or as the function f : [n] → Σ such that f (i ) = v i . Sequences
assume least structure on Σ and hence are most generic. Vectors work well when Σ has some struc-
ture (and in particular is what is known as a field, which we will see next chapter). Functional
representation will be convenient when the set of coordinates has structure (e.g., [n] may come
from a finite field of size n). For now, however, the exact representation does not matter and the
reader can work with representation as sequences.
We will also frequently use the following alternate way of looking at a code. Given a code
C ⊆ Σn , with |C | = M , we will think of C as a mapping of the following form:
C : [M ] → Σn .
In the above, we have used to notation [M ] for any integer M ≥ 1 to denote the set {1, 2, . . . , M }.
We will also need the notion of dimension of a code.
def
k = logq |C |.
Let us begin by looking at two specific codes. Both codes are defined over Σ = {0, 1} (also
known as binary codes). In both cases |C | = 24 and we will think of each of the 16 messages as a
4 bit vector.
We first look at the so-called parity code, which we will denote by C ⊕ . Given a message
(x 1 , x 2 , x 3 , x 4 ) ∈ {0, 1}4 , its corresponding codeword is given by
C ⊕ (x 1 , x 2 , x 3 , x 4 ) = (x 1 , x 2 , x 3 , x 4 , x 1 ⊕ x 2 ⊕ x 3 ⊕ x 4 ),
where the ⊕ denotes the EXOR (also known as the XOR or Exclusive-OR) operator. In other
words, the parity code appends the parity of the message bits (or takes the remainder of the
sum of the message bits when divided by 2) at the end of the message. Note that such a code
uses the minimum amount of non-zero redundancy.
The second code we will look at is the so-called repetition code. This is a very natural code
(and perhaps the first code one might think of). The idea is to repeat every message bit a fixed
number of times. For example, we repeat each of the 4 message bits 3 times and we use C 3,r ep
to denote this code.
Let us now try to look at the tradeoff between the amount of redundancy and the number of
errors each of these codes can correct. Even before we begin to answer the question, we need
to define how we are going to measure the amount of redundancy. One natural way to define
redundancy for a code with dimension k and block length n is by their difference n − k. By this
definition, the parity code uses the least amount of redundancy. However, one “pitfall" of such
a definition is that it does not distinguish between a code with k = 100 and n = 102 and another
code with dimension and block length 2 and 4, respectively. Intuitively, the latter code is using
more redundancy. This motivates the following notion of measuring redundancy.
22
Definition 1.2.4 (Rate of a code). The rate of a code with dimension k and block length n is given
by
def k
R = .
n
Note that the higher the rate, the lesser the amount of redundancy in the code. Also note that
as k ≤ n, R ≤ 1.4 Intuitively, the rate of a code is the average amount of real information in each
of the n symbols transmitted over the channel. So, in some sense, rate captures the complement
of redundancy. However, for historical reasons, we will deal with the rate R (instead of the more
obvious 1−R) as our notion of redundancy. Given the above definition, C ⊕ and C 3,r ep have rates
of 54 and 13 . As was to be expected, the parity code has a higher rate than the repetition code.
We have formalized the notion of redundancy as the rate of a code as well as other param-
eters of a code. However, to formalize Question 1.1.1, we still need to formally define what it
means to correct errors. We do so next.
Next we move to error correction. Intuitively, we can correct a received word if we can re-
cover the transmitted codeword (or equivalently the corresponding message). This “reverse"
process is called decoding.
The definition of a decoding function by itself does not give anything interesting. What we
really need from a decoding function is that it recovers the transmitted message. To understand
this notion, we first need to understand what is the nature of errors that we aim to tackle. In
particular, if a transmitter transmits u ∈ Σn and the receiver receives v ∈ Σn , how do we quantify
the amount of “error” that has happened during this transmission? While multiple notions are
possible, the most central one, and the one we will focus on for most of this book is based on
“Hamming distance”, a notion of distance that captures how close are two given sequences u
and v.
Definition 1.3.3 (Hamming distance). Given two vectors u, v ∈ Σn the Hamming distance be-
tween u and v, denoted by ∆(u, v), is the number of positions in which u and v differ.
4
Further, in this book, we will always consider the case k > 0 and n < ∞ and hence, we can also assume that
R > 0.
23
The Hamming distance is a distance in a very formal mathematical sense: see Exercise 1.5.
Note that the definition of Hamming distance just depends on the number of differences and
not the nature of the difference. For example, consider the vectors u = 00000 and v = 10001.
One can see that their Hamming distance is ∆(u, v) = 2. Now consider the vector w = 01010.
Note that even though v 6= w, we have that the Hamming distance ∆(u, w) = 2.
To return to the quantification of errors, from now we will say that if u is transmitted and
v is received then ∆(u, v) errors occured during transmission. This allows us to quantify the
performance of an encoding/function, or equivalently the underlying code as we do next.
Definition 1.3.4 (t -Error Channel). An n-symbol t -Error Channel over the alphabet Σ is a func-
tion Ch : Σn → Σn that satisfies ∆(v, Ch(v)) ≤ t for every v ∈ Σn .
Definition 1.3.5 (Error Correcting Code). Let C ⊆ Σn be a code and let t ≥ 1 be an integer. C
is said to be a t -error-correcting code if there exists a decoding function D such that for every
message m ∈ [|C |] every t -error channel Ch we have D (Ch(C (m))) = m.
Thus a t -error-correcting code is one where there is a decoding function that corrects any
pattern of t errors.
Figure 1.3 illustrates how the definitions we have examined so far interact.
Channel Ch
m 7→ C (m) v = Ch (C (m)) 7→ m
We will also very briefly look at a weaker form of error recovery called error detection.
Definition 1.3.6 (Error detection code). Let C ⊆ Σn be a code and let t ≥ 1 be an integer. C is said
to be t -error-detecting code if there exists a detecting procedure D such that for every message m
and every received vector v ∈ Σn satisfying ∆(C (m), v) ≤ t , it hold that D outputs a 1 if v = C (m)
and 0 otherwise.
Thus a t -error-detecting code is one where if the transmission has at least one error and at
most t errors, then the decoding function detects the error (by outputting 0). Note that a t -error
correcting code is also a t -error detecting code (but not necessarily the other way round): see
Exercise 1.1. Although error detection might seem like a weak error recovery model, it is useful
in settings where the receiver can ask the sender to re-send the message. For example, error
detection is used quite heavily in the Internet.
Finally we also consider a more benign model of errors referred to as “erasures” where a
symbol is merely (and explicitly) omitted from the transmission (as opposed being replaced by
some other symbol). To define this model we use a special symbol “?” that is not a member of
the alphabet Σ.
24
Definition 1.3.7 (t -Erasure Channel). An n-symbol t -Erasure Channel over the alphabet Σ is a
function Ch : Σn → (Σ ∪ {?})n that satisfies ∆(v, Ch(v)) ≤ t for every v ∈ Σn (where both arguments
to ∆(·, ·) are viewed as elements of (Σ ∪ {?})n ) and for every i ∈ [n] such that vi 6= Ch(v)i we have
Ch(v)i =?.
A coordinate i such that Ch(v)i =? is called an erasure. We may now define erasure correct-
ing codes analogously to error-correcting codes.
Definition 1.3.8 (Erasure Correcting Code). Let C ⊆ Σn be a code and let t ≥ 1 be an integer. C
is said to be a t -erasure-correcting code if there exists a decoding function D such that for every
message m ∈ [|C |] every t -erasure channel Ch we have D (Ch(C (m))) = m.
With the above definitions in place, we are now ready to look at the error correcting capa-
bilities of the codes we looked at in the previous section.
C ⊕ : q = 2, k = 4, n = 5, R = 4/5.
We will start with the repetition code. To study its error-correcting capabilities, we will con-
sider the following natural decoding function. Given a received word y ∈ {0, 1}12 , divide it up
into four consecutive blocks (y 1 , y 2 , y 3 , y 4 ) where every block consists of three bits. Then, for
every block y i (1 ≤ i ≤ 4), output the majority bit as the message bit. We claim this decoding
function can correct any error pattern with at most 1 error. (See Exercise 1.2.) For example, if a
block of 010 is received, since there are two 0’s we know the original message bit was 0. In other
words, we have argued that
However, it is not too hard to see that C 3,r ep cannot correct two errors. For example, if both
of the errors happen in the same block and a block in the received word is 010, then the original
block in the codeword could have been either 111 or 000. Therefore in this case, no decoder can
successfully recover the transmitted message.5
Thus, we have pin-pointed the error-correcting capabilities of the C 3,r ep code: it can cor-
rect one error, but not two or more. However, note that the argument assumed that the error
positions can be located arbitrarily. In other words, we are assuming that the channel noise
behaves arbitrarily (subject to a bound on the total number of errors). Obviously, we can model
the noise differently. We now briefly digress to look at this issue in slightly more detail.
5
Recall we are assuming that the decoder has no side information about the transmitted message.
25
Digression: Channel Noise. As was mentioned above, until now we have been assuming the
following noise model, which was first studied by Hamming:
Any error pattern can occur during transmission as long as the total number of er-
rors is bounded. Note that this means that the location as well as the nature6 of the
errors is arbitrary.
We will frequently refer to Hamming’s model as the Adversarial Noise Model. It is important
to note that the atomic unit of error is a symbol from the alphabet. So for example, if the error
pattern is (1, 0, 1, 0, 0, 0) and we consider the alphabet to be {0, 1}, then the pattern has two errors.
However, if our alphabet is {0, 1}3 (i.e. we think of the vector above as ((1, 0, 1), (0, 0, 0)), with
(0, 0, 0) corresponding to the zero element in {0, 1}3 ), then the pattern has only one error. Thus,
by increasing the alphabet size we can also change the adversarial noise model. As the book
progresses, we will see how error correction over a larger alphabet is easier than error correction
over a smaller alphabet.
However, the above is not the only way to model noise. For example, we could also have
following error model:
First note that, for the channel model above, no more than four errors can occur when a code-
word in C 3,r ep is transmitted. (Recall that in C 3,r ep , each of the four bits is repeated three times.)
Second, note that the decoding function that takes the majority vote of each block can suc-
cessfully recover the transmitted codeword for any error pattern, while in the worst-case noise
model it could only correct at most one error. This channel model is admittedly contrived, but
it illustrates the point that the error-correcting capabilities of a code (and a decoding function)
are crucially dependent on the noise model.
A popular alternate noise model is to model the channel as a stochastic process. As a con-
crete example, let us briefly mention the binary symmetric channel with crossover probability
0 ≤ p ≤ 1, denoted by BSCp , which was first studied by Shannon. In this model, when a (binary)
codeword is transferred through the channel, every bit flips independently with probability p.
Note that the two noise models proposed by Hamming and Shannon are in some sense two
extremes: Hamming’s model assumes no knowledge about the channel (except that a bound
on the total number of errors is known7 while Shannon’s noise model assumes complete knowl-
edge about how noise is produced. In this book, we will consider only these two extreme noise
models. In real life, the situation often is somewhere in between.
For real life applications, modeling the noise model correctly is an extremely important
task, as we can tailor our codes to the noise model at hand. However, in this book we will
not study this aspect of designing codes at all, and will instead mostly consider the worst-case
noise model. Intuitively, if one can communicate over the worst-case noise model, then one
6
For binary codes, there is only one kind of error: a bit flip. However, for codes over a larger alphabet, say {0, 1, 2},
0 being converted to a 1 and 0 being converted into a 2 are both errors, but are different kinds of errors.
7
A bound on the total number of errors is necessary; otherwise, error correction would be impossible: see
Exercise 1.3.
26
could use the same code to communicate over nearly every other noise model with the same
amount of noise.
We now return to C ⊕ and examine its error-correcting capabilities in the worst-case noise
model. We claim that C ⊕ cannot correct even one error. Suppose y = 10000 is the received word.
Then we know that an error has occurred, but we do not know which bit was flipped. This is
because the two codewords u = 00000 and v = 10001 differ from the received word y in exactly
one bit. As we are assuming that the receiver has no side information about the transmitted
codeword, no decoder can know what the transmitted codeword was.
Thus, from an error-correction point of view, C ⊕ is a terrible code (as it cannot correct even
1 error). However, we will now see that C ⊕ can detect one error. Consider Algorithm 1. Note that
1: b ← y 1 ⊕ y 2 ⊕ y 3 ⊕ y 4 ⊕ y 5
2: RETURN 1 ⊕ b ⊲ If there is no error, then b = 0 and hence we need to "flip" the bit for the
answer
Proposition 1.3.10. The parity code C ⊕ can detect an odd number of errors.
Let us now revisit the example that showed that one cannot correct one error using C ⊕ .
Recall, we considered two codewords in C ⊕ , u = 00000 and v = 10001 (which are codewords
corresponding to messages 0000 and 1000, respectively). Now consider the scenarios in which
u and v are each transmitted and a single error occurs resulting in the received word r = 10000.
Thus, given the received word r and the fact that at most one error can occur, the decoder has
no way of knowing whether the original transmitted codeword was u or v. Looking back at the
example, it is clear that the decoder is “confused" because the two codewords u and v do not
differ in many positions. This notion is formalized in the next section.
27
Definition 1.4.1 (Minimum distance). Let C ⊆ Σn . The minimum distance (or just distance) of
C , denoted ∆(C ), is defined to be
In other words, ∆(C ) is the minimum distance between two distinct codewords in C . It is
easy to check that the repetition code C 3,r ep has distance 3. Indeed, any two distinct messages
will differ in at least one of the message bits. After encoding, the difference in one message bit
will translate into a difference of three bits in the corresponding codewords. We now claim that
the distance of C ⊕ is 2. This is a consequence of the following observations. If two messages
m1 and m2 differ in at least two places then ∆(C ⊕ (m1 ),C ⊕ (m2 )) ≥ 2 (even if we just ignored the
parity bits). If two messages differ in exactly one place then the parity bits in the correspond-
ing codewords are different which implies a Hamming distance of 2 between the codewords.
Thus, C ⊕ has smaller distance than C 3,r ep and can correct less number of errors than C 3,r ep .
This suggests that a larger distance implies greater error-correcting capabilities. The next result
formalizes this intuition. Before we get to the result, we first introduce a milder notion of cor-
ruption called “erasures”. As we will see minimum distance exactly captures both the ability to
recover from errors as also this notion of erasures.
Remark 1.4.3. Property (2) above for even d is slightly different. In this case, one can correct up
to d2 − 1 errors but cannot correct d2 errors. (See Exercise 1.6.)
Before we prove Proposition 1.4.2, let us apply it to the codes C ⊕ and C 3,r ep which have
distances of 2 and 3 respectively. Proposition 1.4.2 implies the following facts that we have
already proved:
The proof of Proposition 1.4.2 will need the following decoding function. Maximum like-
lihood decoding (MLD) is a well-studied decoding method for error correcting codes, which
outputs the codeword closest to the received word in Hamming distance (with ties broken arbi-
trarily). More formally, the MLD function denoted by D M LD : Σn → C is defined as follows. For
every y ∈ Σn ,
D M LD (y) = arg min ∆(c, y).
c∈C
Algorithm 2 is a naive implementation of the MLD.
28
Algorithm 2 Naive Maximum Likelihood Decoder
I NPUT: Received word y ∈ Σn
O UTPUT: D M LD (y)
Proof of Proposition 1.4.2 We will complete the proof in two steps. First, we will show that if
property 1 is satisfied then so are properties 2,3 and 4. Then we show that if property 1 is not
satisfied then none of properties 2,3 or 4 hold.
1. implies 2. Assume C has distance d . We first prove 2 (for this case assume that d = 2t + 1).
We now need to show that there exists a decoding function such that for all error patterns with
at most t errors it always outputs the transmitted message. We claim that the MLD function
has this property. Assume this is not so and let c1 be the transmitted codeword and let y be the
received word. Note that
∆(y, c1 ) ≤ t . (1.1)
As we have assumed that MLD does not work, D M LD (y) = c2 6= c1 . Note that by the definition of
MLD,
∆(y, c2 ) ≤ ∆(y, c1 ). (1.2)
Consider the following set of inequalities:
where (1.3) follows from the triangle inequality (see Exercise 1.5), (1.4) follows from (1.2) and
(1.5) follows from (1.1). (1.6) implies that the distance of C is at most d − 1, which is a contra-
diction.
1. implies 3. We now show that property 3 holds, that is, we need to describe an algorithm
that can successfully detect whether errors have occurred during transmission (as long as the
total number of errors is bounded by d − 1). Consider the following error detection algorithm:
check if the received word y = c for some c ∈ C (this can be done via an exhaustive check). If
no errors occurred during transmission, y = c1 , where c1 was the transmitted codeword and the
algorithm above will accept (as it should). On the other hand if 1 ≤ ∆(y, c1 ) ≤ d − 1, then by the
fact that the distance of C is d , y 6∈ C and hence the algorithm rejects, as required.
29
1. implies 4. Finally, we prove that property 4 holds. Let y ∈ (Σ ∪ {?})n be the received word.
First we claim that there is a unique c = (c 1 , . . . , c n ) ∈ C that agrees with y (i.e. y i = c i for ev-
ery i such that y i 6= ?). (For the sake of contradiction, assume that this is not true, i.e. there
exists two distinct codewords c1 , c2 ∈ C such that both c1 and c2 agree with y in the unerased
positions. Note that this implies that c1 and c2 agree in the positions i such that y i 6= ?. Thus,
∆(c1 , c2 ) ≤ |{i |y i = ?}| ≤ d − 1, which contradicts the assumption that C has distance d .) Given
the uniqueness of the codeword c ∈ C that agrees with y in the unerased position, an algorithm
to find c is as follows: go through all the codewords in C and output the desired codeword.
¬1. implies ¬2. For the other direction of the proof, assume that property 1 does not hold,
that is, C has distance d − 1. We now show that property 2 cannot hold: i.e., for every decoding
function there exists a transmitted codeword c1 and a received word y (where ∆(y, c1 ) ≤ (d −1)/2)
such that the decoding function cannot output c1 . Let c1 6= c2 ∈ C be codewords such that
∆(c1 , c2 ) = d − 1 (such a pair exists as C has distance d − 1). Now consider a vector y such that
∆(y, c1 ) = ∆(y, c2 ) = (d − 1)/2. Such a y exists as d is odd and by the choice of c1 and c2 . Below is
an illustration of such a y (matching color implies that the vectors agree on those positions):
d −1 d −1
n −d +1 2 2
c1
c2
Now, since y could have been generated if either of c1 or c2 were the transmitted codeword,
no decoding function can work in this case.8
¬1. implies ¬3. For the remainder of the proof, assume that the transmitted word is c1 and
there exists another codeword c2 such that ∆(c2 , c1 ) = d − 1. To see why property 3 is not true,
let y = c2 . In this case, either the error detecting algorithm detects no error, or it declares an
error when c2 is the transmitted codeword and no error takes place during transmission.
8
Note that this argument is just a generalization of the argument that C ⊕ cannot correct 1 error.
30
¬1. implies ¬4. We finally argue that property 4 does not hold. Let y be the received word in
which the positions that are erased are exactly those where c1 and c2 differ. Thus, given y both
c1 and c2 could have been the transmitted codeword, and no algorithm for correcting (at most
d − 1) erasures can work in this case. ■
Proposition 1.4.2 implies that Question 1.1.1 can be reframed as
Question 1.4.1. What is the largest rate R that a code with distance d can have?
We have seen that the repetition code C 3,r ep has distance 3 and rate 1/3. A natural follow-up
question (which is a special case of Question 1.4.1) is to ask
Question 1.4.2. Can we have a code with distance 3 and rate R > 31 ?
C H (x 1 , x 2 , x 3 , x 4 ) = (x 1 , x 2 , x 3 , x 4 , x 2 ⊕ x 3 ⊕ x 4 , x 1 ⊕ x 3 ⊕ x 4 , x 1 ⊕ x 2 ⊕ x 4 ).
C H : q = 2, k = 4, n = 7, R = 4/7.
We will show shortly that C H has a distance of 3. We would like to point out that we could
have picked the three parities differently. The reason we mention the three particular parities
above is due to historical reasons. We leave it as an exercise to define an alternate set of parities
such that the resulting code still has a distance of 3: see Exercise 1.9.
Before we move on to determining the distance of C H , we will need another definition.
Definition 1.5.1 (Hamming Weight). Let q ≥ 2. Given any vector v ∈ {0, 1, 2, . . . , q − 1}n , its Ham-
ming weight, denoted by w t (v) is the number of non-zero symbols in v.
31
Proof. We will prove the claimed property by using two properties of C H :
and
min w t (c) = min ∆(c1 , c2 ) (1.8)
c∈C H ,c6=0 c1 6=c2 ∈C H
The proof of (1.7) follows from a case analysis on the Hamming weight of the message bits. Let
us use x = (x 1 , x 2 , x 3 , x 4 ) to denote the message vector.
• Case 0: If w t (x) = 0, then C H (x) = 0, which means we do not have to consider this code-
word.
Thus, we can conclude that min w t (c) ≥ 3. Further, note that w t (C H (1, 0, 0, 0)) = 3, which
c∈C H ,c6=0
along with the lower bound that we just obtained proves (1.7).
We now turn to the proof of (1.8). For the rest of the proof, let x = (x 1 , x 2 , x 3 , x 4 ) and y =
(y 1 , y 2 , y 3 , y 4 ) denote the two distinct messages. Using associativity and commutativity of the ⊕
operator, we obtain that
C H (x) +C H (y) = C H (x + y),
where the “+" operator is just the bit-wise ⊕ of the operand vectors9 . Further, it is easy to verify
that for two vectors u, v ∈ {0, 1}n , ∆(u, v) = w t (u + v) (see Exercise 1.12). Thus, we have
= min w t (C H (x)),
x6=0∈{0,1}4
where the second equality follows from the observation that {x+y|x 6= y ∈ {0, 1}n } = {x ∈ {0, 1}n |x 6=
0}. Recall that w t (C H (x)) = 0 if and only if x = 0 and this completes the proof of (1.8). Combining
(1.7) and (1.8), we conclude that C H has a distance of 3.
The second part of the proof could also be shown in the following manner. It can be verified
easily that the Hamming code is the set {x · G H |x ∈ {0, 1}4 }, where G H is the following matrix
9
E.g. (0, 1, 1, 0) + (1, 1, 1, 0) = (1, 0, 0, 0).
32
(where we think x as a row vector).10
1 0 0 0 0 1 1
0 1 0 0 1 0 1
GH =
0 0 1 0 1 1 0
0 0 0 1 1 1 1
In fact, any binary code (of dimension k and block length n) that is generated11 by a k × n
matrix is called a binary linear code. (Both C ⊕ and C 3,r ep are binary linear codes: see Exer-
cise 1.13.) This implies the following simple fact.
Lemma 1.5.3. For any binary linear code C and any two messages x and y, C (x)+C (y) = C (x+y).
Proof. For any binary linear code, we have a generator matrix G. The following sequence of
equalities (which follow from the distributivity and associativity properties of the Boolean EXOR
and AND operators) proves the lemma.
C (x) +C (y) = x · G + y · G
= (x + y) · G
= C (x + y)
We stress that in the lemma above, x and y need not be distinct. Note that due to the fact that
b ⊕b = 0 for every b ∈ {0, 1}, x+x = 0, which along with the lemma above implies that C (0) = 0.12
We can infer the following result from the above lemma and the arguments used to prove (1.8)
in the proof of Proposition 1.5.2.
Proposition 1.5.4. For any binary linear code, its minimum distance is equal to minimum Ham-
ming weight of any non-zero codeword.
Thus, we have seen that C H has distance d = 3 and rate R = 74 while C 3,r ep has distance d = 3
and rate R = 31 . Thus, the Hamming code is provably better than the repetition code (in terms
of the tradeoff between rate and distance) and thus, answers Question 1.4.2 in the affirmative.
The next natural question is
Question 1.5.1. Can we have a distance 3 code with a rate higher than that of C H ?
33
1.6 Hamming Bound
Now we switch gears to present our first tradeoff between redundancy (in the form of the di-
mension of a code) and its error-correction capability (in the form of its distance). In particular,
we will first prove a special case of the so-called Hamming bound for a distance of 3.
We begin with another definition.
Next, we prove an upper bound on the dimension of every code with distance 3.
Theorem 1.6.2 (Hamming bound for d = 3). Every binary code with block length n, dimension
k, distance d = 3 satisfies
k ≤ n − log2 (n + 1).
Proof. Given any two codewords, c1 6= c2 ∈ C , the following is true (as C has distance13 3):
{0, 1}n
1 1 1
c1 c2
1 1 1
Figure 1.4: Hamming balls of radius 1 are disjoint. The figure is technically not correct: the balls
above are actually balls in the Euclidean space, which is easier to visualize than the Hamming
space.
34
Now consider the union of all Hamming balls centered around some codeword. Obviously, their
union is a subset of {0, 1}n . In other words,
¯ ¯
¯[ ¯
¯ B (c, 1)¯ ≤ 2n . (1.11)
¯ ¯
c∈C
where (1.12) follows from (1.10) and (1.13)) the fact that C has dimension k. Combining (1.13)
and (1.11), we get
2k (n + 1) ≤ 2n ,
or equivalently
2n
2k ≤ .
n +1
Taking log2 of both sides we get the desired bound:
k ≤ n − log2 (n + 1).
Thus, Theorem 1.6.2 shows that for n = 7, C H has the largest possible dimension for any
binary code of block length 7 and distance 3 (as for n = 7, n − log2 (n + 1) = 4). In particular,
it also answers Question 1.5.1 for n = 7 in the negative. Next, will present the general form of
Hamming bound.
Definition 1.7.1. A code C ⊆ Σn with dimension k and distance d will be called a (n, k, d )Σ code.
We will also refer it to as a (n, k, d )|Σ| code.
Theorem 1.7.2 (Hamming Bound for any d ). For every (n, k, d )q code
j k
à !
(d −1)
2
X n
k ≤ n − logq (q − 1)i .
i =0 i
35
Proof. The proof is a straightforward
j k generalization of the proof of Theorem 1.6.2. For nota-
(d −1)
tional convenience, let e = 2 . Given any two codewords, c1 6= c2 ∈ C , the following is true
(as C has distance14 d ):
B (c1 , e) ∩ B (c2 , e) = ;. (1.14)
We claim that for all x ∈ [q]n , Ã !
X e n
|B (x, e)| = (q − 1)i . (1.15)
i =0 i
Indeed
¡n ¢ any vector in B (x, e) must differ from x in exactly 0 ≤ i ≤ e positions. In the summation
i is the number of ways of choosing the differing i positions and in each such position, a
vector can differ from x in q − 1 ways.
Now consider the union of all Hamming balls centered around some codeword. Obviously,
their union is a subset of [q]n . In other words,
¯ ¯
¯[ ¯
¯ B (c, e)¯ ≤ q n . (1.16)
¯ ¯
c∈C
where (1.17) follows from (1.15) and the fact that C has dimension k. Combining (1.17) and
(1.16) and taking logq of both sides we will get the desired bound:
à à ! !
X e n
k ≤ n − logq (q − 1)i .
i =0 i
Note that the Hamming bound gives a partial answer to Question 1.4.1. In particular, any
code of distance d can have rate R at most
¡P ¡ ¢ ¢
logq ei=0 ni (q − 1)i
1− .
n
Definition 1.7.3. Codes that meet Hamming bound are called perfect codes.
14
Assume that y ∈ B (c1 , e)∩B (c2 , e), that is ∆(y, c1 ) ≤ e and ∆(y, c2 ) ≤ e. Thus, by the triangle inequality, ∆(c1 , c2 ) ≤
2e ≤ d − 1, which is a contradiction.
36
Intuitively, a perfect
j code
k leads to the following perfect “packing": if one constructs Ham-
ming balls of radius d −12 around all the codewords, then we would cover the entire ambient
space, i.e. every possible vector will lie in one of these Hamming balls.
One example of perfect code is the (7, 4, 3)2 Hamming code that we have seen in this chapter
(so is the family of general Hamming codes that we will see in the next chapter). A natural
question to ask is if
Question 1.7.1. Other than the Hamming codes, are there any other perfect (binary) codes?
Definition 1.8.1 (Code families, Rate and Distance). Let q ≥ 2. Let {n i }i ≥1 be an increasing se-
quence of block lengths and suppose there exists sequences {k i }i ≥1 and {d i }i ≥1 such that for all
i ≥ 1 there exists an (n i , k i , d i )q code C i . Then the sequence C = {C i }i ≥1 is a family of codes. The
rate of C is defined as ½ ¾
ki
R(C ) = lim ,
i →∞ n i
For instance, we will shortly see that Hamming code of Section 1.5 can be extended to an
entire family of codes C H = {C i }i ∈Z+ , with C i being an (n i .k i , d i )-code with n i = 2i − 1, k i = 2i −
i − 1, d i = 3 and thus,
i
R(C H ) = lim 1 − i = 1,
i →∞ 2 −1
and
3
δ(C H ) = lim i = 0.
i →∞ 2 − 1
15
In all codes we will study these limits will exist, but of course it is possible to construct families of codes where
the limits do not exist.
37
A significant focus of this text from now on will be on families of codes. This is necessary
as we will study the asymptotic behavior of algorithms on codes, which does not make sense
for a fixed code. For example, when we say that a decoding algorithm for a code C takes O(n 2 )
time, we would be implicitly assuming that C is a family of codes and that the algorithm has
an O(n 2 ) running time when the block length is large enough. From now on, unless mentioned
otherwise, whenever we talk about a code, we will be implicitly assuming that we are talking
about a family of codes.
Given that we can only formally talk about asymptotic run time of algorithms, we now also
state our formal notion of efficient algorithms:
We’ll call an algorithm related to a code of block length n to be efficient, if it runs in time
polynomial in n.
For all the specific codes that we will study in this book, the corresponding family of codes
will be a “family" in a more natural sense. By this we mean that all the specific codes in a family
of codes will be the “same" code except with different parameters. A bit more formally, we will
consider families {C i }i ≥1 , where given only the ‘index’ i , one can compute a sufficient descrip-
tion of C i efficiently.16
Finally, the definition of a family of codes allows us to present the final version of the big
motivating question for the book. The last formal version of the main question we considered
was Question 1.4.1, where we were interested in the tradeoff of rate R and distance d . The
comparison was somewhat unfair because R was a ratio while d was an integer. A more appro-
priate comparison should be between rate R and the relative distance δ. Further, we would be
interested in tackling the main motivating question for families of codes, which results in the
following final version:
Question 1.8.1. What is the optimal tradeoff between R(C ) and δ(C ) that can be achieved by
some code family C ?
A natural special case of Question 1.8.1 is whether the rate and relative distance of a family
of codes can be simultaneously positive. We formulate this special case as a separate question
below.
Question 1.8.2. Does there exist a family of codes C such that R(C ) > 0 and δ(C ) > 0 hold
simultaneously?
16
We stress that this is not always going to be the case. In particular, we will consider “random" codes where this
efficient constructibility will not be true.
38
Codes that have the above property are called asymptotically good. For the curious reader,
we will present many asymptotically good codes in the rest of this book, though a priori the
existence of these is not immediate.
1.9 Exercises
Exercise 1.1. Show that any t -error correcting code is also t -error detecting but not necessarily
the other way around.
Exercise 1.3. Show that for every integer n, there is no code with block length n that can handle
arbitrary number of errors.
1. d (x, y) ≥ 0.
4. d (x, z) ≤ d (x, y) + d (y, z). (This property is called the triangle inequality.)
Exercise 1.6. Let C be a code with distance d for even d . Then argue that C can correct up to
d /2 − 1 many errors but cannot correct d /2 errors. Using this or otherwise, argue that if a code C
is t -error correctable then it either has a distance of 2t + 1 or 2t + 2.
Exercise 1.7. In this exercise, we will see that one can convert arbitrary codes into code with
slightly different parameters:
1. Let C be an (n, k, d )2 code with d odd. Then it can be converted into an (n + 1, k, d + 1)2
code.
Note: Other than the parameters of the code C , you should not assume anything else about the
code. Also your conversion should work for every n, k, d ≥ 1.
39
Exercise 1.8. In this problem we will consider a noise model that has both errors and erasures. In
particular, let C be an (n, k, d )Σ code. As usual a codeword c ∈ C is transmitted over the channel
and the received word is a vector y ∈ (Σ ∪ {?})n , where as before a ? denotes an erasure. We will use
s to denote the number of erasures in y and e to denote the number of (non-erasure) errors that
occurred during transmission. To decode such a vector means to output a codeword c ∈ C such
that the number of positions where c disagree with y in the n − s non-erased positions is at most
e. For the rest of the problem assume that
2e + s < d . (1.18)
1. Argue that the output of the decoder for any C under (1.18) is unique.
2. Let C be a binary code (but not necessarily linear). Assume that there exists a decoder D
that can correct from < d /2 many errors in T (n) time. Then under (1.18) one can perform
decoding in time O(T (n)).
Exercise 1.9. Define codes other than C H with k = 4, n = 7 and d = 3.
Hint: Refer to the proof of Proposition 1.5.2 to figure out the properties needed from the three parities.
Exercise 1.10. Argue that if w t (x) = 1 then at least two parity check bits in (x 2 ⊕ x 3 ⊕ x 4 , x 1 ⊕ x 2 ⊕
x 4 , x 1 ⊕ x 3 ⊕ x 4 ) are 1.
Exercise 1.11. Argue that if w t (x) = 2 then at least one parity check bit in (x 2 ⊕ x 3 ⊕ x 4 , x 1 ⊕ x 2 ⊕
x 4 , x 1 ⊕ x 3 ⊕ x 4 ) is 1.
Exercise 1.12. Prove that for any u, v ∈ {0, 1}n , ∆(u, v) = w t (u + v).
Exercise 1.13. Argue that C ⊕ and C 3,r ep are binary linear codes.
Exercise 1.14. Let G be a generator matrix of an (n, k, d )2 binary linear code. Then G has at least
kd ones in it.
Exercise 1.15. Argue that in any binary linear code, either all all codewords begin with a 0 of
exactly half of the codewords begin with a 0.
Exercise 1.16. Prove (1.10).
Exercise 1.17. Show that there is no binary code with block length 4 that achieves the Hamming
bound.
Exercise 1.18. (∗) There are n people in a room, each of whom is given a black/white hat chosen
uniformly at random (and independent of the choices of all other people). Each person can see
the hat color of all other people, but not their own. Each person is asked if (s)he wishes to guess
their own hat color. They can either guess, or abstain. Each person makes their choice without
knowledge of what the other people are doing. They either win collectively, or lose collectively.
They win if all the people who don’t abstain guess their hat color correctly and at least one person
does not abstain. They lose if all people abstain, or if some person guesses their color incorrectly.
Your goal below is to come up with a strategy that will allow the n people to win with pretty high
probability. We begin with a simple warmup:
40
(a) Argue that the n people can win with probability at least 12 .
Next we will see how one can really bump up the probability of success with some careful mod-
eling, and some knowledge of Hamming codes. (Below are assuming knowledge of the general
Hamming code (see Section 2.4). If you do not want to skip ahead, you can assume that n = 7 in
the last part of this problem.
(b) Lets say that a directed graph G is a subgraph of the n-dimensional hypercube if its vertex
set is {0, 1}n and if u → v is an edge in G, then u and v differ in at most one coordinate.
Let K (G) be the number of vertices of G with in-degree at least one, and out-degree zero.
Show that the probability of winning the hat problem equals the maximum, over directed
subgraphs G of the n-dimensional hypercube, of K (G)/2n .
(c) Using the fact that the out-degree of any vertex is at most n, show that K (G)/2n is at most
n
n+1
for any directed subgraph G of the n-dimensional hypercube.
(d) Show that if n = 2r − 1, then there exists a directed subgraph G of the n-dimensional hyper-
n
cube with K (G)/2n = n+1 .
Hint: This is where the Hamming code comes in.
41
42
Chapter 2
One motivation for the topic of this chapter is the following question: How we can represent
a code? Or more specifically, how many bits does it take to describe a code C : [q]k −→ [q]n ?
In general, a code C : [q]k −→ [q]n can be stored using nq k symbols from [q] (n symbols for
each of the q k codewords) or nq k log q bits. For constant rate codes, this is exponential space,
which is prohibitive even for modest values of k like k = 100. A natural question is whether we
can do better. Intuitively to facilitate a succinct representation the code must have some extra
structure. It turns out that one broad class of codes that do possess extra structure than general
codes, is what are called linear codes. We have already seen binary linear codes in Section 1.5,
that is: C ⊆ {0, 1}n is a linear code if for all c1 , c2 ∈ C , c1 + c2 ∈ C , where the “+" denotes bit-
wise XOR. In this chapter, we will see more general linear codes. We will see that they not only
offer enough structure to get succinct representations, but they also possess several other nice
properties.
To define general linear codes, we first need to introduce general finite fields and vector
spaces over such fields and we do so first before returning to codes.
43
• I DENTITY: There exists distinct a special elements e ∈ S such that for every a ∈ S we have
a ◦ e = e ◦ a = a.
• I NVERSE : For every a ∈ S, there exists its unique inverse a −1 such that a ◦ a −1 = a −1 ◦ a = e.
If G = (S, ◦) satisfies all the properties except the existence of inverses then G is called a monoid.
We say G is commutative if for every a, b ∈ S, a ◦ b = b ◦ a.
We often use the same letter to denote the group (or other algebraic structures) and the set
of elements.
We now turn to the definition of a field. Informally speaking, a field is a set of elements on
which one can do addition, subtraction, multiplication and division and still stay in the set.
Definition 2.1.2. A field F is given by a triple (S, +, ·), where S is the set of elements and +, · are
functions S × S → S with the following properties:
Again we typically use the same letter to denote the field and its set of elements. We also
use −a to denote the additive inverse of a ∈ F and a −1 to denote the multiplicative inverse of
a ∈ F \ {0}.
With the usual semantics for + and ·, R (set of real number) is a field, but Z (set of integers)
is not a field as division of two integers results in a rational number that need not be an inte-
ger (the set of rational numbers itself is a field though: see Exercise 2.1). In this course, we will
exclusively deal with finite fields. As the name suggests these are fields with a finite set of ele-
ments. (We will overload notation and denote the size of a field |F| = |S|.) The following is a well
known result.
Theorem 2.1.3 (Size of Finite Fields). Every finite field has size p s for some prime p and integer
s ≥ 1. Conversely for every prime p and integer s ≥ 1 there exists a field F of size p s .
One example of a finite field that we have seen is the field with S = {0, 1}, which we will
denote by F2 (we have seen this field in the context of binary linear codes). For F2 , addition
is the XOR operation, while multiplication is the AND operation. The additive inverse of an
element in F2 is the number itself while the multiplicative inverse of 1 is 1 itself.
Let p be a prime number. Then the integers modulo p form a field, denoted by Fp (and also
by Zp ), where the addition and multiplication are carried out modulo p. For example, consider
F7 , where the elements are {0, 1, 2, 3, 4, 5, 6}. We have (4 + 3) mod 7 = 0 and 4 · 4 mod 7 = 2.
Further, the additive inverse of 4 is 3 as (3 + 4) mod 7 = 0 and the multiplicative inverse of 4 is 2
as 4 · 2 mod 7 = 1.
More formally, we prove the following result.
44
Lemma 2.1.4. Let p be a prime. Then Fp = ({0, 1, . . . , p − 1}, +p , ·p ) is a field, where +p and ·p are
addition and multiplication modulo p.
Proof. The properties of associativity, commutativity, distributivity and identities hold for in-
tegers and hence, they hold for Fp . The closure property follows since both the “addition" and
“multiplication" are done modulo p, which implies that for any a, b ∈ {0, . . . , p −1}, a +p b, a ·p b ∈
{0, . . . , p −1}. Thus, to complete the proof, we need to prove the existence of unique additive and
multiplicative inverses.
Fix an arbitrary a ∈ {0, . . . , p − 1}. Then we claim that its additive inverse is p − a mod p. It is
easy to check that a + p − a = 0 mod p. Next we argue that this is the unique additive inverse.
To see this note that the sequence a, a + 1, a + 2, . . . , a + p − 1 are p consecutive numbers and
thus, exactly one of them is a multiple of p, which happens for b = p − a mod p, as desired.
Now fix an a ∈ {1, . . . , p − 1}. Next we argue for the existence of a unique multiplicative uni-
verse a −1 . Consider the set of numbers T = {a ·p b|b ∈ {1, . . . , p − 1}}. We claim that all these
numbers are unique. To see this, note that if this is not the case, then there exist b 1 6= b 2 ∈
{0, 1, . . . , p − 1} such that a · b 1 = a · b 2 mod p, which in turn implies that a · (b 1 − b 2 ) = 0 mod p.
Since a and b 1 −b 2 are non-zero numbers, this implies that p divides a · (b 1 −b 2 ). Further, since
a and |b 1 −b 2 | are both at most p −1, this implies that multiplying a and (b 1 −b 2 ) mod p results
in p, which is a contradiction since p is prime. Thus, we have argued that |T | = p − 1 and since
each number in T is in [p − 1], we have that T = [p − 1]. Thus, we can conclude that there exists
a unique element b such that a · b = 1 mod p and thus, b is the required a −1 .
One might think that there could be different finite fields with the same number of elements.
However, this is not the case:
Theorem 2.1.5. For every prime power q there is a unique finite field with q elements (up to
isomorphism1 ).
The most common vector space we will focus on is Fn with + representing coordinatewise
addition in F and a · u representing the coordinatewise scaling of u by a.
We are finally ready to define the notion of linear subspaces of Fn .
Definition 2.2.2 (Linear Subspace). A non-empty subset S ⊆ Fn is a linear subspace if the follow-
ing properties hold:
1
An isomorphism φ : S → S ′ is a bijective map (such that F = (S, +, ·) and F′ = (S ′ , ⊕, ◦) are fields) where for every
a 1 , a 2 ∈ S, we have φ(a 1 + a 2 ) = φ(a 1 ) ⊕ φ(a 2 ) and φ(a 1 · a 2 ) = φ(a 1 ) ◦ φ(a 2 ).
45
1. For every x, y ∈ S, x + y ∈ S, where the addition is vector addition over F (that is, do addition
component wise over F).
S 1 = {(0, 0, 0), (1, 1, 1), (2, 2, 2), (3, 3, 3), (4, 4, 4)}. (2.1)
Note that for example (1, 1, 1) + (3, 3, 3) = (4, 4, 4) ∈ S 1 and 2 · (4, 4, 4) = (3, 3, 3) ∈ S 1 as required
by the definition. Here is another somewhat less trivial example of a linear subspace over F33 :
S 2 = {(0, 0, 0), (1, 0, 1), (2, 0, 2), (0, 1, 1), (0, 2, 2), (1, 1, 2), (1, 2, 0), (2, 1, 0), (2, 2, 1). (2.2)
Note that (1, 0, 1) + (0, 2, 2) = (1, 2, 0) ∈ S 2 and 2 · (2, 0, 2) = (1, 0, 1) ∈ S 2 as required.
Remark 2.2.3. Note that the second property implies that 0 is contained in every linear subspace.
Further for any subspace over F2 , the second property is redundant: see Exercise 2.4.
Before we state some properties of linear subspaces, we state some relevant definitions.
Definition 2.2.4 (Span). Given a set B = {v1 , . . . , vℓ }. The span of B is the set of vectors
( )
Xℓ
a i · vi |a i ∈ Fq for every i ∈ [ℓ] .
i =1
Definition 2.2.5 (Linear (in)dependence of vectors). We say that v1 , v2 , . . . vk are linearly inde-
pendent if for every 1 ≤ i ≤ k and for every (k − 1)-tuple (a 1 , a 2 , . . . , a i −1 , a i +1 , . . . , a k ) ∈ Fk−1
q ,
vi 6= a 1 v1 + . . . + a i −1 vi −1 + a i +1 vi +1 + . . . + a k vk .
In other words, vi is not in the span of the set {v1 , . . . , vi −1 , vi +1 , . . . , vn } for every 1 ≤ i ≤ k. We say
that v1 , v2 , . . . vk are linearly dependent if they are not linearly independent.
For example the vectors (1, 0, 1), (1, 1, 1) ∈ S 2 are linearly independent.
One can define the row (column) rank of a matrix as the maximum number of linearly in-
dependent rows (columns). However, it is a well-known theorem that the row rank of a matrix
is the same as its column rank. For example, the matrix below over F3 has full rank (see Exer-
cise 2.5): µ ¶
1 0 1
G2 = . (2.3)
0 1 1
Any linear subspace satisfies the following properties (the full proof can be found in any
standard linear algebra textbook).
46
Theorem 2.2.7. If S ⊆ Fq n is a linear subspace then
2. There exists at least one set of vectors v1 , ..., vk ∈ S called basis elements such that every x ∈ S
can be expressed as x = a 1 v1 + a 2 v2 + ... + a n vn where a i ∈ Fq for 1 ≤ i ≤ k. In other words,
there exists a full rank k × n matrix G (also known as a generator matrix) with entries from
Fq such that every x ∈ S, x = (a 1 , a 2 , ...a k ) · G where
←− v1 −→
←− v2 −→
G = .. .
.
←− vk −→
3. There exists a full rank (n − k) × n matrix H (called a parity check matrix) such that for
every x ∈ S, H xT = 0.
Proof Sketch.
Property 1. We begin with the proof of the first property. For the sake of contradiction, let
us assume that q k < |S| < q k+1 , for some k ≥ 0. Iteratively, we will construct a set of linearly
independent vectors B ⊆ S such that |B | ≥ k + 1. Note that by the definition of a linear subspace
the span of B should be contained in S. However, this is a contradiction as the size of the span
of B is at least2 q k+1 > |S|.
To complete the proof, we show how to construct the set B in a greedy fashion. In the first
step pick v1 to be any non-zero vector in S and set B ← {v1 } (we can find such a vector as |S| >
q k ≥ 1). Now say after the step t (for some t ≤ k), |B | = t . Now the size of the span of the current
B is q t ≤ q k < |S|. Thus there exists a vector vt +1 ∈ S \ B that is linearly independent of vectors
in B . Set B ← B ∪ {vt +1 }. Thus, we can continue building B until |B | = k + 1, as desired.
Property 2. We first note that we can pick B = {v1 , . . . , vk } to be any set of k linearly indepen-
dent vectors– this just follows from the argument above for Property 1.1. This is because the
span of B is contained in S. However, since |S| = q k and the span of B has q k vectors, the two
have to be the same.
Property 3. Property 3 above follows from another fact that every linear subspace S has a null
space N ⊆ Fnq such that for every x ∈ S and y ∈ N , 〈x, y〉 = 0. Further, it is known that N itself is
a linear subspace of dimension n − k. (The claim that N is also a linear subspace follows from
the following two facts: for every x, y, z ∈ Fnq , (i) 〈x, y + z〉 = 〈x, y〉 + 〈x, z〉 and (ii) for any a ∈ Fq ,
〈x, ay〉 = a ·〈x, y〉.) In other words, there exists a generator matrix H for it. This matrix H is called
the parity check matrix of S.
2
See Exercise 2.7.
47
Property 4. See Exercise 2.8.
As examples, the linear subspace S 1 in (2.1) has as one of its generator matrices
¡ ¢
G1 = 1 1 1
Further, the linear subspace S 2 in (2.2) has G 2 as one of its generator matrices and has the fol-
lowing as one of its parity check matrices
¡ ¢
H2 = 1 1 2 .
Lemma 2.2.8. Given matrix G of dimension k × n that is a generator matrix of subspace S 1 and
matrix H of dimension (n −k)×n that is a parity check matrix of subspace S 2 such that G H T = 0,
then S 1 = S 2 .
Proof. We first prove that S 1 ⊆ S 2 . Given any c ∈ S 1 , there exists x ∈ Fkq such that c = xG. Then,
¡ ¢T
H · cT = H · (xG)T = HG T xT = G H T xT = 0,
Definition 2.3.1 (Linear Codes). Let q be a prime power (i.e. q = p s for some prime p and integer
s ≥ 1). C ⊆ Fq is a linear code if it is a linear subspace of Fnq . If C has dimension k and distance d
then it will be referred to as an [n, k, d ]q or just an [n, k]q code.
Theorem 2.2.7 now gives two alternate characterizations of an [n, k]q linear code C : C is
generated by a k × n generator matrix G. Alternately C is characterized by a (n − k) × n parity
check matrix H . Since these are important concepts for us, we define these formally below
before giving examples and consequences.
3
If not, S 1 ⊂ S 2 which implies that that |S 2 | ≥ |S 1 | + 1. The latter is not possible if both S 1 and S 2 have the same
dimension.
48
Definition 2.3.2 (Generator and Parity Check Matrices). If C is an [n, k]q linear code then there
exists a matrix G ∈ Fk×n
q of rank k satisfying
C = {x · G|x ∈ Fkq }.
C = {y ∈ Fnq |H · yT ∈ Fkq }.
Note that we require G and H to have full row rank (i.e., the rows of G are linearly indepen-
dent and the same holds for H ). Sometimes we will consider matrices M ∈ Fm×n q that are not of
full row rank. These can still be used to generate a code C = {{x · G|x ∈ Fmq } though the code C
will not be an [n, m]q code. We will still refer to C as the code generated by M in such a case,
though the phrase “generator matrix” will be reserved for full rank matrices.
Note that neither the generator matrix nor the parity check matrix are unique for a given
code. However their dimensions are. We give examples of these matrices for the case of the
[7, 4, 3]2 Hamming code below.
• The [7, 4, 3]2 Hamming code has the following generator matrix:
1 0 0 0 0 1 1
0 1 0 0 1 0 1
G =
0 0 1 0 1 1 0
0 0 0 1 1 1 1
• The following matrix is a parity check matrix of the [7, 4, 3]2 Hamming code:
0 0 0 1 1 1 1
H = 0 1 1 0 0 1 1
1 0 1 0 1 0 1
Indeed, it can be easily verified that G · H T = 0. Then Lemma 2.2.8 proves that H is indeed
a parity check matrix of the [7, 4, 3]2 Hamming code.
We now look at some consequences of the above characterizations of an [n, k]q linear code
C . We started this chapter with a quest for succinct representation of a code. Note that both the
generator matrix and the parity check matrix can be represented using O(n 2 ) symbols from Fq
(which is much smaller than the exponential representation of a general code). More precisely
we have the following (see also Exercise 2.10):
Proposition 2.3.3. Any [n, k]q linear code can be represented with min(nk, n(n − k)) symbols
from Fq .
49
There is an encoding algorithm for C that runs in O(n 2 ) (in particular O(kn)) time– given a
message m ∈ Fkq , the corresponding codeword C (m) = m · G, where G is the generator matrix of
C . (See Exercise 2.11.)
Proposition 2.3.4. For any [n, k]q linear code, given its generator matrix, encoding can be done
with O(nk) operations over Fq .
There is an error-detecting algorithm for C that runs in O(n 2 ). This is a big improvement
over the naive brute force exponential time algorithm (that goes through all possible codewords
c ∈ C and checks if y = c). (See Exercise 2.12.)
Proposition 2.3.5. For any [n, k]q linear code, given its parity check matrix, error detection can
be performed in O(n(n − k)) operations over Fq .
d = min wt(c).
c∈C ,
c6=0
Proof. To show that d is the same as the minimum weight we show that d is no more than the
minimum weight and d is no less than the minimum weight.
First, we show that d is no more than the minimum weight. We can see this by considering
∆(0, c′ ) where c′ is the non-zero codeword in C with minimum weight; its distance from 0 is
equal to its weight. Thus, we have d ≤ w t (c′ ), as desired.
Now, to show that d is no less than the minimum weight, consider c1 6= c2 ∈ C such that
∆(c1 , c2 ) = d . Note that c1 − c2 ∈ C (this is because −c2 = −1 · c2 ∈ C , where −1 is the additive
inverse of 1 in Fq and c1 − c2 = c1 + (−c2 ), which in in C by the definition of linear codes). Now
note that w t (c1 − c2 ) = ∆(c1 , c2 ) = d , since the non-zero symbols in c1 − c2 occur exactly in the
positions where the two codewords differ. Further, since c1 6= c2 , c1 − c2 6= 0, which implies that
the minimum Hamming weight of any non-zero codeword in C is at most d .
Next, we look at another property implied by the parity check matrix of a linear code.
Proposition 2.3.7. For every [n, k, d ]q code C with parity check matrix H , d equals the size of the
smallest subset of columns of H that are linearly dependent.
50
Proof. By Proposition 2.3.6, we need to show that the minimum weight of a non-zero codeword
in C is the minimum number of linearly dependent columns. Let t be the minimum number of
linearly dependent columns in H . To prove the claim we will show that t ≤ d and t ≥ d .
For the first direction, Let c 6= 0 ∈ C be a codeword with w t (c) = d . Now note that, by the
definition of the parity check matrix, H ·cT = 0. Working through the matrix multiplication, this
P
gives us that ni=1 c i H i , where
↑ ↑ ↑ ↑
1 2
H = H H ··· Hi ··· Hn
↓ ↓ ↓ ↓
and c = (c 1 , . . . , c n ). Note that we can skip multiplication for those columns for which the corre-
sponding bit c i is zero, so for this to be zero, those H i with c i 6= 0 are linearly dependent. This
means that d ≥ t , as the columns corresponding to non-zero entries in c are one instance of
linearly dependent columns.
For the other direction, consider the minimum set of columns from H , H i 1 , H i 2 , . . . , H i t that
are linearly dependent. This implies that there exists non-zero elements c i′ , . . . , c i′ ∈ Fq such
1 t
that c i′ H i 1 + . . . + c i′ H i t = 0. (Note that all the c i′ are non-zero as no set of less than t columns
i t j
are linearly dependent.) Now extend c i′ , . . . , c i′ to the vector c′ such that c ′j = 0 for j 6∈ {i 1 , . . . , i t }.
1 t
¡ ′ ¢T
Note that we have H · c = 0 and thus, we have c′ ∈ C . This in turn implies that d ≤ w t (c′ ) = t
(where recall t is the minimum number of linearly independent columns in H ).
In other words, the general [2r − 1, 2r − r − 1]2 Hamming code is the code
r
{c ∈ {0, 1}2 −1
|Hr · cT = 0}.
51
and the resulting code was a [7, 4, 3]2 code.
Next we argue that the above Hamming code has distance 3 (in Proposition 1.5.2, we argued
this for r = 3).
Now note that under the Hamming bound for d = 3 (Theorem 1.6.2), k ≤ n − log2 (n + 1), so
for n = 2r − 1, k ≤ 2r − r − 1. Hence, the Hamming code is a perfect code. (See Definition 1.7.3.)
In Question 1.7.1, we asked which codes are perfect codes. Interestingly, the only perfect
binary codes are the following:
• The trivial [n, 1, n]2 codes for odd n (which have 0n and 1n as the only codewords): see
Exercise 2.22.
52
Algorithm 3 Naive Decoder for Hamming Code
I NPUT: Received word y
O UTPUT: c if ∆(y, c) ≤ 1 else Fail
1: IF y ∈ C H ,r THEN
2: RETURN y
3: FOR i = 1 . . . n DO
4: y′ ← y + ei ⊲ ei is the i th standard basis vector
5: IF y′ ∈ C H ,r THEN
6: RETURN y′
7: RETURN Fail
w t (z) ≤ t ) and check if y − z is in the code or not. Algorithm 4 presents the formal algorithm
(where C is an [n, k, 2t + 1]q code).
1: FOR i = 0 . . . t DO
2: FOR S ⊆ [n] such that |S| = i DO
3: FOR z ∈ Fnq such that w t (zS ) = w t (z) = i DO
4: IF y − z ∈ C THEN
5: RETURN y − z
6: RETURN Fail
P ¡ ¢
The number of error patterns z considered by Algorithm 4 is5 it =0 ni (q − 1)i ≤ O((nq)t ).
Further by Proposition 2.3.5, Step 4 can be performed with O(n 2 ) operations over Fq . Thus, Al-
gorithm 4 runs with O(n t +2 q t ) operations over Fq , which for q being a small polynomial in n,
is n O(t ) operations. In other words, the algorithm will have polynomial running time for codes
with constant distance (though the running time would not be practical even for moderate val-
ues of t ).
However, it turns out that for Hamming codes there exists a decoding algorithm with an
O(n 2 ) running time. To see this, first note that if the received word y has no errors, then Hr ·yT =
0. If not, then y = c + ei , where c ∈ C and ei is the unit vector with the only nonzero element at
the i -th position. Thus, if Hir stands for the i -th column of Hr ,
where the second equality follows as Hr · cT = 0, which in turn follows from the fact that c ∈ C .
In other words, Hr · yT gives the location of the error. This leads to Algorithm 5.
5
Recall (1.15).
53
Algorithm 5 Efficient Decoder for Hamming Code
I NPUT: Received word y
O UTPUT: c if ∆(y, c) ≤ 1 else Fail
1: b ← Hr · yT .
2: Let i ∈ [n] be the number whose binary representation is b
3: IF y − ei ∈ C H THEN
4: RETURN y − ei
5: RETURN Fail
Note that Hr is an r × n matrix where n = 2r − 1 and thus, r = Θ(log n). This implies Step 1 in
Algorithm 5, which is a matrix vector multiplication can be done in time O(n log n). By a similar
argument and by Proposition 2.3.5 Step 3 can be performed in O(n log n) time, and therefore
Algorithm 5 overall runs in O(n log n) time. Thus,
Theorem 2.5.1. The [n = 2r −1, 2r −r −1, 3]2 Hamming code is 1-error correctable. Furthermore,
decoding can be performed in time O(n log n).
Definition 2.6.1 (Dual of a code). Let H be a parity check matrix of a code C , then the code
generated by H is called the dual of C . The dual of a code C is denoted by C ⊥ .
It is obvious from the definition that if C is an [n, k]q code, then C ⊥ is an [n, n −k]q code. Ap-
plying duality to the Hamming codes and a close relative, we get two families of codes described
below.
Definition 2.6.2 (Simplex and Hadamard Codes). For positive integer r the Simplex Code C Si m,r
⊥
is the code generated by Hr . (Equivalently C Si m,r = C H ,r .) For positive integer r the Hadamard
Code C H ad ,r is the [2r , r ]2 code generated by the r × 2r matrix Hr′ obtained by adding the all zero
column to Hr .
We claim that C Si m,r and C H ad ,r are [2r − 1, r, 2r −1 ]2 and [2r , r, 2r −1 ]2 codes respectively. The
claimed block length and dimension follow from the definition of the codes, while the distance
follows from the following result.
54
Proof. We first show the result for C H ad ,r . In fact, we will show something stronger: every non-
zero codeword in C H ad ,r has weight exactly equal to 2r −1 (the claimed distance follows from
Proposition 2.3.6). Consider a message x 6= 0. Let its i th entry be x i = 1. x is encoded as
r
c = (x 1 , x 2 , . . . , x r )(Hr0 , Hr1 , . . . , Hr2 −1
),
j
where Hr is the binary representation of 0 ≤ j ≤ 2r − 1 (that is, it contains all the vectors in
j
{0, 1}r ). Further note that the j th bit of the codeword c is 〈x, Hr 〉. Group all the columns of the
generator matrix into pairs (u, v) such that v = u + ei (i.e. v and u are the same except in the i th
position). Notice that this partitions all the columns into 2r −1 disjoint pairs. Then,
〈x, v〉 = 〈x, u + ei 〉 = 〈x, u〉 + 〈x, ei 〉 = 〈x, u〉 + x i = 〈x, u〉 + 1.
Thus we have that exactly one of 〈x, v〉 and 〈x, u〉 is 1. As the choice of the pair (u, v) was arbitrary,
we have proved that for any non-zero codeword c such that c ∈ C H ad , w t (c) = 2r −1 .
For the simplex code, we observe that all codewords of C H ad ,3 are obtained by padding a 0 to
the beginning of the codewords in C Si m,r , which implies that all non-zero codewords in C Si m,r
also have a weight of 2r −1 , which completes the proof.
We remark that the family of Hamming code has a rate of 1 and a (relative) distance of 0
while the families of Simplex/Hadamard codes have a rate of 0 and a relative distance of 1/2.
Thus neither gives a positive answer to Question 1.8.2 and so the quest for an asymptotically
good code remains ongoing for now (and we will get to these in future chapters).
2.7 Exercises
Exercise 2.1. Prove that the set of rationals (i.e. the set of reals of the form ba , where both a and
b 6= 0 are integers), denoted by Q, is a field.
Exercise 2.2. Let q be a prime power. Let x ∈ Fq such that x 6∈ {0, 1}. Then prove that for any
n ≤ q − 1:
X n x n+1 − 1
xi = .
i =0 x −1
Exercise 2.3. The main aim of this exercise is to prove the following identity that is true for any
α ∈ Fq :
αq = α (2.4)
To make progress towards the above we will prove a sequence of properties of groups. A group G
is a pair (S, ◦) where the operator ◦ : G ×G → G such that ◦ is commutative6 and the elements of S
are closed under ◦. Further, there is a special element ι ∈ S that is the identity element and every
element a ∈ S has an inverse element b ∈ S such that a ◦ b = ι. Note that a finite field Fq consists
of an additive group with the + operator (and 0 as additive identity) and a multiplicative group
on the non-zero elements of Fq (which is also denoted by F∗q ) with the · operator (and 1 as the
multiplicative identity).7
6
Technically, G is an abelian group.
7
Recall Definition 2.1.2.
55
For the rest of the problem let G = (S, ·) be a multiplicative group with |G| = m. Prove the
following statements.
1. For any β ∈ G, let o(β) be the smallest integer o such that βo = 1. Prove that such an o ≤ m
always exists. Further, argue that T = {1, β, . . . , βo−1 } also forms a group. (T, ·) is called a
sub-group of G and o(β) is called the order of β.
g T = {g · β|β ∈ T }.
Prove that if h −1 · g ∈ T then g T = hT and g T ∩hT = ; otherwise. Further argue that these
cosets partition the group G into disjoint sets.
4. Using the above results or otherwise, argue that for any β ∈ G, we have
βm = 1.
5. Prove (2.4).
Exercise 2.4. Prove that for q = 2, the second condition in Definition 2.2.2 is implied by the first
condition.
Exercise 2.6. In this problem we will look at the problem of solving a system of linear equations
over Fq . That is, one needs to solve for unknowns x 1 , . . . , x n given the following m linear equations
(where a i , j , b i ∈ Fq for 1 ≤ i ≤ m and 1 ≤ j ≤ n):
1. (Warm-up) Convince yourself that the above problem can be stated as A · xT = bT , where A
is an m × n matrix over Fq , x ∈ Fnq and b ∈ Fm
q .
2. (Upper Triangular Matrix) Assume n = m and that A is upper triangular, i.e. all diagonal
elements (a i ,i ) are non-zero and all lower triangular elements (a i , j , i > j ) are 0. Then
present an O(n 2 ) time8 algorithm to compute the unknown vector x.
8
For this problem, any basic operation over Fq takes unit time.
56
3. (Gaussian Elimination) Assume that A has full rank (or equivalently a rank of n.)
(a) Prove that the following algorithm due to Gauss converts A into an upper triangular
matrix. By permuting the columns if necessary make sure that a 1,1 6= 0. (Why can one
a
assume w.l.o.g. that this can be done?) Multiply all rows 1 < i ≤ n with a1,1 and then
i ,1
subtract a 1, j from the (i , j )th entry 1 ≤ j ≤ n. Recurse with the same algorithm on the
(n −1)×(n −1) matrix A ′ obtained by removing the first row and column from A. (Stop
when n = 1.)
(b) What happens if A does not have full rank? Show how one can modify the algorithm
above to either upper triangulate a matrix or report that it does not have full rank.
(Convince yourself that your modification works.)
(c) Call a system of equations A · xT = bT consistent if there exists a solution to x ∈ Fnq .
Show that there exists an O(n 3 ) algorithm that finds the solution if the system of equa-
tions is consistent and A has full rank (and report “fail" otherwise).
4. (m < n case) Assume that A has full rank, i.e. has a rank of m. In this scenario either the
system of equations is inconsistent or there are q n−m solutions to x. Modify the algorithm
from above to design an O(m 2 n) time algorithm to output the solutions (or report that the
system is inconsistent).
• Note that in case the system is consistent there will be q n−m solutions, which might be
much bigger than O(m 2 n). Show that this is not a problem as one can represent the
solutions as system of linear equations. (I.e. one can have n − m “free" variables and
m “bound" variables.)
5. (m > n case) Assume that A has full rank, i.e. a rank of n. In this scenario either the system
of equations is inconsistent or there is a unique solution to x. Modify the algorithm from
above to design an O(m 2 n) time algorithm to output the solution (or report that the system
is inconsistent).
6. (Non-full rank case) Give an O(m 2 n) algorithm for the general case, i.e. the m × n matrix
A need not have full rank. (The algorithm should either report that the system of equations
is inconsistent or output the solution(s) to x.)
Exercise 2.7. Prove that the span of k linearly independent vectors over Fq has size exactly q k .
Exercise 2.8. Let G and H be a generator and parity check matrix of the same linear code of
dimension k and block length n. Then G · H T = 0.
Exercise 2.9. Let C be an [n, k]q linear code with a generator matrix with no all zeros columns.
Then for every position i ∈ [n] and α ∈ Fq , the number of codewords c ∈ C such that c i = α is
exactly q k−1 .
57
Exercise 2.11. Prove Proposition 2.3.4.
Exercise 2.13. A set of vector S ⊆ Fnq is called t -wise independent if for every set of positions I with
|I | = t , the set S projected to I has each of the vectors in Ftq appear the same number of times. (In
other words, if one picks a vector (s 1 , . . . , s n ) from S at random then any of the t random variables
are uniformly and independently random over Fq ).
Prove that any linear code C whose dual C ⊥ has distance d ⊥ is (d ⊥ − 1)-wise independent.
Exercise 2.14. A set of vectors S ⊆ Fk2 is called ε-biased sample space if the following property
holds. Pick a vector X = (x 1 , . . . , x k ) uniformly at random from S. Then X has bias at most ε, that
is, for every I ⊆ [k],
¯ Ã ! Ã !¯
¯ X X ¯
¯ ¯
¯Pr x i = 0 − Pr x i = 1 ¯ ≤ ε.
¯ i ∈I i ∈I
¯
£¡ 1an [n,
2. Let C be ¢ k]
¡ 12 code
¢ such
¤ that all non-zero codewords have Hamming weight in the
range 2 − γ n, 2 + γ n for some constant 0 < γ < 1/2. Then there exists an ε-biased
−1
space of size n O(γ ·log(1/ε))
.
Exercise 2.15. Let C be an [n, k, d ]q code. Let y = (y 1 , . . . , y n ) ∈ (Fq ∪{?})n be a received word9 such
that y i =? for at most d − 1 values of i . Present an O(n 3 ) time algorithm that outputs a codeword
c = (c 1 , . . . , c n ) ∈ C that agrees with y in all un-erased positions (i.e., c i = y i if y i 6=?) or states that
no such c exists. (Recall that if such a c exists then it is unique.)
Exercise 2.16. In the chapter, we did not talk about how to obtain the parity check matrix of a
linear code from its generator matrix. In this problem, we will look at this “conversion" procedure.
(a) Prove that any generator matrix G of an [n, k]q code C (recall that G is a k × n matrix) can
be converted into another equivalent generator matrix of the form G′ = [Ik |A], where Ik is
the k × k identity matrix and A is some k × (n − k) matrix. By “equivalent," we mean that
the code generated by G′ has a linear bijective map to C .
Note that the code generated by G′ has the message symbols as its first k symbols in the cor-
responding codeword. Such codes are called systematic codes. In other words, every linear
code can be converted into a systematic code. Systematic codes are popular in practice as
they allow for immediate access to the message symbols.
9
A ? denotes an erasure.
58
(b) Given an k × n generator matrix of the form [Ik |A], give a corresponding (n − k) × n parity
check matrix. Briefly justify why your construction of the parity check matrix is correct.
Hint: Try to think of a parity check matrix that can be decomposed into two submatrices: one will be closely
related to A and the other will be an identity matrix, though the latter might not be a k × k matrix).
(c) Use part (b) to present a generator matrix for the [2r − 1, 2r − r − 1, 3]2 Hamming code.
Exercise 2.17. So far in this book we have seen that one can modify one code to get another code
with interesting properties (for example, the construction of the Hadamard code from the Simplex
code from Section 2.6 and Exercise 1.7). In this problem you will need to come up with more ways
of constructing new codes from existing ones.
Prove the following statements (recall that the notation (n, k, d )q code is used for general codes
with q k codewords where k need not be an integer, whereas the notation [n, k, d ]q code stands for
a linear code of dimension k):
1. If there exists an (n, k, d )2m code, then there also exists an (nm, km, d ′ ≥ d )2 code.
2. If there exists an [n, k, d ]2m code, then there also exists an [nm, km, d ′ ≥ d ]2 code.
3. If there exists an [n, k, d ]q code, then there also exists an [n − d , k − 1, d ′ ≥ ⌈d /q⌉]q code.
m
¡ an [n, k,mδn]
4. ¡If there exists ¢ qmcode,
¢ then for every m ≥ 1, there also exists an
n , k/m, 1 − (1 − δ) · n q m code.
5. If
£ there
m
¡ an [n, k,m
exists
1
δn] ¤ then for every odd m ≥ 1, there also exists an
¢ 2 code,
m
n , k, 2 · 1 − (1 − 2δ) · n 2 code.
Note: In all the parts, the only things that you can assume about the original code are only the
parameters given by its definition– nothing else!
Exercise 2.18. Let C 1 be an [n, k 1 , d 1 ]q code and C 2 be an [n, k 2 , d 2 ]q code. Then define a new
code as follows:
C 1 ⊖C 2 = {(c1 , c1 + c2 )|c1 ∈ C 1 , c2 ∈ C 2 }.
Next we will prove interesting properties of this operations on codes:
1. If G i is the generator matrix for C i for i ∈ [2], what is a generator matrix for C 1 ⊖C 2 ?
def
2. Argue that C 1 ⊖C 2 is an [2n, k 1 + k 2 , d = min(2d 1 , d 2 )]q code.
3. Assume there exists algorithms Ai for code C i for i ∈ [2] such that: (i) A1 can decode from
e errors and s erasures such that 2e + s < d 1 and (ii) A2 can decode from ⌊(d 2 − 1)/2⌋ errors.
Then argue that one can correct ⌊(d − 1)/2⌋ errors for C 1 ⊖C 2 .
Hint: Given a received word (y1 , y2 ) ∈ Fnq × Fnq , first apply A2 on y2 − y1 . Then create an intermediate received
word for A1 .
59
4. We will now consider a recursive construction of a binary linear code that uses the ⊖ oper-
ator. For integers 0 ≤ r ≤ m, we define the code C (r, m) as follows:
• C (r, r ) = Fr2 and C (0, r ) is the code with only two codewords: the all ones and all zeroes
vector in Fr2 .
• For 1 < r < m, C (r, m) = C (r, m − 1) ⊖C (r − 1, m − 1).
Exercise 2.20. In Section 2.4 we considered the binary Hamming code. In this problem we will
consider the more general q-ary Hamming code. In particular, let q be a prime power and r ≥ 1
be an integer. Define the following r × n matrix H q,r , where each column is an non-zero vector
from Frq such that the first non-zero entry is 1. For example,
µ ¶
0 1 1 1
H3,2 =
1 0 1 2
In this problem we will derive the parameters of the code. Define the generalized Hamming code
C H ,r,q to be the linear code whose parity check matrix is H q,r . Argue that
q r −1
1. The block length of C H ,r,q is n = q−1 .
Exercise 2.21. Design the best 6-ary code (family) with distance 3 that you can.
Exercise 2.22. Prove that the [n, 1, n]2 code for odd n (i.e. the code with the all zeros and all ones
vector as it only two codewords) attains the Hamming bound (Theorem 1.7.2).
Exercise 2.23. Let C be an [n, k]q code with generator matrix G. Then given a codeword c ∈ C
one can compute the corresponding message in time O(kn 2 ).
Exercise 2.24. Given a c ∈ C H ,r , one can compute the corresponding message in time O(n).
60
Exercise 2.25. Let C be an (n, k)q code. Prove that if C can be decoded from e errors in time T (n),
then it can be decoded from n + c errors in time O((nq)c · T (n)).
Exercise 2.26. Show that the bound of kd of the number of ones in the generator matrix of any
binary linear code (see Exercise 1.14) cannot be improved for every code.
¡ ¢⊥
Exercise 2.27. Let C be a linear code. Then prove that C ⊥ = C .
Exercise 2.28. Note that for any linear code C , the codewords 0 is in both C and C ⊥ . Show that
there exists a linear code C such that it shares a non-zero codeword with C ⊥ .
Exercise 2.29. We go into a bit of diversion and look at how finite fields are different from infinite
fields (e.g. R). Most of the properties of linear subspaces that we have used for linear codes (e.g.
notion of dimension, the existence of generator and parity check matrices, notion of duals) also
hold for linear subspaces over R.10 One trivial property that holds for linear subspaces over finite
fields that does not hold over R is that linear subspaces over Fq with dimension k has size q k
(though this is a trivial consequence that F q are finite field while R is an infinite field). Next, we
consider a more subtle distinction.
Let S ⊆ Rn be a linear subspace over R and let S ⊥ is the dual of S. Then show that
S ∩ S ⊥ = {0} .
By contrast, linear subspaces over finite fields can have non-trivial intersection with their duals
(see e.g. Exercise 2.28).
Exercise 2.31. A linear code C is called self dual if C = C ⊥ . Show that for
Exercise 2.32. Given a code C a puncturing of C is another code C ′ where the same set of positions
are dropped in all codewords of C . More precisely, if C ⊆ Σn and the set of punctured positions is
P ⊆ [n], then the punctured code is {(c i )i 6∈P |(c 1 , . . . , c n ) ∈ C }.
Prove that a linear code with no repetitions (i.e. there are no two positions i 6= j such that for
every codeword c ∈ C , c i = c i ) is a puncturing of the Hadamard code. Hence, Hadamard code is
the “longest" linear code that does not repeat.
10
A linear subspace S ⊆ Rn is the same as in Definition 2.2.2 where all occurrences of the finite field Fq is replaced
by R.
61
Exercise 2.33. In this problem we will consider the long code. For the definition, we will use
the functional way of looking at the ambient space as mentioned in Remark 1.2.2. A long code
of dimension k is a binary code such that the codeword corresponding to x = Fk2 , is the function
k k
f : {0, 1}2 → {0, 1} defined as follows. For any m ∈ {0, 1}F2 , we have f ((m α )α∈Fk ) = m x . Derive the
2
parameters of the long code.
Finally, argue that the long code is the code with the longest block length such that the code-
words do not have a repeated coordinate (i.e. there does not exists i 6= j such that for every code-
word c, c i = c j ). (Contrast this with the property of Hadamard code above.)
62
Chapter 3
In the chapters to come we will explore questions of the form: “Given n, k, d and q does an
(n, k, d )q code exist?” To answer such questions we will apply the “probabilistic method” — the
method that demonstrates the existence of an object with a given property by showing that a
randomly chosen object has the property with positive probability. To elaborate on this sen-
tence we need to introduce the basic language and tools of probability theory which we do in
Section 3.1.
We then introduce the probabilistic method in Section 3.2. We even apply the method to
answer a very simple question:
We note that the answer to the above question is trivially yes: just pick the generator matrix
to be the 2 × 2 identity matrix. But our proof will have the advantage of generalizing to broader
settings, though we save the generalizations for later chapters.
Finally in Section 3.3 we introduce the “entropy function” which turns out to be central in
the understanding of limits of codes (both existence and non-existence).
63
G U (G) V00 V01 V10 V11 G U (G) V00 V01 V10 V11
µ ¶ µ ¶
0 0 1 1 0 1
16
0 0 0 0 16
0 0 1 1
0 0 0 0
µ ¶ µ ¶
0 0 1 1 0 1
16 0 1 0 1 16 0 1 1 2
0 1 0 1
µ ¶ µ ¶
0 0 1 1 0 1
16 0 1 0 1 16 0 1 1 0
1 0 1 0
µ ¶ µ ¶
0 0 1 1 0 1
16 0 2 0 2 16 0 2 1 1
1 1 1 1
µ ¶ µ ¶
0 1 1 1 1 1
16 0 0 1 1 16 0 0 2 2
0 0 0 0
µ ¶ µ ¶
0 1 1 1 1 1
16
0 1 1 0 16
0 1 2 1
0 1 0 1
µ ¶ µ ¶
0 1 1 1 1 1
16 0 1 1 2 16 0 1 2 1
1 0 1 0
µ ¶ µ ¶
0 1 1 1 1 1
16 0 2 1 1 16 0 2 2 0
1 1 1 1
Table 3.1: Uniform distribution over F22×2 along with values of four random variables.
where [0, 1] is shorthand for the interval of all real numbers between 0 and 1.
An event E is a predicate over the domain D, i.e. it maps every element of D to “true” or “false”.
Equivalently an event is a subset of the domain D, i.e., those elements that are mapped to true.
We switch between “logical” or ”set-theoretic” notation to denote combinations of events. So
the disjunction of events E1 and E2 may be denoted E1 ∨ E2 or E1 ∪ E2 . Similarly the conjunction
of E1 and E2 may be denoted E1 ∧ E2 or E1 ∩ E2 ; and the negation of E1 may be denote ¬E1 or E1 .
In this book, we will primarily deal with the following special distribution:
For example, consider the domain D = F22×2 , i.e. the set of all 2 × 2 matrices over F2 . (Note
that each such matrix is a generator matrix of some [2, 2]2 code.) The first two columns of Ta-
ble 3.1 list the elements of this D along with the corresponding probabilities for the uniform
distribution.
Typically, we will be interested in a real-valued function defined on D and how it behaves
under a probability distribution defined over D. This is captured by the notion of a random
variable1 :
1
We note that the literature on probability theory allows for more general random variables, but for our purposes
we restrict only to real-valued ones.
64
Definition 3.1.2 (Random Variable). Let D be a finite domain and I ⊂ R be a finite2 subset. Let p
be a probability distribution defined over D. A random variable is a function:
V : D → I.
E [1E ] = Pr [E is true] .
Next, we state a simple yet useful property of expectation of a sum of random variables:
2
In general, I need not be finite. However, for this book this definition suffices.
65
Proposition 3.1.4 (Linearity of Expectation). Given random variables V1 , . . . ,Vm defined over the
same domain D and with the same probability distribution p, we have
" #
m
X Xm
E Vi = E [Vi ] .
i =1 i =1
In the equalities above, (3.4) and (3.7) follow from the definition of expectation of a random
variable. (3.5) follows from the definition of V and (3.6) follows by switching the order of the
two summations.
As an example, we have
£ ¤ 3
E 1V01 =0 + 1V10 =0 + 1V11 =0 = (3.8)
4
Frequently, we will need to deal with the probability of the union of events. We will use the
following result to upper bound such probabilities:
Proposition 3.1.5 (Union Bound). Given m binary random variables A 1 , . . . , A m , we have
"Ã ! #
m
_ m
X
Pr Ai = 1 ≤ Pr [A i = 1] .
i =1 i =1
In the above, (3.9) and (3.11) follow from the definition of S i . (3.10) follows from the fact that
some of the x ∈ ∪i S i get counted more than once.
66
We remark that the union bound is tight when the events are disjoint. (In other words, using
the notation in the proof above, when S i ∩ S j = ; for every i 6= j .)
As an example, let A 1 = 1V01 =0 , A 2 = 1V10 =0 and A 3 = 1V11 =0 . Note that in this case the event
A 1 ∨ A 2 ∨ A 3 is the same as the event that there exists a non-zero m ∈ {0, 1}2 such that w t (m·G) =
0. Thus, the union bound implies (that under the uniform distribution over F22×2 )
£ ¤ 3
Pr There exists an m ∈ {0, 1}2 \ {(0, 0)}, such that w t (mG) = 0 ≤ . (3.12)
4
Finally, we present two bounds on the probability of a random variable deviating signifi-
cantly from its expectation. The first bound holds for any random variable:
Lemma 3.1.6 (Markov Bound). Let V be a non-zero random variable. Then for any t > 0,
E[V ]
Pr[V ≥ t ] ≤ .
t
In particular, for any a ≥ 1,
1
Pr[V ≥ a · E[V ]] ≤
.
a
Proof. The second bound follows from the first bound by substituting t = a · E[V ]. Thus, to
complete the proof, we argue the first bound. Consider the following sequence of relations:
X X
E[V ] = i · Pr[V = i ] + i · Pr[V = i ] (3.13)
i ∈[0,t ) i ∈[t ,∞)
X
≥ i · Pr[V = i ] (3.14)
i ≥t
X
≥t· Pr[V = i ] (3.15)
i ≥t
= t · Pr[V ≥ t ]. (3.16)
In the above relations, (3.13) follows from the definition of expectation of a random variable and
the fact that V is positive. (3.14) follows as we have dropped some non-negative terms. (3.15)
follows by noting that in the summands i ≥ t . (3.16) follows from the definition of Pr[V ≥ t ].
The proof is complete by noting that (3.16) implies the claimed bound.
The second bound works only for sums of independent random variables. We begin by
defining independent random variables:
Definition 3.1.7 (Independence). Two random variables A and B are called independent if for
every a and b in the ranges of A and B respectively, we have
Pr[A = a ∧ B = b] = Pr[A = a] · Pr[B = b].
For example, for the uniform distribution in Table 3.1, let A denote the bit G 0,0 and B denote
the bit G 0,1 . It can be verified that these two random variables are independent. In fact, it can be
verified all the random variables corresponding to the four bits in G are independent random
variables. (We’ll come to a related comment shortly.)
Another related concept that we will use is that of probability of an event happening condi-
tioned on another event happening:
67
Definition 3.1.8 (Conditional Probability). Given two events A and B defined over the same do-
main and probability distribution, we define the probability of A conditioned on B as
Pr[A and B ]
Pr[A|B ] = .
Pr[B ]
For example, note that
4/16 1
= .
Pr[1V01 =1 |G 0,0 = 0] =
1/2 2
The above definition implies that two events A and B are independent if and only if Pr[A] =
Pr[A|B ]. We will also use the following result later on in the book (see Exercise 3.2):
Lemma 3.1.9. For any two events A and B defined on the same domain and the probability
distribution:
Pr[A] = Pr[A|B ] · Pr[B ] + Pr[A|¬B ] · Pr[¬B ].
Next, we state a deviation bound that asserts that the sum of independent random variables
takes values close to its expectation with high probability. We only state it for sums of binary
random variables, which is the form that will be needed in the book. We refer to this bound as
the “Chernoff bound” though we note that this is part of a larger body of work and the biblio-
graphic notes give more details.
Theorem 3.1.10 (Chernoff Bound). Let X 1 , . . . , X m be independent binary random variables and
P
define X = X i . Then the multiplicative Chernoff bound sates that for 0 < ε ≤ 1,
2
E(X )/3
Pr [|X − E(X )| > εE(X )] < 2e −ε ,
We omit the proof, which can be found in any standard textbook on randomized algorithms.
Finally, we present an alternate view of uniform distribution over product spaces and then
use that view to prove a result that we will use later in the book. Given probability distributions
p 1 and p 2 over domains D1 and D2 respectively, we define the product distribution p 1 × p 2 over
D1 × D2 as follows: every element (x, y) ∈ D1 × D2 under p 1 × p 2 is picked by choosing x from
D1 according to p 1 and y is picked independently from D2 under p 2 . This leads to the following
observation (see Exercise 3.4).
Lemma 3.1.11. For any m ≥ 1, the distribution U D1 ×D2 ×···×Dm is identical3 to the distribution
U D1 × U D2 × · · · × U Dm .
For example, the uniform distribution in Table 3.1 can be described equivalently as follows:
pick each of the four bits in G independently and uniformly at random from {0, 1}.
We conclude this section by proving the following result:
3
We say two distributions p 1 and p 2 on D are identical if for every x ∈ D, p 1 (x) = p 2 (x).
68
Lemma 3.1.12. Given a non-zero vector m ∈ Fkq and a uniformly random k × n matrix G over Fq ,
the vector m · G is uniformly distributed over Fnq .
Note that the above inequality proves the existence of C with property P .
As an example consider Question 3.0.1. To answer this in the affirmative, we note that the
set of all [2, 2]2 linear codes is covered by the set of all 2 × 2 matrices over F2 . Then, we let D be
the uniform distribution over F22×2 . Then by Proposition 2.3.6 and (3.12), we get that
3
Pr [There is no [2, 2, 1]2 code] ≤ < 1,
U F2×2 4
2
which by the probabilistic method answers the Question 3.0.1 in the affirmative.
For the more general case, when we apply the probabilistic method, the typical approach
will be to define (sub-)properties P 1 , . . . , P m such that P = P 1 ∧P 2 ∧P 3 . . .∧P m and show that for
every 1 ≤ i ≤ m:
£ ¤ h i 1
Pr C doesn’t have property P i = Pr P i < .
m
69
£ ¤
Finally, by the union bound, the above will prove that4 Pr C doesn’t have property P < 1, as
desired.
As an example, an alternate way to answer Question 3.0.1 in the affirmative is the following.
Define P 1 = 1V01 ≥1 , P 2 = 1V10 ≥1 and P 3 = 1V11 ≥1 . (Note that we want a [2, 2]2 code that satisfies
P 1 ∧ P 2 ∧ P 3 .) Then, by (3.1), (3.2) and (3.3), we have for i ∈ [3],
£ ¤ h i 1 1
Pr C doesn’t have property P i = Pr P i = < ,
4 3
as desired.
Finally, we mention a special case of the general probabilistic method that we outlined
above. In particular, let P denote the property that the randomly chosen C satisfies f (C ) ≤ b.
Then we claim (see Exercise 3.5) that E[ f (C )] ≤ b implies that Pr[C has property P ] > 0. Note
that this implies that E[ f (C )] ≤ b implies that there exists a code C such that f (C ) ≤ b.
70
1
q=2
q=3
0.9 q=4
0.8
0.7
0.5
0.4
0.3
0.2
0.1
0
0 0.2 0.4 0.6 0.8 1
x --->
Figure 3.1: A plot of H q (x) for q = 2, 3 and 4. The maximum value of 1 is achieved at x = 1−1/q.
Definition 3.3.2 (Volume of a Hamming Ball). Let q ≥ 2 and n ≥ r ≥ 1 be integers. Then the
volume of a Hamming ball of radius r is given by
à !
X r n
V ol q (r, n) = |B q (0, r )| = (q − 1)i .
i =0 i
The choice of 0 as the center for the Hamming ball above was arbitrary: since the volume
of a Hamming ball is independent of its center (as is evident from the last equality above), we
could have picked any point as the center.
We will prove the following result:
Proof. We start with the proof of (i). Consider the following sequence of relations:
1 = (p + (1 − p))n
à !
X n n
= p i (1 − p)n−i (3.17)
i =0 i
à ! à !
Xpn Xn
n i n i
= p (1 − p)n−i + p (1 − p)n−i
i =0 i i =pn+1 i
71
à !
Xpn
n i
≥ p (1 − p)n−i (3.18)
i =0 i
à ! µ ¶
pn
X n i p i
= (q − 1) (1 − p)n−i
i =0 i q −1
à ! µ ¶i
Xpn
n i n p
= (q − 1) (1 − p)
i =0 i (q − 1)(1 − p)
à ! µ ¶pn
Xpn
n i n p
≥ (q − 1) (1 − p) (3.19)
i =0 i (q − 1)(1 − p)
à ! µ ¶
pn
X n p pn
= (q − 1)i (1 − p)(1−p)n (3.20)
i =0 i q − 1
≥ V ol q (pn, n)q −Hq (p)n . (3.21)
In the above, (3.17) follows from the binomial expansion. (3.18) follows by dropping the second
p
sum and (3.19) follows from the facts that (q−1)(1−p) ≤ 1 (as5 p ≤ 1−1/q). Rest of the steps except
³ ´
−H q (p)n p pn
(3.21) follow from rearranging the terms. (3.21) follows as q = q−1 (1 − p)(1−p)n .
(3.21) implies that
1 ≥ V ol q (pn, n)q −Hq (p)n ,
which proves (i).
We now turn to the proof of part (ii). For this part, we will need Stirling’s approximation for
n! (Lemma B.1.2).
By the Stirling’s approximation, we have the following inequality:
à !
n n!
=
pn (pn)!((1 − p)n)!
(n/e)n 1
> ·p · e λ1 (n)−λ2 (pn)−λ2 ((1−p)n)
(pn/e)pn ((1 − p)n/e)(1−p)n 2πp(1 − p)n
1
= · ℓ(n), (3.22)
p pn (1 − p)(1−p)n
λ1 (n)−λ2 (pn)−λ2 ((1−p)n)
where ℓ(n) = e p .
2πp(1−p)n
Now consider the following sequence of relations that complete the proof:
à !
n
V ol q (pn, n) ≥ (q − 1)pn (3.23)
pn
(q − 1)pn
> · ℓ(n) (3.24)
p pn (1 − p)(1−p)n
≥ q Hq (p)n−o(n) . (3.25)
5 p p q−1 q−1
Indeed, note that (q−1)(1−p) ≤ 1 is true if 1−p ≤ 1 , which in turn is true if p ≤ q , where the last step follows
from Lemma B.2.1.
72
In the above (3.23) follows by only looking at one term. (3.24) follows from (3.22) while (3.25)
follows from the definition of H q (·) and the fact that for large enough n, ℓ(n) is q −o(n) .
Next, we consider how the q-ary entropy function behaves for various ranges of its parame-
ters.
Proposition 3.3.4. For small enough ε, 1 − H q (ρ) ≥ 1 − ρ − ε for every 0 < ρ ≤ 1 − 1/q if and only
if q is 2Ω(1/ε) .
We will also be interested in how H q (x) behaves for fixed x and increasing q:
6
The last equality follows from the fact that by Lemma B.2.2, for 0 < x < 1, ln(1 − x) = −O(x).
73
Lemma 3.3.5. Let q ≥ 2 be an integer and let 0 ≤ ρ ≤ 1 − 1/q, then for any real m ≥ 1 such that
µ ¶q−1
m−1 1
q ≥ 1+ , (3.26)
q −1
we have
H q (ρ) ≥ H q m (ρ).
Proof. Note that H q (0) = H q m (0) = 0. Thus, for the rest of the proof we will assume that ρ ∈
(0, 1 − 1/q].
As observed in the proof of Proposition 3.3.4, we have
log(q − 1) 1
H q (ρ) = ρ · + H (ρ) · .
log q log q
1 H (ρ)
· m log q · (H q (ρ) − H q m (ρ)) = log(q − 1)m − log(q m − 1) + (m − 1)
ρ ρ
H (1 − 1/q)
≥ log(q − 1)m − log(q m − 1) + (m − 1) (3.27)
1 − 1/q
µ ¶
m m q log q
= log(q − 1) − log(q − 1) + (m − 1) log +
q −1 q −1
µ µ ¶ ¶
(q − 1)m q m−1 m−1
= log · ·q q−1
qm − 1 q −1
à m−1 !
(q − 1) · q m−1 · q q−1
= log
qm − 1
≥0 (3.28)
In the above (3.27) follows from the fact that H (ρ)/ρ is decreasing7 in ρ and that ρ ≤ 1 − 1/q.
(3.28) follows from the claim that
m−1
(q − 1) · q q−1 ≥ q.
74
Since (1 + 1/x)x ≤ e (by Lemma B.2.5), we also have that (3.26) is also satisfied for m ≥ 1 +
1
ln q
. Further, we note that (3.26) is satisfied for every m ≥ 2 (for any q ≥ 3), which leads to the
following (also see Exercise 3.6):
Corollary 3.3.6. Let q ≥ 3 be an integer and let 0 ≤ ρ ≤ 1 − 1/q, then for any m ≥ 2, we have
H q (ρ) ≥ H q m (ρ).
Next, we look at the entropy function when its input is very close to 1.
Proof. The intuition behind the proof is the following. Since the derivative of H q (x) is zero at
x = 1 − 1/q, in the Taylor expansion of H q (1 − 1/q − ε) the ε term will vanish. We will now make
this intuition more concrete. We will think of q as fixed and 1/ε as growing. In particular, we
will assume that ε < 1/q. Consider the following equalities:
µ ¶ µ ¶ µ ¶ µ ¶
1 1 − 1/q − ε 1 1
H q (1 − 1/q − ε) = − 1 − − ε logq − + ε logq +ε
q q −1 q q
µ µ ¶¶ µ ¶ µ ¶
1 εq 1 1 − (εq)/(q − 1)
= − logq 1− + + ε logq
q q −1 q 1 + εq
· µ ¶ µ ¶ µ ¶¸
1 εq 1 1 − (εq)/(q − 1)
= 1− ln 1 − − + ε ln
ln q q −1 q 1 + εq
· 2 2 µ ¶µ
2 1 εq ε q 1 εq
= 1 + o(ε ) − − − − + ε −
ln q q − 1 2(q − 1)2 q q −1
2 2 2 2 ¶¸
ε q ε q
− 2
− εq + (3.29)
2(q − 1) 2
·
2 1 εq ε2 q 2
= 1 + o(ε ) − − −
ln q q − 1 2(q − 1)2
µ ¶µ ¶¸
1 εq 2 ε2 q 3 (q − 2)
− +ε − +
q q −1 2(q − 1)2
· ¸
2 1 ε2 q 2 ε2 q 2 ε2 q 2 (q − 2)
= 1 + o(ε ) − − + − (3.30)
ln q 2(q − 1)2 q − 1 2(q − 1)2
ε2 q 2
= 1− + o(ε2 )
2 ln q(q − 1)
ε2 q 2
≤ 1−
4 ln q(q − 1)
(3.31)
75
(3.29) follows from the fact that for |x| < 1, ln(1 + x) = x − x 2 /2 + x 3 /3 − . . . (Lemma B.2.2) and
by collecting the ε3 and smaller terms in o(ε2 ). (3.30) follows by rearranging the terms and by
absorbing the ε3 terms in o(ε2 ). The last step is true assuming ε is small enough.
Next, we look at the entropy function when its input is very close to 0.
Proof. By definition
Further, by Lemma B.2.2, (1 − ε) logq (1/(1 − ε)) ≤ 2ε/ ln q for small enough ε. Thus, this implies
that µ ¶
2 + ln(q − 1) 1 1
H q (ε) ≤ ·ε+ · ε ln . (3.33)
ln q ln q ε
(3.32) and (3.33) proves the claimed bound.
We will also work with the inverse of the q-ary entropy function. Note that H q (·) on the
domain [0, 1 − 1/q] is a bijective map into [0, 1]. Thus, we define H q−1 (y) = x such that H q (x) = y
and 0 ≤ x ≤ 1 − 1/q. Finally, we will need the following lower bound:
Lemma 3.3.9. For every 0 ≤ y ≤ 1 − 1/q and for every small enough ε > 0,
Proof. It is easy to check that H q−1 (y) is a strictly increasing convex function when y ∈ [0, 1]. This
implies that the derivative of H q−1 (y) increases with y. In particular, (H q−1 )′ (1) ≥ (H q−1 )′ (y) for
H q−1 (y)−H q−1 (y−δ)
every 0 ≤ y ≤ 1. In other words, for every 0 < y ≤ 1, and (small enough) δ > 0, δ ≤
H q−1 (1)−H q−1 (1−δ)
δ
Proposition 3.3.7 along with the facts that H q−1 (1) = 1−1/q
. and H q−1 is increasing
completes the proof if one picks c q′ = max(1, 1/c q ) and δ = ε2 /c q′ .
76
3.4 Exercises
Exercise 3.1. Prove Lemma 3.1.3.
Exercise 3.3. In this exercise, we will see a common use of the Chernoff bound (Theorem 3.1.10).
Say we are trying to determine an (unknown) value x ∈ F to which we have access to via a ran-
domized algorithm A that on input (random) input r ∈ {0, 1}m outputs an estimate A (r) of x
such that
1
Pr [A(r) = x] ≥ + γ,
r 2
³ ´
for some 0 < γ < 21 . Then show that for any t ≥ 1 with O γt2 calls to A one can determine x with
probability at least 1 − e −t .
Hint: Call A with independent random bits and take majority of the answer and then use the Chernoff bound.
Exercise 3.5. Let P denote the property that the randomly chosen C satisfies f (C ) ≤ b. Then
E[ f (C )] ≤ b implies that Pr[C has property P ] > 0.
Exercise 3.6. Show that for any Q ≥ q ≥ 2 and ρ ≤ 1 − 1/q, we have HQ (ρ) ≤ H q (ρ).
77
78
Part II
The Combinatorics
79
Chapter 4
In this chapter, we will try to tackle Question 1.8.1. We will approach this trade-off in the fol-
lowing way:
If we fix the relative distance of the code to be δ, what is the best rate R that we can
achieve?
While we will not be able to pin down the exact optimal relationship between R and δ, we will
start establishing some limits. Note that an upper bound on R is a negative result in that it
establishes that codes with certain parameters do not exist. Similarly, a lower bound on R is a
positive result.
In this chapter, we will consider only one positive result, i.e. a lower bound on R called the
Gilbert-Varshamov bound in Section 4.2. In Section 4.1, we recall a negative result that we have
already seen– Hamming bound and state its asymptotic version to obtain an upper bound on
R. We will consider two other upper bounds: the Singleton bound (Section 4.3), which gives
a tight upper bound for large enough alphabets (but not binary codes) and the Plotkin bound
(Section 4.4).
81
Taking logarithms to base q of both sides above, and dividing by n yields that the second term
in the right had side of the inequality above is lower bounded by H q (δ/2) − o(1), and thus we
can get an asymptotic implication from Theorem 1.7.2. For a q-ary code C of rate R, relative
distance δ and block length n, we have:
µ ¶
δ
R ≤ 1 − Hq + o(1),
2
where the o(1) term tends to 0 as n → ∞. Thus for an infinite family of codes C , taking limits as
n → ∞, we get the following asymptotic Hamming bound.
Proposition 4.1.1 (Asymptotic Hamming Bound). Let C be an infinite family of q-ary codes with
rate R and relative distance δ. Then we have:
µ ¶
δ
R ≤ 1 − Hq .
2
Figure 4.1 gives a pictorial description of the asymptotic Hamming bound for binary codes.
1
Hamming bound
GV bound
0.9
0.8
0.7
0.6
0.5
R
0.4
0.3
0.2
0.1
0
0 0.2 0.4 0.6 0.8 1
δ
Figure 4.1: The Hamming and GV bounds for binary codes. Note that any point below the GV
bound is achievable by some code while no point above the Hamming bound is achievable by
any code. In this part of the book we would like to push the GV bound as much up as possible
while at the same time try and push down the Hamming bound as much as possible.
82
Theorem 4.2.1 (Gilbert-Varshamov Bound). Let q ≥ 2. For every 0 ≤ δ < 1 − q1 there exists a
(linear) code with rate R ≥ 1 − H q (δ) and relative distance δ. Furthermore, for every 0 ≤ ε ≤
1− H q (δ) and integer n, if the generator matrix of a code of block length n and rate 1− H q (δ)−ε is
picked uniformly at random, then the code has relative distance at least δ with probability strictly
greater than 1 − q −εn .
The bound is generally referred to as the GV bound. For a pictorial description of the GV
bound for binary codes, see Figure 4.1. We will present the proofs for general codes and linear
codes in Sections 4.2.1 and 4.2.2 respectively.
Note that the first part of the theorem follows from the second part by setting ε = 0. In what
follows we first prove the existence of a non-linear code of rate 1 − H q (δ) and relative distance
at least δ. Later we show how to get a linear code, and with high probability (when ε > 0).
1: C ← ;
2: WHILE there exists a v ∈ [q]n such that ∆(v, c) ≥ d for every c ∈ C DO
3: Add v to C
4: RETURN C
We claim that Algorithm 6 terminates and the C that it outputs has distance d . The latter
is true by step 2, which makes sure that in Step 3 we never add a vector c that will make the
distance of C fall below d . For the former claim, note that, if we cannot add v at some point, we
cannot add it later. Indeed, since we only add vectors to C , if a vector v ∈ [q]n is ruled out in a
certain iteration of Step 2 because ∆(c, v) < d , then in all future iterations, we have ∆(v, c) < d
and thus, this v will never be added in Step 3 in any future iteration.
The running time of Algorithm 6 is q O(n) . To see this, note that Step 2 in the worst-case could
be repeated for every vector in [q]n , that is at most q n times. In a naive implementation, for each
iteration, we cycle through all vectors in [q]n and for each vector v ∈ [q]n , iterate through all (at
most q n ) vectors c ∈ C to check whether ∆(c, v) < d . If no such c exists, then we add v to C .
Otherwise, we move to the next v. However, note that we can do slightly better– since we know
that once a v is “rejected" in an iteration, it’ll keep on being rejected in the future iterations, we
can fix up an ordering of vectors in [q]n and for each vector v in this order, check whether it can
83
c3
c2
c5
c1
c4
d −1
[q]n
Figure 4.2: An illustration of Gilbert’s greedy algorithm (Algorithm 6) for the first five iterations.
be added to C or not. If so, we add v to C , else we move to the next vector in the order. This
algorithm has time complexity O(nq 2n ), which is still q O(n) .
Further, we claim that after termination of Algorithm 6
[
B (c, d − 1) = [q]n .
c∈C
This is because if the above is not true, then there exists a vector v ∈ [q]n \C , such that ∆(v, c) ≥ d
and hence v can be added to C . However, this contradicts the fact that Algorithm 6 has termi-
nated. Therefore, ¯ ¯
¯[ ¯
¯ B (c, d − 1)¯ = q n . (4.1)
¯ ¯
c∈C
It is not too hard to see that
¯ ¯
X ¯[ ¯
|B (c, d − 1)| ≥ ¯¯ B (c, d − 1)¯¯ ,
c∈C c∈C
84
as desired. In the above, (4.2) follows from the fact that
V ol q (d − 1, n) ≤ V ol q (δn, n)
≤ q nHq (δ) , (4.3)
where the second inequality follows from the upper bound on the volume of a Hamming ball in
Proposition 3.3.3.
We thus conclude that for every q, n and δ there exists a code of rate at least n(1 − H q (δ)).
We state this formally as a lemma below.
Lemma 4.2.2. For every pair of positive integers n, q and real δ ∈ [0, 1] there exists a code (n, k, δn)q
qn
code satisfying q k ≥ V ol q (d −1,n) .
In particular, for every positive integer q and real δ ∈ [0, 1] there exists an infinite family of
q-ary codes C of rate R and distance δ satisfying R ≥ 1 − H q (δ).
It is worth noting that the code from Algorithm 6 is not guaranteed to have any special struc-
ture. In particular, even storing the code can take exponential space. We have seen in Proposi-
tion 2.3.3 that linear codes have a much more succinct representation. Thus, a natural question
is:
Question 4.2.1. Do linear codes achieve the R ≥ 1 − H q (δ) tradeoff that the greedy construc-
tion achieves?
Proof of Theorem 4.2.1. By Proposition 2.3.6, we are done if we can show that there exists a k ×n
matrix G of full rank (for k = (1 − H q (δ) − ε)n) such that
We will prove the existence of such a G by the probabilistic method. Pick a random linear code
by picking a random k × n matrix G where each of kn entries is chosen uniformly and indepen-
dently at random from Fq . Fix m ∈ Fkq \ {0}. Recall that by Lemma 3.1.12, for a random G, mG is
85
a uniformly random vector from Fnq . Thus, for every non-zero vector m, we have
V ol q (d − 1, n)
Pr[w t (mG) < d ] =
G qn
q nHq (δ)
≤ (4.4)
qn
≤ q −k · q −εn , (4.5)
where (4.4) follows from (4.3) and (4.5) uses k ≤ n(1 − H q (δ) − ε). There are q k − 1 non-zero vec-
tors m and taking the union over all such vectors and applying the union bound (Lemma 3.1.5)
we have
Discussion. We now digress a bit to stress some aspects of the GV bound and its proof. First,
note that that proof by the probabilistic method shows something stronger than just the exis-
tence of a code, but rather gives a high probability result. Furthermore, as pointed out explicitly
for the non-linear setting in Lemma 4.2.2, the result gives a lower bound not only in the asymp-
totic case but also one for every choice of n and k. The proof of the GV bound in the non-linear
case gives a similar non-asymptotic bound in the linear setting also.
Note that we can also pick a random linear code by picking a random (n −k)×n parity check
matrix. This also leads to a alternate proof of the GV bound: see Exercise 4.1.
Finally, we note that Theorem 4.2.1 requires δ < 1 − q1 . An inspection of Gilbert and Var-
shamov’s proofs shows that the only reason the proof required that δ ≤ 1 − q1 is because it is
86
c1
c2
ci c′i
cj c′i
cM
n −d +1 d −1
Figure 4.3: Construction of a new code in the proof of the Singleton bound.
needed for the volume bound (recall the bound in Proposition 3.3.3)– V ol q (δn, n) ≤ q Hq (δ)n – to
hold. It is natural to wonder if the above is just an artifact of the proof or if better codes exist.
This leads to the following question:
Question 4.2.2. Does there exists a code with R > 0 and δ > 1 − q1 ?
k ≤ n − d + 1.
Proof. Let c1 , c2 , . . . , cM be the codewords of an (n, k, d )q code C . Note that we need to show
M ≤ q n−d +1 . To this end, we define c′i to be the prefix of the codeword ci of length n − d + 1 for
every i ∈ [M ]. See Figure 4.3 for a pictorial description.
We now claim that for every i 6= j , c′i 6= c′j . For the sake of contradiction, assume that there
exits an i 6= j such that c′i = c′j . Notice this implies that ci and c j agree in all the first n − d +
87
1
Hamming bound
GV bound
Singleton bound
0.8
R 0.6
0.4
0.2
0
0 0.2 0.4 0.6 0.8 1
δ
Figure 4.4: The Hamming, GV and Singleton bound for binary codes.
1 positions, which in turn implies that ∆(ci , c j ) ≤ d − 1. This contradicts the fact that C has
distance d . Thus, M is the number of prefixes of codewords in C of length n − d + 1, which
implies that M ≤ q n−d +1 as desired.
Note that the asymptotic version of the Singleton bound states that k/n ≤ 1 − d /n + 1/n. In
other words,
R ≤ 1 − δ + o(1).
Figure 4.4 presents a pictorial description of the asymptotic version of the Singleton bound.
It is worth noting that the bound is independent of the alphabet size. As is evident from Fig-
ure 4.4, the Singleton bound is worse than the Hamming bound for binary codes. However, this
bound is better for larger alphabet sizes. In fact, we will look at a family of codes called Reed-
Solomon codes in Chapter 5 that meets the Singleton bound. However, the alphabet size of the
Reed-Solomon codes increases with the block length n. Thus, a natural follow-up question is
the following:
Question 4.3.1. Given a fixed q ≥ 2, does there exist a q-ary code that meets the Singleton
bound?
88
4.4 Plotkin Bound
In this section, we will study the Plotkin bound, which will answer Questions 4.2.2 and 4.3.1.
We start by stating the bound.
Theorem 4.4.1 (Plotkin bound). The following holds for any code C ⊆ [q]n with distance d :
³ ´
1. If d = 1 − q1 n, |C | ≤ 2qn.
³ ´
qd
2. If d > 1 − q1 n, |C | ≤ qd −(q−1)n .
Note that the Plotkin bound (Theorem 4.4.1) implies that a code with relative distance δ ≥
1 − q1 , must necessarily have R = 0, which answers Question 4.2.2 in the negative.
Before we prove Theorem 4.4.1, we make a few remarks. We first note that the upper bound
in the first part of Theorem 4.4.1 can be improved to 2n for q = 2. (See Exercise 4.12.) Second, it
can be shown that this bound is tight. (See Exercise 4.13.) Third, the statement of Theorem 4.4.1
gives a trade-off only for relative distance greater than 1 − 1/q. However, as the following corol-
lary shows, the result can be extended to work for 0 ≤ δ ≤ 1 − 1/q. (See Figure 4.5 for an illustra-
tion for binary codes.)
qd
|C x | ≤ ≤ qd , (4.6)
qd − (q − 1)n ′
where the second inequality follows from the fact that qd − (q − 1)n ′ is an integer.
Note that by the definition of C x :
X
|C | = |C x | ,
′
x∈[q]n−n
1
If for some x, c1 6= c2 ∈ C x , ∆(c1 , c2 ) < d , then ∆((x, c1 ), (x, c2 )) < d , which implies that the distance of C is less
than d (as by definition of C x , both (x, c1 ), (x, c2 ) ∈ C ), which in turn is a contradiction.
89
1
Hamming bound
GV bound
Singleton bound
Plotkin bound
0.8
R 0.6
0.4
0.2
0
0 0.2 0.4 0.6 0.8 1
δ
Figure 4.5: The current bounds on the rate R vs. relative distance δ for binary codes. The GV
bound is a lower bound on R while the other three bounds are upper bounds on R.
³ ´
q
In other words, R ≤ 1 − q−1 δ + o(1) as desired.
Note that Corollary 4.4.2 implies that for any q-ary code of rate R and relative distance δ
(where q is a constant independent of the block length of the code), R < 1 − δ. In other words,
this answers Question 4.3.1 in the negative.
Let us pause for a bit at this point and recollect the bounds on R versus δ that we have proved
till now. Figure 4.5 depicts all the bounds we have seen till now (for q = 2). The GV bound is
the best known lower bound at the time of writing of this book. Better upper bounds are known
and we will see one such trade-off (called the Elias-Bassalygo bound) in Section 8.1.
Now, we turn to the proof of Theorem 4.4.1, for which we will need two more lemmas. The
first lemma deals with vectors over real spaces. We quickly recap the necessary definitions.
n
Considerq a vector v in R , that is, a tuple of n real numbers. This vector has (Euclidean) norm
kvk = v 12 + v 22 + . . . + v n2 , and is a unit vector if and only if its norm is 1. The inner product of
P
two vectors, u and v, is 〈u, v〉 = i u i · v i . The following lemma gives a bound on the number of
vectors that can exist such that every pair is at an obtuse angle with each other.
90
1. If 〈vi , v j 〉 ≤ 0 for all i 6= j , then m ≤ 2N .
2. Let vi be unit vectors for 1 ≤ i ≤ m. Further, if 〈vi , v j 〉 ≤ −ε < 0 for all i 6= j , then m ≤ 1 + 1ε .2
(Item 1 is tight: see Exercise 4.14.) The proof of the Plotkin bound will need the existence of a
map from codewords to real vectors with certain properties, which the next lemma guarantees.
Lemma 4.4.4 (Mapping Lemma). Let C ⊆ [q]n . Then there exists a function f : C −→ Rnq such
that
We defer the proofs of the lemmas above to the end of the section. We are now in a position
to prove Theorem 4.4.1.
Proof of Theorem 4.4.1 Let C = {c1 , c2 , . . . , cm }. For all i 6= j ,
µ ¶ µ ¶
® q ∆(ci , c j ) q d
f (ci ), f (c j ) ≤ 1 − ≤ 1− .
q −1 n q −1 n
®
f (ci ), f (c j ) ≤ 0
µ ¶ µ ¶
q d qd − (q − 1)n
〈 f (ci ), f (c j )〉 ≤ 1 − =−
q −1 n (q − 1)n
³ ´
def qd −(q−1)n
and, since ε = (q−1)n > 0, we can apply the second part of Lemma 4.4.3. Thus, m ≤ 1 +
(q−1)n qd
qd −(q−1)n = qd −(q−1)n , as desired ✷
2
Note that since vi and v j are both unit vectors, 〈vi , v j 〉 is the cosine of the angle between them.
91
Proof of Lemma 4.4.3. We begin with a proof of the first result. The proof is by induction on
n. Note that in the base case of N = 0, we have m = 0, which satisfies the claimed inequality
m ≤ 2N .
In the general case, we have m ≥ 1 non-zero vectors v1 , . . . , vm ∈ RN such that for every i 6= j ,
〈vi , v j 〉 ≤ 0. (4.7)
Since rotating all the vectors by the same amount does not change the sign of the inner
product (nor does scaling any of the vectors), w.l.o.g. we can assume that vm = 〈1, 0, . . . , 0〉. For
1 ≤ i ≤ m − 1, denote the vectors as vi = 〈αi , yi 〉, for some αi ∈ R and yi ∈ RN −1 . Now, for any
P
i 6= 1, 〈v1 , vi 〉 = 1·αi + m
i =2 0 = αi . However, note that (4.7) implies that 〈v1 , vi 〉 ≤ 0, which in turn
implies that
αi ≤ 0. (4.8)
Next, we claim that at most one of y1 , . . . , ym−1 can be the all zeroes vector, 0. If not, assume
w.l.o.g., that y1 = y2 = 0. This in turn implies that
〈v1 , v2 〉 = α1 · α2 + 〈y1 , y2 〉
= α1 · α2 + 0
= α1 · α2
> 0,
where the last inequality follows from the subsequent argument. As v1 = 〈α1 , 0〉 and v2 = 〈α2 , 0〉
are non-zero, this implies that α1 , α2 6= 0. (4.8) then implies that α1 , α2 < 0. However, 〈v1 , v2 〉 > 0
contradicts (4.7).
Thus, w.l.o.g., assume that v1 , . . . , vm−2 are all non-zero vectors. Further, note that for every
i 6= j ∈ [m − 2], 〈yi , y j 〉 = 〈vi , v j 〉 − αi · α j ≤ 〈vi , v j 〉 ≤ 0. Thus, we have reduced problem on m
vectors with dimension N to an equivalent problem on m−2 vectors with dimension dimension
N − 1. If we continue this process, we can conclude that every loss in dimension of the vector
results in twice in loss in the numbers of the vectors in the set. Induction then implies that
m ≤ 2N , as desired.
We now move on to the proof of the second part. Define z = v1 + . . . + vm . Now consider the
following sequence of relationships:
à !
Xm X m
2 2
kzk = kvi k + 2 〈vi , v j 〉 ≤ m + 2 · · (−ε) = m(1 − εm + ε).
i =1 i<j 2
The inequality follows from the facts that each vi is a unit vector and the assumption that for
every i 6= j , 〈vi .v j 〉 ≤ −ε. As kzk2 ≥ 0,
m(1 − εm + ε) ≥ 0.
92
or
εm ≤ 1 + ε.
Thus, we have m ≤ 1 + 1ε , as desired. ✷
Finally, we prove Lemma 4.4.4.
Proof of Lemma 4.4.4. We begin by defining a map φ : [q] → Rq with certain properties. Then
we apply φ to all the coordinates of a codeword to define the map f : Rq → Rnq that satisfies the
claimed properties. We now fill in the details.
Define φ : [q] → Rq as follows. For every i ∈ [q], we define
* +
1 1 −(q − 1) 1
φ(i ) = , ,..., ,... .
q q q q
| {z }
i th position
That is, all but the i ’th position in φ(i ) ∈ Rq has a value of 1/q and the i th position has value
−(q − 1)/q.
Next, we record two properties of φ that follow immediately from its definition. For every
i ∈ [q],
(q − 1) (q − 1)2 (q − 1)
φ(i )2 = + = . (4.9)
q2 q2 q
Also for every i 6= j ∈ [q],
(q − 2) 2(q − 1) 1
〈φ(i ), φ( j )〉 = 2
− 2
=− . (4.10)
q q q
We are now ready to define our final map f : C → Rnq . For every c = (c 1 , . . . , c n ) ∈ C , define
s
q ¡ ¢
f (c) = · φ(c 1 ), φ(c 2 ), . . . , φ(c n ) .
n(q − 1)
q
q
(The multiplicative factor n(q−1) is to ensure that f (c) for any c ∈ C is a unit vector.)
To complete the proof, we will show that f satisfies the claimed properties. We begin with
condition 1. Note that
q Xn ¯ ¯
k f (c)k2 = · ¯φ(i )¯2 = 1,
(q − 1)n i =1
where the first equality follows from the definition of f and the second equality follows from
(4.9).
We now turn to the second condition. For notational convenience, define c1 = (x 1 , . . . , x n )
and c2 = (y 1 , . . . , y n ). Consider the following sequence of relations:
® Xn ®
f (c1 ), f (c2 ) = f (x ℓ ), f (y ℓ )
ℓ=1
93
" # µ ¶
X ® X ® q
= φ(x ℓ ), φ(y ℓ ) + φ(x ℓ ), φ(y ℓ ) ·
ℓ:x ℓ 6= y ℓ ℓ:x ℓ =y ℓ n(q − 1)
" µ ¶ µ¶# µ ¶
X −1 X q −1 q
= + · (4.11)
ℓ:x ℓ 6= y ℓ q ℓ:x ℓ =y ℓ q n(q − 1)
· µ ¶ µ ¶¸ µ ¶
−1 q −1 q
= ∆(c1 , c2 ) + (n − ∆(c1 , c2 )) · (4.12)
q q n(q − 1)
µ ¶· ¸
q 1 q −1
= 1 − ∆(c1 , c2 ) +
n(q − 1) q q
µ ¶µ ¶
q ∆(c1 , c2 )
= 1− ,
q −1 n
as desired. In the above, (4.11) is obtained using (4.10) and (4.9) while (4.12) follows from the
definition of the Hamming distance. ✷
4.5 Exercises
Exercise 4.1. Pick a (n − k) × n matrix H over Fq at random. Show that with high probability the
code whose parity check matrix is H achieves the GV bound.
Exercise 4.2. Recall the definition of an ε-biased space from Exercise 2.14. Show that there exists
an ε-biased space of size O(k/ε2 ).
Exercise 4.3. Argue that a random linear code as well as its dual both lie on the corresponding
GV bound.
Exercise 4.4. In Section 4.2.2, we saw that random linear code meets the GV bound. It is natural
to ask the question for general random codes. (By a random (n, k)q code, we mean the follow-
ing: for each of the q k messages, pick a random vector from [q]n . Further, the choices for each
codeword is independent.) We will do so in this problem.
1. Prove that a random q-ary code with rate R > 0 with high probability has relative distance
δ ≥ H q−1 (1 − 2R − ε). Note that this is worse than the bound for random linear codes in
Theorem 4.2.1.
2. Prove that with high probability the relative distance of a random q-ary code of rate R is at
most H q−1 (1 − 2R) + ε. In other words, general random codes are worse than random linear
codes in terms of their distance.
Hint: Use Chebyshev’s inequality.
Exercise 4.5. We saw that Algorithm 6 can compute an (n, k)q code on the GV bound in time
q O(n) . Now the construction for linear codes is a randomized construction and it is natural to ask
94
how quickly can we compute an [n, k]q code that meets the GV bound. In this problem, we will
see that this can also be done in q O(n) deterministic time, though the deterministic algorithm is
not that straight-forward anymore.
1. Argue that Theorem 4.2.1 gives a q O(kn) time algorithm that constructs an [n, k]q code on
the GV bound. (Thus, the goal of this problem is to “shave" off a factor of k from the expo-
nent.)
,n
2. A k × n Toeplitz Matrix A = {A i , j }ki =1, j =1
satisfies the property that A i , j = A i −1, j −1 . In other
words, any diagonal has the same value. For example, the following is a 4 × 6 Toeplitz
matrix:
1 2 3 4 5 6
7 1 2 3 4 5
8 7 1 2 3 4
9 8 7 1 2 3
Hint: Write down the expression for the value at each of the n positions in the vector m · T in terms of the
values in the first row and column of T . Think of the values in the first row and column as variables. Then
divide these variables into two sets (this “division" will depend on m) say S and S. Then argue the following:
for every fixed y ∈ Fnq and for every fixed assignment to variables in S, there is a unique assignment to variables
in S such that mT = y.
3. Briefly argue why the claim in part 2 implies that a random code defined by picking its
generator matrix as a random Toeplitz matrix with high probability lies on the GV bound.
4. Conclude that an [n, k]q code on the GV bound can be constructed in time q O(k+n) .
Exercise 4.6. Show that one can construct the parity check matrix of an [n, k]q code that lies on
the GV bound in time q O(n) .
Exercise 4.7. So far in Exercises 4.5 and 4.6, we have seen two constructions of [n, k]q code on the
GV bound that can be constructed in q O(n) time. For constant rate codes, at the time of writing of
this book, this is fastest known construction of any code that meets the GV bound. For k = o(n),
there is a better construction known, which we explore in this exercise.
We begin with some notation. For the rest of the exercise we will target a distance of d = δn.
Given a message m ∈ Fkq and an [n, k]q code C , define the indicator variable:
½
1 if w t (C (m)) < d
Wm (C ) =
0 otherwise.
95
Further, define X
D(C ) = Wm (C ).
m∈Fkq \{0}
We will also use D(G) and Wm (G) to denote the variables above for the code C generated by G.
Given an k × n matrix M , we will use M i to denote the i th column of M and M ≤i to denote
the column submatrix of M that contains the first i columns. Finally below we will use G to
denote a uniformly random k ×n generator matrix and G to denote a specific instantiation of the
generator matrix. We will arrive at the final construction in a sequence of steps. In what follows
define k < (1 − H q (δ))n for large enough n.
1. Argue that C has a distance d if and only if D(C ) < 1.
4. We are now ready to define the algorithm to compute the final generator matrix G: see
Algorithm 7. Prove that Algorithm 7 outputs a matrix G such that the linear code generated
4: RETURN G
by G is an [n, k, δn]q code. Conclude that this code lies on the GV bound.
5. Finally,¡we will
¢ analyze the run time of Algorithm 7. Argue that Step 2 can be¡implemented
¢
k
in poly n, q time. Conclude Algorithm 7 can be implemented in time poly n, q k .
Hint: It might be useful to maintain a data structure that keeps track of one number for every non-zero m ∈ Fkq
throughout the run of Algorithm 7.
Exercise 4.8. In this problem we will derive the GV bound using a graph-theoretic proof, which
is actually equivalent to the greedy proof we saw in Section 4.2.1. Let 1 ≤ d ≤ n and q ≥ 1 be
integers. Now consider the graph G n,d ,q = (V, E ), where the vertex set is the set of all vectors in
[q]n . Given two vertices u 6= v ∈ [q]n , we have the edge (u, v) ∈ E if and only if ∆(u, v) < d . An
independent set of a graph G = (V, E ) is a subset I ⊆ V such that for every u 6= v ∈ I , we have that
(u, v) is not an edge. We now consider the following sub-problems:
96
1. Argue that any independent set C of G n,d ,q is a q-ary code of distance d .
2. The degree of a vertex in a graph G is the number of edges incident on that vertex. Let ∆ be
the maximum degree of any vertex in G = (V, E ).Then argue that G has an independent set
|V |
of size at least ∆+1 .
Exercise 4.9. In this problem we will improve slightly on the GV bound using a more sophisti-
cated graph-theoretic proof. Let G n,d ,q and N and ∆ be as in the previous exercise (Exercise 4.8).
So far we used the fact that G n,d ,q has many vertices and small degree to prove it has a large in-
dependent set, and thus to prove there is a large code of minimum distance d . In this exercise we
will see how a better result can be obtained by counting the number of “triangles” in the graph. A
triangle in a graph G = (V, E ) is a set {u, v, w} ⊂ V of three vertices such that all three vertices are
adjancent, i.e., (u, v), (v, w), (w, u) ∈ E . For simplicity we will focus on the case where q = 2 and
d = n/5, and consider the limit as n → ∞.
1. Prove that a graph on N vertices of maximum degree ∆ has at most O(N ∆2 ) triangles.
Hint: Fix u and let e count the number of coordinates where at least one of v or w disagree
with u. Prove that e is at most 3d /2.
3. Simplify the expression in the case where d = n/5 to show that the number of triangles in
G n,n/5,2 is O(N · ∆2−η ) for some η > 0.
4. A famous result in the “probabilistic method” shows (and you don’t have to prove this), that
if a graph on N vertices of maximum degree ∆ has at most O(N ·∆2−η ) triangles, then it has
an independent set of size Ω( N
∆ log ∆). Use this result
¡ nto¢ conclude that there is a binary code
n
of block length n and distance n/5 of size Ω(n2 / n/5 ). (Note that this improves over the
GV-bound by an Ω(n) factor.)
Exercise 4.10. Use part 2 from Exercise 1.7 to prove the Singleton bound.
Exercise 4.11. Let C be an (n, k, d )q code. Then prove that fixing any n −d +1 positions uniquely
determines the corresponding codeword.
Exercise 4.12. Let C be a binary code of block length n and distance n/2. Then |C | ≤ 2n. (Note
that this is a factor 2 better than part 1 in Theorem 4.4.1.)
Exercise 4.13. Prove that the bound in Exercise 4.12 is tight– i.e. there exists binary codes C with
block length n and distance n/2 such that |C | = 2n.
97
Exercise 4.14. Prove that part 1 of Lemma 4.4.3 is tight.
Exercise 4.15. In this exercise we will prove the Plotkin bound (at least part 2 of Theorem 4.4.1)
via a purely combinatorial proof. ³ ´
Given an (n, k, d )q code C with d > 1 − q1 n define
X
S= ∆(c1 , c2 ).
c1 6=c2 ∈C
For the rest of the problem think of C has an |C | × n matrix where each row corresponds to a
codeword in C . Now consider the following:
1. Looking at the contribution of each column in the matrix above, argue that
µ ¶
1
S ≤ 1− · n|C |2 .
q
2. Look at the contribution of the rows in the matrix above, argue that
S ≥ |C | (|C | − 1) · d .
Exercise 4.17. Use Exercise 4.16 to prove part 2 of Theorem 4.4.1 for linear codes.
Exercise 4.18. Use Exercise 4.16 to prove Theorem 4.3.1 for linear code.
98
Chapter 5
In this chapter, we will study the Reed-Solomon codes. Reed-Solomon codes have been studied
a lot in coding theory. These codes are optimal in the sense that they meet the Singleton bound
(Theorem 4.3.1). We would like to emphasize that these codes meet the Singleton bound not
just asymptotically in terms of rate and relative distance but also in terms of the dimension,
block length and distance. As if this were not enough, Reed-Solomon codes turn out to be more
versatile: they have many applications outside of coding theory. (We will see some applications
later in the book.)
These codes are defined in terms of univariate polynomials (i.e. polynomials in one un-
known/variable) with coefficients from a finite field Fq . It turns out that polynomials over Fp ,
for prime p, also help us define finite fields Fp s , for s > 1. To kill two birds with one stone1 , we
first do a quick review of polynomials over finite fields. Then we will define and study some
properties of Reed-Solomon codes.
99
P
Definition 5.1.2. For F (X ) = di=0 f i X i with f d 6= 0, we call d the degree of F (X ). We denote the
degree of the polynomial F (X ) by deg(F ).
Addition:
max(deg(F
X),deg(G))
F (X ) +G(X ) = ( f i + g i )X i ,
i =0
where the addition on the coefficients is done over Fq . For example, over F2 , X + (1 + X ) =
X · (1 + 1) + 1 · (0 + 1) = 1 (recall that over F2 , 1 + 1 = 0).2
Multiplication: Ã !
deg(F )+deg(G)
X min(iX
,deg(F ))
F (X ) · G(X ) = f j · gi −j X i ,
i =0 j =0
where all the operations on the coefficients are over Fq . For example, over F2 , X (1 + X ) =
X + X 2 ; (1+ X )2 = 1+2X + X 2 = 1+ X 2 , where the latter equality follows since 2 ≡ 0 mod 2.
Theorem 5.1.5. Let E (X ) be an irreducible polynomial with degree at least 2 over Fp , p prime.
Then the set of polynomials in Fp [X ] modulo E (X ), denoted by Fp [X ]/E (X ), is a field.
2
This will be a good time to remember that operations over a finite field are much different from operations over
integers/reals. For example, over reals/integers X + (X + 1) = 2X + 1.
100
The proof of the theorem above is similar to the proof of Lemma 2.1.4, so we only sketch the
proof here. In particular, we will explicitly state the basic tenets of Fp [X ]/E (X ).
• Elements are polynomials in Fp [X ] of degree at most s − 1. Note that there are p s such
polynomials.
• The additive identity is the zero polynomial, and the additive inverse of any element F (X )
is −F (X ).
• The multiplicative identity is the constant polynomial 1. It can be shown that for every
element F (X ), there exists a unique multiplicative inverse (F (X ))−1 .
For example, for p = 2 and E (X ) = 1+X +X 2 , F2 [X ]/(1+X +X 2 ) has as its elements {0, 1, X , 1+
X }. The additive inverse of any element in F2 [X ]/(1 + X + X 2 ) is the element itself while the
multiplicative inverses of 1, X and 1 + X in F2 [X ]/(1 + X + X 2 ) are 1, 1 + X and X respectively.
A natural question to ask is if irreducible polynomials exist for every degree. Indeed, they
do:
101
Algorithm 8 Generating Irreducible Polynomial
I NPUT: Prime power q and an integer s > 1
O UTPUT: A monic irreducible polynomial of degree s over Fq
1: b ← 0
2: WHILE b = 0 DO
P
3: F (X ) ← X s + is−1 i
=0 f i X , where each f i is chosen uniformly at random from Fq .
s
4: IF gcd(F (X ), X q − X ) = F (X ) THEN
5: b ← 1.
6: RETURN F (X )
Corollary 5.1.7. There is a Las Vegas algorithm to generate an irreducible polynomial of degree s
over any Fq in expected time poly(s, log q).
Now recall that Theorem 2.1.5 states that for every prime power p s , there a unique field Fp s .
This along with Theorems 5.1.5 and 5.1.6 imply that:
Corollary 5.1.8. The field Fp s is Fp [X ]/E (X ), where E (X ) is an irreducible polynomial of degree
s.
102
For example, the first row below are all the codewords in the [3, 2]3 Reed-Solomon codes
where the evaluation points are F3 (and the codewords are ordered by the corresponding mes-
sages from F23 in lexicographic order where for clarity the second row shows the polynomial
f m (X ) for the corresponding m ∈ F23 in gray):
(0,0,0), (1,1,1), (2,2,2), (0,1,2), (1,2,0), (2,0,1), (0,2,1), (1,0,2), (2,1,0)
0, 1, 2, X, X+1, X+2, 2X, 2X+1, 2X+2
Notice that by definition, the entries in {α1 , ..., αn } are distinct and thus, must have n ≤ q.
We now turn to some properties of Reed-Solomon codes.
Proof. The proof follows from the fact that if a ∈ Fq and f (X ), g (X ) ∈ Fq [X ] are polynomials of
degree ≤ k −1, then a f (X ) and f (X )+ g (X ) are also polynomials of degree ≤ k −1. In particular,
let messages m1 and m2 be mapped to f m1 (X ) and f m2 (X ) where f m1 (X ), f m2 (X ) ∈ Fq [X ] are
polynomials of degree at most k − 1 and because of the mapping defined in (5.1), it is easy to
verify that:
f m1 (X ) + f m2 (X ) = f m1 +m2 (X ),
and
a f m1 (X ) = f am1 (X ).
In other words,
RS(m1 ) + RS(m2 ) = RS(m1 + m2 )
aRS(m1 ) = RS(am1 ).
Therefore RS is a [n, k]q linear code.
Claim 5.2.3. RS is a [n, k, n − k + 1]q code. That is, it matches the Singleton bound.
The claim on the distance follows from the fact that every non-zero polynomial of degree
k − 1 over Fq [X ] has at most k − 1 (not necessarily distinct) roots, which we prove first (see
Proposition 5.2.4 below). This implies that if two polynomials agree on more than k − 1 places
then they must be the same polynomial– note that this implies two polynomials when evaluated
at the same n points must differ in at least n−(k −1) = n−k +1 positions, which is what we want.
Proposition 5.2.4 (“Degree Mantra"). A nonzero polynomial f (X ) of degree t over a field Fq has
at most t roots in Fq
Proof. We will prove the theorem by induction on t . If t = 0, we are done. Now, consider f (X )
of degree t > 0. Let α ∈ Fq be a root such that f (α) = 0. If no such root α exists, we are done. If
there is a root α, then we can write
f (X ) = (X − α)g (X )
103
where deg(g ) = deg( f ) − 1 (i.e. X − α divides f (X )). Note that g (X ) is non-zero since f (X ) is
non-zero. This is because by the fundamental rule of division of polynomials:
f (X ) = (X − α)g (X ) + R(X )
where deg(R) ≤ 0 (as the degree cannot be negative this in turn implies that deg(R) = 0) and
since f (α) = 0,
f (α) = 0 + R(α),
which implies that R(α) = 0. Since R(X ) has degree zero (i.e. it is a constant polynomial), this
implies that R(X ) ≡ 0.
Finally, as g (X ) is non-zero and has degree t − 1, by induction, g (X ) has at most t − 1 roots,
which implies that f (X ) has at most t roots.
Proof of Claim 5.2.3. We start by proving the claim on the distance. Fix arbitrary m1 6= m2 ∈
Fkq . Note that f m1 (X ), f m2 (X ) ∈ Fq [X ] are distinct polynomials of degree at most k − 1 since
m1 6= m2 ∈ Fkq . Then f m1 (X ) − f m2 (X ) 6= 0 also has degree at most k − 1. Note that w t (RS(m2 ) −
RS(m1 )) = ∆(RS(m1 ), RS(m2 )). The weight of RS(m2 )−RS(m1 ) is n minus the number of zeroes
in RS(m2 ) − RS(m1 ), which is equal to n minus the number of roots that f m1 (X ) − f m2 (X ) has
among {α1 , ..., αn }. That is,
104
1 1 1 1 1 1
α α2 · · · αj · · · αn
1
2
α1 α22 · · · α2j · · · α2n
.. .. .. .. .. ..
. . . . . .
i
α α i
· · · αij · · · αin
1 2
. . .. .. .. ..
.. .. . . . .
αk−1
1 αk−1
2
k−1
· · · αj · · · αk−1
n
The class of codes that match the Singleton bound have their own name, which we define
and study next.
Definition 5.3.2. For any subset of indices S ⊆ [n] of size exactly k and a code C ⊆ Σn , C S is the
set of all codewords in C projected onto the indices in S.
MDS codes have the following nice property that we shall prove for the special case of Reed-
Solomon codes first and subsequently for the general case as well.
Proposition 5.3.3. Let C ⊆ Σn of integral dimension k be an MDS code, then for all S ⊆ [n] such
that |S| = k, we have |C S | = Σk .
Before proving Proposition 5.3.3 in its full generality, we present its proof for the special case of
Reed-Solomon codes.
Consider any S ⊆ [n] of size k and fix an arbitrary v = (v 1 , . . . , v k ) ∈ Fkq , we need to show that
there exists a codeword c ∈ RS (assume that the RS code evaluates polynomials of degree at
most k − 1 over α1 , . . . , αn ⊆ Fq ) such that cS = v. Consider a generic degree k − 1 polynomial
P
F (X ) = k−1 i
i =0 f i X . Thus, we need to show that there exists F (X ) such that F (αi ) = v i for all i ∈
S, where |S| = k.
For notational simplicity, assume that S = [k]. We think of f i ’s as unknowns in the equations
that arise out of the relations F (αi ) = v i . Thus, we need to show that there is a solution to the
following system of linear equations:
105
1 1 1 v1
α1 αi αk v2
¡ ¢
p 0 p 1 · · · p k−1 α21 α2i α2k = v3
.. .. .. ..
. . . .
αk−1
1 αk−1
i
αk−1
k
vk
The above constraint matrix is a Vandermonde matrix and is known to have full rank (see Ex-
ercise 5.7). Hence, by Exercise 2.6, there always exists a unique solution for (p 0 , . . . , p k−1 ). This
completes the proof for Reed-Solomon codes.
Next, we prove the property for the general case which is presented below
Proof of Proposition 5.3.3. Consider a |C | × n matrix where each row represents a codeword
in C . Hence, there are |C | = |Σ|k rows in the matrix. The number of columns is equal to the
block length n of the code. Since C is Maximum Distance Separable, its distance d = n − k + 1.
Let S ⊆ [n] be of size exactly k. It is easy to see that for any ci 6= c j ∈ C , the corresponding
j
projections ciS and cS ∈ C S are not the same. As otherwise △(ci , c j ) ≤ d −1, which is not possible
as the minimum distance of the code C is d . Therefore, every codeword in C gets mapped to a
distinct codeword in C S . As a result, |C S | = |C | = |Σ|k . As C S ⊆ Σk , this implies that C S = Σk , as
desired. ✷
Proposition 5.3.3 implies an important property in pseudorandomness: see Exercise 5.8 for
more.
5.4 Exercises
Exercise 5.1. Prove that X 2 + X + 1 is the unique irreducible polynomial of degree two over F2 .
Exercise 5.2. Argue that any function f : Fq → Fq is equivalent to a polynomial P (X ) ∈ Fq [X ] of
degree at most q − 1: that is, for every α ∈ Fq
f (α) = P (α).
Exercise 5.3. For any [n, k]q Reed-Solomon code, exhibit two codewords that are at Hamming
distance exactly n − k + 1.
Exercise 5.4. Let RSF∗q [n, k] denote the Reed-Solomon code over Fq where the evaluation points
is Fq (i.e. n = q). Prove that
³ ´⊥
RSFq [n, k] = RSFq [n, n − k],
that is, the dual of these Reed-Solomon codes are Reed-Solomon codes themselves. Conclude that
Reed-Solomon codes contain self-dual codes (see Exercise 2.31 for a definition).
106
Exercise 5.5. Since Reed-Solomon codes are linear codes, by Proposition 2.3.5, one can do error
detection for Reed-Solomon codes in quadratic time. In this problem, we will see that one can
design even more efficient error detection algorithm for Reed-Solomon codes. In particular, we
will consider data streaming algorithms (see Section 22.5 for more motivation on this class of al-
gorithms). A data stream algorithm makes a sequential pass on the input, uses poly-logarithmic
space and spend only poly-logarithmic time on each location in the input. In this problem we
show that there exists a randomized data stream algorithm to solve the error detection problem
for Reed-Solomon codes.
2. Given [q, k]q Reed-Solomon code C (i.e. with the evaluation points being Fq ), present a
data stream algorithm for error detection of C with O(log q) space and polylogq time per
position of the received word. The algorithm should work correctly with probability at least
2/3. You should assume that the data stream algorithm has access to the values of k and q
(and knows that C has Fq as its evaluation points).
Hint: Part 1 and Exercise 5.4 should be helpful.
Exercise 5.6. We have defined Reed-Solomon in this chapter and Hadamard codes in Section 2.6.
In this problem we will prove that certain alternate definitions also suffice.
1. Consider the Reed-Solomon code over a field Fq and block length n = q − 1 defined as
2. Recall that the [2r , r, 2r −1 ]2 Hadamard code is generated by the r × 2r matrix whose i th (for
0 ≤ i ≤ 2r − 1) column is the binary representation of i . Briefly argue that the Hadamard
6
This means that F∗q = {1, α, . . . , αn−1 }. Further, αn = 1.
107
codeword for the message (m 1 , m 2 , . . . , m r ) ∈ {0, 1}r is the evaluation of the (multivariate)
polynomial m 1 X 1 + m 2 X 2 + · · · + m r X r (where X 1 , . . . , X r are the r variables) over all the
possible assignments to the variables (X 1 , . . . , X r ) from {0, 1}r .
Using the definition of Hadamard codes above (re)prove the fact that the code has distance
2r −1 .
Exercise 5.7. Prove that the k × k Vandermonde matrix (where the (i , j )th entry is αij ) has full
rank (where α1 , . . . , αk are distinct).
Exercise 5.8. A set S ⊆ Fnq is said to be a t -wise independent source (for some 1 ≤ t ≤ n) if given
a uniformly random sample (X 1 , . . . , X n ) from S, the n random variables are t -wise independent:
i.e. any subset of t variables are uniformly independent random variables over Fq . We will explore
properties of these objects in this exercise.
1. Argue that the definition of t -wise independent source is equivalent to the definition in
Exercise 2.13.
2. Argue that for any k ≥ 1, any [n, k]q code C is a 1-wise independent source.
3. Prove that any [n, k]q MDS code is a k-wise independent source.
4. Using part 3 or otherwise prove that there exists a k-wise independent source over F2 of
size at most (2n)k . Conclude that k(log2 n + 1) uniformly and independent random bits
are enough to compute n random bits that are k-wise independent. Improve the bound
slightly to show that k(log2 n − log2 log2 n + O(1))-random bits are enough to generate k-
wise independent source over F2 .
5. For 0 < p ≤ 1/2, we say the n binary random variables X 1 , . . . , X n are p-biased and t -
wise independent if any of the t random variables are independent and Pr [X i = 1] = p
for every i ∈ [n]. For the rest of the problem, let p be a power of 1/2. Then show that
any t · log2 (1/p)-wise independent random variables can be converted into t -wise inde-
pendent p-biased random
¡ variables.
¢ Conclude that one can construct such sources with
t log2 (1/p)(1+log2 n log2 (1/p) ) uniformly random bits. Then improve this bound to t (1+
max(log2 (1/p), log2 n)) uniformly random bits.
Exercise 5.9. In many applications, errors occur in “bursts"– i.e. all the error locations are con-
tained in a contiguous region (think of a scratch on a DVD or disk). In this problem we will use
how one can use Reed-Solomon codes to correct bursty errors.
An error vector e ∈ {0, 1}n is called a t -single burst error pattern if all the non-zero bits in e
occur in the range [i , i + t − 1] for some 1 ≤ i ≤ n = t + 1. Further, a vector e ∈ {0, 1}n is called a
(s, t )-burst error pattern if it is the union of at most s t -single burst error pattern (i.e. all non-zero
bits in e are contained in one of at most s contiguous ranges in [n]).
We call a binary code C ⊆ {0, 1}n to be (s, t )-burst error correcting if one can uniquely decode
from any (s, t )-burst error pattern. More precisely, given an (s, t )-burst error pattern e and any
codeword c ∈ C , the only codeword c′ ∈ C such that (c + e) − c′ is an (s, t )-burst error pattern
satisfies c′ = c.
108
1. Argue that if C is (st )-error correcting (in the sense of Definition 1.3.5), then it is also (s, t )-
2
burst error correcting. Conclude that for any ε > 0, there exists code with rate ¢ ) and
¡ 1 Ω(ε
block length n that is (s, t )-burst error correcting for any s, t such that s · t ≤ 4 − ε · n.
2. Argue that for any rate R > 0 and for³large´ enough n, there exist (s, t )-burst error correcting
¡ ¢ log n
as long as s ·t ≤ 1−R−ε
2
·n and t ≥ Ω ε . In particular, one can correct from 12 −ε fraction
of burst-errors (as long as each burst is “long enough") with rate Ω(ε) (compare this with
item 1).
Hint: Use Reed-Solomon codes.
Exercise 5.10. In this problem we will look at a very important class of codes called BCH codes7 .
Let F = F2m . Consider the binary code C BCH defined as RSF [n, k, n − k + 1] ∩ Fn2 .
1. Prove that C BCH is a binary linear code of distance at least d = n − k + 1 and dimension at
least n − (d − 1) log2 (n + 1).
Hint: Use the characterization (5.2) of the Reed-Solomon code from Exercise 5.6.
l m
d −1
2. Prove a better lower bound of n − 2 log2 (n + 1) on the dimension of C BCH .
Hint: Try to find redundant checks among the “natural” parity checks defining C BCH ).
3. For d = 3, C BCH is the same as another code we have seen. What is that code?
4. For constant d (and growing n), prove that C BCH have nearly optimal dimension for dis-
tance d , in that the dimension cannot be n − t log2 (n + 1) for t < d −1
2 .
l m
Exercise 5.11. Show that for 1 ≤ k ≤ n, k2 log2 (n +1) random bits are enough to compute n-bits
that are k-wise independent. Note that this is an improvement of almost a factor of 2 from the
bound from Exercise 5.8 part 4 (and this new bound is known to be optimal).
Hint: Use Exercises 2.13 and 5.10.
Exercise 5.12. In this exercise, we continue in the theme of Exercise 5.10 and look at the inter-
section of a Reed-Solomon code with Fn2 to get a binary code. Let F = F2m . Fix positive integers
d , n with (d − 1)m < n < 2m , and a set S = {α1 , α2 , . . . , αn } of n distinct nonzero elements of F. For
a vector v = (v 1 , . . . , v n ) ∈ (F∗ )n of n not necessarily distinct nonzero elements from F, define the
Generalized Reed-Solomon code GRSS,v,d as follows:
2. Argue that GRSS,v,d ∩ Fn2 is a binary linear code of rate at least 1 − (d −1)m
n .
7
The acronym BCH stands for Bose-Chaudhuri-Hocquenghem, the discoverers of this family of codes.
109
3. Let c ∈ Fn2 be a nonzero binary vector. Prove that (for every choice of d , S) there are at most
(2m − 1)n−d +1 choices of the vector v for which c ∈ GRSS,v,d .
4. Using the above, prove that if the integer D satisfies Vol2 (n, D − 1) < (2m − 1)d −1 (where
PD−1 ¡n ¢
Vol2 (n, D − 1) = i =0 i ), then there exists a vector v ∈ (F∗ )n such that the minimum dis-
tance of the binary code GRSS,v,d ∩ Fn2 is at least D.
5. Using parts 2 and 4 above (or otherwise), argue that the family of codes GRSS,v,d ∩ Fn2 con-
tains binary linear codes that meet the Gilbert-Varshamov bound.
Exercise 5.13. In this exercise we will show that the dual of a GRS code is a GRS itself with differ-
ent parameters. First, we state the obvious definition of GRS codes over a general finite field Fq (as
opposed to the definition over fields of characteristic two in Exercise 5.12). In particular, define
the code GRSS,v,d ,q as follows:
Exercise 5.14. In Exercise 2.16, we saw that any linear code can be converted in to a systematic
code. In other words, there is a map to convert Reed-Solomon codes into a systematic one. In
this exercise the goal is to come up with an explicit encoding function that results in a systematic
Reed-Solomon code.
In particular, given the set of evaluation points α1 , . . . , αn , design an explicit map f from Fkq
to a polynomial of degree at most k −1 such that the following holds. For every message m ∈ Fkq , if
¡ ¢
the corresponding polynomial is f m (X ), then the vector f m (αi ) i ∈[n] has the message m appear
in the corresponding codeword (say in its first k positions). Further, argue that this map results in
an [n, k, n − k + 1]q code.
Exercise 5.15. In this problem, we will consider the number-theoretic counterpart of Reed-Solomon
Q
codes. Let 1 ≤ k < n be integers and let p 1 < p 2 < · · · < p n be n distinct primes. Denote K = ki=1 p i
Q
and N = ni=1 p i . The notation ZM stands for integers modulo M , i.e., the set {0, 1, . . . , M −1}. Con-
sider the Chinese Remainder code defined by the encoding map E : ZK → Zp 1 × Zp 2 × · · · × Zp n
defined by:
E (m) = (m mod p 1 , m mod p 2 , · · · , m mod p n ) .
(Note that this is not a code in the usual sense we have been studying since the symbols at different
positions belong to different alphabets. Still notions such as distance of this code make sense and
are studied in the question below.)
Suppose that m 1 6= m 2 . For 1 ≤ i ≤ n, define the indicator variable b i = 1 if E (m 1 )i 6= E (m 2 )i
Q b
and b i = 0 otherwise. Prove that ni=1 p i i > N /K .
Use the above to deduce that when m 1 6= m 2 , the encodings E (m 1 ) and E (m 2 ) differ in at least
n − k + 1 locations.
110
Exercise 5.16. In this problem, we will consider derivatives over a finite field Fq . Unlike the
case of derivatives over reals, derivatives over finite fields do not have any physical interpretation
but as we shall see shortly, the notion of derivatives over finite fields is still a useful concept. In
P
particular, given a polynomial f (X ) = it =0 f i X i over Fq , we define its derivative as
tX
−1
f ′ (X ) = (i + 1) · f i +1 · X i .
i =0
Further, we will denote by f (i ) (X ), the result of applying the derivative on f i times. In this prob-
lem, we record some useful facts about derivatives.
Pt i
1. Define R(X , Z ) = f (X + Z ) = i =0 r i (X ) · Z . Then for any j ≥ 1,
f ( j ) (X ) = j ! · r j (X ).
3. Let j ≤ char(Fq ). Further, assume that for every 0 ≤ i < j , f (i ) (α) = 0 for some α ∈ Fq . Then
prove that (X − α) j divides f (X ).
4. Finally, we will prove the following generalization of the degree mantra (Proposition 5.2.4).
¥Let
t
¦f (X ) be a non-zero polynomial of degree
(j)
t and m ≤ char(Fq ). Then there exists at most
m distinct elements α ∈ Fq such that f (α) = 0 for every 0 ≤ j < m.
Exercise 5.17. In this exercise, we will consider a code that is related to Reed-Solomon codes and
uses derivatives from Exercise 5.16. These codes are called derivative codes.
Let m ≥ 1 be an integer parameter and consider parameters k < char(Fq ) and n such that
m < k < nm. Then the derivative code with parameters (n, k, m) is defined as follow. Consider
any message m ∈ Fkq and let f m (X ) be the message polynomial as defined for the Reed-Solomon
code. Let α1 , . . . , αn ∈ Fq be distinct elements. Then the codeword for m is given by
f m (α1 ) f m (α2 ) ··· f m (αn )
(1) (1) (1)
fm (α1 ) fm (α2 ) ··· fm (αn )
.. .. ..