Tips - Fundamentals of Algorithmics PDF
Tips - Fundamentals of Algorithmics PDF
OF ALGORITHMICS
FUNDAMENTALS
OF ALGORITHMICS
Gilles Brassard and Paul Bratley
PRENTICE HALL
Englewood Cliffs, New Jersey 07632
Library of Congress Cataloging-in-Publication Data
BRASSARD, GILLES
Fundamentals of Algorithmics / Gilles Brassard and Paul Bratley
p. cm.
Includes bibliographical references and index.
ISBN 0-13-335068-1
1. Algorithms. I. Bratley, Paul. H. Title
QA9.58.B73 1996 95-45581
511'.8-dc2O CU'
The author and publisher of this book have used their best efforts in preparing this book. These
efforts include the development, research, and testing of the theories and formulas to determine
their effectiveness. The author and publisher shall not be liable in any event for incidental or
consequential damages in connection with, or arising out of, the furnishing, performance, or use
of these formulas.
All rights reserved. No part of this book may be reproduced, in any form or by any means, with-
out permission in writing from the publisher.
ISBN 0-13-335068-1
Prentice-Hall International (UK) Limited, London
Prentice-Hall of Australia Pty. Limited, Sydney
Prentice-Hall Canada Inc., Toronto
Prentice-Hall Hispanoamericana, S.A., Mexico
Prentice-Hall of India Private Limited, New Delhi
Prentice-Hall of Japan, Inc., Tokyo
Simon & Schuster Asia Pte. Ltd., Singapore
Editora Prentice-Hall do Brasil, Ltda., Rio de Janeiro
A nos parents
Contents
PREFACE xv
* 1 PRELIMINARIES
1.1 Introduction 1
1.2 What is an algorithm? 1
1.3 Notation for programs 6
1.4 Mathematical notation 7
1.4.1 Propositional calculus 7
1.4.2 Set theory 8
1.4.3 Integers, reals and intervals 8
1.4.4 Functions and relations 9
1.4.5 Quantifiers 10
1.4.6 Sums and products 11
1.4.7 Miscellaneous 12
1.5 Proof technique 1 - Contradiction 13
1.6 Proof technique 2 - Mathematical induction 16
1.6.1 The principle of mathematical induction 18
1.6.2 A horse of a different colour 23
1.6.3 Generalized mathematical induction 24
1.6.4 Constructive induction 27
1.7 Some reminders 31
1.7.1 Limits 31
1.7.2 Simple series 34
1.7.3 Basic combinatorics 38
1.7.4 Elementary probability 41
Vii
viii Contents
1.8 Problems 48
1.9 References and further reading 55
2.1 Introduction 57
2.2 Problems and instances 58
2.3 The efficiency of algorithms 59
2.4 Average and worst-case analyses 61
2.5 What is an elementary operation? 64
2.6 Why look for efficiency? 66
2.7 Some examples 67
2.7.1 Calculating determinants 68
2.7.2 Sorting 68
2.7.3 Multiplication of large integers 70
2.7.4 Calculating the greatest common divisor 71
2.7.5 Calculating the Fibonacci sequence 72
2.7.6 Fourier transforms 73
2.8 When is an algorithm specified? 74
2.9 Problems 74
2.10 References and further reading 78
* 3 ASYMPTOTIC NOTATION 79
3.1 Introduction 79
3.2 A notation for "the order of" 79
3.3 Other asymptotic notation 85
3.4 Conditional asymptotic notation 88
3.5 Asymptotic notation with several parameters 91
3.6 Operations on asymptotic notation 91
3.7 Problems 92
3.8 References and further reading 97
* 4 ANALYSIS OF ALGORITHMS 98
4.1 Introduction 98
4.2 Analysing control structures 98
4.2.1 Sequencing 98
4.2.2 "For" loops 99
4.2.3 Recursive calls 101
4.2.4 "While" and "repeat" loops 102
Contents ix
* 6 GREEDYALGORITHMS 187
* 7 DIVIDE-AND-CONQUER 219
7.1 Introduction: Multiplying large integers 219
7.2 The general template 223
7.3 Binary search 226
7.4 Sorting 228
7.4.1 Sorting by merging 228
7.4.2 Quicksort 231
7.5 Finding the median 237
7.6 Matrix multiplication 242
7.7 Exponentiation 243
7.8 Putting it all together: Introduction to cryptography 247
7.9 Problems 250
7.10 References and further reading 257
REFERENCES 501
INDEX 5177
Preface
xv
xvi Preface
the beginning of this preface, is nowadays more valid than ever, because the faster
your computing equipment, the more you stand to gain from efficient algorithms.
Our book is not a programming manual. Still less is it a "cookbook" containing
a long catalogue of programs ready to be used directly on a machine to solve certain
specific problems, but giving at best a vague idea of the principles involved in their
design. On the contrary, it deals with algorithmics: the systematic study of the
design and analysis of algorithms. The aim of our book is to give readers some
basic tools needed to develop their own algorithms, in whatever field of application
they may be required.
We concentrate on the fundamental techniques used to design and analyse
efficient algorithms. These techniques include greedy algorithms, divide-and-
conquer, dynamic programming, graph techniques, probabilistic algorithms and
parallel algorithms. Each technique is first presented in full generality. Thereafter
it is illustrated by concrete examples of algorithms taken from such different ap-
plications as optimization, linear algebra, cryptography, computational number
theory, graph theory, operations research, artificial intelligence, and so on. We pay
special attention to integrating the design of algorithms with the analysis of their
efficiency. Although our approach is rigorous, we do not neglect the needs of
practitioners: besides illustrating the design techniques employed, most of the
algorithms presented also have real-life applications.
To profit fully from this book, you should have some previous programming
experience. However, we use no particular programming language, nor are the
examples for any particular machine. This and the general, fundamental treatment
of the material ensure that the ideas presented here will not lose their relevance.
On the other hand, you should not expect to be able to use directly the algorithms
we give: you will always be obliged to make the necessary effort to transcribe them
into some appropriate programming language. The use of Pascal or a similarly
structured language will help reduce this effort to the minimum necessary.
Our book is intended as a textbook for an undergraduate course in algorithmics.
Some 500 problems are provided to help the teacher find homework assignments.
The first chapter includes most of the required mathematical preliminaries. In par-
ticular, it features a detailed discussion of mathematical induction, a basic skill too
often neglected in undergraduate computer science education. From time to time
a passage requires more advanced mathematical knowledge, but such passages
can be skipped on the first reading with no loss of continuity. Our book can also
be used for independent study: anyone who needs to write better, more efficient
algorithms can benefit from it.
To capture the students' attention from the outset, it is particularly effective to
begin the first lecture with a discussion of several algorithms for a familiar task such
as integer multiplication. James A. Foster, who used preliminary versions of this
book at the University of Idaho, described his experience in the following terms:
"My first lecture began with a discussion of 'how do you multiply two numbers'.
This led to what constitutes the size of the input, and to an analysis of the classical
algorithm. I then showed multiplication a la russe, with which they were soon
taken. We then discussed the divide-and-conquer algorithm (Section 7.1). All of
this was done informally, but at the end of the class (a single lecture, mind you) they
Preface xvii
Preliminaries
1.1 Introduction
In this book we shall be talking about algorithms and about algorithmics. This
introductory chapter begins by defining what we mean by these two words. We
illustrate this informal discussion by showing several ways to do a straightforward
multiplication. Even such an everyday task has hidden depths! We also take the
opportunity to explain why we think that the study of algorithms is both useful
and interesting.
Next we explain the notation we shall use throughout the book for describing
algorithms. The rest of the chapter consists essentially of reminders of things we
expect the reader to have seen already elsewhere. After a brief review of some
standard mathematical notation, we recall two useful proof techniques: proof by
contradiction and proof by mathematical induction. Next we list some results
concerning limits, the sums of series, elementary combinatorics and probability.
A reader familiar with these topics should read Sections 1.2 and 1.3, then sim-
ply glance through the rest of the chapter, skipping material that is already known.
Special attention should be paid to Section 1.6.4. Those whose basic maths and com-
puter science are rusty should at least read through the main results we present
to refresh their memories. Our presentation is succinct and informal, and is not
intended to take the place of courses on elementary analysis, calculus or program-
ming. Most of the results we give are needed later in the book; conversely, we try in
later chapters not to use results that go beyond the basics catalogued in this chapter.
1
2 Preliminaries Chapter 1
In the first twelve chapters of this book, unless the context clearly indicates
the contrary, we assume that an algorithm is a set of rules for calculating the cor-
rect answer to some problem. Chapter 13, on the other hand, deals entirely with
approximate algorithms and heuristics.
Algorithmics can now be defined simply as the study of algorithms. When we
set out to solve a problem, there may be a choice of algorithms available. In this
case it is important to decide which one to use. Depending on our priorities and on
the limits of the equipment available to us, we may want to choose the algorithm
that takes the least time, or that uses least storage, or that is easiest to program, and
so on. The answer can depend on many factors, such as the numbers involved, the
way the problem is presented, or the speed and storage capacity of the available
computing equipment. It may be that none of the available algorithms is entirely
suitable so that we have to design a new algorithm of our own. Algorithmics is
the science that lets us evaluate the effect of these various external factors on the
available algorithms so that we can choose the one that best suits our particular
circumstances; it is also the science that tells us how to design a new algorithm for
a particular task.
Take elementary arithmetic as an example. Suppose you have to multiply two
positive integers using only pencil and paper. If you were raised in North America,
the chances are that you will multiply the multiplicand successively by each figure
of the multiplier taken from right to left, that you will write these intermediate
results one beneath the other shifting each line one place left, and that finally you
will add all these rows to obtain your answer. Thus to multiply 981 by 1234 you
would produce an arrangement like that of Figure 1.1(a). If, on the other hand, you
went to school in England, you would be more likely to work from left to right,
producing the arrangement shown in Figure 1.1(b).
981 981
1234 1234
3924 981
2943 1962
1962 2943
981 3924
1210554 1210554
(a) (b)
Figure 1.1. Multiplication (a) American (b) English
These two algorithms for multiplication are very similar: so similar, in fact, that
we shall refer to them as the "classic" multiplication algorithm, without worrying
precisely which one we mean. A third, different algorithm for doing the same thing
is illustrated in Figure 1.2.
Write the multiplicand and the multiplier side by side. Make two columns,
one under each operand, by repeating the following rule until the number in the
left-hand column is 1: divide the number in the left-hand column by 2, ignoring
4 Preliminaries Chapter 1
any fractions, and double the number in the right-hand column by adding it to
itself. Next cross out each row where the number in the left-hand column is even,
and finally add up the numbers that remain in the right-hand column. The figure
illustrates how to multiply 981 by 1234. The answer obtained is
Now to multiply 0981 by 1234 we first multiply the left half of the multiplicand
(09) by the left half of the multiplier (12), and write the result (108) shifted left as
Section 1.2 What is an algorithm? 5
many places as there are figures in the multiplier: four, in our example. Next we
multiply the left half of the multiplicand (09) by the right half of the multiplier
(34), and write the result (306) shifted left by half as many places as there are
figures in the multiplier: two, in this case. Thirdly we multiply the right half of the
multiplicand (81) by the left half of the multiplier (12), and write the result (972)
also shifted left by half as many places as there are figures in the multiplier; and
fourthly we multiply the right half of the multiplicand (81) by the right half of the
multiplier (34) and write the result (2754), not shifted at all. Finally we add up the
four intermediate results as shown in Figure 1.3 to obtain the answer 1210554.
If you have followed the working of the algorithm so far, you will have seen that we
have reduced the multiplication of two four-figure numbers to four multiplications
of two-figure numbers (09 x 12, 09 x 34, 81 x 12 and 81 x 34) together with a
certain number of shifts and a final addition. The trick is to observe that each
of these multiplications of two-figure numbers can be carried out in exactly the
same way, except that each multiplication of two-figure numbers requires four
multiplications of one-figure numbers, some shifts, and an addition. For instance,
Figure 1.4 shows how to multiply 09 by 12. We calculate 0 x 1 = 0, shifted left
two places; 0 x 2 = 0, shifted left one place; 9 x 1 = 9, shifted left one place; and
9 x 2 = 18, not shifted. Finally we add these intermediate results to obtain the
answer 108. Using these ideas the whole of our calculation can be carried out in
such a way that the multiplications involve only one-figure operands. (Although
we described Figure 1.3 before Figure 1.4, this was only to simplify the presentation.
Of course we have to do the four multiplications of two-figure numbers first, since
we use the values thus calculated when we do the multiplication of the four-figure
numbers.)
This unusual algorithm is an example of the technique called "divide-and-
conquer", which we shall study in Chapter 7. If you think it unlikely that it could
outperform the classic algorithm, you are perfectly right. However we shall see in
Chapter 7 that it is possible to reduce the multiplication of two large numbers to
three, and not four, multiplications of numbers roughly half the size, together with
a certain number of shifts and additions. (If you are stimulated by challenges, try
to figure out how to do this!) With this improvement, the divide-and-conquer mul-
tiplication algorithm runs faster on a computer than any of the preceding methods,
provided the numbers to be multiplied are sufficiently large. (Still faster methods
are known for very large operands.) It is not absolutely necessary for the length
6 Preliminaries Chapter 1
of the operands to be a power of two, nor that they have the same length. Prob-
lem 1.6 shows one case where the algorithm can be useful in practice, even when
the operands are relatively small, and even when we use four submultiplications
instead of three.
The point of all these examples is that, even in an area as simple as elemen-
tary arithmetic, there may be several algorithms available to us for carrying out
the necessary operations. One may appeal by its familiarity, a second because of
the elementary nature of the intermediate calculations involved, or a third by its
speed on a machine. It is by making a more formal study of the properties of
the algorithms-by using algorithmics, in other words-that we can make a wise
choice of the technique to use in any given situation. As we shall see, a good choice
can save both money and time; in some cases, it can make all the difference between
success and failure when solving a large, hard problem. The aim of our book is to
teach you how to make such choices.
function russe(m, n)
result 0
repeat
if m is odd then result - result + n
m -m .2
m - in 2
n- n+n
until m = 1
return result
1.4 Mathematical notation
This section reviews some mathematical notation that we shall use throughout the
book. Our review is succinct, as we expect the reader to be familiar with most
of it. Nevertheless, you are encouraged to read it at least summarily because we
introduce most of the symbols that will be used, and some of them (such as [ i. . j
V, 3, 1g. Lx], .,and R° ) are not universally accepted.
An interval is a set of real numbers lying between two bounds. Let a and b be
two real numbers such that a < b. The open interval (a, b) denotes
(a, b] fE
{x= R Ia < x < b}
and
[a, b)= {x ER I a < x < b}.
Moreover, a co and b = + oo are allowed with their obvious meaning provided
they fall on the open side of an interval.
An integer interval is a set of integers lying between two bounds. Let i and j
be two integers such that i < j + 1. The integer interval [i.. j] denotes
{n e i <n <j1.
which is true of the odd integers and false of the even integers. There is also a
natural interpretation of Boolean formulas in terms of predicates. For instance,
one can define a predicate P: {true,false}3 _{true,false} by
1.4.5 Quantifiers
The symbols V and ] are pronounced "for all" and "there exists", respectively.
To illustrate this, consider an arbitrary set X and a property P on X. We write
(V x C X) [P(x)] to mean "every x in X has property P". Similarly,
(3x E X) [P(x)]
means "there exists at least one x in X that has property P". Finally, we write
(3!x C X) [P(x)] to mean "there exists exactly one x in X that has property P".
If X is the empty set, (V x E X) [P(x)] is always vacuously true-try to find a coun-
terexample if you disagree!-whereas (3x e X) [P(x)] is always trivially false.
Consider the following three concrete examples.
(V NO) [i n(n 1)
(3!n E )Ei = n2
(nm,neN)[m>1,n>1andmn =12573]
These examples state that the well-known formula for the sum of the first n integers
is always valid (see Section 1.7.2), that this sum is also equal to n2 for exactly one
positive value of n, and that 12573 is a composite integer, respectively.
An alternation of quantifiers may be used in a single expression. For instance,
says that for every natural number, there exists another natural number larger
still. When using alternation of quantifiers, the order in which the quantifiers are
presented is important. For instance, the statement (3 m ERN) (V n e RJ) [m > n]
is obviously false: it would mean that there is an integer m that is larger than every
natural number (including m itself!).
Provided the set X is infinite, it is sometimes useful to say that not only is
there an x e X such that property P (x) holds, but that there are infinitely many
of them. The appropriate quantifier in this case is 3. For instance, (Bn e N)
[n is prime]. Note that B is stronger than 3 but weaker than V. Another useful
quantifier, stronger than B but still weaker than V, is V, which is used when
Section 1.4 Mathematical notation 11
a property holds in all cases except possibly for a finite number of exceptions.
For instance, (V n E M) [if n is prime, then n is odd] means that prime numbers
are always odd, except possibly for a finite number of exceptions (in this case there
is exactly one exception: 2 is both prime and even).
When we are interested in properties of the natural numbers, there is an equiv-
alent definition for these quantifiers, and it is often better to think of them accord-
ingly. A property P of the natural numbers holds infinitely often if, no matter how
large m is, there is an n > m such that P(n) holds. Similarly, property P holds on
all natural numbers except possibly for a finite number of exceptions if there is an
integer m such that P(n) holds for all integers n > m. In the latter case, we say
that "property P holdsfor all sufficiently large integers". Formally,
whereas
The duality principle for quantifiers says that "it is not the case that property P
holds for all x E X if and only if there exists at least one x E X for which property
P does not hold". In other words,
Similarly,
E f (i)
PW
12 Preliminaries Chapter 1
denotes the sum of f (i) for all the integers i such that P(i) holds. This sum may
not be well-defined if it involves an infinite number of integers. We may also use
a mixed notation, such as
n
A f(i)
P(i)
which denotes the sum of the values taken by f on those integers between 1 and n
for which property P holds. If there are no such integers, the sum is 0. For example,
10
i=1+3+5+7+9=25.
i odd
pronounced "the product of f(i) as i goes from 1 to n". In the case n - 0, the
product is defined to be 1. This notation is generalized in the same way as the sum
notation.
1.4.7 Miscellaneous
If b X 1 and x are strictly positive real numbers, then logb x, pronounced "the log-
arithm of x in base ,".is defined as the unique real number y such that by - x.
For instance, loglo 1000 - 3. Note that although b and x must be positive, there
is no such restriction on y. For instance, log1o 0.001 = -3. When the base b is not
specified, we take it to be e = 2.7182818..., the base of the so-called natural loga-
rithm. (Some authors take the base to be 10 when it is not specified and denote the
natural logarithm by "In".) In algorithmics, the base most often used for logarithms
is 2, which deserves a notation of its own: lg x is short for log2 x. Although we
assume that the reader is familiar with logarithms, let us recall the most important
logarithmic identities:
log xy = y loga x,
logloa
Remember too that log log n is the logarithm of the logarithm of n, but log 2 n is
the square of the logarithm of n.
If x is a real number, Lx] denotes the largest integer that is not larger than x,
called the floor of x. For instance, [31/21 = 3. When x is positive, Lx] is the in-
Section 1.5 Proof technique 1 - Contradiction 13
teger you obtain by discarding the fractional part of x if there is one. When x is
negative and not itself an integer, however, Lx] is smaller than this by 1. For in-
stance, [ -31/21 = -4. Similarly, we define the ceiling of x, denoted by [xl, as the
smallest integer that is not smaller than x. Note that x -1 < Lx] < x c [xl < x + 1
for all x.
If mr > 0 and n > 0 are integers, m/n denotes as always the result of dividing
m by n, which is not necessarily an integer. For instance, 7/2 = 31/2. We denote the
quotient by the symbol " . ", so that 7 . 2 = 3. Formally, mr n [ rn/n1. We also
use mod to denote the "modulo" operator defined by
mmodn=m-nx(mr .n).
Proof Let P denote the set of all prime numbers. Assume for a contradiction that P is a
finite set. The set P is not empty since it contains at least the integer 2. Since P is
finite and nonempty, it makes sense to multiply all its elements. Let x denote that
product, and let y denote x + 1. Consider the smallest integer d that is larger than
1 and that is a divisor of y. Such an integer certainly exists since y is larger than 1
and we do not require that d be different from y. First note that d itself is prime, for
otherwise any proper divisor of d would also divide y and be smaller than d, which
would contradict the definition of d. (Did you notice that the previous sentence is
itself a proof by contradiction, nested in the larger scheme of things?) Therefore,
according to our assumption that P contains each and every prime, d belongs to P.
This shows that d is also a divisor of x since x is the product of a collection of
integers including d. We have reached the conclusion that d exactly divides both
x and y. But recall that y = x + 1. Therefore, we have obtained an integer d larger
than 1 that divides two consecutive integers x and y. This is clearly impossible: if
indeed d divides x, then the division of y by d will necessarily leave 1 as remainder.
The inescapable conclusion is that the original assumption was equally impossible.
But the original assumption was that the set P of all primes is finite, and therefore
its impossibility establishes that the set P is in fact infinite. U
For the constructively-minded reader (which every algorithmicist should be
at heart!), this proof of Euclid's can be turned into an algorithm-albeitnot a very
efficient one-capable of finding a new prime given any finite set of primes.
d-1
repeat d - d + 1 until d divides y
return d
Euclid's proof establishes that the value returned by Newprime(P) is a prime
number that does not belong to P. But who needs Euclid when writing an algorithm
for this task? What about the following, much simpler algorithm?
prime. Naturally, this situation cannot occur because there is no such thing as "the
largest prime", but Euclid's proof is needed to establish this. In sum, DumpEuclid
does work, but the proof of its termination is not immediate. In contrast, the fact
that Newprime always terminates is obvious (in the worst case it will terminate
when d reaches the value y), but the fact that it returns a new prime requires
proof.
We have just seen that it is sometimes possible to turn a mathematical proof
into an algorithm. Unfortunately, this is not always the case when the proof is by
contradiction. We illustrate this with an elegant example.
Theorem 1.5.2 There exist two irrational numbers x and y such that xy is
rational.
W Z2 = (2 2) = (2 ) 2x 2).=( )2= 2.
We have arrived at the conclusion that 2 is irrational, which is clearly false. We must
therefore conclude that our assumption was false: it must be possible to obtain a
rational number when raising an irrational to an irrational power. U
Now, how would you turn this proof into an algorithm? Clearly, the algorithm's
purpose should be to exhibit two irrationals x and y such that xy is rational.
At first, you may be tempted to say that the algorithm should simply output x = z
(as defined in the proof) and y = 2 since it was proven above that z is irrational
and that z, 2 = 2. But beware! The "proof" that z is irrational depends on the false
assumption that we started with, and therefore this proof is not valid. (It is only
the proof that is not valid. It is in fact true that z is irrational, but this is difficult
to establish.) We must always be careful not to use later an intermediate result
"proved" in the middle of a proof by contradiction.
There is no direct way to extract the required pair (x, y) from the proof of the
theorem. The best you can do is to extract two pairs and claim with confidence
that one of them does the trick-but you will not know which. Such a proof is
called nonconstructive and is not unusual among indirect proofs. Although some
mathematicians do not accept nonconstructive proofs, most see them as perfectly
valid. In any case, we shall as much as possible refrain from using them in the
context of algorithmics.
16 Preliminaries Chapter 1
(not counting the solution obtained by multiplying each of these numbers by 2).
Note that 4224814 is a 23-figure number.
Pell's equation provides an even more extreme case of compelling but incorrect
inductive reasoning. Consider the polynomial p (n)= 991n 2 + 1. The question is
whether there is a positive integer n such that p(n) is a perfect square. If you try
various values for n, you will find it increasingly tempting to assume inductively
Section 1.6 Proof technique 2 - Mathematical induction 17
that the answer is negative. But in fact a perfect square can be obtained with this
polynomial: the smallest solution is obtained when
many prime numbers (Theorem 1.5.1) and that multiplication a la russe is a correct
algorithm (Theorem 1.6.4) can be proved in a rigorous deductive manner, without
any need for experimental data. Inductive reasonings are to be banned from math-
ematics. Right? Wrong! In reality, mathematics is often very much an experimental
science. It is not unusual that a mathematician will discover a mathematical truth
by considering several special cases and inferring from them by induction a general
rule that seems plausible. For instance, if I notice that
13 = 1 12
13 + 23 9 = 32
13 + 23+ 3 = 36 62
13 + 23 + 3 + 4 = 100 = 102
13 + 23 + 33 + 43 + 53 = 225 152,
I may begin to suspect that the sum of the cubes of the first n positive integers is
always a perfect square. It turns out in this case that inductive reasoning yields a
correct law. If I am even more perceptive, I may realize that this sum of cubes is
precisely the square of the sum of the first n positive integers; see Problem 1.21.
However, no matter how compelling the evidence becomes when more and
more values of n are tried, a general rule of this sort cannot be asserted on the
basis of inductive evidence only. The difference between mathematics and the in-
herently experimental sciences is that once a general mathematical law has been
discovered by induction, we may hope to prove it rigorously by applying the deduc-
tive approach. Nevertheless, induction has its place in the mathematical process.
Otherwise, how could you hope to prove rigorously a theorem whose statement
has not even been formulated? To sum up, induction is necessary for formulating
conjectures and deduction is equally necessary for proving them or sometimes dis-
proving them. Neither technique can take the place of the other. Deduction alone
is sufficient for "dead" or frozen mathematics, such as Euclid's Elements (perhaps
history's highest monument to deductive mathematics, although much of its mate-
rial was no doubt discovered by inductive reasoning). But induction is required to
keep mathematics alive. As P6lya once said, "mathematics presented with rigor is
a systematic deductive science but mathematics in the making is an experimental
inductive science".
Finally, the punch line of this digression: one of the most useful deductive
techniques available in mathematics has the misfortune to be called mathematical
induction. This terminology is confusing, but we must live with it.
function sq(n)
if n = 0 then return 0
else return 2n + sq(n -1)-1
By induction, it seems obvious that sq(n)= n2 for all n > 0, but how could this
be proved rigorously? Is it even true? Let us say that the algorithm succeeds on
integer n whenever sq(n) = n2 , and that it fails otherwise.
Consider any integer n > 1 and assume for the moment that the algorithm
succeeds on n -1. By definition of the algorithm, sq (n) = 2n + sq (n - 1) -1. By our
assumption sq(n -1) = (n -1)2. Therefore
2
sq(n)=2n+(n -1)21 =2n+(n -2n+1)- = n2 .
What have we achieved? We have proved that the algorithm must succeed on n
whenever it succeeds on n -1, provided n > 1. In addition, it clearly succeeds
on n = 0.
The principleof mathematical induction, described below, allows us to infer from
the above that the algorithm succeeds on all n > 0. There are two ways of under-
standing why this conclusion follows: constructively and by contradiction. Con-
sider any positive integer m on which you wish to prove that the algorithm suc-
ceeds. For the sake of argument, assume that m > 9 (smaller values can be proved
easily). We know already that the algorithm succeeds on 4. From the general rule
that it must succeed on n whenever it succeeds on n - 1 for n > 1, we infer that
it also succeeds on 5. Applying this rule again shows that the algorithm succeeds
on 6 as well. Since it succeeds on 6, it must also succeed on 7, and so on. This
reasoning continues as many times as necessary to arrive at the conclusion that the
algorithm succeeds on m - 1. Finally, since it succeeds on m -1, it must succeed
on m as well. It is clear that we could carry out this reasoning explicitly-with no
need for "and so on"-for any fixed positive value of m.
If we prefer a single proof that works for all n > 0 and that does not contain
and-so-on's, we must accept the axiom of the least integer, which says that every
nonempty set of positive integers contains a smallest element; see Problem 1.24.
The axiom allows us to use this smallest number as a foundation from which to
prove theorems.
Now, to prove the correctness of the algorithm, assume for a contradiction that
there exists at least one positive integer on which the algorithm fails. Let n stand
for the smallest such integer, which exists by the axiom of the least integer. Firstly,
n must be greater than or equal to 5 since we have already verified that sq ( i)= 2
when i = 1, 2, 3 or 4. Secondly, the algorithm must succeed on n - 1 for otherwise
n would not be the smallest positive integer on which it fails. But this implies
by our general rule that the algorithm also succeeds on n, which contradicts our
assumption about the choice of n. Therefore such an n cannot exist, which means
that the algorithm succeeds on every positive integer. Since we also know that the
algorithm succeeds on 0, we conclude that sq(n)= n 2 for all integers n > 0.
We now spell out a simple version of the principle of mathematical induction,
which is sufficient in many cases. A more powerful version of the principle is
given in Section 1.6.3. Consider any property P of the integers. For instance, P(n)
could be "sq(n)= n2 If, or "the sum of the cubes of the first n integers is equal to
the square of the sum of those integers", or "n3 < 2n " The first two properties
20 Preliminaries Chapter 1
hold for every n > 0, whereas the third holds provided n > 10. Consider also an
integer a, known as the basis. If
then property P(n) holds for all integers n > a. Using this principle, we could
assert that sq(n)= n2 for all n > 0, immediately after showing that sq(0)= 0 - 02
and that sq(n)= n2 whenever sq(n -1) = (n - 1)2 and n > 1.
Our first example of mathematical induction showed how it can be used to
prove rigorously the correctness of an algorithm. As a second example, let us see
how proofs by mathematical induction can sometimes be turned into algorithms.
This example is also instructive as it makes explicit the proper way to write a proof
by mathematical induction. The discussion that follows stresses the important
points common to all such proofs.
Consider the following tiling problem. You are given a board divided into
equal squares. There are m squares in each row and m squares in each column,
where m is a power of 2. One arbitrary square of the board is distinguished as
special; see Figure 1.5(a).
I l-
- - -
- - -
You are also given a supply of tiles, each of which looks like a 2 x 2 board with one
square removed, as illustrated in Figure 1.5(b). Your puzzle is to cover the board
Section 1.6 Proof technique 2 - Mathematical induction 21
with these tiles so that each square is covered exactly once, with the exception of
the special square, which is not covered at all. Such a covering is called a tiling.
Figure 1.5(d) gives a solution to the instance given in Figure 1.5(a).
Proof The proof is by mathematical induction on the integer n such that m = 2n.
• Basis: The case n = 0 is trivially satisfied. Here m = 1, and the 1 x 1 "board"
is a single square, which is necessarily special. Such a board is tiled by doing
nothing! (If you do not like this argument, check the next simplest case: if n = 1,
then m = 2 and any 2 x 2 board from which you remove one square looks
exactly like a tile by definition.)
• Induction step: Consider any n > 1. Let m = 21. Assume the induction hypoth-
esis that the theorem is true for 2" 1 x 2"-1 boards. Consider an m x m board,
containing one arbitrarily placed special square. Divide the board into 4 equal
sub-boards by halving it horizontally and vertically. The original special square
now belongs to exactly one of the sub-boards. Place one tile in the middle of the
original board so as to cover exactly one square of each of the other three sub-
boards; see Figure 1.5(c). Call each of the three squares thus covered "special"
for the corresponding sub-board. We are left with four 2"-1 x 2n 1 sub-boards,
each containing one special square. By our induction hypothesis, each of these
sub-boards can be tiled. The final solution is obtained by combining the filings
of the sub-boards together with the tile placed in the middle of the original
board.
Since the theorem is true when m = 20, and since its truth for m = 2" follows
from its assumed truth for m = 2n-1 for all n > 1, it follows from the principle of
mathematical induction that the theorem is true for all m provided m is a power
of 2. U
The basis step is followed by the induction step, which is usually more sub-
stantial. This should start with "consider any n > a" (or equivalently "consider
any n > a + 1"). It should continue with an explicit statement of the induction hy-
pothesis, which essentially states that we assume P(n -1) to hold. At that point,
it remains to prove that we can infer that P(n) holds assuming the induction hy-
pothesis. Finally, an additional sentence such as the one at the end of the proof
of Theorem 1.6.1 can be inserted to conclude the reasoning, but this is generally
unnecessary.
Concerning the induction hypothesis, it is important to understand that we as-
sume that P (n - 1) holds on a provisionalbasis; we do not really know that it holds
until the theorem has been proved. In other words, the point of the induction step
is to prove that the truth of P(n) would follow logically from that of P(n -1), re-
gardless of whether or not P (n -1) actually holds. If in fact P (n -1) does not hold,
the induction step does not allow us to conclude anything about the truth of P (n).
For instance, consider the statement "n3 < 2n ", which we shall denote P(n).
For positive integer n, it is easy to show that n3 < 2 x (n - 1)3 if and only if n > 5.
Consider any n > 5 and provisionally assume that P(n -1) holds. Now
n3 < 2 x (n- 1)3 because n > 5
< 2 x 2n-1 by the assumption that P(n -1) holds
2n.
Thus we see that P(n) follows logically from P(n -1) whenever n > 5. Never-
theless P(4) does not hold (it would say 43 < 24, which is 64 < 16) and therefore
nothing can be inferred concerning the truth of P(5). By trial and error, we find
however that P(10) does hold (103 = 1000 < 210 = 1024). Therefore, it is legitimate
to infer that P(11) holds as well, and from the truth of P(11) it follows that P(12)
holds also, and so on. By the principle of mathematical induction, since P(10)
holds and since P(n) follows from P(n -1) whenever n > 5, we conclude that
n3 < 2n is true for all n > 10. It is instructive to note that P(n) holds also for n = 0
and n = 1, but that we cannot use these points as the basis of the mathematical
induction because the induction step does not apply for such small values of n.
It may happen that the property to be proved is not concerned with the set of
all integers not smaller than a given basis. Our tiling puzzle, for instance, concerns
only the set of integers that are powers of 2. Sometimes, the property does not
concern integers at all. For instance, it is not unusual in algorithmics to wish to
prove a property of graphs. (It could even be said that our tiling problem is not
really concerned with integers, but rather with boards and tiles, but that would be
hairsplitting.) In such cases, if simple mathematical induction is to be used, the
property to be proved should first be transformed into a property of the set of all
integers not smaller than some basis point. (An alternative approach is given in
Section 1.6.3.) In our tiling example, we proved that P(m) holds for all powers
of 2 by proving that Q(n) holds for all n > 0, where Q(n) is equivalent to P(2").
When this transformation is necessary, it is customary to begin the proof (as we
did) with the words "The proof is by mathematical induction on such-and-such a
parameter". Thus we find proofs on the number of nodes in a graph, on the length
of a character string, on the depth of a tree, and so on.
Section 1.6 Proof technique 2 - Mathematical induction 23
Proof We shall prove that any set of horses contains only horses of a single colour. In par-
ticular, this will be true of the set of all horses. Let H be an arbitrary set of horses.
Let us prove by mathematical induction on the number n of horses in H4 that they
are all the same colour.
o Basis: The case n = 0 is trivially true: if there are no horses in Hf, then surely
they are all the same colour! (If you do not like this argument, check the next
simplest case: if n= 1, then there is only one horse in -, and again it is
vacuously clear that "they" are "all" the same colour.)
o Induction step: Consider any number n of horses in -f. Call these horses
hi, h2, ... , h, . Assume the induction hypothesis that any set of n -1 horses
contains only horses of a single colour (but of course the horses in one set
could a priori be a different colour from the horses in another). Let H1, be the
set obtained by removing horse hi from Hf, and let -f2 be defined similarly;
see Figure 1.6.
Hfi: h2 h3 h4 h5
H42: hi h3 h4 h5
Figure 1.6. Horses of the same colour (n = 5)
There are n -1 horses in each of these two new sets. Therefore, the induction
hypothesis applies to them. In particular, all the horses in •1f are of a single
colour, say cl, and all the horses in •2 are also of a single (possibly differ-
ent) colour, say c2. But is it really possible for colour ci to be different from
24 Preliminaries Chapter 1
colour c2 ? Surely not, since horse h, belongs to both sets and therefore both
ci and c2 must be the colour of that horse! Since all the horses in Hf belong
to either HI or f2 (or both), we conclude that they are all the same colour
c = c1 = c2. This completes the induction step and the proof by mathematical
induction. X
Before you continue, figure out the fallacy in the above "proof". If you think
the problem is that our induction hypothesis ("any set of n - 1 horses must contain
only horses of a single colour") was absurd, think again!
Solution: The problem is that "ha belongs to both sets" is not true for n = 2
since h2 does not belong to H2! Our reasoning was impeccable for the basis cases
n = 0 and n = 1. Moreover, it is true that our theorem follows for sets of n horses
assuming that it is true for n -1, but only when n > 3. We can go from 2 to 3,
from 3 to 4, and so on, but not from 1 to 2. Since the basis cases contain only 0
and 1, and since we are not allowed to go from 1 to 2, the induction step cannot get
started. This small missing link in the proof is enough to invalidate it completely.
We encountered a similar situation when we proved that n 3 < 2n: the induction
step did not apply for n < 5, and thus the fact that the statement is true for n = 0
and n = 1 was irrelevant. The important difference was that n3 < 2n is true for
n = 10, and therefore also for all larger values of n.
Proof The proof is by generalized mathematical induction. In this case, there is no need
for a basis.
o Induction step: Consider any composite integer n > 4. (Note that 4 is the small-
est positive composite integer, hence it would make no sense to consider smaller
values of n.) Assume the induction hypothesis that any positive composite in-
teger smaller than n can be expressed as a product of prime numbers. (In the
26 Preliminaries Chapter 1
In either case, this completes the proof of the induction step and thus of the
theorem. 0
Until now, the induction hypothesis was always concerned with a finite set
of instances (exactly one for simple mathematical induction, usually many but
sometimes none for generalized mathematical induction). In our final example
of proof by generalized mathematical induction, the induction hypothesis covers
an infinity of cases even when proving the induction step on a finite instance!
This time, we shall prove that multiplication a la russe correctly multiplies any
pair of positive integers. The key observation is that the tableau produced when
multiplying 490 by 2468 is almost identical to Figure 1.2, which was used to multiply
981 by 1234. The only differences are that the first line is missing when multiplying
490 by 2468 and that consequently the term 1234 found in that first line is not added
into the final result; see Figure 1.7. What is the relationship between instances
(981 1234) and (4902468)? Of course, it is that 490 = 981 . 2 and 2468 = 2 x 1234.
defined below. The second example shows how the technique can be useful in the
analysis of algorithms.
The sequence named for Fibonacci, an Italian mathematician of the twelfth
century, is traditionally introduced in terms of rabbits (although this time not out
of a hat). This is how Fibonacci himself introduced it in his Liberabaci,published
in 1202. Suppose that every month a breeding pair of rabbits produce a pair of
offspring. The offspring will in their turn start breeding two months later, and
so on. Thus if you buy a pair of baby rabbits in month 1, you will still have just
one pair in month 2. In month 3 they will start breeding, so you now have two
pairs; in month 4 they will produce a second pair of offspring, so you now have
three pairs; in month 5 both they and their first pair of offspring will produce baby
rabbits, so you now have five pairs; and so on. If no rabbits ever die, the number
of pairs you have each month will be given by the terms of the Fibonacci sequence,
defined more formally by the following recurrence:
I fo = O;fi =1 and
f
fn =ni+fn 2 forn>2.
The sequence begins 0, 1, 1, 2, 3, 5, 8, 13, 21, 34 ... It has numerous applications
in computer science, in mathematics, and in the theory of games. De Moivre
obtained the following formula, which is easy to prove by mathematical induction
(see Problem 1.27):
An=
[¢n - -()n]
5
where P = (1 + V5) /2 is the golden ratio. Since 0 < kh-1 < 1, the term ( P) n can
be neglected when n is large. Hence the value of ft is roughly (4"/\5, which is
exponential in n.
But where does de Moivre's formula come from? In Section 4.7 we shall see a
general technique for solving Fibonacci-like recurrences. In the meantime, assume
you do not know any such techniques, nor do you know de Moivre's formula,
yet you would like to have an idea of the behaviour of the Fibonacci sequence.
If you compute the sequence for a while, you soon discover that it grows quite
rapidly (fico is a 21-figure number). Thus, the conjecture "the Fibonacci sequence
grows exponentially fast" is reasonable. How would you prove it? The difficulty is
that this conjecture is too vague to be proved directly by mathematical induction:
remember that it is often easier to prove a stronger theorem than a weaker one.
Let us therefore guess that there exists a real number x > 1 such that ft > Xn for
each sufficiently large integer n. (This statement could not possibly be true for
every positive integer n since it obviously fails on n < 2.) In symbols,
Conjecture: (3 x > 1) (V n X N) [fet Xn>.
There are two unknowns in the theorem we wish to prove: the value of x and
the precise meaning of "for each sufficiently large". Let us not worry about the latter
for the time being. Let P, (n) stand for "ft > Xn ". Consider any sufficiently large
integer n. The approach by constructive induction consists of asking ourselves for
which values of x Px (n) follows from the partiallyspecified induction hypothesis that
Section 1.6 Proof technique 2 - Mathematical induction 29
P, (m) holds for each integer m that is less than n but that is still sufficiently large.
Using the definition of the Fibonacci sequence and this hypothesis, and provided
n - 1 and n - 2 are also "sufficiently large",
2
f, = fn-l + fn-2 > X' 1 + Xn-2 = (X-1 + x- ) Xn
To conclude that fn, > xn, we need x-1 + x-2 > 1, or equivalently x2 - x -1 S 0.
By elementary algebra, since we are only interested in the case x > 1, solving this
quadratic equation implies that 1 < x < ¢) = (1 + f5 ) /2.
We have established that P (n) follows from P, (n -1) and Px (n - 2) provided
1 < x < 4. This corresponds to proving the induction step in a proof by mathe-
matical induction. To apply the principle of mathematical induction and conclude
that the Fibonacci sequence grows exponentially fast, we must also take care of the
basis. In this case, because the truth of Px(n) depends only on that of P,(n -1)
and Px (n - 2), it is sufficient to verify that property P, holds on two consecutive
positive integers to assert that it holds from that point on.
It turns out that there are no integers n such that fn, > 4A. However, finding
two consecutive integers on which property P holds is easy for any x strictly
smaller than 4,. Forinstance, both PX (11) and P, (12) hold when x = 3/2. Therefore,
f, Ž ( 2 )n for all n > 11. This completes the proof that the Fibonacci sequence grows
at least exponentially. The same process can be used to prove that it grows nofaster
than exponentially: fA, < yn for every positive integer n provided y > 4,. Here
again, the condition on y is not God-given: it is obtained by constructive induction
when trying to find constraints on y that make the induction step go through.
Putting those observations together, we conclude that fA, grows exponentially;
more precisely, it grows like a power of a number close to 4,. The remarkable thing
is that we can reach this conclusion with no need for an explicit formula such as
de Moivre's.
Our second example of constructive induction concerns the analysis of the
obvious algorithm for computing the Fibonacci sequence.
function Fibonacci(n)
if n < 2 then return n
else return Fibonacci(n-1) +Fibonacci(n - 2) (*)
Let g(n) stand for the number of times instruction (*) is performed when
Fibonacci(n) is called (counting the instructions performed in recursive calls). This
function is interesting because g(n) gives a bound on the time required by a call
on Fibonacci(n).
Clearly, 9(0)= g(1)= 0. When n > 2, instruction (*) is executed once at the
top level, and g(n - 1) and g(n - 2) times by the first and second recursive calls,
respectively. Therefore,
This formula is similar to the recurrence that defines the Fibonacci sequence
itself. It is therefore reasonable to conjecture the existence of positive real constants
30 Preliminaries Chapter 1
a and b such that af, • g (n) • bf, for each sufficiently large integer n. Using
constructive induction, it is straightforward to find that afn g (n) holds for each
sufficiently large n provided it holds on two consecutive integers, regardless of the
value of a. For instance, taking a = 1, fn < g(n) holds for all n > 2.
However when we try to prove the other part of our conjecture, namely that
there exists a b such that g (n) < bf, for each sufficiently large n, we run into
trouble. To see what happens, let Pb(n) stand for "g(n) < bf,", and consider
any sufficiently large integer n (to be made precise later). We wish to determine
conditions on the value of b that make Pb (n) follow from the hypothesis that
Pb (m) holds for each sufficiently large m < n. Using the definition of the Fibonacci
sequence and this partially specified induction hypothesis, and provided n - 1 and
n - 2 are also sufficiently large,
g(n)= g(n - l)+g(n -2)+1 < bfn1 + bfn-2 + 1 = bfn + 1,
where the last equality comes from fn = fn 1 + fn-2. Thus, we infer that
g(n)< bJn + 1, but not that g(n)< bfa. Regardless of the value of b, we can-
not make the induction step work!
Does this mean the original conjecture was false, or merely that construc-
tive induction is powerless to prove it? The answer is: neither. The trick is to
use constructive induction to prove there exist positive real constants b and c
such that g (n) < bfn - c for each sufficiently large n. This may seem odd, since
g(n) < bfn -c is a stronger statement than g(n) < bf, , which we were unable to
prove. We may hope for success, however, on the ground that if the statement to
be proved is stronger, then so too is the induction hypothesis it allows us to use;
see the end of Section 1.6.1.
Consider any sufficiently large integer n. We must determine for which values
of b and c the truth of g (n) < bfn - c follows from the partially specified induc-
tion hypothesis that g (m)) bfu - c for each sufficiently large m < n. Using the
definition of the Fibonacci sequence and this hypothesis, and provided n -1 and
n - 2 are also sufficiently large,
g(n) = g(n - 1)+g(n - 2)+1
< bfn 1- c + bfn 2 - c + 1 = bfn - 2c + 1.
To conclude that g(n) < bf, -c, it suffices that -2c + 1 < -c, or equivalently
that c > 1. We have thus established that the truth of our conjecture on any given
integer n follows from its assumed truth on the two previous integers provided
c 1, regardless of the value of b. Before we can claim the desired theorem, we
still need to determine values of b and c that make it work on two consecutive
integers. For instance, b = 2 and c = 1 make it work on n = 1 and n = 2, and
therefore g(n)< 2fn -1 for all n 2 1.
The key idea of strengthening the incompletely specified statement to be proved
when constructive induction fails may again appear to be produced like a rabbit
out of a hat. Nevertheless, this idea comes very naturally with experience. To gain
such experience, work Problems 1.31 and 1.33. Unlike the Fibonacci examples,
which could have been handled easily by the techniques of Section 4.7, the cases
tackled in these problems are best handled by constructive induction.
Section 1.7 Some reminders 31
1.7.1 Limits
Let f(n) be any function of n. We say that f (n) tends to a limit a as n tends to
infinity if f(n) is nearly equal to a when n is large. The following formal definition
makes this notion more precise.
In other words, however small the positive number 6, we can find a threshold no (6)
corresponding to 3, such that f(n) differs from a by less than 6 for all values of n
greater than or equal to no(6). When 6(n) tends to a limit a as n tends to infinity,
we write
lim f(n)= a
n-X
Once again this means that we can find a threshold no(A) corresponding to A,
such that f(n) is greater than A for all values of n greater than or equal to no (A).
We write
limf(n)= +o
n-OO
A similar definition takes care of functions such as -n 2 that take increasingly large
negative values as n tends to infinity. Such functions are said to tend to minus
infinity.
Finally, when f(n) does not tend to a limit, nor to + o nor to - o, we say that
f(n) oscillatesas n tends to infinity. If it is possible to find a positive constant K such
that -K < f(n) < K for all values of n, then we say that f(n) oscillates finitely;
otherwise f(n) oscillates infinitely. For example, the function (- 1)" oscillates
finitely; the function (- 1) n oscillates infinitely.
The following propositions state some general properties of limits.
32 Preliminaries Chapter 1
Proposition 1.7.3 Iftwofunctions f(n) and g(n) tend to limits a and b respec-
tively as n tends to infinity, then f (n) +g (n) tends to the limit a + b.
Proposition 1.7.4 Iftwofunctions f(n) and g(n) tend to limits a and b respec |
lively as n tends to infinity, then f (n)g(n) tends to the limit ab.
Both these propositions may be extended to the sum or product of any finite number
of functions of n. An important particular case of Proposition 1.7.4 is when g (n)
is constant. The proposition then states that if the limit of f(n) is a, then the
limit of cf (n) is ca, where c is any constant. It is perfectly possible for either
f(n)+g(n) or f(n)g(n) to tend to a limit even though neither f(n) nor g(n)
does so; see Problem 1.34. Finally the following proposition deals with division.
Proposition 1.7.5 If twofunctions f (n) and g (n) tend to limits a and b respec-
tively as n tends to infinity, and b is not zero, then f(n)/g(n) tends to the limit
a/b.
These propositions, although simple, are surprisingly powerful. For instance, sup-
pose we want to know the behaviour as n tends to infinity of the most general
rational function of n, namely
S(n)= aOnP + ajnP 1 + + ap
bonq + blnq1 + .+ bq'
where neither ao nor bo is zero. Writing S (n) in the form
and applying the above propositions, it is easy to see that the function in braces
tends to the limit ao / bo as n tends to infinity. Furthermore nP -q tends to the limit O
if p < q; nP-q 1 and therefore nP-q tends to the limit I if p = q; and nP-q tends
to infinity if p > q. Hence
lim S(n)= O when p < q,
n-.0
lim S(n)= ao/bo when p = q,
and S(n) tends to plus or minus infinity when p > q, depending on the sign of
ao/ bo.
or alternatively that both these limits are infinite. Supposefurther that the domains
off and g can be extended to some real interval [no, + oa) in such a way that (a) the
correspondingnew functions f and g are differentiable on this interval, and also
that (b)g'(x), the derivative of 4 (x), is never zero for x e [no, + oo), then
For a simple example, suppose f(n)= logn and g(n)= na, where a > 0 is an
arbitrary positive constant. Now both f(n) and g(n) tend to infinity as n tends
to infinity, so we cannot use Proposition 1.7.5. However if we extend f (n) to
f(x) = log x and g (n) to 4 (x) = xa, de l'H6pital's rule allows us to conclude that
Proposition 1.7.8 If twofunctions f(n) and g(n) tend to limits a and b respec-
tively as n tends to infinity, and if f (n) < g (n) for all sufficiently large n, then
a < b.
34 Preliminaries Chapter 1
It is often convenient to change the notation slightly, and to write this equation in
the form
Sn = U1 +U2+ +Un,
or simply
n
Sn UAi,
iil
which we read as "the sum of ui as i goes from 1 to n". Now if sn tends to a limit
s when n tends to infinity, we have
s = n-OO
lim Eu,
S ZU 1
Lil
or as
S U 1 + U2 +
where the dots show that the series is continued indefinitely. In this case we say
that the series is convergent, and we call s the sum of the series.
If on the other hand s, does not tend to a limit, but s, tends to + co or to -co,
then we say that the series diverges, to + oCor - co as the case may be. Finally if s,
does not tend to a limit, nor to + oaor - oo, then we say that the series oscillates
(finitely or infinitely, as the case may be). It is evident that a series with no negative
terms must either converge, or else diverge to + Co: it cannot oscillate.
Two particularly simple kinds of series are arithmetic series and geometric series.
In an arithmetic series the difference between successive terms is constant, so we
may represent the first n terms of the series as
a, a + d, a + 2d,...,a + (n - I)d,
where a, the first term, and d, the difference between successive terms, are suitable
constants. In a geometric series, the ratio of successive terms is constant, so that
here the first n terms of the series may be represented as
Proposition 1.7.9 (Arithmetic series) Let s, be the sum of the first n terms of
the arithmetic series a, a + d, a + 2d,... Then s, = an + n(n - 1)d/2.
The series diverges unless a = d = 0, in which case sn = 0 for all n. The proposition
is easily proved. First write the sum as
where there are n equal terms on the right. The result follows immediately.
Proposition 1.7.10 (Geometric series) Let Sn be the sum of the first n terms of
the geometric series a, ar, ar 2 ,... Then sn a(1 - rn)/(1- r), except in the
special case in which r = 1, when s, = an.
so that
rsn = a(r + r 2 + r3 + + r).
Subtracting the second equation from the first, we obtain immediately
In the general case (that is, when r 7 1) the sum sn of a geometric series tends to a
limit if and only if rn does so. This gives us the following proposition.
A similar technique can be used to obtain a useful result concerning yet another
series. If we write
s= r+2r 2 +3r 3
+...+(n -l)r 1
we have that
rsn= r 2 + 2r3 + 3r 4 + (n -1)r'.
Subtracting the second equation from the first, we obtain
2 3
( r)s, r+r +r + -**+r -1 (nl 1)r'
2
r(l-+-r+r ±..+rrn l)-nrn
+
=r(l - r') M1 r) -nr'
The right-hand side tends to a limit as n tends to infinity if -1 < r < 1 (use
Proposition 1.7.6), giving us the following result.
n
i(i+1)...(i+k)=n(n+1)...(n+k+1)/(k+2).
i=l
n
ir = nr+l/(r + 1)+pr(n),
n(n+1)...(n+r)/(r+1)+p'(n)
nr+1-/(r + 1)+p"(n)
where p(i) is a polynomial of degree not more than r -1, and p'(n) and p'(n)
are polynomials of degree not more than r. We leave the reader to fill in the details
of the argument.
Finally we consider briefly series of the form 1 r, 2-r, ... where r is a positive
integer. When r = 1 we obtain the series 1,1/2,1/3,... known as the harmonic
series. It is easy to show that this series diverges. The following proposition gives
us a better idea of its behaviour.
To see this, consider Figure 1.8. The area under the "staircase" gives the sum of the
harmonic series; the area under the lower curve, y = / (x + 1), is less than this
sum, while the area under the upper curve, which is y = 1 for x < 1 and y = 1 /x
thereafter, is greater than or equal to the sum. Hence
A d <Hn < 1 + --
where y t 0.57721 ... is Euler's constant, but the proof of this is beyond the scope
of our book.
0 1 2 3 n-l n
It is easy to show that series of the form 1, 1/2r, 1/3r,... with r > 1 are all con-
vergent, and that the sum of such a series is less than r / (r - 1); see Problem 1.39.
For example
lirm ( +
1
+
2
2 +32 ***+ 2) 6 1.64493...
suppose they are labelled a, b, and so on, with each object having a distinct label.
From now on we shall simply refer to a, for example, when we mean "the object
labelled a".
Our first definition concerns the number of ways we can arrange these n objects
in order.
For example, if we have four objects a, b, c and d, we can arrange them in order
in 24 different ways:
abcd abdc acbd acdb adbc adcb
bacd badc bcad bcda bdac bdca
cabd cadb cbad cbda cdab cdba
dabc dacb dbac dbca dcab dcba
The first object in the permutation may be chosen in n ways; once the first object
has been chosen, the second may be chosen in n - 1 different ways; once the first
and second have been chosen, the third may be chosen in n - 2 different ways, and
so on. There are two possibilities when we choose last but one object, and there is
only one way of choosing the last object. The total number of permutations of n
objects is therefore
n(n - 1) (n - 2) .. 2 1 = n!
Next we consider the number of ways of choosing a certain number of these
objects, without regard to the order in which we make our choices.
For example, if we have five objects a, b, c, d and e, we can choose three of them
in 10 different ways if order is not taken into account:
abc abd abe acd ace
ade bcd bce bde cde
A choice such as eba does not appear in this list, since it is the same as abe when the
order of the objects is disregarded.
When we make our choice of r objects from among n, there are n ways of
making the first choice. When the first object has been chosen, there remain n - 1
ways of choosing the second, and so on. When we choose the last of the r objects
we want, there remain n - r + 1 possibilities. Hence there are
ways of choosing r objects from n when order is taken into account. However when
we do not take order into account, we can permute the r chosen objects any way
40 Preliminaries Chapter 1
we like, and it still counts as the same combination. In the example above, for
instance, the six ordered choices abc, acb, bac, bca, cab and cba all count as the same
combination. Since there are r! ways of permuting r objects, the number of ways
of choosing r objects from n when order is not taken into account is
(n) n(n - 1) n - 2) .. (n - r + 1)(1)
Several alternative notations are used for the number of combinations of n objects
taken r at a time: among others, you may encounter nCr and Cr'. This accounts
for the common, but illogical, habit of writing (n) but reading this symbol aloud
as "n C r".
When r > n, Equation 1.1 gives (nr) = 0, which is sensible: there are no ways of
choosing more than n objects from n. It is convenient to take (On) = 1 (there is just
one way of not choosing any objects), and when r < 0, which has no combinatorial
meaning, we define ( 0. When 0 < r < n, Equation 1.1 can conveniently be
0)
written in the form
(n) n!
(r) r!(n - r)!
A simple argument allows us to obtain an important relation. Pick any one of
the n objects, and say that this object is "special". Now when we choose r objects
from the n available, we can distinguish those choices that include the special
object, and those that do not. For instance, if we are to choose three objects among
a, b, c, d and e, and the special object is b, then the choices that include the special
object are abc, abd, abe, bcd, bce and bcd, while those that do not include the special
object are acd, ace, ade and cde. To make a selection of the first kind, we can first
choose the special object (since the order of our choices is not important), and then
complete our selection by picking r - 1 objects among the n - 1 that are left; this
can be done in ( -1) ways. To make a selection of the second kind, we must choose
our r objects among the n - 1 that are not special; this can be done in (or 1) ways.
Since every selection of r objects from among the n available is of one kind or the
other, we must have
n\r 0 1 2 3 4 5
0 1
1 1 1
2 1 2 1
3 1 3 3 1
4 1 4 6 4 1
5 1 5 10 10 5 1
Figure 1.9. Combinations of n objects taken r at a time
1
(1 + X)n= 1 + ()X + (2)x + *+ n +Xn.
Using this theorem it is easy to obtain interesting results concerning the binomial
coefficients. For example, on setting x = 1 we obtain immediately
In combinatorial terms, the sum on the left is the number of ways of choosing an
arbitrary number of objects (including 0 objects) from n when order is not taken
into account. Since there are 2 possibilities for each of the n objects-we can take
it or leave it-there are 2n ways this can be done. Similarly, on setting x = -1 in
Theorem 1.7.20 we find
n (n) = n (n)
r odd r even
passing a given point, the sample space is S = {0, 1,2, ...}. For the random ex-
periment that consists of measuring the response time of a computer system, the
sample space is S = {tIt > 01.
A sample space can be finite or infinite, and it can be discrete or continuous.
A sample space is said to be discrete if the number of sample points is finite, or
if they can be labelled 1, 2, 3, and so on using the positive integers; otherwise the
sample space is continuous. In the examples above, S is finite and therefore discrete
for the random experiment of throwing a dice; S is infinite but discrete for the
experiment of counting cars, for the possible outcomes can be made to correspond
to the positive integers; and S is continuous for the experiment of measuring a
response time. In this book we shall be concerned almost entirely with random
experiments whose sample space is finite, and therefore discrete.
An event is now defined as a collection of sample points, that is, as a subset
of the sample space. An event A is said to occur if the random experiment is
performed and the observed outcome is an element of the set A. For example,
when we throw a dice, the event described by the statement "The number shown
on the dice is odd" corresponds to the subset A = {1,3, 5} of the sample space.
When we count cars, the event described by "The observed number of cars is less
than 20" corresponds to the subset A = {0, 1, 2,...,18,191, and so on. Informally,
we use the word "event" to refer either to the statement describing it, or to the
corresponding subset of the sample space. In particular, the entire sample space
S is an event called the universal event, and the empty set 0 is an event called the
impossible event.
Since a sample space S is a set and an event A is a subset of S, we can form new
events by the usual operations of set theory. Thus to any event A there corresponds
an event A consisting of all the sample points of S that are not in A. Clearly A is the
event "A does not occur". Similarly the event A u B corresponds to the statement
"Either A or B occurs", while the event A n B corresponds to "Both A and B occur".
Two events are said to be mutually exclusive if A n B = 0.
Finally a probability measure is a function that assigns a numerical value Pr[A]
to every event A of the sample space. Obviously the probability of an event is
supposed to measure in some sense the relative likelihood that the event will occur
if the underlying random experiment is performed. The philosophical bases of
the notion of probability are controversial (What precisely does it mean to say that
the probability of rain tomorrow is 0.25?), but there is agreement that any sensible
probability measure must satisfy the following three axioms:
for any finite collection Al, A2 ,..., A, of mutually exclusive events. (A modified
form of axiom 3 is necessary if the sample space is infinite, but that need not concern
us here.) The axioms lead to a number of consequences, among which are
4. Pr[A]= 1 - Pr[A] for any event A; and
5. Pr[A u B] = Pr[A] + Pr[B] - Pr[A n B] for any events A and B.
The basic procedure for solving problems in probability can now be outlined:
first, identify the sample space S; second, assign probabilities to the elements in S;
third, identify the events of interest; and finally, compute the desired probabilities.
For instance, suppose we want to know the probability that a random number
generator will produce a value that is prime. We have first to identify the sample
space. Suppose we know that the generator can produce any integer value between
0 and 9999 inclusive. The sample space (that is, the set of possible outcomes) is
therefore {0, 1,2, . . ., 9999 1. Next, we must assign probabilities to the elements
of S. If the generator produces each possible elementary event with equal prob-
ability, it follows that Pr[0]= Pr[1]= =. Pr[9999]= 1/10000. Third, the inter-
esting event is "the generator produces a prime", which corresponds to the subset
A = {2,3,5,...,9967,9973}. Finally the interesting probability is Pr[A], which can
be computed as ,eeA Pr [ e], where the sum is over the elementary events that com-
pose A. Since the probability of each elementary event is 1/ 10000, and there are
1229 elementary events in A, we find Pr[A]= 0.1229.
So far we have assumed that all we know about the outcome of some random
experiment is that it must correspond to some sample point in the sample space S.
However it is often useful to calculate the probability that an event A occurs when
it is known that the outcome of the experiment is contained in some subset B of
the sample space. For example, we might wish to calculate the probability that the
random number generator of the previous paragraph has produced a prime when
we know that it has produced an odd number. The symbol for this probability is
Pr[AIB], called the conditional probability of A given B. Obviously this conditional
probability is only interesting when Pr[B] ¢ 0.
In essence, the extra information tells us that the outcome of the experiment
lies in a new sample space, namely B. Since the sum of the probabilities of the
elementary events in the sample space must be 1, we scale up the original values
to fulfill this condition. Furthermore we are only interested now in that part of A
that lies in B, namely A n B. Thus the conditional probability is given by
Pr[AIBI= Pr[A B]
Pr[A~B]Pr[B]
For our example, A is the event "the generator produces a prime" and B is the
event "the generator produces an odd number". Now B includes 5000 elementary
events, so Pr[B]= 0.5.. There are 1228 odd primes less than 10000 (only the even
prime 2 drops out), so Pr[A n B]= 0.1228. Hence the probability that the generator
has produced a prime, given that it has produced an odd number, is
and hence the knowledge that event B has occurred does not change the probability
that event A will occur. In fact this condition can be used as an alternative definition
of independence.
In Chapter 10 we shall be looking at probabilistic algorithms for determining
whether or not a given integer is prime. In this context the following approximations
are useful. Suppose the sample space S contains a large number n of consecutive
integers, and that the probability of each of these outcomes is the same, namely
1/ n. For example, S might be the set {1, 2, . . ., n }. Let Di be the event "the outcome
is divisible by i". Since the number of outcomes in S that are divisible by i is ap-
proximately n/ i, we have that Pr[D 1] (n/ i) x (1 /n) = I / i. Furthermore, if p and
q are two different primes, the events Dp and Dq are approximately independent,
so that
Pr[Dp n Dqh] Pr[Dp]Pr[Dq]; 1/pq.
Of course this may not be true if either p or q is not prime: clearly the events D2 and
D4 are not independent; also when S contains only a small number of elements,
the approximations do not work very well, as Problem 1.46 illustrates. Now the
Prime Number Theorem (whose proof is well beyond the scope of this book) tells
us that the number of primes less than n is approximately n/log n, so that if S is
indeed {1, 2, . . ., n}, with equal probability for each outcome, and A is the event
"the outcome is prime", we have that Pr[A] I/logn.
Consider for instance the following problem. A random number generator
produces the value 12262409. We would like to know whether this number is prime,
but we don't have the computing power to compute the answer in a deterministic
way. (Of course this is unrealistic for such a small example.) So what can we say
with the help of a little probability theory?
The first thing to note is that a question such as "What is the probability that
12262409 is prime?" is meaningless. No random experiment is involved here, so we
have no sample space to talk about, and we cannot even begin to assign probabilities
to outcomes. On the other hand, a question such as "What is the probability that
our random number generator has produced a prime?" is meaningful and can be
answered quite easily.
As before, the first step is to identify our sample space, and the second to
assign probabilities to the elements of S. Suppose we know that the genera-
tor produces each of the values from 0 to 99999999 with equal probability: then
S = {0, 1,.. ,99999999}, and the probability of each elementary event in S is 1/108.
The Prime Number Theorem tells us that approximately 10 8 / log 108 5.43 x 106
Section 1.7 Some reminders 45
elements of S are prime. (The correct value is actually 5761455, but the approxima-
tion is good enough for our purposes.) So if A is the event "our generator produces
a prime", we have that Pr[A] 0.0543.
Now let the event Dp be "our generator produces a number that is divisible
by the prime p". Since Pr[Dp l/p, the probability of the complementary event
"our generator produces a number that is not divisible by the prime p " is Pr[1Jp -]
1 - 1/ p. Suppose we test 12262409 by trying to divide it by 2. The attempt fails, but
now we have some additional information, and we can ask "What is the probability
that our generator produces a prime, given that it produces a number not divisible
by 2?" In symbols,
Pr[AID 2 ] = Pr[A n D 2 ]/ Pr[D2 ]
2Pr[A] 0.109.
Here Pr[A n D2] is essentially the same as Pr[A] because all the primes save one
are not divisible by 2. If we next try dividing 12262409 by 3, the attempt again fails,
so now we can ask "What is the probability that our generator produces a prime,
given that it produces a number divisible neither by 2 nor by 3?" This probability
is
Pr[A n~ D2 n~ D3]
Pr[AID2 n D3 ] Pr[D2 n D 3 ]
Pr[A]/Pr[D2]Pr[D3]
3Pr[A]= 0.163.
Continuing in this way, each successive failure to divide 12262409 by a new prime
allows us to ask a more precise question, and to be a little more confident that our
generator has indeed produced a prime. Suppose, however, that at some stage we
try to divide 12262409 by 3121. Now the trial division succeeds, so our next question
would be "What is the probability that our generator has produced a prime, given
that it has produced a number divisible by none of 2, 3, .. ., but divisible by 3121?"
The answer is of course 0: once we have found a divisor of the number produced
by the generator, we are sure it is not prime. Symbolically, this answer is obtained
because
Pr[A n D 2 n n D31 21 ]= 0.
Notice that we cannot start this process of calculating a new probability after
each trial division if we are unable to estimate Pr[A], the unconditional probability
that our generator has produced a prime. In the example we did this using the
Prime Number Theorem plus our knowledge that the generator produces each
value from 0 to 99999999 with equal probability. Suppose on the other hand we are
presented with the number 12262409 and simply told that it has been selected for the
purposes of the example. Then the first question to ask is "What is the probability
that a number selected in some unspecified way for the purposes of an example
is prime?" Clearly this is impossible to answer. The sample space of the random
experiment consisting of choosing a number to serve as an example is unknown,
as are the probabilities associated with the elementary events in this sample space.
We can therefore make no meaningful statement about the probabilities associated
with a number selected in this way.
46 Preliminaries Chapter 1
In many random experiments we are interested less in the outcome itself, than
in some number associated with this outcome. In a horse-race, for instance, we may
be less interested in the name of the winning horse than in the amount we win or
lose on the race. This idea is captured by the notion of a random variable. Formally,
a random variable is a function (and not a variable at all, despite its name) that
assigns a real number to each sample point of some sample space S.
If X is a random variable defined on a sample space S, and x is a real number,
we define the event A, to be the subset of S consisting of all the sample points to
which the random variable X assigns the value x. Thus
Ax = Is E SIX(s)= x}.
The notation X = x is a convenient way to denote the event Ax. Hence we can
write
Pr[X = x] Pr[Ax] = E Pr[s].
ses
x(S) x
If we define p(x) by
p(x)= Pr[X = x]
then p(x) is a new function associated with the random variable X, called the
probability mass function of X. We define the expectation E[X] of X (also called the
mean or the average) by
bability Winnings
0.10 50
0.05 100
0.25 -30
0.50 -30
0.10 15
Figure 1.10. Probability of each outcome
Section 1.7 Some reminders 47
(Remember that the probabilities must sum to 1.) We have made a number of
wagers on the race so that, depending on the outcome, we will win or lose some
money The amount we win or lose is also shown in Figure 1.10. This amount is a
function of the outcome, that is, a random variable. Call this random variable W;
then the table shows that W(Ariel)= 50, WV(Bonbon)= 100, and so on. The random
variable W can take the values -30, 15, 50, or 100, and for instance
= Pr[s]
seS
W(s) -30
Similarly p(15)= Pr[W = 15]= 0.10, p(50)= Pr[W = 50]= 0.10 and p(100)=
Pr[W =100] 0.05. Our expected winnings can be calculated as
E[W] = Exp(x)
x
= -30p(-30) +15p (15) +50p (50) +lOOp(100)
= -11.
Once we have obtained E[X], we can calculate a second useful measure called
the variance of X, denoted by Var[X] or o-x. This is defined by
In words, it is the expected value of the square of the difference between X and its
expectation E[X]. The standarddeviation of X, denoted by ax, is the square root of
the variance. For the horse-race example above, we have
general conditions, the Central Limit Theorem suggests that when n is large, the
average observed value of xi will have a distribution that is approximately normal
with mean E[X] and variance Var[X]/n. To take advantage of this, all we need is
a table of the normal distribution. These are widely available. Such a table tells
us, among other things, that a normal deviate lies between plus or minus 1.960
standard deviations of its mean 95% of the time; 99% of the time it lies within plus
or minus 2.576 standard deviations of its mean.
For example, suppose that by some magic the horse-race described above
could be run 50 times under identical conditions. Suppose we win wi on the
i-th repetition, and let our average winnings be w wI-
wi/50.
1 Then we can
expect that fv will be approximately E[W]= -11. The variance of wv will be
Var[W]/50 = 1326.5/50 = 26.53, and its standard deviation will be the square
root of this, or 5.15, while the Central Limit Theorem tells us that the distribution
of w will be approximately normal. Therefore 95% of the time we can expect our
average winnings fv to lie between -11 - 1.960 x 5.15 and -11 + 1.960 x 5.15, that
is between -21.1 and -0.9. A similar calculation shows that 99% of the time w
will lie between -24.3 and +2.3; see Problem 1.47.
It is usually safe to use the Central Limit Theorem when n is greater than about
25 or so.
1.8 Problems
Problem 1.1. The word "algebra" is also connected with the mathematician
al-Khowarizmi, who gave his name to algorithms. What is the connection?
Problem 1.2. Easter Sunday is in principle the first Sunday after the first full moon
after the spring equinox. Is this rule sufficiently precise to be called an algorithm?
Justify your answer.
Problem 1.3. While executing an algorithm by hand, you have to make a random
choice. To help you, you have a fair coin, which gives the values heads and tails
with equal probability, and a fair dice, which gives each of the values 1 to 6 with
equal probability. You are required to choose each of the values red, yellow and blue
with equal probability. Give at least three different ways of doing this.
Repeat the problem with five colours instead of three.
Problem 1.4. Is it possible that there exists an algorithm for playing a perfect game
of billiards? Justify your answer.
Problem 1.5. Use multiplication a la russe to multiply (a) 63 by 123, and (b) 64
by 123.
Problem 1.6. Find a pocket calculator accurate to at least eight figures, that is,
which can multiply a four-figure number by a four-figure number and get the
correct eight-figure answer. You are required to multiply 31415975 by 8182818.
Show how the divide-and-conquer multiplication algorithm of Section 1.2 can be
used to reduce the problem to a small number of calculations that you can do on
your calculator, followed by a simple paper-and-pencil addition. Carry out the
calculation. Hint: Don't do it recursively!
Section 1.8 Problems 49
Problem 1.7. You are required to multiply two numbers given in Roman figures.
For instance, XIX times XXXIV is DCXLVI. You may not use a method that in-
volves translating the numbers into Arabic notation, multiplying them, and then
translating them back again. Devise an algorithm for this problem.
Hint: Find ways to translate back and forth between true Roman notation and
something similar that does not involve any subtractions. For instance, XIX might
become XVIIII in this "pseudo-Roman" notation. Next find easy ways to double,
halve and add figures in pseudo-Roman notation. Finally adapt multiplication a
la russe to complete the problem.
Problem 1.8. As in Problem 1.6, suppose you have available a pocket calculator
that can multiply a four-figure number by a four-figure number and get the cor-
rect eight-figure answer. Devise an algorithm for multiplying two large numbers
based on the classic algorithm, but using blocks of four figures at a time instead
of just one. (If you like, think of it as doing your calculation in base 10000 arith-
metic.) For instance, when multiplying 1234567 by 9876543, you might obtain the
arrangement shown in Figure 1.11.
0123 4567
0987 6543
0080 7777 1881
0012 1851 7629
0012 1932 5406 1881
Figure 1.11. Multiplication in base 10000
Here the first line of the calculation is obtained, from right to left, as
(that is, a result of 1881 and a carry of 2988), followed by 0123 x 6543 + 2988 = 807777
(that is, a result of 7777 and a carry of 0080). The second line of the calculation is
obtained similarly, and the final result is found by adding the columns. All the
necessary arithmetic can be done with your calculator.
Use your algorithm to multiply 31415975 by 8182818. Check that your answer is
the same as the one you found in Problem 1.6.
Problem 1.9. Figure 1.12 shows yet another method of multiplying two positive
integers, sometimes called Arabic multiplication.
In the figure, as before, 981 is multiplied by 1234. To use this method, draw a
rectangle with as many columns as there are figures in the multiplicand (here 3) and
as many rows as there are figures in the multiplier (here 4). Write the multiplicand
above the columns, and the multiplier down the right-hand side of the rectangle.
Draw diagonal lines as shown. Next fill in each cell of the rectangle with the product
of the figure at the top of the column and the figure at the right-hand end of the
row. The tens figure of the result (which may be 0) goes above the diagonal line,
50 Preliminaries Chapter 1
9 8 I
0 09 0
2 0 2 2
X 8
X 4 3 3
0 4
74
5 5 4
and the units figure below it. Finally add the diagonals of the rectangle starting at
the bottom right. Here the first diagonal gives 4; the second gives 2 + 0 + 3 = 5; the
third gives 6 + 3 + 4 + 0 + 2 = 15, so write down 5 and carry 1; and so on. Now the
result 1210554 can be read off down the left-hand side and along the bottom of the
rectangle.
Once again, use this algorithm to multiply 31415975 by 8182818, checking your
result against the answers to previous problems.
Problem 1.10. Are the two sets X = {1, 2, 31 and Y = {2, 1, 31 equal?
Problem 1.11. Which of the following sets are finite: 0, {0}, A, {1I}? What is
the cardinality of those among the above sets that are finite?
Problem 1.12. For which values of Boolean variables p, q and r is the Boolean
formula (p A q)v(- q A r) true?
Problem 1.16. Provethatx - 1 < Lx] < x < [xI < x + lforeveryrealnumberx.
Section 1.8 Problems 51
Problem 1.17. An alternative proof of Theorem 1.5.1, to the effect that there are
infinitely many primes, begins as follows. Assume for a contradiction that the
set of prime numbers is finite. Let p be the largest prime. Consider x = p! and
y = x + 1. Your problem is to complete the proof from here and to distill from your
proof an algorithm Biggerprime(p) that finds a prime larger than p. The proof of
termination for your algorithm must be obvious, as well as the fact that it returns
a value larger than p.
Problem 1.18. Modify the proof of Theorem 1.5.1 to prove that there are infinitely
many primes of the form 4k -1, where k is an integer.
Hint: Define x as in the proof of Theorem 1.5.1, but then set y = 4x - 1 rather than
y = x + 1. Even though y itself may not be prime and the smallest integer d larger
than 1 that divides y may not be of the form 4k - 1, prove by contradictionthat y
has at least one prime divisor of the required form.
It is also true that there are infinitely many primes of the form 4k + 1, but this is
more involved to prove. Where does your reasoning for the case 4k - 1 break down
in trying to use the same idea to prove the case 4k + 1?
Problem 1.19. Let n be a positive integer. Draw a circle and mark n points regu-
larly spaced around the circumference. Now, draw a chord inside the circle between
each pair of these points. In the case n = 1, there are no pairs of points and thus
no chords are drawn; see Figure 1.13.
Finally, denote by c(n) the number of sections thus carved inside the circle. You
should find that c(1) = 1, c(2) = 2, c (3) = 4 and c(4) = 8. By induction, what do you
think the general formula for c(n) is? Determine c(5) by drawing and counting.
Was your inductively found formula correct? Try again with c(6). What if you
allow the points to be spaced irregularly? (Optional and much harder: determine
the correct formula for c(n), and prove that it is correct.)
Problem 1.20. Why do you think that mathematical induction received this name
even though it really is a deductive technique?
Problem 1.21. Prove by mathematical induction that the sum of the cubes of the
first n positive integers is equal to the square of the sum of these integers.
52 Preliminaries Chapter 1
Problem 1.22. Following Problem 1.21, prove that the sum of the cubes of the first
n positive integers is equal to the square of the sum of these integers, but now
use Proposition 1.7.13 rather than mathematical induction. (Of course, Proposi-
tion 1.7.13 was proved by mathematical induction, too!)
Problem 1.23. Determine by induction all the positive integer values of n for
which n3 > 2'. Prove your claim by mathematical induction.
Problem 1.24. The axiom of the least integer says that every nonempty set of positive
integers contains a smallest element. (This is not true in general for sets of positive
real numbers-consider for instance the set of all reals strictly between 0 and 1.)
Using this axiom, give a rigorous proof by contradiction that the simple principle
of mathematical induction is valid. More precisely, consider any integer a and
any property P of the integers such that P(a) holds, and such that P(n) holds
whenever P(n -1) holds for any integer n > a. Assume furthermore that it is not
the case that P(n) holds for all n > a. Use the axiom of the least integer to derive
a contradiction.
Problem 1.25. Problem 1.24 asked you to prove the validity of the principle of
mathematical induction from the axiom of the least integer. In fact the principle
and the axiom are equivalent: prove the axiom of the least integer by mathematical
induction!
Hint: As a first step, prove that any nonemptyfinite set of positive integers contains
a smallest element by mathematical induction on the number of elements in the
set. Note that your proof would hold just as well for any finite set of real numbers,
which shows clearly that it does not apply directly to infinite sets. To generalize
the result to infinite sets of positive integers, consider any such set X. Let m be any
element in X (we do not need an axiom for this: any infinite set contains at least one
element by definition). Let Y be the set of elements of X that are not larger than m.
Show that Y is a nonempty finite set of positive integers, and therefore your proof
by mathematical induction applies to conclude that Y contains a smallest element,
say n. Finish the proof by arguing that n is also the smallest element in X.
Problem 1.26. Give a rigorous proof that the generalized principle of mathematical
induction is valid.
Hint: Prove it by simple mathematical induction.
Problem 1.27. Recall that the Fibonaccisequence is defined as
fo 0;fJ =1 and
f fn11 + f7 2 for n > 2.
Prove by generalized mathematical induction that
lt(l)= a and
Lt(n)=bn+nt(n-1) forn>2.
for each positive integer n. Using constructive induction, prove the existence
of a real positive constant c such that t(n)< cnlogn for each sufficiently large
integer n.
Problem 1.34. Give two functions f(n) and g (n) such that neither f(n) nor g (n)
tends to a limit as n tends to infinity, but both f (n) +g (n) and f (n) /g (n) do tend
to a limit.
Problem 1.35. Determine the behaviour as n tends to infinity of f (n) = n-rxn,
where r is any positive integer.
Problem 1.36. Use de l'Hopital's rule to find the limit as n tends to infinity of
(log log n) a / log n, where a > 0 is an arbitrary positive constant.
54 Preliminaries Chapter 1
Problem 1.38. Give a simple proof, not using integrals, that the harmonic series
diverges.
Problem 1.39. Use a technique similar to the one illustrated in Figure 1.8 to show
that for r > 1 the sum
Sn =1 1 +1 + +
Sf 21r 3r nrY
Show further that the error we make if we approximate the sum of the whole series
by the sum of the first n terms is less than f (n + 1).
Problem 1.41. Show that (<n) < 2n-1 for n > 0 and 0 < r < n.
Problem 1.45. Show that two mutually exclusive events are not independent ex-
cept in the trivial case that at least one of them has probability zero.
Problem 1.46. Consider a random experiment whose sample space S is {1, 2,3,
4, 5} . Let A and B be the events "the outcome is divisible by 2" and "the outcome is
divisible by 3", respectively. What are Pr[A], Pr[B] and Pr[A n B]? Are the events
A and B independent?
Problem 1.47. Show that for the horse-race of Section 1.7.4, 99% of the time our
expected winnings fv averaged over 50 races lie between -24.3 and +2.3.
Section 1.9 References and further reading 55
Leonardo Pisano (c. 1170-c. 1240), or Leonardo Fibonacci, was the first great
western mathematician of the Middle Ages. There is a brief account of his life and
an excerpt from Liberabaci in Calinger (1982). The proof that
lim 1 + r 2+2
n-fo( 22 ±+ 32 + + W2 6
is due to Euler; see Eves (1983) or Scharlau and Opolka (1985). These references
also give the history of the binomial theorem.
Chapter 2
Elementary Algorithmics
2.1 Introduction
In this chapter we begin our detailed study of algorithms. First we define some
terms: we shall see that a problem, such as multiplying two positive integers, will
normally have many-usually infinitely many-instances, such as multiplying the
particular integers 981 and 1234. An algorithm must work correctly on every in-
stance of the problem it claims to solve.
Next we explain what we mean by the efficiency of an algorithm, and discuss
different ways for choosing the most efficient algorithm to solve a problem when
several competing techniques are available. We shall see that it is crucial to know
how the efficiency of an algorithm changes as the problem instances get bigger
and therefore (usually) harder to solve. We also distinguish between the average
efficiency of an algorithm when it is used on many instances of a problem and its
efficiency in the worst possible case. The pessimistic, worst-case estimate is often
appropriate when we have to be sure of solving a problem in a limited amount of
time, for instance.
Once we have defined what we mean by efficiency, we can begin to investigate
the methods used to analyse algorithms. Our line of attack is to try to count the
number of elementary operations, such as additions and multiplications, that an
algorithm performs. However we shall see that even such commonplace operations
as these are not straightforward: both addition and multiplication get slower as the
size of their operands increases. We also try to convey some notion of the practical
difference between a good and a bad algorithm in terms of computing time.
57
58 Elementary Algorithmics Chapter 2
One topic we shall not cover, either in this chapter or elsewhere, is how to prove
rigorously that the programs we use to represent algorithms are correct. Such an
approach requires a formal definition of programming language semantics well
beyond what we consider necessary; an adequate treatment of this subject would
deserve a book to itself. For our purposes we shall be content to rely on informal
proofs using common-sense arguments.
The chapter concludes with a number of examples of algorithms from different
areas, some good and some poor, to show how the principles we put forward apply
in practice.
than its predecessor only when they are both used on large instances, this last point
is particularly important.
It is also possible to analyse algorithms using a hybrid approach, where the form
of the function describing the algorithm's efficiency is determined theoretically, and
then any required numerical parameters are determined empirically for a particular
program and machine, usually by some kind of regression. Using this approach we
can predict the time an actual implementation will take to solve an instance much
larger than those used in the tests. Beware however of making such an extrapolation
solely on the basis of a small number of empirical tests, ignoring all theoretical
considerations. Predictions made without theoretical support are likely to be very
imprecise, if not plain wrong.
If we want to measure the amount of storage an algorithm uses as a function
of the size of the instances, there is a natural unit available to us, namely the bit.
Regardless of the machine being used, the notion of one bit of storage is well
defined. If on the other hand, as is more often the case, we want to measure the
efficiency of an algorithm in terms of the time it takes to arrive at an answer, there
is no such obvious choice. Clearly there can be no question of expressing this
efficiency in seconds, say, since we do not have a standard computer to which all
measurements might refer.
An answer to this problem is given by the principle of invariance, which states
that two different implementations of the same algorithm will not differ in efficiency
by more than some multiplicative constant. If this constant happens to be 5, for
example, then we know that, if the first implementation takes 1 second to solve
instances of a particular size, then the second implementation (maybe on a different
machine, or written in a different programming language) will not take longer than
5 seconds to solve the same instances. More precisely, if two implementations of
the same algorithm take tj (n) and t2 (n) seconds, respectively, to solve an instance
of size n, then there always exist positive constants c and d such that tj (n) < ct2 (n)
and t 2 (n) < dt1 (n) whenever n is sufficiently large. In other words, the running
time of either implementation is bounded by a constant multiple of the running
time of the other; the choice of which implementation we call the first, and which
we call the second, is irrelevant. The condition that n be sufficiently large is not
really necessary: see the "threshold rule" in Section 3.2. However by including it
we can often find smaller constants c and d than would otherwise be the case. This
is useful if we are trying to calculate good bounds on the running time of one
implementation when we know the running time of the other.
This principle is not something we can prove: it simply states a fact that can be
confirmed by observation. Moreover it has very wide application. The principle
remains true whatever the computer used to implement an algorithm (provided it is
of conventional design), regardless of the programming language and the compiler
employed, and regardless even of the skill of the programmer (provided he or she
does not actually modify the algorithm!). Thus a change of machine may allow us
to solve a problem 10 times or 100 times faster, giving an increase in speed by a
constant factor. A change of algorithm, on the other hand-and only a change of
algorithm-may give us an improvement that gets more and more marked as the
size of the instances increases.
Section 2.4 Average and worst-case analyses 61
Returning to the question of the unit to be used to express the theoretical ef-
ficiency of an algorithm, the principle of invariance allows us to decide that there
will be no such unit. Instead, we only express the time taken by an algorithm to
within a multiplicative constant. We say that an algorithm for some problem takes
a time in the order of t (n), for a given function t, if there exist a positive constant c
and an implementation of the algorithm capable of solving every instance of size n
in not more than c t (n) seconds. (For numerical problems, as we remarked earlier,
n may sometimes be the value rather than the size of the instance.)
The use of seconds in this definition is obviously arbitrary: we only need to
change the constant to bound the time by at(n) years or bt(n) microseconds.
By the principle of invariance, if any one implementation of the algorithm has the
required property, then so do all the others, although the multiplicative constant
may change from one implementation to another. In the following chapter we
give a more rigorous treatment of this important concept known as the asymptotic
notation. It will be clear from the formal definition why we say "in the order of"
rather than the more usual "of the order of."
Certain orders occur so frequently that it is worth giving them a name. For ex-
ample, suppose the time taken by an algorithm to solve an instance of size n is
never more than en seconds, where c is some suitable constant. Then we say that
the algorithm takes a time in the order of n, or more simply that it takes linear
time. In this case we also talk about a linear algorithm. If an algorithm never takes
more than cn 2 seconds to solve an instance of size n, then we say it takes time in
the order of n2 , or quadratictime, and we call it a quadraticalgorithm. Similarly an
algorithm is cubic, polynomial or exponential if it takes a time in the order of n3 , nk
or c 1, respectively, where k and c are appropriate constants. Section 2.6 illustrates
the important differences between these orders of magnitude.
Do not fall into the trap of completely forgetting the hidden constants, as the
multiplicative constants used in these definitions are often called. We commonly
ignore the exact values of these constants and assume that they are all of about
the same order of magnitude. This lets us say, for instance, that a linear algorithm
is faster than a quadratic one without worrying whether our statement is true in
every case. Nevertheless it is sometimes necessary to be more careful.
Consider, for example, two algorithms whose implementations on a given ma-
chine take respectively n2 days and n 3 seconds to solve an instance of size n. It is
only on instances requiring more than 20 million years to solve that the quadratic
algorithm outperforms the cubic algorithm! (See Problem 2.7.) From a theoretical
point of view, the former is asymptotically better than the latter; that is, its perfor-
mance is better on all sufficiently large instances. From a practical point of view,
however, we will certainly prefer the cubic algorithm. Although the quadratic al-
gorithm may be asymptotically better, its hidden constant is so large as to rule it
out of consideration for normal-sized instances.
T[j + 1].- x
procedure select(T[l . . n])
for i - 1 to n - 1 do
minj - i; minx - T[i]
for j - i + 1 to n do
if T [j]< minx then = minj - j
minx - T[j]
T[minj]- T[i]
T[i]- minx
Simulate the operation of these two algorithms on a few small arrays to make
sure you understand how they work. The main loop in insertion sorting looks suc-
cessively at each element of the array from the second to the n-th, and inserts it in
the appropriate place among its predecessors in the array. Selection sorting works
by picking out the smallest element in the array and bringing it to the beginning;
then it picks out the next smallest, and puts it in the second position in the array;
and so on.
Let U and V be two arrays of n elements, such that U is already sorted in
ascending order, whereas V is sorted in descending order. Problem 2.9 shows that
both these algorithms take more time on V than on U. In fact, array V represents the
worst possible case for these two algorithms: no array of nt elements requires more
work. Nonetheless, the time required by the selection sort algorithm is not very
sensitive to the original order of the array to be sorted: the test "if T[j] < minx" is
executed exactly the same number of times in every case. The variation in execution
time is only due to the number of times the assignments in the then part of this test
are executed. When we programmed this algorithm and tested it on a machine,
we found that the time required to sort a given number of elements did not vary
by more than 15% whatever the initial order of the elements to be sorted. As we
will show in Section 4.4, the time required by select(T) is quadratic, regardless of
the initial order of the elements.
The situation is different if we compare the times taken by the insertion sort
algorithm on the same two arrays. Because the condition controlling the while
loop is always false at the outset, insert(U) is very fast, taking linear time. On the
other hand, insert(V) takes quadratic time because the while loop is executed i - 1
times for each value of i; see Section 4.4 again. The variation in time between
these two instances is therefore considerable. Moreover, this variation increases
with the number of elements to be sorted. When we implemented the insertion
sort algorithm, we found that it took less than one-fifth of a second to sort an
array of 5000 elements already in ascending order, whereas it took three and a half
minutes-that is, a thousand times longer-to sort an array with the same number
of elements, this time initially in descending order.
Section 2.4 Average and worst-case analyses 63
If such large variations can occur, how can we talk about the time taken by
a algorithm solely in terms of the size of the instance to be solved? We usually
consider the worst case of the algorithm, that is, for each size of instance we only
consider those on which the algorithm requires the most time. This is why we said
in the preceding section that an algorithm must be able to solve every instance of
size n in not more than c t (n) seconds, for an appropriate constant c that depends
on the implementation, if it is to run in a time in the order of t (n): we implicitly
had the worst case in mind.
Worst-case analysis is appropriate for an algorithm whose response time is
critical. For example, if it is a question of controlling a nuclear power plant, it is
crucial to know an upper limit on the system's response time, regardless of the
particular instance to be solved. On the other hand, if an algorithm is to be used
many times on many different instances, it may be more important to know the
average execution time on instances of size n. We saw that the time taken by the
insertion sort algorithm varies between the order of n and the order of n2 . If we
can calculate the average time taken by the algorithm on the n! different ways
of initially ordering n distinct elements, we shall have an idea of the likely time
taken to sort an array initially in random order. We shall see in Section 4.5 that if
the n! initial permutations are equally likely, then this average time is also in the
order of n 2 . Insertion sorting thus takes quadratic time both on the average and in
the worst case, although for some instances it can be much faster. In Section 7.4.2
we shall see another sorting algorithm that also takes quadratic time in the worst
case, but that requires only a time in the order of n log n on the average. Even
though this algorithm has a bad worst case-quadratic performance is slow for a
sorting algorithm-it is probably the fastest algorithm known on the average for
an in-place sorting method, that is, one that does not require additional storage.
It is usually harder to analyse the average behaviour of an algorithm than to
analyse its behaviour in the worst case. Furthermore, such an analysis of average
behaviour can be misleading if in fact the instances to be solved are not chosen
randomly when the algorithm is used in practice. For example, we stated above that
insertion sorting takes quadratic time on the average when all the n! possible initial
arrangements of the elements are equally probable. However in many applications
this condition may be unrealistic. If a sorting program is used to update a file, for
instance, it might mostly be asked to sort arrays whose elements are already nearly
in order, with just a few new elements out of place. In this case its average behaviour
on randomly chosen instances will be a poor guide to its real performance.
A useful analysis of the average behaviour of an algorithm therefore requires
some a priori knowledge of the distribution of the instances to be solved. This
is normally an unrealistic requirement. Especially when an algorithm is used as
an internal procedure in some more complex algorithm, it may be impractical
to estimate which instances it is most likely to encounter, and which will only
occur rarely. In Section 10.7, however, we shall see how this difficulty can be
circumvented for certain algorithms, and their behaviour made independent of
the specific instances to be solved.
In what follows we shall only be concerned with worst-case analyses unless
stated otherwise.
64 Elementary Algorithmics Chapter 2
x- T[1]
for i - 2 to n do
if T[i] < x then x - T[i].
The example at the beginning of this section suggested that we can consider
addition and multiplication to be unit cost operations, since it assumed that the
time required for these operations could be bounded by a constant. In theory,
however, these operations are not elementary since the time needed to execute
them increases with the length of the operands. In practice, on the other hand, it
may be sensible to consider them as elementary operations so long as the operands
concerned are of a reasonable size in the instances we expect to encounter. Two
examples will illustrate what we mean.
function Sum(n)
{Calculates the sum of the integers from 1 to n}
sum - 0
for i - 1 to n do sum - sum + i
return sum
function Fibonacci(n)
{Calculates the n-th term of the Fibonacci sequence;
see Section 1.6.4}
i 1; j 0
for k - 1 to n do j - i + j
i- ji
return j
In the algorithm called Sum the value of sum stays reasonable for all the in-
stances that the algorithm can realistically be expected to meet in practice. If we
are using a machine with 32-bit words, all the additions can be executed directly
provided n is no greater than 65 535. In theory, however, the algorithm should
work for all possible values of n. No real machine can in fact execute these ad-
ditions at unit cost if n is chosen sufficiently large. The analysis of the algorithm
must therefore depend on its intended domain of application.
The situation is different in the case of Fibonnaci. Here it suffices to take n = 47
to have the last addition "j - i + j " cause arithmetic overflow on a 32-bit machine.
To hold the result corresponding to n = 65 535 we would need 45 496 bits, or more
than 1420 computer words. It is therefore not realistic, as a practical matter, to
consider that these additions can be carried out at unit cost. Rather, we must
attribute to them a cost proportional to the length of the operands concerned.
In Section 4.2.2 this algorithm is shown to take quadratic time, even though at
first glance its execution time appears to be linear.
In the case of multiplication it may still be reasonable to consider this an ele-
mentary operation for sufficiently small operands. However it is easier to produce
large operands by repeated multiplication than by addition, so it is even more im-
portant to ensure that arithmetic operations do not overflow. Furthermore, when
the operands do start to get large, the time required to perform an addition grows
linearly with the size of the operands, but the time required to perform a multipli-
cation is believed to grow faster than this.
A similar problem can arise when we analyse algorithms involving real num-
bers if the required precision increases with the size of the instances to be solved.
66 Elementary Algorithmics Chapter 2
One typical example of this phenomenon is the use of de Moivre's formula (see Prob-
lem 1.27) to calculate values in the Fibonacci sequence. This formula tells us
that fA, the n-th term in the sequence, is approximately equal to 0A / 5, where
4 = (1 + \5 )/2 is the golden ratio. The approximation is good enough that we
can in principle obtain the exact value of fn by simply taking the nearest integer;
see Problem 2.23. However we saw above that 45 496 bits are required to represent
f65535 accurately. This means that we would have to calculate the approximation
with the same degree of accuracy to obtain the exact answer. Ordinary single
or double precision floating-point arithmetic, using one or two computer words,
would certainly not be accurate enough. In most practical situations, however,
the use of single or double precision floating-point arithmetic proves satisfactory,
despite the inevitable loss of precision. When this is so, it is reasonable to count
such arithmetic operations at unit cost.
To sum up, even deciding whether an instruction as apparently innocuous as
- i + j" can be considered as elementary or not calls for the use of judgement.
-
an instance of size 10 will take you 10 seconds, and an instance of size 20 will still
require between one and two minutes. But now an instance of size 30 can be solved
in four and a half minutes, and in one day you can solve instances whose size is
greater than 200; with one year's computation you can almost reach size 1500. This
is illustrated by Figure 2.1.
Not only does the new algorithm offer a much greater improvement than the pur-
chase of new hardware, it will also, supposing you are able to afford both, make
such a purchase much more profitable. If you can use both your new algorithm
and a machine one hundred times faster than the old one, then you will be able to
solve instances four or five times bigger than with the new algorithm alone, in the
same length of time-the exact factor is 100. Compare this to the situation with
the old algorithm, where you could add 7 to the size of the instance; here you can
multiply the size of the instance by four or five. Nevertheless the new algorithm
should not be used uncritically on all instances of the problem, in particular on
rather small ones. We saw above that on the original machine the new algorithm
takes 10 seconds to solve an instance of size 10, which is one hundred times slower
than the old algorithm. The new algorithm is faster only for instances of size 20 or
greater. Naturally, it is possible to combine the two algorithms into a third one that
looks at the size of the instance to be solved before deciding which method to use.
2.7.2 Sorting
The sorting problem is of major importance in computer science, and in particular
in algorithmics. We are required to arrange in order a collection of n objects on
which a total orderingis defined. By this we mean that when we compare any two
objects in the collection, we know which should come first. For many kinds of
objects, this requirement is trivial: obviously 123 comes before 456 in numerical
order, and 1 August 1991 comes before 25 December 1995 in calendar order, so that
both integers and dates are totally ordered. For other, equally common objects,
though, defining a total order may not be so easy. For example, how do you
order two complex numbers? Does "general" come before or after "General" in
alphabetical order, or are they the same word (see Problem 2.16)? Neither of these
questions has an obvious answer, but until an answer is found the corresponding
objects cannot be sorted.
Sorting problems are often found inside more complex algorithms. We have
already seen two standard sorting algorithms in Section 2.4: insertion sorting and
selection sorting. Both these algorithms, as we saw, take quadratic time both in the
worst case and on the average. Although they are excellent when n is small, other
sorting algorithms are more efficient when n is large. Among others, we might
use Williams's heapsort algorithm (see Section 5.7), mergesort (see Section 7.4.1), or
Hoare's quicksort algorithm (see Section 7.4.2). All these algorithms take a time in
Section 2.7 Some examples 69
the order of n log n on the average; the first two take a time in the same order even
in the worst case.
To have a clearer idea of the practical difference between a time in the order of
n2 and a time in the order of n log n, we programmed the insertion sort algorithm
and quicksort on our local machine. The difference in efficiency between the two
algorithms is marginal when the number of elements to be sorted is small. Quicksort
is already almost twice as fast as insertion when sorting 50 elements, and three times
as fast when sorting 100 elements. To sort 1000 elements, insertion takes more than
three seconds, whereas quicksort requires less than one-fifth of a second. When we
have 5000 elements to sort, the inefficiency of insertion sorting becomes still more
pronounced: one and a half minutes are needed on average, compared to little more
than one second for quicksort. In 30 seconds, quicksort can handle 100 000 elements;
we estimate it would take nine and a half hours to do the same job using insertion
sorting.
We shall see in Chapter 12 that no sorting algorithm that proceeds by comparing
the elements to be sorted can be faster than the order of n log n, so that in this sense
heapsort, mergesortand quicksort are as fast as an algorithm can be (although quicksort
has a bad worst case). Of course their actual running times depend on the hidden
multiplicative constants in the definition of "the order of." Other, faster sorting
algorithms can be found in special cases, however. Suppose for instance that the
elements to be sorted are integers known to lie between 1 and 10 000. Then the
following algorithm can be used.
Parallel sorting methods, which use many processors to carry out several com-
parisons simultaneously, allow us to go faster still. An example of a parallel sorting
algorithm is outlined in Chapter 11.
algorithm is some three times more efficient than the classic algorithm: they take
about 15 seconds and 40 seconds, respectively. The gain in efficiency continues to
increase as the size of the operands goes up.
More sophisticated algorithms exist, the fastest at present taking a time in the
order of n log n log log n to multiply two integers of size n. However these more
sophisticated algorithms are largely of theoretical interest; the hidden constants
involved are such that they only become competitive for much larger operands.
For "small" instances involving operands with only a few thousand decimal digits,
they are considerably slower than the algorithms mentioned above.
function gcd(m, n)
i - min(m, n) +1
repeat i - i - 1 until i divides both m and n exactly
return i
The time taken by this algorithm is in the order of the difference between the
smaller of the two arguments and their greatest common divisor. When m and n
are of similar size and coprime, it therefore takes a time in the order of n. (Notice
that this is the value of the operand, not its size.)
A classic algorithm for calculating gcd ( m, n) consists of first factorizing m and
n, and then taking the product of the prime factors common to m and n, each prime
factor being raised to the lower of its powers in the two arguments. For example,
to calculate gcd(120,700) we first factorize 120 = 23 x 3 x 5 and 700 = 22 x 52 x 7.
The common factors of 120 and 700 are therefore 2 and 5, and their lower powers
are 2 and 1, respectively. The greatest common divisor of 120 and 700 is therefore
22 x 5 = 20.
Even though this algorithm is better than the one given previously, it requires
us to factorize m and n, an operation nobody knows how to do efficiently when
m and n are large; see Section 10.7.4. In fact there exists a much more efficient
algorithm for calculating greatest common divisors, known as Euclid's algorithm,
even though it can be traced back well before the time of the ancient Greeks.
function Euclid(m, n)
while m > 0 do
t- m
m- nmodm
n- t
return n
72 Elementary Algorithmics Chapter 2
If we consider the arithmetic operations involved to have unit cost, this algo-
rithm takes a time in the order of the logarithm of its arguments-that is, in the
order of their size-even in the worst case; see Section 4.4. To be historically ex-
act, Euclid's original algorithm works using successive subtractions rather than by
calculating a modulo. In this form it is more than 3500 years old.
Ao = 0; f = I and
fn =fn1;+fn 2 forn >2.
As we saw, the sequence begins 0, 1, 1, 2, 3, 5, 8, 13, 21, 34 ... We also saw
de Moivre's formula
LI -[
v5[V - (-W/)n]
where d> = (1 + /5 )/2 is the golden ratio, and we pointed out that the term ( - <) -
can be neglected when n is large. Hence the value of fn is in the order of gqn, and
therefore the size of fn is in the order of n. However, de Moivre's formula is of little
immediate help in calculating fn exactly, since the larger n becomes, the greater is
the degree of precision required in the values of 5 and (b; see Section 2.5. On our
local machine, a single-precision computation produces an error for the first time
when calculating f66.
The recursive algorithm obtained directly from the definition of the Fibonacci
sequence was given in Section 1.6.4 under the name Fibonacci.
function Fibrec(n)
if n < 2 then return n
else return Fibrec(n - 1)+Fibrec(n- 2)
This algorithm is very inefficient because it recalculates the same values many
times. For instance, to calculate Fibrec(5) we need the values of Fibrec(4) and
Fibrec(3); but Fibrec(4) also calls for the calculation of Fibrec(3). It is simple to check
that Fibrec(3) will be calculated twice, Fibrec(2) three times, Fibrec(l) five times,
and Fibrec(O) three times. (The number of calls of Fibrec(5),Fibrec(4),...,Fibrec(l),
are thus 1, 1, 2, 3 and 5 respectively. It is no coincidence that this is the beginning
of the Fibonacci sequence; see Problem 2.24.)
In fact, the time required to calculate f, using this algorithm is in the order of
the value of fn itself, that is, in the order of k 'U To see this, note that the recursive
calls only stop when Fibrec returns a value 0 or 1. Adding these intermediate results
to obtain the final result fn must take at least f, operations, and hence the complete
algorithm certainly takes a number of elementary operations at least in the order
of fn. This was proven formally by constructive induction in Section 1.6.4, together
with a proof that the number of operations required is not more than the order of
f, provided the additions are counted at unit cost. The case when additions are
Section 2.7 Some examples 73
not counted at unit cost yields the same conclusion, as the more precise analysis
given in Section 4.2.3 shows.
To avoid wastefully calculating the same values over and over, it is natural to
proceed as in Section 2.5, where a different algorithm Fibonacci was introduced,
which we rename Fibiter for comparison with Fibrec.
function Fibiter(n)
i-1; jo-O
fork -tondo j- i+j
i-i ii
return j
This second algorithm takes a time in the order of n, assuming we count each
addition as an elementary operation. Figure 2.2, which shows some computation
times we observed in practice, illustrates the difference. To avoid the problems
caused by ever-longer operands, the computations reported in this figure were
carried out modulo 107, which is to say that we only computed the seven least
significant figures of the answer. The times for Fibrec when n > 50 were estimated
using the hybrid approach.
n 10 20 30 50 100
Fibrec 8 msec 1 sec 2 min 21 days 109 years
Fibiter 6 msec 1 1 3 2 msec
Figure 2.2. Comparison of modulo 107 Fibonacci algorithms
classic algorithm took more than 26 minutes of computation, the "new" algorithm
was able to perform the same task in less than two and a half seconds.
Ironically it turned out that an efficient algorithm had already been published
in 1942 by Danielson and Lanczos, and all the necessary theoretical groundwork
for Danielson and Lanczos's algorithm had been published by Runge and Konig
in 1924. And if that were not sufficient, Gauss describes a similar algorithm in a
paper written around 1805 and published posthumously in 1866!
2.9 Problems
Problem 2.1. Find a more practical algorithm for calculating the date of Easter
than the one given in Problem 1.2. What will be the date of Easter in the year 2000?
What is the domain of definition of your algorithm?
Problem 2.2. In the game of chess, your opponent's pieces and moves are all
visible. We say that chess is a game with complete information. In games such as
bridge or poker, however, you do not know how the cards have been dealt. Does
Section 2.9 Problems 75
this make it impossible to define an algorithm for playing good bridge or poker?
Would such an algorithm necessarily be probabilistic?
What about backgammon? Here you can see your opponent's pieces and moves,
but you cannot predict how the dice will fall on future throws. Does this alter the
situation?
Problem 2.4. Using the technique called "virtual memory", it is possible to free
a programmer from most worries about the actual size of the storage available on
his machine. Does this mean that the quantity of storage used by an algorithm is
never of interest in practice? Justify your answer.
Problem 2.5. Suppose you measure the performance of a program, perhaps using
some kind of run-time trace, and then you optimize the heavily-used parts of the
code. However, you are careful not to change the underlying algorithm. Would
you expect to obtain (a) a gain in efficiency by a constant factor, whatever the
problem being solved, or (b) a gain in efficiency that gets proportionally greater as
the problem size increases? Justify your answer.
Problem 2.6. A sorting algorithm takes 1 second to sort 1000 items on your local
machine. How long would you expect it to take to sort 10 000 items (a) if you
believe that the algorithm takes a time roughly proportional to n2 , and (b) if you
believe that the algorithm takes a time roughly proportional to n log n?
Problem 2.7. Two algorithms take n2 days and n3 seconds respectively to solve an
instance of size n. Show that it is only on instances requiring more than 20 million
years to solve that the quadratic algorithm outperforms the cubic algorithm.
Problem 2.8. Two algorithms take n2 days and 2n seconds respectively to solve
an instance of size n. What is the size of the smallest instance on which the former
algorithm outperforms the latter? Approximately how long does such an instance
take to solve?
Problem 2.9. Simulate both the insertion sort and the selection sort algorithms of
Section 2.4 on the following two arrays: U = [1, 2,3,4,5,6] and V = [6,5,4,3,2,1].
Does insertion sorting run faster on the array U or the array V? And selection
sorting? Justify your answers.
Problem 2.10. Suppose you try to "sort" an array W [1,1,1,1,1,1] all of whose
-
elements are equal, using (a) insertion sorting and (b) selection sorting. How does
this compare to sorting the arrays U and V of the previous problem?
76 Elementary Algorithmics Chapter 2
Problem 2.11. You are required to sort a file containing integers between 0 and
999 999. You cannot afford to use one million pigeon-holes, so you decide instead
to use one thousand pigeon-holes numbered from 0 to 999. You begin the sort by
putting each integer into the pigeon-hole corresponding to its first three figures.
Next you use insertion sorting one thousand times to sort the contents of each
pigeon-hole separately, and finally you empty the pigeon-holes in order to obtain
a completely sorted sequence.
Would you expect this technique to be faster, slower or the same as simply using
insertion sorting on the whole sequence (a) on the average, and (b) in the worst
case? Justify your answers.
Problem 2.12. Is it reasonable, as a practical matter, to consider division as an
elementary operation (a) always, (b) sometimes, or (c) never? Justify your answer.
If you think it necessary, you may treat the division of integers and the division of
real numbers separately.
Problem 2.13. Suppose n is an integer variable in a program you are writing.
Consider the instruction x - sin(n), where n may be supposed to be in radians.
As a practical matter, would you regard the execution of this instruction as an
elementary operation (a) always, (b) sometimes, or (c) never? Justify your answer.
What about the instruction x - sin(nTr) ?
Problem 2.14. In Section 2.5, we saw that Wilson's theorem could be used to
test any number for primality in constant time if factorials and tests for integer
divisibility were counted at unit cost, regardless of the size of the numbers involved.
Clearly this would be unreasonable.
Use Wilson's theorem together with Newton's binomial theorem to design an algo-
rithm capable of deciding in a time in the order of log n whether or not an integer
n is prime, provided additions, multiplications and tests for integer divisibility are
counted at unit cost, but factorials and exponentials are not. The point of this exer-
cise is not to provide a useful algorithm, but to demonstrate that it is unreasonable
to consider multiplications as elementary operations in general.
Problem 2.17. Show that pigeon-hole sorting takes a time in the order of n to sort
n elements that are within bounds.
Problem 2.18. In Section 2.7.3 we said that the analysis of algorithms for large
integers is not affected by the choice of a measure for the size of the operands: the
number of computer words needed, or the length of their representation in decimal
or binary will do equally well. Show that this remark would in fact be false were
we considering exponential-time algorithms.
Problem 2.19. How much time is required to add or subtract two large integers
of size m and n respectively? Sketch the appropriate algorithm.
Problem 2.20. How much time is required to multiply two large integers of size
m and n, respectively, using multiplication a la russe (a) if the smaller operand is
in the left-hand column, and (b) if the larger operand is in the left-hand column?
Of course you should not take addition, doubling and halving to be elementary
operations in this problem.
Problem 2.21. How much time is required to multiply two large integers of size
m and n, respectively, using multiplication round a rectangle (see Problem 1.9).
Problem 2.22. Calculate gcd(606, 979) (a) by factorizing 606 and 979, and pick-
ing out the common factors to the appropriate power, and (b) using Euclid's
algorithm.
Problem 2.23. Use de Moivre's formula for fn to show that f, is the nearest
integer to qA !15 for all n > 1.
Problem 2.24. Show that when calculating fn using Fibrecfrom Section 2.7.5, there
are in all f,+i calls of Fibrec(i) for i = 1, 2, . . ., n, and fn 1 calls of Fibrec(0).
Problem 2.25. Let g(n) be the number of ways to write a string of n zeros and
ones so that there are never two successive zeros. For example, when n = 1 the
possible strings are 0 and 1, so g(1)= 2; when n = 2 the possible strings are 01, 11
and 10, so g(2)= 3; when n 3 the possible strings are 010, 011, 101, 110 and 111,
so g(3)= 5. Show that g(n) fn+2.
78 Elementary Algorithmics Chapter 2
Asymptotic Notation
3.1 Introduction
An important aspect of this book concerns determining the efficiency of algorithms.
In Section 2.3, we saw that this may, for instance, help us choose among several
competing algorithms. Recall that we wish to determine mathematically the quan-
tity of resources needed by an algorithm as a function of the size (or occasionally
of the value) of the instances considered. Because there is no standard computer to
which all measurements of computing time might refer, we saw also in Section 2.3
that we shall be content to express the time taken by an algorithm to within a mul-
tiplicative constant. To this end, we now introduce formally the asymptotic notation
used throughout the book. In addition, this notation permits substantial simplifi-
cations even when we are interested in measuring something more tangible than
computing time, such as the number of times a given instruction in a program is
executed.
This notation is called "asymptotic" because it deals with the behaviour of func-
tions in the limit, that is for sufficiently large values of its parameter. Accordingly,
arguments based on asymptotic notation may fail to have practical value when
the parameter takes "real-life" values. Nevertheless, the teachings of asymptotic
notation are usually of significant relevance. This is because, as a rule of thumb, an
asymptotically superior algorithm is very often (albeit not always) preferable even
on instances of moderate size.
79
80 Asymptotic Notation Chapter 3
42 16 n2= 42 6 f(n).
113 113
Even though it is natural to use the set-theoretic symbol " E " to denote the
fact that n2 is in the order of n 3 as in " n2 e 0(n 3 )", be warned that the traditional
notation for this is n2 = 0(n 3 ). Therefore, do not be surprised if you find these so-
called "one-way equalities" (for one would never write 0(n 3 )= n 2 ) in other books
or scientific papers. Those using one-way equalities say that n2 is of (or sometimes
on) the order of n3 , or simply n 2 is 0(n 3 ). Another significant difference you may
encounter in the definition of the 0 notation is that some writers allow 0 (f (n))
to include functions from the natural numbers to the entire set of reals-including
negative reals-and define a function to be in 0 (f (n)) if its absolute value is in what
we call 0(f(n)).
For convenience, we allow ourselves to misuse the notation from time to time
(as well as other notation introduced later in this chapter). For instance, we may
say that t(n) is in the order of f(n) even if t(n) is negative or undefined for a
finite number of values of n. Similarly, we may talk about the order of f(n) even
if f(n) is negative or undefined for a finite number of values of n. We shall say
that t(n)e 0(f(n)) if there is a positive real constant c and integer threshold no
such that both t(n) and f(n) are well-defined and 0 < t(n)< cf(n) whenever
n > no, regardless of what happens to these functions when n < no. For example,
it is allowable to talk about the order of n/log n, even though this function is not
defined when n = 0 or n = 1, and it is correct to write n 3 -3n 2 - n - 8 E 0(n 3 )
even though n 3 -3n 2 - n - 8 < 0 when n < 3.
The threshold no is often useful to simplify arguments, but it is never neces-
sary when we consider strictly positive functions. Let f,t : N -i R be two func-
tions from the natural numbers to the strictly positive reals. The threshold rule
states that t (n) e 0(f (n))if and only if there exists a positive real constant c such
that t(n) < cf(n) for each natural number n. One direction of the rule is obvious
since any property that is true for each natural number is also true for each suffi-
ciently large integer (simply take no = 0 as the threshold). Assume for the other
direction that t(n) 0(f(n)). Let c and no be the relevant constants such that
t(n)< cf(n) whenever n > no. Assume no > 0 since.otherwise there is nothing
to prove. Let b = max{t(n)/f(n) 0 < n < nol be the largest value taken by the
ratio of t and f on natural numbers smaller than no (this definition makes sense
precisely because f(n) cannot be zero and no > 0). By definition of the maxi-
mum, b > t(n)/f(n), and therefore t(n)< bf(n), whenever 0 < n < no. We al-
ready know that t(n)< cf(n) whenever n > no. Therefore t(n)< af(n) for each
natural number n, as we had to prove, provided we choose a at least as large as
both b and c, such as a = max(b, c). The threshold rule can be generalized to say
that if t(n)E 0(f(n)) and if f(n) is strictly positive for all n > no for some no,
then this no can be used as the threshold for the 0 notation: there exists a positive
real constant c such that t(n)< cf(n) for all n > no.
A useful tool for proving that one function is in the order of another is the
maximum rule. Let f,g : NR- °>0 be two arbitrary functions from the natural
numbers to the nonnegative reals. The maximum rule says that 0 (f(n) +g(n))=
0 (max(f (n),g(n))). More specifically, let p, q : N - Rlo be defined for each nat-
ural number n by p(n)= f(n)+g(n) and q(n)= max(f(n),g(n)), and consider
82 Asymptotic Notation Chapter 3
an arbitrary function t:N -. lO. The maximum rule says that t (n)e 0(p(n)) if
and only if t(n)e 0(q(n)). This rule generalizes to any finite constant number
of functions. Before proving it, we illustrate a natural interpretation of this rule.
Consider an algorithm that proceeds in three steps: initialization, processing and
finalization. Assume for the sake of argument that these steps take time in 0 (n 2 ),
0(n 3 ) and 0 (n log n), respectively. It is therefore clear (see Section 4.2) that the
complete algorithm takes a time in 0(n 2 + n 3 + nlogn). Although it would not
be hard to prove directly that this is the same as 0(n 3 ), it is immediate from the
maximum rule.
2
0(n + n3 + nlogn) = 0(max(n 2 ,n 3 ,nlogn))
= 0(n3 )
In other words, even though the time taken by an algorithm is logically the sum of
the times taken by its disjoint parts, it is in the order of the time taken by its most
time-consuming part, provided the number of parts is a constant, independent of
input size.
We now prove the maximum rule for the case of two functions. The general
case of any fixed number of functions is left as an exercise. Observe that
and
0 < min(f(n), g(n)) < max(f(n), g(n)).
It follows that
where the second line is obtained from the maximum rule. This does not use the
maximum rule properly since the function -5n 2 is negative. Nevertheless, the
following reasoning is correct:
Even though n3 log n - 5n2 is negative and 36 is larger than 11n 3 log n for small
values of n, all is well since this does not occur whenever n is sufficiently large.
Another useful observation is that it is usually unnecessary to specify the base of
a logarithm inside an asymptotic notation. This is because log, n = log, b x logb n
for all positive reals a, b and n such that neither a nor b is equal to 1. The point
is that loga b is a positive constant when a and b are constants larger than 1.
Therefore log, n and logb n differ only by a constant multiplicative factor. From
this, it is elementary to prove that 0 (loga n)= 0 (logb n), which we usually sim-
plify as 0 (log n). This observation also applies to more complicated functions,
such as positive polynomials involving n and log n, and ratios of such polyno-
mials. For instance, 0 (n Ig n) is the same as the more usual 0 (n log n), and
O(n 2 /(log3 nnnlgn )) is the sarne as 0((n/logn) 1 5 ). However, the base of
the logarithm cannot be ignored when it is smaller than 1, when it is not a con-
stant, as in 0 (log . n) A 0 (log n), or when the logarithm is in the exponent, as in
0(21gn) 0(21ogn).
It is easy to prove that the notation "e 0" is reflexive and transitive. In
other words, f (n) 0(f(n)) for any function f: N - R2°, and it follows
from f (n)E 0(g(n)) and g(n)c O(h(n)) that f (n)e O(h(n)) for any functions
f, g, h: N - W O; see Problems 3.9 and 3.10. As a result, this notation provides a
way to define a partial order on functions and consequently on the relative efficiency
of different algorithms to solve a given problem; see Problem 3.21. However, the
induced order is not total as there exist functions f, g : RN - 0°such that neither
f (n)E 0(g(n)) nor g(n)E 0(f (n)); see Problem 3.11.
We have seen several examples of functions t (n) and f (n) for which it is easy
to prove that t(n)E 0(f (n)). For this, it suffices to exhibit the appropriate con-
stants c and no and show that the desired relation holds. How could we go about
84 Asymptotic Notation Chapter 3
proving that a given function t (n) is not in the order of another function f ( ) ? The
simplest way is to use a proof by contradiction. The following example makes the
case. Let t(n)= 1o n3 and f(n)= 1OOOn 2 . If you try several values of n smaller
than one million, you find that t (n) < f (n),which may lead you to believe by in-
duction that t(n)e 0 (f(n)), taking 1 as the multiplicative constant. If you attempt
to prove this, however, you are likely to end up with the following proof by contra-
diction that it is false. To prove that t (n))t 0 (f (n)), assume for a contradiction that
t(n)e O(f(n)). Using the generalized threshold rule, this implies the existence of
a positive real constant c such that t(n)< cf(n) for all n > 1. But t(n)< cf(n)
means I n3 < lOOOcn 2 , which implies that n < 106 c. In other words, assuming
t(n)( 0(f(n)) implies that every positive integer n is smaller than some fixed
constant, which is clearly false.
The most powerful and versatile tool both to prove that some functions are in
the order of others and to prove the opposite is the limit rule, which states that,
given arbitrary functions f and 9g: NH-. RO,
2. if lim
fl--c g(n) 0 then f(n) 0(g(n))but g(n) 0 (f(n)),and
f (n)
3. if lim + oothenf(n)¢ O(g(n)) butg(n)e 0(f(n)).
n-- gy(n)
We illustrate the use of this rule before proving it. Consider the two functions
f(n)= logn and g(n)= /n. We wish to determine the relative order of these
functions. Since both f (n) and g (n) tend to infinity as n tends to infinity, we use
de l'H6pital's rule to compute
Now the limit rule immediately shows that logn e 0 (/n ) whereas / vi 0 (log n).
In other words, ni grows asymptotically faster than log n. We now prove the limit
rule.
g(n)< cf (n),
and therefore I/c <f (n)Ig(n), for all sufficiently large n. Since
lim f(n)/g(n)
n--o
exists by assumption that it equals 0 and lim l1/c = 1/c exists as well,
Proposition 1.7.8 applies to conclude that 1/c < limn-o f (n)/g(n)= 0, which
is a contradiction since c > 0. We have contradicted the assumption that
g(n) e 0 (f (n)) and therefore established as required that g(n) ¢ 0 (f (n)).
3. Assume lim-, f (n)/g(n)= +oo. This implies that
and therefore the previous case applies mutatis mutandis with f (n) and g(n)
interchanged.
The converse of the limit rule does not necessarily apply: it is not always the case
thatlim-- f (n)/.g(n)e R' whenf(n) O(g(n)) and g(n)e O(f (n)). Althoughit
does follow that the limit is strictly positive if it exists, the problem is that it may
not exist. Consider for instance f (n)= n and g(n)= 2[19g1 It is easy to see that
g(n)< f (n)< 2g(n) for all n > 1, and thus f (n) and g(n) are each in the order
of the other. However, it is equally easy to see that f (n) /g(n) oscillates between
1 and 2, and thus the limit of that ratio does not exist.
Consider again two functions f, t : - lso from the natural numbers to the
nonnegative reals. We say that t (n) is in Omega of f (n), denoted t (n) E Q(f (n)),
if t(n) is bounded below by a positive real multiple of f(n) for all sufficiently
large n. Mathematically, this means that there exist a positive real constant d and
an integer threshold no such that t(n) > df (n) whenever n > no:
It is easy to see the duality rule: t(n)eE2(f (n)) if and only if f (n)E 0(t(n))
because t(n)> df(n) if and only if f(n) < t(n). You may therefore question the
usefulness of introducing a notation for Q when the 0 notation seems to have the
same expressive power. The reason is that it is more natural to say that a given
algorithm takes a time in Q(n 2 ) than to say the mathematically equivalent but
clumsy "n2 is in 0 of the time taken by the algorithm".
Thanks to the duality rule, we know from the previous section that +n/ E
Q(logn) whereas logn Hi ()(n_),among many examples. More importantly,
the duality rule can be used in the obvious way to turn the limit rule, the max-
imum rule and the threshold rule into rules about the Q notation.
Despite strong similarity between the 0 and the Q notation, there is one aspect
in which their duality fails. Recall that we are most often interested in the worst-
case performance of algorithms. Therefore, when we say that an implementation
of an algorithm takes t(n) microseconds, we mean that t(n) is the maximum time
taken by the implementation on all instances of size n. Let f (n) be such that
t(n)E 0(f(n)). This means that there exists a real positive constant c such that
t(n) < cf(n) for all sufficiently large n. Because no instance of size n can take
more time than the maximum time taken by instances of that size, it follows that
the implementation takes a time bounded by cf (n) microseconds on all sufficiently
large instances. Assuming only a finite number of instances of each size exist, there
can thus be only a finite number of instances, all of size less than the threshold, on
which the implementation takes a time greater than cf (n) microseconds. Assum-
ing f (n) is never zero, these can all be taken care of by using a bigger multiplicative
constant, as in the proof of the threshold rule.
In contrast, let us also assume t(n) Q(ef(n)). Again, this means that there
exists a real positive constant d such that t (n)> df(n) for all sufficiently large n.
But because t (n) denotes the worst-case behaviour of the implementation, we may
infer only that, for each sufficiently large n, there exists at least one instance of size n
such that the implementation takes at least df (n) microseconds on that instance.
This does not rule out the possibility of much faster behaviour on other instances
of the same size. Thus, there may exist infinitely many instances on which the
implementation takes less than df(n) microseconds. Insertion sort provides a
typical example of this behaviour. We saw in Section 2.4 that it takes quadratic
time in the worst case, yet there are infinitely many instances on which it runs in
linear time. We are therefore entitled to claim that its worst-case running time is
both in 0(n 2 ) and in Q(n 2 ). Yet the first claim says that every sufficiently large
instance can be sorted in quadratic time, whereas the second merely says that at
Section 3.3 Other asymptotic notation 87
least one instance of each sufficiently large size genuinely requires quadratic time:
the algorithm may be much faster on other instances of the same size.
Some authors define the Q notation in a way that is subtly but importantly
different. They say that t(n)E Q(f(n)) if there exists a real positive constant d
such that t(n)Ž> df(n) for an infinite number of values of n, whereas we require
the relation to hold for all but finitely many values of n. With this definition,
an algorithm that takes a time in Q (f (n)) in the worst case is such that there
are infinitely many instances on which it takes at least df (n) microseconds for the
appropriate real positive constant d. This corresponds more closely to our intuitive
idea of what a lower bound on the performance of an algorithm should look like.
It is more natural than what we mean by "taking a time in f2(f(n)) ". Nevertheless,
we prefer our definition because it is easier to work with. In particular, the modified
definition of Q is not transitive and the duality rule breaks down.
In this book, we use the Q notation mainly to give lower bounds on the running
time (or other resources) of algorithms. However, this notation is often used to give
lower bounds on the intrinsic difficulty of solving problems. For instance, we shall
see in Section 12.2.1 that any algorithm that successfully sorts n elements must take
a time in f2(n log n), provided the only operation carried out on the elements to be
sorted is to compare them pairwise to determine whether they are equal and, if not,
which is the greater. As a result, we say that the problem of sorting by comparisons
has running time complexity in Q (n log n). It is in general much harder to determine
the complexity of a problem than to determine a lower bound on the running time
of a given algorithm that solves it. We elaborate on this topic in Chapter 12.
The threshold rule and the maximum rule, which we formulated in the context
of the 0 notation, apply mutatis inutandis to the 6 notation. More interestingly,
for the 6 notation the limit rule is reformulated as follows. Consider arbitrary
functions f and g : N - W2°.
88 Asymptotic Notation Chapter 3
3. if lim
r f
ri-o gin) = ±oo thenf(n) Q(g(n)) butf(n)t E)(g(n)).
As an exercise in manipulating asymptotic notation, let us now prove a useful fact:
n
Eik E- EH)
( n k+),
for any fixed integer k > 0, where the left-hand side summation is considered as
a function of n. Of course, this is immediate from Proposition 1.7.16, but it is
instructive to prove it directly.
The "0" direction is easy to prove. For this, simply notice that ik < nk when-
ever 1 < i < n. Therefore ,n§1 ik < nlk = nk+l for all n > 1, which proves
y I ik E Of nk+1) using 1 as the multiplicative constant.
To prove the "Q" direction, notice that ik > (n/ 2 )k whenever i > [n/21 and
that the number of integers between [n/21 and n inclusive is greater than n/2.
Therefore, provided n > I (which implies that In/21 > 1),
k > 11 n X k(n
i21 i= [ [f2
t =n)=
a ifon =w1 (3.2)
14t(nI21)±bn otherwise.
Section 3.4 Conditional asymptotic notation 89
We study techniques for solving recurrences in Section 4.7, but unfortunately Equa-
tion 3.2 cannot be handled directly by those techniques because the ceiling func-
tion [n/21 is troublesome. Nevertheless, our recurrence is easy to solve provided
we consider only the case when n is a power of 2: in this case [ n/21 = n/2 and the
ceiling vanishes. The techniques of Section 4.7 yield
t(n)= (a + b)n 2 bn
provided n is a power of 2. Since the lower-order term "b n" can be neglected, it
follows that t(n) is in the exact order of n2 , still provided n is a power of 2. This
is denoted by t(n)e 6 (n 2 n is a power of 2).
More generally, let f, t : J - >O be two functions from the natural numbers
to the nonnegative reals, and let P: N-I true,false} be a property of the integers.
We say that t(n) is in O(f(n) P(n)) if t(n) is bounded above by a positive
real multiple of f(n) for all sufficiently large n such that P(n) holds. Formally,
O(f(n) P(n)) is defined as
The sets Q(f(n) I P(n)) and O(f(n) P(n)) are defined similarly. Abusing nota-
tion in a familiar way, we write t(n)e O(f(n) P(n)) even if t(n) and f(n) are
negative or undefined on an arbitrary-perhaps infinite-number of values of n
on which P(n) does not hold.
Conditional asymptotic notation is more than a mere notational convenience:
its main interest is that it can generally be eliminated once it has been used to
facilitate the analysis of an algorithm. For this, we need a few definitions. A func-
tion f: NRI-l is eventually nondecreasing if there exists an integer threshold no
such that f(n) < f(n + 1) for all n > no. This implies by mathematical induction
that f(n)< f(m) whenever m > n > nO. Let b > 2 be any integer. Function f is
b-smooth if, in addition to being eventually nondecreasing, it satisfies the condition
f (bn) e O (f (n)). In other words, there must exist a constant c (depending on b)
such that f (bn) < cf (n) for all n > no. (There is no loss of generality in using the
same threshold no for both purposes.) A function is smooth if it is b-smooth for
every integer b > 2.
Most functions you are likely to encounter in the analysis of algorithms are
smooth, such as log n, n log n, n2 , or any polynomial whose leading coefficient
is positive. However, functions that grow too fast, such as nl1n, 2n or n!, are not
smooth because the ratio f (2n) /f (n) is unbounded. For example,
from Equation 3.2, whereas it is harder to carry out the analysis of t(n) when n is
not a power of 2. The smoothness rule allows us to infer directly from Equation 3.3
that t(n)se 0(n 2 ), provided we verify that n2 is a smooth function and t(n) is
eventually nondecreasing. The first condition is immediate since n2 is obviously
nondecreasing and (2n )2= 4n2 . The second is easily demonstrated by mathemat-
ical induction from Equation 3.2; see Problem 3.28. Thus the use of conditional
asymptotic notation as a stepping stone yields the final result that t(n)e 0(n 2 )
unconditionally.
We now prove the smoothness rule. Let f (n) be a smooth function and let t(n)
be an eventually nondecreasing function such that t (n) c 3 (f (n) I n is a power
of b) for some integer b Ž 2. Let no be the largest of the thresholds implied by the
above conditions: f(m)< f(m + 1), t(m)< tn(m + 1) and f(bnm)< cf(m) when-
ever m > no, and df(m)s t(m)< af(m) whenever m > no is a power of b, for
appropriate constants a, c and d. For any positive integer n, let n denote the largest
power of b not larger than n (formally, n = blogb ?nt) and let n = bn. By definition,
n/b < n < n < n and n is a power of b. Consider any n > max(1, bno).
t(n) < t(n) < af (n) = af (bn):< acf (n):< acf (n)
This equation uses successively the facts that t is eventually nondecreasing (and
n > n 2 no), t(m) is in the order of f(m) when m is a power of b (and n is a
Section 3.6 Operations on asymptotic notation 91
power of b no smaller than no), n = bn, f is b-smooth (and n > n/b > no), and
f is eventually nondecreasing (and n > n > no). This proves that t(n) < acf (n)
for all values of n > max(1, bno), and therefore t(n)c O(f (n)). The proof that
t(n)e Q(f (n)) is similar.
(There is no need for two distinct thresholds for m and n in V m, n e RI.) Gener-
alization to more than two parameters, of the conditional asymptotic notation and
of the Q and 0 notation, is done in the same spirit.
Asymptotic notation with several parameters is similar to what we have seen
so far, except for one essential difference: the threshold rule is no longer valid.
Indeed, the threshold is sometimes indispensable. This is explained by the fact that
while there are never more than a finite number of nonnegative integers smaller
than any given threshold, there are in general an infinite number of pairs (m, n)
of nonnegative integers such that at least one of m or n is below the threshold;
see Problem 3.32. For this reason, 0 (f (m, n)) may make sense even if f (m, n) is
negative or undefined on an infinite set of points, provided all these points can be
ruled out by an appropriate choice of threshold.
More formally, if op is any binary operator and if X and Y are sets of functions
from N into [R'0 , in particular sets described by asymptotic notation, then "X op Y "
denotes the set of functions that can be obtained by choosing one function from
X and one function from Y, and by "oping" them together pointwise. In keeping
with the spirit of asymptotic notation, we only require the resulting function to be
the correct oped value beyond some threshold. Formally, X op Y denotes
3.7 Problems
Problem 3.1. Consider an implementation of an algorithm that takes a time that
is bounded above by the unlikely function
to solve an instance of size n. Find the simplest possible function f: N - '° such
that the algorithm takes a time in the order of f (n).
Problem 3.2. Consider two algorithms A and B that take time in 0(n 2 ) and 0(n 3),
respectively, to solve the same problem. If other resources such as storage and
programming time are of no concern, is it necessarily the case that algorithm A is
always preferable to algorithm B? Justify your answer.
Section 3.7 Problems 93
Problem 3.3. Consider two algorithms A and B that take time in O(n 2 ) and o(n3 ),
respectively. Could there exist an implementation of algorithm B that would be
more efficient (in terms of computing time) than an implementation of algorithm
A on all instances? Justify your answer.
Problem 3.5. Which of the following statements are true? Prove your answers.
1. n2 c O(n3 )
2. n2 E Q(n)
3. 2n eE ()(2n+')
4. n! e 0((n + 1)!)
Problem 3.7. In contrast with Problem 3.6, prove that 2 f (n) G 0(2n) does not
necessarily follow from f (n) C 0(n).
Problem 3.8. Consider an algorithm that takes a time in0 ( nlg3) to solve instances
of size n. Is itcorrectto saythatittakes a time in 0(n 59 )? In (2(n] 59)? In 0 (n15 9 )?
Justify your answers. (Note: lg 3 1.58496...)
Problem 3.9. Prove that the 0 notation is reflexive: f (n) e 0 (f (n)) for any func-
tion f hiN- Rl20
Problem 3.11. Prove that the ordering on functions induced by the 0 notation
is not total: give explicitly two functions f, y: RI - Rl2° such that f (n) t O (g(n))
and g (n) 0(f(n)). Prove your answer.
Problem 3.12. Prove that the Q notation is reflexive and transitive: for any func-
tionsf,g, h :N
R- 0
1. f(n)e Q(f(n))
2. if f(n)c Q (g(n)) and g(n)E Q (h(n)) then f(n)E Q(h(n)).
Rather than proving this directly (which would be easier!), use the duality rule and
the results of Problems 3.9 and 3.10.
94 Asymptotic Notation Chapter 3
Problem 3.13. As we explained, some authors give a different definition for the
( notation. For the purpose of this problem, let us say that t (n) E Q (f (n)) if there
exists a real positive constant d such that t(n)> df (n) for an infinite number of
values of n. Formally,
Prove that this notation is not transitive. Specifically, give an explicit example of
three functions f, g,h: NRd- such that f(n)e Q(g(n)) and g(n)c n (h(n)),yet
f (n)0 Q2(h(n)).
Problem 3.14. Let f (n)= n2 . Find the error in the following "proof" by mathe-
matical induction that f (n) E 0 (n).
• Basis: The case n = 1 is trivially satisfied since f (1) 1 en, where c = 1.
• Induction step: Consider any n > 1. Assume by the induction hypothesis the
existence of a positive constant c such that f (n - 1) < c (n -1).
2
f(n) =n2= (n -1) +2n -1= f(n -1)+2n -1
<c(n-1)+2n -1 = (c+2)n -c- < (c+2)n
Problem 3.15. Find the error in the following "proof" that 0(n)= 0(n 2 ). Let
f (n)= n2 , g(n)= n and h(n)= g(n)-f (n). It is clear that h(n)< g(n)< f (n)
for all n 2 0. Therefore, f (n)= max(f (n),h(n)). Using the maximum rule, we
conclude
Problem 3.16. Prove by mathematical induction that the maximum rule can be
applied to more than two functions. More precisely, let k be an integer and let
fl,f2,...,fk be functions from N to l 0 . Define g(n)f= maxi fi(n),f 2 (n),...,
fk(n)) and h(n)= fi(n)+f2 (n)±+ + fk(n) for all n > 0. Prove that 0(g(n))
0 (h(n)).
Problem 3.17. Find the error in the following proof that O(n)= O(n 2 ).
2
0(n)= O(max(n,n,...n))= O(n+n+.*.-+n)=O(n),
n times n times
where the middle equality comes from the generalized maximum rule that you
proved in Problem 3.16.
Section 3.7 Problems 95
Problem 3.18. Prove that the E notation is reflexive, symmetric and transitive: for
any functions f g, h: N - R> ,
1. f(n) c O(f(n))
2. if f(n)c 03(g(n)) then g(n)c 0 (f(n))
3. if f(n) 0(g(n)) and g(n)c 9(h(n)) then f(n) cE)(h(n)).
Problem 3.19. For any functions f, : - Rprove that
Problem 3.23. We saw at the end of Section 3.3 that Z.n=1 ik c Q(nk+1) for any fixed
integer k > 0 because Zn= 1 ik> nk+l /2 k+1 for all n. Use the idea behind Figure 1.8
in Section 1.7.2 to derive a tighter constant for the inequality: find a constant d
(depending on k) much larger than 1 /2 k+1 such that y.' I ik > dnk+l holds for
all n. Do not use Proposition 1.7.16.
Problem 3.24. Prove that log(n!)c e(nlogn). Do not use Stirling's formula.
Hint: Mimic theproof that Y i ik e g(nk+l) given at the end of Section 3.3. Resist
the temptation to improve your reasoning along the lines of Problem 3.23.
Problem 3.27. Consider any b-smooth function 1f: N - W2. Let c and no be
constants such that f (bn) < cf (n) for all n > no. Consider any positive integer i.
Prove by mathematical induction that f (bn): cif (n) for all n > no.
Problem 3.35. Find a function f: N - W>0 that is bounded above by some poly-
nomial, yet f(n) no(l).
Analysis of Algorithms
4.1 Introduction
The primary purpose of this book is to teach you how to design your own efficient
algorithms. However, if you are faced with several different algorithms to solve
the same problem, you have to decide which one is best suited for your applica-
tion. An essential tool for this purpose is the analysis of algorithms. Only after you
have determined the efficiency of the various algorithms will you be able to make a
well-informed decision. But there is no magic formula for analysing the efficiency
of algorithms. It is largely a matter of judgement, intuition and experience. Nev-
ertheless, there are some basic techniques that are often useful, such as knowing
how to deal with control structures and recurrence equations. This chapter covers
the most commonly used techniques and illustrates them with examples. More
examples are found throughout the book.
4.2.1 Sequencing
Let PI and P2 be two fragments of an algorithm. They may be single instructions
or complicated subalgorithms. Let t1 and t2 be the times taken by P1 and P2,
respectively. These times may depend on various parameters, such as the instance
98
Section 4.2 Analysing control structures 99
size. The sequencing rule says that the time required to compute "P1 ;P2 ", that
is first P1 and then P2, is simply t1 + t2 . By the maximum rule, this time is in
G (max(t1 , t 2 )).
Despite its simplicity, applying this rule is sometimes less obvious than it may
appear. For example, it could happen that one of the parameters that control t2
depends on the result of the computation performed by PI. Thus, the analysis of
"P1 ; P2" cannot always be performed by considering P1 and P2 independently.
Here and throughout the book, we adopt the convention that when m = 0 this is
not an error; it simply means that the controlled statement P(i) is not executed
at all. Suppose this loop is part of a larger algorithm, working on an instance
of size n. (Be careful not to confuse m and n.) The easiest case is when the
time taken by P(i) does not actually depend on i, although it could depend on
the instance size or, more generally, on the instance itself. Let t denote the time
required to compute P(i). In this case, the obvious analysis of the loop is that P (i)
is performed m times, each time at a cost of t, and thus the total time required by
the loop is simply ? = mt. Although this approach is usually adequate, there is a
potential pitfall: we did not take account of the time needed for loop control. After
all, our for loop is shorthand for something like the following while loop.
i -I
while i < m do
P(i)
i- i-i+
In most situations, it is reasonable to count at unit cost the test i < m, the in-
structions i - 1 and i - i + 1, and the sequencing operations (go to) implicit in
the while loop. Let c be an upper bound on the time required by each of these
operations. The time f taken by the loop is thus bounded above by
VP c for i - 1
+ (m + 1)c for the tests i ' m
+ mt for the executions of P(i)
+ mc for the executions of i - i + 1
± Mc for the sequencing operations
< (t+3c)m+2c.
Moreover this time is clearly bounded below by mt. If c is negligible compared
to t, our previous estimate that f is roughly equal to mt was therefore justified,
except for one crucial case: 4? mt is completely wrong when m = 0 (it is even
worse if m is negative!). We shall see in Section 4.3 that neglecting the time required
for loop control can lead to serious errors in such circumstances.
100 Analysis of Algorithms Chapter 4
Resist the temptation to say that the time taken by the loop is in 0(mt) on
the pretext that the 0 notation is only asked to be effective beyond some threshold
such as m > 1. The problem with this argument is that if we are in fact analysing
the entire algorithm rather than simply the for loop, the threshold implied by the
0 notation concerns n, the instance size, rather than m, the number of times we go
round the loop, and m = 0 could happen for arbitrarily large values of n. On the
other hand, provided t is bounded below by some constant (which is always the case
in practice), and provided there exists a threshold no such that m >1 whenever
n > no, Problem 4.3 asks you to show that X is indeed in 0(mt) when A, m and t
are considered as functions of n.
The analysis of for loops is more interesting when the time t (i) required for
PW)varies as a function of i. (In general, the time required for P(i) could depend
not only on i but also on the instance size n or even on the instance itself.) If we
neglect the time taken by the loop control, which is usually adequate provided
m > 1, the same for loop
for i - 1 to m do P(i)
takes a time given not by a multiplication but rather by a sum: it is X,=1 t (i).
The techniques of Section 1.7.2 are often useful to transform such sums into simpler
asymptotic notation.
We illustrate the analysis of for loops with a simple algorithm for computing
the Fibonacci sequence that we evaluated empirically in Section 2.7.5. We repeat
the algorithm below.
function Fibiter(n)
i - 1; j- 0
for k - 1 to n do j - i+ j
i- j
return j
If we count all arithmetic operations at unit cost, the instructions inside the for loop
take constant time. Let the time taken by these instructions be bounded above by
some constant c. Not taking loop control into account, the time taken by the for loop
is bounded above by n times this constant: nc. Since the instructions before and
after the loop take negligible time, we conclude that the algorithm takes a time in
0(n). Similar reasoning yields that this time is also in 0(n), hence it is in 0((n).
We saw however in Section 2.5 that it is not reasonable to count the additions
involved in the computation of the Fibonacci sequence at unit cost unless n is very
small. Therefore, we should take account of the fact that an instruction as simple
as "j - i + j " is increasingly expensive each time round the loop. It is easy to
program long-integer additions and subtractions so that the time needed to add
or subtract two integers is in the exact order of the number of figures in the larger
operand. To determine the time taken by the k-th trip round the loop, we need
to know the length of the integers involved. Problem 4.4 asks you to prove by
mathematical induction that the values of i and j at the end of the k-th iteration
are fk- 1 and fk, respectively. This is precisely why the algorithm works: it returns
Section 4.2 Analysing control structures 101
the value of j at the end of the n-th iteration, which is therefore f, as required.
Moreover, we saw in Section 2.7.5 that de Moivre's formula tells us that the size of
fk is in 0(k). Therefore, the k-th iteration takes a time 0 (k - 1) +±(k),
which is the
same as 0((k); see Problem 3.34. Let c be a constant such that this time is bounded
above by ck for all k > 1. If we neglect the time required for the loop control and
for the instructions before and after the loop, we conclude that the time taken by
the algorithm is bounded above by
E n (n + l) 2
>Ick =c Yk =c ~~ O (n)
k-l kl 2
Similar reasoning yields that this time is in Q(n 2 ), and therefore it is in 0(n 2 ).
Thus it makes a crucial difference in the analysis of Fibrec whether or not we count
arithmetic operations at unit cost.
The analysis of for loops that start at a value other than 1 or proceed by larger
steps should be obvious at this point. Consider the following loop for example.
Here, P(i) is executed ((m -5) : 2)+1 times provided m > 3. (For a for loop to
make sense, the endpoint should always be at least as large as the starting point
minus the step).
function Fibrec(n)
if n < 2 then return n
else return Fibrec(n - 1)+Fibrec(n- 2)
Let T(n) be the time taken by a call on Fibrec(n). If n < 2, the algorithm simply
returns n, which takes some constant time a. Otherwise, most of the work is spent
in the two recursive calls, which take time T(n -1) and T(n - 2), respectively.
Moreover, one addition involving fn-1 and fn 2 (which are the values returned
by the recursive calls) must be performed, as well as the control of the recursion
and the test "if n < 2". Let h(n) stand for the work involved in this addition and
102 Analysis of Algorithms Chapter 4
control, that is the time required by a call on Fibrec(n) ignoring the time spent inside
the two recursive calls. By definition of T(n) and h(n) we obtain the following
recurrence.
If we count the additions at unit cost, h(n) is bounded by a constant and the
recurrence equation for T (n) is very similar to that already encountered for g (n)
in Section 1.6.4. Constructive induction applies equally well to reach the same
conclusion: T (n) e O (fn). However, it is easier in this case to apply the techniques
of Section 4.7 to solve recurrence 4.1. Similar reasoning shows that T(n)E Q (fn),
hence T(n)e FJ(fn). Using de Moivre's formula, we conclude that Fibrec(n) takes
a time exponential in n. This is double exponential in the size of the instance since
the value of n is exponential in the size of n.
If we do not count the additions at unit cost, h(n) is no longer bounded by
a constant. Instead h(n) is dominated by the time required for the addition of
fn -land fn 2 for sufficiently large n. We have already seen that this addition
takes a time in the exact order of n. Therefore h(n)e 0(n). The techniques of
Section 4.7 apply again to solve recurrence 4.1. Surprisingly, the result is the same
regardless of whether h (n) is constant or linear: it is still the case that T (n) E O(fn).
In conclusion, Fibrec(n) takes a time exponential in n whether or not we count
additions at unit cost! The only difference lies in the multiplicative constant hidden
in the 6 notation.
the lower half. We obtain the following algorithm. (A slightly better algorithm is
given in Section 7.3; see Problem 7.11.)
i= [(i+j): 2]+1andg = j.
Therefore,
d =5- i =1
j -(i+j).2<j -(i+j -1)/2 (j -i+1)/2 =d/2.
Finally, if x = T [k], then i and j are set to the same value and thus d = 1; but d was
at least 2 since otherwise the loop would not have been reentered. We conclude
that d < d/2 whichever case happens, which means that the value of d is at least
halved each time round the loop. Since we stop when dc< 1, the process must
eventually stop, but how much time does it take?
To determine an upper bound on the running time of binary search, let de
denote the value of j - i + 1 at the end of the f -th trip round the loop for 4 > 1 and
let do = n. Since de 1 is the value of j - i + 1 before starting the 4-?th iteration, we
have proved that di s de-I /2 for all 4 > 1. It follows immediately by mathematical
induction that de < n/2e. But the loop terminates when d < 1, which happens at
the latest when ? = [ lg n] . We conclude that the loop is entered at most r lg nl
times. Since each trip round the loop takes constant time, binary search takes a
104 Analysis of Algorithms Chapter 4
i -0
for k - 1 to s do
while U[k]iA 0 do
i- i+I
T[i]- k
U[kb- U[k]-1
To analyse the time required by this process, we use "U [ k]" to denote the value
originally stored in U[k] since all these values are set to 0 during the process. It is
tempting to choose any of the instructions in the inner loop as a barometer. For each
value of k, these instructions are executed U[k] times. The total number of times
they are executed is therefore Y.'=, U[k]. But this sum is equal to n, the number
of integers to sort, since the sum of the number of times that each element appears
gives the total number of elements. If indeed these instructions could serve as a
barometer, we would conclude that this process takes a time in the exact order
of n. A simple example is sufficient to convince us that this is not necessarily
the case. Suppose U[k]= 1 when k is a perfect square and U[k]= 0 otherwise.
This would correspond to sorting an array T containing exactly once each perfect
square between 1 and n2 , using s = n2 pigeon-holes. In this case, the process clearly
takes a time in Q (n2 ) since the outer loop is executed s times. Therefore, it cannot
be that the time taken is in E)(n). This proves that the choice of the instructions in
the inner loop as a barometer was incorrect. The problem arises because we can
only neglect the time spent initializing and controlling loops provided we make
sure to include something even if the loop is executed zero times.
The correct and detailed analysis of the process is as follows. Let a be the
time needed for the test U[k]f 0 each time round the inner loop and let b be
the time taken by one execution of the instructions in the inner loop, including
the implicit sequencing operation to go back to the test at the beginning of the
loop. To execute the inner loop completely for a given value of k takes a time
tk = (1 + U[k])a + U[k]b, where we add 1 to U[k] before multiplying by a to take
account of the fact that the test is performed each time round the loop and one
more time to determine that the loop has been completed. The crucial thing is
that this time is not zero even when U[ k]= 0. The complete process takes a time
c + =1(d + tk), where c and d are new constants to take account of the time
needed to initialize and control the outer loop, respectively. When simplified, this
expression yields c + (a + d)s + (a + b)n. We conclude that the process takes a
time in 0 (n + s). Thus the time depends on two independent parameters n and s;
106 Analysis of Algorithms Chapter 4
Selection sort
Selection sorting, encountered in Section 2.4, provides a good example of the anal-
ysis of nested loops.
Although the time spent by each trip round the inner loop is not constant-it takes
longer when T[j] < minx-this time is bounded above by some constant c (that
takes the loop control into account). For each value of i, the instructions in the
inner loop are executed n -(i + 1) +I = n - i times, and therefore the time taken
by the inner loop is t (i) • (n - i) c. The time taken for the i-th trip round the outer
loop is bounded above by b + t (i) for an appropriate constant b that takes account
of the elementary operations before and after the inner loop and of the loop control
Section 4.4 Supplementary examples 107
for the outer loop. Therefore, the total time spent by the algorithm is bounded
above by
n-l nil n-1
b+ (n -i)c = (b+cn)-c E i
iil iil 11
I cn 2 + b- Ic) n-b,
2
which is in 0(n ). Similar reasoning shows that this time is also in 0(n2 ) in all
cases, and therefore selection sort takes a time in 0 (n 2) to sort n items.
The above argument can be simplified, obviating the need to introduce explicit
constants such as b and c, once we are comfortable with the notion of a barometer
instruction. Here, it is natural to take the innermost test "if T[j]< minx" as a
barometer and count the exact number of times it is executed. This is a good
measure of the total running time of the algorithm because none of the loops can
be executed zero times (in which case loop control could have been more time-
consuming than our barometer). The number of times that the test is executed is
easily seen to be
nl1 n n-1
= k = n(n- 1)/2.
k-l
Thus the number of times the barometer instruction is executed is in 0 (n2 ), which
automatically gives the running time of the algorithm itself.
Insertion sort
We encountered insertion sorting also in Section 2.4.
T[j + 1]- x
Unlike selection sorting, we saw in Section 2.4 that the time taken to sort n items
by insertion depends significantly on the original order of the elements. Here,
we analyse this algorithm in the worst case; the average-case analysis is given in
Section 4.5. To analyse the running time of this algorithm, we choose as barometer
the number of times the while loop condition (j > 0 and x < TI jI) is tested.
Suppose for a moment that i is fixed. Let x = T[i], as in the algorithm.
The worst case arises when x is less than Tfj] for every j between 1 and i - 1,
since in this case we have to compare x to T[i -1], T[i - 2],..., T[1] before we
leave the while loop because j = 0. Thus the while loop test is performed i times
108 Analysis of Algorithms Chapter 4
in the worst case. This worst case happens for every value of i from 2 to n when
the array is initially sorted into descending order. The barometer test is thus per-
formed 1=2 i = n(n + 1)/2 1 times in total, which is in 0(n2 ). This shows that
insertion sort also takes a time in 0 (n 2) to sort n items in the worst case.
Euclid's algorithm
Recall from Section 2.7.4 that Euclid's algorithm is used to compute the greatest
common divisor of two integers.
function Euclid(m, n)
while m > 0 do
t- m
m - n mod m
n- t
return n
The analysis of this loop is slightly more subtle than those we have seen so far
because clearly measurable progress occurs not every time round the loop but
rather every other time. To see this, we first show that for any two integers m and
n such that n > m, it is always true that n mod m < n/2.
• If m > n/2, then 1 < n/m < 2, and so n . m = 1, which implies that
nmodm = n xm(n m)= n-m < n -n/2 = n/2.
• If m <n/2, then n mod m <imn<n/2.
Assume without loss of generality that n > m since otherwise the first trip round
the loop swaps m and n (because n mod m - n when n < m). This condition
is preserved each time round the loop because n mod m is never larger than m.
If we assume that arithmetic operations are elementary, which is reasonable in
many situations, the total time taken by the algorithm is in the exact order of the
number of trips round the loop. Let us determine an upper bound on this number
as a function of n. Consider what happens to m and n after we go round the
loop twice, assuming the algorithm does not stop earlier. Let mo and no denote the
original value of the parameters. After the first trip round the loop, m becomes
no mod mo. After the second trip round the loop, n takes up that value. By the
observation above, n has become smaller than no/2. In other words, the value of
n is at least halved after going round the loop twice. By then it is still the case that
n > m and therefore the same reasoning applies again: if the algorithm has not
stopped earlier, two additional trips round the loop will make the value of n at
least twice as small again. With some experience, the conclusion is now immediate:
the loop is entered at most roughly 2 lg n times.
Formally, it is best to complete the analysis of Euclid's algorithm by treating the
while loop as if it were a recursive algorithm. Let t(e) be the maximum number
of times the algorithm goes round the loop on inputs m and n when m vn < P.
If n < 2, we go round the loop either zero times (if m = 0) or one time. Otherwise,
either we go round the loop less than twice (if m = 0 or m divides n exactly),
or at least twice. In the latter case, the value of n is at least halved-and thus it
Section 4.4 Supplementary examples 109
becomes at most f . 2-and that of m becomes no larger than the new value of n.
Therefore it takes no more than t (1 . 2) additional trips round the loop to complete
the calculation. This yields the following recurrence.
To analyse the execution time of this algorithm, we use the instruction write as
a barometer. Let t(m) denote the number of times it is executed on a call of
Hanoi(m,, ). By inspection of the algorithm, we obtain the following recurrence:
t (M) (0 if M= 0 (4.3)
12t(m- 1)+1 otherwise,
from which the technique of Section 4.7 yields t(m)= 2m -1; see Example 4.7.6.
Since the number of executions of the write instruction is a good measure of the
time taken by the algorithm, we conclude that it takes a time in 0 (2n) to solve
the problem with n rings. In fact, it can be proved that the problem with n rings
cannot be solved in less than 2n - 1 moves and therefore this algorithm is optimal
if one insists on printing the entire sequence of necessary moves.
Computing determinants
Yet another example of analysis of recursion concerns the recursive algorithm for
calculating a determinant. Recall that the determinant of an n x n matrix can be
computed from the determinants of n smaller (n - 1) x (n - 1) matrices obtained
by deleting the first row and some column of the original matrix. Once the n sub-
determinants are calculated, the determinant of the original matrix is obtained very
Section 4.5 Average-case analysis ill
quickly. In addition to the recursive calls, the dominant operation needed consists
of creating the n submatrices whose determinants are to be calculated. This takes a
time in 0 (n3 ) if it is implemented in a straightforward way, but a time in 0((n) suf-
fices if pointers are used instead of copying elements. Therefore, the total time t (n)
needed to calculate the determinant of an n x n matrix by the recursive algorithm is
givenby the recurrence t(n)= nt(n - 1)+h(n) forn > 2, where h(n)E 0(n). This
recurrence cannot be handled by the techniques of Section 4.7. However we saw
in Problem 1.31 that constructive induction applies to conclude that t(n)e a(n!),
which shows that this algorithm is very inefficient. The same conclusion holds if
we are less clever and need h (n)E 0(n3 ). Recall from Section 2.7.1 that the deter-
minant can be calculated in a time in 0 (n 3 ) by Gauss-Jordan elimination, and even
faster by another recursive algorithm of the divide-and-conquer family. More work
is needed to analyse these algorithms if the time taken for arithmetic operations is
taken into account.
Let k be this partial rank. We choose again as barometer the number of times
the while loop condition (j > 0 and x < T[j]) is tested. By definition of partial
rank, and since Till . . i - 1] is sorted, this test is performed exactly i - k + 1 times.
Because each value of k between 1 and i has probability 1/i of occurring, the
average number of times the barometer test is performed for any given value of
i is
t
i k= 2
These events are independent for different values of i. The total average number
of times the barometer test is performed when sorting n items is therefore
n n i + 1 (n -1) (n + 4)
Eci i 2 4
i=2 i=2
for i 1 to n do P
If P takes a time in O (log n) in the worst case, it is correct to conclude that the loop
takes a time in 0 (n log n), but it may be that it is much faster even in the worst case.
This could happen if P cannot take a long time ([ (log n)) unless it has been called
many times previously, each time at small cost. It could be for instance that P takes
constant time on the average, in which case the entire loop would be performed in
linear time.
The meaning of "on the average" here is entirely different from what we en-
countered in Section 4.5. Rather than taking the average over all possible inputs,
which requires an assumption on the probability distribution of instances, we take
the average over successive calls. Here the times taken by the various calls are
highly dependent, whereas we implicitly assumed in Section 4.5 that each call was
independent from the others. To prevent confusion, we shall say in this context
that each call on P takes amortized constant time rather than saying that it takes
constant time on the average.
Saying that a process takes amortized constant time means that there exists
a constant c (depending only on the implementation) such that for any positive
Section 4.6 Amortized analysis 113
n and any sequence of n calls on the process, the total time for those calls is
bounded above by cn. Therefore, excessive time is allowed for one call only if
very short times have been registered previously, not merely if further calls would
go quickly. Indeed, if a call were allowed to be expensive on the ground that it
prepares for much quicker later calls, the expense would be wasted should that
call be the final one.
Consider for instance the time needed to get a cup of coffee in a common coffee
room. Most of the time, you simply walk into the room, grab the pot, and pour
coffee into your cup. Perhaps you spill a few drops on the table. Once in a while,
however, you have the bad luck to find the pot empty, and you must start a fresh
brew, which is considerably more time-consuming. While you are at it, you may
as well clean up the table. Thus, the algorithm for getting a cup of coffee takes
substantial time in the worst case, yet it is quick in the amortized sense because a
long time is required only after several cups have been obtained quickly. (For this
analogy to work properly, we must assume somewhat unrealistically that the pot
is full when the first person walks in; otherwise the very first cup would consume
too much time.)
A classic example of this behaviour in computer science concerns storage allo-
cation with occasional need for "garbage collection". A simpler example concerns
updating a binary counter. Suppose we wish to manage the counter as an array
of bits representing the value of the counter in binary notation: array C[1 . . m]
represents EYm' 2 m- C[j]. For instance, array [0,1,1, 0,1, 1] represents 27. Since
such a counter can only count up to 2m - 1, we shall assume that we are happy to
count modulo 2". (Alternatively, we could produce an error message in case of
overflow.) Here is the algorithm for adding 1 to the counter.
procedure count(C[l .. im])
{This procedure assumes m > I
and C[j] {0, 1} for each 1 < j < m}
j- mi +I
repeat
j-gj--i
C[jP- 1 - C[j]
until C[j]= 1 or j = 1
Called on our example [0,1, 1,0, 1,1], the arraybecomes [0, 1, 1,0,1,0] the firsttime
round the loop, [0,1,1, 0, 0, 0] the second time, and [0,1,1,1, 0, 0] the third time
(which indeed represents the value 28 in binary); the loop then terminates with
j = 4 since C[4] is now equal to 1. Clearly, the algorithm's worst case occurs when
C[j] = 1 for all j, in which case it goes round the loop m times. Therefore, n calls
on count starting from an all-zero array take total time in 0 (nm). But do they take
a time in 0 (nm) ? The answer is negative, as we are about to show that count takes
constant amortized time. This implies that our n calls on count collectively take a
time in 0(n), with a hidden constant that does not depend on m. In particular,
counting from 0 to n = 2m - 1 can be achieved in a time linear in n, whereas careless
worst-case analysis of count would yield the correct but pessimistic conclusion that
it takes a time in 0 (n log n).
114 Analysis of Algorithms Chapter 4
There are two main techniques to establish amortized analysis results: the
potential function approach and the accounting trick. Both techniques apply best
to analyse the number of times a barometer instruction is executed.
Potential functions
Suppose the process to be analysed modifies a database and its efficiency each time
it is called depends on the current state of that database. We associate with the
database a notion of "cleanliness", known as the potentialfunction of the database,
Calls on the process are allowed to take more time than average provided they
clean up the database. Conversely, quick calls are allowed to mess it up. This is
precisely what happens in the coffee room! The analogy holds even further: the
faster you fill up your cup, the more likely you will spill coffee, which in turn means
that it will take longer when the time comes to clean up. Similarly, the faster the
process goes when it goes fast, the more it messes up the database, which in turn
requires more time when cleaning up becomes necessary.
Formally, we introduce an integer-valued potential function F of the state of
the database. Larger values of 4 correspond to dirtier states. Let (Po be the value
of 4 on the initial state; it represents our standard of cleanliness. Let Oi be the
value of 4F on the database after the i-th call on the process, and let t1 be the time
needed by that call (or the number of times the barometer instruction is performed).
We define the amortized time taken by that call as
ti = ti + cki - fi/;1
Thus, ii is the actual time required to carry out the i-th call on the process plus
the increase in potential caused by that call. It is sometimes better to think of it as
the actual time minus the decrease in potential, as this shows that operations that
clean up the database will be allowed to run longer without incurring a penalty in
terms of their amortized time.
Let T. denote the total time required for the first n calls on the process, and
denote the total amortized time by Tn.
n n
Tn Y.
E i = (ti + Oii - OZi-l)
n n n
= ti + Y. ti - Z' 1
i11 i~l ~I
Tn + O-n + Pn-i + + (Pi
- Pn-i -¢i (Po
=Tn + -4). - (Po
Therefore
T. Tn - (kn - PO)
Section 4.6 Amortized analysis 115
The significance of this is that Tn < Tn holds for all n provided kn never becomes
smaller than 4 o. In other words, the total amortized time is always an upper bound
on the total actual time needed to perform a sequence of operations, as long as the
database is never allowed to become "cleaner" than it was initially. (This shows
that overcleaning can be harmful!) This approach is interesting when the actual
time varies significantly from one call to the next, whereas the amortized time is
nearly invariant. For instance, a sequence of operations takes linear time when the
amortized time per operation is constant, regardless of the actual time required for
each operation.
The challenge in applying this technique is to figure out the proper potential
function. We illustrate this with our example of the binary counter. A call on
count is increasingly expensive as the rightmost zero in the counter is farther to the
left. Therefore the potential function that immediately comes to mind would be
m minus the largest j such that C [j]= 0. It turns out, however, that this choice of
potential function is not appropriate because a single operation can mess up the
counter arbitrarily (adding 1 to the counter representing 2 k 2 causes this potential
function to jump from 0 to k). Fortunately, a simpler potential function works well:
define 4>(C) as the number of bits equal to 1 in C. Clearly, our condition that the
potential never be allowed to decrease below the initial potential holds since the
initial potential is zero.
What is the amortized cost of adding 1 to the counter, in terms of the number
of times we go round the loop? There are three cases to consider.
• If the counter represents an even integer, we go round the loop once only as we
flip the rightmost bit from 0 to 1. As a result, there is one more bit set to 1 than
there was before. Therefore, the actual cost is 1 trip round the loop, and the
increase in potential is also 1. By definition, the amortized cost of the operation
isl+1 =2.
• If all the bits in the counter are equal to 1, we go round the loop m times, flipping
all those bits to 0. As a result, the potential drops from m to 0. Therefore, the
amortized cost is m - m = 0.
• In all other cases, each time we go round the loop we decrease the potential by
1 since we flip a bit from 1 to 0, except for the last trip round the loop when we
increase the potential by 1 since we flip a bit from 0 to 1. Thus, if we go round
the loop k times, we decrease the potential k - 1 times and we increase it once,
for a net decrease of k - 2. Therefore, the amortized cost is k - (k - 2)= 2.
One of the first lessons experience will teach you if you try solving recurrences is
that discontinuous functions such as the floor function (implicit in n 2) are hard
to analyse. Our first step is to replace n . 2 with the better-behaved "n /2" with a
suitable restriction on the set of values of n that we consider initially. It is tempting
to restrict n to being even since in that case n . 2 = n/2, but recursively dividing
an even number by 2 may produce an odd number larger than 1. Therefore, it
is a better idea to restrict n to being an exact power of 2. Once this special case
is handled, the general case follows painlessly in asymptotic notation from the
smoothness rule of Section 3.4.
First, we tabulate the value of the recurrence on the first few powers of 2.
n 1 2 4 8 16 32
T(n) 1 5 19 65 211 665
Each term in this table but the first is computed from the previous term. For in-
stance, T(16)= 3 x T(8)+16 = 3 x 65 + 16 = 211. But is this table useful? There is
certainly no obvious pattern in this sequence! What regularity is there to look for?
The solution becomes apparent if we keep more "history" about the value
of T(n). Instead of writing T(2)= 5, it is more useful to write T(2)= 3 x 1 + 2.
Then,
T(4)=3xT(2)+4=3x (3x1+2)+4 =3 2 x1 +3x22+ 4.
We continue in this way, writing n as an explicit power of 2.
n T(n)
1 1
2 3x 1+2
22 32 x1 +3x2+22
23 33 x1 +32 x2+3x22+23
24 34 x 1 +33x2+32 x22 +3x23 +24
25 35 x1 +34 x2+33 x 22 +32 x23 +3 x24 +25
The pattern is now obvious.
It is easy to check this formula against our earlier tabulation. By induction (not math-
ematical induction), we are now convinced that Equation 4.5 is correct.
118 Analysis of Algorithms Chapter 4
n 1 2 4 8 16 32
T(n) -2n -1 1 11 49 179 601
T(n)- n 0 3 15 57 195 633
T(n) 1 5 19 65 211 665
T(n)2+n 2 7 23 73 227 697
T(n) +2n 3 9 27 81 243 729
T(n)= 3n 1g 3 - 2n (4.6)
where the t1 are the values we are looking for. In addition to Equation 4.7, the values
of ti on k values of i (usually 0 < i < k - 1 or 1 • i < k) are needed to determine the
sequence. These initial conditions will be considered later. Until then, Equation 4.7
typically has infinitely many solutions. This recurrence is
• linearbecause it does not contain terms of the form t, _tn j, t2 - , and so on;
c homogeneous because the linear combination of the t,_j is equal to zero; and
c with constant coefficients because the a1 are constants.
Consider for instance our now familiar recurrence for the Fibonacci sequence.
fn = f.-1 + f.-2
This recurrence easily fits the mould of Equation 4.7 after obvious rewriting.
fn - fn 1 - fn-2 =0
=cxO+dx O=O.
k
p(x)= f(x - ri)
z=1
where the ri may be complex numbers. Moreover, these ri are the only solutions
of the equation p(x)= 0.
Consider any root r1 of the characteristic polynomial. Since p (ri) 0 it follows
that x = ri is a solution to the characteristic equation and therefore rin is a solution
to the recurrence. Since any linear combination of solutions is also a solution, we
conclude that
k
t' ii (4.8)
satisfies the recurrence for any choice of constants Cl, C2. Ck. The remarkable
fact, which we do not prove here, is that Equation 4.7 has only solutions of this form
provided all the ri are distinct. In this case, the k constants can be determined from
k initial conditions by solving a system of k linear equations in k unknowns.
Example 4.7.1. (Fibonacci) Consider the recurrence
n if n = Oorn = 1
Lfn- + fn 2 otherwise
First we rewrite this recurrence to fit the mould of Equation 4.7.
fn -fn-1 -fnf2 =0
X2 _ x -1
fnc= cI
rl + C2r2. (4.9)
It remains to use the initial conditions to determine the constants cl and c2. When
n = 0, Equation 4.9 yields o = C1 + C2 . But we know that fo = 0. Therefore,
Section 4.7 Solving recurrences 121
Cl I and c
T5. 2
Thus
f15 ( 2 ) ( 2 ) 11]
which is de Moivre's famous formula for the Fibonacci sequence. Notice how much
easier the technique of the characteristic equation is than the approach by construc-
tive induction that we saw in Section 1.6.4. It is also more precise since all we were
able to discover with constructive induction was that fan grows exponentially in
a number close to P)"; now we have an exact formula. D
If it surprises you that the solution of a recurrence with integer coefficients and
initial conditions involves irrational numbers, try Problem 4.31 for an even bigger
surprise!
Example 4.7.2. Consider the recurrence
0 if n=O
tn5 if n= I
3tn-I + 4tn-2 otherwise
First we rewrite the recurrence.
tn -3tn-1- 4t, 2 = 0
whose roots are ri = -1 and r2 = 4. The general solution is therefore of the form
tn = 4n -(-1)-.
D~
122 Analysis of Algorithms Chapter 4
The situation becomes slightly more complicated when the characteristic poly-
nomial has multiple roots, that is when the k roots are not all distinct. It is still true
that Equation 4.8 satisfies the recurrence for any values of the constants ci, but this
is no longer the most generalsolution. To find other solutions, let
be the characteristic polynomial of our recurrence, and let r be a multiple root. By def-
inition of multiple roots, there exists a polynomial q (x) of degree k - 2 such that
p(x)= (x - r) 2 q(x). For every n > k consider the n-th degree polynomials
Observe that vn(x)= x x un(x), where uW(x) denotes the derivative of un(x)
with respect to x. But un (x) can be rewritten as
Using the rule for computing the derivative of a product of functions, we obtain
that the derivative of un (x) with respect to x is
f mi-l
tn = E' >. cij nirin
i~l j=O
is the general solution to Equation 4.7. Again, the constants cij, 1 < i <1 and
0 < j < mi -1, are to be determined by the k initial conditions. There are k such
constants because Y e., mi = k (the sum of the multiplicities of the distinct roots
is equal to the total number of roots). For simplicity, we shall normally label the
constants cl, C2,. . ., ck rather than using two indexes.
Section 4.7 Solving recurrences 123
In if n = 0,1 or 2
5t, 1 - 8t,-2 + 4t,-3 otherwise
x 3 -5x 2
+8x- 4= (x- 1)(x- 2)2.
tn = Cll' + C 2 21 + c 3 n2l.
tn = 2n+l-n2n-l - 2.
The left-hand side is the same as before, but on the right-hand side we have bn p (n),
where
o b is a constant; and
o p(n) is a polynomial in n of degree d.
t, -2t, 1= (4.11)
(1.
124 Analysis of Algorithms Chapter 4
C 2 3` 1
Looking back on Examples 4.7.4 and 4.7.5, we see that part of the characteristic
polynomial comes from the left-hand side of Equation 4.10 and the rest from the
right-hand side. The part that comes from the left-hand side is exactly as if the
equation had been homogeneous: (x - 2) for both examples. The part that comes
from the right-hand side is a result of our manipulation.
Generalizing, we can show that to solve Equation 4.10 it is sufficient to use the
following characteristic polynomial.
(aoXk + a xk1++ +ak) (X - b)d+l
(Recall that d is the degree of polynomial p (n).) Once this polynomial is obtained,
proceed as in the homogeneous case, except that some of the equations needed to
determine the constants are obtained not from the initial conditions but from the
recurrence itself.
Example 4.7.6. The number of movements of a ring required in the Towers of
Hanoi problem (see Section 4.4) is given by Equation 4.3.
t (M) 0if M -0
12t(m-1)+1 otherwise
This can be written as
t(m)-2t(m -1)= 1, (4.18)
which is of the form of Equation 4.10 with b = 1 and p(n)= 1, a polynomial of
degree 0. The characteristic polynomial is therefore
(x - 2)(x - 1)
where the factor (x - 2) comes from the left-hand side of Equation 4.18 and the
factor (x -1) from its right-hand side. The roots of this polynomial are 1 and 2,
both of multiplicity 1, so all solutions of this recurrence are of the form
m
t(mn) = cl + c2 2m . (4.19)
We need two initial conditions to determine the constants cl and C2. We know that
t (0) = 0; to find the second initial condition we use the recurrence itself to calculate
W()= 2t(0)+1 = 1.
This gives us two linear equations in the unknown constants.
Cl + C2 = 0 m=0
cl+ 2c2 = 1 m =1
From this, we obtain the solution cl - 1 and c2 = 1 and therefore
t(m)= 2" - 1.
If all we want is to determine the exact order of t (m), there is no need to cal-
culate the constants in Equation 4.19. This time we do not even need to substitute
Equation 4.19 into the original recurrence. Knowing that t(m)= Cl + c22 m is suf-
ficient to conclude that c2 > 0 and thus t(m) GE)(2m ). For this, note that t(m),
the number of movements of a ring required, is certainly neither negative nor a
constant since clearly t(m)>im. D
Section 4.7 Solving recurrences 127
t= 2t,_1 + n.
(x - 2)(x - 1)2
with roots 2 (multiplicity 1) and 1 (multiplicity 2). All solutions are therefore of the
form
tn = cl 2' + c 2 I' +c 3 n In. (4.20)
Provided to > 0,and therefore tn > 0 for all n, we conclude immediately that
t, E 0(2n). Further analysis is required to assert that tn E O (2n).
If we substitute Equation 4.20 into the original recurrence, we obtain
n = -tn 12tnI
from which we read directly that 2c3 - C2 = 0 and -C3 = 1, regardless of the initial
condition. This implies that C3 = -1 and c2 = -2. At first we are disappointed
because it is cl that is relevant to determining the exact order of tn as given by
Equation 4.20, and we obtained the other two constants instead. However, those
constants turn Equation 4.20 into
tn = cl 2 - n -2. (4.21)
Provided to > 0, and therefore tn > 0 for all n, Equation 4.21 implies that cl must
be strictly positive. Therefore, we are entitled to conclude that tn E)(2n) with no
need to solve explicitly for cl. Of course, c1 can now be obtained easily from the
initial condition if so desired.
Alternatively, all three constants can be determined as functions of to by setting
up and solving the appropriate system of linear equations obtained from Equa-
tion 4.20 and the value of tj and t2 computed from the original recurrence. L
By now, you may be convinced that, for all practical purposes, there is no
need to worry about the constants: the exact order of tn can always be read off
directly from the general solution. Wrong! Or perhaps you think that the constants
obtained by the simpler technique of substituting the general solution into the
original recurrence are always sufficient to determine its exact order. Wrong again!
Consider the following example.
128 Analysis of Algorithms Chapter 4
51 if =0
t 14th-l - 2n otherwise
t -4t,_1 = -2n,
which is of the form of Equation 4.10 with b = 2 and p(n)= -1, a polynomial of
degree 0. The characteristic polynomial is thus
(x - 4) (x - 2)
with roots 4 and 2, both of multiplicity 1. All solutions are therefore of the form
t, = cl 4n + c 2 2'. (4.22)
You may be tempted to assert without further ado that tn X 0(4n) since that is
clearly the dominant term in Equation 4.22.
If you are in less of a hurry, you may wish to substitute Equation 4.22 into the
original recurrence to see what comes out.
-2n tn -4tn-I
cl 4 + c 2 2' -4(c, 4n-1 ±c
+ 2 2" -1)
-C22'
T(n)= I~3~/)- if n 1
=3T(n/2)+n ifnisapowerof2,n>1
To transform this into a form that we know how to solve, we replace n by 2K.
This is achieved by introducing a new recurrence ti, defined by ti = T(2i). This
transformation is useful because n/2 becomes (2i)/2 = 2i -1. In other words, our
original recurrence in which T(n) is defined as a function of T(n/2) gives way to
one in which t, is defined as a function of ti-1, precisely the type of recurrence we
have learned to solve.
t, = T(2') = 3T(2' 1)+21
= 3ti 1 + 2'
Once rewritten as
ti - 3ti-1 = 2',
this recurrence is of the form of Equation 4.10. The characteristic polynomial is
(x - 3) (x - 2)
We use the fact that T(2T)= t, and thus T(n)= tlgn when n = 2i to obtain
However, we need to show that cl is strictly positive before we can assert something
about the exact order of T(n).
We are now familiar with two techniques to determine the constants. For the
sake of the exercise, let us apply each of them to this situation. The more direct
approach, which does not always provide the desired information, is to substitute
Section 4.7 Solving recurrences 131
the solution provided by Equation 4.25 into the original recurrence. Noting that
(1/2)1g3= 1/3, this yields
n= T(n)-3T(n/2)
= (cl n g3 + C2 n) -3 (C, (n/2)g3+C2 (n/2))
= -C2 n/2
and therefore c2 = -2. Even though we did not obtain the value of cl, which is
the more relevant constant, we are nevertheless in a position to assert that it must
be strictly positive, for otherwise Equation 4.25 would falsely imply that T(n) is
negative. The fact that
is thus established. Of course, the value of c1 would now be easy to obtain from
Equation 4.25, the fact that c2 = -2, and the initial condition T(1) = 1, but this is not
necessary if we are satisfied to solve the recurrence in asymptotic notation. More-
over we have learned that Equation 4.26 holds regardless of the initial condition,
provided T(n) is positive.
The alternative approach consists of setting up two linear equations in the two
unknowns cl and c2. It is guaranteed to yield the value of both constants. For this,
we need the value of T (n) on two points. We already know that T (1) = 1. To obtain
another point, we use the recurrence itself: T(2)= 3T(1 ) +2 = 5. Substituting n = 1
and n = 2 in Equation 4.25 yields the following system.
ci + C2 =1 n= 1
3ci + 2C2 =5 n =2
Solving these equations, we obtain cl = 3 and c2 = -2. Therefore
T(n)= 3nlg3 - 2n
T(n)= 4T(n/2)+n 2
t, - 4t- 1 I = 4'
132 Analysis of Algorithms Chapter 4
The characteristic polynomial is (x - 4)2 and hence all solutions are of the form
ti cl 4' + C2 i4'.
n2 = T(n) -4T(n/2)= C2 2
T(n)= 2T(n/2)+nlgn
ti - 2ti-1 = i2'
Remark: In the preceding examples, the recurrence given for T(n) only applies
when n is a power of 2. It is therefore inevitable that the solution obtained should
be in conditional asymptotic notation. In each case, however, it is sufficient to add
the condition that T(n) is eventually nondecreasing to be able to conclude that
the asymptotic results obtained apply unconditionally for all values of n. This
follows from the smoothness rule (Section 3.4) since the functions nIg3, n 2 log n
and n log2 n are smooth.
Example 4.7.13. We are now ready to solve one of the most important recurrences
for algorithmic purposes. This recurrence is particularly useful for the analysis
of divide-and-conquer algorithms, as we shall see in Chapter 7. The constants
no > 1, 1? :1, b > 2 and k > 0 are integers, whereas c is a strictly positive real
number. Let T: N-. R' be an eventually nondecreasing function such that
when n/no is an exact power of b, that is when n c {bno, b2 no, b3 no,... }. This
time, the appropriate change of variable is n = b no.
The right-hand side is of the required form a1 p(i) where p(i)= cnois a con-
stant polynomial (of degree 0) and a = bk. Thus, the characteristic polynomial is
(x - ?)(x - bk) whose roots are 4 and bk. From this, it is tempting (but false in
general!) to conclude that all solutions are of the form
ti = cl il +c 2 (bk)L. (4.30)
To write this in terms of T(n), note that i = logb(n/no) when n is of the proper
form, and thus di = (nf/no)l19b d for arbitrary positive values of d. Therefore,
= c3 nlogb e + C4 fk
for appropriate new constants C3 and C4 . To learn about these constants, we sub-
stitute Equation 4.31 into the original recurrence.
• If l > bk then C4 < 0 and logb 1 > k. The fact that C4 is negative implies that
c3 is positive, for otherwise Equation 4.31 would imply that T(n) is nega-
tive, contrary to the specification that T: N -R '. Therefore the term C3 nllogb
dominates Equation 4.31. Furthermore nlogb e is a smooth function and T(n)
is eventually nondecreasing. Therefore T(n) c-0(nlog, e),
o If 13= bk, however, we are in trouble because the formula for C4 involves a divi-
sion by zero! What went wrong is that in this case the characteristic polynomial
has a single root of multiplicity 2 rather than two distinct roots. Therefore Equa-
tion 4.30 does not provide the general solution to the recurrence. Rather, the
general solution in this case is
ti = C5 (b k)i+C6 i (bkni
for appropriate constants C7 and c8. Substituting this into the original recur-
rence, our usual manipulation yields a surprisingly simple c8 = c. Therefore,
c nk logb n is the dominant term in Equation 4.32 because c was assumed to
be strictly positive at the beginning of this example. Since nk log n is smooth
and T(n) is eventually nondecreasing, we conclude that T(n)E O(nk log n).
Putting it all together,
f6(nk) if - < bk
T(n)e E(nk logn) if f = bk (4.33)
0 (nlogb 4) if e > b k
Problem 4.44 gives a generalization of this example. D
Remark: It often happens in the analysis of algorithms that we derive a recurrence
in the form of an inequality. For instance, we may get
when n/no is an exact power of b, instead of Equation 4.29. What can we say
about the asymptotic behaviour of such a recurrence? First note that we do not
have enough information to determine the exact order of T(n) because we are given
Section 4.7 Solving recurrences 135
only an upper bound on its value. (For all we know, it could be that T(n)= 1 for
all n.) The best we can do in this case is to analyse the recurrence in terms of
the 0 notation. For this we introduce an auxiliary recurrence patterned after the
original but defined in terms of an equation (not an inequation). In this case
t(n)= |T(no) if n = no
ln -
{PT(n/b) +cnk if n/no is a power of b, n > no.
This new recurrence falls under the scope of Example 4.7.13, except that we
have no evidence that T(n) is eventually nondecreasing. Therefore Equation 4.33
holds for T(n), provided we use conditional asymptotic notation to restrict n/no
to being a power of b. Now, it is easy to prove by mathematical induction that
T(n)< T(n) for all n > no such that n/no is a power of b. But clearly if
O(nk) if ? < bk
T(n)E 0(nklogn) ifl =bk
l O(n'9b r) if P > bk
We shall study further recurrences involving inequalities in Section 4.7.6.
So far, the changes of variable we have used have all been of the same logarith-
mic nature. Rather different changes of variable are sometimes useful. We illustrate
this with one example that comes from the analysis of the divide-and-conquer al-
gorithm for multiplying large integers (see Section 7.1).
Example 4.7.14. Consider an eventually nondecreasing function T(n) such that
for all sufficiently large n, where c is some positive real constant. As explained in
the remark following the previous example, we have to be content here to analyse
the recurrence in terms of the 0 notation rather than the 8 notation.
Let no > 1 be large enough that T(m)> T(n) for all m > n > no/ 2 and Equa-
tion 4.34 holds for all n > no. Consider any n > no. First observe that
Now make a change of variable by introducing a new function T such that T (n)
T(n + 2) for all n. Consider again any n > no.
In particular,
T(n)< 3T(n/2)+dn n > no
when n/no is a power of 2, where d = 2c. This is a special case of the recurrence
analysed in the remark following Example 4.7.13, with f = 3, b = 2 and k = 1. Since
F > bk, we obtain t (n)e O(nlg3 ). Finally, we use one last time the fact that T(n)
is eventually nondecreasing: T(n) < T(n + 2)= T(n) for any sufficiently large n.
Therefore any asymptotic upper bound on T(n) applies equally to T(n), which
concludes the proof that T(n) Q(nig3 ). D
ti = T(2') 2i T 2 (2' 1)
2' ti 1
At first glance, none of the techniques we have seen applies to this recurrence since
it is not linear; furthermore the coefficient 2i is not a constant. To transform the
range, we create yet another recurrence by using ui to denote lg ti.
u= lg ti i + 2g t1
i+2ui-1
(x - 2)(x 1)2
Ui =c2'+C21-i+C3i11 .
i = ui - 2uj-j
=c 1 2'+C2+C3i -2(c 2'-±c2+C3(i 1))
= (2c3 - C2) -C3 i
and thus C3 = -1 and C2 = 2C3 = -2. Therefore, the general solution for ui, if the
initial condition is not taken into account, is ui cl 2i - i - 2. This gives us the
general solution for t1 and T(n).
tj = 2ui = 2 cjT-i-2
22n
4n 3n'
when n is sufficiently large, where all we know about f (n) is that it is in the exact
order of n, and we know nothing specific about the initial condition that defines
T(n) except that it is positive for all n. Such an equation is called an asymptotic re-
currence. Fortunately, the asymptotic solution of a recurrence such as Equation 4.36
is virtually always identical to that of the simpler Equation 4.35. The general tech-
nique to solve an asymptotic recurrence is to "sandwich" the function it defines
between two recurrences of the simpler type. When both simpler recurrences have
138 Analysis of Algorithms Chapter 4
the same asymptotic solution, the asymptotic recurrence must have the same solu-
tion as well. We illustrate this with our example.
For arbitrary positive real constants a and b, define the recurrence
Ta,b(n)= (a + b)n 2 - bn
for all n > 1. Since both Tr,s (n) and T,,v(n) are in the exact order of n 2, it will
follow that T (n) e 0(n 2 ) as well. One interesting benefit of this approach is that
Ta,b (n) is nondecreasing for any fixed positive values of a and b, which makes it
possible to apply the smoothness rule. On the other hand, this rule could not have
been invoked to simplify directly the analysis of T(n) by restricting our attention
to the case when n is a power of 2 because Equation 4.36 does not provide enough
information to be able to assert that T(n) is nondecreasing.
We still have to prove the existence of r, s, u and v. For this, note that
Ta,a(n) = a T1 ,j(n) for all a and Ta,b(n)< Ta',b'(n) when a < a' and b < b' (two
more easy proofs by mathematical induction). Let c and d be positive real constants
and let no be an integer sufficiently large that cn s f(n) s dn and
T(n)> r Tl j (n)
=T,,r (n) > Tr,, (n)
and
T(n) s u Tlij (n)= Tusu(n)< Tu,v (n)
for all n S nO. This forms the basis of the proof by mathematical induction that
Equation 4.37 holds. For the induction step, consider an arbitrary n > no and
Section 4.8 Problems 139
assume by the induction hypothesis that Tr,s (m) < T(m) < TU,V (m) for all m < n.
Then,
T(n) = 4 T(n 2) +f (n)
<4T(n 2)+dn
- 4 T.,, (n . 2) +dn by the induction hypothesis
< 4 T,,, (n 2) +vn since d < v
= Tu,v (n).
The proof that Tr,s(n) < T(n) is similar, which completes the argument.
Example 4.7.16. The important class of recurrences solved in Example 4.7.13 can
be generalized in a similar way. Consider a function T: N - R+ such that
T(n)= f T(n . b)+f(n)
for all sufficiently large n, where f > 1 and b > 2 are constants, and f (n) e 6 (nk
for some k > 0. For arbitrary positive real constants x and y, define
Tx~yX (n = kif n =1
T (n) I, Ty (n b) +ynk if n > 1.
4.8 Problems
Problem 4.1. How much time does the following "algorithm" require as a function
of n?
fr - o
for i -1 to n do
for j - 1 to n2 do
for k - I to n3 do
Express your answer in 0 notation in the simplest possible form. You may consider
that each individual instruction (including loop control) is elementary.
140 Analysis of Algorithms Chapter 4
Problem 4.2. How much time does the following "algorithm" require as a function
of n?
e- o
for i -1 to n do
for j - 1 to i do
for k - j to n do
e - e+i
Express your answer in 8 notation in the simplest possible form. You may consider
that each individual instruction (including loop control) is elementary.
for i - 1 to m do P
which is part of a larger algorithm that works on an instance of size n. Let t be the
time needed for each execution of P, which we assume independent of i for the
sake of this problem (but t could depend on n). Prove that this loop takes a time
in O ( mt) provided t is bounded below by some constant and provided there exists a
threshold no such that m > 1 whenever n > no. (Recall that we saw in Section 4.2.2
that the desired conclusion would not hold without those restrictions.)
Problem 4.4. Prove by mathematical induction that the values of i and j at the
end of the k-th iteration of Fibiter in Section 4.2.2 are fk-, and fk, respectively,
where f, denotes the n-th Fibonacci number.
function C(n, k)
if k = O or k = n then return 1
else return C(n -1, k - 1)+C(n -1, k)
Analyse the time taken by this algorithm under the (unreasonable) assumption
that the addition C(n - 1, k - 1)+C(n -1, k) can be carried out in constant time
once both C(n - 1, k - 1) and C(n - 1, k) have been obtained recursively. Let t(n)
denote the worst time that a call on C(n, k) may take for all possible values of k,
o < k < n. Express t(n) in the simplest possible form in 0 notation.
Problem 4.6. Consider the following "algorithm".
procedure DC(n)
If n < 1 then return
for i - 1 to 8 do DC(n 2)
for i - 1 to n3 do dummy - 0
Section 4.8 Problems 141
Write an asymptotic recurrence equation that gives the time T(n) taken by a call
of DC(n). Use the result of Example 4.7.16 to determine the exact order of T(n) in
the simplest possible form. Do not reinvent the wheel here: apply Example 4.7.16
directly. The complete solution of this problem should not take more than 2 or
3 lines!
Note: This is how easy it is to analyse the running time of most divide-and-conquer
algorithms; see Chapter 7.
Problem 4.7. Rework Problem 4.6 if the constant 8 that appears in the middle line
of algorithm DC is replaced by 9.
Problem 4.8. Rework Problem 4.6 if the constant 8 that appears in the middle line
of algorithm DC is replaced by 7.
Problem 4.9. Consider the following "algorithm".
procedure waste(n)
for i 1 to n do
for j - 1 to i do
write i, j, n
if n > 0 then
for i - 1 to 4 do
waste(n . 2)
Let T(n) stand for the number of lines of output generated by a call of waste(n).
Provide a recurrence equation for T(n) and use the result of Example 4.7.16 to
determine the exact order of T(n) in the simplest possible form. (We are not asking
you to solve the recurrence exactly.)
Problem 4.10. Prove by mathematical induction that if do = n and de < df- I/ 2
for all f > 1, then de < n/2t for all l?2 0. (This is relevant to the analysis of the
time taken by binary search; see Section 4.2.4.)
Problem 4.11. Consider the following recurrence for n > 1.
i~) Ic if n 1
{t(n : 2)+b otherwise
Use the technique of the characteristic equation to solve this recurrence when n is
a power of 2. Prove by mathematical induction that f (n) is an eventually nonde-
creasing function. Use the smoothness rule to show that I (n)e O)(logn). Finally,
conclude that t(n)E O(logn), where t(n) is given by Equation 4.2. Can we con-
clude from Equation 4.2 that t (n) E G(log n) ? Why or why not?
Problem 4.12. Prove that the initialization phase of pigeon-hole sorting (Sec-
tion 2.7.2) takes a time in O(n + s).
Problem 4.13. We saw in Section 4.2.4 that binary search can find an item in a
sorted array of size n in a time in O(logn). Prove that in the worst case a time in
Q (log n) is required. On the other hand, what is the time required in the best case?
142 Analysis of Algorithms Chapter 4
Problem 4.14. How much time does insertion sorting take to sort n distinct items
in the best case? State your answer in asymptotic notation.
Problem 4.15. We saw in Section 4.4 that a good barometer for the worst-case
analysis of insertion sorting is the number of times the while loop condition is
tested. Show that this barometer is also appropriate if we are concerned with the
best-case behaviour of the algorithm (see Problem 4.14).
Problem 4.16. Prove that Euclid's algorithm performs least well on inputs of any
given size when computing the greatest common divisor of two consecutive num-
bers from the Fibonacci sequence.
Problem 4.17. Give a nonrecursive algorithm to solve the Towers of Hanoi prob-
lem (see Section 4.4). It is cheating simply to rewrite the recursive algorithm using
an explicit stack to simulate the recursive calls!
Problem 4.18. Prove that the Towers of Hanoi problem with n rings cannot be
solved with fewer than 2n - 1 movements of rings.
Problem 4.19. Give a procedure similar to algorithm count from Section 4.6to
increase an m-bit binary counter. This time, however, the counter should remain
all ones instead of cycling back to zero when an overflow occurs. In other words,
if the current value represented by the counter is v, the new value after a call on
your algorithm should be min(v + 1, 2m - 1). Give the amortized analysis of your
algorithm. It should still take constant amortized time for each call.
Problem 4.20. Prove Equation 4.6 from Section 4.7.1 by mathematical induction
when n is a power of 2. Prove also by mathematical induction that the function
T (n) defined by Equation 4.4 is nondecreasing (for all n, not just when n is a power
of 2).
Problem 4.21. Consider arbitrary positive real constants a and b. Use intelligent
guesswork to solve the following recurrence when n > 1.
t (n) aif n 1
Lnt(n-l)+bn otherwise.
You are allowed a term of the form X 1/i! in your solution. Prove your an-
swer by mathematical induction. Express t (n) in 6 notation in the simplest
possible form. What is the value of liming t(n)/n! as a function of a and b?
(Note: l/i!
II = e -1, where e = 2.7182818 ... is the base of the natural loga-
rithm.) Although we determined the asymptotic behaviour of this recurrence in
Problem 1.31 using constructive induction, note that this time we have obtained a
more precise formula for t(n). In particular, we have obtained the limit of t(n)/n!
as n tends to infinity. Recall that this problem is relevant to the analysis of the
recursive algorithm for calculating determinants.
Problem 4.22. Solve the recurrence of Example 4.7.2 by intelligent guesswork.
Resist the temptation to "cheat" by looking at the solution before working out this
problem!
Section 4.8 Problems 143
Problem 4.23. Prove that Equation 4.7 (Section 4.7.2) has only solutions of the
form
k
t, = E cri'
til
provided the roots rl, r2,..., rk of the characteristic polynomial are distinct.
Problem 4.24. Complete the solution of Example 4.7.7 by determining the value
of cl as function of to.
Problem 4.25. Complete the solution of Example 4.7.11 by determining the value
of cl as function of to.
Problem 4.26. Complete the solution of Example 4.7.12 by determining the value
of cl as function of to.
Problem 4.27. Complete the solution of Example 4.7.13 by showing that C =c.
Problem 4.28. Complete the remark that follows Example 4.7.13 by proving by
mathematical induction that T(n) < T(n) for all n > no such that n/no is a power
of b.
Problem 4.29. Solve the following recurrence exactly.
In if n = Oorn= 1
|5t1 - 6tn-2 otherwise
t n if n =Oorn =1
|2tn-1 -2tn-2 otherwise
Prove that t, = 2"/2 sin(nrT/4), not by mathematical induction but by using the
technique of the characteristic equation.
in if n =0,1,2or3
+
1tn- tn-3 - tn-4 otherwise
tn+1 if n =Oorn =
L3tn-, - 2tn 2+ 3 x 2n 2 otherwise
T(n)= a if n =Oorn= 1
IT(n -l)+T(n -2)+c otherwise
Express your answer as simply as possible using the 0 notation and the golden
ratio P = (1 + 5\f) /2. Note that this is Recurrence 4.1 from Section 4.2.3 if h(n) = c,
which represents the time taken by a call on Fibrec(n) if we count the additions at
unit cost.
Problem 4.35. Solve the following recurrence exactly.
T(n)- La if n = or n = 1
IT(n-l)+T(n -2)+cn otherwise
Express your answer as simply as possible using the 0 notation and the golden ratio
P = (1 + 5 ) /2. Note that this is Recurrence 4.1 from Section 4.2.3 if h(n) = cn,
which represents the time taken by a call on Fibrec(n) if we do not count the
additions at unit cost. Compare your answer to that of Problem 4.34, in which
additions were counted at unit cost.
Problem 4.36. Solve the following recurrence exactly for n a power of 2.
T(n)=- if n 1
=4T(n/2)+n otherwise
T~n)=
1 if n =1
( 2T(n/2)+lgn otherwise
Problem 4.39. Solve the following recurrence exactly for n of the form 2 2 k.
(1 if n= 2
T(n) l2T( /n)++lgn otherwise
[n if n=Oorn 1
T(n)==
T l2T2(n -1)+VT2(n - 2)+n otherwise
1 if n = 1
T(n)= 3/2 ifn =2
l2T(n/2)- T(n/4)-1/n otherwise
t = O ifn 0=
l1/(4- tn 1) otherwise
Problem 4.43. Solve the following recurrence exactly as a function of the initial
conditions a and b.
a if n= 0
T(n)= b ifn =1
(1 + T(n - 1))/T(n - 2) otherwise
log,(n/no)
T(n)= E af
f(n/b ).
P) if f(n) O(nP/(logn)l+E)
T(n)c ()(f(n)lognloglogn) iff (n)O(nP/logn)
(ff (n)logn) iff(n) (nP(logn)E
) 1)
O (f(n)) if f (n)e E(nP+-).
3. As a special case of the first alternative, T(n) e O(nP) whenever f(n)e 0 (n)
for some real constant r < p.
4. The first alternative can be generalized to include cases such as
The use of well-chosen data structures is often a crucial factor in the design of
efficient algorithms. Nevertheless, this book is not intended to be a manual on
data structures. We assume the reader already has a good working knowledge of
such basic notions as arrays, records, and the various structured data types ob-
tained using pointers. We also suppose that the mathematical concepts of directed
and undirected graphs are reasonably familiar, and that the reader knows how to
represent such objects efficiently on a computer.
The chapter begins with a brief review of the more important aspects of these
elementary data structures. The review includes a summary of their essential
properties from the point of view of algorithmics. For this reason even readers
who know the basic material well should skim the first few sections. The last three
sections of the chapter introduce the less elementary notions of heaps and disjoint
sets. Chosen because they will be used in subsequent chapters, these structures
also offer interesting examples of the analysis of algorithms. Most readers will
probably need to read the sections concerning these less familiar data structures
quite thoroughly.
Here tab is an array of 50 integers indexed from 1 to 50; tab[l] refers to the first item
of the array, tab[50] to the last. It is natural to think of the items as being arranged
147
148 Some Data Structures Chapter 5
from left to right, so we may also refer to tab[l] as the left-hand item, and so on.
We often omit to specify the type of the items in the array when the context makes
this obvious.
From the point of view that interests us in this book, the essential property of
an array is that we can calculate the address of any given item in constant time.
For example, if we know that the array tab above starts at address 5000, and that
integer variables occupy 4 bytes of storage each, then the address of the item with
index k is easily seen to be 4996 + 4k. Even if we think it worthwhile to check that
k does indeed lie between 1 and 50, the time required to compute the address can
still be bounded by a constant. It follows that the time required to read the value
of a single item, or to change such a value, is in 0 (1): in other words, we can treat
such operations as elementary.
On the other hand, any operation that involves all the items of an array will
tend to take longer as the size of the array increases. Suppose we are dealing with
an array of some variable size n; that is, the array consists of n items. Then an
operation such as initializing every item, or finding the largest item, usually takes
a time proportional to the number of items to be examined. In other words, such
operations take a time in 0(n). Another common situation is where we want to
keep the values of successive items of the array in order-numerical, alphabetic,
or whatever. Now whenever we decide to insert a new value we have to open up a
space in the correct position, either by copying all the higher values one position to
the right, or else by copying all the lower values one position to the left. Whichever
tactic we adopt (and even if we sometimes do one thing, sometimes the other), in
the worst case we may have to shift n /2 items. Similarly, deleting an element may
require us to move all or most of the remaining items in the array. Again, therefore,
such operations take a time in 0((n).
A one-dimensional array provides an efficient way to implement the data struc-
ture called a stack. Here items are added to the structure, and subsequently re-
moved, on a last-in-first-out (LIFO) basis. The situation can be represented using an
array called stack, say, whose index runs from 1 to the maximum required size of
the stack, and whose items are of the required type, along with a counter. To set
the stack empty, the counter is given the value zero; to add an item to the stack,
the counter is incremented, and then the new item is written into stack[counter];
to remove an item, the value of stack[counter] is read out, and then the counter is
decremented. Tests can be added to ensure that no more items are placed on the
stack than were provided for, and that items are not removed from an empty stack.
Adding an item to a stack is usually called a push operation, while removing one
is called pop.
The data structure called a queue can also be implemented quite efficiently in
a one-dimensional array Here items are added and removed on a first-in-first-out
(FIFO) basis; see Problem 5.2. Adding an item is called an enqueue operation, while
removing one is called dequeue. For both stacks and queues, one disadvantage
of using an implementation in an array is that space usually has to be allocated at
the outset for the maximum number of items envisaged; if ever this space is not
sufficient, it is difficult to add more, while if too much space is allocated, waste
results.
Section 5.1 Arrays, stacks and queues 149
The items of an array can be of any fixed-length type: this is so that the address
of any particular item can be easily calculated. The index is almost always an integer.
However, other so-called ordinal types can be used. For instance,
is one possible way of declaring an array of 26 values, indexed by the letters from 'a'
to 'z'. It is not permissible to index an array using real numbers, nor do we allow an
array to be indexed by structures such as strings or sets. If such things are allowed,
accessing an array item can no longer be considered an elementary operation.
However a more general data structure called an associative table, described in
Section 5.6, does allow such indexes.
The examples given so far have all involved one-dimensional arrays, that is,
arrays whose items are accessed using just one index. Arrays with two or more
indexes can be declared in a similar way. For instance,
is one possible way to declare an array containing 400 items of type complex. A ref-
erence to any particular item, such as matrix[5, 7], now requires two indexes. The
essential point remains, however, that we can calculate the address of any given
item in constant time, so reading or modifying its value can be taken to be an
elementary operation. Obviously if both dimensions of a two-dimensional array
depend on some parameter n, as in the case of an n x n matrix, then operations
such as initializing every item of the array, or finding the largest item, now take a
time in 9 (n 2 ).
We stated above that the time needed to initialize all the items of an array of size
n is in 0((n). Suppose however we do not need to initialize each item, but simply to
know whether it has been initialized or not, and if so to obtain its value. Provided
we are willing to use rather more space, the technique called virtual initialization
allows us to avoid the time spent setting all the entries in the array. Suppose the
array to be virtually initialized is T[1 . . n]. Then we also need two auxiliary arrays
of integers the same size as T, and an integer counter. Call these auxiliary arrays
a[1. .n] and b[l . . n], say, and the counter ctr. At the outset we simply set ctr
to zero, leaving the arrays a, b and T holding whatever values they happen to
contain.
Subsequently ctr tells us how many elements of T have been initialized, while
the values a [1] to afctr] tell us which these elements are: a[1] points to the element
initialized first, a[2] to the element initialized second, and so on; see Figure 5.1,
where three items of the array T have been initialized. Furthermore, if T[i] was
the k-th element to be initialized, then b[i]= k. Thus the values in a point to T
and the values in b point to a, as in the figure.
To test whether T [i] has been assigned a value, we first check that 1 < b [i] < ctr.
If not, we can be sure that T[i] has not been initialized. Otherwise we are not sure
whether T[i] has been initialized or not: it could be just an accident that b[i] has a
plausible value. However if T[i] really has been assigned a value, then it was the
b[ i] -th element of the array to be initialized. We can check this by testing whether
150 Some Data Structures Chapter 5
1 2 3 4 5 6 7 8
-6| 17 24 T
4 7 2 a (ctr = 3)
3 1 2 b
a [ b[i]] i. Since the first ctr values of the array a certainly have been initialized,
and 1 < b[i]< ctr, it cannot be an accident if a[b[i]]= i, so this test is conclusive:
if it is satisfied, then T[i] has been initialized, and if not, not.
To assign a value to T[i] for the first time, increment the counter ctr, set alctr]
to i, set b[i] to ctr, and load the required value into T[i]. On subsequent assign-
ments to T[i], of course, none of this is necessary. Problem 5.3 invites you to fill in
the details. The extra space needed to use this technique is a constant multiple of
the size of T.
then the array class holds 50 records. The attributes of the seventh member of
the class are referred to as class[7]. name, class[7]. age, and so on. As with arrays,
provided a record holds only fixed-length items, the address of any particular item
can be calculated in constant time, so consulting or modifying the value of a field
may be considered as an elementary operation.
Section 5.3 Lists 151
says that boss is a pointer to a record whose type is person. Such a record can be
created dynamically by a statement such as
Now boss t (note that here the arrow follows the name) means "the record that boss
points to". To refer to the fields of this record, we use boss T.name, boss t. age, and
so on. If a pointer has the special value nil, then it is not currently pointing to any
record.
5.3 Lists
A list is a collection of items of information arranged in a certain order. Unlike
arrays and records, the number of items in a list is generally not fixed, nor is it
usually bounded in advance. The corresponding data structure should allow us
to determine, for example, which is the first item in the structure, which is the
last, and which are the predecessor and the successor (if they exist) of any given
item. On a machine, the storage corresponding to any given item of information
is often called a node. Besides the information in question, a node may contain one
or more pointers. Such a structure is frequently represented graphically by boxes
and arrows, as in Figure 5.2. The information attached to a node is shown inside
the corresponding box, and the arrows show links from a node to its successor.
the items of a list occupy the slots value[1] to value[counter], and the order of the
items is the same as the order of their indexes in the array. Using this implemen-
tation, we can find the first and the last items of the list rapidly, as we can the
predecessor and the successor of a given item. On the other hand, as we saw in
Section 5.1, inserting a new item or deleting one of the existing items requires a
worst-case number of operations in the order of the current size of the list. It was
noted there, however, that this implementation is particularly efficient for the im-
portant structure known as the stack; and a stack can be considered as a kind of
list