0% found this document useful (0 votes)
11 views55 pages

Algo

fafa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views55 pages

Algo

fafa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

ALGO

Alexandre Duret-Lutz
December 1, 2020

These are incomplete1 lecture notes for the ‘‘ALGO’’ course taught 1
The latest version can be retrieved
to ING1 students at EPITA. This course studies the complexity of algo- from https://www.lrde.epita.
rithms, and introduces students to several mathematical tools useful to fr/~adl/ens/algo/algo.pdf.
Please email me any correction to
analyze algorithms and derive complexity bounds. [email protected].

Contents

Mathematical Background 4
Read this before the second lecture.

Two ∑ Notations for Sums 4


Logarithms 5
Floor b x c and Ceiling d x e of x 6
Simple Combinatorics 7
Triangular Numbers 8
Tetrahedral Numbers 9
Pyramidal Numbers 10
Sum of an Arithmetic Progression 11
Sum of a Geometric Progression 12 9 Sections or paragraphs introduced
with this mark contain more advanced
9 Catalan Numbers 13 material that is not strictly necessary
to understand the rest of the text. You
9 Bounding Sums with Integrals 14 may want to skip them on first read,
and decide later if you want to read
9 Summing Using the Reciprocal 15 more.
9 Finite Calculus 16
Binary Trees 17
9 Comparison of Two Uniformly Distributed Values 18
9 Probabilities over Sequences of Random Numbers 19
Computing Complexities for Algorithms 20
Sel e c t i o n S o r t 21
Ins e r t i o n S o r t 22
Average-Case Analysis 23
Bin a r y S e a r c h 24
Definitions for Big-Θ, Big-O, and Big-Ω Notations 25
Properties of Big-Θ, Big-O, and Big-Ω Notations 26
Usage of Big-Θ, Big-O, and Big-Ω Notations 27
A Bestiary of Common Complexity Functions 28
Merging two Sorted Sub-Arrays 29
algo 2

Me r g e S o r t 30
Exploring Recurrence Equations 31
9 Solving Recurrence Equations by Differentiation 32
Master Theorem for Recurrence Equations 33
Establishing Upper Bounds by Mathematical Induction 34
When Mathematical Induction on Recurrence Fails 35
More Examples of Complexities 36
Nearly Complete Binary Trees 36
Heaps 36
He a p i f y and B u i l d H e a p 37
The Complexity of H e a p i f y 38
The Complexity of B u i l d H e a p 39
He a p S o r t 40
Pa r t i t i o n 41
Qu i c k S o r t 42
Worst and Best Cases for Q u i c k S o r t 43
Average Complexity of Q u i c k S o r t 44
Qu i c k S o r t Optimizations 45
Int r o S o r t, or How to Avoid Q u i c k S o r t’s Worst Case 46
Qu i c k S e l e c t 47
Average complexity of Q u i c k S e l e c t 48
Linear Selection 49
Space Complexity 50
In-place Algorithms 50
More Sort-Related Topics 51
Stable Sorting Techniques 51
Worst Case for Comparison Sort 52
Counting Sort 53
Radix Sort (LSD) 54
Radix Sort (MSD) 54
Further Reading 55
algo 3

F N O TAT I O N f

Although this document is written in English, it targets French students. As such, it mixes conventions
from different origins. For instance, I prefer to write ⊆ and ⊂ (for the analogy with ≤ and <) rather than
the ⊂ and convention commonly used in France.
N the set of natural numbers, including 0. N = {0, 1, 2, . . .}
N+ the set of natural numbers, excluding 0. N+ = {1, 2, 3, . . .}
Z the set of integers
R the set of real numbers
A⊆B A is a subset of (and possibly equal to) B
A⊂B A is a proper subset of B
loga x the logarithm of x in base a page 5
ln x loge x, the natural logarithm page 5
bxc the floor function, largest integer less than or equal to x page 6
dxe the ceiling function, smallest integer greater than or equal to x page 6
n the nth falling power of x: x n = x ( x − 1)( x − 2) . . . ( x − n + 1) page 7
x 
n
a binomial coefficient: page 7
k k
there are (nk) = nk! ways to choose k items out of n
O( f (n)) the set of functions that, up to some multiplicative factor, are dominated pages 25–26
by f (n) asymptotically
Ω( f (n)) the set of functions that, up to some multiplicative factor, dominate f (n) pages 25–26
asymptotically
Θ( f (n)) the set of functions that, up to some multiplicative factors, dominate and pages 25–26
are dominated by f (n) asymptotically
f (n)
f (n) ∼ g(n) f (n) is asymptotically equivalent to g(n), i.e., lim =1 page 26
n→∞ g (n )
iff if and only if
positive (strictly) greater than zero (x > 0)
negative (strictly) less than zero (x < 0)
non-positive less than or equal to zero (x ≤ 0)
non-negative greater than or equal to zero (x ≥ 0)
algo 4

Mathematical Background

There are a couple of mathematical notions we need to review before


we turn our attention to algorithms. I expect you to be already famil-
iar with many of those, but you might learn of few tricks along the
way.

Two ∑ Notations for Sums


A sum such as a0 + a1 + · · · + an−1 is more compactly written

n −1
∑ ak ∑
2
or, using a more general form, ak . As programmers you should learn
to love semi-open intervals. Think
k =0 0≤ k < n
for instance about how begin() and
end() are used in the C++ standard
The latter form has a couple of advantages over the former one.
library. If the interval was closed (i.e.,
if it included the value pointed to
• As this example demonstrates, we can use the semi-open interval2 by end()) you would not be able to
0 ≤ k < n instead of the more tedious 0 ≤ k ≤ n − 1. specify an empty range.

• The sum ∑ k clearly evaluates to 0 when b < a since there is


a≤k≤b
b
nothing to sum. This is not so obvious3 with ∑ k. 3
But true nonetheless. I.e., it is wrong
b a b
k= a
to write ∑ k = ∑ k because the ∑
• The general form supports the addition of more constraints. The k= a k=b k= a
notation is about going from a to b
sum of all odd numbers below 1000 can be expressed as using an increment of 1.

499
∑ k or less intuitively ∑ 2k + 1. (1)
1≤k <1000 k =0
k odd

• The general form makes variable substitutions much less error-


prone. Let us look at the sum of all odd numbers from (1) and see
how we can derive the right-hand expression starting from the
left-hand one.
Since k should be odd, let us replace all occurrences of k by 2k + 1:

∑ k= ∑ 2k + 1
1≤k<1000 1≤2k +1<1000
k odd 2k +1 odd

As 2k + 1 is always odd, the constraint is now superfluous:

∑ 2k + 1 = ∑ 2k + 1
1≤2k +1<1000 1≤2k +1<1000
2k +1 odd

We can simplify 1 ≤ 2k + 1 < 1000 by subtracting 1 from all sides,


and then halving them:

∑ 2k + 1 = ∑ 2k + 1
1≤2k +1<1000 0≤k <499.5

Now since k is an integer changing 0 ≤ k < 499.5 into the equiva-


lent 0 ≤ k ≤ 499 gives us the right-hand expression of (1).
algo 5

10x ex 2x
Logarithms 10

8
The logarithm in base a, i.e., the function x 7→ loga x, is the reciprocal
function of x 7→ a x . Figure 1 shows a few examples. 6
It is common to write ln x = loge x for the natural logarithm4 , i.e., 4
the logarithm in base e. But this natural logarithm will have almost log2 x
2 ln x
no use to us. When analyzing algorithms, we will usually encounter log10 x
loga x for various integer values of a, and most often a = 2. 0
0 2 4 6 8 10
There is a simple algorithm for computing a logarithm in any
base, using only elementary operations. This algorithm is also a N Figure 1: Various logarithms and
perfect exercise to practice logarithms. their reciprocal functions. This figure
is restricted to positive values because
Let us compute log10 (1385) up to two decimal places. Because
negative values will never occur in the
x 7→ log10 x is the reciprocal function of x 7→ 10x we know that analysis of algorithms.
log10 (1000) = log10 (103 ) = 3 and log10 (10000) = log10 (104 ) = 4.
4
Trivia: x 7→ ln x is sometimes called
Furthermore, since 1000 < 1385 < 10000 and log10 is an increasing
Napierian logarithm after John Napier
function, it follows that 3 < log10 (1385) < 4. We are therefore (known as Neper in France), despite
looking for two digits a and b such that the fact that the function he defined
in 1614 was different. The natural
logarithm was introduced by Nicolaus
log10 (1385) = 3.ab . . . (2) Mercator in 1668 using the series
expansion of ln(1 + x ). Finally around
To find a, we should subtract 3 from both sides, multiply everything 1730 Leonhard Euler defined the
by 10 and rework the left-hand side as a log10 : functions ex = limn→∞ (1 + x/n)n and
ln x = limn→∞ n( x1/n − 1) and proved
that they are the reciprocal of each
log10 (1385) − 3 = 0.ab . . . other.
log10 (1385) − log10 (103 ) = 0.ab . . .
 
1385
log10 = 0.ab . . .
1000
log10 (1.385) = 0.ab . . . (3)
10 log10 (1.385) = a.b . . .
log10 (1.38510 ) = a.b . . . (4)
log10 (25.9715419 . . .) = a.b . . .

Since 101 < 25.9715419 . . . < 102 we conclude that a = 1. Did


you notice what happened between (2) and (3)? When we have
log10 x = y, removing k from y is equivalent to shifting the decimal
point by k places in x.5 Also, looking at (3) and (4), multiplying y 5
This is because we are working with a
by 10 is equivalent to raising x to its 10th power.6 We can now use a base-10 logarithm.
6
This is independent on the base of
similar procedure to find b: the logarithm: this 10 is the base in
which we represent numbers on the
log10 (25.9715419 . . .) = 1.b . . . right-hand side.
log10 (2.59715419 . . .) = 0.b . . .
log10 (2.59715419 . . .10 ) = b. . . .
log10 (13962.955 . . .) = b. . . .

Since 104 < 13962.955 . . . < 105 we conclude that b = 4 and we have
just computed that log10 (1385) ≈ 3.14.
You can adjust this algorithm to compute a logarithm in any
base. Using paper and pen, the only difficult step is to compute x10 . 7
You can compute x10 using only 4
However, unless you plan to compute a lot of decimal places, you do multiplications. Can you see how?
not necessary need a very precise result.7 Hint: x4 requires 2 multiplications.
algo 6

dxe x
Floor b x c and Ceiling d x e of x
3 bxc
Given a real number x, the notation b x c denotes the largest integer
smaller than x. Conversely, d x e denotes the smallest integer larger 2
than x. Figure 2 illustrate both functions, called respectively floor 1
and ceiling.8 For instance bπ c = 3 and dπ e = 4. These two functions
have no effect on integers: b12c = d12e = 12. In fact for any real −2 −1 0 1 2 3 x
number x we have: −1

bxc ≤ dxe −2

b x c = d x e iff x ∈ Z
N Figure 2: The functions b x c and d x e.
1 + b x c = d x e iff x 6∈ Z 8
The C standard library has two
functions floor() and ceil() that
For any n ∈ Z and any x ∈ R the following properties hold9 : round a double accordingly.

b x c < n ⇐⇒ x < n 9
Exercise: demonstrate b− x c = −d x e
using these properties.
d x e ≤ n ⇐⇒ x ≤ n
n < d x e ⇐⇒ n < x
n ≤ b x c ⇐⇒ n ≤ x
b x c = n ⇐⇒ x − 1 < n ≤ x ⇐⇒ n ≤ x < n + 1
d x e = n ⇐⇒ x ≤ n < x + 1 ⇐⇒ n − 1 < x ≤ n
For any n ∈ N we have n = bn/2c + dn/2e. We can prove
this equation by considering the parity of n. If n is even, bn/2c =
dn/2e = n/2 and the equation holds trivially. If n is odd, then
bn/2c = n/2 − 1/2 and dn/2e = n/2 + 1/2 so the sum is indeed n.
Rounding to the nearest integer can be done with b x + 0.5c or
d x − 0.5e depending on how you want to round half-integers10 . 10
i.e., values of the form n + 0.5 with
Now let us nest these rounding notations. It should be easy to see n ∈ N. Should they be rounded down
to n, or up to n + 1?
that dd x ee = bd x ec = d x e and db x ce = bb x cc = b x c, i.e., only the
innermost rounding function matters.
Furthermore, for any n ∈ N+ , m ∈ N+ and x ∈ R we have11 : 11
Note that these two equations are
only good when n and m are integers.
bb x/nc/mc = b x/nmc For instance bb10/.3c/.3c = 111 but
b10/.3/.3c = 110.
dd x/ne/me = d x/nme
The floor notation should be used any time we want to represent int avg(int a, int b)
an integer division, for instance as in Figure 3. {
When rounding logarithms you should know the following iden- return (a + b) / 2;
}
tity:
N Figure 3: If we ignore overflows,
dlog2 (n + 1)e = blog2 (n)c + 1 this function computes b a+2 b c because
dividing an int by another int will
To prove that, rewrite n as 2m + p where m ∈ N and 0 ≤ p < 2m . always round the result towards zero.
Then:
j   p k j  p k
blog2 (n)c + 1 = log2 2m 1 + m + 1 = m + log2 1 + m +1
2 | {z2 }
1≤···<2

= m + 0 + 1 = m + 1, and
      
m p+1 p+1
dlog2 (n + 1)e = log2 2 1+ m = m + log2 1 + m
2 2
| {z }
1<···≤2

= m + 1.
algo 7

Simple Combinatorics 12
By word, we mean just any sequence
of letters, not necessarily a meaningful
• Assume you have a set S of n different letters. How many differ-
word in some dictionary.
ent words12 of length k can we build using only letters from S,
assuming we can use the same letter multiple times? There are
30 31 32 33
n possible letters for each of the k positions in the word, so the aaa ε ccc
number of choices is n × n × n × · · · × n = nk . See Figure 4. aab aa a c cc ccb
cc a
| {z } a ac b
ab cb cbc
k terms ab a c
abb c b
ac ca cb b
• What if we are only allowed to use each letter of S at most once? ab a ba bb bc ca a
a c c

ac b

ca a
ac
Then after we have selected the first letter among the n available,

ca c
baa c

b
bc
b ab

bcb
b ac

bc a
bb a
bbb bbc
we are left with only n − 1 choices for the second letter, n − 2 for
the third letter, etc. The number of words of length k ≤ n we can
N Figure 4: Over the alphabet { a, b, c}
build without repeated letter is therefore there are 31 ways to build a 1-letter
n! word, 32 ways to build a 2-letter word,
n × ( n − 1) × ( n − 2) × · · · × ( n − k + 1) = n k = and 33 ways to build a 3-letter word.
| {z } (n − k)! There is only 30 = 1 way to build the
k terms empty word (denoted ε).
See Figure 5 for an example. The notation nk , with an underlined
exponent, is the kth falling power of n: it works like a power except ε 30
that its argument is decremented by one after each product.13 We a b c 31
can define the falling power recursively as n0 = 1 and for k > 0, ab ac bc ba ca cb 32
nk = n × (n − 1)k−1 . In particular we have nn = n!. abc acb bca bac cab cba 33
N Figure 5: Without repeating letter
• Let us now build subsets of S that contain k letters. We could pro-
there are only 31 = 3 ways to build a
ceed as we did for building words of length k with unique letters: 1-letter word, 32 = 3 × 2 ways to build
choosing the first letter among n, then the second among n − 1, a 2-letter word, and 33 = 3 × 2 × 1
ways to build a 3-letter word.
etc. We can actually associate each word to a set. For instance,
the word ab would correspond to the set { a, b}, the word bc to 13
When both n and k are natural
{b, c}. The problem is that this correspondence is not a one-to-one numbers such that k ≤ n, we have
mapping: the word ba would also be mapped to the set { a, b} nk = n!/(n − k )!. However, the
falling power can be used even when
since sets are not ordered. For a given set with k letters, there are n is a complex number, or when k is
kk = k! different words. So the number of subsets of size k built larger than n, two cases that are not
supported by the expression using
from a set of size n, is equal to the number of k-letter words we
factorials.
can build without repeating letters from n letters, divided by the
k! numbers of ways to order these k letters. ∅ 30 /0! = (30)
{ a} {c} {b} 31 /1! = (31)
nk
 
n! n
= = { a, c} { a, b} {b, c} 32 /2! = (32)
k! (n − k)!k! k
{ a, b, c} 33 /3! = (33)
The number (nk), pronounced ‘n choose k’, is called binomial co-
efficient because it is the coefficient of x k yn−k in the polynomial N Figure 6: When the words of Fig-
ure 5 are converted to sets, the tree
expansion of the nth power of the binomial x + y: collapses into this lattice.
n  
n k n−k
( x + y)n = ∑ x y 1 20
k =0
k
1 1
21
• What is the total number of subsets of S (of any size)? To build 1 2 1 22
1 3 3 1 23
one subset, we iterate over each letter of S and decide whether
1 4 6 4 1 24
we take it or not. We have 2 possibilities for each of the n letters, 1 5 10 10 5 1 25
that makes 2n different subsets. On the other hand, this number 1 6 15 20 15 6 1 26
of subsets is also the sum of all subsets of different sizes, as com- 1 7 21 35 35 21 7 1 27
n  
n
puted in the previous paragraph. So we have ∑ = 2n as N Figure 7: The sum of each line of
k =0
k Pascal’s triangle is a power of 2.
illustrated by Figure 7.
algo 8

Triangular Numbers +1 +2 +3
A0 = 0 A1 = 1 A2 = 3 A3 = 6
n
The numbers An = 0 + 1 + 2 + · · · + n = ∑ k are called triangular
k =0
numbers because they can be represented as in Figure 8. +4 +5 +6
A4 = 10 A5 = 15 A6 = 21
The equality An = n(n + 1)/2 can be demonstrated in several
N Figure 8: The first triangular num-
ways. bers.

• By induction14 . You probably already did it when you learned 14


Induction is of no use to you if you
induction. The proof is based on the fact that An = An−1 + n. do not already know the solution. If
this was a new problem for which you
suspected (maybe after looking at a
• By summing twice: once forward, and once backward.15 few values) that An = n(n + 1)/2, then
induction would be a way to prove that
An = 0 + 1+ 2 + · · · + ( n − 1) + n your intuition is correct.
A n = n + ( n − 1) + ( n − 2) + · · · + 1+ 0
15
Gauss reportedly found this trick
2An = n + n+ n+···+ n+n while he was a child.

Since there are n + 1 terms on the right-hand side of the last line,
we find that 2An = n(n + 1). 1 n
2
Figure 9 shows a graphical version of this demonstration. 3
n
• The previous demonstration is easily performed using the ∑
3
notation as well: 2
n
! !
1
2An = ∑ k + ∑ k n+1
0≤ k ≤ n 0≤ k ≤ n
N Figure 9: Another way to see that
Replace k by n − k in the second sum: 2An = n(n + 1): there are An dots
! ! of each color arranged in a n by n + 1
rectangle.
2An = ∑ k + ∑ n−k
0≤ k ≤ n 0≤ n − k ≤ n

Simplify the constraint of the second sum:


! !
2An = ∑ k + ∑ n−k
0≤ k ≤ n −n≤−k≤0
! !
2An = ∑ k + ∑ n−k
0≤ k ≤ n 0≤ k ≤ n

Finally merge the two sums:

2An = ∑ (k + n − k) = ∑ n = n ( n + 1)
0≤ k ≤ n 0≤ k ≤ n

1
• As seen on page 7, there are (n+ 1
2 ) subsets of length 2 in {0, 1, . . . , n }. 1 1
Let { x, y} be such a subset, and assume x < y. Let us count all
1 2 1
subsets existing for the different values of x. If x = 0, there are n 1 3 3 1
possible values for y; if x = 1 we have n − 1 possible values for y; 1 14 6 4
etc. If x = n there is no value available for y. The sum of all these 5 10 10 5 1
1
n + (n − 1) + · · · + 0 just happens to be An . So we have 1 6 15 20 15 6 1
1 7 21 35 35 21 7 1
N Figure 10: Triangular numbers form
( n + 1)2
 
n+1 ( n + 1) n
An = = = . a diagonal of Pascal’s triangle.
2 2! 2
Figure 10 should therefore not be a surprise.16 16
By convention (nk) = 0 when k > n or
k < 0 (i.e., outside of Pascal’s triangle)
so our (n+ 1
2 ) is also valid for A0 .
algo 9

+ A1 + A2
Tetrahedral Numbers
B0 = 0 B1 = 1 B2 = 4
What happens when we sum all consecutive triangle numbers? + A3 + A4

n n j B3 = 10 B4 = 20
Bn = A0 + A1 + · · · + An = ∑ Aj = ∑ ∑ k
j =0 j =0 k =0 + A5 + A6
B5 = 35 B6 = 56
We get tetrahedral numbers, so called because stacking the triangles of
N Figure 11: The first tetrahedral
Figure 8 gives you a triangular pyramid as shown in Figure 11.
numbers.
The closed formula is Bn = n(n + 1)(n + 2)/6 and there are again
a couple of ways to prove it.17 17
Do not confuse this formula with the
Cn = n(n + 1)(2n + 1)/6 from page 10.
• Induction is still a possible option. The key step is that Bn =
( n −1) n ( n +1)
Bn−1 + An = 6 + n(n2+1) = n(n+16)(n+2) . 1 1
∑1 = n
1 1
( n −1) n ( n +1)
• Note that the above formula 6 + n(n2+1) = n(n+16)(n+2) is 1 ∑ ∑ 1 = An
2 1
1 3 3 1 ∑ ∑ ∑ 1 = Bn
simply a long way to write (n+ 1 n +1 n +2
3 ) + ( 2 ) = ( 3 ). You may find it 1 4 6 4 1
easier to remember that Bn = (n+ 2
3 ), forming another diagonal of 1 5 10 10 5 1
Pascal’s triangle (Figure 12). 1 6 15 20 15 6 1
1 7 21 35 35 21 7 1
Since each diagonal of Pascal’s triangle is made of the partial sum
N Figure 12: Tetrahedral numbers form
of the previous diagonal, you should find very easy to guess a another diagonal of Pascal’s triangle.
formula for the sum of consecutive tetrahedral numbers: (Note that these sums implicitly start at
1, not 0 like in the rest of the page; do
n   n j   n i j  
you see why it matters in this picture?)
n+1 n+2 n+3
∑k= 2
, ∑ ∑k= 3
, ∑∑ ∑k= 4
.
k =0 j =0 k =0 i =0 j =0 k =0

• The above two points require you to know (or suspect) that Bn =
n(n+1)(n+2)
6 or Bn = (n+ 2
3 ) in order to prove it by induction.
How can we find a closed formula for Bn if we do not know
that? Looking at how balls are stacked in 3D in Figure 11, we
can assume that Bn should represent some volume, i.e., a cu-
bic polynomial. Or if you prefer a more mathematical view: A j
is a quadratic polynomial, Bn , as the sum of n of these terms,
should be expressible as a cubic polynomial. So we guess Bn =
an3 + bn2 + cn + d and we just need to evaluate this for a couple
of values of n to find a, b, c, and d. Evaluating B0 = 0 tells us that
d = 0. From B1 = 1, B2 = 4, and B3 = 10 we get:
 

 a+b+c = 1 
 c = 1−a−b
 
8a + 4b + 2c = 4 hence 6a + 2b = 2

 

27a + 9b + 3c = 10 24a + 6b = 7
 

 c = 1−a−b 
 c = 2/6
 
hence b = 1 − 3a hence b = 3/6

 

6a + 6 = 7  a = 1/6

n3 +3n2 +2n
Thus we have found that 6 , which happens to be equal to
n(n+1)(n+2)
6 ,
is a polynomial that will work for n = 0, 1, 2, 3, and
we can prove by induction that it is correct for any n ∈ N.
algo 10

Pyramidal Numbers +1 +4
C0 = 0 C1 = 1 C2 = 5
n
The numbers Cn = 02 + 12 + 22 + · · · + n2 = ∑ k2 are called +9 +16
k =0 C3 = 14 C4 = 30
pyramidal numbers because they represent the number of spheres
stacked in a pyramid with a square base, as shown in Figure 13.
+25 +36
Unlike previous numbers, we will not give the closed formula di- C5 = 55 C6 = 91
rectly. It seems remembering the formula is hard for many students, N Figure 13: The first pyramidal
so maybe it is best to learn three ways to rediscover it. numbers.

• Since this is a sum of n squares, and Figure 13 gives a 3D inter-


pretation, we can, as we did on page 9 for tetrahedral numbers,
assume that Cn is cubic polynomial an3 + bn2 + cn + d and use
the first values of Cn to find its coefficients. From C0 = 0, we learn
that d = 0. Using C1 = 1, C2 = 5, and C3 = 14, we get:
 

 a+b+c = 1 
 c = 1/6
 
8a + 4b + 2c = 5 whose solution is b = 3/6

 

27a + 9b + 3c = 14  a = 2/6

2n3 +3n2 +n
Hence our polynomial is 6 and without too much effort18 18
Because two of the three roots are
n(n+1)(2n+1) easy to find: 0 and −1.
we can factorize it as 6 .
By construction this formula is correct from C0 to C3 . If we as-
(n−1)n(2n−1)
sume that Cn−1 = 6 , then Cn = Cn−1 + n2 =
n(2n2 −3n+1)+6n2 2
6 = n(2n +63n+1) = n(n+1)(
6
2n+1)
. Hence by induction
our formula is correct for all n ∈ N.
n  
• Let us compute S = ∑ (i + 1)3 − i3 in two ways. First, we
i =0
separate it in two sums which almost cancel out each other19 : 19
Watch out for the indices in these
two sums! The first sum is changed by
n n n +1 n
replacing i by i − 1 and rewriting the
S= ∑ ( i + 1)3 − ∑ i 3 = ∑ i 3 − ∑ i 3 = ( n + 1)3 (5) range 0 ≤ i − 1 ≤ n into 1 ≤ i ≤ n + 1.
i =0 i =0 i =1 i =1 In the second sum we just omit the first
In a second approach, we develop the summand and express the term, because it is equal to 0.

result as a sum of triangular (page 8) and pyramidal numbers:


n
S= ∑ (3i2 + 3i + 1) = 3Cn + 3An + n + 1 (6)
i =0

Since (5) and (6) are two expressions for S, we get that 3Cn +
3An + n + 1 = (n + 1)3 . Knowing a formula for An , we get
n(n+1)(2n+1)
3Cn = (n + 1)((n + 1)2 − 32 n − 1) hence Cn = 6 .

• Consider each square used in the layers of a pyramid in Figure 13,


and split them into two triangles by the diagonal. One triangle
(the larger one, drawn using in Figure 14) includes the diagonal,
C6 = B6 + B5
and the other does not. The sum of the larger triangles of all
layers of Cn is the tetrahedral number Bn (page 9) while the sum
of all smaller triangles is Bn−1 . Hence
   
n+2 n+1
Cn = Bn + Bn−1 = + N Figure 14: A pyramidal number is
3 3
the sum of two consecutive tetrahedral
n(n + 1)(n + 2) (n − 1)n(n + 1) n(n + 1)(2n + 1) numbers.
= + =
6 6 6
algo 11

Sum of an Arithmetic Progression


When analyzing algorithms, it often happens that the number of op-
erations performed in a loop is a linear function of the loop counter.
Then, the sum of all performed operations has the following form,
for some value of a and b:
n
Dn = a + ( a + b) + ( a + 2b) + · · · + ( a + nb) = ∑ a + kb
k =0

Triangular numbers (page 8) are a special case of this sum with


a = 0 and b = 1. In the general case we can rewrite Dn using An :
! !
n n n
Dn = ∑a + ∑ kb = a ( n + 1) + b ∑ k = a(n + 1) + bAn
k =0 k =0 k =0
bn(n + 1) (2a + nb)(n + 1)
= a ( n + 1) + =
2 2
But the same result is in fact even easier to obtain using Gauss’
trick of summing forward and backward:

Dn = a+ ( a + b) + ( a + 2b) + · · · + ( a + nb)
Dn = ( a + nb) + ( a + (n − 1)b) + ( a + (n − 2)b) + · · · + a
1 2n+1
2Dn = (2a + nb) + (2a + nb) + (2a + nb) + · · · + (2a + nb) 3
5
n
Hence 2Dn = (2a + nb)(n + 1). Figure 15 gives an example with 5
3
2n+1 1
a = 1 and b = 2.
2n + 2
The above trick has a huge advantage over expressing Dn using
An : it can be generalized very easily to any partial sum of an arith- n
N Figure 15: The sum On = ∑ 2k + 1
metic progression. For instance, let us assume you want to sum all k =0
of the first n odd numbers is such that
the terms 3 + 5i for 100 ≤ i ≤ 1000. Calling S the result, you would 2On = n(2n + 2) hence On = n(n + 1).
write

S = 503 + 508 + 513 + · · · + 5003


S = 5003 + 4998 + 4993 + · · · + 503
2S = 5506 + 5506 + 5506 + · · · + 5506

The number of terms20 in these sums is 901 since we go from i = 100 20


Always be cautious when calculating
to i = 1000. Therefore 2S = 5506 × 901 and S = 2480453. the length of an interval: it is a frequent
source of off-by-one errors.
For any a, b, v ≤ w, we have
w
(2a + (v + w)b)(w − v + 1)
∑ a + kb = 2
.
k=v

You might find the above formula easier to remember as

w−v+1
(( a + vb) + ( a + wb)) ,
2
that is: the sum of the first and last terms, multiplied by half the
number of terms.21 21
But do also remember that this is
only valid for arithmetic progressions.
algo 12

Sum of a Geometric Progression


Consider the sum of the terms of a geometric progression of ratio r:
n
En = 1 + r + r2 + · · · + r n = ∑ rk
k =0
An easy way to find a closed formula for this sum is to notice that
En and rEn have many terms in common:
En = 1 + r + r2 + · · · + r n
rEn = r + r 2 + · · · + r n + r n +1
hence En − rEn = 1 − r n+1
1 − r n +1
and assuming r 6= 1, En =
1−r
The formula to remember is therefore:
n 20 nodes
1 − r n +1
For any r 6= 1, ∑ rk
=
1−r
(7) 21 nodes
k =0 22 nodes
n +1 23 nodes
• When r = 2, we have ∑nk=0 2k = 1−1− 2
2 = 2
n+1 − 1, a formula that
3
should be known by any programmer. For instance the number ∑ 2k = 24 − 1
k =0
of nodes in a complete binary tree of height n (see Figure 16). A
N Figure 16: A complete binary tree of
binary number (111 . . . 1)2 that has all its n bits set to 1 represents height 3 has 24 − 1 = 15 nodes.
−1 k
the value ∑nk= n 8
0 2 = 2 − 1. In particular 2 − 1 is the maximum
value you can represent with a unsigned char variable, since
this type uses 8 bits.
22
Limits on this page are computed
9 We had to assume r 6= 1 because of the division by 1 − r, but the using L’Hôpital’s rule: if lim f ( x ) =
x →c
1 − r n +1 f 0 (x)
limit22 of when r tends to 1 is actually what we expect: lim g( x ) = 0 and lim 0 exists, then
1−r x →c x →c g ( x )

n n f (x) f 0 (x)
lim = lim 0 .
∑ 1k = ∑ 1 = n + 1 x →c g ( x ) x →c g ( x )

k =0 k =0
1 − r n +1 −(n + 1)r n
lim = lim = n+1
r →1 1 − r r →1 −1
9 Equation (7) can be used to rediscover the formula for Triangular
Numbers (page 8). To transform ∑ r k into ∑ k, we differentiate
∑ r k with respect to r, giving ∑ kr k−1 , and then we set r = 1. Of
course we must do these operations on both sides of (7), and we
have to take a limit for r → 1 on the right:
n
d d 1 − r n +1
dr ∑ rk = dr 1−r
k =0
n
−(n + 1)r n (1 − r ) + (1 − r n+1 )
∑ krk−1 = (1 − r )2
k =1
n
nr n+1 − (n + 1)r n + 1 (n + 1)nr n − (n + 1)nr n−1
∑ k = rlim
→1 (1 − r ) 2
= lim
r →1 2(r − 1)
k =1 23
Doing so is left as an exercise to
(n + 1)nr n−1 (r − 1) ( n + 1) n the reader. If you survive the double
= lim = differentiation and the computation
r →1 2(r − 1) 2
  of the limit, and obtain the expected
d d
∑ r k = ∑ k2 r k−1 so by setting r = 1 we get the
n(n+1)(2n+1)
Similarly dr r dr 6 , treat yourself with a
well-earned lollipop.
formula for the Pyramidal Numbers (page 8).23
algo 13

9 Catalan Numbers
n Pn Dyck words
A Dyck word of length 2n is a string built using n opening paren- 0 1 ε (empty word)
theses and n closing parentheses, in such a way that a closing 1 1 ()
2 2 ()(), (())
parenthesis always matches an opening one. For instance w1 = 3 5 ()()(), ()(()),
‘((()()))(())’ is a Dyck word, but w2 = ‘(()))(())()(’ is not. (())(), (()()), ((()))
Let Pn be the number of Dyck words of length 2n. This integer 4 14 …
6 42 …
sequence (Table 1) is known as the Catalan numbers24 . 7 132 …
A string with n opening parentheses and n closing parentheses N Table 1: Number of Dyck words for
can be interpreted as a path on a square grid (Figure 17). Starting various n, a.k.a. Catalan numbers.
24
from the lower left corner and interpreting the letter ‘(’ and ‘)’ Named after Eugène Charles Catalan
(1814–1894).
respectively as up and right, we necessarily reach the above right
corner. The number of paths that join the two corners using only n
up and n right movements is (2n n ): from the total of 2n movements
we simply have to choose n which will be the ups. (Or if you prefer
working with words: in a string of 2n characters we have to choose n
positions among the 2n positions available to put the ‘(’ letters.)
Not all these (2n
n ) paths correspond to Dyck words, only those that
stay above the diagonal. To count the number of paths that do not
correspond to Dyck words, let us consider the first segment of the
N Figure 17: The words
path that goes below the diagonal, and flip all up and right move- w1 = ‘((()()))(())’ and
ments afterwards (Figure 18). This is a reversible operation that can w2 = ‘(()))(())()(’ interpreted
as paths on a grid. The letter ‘(’ is up,
only be done on paths that do not represent a Dyck word. Since the
while ‘)’ is right. Dyck words corre-
resulting path has only n − 1 up movements, there are (n2n −1) words of sponds to paths that stay above the
length 2n that are not Dyck words. We have established that diagonal.
n
   
2n 2n
Pn = − (8)
n n−1

which we can simplify: n

(2n)(2n − 1) · · · (n + 2)(n + 1) (2n)(2n − 1) · · · (n + 2) x


= −
n! ( n − 1) !
(2n)(2n − 1) · · · (n + 2)(n + 1)
 
n
= 1− flip after x
n! n+1
 
1 2n
Pn = (9)
n+1 n
n−1
Note that (8) tells us that Pn is an integer even if it is not that obvi- x
ous from (9).
Catalan numbers have a vast number of applications.25 For in-
n+1
stance the number of full26 binary trees with n internal nodes is Pn .
N Figure 18: Flipping all ups and rights
To see that, make a depth-first traversal of some full binary tree and that occur after the first segment below
write ‘(’ each time you get down a left edge, and ‘)’ each time you the diagonal transform a path with n
ups and n rights into a path with n − 1
get down a right edge (Figure 19 below). ups and n + 1 rights.
25
And also many different proofs.
((())) (()()) (())() ()(()) ()()() 26
A binary tree is full if all its internal
nodes have degree 2.

J Figure 19: The P3 = 5 full binary


trees with 3 internal nodes and their
relation to Dyck words of length 6.
algo 14

9 Bounding Sums with Integrals


The technique presented on this page justifies (and generalizes)
the intuition we used on pages 9 and 10 that the sum of n quadratic f (0) f (1) f (2) f (n)
terms should be a cubic polynomial.
−1 0 1 2 n n+1
For more generality, let us consider the sum f (0) + f (1) + · · · +
f (n) where f is some monotonically increasing function. Showing N Figure 20: When f (i ) is interpreted
these terms under the graph of f as in Figures 20 and 21 we have as an area between i Rand i + 1, we have
n +1
f (0) + · · · + f ( n ) ≤ 0 f (k) dk.
Z n n Z n +1

−1
f (k) dk ≤ ∑ f (k) ≤
0
f (k) dk
k =0
Note that the length of the two integration intervals is equal to the
number of terms in the sum.27 f (0) f (1) f (2) f (n)
These inequalities come in handy to bound a sum that we do
−1 0 1 2 n n+1
not know how to simplify. For instance, let us pretend that we do
not know how to compute triangular numbers (page 8). We simply N Figure 21: If f (i ) is interpreted as
rewrite the above inequalities with f (k) = k: an area between i − 1R and i, we have
n
f (0) + · · · + f (n) ≥ −1 f (k ) dk.
Z n n Z n +1

−1
k dk ≤ ∑ k≤
0
k dk 27
Using a semi-open interval for the
k =0 sum, we can rewrite these inequalities
using the same bounds for the sum and
Since the antiderivative28 of k is k2 /2 we get: integrals:
 2 n n  2  n +1 Z n +1 Z n +1
k k
≤ ∑k≤ 0
f (k − 1) dk ≤ ∑ f (k) ≤ 0
f (k) dk
2 −1 k =0 2 0 28
0≤ k < n +1
a.k.a. primitive
n
n2 − 1 ( n + 1)2
≤ ∑k≤
2 k =0
2
We do not have an exact value for this sum, but from these bounds
we can at least derive some asymptotic equivalence29 : 29
f ∼ g iff lim
f (n)
=1
n→∞ g(n)
n
n2
∑k∼ 2
k =0

A complexity we will encounter later is log2 (n!). Do you think


that using log2 (n!) operations to sort n values is efficient? It is hard
to tell if you have no idea how fast log2 (n!) grows. Luckily, we can
rewrite log2 (n!) as a sum:
!
n n n
log2 (n!) = log2 ∏k = ∑ log2 (k) = ∑ log2 (k)
k =1 k =1 k =2

and then we simply apply the bound-by-integral technique30 : 30


If you learned that the antiderivative
Z n n Z n +1 of ln( x ) is x ln( x ) − x, just erase it from

1
log2 k dk ≤ ∑ log2 (k) ≤ 2
log2 k dk your memory, and use the freed space
to store a formula that will work for
k =2
  n   n+1 all bases instead: the antiderivative of
k k loga ( x ) is x loga ( x/e).
k log2 ≤ log2 (n!) ≤ k log2
e 1 e 2
 
2
n log2 n − n log2 (e) + log2 (e) ≤ log2 (n!) ≤ (n + 1) log2 (n + 1) − (n + 1) log2 (e) − 2 log2
e
From that we easily conclude log2 (n!) ∼ n log2 n. A sorting algo- 31
Page 52 we will demonstrate that any
sorting algorithm that uses compar-
rithm that performs in the order of n log2 n operations is actually
isons to order values requires at least
pretty good.31 log2 (n!) comparisons in the worst case.
For a more precise tie between sums and integrals, look up the
Euler-Maclaurin formula in your preferred encyclopedia.
algo 15

9 Summing Using the Reciprocal


Here is a nifty trick to deal with sums such as ∑i blog2 i c. Let us
consider the following sum, which we will encounter later.
n n
Fn = ∑ (blog2 ic + 1) = n + ∑ blog2 ic
i =1 i =1

The trick, pictured on Figure 22, is to express the sum32 of a strictly 32


This trick can be applied to integrals
increasing function f using the sum of its reciprocal f −1 . as well.

4 J Figure 22: The total area covered by


those two sums is a rectangle, and we
4
3 16 4

∑2 k
2
have ∑ blog2 kc = 17 × 4 − ∑ 2k .
k =1 k =1
k =1
1
0
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
16
∑ blog2 kc
k =1

Generalizing this figure for any n, we have

n blog2 nc blog2 nc
∑ blog2 kc = (n + 1)blog2 nc − ∑ 2k = (n + 1)blog2 nc + 1 − ∑ 2k
k =1 k =1 k =0
blog2 nc+1
= (n + 1)blog2 nc + 2 − 2
n 1 0 0 0 0 1
Finally, Fn = n + ∑ blog2 i c = n + (n + 1)blog2 nc + 2 − 2blog2 nc+1 . 2 0 0 0 1 0
i =1 3 0 0 0 1 1
4 0 0 1 0 0
F t f 5 0 0 1 0 1
6 0 0 1 1 0
Why do we care about such a function? Because blog2 (n)c + 1 is 7 0 0 1 1 1
the number of bits required to represent the number n in binary, and 8 0 1 0 0 0
many algorithms have a run time proportional to that. Fn is the sum 9 0 1 0 0 1
of the bits needed to represent each number between 1 and n (see 10 0 1 0 1 0
Figure 23). 11 0 1 0 1 1
For instance, running a recursive implementation of Bina- 12 0 1 1 0 0
13 0 1 1 0 1
rySearch (page 24) on an array of length n involves at most
14 0 1 1 1 0
2 + blog2 (n)c calls (the first one plus the recursive ones) to Bi- 15 0 1 1 1 1
narySearch. 16 1 0 0 0 0
Now let us assume that you are doing a binary search to insert N Figure 23: The first 16 positive inte-
a new element into a sorted array, and that you do this in a loop, gers with their binary representation.
so that each binary search is applied to an array that has one entry The log2 (i ) + 1 bits needed to represent
the number i are highlighted . If you
more than the previous one. The total number of calls the Binary-
omit the last column (a total of n bits),
Search (including recursive calls) will therefore have the form the two colored areas are the same as in
Figure 22.
j
∑ (2 + blog2 kc)
k =i

where i and j depends on the initial size of the array and the number
of iterations (i.e., binary search + insertion) performed.
algo 16

9 Finite Calculus
For those of you curious to learn new tricks, here is something called
Finite Calculus. This page should by no means be understood as a
reference on this subject, instead, consider it as a teaser33 . 33
For a more serious presentation of
The idea is that a sum of kα over a half-open interval behaves like Finite Calculus, I suggest you start with
Finite Calculus: A Tutorial for Solving
the integral of kα over the same interval. So summing these falling Nasty Sums, by David Gleich.
powers34 should be very natural (at least if you remember how to 34
The falling power was defined on
integrate). Compare the following equations: page 7 as k0 = 1 and kα = k (k − 1)α−1 .

Z n
∑ 1=n
0
1 dk = n
0≤ k < n
n2 n2
Z n
∑ k= 2 0
k dk =
2
0≤ k < n
n3 n3
Z n
∑ k2 =
3 0
k2 dk =
3
0≤ k < n
n α +1 n α +1
Z n
∑ kα =
α+1 0
kα dk =
α+1
0≤ k < n

These work on non-zero based intervals as you would expect from


an integral. For instance
j
k α +1 j α +1 − i α +1

∑ kα =
α+1
=
α+1
i ≤k< j i

Following these rules, we can for instance compute the tetrahedral


numbers of page 9 very easily (just remember to use semi-open
intervals35 ): 35
They have to be closed on the left
side, and open on the right side.
( j + 1)2 ( n + 2)3 13 ( n + 2)3
Bn = ∑ ∑ k = ∑ 2
=
6
− =
0≤ j < n +1 0≤ k < j +1 0≤ j < n +1 | {z6} 6
0

If we look at functions other than falling powers, the analogy


between sum and integral does not always exist. For instance, it
would be tempting to see the sum of 2x as the analogous of the
integral of e x :
Z j
∑ 2 k = 2 j − 2i
i
ek dk = e j − ei
i ≤k< j

but the sum and integral of x k do not actually exhibit that much
similarities:

x j − xi x j − xi
Z j
∑ xk =
x−1 i
xk dk =
ln x
i≤k< j
algo 17

(((∅, ∅), ∅), ((∅, (∅, ∅)), (∅, ∅)))


Binary Trees
((∅, ∅), ∅) ((∅, (∅, ∅)), (∅, ∅))
Let us define a binary tree recursively as follows: a binary tree is
either the empty tree ∅, or a pair of binary trees ( L, R) where L is (∅, ∅) (∅, (∅, ∅)) (∅, ∅)
called the left child, while R is the right child. (∅, ∅)
As Figure 24 illustrates, a binary tree can be represented as a
graph where each pair ( L, R) is represented by a node connected N Figure 24: A graphical rep-
resentation of the binary tree
to new nodes created for each of its non-empty children. These
(((∅, ∅), ∅), ((∅, (∅, ∅)), (∅, ∅))).
graphs are traditionally drawn going down, with the left and right
children located on the corresponding side below their parent node.
With this drawing convention the shape of the graph is enough to
uniquely identify a binary tree, so we can forgo the mathematical
notations and work only with pictures such as Figure 25. N Figure 25: A nicer representation of
the same binary tree. This tree is not
A node that has only empty children (i.e., a node labeled by full because it has two internal nodes of
(∅, ∅)) is called a leaf node. The other nodes are called internal degree 1.
nodes. These two sets of nodes are shown with two colors on Fig-
ures 24 and 25, but this coloration is purely cosmetic. The topmost
node is called the root of the tree36 . The degree of a node is the num- 36
In Figure 25 the root is an internal
ber of non-empty children: the leaves are the nodes with degree 0, node. Can you build a tree where the
root is a leaf?
while internal nodes have degree 1 or 2.
depth
A full binary tree is a binary tree where all internal nodes have
0
degree 2. The binary tree of Figure 25 is not full, while the one of
1
Figure 26 is. Figure 19 on page 13 shows all possible full binary trees
2
with 3 internal nodes.
3
The depth of a node is the number of edges between this node and
N Figure 26: A full binary tree: each
the root. The root itself has depth 0; its children have depth 1; its internal node has two non-empty
grand children have depth 2; etc. The height of a tree is the maximum children. The height of this tree is 3.
depth of its nodes.37 37
We could also write ‘‘the maximum
depth of its leaves’’, because for any
You should be able to prove the following properties38 by yourself: internal node there exists a deeper
• A binary tree with n nodes has n − 1 edges.39 leave.
38
All of these are for binary trees, they
• A binary tree with ` leaves has exactly ` − 1 internal nodes of do not assume the binary tree to be full.
39
degree 2. Hint: what do every node but the
root have?
• A binary tree of height h has at most 2h leaves.
• The height of a binary tree with ` > 0 leaves is at least dlog2 `e.
• The number of nodes of a binary tree of height h is at most 2h+1 −
1.
• The height of a binary tree with n > 0 nodes is at least dlog2 (n +
1) − 1e, which we have shown on page 6 to be equal to blog2 nc.
A full binary tree of height h is balanced if the depth of each leaf
is either h or h − 1. For instance, the full binary tree of Figure 26
is balanced because all its leaves have depth 2 or 3. A balanced full
−1 k
binary tree of height h necessarily has ∑kh= h
0 2 = 2 − 1 nodes
h
of depth h − 1 or smaller and between 1 and 2 nodes of depth h.
So if we write n the number of nodes, we have 2h ≤ n < 2h+1 ,
hence h ≤ log2 (n) < h + 1 and because h has to be an integer:
h = blog2 (n)c. The height of a balanced full binary tree of n nodes is
therefore always blog2 (n)c.
algo 18

y
9 Comparison of Two Uniformly Distributed Values
50
Let us assume we have two dices x and y, returning uniformly-
distributed integers such that 20 ≤ x ≤ 40 and 30 ≤ y ≤ 50. We
want to compute the probability that x ≥ y. Maybe this is part of a
role-playing game that we are developing, and this comparison will 40
decide the outcome of a battle. Being able to figure this probability
will allows us to fine-tune the range of x and y to balance the game.
We can represent the different values that x and y may take in two
dimensions, as on Figure 28, and then it is just a matter of counting 30
x
the dots: 20 30 40

∑50 40
y=30 ∑ x =y 1 N Figure 27: When x and y are inte-
P( x ≥ y) = gers, the probability P( x ≥ y) is the
∑50
y=30 ∑40
x =20 1 number of highlighted dots divided by
the total number of dots.
The constraint y ≤ x ≤ 40 on the inner sum of the numerator implies y
that the outer sum should be restricted to y ≤ 40: 50

∑40 40
y=30 ∑ x =y 1 ∑40
y=30 (41 − y ) (12 × 11)/2 66
= = = =
∑50
y=30 ∑40
x =20 1 ∑50
y=30 21
21 × 21 441

If we replace our two random integers x and y by uniformly gener- 40

ated real values, then P( x ≥ y) can be computed in a similar way, but


using a ratio of areas:
i40
y2
R 40 R 40 h
40y −
R 40 30
30 y dx dy (40 − y) dy 2 30 50 x
30
P( x ≥ y) = R 50 R 40 = R 50 = = 20 30 40
20 × 20 400
30 20 dx dy 30 20 dy
N Figure 28: When x and y are real
This result matches the observation on Figure 18 that the colored numbers, the probability P( x ≥ y) is
triangle covers 1/8 of the full area. the highlighted area divided by the
total area.

F notes f

1. In the integer (or discrete) case P( x ≥ y) > P( x > y) whereas


when using real (or continuous) values, P( x ≥ y) = P( x > y).
In the latter case x and y can take an infinite number of values, so
P( x = y) = 0.

2. When writing the sum, we could write ∑50 40


y=30 ∑ x =y 1 and restrict
it to ∑40 40
y=30 ∑ x =y 1. These two sum are equal because the bounds
of the sum must be increasing (for y = 45 we have ∑40 1 =
R b 45
0). However integrals are signed beasts: we have a f ( x ) dx =
Ra
− b f ( x ) dx, so we need to use the correct bounds from the start.

3. These computations can work with more than two variables.40 40


Page use computations similar to
For instance if we have 3 uniformly distributed integer variables those in a continuous context.

0 ≤ xi ≤ 20 for i ∈ {1, 2, 3}, and wish to compute the probability


that x1 ≤ x2 ≤ x3 : we can do it in a similar way:
x x
∑20
x3 =0 ∑ x2 =0 ∑ x1 =0 1
3 2
1771
P ( x1 ≤ x2 ≤ x3 ) = =
∑20
x3 =0 ∑20
x2 =0 ∑20
x1 =0
9261
algo 19

9 Probabilities over Sequences of Random Numbers


Consider long sequence of random numbers, uniformly distributed
between 0 and 1. Let us denote this sequence by σ = s1 , s2 , . . . , sn
where 0 ≤ si ≤ 1 for all i. We say that si , si+1 , . . . , s j is a sorted subse-
quence if si ≤ si+1 ≤ · · · ≤ s j . A sorted subsequence is maximal if it is
41
not included in a larger sorted subsequence. We want to answer the Note that this is different from
the average length of the maximal
following questions for large n: sorted sequences of σ. For instance the
sequence 0.2, 0.4, 0.6, 0.3 contains two
1. What is the expected average number of maximal sorted se- maximal sorted subsequences of sizes
quences in σ? 3 and 1. The average length of those
sequences is (3 + 1)/2 = 2, but the
2. Given an index i chosen randomly, what is the expected average average length for a maximal sequence
containing some random index i is
length of the maximal sorted sequence that contains si .41 (3 + 3 + 3 + 1)/4 = 2.5.
F Answer to Question 1 f
Let us compute the probability42 that some element si is the 42
Please read page 18 if you feel un-
start of a sorted sequence (and therefore also the start of a maximal comfortable with the upcoming inte-
grals.
sorted sequence). For i > 1, such an element satisfies si−1 > si .
R1R1 R1
0 si dsi −1 dsi (1 − si ) dsi s i1 1 from random import uniform
h
P ( s i −1 > s i ) = R 1 R 1 = 0 = si − i = from statistics import mean
dsi−1 dsi 1 2 0 2
0 0 from math import expm1
Now s1 is necessarily the start of a sorted subsequence, and a
N = 10000000
new sorted subsequence will start with propability 12 at the n − 1 x = [uniform(0, 1)
remaining positions. Therefore the expected number of maximal for _ in range(N)]
lengths = []
sorted subsequences is 1 + n− 1
2 . curlen = 0
preval = -1
F Answer to Question 2 f
for val in x:
Let Pm denote the probability Pm = P(s1 ≤ s2 ≤ · · · ≤ sm ∧ sm > if preval <= val:
sm+1 ) that s1 starts a maximal subsequence of length m. curlen = curlen+1
else:
lengths.append(curlen)
R 1 R 1 R sm R s3 R s2
0 s m +1 0 · · · 0 0 ds1 ds2 . . . dsm−1 dsm dsm+1 curlen = 1
Pm = R 1 R 1 R 1
preval = val
R1R1
0 0 0 · · · 0 0 ds1 ds2 . . . dsm−1 dsm dsm+1 lengths.append(curlen)
R 1 R 1 ( s m ) m −1 assert sum(lengths) == N
0 sm+1 (m−1)! dsm dsm+1 1 − ( s m +1 ) m
Z 1
= = dsm+1 print("Theory: ", 1+(N-1)/2)
1 0 m! print("Practice:", len(lengths))
1 1 print("Theory: ", 2*expm1(1)-1)
= − ell=sum(y*y for y in lengths)/N
m! (m + 1)! print("Practice:", ell)
The expected length ` = ∑∞
m=1 mPm of the first maximal subse-
N Figure 29: A small Python program
testing how our theoretical results
quence when n → ∞ is match the reality.
∞ ∞
m m+1−1 1 1 1
`= ∑ m!

( m + 1 ) !
= ∑
( m − 1 ) !

m!
+
( m − 1) ! Theory: 5000000.5
m =1 m =1 Practice: 4999813
= e − (e − 1) + (e − 2) = e − 1 ≈ 1.72 Theory: 2.43656365691809
Practice: 2.4363526
If we pick some random element si in an infinitely long sequence,
N Figure 30: Example output for the
the probability that it belongs to the first maximal increasing se- program of Figure 29.
quence is null. However ` is the average length of the maximal
increasing suffix of si (si si+1 si+2 . . .), and symmetrically, it is also
the average length of its increasing prefix (. . . si−2 si−1 si ). Therefore,
the average size of the maximal increasing sequence containing a
randomly chosen element is 2` − 1 ≈ 2.44.
To double-check these answers, see the experiment of Figures 29–30.
algo 20

Computing Complexities for Algorithms

The time complexity of an algorithm, often noted T (n), is a func-


tion that indicates how long the algorithm will take to process an
input of size n. Similarly, the space complexity S(n) measures the ex-
tra memory that the algorithm requires. These two definitions are
vague43 on purpose, as there are different ways to compute and 43
What would be the units of n, T (n),
and S(n)?
express these quantities, depending on how we plan to use them:

• A complexity can be given as a precise formula. For instance, we


might say that SelectionSort requires n(n − 1) comparisons
44
and n swaps to sort an array of size n. If these are the most fre- Two implementations of the same
algorithm are likely to have different
quent operations in the algorithm, this suggests that T (n) can be
coefficients.
expressed as a quadratic polynomial T (n) = an2 + bn + c. Know-
ing this, we can compute the coefficients a, b, and c for a given
implementation44 of SelectionSort by measuring its run time
on a few arrays of different sizes. 3,000

If the implementation of two sorting algorithms A1 and A2 have T2


for time complexity T1 (n) and T2 (n) as shown on Figure 31, we 2,000
can see that A1 is better for n < 8, and A2 for n > 8.
1,000 T1
• A complexity can be given using Landau’s notation, to give an
idea of its order. For instance, we would say that SelectionSort
has time complexity T (n) = Θ(n2 ). This means that when n tends 0
0 2 4 6 8 10 12
to ∞, T (n) behaves like n2 up to some multiplicative factor.
This notation simplifies the derivations of complexities because it N Figure 31: T1 (n) = 10n2 + 14n + 316
and T2 (n) = 2n3 + 5n + 4.
is only concerned about the asymptotic behavior of the algorithm.
For instance, a Θ(n log n) algorithm will be more efficient than a
Θ(n2 ) algorithm for large values of n. However, it tells us nothing
about the behavior for small values of n.
With this notation 10n2 + 2n and 2n2 + 10n would both be written
Θ(n2 ). Because of those hidden constants, we can hardly compare
two algorithms that have the same order of complexity.

• In computational complexity theory, problems (not algorithms)45 45


Sorting an array is a problem that can
are classified according to their difficulty. For instance, the class be solved by many different algorithms.

PTIME (often abbreviated P) is the set of all problems that can be


solved by some algorithm in polynomial time on a deterministic
Turing machine. The class EXPTIME contains problems that can
be solved in exponential time.46 These classes are broad: PTIME 46
Obviously PTIME ⊂ EXPTIME.
doesn’t distinguish between linear or cubic complexities, and
EXPTIME does not distinguish between 2n and 10n , although
these differences certainly do matter to us as programmers.

In this lecture, we shall focus only on how to derive complexities of


the first two kinds.
algo 21

SelectionS o r t 0 i n
smallest i largest n − i

This is probably the simplest sorting algorithm to study. values, sorted values, unsorted

Given an array A containing n values to sort in increasing order, N Figure 32: Loop invariant for Selec-
tionSort.
SelectionSort maintains the loop invariant depicted by Figure 32:
at any iteration i, the values in the range A[0..i − 1] are the smallest 2 7 1 4 6 5 8 3
i values of A in increasing order (i.e., this part is sorted and will not i
be changed), while the values in A[i..n − 1] are unsorted and larger 1 7 2 4 6 5 8 3
than all the values in A[0..i − 1].
i
When i = 0 the array is completely unsorted, and when i = n the
array is fully sorted. Actually we can stop after i = n − 2 because 1 2 7 4 6 5 8 3
after this iteration the only ‘‘unsorted’’ value, A[n − 1], is necessarily i
the largest value in the array, so it is already at the right place. 1 2 3 4 6 5 8 7
To increase i while maintaining this invariant, all we need is to
i
exchange A[i ] with the minimum value of A[i..n − 1]. This gives us
the following algorithm (illustrated by Figure 33): 1 2 3 4 6 5 8 7
i
SelectionSort( A, n) (executions)
1 for i ← 0 to n − 2 do n−1 1 2 3 4 5 6 8 7
2 min ← i n−1 i
3 for j ← i + 1 to n − 1 do (n − 1)n/2 1 2 3 4 5 6 8 7
4 if A[ j] < A[min] (n − 1)n/2
i
5 min ← j ≤ (n − 1)n/2
1 2 3 4 5 6 7 8
6 A[min] ↔ A[i ] n−1
N Figure 33: The colored arrows show
Because line 1 iterates from 0 to n − 2, we can easily tell that lines the exchanges performed on line 6
1, 2, and 6 will be executed n − 1 times. The other three lines are by SelectionSort. Exactly n − 1
swaps are needed to sort an array of n
involved in two nested loops: for a given i the loop on line 3 will
elements.
make (n − 1) + 1 − (i + 1) = n − i − 1 iterations. We have to sum this
for all i using for instance the formula from page 11:
n −2
((n − 1) + (1))(n − 1) ( n − 1) n
∑ ( n − i − 1) =
47
= This is not exactly true, because
i =0
2 2 line 5 may not be proportional to any
of those. However, because line 5 is
Finally line 5 should have an execution count that is at most (n − simple and execute less often than
1)n/2 since it is only executed if the previous comparison succeeds. the comparisons, it should have little
influence in practice.
SelectionSort performs (n − 1)n/2 comparisons (line 4) and
n − 1 exchanges (line 6). Since the execution counts of all the other
lines are also expressed using these quantities, we should be able to
>>> import numpy as np
approximate47 the total number of operations performed (or even
>>> x=[1000,2000,5000,10000,
the time to execute the algorithm) as a linear combination of these ... 20000,50000,100000,200000]
two quantities: a polynomial of the form an2 + bn + c. >>> y=[0.001166,0.004356,
... 0.018224,0.052226,0.173569,
We can now predict the behavior of a SelectionSort imple- ... 0.921581,3.678394,14.70667]
mentation after measuring it on a few arrays of different sizes. For >>> p = np.polyfit(x, y, deg=2)
>>> print(p)
instance, if an implementation gives the following timings:
[ 3.67619503e-10 -4.42217128e-08
size: 1, 000 2, 000 5, 000 10, 000 20, 000 50, 000 100, 000 200, 000 9.81022492e-03]
time: 0.001166 0.004356 0.018224 0.052226 0.173569 0.921581 3.678394 14.70667 >>> np.polyval(p, 2000000)/60
we can use a least-square regression to fit these points to the follow- 24.506656289555412
ing quadratic polynomial as shown on Figure 34: N Figure 34: Using numpy to find a
quadratic polynomial that is a best fit
3.676 · 10−10} ×n2 |− 4.422 · 10−}8 ×n + 9.810 · 10−}3 (in a least-square sense) to our data
| {z {z | {z set, and then predict the run time of
a b c
our implementation on an input of size
Now we can estimate we need 24.5 minutes to sort 2,000,000 values. 2, 000, 000.
algo 22

0 i n
InsertionS o r t
sorted values unsorted values
While InsertionSort’s loop invariant (Figure 35) looks similar N Figure 35: Loop invariant for I nser-
tionSort.
to the invariant of SelectionSort (page 21), it is actually more
relaxed: there is no requirement for all the sorted values in A[0..i − 2 7 1 4 6 5 8 3
1] to be smaller than the unsorted values in A[i..n − 1]. We can start i key
with i = 1 because a sub-array of size 1 is always sorted. To increase
2 7 1 4 6 5 8 3
i, it is necessary to insert A[i ] at the correct position in the sorted
i key
range, shifting some values to the right to make some room. The
array will be sorted when we reach i = n, i.e., after the iteration for 1 2 7 4 6 5 8 3
i = n − 1. i key
There are actually a couple of ways to implement the shift-to- 1 2 4 7 6 5 8 3
insert procedure.48 The pseudo-code below (illustrated by Fig-
i key
ure 36) scans the sorted values from right to left, shifting right all
values greater than the one we want to insert (stored in the variable 1 2 4 6 7 5 8 3
key), and until it finds a smaller value or the start of the array. i key

InsertionSort( A, n) (executions) 1 2 4 5 6 7 8 3
1 for i ← 1 to n − 1 do n−1 i key

2 key ← A[i ] n−1 1 2 4 5 6 7 8 3


3 j ← i−1 n−1 i key
4 while j ≥ 0 and A[ j] > key do ∑ i ( t i + 1)
1 2 3 4 5 6 7 8
5 A [ j + 1] ← A [ j ] ∑i ti
N Figure 36: Running InsertionSort
6 j ← j−1 ∑i ti
on an example. For each iteration the
7 A[ j + 1] ← key n−1 purple arrows represent the assign-
ments on lines 2 and 7, while the blue
Lines 1, 2, 3, and 7 are obviously always executed n − 1 times. arrows are those from lines 5.
However, we are not able to give a precise count for lines 4, 5, and 48
Another option worth investigating
is to locate the position to insert with
6. If we let ti denote the number of iterations of the while loop for
a binary search, and then shift all
a given i, then we can write that lines 5 and 6 are both executed values at once using memmove() or
∑in=−11 ti times. Similarly line 4 is executed ∑in=−11 (ti + 1) times because equivalent.

the condition has to be evaluated one more time before deciding to


exit the while loop.
Our problem is that the actual value of ti depends on the contents
of the array A to sort, so we cannot compute a precise number of
operations that is independent of A. Instead let us look at some
extreme cases: what are the best and worst scenarios?

• The best case is when lines 5 and 6 are never executed.49 In that 49
For this to occur, key (which contains
case, ti = 0 for all i, and line 4 is executed ∑in=−11 (ti + 1) = n − A[i ]) must always be larger or equal to
A[i − 1], i.e., A must already be sorted.
1 times. The entire algorithm therefore executes a number of
operations that is proportional to n − 1, i.e., it is a linear function.

• The worst case is when ti is maximal for each iteration.50 In that 50


This happens when key is smaller
case the while loop executes its body for all values between j = than all values in A[0..i − 1] and the
while loop stops when j < 0.
i − 1 and j = 0, i.e., it performs ti = i iterations for a given i. The
( n −1) n
number of executions of lines 5 and 6 is therefore ∑in=−11 i = 2
(n+2)(n−1)
while lines 4 runs ∑in=−11 (i + 1) = 2 times. In this scenario,
the total number of operations is a quadratic polynomial.

We conclude that InsertionSort is quadratic in the worst case, 51


Can you guess how InsertionSort
behaves on the average? See page 23.
and linear in the best case.51
algo 23

Average-Case Analysis
Knowing the worst and best case complexities of some algorithm is
important, but it does not really tells us how it behaves usually. This
is where an average analysis can be helpful: if possible we would
like to consider all possible inputs of size n, compute the number
of operations performed on each of them, and average the result.
This procedure is not really practical, because we are usually not 2 7 1 4 6 5 8 3
able (or willing) to compute a complexity for each individual case.
Therefore, we resort to statistics and probabilities, making some 1 2 3 4 5 6 7 8
hypothesis on the distribution of inputs. Instead of averaging on all
possible inputs, we will also usually consider only different possible
4 10 2 6 9 8 11 5
shapes of inputs, maybe with different probabilities.
Let us consider InsertionSort from page 22 again, and as-
sume for simplicity that all the values in the array A are different. 2 4 5 6 8 9 10 11
Although the two arrays of Figure 37 are different, from the point of N Figure 37: Two different arrays
view of the sorting algorithm they correspond the same input order that have the same input order will
be handled similarly by the sorting
and they can be sorted with the exact same operations.
algorithm.
So instead of averaging InsertionSor t over all inputs of size
n, we will only consider all possible input orders. Each order can
be given as a permutation π = (π1 , π2 , . . . , πn ) of {1, 2, . . . , n},
and there are n! such permutations possible. For instance, the input
order of the two arrays in Figure 37 corresponds to the permutation
(2, 7, 1, 4, 6, 5, 8, 3).
Given a permutation π, we say that (i, j) is an inversion if i < j
and πi > π j . For instance, the permutation (2, 7, 1, 4, 6, 5, 8, 3)
has 11 inversions: (1, 3), (2, 3), (2, 4), (2, 5), (2, 6), (2, 8), (4, 8),
(5, 6), (5, 8), (6, 8), and (7, 8). Note that the sorted permutation
(1, 2, 3, 4, 5, 6, 7, 8) contains no inversion, while the reverse permu-
tation (8, 7, 6, 5, 4, 3, 2, 1) contains n(n − 1)/2 inversions52 . At every 52
This is the maximum number of per-
iteration i of InsertionSort, when ti values are shifted right to mutations. Indeed, every permutation
is a pair (i, j) satisfying i < j. There
insert A[i ] to their left, exactly ti inversions are canceled. The total are n(n − 1) pairs, and half of them
number of executions of line 5 of InsertionSort, i.e., ∑in=−11 ti is satisfies i < j.
therefore equal to the number of inversions in the input array.53 53
We counted 11 inversions for
Back to our average-case analysis. To count how many times line 5 (2, 7, 1, 4, 6, 5, 8, 3), and you can check
that there are indeed 11 blue arrows on
will be executed on average54 we only need to know the average Figure 36 (page 22).
number of inversions in a permutation of size n. For each permu- 54
We should write this as E[∑in=−11 ti ],
tation (π1 , . . . , πi , . . . , πi , . . . , πn ) that contains the inversion (i, j), that is, the expected sum of all ti s.

there exists a permutation (π1 , . . . , π j , . . . , πi , . . . , πn ) that does not


contain this inversion. This one-to-one mapping means that each
inversion (i, j) has exactly 12 chance to occur in a random permuta-
tion.55 The expected number of inversions in a random permutation 55
Because half of all the n! existing
is therefore 12 (n2 ), that is the number of possible inversion multiplied permutations have the inversion (i, j),
and the remaining half does not.
by their probability to occur.
We conclude that on the average case, lines 5 and 6 are executed
1 n n ( n −1)
2 (2) = 4 times56 , which is just half of our worst case scenario. 56
The average number of executions of
The average number of operations of InsertionSort is therefore a line 7 is left as an exercise to the reader.

quadratic function.
algo 24

BinarySearch( A, 0, 8, 7) :
BinarySea r c h b m e
2 4 5 6 8 9 10 11
So far we have studied two iterative algorithms, but we should also
know how to deal with recursive algorithms. As a very simple ex- BinarySearch( A, 0, 4, 7) :
b m e
ample, let us consider BinarySearch. 2 4 5 6 8 9 10 11
This takes a sorted array A[b..e − 1], a value v, and returns the
BinarySearch( A, 3, 4, 7) :
index where v is in A, or where it should be in case it is not. bm e
BinarySearch( A, b, e, v) 2 4 5 6 8 9 10 11
1 if b < e then
BinarySearch( A, 4, 4, 7) :
2 m ← b(b + e)/2c be
3 if v = A[m] then 2 4 5 6 8 9 10 11
4 return m return 4
5 else N Figure 38: Recursive call to Bina-
rySearch showing the evolution of
6 if v < A[m] then b and e (and the calculated m) when
7 return BinarySearch( A, b, m, v) searching for the value 7 in the array.
8 else
9 return BinarySearch( A, m + 1, e, v) unsigned s(unsigned n)
10 else {
if (n == 0) return 1;
11 return b
return 1 + s(n/2);
The algorithm first checks whether the middle value A[m] is }
equal to v, otherwise it looks for v recursively in either A[b..m − 1] or N Figure 39: Straightforward, recursive
implementation of S(n).
A[m + 1..e − 1].
Let us write n = e − b for the size of the array, and S(n) for unsigned s(unsigned n)
the number of calls (including recursive calls) to BinarySearch {
unsigned res = 1;
needed to locate a value in the worst case. Clearly S(0) = 1 because while (n != 0)
calling BinarySearch with b = e will return immediately. For {
n ≥ 1, the worst-case scenario is when the value is never found, and ++res;
n /= 2; // or n >>= 1
the recursion always occurs on the larger of the two halves. Since }
one value has been removed, these halves have length b(n − 1)/2c return res;
}
and d(n − 1)/2e = bn/2c, and the latter is the larger one. Therefore,
N Figure 40: Iterative implementation
in the worst case, the number of calls to BinarySearch satisfies of S(n).

1 when n = 0,
S(n) =
1 + S (bn/2c) when n ≥ 1.

You can actually solve this recursive equation (i.e., find a formula
for S(n)) as if you had to replace a recursive implementation of this
function (Figure 39) by an iterative version. Every time S is called
recursively, its argument is divided by two, and 1 is added to the
result. We could do this in a simple loop, as in Figure 40. So S(n) is 57
The number (110010)2 = 25 + 24 + 21 ,
needs 6 bits, because its left-most
equal to 1 plus the number of times we need to perform an integer
1-bit is at position 5 (counting from
division of n by 2 to reach 0. This integer division is similar to a right 0). We have 25 ≤ m < 26 hence
shift by one bit, so S(n) is equal to 1 plus the number of bits needed 5 ≤ log2 (m) < 6. More generally,
the position of the left-most 1-bit in
to represent n.57 In other words: the binary representation of any non-
negative integer m is blog2 (m)c, and
S(n) = 2 + blog2 (n)c if n ≥ 1, and S(0) = 1 since bits are numbered from 0, the
number of bits needed to represent m
From this formula, we have a pretty good idea of the behavior of
in base 2 is 1 + blog2 (m)c.
BinarySearch. Since the number of operations performed dur-
ing each of these calls can be bounded by some constant c, the run 58
Later, using notation introduced on
page 25 we shall write that Bina ry-
time of Bina rySearch cannot exceed c × S(n) in the worst-case
Search is a O(log n) algorithm for this
scenario.58 reason.
algo 25

Definitions for Big-Θ, Big-O, and Big-Ω Notations59 59


Those are sometimes called Landau’s
notations, although what Landau really
When we computed the number of operations performed by Selec- invented was the small o notation. For
some history about the notations, read
tionSort (page 21) we concluded its run time should be a polyno- ‘‘Big Omicron and Big Theta and Big
mial of the form an2 + bn + c, and after running some experiments Omega’’ by D. Knuth.
we even actually computed the values of a, b, and c. Of course these
coefficients will be different if the same code is compiled differently,
or executed on a different computer. However, the shape of the func-
tion an2 + bn + c is independent of these implementation details: the
run time of SelectionSort has to be a second-order polynomial.
Most importantly, when n tends towards ∞ the most important term
in this function will be an2 and the bn + c part will be negligible. We
like to remember SelectionSort as a quadratic algorithm, because
n2 is the dominant term in its complexity function.
The Θ, O, and Ω notations help making calculations using these
dominant terms without bothering with all the implementation-
related constants like a, b, and c.
60
i.e. when n → ∞
• f (n) ∈ Θ( g(n)) expresses the fact that f (n)’s asymptotic behav- c2 g ( n )
ior60 is comparable to g(n), up to some multiplicative factor. For
instance an2 + bn + c ∈ Θ(n2 ). We say that SelectionSort’s f (n)
complexity is Θ(n2 ).
c1 g ( n )
The formal definition of f (n) ∈ Θ( g(n)) states that there must
exist two positive constants c1 and c2 so that f (n) is bounded
below by c1 g(n) and bounded above by c2 g(n) for large values of
n. This is illustrated by Figure 41. n0 n
N Figure 41: f (n) ∈ Θ( g(n)): after
∃c1 > 0, ∃c2 > 0, ∃n0 ∈ N,
( )
some n0 the function f (n) is bounded
Θ( g(n)) = f (n)
∀ n ≥ n0 , 0 ≤ c1 g ( n ) ≤ f ( n ) ≤ c2 g ( n ) by two copies of g(n) with different
scale factors.

• f (n) ∈ O( g(n)) expresses the fact that f (n)’s asymptotic behav- 61


cf. page 22
ior is dominated by g(n), up to some multiplicative factor. For cg(n)
instance, InsertionSort’s complexity61 can range from linear
to quadratic depending on its input, so we can say it is in O(n2 ),
| f (n)|
meaning its order is at most quadratic.
O( g(n)) can be defined as the set of all functions whose magni-
tude is bounded above by cg(n) for some c > 0 and large n: n0 n
N Figure 42: f (n) ∈ O( g(n)): after
O( g(n)) = { f (n) | ∃c > 0, ∃n0 ∈ N, ∀n ≥ n0 , | f (n)| ≤ cg(n)} some n0 the function | f (n)| is bounded
above by cg(n) for some constant c.
• f (n) ∈ Ω( g(n)) expresses the fact that f (n)’s asymptotic behavior f (n)
dominates g(n), up to some multiplicative factor. For instance,
InsertionSort’s complexity is in Ω(n) since it is at least linear
but may be larger. cg(n)

Ω( g(n)) can be defined as the set of all functions bounded below


by cg(n) for some c > 0 and large n: n0 n

Ω( g(n)) = { f (n) | ∃c > 0, ∃n0 ∈ N, ∀n ≥ n0 , 0 ≤ cg(n) ≤ f (n)} N Figure 43: f (n) ∈ Ω( g(n)): after
some n0 the function f (n) is bounded
below by cg(n) for some constant c.
These definitions imply that Θ( g(n)) = O( g(n)) ∩ Ω( g(n)).
algo 26

Properties of Big-Θ, Big-O, and Big-Ω Notations


Although Θ( g(n)), O( g(n)), and Ω( g(n)) are defined as sets of 62
Note that this equality really goes
functions, we often abuse the notation to mean one function in this set. one way only: in this context the
For instance, we would write Θ(n) + Θ(n2 ) = Θ(n2 ), which we can notation ‘‘=’’ works like the word ‘‘is’’
in English. For instance, ‘‘Θ(n) =
read as ‘‘any linear function added to any quadratic function is a quadratic O(n2 )’’ means that any function in
function’’62 , although a more rigorous way to write this would be Θ(n) is in O(n2 ), but the reverse does
not hold.
{ f (n) + g(n) | f (n) ∈ Θ(n), g(n) ∈ Θ(n2 )} ⊆ Θ(n2 ). 63
Since we are concerned with a
With the above convention in mind, we have the following simpli- number of operations performed by
fications, where f (n) and g(n) are positive functions63 and λ > 0 is some algorithm, we will (almost)
always have positive functions, and
a positive constant: they will usually be increasing.

λ = Θ (1) λ = O(1)
f (n) = Θ( f (n)) f (n) = O( f (n))
Θ( f (n)) + Θ( g(n)) = Θ( f (n) + g(n)) O( f (n)) + O( g(n)) = O( f (n) + g(n))
Θ( f (n) + g(n)) = Θ(max( f (n), g(n))) O( f (n) + g(n)) = O(max( f (n), g(n)))
Θ( f (n)) · Θ( g(n)) = Θ( f (n) · g(n)) O( f (n)) · O( g(n)) = O( f (n) · g(n))
Θ(λ f (n)) = Θ( f (n)) O(λ f (n)) = O( f (n))

These equalities, which can be proved64 from the definitions of Θ 64


Do not trust me, try it.
and O given on page 25, hold for Ω as well. Following these rules we
have that 4n2 + 3n + 1 = Θ(4n2 + 3n + 1) = Θ(4n2 ) = Θ(n2 ), but we
can generalize this to any polynomial: ak nk + ak−1 nk−1 + · · · + a1 n +
a0 = Θ ( n k ).
Things get a little fancier when we combine Θ, O and Ω. For
instance, we have Θ(n2 ) + O(n2 ) = Θ(n2 ) because the sum of a
quadratic function with a function that is at most quadratic will al-
ways be quadratic, and we have Θ(n2 ) + Ω(n2 ) = Ω(n2 ) because the o ( g(n))
sum of a quadratic function with a function that is at least quadratic `=0

will be at least quadratic. O( g(n))


f (n)
When limn→∞ g(n) = ` exists, we can use its value to decide
whether f (n) belongs to Θ( g(n)), O( g(n)), or Ω( g(n)):
f (n) 0<`<∞ Θ( g(n))
if lim = c > 0 then f (n) = Θ( g(n))
n→∞ g(n)
f (n)
if lim =0 then f (n) = O( g(n)) and f (n) 6= Θ( g(n))
n→∞ g (n )

f (n) Ω( g(n))
if lim =∞ then f (n) = Ω( g(n)) and f (n) 6= Θ( g(n))
n→∞ g (n ) ω ( g(n))
`=∞
f (n)
Note that limn→∞ g(n) = 0 is the definition f (n) = o ( g(n)). We
actually have o ( g(n)) ⊂ O( g(n)) \ Θ( g(n)). Similarly, people oc-
f (n)
casionally write f (n) = ω ( g(n)) when limn→∞ g(n) = ∞, so that
N Figure 44: Relation between o ( g(n)),
we have f (n) = o ( g(n)) ⇐⇒ g(n) = ω ( f (n)) just like we have O( g(n)), Θ( g(n)), Ω( g(n)), and
f (n) = O( g(n)) ⇐⇒ g(n) = Ω( f (n)). f (n)
ω ( g(n)). If the limit ` = limn→∞ g(n)
See Figure 44 for a Venn diagram showing how these different exists, f (n) belongs to one of the round
classes.
sets relate to each other.
Exercises. 1. Show that 1 + sin(n) + n is in Θ(n). 2. Show that for
any a and any b > 0, the function (n + a)b is in Θ(nb ). 3. Show that
n + n sin(n) is in O(n) but is not in Θ(n). 4. Show that 2n + n sin(n)
is in Θ(n). 5. Prove Θ(logi n) = Θ(log j n) for any i > 1 and j > 1.
algo 27

Usage of Big-Θ, Big-O, and Big-Ω Notations


Let us consider again SelectionSort65 and show how to annotate 65
cf. page 21
it with these notations to derive its complexity.

SelectionSort( A, n)
1 for i ← 0 to n − 2 do Θ(n)
2 min ← i Θ(n)
3 for j ← i + 1 to n − 1 do Θ ( n2 )
4 if A[ j] < A[min] Θ ( n2 )
5 min ← j O( n2 )
6 A[min] ↔ A[i ] Θ(n)
Θ ( n2 )

For each line, we essentially make the same computations as


before: we know that lines 1, 2 and 6 are executed n − 1 times, which
is a linear function, so we simply write Θ(n). Also, we know that
lines 3 and 4 will be executed ∑in=−02 n − i − 1 times, but we need not
compute this sum precisely. Summing a linear function between
a constant and n is like integrating66 a linear function between a 66
cf. page 14
constant and n: it will give a quadratic function, so we simply write
Θ(n2 ). Finally, line 5 can be executed as many times as line 4, but it
could be executed less, so we write O(n2 ) to indicate that this is an 67
When people say just complexity
they usually mean time complexity,
upper bound. Now the complexity of the SelectionSort is simply
i.e., a class of functions like Θ(n2 ) or
the sum of the complexity of all its lines: Θ(n) + Θ(n) + Θ(n2 ) + O(n3 ), into which that function giving
Θ(n2 ) + O(n2 ) + Θ(n) = Θ(n2 ). We write that SelectionSort the run time of the algorithm for an
input of size n (or equivalently the
runs in Θ(n2 ), or that its time complexity67 is Θ(n2 ). We shall often number of operations performed)
write T (n) = Θ(n2 ) instead of the time complexity is Θ(n2 ). belongs. Another complexity that can
be studied is the space complexity: how
We can use similar annotation on InsertionSort68 and con-
many extra space does the algorithm
clude that its complexity is O(n2 ): require to process an input of size n.
SelectionSort only needs a constant
InsertionSort( A, n) amount of additional memory (for
the variables i, j, and min) regardless
1 for i ← 1 to n − 1 do Θ(n)
of n, so its state-space complexity is
2 key ← A[i ] Θ(n) S ( n ) = Θ (1)
3 j ← i−1 Θ(n) 68
cf. page 22
4 while j ≥ 0 and A[ j] > key do O( n2 )
5 A [ j + 1] ← A [ j ] O( n2 )
6 j ← j−1 O( n2 )
7 A[ j + 1] ← key Θ(n)
O( n2 )

Such annotations can also be used with recursive algorithms


(such as our presentation of BinarySearch), but they produce a
recursive equation that the complexity function must satisfy, and we
will explain how to deal with those later.69 69
Starting on page 30.
algo 28

A Bestiary of Common Complexity Functions


We will often compare algorithms with different time complexities, 70
Note that as soon as we use the Θ,
O, or Ω notations, we are discussing
saying, for instance, that a Θ(n2 ) algorithm is better than a Θ(n3 ) al-
only about the asymptotic complexity,
gorithm.70 To visualize how far apart different complexity functions i.e., when n → ∞. It would be wrong
are, consider Table 2 at the bottom of this page. It assumes we have a to assume that an Θ(n2 ) algorithm is
always better than a Θ(n3 ) algorithm,
computer that can execute 3 × 109 operations per second71 and con- especially for small values of n. See for
siders many complexity functions we will encounter later. This table instance Figure 31 on page 20.
71
assumes a precise count of operations, like n, not a complexity class If we assume that one operation is
executed in one CPU cycle, we can
like Θ(n), so just keep in mind that an algorithm with complexity think of it as a 3GHz computer.
Θ(n) should have a run time more or less proportional to what the
table gives in the n column.
Here are some algorithms that illustrate each complexity class:
Θ(n) is the cost of computing the minimum or maximum value in
72
Because it is Θ(n) in the worst case,
an array of size n. It is also the worst-case complexity of searching we would write that the search of
a value in an unsorted array.72 a value in an unsorted array can be
Θ(log n) is the worst-case complexity of searching a value in a implemented by a O(n) algorithm.

sorted array using BinarySearch.73 It is also the worst-case 73


Likewise, we would write that Bi-
complexity of searching a value in a balanced binary search tree. narySearch is a O(log n) algorithm.
Note that we do not specify the base
Θ(n log n) is the typical complexity of a good sorting algorithm that of the log when writing O(log n),
relies on comparisons to sort values.74 Θ(log n), or Ω(log n) because all log-
arithm functions are equal up to a
Θ(n2 ) is the complexity of SelectionSo rt75 on an array of size n, constant factor.
or the complexity of adding two matrices of size n × n. 74
e.g., MergeSort, page 30.
Θ(n3 ) is the complexity for the naive76 algorithm to multiply two 75
cf. page 21
76
matrices of size n × n. You probably do not want to use it to The one that implements
cij = ∑k aik bkj as a triple loop.
multiply two 100 000 × 100 000 matrices.
Θ(nlog2 (7) ) is the complexity of multiplying two n × n matrices
using Strassen’s algorithm77 . Note that log2 (7) ≈ 2.81 so even if 77
a clever way to recursively express
the difference between 3 and 2.81 is small, you can appreciate the such a product using 7 products of
sub-matrices of size n2 × n2
difference between n3 and n2.81 .
Θ(2n ) arises naturally in many problems that enumerate all sub-
sets of n elements. For instance, the determinization of a n-state
finite automaton is an O(2n ) algorithm, because it constructs an
automaton that contains 2n states in the worst case.
H Table 2: An algorithm that requires
f (n) CPU cycles to process an input of
size n will execute in f (n)/(3 × 109 )
seconds on a 3GHz CPU. This table
shows run times for different f and n.
input number f (n) of operations to perform
size n log2 n n n log2 n n2 nlog2 (7) n3 2n
101 1.1 ns 3.3 ns 11.1 ns 33.3 ns 0.2 µs 0.3 µs 0.3 ms
102 2.2 ns 33.3 ns 0.2 µs 3.3 µs 0.1 ms 0.3 ms 1.3 × 10 13 y
103 3.3 ns 0.3 µs 3.3 µs 0.3 ms 88.1 ms 0.3 s 1.1 × 10284 y
104 4.4 ns 3.3 µs 44.2 µs 33.3 ms 56.5 s 5.5 min 6.3 × 103002 y
105 5.5 ns 33.3 µs 0.5 ms 3.3 s 10.1 h 3.8 d
106 6.6 ns 0.3 ms 6.6 ms 5.5 min 0.7 y 10.6 y
107 7.8 ns 3.3 ms 77.5 ms 9.3 h 473.8 y 10 570.0 y
108 8.9 ns 33.3 ms 0.9 s 28.6 d 30 402.1 y
109 10.0 ns 0.3 s 10.0 s 10.6 y
1010 11.0 ns 3.3 s 1.8 min 1057.0 y
algo 29

i j k
Merging two Sorted Sub-Arrays
A : ··· ···
The Merge procedure will be used on next page to build Merge-
Sort, a better sorting algorithm than what we have seen so far. Merge( A, i, j, k)
Merge takes an array A and three indices i, j, and k, such that the i k

values in the sub-array A[i..j − 1] are sorted (in increasing order), A : ··· ···

and the values in A[ j..k − 1] are also sorted. The goal is to reorganize N Figure 45: Merge( A, i, j, k) takes
two consecutive sorted sub-arrays
all these values so that A[i..k − 1] is sorted (Figure 45). A[i..j − 1] and A[ j..k − 1] reorder the
entire range.
Merge( A, i, j, k)
1 `←i Θ (1)
2 r←j Θ (1)
3 for b ← i to k − 1 do Θ(n) for n = k − i
4 if r = k or (` < j and A[`] ≤ A[r ]) Θ(n)
5 B[b] ← A[`] O( n )
6 ` ← `+1 O( n )
7 else
8 B [ b ] ← A [r ] O( n )
9 r ← r+1 O( n )
10 A[i..k − 1] ← B[i..k − 1] Θ(n)
Θ(n)

The procedure works in two steps. First, lines 1–9, a temporary


array B is filled with the sorted values, then, on line 10, the part of A
i j k
that we had to sort is overwritten with the contents of B. This array A: 1 3 4 7 2 5 6 8
B is supposed to be at least as large as A. ` r
The actual merging, in lines 1–10, is done using three indices: `
(for left) points to the smallest unused value of A[i..j − 1], r (for B: 1 2 3
b
right) points to the smallest unused value of A[ j..k − 1], and b points
N Figure 46: Merge on an example,
to the current entry of B to fill. B is simply filled from left to right,
after the third iteration of its main loop.
with the smallest value between A[`] and A[r ]. Figure 46 shows an The arrows show previous executions
example with the various involved indices. of lines 5 or 8.

Of course at some point the value of one of the two sub-arrays


will all be used: then either ` will reach j, or r will reach k. In these
cases, the extra conditions on line 4 ensure that the remaining values
will always be taken from the other sub-array.
If we use n = k − i to denote the size of the range to sort, the
complexity of Merge is quite straightforward to establish. The loop
on line 3 performs exactly n iterations, so lines 3 and 4 both account
for Θ(n) operations. Lines 5, 6, 8, and 9 taken individually are each
executed at most n times78 , so we write O(n). Finally line 10 is a 78
In fact lines 5 and 6 are necessarily
trap: it is actually copying n values from B to A, so it has to performs executed j − i times, while lines 8 and
9 are executed exactly k − i times, so
Θ(n) operations. taken together these two groups of
The total complexity is Θ(n): merging two sorted sub-arrays can lines are executed n times. We could
therefore lump all these four lines into
be done in linear time. a big Θ(n) but it would not change our
result.
algo 30

MergeSort
‘‘Divide and conquer algorithms’’79 are designed around the following 79
We will discuss this class of algo-
idea: when faced with a complex problem, try to divide the problem rithms in more details later.

in smaller sub-problems that are easier to solve (this is the divide


step), and once these sub-problems are solved, use their solutions
to construct a solution to the large problem (the conquer step). The
80
division into smaller problems is usually done recursively until the Obviously this is a problem when n is
odd, since the size of an array must be
problems are so small that their solutions are obvious.
an integer. So in practice we have one
The MergeSort algorithm follows this idea: when given an sub-array of size b n2 c and the other of
unsorted array of size n > 1, it divides it into two unsorted arrays size n − b n2 c = d n2 e.

of size n/2 and recursively sorts those.80 Once the two halves are i j k
sorted, the complete sorted array is built using the Merge proce-
2 7 1 4 6 5 8 3
dure described on page 29. Of course the recursive calls to sorts the
arrays of size n/2 will probably divide the arrays into two arrays of
2 7 1 4 6 5 8 3
size n/4. Eventually the recursion will stop on arrays of size 1: those
are already sorted!
2 7 1 4 6 5 8 3
Here is the pseudo-code for MergeSort. We assume that A, the
array to be sorted between indices i (included) and j (excluded),
2 7 1 4 6 5 8 3
will be modified in place. Figure 47 illustrates it.

MergeSort( A, i, k) T (1) T ( n ), n > 1 2 7 1 4 6 5 8 3


1 if k − i >
1  Θ (1) Θ (1)
i+k 2 7 1 4 5 6 3 8
2 j← Θ (1)
2
3 MergeSort( A, i, j) T (b n2 c) 1 2 4 7 3 5 6 8
4 MergeSort( A, j, k) T (d n2 e)
5 Merge( A, i, j, k ) Θ(n) 1 2 3 4 5 6 7 8
Let n = k − i be the size of the array to sort, and let T (n) denote N Figure 47: Running MergeSort on
the time complexity of MergeSort. By looking at the pseudo-code, an example. Each arrow represents one
call to MergeSort on the unsorted
we can see that when n = 1, only the first line is executed in constant array above the arrow, and producing
time, so T (1) = Θ(1). When n > 1, the first two lines cost Θ(1); the sorted array at the bottom of the
arrow. The two recursive calls are
then we have two recursive calls, one on an array of size b n2 c, and the pictured on the sides of the arrow.
other on an array of size d n2 e, those cost T (b n2 c) + T (d n2 e) operations;
and finally we call Merge on an array of size n, which we know
costs Θ(n). The Θ(n) of line 5 dominates the Θ(1) of lines 1 and 2,
so the complexity T (n) is a function that satisfies

 Θ (1) if n = 1
T (n) =
 T n + T n + Θ(n) else
   
2 2

From these constraints, we can find what complexity class T (n)


belongs to. Can you guess the solution here? We will see different
ways to solve this type of equations on the following pages.
Note that in practice we also have T (2) = Θ(1) and T (3) = Θ(1)
because the number of operations needed to process a fixed-size
input can always be bounded by a constant. So we usually write
l n m j n k
T (n) = T +T + Θ(n)
2 2
without mentioning that T (n) = Θ(1).
algo 31

Exploring Recurrence Equations #include <stdio.h>

Let us first consider recurrence equations that do not involve the Θ, unsigned m(unsigned n)
{
O, Ω notations. For instance, let M(n) denote the number of times if (n == 1) return 0;
line 4 of Merge (page 29) is executed while running MergeSort return m(n / 2) + m(n - n/2) + n;
}
(page 30) on an array of length n. Since each call to Merge on a
sub-array of length n executes line 4 exactly n times, we have: unsigned m_floor(unsigned n)
{
if (n == 1) return 0;

0 if n = 1 return 2 * m_floor(n / 2) + n;
M(n) =
}
 n   n 
M +M 2 + n for n ≥ 2
2
unsigned m_ceil(unsigned n)
At first, the mix of d·e and b·c might look intimidating. One can {
wonder if it would not be easier to solve equations such as if (n == 1) return 0;
return 2 * m_ceil(n - n / 2) + n;
}
Mfloor (n) = 2Mfloor (b n2 c) + n with Mfloor (1) = 0
or Mceil (n) = 2Mceil (d n2 e) + n with Mceil (1) = 0 int main()
{
for (unsigned n = 1; n <= 256; ++n)
We can write a small program (Figure 48) to compute the first val- printf("%5u %5u %5u %5u\n",
ues from these functions and plot them (Figure 49 on this page, and n, m_floor(n),
m(n), m_ceil(n));
Table 3 on next page). What can we make from this plot? First, we }
obviously have Mfloor (n) ≤ M(n) ≤ Mceil (n) and this is easy to prove N Figure 48: Computing M(n),
from our definitions. Then, these three functions coincide on values Mfloor (n), and Mceil (n) to draw Fig-
ure 49.
of n that are powers of 2: this should not be a surprise as d·e and b·c
are useless in this case. If n = 2m , solving any of the these equations 2,000
Mceil
amounts to solving: M
1,500 Mfloor
m m −1 m 0
M(2 ) = 2M(2 ) + 2 if m ≥ 1, and M(2 ) = 0
1,000
M (2m ) M (2m −1 )
Dividing everything by 2m , we have = 2m 2m −1
+ 1, and we
500
can iterate this definition until we reach M(20 ):
0
M (2m ) M (2m −1 ) M (2m −2 ) M (2m −3 ) M ( 20 ) n
2m = 2m −1
+1 = 2m −2
+2 = 2m −3
+3 = ··· = 20
+m = m 1 32 64 128 256
N Figure 49: Plot of M(n), Mfloor (n),
So M(2m ) = m2m and since m = log2 n it follows that M (n) = and Mceil (n), as computed in Figure 48.
n log2 n, but only if n is a power of two. How far is n log2 n from
1.6 Mceil (n)/n log2 n
M (n)? After writing another small program, we can plot Figure 50:
M (n)/n log2 n
M (n) appears closer to n log2 n than Mfloor and Mceil are. From the 1.4 Mfloor (n)/n log2 n
same figure, we also easily see (this is not a proof) that all three 1.2
functions satisfy 12 n log2 n ≤ M(n) ≤ 2n log2 n, which means that
1
they are all in Θ(n log n).
We will see later81 that as long as all we want is a complexity class 0.8
(such as Θ(n log n)), we can usually ignore the d·e or b·c functions 0.6 n
in this type of recurrence equations. 1 256 1,024 2,048
However, if we need an exact solution these d·e or b·c functions N Figure 50: The ratio between the
do matter. Figure 49 leaves no doubt about that. On next page, we three M functions, and n log2 n.
81
show how to compute an exact solution for M(n). cf. page 33
algo 32

9 Solving Recurrence Equations by Differentiation n Mfloor (n) M(n) Mceil (n)


1 0 0 0
Let us consider the recurrence M from previous page: 2 2 2 2
 3 3 5 7
4 8 8 8
0 if n = 1 5 9 12 19
M(n) =  n   n 
M +M2 + n for n ≥ 2
2
6 12 16 20
7 13 20 23
8 24 24 24
We will solve this equation by calculating U (n) = M(n + 1) − 9 25 29 47
M (n) and then realizing that U (n) satisfies a recurrence equation we 10 28 34 48
11 29 39 51
have already seen previously. 12 36 44 52
Notice first that n2 = n+
   1
2 , so we can rewrite M(n) using only 13 37 49 59
b·c: 14 40 54 60
15 41 59 63
  16 64 64 64
n+1 j n k
M(n) = M +M +n N Table 3: The first values of Mfloor (n),
2 2 M (n), Mceil (n), as defined on page 31.
   
n+2 n+1
M ( n + 1) = M +M +n+1
2 2
 
n+2 j n k
M ( n + 1) − M ( n ) = M −M +1
2 2
j n k  j n k
M ( n + 1) − M ( n ) = M +1 −M +1
2 2
Now if we let U (n) = M(n + 1) − M(n) we have

2 if n = 1
U (n) =  n 
U + 1 if n > 2
2

Do you recognize this equation? For any n ≥ 1 this definition of


U (n) is the same as the definition of S(n) from page 24, so we can
conclude that U (n) = 2 + blog2 nc.
Now, since we have U (n) = M (n + 1) − M (n), it follows that

M ( n ) = M (1) + U (1) + U (2) + U (3) + · · · + U ( n − 1)


M(n) = 0 + ∑ U (i )
1≤ i < n

M(n) = ∑ (2 + blog2 nc)


1≤ i < n

M ( n ) = ( n − 1) + ∑ (1 + blog2 i c)
1≤ i < n

And this is a sum we studied on page 15:

n −1
∑ (1 + blog2 ic) = n + 1 + nblog2 (n − 1)c − 2blog2 (n−1)c+1 82
It is very easy to forget a b·c or a little
i =1 −1 somewhere while making this kind
of calculation. To detect such mistakes,
So we can conclude82 that I usually evaluate both formulas (here,
 the definition of M (n) at the top of the
0 if n = 1 page, and the one at the bottom) on a
M(n) = handful of values, and check that they
2n + nblog (n − 1)c − 2blog2 (n−1)c+1 if n ≥ 2 are the same (here, the values should
2
be those given by Table 3).
algo 33

Master Theorem for Recurrence Equations


The following theorem can be used to solve many (but not all) recur-
rence equations derived from recursive algorithms, typically those
obtained from divide-and-conquer algorithms.83 83
Divide-and-conquer algorithms will
typically perform a recursive calls on
Theorem 1 (Master Theorem). Consider a recurrence equation such as sub-problems of size nb , and use f (n)
operations to split the problem and
merge the sub-solutions.
T ( n ) = Θ (1) for n < n0
n 
T (n) = aT + O(1) + f ( n ) for n ≥ n0
b
84
These are the cases where ε or c
with a ≥ 1, b > 1, and n0 > 0. cannot be found. For instance, if you
1. If f (n) = O(n(logb a)−ε ) for some ε > 0, then T (n) = Θ(nlogb a ). consider T (n) = 2T (n/2) + n log2 n,
you can show that n log2 n = Ω(n1 )
2. If f (n) = Θ(nlogb a ), then T (n) = Θ(nlogb a log n). but you cannot find any ε > 0 such that
n log2 n = Ω(n1+ε ), so the theorem
3. If f (n) = Ω(n(logb a)+ε ) for some ε > 0, and if a f ( nb ) ≤ c f (n) for
does not apply.
some c < 1 and all large values of n, then T (n) = Θ( f (n)).
4. In other cases84 , the theorem does not apply.
85
Note how the nb + O(1) in the theo-
Figures 51–53 illustrate this theorem by picturing the work f (n) rem accommodates any terms like nb , or
performed by each recursive call as a rectangle. b nb c, or even d n+b 5 e. This is great news:
Examples: no need to worry about integer parts
anymore!
• T (n) = T n2 + T n2 + Θ(n). This is the recursive equation
   
Θ(n)
for the complexity of MergeSort, as established on page 30. We
can rewrite it as85 : T (n) = 2T n2 + O(1) + Θ(n).


∼ log3 n
So we have a = b = 2 and f (n) = Θ(n). We compute nlogb a =
nlog2 2 = n1 , and we now have to check which case of the the-
orem applies. Is f (n) in O(n1−ε ) (case 1), in Θ(n1 ) (case 2), or
in Ω(n1+ε ) (case 3)? Obviously we are in the second case since N Figure 51: If T (n) = 2T (n/3) + Θ(n)
we are in the third case of the theorem:
f (n) = Θ(n) = Θ(n1 ). We therefore conclude immediately that the work performed by recursive calls
MergeSort has complexity T (n) = Θ(n log n). diminishes exponentially fast, so only
the initial Θ(n) matters.
• T (n) = T (b n2 c) + Θ(1). This is the worst case complexity of Θ(n)
BinarySearch (page 24). We have a = 1, b = 2, and f (n) =
Θ(1). nlog2 1 = n0 = 1. Again in case 2, we conclude that T (n) =
∼ log3 n

Θ(n0 log n) = Θ(log n) for the worst case of BinarySearch.86



• T (n) = n + 3T (n/4). We have b = 4, a = 3, and log4 3 ≈ 0.792.

We have n = n1/2 = O(n(log4 3)−ε ) if we take for instance ε =
N Figure 52: If T (n) = 3T (n/3) +
0.2. So this is the first case of the theorem, an T (n) = Θ(nlog4 3 ). Θ(n) we are in the second case of the
theorem: the total work performed
• T (n) = n2 + 3T (n/4). Same constants, different f (n). This times, at each level of the recursion is the
n2 = Ω(n(log4 3)+ε ) if we take for instance ε = 1. Furthermore, the same, so the complexity Θ(n) has to be
multiplied by Θ(log n).
function f (n) = n2 verifies 4 f (n/3) ≤ cn2 if we take for instance
c = 1/2, so T (n) = Θ(n2 ). 86
So we can say that BinarySearc h is
Θ(n) O(log n) in general.

J Figure 53: If T (n) = 4T (n/3) + Θ(n)


∼ log3 n

we are in the first case of the theorem:


the total work performed at each level
increases exponentially, and the work
performed on the last level dominates
everything else.
Θ(nlog3 4 )
algo 34

Establishing Upper Bounds by Mathematical Induction


Here we review a way to prove that T (n) = O( f (n)) when you have 87
Guessing the solution is also an
option: sometimes the recurrence
some recursive equation for T (n) but do not want to (or cannot) use
equation looks like some equation you
the master theorem. However, like all inductive proofs, you need to have already solved, or some equation
know87 the solution (i.e., f (n)) before you can start the proof. that you would know how to solve,
and it seems legitimate to estimate
To prove the T (n) = O( f (n)), we need to show that there exists that the solution should be be similar.
some constant c > 0 and some n0 ∈ N such that for values of n You would then use mathematical
induction to confirm your guess.
larger than n0 we have | T (n)| ≤ c f (n).88 Note that once we have a 88
This is the definition of T (n) =
constant c that works, we can decide to use a larger c and the proof O( f (n)), as seen on page 25.
will still work. In fact, the ability to pick a c large enough is often
necessary to complete the proof.
Ideally, we make our inductive proof as follows:
1. We write our inductive hypothesis89 , Hn : | T (n)| ≤ c f (n). 89
When it is clear that T (n) is positive,
as in the examples that will follow, the
2. We show that Hn0 holds, from some n0 and c we supply. absolute value can be omitted.
3. We show that Hn holds for any n > n0 , if we assume Hn0 , Hn0 +1 ,
…, and Hn−1 .
4. We conclude that T (n) = O( f (n)).
Sometimes we fail at step 3 and we may have to revise our definition
of f (n) before giving up.

F
t f

 Θ (1) if n = 1
Example: Assume we have T (n) =
 T (bn/2c) + Θ(1) if n > 1
and we want to prove that T (n) = O(log n).
Our inductive hypothesis Hn is that T (n) ≤ c log2 n. Clearly we
cannot prove H1 because T (1) is a constant necessary larger than
c log2 1 = 0. However T (2) = T (1) + Θ(1) = Θ(1) is a constant
as well, so we can choose c large enough so that T (2) ≤ c log2 2 =
c, and H2 holds. Similarly, T (3) = T (2) + Θ(1) = Θ(1) is also
a constant, so clearly H3 holds if we keep c larger than these two
constants.
Now for any n ≥ 4 let us assume that H2 , H3 , . . . , Hn−1 holds and
let us prove Hn . By definition of T (n) we have

T (n) = T (bn/2c) + Θ(1)

Since n ≥ 4 we have bn/2c ≥ 2, so we use hypothesis Hbn/2c :


jnk n
T (n) ≤ c log2 + Θ(1) ≤ c log2 + Θ(1)
2 2
T (n) ≤ c log2 n − (c − Θ(1))

Now we argue that since we can pick c as large as we want, we can


make sure that c − Θ(1) is positive. Therefore

T (n) ≤ c log2 n

We have proved Hn , and by mathematical induction we conclude 90


Exercise: Using the same recursive
definition of T (n), adapt this method to
that T (n) = O(log n).90
demonstrate that T (n) = Ω(log n).
It would be nice if it was always that easy!
algo 35

When Mathematical Induction on Recurrence Fails


The example from the previous page is a case where the proof goes
well. This is not always the case and sometimes we have to adapt
our induction hypothesis in order to complete the proof.
Consider the recurrence

1 if n = 1
T (n) =
2T (bn/2c) + 1 if n > 1

From our experience, we guess that T (n) = O(n), so let’s try to prove
the hypothesis Hn : T (n) ≤ cn for some constant c. Clearly H1 is
true. So for any n > 1, let us assume that H1 , . . . , Hn−1 hold, and use
that to prove Hn :

T (n) = 2T (bn/2c) + 1

since n > 1, we have 1 ≤ bn/2c < n and can apply Hbn/2c :

T (n) ≤ 2cbn/2c + 1
T (n) ≤ 2cn/2 + 1
T (n) ≤ cn + 1

Unfortunately, this last equation does not imply Hn , so we cannot


conclude our inductive proof.
The trick is to realize that we are just off by a constant, i.e., some-
thing negligible in front of the O(n) bound we are trying to estab-
lish. In order to get rid of this constant, we can introduce one in our
hypothesis. Let us attempt the same proof with Hn0 : T (n) ≤ cn − 1.
Hypothesis H10 still hold for a c large enough. Again, for any n > 1,
let us assume that H10 , . . . , Hn0 −1 hold, and use that to prove Hn0 :

T (n) = 2T (bn/2c) + 1

we apply Hb0 n/2c :

T (n) ≤ 2(cbn/2c − 1) + 1
T (n) ≤ 2cbn/2c − 1
T (n) ≤ 2cn/2 − 1
T (n) ≤ cn − 1

Now, this is exactly Hn0 , so by mathematical induction we have


proved that T (n) ≤ cn − 1 holds for all n ≥ 1. This of course im-
plies that T (n) ≤ cn for all n ≥ 1 and hence that T (n) = O(n).
algo 36

More Examples of Complexities

Let us apply the techniques we learned so far to different algorithms


and operations on data structures. The presentation of those al-
gorithms and data structures, which you should probably already
know, is just a pretext to practice the computation of complexities.
The next sorting algorithm we study is HeapSort. It uses a
data structure called heap, which is a nearly complete binary tree (see each level
is complete
below) with some additional constraints.
with 2h nodes

Nearly Complete Binary Trees


nodes may only be missing
on the right of the last level
A nearly complete binary tree is a binary tree in which all levels, except
possibly the last, are fully filled, and furthermore, the nodes from
N Figure 54: A nearly complete binary
the last level are filled from left to right. Figure 54 gives an example. tree has all its levels complete, except
A nearly complete binary tree with n nodes can be efficiently rep- maybe the last one where all nodes are
flush left.
resented as an array of n elements, as illustrated by Figure 55: the
array is simply filled by reading the values of the tree one level after 0 4
the other, i.e., from top to bottom and from left to right. The require- 1 2 2 8
ment that a nearly complete binary tree can only have missing nodes
3 7 4 3 5 4 6 0
at the end of its last level stems from this array-based representation:
7 7 8 9 9 4
we do not want any hole in the array.
0 1 2 3 4 5 6 7 8 9
This array-based representation is very space efficient since it
4 2 8 7 3 4 0 7 9 4
does not need to store any pointer to parent and children. A node
N Figure 55: A nearly complete binary
can be referred to by its index i in the array, and the index of its tree storing integers, and its representa-
parent and children can be computed from i. Assuming the number tion as an array of integers.
of nodes (i.e., the size of the array) is known to be n, we have the
following formulas91 : 91
The formulas are different if array
indices start at 1 instead of 0.
LeftChild(i ) = 2i + 1 if 2i + 1 < n
RightChild(i ) = 2i + 2 if 2i + 2 < n
i−1
 
Parent(i ) = if i > 0
2
Furthermore, if a nearly complete binary tree has n nodes, we
know it has exactly b n2 c internal nodes and d n2 e leaves. These leaves
are necessarily stored at positions b n2 c to n − 1 in the array. This fact
will be used in BuildHeap92 to work on all subtrees but the leaves. 92
cf. p. 37

0 9
Heaps 1 7 2 8

A max-heap is a nearly complete binary tree storing elements in an order 3 7 4 4 5 4 6 0


that satisfies the following heap constraint: the value of any node 7 2 8 4 9 3
must be greater than (or equal to) that of its children. A min-heap 0 1 2 3 4 5 6 7 8 9
can be defined similarly (each node has a value less than that of its 9 7 8 7 4 4 0 2 4 3
children), but we will only focus on max-heaps from now on. N Figure 56: A (max-)heap storing the
For instance, the heap of Figure 56 was built from the nearly same set of values as in Fig. 55.

complete binary tree of Figure 55 by applying the algorithm Build-


Heap.
Max-heaps have the important property that the maximum value
can always be found at its root. This can be used for sorting.
algo 37

Heapify and B u i l d H e a p Heapify( A, i, m)

The Heapify function is the main building block for the Build-
Heap algorithm. Let A be an array of size m storing a nearly com- i
plete binary tree. Heapify takes the index i of a node whose left
` r
and right children are already known to be subtrees that satisfy the i
heap heap
heap property, and it rearranges the values of i and its children so
that the subtree rooted in i has the heap property. These conditions heap
are illustrated by Figure 57.
N Figure 57: Pre- and post-conditions
Note that if the left child ` of i satisfies the heap property, its value
of Heapify. The input is a node i whose
A[`] is necessarily the maximum of the left subtree. Similarly, A[r ] children subtrees are already known to
is the maximum of the right subtree. If A[i ] is already greater than satisfy the heap property. In the output
the entire subtree rooted in i satisfies
A[`] and A[r ], then the subtree rooted in i already satisfies the heap the heap property. This implies that
property. Otherwise, two of these three values have to be swapped: A[i ] in the output should be equal to
max( A[i ], A[`], A[r ]) in the input.
bringing the maximum at the top, and possibly destroying the heap
property of one of the children (but this can be fixed recursively).
0 1
Heapify( A, i, m)
1 2 2 7
1 ` ← LeftChild(i ) Θ (1)
2 r ← RightChild(i ) Θ (1) 3 4 4 5 5 5 6 4
3 if ` < m and A[`] > A[i ] Θ (1) 7 2 8 3 9 3 10 1
4 g←` O(1) When running Heapify( A, 1, 11) on
the above tree, A[1] is swapped with
5 else A[4] on line 10. 0 1
6 g←i O(1)
7 if r < m and A[r ] > A[ g] Θ (1) 1 5 2 7

8 g←r O(1) 3 4 4 2 5 5 6 4
9 if g 6= i Θ (1) 7 2 8 3 9 3 10 1
10 A [i ] ↔ A [ g ] O(1) The subtree of 4 is then corrected by
calling Heapify( A, 4, 11) recursively.
11 Heapify(A, g, m) ?
0 1
Figure 58 illustrates this algorithm on an example. Using Heapify 1 5 2 7
to turn a complete binary tree into a heap is now quite easy: notice
3 4 4 3 5 5 6 4
that all leaves already satisfy the heap property, so all we need is to
7 2 8 3 9 2 10 1
call Heapify on the internal nodes, in a bottom-up way. Remem-
N Figure 58: Execution of
ber that the first leave is at position bn/2c in the array, so the last Heapify( A, 1, 11) on an example.
internal node is just before. States colored in blue are roots of
subtrees with the heap property.
BuildHeap( A, n)
1 for i from bn/2c − 1 down to 0: Θ(n)
2 Heapify( A, i, n) ?

Figure 59 runs BuildHeap on the nearly complete binary tree


used as example on the previous page.

H Figure 59: Running BuildH eap on


the nearly complete binary tree from
Fig. 55 produces the heap of Fig. 56.
4 4 4 4 4 9
2 8 2 8 2 8 2 8 9 8 7 8
7 3 4 0 7 4 4 0 9 4 4 0 9 4 4 0 7 4 4 0 7 4 4 0
7 9 4 7 9 3 7 7 3 7 7 3 2 7 3 2 4 3
algo 38

The Complexity of H e a p i f y
Page 37 presents Heapify and BuildHeap, but does not give their
complexity.
Heapify contains different execution branches. The most efficient
scenario is obviously when g = i on line 9, because then no recursion
i
occurs. In this case, Heapify executes in constant time.
For the recursive case, it is instructive to consider different ways
to measure the size of the input.
x nodes x nodes
• Heapify( A, i, m) will only work on nodes that belong to the
x + 1 nodes
subtree rooted in i. So we could use TH (s) to denote the time
N Figure 60: Worst case for the recur-
complexity of Heapify on a subtree of s nodes. When Heapify sion of Heapify: the left subtree has
recurses into one of the two children of i, how many nodes are slightly more than twice the number of
nodes of the right subtree. If s = 3x + 2,
left in the worst case? To answer that, look at Figure 60: because the left subtree has 2x + 1 = (2s − 1)/3
the last level of a heap is not necessarily full, the left subtree can nodes.
actually have twice the numbers of nodes of the right one. The
left subtree can actually have up to d(2s − 1)/3e = 2s/3 + O(1)
nodes. We therefore have the following recurrence:

 Θ (1) if s = 1
TH (s) ≤
 TH (2s/3 + O(1)) + Θ(1) if s > 1

This is not exactly the form of the Master theorem93 because of 93


cf. p. 33
the inequality. However, we can use the Master theorem to find
that U (s) = U (2s/3 + O(1)) + Θ(1) has for solution U (s) =
Θ(log s), and from that we conclude:

TH (s) ≤ U (s) = Θ(log s) hence TH (s) = O(log s).

• Another option is to express the complexity TH (h) of Heapify


working on a subtree of height h. Each recursive call reduces the
height by one, so we have TH (h) ≤ TH (h − 1) + Θ(1) until we
handle a leaf with TH (0) = Θ(1). By iterating this definition, we
easily find that TH (h) = O(h):

TH (h) ≤ TH (h − 1) + Θ(1)
TH (h) ≤ TH (h − 2) + Θ(1) + Θ(1)
..
.
TH (h) ≤ TH (0) + Θ(1) + . . . + Θ(1)
| {z }
TH (h) ≤ (h + 1)Θ(1) h terms
TH (h) ≤ Θ(h)
TH (h) = O(h)

Note that these two results, TH (s) = O(log s) and TH (h) = O(h),
are compatible because h = Θ(log s) for complete binary trees.94 94
Exercise: Prove that any complete
We will use both expressions for TH on next page, to compute the binary tree of s nodes has a height of
exactly h = blog2 sc.
complexity of BuildHeap.
algo 39

The Complexity of B u i l d H e a p

BuildHeap( A, n)
1 for i from bn/2c − 1 down to 0: Θ(n)
2 Heapify( A, i, n) ?
Having established the complexity of Heapify on page 38, we
only need to answer one question before we can give complexity
TBH (n) of running BuildHeap: ‘‘what is the cost of line 2?’’
• We can consider that in the worst case, Heapify runs on a sub-
tree of n nodes. This is the case when called with i = 0 and the
Heapify call then costs TH (n) = O(log n). It costs less in the
other iterations, but O(log n) already gives an upper bound any-
way. Since there are bn/2c iterations, the total complexity can be
expressed as follows:
TBH (n) = Θ(n) + bn/2cO(log n) blog2 nc
| {z } | {z }
line 1 line 2
TBH (n) = Θ(n) + Θ(n)O(log n)
TBH (n) = O(n log n)
However, that is a crude upper bound, because we considered h
that all calls to Heapify cost as much as the last one.
N Figure 61: The number of subtrees
• In practice, Heapify is called on many small subtrees where of height h in a complete binary tree of n
it has constant cost. For instance, on all subtrees of height 1, nodes without missing nodes on the last
level, can be expressed as the number of
Heapify costs TH (1) = Θ(1). A more precise evaluation of
nodes at depth d = blog2 nc − h, that
line 2 would therefore account for the different sizes of each sub- is 2blog2 nc−h . This value is smaller or
tree considered. Let S(h, n) be the number of subtrees of height h equal to 2log2 (n)−h = n/2h .
Now if the binary tree is nearly complete
in a heap of size n. We can express the complexity of BuildHeap (i.e., it has missing nodes), n/2h is
as: still an upper bound of the number of
blog nc subtrees with height h.
TBH (n) = Θ(n) + ∑ S(h, n) TH (h) (10) So we conclude that S(h, n) ≤ n/2h .
h =1
| {z }
line 1 | {z }
line 2
Indeed: we have S(h, n) subtrees of height h, the call to Build-
Heap costs TH (h) for each of them, and we are running Build-
Heap on all subtrees with heights ranging from 1 (the node just
above the leaves) to blog nc (for the root95 ). 95
See the remark 94 on p. 38.

Finding an exact formula for S(h, n) is tricky, but we can establish


the upper bound S(h, n) ≤ n/2h as shown in Figure 61. From that
we have:
blog nc blog nc blog nc
!
nO(h) h
∑ S(h, n)TH (h) ≤ ∑ 2h = n O ∑ 2h = O(n) ∞
h =1 h =1 h =1 1
96
Start from ∑ rk =
1 − r
which can
The trick is to recognize the sum as the start of a series that con- k =0
be established for any |r | < 1 from
verges96 , so it can be reduced to O(1). Plugging this in equa- eq. 7 p. 12. Differentiate both sides
tion (10), we get: w.r.t. r, then multiply by r to obtain

r
TBH (n) = Θ(n) + O(n) = Θ(n) ∑ krk = (1 − r)2 . In our case we have
k =0
∞  k
1 1
A complexity that is both lower (n versus n log n) and more pre- r = and ∑ k converges to 2.
2 k =0
2
cise (Θ versus O) than our first attempt!
algo 40

After BuildHeap
HeapSort 0 9
Sorting an array in ascending order using a max-heap is easy: once 1 7 2 8
the heap has been built, its topmost value (i.e., the first value of the 3 7 4 4 5 4 6 0
array) is the maximum. This maximum should be therefore moved 7 2 8 4 9 3
to the end of the array. If we do that with an exchange, and new 0 1 2 3 4 5 6 7 8 9
consider only the first n − 1 values to be part of the tree, we are in 9 7 8 7 4 4 0 2 4 3
the situation depicted on Figure 62: calling Heapify on the root After A[0] ↔ A[9]
of this (restricted) tree is all we need to sift up its maximum value. 0 3
This can be iterated to sort the entire array: each iteration places one 1 7 2 8
new value at its correct place, and reorder the remaining heap. 3 7 4 4 5 4 6 0
7 2 8 4 9 9
HeapSort( A, n)
0 1 2 3 4 5 6 7 8 9
1 BuildHeap( A, n) Θ(n)
3 7 8 7 4 4 0 2 4 9
2 for i from n − 1 down to 1 Θ(n)
After Heapify( A, 0, 9)
3 A [0] ↔ A [ i ] Θ(n)
0 8
4 Heapify( A, 0, i ) O(n log n)?
1 7 2 4
The complexity of the first three lines of HeapSort, should be 3 7 4 4 5 3 6 0
quite obvious: we know the cost of BuildHeap from page 39, and 7 2 8 4 9 9
line 3 is a constant-time operation repeated n − 1 times. That leaves 0 1 2 3 4 5 6 7 8 9
us with the cost of line 4. 8 7 4 7 4 3 0 2 4 9
The first call to Heapify is done on an array of size n − 1, so its
After A[0] ↔ A[8]
cost should be TH (n − 1) = O(log(n − 1)) = O(log n) according 0 4
to what we established on page 38. The following iterations will 1 7 2 4
call Heapify on smaller arrays, so we can still use O(log n) as an 3 7 4 4 5 3 6 0
upper bound, and claim that the sum of all the these calls will cost 7 2 8 8 9 9
(n − 1)O(log n) = O(n log n). 0 1 2 3 4 5 6 7 8 9
It would be legitimate to ask whether we could get a better com- 4 7 4 7 4 3 0 2 8 9
plexity bound by being more precise when summing the costs of the
After Heapify( A, 0, 8)
different calls to Heapify like we did for BuildHeap on page 39.
0 7
Here the total work performed by all iterations of line 4 is
1 7 2 4
! !
n −1 n −1 n −1 n −1 3 4 4 4 5 3 6 0
∑ TH (i) = ∑ O(log i) = O ∑ log i = O log ∏ i
7 2 8 8 9 9
i =1 i =1 i =1 i =1
 0 1 2 3 4 5 6 7 8 9
= O log((n − 1)!) (11) 7 7 4 4 4 3 0 2 8 9
Stirling’s formula is a powerful tool to simplify expressions involv- 2 7 4 4 4 3 0 7 8 9
ing factorials, if you can remember it. We have 7 4 4 2 4 3 0 7 8 9
√  n n 0 4 4 2 4 3 7 7 8 9
n! ∼ 2πn hence log2 (n!) ∼ n log2 n. 4 4 4 2 0 3 7 7 8 9
e
3 4 4 2 0 4 7 7 8 9
(For another way to obtain the equivalence on the right, see page 14.) 4 3 4 2 0 4 7 7 8 9
We can therefore return to equation (11) and simplify it: 0 3 4 2 4 4 7 7 8 9
n −1 4 3 0 2 4 4 7 7 8 9
∑ TH (i) = O

(n − 1) log(n − 1) = O(n log n) 2 3 0 4 4 4 7 7 8 9
i =1 3 2 0 4 4 4 7 7 8 9
Unfortunately, this result is not better than our original approxima- 0 2 3 4 4 4 7 7 8 9
2 0 3 4 4 4 7 7 8 9
tion. We conclude that HeapSort( A, n) runs in O(n log n).
0 2 3 4 4 4 7 7 8 9
Can you explain the fundamental difference between the loops 0 2 3 4 4 4 7 7 8 9
of BuildHeap and HeapSort? Why is one O(n) and the other
N Figure 62: Progression of Heap-
O(n log n)? Sort, starting from the entire heap.
algo 41

0 ` r n
Partition
··· unsorted ···
The Partition algorithm is a building block for QuickSort. Par-
tition reorders a given range of elements in an array, such that all p ← Partition( A, `, r )

the elements in the left-hand side of the range are smaller than those 0 ` p r n
in the right-hand side as pictured by Figure 63. The resulting range ··· unsorted ≤ unsorted ···
does not need to be sorted.97
N Figure 63: Overview of the Pa rti-
One way to implement Partition is to choose a value, let’s say tion algorithm. The range A[`..r − 1]
x ← A[`] and use it as a threshold to decide whether an element is reordered so that any value in
A[`..p − 1] is less than or equal to any
A[v] can belong to the left-hand part (if A[v] ≤ x) or to the right- value in A[ p..r − 1]. The value p should
hand part (if A[v] ≥ x).98 The following implementation of the be such that ` < p < r, ensuring that
reordering is often described as the ‘‘collapse the walls’’ technique. each part is non-empty. Note that the
two parts may have different lengths.
The walls are in fact two indices i and j starting at both ends of the
range, and moving towards each other, exchanging values along the 97
Sorting A[`..r − 1] would be one way
way. to implement Partition( A, `, r ), but it
would be less efficient.
Partition( A, `, r )
98
Note that elements equal to x can go
1 x ← A[`] Θ (1)
to either side; this is on purpose.
2 i ← ` − 1; j ← r Θ (1)
3 repeat forever O( n )
` r

// find a value that can go to the right-hand side 

 4 2 8 7 3 4 0 7 9 4
4 do i ← i + 1 until A[i ] ≥ x 
Θ(n) i
4 x
j
// find a value that can go to the left-hand side 


5 do j ← j − 1 until A[ j] ≤ x

4 2 8 7 3 4 0 7 9 4
// swap the two values unless the walls collapsed i j
6 if j ≤ i O( n )
4 2 8 7 3 4 0 7 9 4
7 return i + (i = `) Θ (1)
i j
8 A [i ] ↔ A [ j ] O( n )
4 2 0 7 3 4 8 7 9 4
The ‘‘repeat forever’’ loop might look daunting, but since lines 4 i j
and 5 necessarily update i and j at each iteration of the main loop, it
is guaranteed that eventually j ≤ i and the algorithm will terminate. 4 2 0 4 3 7 8 7 9 4
i
What is less obvious is that there are exactly two ways in which j
the algorithm may terminate. Either i = j (in this case A[i ] = x), or
i = j + 1 as in Figure 64. It is not possible for i to be larger than j + 1, N Figure 64: Execution of Partition
on an example. In this case, the index
because all the values to the left of i are less than or equal to x, so the returned is i, and the algorithm has (by
loop decrementing j will stop as soon as it passes i. chance!) reordered the range in two
equal partitions.
The algorithm assumes that the range contains at least two values
(r − l ≥ 2). To argue that the returned value p satisfies ` < p < r,
consider what it would take for this to be violated: To have p = `,
line 4 should be executed only once, which means that line 5 will
execute until j = i = `. However, in this case line 7 will return i + 1
so not `.
Finally, the Θ(n) complexity of Partition should be obvious after
we realize that because of the ‘‘collapsing walls’’ strategy, the sum of
the executions of lines 4 and 5 is at least n + 1 (if we end with i = j)
and at most n + 2 (if we end with i = j + 1).
algo 42

(`, r )
QuickSort
4 2 8 7 3 4 0 7 9 4
QuickSort consists in recursively calling Partition on the two (0, 10)
parts created by Partition, until we reach an array of length 1 (that 4 2 0 4 3 7 8 7 9 4
does not need to be sorted). (0, 5)
3 2 0 4 4 7 8 7 9 4
QuickSort( A, `, r ) TQS (1) TQS (n) for n = r − ` > 1 (0, 4)
1 if r − ` > 1 Θ (1) Θ (1) 0 2 3 4 4 7 8 7 9 4
2 p ← Partition( A, `, r ) Θ(n) (0, 2)
3 QuickSort( A, `, p) TQS ( L)? for L = p − ` 0 2 3 4 4 7 8 7 9 4
4 QuickSort( A, p, r ) TQS (n − L)? (2, 4)
0 2 3 4 4 7 8 7 9 4
Figure 65 shows the effects of the different calls to Partition (5, 10)
occurring while sorting an example array with QuickSort. 0 2 3 4 4 4 7 8 9 7
(5, 7)
The proof that QuickSort actually sorts the array can be done 0 2 3 4 4 4 7 8 9 7
by induction on the length of the considered range. The induction (7, 10)
hypothesis Hn is ‘‘for any range [`..r − 1] of length r − ` = n, calling 0 2 3 4 4 4 7 7 9 8
QuickSort( A, `, r ) will sort all the elements in A[`..r − 1]’’. (8, 10)

Clearly H1 is true, because a range of length 1 is already sorted 0 2 3 4 4 4 7 7 8 9


and QuickSort does not modify the array in this case. Consider
N Figure 65: Effect of the successive
some arbitrary n > 1, and assume that Hi is true for all i < n. Run- calls to Partition( A, `, r ) during the
ning QuickSort on a range of length n > 1 will execute lines 2–4: recursion of QuickSort( A, 0, 10). The
pairs displayed on the side give the
• The result of line 2 is that all values in the range A[`..p − 1] are value of ` and r passed to Partition.
smaller than all values in the range A[ p..r − 1].
• Furthermore, we have ` < p < r, which implies that the two
ranges [`..p − 1] and [ p..r − 1] have lengths smaller than n and by
hypothesis we can therefore state that after running lines 3 and 4,
the values in A[`..p − 1] and A[ p..r − 1] are sorted.
Combining these two points, it follows that A[`..r − 1] is sorted after
executing lines 2–4.

Evaluating the complexity of QuickSort is less easy, because the


recursive calls on lines 3 and 4 are not necessarily done on ranges
of equal length. The Partition function could return any p that
satisfies ` < p < r. So if the size of the input range is n = r − `, then ` p r
after calling Partition, the left part may have a length L = p − ` ≤
anywhere between 1 and n − 1, and the right part would have the
remaining length n − L (Fig. 66). L n−L

It would therefore be tempting to express the complexity as the N Figure 66: Let us assume that Parti-
tion always puts L elements in the left
solution of part, and n − L elements in the right
 one.
 Θ (1) if n = 1
TQS (n) =
Θ(n) + TQS ( L) + TQS (n − L) if n > 1

Unfortunately that is incorrect, because the above assumes that L


would have the same value in every recursive call: i.e., Partition
would always produce a left part of size L. Clearly that is not true.
However, solving this equation can give us some clues about the
possible behaviors of QuickSort.
algo 43

Worst and Best Cases for Q u i c k S o r t


Page 42 ended with a recursive expression of the complexity TQS (n)
of sorting an array of size n, but that equation assumed that the
Partition always created a left part of length L. Let us evaluate
scenarios with different values of L.
• The case where L is always equal to 1 occurs when running
QuickSort on a sorted array. In this case TQS ( L) = Θ(1), and
the recursive equation can be simplified to 99
 Be careful when doing this type of
 Θ (1) iterative computations. It would be
if n = 1 tempting to simplify Θ(n) + Θ(n − 1)
TQS (n) =
Θ(n) + TQS (n − 1) if n > 1 as Θ(n) (this is true), then simplify
Θ(n) + Θ(n − 2) as Θ(n) (this is true as
well), and continue until we obtain that
Solving TQS (n) = Θ(n) + TQS (n − 1) iteratively99 , we find that TQS (n) = Θ(n) (which is incorrect).
What did we do wrong? The number
TQS (n) = Θ(n) + TQS (n − 1) of terms we summed is not constant:
= Θ(n) + Θ(n − 1) + TQS (n − 2) we can only perform these reductions a
constant number of times.
= Θ(n) + Θ(n − 1) + Θ(n − 2) + TQS (n − 3) If you are unsure, it is better to
replace Θ(n) by some representative
= Θ ( n ) + Θ ( n − 1) + Θ ( n − 2) + · · · + Θ (1) function of the class, like cn, and solve
F (n) = cn + F (n − 1) instead. Then you
!
n n  
n ( n + 1)
= ∑ Θ (i ) = Θ ∑ i = Θ = Θ ( n2 ) have TQS (n) = Θ( F (n))
i =1 i =1
2

So QuickSort needs Θ(n2 ) operations to sort a sorted array...100 100


This is bad, and it also implies that
this implementation of QuickSort
Note that the result is the same if L is replaced by any constant.
behaves badly for ‘‘nearly sorted’’
arrays. We discuss some mitigating
• Another interesting case would be when L = bn/2c, i.e., when
techniques on page 45.
Partition always cuts the range in its middle. Then we have
TQS (n) = Θ(n) + TQS (bn/2c) + TQS (dn/2e). n

This is the same equation as for MergeSort (page 30), so we n


10
9n
10

log10/9 n
know the solution is TQS (n) = Θ(n log n).
log10 n

n 9n 9n 81n
100 100 100 100
• What if L = n/10? For this 10%-90% scenario the equation is
TQS (n) = Θ(n) + TQS (bn/10c) + TQS (d9n/10e). 1 9 log10 n

n
10
1
Figure 67 shows the shape of the recursion tree: each node is la- 1
beled by the length of the array passed to Partition. The short-
1
est branch of the tree is the left one, where the range is always
N Figure 67: Shape of the tree of the
divided by 10: the height of this branch is log10 n. The longest recursive calls to QuickSort in a
branch is the right one, with height log10/9 n since the range is scenario where the Partition always
makes a 10%-90% split.
(slowly) divided by 10/9 at each recursive call. The work per-
formed by Partition is proportional to the value displayed on
each node of this tree, therefore the total cost of QuickSort is
proportional to the sum of all the nodes of this tree. The sum of
each line of the first log10 n lines if this tree is necessarily n, so
these sum to n log10 n. But the algorithm processes more than
101
that. The total for each remaining line is less than n, so the sum of The difference between the bad
cases and the good cases discussed on
the whole tree is less than n log10/9 n. We therefore have
this page is whether L is constant or
whether it is proportional to n. The
Θ(n log10 n) ≤ TQS (n) ≤ Θ(n log10/9 n) hence TQS (n) = Θ(n log n).
actual constant or ratio does not affect
the resulting complexity class.
The same result holds if L = n/10000 or any other ratio.101
algo 44

Average Complexity of Q u i c k S o r t
Let us start again from the equation102 The fact that TQS (1) = Θ(1) is
102

implicit here, but it implies that later


TQS (n) = Θ(n) + TQS ( L) + TQS (n − L). down the page we also have F (1) = c
and Y (1) = c.

On page 43, we considered some arbitrary (but fixed) expressions


for L to establish the complexity of Quick Sort on some particular
scenarios. However, in practice, L may take a different value in each
recursive call. All we know is that 0 < L < n because Partition
guarantees that the left and right sides may not be empty.
To derive an average complexity, assume that L is a random vari-
able taking its value uniformly in {1, 2, ..., n − 1}. We can therefore
compute T̂QS , the average complexity of QuickSort by averaging
the complexity we could obtain for each of these n − 1 different
values, recursively:

n −1
1
∑ Θ(n) + T̂QS ( L) + T̂QS (n − L)

T̂QS (n) =
n−1 L =1
!
n −1 n −1
1
T̂QS (n) = Θ(n) +
n−1 ∑ T̂QS ( L) + ∑ T̂QS (n − L)
L =1 L =1
n −1
2
T̂QS (n) = Θ(n) +
n−1 ∑ T̂QS ( L)
L =1

To avoid any errors103 , let’s replace Θ(n) by some representative 103


cf. remark 99 on p. 43
function cn. The new function F (n) is such that T̂QS (n) = Θ( F (n)):

n −1
2
F (n) = cn +
n−1 ∑ F ( L)
L =1

To get rid of the sum, we first multiply both sides by n − 1 to get rid
of the non-constant factor in front of the sum, and then subtract the
same expression for F (n − 1):

n −1
(n − 1) F (n) = (n − 1)cn + 2 ∑ F ( L)
L =1
n −2
( n − 2) F ( n − 1) = ( n − 2) c ( n − 1) + 2 ∑ F ( L)
L =1
(n − 1) F (n) − (n − 2) F (n − 1) = 2c(n − 1) + 2F (n − 1)
(n − 1) F (n) = 2c(n − 1) + nF (n − 1)
104
The fact that ∑in=1 1i = Θ(log n)
can be derived from Euler’s formula
(∑in=1 1i = ln n + γ + o (1)), or easily
Let’s divide both sides by n(n − 1) and then set Y (n) = F (n)/n: proven by bounding the sum with
integrals as done on page 14:
F (n) 2c F ( n − 1)
= + Z n +1
1 n
1
Z n
1
n n n−1
2 i
di ≤ ∑i ≤
1 i
di
n i =2
2c 1
Y (n) = + Y (n − 1) = 2c ∑ n
1
n i =1
i ln(n + 1) − ln(2) ≤ ∑i ≤ ln(n)
i =2
n
1
From this harmonic series104 , we conclude that Y (n) = Θ(log n), Θ(log n) ≤ ∑i ≤ Θ(log n)
i =2
hence F (n) = Θ(n log n). The average complexity of QuickSort is n
1
therefore T̂QS = Θ(n log n). Θ(log n) ≤ ∑i ≤ Θ(log n)
i =1
algo 45

QuickSort Optimizations
Typical QuickSort optimizations include:
• Selecting a different pivot value in the Partition procedure from
page 41. The ideal value would be the median of the range as it
would ensure equal size for both sides. However, the median is
not really easy to compute without sorting the range already.105 105
It is possible to find the median of
The usual strategy is to pick the median of the three values A[`], an array with only Θ(n) operations
using an algorithm sometimes called
A[r − 1] and A[b(r + `)/2c]. Line 1 of Partition is replaced ‘‘median of medians’’ (cf. page 49).
by x ← MedianOf3( A[`], A[r − 1], A[b(r + `)/2c]). With this However this would be very inconve-
nient here: firstly the constant hidden
change, QuickSort deals nicely with nearly-sorted arrays.106 behind the Θ(n) notation is quite large,
and secondly this algorithm is itself
• The last recursive call to QuickSort is a tail call, so it can be based on a recursive procedure similar
optimized as a loop. Compare these equivalent implementations: to QuickSort.
106
Input arrays that trigger the worst-
QuickSort( A, `, r ) QuickSort( A, `, r ) case Θ(n2 ) complexity still exist (see
1 if r − ` > 1 1 while r − ` > 1 Fig. 68 page 46), but are harder to come
2 p ← Partition( A, `, r ) 2 p ← Partition( A, `, r ) by.

3 QuickSort( A, `, p) 3 QuickSort( A, `, p)
4 QuickSort( A, p, r ) 4 `←p
Any decent compiler would already do this kind of tail call elimi-
nation automatically: this saves memory, because the value of the
local variables have to be saved on the stack before each call.
However, what the compiler cannot guess is that the order of
the two recursive calls to QuickSort does not matter: we can
actually choose which of the two calls should be turned into a loop.
Here, we want to always recurse on the smaller part, to keep the
recursion as shallow as possible.
QuickSort( A, `, r )
1 while r − ` > 1
2 p ← Partiti on( A, `, r )
3 if p − ` ≤ r − p
4 QuickSort( A, `, p)
5 `←p
6 else
7 QuickSort( A, p, r )
8 r←p

While this does not change the time complexity of the algorithm,
it changes its memory complexity107 . Indeed the memory com- 107
I.e., the number of additional mem-
plexity was O(n) in our first implementation of QuickSort ory an algorithm requires to process its
input — this includes the stack in case
because the recursion could be n-deep in the worst case; it is now of recursive algorithms.
O(log n) because there is no way to recurse on an sub-array larger
than n/2.

• Use InsertionSort when the array has a small length (like 10


values — the precise bound has to be found empirically). Even
if InsertionSort has O(n2 ) complexity, it usually performs a
lot better than QuickSort for small-sized input, because it does
not have all the overhead of running Partition and making
recursive calls.
algo 46

0 8 2 10 4 12 6 1 3 5 7 9 11 13
IntroSort, or How to Avoid Q u i c k S o r t’s Worst Case
Even with the usual optimizations described on page 45, it is pos- 0 1 2 10 4 12 6 8 3 5 7 9 11 13
sible to construct an input that triggers QuickSort’s worst case.
Figure 68 shows an example where the pivot selected by Medi- 0 1 2 3 4 12 6 8 10 5 7 9 11 13
anOf3 is the second smallest value, so that Partition does a sin-
gle swap and creates a left part with L = 2 elements, and a right 0 1 2 3 4 5 6 8 10 12 7 9 11 13

part with n − 2 elements. The left part can be sorted recursively in


0 1 2 3 4 5 6 7 10 12 8 9 11 13
constant time, but when sorting the right part recursively, a similar
(2, n − 2) partition is produced again. This unfortunate situation N Figure 68: A worst case for Quick-
occurs until half of the array is sorted: if we sum just the costs of Sort when the pivot is the median of
A[`], A[r − 1] and A[b(r + `)/2c].
running Par ti tion on the largest parts up to this point we have
Θ(n) + Θ(n − 2) + Θ(n − 4) + · · · + Θ(n/2) = Θ(n2 ), so the worst
case of QuickSort is necessarily reached. Any input where the re-
cursion depth108 of QuickSort is linear will lead to the worst case 108
This should be understood without
complexity of Θ(n2 ). On the contrary, if the recursion depth is in the tail call optimization discussed on
page 45
O(log n), then we obtain a best case complexity of Θ(n log n).
An easy way to avoid the worst case of QuickSort is to not use
QuickSort, and resort instead to HeapS ort or MergeSort, that
ensure a Θ(n log n) worst case. However practice shows that Quick-
Sort’s is much faster on the average, so it is usually preferred.
The idea of IntroSort109 is to execute a variant of QuickSort 109
IntroSort was invented by David
that will keep track of the depth of the recursive calls in order to Musser, one of the original authors
(with Alexander Stepanov) of the C++
ensure it is sub-linear. If that depths exceeds cblog2 nc for some con- Standard Template Library (STL). It
stant c, then IntroSort stops the recursion and sorts the current is now the default sorting algorithm
behind most (if not all) std::sort()
sub-array with HeapSort instead. Doing so ensure that the entire implementations, because the STL
array will be sorted in Θ(n log n), either by the variant of Quick- requires std::sort() to run in
Sort, or with the help of HeapSort in the difficult cases. The Θ(n log n).

constant c can be chosen arbitrarily, but it should not be too small


to give QuickSort a chance to finish and limit the number of calls
to HeapSort. In practice c = 2 is often used.
The pseudo-code below includes the tail call optimization from
page 45. It could be combined with InsertionSort as well.110 110
Here we assume that
HeapSort( A, `, r ) executes Heap-
IntroSort( A, `, r ) Sort on the sub-array A[`..r − 1]. This
1 IntroSortRec( A, `, r, 2blog2 (r − `)c) does not exactly matches to prototype
of page 40.
IntroSortRec( A, `, r, depth_limit)
1 while r − ` > 1
2 if depth_limit = 0
3 HeapSort( A, `, r )
4 return
5 else
6 depth_limit ← depth_limit − 1
7 p ← Partition( A, `, r )
8 if p − ` ≤ r − p
9 IntroSortRec( A, `, p, depth_limit)
10 `←p
11 else
12 IntroSortRec( A, p, r, depth_limit)
13 r←p
algo 47

QuickSele c t
Given an array A of n values, the value of rank k is the kth smallest
value of A. An algorithm that takes A and k as input and returns
the rank-k value is called a selection algorithm.111 In what follows, we 111
One easy (and inefficient) way to
assume 0 ≤ k < n. The value of rank k = 0 is the minimal value, and build a selection algorithm is to first sort
A in increasing order, and then pick
the value of rank n − 1 is the maximal value. its kth value. What sorting algorithm
QuickSelect can be seen as a modification of QuickSort to would you use, and what would be the
complexity of that selection algorithm?
implement a selection without sorting the entire array. The trick is
that after Partition has run, we know (from the sizes of the two
parts) in which part we need to search our value recursively.

QuickSelect( A, n, k)
1 return QuickSelectRec( A, 0, n, k)
QuickSelectRec( A, `, r, k ) T (1) T (n)
1 if r − ` = 1 then return A[`] Θ (1) Θ (1) 4 2 8 7 3 4 0 7 9 4
2 p ← Partition( A, `, r ) Θ(n) QuickSelectRec( A, 0, 10, 6)

3 L ← p−` Θ (1) 4 2 0 4 3 7 8 7 9 4
4 if k < L Θ (1)
QuickSelectRec( A, 5, 10, 1)
5 then return QuickSelectRec( A, `, p, k ) T ( L)
6 else return QuickSelectRec( A, p, r, k − L) T (n − L) 4 2 0 4 3 4 7 8 9 7
QuickSelectRec( A, 5, 7, 1)
Figure 69 shows an example.
The complexity differs from QuickSort, because at most one of 4 2 0 4 3 4 7 8 9 7
the two possible recursive calls on lines 5–6 may be executed. In the QuickSelectRec( A, 6, 7, 0) = 7
worst case, we have to assume that Partition performed poorly N Figure 69: An example execution
of QuickSelect( A, 10, 6). The figure
(L = 1), and that we always have to recurse into the largest part of shows the state of the array before the
n − 1 elements. This gives us the following recursive equation: next recursive call to QuickSelec-
tRec.
T (1) = Θ (1),
T (n) = T (n − 1) + Θ(n) for n > 1

We conclude that the worst case complexity is T (n) = Θ(n2 ).


Let us now consider a case where Partition behaves ideally,
always splitting the array of n elements in two equal halves. The
equation becomes:

T (1) = Θ (1),
T (n) = T (n/2) + Θ(n) for n > 1

The Master Theorem (p. 33) tells us the solution to this equation
is T (n) = Θ(n). While this is not the best case scenario112 , this 112
Exercise: Devise a scenario where
seems like a favorable case. If we imagine a similar scenario where calling QuickSelect on an array of
size n runs in Θ(1).
we recurse into 90% of the array (or any other ratio), the complexity
will remain linear.
On page 48 we will prove that the average complexity of QuickS-
elect is actually Θ(n), despite the quadratic worst case.113 113
Exercise: Using the idea presented
Do you believe it is possible to write a selection algorithm that page 46, write another selection algo-
rithm, called IntroSelect, with a
would always run in Θ(n)? Jump to page 49 for an answer. worst case of Θ(n log n).
algo 48

Average complexity of Q u i c k S e l e c t
To study the average complexity T̂ (n) for QuickSelect, we con-
sider (as we did on page 44) that the value of L can be anything
between 1 and n − 1 with equal probability, and we average over all
these possibilities. The problem in our case, is that we do not know
if the algorithm will recurse in a part of size L or n − L, so let us
compute an upper bound of T̂ by assuming we recurse into the largest
of these two parts.

T̂ (1) = Θ(1),
1 n −1
n − 1 i∑
T̂ (n) ≤ T̂ (max(i, n − i )) + Θ(n) if n > 1
=1

i if i ≥ dn/2e
We can simplify this a bit because max(i, n − i ) =
n − i if i ≤ bn/2c
If n is even, all the terms between T̂ (dn/2e) and T̂ (n) appear twice
in the sum. If n is even, T̂ (bn/2c) additionally occurs once, but it is
OK to count it twice since we are establishing an upper bound.

n −1
2
n − 1 i=b∑
T̂ (n) ≤ T̂ (i ) + Θ(n)
n/2c

It is not clear how to simplify this, however from the last scenario of
page 47, it seems likely that T̂ (n) = Θ(n). Since we are working on
an upper bound, let us prove that T̂ (n) = O(n) by induction114 . 114
cf. pp. 34–35
Our hypothesis H (n) is that T̂ (n) ≤ cn from some c we can pick
arbitrarily large. Clearly H (1) is true, because T̂ (n) = Θ(1), so we
can find some c larger than this Θ(1) constant. Now let us assume
that H (i ) holds for 1 ≤ i < n and let us establish H (n):115 115
In the first line, the sum over ci can
be removed by showing that
n −1
2 2c 3n2

n −1
T (n) ≤ ci + Θ(n) ≤ + Θ(n) ∑ i≤
3 2
n .
n − 1 i=bn/2c n−1 8 i =bn/2c
8

3cn(n − 1) + 3cn 3cn 3c(n − 1) + 3c


T (n) ≤ + Θ(n) ≤ + + Θ(n) n −1
(n − 1 + bn/2c)(n − bn/2c)
4( n − 1) 4 4( n − 1) ∑ i=
2
i =bn/2c
cn 3c 3c  cn 
T (n) ≤ cn − + + + Θ(n) ≤ cn + Θ(n) − n2 − n + bn/2c − bn/2c2
4 4 4( n − 1) 4 =
2
We can take c large enough so that cn/4 dominates Θ(n) if n is even:
4n2 − 4n + 2n − n2
=
T (n) ≤ cn 8
3n2 − 2n 3n2
= ≤
Conclusion: H (n) holds for all n ≥ 1 and therefore T̂ (n) = O(n). 8 8

However since QuickSelect has to call Partition for n > 1, it if n is odd:

necessarily performs at least Θ(n) operations, so we can claim that 4n2 − 4n + 2(n − 1) − (n − 1)2
=
8
its average complexity is in fact T̂ (n) = Θ(n). 3n2 − 3 3n2
= ≤
8 8
algo 49

Linear Selection
While QuickSelect has an average complexity of Θ(n), it still has
a Θ(n2 ) worst case when we arrange the data so that Partition
behaves badly (for instance using an arrangement similar to Fig. 68
on page 46). However if we could make sure that Partition would
always allow us to eliminate a number of values that is a fraction of
n (as opposed to a constant), then the selection would be linear.
This is actually the key of the LinearSelect algorithm116 : at any 116
This algorithm is also known as
‘‘median of medians’’ for reasons that
step of the recursion, it removes a quarter of the values. LinearSe-
will be soon obvious.
lect can be seen as a variant of QuickSelect where Partition is
changed to select its pivot as the median of medians (steps 1–3).
LinearSelect( A, n, k ) T (n)
1 consider the n input values as d n5 e groups of 5 val- Θ(n)
x
ues (with maybe the last group having less than 5
values) and compute the median of each group
2 compute the median x of all those d n5 e medians T (d n5 e)
3 use x as the pivot of Partition (i.e., replacing line 1 Θ(n)
of the algorithm on page 41) to reorganize A N Figure 70: The circles represent
4 recursively call LinearSelect on one of the two ≤ T ( 3n
4 )
an array of 34 values organized in
7 groups (6 groups of 5 values, and
parts (depending on k), as done in QuickSelect. one group of 4 values). We interpret
u v as meaning u ≤ v. Inside
Computing the median of 5 values can be done in constant time, each group (i.e., each column), values
so repeating this operation d n5 e times in step 1 requires Θ(n) op- have been moved so that the median is
erations. Computing the median of these d n5 e medians however on the center line, values above it are
greater, and values below it are lesser.
cannot be done in constant time since the number of values de- Similarly, columns have been ordered
pends on n; however it correspond to the value of rank bd n5 e/2c such that the median x of all medians
has three greater medians to its left,
among these d n5 e values, so we can compute it by calling LinearS- and three lesser medians to its right.
elect recursively on an array of size d n5 e. Hence, assuming that We can see that there are more than
running LinearSelect on n values takes T (n) operations, then step n/4 values that are greater than x (the
top left quarter), and more than n/4
2 needs T (d n5 e) operations. Calling Partition costs Θ(n), as seen values that are lesser than x. Therefore,
on page 41, but thanks to our pivot, we are sure that at each of the if x is used as a pivot in Partition,
both parts are guaranteed to have more
two parts contains at least 25% of the array (see Fig. 70), so in the than n/4 values.
worst case, the recursive call of step 4 will cost T ( 3n
4 ). Thus, we have:

T (1) = Θ (1)
l n m  3n 
T (n) ≤ Θ(n) + T +T
5 4
Let us prove the following induction hypothesis117 ,
Hn : ‘‘T (n) ≤ cn’’. 117
See pp. 34–35 for the technique.
Clearly H1 holds, because T (1) = Θ(1) and we can find c large One way to guess a probable solution
for this equation is to consider that
enough to dominate that Θ(1). Let us assume that H1 , H2 , …, Hn−1 T (d n5 e) seems much smaller than
hold, and inject this knowledge into our recursive inequality: T ( 3n
4 ). So as a first guess, we neglect
it and solve T (n) ≤ Θ(n) + T ( 3n 4 ). In
l n m 3cn c(n + 4) 3cn this simplified case, we find using the
T (n) ≤ Θ(n) + c + ≤ Θ(n) + +
5 4 5 4 master theorem that T (n) = O(n).
Then we use induction to verify if this

19cn 4c 4c cn
T (n) ≤ Θ(n) + + = cn + Θ(n) + − solution is still correct for the complete
20 3 3 20 problem.
We can chose c large enough so that cn/20 dominates Θ(n) + 4c 3.
Then T (n) ≤ cn, which means that Hn holds for any n ≥ 1, and this
allows us to conclude that T (n) = O(n).
We can strengthen this result to T (n) = Θ(n) because of step 3.
algo 50

Space Complexity
As mentioned on page 20 the space complexity of an algorithm,
often noted S(n), is the amount of additional memory required to
process an input of size n.
The space complexity can be computed using the same mathe-
matical tools we used for computing time complexities. We simply
sum the sizes of all local variables, and if there are function calls
(even recursive calls), we add the maximum space complexities of
all these calls.
For a simple iterative algorithm like SelectionSort (p. 21) or
InsertionSort (p. 22), there is only a couple of local variables
used to store values or indices, so S(n) = Θ(1).118 118
Here, and in all our examples, we
A recursive algorithm like BinarySearch (p. 24) has to store are making the practical assumption
that values and indices are integers
one local variable m per recursive call. Its space complexity is the stored in a fixed amount of memory.
solution of S(n) ≤ Θ(1) + S(n/2) which is S(n) = O(log n). How- Some people who are interested in
bit-level complexity may have different
ever if we account for the fact that BinarySearch is tail recursive expectation. For instance they could
and assume the compiler will perform tail calls elimination, changing say that if we want to work with
the recursion into a simple loop, then the space complexity drops to array that are arbitrarily large, we
need Θ(log n) bits to store an index.
Θ (1). Similarly, if we want be able to store
Similarly, the space complexity of HeapSort (p. 40) satisfies n distinct values in the input array, it
needs Θ(n log n) bits.
S(n) ≤ S(2n/3 + O(1)) + Θ(1), and this can be solved as for TH (n)
on page 38: S(n) = O(log n).
For an algorithm like MergeSort (p. 30), we can assume that
the array B of size Θ(n) used in Merge (p. 29) is allocated once
before the recursion, and the space requirement of the recursion is
Θ(log n). We conclude that S(n) = Θ(n).
Finally, the space complexity of QuickSort depends on how it
has been implemented. If we use the implementation described on
pages 41–42, each call to QuickSort requires Θ(1) local variables
(call to Part ition included) but we can have n − 1 recursive calls
in the worst case. Therefore S(n) = O(n). However if the trick of
page 45 is used to only recurse on the smaller of the to sides of the
partition, then the recursive depth is at most log2 n and the space
complexity drops to S(n) = O(log n).

In-place Algorithms
An algorithm is in-place if its space complexity belong to O(log n). In
other words, the additional memory it requires to process an input
of size n is at most logarithmic in n.
This definition is a bit more practical than what we would intu-
itively allow: limiting the memory consumption to O(log n) instead
of O(1) allows some recursive algorithms like HeapSort to qualify
as in-place while still disallowing algorithms that would use recur-
sion to allocate enough local variables to duplicate the input on the
stack.
algo 51

More Sort-Related Topics

Stable Sorting Techniques


While we have illustrated all sorting algorithms over arrays of in-
tegers, in real-life it is often the case that we want to sort records of
values according to some key. The only change required to the algo-
rithms is that instead of comparing array elements (A[i ] < A[ j]) we
compare some fields of these elements (e.g., A[i ].grade < A[ j].grade).
Let call this field the sort key.
input output 1 output 2 output 3 output 4
name grade name grade name grade name grade name grade
Robert 10 John 5 Michael 5 John 5 Michael 5
William 15 Michael 5 John 5 Michael 5 John 5
James 10 Robert 10 Robert 10 James 10 James 10
John 5 James 10 James 10 Robert 10 Robert 10
Michael 5 William 15 William 15 William 15 William 15
N Figure 71: A table listing students
and their grades. Sorting this table
A sorting algorithm is stable if it preserves the order of elements in ascending-grade order may return
with equal keys. An unstable sorting algorithm may turn the input of any of the four displayed outputs.
Fig 71 into any of the four possible output. A stable sort will neces- For instance Selection Sort pro-
duces output 3, In sertionSort and
sarily produce output 1: John and Michael have equal grade so their MergeSort both produce output 1,
relative order has to be preserved; likewise for Robert and James. HeapSort produces output 4, and the
output of QuickSort depends on how
The algorithms InsertionSort and MergeSort, as presented the pivot is selected in Partition.
on pages 22 and 29–30 are stable.119 The following sorts are unsta- 119
However some very subtle changes
ble: SelectionSort, HeapSort, QuickSort. to these algorithms will lose the stable
Stable algorithms are mostly useful when records are sorted mul- property. In InsertionSort, change
A[i ] > key into A[i ] ≥ key and the
tiple times, one key at a time. For instance to obtain output 3 of algorithm will still sort the array, but in
Fig 71, where people with equal grades are sorted alphabetically, we an unstable way. Changing A[`] ≤ A[r ]
into A[`] < A[r ] in Merge will have a
could proceed in two steps as illustrated by Fig. 72.
similar effect on MergeSort.
name grade name grade name grade J Figure 72: (1) Sort by names, then
(2) use a stable sort to order by grades.
Robert 10 James 10 John 5 In this case, we would get the same
(1) (2) output using a single sorting procedure
William 15 −→ John 5 −→ Michael 5 by modifying the comparison function
James 10 Michael 5 James 10 to order by grades and then by names
John 5 Robert 10 Robert 10 in case of equal grades, however there
are situations (e.g., using a cheap
Michael 5 William 15 William 15 spreadsheet) where we have less
control on the comparison function.

Finally, we can turn any unstable sort into a stable one by append-
H Figure 73: Making any sort stable:
ing the input order to the key (Fig. 73), or equivalently, by adding a (1) modify the key to include the input
new column with the input order and using it for tie breaking. order, (2) sort against the new key,
(3) revert to old keys.

name grade name grade’ name grade’ name grade


Robert 10 Robert 1001 John 504 John 5
William 15 (1) William 1502 (2) Michael 505 (3) Michael 5
−→ −→ −→
James 10 James 1003 Robert 1001 Robert 10
John 5 John 504 James 1003 James 10
Michael 5 Michael 505 William 1502 William 15
algo 52

SelectionSort( A, n)
Worst Case for Comparison Sort 1 for i ← 0 to n − 2 do
2 min ← i
Let us build a decision tree tracing all the comparisons performed 3 for j ← i + 1 to n − 1 do
by SelectionSort (Figure 74) on the array [ a, b, c]. Each branch 4 if A[ j] < A[min]
of this tree (Figure 75) corresponds to a possible execution of the 5 min ← j
6 A[min] ↔ A[i ]
algorithm. Each internal node corresponds to a comparison of the
N Figure 74: SelectionSort, for
values to sort (line 4 of Figure 74): the first two lines of comparisons reference. See page 21 for more details.
correspond to the first iteration with i = 0, after which the smallest We focus on the comparison of line 4.
element of the array is swapped with the first element, and the third
line of the tree corresponds to iteration i = 1 which orders the last
two elements of the array. The state of the array is displayed above
the first comparison of each iteration.

[ a, b, c] J Figure 75: Decision tree representing


comparisons performed on line 4 of
b<a SelectionSort while running on
i=0

[ a, b, c]. Left children, with dotted


c<a c<b lines, correspond to cases where the
[ a, b, c] [c, b, a] [b, a, c] [c, b, a] comparison fails, while right children,
i=1

c<b a<b c<a a<b with plain lines, are cases where it is
successful.
[ a, b, c] [ a, c, b] [c, b, a] [c, a, b] [b, a, c] [b, c, a] [c, b, a]

Leaves correspond to the final order produced by the algorithm.


Note that some of the comparisons performed can be superfluous for
two reasons:

• Either its result is forced. For instance the rightmost comparison,


a < b, can only fail because we know that b < a succeeded.

• Or the resulting effect would make no difference. For instance if


b < a is false, and c < a is true, the array is rewritten as [c, b, a]
at the end of the first iteration. The second iteration will then test
whether a < b and return either [c, b, a] or [c, a, b]. However if
both b < a and a < b are false, it means that a = b, so it would be
OK to just skip this comparison and always return [c, a, b].

Such a decision tree can be built for an array of any size, and for
any comparison sort, i.e., any sorting procedure that compares the
elements to sort.120 For some algorithms, the branches may not have 120
Can you think of sorting procedures
all of same length. As branches correspond to a possible executions that do not?

of the algorithm, the height of the tree gives the number of compar-
isons necessary in the worst case.
If we construct such a decision tree for an array of n elements,
the number of leaves has to be at least n! to account for all possible
ordering of those elements (and possibly more since we may have
duplicates). As this is a binary tree, the number of comparisons
necessary to sort an array in the worst case is at least log2 (n!).

We can therefore conclude that the worst case of any comparison


sort requires Ω(log(n!)) = Ω(n log n) operations.121 Please read the 121
See page 14 for a demonstration that
previous sentence carefully. This is a lower bound for a worst case. log2 (n!) = Θ(n log n).
algo 53

Counting Sort
Let us study a first non-comparitive sort. That’s right: Counting-
Sort is able to sort an array without comparing its elements. This
algorithm is quite limited: it can only sort non-negative integers that
span a small range. But it can be used as a building block to sort
more complex values, as we shall see later. Also, this is a stable sort,
and that will prove important to build upon.
Figure 76 illustrates how CountingSort works on an example.
5
3
(1) count 2
1 0 2 1 0 1 1 0 2 1 J Figure 76: CountingSort steps:
0 1 2 (1) build a histogram of all values, (2)

(2) cumulate
...
(3) copy to
final place

build a cumulative histogram, where


10 index i gives the number of values less
8 than or equal to i, and (3) use those
values to construct the sorted array.

3 8 10 3
0 0 0 1 1 1 1 1 2 2
≤0 ≤1 ≤2

The values to sort are used as indices in an array that counts


their occurrences. Once the number of each value is known, we can
compute the ranges of the output array where each different output
value needs to be copied.
Assuming the input array A contains n integers satisfying
0 ≤ A[i ] < k for 0 ≤ i < n, the algorithm can be written as fol-
lows:122 122
For simplicity, this algorithm is
presented for the case where the input
CountingSort( A, n, k) array contains only the integers to sort.
1 for i ← 0 to k − 1 do Θ(k) In reality those integers are likely to
be attached to some auxiliary data to
2 C [i ] ← 0 Θ(k) copy as well. For instance we might be
(1)
3 for i ← 0 to n − 1 do Θ(n) sorting a list a students according to
some grade. So to generalize the algo-
4 C[A[i]] ← C [ A[i ]] + 1 Θ(n) rithm, C [ A[i ]] should be understood as
5 for i ← 1 to k − 1 do Θ(k) C [key( A[i ])] where 0 ≤ key(A[i]) < k is
(2)
6 C [ i ] ← C [ i ] + C [ i − 1] Θ(k) the key used to sort A[i ].

7 for i ← n − 1 down to 0 do Θ(n)


(3) 8 C[A[i]] ← C [ A[i ]] − 1 Θ(n)
9 B[C[A[i]]] ← A[i ] Θ(n)
10 return B
Θ(n + k)

The steps of Figure 76 are highlighted on the left. After step (2),
C [ x ] contains the number of occurence of values ≤ x, so when scan-
ning the array from right to left123 , we know that the first x we en- 123
It is also possible to do it from left-
counter should go at position C [ x − 1] and we can then decrement to-right, but preserving stability will
require some tweaks to lines 5–9.
C [ x ] for the next occurrence of x (these two operations are done in
the opposite order, on lines 8–9).
The time complexity is easily seen to be Θ(n + k) and this can be
reduced to Θ(n) whenever it is known that k = O(n). In practice, k
is usually a small constant like 10 or 256 to keep the histogram short.
algo 54

Radix Sort (LSD)


RadixSort leverage the stability of CountingSort124 to sort ar- 124
Page 53.
rays with larger keys in multiple passes. A simple illustration is to
sort 5-digits numbers (e.g., French postal codes). Using a single pass
of CountingSort would require a histogram with k = 100000 en-
tries, probably not an issue for a computer, but definitely not some-
thing a human would do. A RadixSort approach would be to sort
those postal codes digits by digits, using multiple CountingSort
passes, each with k = 10 entries. For this we first sort according to
the last digit, then according to the one before, etc., until the first
digit (Figure 77). Since CountingSort is stable, everything to the
right of the considered digit is sorted after each pass.

75014 94270 94200 69002 73014 69002 J Figure 77: Sorting a list of 5-digit
numbers, one digit at a time using.
94270 94200 69002 75014 94110 73014
Each of the five arrows represents an
73014 94110 94110 73014 94200 75014 application of CountingSort using
94200 → 94250 → 75014 → 94110 → 94250 → 94110 the key highlighted to its left. This
iterative implementation of radix sort
94110 69002 73014 94200 94270 94200 starts from the least significant digit
94250 75014 94250 94250 75014 94250 (LSD).
69002 73014 94270 94270 69002 94270
Clearly if we have n 5-digit numbers to sort, each of these 5 passes
takes Θ(n) time. Therefore the entire process is linear. But this does
not necessarily has to be done in base 10. Sorting a list of 32-bit
integers, we could apply RadixSort in base 256 to process 8 bits at
a time, and require only four passes.

Radix Sort (MSD)


Instead of working from the least significant digit up to the most
significant one, it is also possible to work in the other way around. In
that case the first pass should split the input into k different ranges
sharing the same key, and sort those ranges recursively using next
key (Figure 78).

75014 69002 J Figure 78: Recursive RadixSort


starting from the most significant
94270 75014 73014 digit (MSD). Each application of
73014 73014 75014 CountingSort is used to split the
94200 values range into k output ranges.
94270 94270 94110 Ranges that have more than one value
94110 94200 94200 are sorted recursively using the next
94270 94200 digit as key.
94250 94110 94110 94200 94250
69002 94250 94250 94250 94270

If the recursive calls are laid as in Figure 78, where the recursive
calls in each ‘‘layer’’ sort their range according to the same digit, it is
clear the the number of values that need to be moved is can be less
than in the LSD version (because recursion stops on ranges of size
1), and that the total amount of work is still Θ(n) because of the first
pass.
algo 55

Further Reading

We recommend the following books (ordered by relevance):

• Introduction to Algorithms (Third Edition) by Thomas H. Cormen,


Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. The
MIT Press, 2009.
This book covers most of the topics touched in our lecture. Focus
on chapters: 1–4, 6–13, and 15. This books also has chapters on
graphs that intersect with the lecture on graph theory you will get
next semester.

• Concrete Mathematics: A Foundation for Computer Science (Sec-


ond Edition) by Ronald L. Graham, Donald E. Knuth, and Oren
Patashnik. Addison-Wesley, 1994.
An introduction to mathematical tools useful to the computer
scientist, and presented very nicely.

• Advanced Data Structures by Peter Brass. Cambridge University


Press, 2008.
This book presents a wide range of data structures. It is well
illustrated, and it gives actual C code for implementing each data
structure.

• Analysis of Algorithms (Second Edition) by Robert Sedgewick and


Philippe Flajolet. Addison-Weysley, 2013.
This book focuses on the mathematical tools needed for studying
the complexity of algorithm, but it goes very fast into powerful
techniques (such as generating functions) that are beyond the
scope of the current lecture. The first two chapters contains mate-
rial discussed in this lecture. In particular, our illustration of the
master theorem (page 33) is inspired from this book.

You might also like