Elementary Algorithms PDF
Elementary Algorithms PDF
1
Larry LIU Xinyu
1
Larry LIU Xinyu
Version: 0.6180339887498949
Email: [email protected]
2
Contents
I Preface 5
0.1 Why? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
0.2 The smallest free ID problem, the power of algorithms . . . . . . 7
0.2.1 Improvement 1 . . . . . . . . . . . . . . . . . . . . . . . . 8
0.2.2 Improvement 2, Divide and Conquer . . . . . . . . . . . . 9
0.2.3 Expressiveness vs. Performance . . . . . . . . . . . . . . . 10
0.3 The number puzzle, power of data structure . . . . . . . . . . . . 12
0.3.1 The brute-force solution . . . . . . . . . . . . . . . . . . . 12
0.3.2 Improvement 1 . . . . . . . . . . . . . . . . . . . . . . . . 12
0.3.3 Improvement 2 . . . . . . . . . . . . . . . . . . . . . . . . 15
0.4 Notes and short summary . . . . . . . . . . . . . . . . . . . . . . 17
0.5 Structure of the contents . . . . . . . . . . . . . . . . . . . . . . . 18
II Trees 21
3
4 CONTENTS
4 AVL tree 75
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
4.1.1 How to measure the balance of a tree? . . . . . . . . . . . 75
4.2 Definition of AVL tree . . . . . . . . . . . . . . . . . . . . . . . . 75
4.3 Insertion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
4.3.1 Balancing adjustment . . . . . . . . . . . . . . . . . . . . 80
4.3.2 Pattern Matching . . . . . . . . . . . . . . . . . . . . . . . 82
4.4 Deletion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
4.5 Imperative AVL tree algorithm ⋆ . . . . . . . . . . . . . . . . . . 83
4.6 Chapter note . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
6 B-Trees 127
6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
6.2 Insertion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
6.2.1 Splitting . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
6.3 Deletion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
6.3.1 Merge before delete method . . . . . . . . . . . . . . . . . 136
6.3.2 Delete and fix method . . . . . . . . . . . . . . . . . . . . 144
6.4 Searching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
6.5 Notes and short summary . . . . . . . . . . . . . . . . . . . . . . 151
8 From grape to the world cup, the evolution of selection sort 189
8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189
8.2 Finding the minimum . . . . . . . . . . . . . . . . . . . . . . . . 191
8.2.1 Labeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192
8.2.2 Grouping . . . . . . . . . . . . . . . . . . . . . . . . . . . 193
8.2.3 performance of the basic selection sorting . . . . . . . . . 194
8.3 Minor Improvement . . . . . . . . . . . . . . . . . . . . . . . . . 195
8.3.1 Parameterize the comparator . . . . . . . . . . . . . . . . 195
8.3.2 Trivial fine tune . . . . . . . . . . . . . . . . . . . . . . . 196
8.3.3 Cock-tail sort . . . . . . . . . . . . . . . . . . . . . . . . . 197
8.4 Major improvement . . . . . . . . . . . . . . . . . . . . . . . . . . 201
8.4.1 Tournament knock out . . . . . . . . . . . . . . . . . . . . 201
8.4.2 Final improvement by using heap sort . . . . . . . . . . . 209
8.5 Short summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210
6 CONTENTS
13 Searching 397
13.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 397
13.2 Sequence search . . . . . . . . . . . . . . . . . . . . . . . . . . . . 397
13.2.1 Divide and conquer search . . . . . . . . . . . . . . . . . . 398
13.2.2 Information reuse . . . . . . . . . . . . . . . . . . . . . . . 418
13.3 Solution searching . . . . . . . . . . . . . . . . . . . . . . . . . . 446
13.3.1 DFS and BFS . . . . . . . . . . . . . . . . . . . . . . . . . 446
13.3.2 Search the optimal solution . . . . . . . . . . . . . . . . . 483
13.4 Short summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 512
VI Appendix 515
Appendices
8 CONTENTS
A Lists 517
A.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 517
A.2 List Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . 517
A.2.1 Empty list . . . . . . . . . . . . . . . . . . . . . . . . . . . 518
A.2.2 Access the element and the sub list . . . . . . . . . . . . . 518
A.3 Basic list manipulation . . . . . . . . . . . . . . . . . . . . . . . . 519
A.3.1 Construction . . . . . . . . . . . . . . . . . . . . . . . . . 519
A.3.2 Empty testing and length calculating . . . . . . . . . . . . 520
A.3.3 indexing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 521
A.3.4 Access the last element . . . . . . . . . . . . . . . . . . . 522
A.3.5 Reverse indexing . . . . . . . . . . . . . . . . . . . . . . . 523
A.3.6 Mutating . . . . . . . . . . . . . . . . . . . . . . . . . . . 525
A.3.7 sum and product . . . . . . . . . . . . . . . . . . . . . . . 535
A.3.8 maximum and minimum . . . . . . . . . . . . . . . . . . . 539
A.4 Transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 542
A.4.1 mapping and for-each . . . . . . . . . . . . . . . . . . . . 543
A.4.2 reverse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 549
A.5 Extract sub-lists . . . . . . . . . . . . . . . . . . . . . . . . . . . 551
A.5.1 take, drop, and split-at . . . . . . . . . . . . . . . . . . . 551
A.5.2 breaking and grouping . . . . . . . . . . . . . . . . . . . . 553
A.6 Folding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 558
A.6.1 folding from right . . . . . . . . . . . . . . . . . . . . . . . 558
A.6.2 folding from left . . . . . . . . . . . . . . . . . . . . . . . 560
A.6.3 folding in practice . . . . . . . . . . . . . . . . . . . . . . 563
A.7 Searching and matching . . . . . . . . . . . . . . . . . . . . . . . 564
A.7.1 Existence testing . . . . . . . . . . . . . . . . . . . . . . . 564
A.7.2 Looking up . . . . . . . . . . . . . . . . . . . . . . . . . . 565
A.7.3 finding and filtering . . . . . . . . . . . . . . . . . . . . . 565
A.7.4 Matching . . . . . . . . . . . . . . . . . . . . . . . . . . . 568
A.8 zipping and unzipping . . . . . . . . . . . . . . . . . . . . . . . . 570
A.9 Notes and short summary . . . . . . . . . . . . . . . . . . . . . . 573
Preface
11
Elementary Algorithms 13
0.1 Why?
‘Are algorithms useful?’. Some programmers say that they seldom use any
serious data structures or algorithms in real work such as commercial application
development. Even when they need some of them, they have already been
provided by libraries. For example, the C++ standard template library (STL)
provides sort and selection algorithms as well as the vector, queue, and set data
structures. It seems that knowing about how to use the library as a tool is quite
enough.
Instead of answering this question directly, I would like to say algorithms
and data structures are critical in solving ‘interesting problems’, the usefulness
of the problem set aside.
Let’s start with two problems that looks like they can be solved in a brute-
force way even by a fresh programmer.
How can you find the smallest free ID, which is 10, from the list?
It seems the solution is quite easy even without any serious algorithms.
1: function Min-Free(A)
2: x←0
3: loop
4: if x ∈
/ A then
5: return x
6: else
7: x←x+1
Where the ∈ / is realized like below.
1: function ‘∈’(x,
/ X)
2: for i ← 1 to |X| do
3: if x = X[i] then
4: return False
5: return True
Some languages provide handy tools which wrap this linear time process. For
example in Python, this algorithm can be directly translated as the following.
def b r u t e _ f o r c e ( l s t ) :
i = 0
while True :
i f i not in l s t :
14 Preface
return i
i = i + 1
It seems this problem is trivial. However, There will be millions of IDs in a
large system. The speed of this solution is poor in such case for it takes O(n2 )
time, where n is the length of the ID list. In my computer (2 Cores 2.10 GHz,
with 2G RAM), a C program using this solution takes an average of 5.4 seconds
to search a minimum free number among 100,000 IDs1 . And it takes more than
8 minutes to handle a million numbers.
0.2.1 Improvement 1
The key idea to improve the solution is based on a fact that for a series of n
numbers x1 , x2 , ..., xn , if there are free numbers, some of the xi are outside the
range [0, n); otherwise the list is exactly a permutation of 0, 1, ..., n − 1 and n
should be returned as the minimum free number. We have the following fact.
#define N 1000000 // 1 m i l l i o n
#define WORD_LENGTH ( s i z e o f ( int ) ∗ 8 )
actually from ⌊n/2⌋ + 1 as the lower bound. So the algorithm is something like
minf ree(A, l, u), where l is the lower bound and u is the upper bound index of
the element.
Note that there is a trivial case, that if the number list is empty, we merely
return the lower bound as the result.
This divide and conquer solution can be formally expressed as a function :
l : A=ϕ
search(A, l, u) = search(A′′ , m + 1, u) : |A′ | = m − l + 1
search(A′ , l, m) : otherwise
where
l+u
m=⌊ ⌋
′
2
A = {∀x ∈ A ∧ x ≤ m}
A′′ = {∀x ∈ A ∧ x > m}
It is obvious that this algorithm doesn’t need any extra space2 . Each call
performs O(|A|) comparison to build A′ and A′′ . After that the problem scale
halves. So the time needed for this algorithm is T (n) = T (n/2) + O(n) which
reduce to O(n). Another way to analyze the performance is by observing that
the first call takes O(n) to build A′ and A′′ and the second call takes O(n/2), and
O(n/4) for the third... The total time is O(n + n/2 + n/4 + ...) = O(2n) = O(n)
.
In functional programming languages such as Haskell, partitioning a list has
already been provided in the basic library and this algorithm can be translated
as the following.
import Data.List
bsearch xs l u | xs == [] = l
| length as == m - l + 1 = bsearch bs (m+1) u
| otherwise = bsearch as l m
where
m = (l + u) `div` 2
(as, bs) = partition (≤m) xs
keeping. As we’ll see later, this can be eliminated either by tail recursion optimization, for
instance gcc -O2. or by manually changing the recursion to iteration
0.2. THE SMALLEST FREE ID PROBLEM, THE POWER OF ALGORITHMS17
3
can eliminate the recursion by replacing it by an iteration which yields the
following C program.
This program uses a ‘quick-sort’ like approach to re-arrange the array so that
all the elements before lef t are less than or equal to m; while those between
lef t and right are greater than m. This is shown in figure 1.
left right
Figure 1: Divide the array, all x[i] ≤ m where 0 ≤ i < lef t; while all x[i] > m
where lef t ≤ i < right. The left elements are unknown.
This program is fast and it doesn’t need extra stack space. However, com-
pared to the previous Haskell program, it’s hard to read and the expressiveness
decreased. We have to balance performance and expressiveness.
3 This is done automatically in most functional languages since our function is in tail re-
0.3.2 Improvement 1
Analysis of the above algorithm shows that modular and divide calculations
are very expensive [2]. And they executed a lot in loops. Instead of checking a
number contains only 2, 3, or 5 as factors, one alternative solution is to construct
such number by these factors.
0.3. THE NUMBER PUZZLE, POWER OF DATA STRUCTURE 19
1 2 3 5 3 4 5 6 10
4 5 6 9 10 15
The insert function takes O(|Q|) time to find the proper position and insert
it. If the element has already existed, it just returns.
A rough estimation tells that the length of the queue increase proportion to
n, (Each time, we extract one element, and pushed 3 new, the increase ratio ≤
2), so the total running time is O(1 + 2 + 3 + ... + n) = O(n2 ).
Figure3 shows the number of queue access time against n. It is quadratic
curve which reflect the O(n2 ) performance.
The C program based on this algorithm takes only 0.016[s] to get the right
answer 859963392. Which is 2500 times faster than the brute force solution.
Improvement 1 can also be considered in recursive way. Suppose X is the
infinity series for all numbers which only contain factors of 2, 3, or 5. The
following formula shows an interesting relationship.
merge [] l = l
merge l [] = l
merge (x:xs) (y:ys) | x <y = x : merge xs (y:ys)
| x ==y = x : merge xs ys
| otherwise = y : merge (x:xs) ys
0.3. THE NUMBER PUZZLE, POWER OF DATA STRUCTURE 21
>ns !! (1500-1)
859963392
0.3.3 Improvement 2
Considering the above solution, although it is much faster than the brute-force
one, It still has some drawbacks. It produces many duplicated numbers and
they are finally dropped when examine the queue. Secondly, it does linear scan
and insertion to keep the order of all elements in the queue, which degrade the
ENQUEUE operation from O(1) to O(|Q|).
If we use three queues instead of using only one, we can improve the solution
one step ahead. Denote these queues as Q2 , Q3 , and Q5 , and we initialize them
as Q2 = {2}, Q3 = {3} and Q5 = {5}. Each time we DEQUEUEed the smallest
one from Q2 , Q3 , and Q5 as x. And do the following test:
We repeatedly ENQUEUE the smallest one until we find the n-th element.
The algorithm based on this idea is implemented as below.
1: function Get-Number(n)
2: if n = 1 then
3: return 1
4: else
5: Q2 ← {2}
6: Q3 ← {3}
7: Q5 ← {5}
8: while n > 1 do
9: x ← min(Head(Q2 ), Head(Q3 ), Head(Q5 ))
10: if x = Head(Q2 ) then
11: Dequeue(Q2 )
12: Enqueue(Q2 , 2x)
13: Enqueue(Q3 , 3x)
14: Enqueue(Q5 , 5x)
15: else if x = Head(Q3 ) then
16: Dequeue(Q3 )
17: Enqueue(Q3 , 3x)
18: Enqueue(Q5 , 5x)
19: else
20: Dequeue(Q5 )
21: Enqueue(Q5 , 5x)
22: n←n−1
22 Preface
2 3 5 4 3 6 5 10
min=2 min=3
4 6 9 5 10 15 8 6 9 12 5 10 15 20
min=4 min=5
23: return x
This algorithm loops n times, and within each loop, it extract one head
element from the three queues, which takes constant time. Then it appends
one to three new elements at the end of queues which bounds to constant time
too. So the total time of the algorithm bounds to O(n). The C++ program
translated from this algorithm shown below takes less than 1 µs to produce the
1500th number, 859963392.
else if(x==Q3.front()){
Q3.pop();
Q3.push(x∗3);
Q5.push(x∗5);
}
else{
Q5.pop();
Q5.push(x∗5);
}
}
return x;
}
Invoke ‘last takeN 1500’ will generate the correct answer 859963392.
25
26 BIBLIOGRAPHY
Part II
Trees
27
Chapter 1
1.1 Introduction
Arrays or lists are typically considered the ‘hello world’ data structures. How-
ever, we’ll see they are not actually particularly easy to implement. In some
procedural settings, arrays are the most elementary data structures, and it is
possible to implement linked lists using arrays (see section 10.3 in [2]). On the
other hand, in some functional settings, linked lists are the elementary building
blocks used to create arrays and other data structures.
Considering these factors, we start with Binary Search Trees (or BST) as the
‘hello world’ data structure using an interesting problem Jon Bentley mentioned
in ‘Programming Pearls’ [2]. The problem is to count the number of times each
word occurs in a large text. One solution in C++ is below:
int main(int, char∗∗ ){
map<string, int> dict;
string s;
while(cin>>s)
++dict[s];
map<string, int>::iterator it=dict.begin();
for(; it!=dict.end(); ++it)
cout<<it→first<<": "<<it→second<<"λn";
}
And we can run it to produce the result using the following UNIX commands
1
.
29
30CHAPTER 1. BINARY SEARCH TREE, THE ‘HELLO WORLD’ DATA STRUCTURE
it reflects the power of BSTs. We’ll introduce how to implement BSTs in this
section and show how to balance them in a later section.
Before we dive into BSTs, let’s first introduce the more general binary tree.
Binary trees are recursively defined. BSTs are just one type of binary tree.
A binary tree is usually defined in the following way.
A binary tree is
L R
16
4 10
14 7 9 3
2 8 1
• all the values in left child tree are less than the value of this node;
• the value of this node is less than any values in its right child tree.
Figure 1.2 shows an example of a BST. Comparing with Figure 1.1, we can
see the differences in how keys are ordered between them.
3 8
1 7 16
2 10
9 14
The node first contains a field for the key, which can be augmented with
satellite data. The next two fields contain pointers to the left and right children,
respectively. To make backtracking to ancestors easy, a parent field is sometimes
provided as well.
In this section, we’ll ignore the satellite data for the sake of simplifying
the illustrations. Based on this layout, the node of BST can be defined in a
procedural language, such as C++:
template<class T>
struct node{
node(T x):key(x), left(0), right(0), parent(0){}
~node(){
delete left;
delete right;
}
node∗ left;
node∗ right;
node∗ parent; //Optional, it's helpful for succ and pred
T key;
};
key next
Figure 1.4: Binary search tree node layout on top of linked list. Where ‘left...’
and ‘right ...’ are either empty or BST nodes composed in the same way.
1.3. INSERTION 33
1.3 Insertion
To insert a key k (sometimes along with a value in practice) to a BST T , we
can use the following algorithm:
The exception to the above is when k is equal to the key of the root node,
meaning it already exists in the BST, and we can either overwrite the data, or
just do nothing. To simplify things, this case has been skipped in this section.
This algorithm is described recursively. It’s simplicity is why we consider
the BST structure the ‘hello world’ data structure. Formally, the algorithm can
be represented with a recursive mathematical function:
node(ϕ, k, ϕ) : T = ϕ
insert(T, k) = node(insert(Tl , k), k ′ , Tr ) : k < k ′ (1.1)
node(Tl , k ′ , insert(Tr , k)) : otherwise
Where Tl is the left child, Tr is the right child, and k ′ is the key when T
isn’t empty.
The node function creates a new node given the left subtree, a key and a
right subtree as parameters. ϕ means NIL or empty.
Translating the above functions directly to Haskell yields the following pro-
gram:
insert Empty k = Node Empty k Empty
insert (Node l x r) k | k < x = Node (insert l k) x r
| otherwise = Node l x (insert r k)
This program utilized the pattern matching features provided by the lan-
guage. However, even in functional settings without this feature (e.g. Scheme/Lisp)
the program is still expressive:
(define (insert tree x)
(cond ((null? tree) (list '() x '()))
((< x (key tree))
(make-tree (insert (left tree) x)
(key tree)
(right tree)))
((> x (key tree))
(make-tree (left tree)
(key tree)
(insert (right tree) x)))))
This algorithm can be expressed imperatively using iteration, completely
free of recursion:
34CHAPTER 1. BINARY SEARCH TREE, THE ‘HELLO WORLD’ DATA STRUCTURE
1: function Insert(T, k)
2: root ← T
3: x ← Create-Leaf(k)
4: parent ← N IL
5: while T ̸= N IL do
6: parent ← T
7: if k < Key(T ) then
8: T ← Left(T )
9: else
10: T ← Right(T )
11: Parent(x) ← parent
12: if parent = N IL then ▷ tree T is empty
13: return x
14: else if k < Key(parent) then
15: Left(parent) ← x
16: else
17: Right(parent) ← x
18: return root
1.4 Traversing
Traversing means visiting every element one-by-one in a BST. There are 3 ways
to traverse a binary tree: a pre-order tree walk, an in-order tree walk and a
post-order tree walk. The names of these traversal methods highlight the order
in which we visit the root of a BST.
• pre-order traversal:, visit the key, then the left child, finally the right child;
• in-order traversal: visit the left child, then the key, finally the right child;
• post-order traversal: visit the left child, then the right child, finally the
key.
The in-order walk of a BST outputs the elements in increasing order. The
definition of a BST ensures this interesting property, while the proof of this fact
is left as an exercise to the reader.
The in-order tree walk algorithm can be described as:
• traverse the left child by in-order walk, then access the key, finally traverse
the right child by in-order walk.
Where
Tl′ = map(f, Tl )
Tr′ = map(f, Tr )
k ′ = f (k)
And Tl , Tr and k are the children and key when the tree isn’t empty.
If we only need access the key without create the transformed tree, we can
realize this algorithm in procedural way lie the below C++ program.
template<class T, class F>
void in_order_walk(node<T>∗ t, F f){
if(t){
in_order_walk(t→left, f);
f(t→value);
in_order_walk(t→right, f);
}
}
{
ϕ : T =ϕ
toList(T ) = (1.3)
toList(Tl ) ∪ {k} ∪ toList(Tr ) : otherwise
For the readers who are not familiar with folding from left, this function can
also be defined recursively as the following.
{
ϕ : X=ϕ
f romList(X) =
insert(f romList({x2 , x3 , ..., xn }), x1 ) : otherwise
We’ll intense use folding function as well as the function composition and
partial evaluation in the future, please refer to appendix of this book or [6] [7]
and [8] for more information.
Exercise 1.1
• Given the in-order traverse result and pre-order traverse result, can you re-
construct the tree from these result and figure out the post-order traversing
result?
– Pre-order result: 1, 2, 4, 3, 5, 6;
– In-order result: 4, 2, 1, 5, 3, 6;
– Post-order result: ?
• Prove why in-order walk output the elements stored in a binary search
tree in increase order?
• Can you analyze the performance of tree sort with big-O notation?
2 Also known as ’Curried form’ to memorialize the mathematician and logician Haskell
Curry.
1.5. QUERYING A BINARY SEARCH TREE 37
1.5.1 Looking up
According to the definition of binary search tree, search a key in a tree can be
realized as the following.
When finding the successor of element x, which is the smallest one y that
satisfies y > x, there are two cases. If the node with value x has non-NIL right
child, the minimum element in right child is the answer; For example, in Figure
1.5, in order to find the successor of 8, we search it’s right sub tree for the
minimum one, which yields 9 as the result. While if node x don’t have right
child, we need back-track to find the closest ancestor whose left child is also
ancestor of x. In Figure 1.5, since 2 don’t have right sub tree, we go back to its
parent 1. However, node 1 don’t have left child, so we go back again and reach
to node 3, the left child of 3, is also ancestor of 2, thus, 3 is the successor of
node 2.
3 8
1 7 16
2 10
9 14
Figure 1.5: The successor of 8, is the minimum one in its right sub tree, 9;
In order to find the successor of 2, we go up to its parent 1, but 1 doesn’t have
left child, we go up again and find 3. Because its left child is also the ancestor
of 2, 3 is the result.
5: p ← Parent(x)
6: while p ̸= N IL and x = Left(p) do
7: x←p
8: p ← Parent(p)
9: return p
Below are the Python programs based on these algorithms. They are changed
a bit in while loop conditions.
def succ(x):
if x.right is not None: return tree_min(x.right)
p = x.parent
while p is not None and p.left != x:
x=p
p = p.parent
return p
def pred(x):
if x.left is not None: return tree_max(x.left)
p = x.parent
while p is not None and p.right != x:
x=p
p = p.parent
return p
Exercise 1.2
• Can you figure out how to iterate a tree as a generic container by using
Pred/Succ? What’s the performance of such traversing process in terms
of big-O?
• A reader discussed about traversing all elements inside a range [a, b]. In
C++, the algorithm looks like the below code:
for_each (m.lower_bound(12), m.upper_bound(26), f);
Can you provide the purely function solution for this problem?
1.6 Deletion
Deletion is another ‘imperative only’ topic for binary search tree. This is because
deletion mutate the tree, while in purely functional settings, we don’t modify
the tree after building it in most application.
However, One method of deleting element from binary search tree in purely
functional way is shown in this section. It’s actually reconstructing the tree but
not modifying the tree.
Deletion is the most complex operation for binary search tree. this is because
we must keep the BST property, that for any node, all keys in left sub tree are
less than the key of this node, and they are all less than any keys in right sub
tree. Deleting a node can break this property.
In this post, different with the algorithm described in [2], A simpler one from
SGI STL implementation is used.[4]
To delete a node x from a tree.
1.6. DELETION 41
• Otherwise (x has two children), use minimum element of its right sub tree
to replace x, and splice the original minimum element out.
The simplicity comes from the truth that, the minimum element is stored in
a node in the right sub tree, which can’t have two non-NIL children. It ends up
in the trivial case, the node can be directly splice out from the tree.
Figure 1.6, 1.7, and 1.8 illustrate these different cases when deleting a node
from the tree.
Tree
NIL NIL
Tree
Tree
x
L
L NIL
Tree
Tree
x
R
NIL R
Figure 1.7: Delete a node which has only one non-NIL child.
42CHAPTER 1. BINARY SEARCH TREE, THE ‘HELLO WORLD’ DATA STRUCTURE
Tree
min(R)
Tree
x
L delete(R, min(R))
L R
Based on this idea, the deletion can be defined as the below function.
ϕ : T =ϕ
node(delete(Tl , x), K, Tr ) : x<k
node(Tl , k, delete(Tr , x)) : x>k
delete(T, x) = (1.9)
Tr : x = k ∧ Tl = ϕ
Tl : x = k ∧ Tr = ϕ
node(Tl , y, delete(Tr , y)) : otherwise
Where
Tl = lef t(T )
Tr = right(T )
k = key(T )
y = min(Tr )
Translating the function to Haskell yields the below program.
delete Empty _ = Empty
delete (Node l k r) x | x < k = (Node (delete l x) k r)
| x > k = (Node l k (delete r x))
-- x == k
| isEmpty l = r
| isEmpty r = l
| otherwise = (Node l k' (delete r k'))
where k' = min r
Function isEmpty is to test if a tree is empty (ϕ). Note that the algorithm
first performs search to locate the node where the element need be deleted,
after that it execute the deletion. This algorithm takes O(h) time where h is
the height of the tree.
1.6. DELETION 43
It’s also possible to pass the node but not the element to the algorithm for
deletion. Thus the searching is no more needed.
The imperative algorithm is more complex because it need set the parent
properly. The function will return the root of the result tree.
1: function Delete(T, x)
2: r←T
3: x′ ← x ▷ save x
4: p ← Parent(x)
5: if Left(x) = N IL then
6: x ← Right(x)
7: else if Right(x) = N IL then
8: x ← Left(x)
9: else ▷ both children are non-NIL
10: y ← Min(Right(x))
11: Key(x) ← Key(y)
12: Copy other satellite data from y to x
13: if Parent(y) ̸= x then ▷ y hasn’t left sub tree
14: Left(Parent(y)) ← Right(y)
15: else ▷ y is the root of right child of x
16: Right(x) ← Right(y)
17: if Right(y) ̸= N IL then
18: Parent(Right(y)) ← Parent(y)
19: Remove y
20: return r
21: if x ̸= N IL then
22: Parent(x) ← p
23: if p = N IL then ▷ We are removing the root of the tree
24: r←x
25: else
26: if Left(p) = x′ then
27: Left(p) ← x
28: else
29: Right(p) ← x
30: Remove x′
31: return r
Here we assume the node to be deleted is not empty (otherwise we can simply
returns the original tree). In other cases, it will first record the root of the tree,
create copy pointers to x, and its parent.
If either of the children is empty, the algorithm just splice x out. If it has
two non-NIL children, we first located the minimum of right child, replace the
key of x to y’s, copy the satellite data as well, then splice y out. Note that there
is a special case that y is the root node of x’s right sub tree.
Finally we need reset the stored parent if the original x has only one non-
NIL child. If the parent pointer we copied before is empty, it means that we are
deleting the root node, so we need return the new root. After the parent is set
properly, we finally remove the old x from memory.
The relative Python program for deleting algorithm is given as below. Be-
cause Python provides GC, we needn’t explicitly remove the node from the
44CHAPTER 1. BINARY SEARCH TREE, THE ‘HELLO WORLD’ DATA STRUCTURE
memory.
def tree_delete(t, x):
if x is None:
return t
[root, old_x, parent] = [t, x, x.parent]
if x.left is None:
x = x.right
elif x.right is None:
x = x.left
else:
y = tree_min(x.right)
x.key = y.key
if y.parent != x:
y.parent.left = y.right
else:
x.right = y.right
if y.right is not None:
y.right.parent = y.parent
return root
if x is not None:
x.parent = parent
if parent is None:
root = x
else:
if parent.left == old_x:
parent.left = x
else:
parent.right = x
return root
Exercise 1.3
• There is a symmetrical solution for deleting a node which has two non-NIL
children, to replace the element by splicing the maximum one out off the
left sub-tree. Write a program to implement this solution.
Exercise 1.4
[7] http://en.wikipedia.org/wiki/Function_composition
[8] http://en.wikipedia.org/wiki/Partial_application
[9] Miran Lipovaca. “Learn You a Haskell for Great Good! A Beginner’s
Guide”. the last chapter. No Starch Press; 1 edition April 2011, 400 pp.
ISBN: 978-1-59327-283-8
47
48 The evolution of insertion sort
Chapter 2
2.1 Introduction
In previous chapter, we introduced the ’hello world’ data structure, binary
search tree. In this chapter, we explain insertion sort, which can be think of
the ’hello world’ sorting algorithm 1 . It’s straightforward, but the performance
is not as good as some divide and conqueror sorting approaches, such as quick
sort and merge sort. Thus insertion sort is seldom used as generic sorting utility
in modern software libraries. We’ll analyze the problems why it is slow, and
trying to improve it bit by bit till we reach the best bound of comparison based
sorting algorithms, O(n lg n), by evolution to tree sort. And we finally show the
connection between the ’hello world’ data structure and ’hello world’ sorting
algorithm.
The idea of insertion sort can be vivid illustrated by a real life poker game[2].
Suppose the cards are shuffled, and a player starts taking card one by one.
At any time, all cards in player’s hand are well sorted. When the player
gets a new card, he insert it in proper position according to the order of points.
Figure 2.1 shows this insertion example.
Based on this idea, the algorithm of insertion sort can be directly given as
the following.
function Sort(A)
X←ϕ
for each x ∈ A do
Insert(X, x)
return X
It’s easy to express this process with folding, which we mentioned in the
chapter of binary search tree.
49
50 CHAPTER 2. THE EVOLUTION OF INSERTION SORT
Note that in the above algorithm, we store the sorted result in X, so this
isn’t in-place sorting. It’s easy to change it to in-place algorithm. Denote the
sequence as A = {a1 , a2 , ..., an }.
function Sort(A)
for i ← 2 to |A| do
insert ai to sorted sequence {a′1 , a′2 , ..., a′i−1 }
At any time, when we process the i-th element, all elements before i have
already been sorted. we continuously insert the current elements until consume
all the unsorted data. This idea is illustrated as in figure 8.3.
insert
Figure 2.2: The left part is sorted data, continuously insert elements to sorted
part.
2.2 Insertion
We haven’t answered the question about how to realize insertion however. It’s
a puzzle how does human locate the proper position so quickly.
For computer, it’s an obvious option to perform a scan. We can either scan
from left to right or vice versa. However, if the sequence is stored in plain array,
it’s necessary to scan from right to left.
2.2. INSERTION 51
function Sort(A)
for i ← 2 to |A| do ▷ Insert A[i] to sorted sequence A[1...i − 1]
x ← A[i]
j ←i−1
while j > 0 ∧ x < A[j] do
A[j + 1] ← A[j]
j ←j−1
A[j + 1] ← x
One may think scan from left to right is natural. However, it isn’t as effect
as above algorithm for plain array. The reason is that, it’s expensive to insert an
element in arbitrary position in an array. As array stores elements continuously,
if we want to insert new element x in position i, we must shift all elements after
i, including i + 1, i + 2, ... one position to right. After that the cell at position i
is empty, and we can put x in it. This is illustrated in figure 2.3.
insert
A[1] A[2] ... A[i-1] A[i] A[i+1] A[i+2] ... A[n-1] A[n] empty
If the length of array is n, this indicates we need examine the first i elements,
then perform n − i + 1 moves, and then insert x to the i-th cell. So insertion
from left to right need traverse the whole array anyway. While if we scan from
right to left, we examine i elements at most, and perform the same amount of
moves.
Translate the above algorithm to Python yields the following code.
def isort(xs):
n = len(xs)
for i in range(1, n):
x = xs[i]
j=i - 1
while j ≥ 0 and x < xs[j]:
xs[j+1] = xs[j]
j=j - 1
xs[j+1] = x
It can be found some other equivalent programs, for instance the following
ANSI C program. However this version isn’t as effective as the pseudo code.
void isort(Key∗ xs, int n){
int i, j;
for(i=1; i<n; ++i)
for(j=i-1; j≥0 && xs[j+1] < xs[j]; --j)
swap(xs, j, j+1);
}
52 CHAPTER 2. THE EVOLUTION OF INSERTION SORT
This is because the swapping function, which can exchange two elements
typically uses a temporary variable like the following:
void swap(Key∗ xs, int i, int j){
Key temp = xs[i];
xs[i] = xs[j];
xs[j] = temp;
}
So the ANSI C program presented above takes 3m times assignment, where
m is the number of inner loops. While the pseudo code as well as the Python
program use shift operation instead of swapping. There are m + 2 times assign-
ment.
We can also provide Insert() function explicitly, and call it from the general
insertion sort algorithm in previous section. We skip the detailed realization here
and left it as an exercise.
All the insertion algorithms are bound to O(n), where n is the length of
the sequence. No matter what difference among them, such as scan from left
or from right. Thus the over all performance for insertion sort is quadratic as
O(n2 ).
Exercise 2.1
• Provide explicit insertion function, and call it with general insertion sort
algorithm. Please realize it in both procedural way and functional way.
2.3 Improvement 1
Let’s go back to the question, that why human being can find the proper position
for insertion so quickly. We have shown a solution based on scan. Note the fact
that at any time, all cards at hands have been well sorted, another possible
solution is to use binary search to find that location.
We’ll explain the search algorithms in other dedicated chapter. Binary search
is just briefly introduced for illustration purpose here.
The algorithm will be changed to call a binary search procedure.
function Sort(A)
for i ← 2 to |A| do
x ← A[i]
p ← Binary-Search(A[1...i − 1], x)
for j ← i down to p do
A[j] ← A[j − 1]
A[p] ← x
Instead of scan elements one by one, binary search utilize the information
that all elements in slice of array {A1 , ..., Ai−1 } are sorted. Let’s assume the
order is monotonic increase order. To find a position j that satisfies Aj−1 ≤
x ≤ Aj . We can first examine the middle element, for example, A⌊i/2⌋ . If x is
less than it, we need next recursively perform binary search in the first half of
the sequence; otherwise, we only need search in last half.
Every time, we halve the elements to be examined, this search process runs
O(lg n) time to locate the insertion position.
2.4. IMPROVEMENT 2 53
function Binary-Search(A, x)
l←1
u ← 1 + |A|
while l < u do
m ← ⌊ l+u
2 ⌋
if A[m] = x then
return m ▷ Find a duplicated element
else if A[m] < x then
l ←m+1
else
u←m
return l
The improved insertion sort algorithm is still bound to O(n2 ), compare to
previous section, which we use O(n2 ) times comparison and O(n2 ) moves, with
binary search, we just use O(n lg n) times comparison and O(n2 ) moves.
The Python program regarding to this algorithm is given below.
def isort(xs):
n = len(xs)
for i in range(1, n):
x = xs[i]
p = binary_search(xs[:i], x)
for j in range(i, p, -1):
xs[j] = xs[j-1]
xs[p] = x
Exercise 2.2
Write the binary search in recursive manner. You needn’t use purely func-
tional programming language.
2.4 Improvement 2
Although we improve the search time to O(n lg n) in previous section, the num-
ber of moves is still O(n2 ). The reason of why movement takes so long time, is
because the sequence is stored in plain array. The nature of array is continu-
ously layout data structure, so the insertion operation is expensive. This hints
54 CHAPTER 2. THE EVOLUTION OF INSERTION SORT
us that we can use linked-list setting to represent the sequence. It can improve
the insertion operation from O(n) to constant time O(1).
{x} : A = ϕ
insert(A, x) = {x} ∪ A : x < a1 (2.3)
{a1 } ∪ insert({a2 , a3 , ...an }, x) : otherwise
return head;
}
Exercise 2.3
We must use binary search, this is the only way to improve the comparison
time to O(lg n). On the other hand, we must change the data structure, because
we can’t achieve constant time insertion at a position with plain array.
This remind us about our ’hello world’ data structure, binary search tree. It
naturally support binary search from its definition. At the same time, We can
insert a new node in binary search tree in O(1) constant time if we already find
the location.
So the algorithm changes to this.
function Sort(A)
T ←ϕ
for each x ∈ A do
T ← Insert-Tree(T, x)
return To-List(T )
Where Insert-Tree() and To-List() are described in previous chapter
about binary search tree.
As we have analyzed for binary search tree, the performance of tree sort is
bound to O(n lg n), which is the lower limit of comparison based sort[3].
[1] http://en.wikipedia.org/wiki/Bubble_sort
[2] Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest and Clifford
Stein. “Introduction to Algorithms, Second Edition”. ISBN:0262032937.
The MIT Press. 2001
[3] Donald E. Knuth. “The Art of Computer Programming, Volume 3: Sorting
and Searching (2nd Edition)”. Addison-Wesley Professional; 2 edition (May
4, 1998) ISBN-10: 0201896850 ISBN-13: 978-0201896855
57
58 Red black tree
Chapter 3
3.1 Introduction
3.1.1 Exploit the binary search tree
We showed the power of using binary search tree as a dictionary to count the
occurrence of every word in a book in previous chapter.
One may come to the idea to feed a yellow page book 1 to a binary search
tree, and use it to look up the phone number for a contact.
By modifying a bit of the program for word occurrence counting yields the
following code.
int main(int, char∗∗ ){
ifstream f("yp.txt");
map<string, string> dict;
string name, phone;
while(f>>name && f>>phone)
dict[name]=phone;
for(;;){
cout<<"λnname: ";
cin>>name;
if(dict.find(name)==dict.end())
cout<<"not found";
else
cout<<"phone: "<<dict[name];
}
}
This program works well. However, if you replace the STL map with the
binary search tree introduced in previous chapter, the performance will be bad,
especially when you search some names such as Zara, Zed, Zulu.
This is because the content of yellow page is typically listed in lexicographic
order. Which means the name list is in increase order. If we try to insert a
1A telephone number contact list book
59
60CHAPTER 3. RED-BLACK TREE, NOT SO COMPLEX AS IT WAS THOUGHT
sequence of number 1, 2, 3, ..., n to a binary search tree, we will get a tree like
in Figure 3.1.
...
Exercise 3.1
• For a very big yellow page list, one may want to speed up the dictionary
building process by two concurrent tasks (threads or processes). One task
reads the name-phone pair from the head of the list, while the other one
reads from the tail. The building terminates when these two tasks meet at
the middle of the list. What will be the binary search tree looks like after
building? What if you split the list more than two and use more tasks?
• Can you find any more cases to exploit a binary search tree? Please
consider the unbalanced trees shown in figure 3.2.
n
n
n-1 3
n-2 n-1
... 4
1 ...
(a) (b)
m-1 m+1
m-2 m+2
... ...
1 n
(c)
chapter about binary heaps, we’ll show another interesting tree called splay tree,
which can gradually adjust the the tree to make it more and more balanced.
X Y
a Y X c
⇐⇒
b c a b
(a) (b)
Figure 3.3: Tree rotation, ‘rotate-left’ transforms the tree from left side to right
side, and ‘rotate-right’ does the inverse transformation.
Tree rotation is a set of operations that can transform the tree structure
without changing the in-order traverse result. It based on the fact that for
a specified ordering, there are multiple binary search trees correspond to it.
Figure 3.3 shows the tree rotation. For a binary search tree on the left side, left
rotate transforms it to the tree on the right, and right rotate does the inverse
transformation.
Although tree rotation can be realized in procedural way, there exists simple
functional definition by using pattern matching. Denote the non-empty tree as
T = (Tl , k, Tr ), where k is the key, and Tl , Tr are left and right sub-trees.
{
((a, X, b), Y, c) : T = (a, X, (b, Y, c))
rotatel (T ) = (3.1)
T : otherwise
{
(a, X, (b, Y, c)) : T = ((a, X, b), Y, c))
rotater (T ) = (3.2)
T : otherwise
To perform tree rotation imperatively, we need set all fields of the node as
the following.
1: function Left-Rotate(T, x)
2: p ← Parent(x)
3: y ← Right(x) ▷ Assume y ̸= NIL
4: a ← Left(x)
5: b ← Left(y)
6: c ← Right(y)
7: Replace(x, y)
8: Set-Children(x, a, b)
9: Set-Children(y, x, c)
10: if p = NIL then
11: T ←y
12: return T
3.1. INTRODUCTION 63
4: function Set-Left(x, y)
5: Left(x) ← y
6: if y ̸= NIL then Parent(y) ← x
7: function Set-Right(x, y)
8: Right(x) ← y
9: if y ̸= NIL then Parent(y) ← x
Most of the content in this chapter is based on Chris Okasaki’s work in [2].
64CHAPTER 3. RED-BLACK TREE, NOT SO COMPLEX AS IT WAS THOUGHT
13
8 17
1 11 15 25
As all NIL nodes are black, people often omit them when draw red-black
tree. Figure 3.5 gives the corresponding tree that hides all the NIL nodes.
2 Red-black tree is one of the equivalent form of 2-3-4 tree (see chapter B-tree about 2-3-4
tree). That is to say, for any 2-3-4 tree, there is at least one red-black tree has the same data
order.
3.2. DEFINITION OF RED-BLACK TREE 65
13
8 17
1 11 15 25
6 22 27
Figure 3.5: The red-black tree with all NIL nodes hidden.
All read operations such as search, find the min/max, are same as the binary
search tree. The insertion and deletion are special for the red-black tree.
Many implementation of set or map container are based on red-black tree.
One example is the C++ Standard library (STL)[4].
For the data layout, the only change is the color color information need be
augmented to binary search tree. This can be represented as a data field. Like
the below C++ example.
enum Color {Red, Black};
Exercise 3.2
• Can you prove that a red-black tree with n nodes has height at most
2 lg(n + 1)?
66CHAPTER 3. RED-BLACK TREE, NOT SO COMPLEX AS IT WAS THOUGHT
3.3 Insertion
The tree may be unblanced if a new node is inserted with the method we used
for the binary search tree. In order to maintain the red-black properties, we
need do some fixing after insertion.
When insert a new key, we can always insert it as a red node. As far as the
new inserted node isn’t the root of the tree, we can keep all properties except
the 4-th one as it may bring two adjacent red nodes.
There are both functional and procedural fixing methods. One is intuitive
but has some overhead, the other is a bit complex but has higher performance. In
this chapter, we focus on the functional approach to show how easy a red-black
tree insertion algorithm can be realized. The traditional procedural method will
be given for comparison purpose.
As described by Chris Okasaki, there are total 4 cases which violate property
4. All of them has 2 adjacent red nodes. However, they have a uniformed
structure after fixing[2] as shown in figure 3.6.
Note that this transformation will move the redness one level up. During the
bottom-up recursive fixing, the last step will make the root node red. According
to property 2, root is always black, thus we need finally fix to revert the root
color to black.
Observing that the 4 cases and the fixed result have strong patterns, the
fixing function can be defined by using the similar method we mentioned in tree
rotation. Denote the color of a node as C, it has two values: black B, and redR.
A none empty tree can be represented as T = (C, Tl , k, Tr ).
{
(R, (B, A, x, B), y, (B, C, z, D)) : match(T )
balance(T ) = (3.3)
T : otherwise
where function match() tests if a tree mathes one of the 4 possible patterns
as the following.
(B, (R, (R, A, x, B), y, C), z, D)∨
(B, (R, A, x, (R, B, y, C), z, D))∨
match(T ) = T =
(B, A, x, (R, B, y, (R, C, z, D)))∨
(B, A, x, (R, (R, B, y, C), z, D))
With function balance(T ) defined, we can modify the binary search tree
insertion functions to make it work for red-black tree.
where
(R, ϕ, k, ϕ) : T =ϕ
ins(T, k) = balance((ins(Tl , k), k ′ , Tr )) : k < k′ (3.5)
balance((Tl , k ′ , ins(Tr , k))) : otherwise
If the tree is empty, then a new red node with k as the key is created;
otherwise, denote the children and the key as Tl , Tr , and k ′ , we compare k and
3.3. INSERTION 67
z x
y D A y
x C B z
@
@ y
A B @
R C D
x z
A B C D
@
I
z @ x
@
x D A z
A y y D
B C B C
k ′ and recursively insert k to a child. Function balance is called after that, and
the root is re-colored black finally.
Summarize the above functions and use language supported pattern match-
ing features, we can come to the following Haskell program.
insert t x = makeBlack $ ins t where
ins Empty = Node R Empty x Empty
ins (Node color l k r)
| x<k = balance color (ins l) k r
| otherwise = balance color l k (ins r) --[3]
makeBlack(Node _ l k r) = Node B l k r
Note that the ’balance’ function is changed a bit from the original definition.
Instead of passing the tree, we pass the color, the left child, the key and the
right child to it. This can save a pair of ‘boxing’ and ’un-boxing’ operations.
This program doesn’t handle the case of duplicated keys. we can either
overwrite the key or drop the duplicated one. Another option is to augment the
data with a linked list([2], pp269).
Figure 3.7 shows two red-black trees built from feeding list 11, 2, 14, 1, 7,
15, 5, 8, 4 and 1, 2, ..., 8. The tree is well balanced even if we input an ordered
list.
7 4
2 14 2 6
1 5 11 15 1 3 5 7
4 8 8
The insertion algorithm takes O(lg n) time to insert a key to a red-black tree
which has n nodes.
Exercise 3.3
3.4 Deletion
Remind the deletion section in binary search tree. Deletion is ‘imperative only’
for red-black tree as well. In many cases, the tree is often built just one time,
and then performs looking up frequently[3].
The purpose of this section is to show that red-black tree deletion is possible
in purely functional settings, although it actually rebuilds the tree because trees
are read only in terms of purely functional data structure3 . In real world, it’s
up to the user (i.e. the programmer) to adopt the proper solution. One option
is to mark the node be deleted with a flag, and later rebuild the tree when the
number of deleted nodes exceeds 50%.
Deletion is more complex than insertion in both functional and imperative
settings, as there are more cases to fix. Deletion may also violate the red black
tree properties, so we need fix it after the normal deletion as described in binary
search tree.
The problem only happens if you try to delete a black node, because it will
violate the last property of red-black tree. The number of black node in the
path decreases so not all the paths contain the same number of black node.
When delete a black node, we can resume the last red-black property by
introducing a ’doubly-black’ concept([2], pp290). It means that the although
the node is deleted, the blackness is kept by storing it in the parent node. If
the parent node is red, it turns to black, However, if it’s already black, it turns
to ‘doubly-black’.
In order to express the ’doubly-black node’, The definition need some mod-
ification accordingly.
data Color = R | B | BB -- BB: doubly black for deletion
data RBTree a = Empty | BBEmpty -- doubly black empty
| Node Color (RBTree a) a (RBTree a)
When deleting a node, we first perform the same binary search tree deleting
algorithm. After that, if the node to be sliced out is black, we need fix the tree
to keep the red-black properties. The delete function is defined as the following.
where
ϕ : T =ϕ
f ixBlack 2 ((C, del(Tl , k), k ′ , Tr )) : k < k′
f{ixBlack 2 ((C, Tl , k ′ , del(Tr , k))) : k > k′
mkBlk(Tr ) : C = B
del(T, k) = : Tl = ϕ (3.8)
{ Tr : otherwise
mkBlk(Tl ) : C = B
: Tr = ϕ
Tl : otherwise
f ixBlack 2 ((C, Tl , k ′′ , del(Tr , k ′′ ))) : otherwise
The real deleting happens inside function del. For the trivial case, that the
tree is empty, the deletion result is ϕ; If the key to be deleted is less than the
key of the current node, we recursively perform deletion on its left sub-tree; if
it is bigger than the key of the current node, then we recursively delete the key
from the right sub-tree; Because it may bring doubly-blackness, so we need fix
it.
If the key to be deleted is equal to the key of the current node, we need
splice it out. If one of its children is empty, we just replace the node by the
other one and reserve the blackness of this node. otherwise we cut and past the
minimum element k ′′ = min(Tr ) from the right sub-tree.
Function delete just forces the result tree of del to have a black root. This
is realized by function blackenRoot.
{
ϕ : T =ϕ
blackenRoot(T ) = (3.9)
(B, Tl , k, Tr ) : otherwise
The blackenRoot(T ) function is almost same as the makeBlack(T ) function
defined for insertion except for the case of empty tree. This is only valid in
deletion, because insertion can’t result an empty tree, while deletion may.
Function mkBlk is defined to reserved the blackness of a node. If the node
to be sliced isn’t black, this function won’t be applied, otherwise, it turns a red
node to black and turns a black node to doubly-black. This function also marks
an empty tree ϕ to doubly-black empty Φ.
Φ : T =ϕ
(B, Tl , k, Tr ) : C = R
mkBlk(T ) = (3.10)
(B 2 , Tl , k, Tr ) : C = B
T : otherwise
where B 2 denotes the doubly-black color.
Summarizing the above functions yields the following Haskell program.
delete t x = blackenRoot(del t x) where
del Empty _ = Empty
del (Node color l k r) x
| x < k = fixDB color (del l x) k r
| x > k = fixDB color l k (del r x)
-- x == k, delete this node
| isEmpty l = if color==B then makeBlack r else r
| isEmpty r = if color==B then makeBlack l else l
| otherwise = fixDB color l k' (del r k') where k'= min r
3.4. DELETION 71
The final attack to the red-black tree deletion algorithm is to realize the
f ixBlack 2 function. The purpose of this function is to eliminate the ‘doubly-
black’ colored node by rotation and color changing. There are three cases. In
every case, the doubly black node can either be normal node, or doubly black
empty node Φ. Let’s examine these three cases one by one.
In this situation, we can fix the doubly-blackness with one rotation. Actually
there are 4 different sub-cases, all of them can be transformed to one uniformed
pattern. They are shown in the figure B.1.
Figure 3.8: Fix the doubly black by rotation, the sibling of the doubly-black
node is black, and it has one red child.
72CHAPTER 3. RED-BLACK TREE, NOT SO COMPLEX AS IT WAS THOUGHT
{
2 (C, (B, mkBlk(A), x, B), y, (B, C, z, D)) : p1.1
f ixBlack (T ) = (3.11)
(C, (B, A, x, B), y, (B, C, z, mkBlk(D))) : p1.2
where p1.1 and p1.2 each represent 2 patterns as the following.
T = (C, A, x, (B, (R, B, y, C), z, D)) ∧ color(A) = B 2
p1.1 : ∨
T = (C, A, x, (B, B, y, (R, C, z, D))) ∧ color(A) = B 2
T = (C, (B, A, x, (R, B, y, C)), z, D) ∧ color(D) = B 2
p1.2 : ∨
T = (C, (B, (R, A, x, B), y, C), z, D) ∧ color(D) = B 2
If the doubly black node is a doubly black empty node Φ, it can be changed
back to normal empty node after the above operation. We can add the doubly
black empty node handling on top of the (3.11).
(C, (B, mkBlk(A), x, B), y, (B, C, z, D)) : p1.1
2 (C, (B, ϕ, x, B), y, (B, C, z, D)) : p1.1′
f ixBlack (T ) = (3.12)
(C, (B, A, x, B), y, (B, C, z, mkBlk(D)))
: p1.2
(C, (B, A, x, B), y, (B, C, z, ϕ)) : p1.2′
Where patter p1.1′ and p1.2′ are defined as below:
T = (C, Φ, x, (B, (R, B, y, C), z, D))
p1.1′ : ∨
T = (C, Φ, x, (B, B, y, (R, C, z, D)))
T = (C, (B, A, x, (R, B, y, C)), z, Φ)
p1.2′ : ∨
T = (C, (B, (R, A, x, B), y, C), z, Φ)
... : ...
mkBlk((C, mkBlk(A), x, (R, B, y, C))) : p3.1
f ixBlack 2 (T ) = (3.14)
mkBlk((C, (R, A, x, B), y, mkBlk(C))) : p3.2
... : ...
... : ...
mkBlk((C, mkBlk(A), x, (R, B, y, C))) : p2.1
2 mkBlk((C, ϕ, x, (R, B, y, C))) : p2.1′
f ixBlack (T ) = (3.15)
mkBlk((C, (R, A, x, B), y, mkBlk(C))) : p2.2
mkBlk((C, (R, A, x, B), y, ϕ)) : p2.2′
... : ...
74CHAPTER 3. RED-BLACK TREE, NOT SO COMPLEX AS IT WAS THOUGHT
a y
a y
=⇒
b c
b c
(a) Color of x can be either black or red. (b) If x was red, then it becomes black, oth-
erwise, it becomes doubly-black.
y
x c
x c
=⇒
a b
a b
(c) Color of y can be either black or red. (d) If y was red, then it becomes black, oth-
erwise, it becomes doubly-black.
Figure 3.10: propagate the blackness up.
The deletion algorithm takes O(lg n) time to delete a key from a red-black
tree with n nodes.
Exercise 3.4
9: T ← Left(T )
10: else
11: T ← Right(T )
12: Parent(x) ← p
13: if p = NIL then ▷ tree T is empty
14: return x
15: else if k < Key(p) then
16: Left(p) ← x
17: else
18: Right(p) ← x
19: return Insert-Fix(root, x)
The only difference from the binary search tree insertion algorithm is that
we set the color of the new node as red, and perform fixing before return. Below
is the example Python program.
def rb_insert(t, key):
root = t
x = Node(key)
parent = None
while(t):
parent = t
if(key < t.key):
t = t.left
else:
t = t.right
if parent is None: #tree is empty
root = x
elif key < parent.key:
parent.set_left(x)
else:
parent.set_right(x)
return rb_insert_fix(root, x)
There are 3 base cases for fixing, and if we take the left-right symmetric
into consideration. there are total 6 cases. Among them two cases can be
merged together, because they all have uncle node in red color, we can toggle
the parent color and uncle color to black and set grand parent color to red.
With this merging, the fixing algorithm can be realized as the following.
1: function Insert-Fix(T, x)
2: while Parent(x) ̸= NIL ∧ Color(Parent(x)) = RED do
3: if Color(Uncle(x)) = RED then ▷ Case 1, x’s uncle is red
4: Color(Parent(x)) ← BLACK
5: Color(Grand-Parent(x)) ← RED
6: Color(Uncle(x)) ← BLACK
7: x ← Grand-Parent(x)
8: else ▷ x’s uncle is black
9: if Parent(x) = Left(Grand-Parent(x)) then
10: if x = Right(Parent(x)) then ▷ Case 2, x is a right child
11: x ← Parent(x)
12: T ← Left-Rotate(T, x)
▷ Case 3, x is a left child
3.5. IMPERATIVE RED-BLACK TREE ALGORITHM ⋆ 77
Figure 3.11 shows the results of feeding same series of keys to the above
python insertion program. Compare them with figure 3.7, one can tell the
difference clearly.
78CHAPTER 3. RED-BLACK TREE, NOT SO COMPLEX AS IT WAS THOUGHT
11
2 14
2 7
1 7 15
1 4 6 9
5 8
3 8
(a) (b)
79
80 AVL tree
Chapter 4
AVL tree
4.1 Introduction
4.1.1 How to measure the balance of a tree?
Besides red-black tree, are there any other intuitive self-balancing binary search
trees? In order to measure how balancing a binary search tree is, one idea is
to compare the height of the right sub-tree and left sub-tree. If they differs a
lot, the tree isn’t well balanced. Let’s denote the difference height between two
children as below
|δ(T )| ≤ 1 (4.2)
The absolute value of balance factor is less than or equal to 1, which means
there are only three valid values, -1, 0 and 1. Figure 4.1 shows an example AVL
tree.
Why AVL tree can keep the tree balanced? In other words, Can this defini-
tion ensure the height of the tree as O(lg n) where n is the number of the nodes
in the tree? Let’s prove this fact.
For an AVL tree of height h, The number of nodes varies. It can have at
most 2h − 1 nodes for a complete binary tree. We are interesting about how
81
82 CHAPTER 4. AVL TREE
2 8
1 3 6 9
5 7 10
many nodes there are at least. Let’s denote the minimum number of nodes for
the AVL tree of height h as N (h). It’s obvious we have the below result for the
trivial cases.
What’s the situation for the common case N (h)? Figure 4.2 shows an AVL
tree T of height h. It contains three parts, the root node, and two sub trees Tl ,
Tr . We have the following fact.
We immediately know that, there must be one child has height h − 1. Ac-
cording to the definition of AVL tree, we have ||Tl | − |Tr || ≤ 1. This leads to the
fact that the height of other tree can’t be lower than h − 2, So the total number
of the nodes of T is the number of nodes in both children plus 1 (for the root
node). We exclaim that.
N (h) = N (h − 1) + N (h − 2) + 1 (4.4)
h-1 h-2
Figure 4.2: An AVL tree of height h. The height of one sub-tree is h − 1, the
other is no less than h − 2
4.2. DEFINITION OF AVL TREE 83
N ′ (h) = N ′ (h − 1) + N ′ (h − 2) (4.5)
Lemma 4.2.1. Let N (h) be the minimum number of nodes for an AVL tree of
height h. and N ′ (h) = N (h) + 1, then
N ′ (h) ≥ ϕh (4.6)
√
5+1
Where ϕ = 2 is the golden ratio.
Proof. For the trivial case, we have
• h = 0, N ′ (0) = 1 ≥ ϕ0 = 1
• h = 1, N ′ (1) = 2 ≥ ϕ1 = 1.618...
For the induction case, suppose N ′ (h) ≥ ϕh .
N ′ (h + 1) = N ′ (h) + N ′ (h − 1) {F ibonacci}
≥ ϕh + ϕh−1 √
= ϕh−1 (ϕ + 1) {ϕ + 1 = ϕ2 = 2 }
5+3
h+1
=ϕ
4.3 Insertion
Insert a new element to the tree may violate the AVL tree property that the
absolute value of δ exceeds 1. To resume it, one option is to do the tree rotation
according to the different insertion cases. Most implementation is based on this
approach
Another way is to use the similar pattern matching method mentioned by
Okasaki in his red-black tree implementation [2]. Inspired by this idea, it is
possible to provide a simple and intuitive solution.
When insert a new key to the AVL tree, the balance factor of the root may
changes in range [−1, 1]2 , and the height may increase at most by one, which we
need recursively use this information to update the δ value in further level nodes.
We can define the result of the insertion algorithm as a pair of data (T ′ , ∆H).
Where T ′ is the new tree and ∆H is the increment of height. Let’s denote
function f irst(pair) can return the first element in a pair. We can modify the
binary search tree insertion algorithm as the following to handle AVL tree.
Tl = lef t(T )
Tr = right(T )
k ′ = key(T )
∆ = δ(T )
When we insert a new key k to a AVL tree T , if the tree is empty, we create
a leaf with k as the key, set the balance factor as 0, and the height is increased
by one.
If T isn’t empty, we need compare the key k ′ with k. If k is less than the
key, we recursively insert it to the left child, otherwise we insert it to the right.
2 Note that, it doesn’t mean δ is in range [−1, 1], the changes of δ is in this range.
4.3. INSERTION 85
As the result of the recursive insertion is a pair like (Tl′ , ∆Hl ), we need
do balance adjustment and update the increment of height. Function tree()
is defined to dealing with this task. It takes 4 parameters as (Tl′ , ∆Hl ), k ′ ,
(Tr′ , ∆Hr ), and ∆. The result of this function is defined as (T ′ , ∆H), where T ′
is the new tree after adjustment, and ∆H is the new increment of height. It is
defined as below.
∆H = |T ′ | − |T | (4.10)
This can be further detailed deduced in 4 cases.
∆H = |T ′ | − |T |
= 1 + max(|Tr′ |, |Tl′ |) − (1 + max(|Tr |, |Tl |))
=max(|Tr′ |, |Tl′ |) − max(|Tr |, |Tl |)
∆Hr : ∆ ≥ 0 ∧ ∆′ ≥ 0 (4.11)
∆ + ∆Hr : ∆ ≤ 0 ∧ ∆′ ≥ 0
=
∆Hl − ∆ : ∆ ≥ 0 ∧ ∆′ ≤ 0
∆Hl : otherwise
The proof of this equation can be referred from Appendix C.
The next problem is to determine the new balance factor ∆′ before perform-
ing balance adjustment. According to the definition of AVL tree, the balance
factor is the height difference of the right and left sub trees. We have the
following fact.
∆′ = |Tr′ | − |Tl′ |
= |Tr | + ∆Hr − (|Tl | + ∆Hl )
(4.12)
= |Tr | − |Tl | + ∆Hr − ∆Hl
= ∆ + ∆Hr − ∆Hl
With all these changes in height and the balance factor, we can define the
tree() function mentioned in (4.9).
Before we moving into details of balance adjustment, let’s translate the above
equations to example Haskell program.
First is the insert function.
insert::(Ord a)⇒AVLTree a → a → AVLTree a
insert t x = fst $ ins t where
ins Empty = (Br Empty x Empty 0, 1)
ins (Br l k r d)
| x<k = tree (ins l) k (r, 0) d
| x == k = (Br l k r d, 0)
| otherwise = tree (l, 0) k (ins r) d
Here we also handle the duplicated keys (the key has already existed.) by
overwriting.
tree::(AVLTree a, Int) → a → (AVLTree a, Int) → Int → (AVLTree a, Int)
tree (l, dl) k (r, dr) d = balance (Br l k r d', delta) where
d' = d + dr - dl
delta = deltaH d d' dl dr
86 CHAPTER 4. AVL TREE
Left-left lean
δ ′ (x) = δ(x)
δ ′ (y) = 0 (4.14)
δ ′ (z) = 0
Right-right lean
δ ′ (x) = 0
δ ′ (y) = 0 (4.15)
δ ′ (z) = δ(z)
{
′ −1 : δ(y) = 1
δ (x) =
0 : otherwise
δ ′ (y) = {
0 (4.16)
1 : δ(y) = −1
δ ′ (z) =
0 : otherwise
4.3. INSERTION 87
δ(z) = −2 δ(x) = 2
z x
δ(y) = −1
δ(y) = 1
y D A y
x C
δ ′ (y) = 0 B z
@
@ y
A B @
R C D
x z
A B C D
δ(z) = −2 @
I δ(x) = 2
z @ x
@
x
δ(x) = 1 D A
δ(z) = −1 z
A y y D
B C B C
(((A, x, B, δ(x)), y, (C, z, D, 0), 0), ∆H − 1)
:
Pll (T )
(((A, x, B, 0), y, (C, z, D, δ(z)), 0), ∆H − 1) :
Prr (T )
balance(T, ∆H) =
(((A, x, B, δ ′ (x)), y, (C, z, D, δ ′ (z)), 0), ∆H − 1) Prl (T ) ∨ Plr (T )
:
(T, ∆H) :
otherwise
(4.17)
Where Pll (T ) means the pattern of tree T is left-left lean respectively. δ ′ (x)
and delta′ (z) are defined in (C.16). The four patterns are tested as below.
Verification
When verify if a tree is AVL tree, we need verify two things, first, it’s a binary
search tree; second, it satisfies AVL tree property.
In order to test if a binary tree satisfies AVL tree property, we can examine
the height difference between the two sub trees recursively till the leaves.
{
T rue : T =ϕ
avl?(T ) = (4.19)
avl?(Tl ) ∧ avl?(Tr ) ∧ ||Tr | − |Tl || ≤ 1 : otherwise
Where the height can also be calculated recursively.
{
0 : T =ϕ
|T | = (4.20)
1 + max(|Tr |, |Tl |) : otherwise
The corresponding Haskell example program is given as the following.
4.4. DELETION 89
Exercise 4.1
Write a program to verify if a tree is the AVL tree. Please consider both
functional and imperative approaches.
4.4 Deletion
As we mentioned before, deletion will not be a major problem in purely func-
tional settings. As the tree is read only, the use case is typically performing
looking up after build.
For purely functional deletion, it actually re-builds the tree as we show in the
chatper of red-black tree. We put the AVL tree deletion algorithm in Appendix
C.
Note that after insertion, the balance factor δ may change because the height
of the tree can grow. Inserting on right side can increase δ by 1, while insert
on left side can decrease it. By the end of this algorithm, we need perform
bottom-up fixing from node x towards root.
We can translate the pseudo code to Python example program3 .
def avl_insert(t, key):
root = t
x = Node(key)
parent = None
while(t):
parent = t
if(key < t.key):
t = t.left
else:
t = t.right
if parent is None: #tree is empty
root = x
elif key < parent.key:
parent.set_left(x)
else:
parent.set_right(x)
return avl_insert_fix(root, x)
This is a top-down algorithm. It searches the tree from root down to the
proper position and inserts the new key as a leaf. By the end of this algorithm,
it calls the fixing function with the root and the new inserted node.
Note that we reuse the same methods of set_left() and set_right() as
we defined in chapter of red-black tree.
In order to resume the AVL tree property, we first check if the new node is
inserted on left or right. If it is on left, the balance factor δ decreases, otherwise
it increases. If we denote the new value as δ ′ , there are 3 cases between δ and
δ′ .
• If |δ| = 1 and |δ ′ | = 0, it means the new node makes the tree perfectly
balanced, the height of the parent node doesn’t change, the algorithm can
be terminated.
• If |δ| = 0 and |δ ′ | = 1, it means either the left or the right sub tree
increases its height. We need go on checking the upper level of the tree.
• If |δ| = 1 and |δ ′ | = 2, it means the AVL tree property is violated due to
the new insertion. We need perform rotation to fix it.
1: function AVL-Insert-Fix(T, x)
2: while Parent(x) ̸= NIL do
3: δ ← δ(Parent(x))
4: if x = Left(Parent(x)) then
5: δ′ ← δ − 1
6: else
7: δ′ ← δ + 1
8: δ(Parent(x)) ← δ ′
3C and C++ source code are available along with this book
4.5. IMPERATIVE AVL TREE ALGORITHM ⋆ 91
9: P ← Parent(x)
10: L ← Left(x)
11: R ← Right(x)
12: if |δ| = 1 and |δ ′ | = 0 then ▷ Height doesn’t change, terminates.
13: return T
14: else if |δ| = 0 and |δ ′ | = 1 then ▷ Go on bottom-up updating.
15: x←P
16: else if |δ| = 1 and |δ ′ | = 2 then
17: if δ ′ = 2 then
18: if δ(R) = 1 then ▷ Right-right case
19: δ(P ) ← 0 ▷ By (C.5)
20: δ(R) ← 0
21: T ← Left-Rotate(T, P )
22: if δ(R) = −1 then ▷ Right-left case
23: δy ← δ(Left(R)) ▷ By (C.16)
24: if δy = 1 then
25: δ(P ) ← −1
26: else
27: δ(P ) ← 0
28: δ(Left(R)) ← 0
29: if δy = −1 then
30: δ(R) ← 1
31: else
32: δ(R) ← 0
33: T ← Right-Rotate(T, R)
34: T ← Left-Rotate(T, P )
35: if δ ′ = −2 then
36: if δ(L) = −1 then ▷ Left-left case
37: δ(P ) ← 0
38: δ(L) ← 0
39: Right-Rotate(T, P )
40: else ▷ Left-Right case
41: δy ← δ(Right(L))
42: if δy = 1 then
43: δ(L) ← −1
44: else
45: δ(L) ← 0
46: δ(Right(L)) ← 0
47: if δy = −1 then
48: δ(P ) ← 1
49: else
50: δ(P ) ← 0
51: Left-Rotate(T, L)
52: Right-Rotate(T, P )
53: break
54: return T
As rotation operation doesn’t update the balance factor δ, we need update it
for impacted nodes. Among the four cases, the right-right case and the left-left
92 CHAPTER 4. AVL TREE
case need only one rotation, while the right-left case and the left-right case need
two rotations.
The relative example python program is as the following.
break
return t
95
96 Radix tree, Trie and Prefix Tree
Chapter 5
5.1 Introduction
The binary trees introduced so far store information in nodes. Edge can also
be used to store information. Radix trees including Trie and prefix tree are im-
portant data structures in information retrieving and manipulating. They were
found in 1960s. And are widely used in compiler design[2], and bio-information
area, such as DNA pattern matching [3].
0 1
1 0
10
1 0 1
011 100
1011
Figure 5.1 shows a radix tree([2] pp. 269). It contains strings of bit 1011,
10, 011, 100 and 0. When searching a key k = (b0 b1 ...bn )2 , we take the first bit
b0 (MSB from left), check if it is 0 or 1, if it is 0, we turn left, else turn right
for 1. Then we take the second bit and repeat this search till either meet a leaf
node or finish all the n bits.
The radix tree needn’t store keys in node at all. The information is repre-
97
98 CHAPTER 5. RADIX TREE, TRIE AND PREFIX TREE
sented by edges. The nodes marked with keys in the above figure are only for
illustration purpose.
Another idea is to represent the key in integer instead of string. Because
integer can be in binary format to save space. The speed is also fast as we can
use bit-wise manipulation in most programming environments.
0 1
0 1 1
11
1 1
011
0011
One approach is to treat all the prefix zero as effective bits. Suppose the
integer is represented with 32-bits, If we want to insert key 1, it ends up with
a tree of 32 levels. There are 31 nodes, each only has the left sub-tree. the last
node only has the right sub-tree. It is very inefficient in terms of space.
Okasaki shows a method to solve this problem in [2]. Instead of using big-
endian integer, we can use the little-endian integer to represent key. Thus
decimal integer 1 is represented as binary 1. When insert it to the empty binary
trie, the result is a trie with a root and a right leaf. There is only 1 level.
decimal 2 is represented as 01, and decimal 3 is (11)2 in little-endian binary
5.2. INTEGER TRIE 99
format. There is no need to add any prefix 0, the position in the trie is uniquely
determined.
5.2.2 Insertion
Because the definition of the integer trie is recursive, it’s strightforward to define
the insertion algorithm recursively. If the lowest bit is 0, the key to be inserted
is even, we recursively insert it to the left sub-tree; otherwise if the lowest bit
is 1, the key is odd, then the recursive insertion is applied to the right. we next
divide the key by 2 to get rid of the lowest bit. For trie T , denote the left and
right sub-trees as Tl and Tr respectively. Thus T = (Tl , v ′ , Tr ), where v ′ is the
optional satellite data. If T is empty, then Tl , Tr and v ′ are defined as empty
as well.
(Tl , v, Tr ) : k = 0
insert(T, k, v) = (insert(Tl , k/2, v), v ′ , Tr ) : even(k) (5.1)
(Tl , v ′ , insert(Tr , ⌊k/2⌋, v)) : otherwise
If the key to be inserted already exists, this algorithm just overwrites the
previous stored data. It can be replaced with other alternatives, such as to store
the data in a linked-list.
Figure 5.3 shows an example trie. It’s generated by inserting the key-value
pairs {1 → a, 4 → b, 5 → c, 9 → d} to the empty trie.
The following Haskell example program implements the insertion algorithm.
insert t 0 x = Branch (left t) (Just x) (right t)
insert t k x
| even k = Branch (insert (left t) (k `div` 2) x) (value t) (right t)
| otherwise = Branch (left t) (value t) (insert (right t) (k `div` 2) x)
left (Branch l _ _) = l
left Empty = Empty
right (Branch _ _ r) = r
100 CHAPTER 5. RADIX TREE, TRIE AND PREFIX TREE
0 1
1:a
0 0
1 0 1
4:b 5:c
9:d
value (Branch _ v _) = v
value Empty = Nothing
It uses bit-wise operation to test whether a number is even or odd, and shift
the bit to right as division.
def insert(t, key, value = None):
if t is None:
t = IntTrie()
p=t
while key != 0:
if key & 1 == 0:
if p.left is None:
p.left = IntTrie()
p = p.left
else:
if p.right is None:
p.right = IntTrie()
p = p.right
key = key >> 1 # key / 2
p.value = value
return t
For a given integer k with m bits in binary, the insertion algorithm goest
into m levels. The performance is bound to O(m) time.
5.2.3 Look up
To look up key k in the little-endian integer binary trie, if the trie is empty, the
looking up fails; if k = 0, then we return the data stored in the current node; if
the last bit is 0, we recursively look up the left sub-tree; otherwise we look up
the right sub-tree.
ϕ : T =ϕ
d : k=0
lookup(T, k) = (5.2)
lookup(T l , k/2) : even(k)
lookup(Tr , ⌊k/2⌋) : otherwise
The following Haskell example program implements the recursive look up
algorithm.
search Empty k = Nothing
search t 0 = value t
search t k = if even k then search (left t) (k `div` 2)
else search (right t) (k `div` 2)
The look up algorithm can also be realized imperatively. We examine each
bit of k from the lowest one. We go left if the bit is 0, otherwise, go right. The
looking up completes when all bits are consumed.
1: function Lookup(T, k)
2: while k ̸= 0 ∧ T ̸=NIL do
3: if Even?(k) then
4: T ← Left(T )
5: else
6: T ← Right(T )
7: k ← ⌊k/2⌋
8: if T ̸= NIL then
102 CHAPTER 5. RADIX TREE, TRIE AND PREFIX TREE
9: return Data(T )
10: else
11: return not found
Below example Python program implements the looking up algorithm.
def lookup(t, key):
while t is not None and k != 0:
if key & 1 == 0:
t = t.left
else:
t = t.right
key = key >> 1
return None if t is None else t.value
001 1
4:b 1:a
01 1
9:d 5:c
Figure 5.4: Little endian integer tree for the map {1 → a, 4 → b, 5 → c, 9 → d}.
From this figure, we can find the key of the branch node is the longest com-
mon prefix for its descendant trees. They branches out at certain bit. Integer
tree saves a lot of space compare to trie.
Different from integer trie, padding bits of zero don’t cause issue with the
big endian integer tree. All zero bits before MSB are omitted to save the space.
5.3. INTEGER PREFIX TREE 103
Okasaki list some significant advantages of big endian integer tree in [2].
5.3.1 Definition
Integer prefix tree is a special binary tree. It is either empty or a node. There
are two different types of node:
• A leaf contains integer key and optional satellite data;
• Or a branch node with the left and right sub-trees. The two children share
the longest common prefix bits for their keys. For the left child, the
next bit in the key is zero, while it’s one for the right child.
The following Haskell example code defines integer tree accordingly.
type Key = Int
type Prefix = Int
type Mask = Int
def isleaf(self):
return self.left is None and self.right is None
5.3.2 Insertion
When insert a key, if the tree is empty, we create a leaf node as shown in figure
5.5.
104 CHAPTER 5. RADIX TREE, TRIE AND PREFIX TREE
NIL 12
Figure 5.5: Left: the empty tree; Right: After insert key 12.
If the tree is a singleton leaf node x, we create a new leaf y, put the key and
the value into it. After that, we need create a new branch node, set x and y
as the two sub-trees. In order to determine if y should be on the left or right,
we need find the longest common prefix of x and y. For example if key(x) is 12
((1100)2 in binary), key(y) is 15 ((1111)2 in binary), then the longest common
prefix is (11oo)2 . Where o denotes the bits we don’t care about. We can use
another integer to mask those bits. In this case, the mask number is 4 (100 in
binary). The next bit after the longest common prefix presents 21 . This bit is
0 in key(x), while it is 1 in key(y). We should set x as the left sub-tree and y
as the right sub-tree. Figure 5.6 shows this example.
prefix=1100
12
mask=100
0 1
12 15
Figure 5.6: Left: A tree with a singleton leaf 12; Right: After insert key 15.
In case the tree is neither empty, nor a singleton leaf, we need firstly check if
the key to be inserted matches the longest common prefix recorded in the root.
Then recursively insert the key to the left or right according to the next bit of
the longest common prefix. For example, if insert key 14 ((1110)2 in binary) to
the result tree in figure 5.6, since the common prefix is (11oo)2 , and the next
bit (the bit of 21 ) is 1, we need recursively insert to the right sub-tree.
If the key to be inserted doesn’t match the longest common prefix in the
root, we need branch a new leaf out. Figure 5.7 shows these two different cases.
For a given key k and value v, denote (k, v) as the leaf node. For branch
node, denote it in form of (p, m, Tl , Tr ), where p is the longest common prefix,
m is the mask, Tl and Tr are the left and right sub-trees. Summarize the above
cases, the insertion algorithm can be defined as below.
(k, v) = ϕ ∨ T = (k, v ′ )
: T
join(k, (k, v), k ′ , T ) = (k ′ , v ′ )
: T
insert(T, k, v) = (p, m, insert(Tl , k, v), Tr ) : T
= (p, m, Tl , Tr ), match(k, p, m), zero(k, m)
(p, m, Tl , insert(Tr , k, v)) = (p, m, Tl , Tr ), match(k, p, m), ¬zero(k, m)
: T
join(k, (k, v), p, T ) = (p, m, Tl , Tr ), ¬match(k, p, m)
: T
(5.3)
The first clause deals with the edge cases, if T is empty, the result is a leaf
node. If T is a leaf node with the same key, we overwrite the previous value.
5.3. INTEGER PREFIX TREE 105
prefix=1100 prefix=1100
mask=100 mask=100
0 1 0 1
prefix=1110
12 15 12
mask=10
0 1
14 15
prefix=1100 prefix=0
mask=100 mask=10000
0 1 0 1
prefix=1110
12 15 5
mask=10
0 1
12 15
The second clause handles the case that T is a leaf node, but with differ-
ent key. Here we branch out another leaf, then extract the longest common
prefix, and determine which leaf should be set as the left sub-tree. Function
join(k1 , T1 , k2 , T2 ) does this work. We’ll define it later.
The third clause deals with the case that T is a branch node, the longest
common prefix matches the key to be inserted, and the next bit to the common
prefix is zero. Here we need recursively insert to the left sub-tree.
The fourth clause handles the similar case as the third one, except that the
next bit to the common prefix is one, but not zero. We need recursively insert
to the right sub-tree.
The last clause is for the case that the key to be inserted doesn’t match the
longest common prefix in the branch. We need branch out a new leaf by calling
the join function.
We need define function match(k, p, m) to test if the key k, has the same
prefix p above the masked bits m. For example, suppose the prefix stored in a
branch node is (pn pn−1 ...pi ...p0 )2 in binary, key k is (kn kn−1 ...ki ...k0 )2 in binary,
and the mask is (100...0)2 = 2i . They match if and only if pj = kj for all j,
that i ≤ j ≤ n.
One solution to realize match is to test if mask(k, m) = p is satisfied. Where
mask(x, m) = m − 1&x, that we perform bitwise-not of m − 1, then perform
bitwise-and with x.
Function zero(k, m) test the next bit of the common prefix is zero. With
the help of the mask m, we can shift m one bit to the right, then perform
bitwise-and with the key.
{
(p, m, T1 , T2 ) :
zero(p1, m), (p, m) = LCP (p1 , p2 )
join(p1 , T1 , p2 , T2 ) =
¬zero(p1, m)
(p, m, T2 , T1 ) :
(5.5)
In order to calculate the longest common prefix of p1 and p2 , we can firstly
compute bitwise exclusive-or for them, then count the number of bits in this
result, and generate a mask m = 2|xor(p1 ,p2 )| . The longest common prefix p can
be given by masking the bits with m for either p1 or p2 .
p = mask(p1 , m) (5.6)
The following Haskell example program implements the insertion algorithm.
import Data.Bits
insert t k x
= case t of
Empty → Leaf k x
5.3. INTEGER PREFIX TREE 107
match k p m = (mask k m) == p
if zero(t1.prefix, t.mask):
t.left, t.right = t1, t2
else:
t.left, t.right = t2, t1
return t
Figure 5.8 shows the example integer tree created with the insertion algo-
rithm.
prefix=0
mask=8
0 1
prefix=100
1:x
mask=2
0 1
4:y 5:z
5.3.3 Look up
If the integer tree T is empty, or it’s a singleton leaf with the key that is different
from what we are looking up, the result is empty. else if the key in the leaf equals,
we are done. If T is a branch node, we need check if the common prefix matches
the subject key, and recursively look up the sub-tree according to the next bit.
If the common prefix doesn’t match the key, then the lookup fails.
ϕ T = ϕ ∨ (T = (k ′ , v), k ′ ̸= k)
:
v T = (k ′ , v), k ′ = k
:
lookup(T, k) = lookup(Tl , k) :
T = (p, m, Tl , Tr ), match(k, p, m), zero(k, m)
lookup(Tr , k) T = (p, m, Tl , Tr ), match(k, p, m), ¬zero(k, m)
:
ϕ :
otherwise
(5.7)
The following Haskell example program implements this recursive lookup up
algorithm.
search t k
= case t of
Empty → Nothing
Leaf k' x → if k == k' then Just x else Nothing
Branch p m l r
| match k p m → if zero k m then search l k
else search r k
| otherwise → Nothing
110 CHAPTER 5. RADIX TREE, TRIE AND PREFIX TREE
The look up algorithm can also be realized imperatively. Consider the prop-
erty of integer prefix tree. When look up a key, if it has common prefix with
the root, then we check the next bit. If this bit is zero, we then recursively look
up the left sub-tree; otherwise we look up the right sub-tree if the bit is one.
When arrive at the leaf node, we check if the key of the leaf equals to the
one we are looking up.
1: function Look-Up(T, k)
2: if T = NIL then
3: return N IL ▷ Not found
4: while T is not leaf, and Match(k, Prefix(T ), Mask(T )) do
5: if Zero?(k, Mask(T )) then
6: T ← Left(T )
7: else
8: T ← Right(T )
9: if T is leaf, and Key(T ) = k then
10: return Data(T )
11: else
12: return N IL ▷ Not found
Below Python example program implements the looking up algorithm.
def lookup(t, key):
while t is not None and (not t.isleaf()) and t.match(key):
if zero(key, t.mask):
t = t.left
else:
t = t.right
if t is not None and t.isleaf() and t.key == key:
return t.value
return None
5.4.1 Definition
It’s not enough to just use the left and right sub-trees to represent alphabetic
keys. Taking English for example, there are 26 letters. If we don’t care about
the case, one solution is to limit the number of branches (children) to 26. Some
simplified implementation defines the trie with the array of 26 letters. This can
be illustrated as in Figure 5.9.
Not all the 26 branches contain data. For instance, in Figure 5.9, the root
only has three non-empty branches representing letter ’a’, ’b’, and ’z’. Other
branches such as for letter ’c’, are all empty. We will not show empty branch in
the future.
5.4. ALPHABETIC TRIE 111
a b c z
a nil ...
n o o
an
o o y o
boy zoo
t l
bool
another
Figure 5.9: A trie with 26 branches, containing key ’a’, ’an’, ’another’, ’bool’,
’boy’ and ’zoo’.
112 CHAPTER 5. RADIX TREE, TRIE AND PREFIX TREE
When dealing with case sensitive problems, or handling languages other than
English, there can be more letters. We can use the collection data structures,
like Hash map to define the trie.
Alphabetic trie is either empty or a node. There are two types of node.
Both leaf and branch can contain optional satellite data. The following
Haskell code shows the example definition.
data Trie a = Trie { value :: Maybe a
, children :: [(Char, Trie a)]}
Below ANSI C example code defines the alphabetic trie. For illustration
purpose, it limits the character set to lower case English letters, from ’a’ to ’z’.
struct Trie {
struct Trie∗ children[26];
void∗ data;
};
5.4.2 Insertion
When insert to the trie, denote the key to be inserted as K = k1 k2 ...kn , where
ki is the i-th character. K ′ is the rest of characters except k1 , v ′ is the data to
be inserted. The trie is in form T = (v, C), where v is the data store in the trie,
C = {(c1 , T1 ), (c2 , T2 ), ..., (cm , Tm )} is the collection of sub-trees. It associates
a character ci and the corresponding sub-tree Ti . C is empty for leaf node.
{
(v ′ , C) : K = ϕ
insert(T, K, v ′ ) = (5.8)
(v, ins(C, k1 , K ′ , v ′ )) : otherwise.
If the key is empty, the previous value v is overwritten with v ′ . Otherwise,
we need check the children and perform recursive insertion. This is realized in
function ins(C, k1 , K ′ , v ′ ). It examines the (character, sub-tree) pairs in C one
by one. Let C ′ be the rest of pairs except for the first one. This function can
be defined as below.
{(k1 , insert((ϕ, ϕ), K ′ , v ′ ))} : C = ϕ
′ ′
ins(C, k1 , K , v ) = {k1 , insert(T1 , K ′ , v ′ )} ∪ C ′ : k1 = c1 (5.9)
{(c1 , T1 )} ∪ ins(C ′ , k1 , K ′ , v ′ ) : otherwise
To realize the insertion imperatively, starting from the root, we pick the
character one by one from the string. For each character, we examine which
child sub-tree represents that character. If the corresponding child is empty,
a new node is created. After that, we pick the next character and repeat this
process.
After consuming all the characters, we then store the value bound the key
in the node we arrived.
1: function Insert(T, k, v)
2: if T = NIL then
3: T ← Empty-Node
4: p←T
5: for each c in k do
6: if Children(p)[c] = NIL then
7: Children(p)[c] ← Empty-Node
8: p ← Children(p)[c]
9: Data(p) ← v
10: return T
The following example ANSI C program implements the insertion algorithm.
struct Trie∗ insert(struct Trie∗ t, const char∗ key, void∗ value) {
int c;
struct Trie ∗p;
if(!t)
t = create_node();
for (p = t; ∗key; ++key, p = p→children[c]) {
c = ∗key - 'a';
if (!p→children[c])
p→children[c] = create_node();
}
p→data = value;
return t;
}
Where function create_node creates new empty node, with all children
initialized to empty.
struct Trie∗ create_node() {
struct Trie∗ t = (struct Trie∗) malloc(sizeof(struct Trie));
int i;
for (i = 0; i < 26; ++i)
t→children[i] = NULL;
t→data = NULL;
return t;
}
114 CHAPTER 5. RADIX TREE, TRIE AND PREFIX TREE
5.4.3 Look up
When looking up a key, we start from the first character, if it is bound to some
sub-tree, we then recursively search the rest characters in that child sub-tree.
Denote the trie as T = (v, C), the key being looked up as K = k1 k2 ...kn if it
isn’t empty. The first character in the key is k1 , and the rest characters are
represented as K ′ .
v : K=ϕ
lookup(T, K) = ϕ : f ind(C, k1 ) = ϕ (5.10)
lookup(T ′ , K ′ ) : f ind(C, k1 ) = T ′
Exercise 5.1
5.5.1 Definition
Alphabetic prefix tree is a special prefix tree, each node contains multiple
branches. All sub-trees share the longest common prefix string in a node. As
the result, there is no node has only one child, because it conflicts with the
longest common prefix property.
If we turn the trie shown in figure 5.9 into prefix tree by compressing all
nodes which have only one child. we can get a prefix tree as in figure 5.10.
a bo zoo
a zoo
n ol y
an bool boy
other
another
Figure 5.10: A prefix tree, with keys: ’a’, ’an’, ’another’, ’bool’, ’boy’ and ’zoo’.
We can modify the alphabetic trie and adapt it to prefix tree. The tree is
either empty, or a node in form T = (v, C). Where v is the optional satellite
data; C = {(s1 , T1 ), (s2 , T2 ), ..., (sn , Tn )} represents the sub-trees. It is a list of
pairs. Each pair contains a string si , and a sub-tree Ti the string is bound to.
The following Haskell example code defines prefix tree accordingly.
data PrefixTree k v = PrefixTree { value :: Maybe v
, children :: [([k], PrefixTree k v)]}
Below Python example program reuses the trie definition to define prefix
tree.
116 CHAPTER 5. RADIX TREE, TRIE AND PREFIX TREE
class PrefixTree:
def __init__(self, value = None):
self.value = value
self.subtrees = {}
5.5.2 Insertion
When insert a key s, if the prefix tree is empty, we create a leaf node as shown
in figure 5.11 (a). Otherwise, we examine the sub-trees to see if there’s some
tree Ti bound to the string si , and there exists common prefix between si and
s. In such case, we need branch out a new leaf Tj . To do this, we firstly create
a new internal branch node, bind it with the common prefix; then set Ti and Tj
as the two children sub-trees of this node. Ti and Tj share the common prefix.
This is shown in figure 5.11 (b). There are two special cases. s can be the prefix
of si as shown in figure 5.11 (c). Similarly, si can be the prefix of s as shown in
figure 5.11 (d).
For prefix tree T = (v, C), function insert(T, k, v ′ ) inserts key k, and value
′
v to the tree.
{(k, (v ′ , ϕ))} : C=ϕ
′
{(k, (v , CT1 ))} ∪ C ′ : k1 = k
ins(C, k, v ′ ) = ′ ′ (5.13)
{branch(k, v , k 1 , T1 )} ∪ C : match(k1 , k)
{(k1 , T1 )} ∪ ins(C ′ , k, v ′ ) : otherwise
The first clause deals with the edge case of empty children. A leaf node
bound to k, containing v ′ is returned as the only sub-tree. The second clause
overwrites the previous value with v ′ if there is some child bound to the same
key. CT1 represents the children of sub-tree T1 . The third clause branches out
a new leaf if the first child matches the key k. The last clause goes on checking
the rest sub-trees.
We define two keys A and B matching if they have non-empty common
prefix.
match(A, B) = A ̸= ϕ ∧ B ̸= ϕ ∧ a1 = b1 (5.14)
Where a1 and b1 are the first characters in A and B if they are not empty.
Function branch(k1 , v, k2 , T2 ) takes two keys, a value and a tree. It extracts
the longest common prefix k = lcp(k1 , k2 ), and assigns the different part to
k1′ = k1 − k, k2′ = k2 − k. The algorithm firstly handles the edge cases that
either k1 is the prefix of k2 or k2 is the prefix of k1 . For the former one, it creates
a new node containing v, binds this node to k, and set (k2′ , T2 ) as the only child
sub-tree; For the later one, it recursively inserts k1′ and v to T2 . Otherwise, the
algorithm creates a branch node, binds it to the longest common prefix k, and
5.5. ALPHABETIC PREFIX TREE 117
bo
NIL
boy ol y
(a) Insert key ‘boy’ into the empty prefix (b) Insert key ‘bool’. A new branch with
tree, the result is a leaf. common prefix ‘bo’ is created.
another an
x y
p1 p2 ... other
p1 p2 ...
(c) Insert key ‘an’ with value y into x with prefix ‘another’.
insert
another
an p1 ... an p1 ...
insert
other
(d) Insert ‘another’, into the node with prefix ‘an’. We recursively insert key ‘other’ to the child.
set the two children sub-trees for it. One sub-tree is (k2′ , T2 ), the other is a leaf
node containing v, and being bound to k1′ .
(k, (v, {(k2′ , T2 )})) : k = k1
branch(k1 , v, k2 , T2 ) = (k, insert(T2 , k1′ , v)) : k = k2
(k, (ϕ, {(k1′ , (v, ϕ)), (k2′ , T2 )}) : otherwise
(5.15)
Where
k = lcp(k1 , k2 )
k1′ = k1 − k
k2′ = k1 − k
Function lcp(A, B) keeps taking the same characters from A and B one by
one. Denote a1 and b1 as the first characters in A and B if they are not empty.
A′ and B ′ are the rest characters.
{
ϕ : A = ϕ ∨ B = ϕ ∨ a1 ̸= b1
lcp(A, B) = (5.16)
{a1 } ∪ lcp(A′ , B ′ ) : a1 = b1
The following Haskell example program implements the prefix tree insertion
algorithm.
import Data.List (isPrefixOf)
return t
5.5.3 Look up
When look up a key, we can’t examine the characters one by one as in trie any
more. Start from the root, we need search among the children sub-trees to see
if any one is bound to some prefix of the key. If there is such a sub-tree, we
remove the prefix from the key, and recursively look up the updated key in this
child sub-tree. The look up fails if there’s no sub-tree bound to any prefix of
the key.
For prefix tree T = (v, C), we search among its children sub-tree C.
Figure 5.12: E-dictionary. All candidates starting with what the user input are
listed.
Figure 5.13: A search engine. All candidates starting with what user input are
listed.
124 CHAPTER 5. RADIX TREE, TRIE AND PREFIX TREE
A dictionary stores key-value pairs, the key is English word or phrase, the
value is the meaning described in text.
We can store all the words and their meanings in a trie, but it consumes too
large space especially when there are huge amount of items. We’ll use prefix
tree to realize the e-dictionary.
When user wants to look up word ’a’, the dictionary does not only return the
meaning of ’a’, but also provides a list of candidates starting with ’a’, including
’abandon’, ’about’, ’accent’, ’adam’, ... Of course all these words are stored in
the prefix tree.
If there are too many candidates, we can limit only displaying the top 10
candidates, and allow the user to browse more.
To define this algorithm, if the string we are looking for is empty, we ex-
pand all children sub-trees until getting n candidates. Otherwise we recursively
examine the children to find one which has prefix equal to this string.
In programming environments supporting lazy evaluation. An intuitive so-
lution is to lazily expand all candidates, and take the first n on demand. Denote
the prefix tree in form T = (v, C), below function enumerates all items starts
with key k.
enum(C) : k = ϕ, v = ϕ
f indAll(T, k) = {(ϕ, v)} ∪ enum(C) : k = ϕ, v ̸= ϕ (5.19)
f ind(C, k) : k ̸= ϕ
The first two clauses deal with the edge cases that the key is empty. All the
children sub-trees are enumerated except for those with empty values. The last
clause finds child sub-tree matches k.
For non-empty children sub-trees, C = {(k1 , T1 ), (k2 , T2 ), ..., (km , Tm )}, de-
note the rest pairs except for the first one as C ′ . The enumeration algorithm
can be defined as below.
{
ϕ : C=ϕ
enum(C) =
mapAppend(k1 , f indAll(T1 , ϕ)) ∪ enum(C ′ ) :
(5.20)
Where mapAppend(k, L) = {(k + ki , vi )|(ki , vi ) ∈ L}. It concatenate the
prefix k in front of every key-value pair in list L2 .
Function enum can also be defined with concept of concatM ap (also called
f latM ap)3 .
ϕ : C=ϕ
mapAppend(k1 , f indAll(T1 , ϕ)) : k ⊏ k1
f ind(C, k) = (5.22)
mapAppend(k1 , f indAll(T1 , k − k1 ))
: k1 ⊏ k
f ind(C ′ , k) : otherwise
Below example Haskell program implements the e-dictionary application ac-
cording to the above equations.
import Control.Arrow (first)
15: break
16: until ¬match
17: return ϕ
Where function Expand(T, pref ix, n) picks n sub-trees. They share the
same prefix in T . It is realized as BFS (Bread-First-Search) traverse. 14.3.1 in
the Chapter of search explains BFS in detail.
1: function Expand(pref ix, T, n)
2: R←ϕ
3: Q ← {(pref ix, T )}
4: while |R| < n ∧ Q is not empty do
5: (k, T ) ← Pop(Q)
6: if Data(T ) ̸= NIL then
7: R ← R ∪ {(k, Data(T ) )}
8: for ∀(ki , Ti ) ∈ Children(T ) in sorted order do
9: Push(Q, (k + ki , Ti ))
There are typical two methods to input word or phrases with ITU-T key
pad. If user wants to enter a word ‘home’ for example, he can press the keys in
below sequence.
• Press key ’4’, ’6’, ’6’, ’3’, word ‘home’ appears on top of the candidate list;
• Press key ’*’ again to change another candidate word, next word ‘gone’
appears;
• ...
Compare the two methods, the second one is much easier for the user. The
only overhead is the need to store a dictionary of candidate words.
The second method is known as ‘T9’ input method, or predictive input
method [6], [7]. The abbreviation ’T9’ stands for ’textonym’. It start with ’T’
with 9 characters. T9 input can also be realized with prefix tree.
In order to provide candidate words, a dictionary must be prepared in ad-
vance. Prefix tree can be used to store the dictionary. The commercial T9
implementations typically use multiple layers indexed dictionary in both file
system and cache. The realization shown here is for illustration purpose only.
128 CHAPTER 5. RADIX TREE, TRIE AND PREFIX TREE
Firstly, we need define the T9 mapping, which maps from digit to candidate
characters.
MT−1
9 = concat({{c → d|c ∈ S}|(d → S) ∈ MT 9 }) (5.24)
Given a sequence of characters, we can convert it to a sequence of digits by
looking up MT−1
9.
digits(S) = {MT−1
9 [c]|c ∈ S} (5.25)
When input digits D = d1 d2 ...dn , we define the T9 lookup algorithm as
below.
{
{ϕ} : D = ϕ
f indT 9(T, D) = (5.26)
concatM ap(f ind, pref ixes(T )) : otherwise
Where T is the prefix tree built from a set of words and phrases. It’s kind
of a dictionary we’ll look up. If the input D is empty, the result is an empty
string. Otherwise, it looks up the sub-trees that match the input, and concat
the result together.
To enumerate the matched sub-trees, we examine all the children sub-trees
CT , for every pair (ki , Ti ). We first convert string ki to digit sequence di , then
compare di and D. If either one is the prefix of the other, then this pair is
selected as a candidate for further search.
mapT9 = Map.fromList [('1', ",."), ('2', "abc"), ('3', "def"), ('4', "ghi"),
('5', "jkl"), ('6', "mno"), ('7', "pqrs"), ('8', "tuv"),
('9', "wxyz")]
5.6. APPLICATIONS OF TRIE AND PREFIX TREE 129
def digits(w):
return ''.join([T9RMAP[c] for c in w])
Exercise 5.2
5.7 Summary
In this chapter, we start from the integer based trie and prefix tree. The map
data structure based on integer tree plays the important role in Compiler im-
plementation. Alphabetic trie and prefix tree are natural extensions. They can
manipulate text information. We demonstrate how to realize the predictive e-
dictionary and T9 input method with prefix tree, although these examples are
different from the commercial implementations. Other data structure, suffix
tree, has close relationship with trie and prefix tree. Suffix tree is introduced in
Appendix D.
Bibliography
[2] Chris Okasaki and Andrew Gill. “Fast Mergeable Integer Maps”. Workshop
on ML, September 1998, pages 77-86, http://www.cse.ogi.edu/~andy/
pub/finite.htm
[3] D.R. Morrison, “PATRICIA – Practical Algorithm To Retrieve Information
Coded In Alphanumeric”, Journal of the ACM, 15(4), October 1968, pages
514-534.
[4] Suffix Tree, Wikipedia. http://en.wikipedia.org/wiki/Suffix_tree
[5] Trie, Wikipedia. http://en.wikipedia.org/wiki/Trie
[6] T9 (predictive text), Wikipedia. http://en.wikipedia.org/wiki/T9_
(predictive_text)
[7] Predictive text, Wikipedia. http://en.wikipedia.org/wiki/
Predictive_text
131
132 B-Trees
Chapter 6
B-Trees
6.1 Introduction
B-Tree is important data structure. It is widely used in modern file systems.
Some are implemented based on B+ tree, which is extended from B-tree. B-tree
is also widely used in database systems.
Some textbooks introduce B-tree with the the problem of how to access a
large block of data on magnetic disks or secondary storage devices[2]. It is
also helpful to understand B-tree as a generalization of balanced binary search
tree[2].
Refer to the Figure 6.1, It is easy to find the difference and similarity of
B-tree regarding to binary search tree.
C G P T W
A B D E F H I J K N O Q R S U V X Y Z
• or a node contains 3 parts, a value, a left child and a right child. Both
children are also binary search trees.
• all the values on the left child are not greater than the value of of this
node;
• the value of this node is not greater than any values on the right child.
133
134 CHAPTER 6. B-TREES
For non-empty binary tree (L, k, R), where L, R and k are the left, right chil-
dren, and the key. Function Key(T ) accesses the key of tree T . The constraint
can be represented as the following.
The keys and children in a node satisfy the following order constraints.
• Keys are stored in non-decreasing order. that k1 ≤ k2 ≤ ... ≤ kn ;
• for each ki , all elements stored in child ci are not greater than ki , while
ki is not greater than any values stored in child ci+1 .
The constraints can be represented as in equation (6.2) as well.
Thus we have the inequality between the height and the number of keys.
n+1
h ≤ logt (6.4)
2
This is the reason why B-tree is balanced. The simplest B-tree is so called
2-3-4 tree, where t = 2, that every node except root contains 2 or 3 or 4 keys.
red-black tree can be mapped to 2-3-4 tree essentially.
The following Python code shows example B-tree definition. It explicitly
pass t when create a node.
class BTree:
def __init__(self, t):
self.t = t
self.keys = []
self.children = []
B-tree nodes commonly have satellite data as well. We ignore satellite data
for illustration purpose.
In this chapter, we will firstly introduce how to generate B-tree by insertion.
Two different methods will be explained. One is the classic method as in [2],
that we split the node before insertion if it’s full; the other is the modify-fix
approach which is quite similar to the red-black tree solution [3] [2]. We will
next explain how to delete key from B-tree and how to look up a key.
6.2 Insertion
B-tree can be created by inserting keys repeatedly. The basic idea is similar to
the binary search tree. When insert key x, from the tree root, we examine all
the keys in the node to find a position where all the keys on the left are less
than x, while all the keys on the right are greater than x.1 If the current node
is a leaf node, and it is not full (there are less then 2t − 1 keys in this node), x
will be insert at this position. Otherwise, the position points to a child node.
We need recursively insert x to it.
Figure 6.3 shows one example. The B-tree illustrated is 2-3-4 tree. When
insert key x = 22, because it’s greater than the root, the right child contains
key 26, 38, 45 is examined next; Since 22 < 26, the first child contains key 21
and 25 are examined. This is a leaf node, and it is not full, key 22 is inserted
to this node.
However, if there are 2t − 1 keys in the leaf, the new key x can’t be inserted,
because this node is ’full’. When try to insert key 18 to the above example
B-tree will meet this problem. There are 2 methods to solve it.
6.2.1 Splitting
Split before insertion
If the node is full, one method to solve the problem is to split to node before
insertion.
1 This is a strong constraint. In fact, only less-than and equality testing is necessary. The
20
4 11 26 38 45
1 2 5 8 9 12 15 16 17 21 25 30 31 37 40 42 46 47 50
(a) Insert key 22 to the 2-3-4 tree. 22 > 20, go to the right child; 22 < 26 go
to the first child.
20
4 11 26 38 45
1 2 5 8 9 12 15 16 17 21 22 25 30 31 37 40 42 46 47 50
For a node with t − 1 keys, it can be divided into 3 parts as shown in Figure
6.4. the left part contains the first t − 1 keys and t children. The right part
contains the rest t − 1 keys and t children. Both left part and right part are
valid B-tree nodes. the middle part is the t-th key. We can push it up to the
parent node (if the current node is root, then the this key, with the two children
will be the new root).
For node x, denote K(x) as keys, C(x) as children. The i-th key as ki (x),
the j-th child as cj (x). Below algorithm describes how to split the i-th child for
a given node.
1: procedure Split-Child(node, i)
2: x ← ci (node)
3: y ← CREATE-NODE
4: Insert(K(node), i, kt (x))
5: Insert(C(node), i + 1, y)
6: K(y) ← {kt+1 (x), kt+2 (x), ..., k2t−1 (x)}
7: K(x) ← {k1 (x), k2 (x), ..., kt−1 (x)}
8: if y is not leaf then
9: C(y) ← {ct+1 (x), ct+2 (x), ..., c2t (x)}
10: C(x) ← {c1 (x), c2 (x), ..., ct (x)}
The following example Python program implements this child splitting al-
gorithm.
def split_child(node, i):
t = node.t
x = node.children[i]
y = BTree(t)
node.keys.insert(i, x.keys[t-1])
node.children.insert(i+1, y)
y.keys = x.keys[t:]
6.2. INSERTION 137
x.keys = x.keys[:t-1]
if not is_leaf(x):
y.children = x.children[t:]
x.children = x.children[:t]
E P
C M S U X
A D G J K N O R T V Y Z
D M P T
A C E G J K N O R S U V X Y Z
(b) t = 3
def is_full(node):
return len(node.keys) ≥ 2 ∗ node.t - 1
For the array based collection, append on the tail is much more effective
than insert in other position, because the later takes O(n) time, if the length
of the collection is n. The ordered_insert program firstly appends the new
element at the end of the existing collection, then iterates from the last element
to the first one, and checks if the current two elements next to each other are
ordered. If not, these two elements will be swapped.
Function ins(T, k) traverse the B-tree T from root to find a proper position
where key k can be inserted. After that, function f ix is applied to resume the
B-tree properties. Denote B-tree in a form of T = (K, C, t), where K represents
keys, C represents children, and t is the minimum degree.
Below is the Haskell definition of B-tree.
data BTree a = Node{ keys :: [a]
, children :: [BTree a]
, degree :: Int} deriving (Eq)
There are two cases when realize ins(T, k) function. If the tree T is leaf, k
is inserted to the keys; Otherwise if T is the branch node, we need recursively
insert k to the proper child.
Figure 6.6 shows the branch case. The algorithm first locates the position.
for certain key ki , if the new key k to be inserted satisfy ki−1 < k < ki , Then
we need recursively insert k to child ci .
This position divides the node into 3 parts, the left part, the child ci and
the right part.
k, K[i-1]<k<K[i]
insert to
recursive insert
{
(K ′ ∪ {k} ∪ K ′′ , ϕ, t) : C = ϕ, (K ′ , K ′′ ) = divide(K, k)
ins(T, k) =
make((K , C1 ), ins(c, k), (K ′′ , C2′ )) : (C1 , C2 ) = split(|K ′ |, C)
′
(6.6)
The first clause deals with the leaf case. Function divide(K, k) divide keys
into two parts, all keys in the first part are not greater than k, and all rest keys
are not less than k.
K = K ′ ∪ K ′′ ∧ ∀k ′ ∈ K ′ , k ′′ ∈ K ′′ ⇒ k ′ ≤ k ≤ k ′′
6.2. INSERTION 141
The second clause handle the branch case. Function split(n, C) splits chil-
dren in two parts, C1 and C2 . C1 contains the first n children; and C2 contains
the rest. Among C2 , the first child is denoted as c, and others are represented
as C2′ .
Here the key k need be recursively inserted into child c. Function make
takes 3 parameter. The first and the third are pairs of key and children; the
second parameter is a child node. It examines if a B-tree node made from these
keys and children violates the minimum degree constraint and performs fixing
if necessary.
{
′ ′ ′′ ′′ f ixF ull((K ′ , C ′ ), c, (K ′′ , C ′′ )) :
f ull(c)
make((K , C ), c, (K , C )) =
(K ′ ∪ K ′′ , C ′ ∪ {c} ∪ C ′′ , t) :
otherwise
(6.7)
Where function f ull(c) tests if the child c is full. Function f ixF ull splits
the the child c, and forms a new B-tree node with the pushed up key.
c : T = (ϕ, {c}, t)
f ix(T ) = ({k ′ }, {c1 , c2 }, t) : f ull(T ), (c1 , k ′ , c2 ) = split(T ) (6.9)
T : otherwise
E O
C M R T V
A D G J K N P S U X Y Z
G M P T
A C D E J K N O R S U V X Y Z
6.3 Deletion
Deleting a key from B-tree may violate balance properties. Except the root, a
node shouldn’t contain too few keys less than t − 1, where t is the minimum
degree.
Similar to the approaches for insertion, we can either do some preparation
so that the node from where the key being deleted contains enough keys; or do
some fixing after the deletion if the node has too few keys.
• Case 2a, If the child y precedes k contains enough keys (more than t), we
replace k in node x with k ′ , which is the predecessor of k in child y. And
recursively remove k ′ from y.
The predecessor of k can be easily located as the last key of child y.
This is shown in figure 6.8.
• Case 2b, If y doesn’t contain enough keys, while the child z follows k
contains more than t keys. We replace k in node x with k ′′ , which is the
successor of k in child z. And recursively remove k ′′ from z.
The successor of k can be easily located as the first key of child z.
This sub-case is illustrated in figure 6.9.
• Case 2c, Otherwise, if neither y, nor z contains enough keys, we can merge
y, k and z into one new node, so that this new node contains 2t − 1 keys.
After that, we can then recursively do the removing.
Note that after merge, if the current node doesn’t contain any keys, which
means k is the only key in x. y and z are the only two children of x. we
need shrink the tree height by one.
• Case 3a, We check the two sibling of ci , which are ci−1 and ci+1 . If either
one contains enough keys (at least t keys), we move one key from x down
144 CHAPTER 6. B-TREES
to ci , and move one key from the sibling up to x. Also we need move the
relative child from the sibling to ci .
This operation makes ci contains enough keys for deletion. we can next
try to delete k from ci recursively.
Figure 6.11 illustrates this case.
• Case 3b, In case neither one of the two siblings contains enough keys, we
then merge ci , a key from x, and either one of the sibling into a new node.
Then do the deletion on this new node.
Procedure Merge-Children merges the i-th child, the i-th key, and i + 1-
th child of node T into a new child, and remove the i-th key and i + 1-th child
from T after merging.
With these functions defined, the B-tree deletion algorithm can be given by
realizing the above 3 cases.
1: function Delete(T, k)
2: i←1
3: while i ≤ |K(T )| do
4: if k = ki (T ) then
5: if T is leaf then ▷ case 1
6: Remove(K(T ), k)
7: else ▷ case 2
8: if Can-Del(ci (T )) then ▷ case 2a
9: ki (T ) ← Last-Key(ci (T ))
10: Delete(ci (T ), ki (T ))
11: else if Can-Del(ci+1 (T )) then ▷ case 2b
12: ki (T ) ← First-Key(ci+1 (T ))
13: Delete(ci+1 (T ), ki (T ))
14: else ▷ case 2c
15: Merge-Children(T, i)
16: Delete(ci (T ), k)
17: if K(T ) = N IL then
18: T ← ci (T ) ▷ Shrinks height
19: return T
20: else if k < ki (T ) then
6.3. DELETION 147
21: Break
22: else
23: i←i+1
C G M T X
A B D E F J K L N O Q R S U V Y Z
C G M T X
A B D E J K L N O Q R S U V Y Z
C G L T X
A B D E J K N O Q R S U V Y Z
C L T X
A B D E J K N O Q R S U V Y Z
C L P T X
A B E J K N O Q R S U V Y Z
(a) After delete key ’D’, case 3b, and height is shrunk.
E L P T X
A C J K N O Q R S U V Y Z
(b) After delete key ’B’, case 3a, borrow from right sibling.
E L P S X
A C J K N O Q R T V Y Z
(c) After delete key ’U’, case 3a, borrow from left sibling.
tr.keys.remove(key)
else: # case 2 in CLRS
if tr.children[i-1].can_remove(): # case 2a
key = tr.replace_key(i-1, tr.children[i-1].keys[-1])
B_tree_delete(tr.children[i-1], key)
elif tr.children[i].can_remove(): # case 2b
key = tr.replace_key(i-1, tr.children[i].keys[0])
B_tree_delete(tr.children[i], key)
else: # case 2c
tr.merge_children(i-1)
B_tree_delete(tr.children[i-1], key)
if tr.keys==[]: # tree shrinks in height
tr = tr.children[i-1]
return tr
elif key > tr.keys[i-1]:
break
else:
i = i-1
# case 3
if tr.leaf:
return tr #key doesn't exist at all
if not tr.children[i].can_remove():
if i>0 and tr.children[i-1].can_remove(): #left sibling
tr.children[i].keys.insert(0, tr.keys[i-1])
tr.keys[i-1] = tr.children[i-1].keys.pop()
if not tr.children[i].leaf:
tr.children[i].children.insert(0, tr.children[i-1].children.pop())
elif i<len(tr.children) and tr.children[i+1].can_remove(): #right sibling
150 CHAPTER 6. B-TREES
tr.children[i].keys.append(tr.keys[i])
tr.keys[i]=tr.children[i+1].keys.pop(0)
if not tr.children[i].leaf:
tr.children[i].children.append(tr.children[i+1].children.pop(0))
else: # case 3b
if i>0:
tr.merge_children(i-1)
else:
tr.merge_children(i)
B_tree_delete(tr.children[i], key)
if tr.keys==[]: # tree shrinks in height
tr = tr.children[0]
return tr
Figure 6.16: Delete a key from a branch node. Removing ki breaks the node
into 2 parts. Merging these 2 parts is a recursive process. When the two parts
are leaves, the merging terminates.
Figure 6.17: After delete key k from node ci , denote the result as c′i . The fixing
makes a new node from the left part, c′i and the right part.
152 CHAPTER 6. B-TREES
Figure 6.18: Borrow a key-child pair from left part and un-split to a new child.
Denote the B-tree as T = (K, C, t), where K and C are keys and children.
The del(T, k) function deletes key k from the tree.
(delete(K, k), ϕ, t) : C=ϕ
del(T, k) = merge((K1 , C1 , t), (K2 , C2 , t)) : ki = k (6.11)
make((K1′ , C1′ ), del(c, k), (K2′ , C2′ )) : k∈/K
If k ∈
/ K, we need locate a child c, and further delete k from it.
The recursive merge function is defined as the following. When merge two
trees T1 = (K1 , C1 , t) and T2 = (K2 , C2 , t), if both are leaves, we create a new
leave by concatenating the keys. Otherwise, the last child in C1 , and the first
child in C2 are recursively merged. And we call make function to form the new
tree. When C1 and C2 are not empty, denote the last child of C1 as c1,m , the
rest as C1′ ; the first child of C2 as C2,1 , the rest as C2′ . Below equation defines
6.3. DELETION 153
{
(K1 ∪ K2 , ϕ, t) : C1 = C2 = ϕ
merge(T1 , T2 ) =
make((K1 , C1′ ), merge(c1,m , c2,1 ), (K2 , C2′ )) : otherwise
(6.12)
The make function defined above only handles the case that a node contains
too many keys due to insertion. When delete key, it may cause a node contains
too few keys. We need test and fix this situation as well.
f ixF ull((K ′ , C ′ ), c, (K ′′ , C ′′ )) : f ull(c)
′ ′ ′′ ′′
make((K , C ), c, (K , C )) = f ixLow((K ′ , C ′ ), c, (K ′′ , C ′′ )) : low(c)
(K ′ ∪ K ′′ , C ′ ∪ {c} ∪ C ′′ , t) : otherwise
(6.13)
Where low(T ) checks if there are too few keys less than t − 1. Function
f ixLow(Pl , c, Pr ) takes three arguments, the left pair of keys and children, a
child node, and the right pair of keys and children. If the left part isn’t empty, we
borrow a pair of key-child, and do un-splitting to make the child contain enough
keys, then recursively call make; If the right part isn’t empty, we borrow a pair
from the right; and if both sides are empty, we return the child node as result.
In this case, the height of the tree shrinks.
Denote the left part Pl = (Kl , Cl ). If Kl isn’t empty, the last key and child
are represented as kl,m and cl,m respectively. The rest keys and children become
Kl′ and Cl′ ; Similarly, the right part is denoted as Pr = (Kr , Cr ). If Kr isn’t
empty, the first key and child are represented as kr,1 , and cr,1 . The rest keys
and children are Kr′ and Cr′ . Below equation gives the definition of f ixLow.
make((Kl′ , Cl′ ), unsplit(cl,m , kl,m , c), (Kr , Cr )) : Kl ̸= ϕ
f ixLow(Pl , c, Pr ) = make((Kr , Cr ), unsplit(c, kr,1 , cr,1 ), (Kr′ , Cr′ )) : Kr ≠ ϕ
c : otherwise
(6.14)
Function unsplit(T1 , k, T2 ) is the inverse operation to splitting. It forms a
new B-tree nodes from two small nodes and a key.
fixLow (ks'@(_:_), cs') c (ks'', cs'') = make (init ks', init cs')
(unsplit (last cs') (last ks') c)
(ks'', cs'')
fixLow (ks', cs') c (ks''@(_:_), cs'') = make (ks', cs')
(unsplit c (head ks'') (head cs''))
(tail ks'', tail cs'')
fixLow _ c _ = c
When delete the same keys from the B-tree as in delete and fixing approach,
the results are different. However, both satisfy the B-tree properties, so they
are all valid.
C G P T W
A B D E F H I J K N O Q R S U V X Y Z
C G P T W
A B D F H I J K N O Q R S U V X Y Z
C H P T W
A B D F I J K N O Q R S U V X Y Z
H M P T W
B C D F I J K N O Q R S U V X Y Z
H P T W
B C D F I J K N O Q R S U V X Y Z
H P W
B C D F I J K N O Q R S T V X Y Z
6.4 Searching
Searching in B-tree can be considered as the generalized tree search extended
from binary search tree.
When searching in the binary tree, there are only 2 different directions, the
left and the right. However, there are multiple directions in B-tree.
1: function Search(T, k)
2: loop
3: i←1
4: while i ≤ |K(T )| ∧ k > ki (T ) do
5: i←i+1
6: if i ≤ |K(T )| ∧ k = ki (T ) then
7: return (T, i)
8: if T is leaf then
9: return N IL ▷ k doesn’t exist
10: else
11: T ← ci (T )
Starts from the root, this program examines each key one by one from the
smallest to the biggest. In case it finds the matched key, it returns the current
node and the index of this key. Otherwise, if it finds the position i that ki <
k < ki+1 , the program will next search the child node ci+1 for the key. If it
traverses to some leaf node, and fails to find the key, the empty value is returned
to indicate that this key doesn’t exist in the tree.
The following example Python program implements the search algorithm.
def B_tree_search(tr, key):
while True:
for i in range(len(tr.keys)):
if key ≤ tr.keys[i]:
break
if key == tr.keys[i]:
return (tr, i)
if tr.leaf:
return None
else:
if key > tr.keys[-1]:
i=i+1
tr = tr.children[i]
The search algorithm can also be realized by recursion. When search key k
in B-tree T = (K, C, t), we partition the keys with k.
K1 = {k ′ |k ′ < k}
K2 = {k ′ |k ≤ k ′ }
Thus K1 contains all the keys less than k, and K2 holds the rest. If the first
element in K2 is equal to k, we find the key. Otherwise, we recursively search
the key in child c|K1 |+1 .
(T, |K1 | + 1) : k ∈ K2
search(T, k) = ϕ : C=ϕ (6.16)
search(c|K1 |+1 , k) : otherwise
6.5. NOTES AND SHORT SUMMARY 157
Exercise 6.1
• When insert a key, we need find a position, where all keys on the left are
less than it, while all the others on the right are greater than it. Modify
the algorithm so that the elements stored in B-tree only need support
less-than and equality test.
• We assume the element being inserted doesn’t exist in the tree. Modify
the algorithm so that duplicated elements can be stored in a linked-list.
• Eliminate the recursion in imperative B-tree insertion algorithm.
158 CHAPTER 6. B-TREES
Bibliography
159
160 BIBLIOGRAPHY
Part III
Heaps
161
Chapter 7
Binary Heaps
7.1 Introduction
Heaps are one of the most widely used data structures–used to solve practical
problems such as sorting, prioritized scheduling and in implementing graph
algorithms, to name a few[2].
Most popular implementations of heaps use a kind of implicit binary heap
using arrays, which is described in [2]. Examples include C++/STL heap and
Python heapq. The most efficient heap sort algorithm is also realized with
binary heap as proposed by R. W. Floyd [3] [5].
However, heaps can be general and realized with varies of other data struc-
tures besides array. In this chapter, explicit binary tree is used. It leads to
Leftist heaps, Skew heaps, and Splay heaps, which are suitable for purely func-
tional implementation as shown by Okasaki[6].
A heap is a data structure that satisfies the following heap property.
• Top operation always returns the minimum (maximum) element;
• Pop operation removes the top element from the heap while the heap
property should be kept, so that the new top element is still the minimum
(maximum) one;
• Insert a new element to heap should keep the heap property. That the
new top is still the minimum (maximum) element;
• Other operations including merge etc should all keep the heap property.
This is a kind of recursive definition, while it doesn’t limit the under ground
data structure.
We call the heap with the minimum element on top as min-heap, while if
the top keeps the maximum element, we call it max-heap.
163
164 CHAPTER 7. BINARY HEAPS
return the root as the result. And for ‘pop’ operation, we can remove the root
and rebuild the tree from the children.
If binary tree is used to implement the heap, we can call it binary heap. This
chapter explains three different realizations for binary heap.
7.2.1 Definition
The first one is implicit binary tree. Consider the problem how to represent
a complete binary tree with array. (For example, try to represent a complete
binary tree in the programming language doesn’t support structure or record
data type. Only array can be used). One solution is to pack all elements from
top level (root) down to bottom level (leaves).
Figure 7.1 shows a complete binary tree and its corresponding array repre-
sentation.
16
14 10
8 7 9 3
2 4 1
16 14 10 8 7 9 3 2 4 1
This mapping between tree and array can be defined as the following equa-
tions (The array index starts from 1).
1: function Parent(i)
2: return ⌊ 2i ⌋
3: function Left(i)
4: return 2i
5: function Right(i)
6: return 2i + 1
For a given tree node which is represented as the i-th element of the array,
since the tree is complete, we can easily find its parent node as the ⌊i/2⌋-th
element; Its left child with index of 2i and right child of 2i + 1. If the index of
the child exceeds the length of the array, it means this node does not have such
a child (leaf for example).
In real implementation, this mapping can be calculated fast with bit-wise
operation like the following example ANSI C code. Note that, the array index
starts from zero in C like languages.
7.2. IMPLICIT BINARY HEAP BY ARRAY 165
7.2.2 Heapify
The most important thing for heap algorithm is to maintain the heap property,
that the top element should be the minimum (maximum) one.
For the implicit binary heap by array, it means for a given node, which is
represented as the i-th index, we can develop a method to check if both its two
children are not less than the parent. In case there is violation, we need swap
the parent and child recursively [2]. Note that here we assume both the two
sub-trees are the valid heaps.
Below algorithm shows the iterative solution to enforce the min-heap prop-
erty from a given index of the array.
1: function Heapify(A, i)
2: n ← |A|
3: loop
4: l ← Left(i)
5: r ← Right(i)
6: smallest ← i
7: if l < n ∧ A[l] < A[i] then
8: smallest ← l
9: if r < n ∧ A[r] < A[smallest] then
10: smallest ← r
11: if smallest ̸= i then
12: Exchange A[i] ↔ A[smallest]
13: i ← smallest
14: else
15: return
For array A and the given index i, None its children should be less than
A[i], in case there is violation, we pick the smallest element as A[i], and swap
the previous A[i] to child. The algorithm traverses the tree top-down to fix the
heap property until either reach a leaf or there is no heap property violation.
The Heapify algorithm takes O(lg n) time, where n is the number of ele-
ments. This is because the loop time is proportion to the height of the complete
binary tree.
When implement this algorithm, the comparison method can be passed as
a parameter, so that both min-heap and max-heap can be supported. The
following ANSI C example code uses this approach.
typedef int (∗Less)(Key, Key);
int less(Key x, Key y) { return x < y; }
int notless(Key x, Key y) { return !less(x, y); }
l = LEFT(i);
r = RIGHT(i);
m = i;
if (l < n && lt(a[l], a[i]))
m = l;
if (r < n && lt(a[r], a[m]))
m = r;
if (m != i) {
swap(a, i, m);
i = m;
} else
break;
}
}
Figure 7.2 illustrates the steps when Heapify processing the array {16, 4, 10, 14, 7, 9, 3, 2, 8, 1}
from the second index. The array changes to {16, 14, 10, 8, 7, 9, 3, 2, 4, 1} as a
max-heap.
1 1 1
S = n( + 2 + 3 + ...) (7.1)
4 8 16
7.2. IMPLICIT BINARY HEAP BY ARRAY 167
16
4 10
14 7 9 3
2 8 1
(a) Step 1, 14 is the biggest element among 4, 14, and 7. Swap 4 with the left
child;
16
14 10
4 7 9 3
2 8 1
(b) Step 2, 8 is the biggest element among 2, 4, and 8. Swap 4 with the right
child;
16
14 10
8 7 9 3
2 4 1
Figure 7.3, 7.4 and 7.5 show the steps when building a max-heap from array
{4, 1, 3, 2, 16, 9, 10, 14, 8, 7}. The node in black color is the one where Heapify
being applied, the nodes in gray color are swapped in order to keep the heap
property.
Heap Pop
Pop operation is more complex than accessing the top, because the heap prop-
erty has to be maintained after the top element is removed.
The solution is to apply Heapify algorithm to the next element after the
root is removed.
One simple but slow method based on this idea looks like the following.
1: function Pop-Slow(A)
2: x ← Top(A)
3: Remove(A, 1)
7.2. IMPLICIT BINARY HEAP BY ARRAY 169
4 1 3 2 16 9 10 14 8 7
1 3
2 16 9 10
14 8 7
(b) Step 1, The array is mapped to binary tree. The first branch node, which
is 16 is examined;
1 3
2 16 9 10
14 8 7
(c) Step 2, 16 is the largest element in current sub tree, next is to check node
with value 2;
Figure 7.3: Build a heap from the arbitrary array. Gray nodes are changed in
each step, black node will be processed next step.
170 CHAPTER 7. BINARY HEAPS
1 3
14 16 9 10
2 8 7
(a) Step 3, 14 is the largest value in the sub-tree, swap 14 and 2; next is to
check node with value 3;
1 10
14 16 9 3
2 8 7
(b) Step 4, 10 is the largest value in the sub-tree, swap 10 and 3; next is to
check node with value 1;
Figure 7.4: Build a heap from the arbitrary array. Gray nodes are changed in
each step, black node will be processed next step.
7.2. IMPLICIT BINARY HEAP BY ARRAY 171
16 10
14 7 9 3
2 8 1
(a) Step 5, 16 is the largest value in current sub-tree, swap 16 and 1 first; then
similarly, swap 1 and 7; next is to check the root node with value 4;
16
14 10
8 7 9 3
2 4 1
(b) Step 6, Swap 4 and 16, then swap 4 and 14, and then swap 4 and 8; And
the whole build process finish.
Figure 7.5: Build a heap from the arbitrary array. Gray nodes are changed in
each step, black node will be processed next step.
172 CHAPTER 7. BINARY HEAPS
popped result
7.2. IMPLICIT BINARY HEAP BY ARRAY 173
Decrease key
Heap can be used to implement priority queue. It is important to support key
modification in heap. One typical operation is to increase the priority of a tasks
so that it can be performed earlier.
Here we present the decrease key operation for a min-heap. The correspond-
ing operation is increase key for max-heap. Figure 7.6 and 7.7 illustrate such a
case for a max-heap. The key of the 9-th node is increased from 4 to 15.
16
14 10
8 7 9 3
2 4 1
16
14 10
8 7 9 3
2 15 1
(b) The key is modified to 15, which is greater than its parent;
16
14 10
15 7 9 3
2 8 1
Once a key is decreased in a min-heap, it may make the node conflict with
the heap property, that the key may be less than some ancestor. In order to
maintain the invariant, the following auxiliary algorithm is defined to resume
174 CHAPTER 7. BINARY HEAPS
16
15 10
14 7 9 3
2 8 1
(a) Since 15 is greater than its parent 14, they are swapped. After that, because
15 is less than 16, the process terminates.
This algorithm repeatedly compares the keys of parent node and current
node. It swap the nodes if the parent contains the smaller key. This process
is performed from the current node towards the root node till it finds that the
parent node holds the smaller key.
With this auxiliary algorithm, decrease key can be realized as below.
1: function Decrease-Key(A, i, k)
2: if k < A[i] then
3: A[i] ← k
4: Heap-Fix(A, i)
This algorithm is only triggered when the new key is less than the original
key. The performance is bound to O(lg n). Below example ANSI C program
implements the algorithm.
Insertion
Insertion can be implemented by using Decrease-Key [2]. A new node with
∞ as key is created. According to the min-heap property, it should be the last
element in the under ground array. After that, the key is decreased to the value
to be inserted, and Decrease-Key is called to fix any violation to the heap
property.
Alternatively, we can reuse Heap-Fix to implement insertion. The new key
is directly appended at the end of the array, and the Heap-Fix is applied to
this new node.
1: function Heap-Push(A, k)
2: Append(A, k)
3: Heap-Fix(A, |A|)
The following example Python program implements the heap insertion algo-
rithm.
def heap_insert(x, key, less_p = MIN_HEAP):
i = len(x)
x.append(key)
heap_fix(x, i, less_p)
top, it may violate the heap property. We can shrink the heap size by one and
perform Heapify to resume the heap property. This process is repeated till
there is only one element left in the heap.
1: function Heap-Sort(A)
2: Build-Max-Heap(A)
3: while |A| > 1 do
4: Exchange A[1] ↔ A[n]
5: |A| ← |A| − 1
6: Heapify(A, 1)
This is in-place algorithm, it needn’t any extra spaces to hold the result.
The following ANSI C example code implements this algorithm.
void heap_sort(Key∗ a, int n) {
build_heap(a, n, notless);
while(n > 1) {
swap(a, 0, --n);
heapify(a, 0, n, notless);
}
}
Exercise 7.1
• Because of the same reason, can we perform Heapify from left to right k
times to realize in-place top-k algorithm like below ANSI C code?
int tops(int k, Key∗ a, int n, Less lt) {
build_heap(a, n, lt);
for (k = MIN(k, n) - 1; k; --k)
heapify(++a, 0, --n, lt);
return k;
}
7.3. LEFTIST HEAP AND SKEW HEAP, THE EXPLICIT BINARY HEAPS177
L R
Figure 7.8: A binary tree, all elements in children are not less than k.
If k is the top element, all elements in left and right children are not less than
k in a min-heap. After k is popped, only left and right children are left. They
have to be merged to a new tree. Since heap property should be maintained
after merge, the new root is still the smallest element.
Because both left and right children are binary trees conforming heap prop-
erty, the two trivial cases can be defined immediately.
H2 : H1 = ϕ
merge(H1 , H2 ) = H1 : H2 = ϕ
? : otherwise
Where ϕ means empty heap.
If neither left nor right child is empty, because they all fit heap property, the
top elements of them are all the minimum respectively. We can compare these
two roots, and select the smaller as the new root of the merged heap.
For instance, let L = (A, x, B) and R = (A′ , y, B ′ ), where A, A′ , B, and B ′
are all sub trees. If x < y, x will be the new root. We can either keep A, and
recursively merge B and R; or keep B, and merge A and R, so the new heap
can be one of the following.
• (merge(A, R), x, B)
Both are correct. One simplified solution is to only merge the right sub tree.
Leftist tree provides a systematically approach based on this idea.
7.3.1 Definition
The heap implemented by Leftist tree is called Leftist heap. Leftist tree is first
introduced by C. A. Crane in 1972[6].
Rank (S-value)
In Leftist tree, a rank value (or S value) is defined for each node. Rank is the
distance to the nearest external node. Where external node is the NIL concept
extended from the leaf node.
For example, in figure 7.9, the rank of NIL is defined 0, consider the root
node 4, The nearest external node is the child of node 8. So the rank of root
node 4 is 2. Because node 6 and node 8 both only contain NIL, so their rank
values are 1. Although node 5 has non-NIL left child, However, since the right
child is NIL, so the rank value, which is the minimum distance to NIL is still 1.
5 8
NIL NIL
Leftist property
With rank defined, we can create a strategy when merging.
• Every time when merging, we always merge to right child; Denote the
rank of the new right sub tree as rr ;
• Compare the ranks of the left and right children, if the rank of left sub
tree is rl and rl < rr , we swap the left and the right children.
We call this ‘Leftist property’. In general, a Leftist tree always has the
shortest path to some external node on the right.
Leftist tree tends to be very unbalanced, However, it ensures important
property as specified in the following theorem.
Theorem 7.3.1. If a Leftist tree T contains n internal nodes, the path from
root to the rightmost external node contains at most ⌊log(n + 1)⌋ nodes.
7.3. LEFTIST HEAP AND SKEW HEAP, THE EXPLICIT BINARY HEAPS179
We skip the proof here, readers can refer to [7] and [1] for more information.
With this theorem, algorithms operate along this path are all bound to O(lg n).
We can reuse the binary tree definition, and augment with a rank field to
define the Leftist tree, for example in form of (r, k, L, R) for non-empty case.
Below Haskell code defines the Leftist tree.
data LHeap a = E -- Empty
| Node Int a (LHeap a) (LHeap a) -- rank, element, left, right
For empty tree, the rank is defined as zero. Otherwise, it’s the value of the
augmented field. A rank(H) function can be given to cover both cases.
{
0 : H=ϕ
rank(H) = (7.3)
r : otherwise, H = (r, k, L, R)
Here is the example Haskell rank function.
rank E = 0
rank (Node r _ _ _) = r
7.3.2 Merge
In order to realize ‘merge’, we need develop the auxiliary algorithm to compare
the ranks and swap the children if necessary.
{
(rA + 1, k, B, A) : rA < rB
mk(k, A, B) = (7.4)
(rB + 1, k, A, B) : otherwise
This function takes three arguments, a key and two sub trees A, and B. if
the rank of A is smaller, it builds a bigger tree with B as the left child, and A
as the right child. It increment the rank of A by 1 as the rank of the new tree;
Otherwise if B holds the smaller rank, then A is set as the left child, and B
becomes the right. The resulting rank is rb + 1.
The reason why rank need be increased by one is because there is a new key
added on top of the tree. It causes the rank increasing.
Denote the key, the left and right children for H1 and H2 as k1 , L1 , R1 , and
k2 , L2 , R2 respectively. The merge(H1 , H2 ) function can be completed by using
this auxiliary tool as below
H2 : H1 = ϕ
H1 : H2 = ϕ
merge(H1 , H2 ) = (7.5)
mk(k 1 , L 1 , merge(R 1 , H 2 )) : k1 < k 2
mk(k2 , L2 , merge(H1 , R2 )) : otherwise
The merge function is always recursively called on the right side, and the
Leftist property is maintained. These facts ensure the performance being bound
to O(lg n).
The following Haskell example code implements the merge program.
merge E h = h
merge h E = h
merge h1@(Node _ x l r) h2@(Node _ y l' r') =
180 CHAPTER 7. BINARY HEAPS
top(H) = k (7.6)
For pop operation, firstly, the top element is removed, then left and right
children are merged to a new heap.
Insertion
To insert a new element, one solution is to create a single leaf node with the
element, and then merge this leaf node to the existing Leftist tree.
4 3
7 9 14 8
16 10
Figure 7.10: A Leftist tree built from list {9, 4, 16, 7, 10, 2, 14, 3, 8, 1}.
Figure 7.10 shows one example Leftist tree built in this way.
The following example Haskell code gives reference implementation for the
Leftist tree operations.
insert h x = merge (Node 1 x E E) h
findMin (Node _ x _ _) = x
{
ϕ : H=ϕ
heapSort(H) = (7.11)
{top(H)} ∪ heapSort(pop(H)) : otherwise
3 4
8 9
10
14
16
Skew heap (or self-adjusting heap) simplifies Leftist heap realization and
intends to solve the balance issue[9] [10].
When construct the Leftist heap, we swap the left and right children during
merge if the rank on left side is less than the right side. This comparison-and-
swap strategy doesn’t work when either sub tree has only one child. Because
in such case, the rank of the sub tree is always 1 no matter how big it is. A
‘Brute-force’ approach is to swap the left and right children every time when
merge. This idea leads to Skew heap.
Merge
The merge algorithm tends to be very simple. When merge two non-empty Skew
trees, we compare the roots, and pick the smaller one as the new root, then the
other tree contains the bigger element is merged onto one sub tree, finally, the
tow children are swapped. Denote H1 = (k1 , L1 , R1 ) and H2 = (k2 , L2 , R2 ) if
7.3. LEFTIST HEAP AND SKEW HEAP, THE EXPLICIT BINARY HEAPS183
they are not empty. if k1 < k2 for instance, select k1 as the new root. We
can either merge H2 to L1 , or merge H2 to R1 . Without loss of generality,
let’s merge to R1 . And after swapping the two children, the final result is
(k1 , merge(R1 , H2 ), L1 ). Take account of edge cases, the merge algorithm is
defined as the following.
H1 : H2 = ϕ
H2 : H1 = ϕ
merge(H1 , H2 ) = (7.12)
(k1 , merge(R1 , H 2 ), L 1) : k1 < k 2
(k2 , merge(H1 , R2 ), L2 ) : otherwise
All the rest operations, including insert, top and pop are all realized as same
as the Leftist heap by using merge, except that we needn’t the rank any more.
Translating the above algorithm into Haskell yields the following example
program.
merge E h = h
merge h E = h
merge h1@(Node x l r) h2@(Node y l' r') =
if x < y then Node x (merge r h2) l
else Node y (merge h1 r') l'
findMin (Node x _ _) = x
Different from the Leftist heap, if we feed ordered list to Skew heap, it can
build a fairly balanced binary tree as illustrated in figure 7.12.
4 3
7 9 14 8
16 10
Figure 7.12: Skew tree is still balanced even the input is an ordered list
{1, 2, ..., 10}.
184 CHAPTER 7. BINARY HEAPS
7.4.1 Definition
Splay tree uses cache-like approach. It keeps rotating the current access node
close to the top, so that the node can be accessed fast next time. It defines
such kinds of operation as “Splay”. For the unbalanced binary search tree, after
several splay operations, the tree tends to be more and more balanced. Most
basic operations of Splay tree perform in amortized O(lg n) time. Splay tree
was invented by Daniel Dominic Sleator and Robert Endre Tarjan in 1985[11]
[12].
Splaying
There are two methods to do splaying. The first one need deal with many
different cases, but can be implemented fairly easy with pattern matching. The
second one has a uniformed form, but the implementation is complex.
Denote the node currently being accessed as X, the parent node as P , and
the grand parent node as G (If there are). There are 3 steps for splaying. Each
step contains 2 symmetric cases. For illustration purpose, only one case is shown
for each step.
• Zig-zig step. As shown in figure 7.13, in this case, X and P are children
on the same side of G, either both on left or right. By rotating 2 times,
X becomes the new root.
• Zig-zag step. As shown in figure 7.14, in this case, X and P are children
on different sides. X is on the left, P is on the right. Or X is on the right,
P is on the left. After rotation, X becomes the new root, P and G are
siblings.
• Zig step. As shown in figure 7.15, in this case, P is the root, we rotate the
tree, so that X becomes new root. This is the last step in splay operation.
Although there are 6 different cases, they can be handled in the environments
support pattern matching. Denote the non-empty binary tree in form T =
7.4. SPLAY HEAP 185
G X
P d a p
X c b g
a b c d
(a) X and P are both left children or both right (b) X becomes new root after rotating 2 times.
children.
P d
X
a X
P G
b c
a b c d
(a) X and P are children on different sides. (b) X becomes new root. P and G are siblings.
P X
X c a P
a b b c
(a) P is the root. (b) Rotate the tree to make X be new root.
(L, k, R),. when access key Y in tree T , the splay operation can be defined as
below.
(a, X, (b, P, (c, G, d))) :
T = (((a, X, b), P, c), G, d), X = Y
(((a, G, b), P, c), X, d) :
T = (a, G, (b, P, (c, X, d))), X = Y
((a, P, b), X, (c, G, d)) :
T = (a, P, (b, X, c), G, d), X = Y
splay(T, X) = ((a, G, b), X, (c, P, d)) :
T = (a, G, ((b, X, c), P, d)), X = Y
(a, X, (b, P, c)) :
T = ((a, X, b), P, c), X = Y
((a, P, b), X, c) :
T = (a, P, (b, X, c)), X = Y
T :
otherwise
(7.13)
The first two clauses handle the ’zig-zig’ cases; the next two clauses handle
the ’zig-zag’ cases; the last two clauses handle the zig cases. The tree aren’t
changed for all other situations.
The following Haskell program implements this splay function.
data STree a = E -- Empty
| Node (STree a) a (STree a) -- left, key, right
-- zig-zig
splay t@(Node (Node (Node a x b) p c) g d) y =
if x == y then Node a x (Node b p (Node c g d)) else t
splay t@(Node a g (Node b p (Node c x d))) y =
if x == y then Node (Node (Node a g b) p c) x d else t
-- zig-zag
splay t@(Node (Node a p (Node b x c)) g d) y =
if x == y then Node (Node a p b) x (Node c g d) else t
splay t@(Node a g (Node (Node b x c) p d)) y =
if x == y then Node (Node a g b) x (Node c p d) else t
-- zig
splay t@(Node (Node a x b) p c) y = if x == y then Node a x (Node b p c) else t
splay t@(Node a p (Node b x c)) y = if x == y then Node (Node a p b) x c else t
-- otherwise
splay t _ = t
With splay operation defined, every time when insert a new key, we call
the splay function to adjust the tree. If the tree is empty, the result is a leaf;
otherwise we compare this key with the root, if it is less than the root, we
recursively insert it into the left child, and perform splaying after that; else the
key is inserted into the right child.
(ϕ, x, ϕ) : T =ϕ
insert(T, x) = splay((insert(L, x), k, R), x) : T = (L, k, R), x < k
splay(L, k, insert(R, x)) : otherwise
(7.14)
The following Haskell program implements this insertion algorithm.
insert E y = Node E y E
insert (Node l x r) y
| x>y = splay (Node (insert l y) x r) y
| otherwise = splay (Node l x (insert r y)) y
7.4. SPLAY HEAP 187
Figure 7.16 shows the result of using this function. It inserts the ordered
elements {1, 2, ..., 10} one by one to the empty tree. This would build a very
poor result which downgrade to linked-list with normal binary search tree. The
splay method creates more balanced result.
4 10
2 9
1 3 7
6 8
Okasaki found a simple rule for Splaying [6]. Whenever we follow two left
branches, or two right branches continuously, we rotate the two nodes.
Based on this rule, splaying can be realized in such a way. When we access
node for a key x (can be during the process of inserting a node, or looking up a
node, or deleting a node), if we traverse two left branches or two right branches,
we partition the tree in two parts L and R, where L contains all nodes smaller
than x, and R contains all the rest. We can then create a new tree (for instance
in insertion), with x as the root, L as the left child, and R being the right child.
188 CHAPTER 7. BINARY HEAPS
The partition process is recursive, because it will splay its children as well.
(ϕ, ϕ) : T =ϕ
(T, ϕ) : T = (L, k, R) ∧ R = ϕ
T = (L, k, (L′ , k ′ , R′ ))
((L, k, L′ ), k ′ , A, B) :
k < p, k ′ < p
(A, B) = partition(R′ , p)
T = (L, K, (L′ , k ′ , R′ ))
((L, k, A), (B, k ′ , R′ )) : k < p ≤ k′
(A, B) = partition(L′ , p)
partition(T, p) =
(ϕ, T ) : T = (L, k, R) ∧ L = ϕ
T = ((L′ , k ′ , R′ ), k, R)
(A, (L′ , k ′ , (R′ , k, R)) :
p ≤ k, p ≤ k ′
(A, B) = partition(L′ , p)
T = ((L′ , k ′ , R′ ), k, R)
((L′ , k ′ , A), (B, k, R)) :
k′ ≤ p ≤ k
(A, B) = partition(R′ , p)
(7.15)
Function partition(T, p) takes a tree T , and a pivot p as arguments. The
first clause is edge case. The partition result for empty is a pair of empty left
and right trees. Otherwise, denote the tree as (L, k, R). we need compare the
pivot p and the root k. If k < p, there are two sub-cases. one is trivial case that
R is empty. According to the property of binary search tree, All elements are
less than p, so the result pair is (T, ϕ); For the other case, R = (L′ , k ′ , R′ ), we
need further compare k ′ with the pivot p. If k ′ < p is also true, we recursively
partition R′ with the pivot, all the elements less than p in R′ is held in tree A,
and the rest is in tree B. The result pair can be composed with two trees, one is
((L, k, L′ ), k ′ , A); the other is B. If the key of the right sub tree is not less than
the pivot, we recursively partition L′ with the pivot to give the intermediate
pair (A, B), the final pair trees can be composed with (L, k, A) and (B, k ′ , R′ ).
There are symmetric cases for p ≤ k. They are handled in the last three clauses.
Translating the above algorithm into Haskell yields the following partition
program.
partition E _ = (E, E)
partition t@(Node l x r) y
| x<y=
case r of
E → (t, E)
Node l' x' r' →
if x' < y then
let (small, big) = partition r' y in
(Node (Node l x l') x' small, big)
else
let (small, big) = partition l' y in
(Node l x small, Node big x' r')
7.4. SPLAY HEAP 189
| otherwise =
case l of
E → (E, t)
Node l' x' r' →
if y < x' then
let (small, big) = partition l' y in
(small, Node l' x' (Node r' x r))
else
let (small, big) = partition r' y in
(Node l' x' small, Node big x r)
Alternatively, insertion can be realized with partition algorithm. When
insert a new element k into the splay heap T , we can first partition the heap
into two trees, L and R. Where L contains all nodes smaller than k, and R
contains the rest. We then construct a new node, with k as the root and L, R
as the children.
deleteMin (Node E x r) = r
deleteMin (Node (Node E x' r') x r) = Node r' x r
deleteMin (Node (Node l' x' r') x r) = Node (deleteMin l') x' (Node r' x r)
190 CHAPTER 7. BINARY HEAPS
Merge
Merge is another basic operation for heaps as it is widely used in Graph al-
gorithms. By using the partition algorithm, merge can be realized in O(lg n)
time.
When merging two splay trees, for non-trivial case, we can take the root of
the first tree as the new root, then partition the second tree with this new root
as the pivot. After that we recursively merge the children of the first tree to the
partition result. This algorithm is defined as the following.
{
T2 : T1 = ϕ
merge(T1 , T2 ) =
(merge(L, A), k, merge(R, B)) : T1 = (L, k, R), (A, B) = partition(T2 , k)
(7.19)
If the first heap is empty, the result is definitely the second heap. Otherwise,
denote the first splay heap as (L, k, R), we partition T2 with k as the pivot to
yield (A, B), where A contains all the elements in T2 which are less than k, and
B holds the rest. We next recursively merge A with L; and merge B with R as
the new children for T1 .
Translating the definition to Haskell gives the following example program.
merge E t = t
merge (Node l x r) t = Node (merge l l') x (merge r r')
where (l', r') = partition t x
It’s very natural to extend the concept from binary tree to k-ary (k-way)
tree, which leads to other useful heaps such as Binomial heap, Fibonacci heap
and pairing heap. They are introduced in the following chapters.
Exercise 7.2
• Realize the imperative Leftist heap, Skew heap, and Splay heap.
192 CHAPTER 7. BINARY HEAPS
Bibliography
[12] Sleator, Daniel D.; Tarjan, Robert E. (1985), “Self-Adjusting Binary Search
Trees”, Journal of the ACM 32(3):652 - 686, doi: 10.1145/3828.3835
[13] NIST, “binary heap”. http://xw2k.nist.gov/dads//HTML/binaryheap.html
193
194 The evolution of selection sort
Chapter 8
8.1 Introduction
We have introduced the ‘hello world’ sorting algorithm, insertion sort. In this
short chapter, we explain another straightforward sorting method, selection sort.
The basic version of selection sort doesn’t perform as good as the divide and
conqueror methods, e.g. quick sort and merge sort. We’ll use the same ap-
proaches in the chapter of insertion sort, to analyze why it’s slow, and try to
improve it by varies of attempts till reach the best bound of comparison based
sorting, O(n lg n), by evolving to heap sort.
The idea of selection sort can be illustrated by a real life story. Consider
a kid eating a bunch of grapes. There are two types of children according to
my observation. One is optimistic type, that the kid always eats the biggest
grape he/she can ever find; the other is pessimistic, that he/she always eats the
smallest one.
The first type of kids actually eat the grape in an order that the size decreases
monotonically; while the other eat in a increase order. The kid sorts the grapes
in order of size in fact, and the method used here is selection sort.
Based on this idea, the algorithm of selection sort can be directly described
as the following.
In order to sort a series of elements:
• The trivial case, if the series is empty, then we are done, the result is also
empty;
• Otherwise, we find the smallest element, and append it to the tail of the
result;
Note that this algorithm sorts the elements in increase order; It’s easy to
sort in decrease order by picking the biggest element instead; We’ll introduce
about passing a comparator as a parameter later on.
195
196CHAPTER 8. FROM GRAPE TO THE WORLD CUP, THE EVOLUTION OF SELECTION SORT
m = min(A)
A′ = A − {m}
We don’t limit the data structure of the collection here. Typically, A is an
array in imperative environment, and a list (singly linked-list particularly) in
functional environment, and it can even be other data struture which will be
introduced later.
The algorithm can also be given in imperative manner.
function Sort(A)
X←ϕ
while A ̸= ϕ do
x ← Min(A)
A ← Del(A, x)
X ← Append(X, x)
return X
Figure 8.2 depicts the process of this algorithm.
pick
Figure 8.2: The left part is sorted data, continuously pick the minimum element
in the rest and append it to the result.
We just translate the very original idea of ‘eating grapes’ line by line without
considering any expense of time and space. This realization stores the result in
8.2. FINDING THE MINIMUM 197
insert
Figure 8.3: The left part is sorted data, continuously pick the minimum element
in the rest and put it to the right position.
We haven’t completely realized the selection sort, because we take the operation
of finding the minimum (or the maximum) element as a black box. It’s a puzzle
how does a kid locate the biggest or the smallest grape. And this is an interesting
topic for computer algorithms.
The easiest but not so fast way to find the minimum in a collection is to
perform a scan. There are several ways to interpret this scan process. Consider
that we want to pick the biggest grape. We start from any grape, compare
it with another one, and pick the bigger one; then we take a next grape and
compare it with the one we selected so far, pick the bigger one and go on the
take-and-compare process, until there are not any grapes we haven’t compared.
It’s easy to get loss in real practice if we don’t mark which grape has been
compared. There are two ways to to solve this problem, which are suitable for
different data-structures respectively.
198CHAPTER 8. FROM GRAPE TO THE WORLD CUP, THE EVOLUTION OF SELECTION SORT
8.2.1 Labeling
Method 1 is to label each grape with a number: {1, 2, ..., n}, and we systemat-
ically perform the comparison in the order of this sequence of labels. That we
first compare grape number 1 and grape number 2, pick the bigger one; then we
take grape number 3, and do the comparison, ... We repeat this process until
arrive at grape number n. This is quite suitable for elements stored in an array.
function Min(A)
m ← A[1]
for i ← 2 to |A| do
if A[i] < m then
m ← A[i]
return m
With Min defined, we can complete the basic version of selection sort (or
naive version without any optimization in terms of time and space).
However, this algorithm returns the value of the minimum element instead
of its location (or the label of the grape), which needs a bit tweaking for the
in-place version. Some languages such as ISO C++, support returning the
reference as result, so that the swap can be achieved directly as below.
template<typename T>
T& min(T∗ from, T∗ to) {
T∗ m;
for (m = from++; from != to; ++from)
if (∗from < ∗m)
m = from;
return ∗m;
}
template<typename T>
void ssort(T∗ xs, int n) {
for (int i = 0; i < n; ++i)
std::swap(xs[i], min(xs+i, xs+n));
}
n = len(xs)
for i in range(n):
m = min_at(xs, i, n)
(xs[i], xs[m]) = (xs[m], xs[i])
return xs
8.2.2 Grouping
Another method is to group all grapes in two parts: the group we have examined,
and the rest we haven’t. We denote these two groups as A and B; All the
elements (grapes) as L. At the beginning, we haven’t examine any grapes at
all, thus A is empty (ϕ), and B contains all grapes. We can select arbitrary two
grapes from B, compare them, and put the loser (the smaller one for example) to
A. After that, we repeat this process by continuously picking arbitrary grapes
from B, and compare with the winner of the previous time until B becomes
empty. At this time being, the final winner is the minimum element. And A
turns to be L−{min(L)}, which can be used for the next time minimum finding.
There is an invariant of this method, that at any time, we have L = A ∪
{m} ∪ B, where m is the winner so far we hold.
This approach doesn’t need the collection of grapes being indexed (as being
labeled in method 1). It’s suitable for any traversable data structures, including
linked-list etc. Suppose b1 is an arbitrary element in B if B isn’t empty, and B ′
is the rest of elements with b1 being removed, this method can be formalized as
the below auxiliary function.
(m, A) : B = ϕ
min′ (A, m, B) = min′ (A ∪ {m}, b1 , B ′ ) : b1 < m (8.2)
min′ (A ∪ {b1 }, m, B ′ ) : otherwise
In order to pick the minimum element, we call this auxiliary function by
passing an empty A, and use an arbitrary element (for instance, the first one)
to initialize m:
(l1 , ϕ) : |L| = 1
extractM in(L) = (l1 , L′ ) : l1 < m, (m, L′′ ) = extractM in(L′ )
(m, l1 ∪ L′′ ) : otherwise
(8.4)
If L is a singleton, the minimum is the only element it contains. Otherwise,
denote l1 as the first element in L, and L′ contains the rest elements except for
l1 , that L′ = {l2 , l3 , ...}. The algorithm recursively finding the minimum element
in L′ , which yields the intermediate result as (m, L′′ ), that m is the minimum
element in L′ , and L′′ contains all rest elements except for m. Comparing l1
with m, we can determine which of them is the final minimum result.
The following Haskell program implements this version of selection sort.
sort [] = []
sort xs = x : sort xs' where
(x, xs') = extractMin xs
Exercise 8.1
• Implement the basic imperative selection sort algorithm (the none in-place
version) in your favorite programming language. Compare it with the in-
place version, and analyze the time and space effectiveness.
{
ϕ : L=ϕ
sort(c, L) =
m ∪ sort(c, L′′ ) : otherwise, (m, L′′ ) = extract(c, L′ )
(8.5)
And the algorithm extract(c, L) is defined as below.
(l1 , ϕ) : |L| = 1
extract(c, L) = (l1 , L′ ) : c(l1 , m), (m, L′′ ) = extract(c, L′ )
(m, {l1 } ∪ L′′ ) : ¬c(l1 , m)
(8.6)
Where c is a comparator function, it takes two elements, compare them and
returns the result of which one is preceding of the other. Passing ‘less than’
operator (<) turns this algorithm to be the version we introduced in previous
section.
Some environments require to pass the total ordering comparator, which
returns result among ‘less than’, ’equal’, and ’greater than’. We needn’t such
strong condition here, that c only tests if ‘less than’ is satisfied. However, as the
minimum requirement, the comparator should meet the strict weak ordering as
following [16]:
202CHAPTER 8. FROM GRAPE TO THE WORLD CUP, THE EVOLUTION OF SELECTION SORT
• Asymmetric, For all x and y, if x < y, then it’s not the case y < x;
Note that, both ssort and extract-min are inner functions, so that the
‘less than’ comparator ltp? is available to them. Passing ‘<’ to this function
yields the normal sorting in ascending order:
(sel-sort-by < '(3 1 2 4 5 10 9))
;Value 16: (1 2 3 4 5 9 10)
we need NOT find the minimum if there is only one element in the list. This
indicates that the outer loop can iterate to n − 1 instead of n.
Another place we can fine tune, is that we needn’t swap the elements if the
i-th minimum one is just A[i]. The algorithm can be modified accordingly as
below:
procedure Sort(A)
for i ← 1 to |A| − 1 do
m←i
for j ← i + 1 to |A| do
if A[i] < A[m] then
m←i
if m ̸= i then
Exchange A[i] ↔ A[m]
Definitely, these modifications won’t affects the performance in terms of big-
O.
swap
Figure 8.4: Select the maximum every time and put it to the end.
This version reveals the fact that, selecting the maximum element can sort
the element in ascending order as well. What’s more, we can find both the
minimum and the maximum elements in one pass of traversing, putting the
minimum at the first location, while putting the maximum at the last position.
This approach can speed up the sorting slightly (halve the times of the outer
loop). This method is called ’cock-tail sort’.
procedure Sort(A)
for i ← 1 to ⌊ |A|
2 ⌋ do
min ← i
204CHAPTER 8. FROM GRAPE TO THE WORLD CUP, THE EVOLUTION OF SELECTION SORT
max ← |A| + 1 − i
if A[max] < A[min] then
Exchange A[min] ↔ A[max]
for j ← i + 1 to |A| − i do
if A[j] < A[min] then
min ← j
if A[max] < A[j] then
max ← j
Exchange A[i] ↔ A[min]
Exchange A[|A| + 1 − i] ↔ A[max]
This algorithm can be illustrated as in figure 8.5, at any time, the left most
and right most parts contain sorted elements so far. That the smaller sorted ones
are on the left, while the bigger sorted ones are on the right. The algorithm scans
the unsorted ranges, located both the minimum and the maximum positions,
then put them to the head and the tail position of the unsorted ranges by
swapping.
swap
... sorted small ones ... x ... max ... min ... y ... sorted big ones ...
Figure 8.5: Select both the minimum and maximum in one pass, and put them
to the proper positions.
Note that it’s necessary to swap the left most and right most elements before
the inner loop if they are not in correct order. This is because we scan the range
excluding these two elements. Another method is to initialize the first element of
the unsorted range as both the maximum and minimum before the inner loop.
However, since we need two swapping operations after the scan, it’s possible
that the first swapping moves the maximum or the minimum from the position
we just found, which leads the second swapping malfunctioned. How to solve
this problem is left as exercise to the reader.
The following Python example program implements this cock-tail sort algo-
rithm.
def cocktail_sort(xs):
n = len(xs)
for i in range(n / 2):
(mi, ma) = (i, n - 1 -i)
if xs[ma] < xs[mi]:
(xs[mi], xs[ma]) = (xs[ma], xs[mi])
for j in range(i+1, n - 1 - i):
if xs[j] < xs[mi]:
mi = j
if xs[ma] < xs[j]:
ma = j
(xs[i], xs[mi]) = (xs[mi], xs[i])
(xs[n - 1 - i], xs[ma]) = (xs[ma], xs[n - 1 - i])
return xs
8.3. MINOR IMPROVEMENT 205
• Trivial edge case: If the list is empty, or there is only one element in the
list, the sorted result is obviously the origin list;
• Otherwise, we select the minimum and the maximum, put them in the
head and tail positions, then recursively sort the rest elements.
{
′ A ∪ L ∪ B : L = ϕ ∨ |L| = 1
sort (A, L, B) =
sort′ (A ∪ {lmin }, L′′ , {lmax } ∪ B) : otherwise
(8.9)
Where lmin , lmax and L′′ are defined as same as before. And we start sorting
by passing empty A and B: sort(L) = sort′ (ϕ, L, ϕ).
Besides the edge case, observing that the appending operation only happens
on A ∪ {lmin }; while lmax is only linked to the head of B. This appending
occurs in every recursive call. To eliminate it, we can store A in reverse order
←
−
as A , so that lmax can be ‘cons’ to the head instead of appending. Denote
cons(x, L) = {x} ∪ L and append(L, x) = L ∪ {x}, we have the below equation.
reverse(A) ∪ B : L = ϕ
sort′ (A, L, B) = reverse({l1 } ∪ A) ∪ B : |L| = 1 (8.11)
sort′ ({lmin } ∪ A, L′′ , {lmax } ∪ B) :
This algorithm can be implemented by Haskell as below.
csort' xs = cocktail [] xs [] where
cocktail as [] bs = reverse as ++ bs
cocktail as [x] bs = reverse (x:as) ++ bs
cocktail as xs bs = let (mi, xs', ma) = extractMinMax xs
in cocktail (mi:as) xs' (ma:bs)
Exercise 8.2
• Realize the imperative basic selection sort algorithm, which can take a
comparator as a parameter. Please try both dynamic typed language and
static typed language. How to annotate the type of the comparator as
general as possible in a static typed language?
• Implement Knuth’s version of selection sort in your favorite programming
language.
• An alternative to realize cock-tail sort is to assume the i-th element both
the minimum and the maximum, after the inner loop, the minimum and
maximum are found, then we can swap the the minimum to the i-th
position, and the maximum to position |A|+1−i. Implement this solution
in your favorite imperative language. Please note that there are several
special edge cases should be handled correctly:
8.4. MAJOR IMPROVEMENT 207
teams are quite strong, one of them must be knocked out. It’s quite possible
that even the team loss that game can beat all the other teams except for the
champion. Figure 8.6 illustrates such case.
16
16 14
16 13 10 14
7 16 8 13 10 9 12 14
7 6 15 16 8 4 13 3 5 10 9 1 12 2 11 14
Imagine that every team has a number. The bigger the number, the stronger
the team. Suppose that the stronger team always beats the team with smaller
number, although this is not true in real world. But this simplification is fair
enough for us to develop the tournament knock out solution. This maximum
number which represents the champion is 16. Definitely, team with number 14
isn’t the second best according to our rules. It should be 15, which is knocked
out at the first round of comparison.
The key question here is to find an effective way to locate the second max-
imum number in this tournament tree. After that, what we need is to apply
the same method to select the third, the fourth, ..., to accomplish the selection
based sort.
One idea is to assign the champion a very small number (for instance, −∞),
so that it won’t be selected next time, and the second best one, becomes the
new champion. However, suppose there are 2m teams for some natural number
m, it still takes 2m−1 + 2m−2 + ... + 2 + 1 = 2m times of comparison to determine
the new champion. Which is as slow as the first time.
Actually, we needn’t perform a bottom-up comparison at all since the tour-
nament tree stores plenty of ordering information. Observe that, the second
best team must be beaten by the champion at sometime, or it will be the final
winner. So we can track the path from the root of the tournament tree to the
leaf of the champion, examine all the teams along with this path to find the
second best team.
In figure 8.6, this path is marked in gray color, the elements to be examined
are {14, 13, 7, 15}. Based on this idea, we refine the algorithm like below.
1. Build a tournament tree from the elements to be sorted, so that the cham-
pion (the maximum) becomes the root;
2. Extract the root from the tree, perform a top-down pass and replace the
maximum with −∞;
3. Perform a bottom-up back-track along the path, determine the new cham-
pion and make it as the new root;
4. Repeat step 2 until all elements have been extracted.
Figure 8.7, 8.8, and 8.9 show the steps of applying this strategy.
8.4. MAJOR IMPROVEMENT 209
15
15 14
15 13 10 14
7 15 8 13 10 9 12 14
7 6 15 -INF 8 4 13 3 5 10 9 1 12 2 11 14
14
13 14
7 13 10 14
7 -INF 8 13 10 9 12 14
7 6 -INF -INF 8 4 13 3 5 10 9 1 12 2 11 14
13
13 12
7 13 10 12
7 -INF 8 13 10 9 12 11
We can reuse the binary tree definition given in the first chapter of this
book to represent tournament tree. In order to back-track from leaf to the root,
every node should hold a reference to its parent (concept of pointer in some
environment such as ANSI C):
struct Node {
Key key;
struct Node ∗left, ∗right, ∗parent;
};
This algorithm firstly takes O(n) time to build the tournament tree, then
performs n pops to select the maximum elements so far left in the tree. Since
each pop operation is bound to O(lg n), thus the total performance of tourna-
ment knock out sorting is O(n lg n).
Thus a binary tree is either empty or a branch node contains a key, a left
sub tree and a right sub tree. Both children are again binary trees.
We’ve use hard coded big negative number to represents −∞. However, this
solution is ad-hoc, and it forces all elements to be sorted are greater than this
pre-defined magic number. Some programming environments support algebraic
type, so that we can define negative infinity explicitly. For instance, the below
Haskell program setups the concept of infinity 2 .
data Infinite a = NegInf | Only a | Inf deriving (Eq, Ord)
From now on, we switch back to use the min() function to determine the
winner, so that the tournament selects the minimum instead of the maximum
as the champion.
Denote function key(T ) returns the key of the tree rooted at T . Function
wrap(x) wraps the element x into a leaf node. Function tree(l, k, r) creates a
branch node, with k as the key, l and r as the two children respectively.
The knock out process, can be represented as comparing two trees, picking
the smaller key as the new key, and setting these two trees as children:
to derive the default, correct comparing behavior of ‘Ord’. Anyway, it’s possible to specify the
detailed order by make it as an instance of ‘Ord’. However, this is Language specific feature
which is out of the scope of this book. Please refer to other textbook about Haskell.
214CHAPTER 8. FROM GRAPE TO THE WORLD CUP, THE EVOLUTION OF SELECTION SORT
Note that this algorithm actually handles another special cases, that the list
to be sort is empty. The result is obviously empty.
Denote T = {T1 , T2 , ...} if there are at least two trees, and T′ represents the
left trees by removing the first two. Function pair(T) is defined as the following.
{
{branch(T1 , T2 )} ∪ pair(T′ ) : |T| ≥ 2
pair(T) = (8.15)
T : otherwise
tree(ϕ, ∞, ϕ) : L=ϕ∧R=ϕ
pop(T ) = tree(L′ , min(key(L′ ), key(R)), R) : K = key(L), L′ = pop(L)
tree(L, min(key(L), key(R′ )), R′ ) : K = key(R), R′ = pop(R)
(8.16)
It’s straightforward to translate this algorithm into example Haskell code.
pop (Br Empty _ Empty) = Br Empty Inf Empty
pop (Br l k r) | k == key l = let l' = pop l in Br l' (min (key l') (key r)) r
| k == key r = let r' = pop r in Br l (min (key l) (key r')) r'
Note that this algorithm only removes the current champion without return-
ing it. So it’s necessary to define a function to get the champion at the root
node.
{
ϕ : T = ϕ ∨ key(T ) = ∞
sort′ (T ) = (8.19)
{top(T )} ∪ sort′ (pop(T )) : otherwise
The rest of the Haskell code is given below to complete the implementation.
8.4. MAJOR IMPROVEMENT 215
And the auxiliary function only, key, wrap accomplished with explicit in-
finity support are list as the following.
only (Only x) = x
key (Br _ k _ ) = k
wrap x = Br Empty (Only x) Empty
Exercise 8.3
• Why can our tournament tree knock out sort algorithm handle duplicated
elements (elements with same value)? We say a sorting algorithm stable, if
it keeps the original order of elements with same value. Is the tournament
tree knock out sorting stable?
• Compare the tournament tree knock out sort algorithm and binary tree
sort algorithm, analyze efficiency both in time and space.
• Compare the heap sort algorithm and binary tree sort algorithm, and do
same analysis for them.
The final sorting structure described in equation 8.19 can be easily uniformed
to a more general one if we treat the case that the tree is empty if its root holds
infinity as key:
{
ϕ : T =ϕ
sort′ (T ) = (8.20)
{top(T )} ∪ sort′ (pop(T )) : otherwise
This is exactly as same as the one of heap sort we gave in previous chapter.
Heap always keeps the minimum (or the maximum) on the top, and provides
fast pop operation. The binary heap by implicit array encodes the tree structure
in array index, so there aren’t any extra spaces allocated except for the n array
cells. The functional heaps, such as leftist heap and splay heap allocate n nodes
as well. We’ll introduce more heaps in next chapter which perform well in many
aspects.
217
218 Binomial heap, Fibonacci heap, and pairing heap
Chapter 9
9.1 Introduction
In previous chapter, we mentioned that heaps can be generalized and imple-
mented with varies of data structures. However, only binary heaps are focused
so far no matter by explicit binary trees or implicit array.
It’s quite natural to extend the binary tree to K-ary [1] tree. In this chapter,
we first show Binomial heaps which is actually consist of forest of K-ary trees.
Binomial heaps gain the performance for all operations to O(lg n), as well as
keeping the finding minimum element to O(1) time.
If we delay some operations in Binomial heaps by using lazy strategy, it
turns to be Fibonacci heap.
All binary heaps we have shown perform no less than O(lg n) time for merg-
ing, we’ll show it’s possible to improve it to O(1) with Fibonacci heap, which
is quite helpful to graph algorithms. Actually, Fibonacci heap achieves almost
all operations to good amortized time bound as O(1), and left the heap pop to
O(lg n).
Finally, we’ll introduce about the pairing heaps. It has the best performance
in practice although the proof of it is still a conjecture for the time being.
Binomial tree
In order to explain why the name of the tree is ‘binomial’, let’s review the
famous Pascal’s triangle (Also know as the Jia Xian’s triangle to memorize the
Chinese methematican Jia Xian (1010-1070).) [4].
219
220CHAPTER 9. BINOMIAL HEAP, FIBONACCI HEAP, AND PAIRING HEAP
1
1 1
1 2 1
1 3 3 1
1 4 6 4 1
...
In each row, the numbers are all binomial coefficients. There are many
ways to gain a series of binomial coefficient numbers. One of them is by using
recursive composition. Binomial trees, as well, can be defined in this way as the
following.
(a) A B0 tree.
rank=n-1
rank=n-1 ...
...
With this recursive definition, it easy to draw the form of binomial trees of
rank 0, 1, 2, ..., as shown in figure 9.2
Observing the binomial trees reveals some interesting properties. For each
rank n binomial tree, if counting the number of nodes in each row, it can be
found that it is the binomial number.
For instance for rank 4 binomial tree, there is 1 node as the root; and in the
second level next to root, there are 4 nodes; and in 3rd level, there are 6 nodes;
and in 4-th level, there are 4 nodes; and the 5-th level, there is 1 node. They
9.2. BINOMIAL HEAPS 221
2 2 1 0
1 1 0 1 0 0
0 0 0
3 2 1 0
2 1 0 1 0 0
1 0 0 0
0
...
(e) B4 tree;
are exactly 1, 4, 6, 4, 1 which is the 5th row in Pascal’s triangle. That’s why
we call it binomial tree.
Another interesting property is that the total number of node for a binomial
tree with rank n is 2n . This can be proved either by binomial theory or the
recursive definition directly.
Binomial heap
With binomial tree defined, we can introduce the definition of binomial heap.
A binomial heap is a set of binomial trees (or a forest of binomial trees) that
satisfied the following properties.
• Each binomial tree in the heap conforms to heap property, that the key
of a node is equal or greater than the key of its parent. Here the heap is
actually min-heap, for max-heap, it changes to ‘equal or less than’. In this
chapter, we only discuss about min-heap, and max-heap can be equally
applied by changing the comparison condition.
• There is at most one binomial tree which has the rank r. In other words,
there are no two binomial trees have the same rank.
This definition leads to an important result that for a binomial heap contains
n elements, and if convert n to binary format yields a0 , a1 , a2 , ..., am , where a0
is the LSB and am is the MSB, then for each 0 ≤ i ≤ m, if ai = 0, there is no
binomial tree of rank i and if ai = 1, there must be a binomial tree of rank i.
For example, if a binomial heap contains 5 element, as 5 is ‘(LSB)101(MSB)’,
then there are 2 binomial trees in this heap, one tree has rank 0, the other has
rank 2.
Figure 9.3 shows a binomial heap which have 19 nodes, as 19 is ‘(LSB)11001(MSB)’
in binary format, so there is a B0 tree, a B1 tree and a B4 tree.
18 3 6
37 8 29 10 44
30 23 22 48 31 17
45 32 24 50
55
Data layout
There are two ways to define K-ary trees imperatively. One is by using ‘left-
child, right-sibling’ approach[2]. It is compatible with the typical binary tree
structure. For each node, it has two fields, left field and right field. We use the
left field to point to the first child of this node, and use the right field to point to
the sibling node of this node. All siblings are represented as a single directional
linked list. Figure 9.4 shows an example tree represented in this way.
R NIL
C1 C2 ... Cn
The other way is to use the library defined collection container, such as array
or list to represent all children of a node.
Since the rank of a tree plays very important role, we also defined it as a
field.
For ‘left-child, right-sibling’ method, we defined the binomial tree as the
following.1
class BinomialTree:
def __init__(self, x = None):
self.rank = 0
self.key = x
self.parent = None
self.child = None
self.sibling = None
When initialize a tree with a key, we create a leaf node, set its rank as zero
and all other fields are set as NIL.
It quite nature to utilize pre-defined list to represent multiple children as
below.
class BinomialTree:
def __init__(self, x = None):
self.rank = 0
self.key = x
self.parent = None
self.children = []
For purely functional settings, such as in Haskell language, binomial tree are
defined as the following.
data BiTree a = Node { rank :: Int
, root :: a
, children :: [BiTree a]}
While binomial heap are defined as a list of binomial trees (a forest) with
ranks in monotonically increase order. And as another implicit constraint, there
are no two binomial trees have the same rank.
type BiHeap a = [BiTree a]
x = Key(T1 )
y = Key(T2 )
r = Rank(T1 ) = Rank(T2 )
C1 = Children(T1 )
C2 = Children(T2 )
y ...
...
Note that the link operation is bound to O(1) time if the ∪ is a constant
time operation. It’s easy to translate the link function to Haskell program as
the following.
link t1@(Node r x c1) t2@(Node _ y c2) =
if x<y then Node (r+1) x (t2:c1)
else Node (r+1) y (t1:c2)
9.2. BINOMIAL HEAPS 225
It’s possible to realize the link operation in imperative way. If we use ‘left
child, right sibling’ approach, we just link the tree which has the bigger key to
the left side of the other’s key, and link the children of it to the right side as
sibling. Figure 9.6 shows the result of one case.
1: function Link(T1 , T2 )
2: if Key(T2 ) < Key(T1 ) then
3: Exchange T1 ↔ T2
4: Sibling(T2 ) ← Child(T1 )
5: Child(T1 ) ← T2
6: Parent(T2 ) ← T1
7: Rank(T1 ) ← Rank(T1 ) + 1
8: return T1
y ...
...
Figure 9.6: Suppose x < y, link y to the left side of x and link the original
children of x to the right side of y.
Exercise 9.1
Implement the tree-linking program in your favorite language with left-child,
right-sibling method.
2 The C and C++ programs are also available along with this book
226CHAPTER 9. BINOMIAL HEAP, FIBONACCI HEAP, AND PAIRING HEAP
{T } : H = ϕ
insertT (H, T ) = {T } ∪ H : Rank(T ) < Rank(T1 )
insertT (H ′ , link(T, T1 )) : otherwise
(9.2)
where
H ′ = {T2 , T3 , ..., Tn }
The idea is that for the empty heap, we set the new tree as the only element
to create a singleton forest; otherwise, we compare the ranks of the new tree
and the first tree in the forest, if they are same, we link them together, and
recursively insert the linked result (a tree with rank increased by one) to the
rest of the forest; If they are not same, since the pre-condition constraints the
rank of the new tree, it must be the smallest, we put this new tree in front of
all the other trees in the forest.
From the binomial properties mentioned above, there are at most O(lg n)
binomial trees in the forest, where n is the total number of nodes. Thus function
insertT performs at most O(lg n) times linking, which are all constant time
operation. So the performance of insertT is O(lg n). 3
The relative Haskell program is given as below.
insertTree [] t = [t]
insertTree ts@(t':ts') t = if rank t < rank t' then t:ts
else insertTree ts' (link t t')
With this auxiliary function, it’s easy to realize the insertion. We can wrap
the new element to be inserted as the only leaf of a tree, then insert this tree to
the binomial heap.
Since wrapping an element as a singleton tree takes O(1) time, the real work
is done in insertT , the performance of binomial heap insertion is bound to
O(lg n).
The insertion algorithm can also be realized with imperative approach.
Algorithm 1 continuously linking the first tree in a heap with the new tree
to be inserted if they have the same rank. After that, it puts the linked-list of
the rest trees as the sibling, and returns the new linked-list.
If using a container to manage the children of a node, the algorithm can be
given in Algorithm 2.
In this algorithm, function Pop removes the first tree T1 = H[0] from the
forest. And function Head-Insert, insert a new tree before any other trees in
the heap, so that it becomes the first element in the forest.
With either Insert-Tree or Insert-Tree’ defined. Realize the binomial
heap insertion is trivial.
Exercise 9.2
Write the insertion program in your favorite imperative programming lan-
guage by using the ‘left-child, right-sibling’ approach.
H1 :
H2 = ϕ
H2 :
H1 = ϕ
merge(H1 , H2 ) = {T1 } ∪ merge(H1′ , H2 ) Rank(T1 ) < Rank(T1′ )
:
{T1′ } ∪ merge(H1 , H2′ ) Rank(T1 ) > Rank(T1′ )
:
insertT (merge(H1′ , H2′ ), link(T1 , T1′ )) :
otherwise
(9.4)
To analysis the performance of merge, suppose there are m1 trees in H1 ,
and m2 trees in H2 . There are at most m1 + m2 trees in the merged result.
If there are no two trees have the same rank, the merge operation is bound to
O(m1 + m2 ). While if there need linking for the trees with same rank, insertT
performs at most O(m1 + m2 ) time. Consider the fact that m1 = 1 + ⌊lg n1 ⌋
and m2 = 1 + ⌊lg n2 ⌋, where n1 , n2 are the numbers of nodes in each heap, and
⌊lg n1 ⌋ + ⌊lg n2 ⌋ ≤ 2⌊lg n⌋, where n = n1 + n2 , is the total number of nodes. the
final performance of merging is O(lg n).
Translating this algorithm to Haskell yields the following program.
merge ts1 [] = ts1
merge [] ts2 = ts2
merge ts1@(t1:ts1') ts2@(t2:ts2')
| rank t1 < rank t2 = t1:(merge ts1' ts2)
| rank t1 > rank t2 = t2:(merge ts1 ts2')
| otherwise = insertTree (merge ts1' ts2') (link t1 t2)
t1 ... t2 ...
Rank(t1)<Rank(t2)?
the smaller
T1 T2 ... Ti ...
t2 ... t1 ...
Rank(t1)=Rank(t2)
link(t1, t2)
insert
T1 T2 ... + Ti merge rest
(b) If two trees have same rank, link them to a new tree, and recursively insert
it to the merge result of the rest.
the Append-Tree algorithm, The rank of the new tree to be appended, can’t
be less than any other trees in the result heap according to our merge strategy,
however, it might be equal to the rank of the last tree in the result heap. This
can happen if the last tree appended are the result of linking, which will increase
the rank by one. In this case, we must link the new tree to be inserted with the
last tree. In below algorithm, suppose function Last(H) refers to the last tree
in a heap, and Append(H, T ) just appends a new tree at the end of a forest.
1: function Append-Tree(H, T )
2: if H ̸= ϕ∧ Rank(T ) = Rank(Last(H)) then
3: Last(H) ← Link(T , Last(H))
4: else
5: Append(H, T )
Function Append-Trees repeatedly call this function, so that it can append
all trees in a heap to the other heap.
1: function Append-Trees(H1 , H2 )
2: for each T ∈ H2 do
3: H1 ← Append-Tree(H1 , T )
The following Python program translates the merge algorithm.
def append_tree(ts, t):
if ts != [] and ts[-1].rank == t.rank:
ts[-1] = link(ts[-1], t)
else:
9.2. BINOMIAL HEAPS 231
ts.append(t)
return ts
Exercise 9.3
The program given above uses a container to manage sub-trees. Implement
the merge algorithm in your favorite imperative programming language with
‘left-child, right-sibling’ approach.
Pop
Among the forest which forms the binomial heap, each binomial tree conforms
to heap property that the root contains the minimum element in that tree.
However, the order relationship of these roots can be arbitrary. To find the
minimum element in the heap, we can select the smallest root of these trees.
Since there are lg n binomial trees, this approach takes O(lg n) time.
However, after we locate the minimum element (which is also know as the
top element of a heap), we need remove it from the heap and keep the binomial
property to accomplish heap-pop operation. Suppose the forest forms the bino-
mial heap consists trees of Bi , Bj , ..., Bp , ..., Bm , where Bk is a binomial tree of
rank k, and the minimum element is the root of Bp . If we delete it, there will
be p children left, which are all binomial trees with ranks p − 1, p − 2, ..., 0.
One tool at hand is that we have defined O(lg n) merge function. A possible
approach is to reverse the p children, so that their ranks change to monotonically
increasing order, and forms a binomial heap Hp . The rest of trees is still a
binomial heap, we represent it as H ′ = H − Bp . Merging Hp and H ′ given the
final result of pop. Figure 9.8 illustrates this idea.
In order to realize this algorithm, we first need to define an auxiliary function,
232CHAPTER 9. BINOMIAL HEAP, FIBONACCI HEAP, AND PAIRING HEAP
which can extract the tree contains the minimum element at root from the forest.
(T, ϕ) : H is a singleton as {T }
extractM in(H) = (T1 , H ′ ) : Root(T1 ) < Root(T ′ ) (9.5)
(T ′ , {T1 } ∪ H ′′ ) : otherwise
where
The result of this function is a tuple. The first part is the tree which has the
minimum element at root, the second part is the rest of the trees after remove
the first part from the forest.
This function examine each of the trees in the forest thus is bound to O(lg n)
time.
The relative Haskell program can be give respectively.
extractMin [t] = (t, [])
extractMin (t:ts) = if root t < root t' then (t, ts)
else (t', t:ts')
where
(t', ts') = extractMin ts
9.2. BINOMIAL HEAPS 233
{
ϕ : H=ϕ
heapSort(H) =
{f indM in(H)} ∪ heapSort(deleteM in(H)) : otherwise
(9.8)
Translate to Haskell yields the following program.
234CHAPTER 9. BINOMIAL HEAP, FIBONACCI HEAP, AND PAIRING HEAP
Function fromList can be defined by folding. Heap sort can also be expressed
in procedural way respectively. Please refer to previous chapter about binary
heap for detail.
Exercise 9.4
• Write the program to return the minimum element from a binomial heap
in your favorite imperative programming language with ’left-child, right-
sibling’ approach.
As what we have shown that insertion and merge are bound to O(lg n) time.
The results are all ensure for the worst case. The amortized performance are
O(1). We skip the proof for this fact.
9.3.1 Definition
Fibonacci heap is essentially a lazy evaluated binomial heap. Note that, it
doesn’t mean implementing binomial heap in lazy evaluation settings, for in-
stance Haskell, brings Fibonacci heap automatically. However, lazy evaluation
setting does help in realization. For example in [5], presents a elegant imple-
mentation.
Fibonacci heap has excellent performance theoretically. All operations ex-
cept for pop are bound to amortized O(1) time. In this section, we’ll give an
algorithm different from some popular textbook[2]. Most of the ideas present
here are based on Okasaki’s work[6].
Let’s review and compare the performance of binomial heap and Fibonacci
heap (more precisely, the performance goal of Fibonacci heap).
9.3. FIBONACCI HEAPS 235
The Fibonacci heap is either empty or a forest of binomial trees with the
minimum element stored in a special one explicitly.
data FibHeap a = E | FH { size :: Int
, minTree :: BiTree a
, trees :: [BiTree a]}
For convenient purpose, we also add a size field to record how many elements
are there in a heap.
The data layout can also be defined in imperative way as the following ANSI
C code.
struct node{
Key key;
struct node ∗next, ∗prev, ∗parent, ∗children;
int degree; /∗ As known as rank ∗/
int mark;
};
struct FibHeap{
struct node ∗roots;
struct node ∗minTr;
int n; /∗ number of nodes ∗/
};
For generality, Key can be a customized type, we use integer for illustration
purpose.
typedef int Key;
236CHAPTER 9. BINOMIAL HEAP, FIBONACCI HEAP, AND PAIRING HEAP
In this chapter, we use the circular doubly linked-list for imperative settings
to realize the Fibonacci Heap as described in [2]. It makes many operations easy
and fast. Note that, there are two extra fields added. A degree, also known as
rank for a node is the number of children of this node; Flag mark is used only
in decreasing key operation. It will be explained in detail in later section.
Note that function F ibHeap() accepts three parameters, a size value, which
is 1 for this one-leaf-tree, a special tree which contains the minimum element as
root, and a list of other binomial trees in the forest. The meaning of function
node() is as same as before, that it creates a binomial tree from a rank, an
element, and a list of children.
Insertion can also be realized directly by appending the new node to the
forest and updated the record of the tree which contains the minimum element.
1: function Insert(H, k)
2: x ← Singleton(k) ▷ Wrap x to a node
3: append x to root list of H
4: if Tmin (H) = N IL ∨ k < Key(Tmin (H)) then
5: Tmin (H) ← x
6: n(H) ← n(H)+1
Where function Tmin () returns the tree which contains the minimum element
at root.
The following C source snippet is a translation for this algorithm.
struct FibHeap∗ insert_node(struct FibHeap∗ h, struct node∗ x){
h = add_tree(h, x);
if(h→minTr == NULL | | x→key < h→minTr→key)
h→minTr = x;
h→n++;
return h;
}
Exercise 9.5
9.3. FIBONACCI HEAPS 237
H1 : H2 = ϕ
H2 : H1 = ϕ
merge(H1 , H2 ) =
F ibHeap(s1 + s2 , T 1min , {T2min } ∪ T1 ∪ T2) : root(T1min ) < root(T2min )
F ibHeap(s1 + s2 , T2min , {T1min } ∪ T1 ∪ T2 ) : otherwise
(9.10)
where s1 and s2 are the size of H1 and H2 ; T1min and T2min are the spe-
cial trees with minimum element as root in H1 and H2 respectively; T1 =
{T11 , T12 , ...} is a forest contains all other binomial trees in H1 ; while T2 has
the same meaning as T1 except that it represents the forest in H2 . Function
root(T ) return the root element of a binomial tree.
Note that as long as the ∪ operation takes constant time, these merge al-
gorithm is bound to O(1). The following Haskell program is the translation of
this algorithm.
merge h E=h
merge E h=h
merge h1@(FH sz1 minTr1 ts1) h2@(FH sz2 minTr2 ts2)
| root minTr1 < root minTr2 = FH (sz1+sz2) minTr1 (minTr2:ts2++ts1)
| otherwise = FH (sz1+sz2) minTr2 (minTr1:ts1++ts2)
Exercise 9.6
Implement the circular doubly linked list concatenation function in your
favorite imperative programming language.
list ϕ. Each time it process an element x, it firstly check if the first element in L
is equal to x, if so, it will add them together (which yields 2x), and repeatedly
check if 2x is equal to the next element in L. This process won’t stop until either
the element to be melt is not equal to the head element in the rest of the list, or
the list becomes empty. Table 9.1 illustrates the process of consolidating num-
ber sequence {2, 1, 1, 4, 8, 1, 1, 2, 4}. Column one lists the number ’scanned’ one
by one; Column two shows the intermediate result, typically the new scanned
number is compared with the first number in result list. If they are equal, they
are enclosed in a pair of parentheses; The last column is the result of meld, and
it will be used as the input to next step processing.
The Haskell program can be give accordingly.
consolidate = foldl meld [] where
meld [] x = [x]
meld (x':xs) x | x == x' = meld xs (x+x')
| x < x' = x:x':xs
| otherwise = x': meld xs x
We’ll analyze the performance of consolidation as a part of pop operation in
later section.
The tree consolidation is very similar to this algorithm except it performs
based on rank. The only thing we need to do is to modify meld() function a
bit, so that it compare on ranks and do linking instead of adding.
{x} : L=ϕ
meld(L′ , link(x, x1 )) : rank(x) = rank(x1 )
meld(L, x) = (9.14)
{x} ∪ L : rank(x) < rank(x1 )
{x1 } ∪ meld(L′ , x) : otherwise
The final consolidate Haskell program changes to the below version.
consolidate = foldl meld [] where
meld [] t = [t]
meld (t':ts) t | rank t == rank t' = meld ts (link t t')
| rank t < rank t' = t:t':ts
| otherwise = t' : meld ts t
Figure 9.9 and 9.10 show the steps of consolidation when processing a Fi-
bonacci Heap contains different ranks of trees. Comparing with table 9.1 reveals
the similarity.
240CHAPTER 9. BINOMIAL HEAP, FIBONACCI HEAP, AND PAIRING HEAP
a c d e i q r s u
b f g j k m t v w
h l n o x
a b c e
c a b c d f g
b d h
(b) Step 1, 2 (c) Step 3, ’d’ is firstly linked to ’c’, (d) Step 4
then repeatedly linked to ’a’.
a q a
b c e i b c e i
d f g j k m d f g j k m
h l n o h l n o
p p
q a
r s b c e i
t d f g j k m
h l n o
(c) Step 7, 8, ’r’ is firstly linked to ’q’, then ’s’ is linked to ’q’.
After we merge all binomial trees, including the special tree record for the
minimum element in root, in a Fibonacci heap, the heap becomes a Binomial
heap. And we lost the special tree, which gives us the ability to return the top
element in O(1) time.
It’s necessary to perform a O(lg n) time search to resume the special tree.
We can reuse the function extractM in() defined for Binomial heap.
It’s time to give the final pop function for Fibonacci heap as all the sub
problems have been solved. Let Tmin denote the special tree in the heap to
record the minimum element in root; T denote the forest contains all the other
trees except for the special tree, s represents the size of a heap, and function
children() returns all sub trees except the root of a binomial tree.
{
ϕ : T = ϕ ∧ children(Tmin ) = ϕ
deleteM in(H) = ′
F ibHeap(s − 1, Tmin , T′ ) : otherwise
(9.15)
Where
′
(Tmin , T′ ) = extractM in(consolidate(children(Tmin ) ∪ T))
Translate to Haskell yields the below program.
deleteMin (FH _ (Node _ x []) []) = E
deleteMin h@(FH sz minTr ts) = FH (sz-1) minTr' ts' where
(minTr', ts') = extractMin $ consolidate (children minTr ++ ts)
The main part of the imperative realization is similar. We cut all children
of Tmin and append them to root list, then perform consolidation to merge all
trees with the same rank until all trees are unique in term of rank.
1: function Delete-Min(H)
2: x ← Tmin (H)
3: if x ̸= N IL then
4: for each y ∈ Children(x) do
5: append y to root list of H
6: Parent(y) ← N IL
7: remove x from root list of H
8: n(H) ← n(H) - 1
9: Consolidate(H)
10: return x
Algorithm Consolidate utilizes an auxiliary array A to do the merge job.
Array A[i] is defined to store the tree with rank (degree) i. During the traverse
of root list, if we meet another tree of rank i, we link them together to get a
new tree of rank i + 1. Next we clean A[i], and check if A[i + 1] is empty and
perform further linking if necessary. After we finish traversing all roots, array
A stores all result trees and we can re-construct the heap from it.
1: function Consolidate(H)
2: D ← Max-Degree(n(H))
3: for i ← 0 to D do
4: A[i] ← N IL
5: for each x ∈ root list of H do
6: remove x from root list of H
9.3. FIBONACCI HEAPS 243
7: d ← Degree(x)
8: while A[d] ̸= N IL do
9: y ← A[d]
10: x ← Link(x, y)
11: A[d] ← N IL
12: d←d+1
13: A[d] ← x
14: Tmin (H) ← N IL ▷ root list is NIL at the time
15: for i ← 0 to D do
16: if A[i] ̸= N IL then
17: append A[i] to root list of H.
18: if Tmin = N IL∨ Key(A[i]) < Key(Tmin (H)) then
19: Tmin (H) ← A[i]
The only unclear sub algorithm is Max-Degree, which can determine the
upper bound of the degree of any node in a Fibonacci Heap. We’ll delay the
realization of it to the last sub section.
Feed a Fibonacci Heap shown in Figure 9.9 to the above algorithm, Figure
9.11, 9.12 and 9.13 show the result trees stored in auxiliary array A in every
steps.
c a b c d f g
b d h
b c e i
d f g j k m
h l n o
(a) Step 5
q a
b c e i
d f g j k m
h l n o
(b) Step 6
q a
r s b c e i
t d f g j k m
h l n o
(a) Step 7, 8, Since A0 ̸= N IL, ’r’ is firstly linked to ’q’, and the new
tree is stored in A1 (A0 is cleared); then ’s’ is linked to ’q’, and stored
in A2 (A1 is cleared).
int i, d;
for(i=0; i≤D; ++i)
a[i] = NULL;
while(h→roots){
x = h→roots;
h→roots = remove_node(h→roots, x);
d= x→degree;
while(a[d]){
y = a[d]; /∗ Another node has the same degree as x ∗/
x = link(x, y);
a[d++] = NULL;
}
a[d] = x;
}
h→minTr = h→roots = NULL;
for(i=0; i≤D; ++i)
if(a[i]){
h→roots = append(h→roots, a[i]);
if(h→minTr == NULL | | a[i]→key < h→minTr→key)
h→minTr = a[i];
}
free(a);
}
Exercise 9.7
Implement the remove function for circular doubly linked list in your favorite
imperative programming language.
246CHAPTER 9. BINOMIAL HEAP, FIBONACCI HEAP, AND PAIRING HEAP
E =M ·g·h
Suppose there is a complex process, which moves the object with mass M up
and down, and finally the object stop at height h′ . And if there exists friction
resistance Wf , We say the process works the following power.
W = M · g · (h′ − h) + Wf
If only insertion, merge, and pop function are applied to Fibonacci heap.
We ensure that all trees are binomial trees. It is easy to estimate the upper
limit D(n) is O(lg n). (Suppose the extreme case, that all nodes are in only one
Binomial tree).
However, we’ll show in next sub section that, there is operation can violate
the binomial tree assumption.
Exercise 9.8
Why the tree consolidation time is proportion to the number of trees it
processed?
x ... r
... y ...
@
@
...
Figure 9.15: x < y, cut tree x from its parent, and add x to root list.
If a node losses its second child, it is immediately cut from parent, and added
to root list
The final Decrease-Key algorithm is given as below.
1: function Decrease-Key(H, x, k)
2: Key(x) ← k
3: p ← Parent(x)
4: if p ̸= N IL ∧ k < Key(p) then
5: Cut(H, x)
6: Cascading-Cut(H, p)
7: if k < Key(Tmin (H)) then
8: Tmin (H) ← x
Where function Cascading-Cut uses the mark to determine if the node is
losing the second child. the node is marked after it losses the first child. And
the mark is cleared in Cut function.
1: function Cut(H, x)
2: p ← Parent(x)
3: remove x from p
4: Degree(p) ← Degree(p) - 1
5: add x to root list of H
6: Parent(x) ← N IL
7: Mark(x) ← F ALSE
During cascading cut process, if x is marked, which means it has already
lost one child. We recursively performs cut and cascading cut on its parent till
reach to root.
1: function Cascading-Cut(H, x)
2: p ← Parent(x)
3: if p ̸= N IL then
4: if Mark(x) = F ALSE then
9.3. FIBONACCI HEAPS 249
5: Mark(x) ← T RU E
6: else
7: Cut(H, x)
8: Cascading-Cut(H, p)
The relevant ANSI C decreasing key program is given as the following.
void decrease_key(struct FibHeap∗ h, struct node∗ x, Key k){
struct node∗ p = x→parent;
x→key = k;
if(p && k < p→key){
cut(h, x);
cascading_cut(h, p);
}
if(k < h→minTr→key)
h→minTr = x;
}
Exercise 9.9
Prove that Decrease-Key algorithm is amortized O(1) time.
degree(yi ) = degree(x) = i − 1
After that, node yi can at most lost 1 child, (due to the decreasing key
operation) otherwise, if it will be immediately cut off and append to root list
after the second child loss. Thus we conclude
degree(yi ) ≥ i − 2
For any i = 2, 3, ..., k.
Let sk be the minimum possible size of node x, where degree(x) = k. For
trivial cases, s0 = 1, s1 = 2, and we have
|x| ≥ sk
∑
k
= 2+ sdegree(yi )
i=2
∑
k
≥ 2+ si−2
i=2
We next show that sk > Fk+2 . This can be proved by induction. For trivial
cases, we have s0 = 1 ≥ F2 = 1, and s1 = 2 ≥ F3 = 2. For induction case k ≥ 2.
We have
|x| ≥ sk
∑
k
≥ 2+ si−2
i=2
∑
k
≥ 2+ Fi
i=2
∑
k
= 1+ Fi
i=0
∑
k
Fk+2 = 1 + Fi (9.19)
i=0
9.3. FIBONACCI HEAPS 251
• Trivial case, F2 = 1 + F0 = 2
• Induction case,
Fk+2 = Fk+1 + Fk
∑
k−1
= 1+ Fi + Fk
i=0
∑
k
= 1+ Fi
i=0
n ≥ |x| ≥ Fk + 2 (9.20)
√
Recall the result of AVL tree, that Fk ≥ ϕk , where ϕ = 1+2 5 is the golden
ratio. We also proved that pop operation is amortized O(lg n) algorithm.
Based on this result. We can define Function M axDegree as the following.
9.4.1 Definition
Both Binomial Heaps and Fibonacci Heaps are realized with forest. While a
pairing heaps is essentially a K-ary tree. The minimum element is stored at
root. All other elements are stored in sub trees.
The following Haskell program defines pairing heap.
data PHeap a = E | Node a [PHeap a]
This is a recursive definition, that a pairing heap is either empty or a K-ary
tree, which is consist of a root node, and a list of sub trees.
Pairing heap can also be defined in procedural languages, for example ANSI
C as below. For illustration purpose, all heaps we mentioned later are minimum-
heap, and we assume the type of key is integer 4 . We use same linked-list based
left-child, right-sibling approach (aka, binary tree representation[2]).
typedef int Key;
struct node{
Key key;
struct node ∗next, ∗children, ∗parent;
};
Note that the parent field does only make sense for decreasing key operation,
which will be explained later on. we can omit it for the time being.
• Trivial case, one heap is empty, we simply return the other heap as the
result;
• Otherwise, we compare the root element of the two heaps, make the heap
with bigger root element as a new children of the other.
Let H1 , and H2 denote the two heaps, x and y be the root element of H1
and H2 respectively. Function Children() returns the children of a K-ary tree.
Function N ode() can construct a K-ary tree from a root element and a list of
children.
H1 : H2 = ϕ
H2 : H1 = ϕ
merge(H1 , H2 ) = (9.22)
N ode(x, {H2 } ∪ Children(H1 )) : x<y
N ode(y, {H1 } ∪ Children(H2 )) : otherwise
Where
x = Root(H1 )
y = Root(H2 )
It’s obviously that merging algorithm is bound to O(1) time 5 . The merge
equation can be translated to the following Haskell program.
merge h E = h
merge E h = h
merge h1@(Node x hs1) h2@(Node y hs2) =
if x < y then Node x (h2:hs1) else Node y (h1:hs2)
Merge can also be realized imperatively. With left-child, right sibling ap-
proach, we can just link the heap, which is in fact a K-ary tree, with larger key
as the first new child of the other. This is constant time operation as described
below.
1: function Merge(H1 , H2 )
2: if H1 = NIL then
3: return H2
4: if H2 = NIL then
5: return H1
6: if Key(H2 ) < Key(H1 ) then
7: Exchange(H1 ↔ H2 )
8: Insert H2 in front of Children(H1 )
9: Parent(H2 ) ← H1
10: return H1
Note that we also update the parent field accordingly. The ANSI C example
program is given as the following.
struct node∗ merge(struct node∗ h1, struct node∗ h2) {
if (h1 == NULL)
return h2;
if (h2 == NULL)
return h1;
5 Assume ∪ is constant time operation, this is true for linked-list settings, including ’cons’
Exercise 9.10
Implement the program of removing a node from the children of its parent
in your favorite imperative programming language. Consider how can we ensure
the overall performance of decreasing key is O(1) time? Is left-child, right sibling
approach enough?
9.4. PAIRING HEAPS 255
5 4 3 12 7 10 11 6 9
15 13 8 17 14
16
5 4 3 12 7 10 11 6 9
15 13 8 17 14
16
(b) After root element 2 being removed, there are 9 sub-trees left.
4 3 7 6 9
5 13 12 8 10 11 7 14
15 16
(c) Merge every two trees in pair, note that there are odd
number trees, so the last one needn’t merge.
Figure 9.16: Remove the root element, and merge children in pairs.
256CHAPTER 9. BINOMIAL HEAP, FIBONACCI HEAP, AND PAIRING HEAP
6 6
9 11 7 9 11
7 14 10 14
16 16
(a) Merge tree with 9, and tree with root 6. (b) Merge tree with root 7 to the result.
3 3
6 12 8 4 6 12 8
7 9 11 5 13 7 9 11
10 14 15 10 14
16 16
(c) Merge tree with root 3 to the result. (d) Merge tree with root 4 to the result.
Φ : A=Φ
mergeP airs(A) = T1 : A = {T1 }
merge(merge(T1 , T2 ), mergeP airs(A′ )) : otherwise
(9.25)
where
A′ = {T3 , T4 , ..., Tm }
is the rest of the children without the first two trees.
The relative Haskell program of popping is given as the following.
deleteMin (Node _ hs) = mergePairs hs where
mergePairs [] = E
mergePairs [h] = h
mergePairs (h1:h2:hs) = merge (merge h1 h2) (mergePairs hs)
The popping operation can also be explained in the following procedural
algorithm.
1: function Pop(H)
2: L ← N IL
3: for every 2 trees Tx , Ty ∈ Children(H) from left to right do
4: Extract x, and y from Children(H)
5: T ← Merge(Tx , Ty )
6: Insert T at the beginning of L
7: H ← Children(H) ▷ H is either N IL or one tree.
8: for ∀T ∈ L from left to right do
9: H ← Merge(H, T )
10: return H
Note that L is initialized as an empty linked-list, then the algorithm iterates
every two trees in pair in the children of the K-ary tree, from left to right, and
performs merging, the result is inserted at the beginning of L. Because we insert
to front end, so when we traverse L later on, we actually process from right to
left. There may be odd number of sub-trees in H, in that case, it will leave one
tree after pair-merging. We handle it by start the right to left merging from
this left tree.
Below is the ANSI C program to this algorithm.
struct node∗ pop(struct node∗ h) {
struct node ∗x, ∗y, ∗lst = NULL;
while ((x = h→children) != NULL) {
if ((h→children = y = x→next) != NULL)
h→children = h→children→next;
lst = push_front(lst, merge(x, y));
}
x = NULL;
while((y = lst) != NULL) {
lst = lst→next;
258CHAPTER 9. BINOMIAL HEAP, FIBONACCI HEAP, AND PAIRING HEAP
x = merge(x, y);
}
free(h);
return x;
}
Exercise 9.11
Write a program to insert a tree at the beginning of a linked-list in your
favorite imperative programming language.
Delete a node
We didn’t mention delete in Binomial heap or Fibonacci Heap. Deletion can be
realized by first decreasing key to minus infinity (−∞), then performing pop.
In this section, we present another solution for delete node.
The algorithm is to define the function delete(H, x), where x is a node in a
pairing heap H 6 .
If x is root, we can just perform a pop operation. Otherwise, we can cut x
from H, perform a pop on x, and then merge the pop result back to H. This
can be described as the following.
{
pop(H) : x is root of H
delete(H, x) = (9.26)
merge(cut(H, x), pop(x)) : otherwise
Exercise 9.12
some latest textbooks. We also present pairing heap, which is easy to realize
and have good performance in practice.
The elementary tree based data structures are all introduced in this book.
There are still many tree based data structures which we can’t covers them all
and skip here. We encourage the reader to refer to other textbooks about them.
From next chapter, we’ll introduce generic sequence data structures, array and
queue.
260CHAPTER 9. BINOMIAL HEAP, FIBONACCI HEAP, AND PAIRING HEAP
Bibliography
261
262 BIBLIOGRAPHY
Part IV
263
Chapter 10
10.1 Introduction
It seems that queues are relative simple. A queue provides FIFO (first-in, first-
out) data manipulation support. There are many options to realize queue in-
cludes singly linked-list, doubly linked-list, circular buffer etc. However, we’ll
show that it’s not so easy to realize queue in purely functional settings if it must
satisfy abstract queue properties.
In this chapter, we’ll present several different approaches to implement
queue. A queue is a FIFO data structure satisfies the following performance
constraints.
• Element can be added to the tail of the queue in O(1) constant time;
• Element can be removed from the head of the queue in O(1) constant
time.
These two properties must be satisfied. And it’s common to add some extra
goals, such as dynamic memory allocation etc.
Of course such abstract queue interface can be implemented with doubly-
linked list trivially. But this is a overkill solution. We can even implement
imperative queue with singly linked-list or plain array. However, our main
question here is about how to realize a purely functional queue as well?
We’ll first review the typical queue solution which is realized by singly linked-
list and circular buffer in first section; Then we give a simple and straightforward
functional solution in the second section. While the performance is ensured in
terms of amortized constant time, we need find real-time solution (or worst-case
solution) for some special case. Such solution will be described in the third
and the fourth section. Finally, we’ll show a very simple real-time queue which
depends on lazy evaluation.
Most of the functional contents are based on Chris, Okasaki’s great work in
[6]. There are more than 16 different types of purely functional queue given in
that material.
265
266 CHAPTER 10. QUEUE, NOT SO SIMPLE AS IT WAS THOUGHT
struct Node{
Key key;
struct Node∗ next;
};
struct Queue{
struct Node ∗head, ∗tail;
};
Figure 10.1 illustrates an empty list. Both head and tail point to the sentinel
NIL node.
head tail
Figure 10.1: The empty queue, both head and tail point to sentinel node.
Note the difference between Dequeue and Head. Head only retrieve next
element in FIFO order without removing it, while Dequeue performs removing.
In some programming languages, such as Haskell, and most object-oriented
languages, the above abstract queue interface can be ensured by some definition.
For example, the following Haskell code specifies the abstract queue.
class Queue q where
empty :: q a
isEmpty :: q a → Bool
push :: q a → a → q a -- Or named as 'snoc', append, pushλ_back
pop :: q a → q a -- Or named as 'tail', popλ_front
front :: q a → a -- Or named as 'head'
To ensure the constant time Enqueue and Dequeue, we add new element
to head and remove element from tail.2
function Enqueue(Q, x)
p ← Create-New-Node
Key(p) ← x
Next(p) ← N IL
Next(Tail(Q)) ← p
Tail(Q) ← p
Note that, as we use the sentinel node, there are at least one node, the
sentinel in the queue. That’s why we needn’t check the validation of of the tail
before we append the new created node p to it.
function Dequeue(Q)
x ← Head(Q)
Next(Head(Q)) ← Next(x)
if x = Tail(Q) then ▷ Q gets empty
Tail(Q) ← Head(Q)
return Key(x)
As we always put the sentinel node in front of all the other nodes, function
Head actually returns the next node to the sentinel.
Figure 10.2 illustrates Enqueue and Dequeue process with sentinel node.
Translating the pseudo code to ANSI C program yields the below code.
struct Queue∗ enqueue(struct Queue∗ q, Key x) {
struct Node∗ p = (struct Node∗)malloc(sizeof(struct Node));
p→key = x;
p→next = NULL;
q→tail→next = p;
q→tail = p;
return q;
}
Enqueue
head tail
head tail
Dequeue
head tail
q→tail = q→head;
free(p);
return x;
}
This solution is simple and robust. It’s easy to extend this solution even to
the concurrent environment (e.g. multicores). We can assign a lock to the head
and use another lock to the tail. The sentinel helps us from being dead-locked
due to the empty case [1] [2].
Exercise 10.1
When initialize the queue, we are explicitly asked to provide the maximum
size as the parameter.
struct QueueBuf∗ createQ(int max){
struct QueueBuf∗ q = (struct QueueBuf∗)malloc(sizeof(struct QueueBuf));
q→buf = (Key∗)malloc(sizeof(Key)∗max);
q→size = max;
q→head = q→cnt = 0;
return q;
}
With the counter variable, we can compare it with zero and the capacity to
test if the queue is empty or full.
function Empty?(Q)
return Count(Q) = 0
270 CHAPTER 10. QUEUE, NOT SO SIMPLE AS IT WAS THOUGHT
Exercise 10.2
The circular buffer is allocated with a maximum size parameter. Can we
test the queue is empty or full with only head and tail pointers? Note that the
head can be either before or after the tail.
EnQueue O(1) x[n] x[n-1] ... x[2] x[1] NIL DeQueue O(n)
EnQueue O(n) x[n] x[n-1] ... x[2] x[1] NIL DeQueue O(1)
Figure 10.5: DeQueue and EnQueue can’t perform both in constant O(1)
time with a list.
We neither can add a pointer to record the tail position of the list as what
we have done in the imperative settings like in the ANSI C program, because
of the nature of purely functional.
Chris Okasaki mentioned a simple and straightforward functional solution
in [6]. The idea is to maintain two linked-lists as a queue, and concatenate these
two lists in a tail-to-tail manner. The shape of the queue looks like a horseshoe
magnet as shown in figure 10.6.
With this setup, we push new element to the head of the rear list, which is
ensure to be O(1) constant time; on the other hand, we pop element from the
head of the front list, which is also O(1) constant time. So that the abstract
queue properties can be satisfied.
The definition of such paired-list queue can be expressed in the following
Haskell code.
type Queue a = ([a], [a])
front
rear
Figure 10.6: A queue with front and rear list shapes like a horseshoe magnet.
274 CHAPTER 10. QUEUE, NOT SO SIMPLE AS IT WAS THOUGHT
Suppose function f ront(Q) and rear(Q) return the front and rear list in
such setup, and Queue(F, R) create a paired-list queue from two lists F and R.
The EnQueue (push) and DeQueue (pop) operations can be easily realized
based on this setup.
{
Queue(reverse(R), ϕ) : F = ϕ
balance(F, R) = (10.3)
Q : otherwise
Thus if front list isn’t empty, we do nothing, while when the front list be-
comes empty, we use the reversed rear list as the new front list, and the new
rear list is empty.
The new enqueue and dequeue algorithms are updated as below.
Sum up the above algorithms and translate them to Haskell yields the fol-
lowing program.
balance :: Queue a → Queue a
balance ([], r) = (reverse r, [])
balance q = q
Although we only touch the heads of front list and rear list, the overall
performance can’t be kept always as O(1). Actually, the performance of this
algorithm is amortized O(1). This is because the reverse operation takes time
proportion to the length of the rear list. it’s bound O(n) time, where N = |R|.
We left the prove of amortized performance as an exercise to the reader.
10.3. PURELY FUNCTIONAL SOLUTION 275
front array
rear array
Figure 10.7: A queue with front and rear arrays shapes like a horseshoe magnet.
3
We can define such paired-array queue like the following Python code
class Queue:
def __init__(self):
self.front = []
self.rear = []
3 Legacy Basic code is not presented here. And we actually use list but not array in Python
to illustrate the idea. ANSI C and ISO C++ programs are provides along with this chapter,
they show more in a purely array manner.
276 CHAPTER 10. QUEUE, NOT SO SIMPLE AS IT WAS THOUGHT
def is_empty(q):
return q.front == [] and q.rear == []
The relative Push() and Pop() algorithm only manipulate on the tail of the
arrays.
function Push(Q, x)
Append(Rear(Q), x)
Here we assume that the Append() algorithm append element x to the end
of the array, and handle the necessary memory allocation etc. Actually, there
are multiple memory handling approaches. For example, besides the dynamic
re-allocation, we can initialize the array with enough space, and just report error
if it’s full.
function Pop(Q)
if Front(Q) = ϕ then
Front(Q) ← Reverse(Rear(Q))
Rear(Q) ← ϕ
n ← Length(Front(Q))
x ← Front(Q)[n]
Length(Front(Q)) ← n − 1
return x
For simplification and pure illustration purpose, the array isn’t shrunk ex-
plicitly after elements removed. So test if front array is empty (ϕ) can be realized
as check if the length of the array is zero. We omit all these details here.
The enqueue and dequeue algorithms can be translated to Python programs
straightforwardly.
def push(q, x):
q.rear.append(x)
def pop(q):
if q.front == []:
q.rear.reverse()
(q.front, q.rear) = (q.rear, [])
return q.front.pop()
Exercise 10.3