100% found this document useful (1 vote)

389 views618 pages

Elementary Algorithms

A book about algorithms

Uploaded by

Aditya Paliwal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

389 views618 pages

Elementary Algorithms

A book about algorithms

Uploaded by

Aditya Paliwal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Elementary Algorithms

Larry LIU Xinyu

July 25, 2014

Larry LIU Xinyu

Version: 0.6180339887498949
Email: liuxinyu95@[Link]

Contents
I

Preface
0.1
0.2

0.3

0.4
0.5
0.6

Why? . . . . . . . . . . . . . . . . . . . . . . . . . . .
The smallest free ID problem, the power of algorithms
0.2.1 Improvement 1 . . . . . . . . . . . . . . . . . .
0.2.2 Improvement 2, Divide and Conquer . . . . . .
0.2.3 Expressiveness vs. Performance . . . . . . . . .
The number puzzle, power of data structure . . . . . .
0.3.1 The brute-force solution . . . . . . . . . . . . .
0.3.2 Improvement 1 . . . . . . . . . . . . . . . . . .
0.3.3 Improvement 2 . . . . . . . . . . . . . . . . . .
Notes and short summary . . . . . . . . . . . . . . . .
Structure of the contents . . . . . . . . . . . . . . . . .
Appendix . . . . . . . . . . . . . . . . . . . . . . . . .

.
.
.
.
.
.
.
.
.
.
.
.

Trees

1 Binary search tree, the hello world data

1.1 Introduction . . . . . . . . . . . . . . . . .
1.2 Data Layout . . . . . . . . . . . . . . . . .
1.3 Insertion . . . . . . . . . . . . . . . . . . .
1.4 Traversing . . . . . . . . . . . . . . . . . .
1.5 Querying a binary search tree . . . . . . .
1.5.1 Looking up . . . . . . . . . . . . .
1.5.2 Minimum and maximum . . . . . .
1.5.3 Successor and predecessor . . . . .
1.6 Deletion . . . . . . . . . . . . . . . . . . .
1.7 Randomly build binary search tree . . . .
1.8 Appendix . . . . . . . . . . . . . . . . . .
2 The
2.1
2.2
2.3
2.4
2.5
2.6

7
7
8
9
10
12
12
12
15
18
18
19

evolution of insertion sort

Introduction . . . . . . . . . . . . . . . .
Insertion . . . . . . . . . . . . . . . . . .
Improvement 1 . . . . . . . . . . . . . .
Improvement 2 . . . . . . . . . . . . . .
Final improvement by binary search tree
Short summary . . . . . . . . . . . . . .
3

.
.
.
.
.
.

structure
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .

.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

25
25
27
29
30
33
33
34
34
36
40
40

.
.
.
.
.
.

43
43
44
46
47
49
50

CONTENTS

3 Red-black tree, not so complex as it was thought

3.1 Introduction . . . . . . . . . . . . . . . . . . . . . .
3.1.1 Exploit the binary search tree . . . . . . . .
3.1.2 How to ensure the balance of the tree . . .
3.1.3 Tree rotation . . . . . . . . . . . . . . . . .
3.2 Definition of red-black tree . . . . . . . . . . . . .
3.3 Insertion . . . . . . . . . . . . . . . . . . . . . . . .
3.4 Deletion . . . . . . . . . . . . . . . . . . . . . . . .
3.5 Imperative red-black tree algorithm ? . . . . . . .
3.6 More words . . . . . . . . . . . . . . . . . . . . . .

.
.
.
.
.
.
.
.
.

53
53
53
54
56
58
59
63
71
73

4 AVL tree
4.1 Introduction . . . . . . . . . . . . . . . . . . .
4.1.1 How to measure the balance of a tree?
4.2 Definition of AVL tree . . . . . . . . . . . . .
4.3 Insertion . . . . . . . . . . . . . . . . . . . . .
4.3.1 Balancing adjustment . . . . . . . . .
4.3.2 Pattern Matching . . . . . . . . . . . .
4.4 Deletion . . . . . . . . . . . . . . . . . . . . .
4.5 Imperative AVL tree algorithm ? . . . . . . .
4.6 Chapter note . . . . . . . . . . . . . . . . . .

.
.
.
.
.
.
.
.
.

77
77
77
77
80
82
86
87
88
92

5 Trie and Patricia

5.1 Introduction . . . . . . . . . . . . . . . . . . . .
5.2 Integer Trie . . . . . . . . . . . . . . . . . . . .
5.2.1 Definition of integer Trie . . . . . . . . .
5.2.2 Insertion . . . . . . . . . . . . . . . . . .
5.2.3 Look up . . . . . . . . . . . . . . . . . .
5.3 Integer Patricia . . . . . . . . . . . . . . . . . .
5.3.1 Definition . . . . . . . . . . . . . . . . .
5.3.2 Insertion . . . . . . . . . . . . . . . . . .
5.3.3 Look up . . . . . . . . . . . . . . . . . .
5.4 Alphabetic Trie . . . . . . . . . . . . . . . . . .
5.4.1 Definition . . . . . . . . . . . . . . . . .
5.4.2 Insertion . . . . . . . . . . . . . . . . . .
5.4.3 Look up . . . . . . . . . . . . . . . . . .
5.5 Alphabetic Patricia . . . . . . . . . . . . . . . .
5.5.1 Definition . . . . . . . . . . . . . . . . .
5.5.2 Insertion . . . . . . . . . . . . . . . . . .
5.5.3 Look up . . . . . . . . . . . . . . . . . .
5.6 Trie and Patricia applications . . . . . . . . . .
5.6.1 E-dictionary and word auto-completion
5.6.2 T9 input method . . . . . . . . . . . . .
5.7 Summary . . . . . . . . . . . . . . . . . . . . .

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

95
95
96
96
97
99
100
100
102
108
109
109
111
112
113
113
114
119
121
121
125
130

6 Suffix Tree
6.1 Introduction . . . . . . . . . . . . . .
6.2 Suffix trie . . . . . . . . . . . . . . .
6.2.1 Node transfer and suffix link
6.2.2 On-line construction . . . . .

.
.
.
.

133
133
134
135
136

.
.
.
.

CONTENTS
6.3
6.4

6.5

Suffix
6.3.1
Suffix
6.4.1
6.4.2
6.4.3
6.4.4
6.4.5
Notes

5
Tree . . . . . . . . . . . . . . . . . . .
On-line construction . . . . . . . . .
tree applications . . . . . . . . . . . .
String/Pattern searching . . . . . . .
Find the longest repeated sub-string
Find the longest common sub-string
Find the longest palindrome . . . . .
Others . . . . . . . . . . . . . . . . .
and short summary . . . . . . . . . .

7 B-Trees
7.1 Introduction . . . . . . . . . . . . .
7.2 Insertion . . . . . . . . . . . . . . .
7.2.1 Splitting . . . . . . . . . . .
7.3 Deletion . . . . . . . . . . . . . . .
7.3.1 Merge before delete method
7.3.2 Delete and fix method . . .
7.4 Searching . . . . . . . . . . . . . .
7.5 Notes and short summary . . . . .

III

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

141
141
150
150
152
153
155
156
156

.
.
.
.
.
.
.
.

159
159
161
161
168
168
176
181
183

Heaps

8 Binary Heaps
8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . .
8.2 Implicit binary heap by array . . . . . . . . . . . . . .
8.2.1 Definition . . . . . . . . . . . . . . . . . . . . .
8.2.2 Heapify . . . . . . . . . . . . . . . . . . . . . .
8.2.3 Build a heap . . . . . . . . . . . . . . . . . . .
8.2.4 Basic heap operations . . . . . . . . . . . . . .
8.2.5 Heap sort . . . . . . . . . . . . . . . . . . . . .
8.3 Leftist heap and Skew heap, the explicit binary heaps
8.3.1 Definition . . . . . . . . . . . . . . . . . . . . .
8.3.2 Merge . . . . . . . . . . . . . . . . . . . . . . .
8.3.3 Basic heap operations . . . . . . . . . . . . . .
8.3.4 Heap sort by Leftist Heap . . . . . . . . . . . .
8.3.5 Skew heaps . . . . . . . . . . . . . . . . . . . .
8.4 Splay heap . . . . . . . . . . . . . . . . . . . . . . . .
8.4.1 Definition . . . . . . . . . . . . . . . . . . . . .
8.4.2 Heap sort . . . . . . . . . . . . . . . . . . . . .
8.5 Notes and short summary . . . . . . . . . . . . . . . .

187
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

9 From grape to the world cup, the evolution of selection

9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . .
9.2 Finding the minimum . . . . . . . . . . . . . . . . . . . .
9.2.1 Labeling . . . . . . . . . . . . . . . . . . . . . . . .
9.2.2 Grouping . . . . . . . . . . . . . . . . . . . . . . .
9.2.3 performance of the basic selection sorting . . . . .
9.3 Minor Improvement . . . . . . . . . . . . . . . . . . . . .
9.3.1 Parameterize the comparator . . . . . . . . . . . .

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

189
189
189
190
191
192
194
199
201
202
203
204
206
206
208
208
215
215

sort
. . .
. . .
. . .
. . .
. . .
. . .
. . .

.
.
.
.
.
.
.

219
219
221
222
223
224
225
225

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

CONTENTS
.
.
.
.
.
.

.
.
.
.
.
.

226
227
231
231
239
240

10 Binomial heap, Fibonacci heap, and pairing heap

10.1 Introduction . . . . . . . . . . . . . . . . . . . . . .
10.2 Binomial Heaps . . . . . . . . . . . . . . . . . . . .
10.2.1 Definition . . . . . . . . . . . . . . . . . . .
10.2.2 Basic heap operations . . . . . . . . . . . .
10.3 Fibonacci Heaps . . . . . . . . . . . . . . . . . . .
10.3.1 Definition . . . . . . . . . . . . . . . . . . .
10.3.2 Basic heap operations . . . . . . . . . . . .
10.3.3 Running time of pop . . . . . . . . . . . . .
10.3.4 Decreasing key . . . . . . . . . . . . . . . .
10.3.5 The name of Fibonacci Heap . . . . . . . .
10.4 Pairing Heaps . . . . . . . . . . . . . . . . . . . . .
10.4.1 Definition . . . . . . . . . . . . . . . . . . .
10.4.2 Basic heap operations . . . . . . . . . . . .
10.5 Notes and short summary . . . . . . . . . . . . . .

.
.
.
.
.
.
.
.
.
.
.
.
.
.

243
243
243
243
248
258
258
260
269
271
273
275
275
276
282

9.4

9.5

9.3.2 Trivial fine tune . . . . . .

9.3.3 Cock-tail sort . . . . . . . .
Major improvement . . . . . . . .
9.4.1 Tournament knock out . . .
9.4.2 Final improvement by using
Short summary . . . . . . . . . . .

. . .
. . .
. . .
. . .
heap
. . .

. . .
. . .
. . .
. . .
sort
. . .

.
.
.
.
.
.

Queues and Sequences

285

11 Queue, not so simple as it was thought

11.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . .
11.2 Queue by linked-list and circular buffer . . . . . . . . . .
11.2.1 Singly linked-list solution . . . . . . . . . . . . . .
11.2.2 Circular buffer solution . . . . . . . . . . . . . . .
11.3 Purely functional solution . . . . . . . . . . . . . . . . . .
11.3.1 Paired-list queue . . . . . . . . . . . . . . . . . . .
11.3.2 Paired-array queue - a symmetric implementation
11.4 A small improvement, Balanced Queue . . . . . . . . . . .
11.5 One more step improvement, Real-time Queue . . . . . .
11.6 Lazy real-time queue . . . . . . . . . . . . . . . . . . . . .
11.7 Notes and short summary . . . . . . . . . . . . . . . . . .

.
.
.
.
.
.
.
.
.
.
.

287
287
288
288
291
294
294
296
298
300
307
310

12 Sequences, The last brick

12.1 Introduction . . . . . . . . . . . . . . . . . . . . . . .
12.2 Binary random access list . . . . . . . . . . . . . . .
12.2.1 Review of plain-array and list . . . . . . . . .
12.2.2 Represent sequence by trees . . . . . . . . . .
12.2.3 Insertion to the head of the sequence . . . . .
12.3 Numeric representation for binary random access list
12.3.1 Imperative binary access list . . . . . . . . .
12.4 Imperative paired-array list . . . . . . . . . . . . . .
12.4.1 Definition . . . . . . . . . . . . . . . . . . . .
12.4.2 Insertion and appending . . . . . . . . . . . .

.
.
.
.
.
.
.
.
.
.

313
313
314
314
314
316
322
324
327
327
328

.
.
.
.
.
.
.
.
.
.

CONTENTS

12.4.3 random access . . . . . . . . . . . . . . . . . . . .

12.4.4 removing and balancing . . . . . . . . . . . . . . .
12.5 Concatenate-able list . . . . . . . . . . . . . . . . . . . . .
12.6 Finger tree . . . . . . . . . . . . . . . . . . . . . . . . . .
12.6.1 Definition . . . . . . . . . . . . . . . . . . . . . . .
12.6.2 Insert element to the head of sequence . . . . . . .
12.6.3 Remove element from the head of sequence . . . .
12.6.4 Handling the ill-formed finger tree when removing
12.6.5 append element to the tail of the sequence . . . . .
12.6.6 remove element from the tail of the sequence . . .
12.6.7 concatenate . . . . . . . . . . . . . . . . . . . . . .
12.6.8 Random access of finger tree . . . . . . . . . . . .
12.7 Notes and short summary . . . . . . . . . . . . . . . . . .

.
.
.
.
.
.
.
.
.
.
.
.
.

Sorting and Searching

328
329
331
335
335
337
340
341
346
347
349
354
365

369

13 Divide and conquer, Quick sort vs. Merge sort

13.1 Introduction . . . . . . . . . . . . . . . . . . . . . . .
13.2 Quick sort . . . . . . . . . . . . . . . . . . . . . . . .
13.2.1 Basic version . . . . . . . . . . . . . . . . . .
13.2.2 Strict weak ordering . . . . . . . . . . . . . .
13.2.3 Partition . . . . . . . . . . . . . . . . . . . .
13.2.4 Minor improvement in functional partition .
13.3 Performance analysis for quick sort . . . . . . . . . .
13.3.1 Average case analysis ? . . . . . . . . . . . .
13.4 Engineering Improvement . . . . . . . . . . . . . . .
13.4.1 Engineering solution to duplicated elements .
13.5 Engineering solution to the worst case . . . . . . . .
13.6 Other engineering practice . . . . . . . . . . . . . . .
13.7 Side words . . . . . . . . . . . . . . . . . . . . . . . .
13.8 Merge sort . . . . . . . . . . . . . . . . . . . . . . . .
13.8.1 Basic version . . . . . . . . . . . . . . . . . .
13.9 In-place merge sort . . . . . . . . . . . . . . . . . . .
13.9.1 Naive in-place merge . . . . . . . . . . . . . .
13.9.2 in-place working area . . . . . . . . . . . . .
13.9.3 In-place merge sort vs. linked-list merge sort
13.10Nature merge sort . . . . . . . . . . . . . . . . . . .
13.11Bottom-up merge sort . . . . . . . . . . . . . . . . .
13.12Parallelism . . . . . . . . . . . . . . . . . . . . . . .
13.13Short summary . . . . . . . . . . . . . . . . . . . . .

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

371
371
371
372
373
374
377
379
380
383
383
390
394
395
395
396
403
403
404
409
411
416
419
419

14 Searching
14.1 Introduction . . . . . . . . . . . . .
14.2 Sequence search . . . . . . . . . . .
14.2.1 Divide and conquer search .
14.2.2 Information reuse . . . . . .
14.3 Solution searching . . . . . . . . .
14.3.1 DFS and BFS . . . . . . . .
14.3.2 Search the optimal solution

.
.
.
.
.
.
.

423
423
423
424
444
471
471
507

.
.
.
.
.
.
.

CONTENTS
14.4 Short summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 535

Appendix

539

Appendices
A Lists
A.1 Introduction . . . . . . . . . . . . . . . . . . .
A.2 List Definition . . . . . . . . . . . . . . . . .
A.2.1 Empty list . . . . . . . . . . . . . . . .
A.2.2 Access the element and the sub list . .
A.3 Basic list manipulation . . . . . . . . . . . . .
A.3.1 Construction . . . . . . . . . . . . . .
A.3.2 Empty testing and length calculating .
A.3.3 indexing . . . . . . . . . . . . . . . . .
A.3.4 Access the last element . . . . . . . .
A.3.5 Reverse indexing . . . . . . . . . . . .
A.3.6 Mutating . . . . . . . . . . . . . . . .
A.3.7 sum and product . . . . . . . . . . . .
A.3.8 maximum and minimum . . . . . . . .
A.4 Transformation . . . . . . . . . . . . . . . . .
A.4.1 mapping and for-each . . . . . . . . .
A.4.2 reverse . . . . . . . . . . . . . . . . . .
A.5 Extract sub-lists . . . . . . . . . . . . . . . .
A.5.1 take, drop, and split-at . . . . . . . .
A.5.2 breaking and grouping . . . . . . . . .
A.6 Folding . . . . . . . . . . . . . . . . . . . . .
A.6.1 folding from right . . . . . . . . . . . .
A.6.2 folding from left . . . . . . . . . . . .
A.6.3 folding in practice . . . . . . . . . . .
A.7 Searching and matching . . . . . . . . . . . .
A.7.1 Existence testing . . . . . . . . . . . .
A.7.2 Looking up . . . . . . . . . . . . . . .
A.7.3 finding and filtering . . . . . . . . . .
A.7.4 Matching . . . . . . . . . . . . . . . .
A.8 zipping and unzipping . . . . . . . . . . . . .
A.9 Notes and short summary . . . . . . . . . . .

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

541
541
541
542
542
543
543
544
545
546
547
549
559
563
567
567
573
575
575
577
582
582
584
587
588
588
588
589
592
594
597

GNU Free Documentation License

1. APPLICABILITY AND DEFINITIONS . . . . . . .
2. VERBATIM COPYING . . . . . . . . . . . . . . . .
3. COPYING IN QUANTITY . . . . . . . . . . . . . .
4. MODIFICATIONS . . . . . . . . . . . . . . . . . . .
5. COMBINING DOCUMENTS . . . . . . . . . . . . .
6. COLLECTIONS OF DOCUMENTS . . . . . . . . .
7. AGGREGATION WITH INDEPENDENT WORKS
8. TRANSLATION . . . . . . . . . . . . . . . . . . . .
9. TERMINATION . . . . . . . . . . . . . . . . . . . .
10. FUTURE REVISIONS OF THIS LICENSE . . . .

.
.
.
.
.
.
.
.
.
.

601
601
603
603
604
605
606
606
606
607
607

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

CONTENTS

11. RELICENSING . . . . . . . . . . . . . . . . . . . . . . . . . . . . 607

ADDENDUM: How to use this License for your documents . . . . . . 608

CONTENTS

Part I

Preface

Elementary Algorithms

0.1

Why?

Are algorithms useful?. Some programmers say that they seldom use any
serious data structures or algorithms in real work such as commercial application
development. Even when they need some of them, they have already been
provided by libraries. For example, the C++ standard template library (STL)
provides sort and selection algorithms as well as the vector, queue, and set data
structures. It seems that knowing about how to use the library as a tool is quite
enough.
Instead of answering this question directly, I would like to say algorithms
and data structures are critical in solving interesting problems, the usefulness
of the problem set aside.
Lets start with two problems that looks like they can be solved in a bruteforce way even by a fresh programmer.

0.2

The smallest free ID problem, the power of

algorithms

This problem is discussed in Chapter 1 of Richard Birds book [1]. Its common
that applications and systems use ID (identifier) to manage objects and entities.
At any time, some IDs are used, and some of them are available for use. When
some client tries to acquire a new ID, we want to always allocate it the smallest
available one. Suppose IDs are non-negative integers and all IDs in use are kept
in a list (or an array) which is not ordered. For example:
[18, 4, 8, 9, 16, 1, 14, 7, 19, 3, 0, 5, 2, 11, 6]
How can you find the smallest free ID, which is 10, from the list?
It seems the solution is quite easy even without any serious algorithms.
1: function Min-Free(A)
2:
x0
3:
loop
4:
if x
/ A then
5:
return x
6:
else
7:
xx+1
Where the
/ is realized like below.
1: function (x,
/
X)
2:
for i 1 to |X| do
3:
if x = X[i] then
4:
return False
5:
return True
Some languages provide handy tools which wrap this linear time process. For
example in Python, this algorithm can be directly translated as the following.
def b r u t e f o r c e ( l s t ) :
i = 0
while True :
i f i not in l s t :

Preface
return i
i = i + 1

It seems this problem is trivial. However, There will be millions of IDs in a

large system. The speed of this solution is poor in such case for it takes O(n2 )
time, where n is the length of the ID list. In my computer (2 Cores 2.10 GHz,
with 2G RAM), a C program using this solution takes an average of 5.4 seconds
to search a minimum free number among 100,000 IDs1 . And it takes more than
8 minutes to handle a million numbers.

0.2.1

Improvement 1

The key idea to improve the solution is based on a fact that for a series of n
numbers x1 , x2 , ..., xn , if there are free numbers, some of the xi are outside the
range [0, n); otherwise the list is exactly a permutation of 0, 1, ..., n 1 and n
should be returned as the minimum free number. It means that max(xi ) n1.
And we have the following fact.
minf ree(x1 , x2 , ..., xn ) n

(1)

One solution is to use an array of n + 1 flags to mark whether a number in

range [0, n] is free.
1: function Min-Free(A)
2:
F [F alse, F alse, ..., F alse] where |F | = n + 1
3:
for x A do
4:
if x < n then
5:
F [x] True
6:
for i [0, n] do
7:
if F [i] = False then
8:
return i
Line 2 initializes a flag array all of False values. This takes O(n) time. Then
the algorithm scans all numbers in A and mark the relative flag to True if the
value is less than n, This step also takes O(n) time. Finally, the algorithm
performs a linear time search to find the first flag with False value. So the total
performance of this algorithm is O(n). Note that we use n + 1 flags instead of
n flags to cover the special case that sorted(A) = [0, 1, 2, ..., n 1].
Although the algorithm only takes O(n) time, it needs extra O(n) spaces to
store the flags.
This solution is much faster than the brute force one. On my computer,
the relevant Python program takes an average of 0.02 second when dealing with
100,000 numbers.
We havent fine tuned this algorithm yet. Observe that each time we have
to allocate memory to create a n + 1 elements array of flags, and release the
memory when finished. The memory allocation and release is very expensive
thus they cost us a lot of processing time.
There are two ways in which we can improve on this solution. One is to
allocate the flags array in advance and reuse it for all the calls of our function to
find the smallest free number. The other is to use bit-wise flags instead of a flag
array. The following is the C program based on these two minor improvements.
1 All

programs can be downloaded along with this series posts.

0.2. THE SMALLEST FREE ID PROBLEM, THE POWER OF ALGORITHMS15

#define N 1000000 // 1 m i l l i o n
#define WORD LENGTH s i z e o f ( int ) 8
void s e t b i t ( unsigned int b i t s , unsigned int i ) {
b i t s [ i / WORD LENGTH] |= 1<<( i % WORD LENGTH) ;
}
int t e s t b i t ( unsigned int b i t s , unsigned int i ) {
return b i t s [ i /WORD LENGTH] & (1<<( i % WORD LENGTH) ) ;
}
unsigned int b i t s [N/WORD LENGTH+ 1 ] ;
int m i n f r e e ( int xs , int n ) {
int i , l e n = N/WORD LENGTH+1;
fo r ( i =0; i <l e n ; ++i )
b i t s [ i ]=0;
fo r ( i =0; i <n ; ++i )
i f ( xs [ i ]<n )
s e t b i t ( b i t s , xs [ i ] ) ;
fo r ( i =0; i<=n ; ++i )
i f ( ! t e s t b i t ( bits , i ))
return i ;
}
This C program can handle 1,000,000 (1 million) IDs in just 0.023 second
on my computer.
The last for-loop can be further improved as seen below but this is just minor
fine-tuning.
fo r ( i =0; ; ++i )
i f ( b i t s [ i ] !=0 )
for ( j =0; ; ++j )
i f ( ! t e s t b i t ( b i t s , i WORD LENGTH+j ) )
return i WORD LENGTH+j ;

0.2.2

Improvement 2, Divide and Conquer

Although the above improvement is much faster, it costs O(n) extra spaces to
keep a check list. if n is huge number this means a huge amount of space is
wasted.
The typical divide and conquer strategy is to break the problem into some
smaller ones, and solve these to get the final answer.
We can put all numbers xi bn/2c as a sub-list A0 and put all the others
as a second sub-list A00 . Based on formula 1 if the length of A0 is exactly bn/2c,
this means the first half of numbers are full, which indicates that the minimum
free number must be in A00 and so well need to recursively seek in the shorter
list A00 . Otherwise, it means the minimum free number is located in A0 , which
again leads to a smaller problem.
When we search the minimum free number in A00 , the conditions changes
a little bit, we are not searching the smallest free number starting from 0, but

Preface

actually from bn/2c + 1 as the lower bound. So the algorithm is something like
minf ree(A, l, u), where l is the lower bound and u is the upper bound index of
the element.
Note that there is a trivial case, that if the number list is empty, we merely
return the lower bound as the result.
This divide and conquer solution can be formally expressed as a function :
minf ree(A) = search(A, 0, |A| 1)

search(A, l, u) =

l : A=
search(A00 , m + 1, u) : |A0 | = m l + 1

search(A0 , l, m) : otherwise

where
l+u
c
2
0
A = {x A x m}
A00 = {x A x > m}

m=b

It is obvious that this algorithm doesnt need any extra space2 . Each call
performs O(|A|) comparison to build A0 and A00 . After that the problem scale
halves. So the time needed for this algorithm is T (n) = T (n/2) + O(n) which
reduce to O(n). Another way to analyze the performance is by observing that
the first call takes O(n) to build A0 and A00 and the second call takes O(n/2), and
O(n/4) for the third... The total time is O(n + n/2 + n/4 + ...) = O(2n) = O(n)
.
In functional programming languages such as Haskell, partitioning a list has
already been provided in the basic library and this algorithm can be translated
as the following.
import [Link]
minFree xs = bsearch xs 0 (length xs - 1)
bsearch xs l u | xs == [] = l
| length as == m - l + 1 = bsearch bs (m+1) u
| otherwise = bsearch as l m
where
m = (l + u) div 2
(as, bs) = partition ( m) xs

0.2.3

Expressiveness vs. Performance

Imperative language programmers may be concerned about the performance of

this kind of implementation. For instance in this minimum free ID problem, the
number of recursive calls is in O(lg n) , which means the stack size consumed
is in O(lg n). Its not free in terms of space. But if we want to avoid that , we
2 Procedural programmer may note that it actually takes O(lg n) stack spaces for bookkeeping. As well see later, this can be eliminated either by tail recursion optimization, for
instance gcc -O2. or by manually changing the recursion to iteration

0.2. THE SMALLEST FREE ID PROBLEM, THE POWER OF ALGORITHMS17

can eliminate the recursion by replacing it by an iteration
following C program.

which yields the

int min_free(int xs, int n){

int l=0;
int u=n-1;
while(n){
int m = (l + u) / 2;
int right, left = 0;
for(right = 0; right < n; ++ right)
if(xs[right] m){
swap(xs[left], xs[right]);
++left;
}
if(left == m - l + 1){
xs = xs + left;
n = n - left;
l = m+1;
}
else{
n = left;
u = m;
}
}
return l;
}

This program uses a quick-sort like approach to re-arrange the array so that
all the elements before lef t are less than or equal to m; while those between
lef t and right are greater than m. This is shown in figure 1.

left

x[i]<=m

right

x[i]>m

...?...

Figure 1: Divide the array, all x[i] m where 0 i < lef t; while all x[i] > m
where lef t i < right. The left elements are unknown.
This program is fast and it doesnt need extra stack space. However, compared to the previous Haskell program, its hard to read and the expressiveness
decreased. We have to balance performance and expressiveness.
3 This is done automatically in most functional languages since our function is in tail
recursive form which lends itself perfectly to this transformation

0.3

Preface

The number puzzle, power of data structure

If the first problem, to find the minimum free number, is a some what useful in
practice, this problem is a pure one for fun. The puzzle is to find the 1,500th
number, which only contains factor 2, 3 or 5. The first 3 numbers are of course
2, 3, and 5. Number 60 = 22 31 51 , However it is the 25th number. Number
21 = 20 31 71 , isnt a valid number because it contains a factor 7. The first 10
such numbers are list as the following.
2,3,4,5,6,8,9,10,12,15
If we consider 1 = 20 30 50 , then 1 is also a valid number and it is the first
one.

0.3.1

The brute-force solution

It seems the solution is quite easy without need any serious algorithms. We can
check all numbers from 1, then extract all factors of 2, 3 and 5 to see if the left
part is 1.
1: function Get-Number(n)
2:
x1
3:
i0
4:
loop
5:
if Valid?(x) then
6:
ii+1
7:
if i = n then
8:
return x
9:
xx+1
function Valid?(x)
while x mod 2 = 0 do
12:
x x/2
13:
while x mod 3 = 0 do
14:
x x/3
15:
while x mod 5 = 0 do
16:
x x/5
17:
if x = 1 then
18:
return T rue
19:
else
20:
return F alse
This brute-force algorithm works for most small n. However, to find the
1500th number (which is 859963392), the C program based on this algorithm
takes 40.39 seconds in my computer. I have to kill the program after 10 minutes
when I increased n to 15,000.
10:
11:

0.3.2

Improvement 1

Analysis of the above algorithm shows that modular and divide calculations
are very expensive [2]. And they executed a lot in loops. Instead of checking a
number contains only 2, 3, or 5 as factors, one alternative solution is to construct
such number by these factors.

0.3. THE NUMBER PUZZLE, POWER OF DATA STRUCTURE

We start from 1, and times it with 2, or 3, or 5 to generate rest numbers.

The problem turns to be how to generate the candidate number in order? One
handy way is to utilize the queue data structure.
A queue data structure is used to push elements at one end, and pops them
at the other end. So that the element be pushed first is also be popped out first.
This property is called FIFO (First-In-First-Out).
The idea is to push 1 as the only element to the queue, then we pop an
element, times it with 2, 3, and 5, to get 3 new elements. We then push them
back to the queue in order. Note that, the new elements may have already
existed in the queue. In such case, we just drop the element. The new element
may also smaller than the others in the queue, so we must put them to the
correct position. Figure 2 illustrates this idea.
1

1*2=2

3*2=6

1*3=3

3*3=9

1*5=5

3*5=15

2*2=4

4*2=8

2*3=6

4*3=12

2*5=10

4*5=20

Figure 2: First 4 steps of constructing numbers with a queue.

1. Queue is initialized with 1 as the only element;
2. New elements 2, 3, and 5 are pushed back;
3. New elements 4, 6, and 10, are pushed back in order;
4. New elements 9 and 15 are pushed back, element 6 already exists.

1:
2:
3:
4:
5:
6:
7:
8:
9:
10:
11:
12:
13:
14:
15:
16:
17:

This algorithm is shown as the following.

function Get-Number(n)
Q N IL
Enqueue(Q, 1)
while n > 0 do
x Dequeue(Q)
Unique-Enqueue(Q, 2x)
Unique-Enqueue(Q, 3x)
Unique-Enqueue(Q, 5x)
nn1
return x
function Unique-Enqueue(Q, x)
i0
while i < |Q| Q[i] < x do
ii+1
if i < |Q| x = Q[i] then
return
Insert(Q, i, x)

Preface

The insert function takes O(|Q|) time to find the proper position and insert
it. If the element has already existed, it just returns.
A rough estimation tells that the length of the queue increase proportion to
n, (Each time, we extract one element, and pushed 3 new, the increase ratio
2), so the total running time is O(1 + 2 + 3 + ... + n) = O(n2 ).
Figure3 shows the number of queue access time against n. It is quadratic
curve which reflect the O(n2 ) performance.

Figure 3: Queue access count v.s. n.

The C program based on this algorithm takes only 0.016[s] to get the right
answer 859963392. Which is 2500 times faster than the brute force solution.
Improvement 1 can also be considered in recursive way. Suppose X is the
infinity series for all numbers which only contain factors of 2, 3, or 5. The
following formula shows an interesting relationship.
X = {1} {2x : x X} {3x : x X} {5x : x X}

(2)

Where we can define to a special form so that all elements are stored
in order as well as unique to each other. Suppose that X = {x1 , x2 , x3 ...},
Y = {y1 , y2 , y3 , ...}, X 0 = {x2 , x3 , ...} and Y 0 = {y2 , y3 , ...}. We have

X : Y =

Y : X=

{x1 , X 0 Y } : x1 < y1
X Y =

{x1 , X 0 Y 0 } : x1 = y1

{y1 , X Y 0 } : x1 > y1
In a functional programming language such as Haskell, which supports lazy
evaluation, The above infinity series functions can be translate into the following
program.
ns = 1:merge (map (2) ns) (merge (map (3) ns) (map (5) ns))
merge [] l = l
merge l [] = l
merge (x:xs) (y:ys) | x <y = x : merge xs (y:ys)

0.3. THE NUMBER PUZZLE, POWER OF DATA STRUCTURE

| x ==y = x : merge xs ys
| otherwise = y : merge (x:xs) ys

By evaluate ns !! (n-1), we can get the 1500th number as below.

>ns !! (1500-1)
859963392

0.3.3

Improvement 2

Considering the above solution, although it is much faster than the brute-force
one, It still has some drawbacks. It produces many duplicated numbers and
they are finally dropped when examine the queue. Secondly, it does linear scan
and insertion to keep the order of all elements in the queue, which degrade the
ENQUEUE operation from O(1) to O(|Q|).
If we use three queues instead of using only one, we can improve the solution
one step ahead. Denote these queues as Q2 , Q3 , and Q5 , and we initialize them
as Q2 = {2}, Q3 = {3} and Q5 = {5}. Each time we DEQUEUEed the smallest
one from Q2 , Q3 , and Q5 as x. And do the following test:
If x comes from Q2 , we ENQUEUE 2x, 3x, and 5x back to Q2 , Q3 , and
Q5 respectively;
If x comes from Q3 , we only need ENQUEUE 3x to Q3 , and 5x to Q5 ;
We neednt ENQUEUE 2x to Q2 , because 2x have already existed in Q3 ;
If x comes from Q5 , we only need ENQUEUE 5x to Q5 ; there is no need
to ENQUEUE 3x, 5x to Q3 , Q5 because they have already been in the
queues;

1:
2:
3:
4:
5:
6:
7:
8:
9:
10:
11:
12:
13:
14:
15:
16:
17:
18:
19:
20:

We repeatedly ENQUEUE the smallest one until we find the n-th element.
The algorithm based on this idea is implemented as below.
function Get-Number(n)
if n = 1 then
return 1
else
Q2 {2}
Q3 {3}
Q5 {5}
while n > 1 do
x min(Head(Q2 ), Head(Q3 ), Head(Q5 ))
if x = Head(Q2 ) then
Dequeue(Q2 )
Enqueue(Q2 , 2x)
Enqueue(Q3 , 3x)
Enqueue(Q5 , 5x)
else if x = Head(Q3 ) then
Dequeue(Q3 )
Enqueue(Q3 , 3x)
Enqueue(Q5 , 5x)
else
Dequeue(Q5 )

Preface

2*min=4

3*min=6

5*min=10

3*min=9

min=2

5*min=15

min=3

2*min=8

3*min=12

5*min=20

min=4

5*min=25

min=5

Figure 4: First 4 steps of constructing numbers with Q2 , Q3 , and Q5 .

1. Queues are initialized with 2, 3, 5 as the only element;
2. New elements 4, 6, and 10 are pushed back;
3. New elements 9, and 15, are pushed back;
4. New elements 8, 12, and 20 are pushed back;
5. New element 25 is pushed back.

0.3. THE NUMBER PUZZLE, POWER OF DATA STRUCTURE

Enqueue(Q5 , 5x)
22:
nn1
23:
return x
This algorithm loops n times, and within each loop, it extract one head
element from the three queues, which takes constant time. Then it appends
one to three new elements at the end of queues which bounds to constant time
too. So the total time of the algorithm bounds to O(n). The C++ program
translated from this algorithm shown below takes less than 1 s to produce the
1500th number, 859963392.
21:

typedef unsigned long Integer;

Integer get_number(int n){
if(n==1)
return 1;
queue<Integer> Q2, Q3, Q5;
[Link](2);
[Link](3);
[Link](5);
Integer x;
while(n-- > 1){
x = min(min([Link](), [Link]()), [Link]());
if(x==[Link]()){
[Link]();
[Link](x2);
[Link](x3);
[Link](x5);
}
else if(x==[Link]()){
[Link]();
[Link](x3);
[Link](x5);
}
else{
[Link]();
[Link](x5);
}
}
return x;
}

This solution can be also implemented in Functional way. We define a function take(n), which will return the first n numbers contains only factor 2, 3, or
5.
take(n) = f (n, {1}, {2}, {3}, {5})
Where

f (n, X, Q2 , Q3 , Q5 ) =

X : n=1
f (n 1, X {x}, Q02 , Q03 , Q05 ) : otherwise

x = min(Q21 , Q31 , Q51 )

Preface

{Q22 , Q23 , ...} {2x}, Q3 {3x}, Q5 {5x}

Q2 , {Q32 , Q33 , ...} {3x}, Q5 {5x}
Q02 , Q03 , Q05 =

Q2 , Q3 , {Q52 , Q53 , ...} {5x}

: x = Q21
: x = Q31
: x = Q51

And these functional definition can be realized in Haskell as the following.

ks 1 xs _ = xs
ks n xs (q2, q3, q5) = ks (n-1) (xs++[x]) update
where
x = minimum $ map head [q2, q3, q5]
update | x == head q2 = ((tail q2)++[x2], q3++[x3], q5++[x5])
| x == head q3 = (q2, (tail q3)++[x3], q5++[x5])
| otherwise = (q2, q3, (tail q5)++[x5])
takeN n = ks n [1] ([2], [3], [5])

Invoke last takeN 1500 will generate the correct answer 859963392.

0.4

Notes and short summary

If review the 2 puzzles, we found in both cases, the brute-force solutions are so
weak. In the first problem, its quite poor in dealing with long ID list, while in
the second problem, it doesnt work at all.
The first problem shows the power of algorithms, while the second problem
tells why data structure is important. There are plenty of interesting problems,
which are hard to solve before computer was invented. With the aid of computer and programming, we are able to find the answer in a quite different way.
Compare to what we learned in mathematics course in school, we havent been
taught the method like this.
While there have been already a lot of wonderful books about algorithms,
data structures and math, however, few of them provide the comparison between
the procedural solution and the functional solution. From the above discussion,
it can be found that functional solution sometimes is very expressive and they
are close to what we are familiar in mathematics.
This series of post focus on providing both imperative and functional algorithms and data structures. Many functional data structures can be referenced
from Okasakis book[6]. While the imperative ones can be founded in classic
text books [2] or even in WIKIpedia. Multiple programming languages, including, C, C++, Python, Haskell, and Scheme/Lisp will be used. In order to make
it easy to read by programmers with different background, pseudo code and
mathematical function are the regular descriptions of each post.
The author is NOT a native English speaker, the reason why this book is
only available in English for the time being is because the contents are still
changing frequently. Any feedback, comments, or criticizes are welcome.

0.5

Structure of the contents

In the following series of post, Ill first introduce about elementary data structures before algorithms, because many algorithms need knowledge of data structures as prerequisite.

0.6. APPENDIX

The hello world data structure, binary search tree is the first topic; Then
we introduce how to solve the balance problem of binary search tree. After
that, Ill show other interesting trees. Trie, Patricia, suffix trees are useful in
text manipulation. While B-trees are commonly used in file system and data
base implementation.
The second part of data structures is about heaps. Well provide a general Heap definition and introduce about binary heaps by array and by explicit
binary trees. Then well extend to K-ary heaps including Binomial heaps, Fibonacci heaps, and pairing heaps.
Array and queues are considered among the easiest data structures typically,
However, well show how difficult to implement them in the third part.
As the elementary sort algorithms, well introduce insertion sort, quick sort,
merge sort etc in both imperative way and functional way.
The final part is about searching, besides the element searching, well also
show string matching algorithms such as KMP.
All the posts are provided under GNU FDL (Free document license), and
programs are under GNU GPL.

0.6

Appendix

All programs provided along with this book are free for downloading. download
position: [Link]

Preface

Bibliography
[1] Richard Bird. Pearls of functional algorithm design. Cambridge University Press; 1 edition (November 1, 2010). ISBN-10: 0521513383
[2] Jon Bentley. Programming Pearls(2nd Edition). Addison-Wesley Professional; 2 edition (October 7, 1999). ISBN-13: 978-0201657883
[3] Chris Okasaki. Purely Functional Data Structures. Cambridge university
press, (July 1, 1999), ISBN-13: 978-0521663502
[4] Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest and Clifford
Stein. Introduction to Algorithms, Second Edition. The MIT Press, 2001.
ISBN: 0262032937.

BIBLIOGRAPHY

Part II

Trees

Chapter 1

Binary search tree, the

hello world data structure
1.1

Introduction

Arrays or lists are typically considered the hello world data structures. However, well see they are not actually particularly easy to implement. In some
procedural settings, arrays are the most elementary data structures, and it is
possible to implement linked lists using arrays (see section 10.3 in [2]). On the
other hand, in some functional settings, linked lists are the elementary building
blocks used to create arrays and other data structures.
Considering these factors, we start with Binary Search Trees (or BST) as the
hello world data structure using an interesting problem Jon Bentley mentioned
in Programming Pearls [2]. The problem is to count the number of times each
word occurs in a large text. One solution in C++ is below:
int main(int, char ){
map<string, int> dict;
string s;
while(cin>>s)
++dict[s];
map<string, int>::iterator it=[Link]();
for(; it!=[Link](); ++it)
cout<<itfirst<<": "<<itsecond<<"n";
}

And we can run it to produce the result using the following UNIX commands
1

$ g++ [Link] -o wordcount

$ cat [Link] | ./wordcount > [Link]
The map provided in the standard template library is a kind of balanced
BST with augmented data. Here we use the words in the text as the keys and
the number of occurrences as the augmented data. This program is fast, and
1 This is not a UNIX unique command, in Windows OS, it can be achieved by:
type [Link]|[Link] > [Link]

32CHAPTER 1. BINARY SEARCH TREE, THE HELLO WORLD DATA STRUCTURE

it reflects the power of BSTs. Well introduce how to implement BSTs in this
section and show how to balance them in a later section.
Before we dive into BSTs, lets first introduce the more general binary tree.
Binary trees are recursively defined. BSTs are just one type of binary tree.
A binary tree is usually defined in the following way.
A binary tree is
either an empty node;
or a node containing 3 parts: a value, a left child which is a binary tree
and a right child which is also a binary tree.
Figure 1.1 shows this concept and an example binary tree.
k

(a) Concept of binary tree

(b) An example binary tree

Figure 1.1: Binary tree concept and an example.

A BST is a binary tree where the following applies to each node:
all the values in left child tree are less than the value of this node;
the value of this node is less than any values in its right child tree.
Figure 1.2 shows an example of binary search tree. Comparing with Figure
1.1, we can see the differences in how keys are ordered between them.

1.2. DATA LAYOUT

33
4

Figure 1.2: A Binary search tree example.

1.2

Data Layout

Based on the recursive definition of BSTs, we can draw the data layout in
procedural setting with pointers as in Figure 1.3.
The node contains a field of key, which can be augmented with satellite data;
a field contains a pointer to the left child and a field point to the right child. In
order to back-track an ancestor easily, a parent field can be provided as well.
In this post, well ignore the satellite data for simple illustration purpose.
Based on this layout, the node of binary search tree can be defined in a procedural language, such as C++ as the following.
template<class T>
struct node{
node(T x):key(x), left(0), right(0), parent(0){}
~node(){
delete left;
delete right;
}
node left;
node right;
node parent; //parent is optional, its helpful for succ/pred
T key;
};

There is another setting, for instance in Scheme/Lisp languages, the elementary data structure is linked-list. Figure 1.4 shows how a binary search tree
node can be built on top of linked-list.

34CHAPTER 1. BINARY SEARCH TREE, THE HELLO WORLD DATA STRUCTURE

key + satellite data

left
right
parent

key + satellite data

left

right

parent

...

Figure 1.3: Layout of nodes with parent field.

key

left ...

right ...

NIL

Figure 1.4: Binary search tree node layout on top of linked list. Where left...
and right ... are either empty or binary search tree node composed in the same
way.

1.3. INSERTION

Because in pure functional setting, Its hard to use pointer for back tracking
the ancestors, (and typically, there is no need to do back tracking, since we can
provide top-down solution in recursive way) there is not parent field in such
layout.
For simplified reason, well skip the detailed layout in the future, and only
focus on the logic layout of data structures. For example, below is the definition
of binary search tree node in Haskell.
data Tree a = Empty
| Node (Tree a) a (Tree a)

1.3

Insertion

To insert a key k (may be along with a value in practice) to a binary search tree
T , we can follow a quite straight forward way.
If the tree is empty, then construct a leave node with key=k;
If k is less than the key of root node, insert it to the left child;
If k is greater than the key of root, insert it to the right child;
There is an exceptional case that if k is equal to the key of root, it means it
has already existed, we can either overwrite the data, or just do nothing. For
simple reason, this case is skipped in this post.
This algorithm is described recursively. It is so simple that is why we consider
binary search tree is hello world data structure. Formally, the algorithm can
be represented with a recursive function.

insert(T, k) =

node(, k, ) : T =
node(insert(L, k), Key, R) : k < Key

node(L, Key, insert(R, k)) : otherwise

(1.1)

Where
L = lef t(T )
R = right(T )
Key = key(T )
The node function creates a new node with given left sub-tree, a key and a
right sub-tree as parameters. means NIL or Empty. function lef t, right and
key are access functions which can get the left sub-tree, right sub-tree and the
key of a node.
Translate the above functions directly to Haskell yields the following program.
insert::(Ord a) Tree a a Tree a
insert Empty k = Node Empty k Empty
insert (Node l x r) k | k < x = Node (insert l k) x r
| otherwise = Node l x (insert r k)

This program utilized the pattern matching features provided by the language. However, even in functional settings without this feature, for instance,
Scheme/Lisp, the program is still expressive.

36CHAPTER 1. BINARY SEARCH TREE, THE HELLO WORLD DATA STRUCTURE

(define (insert tree x)
(cond ((null? tree) (list () x ()))
((< x (key tree))
(make-tree (insert (left tree) x)
(key tree)
(right tree)))
((> x (key tree))
(make-tree (left tree)
(key tree)
(insert (right tree) x)))))

It is possible to turn the algorithm completely into imperative way without

recursion.
1: function Insert(T, k)
2:
root T
3:
x Create-Leaf(k)
4:
parent N IL
5:
while T 6= N IL do
6:
parent T
7:
if k < Key(T ) then
8:
T Left(T )
9:
else
10:
T Right(T )
11:
Parent(x) parent
12:
if parent = N IL then
. tree T is empty
13:
return x
14:
else if k < Key(parent) then
15:
Left(parent) x
16:
else
17:
Right(parent) x
18:
return root
function Create-Leaf(k)
x Empty-Node
21:
Key(x) k
22:
Left(x) N IL
23:
Right(x) N IL
24:
Parent(x) N IL
25:
return x
Compare with the functional algorithm, it is obviously that this one is more
complex although it is fast and can handle very deep tree. A complete C++
program and a python program are available along with this post for reference.
19:
20:

1.4

Traversing

Traversing means visiting every element one by one in a binary search tree.
There are 3 ways to traverse a binary tree, pre-order tree walk, in-order tree
walk, and post-order tree walk. The names of these traversing methods highlight
the order of when we visit the root of a binary search tree.

1.4. TRAVERSING

Since there are three parts in a tree, as left child, the root, which contains the key and satellite data, and the right child. If we denote them as
(lef t, current, right), the three traversing methods are defined as the following.
pre-order traverse, visit current, then lef t, finally right;
in-order traverse, visit lef t , then current, finally right;
post-order traverse, visit lef t, then right, finally current.
Note that each visiting operation is recursive. And we see the order of
visiting current determines the name of the traversing method.
For the binary search tree shown in figure 1.2, below are the three different
traversing results.
pre-order traverse result: 4, 3, 1, 2, 8, 7, 16, 10, 9, 14;
in-order traverse result: 1, 2, 3, 4, 7, 8, 9, 10, 14, 16;
post-order traverse result: 2, 1, 3, 7, 9, 14, 10, 16, 8, 4;
It can be found that the in-order walk of a binary search tree outputs the
elements in increase order, which is particularly helpful. The definition of binary
search tree ensures this interesting property, while the proof of this fact is left
as an exercise of this post.
In-order tree walk algorithm can be described as the following:
If the tree is empty, just return;
traverse the left child by in-order walk, then access the key, finally traverse
the right child by in-order walk.
Translate the above description yields a generic map function

: T =
map(f, T ) =
node(l0 , k 0 , r0 ) : otherwise

(1.2)

where
l0 = map(f, lef t(T ))
r0 = map(f, right(T ))
k 0 = f (key(T ))
If we only need access the key without create the transformed tree, we can
realize this algorithm in procedural way lie the below C++ program.
template<class T, class F>
void in_order_walk(node<T> t, F f){
if(t){
in_order_walk(tleft, f);
f(tvalue);
in_order_walk(tright, f);
}
}

38CHAPTER 1. BINARY SEARCH TREE, THE HELLO WORLD DATA STRUCTURE

The function takes a parameter f, it can be a real function, or a function
object, this program will apply f to the node by in-order tree walk.
We can simplified this algorithm one more step to define a function which
turns a binary search tree to a sorted list by in-order traversing.

: T =
toList(lef t(T )) {key(T )} toList(right(T )) : otherwise
(1.3)
Below is the Haskell program based on this definition.

toList(T ) =

toList::(Ord a)Tree a [a]

toList Empty = []
toList (Node l x r) = toList l ++ [x] ++ toList r

This provides us a method to sort a list of elements. We can first build

a binary search tree from the list, then output the tree by in-order traversing.
This method is called as tree sort. Lets denote the list X = {x1 , x2 , x3 , ..., xn }.
sort(X) = toList(f romList(X))

(1.4)

And we can write it in function composition form.

sort = toList.f romList
Where function f romList repeatedly insert every element to a binary search
tree.
f romList(X) = f oldL(insert, , X)

(1.5)

It can also be written in partial application form like below.

f romList = f oldL

insert

For the readers who are not familiar with folding from left, this function can
also be defined recursively as the following.

f romList(X) =

: X=
insert(f romList({x2 , x3 , ..., xn }), x1 ) : otherwise

Well intense use folding function as well as the function composition and
partial evaluation in the future, please refer to appendix of this book or [6] [7]
and [8] for more information.

Exercise 1.1
Given the in-order traverse result and pre-order traverse result, can you reconstruct the tree from these result and figure out the post-order traversing
result?
Pre-order result: 1, 2, 4, 3, 5, 6; In-order result: 4, 2, 1, 5, 3, 6; Post-order
result: ?
Write a program in your favorite language to re-construct the binary tree
from pre-order result and in-order result.

1.5. QUERYING A BINARY SEARCH TREE

Prove why in-order walk output the elements stored in a binary search
tree in increase order?
Can you analyze the performance of tree sort with big-O notation?

1.5

Querying a binary search tree

There are three types of querying for binary search tree, searching a key in the
tree, find the minimum or maximum element in the tree, and find the predecessor
or successor of an element in the tree.

1.5.1

Looking up

According to the definition of binary search tree, search a key in a tree can be
realized as the following.
If the tree is empty, the searching fails;
If the key of the root is equal to the value to be found, the search succeed.
The root is returned as the result;
If the value is less than the key of the root, search in the left child.
Else, which means that the value is greater than the key of the root, search
in the right child.
This algorithm can be described with a recursive

T :
lookup(T, x) =
lookup(lef t(T ), x) :

lookup(right(T ), x) :

function as below.
T =
key(T ) = x
x < key(T )
otherwise

(1.6)

In the real application, we may return the satellite data instead of the node
as the search result. This algorithm is simple and straightforward. Here is a
translation of Haskell program.
lookup::(Ord a) Tree a a Tree a
lookup Empty _ = Empty
lookup t@(Node l k r) x | k == x = t
| x < k = lookup l x
| otherwise = lookup r x

If the binary search tree is well balanced, which means that almost all nodes
have both non-NIL left child and right child, for N elements, the search algorithm takes O(lg N ) time to perform. This is not formal definition of balance.
Well show it in later post about red-black-tree. If the tree is poor balanced,
the worst case takes O(N ) time to search for a key. If we denote the height of
the tree as h, we can uniform the performance of the algorithm as O(h).
The search algorithm can also be realized without using recursion in a procedural manner.
1: function Search(T, x)
2:
while T 6= N IL Key(T ) 6= x do

40CHAPTER 1. BINARY SEARCH TREE, THE HELLO WORLD DATA STRUCTURE

3:
4:
5:
6:
7:

if x < Key(T ) then

T Left(T )
else
T Right(T )
return T
Below is the C++ program based on this algorithm.

template<class T>
node<T> search(node<T> t, T x){
while(t && tkey!=x){
if(x < tkey) t=tleft;
else t=tright;
}
return t;
}

1.5.2

Minimum and maximum

Minimum and maximum can be implemented from the property of binary search
tree, less keys are always in left child, and greater keys are in right.
For minimum, we can continue traverse the left sub tree until it is empty.
While for maximum, we traverse the right.

key(T ) : lef t(T ) =
min(T ) =
(1.7)
min(lef t(T )) : otherwise

key(T ) : right(T ) =
max(T ) =
(1.8)
max(right(T )) : otherwise
Both function bound to O(h) time, where h is the height of the tree. For the
balanced binary search tree, min/max are bound to O(lg N ) time, while they
are O(N ) in the worst cases.
We skip translating them to programs, Its also possible to implement them
in pure procedural way without using recursion.

1.5.3

Successor and predecessor

The last kind of querying, to find the successor or predecessor of an element

is useful when a tree is treated as a generic container and traversed by using
iterator. It will be relative easier to implement if parent of a node can be
accessed directly.
It seems that the functional solution is hard to be found, because there is no
pointer like field linking to the parent node. One solution is to left breadcrumbs
when we visit the tree, and use these information to back-track or even reconstruct the whole tree. Such data structure, that contains both the tree and
breadcrumbs is called zipper. please refer to [9] for details.
However, If we consider the original purpose of providing succ/pred function, to traverse all the binary search tree elements one by one as a generic
container, we realize that they dont make significant sense in functional settings
because we can traverse the tree in increase order by mapT function we defined
previously.

1.5. QUERYING A BINARY SEARCH TREE

Well meet many problems in this series of post that they are only valid in
imperative settings, and they are not meaningful problems in functional settings
at all. One good example is how to delete an element in red-black-tree[3].
In this section, well only present the imperative algorithm for finding the
successor and predecessor in a binary search tree.
When finding the successor of element x, which is the smallest one y that
satisfies y > x, there are two cases. If the node with value x has non-NIL right
child, the minimum element in right child is the answer; For example, in Figure
1.2, in order to find the successor of 8, we search its right sub tree for the
minimum one, which yields 9 as the result. While if node x dont have right
child, we need back-track to find the closest ancestors whose left child is also
ancestor of x. In Figure 1.2, since 2 dont have right sub tree, we go back to its
parent 1. However, node 1 dont have left child, so we go back again and reach
to node 3, the left child of 3, is also ancestor of 2, thus, 3 is the successor of
node 2.
Based on this description, the algorithm can be given as the following.
1: function Succ(x)
2:
if Right(x) 6= N IL then
3:
return Min(Right(x))
4:
else
5:
p Parent(x)
6:
while p 6= N IL and x = Right(p) do
7:
xp
8:
p Parent(p)
9:
return p
The predecessor case is quite similar to the successor algorithm, they are
symmetrical to each other.
1: function Pred(x)
2:
if Left(x) 6= N IL then
3:
return Max(Left(x))
4:
else
5:
p Parent(x)
6:
while p 6= N IL and x = Left(p) do
7:
xp
8:
p Parent(p)
9:
return p
Below are the Python programs based on these algorithms. They are changed
a bit in while loop conditions.
def succ(x):
if [Link] is not None: return tree_min([Link])
p = [Link]
while p is not None and [Link] != x:
x=p
p = [Link]
return p
def pred(x):
if [Link] is not None: return tree_max([Link])
p = [Link]

42CHAPTER 1. BINARY SEARCH TREE, THE HELLO WORLD DATA STRUCTURE

while p is not None and [Link] != x:
x=p
p = [Link]
return p

Exercise 1.2
Can you figure out how to iterate a tree as a generic container by using
pred()/succ()? Whats the performance of such traversing process in terms
of big-O?
A reader discussed about traversing all elements inside a range [a, b]. In
C++, the algorithm looks like the below code:
for each([Link] bound(12), [Link] bound(26), f );
Can you provide the purely function solution for this problem?

1.6

Deletion

Deletion is another imperative only topic for binary search tree. This is because
deletion mutate the tree, while in purely functional settings, we dont modify
the tree after building it in most application.
However, One method of deleting element from binary search tree in purely
functional way is shown in this section. Its actually reconstructing the tree but
not modifying the tree.
Deletion is the most complex operation for binary search tree. this is because
we must keep the BST property, that for any node, all keys in left sub tree are
less than the key of this node, and they are all less than any keys in right sub
tree. Deleting a node can break this property.
In this post, different with the algorithm described in [2], A simpler one from
SGI STL implementation is used.[6]
To delete a node x from a tree.
If x has no child or only one child, splice x out;
Otherwise (x has two children), use minimum element of its right sub tree
to replace x, and splice the original minimum element out.
The simplicity comes from the truth that, the minimum element is stored in
a node in the right sub tree, which cant have two non-NIL children. It ends up
in the trivial case, the the node can be directly splice out from the tree.
Figure 1.5, 1.6, and 1.7 illustrate these different cases when deleting a node
from the tree.
Based on this idea, the deletion can be defined as the below function.

node(delete(L,
x),
K,
R)

node(L, K, delete(R, x))

delete(T, x) =
R

node(L, y, delete(R, y))

:
:
:
:
:
:

T =
x<K
x>K
x=K L=
x=K R=
otherwise

(1.9)

1.6. DELETION

Tree

NIL

Figure 1.5: x can be spliced out.

Tree
Tree

x
L

NIL

(a) Before delete x

(b) After delete x

x is spliced out, and replaced by its left child.

Tree
Tree

x
R

NIL

(c) Before delete x

(d) Before delete x

x is spliced out, and replaced by its right child.

Figure 1.6: Delete a node which has only one non-NIL child.

44CHAPTER 1. BINARY SEARCH TREE, THE HELLO WORLD DATA STRUCTURE

Tree

min(R)

Tree

delete(R, min(R))

(a) Before delete x

(b) After delete x

x is replaced by splicing the minimum element from its right child.

Figure 1.7: Delete a node which has both children.

Where
L = lef t(T )
R = right(T )
K = key(T )
y = min(R)
Translating the function to Haskell yields the below program.
delete::(Ord a) Tree a a Tree a
delete Empty _ = Empty
delete (Node l k r) x | x < k = (Node (delete l x) k r)
| x > k = (Node l k (delete r x))
-- x == k
| isEmpty l = r
| isEmpty r = l
| otherwise = (Node l k (delete r k))
where k = min r

Function isEmpty is used to test if a tree is empty (). Note that the
algorithm first performs search to locate the node where the element need be
deleted, after that it execute the deletion. This algorithm takes O(h) time where
h is the height of the tree.
Its also possible to pass the node but not the element to the algorithm for
deletion. Thus the searching is no more needed.
The imperative algorithm is more complex because it need set the parent
properly. The function will return the root of the result tree.
1: function Delete(T, x)
2:
root T

1.6. DELETION

x0 x
. save x
4:
parent Parent(x)
5:
if Left(x) = N IL then
6:
x Right(x)
7:
else if Right(x) = N IL then
8:
x Left(x)
9:
else
. both children are non-NIL
10:
y Min(Right(x))
11:
Key(x) Key(y)
12:
Copy other satellite data from y to x
13:
if Parent(y) 6= x then
. y hasnt left sub tree
14:
Left(Parent(y)) Right(y)
15:
else
. y is the root of right child of x
16:
Right(x) Right(y)
17:
Remove y
18:
return root
19:
if x 6= N IL then
20:
Parent(x) parent
21:
if parent = N IL then
. We are removing the root of the tree
22:
root x
23:
else
24:
if Left(parent) = x0 then
25:
Left(parent) x
26:
else
27:
Right(parent) x
28:
Remove x0
29:
return root
Here we assume the node to be deleted is not empty (otherwise we can simply
returns the original tree). In other cases, it will first record the root of the tree,
create copy pointers to x, and its parent.
If either of the children is empty, the algorithm just splice x out. If it has
two non-NIL children, we first located the minimum of right child, replace the
key of x to ys, copy the satellite data as well, then splice y out. Note that there
is a special case that y is the root node of xs left sub tree.
Finally we need reset the stored parent if the original x has only one nonNIL child. If the parent pointer we copied before is empty, it means that we are
deleting the root node, so we need return the new root. After the parent is set
properly, we finally remove the old x from memory.
The relative Python program for deleting algorithm is given as below. Because Python provides GC, we neednt explicitly remove the node from the
memory.
3:

def tree_delete(t, x):

if x is None:
return t
[root, old_x, parent] = [t, x, [Link]]
if [Link] is None:
x = [Link]
elif [Link] is None:
x = [Link]

46CHAPTER 1. BINARY SEARCH TREE, THE HELLO WORLD DATA STRUCTURE

else:
y = tree_min([Link])
[Link] = [Link]
if [Link] != x:
[Link] = [Link]
else:
[Link] = [Link]
return root
if x is not None:
[Link] = parent
if parent is None:
root = x
else:
if [Link] == old_x:
[Link] = x
else:
[Link] = x
return root

Because the procedure seeks minimum element, it runs in O(h) time on a

tree of height h.

Exercise 1.3
There is a symmetrical solution for deleting a node which has two non-NIL
children, to replace the element by splicing the maximum one out off the
left sub-tree. Write a program to implement this solution.

1.7

Randomly build binary search tree

It can be found that all operations given in this post bound to O(h) time for a
tree of height h. The height affects the performance a lot. For a very unbalanced
tree, h tends to be O(N ), which leads to the worst case. While for balanced
tree, h close to O(lg N ). We can gain the good performance.
How to make the binary search tree balanced will be discussed in next post.
However, there exists a simple way. Binary search tree can be randomly built as
described in [2]. Randomly building can help to avoid (decrease the possibility)
unbalanced binary trees. The idea is that before building the tree, we can call
a random process, to shuffle the elements.

Exercise 1.4
Write a randomly building process for binary search tree.

1.8

Appendix

All programs are provided along with this post. They are free for downloading.
We provided C, C++, Python, Haskell, and Scheme/Lisp programs as example.

Bibliography
[1] Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest and Clifford
Stein. Introduction to Algorithms, Second Edition. ISBN:0262032937.
The MIT Press. 2001
[2] Jon Bentley. Programming Pearls(2nd Edition). Addison-Wesley Professional; 2 edition (October 7, 1999). ISBN-13: 978-0201657883
[3] Chris Okasaki. Ten Years of Purely Functional Data Structures.
[Link]
[4] SGI.
Standard
Template
[Link]

Library

Programmers

Guide.

[5] [Link] search tree

[6] [Link]
[7] [Link] composition
[8] [Link] application
[9] Miran Lipovaca. Learn You a Haskell for Great Good! A Beginners
Guide. the last chapter. No Starch Press; 1 edition April 2011, 400 pp.
ISBN: 978-1-59327-283-8

The evolution of insertion sort

Chapter 2

The evolution of insertion

sort
2.1

Introduction

In previous chapter, we introduced the hello world data structure, binary

search tree. In this chapter, we explain insertion sort, which can be think of
the hello world sorting algorithm 1 . Its straightforward, but the performance
is not as good as some divide and conqueror sorting approaches, such as quick
sort and merge sort. Thus insertion sort is seldom used as generic sorting utility
in modern software libraries. Well analyze the problems why it is slow, and
trying to improve it bit by bit till we reach the best bound of comparison based
sorting algorithms, O(n lg n), by evolution to tree sort. And we finally show the
connection between the hello world data structure and hello world sorting
algorithm.
The idea of insertion sort can be vivid illustrated by a real life poker game[2].
Suppose the cards are shuffled, and a player starts taking card one by one.
At any time, all cards in players hand are well sorted. When the player
gets a new card, he insert it in proper position according to the order of points.
Figure 2.1 shows this insertion example.
Based on this idea, the algorithm of insertion sort can be directly given as
the following.
function Sort(A)
X
for each x A do
Insert(X, x)
return X
Its easy to express this process with folding, which we mentioned in the
chapter of binary search tree.
insert = f oldL insert

(2.1)

1 Some reader may argue that Bubble sort is the easiest sort algorithm. Bubble sort isnt
covered in this book as we dont think its a valuable algorithm[1]

CHAPTER 2. THE EVOLUTION OF INSERTION SORT

Figure 2.1: Insert card 8 to proper position in a deck.

Note that in above algorithm, we store the sorted result in X, so this isnt
in-place sorting. Its easy to change it to in-place algorithm.
function Sort(A)
for i 2 to Length(A) do
insert Ai to sorted sequence {A01 , A02 , ..., A0i1 }
At any time, when we process the i-th element, all elements before i have
already been sorted. we continuously insert the current elements until consume
all the unsorted data. This idea is illustrated as in figure 9.3.

insert

... sorted elements ...

... unsorted elements ...

Figure 2.2: The left part is sorted data, continuously insert elements to sorted
part.
We can find there is recursive concept in this definition. Thus it can be
expressed as the following.

: A=
sort(A) =
(2.2)
insert(sort({A2 , A3 , ...}), A1 ) : otherwise

2.2

Insertion

We havent answered the question about how to realize insertion however. Its
a puzzle how does human locate the proper position so quickly.
For computer, its an obvious option to perform a scan. We can either scan
from left to right or vice versa. However, if the sequence is stored in plain array,
its necessary to scan from right to left.
function Sort(A)
for i 2 to Length(A) do . Insert A[i] to sorted sequence A[1...i 1]

2.2. INSERTION

x A[i]
j i1
while j > 0 x < A[j] do
A[j + 1] A[j]
j j1
A[j + 1] x
One may think scan from left to right is natural. However, it isnt as effect
as above algorithm for plain array. The reason is that, its expensive to insert an
element in arbitrary position in an array. As array stores elements continuously.
If we want to insert new element x in position i, we must shift all elements after
i, including i + 1, i + 2, ... one position to right. After that the cell at position i
is empty, and we can put x in it. This is illustrated in figure 2.3.

x
insert
A[1]

A[2]

...

A[i-1]

A[i]

A[i+1]

A[i+2]

...

A[n-1]

Figure 2.3: Insert x to array A at position i.

If the length of array is n, this indicates we need examine the first i elements,
then perform n i + 1 moves, and then insert x to the i-th cell. So insertion
from left to right need traverse the whole array anyway. While if we scan from
right to left, we totally examine the last j = n i + 1 elements, and perform
the same amount of moves. If j is small (e.g. less than n/2), there is possibility
to perform less operations than scan from left to right.
Translate the above algorithm to Python yields the following code.
def isort(xs):
n = len(xs)
for i in range(1, n):
x = xs[i]
j=i - 1
while j 0 and x < xs[j]:
xs[j+1] = xs[j]
j=j - 1
xs[j+1] = x

It can be found some other equivalent programs, for instance the following
ANSI C program. However this version isnt as effective as the pseudo code.
void isort(Key xs, int n){
int i, j;
for(i=1; i<n; ++i)
for(j=i-1; j 0 && xs[j+1] < xs[j]; --j)
swap(xs, j, j+1);
}

A[n]

empty

CHAPTER 2. THE EVOLUTION OF INSERTION SORT

This is because the swapping function, which can exchange two elements
typically uses a temporary variable like the following:
void swap(Key xs, int i, int j){
Key temp = xs[i];
xs[i] = xs[j];
xs[j] = temp;
}

So the ANSI C program presented above takes 3m times assignment, where

m is the number of inner loops. While the pseudo code as well as the Python
program use shift operation instead of swapping. There are n + 2 times assignment.
We can also provide Insert() function explicitly, and call it from the general
insertion sort algorithm in previous section. We skip the detailed realization here
and left it as an exercise.
All the insertion algorithms are bound to O(n), where n is the length of
the sequence. No matter what difference among them, such as scan from left
or from right. Thus the over all performance for insertion sort is quadratic as
O(n2 ).

Exercise 2.1
Provide explicit insertion function, and call it with general insertion sort
algorithm. Please realize it in both procedural way and functional way.

2.3

Improvement 1

Lets go back to the question, that why human being can find the proper position
for insertion so quickly. We have shown a solution based on scan. Note the fact
that at any time, all cards at hands have been well sorted, another possible
solution is to use binary search to find that location.
Well explain the search algorithms in other dedicated chapter. Binary search
is just briefly introduced for illustration purpose here.
The algorithm will be changed to call a binary search procedure.
function Sort(A)
for i 2 to Length(A) do
x A[i]
p Binary-Search(A[1...i 1], x)
for j i down to p do
A[j] A[j 1]
A[p] x
Instead of scan elements one by one, binary search utilize the information
that all elements in slice of array {A1 , ..., Ai1 } are sorted. Lets assume the
order is monotonic increase order. To find a position j that satisfies Aj1
x Aj . We can first examine the middle element, for example, Abi/2c . If x is
less than it, we need next recursively perform binary search in the first half of
the sequence; otherwise, we only need search in last half.
Every time, we halve the elements to be examined, this search process runs
O(lg n) time to locate the insertion position.
function Binary-Search(A, x)

2.4. IMPROVEMENT 2

l1
u 1+ Length(A)
while l < u do
m b l+u
2 c
if Am = x then
return m
else if Am < x then
l m+1
else
um
return l

. Find a duplicated element

The improved insertion sort algorithm is still bound to O(n2 ), compare to

previous section, which we use O(n2 ) times comparison and O(n2 ) moves, with
binary search, we just use O(n lg n) times comparison and O(n2 ) moves.
The Python program regarding to this algorithm is given below.
def isort(xs):
n = len(xs)
for i in range(1, n):
x = xs[i]
p = binary_search(xs[:i], x)
for j in range(i, p, -1):
xs[j] = xs[j-1]
xs[p] = x
def binary_search(xs, x):
l=0
u = len(xs)
while l < u:
m = (l+u)/2
if xs[m] == x:
return m
elif xs[m] < x:
l=m+1
else:
u=m
return l

Exercise 2.2
Write the binary search in recursive manner. You neednt use purely functional programming language.

2.4

Improvement 2

Although we improve the search time to O(n lg n) in previous section, the number of moves is still O(n2 ). The reason of why movement takes so long time, is
because the sequence is stored in plain array. The nature of array is continuously layout data structure, so the insertion operation is expensive. This hints
us that we can use linked-list setting to represent the sequence. It can improve

CHAPTER 2. THE EVOLUTION OF INSERTION SORT

the insertion operation from O(n) to constant time O(1).

insert(A, x) =

{x} : A =
{x} A : x < A1

{A1 } insert({A2 , A3 , ...An }, x) : otherwise

(2.3)

Translating the algorithm to Haskell yields the below program.

insert :: (Ord a) [a] a [a]
insert [] x = [x]
insert (y:ys) x = if x < y then x:y:ys else y:insert ys x

And we can complete the two versions of insertion sort program based on
the first two equations in this chapter.
isort [] = []
isort (x:xs) = insert (isort xs) x

Or we can represent the recursion with folding.

isort = foldl insert []

Linked-list setting solution can also be described imperatively. Suppose

function Key(x), returns the value of element stored in node x, and Next(x)
accesses the next node in the linked-list.
function Insert(L, x)
p N IL
HL
while L 6= N IL Key(L) < Key(x) do
pL
L Next(L)
Next(x) L
if p 6= N IL then
Hx
else
Next(p) x
return H
For example in ANSI C, the linked-list can be defined as the following.
struct node{
Key key;
struct node next;
};

Thus the insert function can be given as below.

struct node insert(struct node lst, struct node x){
struct node p, head;
p = NULL;
for(head = lst; lst && xkey > lstkey; lst = lstnext)
p = lst;
xnext = lst;
if(!p)
return x;
pnext = x;

2.5. FINAL IMPROVEMENT BY BINARY SEARCH TREE

return head;
}

Instead of using explicit linked-list such as by pointer or reference based

structure. Linked-list can also be realized by another index array. For any
array element Ai , N exti stores the index of next element follows Ai . It means
AN exti is the next element after Ai .
The insertion algorithm based on this solution is given like below.
function Insert(A, N ext, i)
j
while N extj 6= N IL AN extj < Ai do
j N extj
N exti N extj
N extj i
Here means the head of the N ext table. And the relative Python program
for this algorithm is given as the following.
def isort(xs):
n = len(xs)
next = [-1](n+1)
for i in range(n):
insert(xs, next, i)
return next
def insert(xs, next, i):
j = -1
while next[j] != -1 and xs[next[j]] < xs[i]:
j = next[j]
next[j], next[i] = i, next[j]

Although we change the insertion operation to constant time by using linkedlist. However, we have to traverse the linked-list to find the position, which
results O(n2 ) times comparison. This is because linked-list, unlike array, doesnt
support random access. It means we cant use binary search with linked-list
setting.

Exercise 2.3
Complete the insertion sort by using linked-list insertion function in your
favorate imperative programming language.
The index based linked-list return the sequence of rearranged index as
result. Write a program to re-order the original array of elements from
this result.

2.5

Final improvement by binary search tree

It seems that we drive into a corner. We must improve both the comparison
and the insertion at the same time, or we will end up with O(n2 ) performance.
We must use binary search, this is the only way to improve the comparison
time to O(lg n). On the other hand, we must change the data structure, because
we cant achieve constant time insertion at a position with plain array.

CHAPTER 2. THE EVOLUTION OF INSERTION SORT

This remind us about our hello world data structure, binary search tree. It
naturally support binary search from its definition. At the same time, We can
insert a new leaf in binary search tree in O(1) constant time if we already find
the location.
So the algorithm changes to this.
function Sort(A)
T
for each x A do
T Insert-Tree(T, x)
return To-List(T )
Where Insert-Tree() and To-List() are described in previous chapter
about binary search tree.
As we have analyzed for binary search tree, the performance of tree sort is
bound to O(n lg n), which is the lower limit of comparison based sort[3].

2.6

Short summary

In this chapter, we present the evolution process of insertion sort. Insertion sort
is well explained in most textbooks as the first sorting algorithm. It has simple
and straightforward idea, but the performance is quadratic. Some textbooks
stop here, but we want to show that there exist ways to improve it by different
point of view. We first try to save the comparison time by using binary search,
and then try to save the insertion operation by changing the data structure to
linked-list. Finally, we combine these two ideas and evolute insertion sort to
tree sort.

Bibliography
[1] [Link] sort
[2] Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest and Clifford
Stein. Introduction to Algorithms, Second Edition. ISBN:0262032937.
The MIT Press. 2001
[3] Donald E. Knuth. The Art of Computer Programming, Volume 3: Sorting
and Searching (2nd Edition). Addison-Wesley Professional; 2 edition (May
4, 1998) ISBN-10: 0201896850 ISBN-13: 978-0201896855

Red black tree

Chapter 3

Red-black tree, not so

complex as it was thought
3.1

Introduction

3.1.1

Exploit the binary search tree

We have shown the power of binary search tree by using it to count the occurrence of every word in Bible. The idea is to use binary search tree as a dictionary
for counting.
One may come to the idea that to feed a yellow page book 1 to a binary
search tree, and use it to look up the phone number for a contact.
By modifying a bit of the program for word occurrence counting yields the
following code.
int main(int, char ){
ifstream f("[Link]");
map<string, string> dict;
string name, phone;
while(f>>name && f>>phone)
dict[name]=phone;
for(;;){
cout<<"nname: ";
cin>>name;
if([Link](name)==[Link]())
cout<<"not found";
else
cout<<"phone: "<<dict[name];
}
}

This program works well. However, if you replace the STL map with the binary search tree as mentioned previously, the performance will be bad, especially
when you search some names such as Zara, Zed, Zulu.
This is because the content of yellow page is typically listed in lexicographic
order. Which means the name list is in increase order. If we try to insert a
1A

name-phone number contact list book

60CHAPTER 3. RED-BLACK TREE, NOT SO COMPLEX AS IT WAS THOUGHT

sequence of number 1, 2, 3, ..., n to a binary search tree. We will get a tree like
in Figure 3.1.
1

...

Figure 3.1: unbalanced tree

This is a extreme unbalanced binary search tree. The looking up performs
O(h) for a tree with height h. In balanced case, we benefit from binary search
tree by O(lg N ) search time. But in this extreme case, the search time downgraded to O(N ). Its no better than a normal link-list.

Exercise 3.1
For a very big yellow page list, one may want to speed up the dictionary
building process by two concurrent tasks (threads or processes). One task
reads the name-phone pair from the head of the list, while the other one
reads from the tail. The building terminates when these two tasks meet
at the middle of the list. What will be the binary search tree looks like
after building? What if you split the the list more than two and use more
tasks?
Can you find any more cases to exploit a binary search tree? Please
consider the unbalanced trees shown in figure 3.2.

3.1.2

How to ensure the balance of the tree

In order to avoid such case, we can shuffle the input sequence by randomized
algorithm, such as described in Section 12.4 in [2]. However, this method doesnt
always work, for example the input is fed from user interactively, and the tree
need to be built/updated after each input.
There are many solutions people have ever found to make binary search tree
balanced. Many of them rely on the rotation operations to binary search tree.
Rotation operations change the tree structure while maintain the ordering of
the elements. Thus it either improve or keep the balance property of the binary
search tree.

3.1. INTRODUCTION

n
n

n-1

n-2

n-1

...

(a)

(b)

m-1

m+1

m-2

m+2

...

(c)

Figure 3.2: Some unbalanced trees

62CHAPTER 3. RED-BLACK TREE, NOT SO COMPLEX AS IT WAS THOUGHT

In this chapter, well first introduce about red-black tree, which is one of
the most popular and widely used self-adjusting balanced binary search tree. In
next chapter, well introduce about AVL tree, which is another intuitive solution;
In later chapter about binary heaps, well show another interesting tree called
splay tree, which can gradually adjust the the tree to make it more and more
balanced.

3.1.3

Tree rotation

(a)

(b)

Figure 3.3: Tree rotation, rotate-left transforms the tree from left side to right
side, and rotate-right does the inverse transformation.
Tree rotation is a kind of special operation that can transform the tree
structure without changing the in-order traverse result. It based on the fact
that for a specified ordering, there are multiple binary search trees correspond
to it. Figure 3.3 shows the tree rotation. For a binary search tree on the left
side, left rotate transforms it to the tree on the right, and right rotate does the
inverse transformation.
Although tree rotation can be realized in procedural way, there exists quite
simple function description if using pattern matching.

rotateL(T ) =

node(node(a, X, b), Y, c)
T

: pattern(T ) = node(a, X, node(b, Y, c))

: otherwise
(3.1)

node(a, X, node(b, Y, c)) : pattern(T ) = node(node(a, X, b), Y, c))

T : otherwise
(3.2)
However, the pseudo code dealing imperatively has to set all fields accordingly.
1: function Left-Rotate(T, x)
2:
p Parent(x)
3:
y Right(x)
. Assume y 6= N IL
4:
a Left(x)
5:
b Left(y)
6:
c Right(y)

rotateR(T ) =

3.1. INTRODUCTION
7:
8:
9:
10:
11:
12:
13:
14:
15:
16:
17:
18:
19:
20:
21:
22:
23:
24:
25:
26:
27:
28:
29:
30:
31:
32:
33:
34:
35:
36:
37:
38:
39:

Replace(x, y)
Set-Children(x, a, b)
Set-Children(y, x, c)
if p = N IL then
T y
return T
function Right-Rotate(T, y)
p Parent(y)
x Left(y)
a Left(x)
b Right(x)
c Right(y)
Replace(y, x)
Set-Children(y, b, c)
Set-Children(x, a, y)
if p = N IL then
T x
return T

. Assume x 6= N IL

function Set-Left(x, y)
Left(x) y
if y 6= N IL then Parent(y) x
function Set-Right(x, y)
Right(x) y
if y 6= N IL then Parent(y) x
function Set-Children(x, L, R)
Set-Left(x, L)
Set-Right(x, R)
function Replace(x, y)
if Parent(x) = N IL then
if y 6= N IL then Parent(y) N IL
else if Left(Parent(x)) = x then Set-Left(Parent(x), y)
elseSet-Right(Parent(x), y)
Parent(x) N IL

Compare these pseudo codes with the pattern matching functions, the former
focus on the structure states changing, while the latter focus on the rotation
process. As the title of this chapter indicated, red-black tree neednt be so
complex as it was thought. Most traditional algorithm text books use the classic
procedural way to teach red-black tree, there are several cases need to deal and
all need carefulness to manipulate the node fields. However, by changing the
mind to functional settings, things get intuitive and simple. Although there is
some performance overhead.
Most of the content in this chapter is based on Chris Okasakis work in [2].

64CHAPTER 3. RED-BLACK TREE, NOT SO COMPLEX AS IT WAS THOUGHT

3.2

Definition of red-black tree

Red-black tree is a type of self-balancing binary search tree[3]. 2 By using color

changing and rotation, red-black tree provides a very simple and straightforward
way to keep the tree balanced.
For a binary search tree, we can augment the nodes with a color field, a node
can be colored either red or black. We call a binary search tree red-black tree
if it satisfies the following 5 properties[2].
1. Every node is either red or black.
2. The root is black.
3. Every leaf (NIL) is black.
4. If a node is red, then both its children are black.
5. For each node, all paths from the node to descendant leaves contain the
same number of black nodes.
Why this 5 properties can ensure the red-black tree is well balanced? Because
they have a key characteristic, the longest path from root to a leaf cant be as
2 times longer than the shortest path.
Please note the 4-th property, which means there wont be two adjacent red
nodes. so the shortest path only contains black nodes, any paths longer than
the shortest one has interval red nodes. According to property 5, all paths have
the same number of black nodes, this finally ensure there wont be any path is
2 times longer than others[3]. Figure 3.4 shows an example red-black tree.
13

Figure 3.4: An example red-black tree

2 Red-black tree is one of the equivalent form of 2-3-4 tree (see chapter B-tree about 2-3-4
tree). That is to say, for any 2-3-4 tree, there is at least one red-black tree has the same data
order.

3.3. INSERTION

All read only operations such as search, min/max are as same as in binary
search tree. While only the insertion and deletion are special.
As we have shown in word occurrence example, many implementation of
set or map container are based on red-black tree. One example is the C++
Standard library (STL)[6].
As mentioned previously, the only change in data layout is that there is color
information augmented to binary search tree. This can be represented as a data
field in imperative languages such as C++ like below.
enum Color {Red, Black};
template <class T>
struct node{
Color color;
T key;
node left;
node right;
node parent;
};

In functional settings, we can add the color information in constructors,

below is the Haskell example of red-black tree definition.
data Color = R | B
data RBTree a = Empty
| Node Color (RBTree a) a (RBTree a)

Exercise 3.2
Can you prove that a red-black tree with n nodes has height at most
2 lg(n + 1)?

3.3

Insertion

Inserting a new node as what has been mentioned in binary search tree may
cause the tree unbalanced. The red-black properties has to be maintained, so
we need do some fixing by transform the tree after insertion.
When we insert a new key, one good practice is to always insert it as a red
node. As far as the new inserted node isnt the root of the tree, we can keep all
properties except the 4-th one. that it may bring two adjacent red nodes.
Functional and procedural implementation have different fixing methods.
One is intuitive but has some overhead, the other is a bit complex but has
higher performance. Most text books about algorithm introduce the later one.
In this chapter, we focus on the former to show how easily a red-black tree
insertion algorithm can be realized. The traditional procedural method will be
given only for comparison purpose.
As described by Chris Okasaki, there are total 4 cases which violate property
4. All of them has 2 adjacent red nodes. However, they have a uniformed form
after fixing[2] as shown in figure 4.3.
Note that this transformation will move the redness one level up. So this is a
bottom-up recursive fixing, the last step will make the root node red. According

66CHAPTER 3. RED-BLACK TREE, NOT SO COMPLEX AS IT WAS THOUGHT

@
A

@
R
@

@
I
@

@
A

Figure 3.5: 4 cases for balancing a red-black tree after insertion

3.3. INSERTION

to property 2, root is always black. Thus we need final fixing to revert the root
color to black.
Observing that the 4 cases and the fixed result have strong pattern features,
the fixing function can be defined by using the similar method we mentioned in
tree rotation. To avoid too long formula, we abbreviate Color as C, Black as
B, and Red as R.

: match(T )
: otherwise
(3.3)
where function node() can construct a red-black tree node with 4 parameters
as color, the left child, the key and the right child. Function match() can test
if a tree is one of the 4 possible patterns as the following.
balance(T ) =

node(R, node(B, A, x, B), y, node(B, C, z, D))

pattern(T ) = node(B, node(R, node(R, A, x, B), y, C), z, D)

pattern(T ) = node(B, node(R, A, x, node(R, B, y, C), z, D))
match(T ) =
pattern(T ) = node(B, A, x, node(R, B, y, node(R, C, z, D)))
pattern(T ) = node(B, A, x, node(R, node(R, B, y, C), z, D))
With function balance() defined, we can modify the previous binary search
tree insertion functions to make it work for red-black tree.
insert(T, k) = makeBlack(ins(T, k))

(3.4)

where

ins(T, k) =

node(R, , k, ) : T =
balance(node(ins(L, k), Key, R)) : k < Key

balance(node(L, Key, ins(R, k))) : otherwise

(3.5)

L, R, Key represent the left child, right child and the key of a tree.
L = lef t(T )
R = right(T )
Key = key(T )
Function makeBlack() is defined as the following, it forces the color of a
non-empty tree to be black.
makeBlack(T ) = node(B, L, Key, R)

(3.6)

Summarize the above functions and use language supported pattern matching features, we can come to the following Haskell program.
insert::(Ord a)RBTree a a RBTree a
insert t x = makeBlack $ ins t where
ins Empty = Node R Empty x Empty
ins (Node color l k r)
| x<k
= balance color (ins l) k r
| otherwise = balance color l k (ins r) --[3]
makeBlack(Node _ l k r) = Node B l k r

68CHAPTER 3. RED-BLACK TREE, NOT SO COMPLEX AS IT WAS THOUGHT

balance::Color RBTree a a RBTree a RBTree a

balance B (Node R (Node R a x b) y c) z d =
Node R (Node B a x b) y (Node B c z d)
balance B (Node R a x (Node R b y c)) z d =
Node R (Node B a x b) y (Node B c z d)
balance B a x (Node R b y (Node R c z d)) =
Node R (Node B a x b) y (Node B c z d)
balance B a x (Node R (Node R b y c) z d) =
Node R (Node B a x b) y (Node B c z d)
balance color l k r = Node color l k r

Note that the balance function is changed a bit from the original definition.
Instead of passing the tree, we pass the color, the left child, the key and the
right child to it. This can save a pair of boxing and un-boxing operations.
This program doesnt handle the case of inserting a duplicated key. However,
it is possible to handle it either by overwriting, or skipping. Another option is
to augment the data with a linked list[2].
Figure 3.6 shows two red-black trees built from feeding list 11, 2, 14, 1, 7,
15, 5, 8, 4 and 1, 2, ..., 8.
7

Figure 3.6: insert results generated by the Haskell algorithm

This algorithm shows great simplicity by summarizing the uniform feature
from the four different unbalanced cases. It is expressive over the traditional tree
rotation approach, that even in programming languages which dont support
pattern matching, the algorithm can still be implemented by manually check
the pattern. A Scheme/Lisp program is available along with this book can be
referenced as an example.
The insertion algorithm takes O(lg N ) time to insert a key to a red-black
tree which has N nodes.

Exercise 3.3

Write a program in an imperative language, such as C, C++ or python

to realize the same algorithm in this section. Note that, because there is
no language supported pattern matching, you need to test the 4 different
cases manually.

3.4. DELETION

3.4

Deletion

Remind the deletion section in binary search tree. Deletion is imperative only
for red-black tree as well. In typically practice, it often builds the tree just one
time, and performs looking up frequently after that. Okasaki explained why
he didnt provide red-black tree deletion in his work in [3]. One reason is that
deletions are much messier than insertions.
The purpose of this section is just to show that red-black tree deletion is
possible in purely functional settings, although it actually rebuilds the tree
because trees are read only in terms of purely functional data structure. In real
world, its up to the user (or actually the programmer) to adopt the proper
solution. One option is to mark the node be deleted with a flag, and perform
a tree rebuilding when the number of deleted nodes exceeds 50% of the total
number of nodes.
Not only in functional settings, even in imperative settings, deletion is more
complex than insertion. We face more cases to fix. Deletion may also violate the
red black tree properties, so we need fix it after the normal deletion as described
in binary search tree.
The deletion algorithm in this book are based on top of a handout in [5].
The problem only happens if you try to delete a black node, because it will
violate the 4-th property of red-black tree, which means the number of black
node in the path may decreased so that it is not uniformed black-height any
more.
When delete a black node, we can resume red-black property number 4 by
introducing a doubly-black concept[2]. It means that the although the node
is deleted, the blackness is kept by storing it in the parent node. If the parent
node is red, it turns to black, However, if it has been already black, it turns to
doubly-black.
In order to express the doubly-black node, The definition need some modification accordingly.
data Color = R | B | BB -- BB: doubly black for deletion
data RBTree a = Empty | BBEmpty -- doubly black empty
| Node Color (RBTree a) a (RBTree a)

When deleting a node, we first perform the same deleting algorithm in binary search tree mentioned in previous chapter. After that, if the node to be
sliced out is black, we need fix the tree to keep the red-black properties. Lets
denote the empty tree as , and for non-empty tree, it can be decomposed to
node(Color, L, Key, R) for its color, left sub-tree, key and the right sub-tree.
The delete function is defined as the following.

delete(T, k) = blackenRoot(del(T, k))

(3.7)

70CHAPTER 3. RED-BLACK TREE, NOT SO COMPLEX AS IT WAS THOUGHT

where

f
ixBlack
(node(C,
del(L,
k),
Key,
R))

f ixBlack

(node(C, L, Key, del(R, k)))

mkBlk(R) : C = B
del(T, k) =
=
R : otherwise

mkBlk(L)
: C=B

L
: otherwise

f ixBlack2 (node(C, L, k 0 , del(R, k 0 )))

: T =
: k < Key
: k > Key
: L=

(3.8)

: R=
: otherwise

The real deleting happens inside function del. For the trivial case, that the
tree is empty, the deletion result is ; If the key to be deleted is less than the
key of the current node, we recursively perform deletion on its left sub-tree; if
it is bigger than the key of the current node, then we recursively delete the key
from the right sub-tree; Because it may bring doubly-blackness, so we need fix
it.
If the key to be deleted is equal to the key of the current node, we need
splice it out. If one of its children is empty, we just replace the node by the
other one and reserve the blackness of this node. otherwise we cut and past the
minimum element k 0 = min(R) from the right sub-tree.
Function delete just forces the result tree of del to have a black root. This
is realized by function blackenRoot.

: T =
blackenRoot(T ) =
(3.9)
node(B, L, Key, R) : otherwise
Compare with the makeBlack function, which is defined in red-black tree
insertion section, they are almost same, except for the case of empty tree. This
is only valid in deletion, because insertion cant result an empty tree, while
deletion may.
Function mkBlk is defined to reserved the blackness of a node. If the node
to be sliced isnt black, this function wont be applied, otherwise, it turns a red
node to black and turns a black node to doubly-black. This function also marks
an empty tree to doubly-black empty.

: T =

node(B, L, Key, R) : C = R
mkBlk(T ) =
(3.10)
node(B 2 , L, Key, R) : C = B

T : otherwise
where means doubly-black empty node and B2 is the doubly-black color.
Summarizing the above functions yields the following Haskell program.
delete::(Ord a)RBTree a a RBTree a
delete t x = blackenRoot(del t x) where
del Empty _ = Empty
del (Node color l k r) x
| x < k = fixDB color (del l x) k r
| x > k = fixDB color l k (del r x)
-- x == k, delete this node
| isEmpty l = if color==B then makeBlack r else r

3.4. DELETION

| isEmpty r = if color==B then makeBlack l else l

| otherwise = fixDB color l k (del r k) where k= min r
blackenRoot (Node _ l k r) = Node B l k r
blackenRoot _ = Empty
makeBlack::RBTree
makeBlack (Node B
makeBlack (Node _
makeBlack Empty =
makeBlack t = t

a RBTree a
l k r) = Node BB l k r -- doubly black
l k r) = Node B l k r
BBEmpty

The final attack to the red-black tree deletion algorithm is to realize the
f ixBlack2 function. The purpose of this function is to eliminate the doublyblack colored node by rotation and color changing.
Lets solve the doubly-black empty node first. For any node, If one of its
child is doubly-black empty, and the other child is non-empty, we can safely
replace the doubly-black empty with a normal empty node.
Like figure 3.7, if we are going to delete the node 4 from the tree (Instead
show the whole tree, only part of the tree is shown), the program will use a
doubly-black empty node to replace node 4. In the figure, the doubly-black
node is shown in black circle with 2 edges. It means that for node 5, it has a
doubly-black empty left child and has a right non-empty child (a leaf node with
key 6). In such case we can safely change the doubly-black empty to normal
empty node. which wont violate any red-black properties.
3

(a) Delete 4 from the tree.

3
3
2

5
2

NIL

6
1

(b) After 4 is sliced off, it is doubly-black empty.

(c) We can safely change it to normal NIL.

Figure 3.7: One child is doubly-black empty node, the other child is non-empty
On the other hand, if a node has a doubly-black empty node and the other
child is empty, we have to push the doubly-blackness up one level. For example

72CHAPTER 3. RED-BLACK TREE, NOT SO COMPLEX AS IT WAS THOUGHT

in figure 3.8, if we want to delete node 1 from the tree, the program will use a
doubly-black empty node to replace 1. Then node 2 has a doubly-black empty
node and has an empty right node. In such case we must mark node 2 as
doubly-black after change its left child back to empty.

(a) Delete 1 from the tree.

3
3

5
2

NIL

6
4

(b) After 1 is sliced off, it is doubly-black empty. (c) We must push the doubly-blackness up
to node 2.

Figure 3.8: One child is doubly-black empty node, the other child is empty.

Based on above analysis, in order to fix the doubly-black empty node, we

define the function partially like the following.

node(B2 , , Key, )

node(C, , Key, R)
f ixBlack2 (T ) =
node(C, L, Key, )

...

:
:
:
:

(L = R = ) (L = R = )
L = R 6=
R = L 6=
...
(3.11)

After dealing with doubly-black empty node, we need to fix the case that
the sibling of the doubly-black node is black and it has one red child. In this
situation, we can fix the doubly-blackness with one rotation. Actually there are
4 different sub-cases, all of them can be transformed to one uniformed pattern.
They are shown in the figure 3.9. These cases are described in [2] as case 3 and
case 4.

3.4. DELETION

Figure 3.9: Fix the doubly black by rotation, the sibling of the doubly-black
node is black, and it has one red child.

74CHAPTER 3. RED-BLACK TREE, NOT SO COMPLEX AS IT WAS THOUGHT

The handling of these 4 sub-cases can be defined on top of formula 3.11.

... : ...
node(C, node(B, mkBlk(A), x, B), y, node(B, C, z, D)) : p1.1
f ixBlack (T ) =
node(C, node(B, A, x, B), y, node(B, C, z, mkBlk(D))) : p1.2

... : ...
(3.12)
where p1.1 and p1.2 each represent 2 patterns as the following.
2

node(C, A, x, node(B, node(R, B, y, C), z, D)) Color(A) = B 2

p1.1 =

node(C, A, x, node(B, B, y, node(R, C, z, D))) Color(A) = B2

node(C, node(B, A, x, node(R, B, y, C)), z, D) Color(D) = B2

p1.2 =

node(C, node(B, node(R, A, x, B), y, C), z, D) Color(D) = B2

Besides the above cases, there is another one that not only the sibling of the
doubly-black node is black, but also its two children are black. We can change
the color of the sibling node to red; resume the doubly-black node to black and
propagate the doubly-blackness one level up to the parent node as shown in
figure 3.10. Note that there are two symmetric sub-cases. This case is described
in [2] as case 2.
We go on adding this fixing after formula 3.12.

...
mkBlk(node(C, mkBlk(A), x, node(R, B, y, C)))
f ixBlack (T ) =
mkBlk(node(C, node(R, A, x, B), y, mkBlk(C)))

...
2

:
:
:
:

...
p2.1
p2.2
...
(3.13)

where p2.1 and p2.2 are two patterns as below.

node(C, A, x, node(B, B, y, C))
p2.1 =
Color(A) = B2 Color(B) = Color(C) = B

node(C, node(B, A, x, B), y, C)
p2.2 =
Color(C) = B2 Color(A) = Color(B) = B
There is a final case left, that the sibling of the doubly-black node is red.
We can do a rotation to change this case to pattern p1.1 or p1.2. Figure 3.11
shows about it.
We can finish formula 3.13 with 3.14.

...
f ixBlack2 (B, f ixBlack2 (node(R, A, x, B), y, C)
f ixBlack (T ) =
f ixBlack2 (B, A, x, f ixBlack2 (node(R, B, y, C))

T
2

:
:
:
:

...
p3.1
p3.2
otherwise
(3.14)

3.4. DELETION

=
b

(a) Color of x can be either black or red. (b) If x was red, then it becomes black, otherwise, it becomes doubly-black.
y

=
a

Figure 3.10: propagate the blackness up.

Figure 3.11: The sibling of the doubly-black node is red.

76CHAPTER 3. RED-BLACK TREE, NOT SO COMPLEX AS IT WAS THOUGHT

where p3.1 and p3.2 are two patterns as the following.
p3.1 = {Color(T ) = B Color(L) = B 2 Color(R) = R}

p3.2 = {Color(T ) = B Color(L) = R Color(R) = B 2 }

This two cases are described in [2] as case 1.
Fixing the doubly-black node with all above different cases is a recursive
function. There are two termination conditions. One contains pattern p1.1 and
p1.2, The doubly-black node was eliminated. The other cases may continuously
propagate the doubly-blackness from bottom to top till the root. Finally the
algorithm marks the root node as black anyway. The doubly-blackness will be
removed.
Put formula 3.11, 3.12, 3.13, and 3.14 together, we can write the final Haskell
program.
fixDB::Color RBTree a a RBTree a RBTree a
fixDB color BBEmpty k Empty = Node BB Empty k Empty
fixDB color BBEmpty k r = Node color Empty k r
fixDB color Empty k BBEmpty = Node BB Empty k Empty
fixDB color l k BBEmpty = Node color l k Empty
-- the sibling is black, and it has one red child
fixDB color a@(Node BB _ _ _) x (Node B (Node R b y c) z d) =
Node color (Node B (makeBlack a) x b) y (Node B c z d)
fixDB color a@(Node BB _ _ _) x (Node B b y (Node R c z d)) =
Node color (Node B (makeBlack a) x b) y (Node B c z d)
fixDB color (Node B a x (Node R b y c)) z d@(Node BB _ _ _) =
Node color (Node B a x b) y (Node B c z (makeBlack d))
fixDB color (Node B (Node R a x b) y c) z d@(Node BB _ _ _) =
Node color (Node B a x b) y (Node B c z (makeBlack d))
-- the sibling and its 2 children are all black, propagate the blackness up
fixDB color a@(Node BB _ _ _) x (Node B b@(Node B _ _ _) y c@(Node B _ _ _))
= makeBlack (Node color (makeBlack a) x (Node R b y c))
fixDB color (Node B a@(Node B _ _ _) x b@(Node B _ _ _)) y c@(Node BB _ _ _)
= makeBlack (Node color (Node R a x b) y (makeBlack c))
-- the sibling is red
fixDB B a@(Node BB _ _ _) x (Node R b y c) = fixDB B (fixDB R a x b) y c
fixDB B (Node R a x b) y c@(Node BB _ _ _) = fixDB B a x (fixDB R b y c)
-- otherwise
fixDB color l k r = Node color l k r

The deletion algorithm takes O(lg N ) time to delete a key from a red-black
tree with N nodes.

Exercise 3.4

As we mentioned in this section, deletion can be implemented by just

marking the node as deleted without actually removing it. Once the number of marked nodes exceeds 50% of the total node number, a tree re-build
is performed. Try to implement this method in your favorite programming
language.

3.5. IMPERATIVE RED-BLACK TREE ALGORITHM ?

3.5

Imperative red-black tree algorithm ?

We almost finished all the content in this chapter. By induction the patterns, we
can implement the red-black tree in a simple way compare to the imperative tree
rotation solution. However, we should show the comparator for completeness.
For insertion, the basic idea is to use the similar algorithm as described in
binary search tree. And then fix the balance problem by rotation and return
the final result.
1: function Insert(T, k)
2:
root T
3:
x Create-Leaf(k)
4:
Color(x) RED
5:
parent N IL
6:
while T 6= N IL do
7:
parent T
8:
if k < Key(T ) then
9:
T Left(T )
10:
else
11:
T Right(T )
12:
Parent(x) parent
13:
if parent = N IL then
. tree T is empty
14:
return x
15:
else if k < Key(parent) then
16:
Left(parent) x
17:
else
18:
Right(parent) x
19:
return Insert-Fix(root, x)
The only difference from the binary search tree insertion algorithm is that
we set the color of the new node as red, and perform fixing before return. It is
easy to translate the pseudo code to real imperative programming language, for
instance Python 3 .
def rb_insert(t, key):
root = t
x = Node(key)
parent = None
while(t):
parent = t
if(key < [Link]):
t = [Link]
else:
t = [Link]
if parent is None: #tree is empty
root = x
elif key < [Link]:
parent.set_left(x)
else:
parent.set_right(x)
return rb_insert_fix(root, x)
3 C,

and C++ source codes are available along with this book

78CHAPTER 3. RED-BLACK TREE, NOT SO COMPLEX AS IT WAS THOUGHT

There are 3 base cases for fixing, and if we take the left-right symmetric
into consideration. there are total 6 cases. Among them two cases can be
merged together, because they all have uncle node in red color, we can toggle
the parent color and uncle color to black and set grand parent color to red.
With this merging, the fixing algorithm can be realized as the following.
1: function Insert-Fix(T, x)
2:
while Parent(x) 6= N IL and Color(Parent(x)) = RED do
3:
if Color(Uncle(x)) = RED then
. Case 1, xs uncle is red
4:
Color(Parent(x)) BLACK
5:
Color(Grand-Parent(x)) RED
6:
Color(Uncle(x)) BLACK
7:
x Grandparent(x)
8:
else
. xs uncle is black
9:
if Parent(x) = Left(Grand-Parent(x)) then
10:
if x = Right(Parent(x)) then . Case 2, x is a right child
11:
x Parent(x)
12:
T Left-Rotate(T, x)
. Case 3, x is a left child
13:
Color(Parent(x)) BLACK
14:
Color(Grand-Parent(x)) RED
15:
T Right-Rotate(T , Grand-Parent(x))
16:
else
17:
if x = Left(Parent(x)) then
. Case 2, Symmetric
18:
x Parent(x)
19:
T Right-Rotate(T, x)
. Case 3, Symmetric
20:
Color(Parent(x)) BLACK
21:
Color(Grand-Parent(x)) RED
22:
T Left-Rotate(T , Grand-Parent(x))
23:
Color(T ) BLACK
24:
return T
This program takes O(lg N ) time to insert a new key to the red-black tree.
Compare this pseudo code and the balance function we defined in previous
section, we can see the difference. They differ not only in terms of simplicity,
but also in logic. Even if we feed the same series of keys to the two algorithms,
they may build different red-black trees. There is a bit performance overhead
in the pattern matching algorithm. Okasaki discussed about the difference in
detail in his paper[2].
Translate the above algorithm to Python yields the below program.
# Fix the redred violation
def rb_insert_fix(t, x):
while([Link] and [Link]==RED):
if [Link]().color == RED:
#case 1: ((a:R x:R b) y:B c:R) = ((a:R x:B b) y:R c:B)
set_color([[Link], [Link](), [Link]()],
[BLACK, RED, BLACK])
x = [Link]()
else:
if [Link] == [Link]().left:
if x == [Link]:

3.6. MORE WORDS

#case 2: ((a x:R b:R) y:B c) = case 3

x = [Link]
t=left_rotate(t, x)
# case 3: ((a:R x:R b) y:B c) = (a:R x:B (b y:R c))
set_color([[Link], [Link]()], [BLACK, RED])
t=right_rotate(t, [Link]())
else:
if x == [Link]:
#case 2: (a x:B (b:R y:R c)) = case 3
x = [Link]
t = right_rotate(t, x)
# case 3: (a x:B (b y:R c:R)) = ((a x:R b) y:B c:R)
set_color([[Link], [Link]()], [BLACK, RED])
t=left_rotate(t, [Link]())
[Link] = BLACK
return t

Figure 3.12 shows the results of feeding same series of keys to the above
python insertion program. Compare them with figure 3.6, one can tell the
difference clearly.
11
5

14
2

(a)

(b)

Figure 3.12: Red-black trees created by imperative algorithm.

We skip the red-black tree delete algorithm in imperative settings, because
it is even more complex than the insertion. The implementation of deleting is
left as an exercise of this chapter.

Exercise 3.5
Implement the red-black tree deleting algorithm in your favorite imperative programming language. you can refer to [2] for algorithm details.

3.6

More words

Red-black tree is the most popular implementation of balanced binary search

tree. Another one is the AVL tree, which well introduce in next chapter. Redblack tree can be a good start point for more data structures. If we extend the

80CHAPTER 3. RED-BLACK TREE, NOT SO COMPLEX AS IT WAS THOUGHT

number of children from 2 to K, and keep the balance as well, it leads to Btree, If we store the data along with edge but not inside node, it leads to Tries.
However, the multiple cases handling and the long program tends to make new
comers think red-black tree is complex.
Okasakis work helps making the red-black tree much easily understand.
There are many implementation in other programming languages in that manner
[7]. Its also inspired me to find the pattern matching solution for Splay tree
and AVL tree etc.

Bibliography
[1] Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest and Clifford
Stein. Introduction to Algorithms, Second Edition. ISBN:0262032937.
The MIT Press. 2001
[2] Chris Okasaki. FUNCTIONAL PEARLS Red-Black Trees in a Functional
Setting. J. Functional Programming. 1998
[3] Chris Okasaki. Ten Years of Purely Functional Data Structures.
[Link]
[4] Wikipedia. Red-black tree. [Link] tree
[5] Lyn Turbak. Red-Black Trees. [Link]/ cs231/fall01/[Link] Nov. 2, 2001.
[6] SGI STL. [Link]
[7] Pattern matching. [Link] matching

AVL tree

Chapter 4

AVL tree
4.1
4.1.1

Introduction
How to measure the balance of a tree?

Besides red-black tree, are there any other intuitive solutions of self-balancing
binary search tree? In order to measure how balancing a binary search tree,
one idea is to compare the height of the left sub-tree and right sub-tree. If
they differs a lot, the tree isnt well balanced. Lets denote the difference height
between two children as below
(T ) = |L| |R|

(4.1)

Where |T | means the height of tree T , and L, R denotes the left sub-tree
and right sub-tree.
If (T ) = 0, The tree is definitely balanced. For example, a complete binary
tree has N = 2h 1 nodes for height h. There is no empty branches unless the
leafs. Another trivial case is empty tree. () = 0. The less absolute value of
(T ) the more balancing the tree is.
We define (T ) as the balance factor of a binary search tree.

4.2

Definition of AVL tree

An AVL tree is a special binary search tree, that all sub-trees satisfying the
following criteria.
|(T )| 1

(4.2)

The absolute value of balance factor is less than or equal to 1, which means
there are only three valid values, -1, 0 and 1. Figure 4.1 shows an example AVL
tree.
Why AVL tree can keep the tree balanced? In other words, Can this definition ensure the height of the tree as O(lg N ) where N is the number of the
nodes in the tree? Lets prove this fact.
For an AVL tree of height h, The number of nodes varies. It can have at
most 2h 1 nodes for a complete binary tree. We are interesting about how
83

CHAPTER 4. AVL TREE

Figure 4.1: An example AVL tree

many nodes there are at least. Lets denote the minimum number of nodes for
height h AVL tree as N (h). Its obvious for the trivial cases as below.
For empty tree, h = 0, N (0) = 0;
For a singleton root, h = 1, N (1) = 1;
Whats the situation for common case N (h)? Figure 4.2 shows an AVL tree
T of height h. It contains three part, the root node, and two sub trees A, B.
We have the following fact.
h = max(height(L), height(R)) + 1

(4.3)

We immediately know that, there must be one child has height h 1. Lets
say height(A) = h 1. According to the definition of AVL tree, we have.
|height(A) height(B)| 1. This leads to the fact that the height of other
tree B cant be lower than h 2, So the total number of the nodes of T is the
number of nodes in tree A, and B plus 1 (for the root node). We exclaim that.
N (h) = N (h 1) + N (h 2) + 1

(4.4)

This recursion reminds us the famous Fibonacci series. Actually we can

transform it to Fibonacci series by defining N 0 (h) = N (h) + 1. So equation 4.4
changes to.
N 0 (h) = N 0 (h 1) + N 0 (h 2)

(4.5)

Lemma 4.2.1. Let N (h) be the minimum number of nodes for an AVL tree
with height h. and N 0 (h) = N (h) + 1, then
N 0 (h) h
Where =

5+1
2

is the golden ratio.

Proof. For the trivial case, we have

(4.6)

4.2. DEFINITION OF AVL TREE

85
k

h-1

h-2

Figure 4.2: An AVL tree with height h, one of the sub-tree with height h 1,
the other is h 2
h = 0, N 0 (0) = 1 0 = 1
h = 1, N 0 (1) = 2 1 = 1.618...
For the induction case, suppose N 0 (h) h .
N 0 (h + 1)

= N 0 (h) + N 0 (h 1) {F ibonacci}
h + h1
= h1 ( + 1)
{ + 1 = 2 =
h+1
=

5+3
2 }

From Lemma 4.2.1, we immediately get

h log (N + 1) = log (2) lg(N + 1) 1.44 lg(N + 1)

(4.7)

It tells that the height of AVL tree is proportion to O(lg N ), which means
that AVL tree is balanced.
During the basic mutable tree operations such as insertion and deletion, if
the balance factor changes to any invalid value, some fixing has to be performed
to resume || within 1. Most implementations utilize tree rotations. In this
chapter, well show the pattern matching solution which is inspired by Okasakis
red-black tree solution[2]. Because of this modify-fixing approach, AVL tree is
also a kind of self-balancing binary search tree. For comparison purpose, well
also show the procedural algorithms.
Of course we can compute the value recursively, another option is to store
the balance factor inside each nodes, and update them when we modify the tree.
The latter one avoid computing the same value every time.
Based on this idea, we can add one data field to the original binary search
tree as the following C++ code example 1 .
template <class T>
struct node{
int delta;
T key;
node left;
1 Some

implementations store the height of a tree instead of as in [5]

CHAPTER 4. AVL TREE

node right;
node parent;
};

In purely functional setting, some implementation use different constructor

to store the information. for example in [1], there are 4 constructors, E, N,
P, Z defined. E for empty tree, N for tree with negative 1 balance factor, P for
tree with positive 1 balance factor and Z for zero case.
In this chapter, well explicitly store the balance factor inside the node.
data AVLTree a = Empty
| Br (AVLTree a) a (AVLTree a) Int

The immutable operations, including looking up, finding the maximum and
minimum elements are all same as the binary search tree. Well skip them and
focus on the mutable operations.

4.3

Insertion

Insert a new element to an AVL tree may violate the AVL tree property that the
absolute value exceeds 1. To resume it, one option is to do the tree rotation
according to the different insertion cases. Most implementation is based on this
approach
Another way is to use the similar pattern matching method mentioned by
Okasaki in his red-black tree implementation [2]. Inspired by this idea, it is
possible to provide a simple and intuitive solution.
When insert a new key to the AVL tree, the balance factor of the root may
changes in range [1, 1], and the height may increase at most by one, which
we need recursively use this information to update the value in upper level
nodes. We can define the result of the insertion algorithm as a pair of data
(T 0 , H). Where T 0 is the new tree and H is the increment of height. Lets
denote function f irst(pair) which can return the first element in a pair. We
can modify the binary search tree insertion algorithm as the following to handle
AVL tree.
insert(T, k) = f irst(ins(T, k))

(4.8)

where

ins(T, k) =

(node(, k, , 0), 1) :
(tree(ins(L, k), Key, (R, 0)), ) :

(tree((L, 0), Key, ins(R, k)), ) :

T =
k < Key
otherwise

(4.9)

L, R, Key, represent the left child, right child, the key and the balance
factor of a tree.
L = lef t(T )
R = right(T )
Key = key(T )
= (T )

4.3. INSERTION

When we insert a new key k to a AVL tree T , if the tree is empty, we just
need create a leaf node with k, set the balance factor as 0, and the height is
increased by one. This is the trivial case. Function node() is defined to build a
tree by taking a left sub-tree, a right sub-tree, a key and a balance factor.
If T isnt empty, we need compare the Key with k. If k is less than the key,
we recursively insert it to the left child, otherwise we insert it into the right
child.
As we defined above, the result of the recursive insertion is a pair like
(L0 , Hl ), we need do balancing adjustment as well as updating the increment
of height. Function tree() is defined to dealing with this task. It takes 4 parameters as (L0 , Hl ), Key, (R0 , Hr ), and . The result of this function is
defined as (T 0 , H), where T 0 is the new tree after adjustment, and H is the
new increment of height which is defined as
H = |T 0 | |T |

(4.10)

This can be further detailed deduced in 4 cases.

= |T 0 | |T |
= 1 + max(|R0 |, |L0 |) (1 + max(|R|, |L|))
0
= max(|R
|, |L0 |) max(|R|, |L|)

Hr : 0 0 0

+ Hr : 0 0 0
=
Hl : 0 0 0

Hl : otherwise

(4.11)

To prove this equation, note the fact that the height cant increase both in
left and right with only one insertion.
These 4 cases can be explained from the definition of balance factor definition
that it equal to the difference from the right sub tree and left sub tree.
If 0 and 0 0, it means that the height of right sub tree isnt less
than the height of left sub tree both before insertion and after insertion.
In this case, the increment in height of the tree is only contributed from
the right sub tree, which is Hr .
If 0, which means the height of left sub tree isnt less than the height
of right sub tree before, and it becomes 0 0, which means that the
height of right sub tree increases due to insertion, and the left side keeps
same (|L0 | = |L|). So the increment in height is
H

= max(|R0 |, |L0 |) max(|R|, |L|)

= |R0 | |L0 |
= |R| + Hr |L|
= + Hr

{ 0 0 0}
{|L| = |L0 |}

For the case 0 0 0, Similar as the second one, we can get.

= max(|R0 |, |L0 |) max(|R|, |L|)

= |L0 | |R|
= |L| + Hl |R|
= Hl

{ 0 0 0}

CHAPTER 4. AVL TREE

For the last case, the both and 0 is no bigger than zero, which means
the height left sub tree is always greater than or equal to the right sub
tree, so the increment in height is only contributed from the right sub
tree, which is Hl .

The next problem in front of us is how to determine the new balancing factor
value 0 before performing balancing adjustment. According to the definition
of AVL tree, the balancing factor is the height of right sub tree minus the height
of right sub tree. We have the following facts.
0

= |R0 | |L0 |
= |R| + Hr (|L| + Hl )
= |R| |L| + Hr Hl
= + Hr Hl

(4.12)

With all these changes in height and balancing factor get clear, its possible
to define the tree() function mentioned in (4.9).
tree((L0 , Hl ), Key, (R0 , Hr ), ) = balance(node(L0 , Key, R0 , 0 ), H)
(4.13)
Before we moving into details of balancing adjustment, lets translate the
above equations to real programs in Haskell.
First is the insert function.
insert::(Ord a)AVLTree a a AVLTree a
insert t x = fst $ ins t where
ins Empty = (Br Empty x Empty 0, 1)
ins (Br l k r d)
| x<k
= tree (ins l) k (r, 0) d
| x == k
= (Br l k r d, 0)
| otherwise = tree (l, 0) k (ins r) d

Here we also handle the case that inserting a duplicated key (which means
the key has already existed.) as just overwriting.
tree::(AVLTree a, Int) a (AVLTree a, Int) Int (AVLTree a, Int)
tree (l, dl) k (r, dr) d = balance (Br l k r d, delta) where
d = d + dr - dl
delta = deltaH d d dl dr

And the definition of height increment is as below.

deltaH :: Int Int
deltaH d d dl dr
| d 0 && d
| d 0 && d
| d 0 && d
| otherwise =

4.3.1

Int Int Int

0 = dr
0 = d+dr
0 = dl - d
dl

Balancing adjustment

As the pattern matching approach is adopted in doing re-balancing. We need

consider what kind of patterns violate the AVL tree property.

4.3. INSERTION

Figure 4.3 shows the 4 cases which need fix. For all these 4 cases the balancing factors are either -2, or +2 which exceed the range of [1, 1]. After
balancing adjustment, this factor turns to be 0, which means the height of left
sub tree is equal to the right sub tree.

(z) = 2

(x) = 2

(y) = 1
y

@
B

(x) = 1

@
I
@

(x) = 2

@
A

(z) = 1

@
R
@
x

(z) = 2

(y) = 1

0 (y) = 0

Figure 4.3: 4 cases for balancing a AVL tree after insertion

We call these four cases left-left lean, right-right lean, right-left lean, and leftright lean cases in clock-wise direction from top-left. We denote the balancing
factor before fixing as (x), (y), and (z), while after fixing, they changes to
0 (x), 0 (y), and 0 (z) respectively.
Well next prove that, after fixing, we have (y) = 0 for all four cases, and
well provide the result values of 0 (x) and 0 (z).

CHAPTER 4. AVL TREE

Left-left lean case

As the structure of sub tree x doesnt change due to fixing, we immediately get
0 (x) = (x).
Since (y) = 1 and (z) = 2, we have
(y) = |C| |x| = 1 |C| = |x| 1
(z) = |D| |y| = 2 |D| = |y| 2

(4.14)

= |D| |C|
{F rom(4.14)}
= |y| 2 (|x| 1)
= |y| |x| 1
{x is child of y |y| |x| = 1}
=0

(4.15)

After fixing.
0 (z)

For 0 (y), we have the following fact after fixing.

0 (y) = |z| |x|
= 1 + max(|C|, |D|) |x| {By (4.15), we have|C| = |D|}
= 1 + |C| |x|
{By (4.14)}
= 1 + |x| 1 |x|
=0

(4.16)

Summarize the above results, the left-left lean case adjust the balancing
factors as the following.
0 (x) = (x)
0 (y) = 0
0 (z) = 0

(4.17)

Right-right lean case

Since right-right case is symmetric to left-left case, we can easily achieve the
result balancing factors as
0 (x) = 0
0 (y) = 0
0 (z) = (z)

(4.18)

Right-left lean case

First lets consider 0 (x). After balance fixing, we have.
0 (x) = |B| |A|

(4.19)

Before fixing, if we calculate the height of z, we can get.

|z| = 1 + max(|y|, |D|)
= 1 + |y|
= 2 + max(|B|, |C|)

{(z) = 1 |y| > |D|}

(4.20)

4.3. INSERTION

While since (x) = 2, we can deduce that.

(x) = 2 |z| |A| = 2
2 + max(|B|, |C|) |A| = 2
max(|B|, |C|) |A| = 0

{By (4.20)}
(4.21)

If (y) = 1, which means |C| |B| = 1, it means

max(|B|, |C|) = |C| = |B| + 1

(4.22)

Take this into (4.21) yields

|B| + 1 |A| = 0 |B| |A| = 1 {By (4.19) }
0 (x) = 1

(4.23)

If (y) 6= 1, it means max(|B|, |C|) = |B|, taking this into (4.21), yields.
|B| |A| = 0 {By (4.19)}
0 (x) = 0

(4.24)

Summarize these 2 cases, we get relationship of 0 (x) and (y) as the following.
0 (x) =

1 : (y) = 1
0 : otherwise

(4.25)

For 0 (z) according to definition, it is equal to.

0 (z) = |D| |C|
{(z) = 1 = |D| |y|}
= |y| |C| 1
{|y| = 1 + max(|B|, |C|)}
= max(|B|, |C|) |C|

(4.26)

If (y) = 1, then we have |C| |B| = 1, so max(|B|, |C|) = |B| = |C| + 1.

Takes this into (4.26), we get 0 (z) = 1.
If (y) 6= 1, then max(|B|, |C|) = |C|, we get 0 (z) = 0.
Combined these two cases, the relationship between 0 (z) and (y) is as
below.
0 (z) =

1 :
0 :

(y) = 1
otherwise

(4.27)

Finally, for 0 (y), we deduce it like below.

0 (y) = |z| |x|
= max(|C|, |D|) max(|A|, |B|)

(4.28)

There are three cases.

If (y) = 0, it means |B| = |C|, and according to (4.25) and (4.27), we
have 0 (x) = 0 |A| = |B|, and 0 (z) = 0 |C| = |D|, these lead to
0 (y) = 0.

CHAPTER 4. AVL TREE

If (y) = 1, From (4.27), we have 0 (z) = 0 |C| = |D|.
0 (y) = max(|C|, |D|) max(|A|, |B|)
= |C| max(|A|, |B|)
= |C| (|B| + 1)
=0

{|C| = |D|}
{From (4.25): 0 (x) = 1 |B| |A| = 1}
{(y) = 1 |C| |B| = 1}

If (y) = 1, From (4.25), we have 0 (x) = 0 |A| = |B|.

0 (y) = max(|C|, |D|) max(|A|, |B|)
= max(|C|, |D|) |B|
= |C| + 1 |B|
=0

{|A| = |B|}
{From (4.27): |D| |C| = 1}
{(y) = 1 |C| |B| = 1}

Both three cases lead to the same result that 0 (y) = 0.

Collect all the above results, we get the new balancing factors after fixing as
the following.

1 : (y) = 1
0 (x) =
0 : otherwise
0 (y) =
0
(4.29)
1 : (y) = 1
0
(z) =
0 : otherwise
Left-right lean case
Left-right lean case is symmetric to the Right-left lean case. By using the similar
deduction, we can find the new balancing factors are identical to the result in
(4.29).

4.3.2

Pattern Matching

All the problems have been solved and its time to define the final pattern
matching fixing function.

(node(node(A, x, B, (x)), y, node(C, z, D, 0), 0), 0) :

(node(node(A, x, B, 0), y, node(C, z, D, (z)), 0), 0) :
balance(T, H) =
(node(node(A,
x, B, 0 (x)), y, node(C, z, D, 0 (z)), 0), 0) :

(T, H) :
(4.30)
Where Pll (T ) means the pattern of tree T is left-left lean respectively. 0 (x)
and delta0 (z) are defined in (4.29). The four patterns are tested as below.
Pll (T ) = node(node(node(A, x, B, (x)), y, C, 1), z, D, 2)
Prr (T ) = node(A, x, node(B, y, node(C, z, D, (z)), 1), 2)
Prl (T ) = node(node(A, x, node(B, y, C, (y)), 1), z, D, 2)
Plr (T ) = node(A, x, node(node(B, y, C, (y)), z, D, 1), 2)

(4.31)

Translating the above function definition to Haskell yields a simple and intuitive program.

Pll (T )
Prr (T )
Prl (T ) Plr (T )
otherwise

4.3. INSERTION

balance :: (AVLTree a, Int) (AVLTree a, Int)

balance (Br (Br (Br a x b dx) y c (-1)) z d (-2),
(Br (Br a x b dx) y (Br c z d 0) 0, 0)
balance (Br a x (Br b y (Br c z d dz)
1)
2,
(Br (Br a x b 0) y (Br c z d dz) 0, 0)
balance (Br (Br a x (Br b y c dy)
1) z d (-2),
(Br (Br a x b dx) y (Br c z d dz) 0, 0)
dx = if dy == 1 then -1 else 0
dz = if dy == -1 then 1 else 0
balance (Br a x (Br (Br b y c dy) z d (-1))
2,
(Br (Br a x b dx) y (Br c z d dz) 0, 0)
dx = if dy == 1 then -1 else 0
dz = if dy == -1 then 1 else 0
balance (t, d) = (t, d)

_) =
_) =
_) =
where

_) =
where

The insertion algorithm takes time proportion to the height of the tree, and
according to the result we proved above, its performance is O(lg N ) where N is
the number of elements stored in the AVL tree.
Verification
One can easily create a function to verify a tree is AVL tree. Actually we need
verify two things, first, its a binary search tree; second, it satisfies AVL tree
property.
We left the first verification problem as an exercise to the reader.
In order to test if a binary tree satisfies AVL tree property, we can test
the difference in height between its two children, and recursively test that both
children conform to AVL property until we arrive at an empty leaf.

avl?(T ) =

T rue
avl?(L) avl?(R) ||R| |L|| 1

: T =
: otherwise

(4.32)

And the height of a AVL tree can also be calculate from the definition.

|T | =

0 : T =
1 + max(|R|, |L|) : otherwise

(4.33)

The corresponding Haskell program is given as the following.

isAVL :: (AVLTree a) Bool
isAVL Empty = True
isAVL (Br l _ r d) = and [isAVL l, isAVL r, abs (height r - height l) 1]
height :: (AVLTree a) Int
height Empty = 0
height (Br l _ r _) = 1 + max (height l) (height r)

Exercise 4.1
Write a program to verify a binary tree is a binary search tree in your
favorite programming language. If you choose to use an imperative language,
please consider realize this program without recursion.

CHAPTER 4. AVL TREE

4.4

Deletion

As we mentioned before, deletion doesnt make significant sense in purely functional settings. As the tree is read only, its typically performs frequently looking
up after build.
Even if we implement deletion, its actually re-building the tree as we presented in chapter of red-black tree. We left the deletion of AVL tree as an
exercise to the reader.

Exercise 4.2

Take red-black tree deletion algorithm as an example, write the AVL tree
deletion program in purely functional approach in your favorite programming language.
Write the deletion algorithm in imperative approach in your favorite programming language.

4.5

Imperative AVL tree algorithm ?

We almost finished all the content in this chapter about AVL tree. However, it
necessary to show the traditional insert-and-rotate approach as the comparator
to pattern matching algorithm.
Similar as the imperative red-black tree algorithm, the strategy is first to do
the insertion as same as for binary search tree, then fix the balance problem by
rotation and return the final result.
1: function Insert(T, k)
2:
root T
3:
x Create-Leaf(k)
4:
(x) 0
5:
parent N IL
6:
while T 6= N IL do
7:
parent T
8:
if k < Key(T ) then
9:
T Left(T )
10:
else
11:
T Right(T )
12:
Parent(x) parent
13:
if parent = N IL then
. tree T is empty
14:
return x
15:
else if k < Key(parent) then
16:
Left(parent) x
17:
else
18:
Right(parent) x
19:
return AVL-Insert-Fix(root, x)
Note that after insertion, the height of the tree may increase, so that the
balancing factor may also change, insert on right side will increase by 1,
while insert on left side will decrease it. By the end of this algorithm, we need
perform bottom-up fixing from node x towards root.

4.5. IMPERATIVE AVL TREE ALGORITHM ?

We can translate the pseudo code to real programming language, such as

Python 2 .
def avl_insert(t, key):
root = t
x = Node(key)
parent = None
while(t):
parent = t
if(key < [Link]):
t = [Link]
else:
t = [Link]
if parent is None: #tree is empty
root = x
elif key < [Link]:
parent.set_left(x)
else:
parent.set_right(x)
return avl_insert_fix(root, x)

This is a top-down algorithm search the tree from root down to the proper
position and insert the new key as a leaf. By the end of this algorithm, it calls
fixing procedure, by passing the root and the new node inserted.
Note that we reuse the same methods of set left() and set right() as we
defined in chapter of red-black tree.
In order to resume the AVL tree balance property by fixing, we first determine if the new node is inserted on left hand or right hand. If it is on left, the
balancing factor decreases, otherwise it increases. If we denote the new value
as 0 , there are 3 cases of the relationship between and 0 .
If || = 1 and | 0 | = 0, this means adding the new node makes the tree
perfectly balanced, the height of the parent node doesnt change, the algorithm can be terminated.
If || = 0 and | 0 | = 1, it means that either the height of left sub tree or
right sub tree increases, we need go on check the upper level of the tree.
If || = 1 and | 0 | = 2, it means the AVL tree property is violated due to
the new insertion. We need perform rotation to fix it.
1:
2:
3:
4:
5:
6:
7:
8:
9:
10:
11:

function AVL-Insert-Fix(T, x)
while Parent(x) 6= N IL do
(Parent(x))
if x = Left(Parent(x)) then
0 1
else
0 + 1
(Parent(x)) 0
P Parent(x)
L Left(x)
R Right(x)

and C++ source code are available along with this book

96
12:
13:
14:
15:
16:
17:
18:
19:
20:
21:
22:
23:
24:
25:
26:
27:
28:
29:
30:
31:
32:
33:
34:
35:
36:
37:
38:
39:
40:
41:
42:
43:
44:
45:
46:
47:
48:
49:
50:
51:
52:
53:
54:

CHAPTER 4. AVL TREE

if || = 1 and | 0 | = 0 then
. Height doesnt change, terminates.
return T
else if || = 0 and | 0 | = 1 then
. Go on bottom-up updating.
xP
else if || = 1 and | 0 | = 2 then
if 0 = 2 then
if (R) = 1 then
. Right-right case
(P ) 0
. By (4.18)
(R) 0
T Left-Rotate(T, P )
if (R) = 1 then
. Right-left case
y (Left(R))
. By (4.29)
if y = 1 then
(P ) 1
else
(P ) 0
(Left(R)) 0
if y = 1 then
(R) 1
else
(R) 0
T Right-Rotate(T, R)
T Left-Rotate(T, P )
if 0 = 2 then
if (L) = 1 then
. Left-left case
(P ) 0
(L) 0
Right-Rotate(T, P )
else
. Left-Right case
y (Right(L))
if y = 1 then
(L) 1
else
(L) 0
(Right(L)) 0
if y = 1 then
(P ) 1
else
(P ) 0
Left-Rotate(T, L)
Right-Rotate(T, P )
break
return T

Here we reuse the rotation algorithms mentioned in red-black tree chapter.

Rotation operation doesnt update balancing factor at all, However, since
rotation changes (actually improves) the balance situation we should update
these factors. Here we refer the results from above section. Among the four
cases, right-right case and left-left case only need one rotation, while right-left

4.5. IMPERATIVE AVL TREE ALGORITHM ?

case and left-right case need two rotations.
The relative python program is shown as the following.
def avl_insert_fix(t, x):
while [Link] is not None:
d2 = d1 = [Link]
if x == [Link]:
d2 = d2 - 1
else:
d2 = d2 + 1
[Link] = d2
(p, l, r) = ([Link], [Link], [Link])
if abs(d1) == 1 and abs(d2) == 0:
return t
elif abs(d1) == 0 and abs(d2) == 1:
x = [Link]
elif abs(d1)==1 and abs(d2) == 2:
if d2 == 2:
if [Link] == 1: # Right-right case
[Link] = 0
[Link] = 0
t = left_rotate(t, p)
if [Link] == -1: # Right-Left case
dy = [Link]
if dy == 1:
[Link] = -1
else:
[Link] = 0
[Link] = 0
if dy == -1:
[Link] = 1
else:
[Link] = 0
t = right_rotate(t, r)
t = left_rotate(t, p)
if d2 == -2:
if [Link] == -1: # Left-left case
[Link] = 0
[Link] = 0
t = right_rotate(t, p)
if [Link] == 1: # Left-right case
dy = [Link]
if dy == 1:
[Link] = -1
else:
[Link] = 0
[Link] = 0
if dy == -1:
[Link] = 1
else:
[Link] = 0
t = left_rotate(t, l)
t = right_rotate(t, p)
break
return t

CHAPTER 4. AVL TREE

We skip the AVL tree deletion algorithm and left this as an exercise to the
reader.

4.6

Chapter note

AVL tree was invented in 1962 by Adelson-Velskii and Landis[3], [4]. The name
AVL tree comes from the two inventors name. Its earlier than red-black tree.
Its very common to compare AVL tree and red-black tree, both are selfbalancing binary search trees, and for all the major operations, they both consume O(lg N ) time. From the result of (4.7), AVL tree is more rigidly balanced
hence they are faster than red-black tree in looking up intensive applications
[3]. However, red-black trees could perform better in frequently insertion and
removal cases.
Many popular self-balancing binary search tree libraries are implemented on
top of red-black tree such as STL etc. However, AVL tree provides an intuitive
and effective solution to the balance problem as well.
After this chapter, well extend the tree data structure from storing data in
node to storing information on edges, which leads to Trie and Patrica, etc. If
we extend the number of children from two to more, we can get B-tree. These
data structures will be introduced next.

Bibliography
[1] [Link] [Link]
[2] Chris Okasaki. FUNCTIONAL PEARLS Red-Black Trees in a Functional
Setting. J. Functional Programming. 1998
[3] Wikipedia. AVL tree. [Link] tree
[4] Guy Cousinear, Michel Mauny. The Functional Approach to Programming. Cambridge University Press; English Ed edition (October 29, 1998).
ISBN-13: 978-0521576819
[5] Pavel Grafov. Implementation of an
[Link]

AVL

tree

Python.

100

Trie and Patricia

Chapter 5

Trie and Patricia

5.1

Introduction

The binary trees introduced so far store information in nodes. Its possible to
store the information in edges. Trie and Patricia are important data structures
in information retrieving and manipulating. They were invented in 1960s. And
are widely used in compiler design[2], and bio-information area, such as DNA
pattern matching [3].

0
1

011

100

1011

Figure 5.1: Radix tree.

Figure 5.1 shows a radix tree[2]. It contains the strings of bit 1011, 10, 011,
100 and 0. When searching a key k = (b0 b1 ...bn )2 , we take the first bit b0 (MSB
101

102

CHAPTER 5. TRIE AND PATRICIA

from left), check if it is 0 or 1, if it is 0, we turn left; and turn right for 1. Then
we take the second bit and repeat this search till either meet a leaf or finish all
n bits.
The radix tree neednt store keys in node at all. The information is represented by edges. The nodes marked with keys in the above figure are only for
illustration purpose.
It is very natural to come to the idea is it possible to represent key in
integer instead of string? Because integer can be written in binary format,
such approach can save spaces. Another advantage is that the speed is fast
because we can use bit-wise manipulation in most programming environment.

5.2

Integer Trie

The data structure shown in figure 5.1 is called as binary trie. Trie is invented
by Edward Fredkin. It comes from retrieval, pronounce as /tri:/ by the
inventor, while it is pronounced /trai/ try by other authors [5]. Trie is also
called prefix tree or radix tree. A binary trie is a special binary tree in which
the placement of each key is controlled by its bits, each 0 means go left and
each 1 means go right[2].
Because integers can be represented in binary format, it is possible to store
integer keys rather than 0, 1 strings. When insert an integer as the new key to
the trie, we change it to binary form, then examine the first bit, if it is 0, we
recursively insert the rest bits to the left sub tree; otherwise if it is 1, we insert
into the right sub tree.
There is a problem when treat the key as integer. Consider a binary trie
shown in figure 5.2. If represented in 0, 1 strings, all the three keys are different.
But they are identical when turn into integers. Where should we insert decimal
3, for example, to the trie?
One approach is to treat all the prefix zero as effective bits. Suppose the
integer is represented with 32-bits, If we want to insert key 1, it ends up with
a tree of 32 levels. There are 31 nodes, each only has the left sub tree. the last
node only has the right sub tree. It is very inefficient in terms of space.
Okasaki shows a method to solve this problem in [2]. Instead of using bigendian integer, we can use the little-endian integer to represent key. Thus
decimal integer 1 is represented as binary 1. Insert it to the empty binary trie,
the result is a trie with a root and a right leaf. There is only 1 level. decimal 2
is represented as 01, and decimal 3 is (11)2 in little-endian binary format. There
is no need to add any prefix 0, the position in the trie is uniquely determined.

5.2.1

Definition of integer Trie

In order to define the little-endian binary trie, we can reuse the structure of
binary tree. A binary trie node is either empty, or a branch node. The branch
node contains a left child, a right node, and optional value as satellite data. The
left sub tree is encoded as 0 and the right sub tree is encoded as 1.
The following example Haskell code defines the trie algebraic data type.
data IntTrie a = Empty
| Branch (IntTrie a) (Maybe a) (IntTrie a)

5.2. INTEGER TRIE

103

011

0011

Figure 5.2: a big-endian trie

Below is another example definition in Python.
class IntTrie:
def __init__(self):
[Link] = [Link] = None
[Link] = None

5.2.2

Insertion

Since the key is little-endian integer, when insert a key, we take the bit one by
one from the right most (LSB). If it is 0, we go to the left, otherwise go to the
right for 1. If the child is empty, we need create a new node, and repeat this to
the last bit (MSB) of the key.
1: function Insert(T, k, v)
2:
if T = NIL then
3:
T Empty-Node
4:
pT
5:
while k 6= 0 do
6:
if Even?(k) then
7:
if Left(p) = NIL then
8:
Left(p) Empty-Node
9:
p Left(p)
10:
else
11:
if Right(p) = NIL then
12:
Right(p) Empty-Node
13:
p Right(p)

104

CHAPTER 5. TRIE AND PATRICIA

k bk/2c
15:
Data(p) v
16:
return T
This algorithm takes 3 arguments, a Trie T , a key k, and the satellite date v.
The following Python example code implements the insertion algorithm. The
satellite data is optional, it is empty by default.
14:

def trie_insert(t, key, value = None):

if t is None:
t = IntTrie()
p=t
while key != 0:
if key & 1 == 0:
if [Link] is None:
[Link] = IntTrie()
p = [Link]
else:
if [Link] is None:
[Link] = IntTrie()
p = [Link]
key = key>>1
[Link] = value
return t

Figure 5.2 shows a trie which is created by inserting pairs of key and value
{1 a, 4 b, 5 c, 9 d} to the empty trie.

1
1:a

4:b

1
5:c

1
9:d

Figure 5.3: An little-endian integer binary trie for the map {1 a, 4 b, 5

c, 9 d}.
Because the definition of the integer trie is recursive, its nature to define
the insertion algorithm recursively. If the LSB is 0, it means that the key to
be inserted is even, we recursively insert to the left child, we can divide the

5.2. INTEGER TRIE

105

key by 2 to get rid of the LSB. If the LSB is 1, the key is odd number, the
recursive insertion is applied to the right child. For trie T , denote the left and
right children as Tl and Tr respectively. Thus T = (Tl , d, Tr ), where d is the
optional satellite data. if T is empty, Tl , Tr and d are defined as empty as well.

insert(T, k, v) =

(Tl , v, Tr ) : k = 0
(insert(Tl , k/2, v), d, Tr ) : even(k)

(Tl , d, insert(Tr , bk/2c, v)) : otherwise

(5.1)

If the key to be inserted already exists, this algorithm just overwrites the
previous stored data. It can be replaced with other alternatives, such as storing
data as with linked-list etc.
The following Haskell example program implements the insertion algorithm.
insert t 0 x = Branch (left t) (Just x) (right t)
insert t k x | even k = Branch (insert (left t) (k div 2) x) (value t) (right t)
| otherwise = Branch (left t) (value t) (insert (right t) (k div 2) x)
left (Branch l _ _) = l
left Empty = Empty
right (Branch _ _ r) = r
right Empty = Empty
value (Branch _ v _) = v
value Empty = Nothing

For a given integer k with m bits in binary, the insertion algorithm reuses
m levels. The performance is bound to O(m) time.

5.2.3

Look up

To look up key k in the little-endian integer binary trie. We take each bit of
k from the left (LSB), then go left if this bit is 0, otherwise, we go right. The
looking up completes when all bits are consumed.
1: function Lookup(T, k)
2:
while x 6= 0 T 6=NIL do
3:
if Even?(x) then
4:
T Left(T )
5:
else
6:
T Right(T )
7:
k bk/2c
8:
if T 6= NIL then
9:
return Data(T )
10:
else
11:
return not found
Below Python example code uses bit-wise operation to implements the looking up algorithm.
def lookup(t, key):
while key != 0 and (t is not None):
if key & 1 == 0:

106

CHAPTER 5. TRIE AND PATRICIA

t = [Link]
else:
t = [Link]
key = key>>1
if t is not None:
return [Link]
else:
return None

Looking up can also be define in recursive manner. If the tree is empty, the
looking up fails; If k = 0, the satellite data is the result to be found; If the last
bit is 0, we recursively look up the left child; otherwise, we look up the right
child.

: T =

d : k=0
lookup(T, k) =
(5.2)
lookup(T
,
k/2)
: even(k)

lookup(Tr , bk/2c) : otherwise

The following Haskell example program implements the recursive look up
algorithm.
search Empty k = Nothing
search t 0 = value t
search t k = if even k then search (left t) (k div 2)
else search (right t) (k div 2)

The looking up algorithm is bound to O(m) time, where m is the number of

bits for a given key.

5.3

Integer Patricia

Trie has some drawbacks. It wastes a lot of spaces. Note in figure 5.2, only
leafs store the real data. Typically, the integer binary trie contains many nodes
only have one child. One improvement idea is to compress the chained nodes
together. Patricia is such a data structure invented by Donald R. Morrison
in 1968. Patricia means practical algorithm to retrieve information coded in
alphanumeric[3]. It is another kind of prefix tree.
Okasaki gives implementation of integer Patricia in [2]. If we merge the
chained nodes which have only one child together in figure 5.3, We can get a
Patricia as shown in figure 5.4.
From this figure, we can find that the key for the sibling nodes is the longest
common prefix for them. They branches out at certain bit. Patricia saves a lot
of spaces compare to trie.
Different from integer trie, using the big-endian integer in Patricia doesnt
cause the padding zero problem mentioned in section 5.2. All zero bits before
MSB are omitted to save space. Okasaki lists some significant advantages of
big-endian Patricia[2].

5.3.1

Definition

Integer Patricia tree is a special kind of binary tree. It is either empty or is a

node. There are two different types of node.

5.3. INTEGER PATRICIA

107

001
4:b

1
1:a
0

01
9:d

1
5:c

Figure 5.4: Little endian Patricia for the map {1 a, 4 b, 5 c, 9 d}.

It can be a leaf contains integer key and optional satellite data;
or a branch node, contains the left and the right children. The two children
shares the longest common prefix bits for their keys. For the left child,
the next bit of the key is zero, while the next bit is one for the right child.
The following Haskell example code defines Patricia accordingly.
type Key = Int
type Prefix = Int
type Mask = Int
data IntTree a = Empty
| Leaf Key a
| Branch Prefix Mask (IntTree a) (IntTree a)

In order to tell from which bit the left and right children differ, a mask is
recorded in the branch node. Typically, a mask is power of 2, like 2n for some
non-negative integer n, all bits being lower than n dont belong to the common
prefix.
The following Python example code defines Patricia as well as some auxiliary
functions.
class IntTree:
def __init__(self, key = None, value = None):
[Link] = key
[Link] = value
[Link] = [Link] = None
[Link] = [Link] = None
def set_children(self, l, r):
[Link] = l
[Link] = r
def replace_child(self, x, y):
if [Link] == x:
[Link] = y

108

CHAPTER 5. TRIE AND PATRICIA

else:
[Link] = y
def is_leaf(self):
return [Link] is None and [Link] is None
def get_prefix(self):
if [Link] is None:
return [Link]
else:
return [Link]

5.3.2

Insertion

When insert a key, if the tree is empty, we can just create a leaf node with the
given key and satellite data, as shown in figure 5.5).

NIL

Figure 5.5: Left: the empty tree; Right: After insert key 12.
If the tree is just a singleton leaf node x, we can create a new leaf y, put the
key and data into it. After that, we need create a new branch node, and set x
and y as the two children. In order to determine if the y should be the left or
right node, we need find the longest common prefix of x and y. For example
if key(x) is 12 ((1100)2 in binary), key(y) is 15 ((1111)2 in binary), then the
longest common prefix is (11oo)2 . Where o denotes the bits we dont care about.
We can use another integer to mask those bits. In this case, the mask number
is 4 (100 in binary). The next bit after the longest common prefix presents 21 .
This bit is 0 in key(x), while it is 1 in key(y). We should set x as the left child
and y as the right child. Figure 5.6 shows this example.
In case the tree is neither empty, nor a singleton leaf, we need firstly check if
the key to be inserted matches the longest common prefix recorded in the root.
Then recursively insert the key to the left or the right child according to the
next bit of the common prefix. For example, if insert key 14 ((1110)2 in binary)
to the result tree in figure 5.6, since the common prefix is (11oo)2 , and the next
bit (the bit of 21 ) is 1, we need recursively insert to the right child.
If the key to be inserted doesnt match the longest common prefix stored in
the root, we need branch a new leaf out. Figure 5.7 shows these two different
cases.
For a given key k and value v, denote (k, v) is the leaf node. For branch
node, denote it in form of (p, m, Tl , Tr ), where p is the longest common prefix,
m is the mask, Tl and Tr are the left and right children. Summarize the above

5.3. INTEGER PATRICIA

109

prefix=1100
mask=100

Figure 5.6: Left: A tree with singleton leaf 12; Right: After insert key 15.

prefix=1100
mask=100

prefix=1110
mask=10

(a) Insert key 14. It matches the longest common

prefix (1100)2 ; 14 is then recursively inserted to the
right sub tree.

prefix=1100
mask=100

prefix=0
mask=10000

prefix=1110
mask=10

(b) Insert key 5. It doesnt match the longest common prefix

(1100)2 , a new leaf is branched out.

Figure 5.7: Insert key to branch node.

110

CHAPTER 5. TRIE AND PATRICIA

cases, the insertion algorithm can be defined as the following.

= T = (k, v 0 )
= (k 0 , v 0 )
= (p, m, Tl , Tr ), match(k, p, m), zero(k, m)
insert(T, k, v) =

= (p, m, Tl , Tr ), match(k, p, m), zero(k, m)

= (p, m, Tl , Tr ), match(k, p, m)
(5.3)
The first clause deals with the edge cases, that either T is empty or it is a
leaf node with the same key. The algorithm overwrites the previous value for
the later case.
The second clause handles the case that T is a leaf node, but with different
key. Here we need branch out another new leaf. We need extract the longest
common prefix, and determine which leaf should be set as the left, and which
should be set as the right child. Function join(k1 , T1 , K2 , T2 ) does this work.
Well define it later.
The third clause deals with the case that T is a branch node, the longest
common prefix matches the key to be inserted, and the next bit to the common
prefix is zero. Here we need recursively insert to the left child.
The fourth clause handles the similar case as the third clause, except that
the next bit to the common prefix is one, but not zero. We need recursively
insert to the right child.
The last clause is for the case that the key to be inserted doesnt match the
longest common prefix stored in the branch. We need branch out a new leaf by
calling the join function.
We need define function match(k, p, m) to test if the key k, has the same
prefix p above the masked bits m. For example, suppose the prefix stored in
a branch node is (pn pn1 ...pi ...p0 )2 in binary, key k is (kn kn1 ...ki ...k0 )2 in
binary, and the mask is (100...0)2 = 2i . They match if and only if pj = kj for
all i j n.
One solution to realize match is to test if mask(k, m) = p is satisfied. Where
mask(x, m) = m 1&x, that we perform bitwise-not of m 1, then perform
bitwise-and with x.
Function zero(k, m) test the next bit of the common prefix is zero. With the
help of the mask m, we can shift m one bit to the right, then perform bitwise
and with the key.
(k, v)
join(k, (k, v), k 0 , T )
(p, m, insert(Tl , k, v), Tr )
(p, m, Tl , insert(Tr , k, v))
join(k, (k, v), p, T )

:
:
:
:
:

T
T
T
T
T

zero(k, m) = x&shif tr (m, 1)

(5.4)

If the mask m = (100..0)2 = 2i , k = (kn kn1 ...ki 1...k0 )2 , because the bit
next to ki is 1, zero(k, m) returns false value; if k = (kn kn1 ...ki 0...k0 )2 , then
the result is true.
Function join(p1 , T1 , p2 , T2 ) takes two different prefixes and trees. It extracts
the longest common prefix of p1 and p2 , create a new branch node, and set T1
and T2 as the two children.

join(p1 , T1 , p2 , T2 ) =

(p, m, T1 , T2 ) :
(p, m, T2 , T1 ) :

zero(p1, m), (p, m) = LCP (p1 , p2 )

zero(p1, m)
(5.5)

5.3. INTEGER PATRICIA

111

In order to calculate the longest common prefix of p1 and p2 , we can firstly

compute bitwise exclusive-or for them, then count the number of bits in this
result, and generate a mask m = 2|xor(p1 ,p2 )| . The longest common prefix p can
be given by masking the bits with m for either p1 or p2 .
p = mask(p1 , m)

(5.6)

The following Haskell example code implements the insertion algorithm.

import [Link]
insert t k x
= case t of
Empty Leaf k x
Leaf k x if k==k then Leaf k x
else join k (Leaf k x) k t -- t@(Leaf k x)
Branch p m l r
| match k p m if zero k m
then Branch p m (insert l k x) r
else Branch p m l (insert r k x)
| otherwise join k (Leaf k x) p t -- t@(Branch p m l r)
join p1 t1 p2 t2 = if zero p1 m then Branch p m t1 t2
else Branch p m t2 t1
where
(p, m) = lcp p1 p2
lcp :: Prefix Prefix (Prefix, Mask)
lcp p1 p2 = (p, m) where
m = bit (highestBit (p1 xor p2))
p = mask p1 m
highestBit x = if x == 0 then 0 else 1 + highestBit (shiftR x 1)
mask x m = (x &. complement (m-1)) -- complement means bit-wise not.
zero x m = x &. (shiftR m 1) == 0
match k p m = (mask k m) == p

1:
2:
3:
4:
5:
6:
7:
8:
9:
10:
11:
12:

The insertion algorithm can also be realized imperatively.

function Insert(T, k, v)
if T = NIL then
T Create-Leaf(k, v)
return T
yT
p NIL
while y is not leaf, and Match(k, Prefix(y), Mask(y)) do
py
if Zero?(k, Mask(y)) then
y Left(y)
else
y Right(y)

112
13:
14:
15:
16:
17:
18:
19:
20:
21:
22:
23:
24:

CHAPTER 5. TRIE AND PATRICIA

if y is leaf, and k = Key(y) then
Data(y) v
else
z Branch(y, Create-Leaf(k, v))
if p = NIL then
T z
else
if Left(p) = y then
Left(p) z
else
Right(p) z
return T

Function Branch(T1 , T2 ) does the similar job as what join is defined. It

creates a new branch node, extracts the longest common prefix, and sets T1 and
T2 as the two children.
1: function Branch(T1 , T2 )
2:
T Empty-Node
3:
( Prefix(T ), Mask(T ) ) LCP(Prefix(T1 ), Prefix(T2 ))
4:
if Zero?(Prefix(T1 ), Mask(T )) then
5:
Left(T ) T1
6:
Right(T ) T2
7:
else
8:
Left(T ) T2
9:
Right(T ) T1
10:
return T
The following Python example program implements the insertion algorithm.
def insert(t, key, value = None):
if t is None:
t = IntTree(key, value)
return t
node = t
parent = None
while(True):
if match(key, node):
parent = node
if zero(key, [Link]):
node = [Link]
else:
node = [Link]
else:
if node.is_leaf() and key == [Link]:
[Link] = value
else:
new_node = branch(node, IntTree(key, value))
if parent is None:
t = new_node
else:
parent.replace_child(node, new_node)
break

5.3. INTEGER PATRICIA

113

return t

The auxiliary functions, match, branch, lcp etc. are given as below.
def maskbit(x, mask):
return x & (~(mask-1))
def match(key, tree):
return (not tree.is_leaf()) and maskbit(key, [Link]) == [Link]
def zero(x, mask):
return x & (mask>>1) == 0
def lcp(p1, p2):
diff = (p1 ^ p2)
mask=1
while(diff!=0):
diff>>=1
mask< 1
return (maskbit(p1, mask), mask)
def branch(t1, t2):
t = IntTree()
([Link], [Link]) = lcp(t1.get_prefix(), t2.get_prefix())
if zero(t1.get_prefix(), [Link]):
t.set_children(t1, t2)
else:
t.set_children(t2, t1)
return t

Figure 5.8 shows the example Patricia created with the insertion algorithm.

prefix=0
mask=8

1:x

prefix=100
mask=2

0
4:y

1
5:z

Figure 5.8: Insert map 1 x, 4 y, 5 z into the big-endian integer Patricia

tree.

114

5.3.3

CHAPTER 5. TRIE AND PATRICIA

Look up

Consider the property of integer Patricia tree. When look up a key, if it has
common prefix with the root, then we check the next bit. If this bit is zero, we
recursively look up the left child; otherwise if the bit is one, we next look up
the right child.
When reach a leaf node, we can directly check if the key of the leaf is equal
to what we are looking up. This algorithm can be described with the following
pseudo code.
1: function Look-Up(T, k)
2:
if T = NIL then
3:
return N IL
. Not found
4:
while T is not leaf, and Match(k, Prefix(T ), Mask(T )) do
5:
if Zero?(k, Mask(T )) then
6:
T Left(T )
7:
else
8:
T Right(T )
9:
if T is leaf, and Key(T ) = k then
10:
return Data(T )
11:
else
12:
return N IL
. Not found
Below Python example program implements the looking up algorithm.
def lookup(t, key):
if t is None:
return None
while (not t.is_leaf()) and match(key, t):
if zero(key, [Link]):
t = [Link]
else:
t = [Link]
if t.is_leaf() and [Link] == key:
return [Link]
else:
return None

The looking up algorithm can also be realized in recursive approach. If the

Patricia tree T is empty, or its a singleton leaf with different key from what we
are looking up, the result is empty to indicate not found error; If the tree is a
singleton leaf, and the key of this leaf is equal to what we are looking up, we
are done. Otherwise, T is a branch node, we need check if the common prefix
matches the key to be looked up, and recursively look up the child according
to the next bit. If the common prefix doesnt match the key, it means the key
doesnt exist in the tree. We can return empty result to indicate not found
error.

lookup(T, k) =

v
lookup(Tl , k)
lookup(Tr , k)

:
:
:
:
:

T = (T = (k 0 , v), k 0 6= k)
T = (k 0 , v), k 0 = k
T = (p, m, Tl , Tr ), match(k, p, m), zero(k, m)
T = (p, m, Tl , Tr ), match(k, p, m), zero(k, m)
otherwise
(5.7)

5.4. ALPHABETIC TRIE

115

The following Haskell example program implements this recursive looking

up algorithm.
search t k
= case t of
Empty Nothing
Leaf k x if k==k then Just x else Nothing
Branch p m l r
| match k p m if zero k m then search l k
else search r k
| otherwise Nothing

5.4

Alphabetic Trie

Integer based Trie and Patricia Tree can be a good start point. Such technical
plays important role in Compiler implementation. Okasaki pointed that the
widely used Glasgow Haskell Compiler, GHC, utilized the similar implementation for several years before 1998 [2].
If we extend the key from integer to alphabetic value, Trie and Patricia tree
can be very powerful in solving textual manipulation problems.

5.4.1

Definition

Its not enough to just use the left and right children to represent alphabetic
keys. Using English for example, there are 26 letters and each can be lower or
upper case. If we dont care about the case, one solution is to limit the number
of branches (children) to 26. Some simplified ANSI C implementations of Trie
are defined by using the array of 26 letters. This can be illustrated as in Figure
5.9.
Not all branch nodes contain data. For instance, in Figure 5.9, the root
only has three non-empty branches representing letter a, b, and z. Other
branches such as for letter c, are all empty. We dont show empty branches in
the rest of this chapter.
If deal with case sensitive problems, or handle languages other than English,
there can be more letters than 26. The problem of dynamic size of sub branches
can be solved by using some collection data structures. Such as Hash table or
map.
A alphabetic trie is either empty or a node. There are two types of node.
A leaf node dont has any sub trees;
A branch node contains multiple sub trees. Each sub tree is bound to a
character.
Both leaf and branch may contain optional satellite data. The following
Haskell code shows the example definition.
data Trie a = Trie { value :: Maybe a
, children :: [(Char, Trie a)]}
empty = Trie Nothing []

116

CHAPTER 5. TRIE AND PATRICIA

nil

...

boy

zoo

l
bool

r
another

Figure 5.9: A Trie with 26 branches, containing key a, an, another, bool,
boy and zoo.

5.4. ALPHABETIC TRIE

117

Below ANSI C code defines the alphabetic trie. For illustration purpose
only, we limit the character set to lower case English letters, from a to z.
struct Trie {
struct Trie children[26];
void data;
};

5.4.2

Insertion

When insert string as key, start from the root, we pick the character one by
one from the string, examine which child represents the character. If the corresponding child is empty, a new empty node is created. After that, the next
character is used to select the proper grand child.
We repeat this process for all the characters, and finally store the optional
satellite data in the node we arrived at.
Below pseudo code describes the insertion algorithm.
1: function Insert(T, k, v)
2:
if T = NIL then
3:
T Empty-Node
4:
pT
5:
for each c in k do
6:
if Children(p)[c] = NIL then
7:
Children(p)[c] Empty-Node
8:
p Children(p)[c]
9:
Data(p) v
10:
return T
The following example ANSI C program implements the insertion algorithm.
struct Trie insert(struct Trie t, const char key, void value){
int c;
struct Trie p;
if(!t)
t = create_node();
for (p = t; key; ++key, p = pchildren[c]) {
c = key - a;
if (!pchildren[c])
pchildren[c] = create_node();
}
pdata = value;
return t;
}

Where function create node creates new empty node, with all children initialized to empty.
struct Trie create_node(){
struct Trie t = (struct Trie) malloc(sizeof(struct Trie));
int i;
for (i=0; i<26; ++i)
tchildren[i] = NULL;
tdata = NULL;
return t;

118

CHAPTER 5. TRIE AND PATRICIA

The insertion can also be realized in recursive way. Denote the key to be
inserted as K = k1 k2 ...kn , where ki is the i-th character. K 0 is the rest of
characters except k1 . v 0 is the satellite data to be inserted. The trie is in form
T = (v, C), where v is the satellite data, C = {(c1 , T1 ), (c2 , T2 ), ..., (cm , Tm )} is
the map of children. It maps from character ci to sub-tree Ti . if T is empty,
then C is also empty.

(v 0 , C) : K =
(5.8)
insert(T, K, v 0 ) =
(v, ins(C, k1 , K 0 , v 0 )) : otherwise.
If the key is empty, the previous value v is overwritten with v 0 . Otherwise,
we need check the children and perform recursive insertion. This is realized in
function ins(C, k1 , K 0 , v 0 ). It examines key-sub tree pairs in C one by one. Let
C 0 be the rest of pairs except for the first one. This function can be defined as
below.

{(k1 , insert(, K 0 , v 0 ))} : C =

{k1 , insert(T1 , K 0 , v 0 )} C 0 : k1 = c1
ins(C, k1 , K , v ) =

{(c1 , T1 )} ins(C 0 , k1 , K 0 , v 0 ) : otherwise

(5.9)

If C is empty, we create a pair, mapping from character k1 to a new empty

tree, and recursively insert the rest characters. Otherwise, the algorithm locates
the child which is mapped from k1 for further insertion.
The following Haskell example program implements the insertion algorithm.
insert t []
x=
insert t (k:ks) x =
ins [] k ks x =
ins (p:ps) k ks

5.4.3

Trie (Just x) (children t)

Trie (value t) (ins (children t) k ks x) where
[(k, (insert empty ks x))]
x = if fst p == k
then (k, insert (snd p) ks x):ps
else p:(ins ps k ks x)

Look up

To look up a key, we also extract the character from the key one by one. For each
character, we search among the children to see if there is a branch match this
character. If there is no such a child, the look up process terminates immediately
to indicate the not found error. When we reach the last character of the key,
The data stored in the current node is what we are looking up.
1: function Look-Up(T, key)
2:
if T = NIL then
3:
return not found
4:
for each c in key do
5:
if Children(T )[c] = NIL then
6:
return not found
7:
T Children(T )[c]
8:
return Data(T )
Below ANSI C program implements the look up algorithm. It returns NULL
to indicate not found error.

5.5. ALPHABETIC PATRICIA

119

void lookup(struct Trie t, const char key) {

while (key && t && tchildren[key - a])
t = tchildren[key++ - a];
return (key | | !t) ? NULL : tdata;
}

The look up algorithm can also be realized in recursive manner. When look
up a key, we start from the first character. If it is bound to some child, we
then recursively search the rest characters in that child. Denote the trie as
T = (v, C), the key being searched as K = k1 k2 ...kn if it isnt empty. The first
character in the key is k1 , and the rest characters are denoted as K 0 .

lookup(T, K) =

v : K=
: f ind(C, k1 ) =

lookup(T 0 , K 0 ) : f ind(C, k1 ) = T 0

(5.10)

Where function f ind(C, k) examine the pairs of key-child one by one to

check if any child is bound to character k. If the list of pairs C is empty,
the result is empty to indicate non-existence of such a child. Otherwise, let
C = {(k1 , T1 ), (k2 , T2 ), ..., (km , Tm )}, the first sub tree T1 is bound to k1 ; the
rest of pairs are represented as C 0 . Below equation defines the f ind function.

: C=

T1 : k1 = k
(5.11)
f ind(C, k) =

f ind(C 0 , k) : otherwise
The following Haskell example program implements the trie looking up algorithm. It uses the find function provided in standard library[5].
find t [] = value t
find t (k:ks) = case lookup k (children t) of
Nothing Nothing
Just t find t ks

Exercise 5.1
Develop imperative trie by using collection data structure to manage multiple sub trees in alphabetic trie.

5.5

Alphabetic Patricia

Similar to integer trie, alphabetic trie is not memory efficient. We can use the
same method to compress alphabetic trie to Patricia.

5.5.1

Definition

Alphabetic Patricia tree is a special prefix tree, each node contains multiple
branches. All children of a node share the longest common prefix string. As the
result, there is no node with only one child, because it conflicts with the longest
common prefix property.
If we turn the trie shown in figure 5.9 into Patricia by compressing all nodes
which have only one child. we can get a Patricia prefix tree as in figure 5.10.

120

CHAPTER 5. TRIE AND PATRICIA

zoo

n
an

zoo

ol
bool

y
boy

other
another

Figure 5.10: A Patricia prefix tree, with keys: a, an, another, bool, boy
and zoo.
We can modify the definition of alphabetic trie a bit to adapt it to Patricia.
The Patricia is either empty, or a node in form T = (v, C). Where v is the
optional satellite data; C = {(s1 , T1 ), (s2 , T2 ), ..., (sn , Tn )} is a list of pairs.
Each pair contains a string si , which is bound to a sub tree Ti .
The following Haskell example code defines Patricia accordingly.
type Key = String
data Patricia a = Patricia { value :: Maybe a
, children :: [(Key, Patricia a)]}
empty = Patricia Nothing []

Below Python code reuses the definition for trie to define Patricia.
class Patricia:
def __init__(self, value = None):
[Link] = value
[Link] = {}

5.5.2

Insertion

When insert a key, s, if the Patricia is empty, we create a leaf node as shown
in figure 5.11 (a). Otherwise, we need check the children. If there is some sub
tree Ti bound to the string si , and there exists common prefix between si and
s, we need branch out a new leaf Tj . The method is to create a new internal
branch node, bind it with the common prefix. Then set Ti as one child of this
branch, and Tj as the other child. Ti and Tj share the common prefix. This is
shown in figure 5.11 (b). However, there are two special cases, because s may
be the prefix of si as shown in figure 5.11 (c). And si may be the prefix of s as
in figure 5.11 (d).
The insertion algorithm can be described as below.
1: function Insert(T, k, v)
2:
if T = NIL then

5.5. ALPHABETIC PATRICIA

121

bo
NIL

boy

(a) Insert key boy into the empty Patri-(b) Insert key bool. A new branch
cia, the result is a leaf.
with common prefix bo is created.

another

p1 p2

...

other
x
p1 p2

...

(c) Insert key an with value y into x with prefix another.

another

insert

an p1

...

an p1

...

insert

other

(d) Insert another, into the node with prefix an. We recursively insert key other to the child.

Figure 5.11: Patricia insertion

122

CHAPTER 5. TRIE AND PATRICIA

3:
4:
5:
6:
7:
8:
9:
10:
11:
12:
13:
14:
15:
16:
17:
18:
19:
20:
21:

}
22:
23:
24:
25:
26:
27:

T Empty-Node
pT
loop
match FALSE
for each (si , Ti ) Children(p) do
if k = si then
Value(p) v
return T
c LCP(k, si )
k1 k c
k2 si c
if c 6= NIL then
match TRUE
if k2 = NIL then
. si is prefix of k
p Ti
k k1
break
else
. Branch out a new leaf
Children(p) Children(p) { (c, Branch(k1 , v, k2 , Ti ))
Delete(Children(p), (si , Ti ))
return T
if match then
. Add a new leaf
Children(p) Children(p) { (k, Create-Leaf(v)) }
return T
return T

In the above algorithm, LCP function finds the longest common prefix of
two given strings, for example, string bool and boy has the longest common
prefix bo. The subtraction symbol - for strings gives the different part of two
strings. For example bool - bo = ol. Branch function creates a branch
node and updates keys accordingly.
The longest common prefix can be extracted by checking the characters in
the two strings one by one till there are two characters dont match.
1: function LCP(A, B)
2:
i1
3:
while i |A| i |B| A[i] = B[i] do
4:
ii+1
5:
return A[1...i 1]
There are two cases when branch out a new leaf. Branch(s1 , T1 , s2 , T2 )
takes two different keys and two trees. If s1 is empty, we are dealing the case
such as insert key an into a child bound to string another. We set T2 as the
child of T1 . Otherwise, we create a new branch node and set T1 and T2 as the
two children.
1: function Branch(s1 , T1 , s2 , T2 )
2:
if s1 = then
3:
Children(T1 ) Children(T1 ) {(s2 , T2 )}
4:
return T1
5:
T Empty-Node

5.5. ALPHABETIC PATRICIA

123

Children(T ) {(s1 , T1 ), (s2 , T2 )}

return T
The following example Python program implements the Patricia insertion
algorithm.
6:
7:

def insert(t, key, value = None):

if t is None:
t = Patricia()
node = t
while True:
match = False
for k, tr in [Link]():
if key == k: # just overwrite
[Link] = value
return t
(prefix, k1, k2) = lcp(key, k)
if prefix != "":
match = True
if k2 == "":
# example: insert "another" into "an", go on traversing
node = tr
key = k1
break
else: #branch out a new leaf
[Link][prefix] = branch(k1, Patricia(value), k2, tr)
del [Link][k]
return t
if not match: # add a new leaf
[Link][key] = Patricia(value)
return t
return t

Where the functions to find the longest common prefix, and branch out are
implemented as below.
# returns (p, s1, s2), where p is lcp, s1=s1-p, s2=s2-p
def lcp(s1, s2):
j=0
while j < len(s1) and j < len(s2) and s1[j] == s2[j]:
j += 1
return (s1[0:j], s1[j:], s2[j:])
def branch(key1, tree1, key2, tree2):
if key1 == "":
#example: insert "an" into "another"
[Link][key2] = tree2
return tree1
t = Patricia()
[Link][key1] = tree1
[Link][key2] = tree2
return t

The insertion can also be realized recursively. Start from the root, the program checks all the children to find if there is a node matches the key. Matching
means they have the common prefix. For duplicated keys, the program overwrites previous value. There are also alternative solution to handle duplicated

124

CHAPTER 5. TRIE AND PATRICIA

keys, such as using linked-list etc. If there is no child matches the key, the
program creates a new leaf, and add it to the children.
For Patricia T = (v, C), function insert(T, k, v 0 ) inserts key k, and value v 0
to the tree.
insert(T, k, v 0 ) = (v, ins(C, k, v 0 ))

(5.12)

This function calls another internal function ins(C, k, v 0 ). If the children C

is empty, a new leaf is created; Otherwise the children are examined one by
one. Denote C = {(k1 , T1 ), (k2 , T2 ), ..., (kn , Tn )}, C 0 holds all the prefix-sub tree
pairs except for the first one.

{(k, (v 0 , ))}
{(k, (v , CT1 ))} C 0
ins(C, k, v 0 ) =
{branch(k, v 0 , k1 , T1 )} C 0

{(k1 , T1 )} ins(C 0 , k, v 0 )
0

:
:
:
:

C=
k1 = k
match(k1 , k)
otherwise

(5.13)

The first clause deals with the edge case of empty children. A leaf node
containing v 0 which is bound to k will be returned as the only child. The second
clause overwrites the previous value with v 0 if there is some child bound to the
same key. Note the CT1 means the children of sub tree T1 . The third clause
branches out a new leaf if the first child matches the key k. The last clause goes
on checking the rest children.
We define two keys A and B matching if they have non-empty common
prefix.
match(A, B) = A 6= B 6= a1 = b1

(5.14)

Where a1 and b1 are the first characters in A and B if they are not empty.
Function branch(k1 , v, k2 , T2 ) takes tow keys, a value and a tree. Extract the
longest common prefix k = lcp(k1 , k2 ), Denote the different part as k10 = k1 k,
k20 = k2 k. The algorithm firstly handles the edge cases that either k1 is the
prefix of k2 or k2 is the prefix of k1 . For the former case, It creates a new node
containing v, bind this node to k, and set (k20 , T2 ) as the only child; For the later
case, It recursively insert k10 and v to T2 . Otherwise, the algorithm creates a
branch node, binds it to the longest common prefix k, and set two children for
it. One child is (k20 , T2 ), the other is a leaf node containing v, and being bound
to k10 .

(k, (v, {(k20 , T2 )})) : k = k1

(k, insert(T2 , k10 , v)) : k = k2
branch(k1 , v, k2 , T2 ) =

(k, (, {(k10 , (v, )), (k20 , T2 )}) : otherwise

(5.15)
And function lcp(A, B) keeps taking same characters from A and B one by
one. Denote a1 and b1 as the first characters in A and B if they are not empty.
A0 and B 0 are the rest parts except for the first characters.

lcp(A, B) =

: A = B = a1 6= b1
{a1 } lcp(A0 , B 0 ) : a1 = b1

(5.16)

5.5. ALPHABETIC PATRICIA

125

The following Haskell example program implements the Patricia insertion

algorithm.
insert t k x = Patricia (value t) (ins (children t) k x) where
ins []
k x = [(k, Patricia (Just x) [])]
ins (p:ps) k x
| (fst p) == k
= (k, Patricia (Just x) (children (snd p))):ps --overwrite
| match (fst p) k
= (branch k x (fst p) (snd p)):ps
| otherwise
= p:(ins ps k x)
match x y = x /= [] && y /= [] && head x == head y
branch k1 x k2 t2
| k1 == k
-- ex: insert "an" into "another"
= (k, Patricia (Just x) [(k2, t2)])
| k2 == k
-- ex: insert "another" into "an"
= (k, insert t2 k1 x)
| otherwise = (k, Patricia Nothing [(k1, leaf x), (k2, t2)])
where
k = lcp k1 k2
k1 = drop (length k) k1
k2 = drop (length k) k2
lcp [] _ = []
lcp _ [] = []
lcp (x:xs) (y:ys) = if x == y then x:(lcp xs ys) else []

5.5.3

Look up

When look up a key, we cant examine the characters one by one as in trie. Start
from the root, we need search among the children to see if any one is bound
to a prefix of the key. If there is such a child, we update the key by removing
the prefix part, and recursively look up the updated key in this child. If there
arent any children bound to any prefix of the key, the looking up fails.
1: function Look-Up(T, k)
2:
if T = NIL then
3:
return not found
4:
repeat
5:
match FALSE
6:
for (ki , Ti ) Children(T ) do
7:
if k = ki then
8:
return Data(Ti )
9:
if ki is prefix of k then
10:
match TRUE
11:
k k ki
12:
T Ti
13:
break

126
14:
15:

CHAPTER 5. TRIE AND PATRICIA

until match
return not found

Below Python example program implements the looking up algorithm. It

reuses the lcp(s1, s2) function defined previously to test if a string is the
prefix of the other.
def lookup(t, key):
if t is None:
return None
while True:
match = False
for k, tr in [Link]():
if k == key:
return [Link]
(prefix, k1, k2) = lcp(key, k)
if prefix != "" and k2 == "":
match = True
key = k1
t = tr
break
if not match:
return None

This algorithm can also be realized recursively. For Patricia in form T =

(v, C), it calls another function to find among the children C.
lookup(T, k) = f ind(C, k)

(5.17)

If C is empty, the looking up fails; Otherwise, For C = {(k1 , T1 ), (k2 , T2 ), ..., (kn , Tn )},
we firstly examine if k is the prefix of k1 , if not the recursively check the rest
pairs denoted as C 0 .

vT1
f ind(C, k) =
lookup(T1 , k k1 )

f ind(C 0 , k)

:
:
:
:

C=
k = k1
k1 @ k
otherwise

(5.18)

Where A @ B means string A is prefix of B. f ind mutually calls lookup if

some child is bound to a string which is prefix of the key.
Below Haskell example program implements the looking up algorithm.
import qualified [Link]
find t k = find (children t) k where
find [] _ = Nothing
find (p:ps) k
| (fst p) == k = value (snd p)
| (fst p) [Link] k = find (snd p) (diff (fst p) k)
| otherwise = find ps k
diff k1 k2 = drop (length (lcp k1 k2)) k2

5.6. TRIE AND PATRICIA APPLICATIONS

5.6

127

Trie and Patricia applications

Trie and Patricia can be used to solving some interesting problems. Integer
based prefix tree is used in compiler implementation. Some daily used software
applications have many interesting features which can be realized with trie or
Patricia. In this section, some applications are given as examples, including,
e-dictionary, word auto-completion, T9 input method etc. The commercial implementations typically do not adopt trie or Patricia directly. The solutions we
demonstrated here are for illustration purpose only.

5.6.1

E-dictionary and word auto-completion

Figure 5.12 shows a screen shot of an English-Chinese E-dictionary. In order to

provide good user experience, the dictionary searches its word library, and lists
all candidate words and phrases similar to what user has entered.

Figure 5.12: E-dictionary. All candidates starting with what the user input are
listed.
A E-dictionary typically contains hundreds of thousands words. Its very expensive to performs a whole word search. Commercial software adopts complex
approaches, including caching, indexing etc to speed up this process.
Similar with e-dictionary, figure 5.13 shows a popular Internet search engine.
When user input something, it provides a candidate lists, with all items starting
with what the user has entered. And these candidates are shown in the order
of popularity. The more people search, the upper position it is in the list.
In both cases, the software provides a kind of word auto-completion mechanism. In some modern IDEs, the editor can even help users to auto-complete
program code.
Lets see how to implementation of the e-dictionary with trie or Patricia. To
simplify the problem, we assume the dictionary only supports English - English
information.

128

CHAPTER 5. TRIE AND PATRICIA

Figure 5.13: A search engine. All candidates starting with what user input are
listed.

A dictionary stores key-value pairs, the keys are English words or phrases,
the values are the meaning described in English sentences.
We can store all the words and their meanings in a trie, but it isnt space
effective especially when there are huge amount of items. Well use Patricia to
realize e-dictionary.
When user wants to look up word a, the dictionary does not only return
the meaning of a, but also provides a list of candidate words, which all start
with a, including abandon, about, accent, adam, ... Of course all these
words are stored in the Patricia.
If there are too many candidates, one solution is only displaying the top 10
words, and the user can browse for more.
The following algorithm reuses the looking up defined for Patricia. When it
finds a node bound to a string which is the prefix of what we are looking for, it
expands all its children until getting n candidates.
1: function Look-Up(T, k, n)
2:
if T = NIL then
3:
return
4:
pref ix NIL
5:
repeat
6:
match FALSE
7:
for (ki , Ti ) Children(T ) do
8:
if k is prefix of ki then
9:
return Expand(Ti , pref ix, n)
10:
if ki is prefix of k then
11:
match TRUE
12:
k k ki
13:
T Ti
14:
pref ix pref ix + ki

5.6. TRIE AND PATRICIA APPLICATIONS

15:
16:
17:

129

break
until match
return

Where function Expand(T, pref ix, n) picks n sub trees, which share the
same prefix in T . It is realized as BFS (Bread-First-Search) traverse. Chapter
search explains BFS in detail.
1: function Expand(T, pref ix, n)
2:
R
3:
Q {(pref ix, T )}
4:
while |R| < n |Q| > 0 do
5:
(k, T ) Pop(Q)
6:
if Data(T ) 6= NIL then
7:
R R {(k, Data(T ) )}
8:
for (ki , Ti ) Children(T ) do
9:
Push(Q, (k + ki , Ti ))
The following example Python program implements the e-dictionary application. When testing if a string is prefix of another one, it uses the find function
provided in standard string library.
import string
def patricia_lookup(t, key, n):
if t is None:
return None
prefix = ""
while True:
match = False
for k, tr in [Link]():
if [Link](k, key) == 0: #is prefix of
return expand(prefix+k, tr, n)
if [Link](key, k) ==0:
match = True
key = key[len(k):]
t = tr
prefix += k
break
if not match:
return None
def expand(prefix, t, n):
res = []
q = [(prefix, t)]
while len(res)<n and len(q)>0:
(s, p) = [Link](0)
if [Link] is not None:
[Link]((s, [Link]))
for k, tr in [Link]():
[Link]((s+k, tr))
return res

This algorithm can also be implemented recursively, if the string we are looking for is empty, we expand all children until getting n candidates. Otherwise

130

CHAPTER 5. TRIE AND PATRICIA

we recursively examine the children to find one which has prefix equal to this
string.
In programming environments supporting lazy evaluation. An intuitive solution is to expand all candidates, and take the first n on demand. Denote the
Patricia prefix tree in form T = (v, C), below function enumerates all items
starts with key k.

f indAll(T, k) =

enum(C) :
{(, v)} enum(C) :

f ind(C, k) :

k = , v =
k = , v 6=
k 6=

(5.19)

The first two clauses deal with the edge cases the the key is empty. All the
children are enumerated except for those with empty values. The last clause
finds child matches k.
For non-empty children, C = {(k1 , T1 ), (k2 , T2 ), ..., (km , Tm )}, denote the
rest pairs except for the first one as C 0 . The enumeration algorithm can be
defined as below.

: C=
mapAppend(k1 , f indAll(T1 , )) enum(C 0 ) :
(5.20)
Where mapAppend(k, L) = {(k + ki , vi )|(ki , vi ) L}. It concatenate the
prefix k in front of every key-value pair in list L.
Function f ind(C, k) is defined as the following. For empty children, the
result is empty as well; Otherwise, it examines the first child T1 which is bound
to string k1 . If k1 is equal to k, it calls mapAppend to add prefix to the keys of
all the children under T1 ; If k1 is prefix of k, the algorithm recursively find all
children start with k k1 ; On the other hand, if k is prefix of k1 , all children
under T1 are valid result. Otherwise, the algorithm by-pass the first child and
goes on find the rest children.
enum(C) =

f ind(C, k) =

mapAppend(k, f indAll(T1 , ))
mapAppend(k1 , f indAll(T1 , k k1 ))
f indAll(T1 , )
f ind(C 0 , k)

:
:
:
:
:

C=
k1 = k
k1 @ k
k @ k1
otherwise

(5.21)

Below example Haskell program implements the e-dictionary application according to the above equations.
findAll :: Patricia a Key [(Key, a)]
findAll t [] =
case value t of
Nothing enum $ children t
Just x ("", x):(enum $ children t)
where
enum [] = []
enum (p:ps) = (mapAppend (fst p) (findAll (snd p) [])) ++ (enum ps)
findAll t k = find (children t) k where
find [] _ = []

5.6. TRIE AND PATRICIA APPLICATIONS

131

find (p:ps) k
| (fst p) == k
= mapAppend k (findAll (snd p) [])
| (fst p) [Link] k
= mapAppend (fst p) (findAll (snd p) (k diff (fst p)))
| k [Link] (fst p)
= findAll (snd p) []
| otherwise = find ps k
diff x y = drop (length y) x
mapAppend s lst = map (p(s++(fst p), snd p)) lst

In the lazy evaluation environment, the top n candidates can be gotten like
take(n, f indAll(T, k)). Appendix A has detailed definition of take function.

5.6.2

T9 input method

Most mobile phones around year 2000 are equipped with a key pad. Users have
quite different experience from PC when editing a short message or email. This
is because the mobile-phone key pad, or so called ITU-T key pad has much
fewer keys than PC. Figure 5.14 shows one example.

Figure 5.14: an ITU-T keypad for mobile phone.

There are typical two methods to input English word or phrases with ITU-T
key pad. For instance, if user wants to enter a word home, He can press the
key in below sequence.
Press key 4 twice to enter the letter h;
Press key 6 three times to enter the letter o;
Press key 6 twice to enter the letter m;
Press key 3 twice to enter the letter e;
Another much quicker way is to just press the following keys.
Press key 4, 6, 6, 3, word home appears on top of the candidate list;
Press key * to change a candidate word, so word good appears;

132

CHAPTER 5. TRIE AND PATRICIA

Press key * again to change another candidate word, next word gone
appears;
...

Compare these two methods, we can see the second one is much easier for
the end user. The only overhead is to store a dictionary of candidate words.
Method 2 is called as T9 input method, or predictive input method [6], [7].
The abbreviation T9 stands for textonym. It start with T with 9 characters.
T9 input can also be realized with trie or Patricia.
In order to provide candidate words to user, a dictionary must be prepared in
advance. Trie or Patricia can be used to store the dictionary. The commercial T9
implementations typically use complex indexing dictionary in both file system
and cache. The realization shown here is for illustration purpose only.
Firstly, we need define the T9 mapping, which maps from digit to candidate
characters.
MT 9 = { 2 abc, 3 def, 4 ghi,
5 jkl, 6 mno, 7 pqrs,
8 tuv, 9 wxyz}

(5.22)

With this mapping, MT 9 [i] returns the corresponding characters for digit i.
Suppose user input digits D = d1 d2 ...dn , If D isnt empty, denote the rest
digits except for d1 as D0 , below pseudo code shows how to realize T9 with trie.
1: function Look-Up-T9(T, D)
2:
Q {(, D, T )}
3:
R
4:
while Q 6= do
5:
(pref ix, D, T ) Pop(Q)
6:
for each c in MT 9 [d1 ] do
7:
if c Children(T ) then
8:
if D0 = then
9:
R R {pref ix + c}
10:
else
11:
Push(Q, (pref ix + c, D0 , Children(t)[c]))
12:
return R
Where pref ix + c means appending character c to the end of string pref ix.
Again, this algorithm performs BFS search with a queue Q. The queue is initialized with a tuple (pref ix, D, T ), containing empty prefix, the digit sequence
to be searched, and the trie. It keeps picking the tuple from the queue as far
as it isnt empty. Then get the candidate characters from the first digit to be
processed via the T9 map. For each character c, if there is a sub-tree bound to
it, we created a new tuple, update the prefix by appending c, using the rest of
digits to update D, and use that sub-tree. This new tuple is pushed back to the
queue for further searching. If all the digits are processed, it means a candidate
word is found. We put this word to the result list R.
The following example program in Python implements this T9 search with
trie.
T9MAP={2:"abc", 3:"def", 4:"ghi", 5:"jkl",
6:"mno", 7:"pqrs", 8:"tuv", 9:"wxyz"}

5.6. TRIE AND PATRICIA APPLICATIONS

133

def trie_lookup_t9(t, key):

if t is None or key == "":
return None
q = [("", key, t)]
res = []
while len(q)>0:
(prefix, k, t) = [Link](0)
i=k[0]
if not i in T9MAP:
return None #invalid input
for c in T9MAP[i]:
if c in [Link]:
if k[1:]=="":
[Link]((prefix+c, [Link][c].value))
else:
[Link]((prefix+c, k[1:], [Link][c]))
return res

Because trie is not space effective, we can modify the above algorithm with
Patricia solution. As far as the queue isnt empty, the algorithm pops the tuple.
This time, we examine all the prefix-sub tree pairs. For every pair (ki , Ti ), we
convert the alphabetic prefix ki back to digits sequence D0 by looking up the T9
map. If D0 exactly matches the digits of what user input, we find a candidate
word; otherwise if the digit sequence is prefix of what user inputs, the program
creates a new tuple, updates the prefix, the digits to be processed, and the
sub-tree. Then put the tuple back to the queue for further search.
1: function Look-Up-T9(T, D)
2:
Q {(, D, T )}
3:
R
4:
while Q 6= do
5:
(pref ix, D, T ) Pop(Q)
6:
for each (ki , Ti ) Children(T ) do
7:
D0 Convert-T9(ki )
8:
if D0 @ D then
. D0 is prefix of D
0
9:
if D = D then
10:
R R {pref ix + ki }
11:
else
12:
Push(Q, (pref ix + ki , D D0 , Ti ))
13:
return R
Function Convert-T9(K) converts each character in K back to digit.
1: function Convert-T9(K)
2:
D
3:
for each c K do
4:
for each (d S) MT 9 do
5:
if c S then
6:
D D {d}
7:
break
8:
return D
The following example Python program implements the T9 input method
with Patricia.

134

CHAPTER 5. TRIE AND PATRICIA

def patricia_lookup_t9(t, key):

if t is None or key == "":
return None
q = [("", key, t)]
res = []
while len(q)>0:
(prefix, key, t) = [Link](0)
for k, tr in [Link]():
digits = toT9(k)
if [Link](key, digits)==0: #is prefix of
if key == digits:
[Link]((prefix+k, [Link]))
else:
[Link]((prefix+k, key[len(k):], tr))
return res

T9 input method can also be realized recursively. Lets first define the trie
solution. The algorithm takes two arguments, a trie storing all the candidate
words, and a sequence of digits that is input by the user. If the sequence
is empty, the result is empty as well; Otherwise, it looks up C to find those
children which are bound to the first digit d1 according to T9 map.

f indT 9(T, D) =

{} : D =
f old(f, , lookupT 9(d1 , C)) : otherwise

(5.23)

Where folding is defined in Appendix A. Function f takes two arguments,

an intermediate list of candidates which is initialized empty, and a pair (c, T 0 ),
where c is candidate character, to which sub tree T 0 is bound. It append character c to all the candidate words, and concatenate this to the result list.
f (L, (c, T 0 )) = mapAppend(c, f indT 9(T 0 , D0 )) L

(5.24)

Note this mapAppend function is a bit different from the previous one defined
in e-dictionary application. The first argument is a character, but not a string.
Function lookupT 9(k, C) checks all the possible characters mapped to digit
k. If the character is bound to some child in C, it is record as one candidate.
lookupT 9(d, C) = f old(g, , MT 9 [k])

(5.25)

Where

g(L, k) =

L : f ind(C, k) =
{(k, T 0 )} L : f ind(C, k) = T 0

(5.26)

Below Haskell example program implements the T9 look up algorithm with

trie.
mapT9 = [(2, "abc"), (3, "def"), (4, "ghi"), (5, "jkl"),
(6, "mno"), (7, "pqrs"), (8, "tuv"), (9, "wxyz")]
findT9 t [] = [("", value t)]
findT9 t (k:ks) = foldl f [] (lookupT9 k (children t))
where
f lst (c, tr) = (mapAppend c (findT9 tr ks)) ++ lst

5.6. TRIE AND PATRICIA APPLICATIONS

135

lookupT9 c children = case lookup c mapT9 of

Nothing []
Just s foldl f [] s where
f lst x = case lookup x children of
Nothing lst
Just t (x, t):lst
mapAppend x lst = map (p(x:(fst p), snd p)) lst

There are few modifications when change the realization from trie to Patricia.
Firstly, the sub-tree is bound to prefix string, but not a single character.

f indT 9(T, D) =

{} : D =
f old(f, , f indP ref ixT 9(D, C)) : otherwise

(5.27)

The list for folding is given by calling function f indP ref ixT 9(D, C). And
f is also modified to reflect this change. It appends the candidate prefix D0 in
front of every result output by the recursive search, and then accumulates the
words.
f (L, (D0 , T 0 )) = mapAppend(D0 , f indT 9(T 0 , D D0 )) L

(5.28)

Function f indP ref ixT 9(D, C) examines all the children. For every pair
(ki , Ti ), if converting ki back to digits yields a prefix of D, then this pair is
selected as a candidate.
f indP ref ixT 9(D, C) = {(ki , Ti )|(ki , Ti ) C, convertT 9(ki ) @ D}

(5.29)

Function convertT 9(k) converts every alphabetic character in k back to digits according to T9 map.
convertT 9(K) = {d|c k, (d S) MT 9 c S}

(5.30)

The following example Haskell program implements the T9 input algorithm

with Patricia.
findT9 t [] = [("", value t)]
findT9 t k = foldl f [] (findPrefixT9 k (children t))
where
f lst (s, tr) = (mapAppend s (findT9 tr (k diff s))) ++ lst
diff x y = drop (length y) x
findPrefixT9 s lst = filter f lst where
f (k, _) = (toT9 k) [Link] s
toT9 = map (c head $ [ d | (d, s) mapT9, c elem s])

Exercise 5.2
For the T9 input, compare the results of the algorithms realized with trie
and Patricia, the sequences are different. Why does this happen? How to
modify the algorithm so that they output the candidates with the same
sequence?

136

5.7

CHAPTER 5. TRIE AND PATRICIA

Summary

In this chapter, we start from the integer base trie and Patricia. The map
data structure based on integer Patricia plays an important role in Compiler
implementation. Alphabetic trie and Patricia are natural extensions. They can
be used to manipulate text information. As examples, predictive e-dictionary
and T9 input method are realized with trie or Patricia. Although these examples
are different from the real implementation in commercial software. They show
simple approaches to solve some problems. Other important data structure,
such as suffix tree, has close relationship with them. Suffix tree is introduced in
the next chapter.

Bibliography
[1] Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest and Clifford Stein. Introduction to Algorithms, Second Edition. Problem 12-1.
ISBN:0262032937. The MIT Press. 2001
[2] Chris Okasaki and Andrew Gill. Fast Mergeable Integer Maps. Workshop
on ML, September 1998, pages 77-86, [Link] andy/pub/[Link]
[3] D.R. Morrison, PATRICIA Practical Algorithm To Retrieve Information
Coded In Alphanumeric, Journal of the ACM, 15(4), October 1968, pages
514-534.
[4] Suffix Tree, Wikipedia. [Link] tree
[5] Trie, Wikipedia. [Link]
[6] T9 (predictive text), Wikipedia. [Link] (predictive text)
[7] Predictive text, Wikipedia. [Link] text

137

138

Suffix Tree

Chapter 6

Suffix Tree
6.1

Introduction

Suffix Tree is an important data structure. It can be used to realize many

important string operations particularly fast[3]. It is also widely used in bioinformation area such as DNA pattern matching[4]. Weiner introduced suffix
tree in 1973[2]. The latest on-line construction algorithm was found in 1995[1].
The suffix tree for a string S is a special Patricia. Each edge is labeled with
some sub-string of S. Each suffix of S corresponds to exactly one path from the
root to a leaf. Figure 6.1 shows the suffix tree for English word banana.

anana banana nana

Figure 6.1: The suffix tree for banana

All suffixes, banana, anana, nana, ana, na, a, can be found in the
above tree. Among them the first 3 suffixes are explicitly shown; others are
implicitly represented. The reason for why ana, na, a, and are not shown
is because they are prefixes of the others. In order to show all suffixes explicitly,
we can append a special pad terminal symbol, which doesnt occur in other
places in the string. Such terminator is typically denoted as $. By this means,
there is no suffix being the prefix of the others.
Although the suffix tree for banana is simple, the suffix tree for bananas,
as shown in figure 6.2, is quite different.
To create suffix suffix tree for a given string, we can utilize the insertion
algorithm explained in previous chapter for Patricia.
1: function Suffix-Tree(S)
2:
T NIL
3:
for i 1 to |S| do
4:
T Patricia-Insert(T , Right(S, i))
5:
return T
139

140

CHAPTER 6. SUFFIX TREE

a bananas na

na s

nas s

Figure 6.2: The suffix tree for bananas

For non-empty string S = s1 s2 ...si ...sn of length n = |S|, function Right(S, i)
= si si+1...sn . It extracts the sub-string of S from the i-th character to the last
one. This straightforward algorithm can also be defined as below.
suf f ixT (S) = f old(insertP atricia , , suf f ixes(S))

(6.1)

Where function suf f ixes(S) gives all the suffixes for string S. If the string
is empty, the result is one empty string; otherwise, S itself is one suffix, the
others can be given by recursively call suf f ixes(S 0 ), where S 0 is given by drop
the first character from S.

{} : S =
suf f ixes(S) =
(6.2)
{S} suf f ixes(S 0 ) : otherwise
This solution constructs suffix tree in O(n2 ) time, for string of length n.
It totally inserts n suffixes to the tree, and each insertion takes linear time
proportion to the length of the suffix. The efficiency isnt good enough.
In this chapter, we firstly explain a fast on-line suffix trie construction solution by using suffix link concept. Because trie isnt space efficient, we next
introduce a linear time on-line suffix tree construction algorithm found by Ukkonen. and show how to solve some interesting string manipulation problems with
suffix tree.

6.2

Suffix trie

Just likes the relationship between trie and Patricia, Suffix trie has much simpler
structure than suffix tree. Figure 6.3 shows the suffix trie for banana.
Compare with figure 6.1, we can find the difference between suffix tree and
suffix trie. Instead of representing a word, every edge in suffix trie only represents a character. Thus suffix trie needs much more spaces. If we pack all nodes
which have only one child, the suffix trie is turned into a suffix tree.
We can reuse the trie definition for suffix tree. Each node is bound to a
character, and contains multiple sub trees as children. A child can be referred
from the bounded character.

6.2. SUFFIX TRIE

141

Figure 6.3: Suffix trie for banana

6.2.1

Node transfer and suffix link

For string S of length n, define Si = s1 s2 ...si . It is the prefix contains the first
i characters.
In suffix trie, each node represents a suffix string. for example in figure 6.4,
node X represents suffix a, by adding character c, node X transfers to Y
which represents suffix ac. We say node X transfer to Y with the edge of
character c[1].
Y Children(X)[c]
We also say that node X has a c-child Y . Below Python expression reflects
this concept.
y = [Link][c]

If node A in a suffix trie represents suffix si si+1 ...sn , and node B represents
suffix si+1 si+2 ...sn , we say node B represents the suffix of node A. We can
create a link from A to B. This link is defined as the suffix link of node A[1].
Suffix link is drawn in dotted style. In figure 6.4, the suffix link of node A points
to node B, and the suffix link of node B points to node C.
Suffix link is valid for all nodes except the root. We can add a suffix link
field to the trie definition. Below Python example code shows this update.
class STrie:
def __init__(self, suffix=None):
[Link] = {}
[Link] = suffix

142

CHAPTER 6. SUFFIX TREE

root

X
c

a
C

B
o

a
A
o

Figure 6.4: Suffix trie for string cacao. Node X a, node Y ac, X
transfers to Y with character c
suffix string
s1 s2 s3 ...si
s2 s3 ...si
...
si1 si
si

Table 6.1: suffixes for Si

6.2.2

On-line construction

For string S, Suppose we have constructed suffix trie for the i-th prefix Si =
s1 s2 ...si . Denote it as Suf f ixT rie(Si ). Lets consider how to obtain Suf f ixT rie(Si+1 )
from Suf f ixT rie(Si ).
If list all suffixes corresponding to Suf f ixT rie(Si ), from the longest (which
is Si ) to the shortest (which is empty), we can get table 6.1. There are total
i + 1 suffixes.
One solution is to append the character si+1 to every suffix in this table,
then add another empty string. This idea can be realized by adding a new child
for every node in the trie, and binding all these new child with edge of character
si+1 .
However, some nodes in Suf f ixT rie(Si ) may have the si+1 -child already.
For example, in figure 6.5, node X and Y are corresponding for suffix cac and
ac respectively. They dont have the a-child. But node Z, which represents
suffix c has the a-child already.

6.2. SUFFIX TRIE

143

Algorithm 1 Update Suf f ixT rie(Si ) to Suf f ixT rie(Si+1 ), initial version.
1: for T Suf f ixT rie(Si ) do
2:
Children(T )[si+1 ] Create-Empty-Node

root

a c

Z
c

Y
c

(a) Suffix trie for string cac.

(b) Suffix trie for string caca.

Figure 6.5: Suffix Trie of cac and caca

144

CHAPTER 6. SUFFIX TREE

When append si+1 to Suf f ixT rie(Si ). In this example si+1 is character a.
We need create new nodes for X and Y , but we neednt do this for Z.
If check the nodes one by one according to table 6.1, we can stop immediately
when meet a node which has the si+1 -child. This is because if node X in
Suf f ixT rie(Si ) has the si+1 -child, according to the definition of suffix link,
any suffix nodes X 0 of X in Suf f ixT rie(Si ) must also have the si+1 -child. In
other words, let c = si+1 , if wc is a sub-string of Si , then every suffix of wc is
also a sub-string of Si [1]. The only exception is the root, which represents for
empty string .
According to this fact, we can refine the algorithm 1 to the following.
Algorithm 2 Update Suf f ixT rie(Si ) to Suf f ixT rie(Si+1 ), second version.
1: for each T Suf f ixT rie(Si ) in descending order of suffix length do
2:
if Children(T )[si+1 ] = NIL then
3:
Children(T )[si+1 ] Create-Empty-Node
4:
else
5:
break
The next question is how to iterate all nodes in descending order of the suffix
length? Define the top of a suffix trie as the deepest leaf node. This definition
ensures the top represents the longest suffix. Along the suffix link from the top
to the next node, the length of the suffix decrease by one. This fact tells us
that We can traverse the suffix tree from the top to the root by using the suffix
links. And the order of such traversing is exactly what we want. Finally, there
is a special suffix trie for empty string Suf f ixT rie(N IL), We define the top
equals to the root in this case.
function Insert(top, c)
if top = NIL then
. The trie is empty
top Create-Empty-Node
T top
T 0 Create-Empty-Node
. dummy init value
while T 6= NIL Children(T )[c] = NIL do
Children(T )[c] Create-Empty-Node
Suffix-Link(T 0 ) Children(T )[c]
T 0 Children(T )[c]
T Suffix-Link(T )
if T 6= NIL then
Suffix-Link(T 0 ) Children(T )[c]
return Children(top)[c]
. returns the new top
Function Insert, updates Suf f ixT rie(Si ) to Suf f ixT rie(Si+1 ). It takes
two arguments, one is the top of Suf f ixT rie(Si ), the other is si+1 character.
If the top is NIL, it means the tree is empty, so there is no root. The algorithm
creates a root node in this case. A sentinel empty node T 0 is created. It keeps
tracking the previous created new node. In the main loop, the algorithm checks
every node one by one along the suffix link. If the node hasnt the si+1 -child, it
then creates a new node, and binds the edge to character si+1 . The algorithm
repeatedly goes up along the suffix link until either arrives at the root, or find a
node which has the si+1 -child already. After the loop, if the node isnt empty,

6.2. SUFFIX TRIE

145

it means we stop at a node which has the si+1 -child. The last suffix link then
points to that child. Finally, the new top position is returned, so that we can
further insert other characters to the suffix trie.
For a given string S, the suffix trie can be built by repeatedly calling Insert
function.
1: function Suffix-Trie(S)
2:
t NIL
3:
for i 1 to |S| do
4:
t Insert(t, si )
5:
return t
This algorithm returns the top of the suffix trie, but not the root. In order
to access the root, we can traverse along the suffix link.
1: function Root(T )
2:
while Suffix-Link(T ) 6= NIL do
3:
T Suffix-Link(T )
4:
return T
Figure 6.6 shows the steps when construct suffix trie for cacao. Only the
last layer of suffix links are shown.
For Insert algorithm, the computation time is proportion to the size of suffix
trie. In the worse case, the suffix trie is built in O(n2 ) time, where n = |S|. One
example is S = an bn , that there are n characters of a and n characters of b.
The following example Python program implements the suffix trie construction algorithm.
def suffix_trie(str):
t = None
for c in str:
t = insert(t, c)
return root(t)
def insert(top, c):
if top is None:
top=STrie()
node = top
new_node = STrie() #dummy init value
while (node is not None) and (c not in [Link]):
new_node.suffix = [Link][c] = STrie(node)
new_node = [Link][c]
node = [Link]
if node is not None:
new_node.suffix = [Link][c]
return [Link][c] #update top
def root(node):
while [Link] is not None:
node = [Link]
return node

146

CHAPTER 6. SUFFIX TREE

root

(a) Empty

(b) c

root

a
root

X
c

root
Y
a c

Z
c

a
A

(d) cac

(e) caca

(f) cacao

Figure 6.6: Construct suffix trie for cacao. There are 6 steps. Only the last
layer of suffix links are shown in dotted arrow.

6.3. SUFFIX TREE

6.3

147

Suffix Tree

Suffix trie isnt space efficient, and the construction time is quadratic. If dont
care about the speed, we can compress the suffix trie to suffix tree[6]. Ukkonen
found a linear time on-line suffix tree construction algorithm in 1995.

6.3.1

On-line construction

Active point and end point

The suffix trie construction algorithm shows very important fact about what
happens when Suf f ixT rie(Si ) updates to Suf f ixT rie(Si+1 ). Lets review the
last two steps in figure 6.6.
There are two different updates.
1. All leaves are appended with a new node for si+1 ;
2. Some non-leaf nodes are branched out with a new node for si+1 .
The first type of update is trivial, because for all new coming characters, we
need do this work anyway. Ukkonen defines leaf as the open node.
The second type of update is important. We need figure out which internal
nodes need branch out. We only focus on these nodes and apply the update.
Ukkonen defines the path along the suffix links from the top to the end as
boundary path. Denote the nodes in boundary path as, n1 , n2 , ..., nj , ..., nk .
These nodes start from the leaf node (the first one is the top position), suppose
that after the j-th node, they are not leaves any longer, we need repeatedly
branch out from this time point till the k-th node.
Ukkonen defines the first none-leaf node nj as active point and the last
node nk as end point. The end point can be the root.
Reference pair

a bananas na

X
na s

nas s

Y
nas s

Figure 6.7: Suffix tree of bananas. X transfer to Y with sub-string na.

Figure 6.7 shows the suffix tree of English word bananas. Node X represents suffix a. By adding sub-string na, node X transfers to node Y , which

148

CHAPTER 6. SUFFIX TREE

represents suffix ana. In other words, we can represent Y with a pair of node
and sub-string, like (X, w), where w = na. Ukkonen defines such kind of
pair as reference pair. Not only the explicit node, but also the implicit position
in suffix tree can be represented with reference pair. For example, (X, n )
represents to a position which is not an explicit node. By using reference pair,
we can represent every position in a suffix tree.
In order to save spaces, for string S, all sub-strings can be represented as
a pair of index (l, r), where l is the left index and r is the right index of the
character for the sub-string. For instance, if S = bananas, and the index
starts from 1, sub-string na can be represented with pair (3, 4). As the result,
there is only one copy of the complete string, and all positions in the suffix tree
is defined as (node, (l, r)). This is the final form of reference pair.
With reference pair, node transfer for suffix tree can be defined as the following.
Children(X)[sl ] ((l, r), Y ) Y (X, (l, r))
If character sl = c, we say that node X has a c-child, This child is Y . Y can
be transferred from X with sub string (l, r) Each node can have at most one
c-child.
canonical reference pair
Its obvious that the one position in a suffix tree may have multiple reference
pairs. For example, node Y in Figure 6.7 can be either denoted as (X, (3, 4)) or
(root, (2, 4)). If we define empty string = (i, i 1), Y can also be represented
as (Y, ).
The canonical reference pair is the one which has the closest node to the position. Specially, in case the position is an explicit node, the canonical reference
pair is (node, ), so (Y, ) is the canonical reference pair of node Y .
Below algorithm converts a reference pair (node, (l, r)) to the canonical reference pair (node0 , (l0 , r)). Note that since r doesnt change, the algorithm can
only return (node0 , l0 ) as the result.
Algorithm 3 Convert reference pair to canonical reference pair
1: function Canonize(node, (l, r))
2:
if node = NIL then
3:
if (l, r) = then
4:
return ( NIL, l)
5:
else
6:
return Canonize(root, (l + 1, r))
7:
while l r do
. (l, r)isn0 tempty
8:
((l0 , r0 ), node0 ) Children(node)[sl ]
9:
if r l r0 l0 then
10:
l l + r 0 l0 + 1
. Remove |(l0 , r0 )| chars from (l, r)
11:
node node0
12:
else
13:
break
14:
return (node, l)
If the passed in node parameter is NIL, it means a very special case. The

6.3. SUFFIX TREE

149

function is called like the following.

Canonize(Suffix-Link(root), (l, r))
Because the suffix link of root points to NIL, the result should be (root, (l +
1, r)) if (l, r) is not . Otherwise, (NIL, ) is returned to indicate a terminal
position.
We explain this special case in detail in later sections.
The algorithm
In 6.3.1, we mentioned, all updating to leaves is trivial, because we only need
append the new coming character to the leaf. With reference pair, it means,
when update Suf f ixT ree(Si ) to Suf f ixT ree(Si+1 ), all reference pairs in form
(node, (l, i)), are leaves. They will change to (node, (l, i+1)) next time. Ukkonen
defines leaf in form (node, (l, )), here means open to grow. We can skip
all leaves until the suffix tree is completely constructed. After that, we can
change all to the length of the string.
So the main algorithm only cares about positions from the active point to
the end point. However, how to find the active point and the end point?
When start suffix tree construction, there is only a root node. There arent
any branches or leaves. The active point should be (root, ), or (root, (1, 0)) (the
string index starts from 1).
For the end point, it is a position where we can finish updating Suf f ixT ree(Si ).
According to the suffix trie algorithm, we know it should be a position which has
the si+1 -child already. Because a position in suffix trie may not be an explicit
node in suffix tree, if (node, (l, r)) is the end point, there are two cases.
1. (l, r) = . It means the node itself is the end point. This node has the
si+1 -child, which means Children(node)[si+1 ] 6= NIL;
2. Otherwise, l r, the end point is an implicit position. It must satisfy si+1 = sl0 +|(l,r)| , where Children(node)[sl ]= ((l0 , r0 ), node0 ), |(l, r)|
means the length of sub-string (l, r). It equals to r l + 1. This is illustrated in figure 6.8. We can also say that (node, (l, r)) has a si+1 -child
implicitly.

Figure 6.8: Implicit end point

150

CHAPTER 6. SUFFIX TREE

Ukkonen finds a very important fact that if (node, (l, i)) is the end point of
Suf f ixT ree(Si ), then (node, (l, i + 1)) is the active point of Suf f ixT ree(Si+1 ).
This is because if (node, (l, i)) is the end point of Suf f ixT ree(Si ), it must
have a si+1 -child (either explicitly or implicitly). If this end point represents
suffix sk sk+1 ...si , it is the longest suffix in Suf f ixT ree(Si ) which satisfies
sk sk+1 ...si si+1 is a sub-string of Si . Consider Si+1 , sk sk+1 ...si si+1 must occur at least twice in Si+1 , so position (node, (l, i + 1)) is the active point of
Suf f ixT ree(Si+1 ). Figure 6.9 shows about this truth.

Figure 6.9:
End
Suf f ixT ree(Si+1 ).

point

Suf f ixT ree(Si )

and

active

point

Summarize the above facts, the algorithm of Ukkonens on-line construction

can be given as the following.
1: function Update(node, (l, i))
2:
prev Create-Empty-Node
. Initialized as sentinel
3:
loop
. Traverse along the suffix links
4:
(f inish, node0 ) End-Point-Branch?(node, (l, i 1), si )
5:
if f inish then
6:
break
7:
Children(node0 )[si ] ((i, ), Create-Empty-Node)
8:
Suffix-Link(prev) node0
9:
prev node0
10:
(node, l) = Canonize(Suffix-Link(node), (l, i 1))
11:
Suffix-Link(prev) node
12:
return (node, l)
. The end point
This algorithm takes reference pair (node, (l, i)) as arguments, note that
position (node, (l, i 1) is the active point for Suf f ixT ree(Si1 ). Then we
start a loop, this loop goes along the suffix links until the current position
(node, (l, i 1)) is the end point. Otherwise, function End-Point-Branch?

6.3. SUFFIX TREE

151

returns a position, from where the new leaf branch out.

The End-Point-Branch? algorithm is realized as below.
function End-Point-Branch?(node, (l, r), c)
if (l, r) = then
if node = NIL then
return (TRUE, root)
else
return (Children(node)[c] = NIL, node)
else
((l0 , r0 ), node0 ) Children(node)[sl ]
pos l0 + |(l, r)|
if spos = c then
return (TRUE, node)
else
p Create-Empty-Node
Children(node)[sl0 ] ((l0 , pos 1), p)
Children(p)[spos ] ((pos, r0 ), node0 )
return (FALSE, p)
If the position is (root, ), it means we have arrived at the root. Its definitely
the end point, so that we can finish this round of updating. If the position is
in form of (node, ), it means the reference pair represents an explicit node, we
can examine if this node has already the c-child, where c = si . If not, we need
branch out a leaf.
Otherwise, the position (node, (l, r)) points to an implicit node. We need
find the exact position next to it to see if there is a c-child. If yes, we meet an
end point, the updating loop can be finished; else, we turn the position to an
explicit node, and return it for further branching.
We can finalize the Ukkonens algorithm as below.
1: function Suffix-Tree(S)
2:
root Create-Empty-Node
3:
node root, l 0
4:
for i 1 to |S| do
5:
(node, l) = Update(node, (l, i))
6:
(node, l) = Canonize(node, (l, i))
7:
return root
Figure 6.10 shows the steps when constructing the suffix tree for string cacao.
Note that we neednt set suffix link for leaf nodes, only branch nodes need
suffix links.
The following example Python program implements Ukkonens algorithm.
First is the node definition.
class Node:
def __init__(self, suffix=None):
[Link] = {} # c:(word, Node), where word = (l, r)
[Link] = suffix

Because there is only one copy of the complete string, all sub-strings are
represent in (lef t, right) pairs, and the leaf are open pairs as (lef t, ). The
suffix tree is defined like below.

152

CHAPTER 6. SUFFIX TREE

root

(a) Empty

(b) c

ca a

a c cac

(d) cac

aca caca

(e) caca

cao o

(f) cacao

Figure 6.10: Construct suffix tree for cacao. There are 6 steps. Only the last
layer of suffix links are shown in dotted arrow.

class STree:
def __init__(self, s):
[Link] = s
[Link] = len(s)+1000
[Link] = Node()

The infinity is defined as the length of the string plus a big number. Some
auxiliary functions are defined.
def substr(str, str_ref):
(l, r)=str_ref
return str[l:r+1]
def length(str_ref):
(l, r)=str_ref
return r-l+1

The main entry for Ukkonens algorithm is implemented as the following.

def suffix_tree(str):
t = STree(str)
node = [Link] # init active point is (root, Empty)
l=0
for i in range(len(str)):
(node, l) = update(t, node, (l, i))
(node, l) = canonize(t, node, (l, i))
return t
def update(t, node, str_ref):
(l, i) = str_ref

6.3. SUFFIX TREE

153

Functional suffix tree construction

Giegerich and Kurtz found Ukkonens algorithm can be transformed to McCreights algorithm[7]. The three suffix tree construction algorithms found by
Weiner, McCreight, and Ukkonen are all bound to O(n) time. Giegerich and
Kurtz conjectured any sequential suffix tree construction method doesnt base

154

CHAPTER 6. SUFFIX TREE

on suffix links, active suffixes, etc., fails to meet the O(n)-criterion.

There is implementation in PLT/Scheme[10] based on Ukkonens algorithm,
However, it updates suffix links during the processing, which is not purely functional.
A lazy suffix tree construction method is discussed in [8]. And this method
is contributed to Haskell Hackage by Bryan OSullivan. [9]. The method depends on the lazy evaluation property. The tree wont be constructed until it is
traversed. However, it cant ensure the O(n) performance if the programming
environments or languages dont support lazy evaluation.
The following Haskell program defines the suffix tree. A suffix tree is either
a leaf, or a branch containing multiple sub trees. Each sub tree is bound to a
string.
data Tr = Lf | Br [(String, Tr)] deriving (Eq)
type EdgeFunc = [String](String, [String])

The edge function extracts a common prefix from a list of strings. The prefix
returned by edge function may not be the longest one, empty string is also
allowed. The exact behavior can be customized with different edge functions.
build(edge, X)
This defines a generic radix tree building function. It takes an edge function,
and a set of strings. X can be all suffixes of a string, so that we get suffix trie
or suffix tree. Well also explain later that X can be all prefixes, which lead to
normal prefix trie or Patricia.
Suppose all the strings are built from a character set . When build the
tree, if the string is empty, X only contains one empty string as well. The result
tree is an empty leaf; Otherwise, we examine every character in , group the
strings in X with their initial characters, the apply the edge function to these
groups.

branch({({c} p,
build(edge, X) ==

leaf
build(edge, X 0 ))|
c ,
G {group(X, c)},
(p, X 0 ) {edge(G)}})

: X = {}
: otherwise

(6.3)
The algorithm categorizes all suffixes by the first letter in several groups.
It removes the first letter for each element in every group. For example, the
suffixes {acac, cac, ac, c} are categorized to groups {(a, [cac, c]),
(c, [ac, ])}.
group(X, c) = {C 0 |{c1 } C 0 X, c1 = c}

(6.4)

Function group enumerates all suffixes in X, for each one, denote the first
character as c1 , the rest characters as C 0 . If c1 is same as the given character c,
then C 0 is collected.
Below example Haskell program implements the generic radix tree building
algorithm.

6.3. SUFFIX TREE

155

alpha = [a..z]++[A..Z]
lazyTree::EdgeFunc [String] Tr
lazyTree edge = build where
build [[]] = Lf
build ss = Br [(a:prefix, build ss) |
aalpha,
xs@(x:_) [[cs | c:csss, c==a]],
(prefix, ss)[edge xs]]

Different edge functions produce different radix trees. Since edge function
extracts common prefix from a set of strings. The simplest one constantly uses
the empty string as the common prefix. This edge function builds a trie.
edgeT rie(X) = (, X)

(6.5)

We can also realize an edge function, that extracts the longest common
prefix. Such edge function builds a Patricia. Denote the strings as X =
{x1 , x2 , ..., xn }, for the each string xi , let the initial character be ci , and the
rest characters in xi as Wi . If there is only one string in X, the longest common prefix is definitely this string; If there are two strings start with different
initial characters, the longest common prefix is empty; Otherwise,it means all
the strings share the same initial character. This character definitely belongs to
the longest common prefix. We can remove it from all strings, and recursively
call the edge function.

edgeT ree(X) =

(x1 , {}) : X = {x1 }

(, X) : |X| > 1, xi X, ci 6= c1

({c1 } p, Y ) : (p, Y ) = edgeT ree({Wi |xi X})

(6.6)

Here are some examples for edgeT ree function.

edgeT ree({an, another, and}) = (an, {, other, d})
edgeT ree({bool, foo, bar}) = (, {bool, fool, bar})
The following example Haskell program implements this edge function.
edgeTree::EdgeFunc
edgeTree [s] = (s, [[]])
edgeTree awss@((a:w):ss) | null [c | c:_ss, a/=c] = (a:prefix, ss)
| otherwise
= ("", awss)
where (prefix, ss) = edgeTree (w:[u | _:uss])
edgeTree ss = ("", ss)

For any given string, we can build suffix trie and suffix tree by feeding suffixes
to these two edge functions.
suf f ixT rie(S) = build(edgeT rie, suf f ixes(S))

(6.7)

suf f ixT ree(S) = build(edgeT ree, suf f ixes(S))

(6.8)

Because the build(edge, X) is generic, it can be used to build other radix

trees, such as the normal prefix trie and Patricia.
trie(S) = build(edgeT rie, pref ixes(S))

(6.9)

156

CHAPTER 6. SUFFIX TREE

tree(S) = build(edgeT ree, pref ixes(S))

6.4

(6.10)

Suffix tree applications

Suffix tree can help to solve many string and DNA pattern manipulation problems particularly fast.

6.4.1

String/Pattern searching

There a plenty of string searching algorithms, such as the famous KMP(KnuthMorris-Pratt algorithm is introduced in the chapter of search) algorithm. Suffix
tree can perform at the same level as KMP[11]. the string searching in bound
to O(m) time, where m is the length of the sub-string to be search. However,
O(n) time is required to build the suffix tree in advance, where n is the length
of the text[12].
Not only sub-string searching, but also pattern matching, including regular
expression matching can be solved with suffix tree. Ukkonen summarizes this
kind of problems as sub-string motifs: For a string S, Suf f ixT ree(S) gives
complete occurrence counts of all sub-string motifs of S in O(n) time, although
S may have O(n2 ) sub-strings.
Find the number of sub-string occurrence
Every internal node in Suf f ixT ree(S) is corresponding to a sub-string occurs
multiple times in S. If this sub-string occurs k times in S, then there are total
k sub-trees under this node[13].
1: function Lookup-Pattern(T, s)
2:
loop
3:
match FALSE
4:
for (si , Ti ) Values(Children(T )) do
5:
if s @ si then
6:
return Max(|Children(Ti )|, 1)
7:
else if si @ s then
8:
match TRUE
9:
T Ti
10:
s s si
11:
break
12:
if match then
13:
return 0
When look up a sub-string pattern s in text w, we build the suffix tree T
from the text. Start from the root, we iterate all children. For every string
reference pair si and sub-tree Ti , we check if the s is prefix of si . If yes, the
number of sub-trees in Ti is returned as the result. There is a special case that
Ti is a leaf without any children. We need return 1 but not zero. This is why
we use the maximum function. Otherwise, if si is prefix of s, then we remove
si part from s, and recursively look up in Ti .
The following Python program implements this algorithm.

6.4. SUFFIX TREE APPLICATIONS

157

def lookup_pattern(t, s):

node = [Link]
while True:
match = False
for _, (str_ref, tr) in [Link]():
edge = substr(t, str_ref)
if [Link](edge, s)==0: #s isPrefixOf edge
return max(len([Link]), 1)
elif [Link](s, edge)==0: #edge isPrefixOf s
match = True
node = tr
s = s[len(edge):]
break
if not match:
return 0
return 0 # not found

This algorithm can also be realized in recursive way. For the non-leaf suffix
tree T , denote the children as C = {(s1 , T1 ), (s2 , T2 ), ...}. We search the sub
string among the children.
lookuppattern (T, s) = f ind(C, s)

(6.11)

If children C is empty, it means the sub string doesnt occurs at all. Otherwise, we examine the first pair (s1 , T1 ), if s is prefix of s1 , then the number
of sub-trees in T1 is the result. If s1 is prefix of s, we remove s1 from s, and
recursively look up it in T1 ; otherwise, we go on to examine the rest children
denoted as C 0 .

0
max(1, |C1 |)
f ind(C, s) =
lookuppattern (T1 , s s1 )

f ind(C 0 , s)

:
:
:
:

C=
s @ s1
s1 @ s
otherwise

(6.12)

The following Haskell example code implements this algorithm.

lookupPattern (Br lst) ptn = find lst where
find [] = 0
find ((s, t):xs)
| ptn isPrefixOf s = numberOfBranch t
| s isPrefixOf ptn = lookupPattern t (drop (length s) ptn)
| otherwise = find xs
numberOfBranch (Br ys) = length ys
numberOfBranch _ = 1
findPattern s ptn = lookupPattern (suffixTree $ s++"$") ptn

We always append special terminator to the string (the $ in above program), so that there wont be any suffix becomes the prefix of the other[3].
Suffix tree also supports searching pattern like a**n, we skip it here. Readers can refer to [13] and [14] for details.

158

6.4.2

CHAPTER 6. SUFFIX TREE

Find the longest repeated sub-string

After adding a special terminator character to string S, The longest repeated

sub-string can be found by searching the deepest branches in suffix tree.
Consider the example suffix tree shown in figure 6.11

mississippi$ p

$ ppi$ ssi
A
ppi$ ssippi$

i$ pi$

B
ppi$ ssippi$

C
ppi$ ssippi$

Figure 6.11: The suffix tree for mississippi$

There are three branch nodes, A, B, and C with depth 3. However, A
represents the longest repeated sub-string issi. B and C represent for si,
ssi, they are shorter than A.
This example tells us that the depth of the branch node should be measured by the number of characters traversed from the root. But not the number
of explicit branch nodes.
To find the longest repeated sub-string, we can perform BFS in the suffix
tree.
1: function Longest-Repeated-Substring(T )
2:
Q (NIL, Root(T ))
3:
R NIL
4:
while Q is not empty do
5:
(s, T ) Pop(Q)
6:
for each ((l, r), T 0 ) Children(T ) do
7:
if T 0 is not leaf then
8:
s0 Concatenate(s, (l, r))
9:
Push(Q, (s0 , T 0 ))
10:
R Update(R, s0 )
11:
return R
This algorithm initializes a queue with a pair of an empty string and the
root. Then it repeatedly examine the candidate in the queue.
For each node, the algorithm examines each children one by one. If it is a
branch node, the child is pushed back to the queue for further search. And the
sub-string represented by this child will be treated as a candidate of the longest
repeated sub-string.
Function Update(R, s0 ) updates the longest repeated sub-string candidates.
If multiple candidates have the same length, they are all kept in a result list.

6.4. SUFFIX TREE APPLICATIONS

159

function Update(L, s)
if L = NIL |l1 | < |s| then
return l {s}
4:
if |l1 | = |s| then
5:
return Append(L, s)
6:
return L
The above algorithm can be implemented in Python as the following example
program.
1:
2:
3:

def lrs(t):
queue = [("", [Link])]
res = []
while len(queue)>0:
(s, node) = [Link](0)
for _, (str_ref, tr) in [Link]():
if len([Link])>0:
s1 = s+[Link](str_ref)
[Link]((s1, tr))
res = update_max(res, s1)
return res
def update_max(lst, x):
if lst ==[] or len(lst[0]) < len(x):
return [x]
if len(lst[0]) == len(x):
return lst + [x]
return lst

Searching the deepest branch can also be realized recursively. If the tree is
just a leaf node, empty string is returned, else the algorithm tries to find the
longest repeated sub-string from the children.

: leaf (T )
longest({si LRS(Ti )|(si , Ti ) C, leaf (Ti )}) : otherwise
(6.13)
The following Haskell example program implements the longest repeated
sub-string algorithm.
LRS(T ) =

isLeaf Lf = True
isLeaf _ = False
lrs::TrString
lrs Lf = ""
lrs (Br lst) = find $ filter (not isLeaf snd) lst where
find [] = ""
find ((s, t):xs) = maximumBy (compare on length) [s++(lrs t), find xs]

6.4.3

Find the longest common sub-string

The longest common sub-string, can also be quickly found with suffix tree. The
solution is to build a generalized suffix tree. If the two strings are denoted as
txt1 and txt2 , a generalized suffix tree is Suf f ixT ree(txt1 $1 txt2 $2 ). Where

160

CHAPTER 6. SUFFIX TREE

$1 is a special terminator character for txt1 , and $2 6= $1 is another special

terminator character for txt2 .
The longest common sub-string is indicated by the deepest branch node, with
two forks corresponding to both ...$1 ... and ...$2 (no $1 ). The definition of
the deepest node is as same as the one for the longest repeated sub-string, it is
the number of characters traversed from root.
If a node has ...$1 ... under it, the node must represent a sub-string of
txt1 , as $1 is the terminator of txt1 . On the other hand, since it also has ...$2
(without $1 ), this node must represent a sub-string of txt2 too. Because its the
deepest one satisfied this criteria, so the node represents the longest common
sub-string.
Again, we can use BFS (bread first search) to find the longest common
sub-string.
1: function Longest-Common-Substring(T )
2:
Q (NIL, Root(T ))
3:
R NIL
4:
while Q is not empty do
5:
(s, T ) POP(Q)
6:
if Match-Fork(T ) then
7:
R Update(R, s)
8:
for each ((l, r), T 0 ) Children(T ) do
9:
if T 0 is not leaf then
10:
s0 Concatenate(s, (l, r))
11:
Push(Q, (s0 , T 0 ))
12:
return R
Most part is as same as the the longest repeated sub-sting searching algorithm. The function Match-Fork checks if the children satisfy the common
sub-string criteria.
1: function Match-Fork(T )
2:
if | Children(T ) | = 2 then
3:
{(s1 , T1 ), (s2 , T2 )} Children(T )
4:
return T1 is leaf T2 is leaf Xor($1 s1 , $1 s2 ))
5:
return FALSE
In this function, it checks if the two children are both leaf. One contains $2 ,
while the other doesnt. This is because if one child is a leaf, it always contains
$1 according to the definition of suffix tree.
The following Python program implement the longest common sub-string
program.
def lcs(t):
queue = [("", [Link])]
res = []
while len(queue)>0:
(s, node) = [Link](0)
if match_fork(t, node):
res = update_max(res, s)
for _, (str_ref, tr) in [Link]():
if len([Link])>0:
s1 = s + [Link](str_ref)
[Link]((s1, tr))

6.4. SUFFIX TREE APPLICATIONS

161

return res
def is_leaf(node):
return [Link]=={}
def match_fork(t, node):
if len([Link])==2:
[(_, (str_ref1, tr1)), (_, (str_ref2, tr2))]=[Link]()
return is_leaf(tr1) and is_leaf(tr2) and
([Link](str_ref1).find(#)!=-1) !=
([Link](str_ref2).find(#)!=-1)
return False

The longest common sub-string finding algorithm can also be realized recursively. If the suffix tree T is a leaf, the result is empty; Otherwise, we examine
all children in T . For those satisfy the matching criteria, the sub-string are
collected as candidates; for those dont matching, we recursively search the
common sub-string among the children. The longest candidate is selected as
the final result.

: leaf (T )
{si |(si , Ti ) C, match(Ti )}
: otherwise

{si LCS(Ti )|(si , Ti ) C, match(Ti )})

(6.14)
The following Haskell example program implements the longest common
sub-string algorithm.
LCS(T ) =

longest(

lcs Lf = []
lcs (Br lst) = find $ filter (not isLeaf snd) lst where
find [] = []
find ((s, t):xs) = maxBy (compare on length)
(if match t
then s:(find xs)
else (map (s++) (lcs t)) ++ (find xs))
match (Br [(s1, Lf), (s2, Lf)]) = ("#" isInfixOf s1) /= ("#" isInfixOf s2)
match _ = False

6.4.4

Find the longest palindrome

A palindrome is a string, S, such that S = reverse(S) For example, level,

rotator, civic are all palindrome.
The longest palindrome in a string s1 s2 ...sn can be found in O(n) time with
suffix tree. The solution can be benefit from the longest common sub-string
algorithm.
For string S, if sub-string w is a palindrome, then it must be sub-string
of reverse(S) too. for instance, issi is a palindrome, it is a sub-string of
mississippi. When reverse to ippississim, issi is also a sub-string.
Based on this fact, we can find the longest palindrome by searching the
longest common sub-string for S and reverse(S).
palindromem (S) = LCS(suf f ixT ree(S reverse(S)))

(6.15)

162

CHAPTER 6. SUFFIX TREE

The following Haskell example program finds the longest palindrome.

longestPalindromes s = lcs $ suffixTree (s++"#"++(reverse s)++"$")

6.4.5

Others

Suffix tree can also be used for data compression, such as Burrows-Wheeler
transform, LZW compression (LZSS) etc. [3]

6.5

Notes and short summary

Suffix Tree was first introduced by Weiner in 1973 [?]. In 1976, McCreight
greatly simplified the construction algorithm. McCreight constructs the suffix
tree from right to left. In 1995, Ukkonen gave the first on-line construction
algorithms from left to right. All the three algorithms are linear time (O(n)).
And some research shows the relationship among these 3 algorithms. [7]

Bibliography
[1] Esko Ukkonen. On-line construction of suffix trees. Algorithmica
14
(3):
249260.
doi:10.1007/BF01206331.
[Link]
[2] Weiner, P. Linear pattern matching algorithms, 14th Annual
IEEE Symposium on Switching and Automata Theory, pp. 1C11,
doi:10.1109/SWAT.1973.13
[3] Suffix Tree, Wikipedia. [Link] tree
[4] Esko Ukkonen. Suffix tree and suffix array techniques for pattern analysis
in strings. [Link]
[5] Trie, Wikipedia. [Link]
[6] Suffix Tree (Java). [Link] tree (Java)
[7] Robert Giegerich and Stefan Kurtz. From Ukkonen to McCreight
and Weiner: A Unifying View of Linear-Time Suffix Tree Construction. Science of Computer Programming 25(2-3):187-218, 1995.
[Link]
[8] Robert Giegerich and Stefan Kurtz. A Comparison of Imperative and Purely Functional Suffix Tree Constructions. Algorithmica 19 (3):
331353. doi:10.1007/PL00009177. [Link]/pubs/pdf/[Link]
[9] Bryan OSullivan. suffixtree: Efficient, lazy suffix tree implementation.
[Link]
[10] Danny. [Link] dyoo/plt/suffixtree/
[11] Zhang
Shaojie.
Lecture
of
Suffix
[Link] shzhang/Combio09/[Link]

Trees.

[12] Lloyd Allison. Suffix Trees. [Link]

[13] Esko Ukkonen. Suffix tree and suffix array techniques for pattern analysis
in strings. [Link]
[14] Esko Ukkonen Approximate string-matching over suffix trees. Proc. CPM
93. Lecture Notes in Computer Science 684, pp. 228-242, Springer 1993.
[Link]

163

164

B-Trees

Chapter 7

B-Trees
7.1

Introduction

B-Tree is important data structure. It is widely used in modern file systems.

Some are implemented based on B+ tree, which is extended from B-tree. B-tree
is also widely used in database systems.
Some textbooks introduce B-tree with the the problem of how to access a
large block of data on magnetic disks or secondary storage devices[2]. It is
also helpful to understand B-tree as a generalization of balanced binary search
tree[2].
Refer to the Figure 7.1, It is easy to find the difference and similarity of
B-tree regarding to binary search tree.
M

Figure 7.1: Example B-Tree

Remind the definition of binary search tree. A binary search tree is

either an empty node;
or a node contains 3 parts, a value, a left child and a right child. Both
children are also binary search trees.
The binary search tree satisfies the constraint that.
all the values on the left child are not greater than the value of of this
node;
the value of this node is not greater than any values on the right child.
165

166

CHAPTER 7. B-TREES

For non-empty binary tree (L, k, R), where L, R and k are the left, right children, and the key. Function Key(T ) accesses the key of tree T . The constraint
can be represented as the following.
x L, y R Key(x) k Key(y)

(7.1)

If we extend this definition to allow multiple keys and children, we get the
B-tree definition.
A B-tree
is either empty;
or contains n keys, and n + 1 children, each child is also a B-Tree, we
denote these keys and children as k1 , k2 , ..., kn and c1 , c2 , ..., cn , cn+1 .
Figure 7.2 illustrates a B-Tree node.
C[1]

K[1]

C[2]

K[2]

...

C[n]

K[n]

C[n+1]

Figure 7.2: A B-Tree node

The keys and children in a node satisfy the following order constraints.
Keys are stored in non-decreasing order. that k1 k2 ... kn ;
for each ki , all elements stored in child ci are not greater than ki , while
ki is not greater than any values stored in child ci+1 .
The constraints can be represented as in equation (7.2) as well.
xi ci , i = 0, 1, ..., n, x1 k1 x2 k2 ... xn kn xn+1

(7.2)

Finally, after adding some constraints to make the tree balanced, we get the
complete B-tree definition.
All leaves have the same depth;
We define integral number, t, as the minimum degree of B-tree;
each node can have at most 2t 1 keys;
each node can have at least t 1 keys, except the root;
Consider a B-tree holds n keys. The minimum degree t 2. The height is
h. All the nodes have at least t 1 keys except the root. The root contains at
least 1 key. There are at least 2 nodes at depth 1, at least 2t nodes at depth 2,
at least 2t2 nodes at depth 3, ..., finally, there are at least 2th1 nodes at depth
h. Times all nodes with t 1 except for root, the total number of keys satisfies
the following inequality.
n 1 + (t 1)(2 + 2t + 2t2 + ... + 2th1 )
h1
X
tk
= 1 + 2(t 1)
k=0

th 1
= 1 + 2(t 1)
t1
= 2th 1

(7.3)

7.2. INSERTION

167

Thus we have the inequality between the height and the number of keys.
n+1
(7.4)
2
This is the reason why B-tree is balanced. The simplest B-tree is so called
2-3-4 tree, where t = 2, that every node except root contains 2 or 3 or 4 keys.
red-black tree can be mapped to 2-3-4 tree essentially.
The following Python code shows example B-tree definition. It explicitly
pass t when create a node.
h logt

class BTree:
def __init__(self, t):
self.t = t
[Link] = []
[Link] = []

B-tree nodes commonly have satellite data as well. We ignore satellite data
for illustration purpose.
In this chapter, we will firstly introduce how to generate B-tree by insertion.
Two different methods will be explained. One is the classic method as in [2],
that we split the node before insertion if its full; the other is the modify-fix
approach which is quite similar to the red-black tree solution [3] [2]. We will
next explain how to delete key from B-tree and how to look up a key.

7.2

Insertion

B-tree can be created by inserting keys repeatedly. The basic idea is similar to
the binary search tree. When insert key x, from the tree root, we examine all
the keys in the node to find a position where all the keys on the left are less
than x, while all the keys on the right are greater than x. If the current node
is a leaf node, and it is not full (there are less then 2t 1 keys in this node), x
will be insert at this position. Otherwise, the position points to a child node.
We need recursively insert x to it.
Figure 7.3 shows one example. The B-tree illustrated is 2-3-4 tree. When
insert key x = 22, because its greater than the root, the right child contains
key 26, 38, 45 is examined next; Since 22 < 26, the first child contains key 21
and 25 are examined. This is a leaf node, and it is not full, key 22 is inserted
to this node.
However, if there are 2t 1 keys in the leaf, the new key x cant be inserted,
because this node is full. When try to insert key 18 to the above example
B-tree will meet this problem. There are 2 methods to solve it.

7.2.1

Splitting

Split before insertion

If the node is full, one method to solve the problem is to split to node before
insertion.
For a node with at 1 keys, it can be divided into 3 parts as shown in Figure
7.4. the left part contains the first t 1 keys and t children. The right part
contains the rest t 1 keys and t children. Both left part and right part are

168

CHAPTER 7. B-TREES
20

(a) Insert key 22 to the 2-3-4 tree. 22 > 20, go to the right child; 22 < 26 go to the first child.

(b) 21 < 22 < 25, and the leaf isnt full.

Figure 7.3: Insertion is similar to binary search tree.

valid B-tree nodes. the middle part is the t-th key. We can push it up to the
parent node (if the current node is root, then the this key, with the two children
will be the new root).
For node x, denote K(x) as keys, C(x) as children. The i-th key as ki (x),
the j-th child as cj (x). Below algorithm describes how to split the i-th child for
a given node.
1: procedure Split-Child(node, i)
2:
x ci (node)
3:
y CREATE-NODE
4:
Insert(K(node), i, kt (x))
5:
Insert(C(node), i + 1, y)
6:
K(y) {kt+1 (x), kt+2 (x), ..., k2t1 (x)}
7:
K(x) {k1 (x), k2 (x), ..., kt1 (x)}
8:
if y is not leaf then
9:
C(y) {ct+1 (x), ct+2 (x), ..., c2t (x)}
10:
C(x) {c1 (x), c2 (x), ..., ct (x)}
The following example Python program implements this child splitting algorithm.
def split_child(node, i):
t = node.t
x = [Link][i]
y = BTree(t)
[Link](i, [Link][t-1])
[Link](i+1, y)
[Link] = [Link][t:]
[Link] = [Link][:t-1]

7.2. INSERTION

169
K[1]

C[1]

C[2]

K[2]

...

K[t]

C[t]

...

K[2t-1]

C[t+1]

...

C[2t-1]

C[2t]

(a) Before split

...

K[1]

C[1]

K[2]

C[2]

...

K[t]

...

K[t-1]

K[t+1]

C[t]

C[t+1]

...

K[2t-1]

C[2t-1]

(b) After split

Figure 7.4: Split node

if not is_leaf(x):
[Link] = [Link][t:]
[Link] = [Link][:t]

Where function is leaf test if a node is leaf.

def is_leaf(t):
return [Link] == []

After splitting, a key is pushed up to its parent node. It is quite possible

that the parent node has already been full. And this pushing violates the B-tree
property.
In order to solve this problem, we can check from the root along the path
of insertion traversing till the leaf. If there is any node in this path is full,
the splitting is applied. Since the parent of this node has been examined, it is
ensured that there are less than 2t 1 keys in the parent. It wont make the
parent full if pushing up one key. This approach only need one single pass down
the tree without any back-tracking.
If the root need splitting, a new node is created as the new root. There is
no keys in this new created root, and the previous root is set as the only child.
After that, splitting is performed top-down. And we can insert the new key
finally.
1: function Insert(T, k)
2:
rT
3:
if r is full then
. root is full
4:
s CREATE-NODE
5:
C(s) {r}
6:
Split-Child(s, 1)
7:
rs
8:
return Insert-Nonfull(r, k)
Where algorithm Insert-Nonfull assumes the node passed in is not full.

170

CHAPTER 7. B-TREES

If it is a leaf node, the new key is inserted to the proper position based on the
order; Otherwise, the algorithm finds a proper child node to which the new key
will be inserted. If this child is full, splitting will be performed.
1: function Insert-Nonfull(T, k)
2:
if T is leaf then
3:
i1
4:
while i |K(T )| k > ki (T ) do
5:
ii+1
6:
Insert(K(T ), i, k)
7:
else
8:
i |K(T )|
9:
while i > 1 k < ki (T ) do
10:
ii1
11:
if ci (T ) is full then
12:
Split-Child(T, i)
13:
if k > ki (T ) then
14:
ii+1
15:
Insert-Nonfull(ci (T ), k)
16:
return T
This algorithm is recursive. In B-tree, the minimum degree t is typically
relative to magnetic disk structure. Even small depth can support huge amount
of data (with t = 10, maximum to 10 billion data can be stored in a B-tree with
height of 10). The recursion can also be eliminated. This is left as exercise to
the reader.
Figure 7.5 shows the result of continuously inserting keys G, M, P, X, A, C,
D, E, J, K, N, O, R, S, T, U, V, Y, Z to the empty tree. The first result is the
2-3-4 tree (t = 2). The second result shows how it varies when t = 3.
E

(a) 2-3-4 tree.

(b) t = 3

Figure 7.5: Insertion result

Below example Python program implements this algorithm.
def insert(tr, key):

7.2. INSERTION

171

root = tr
if is_full(root):
s = BTree(root.t)
[Link](0, root)
split_child(s, 0)
root = s
return insert_nonfull(root, key)

And the insertion to non-full node is implemented as the following.

def insert_nonfull(tr, key):
if is_leaf(tr):
ordered_insert([Link], key)
else:
i = len([Link])
while i>0 and key < [Link][i-1]:
i = i-1
if is_full([Link][i]):
split_child(tr, i)
if key>[Link][i]:
i = i+1
insert_nonfull([Link][i], key)
return tr

Where function ordered insert is used to insert an element to an ordered

list. Function is full tests if a node contains 2t 1 keys.
def ordered_insert(lst, x):
i = len(lst)
[Link](x)
while i>0 and lst[i]<lst[i-1]:
(lst[i-1], lst[i]) = (lst[i], lst[i-1])
i=i-1
def is_full(node):
return len([Link]) 2 node.t - 1

For the array based collection, append on the tail is much more effective
than insert in other position, because the later takes O(n) time, if the length
of the collection is n. The ordered insert program firstly appends the new
element at the end of the existing collection, then iterates from the last element
to the first one, and checks if the current two elements next to each other are
ordered. If not, these two elements will be swapped.
Insert then fixing
In functional settings, B-tree insertion can be realized in a way similar to redblack tree. When insert a key to red-black tree, it is firstly inserted as in the
normal binary search tree, then recursive fixing is performed to resume the
balance of the tree. B-tree can be viewed as extension to the binary search tree,
that each node contains multiple keys and children. We can firstly insert the
key without considering if the node is full. Then perform fixing to satisfy the
minimum degree constraint.
insert(T, k) = f ix(ins(T, k))

(7.5)

172

CHAPTER 7. B-TREES

Function ins(T, k) traverse the B-tree T from root to find a proper position
where key k can be inserted. After that, function f ix is applied to resume the
B-tree properties. Denote B-tree in a form of T = (K, C, t), where K represents
keys, C represents children, and t is the minimum degree.
Below is the Haskell definition of B-tree.
data BTree a = Node{ keys :: [a]
, children :: [BTree a]
, degree :: Int} deriving (Eq)

The insertion function can be provided based on this definition.

insert tr x = fixRoot $ ins tr x

There are two cases when realize ins(T, k) function. If the tree T is leaf, k
is inserted to the keys; Otherwise if T is the branch node, we need recursively
insert k to the proper child.
Figure 7.6 shows the branch case. The algorithm first locates the position.
for certain key ki , if the new key k to be inserted satisfy ki1 < k < ki , Then
we need recursively insert k to child ci .
This position divides the node into 3 parts, the left part, the child ci and
the right part.
k, K[i-1]<