0% found this document useful (0 votes)

517 views642 pages

Elementary Algorithms PDF

Uploaded by

Senjuti De

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

517 views642 pages

Elementary Algorithms PDF

Uploaded by

Senjuti De

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 642

Elementary Algorithms

1
Larry LIU Xinyu

August 25, 2018

1
Larry LIU Xinyu
Version: 0.6180339887498949
Email: [email protected]
2
Contents

I Preface 5
0.1 Why? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
0.2 The smallest free ID problem, the power of algorithms . . . . . . 7
0.2.1 Improvement 1 . . . . . . . . . . . . . . . . . . . . . . . . 8
0.2.2 Improvement 2, Divide and Conquer . . . . . . . . . . . . 9
0.2.3 Expressiveness vs. Performance . . . . . . . . . . . . . . . 10
0.3 The number puzzle, power of data structure . . . . . . . . . . . . 12
0.3.1 The brute-force solution . . . . . . . . . . . . . . . . . . . 12
0.3.2 Improvement 1 . . . . . . . . . . . . . . . . . . . . . . . . 12
0.3.3 Improvement 2 . . . . . . . . . . . . . . . . . . . . . . . . 15
0.4 Notes and short summary . . . . . . . . . . . . . . . . . . . . . . 17
0.5 Structure of the contents . . . . . . . . . . . . . . . . . . . . . . . 18

II Trees 21

1 Binary search tree, the ‘hello world’ data structure 23

1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
1.2 Data Layout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
1.3 Insertion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
1.4 Traversing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
1.5 Querying a binary search tree . . . . . . . . . . . . . . . . . . . . 31
1.5.1 Looking up . . . . . . . . . . . . . . . . . . . . . . . . . . 31
1.5.2 Minimum and maximum . . . . . . . . . . . . . . . . . . . 32
1.5.3 Successor and predecessor . . . . . . . . . . . . . . . . . . 32
1.6 Deletion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
1.7 Randomly build binary search tree . . . . . . . . . . . . . . . . . 38

2 The evolution of insertion sort 43

2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
2.2 Insertion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
2.3 Improvement 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
2.4 Improvement 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
2.5 Final improvement by binary search tree . . . . . . . . . . . . . . 49
2.6 Short summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

3
4 CONTENTS

3 Red-black tree, not so complex as it was thought 53

3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
3.1.1 Exploit the binary search tree . . . . . . . . . . . . . . . . 53
3.1.2 How to ensure the balance of the tree . . . . . . . . . . . 54
3.1.3 Tree rotation . . . . . . . . . . . . . . . . . . . . . . . . . 56
3.2 Definition of red-black tree . . . . . . . . . . . . . . . . . . . . . 58
3.3 Insertion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
3.4 Deletion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
3.4.1 The sibling of the doubly black node is black, and it has
one red child . . . . . . . . . . . . . . . . . . . . . . . . . 65
3.4.2 The sibling of the doubly-black node is red . . . . . . . . 66
3.4.3 The sibling of the doubly-black node, and its two children
are all black . . . . . . . . . . . . . . . . . . . . . . . . . . 67
3.5 Imperative red-black tree algorithm ⋆ . . . . . . . . . . . . . . . 69
3.6 More words . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

4 AVL tree 75
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
4.1.1 How to measure the balance of a tree? . . . . . . . . . . . 75
4.2 Definition of AVL tree . . . . . . . . . . . . . . . . . . . . . . . . 75
4.3 Insertion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
4.3.1 Balancing adjustment . . . . . . . . . . . . . . . . . . . . 80
4.3.2 Pattern Matching . . . . . . . . . . . . . . . . . . . . . . . 82
4.4 Deletion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
4.5 Imperative AVL tree algorithm ⋆ . . . . . . . . . . . . . . . . . . 83
4.6 Chapter note . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

5 Radix tree, Trie and Prefix Tree 91

5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
5.2 Integer Trie . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
5.2.1 Definition of integer Trie . . . . . . . . . . . . . . . . . . . 93
5.2.2 Insertion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
5.2.3 Look up . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
5.3 Integer prefix tree . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
5.3.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
5.3.2 Insertion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
5.3.3 Look up . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
5.4 Alphabetic Trie . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
5.4.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
5.4.2 Insertion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
5.4.3 Look up . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
5.5 Alphabetic prefix tree . . . . . . . . . . . . . . . . . . . . . . . . 109
5.5.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
5.5.2 Insertion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
5.5.3 Look up . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
5.6 Applications of trie and prefix tree . . . . . . . . . . . . . . . . . 116
5.6.1 E-dictionary and word auto-completion . . . . . . . . . . 116
5.6.2 T9 input method . . . . . . . . . . . . . . . . . . . . . . . 121
5.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
CONTENTS 5

6 B-Trees 127
6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
6.2 Insertion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
6.2.1 Splitting . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
6.3 Deletion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
6.3.1 Merge before delete method . . . . . . . . . . . . . . . . . 136
6.3.2 Delete and fix method . . . . . . . . . . . . . . . . . . . . 144
6.4 Searching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
6.5 Notes and short summary . . . . . . . . . . . . . . . . . . . . . . 151

III Heaps 155

7 Binary Heaps 157

7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
7.2 Implicit binary heap by array . . . . . . . . . . . . . . . . . . . . 157
7.2.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
7.2.2 Heapify . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
7.2.3 Build a heap . . . . . . . . . . . . . . . . . . . . . . . . . 160
7.2.4 Basic heap operations . . . . . . . . . . . . . . . . . . . . 162
7.2.5 Heap sort . . . . . . . . . . . . . . . . . . . . . . . . . . . 169
7.3 Leftist heap and Skew heap, the explicit binary heaps . . . . . . 171
7.3.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . 172
7.3.2 Merge . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
7.3.3 Basic heap operations . . . . . . . . . . . . . . . . . . . . 174
7.3.4 Heap sort by Leftist Heap . . . . . . . . . . . . . . . . . . 175
7.3.5 Skew heaps . . . . . . . . . . . . . . . . . . . . . . . . . . 176
7.4 Splay heap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178
7.4.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . 178
7.4.2 Heap sort . . . . . . . . . . . . . . . . . . . . . . . . . . . 184
7.5 Notes and short summary . . . . . . . . . . . . . . . . . . . . . . 184

8 From grape to the world cup, the evolution of selection sort 189
8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189
8.2 Finding the minimum . . . . . . . . . . . . . . . . . . . . . . . . 191
8.2.1 Labeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192
8.2.2 Grouping . . . . . . . . . . . . . . . . . . . . . . . . . . . 193
8.2.3 performance of the basic selection sorting . . . . . . . . . 194
8.3 Minor Improvement . . . . . . . . . . . . . . . . . . . . . . . . . 195
8.3.1 Parameterize the comparator . . . . . . . . . . . . . . . . 195
8.3.2 Trivial fine tune . . . . . . . . . . . . . . . . . . . . . . . 196
8.3.3 Cock-tail sort . . . . . . . . . . . . . . . . . . . . . . . . . 197
8.4 Major improvement . . . . . . . . . . . . . . . . . . . . . . . . . . 201
8.4.1 Tournament knock out . . . . . . . . . . . . . . . . . . . . 201
8.4.2 Final improvement by using heap sort . . . . . . . . . . . 209
8.5 Short summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210
6 CONTENTS

9 Binomial heap, Fibonacci heap, and pairing heap 213

9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213
9.2 Binomial Heaps . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213
9.2.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . 213
9.2.2 Basic heap operations . . . . . . . . . . . . . . . . . . . . 218
9.3 Fibonacci Heaps . . . . . . . . . . . . . . . . . . . . . . . . . . . 228
9.3.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . 228
9.3.2 Basic heap operations . . . . . . . . . . . . . . . . . . . . 230
9.3.3 Running time of pop . . . . . . . . . . . . . . . . . . . . . 240
9.3.4 Decreasing key . . . . . . . . . . . . . . . . . . . . . . . . 241
9.3.5 The name of Fibonacci Heap . . . . . . . . . . . . . . . . 243
9.4 Pairing Heaps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 246
9.4.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . 246
9.4.2 Basic heap operations . . . . . . . . . . . . . . . . . . . . 246
9.5 Notes and short summary . . . . . . . . . . . . . . . . . . . . . . 252

IV Queues and Sequences 257

10 Queue, not so simple as it was thought 259
10.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259
10.2 Queue by linked-list and circular buffer . . . . . . . . . . . . . . 260
10.2.1 Singly linked-list solution . . . . . . . . . . . . . . . . . . 260
10.2.2 Circular buffer solution . . . . . . . . . . . . . . . . . . . 263
10.3 Purely functional solution . . . . . . . . . . . . . . . . . . . . . . 266
10.3.1 Paired-list queue . . . . . . . . . . . . . . . . . . . . . . . 266
10.3.2 Paired-array queue - a symmetric implementation . . . . 269
10.4 A small improvement, Balanced Queue . . . . . . . . . . . . . . . 270
10.5 One more step improvement, Real-time Queue . . . . . . . . . . 272
10.6 Lazy real-time queue . . . . . . . . . . . . . . . . . . . . . . . . . 279
10.7 Notes and short summary . . . . . . . . . . . . . . . . . . . . . . 282

11 Sequences, The last brick 285

11.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285
11.2 Binary random access list . . . . . . . . . . . . . . . . . . . . . . 286
11.2.1 Review of plain-array and list . . . . . . . . . . . . . . . . 286
11.2.2 Represent sequence by trees . . . . . . . . . . . . . . . . . 286
11.2.3 Insertion to the head of the sequence . . . . . . . . . . . . 288
11.3 Numeric representation for binary random access list . . . . . . . 293
11.3.1 Imperative binary random access list . . . . . . . . . . . . 296
11.4 Imperative paired-array list . . . . . . . . . . . . . . . . . . . . . 299
11.4.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . 299
11.4.2 Insertion and appending . . . . . . . . . . . . . . . . . . . 300
11.4.3 random access . . . . . . . . . . . . . . . . . . . . . . . . 300
11.4.4 removing and balancing . . . . . . . . . . . . . . . . . . . 301
11.5 Concatenate-able list . . . . . . . . . . . . . . . . . . . . . . . . . 303
11.6 Finger tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 306
11.6.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . 307
11.6.2 Insert element to the head of sequence . . . . . . . . . . . 309
11.6.3 Remove element from the head of sequence . . . . . . . . 312
CONTENTS 7

11.6.4 Handling the ill-formed finger tree when removing . . . . 313

11.6.5 append element to the tail of the sequence . . . . . . . . . 318
11.6.6 remove element from the tail of the sequence . . . . . . . 319
11.6.7 concatenate . . . . . . . . . . . . . . . . . . . . . . . . . . 320
11.6.8 Random access of finger tree . . . . . . . . . . . . . . . . 325
11.7 Notes and short summary . . . . . . . . . . . . . . . . . . . . . . 337

V Sorting and Searching 341

12 Divide and conquer, Quick sort vs. Merge sort 343
12.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343
12.2 Quick sort . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343
12.2.1 Basic version . . . . . . . . . . . . . . . . . . . . . . . . . 344
12.2.2 Strict weak ordering . . . . . . . . . . . . . . . . . . . . . 345
12.2.3 Partition . . . . . . . . . . . . . . . . . . . . . . . . . . . 346
12.2.4 Minor improvement in functional partition . . . . . . . . 349
12.3 Performance analysis for quick sort . . . . . . . . . . . . . . . . . 351
12.3.1 Average case analysis ⋆ . . . . . . . . . . . . . . . . . . . 352
12.4 Engineering Improvement . . . . . . . . . . . . . . . . . . . . . . 355
12.4.1 Engineering solution to duplicated elements . . . . . . . . 355
12.5 Engineering solution to the worst case . . . . . . . . . . . . . . . 362
12.6 Other engineering practice . . . . . . . . . . . . . . . . . . . . . . 366
12.7 Side words . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 367
12.8 Merge sort . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 368
12.8.1 Basic version . . . . . . . . . . . . . . . . . . . . . . . . . 368
12.9 In-place merge sort . . . . . . . . . . . . . . . . . . . . . . . . . . 375
12.9.1 Naive in-place merge . . . . . . . . . . . . . . . . . . . . . 376
12.9.2 in-place working area . . . . . . . . . . . . . . . . . . . . 377
12.9.3 In-place merge sort vs. linked-list merge sort . . . . . . . 381
12.10Nature merge sort . . . . . . . . . . . . . . . . . . . . . . . . . . 383
12.11Bottom-up merge sort . . . . . . . . . . . . . . . . . . . . . . . . 389
12.12Parallelism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 391
12.13Short summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 392

13 Searching 397
13.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 397
13.2 Sequence search . . . . . . . . . . . . . . . . . . . . . . . . . . . . 397
13.2.1 Divide and conquer search . . . . . . . . . . . . . . . . . . 398
13.2.2 Information reuse . . . . . . . . . . . . . . . . . . . . . . . 418
13.3 Solution searching . . . . . . . . . . . . . . . . . . . . . . . . . . 446
13.3.1 DFS and BFS . . . . . . . . . . . . . . . . . . . . . . . . . 446
13.3.2 Search the optimal solution . . . . . . . . . . . . . . . . . 483
13.4 Short summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 512

VI Appendix 515
Appendices
8 CONTENTS

A Lists 517
A.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 517
A.2 List Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . 517
A.2.1 Empty list . . . . . . . . . . . . . . . . . . . . . . . . . . . 518
A.2.2 Access the element and the sub list . . . . . . . . . . . . . 518
A.3 Basic list manipulation . . . . . . . . . . . . . . . . . . . . . . . . 519
A.3.1 Construction . . . . . . . . . . . . . . . . . . . . . . . . . 519
A.3.2 Empty testing and length calculating . . . . . . . . . . . . 520
A.3.3 indexing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 521
A.3.4 Access the last element . . . . . . . . . . . . . . . . . . . 522
A.3.5 Reverse indexing . . . . . . . . . . . . . . . . . . . . . . . 523
A.3.6 Mutating . . . . . . . . . . . . . . . . . . . . . . . . . . . 525
A.3.7 sum and product . . . . . . . . . . . . . . . . . . . . . . . 535
A.3.8 maximum and minimum . . . . . . . . . . . . . . . . . . . 539
A.4 Transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 542
A.4.1 mapping and for-each . . . . . . . . . . . . . . . . . . . . 543
A.4.2 reverse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 549
A.5 Extract sub-lists . . . . . . . . . . . . . . . . . . . . . . . . . . . 551
A.5.1 take, drop, and split-at . . . . . . . . . . . . . . . . . . . 551
A.5.2 breaking and grouping . . . . . . . . . . . . . . . . . . . . 553
A.6 Folding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 558
A.6.1 folding from right . . . . . . . . . . . . . . . . . . . . . . . 558
A.6.2 folding from left . . . . . . . . . . . . . . . . . . . . . . . 560
A.6.3 folding in practice . . . . . . . . . . . . . . . . . . . . . . 563
A.7 Searching and matching . . . . . . . . . . . . . . . . . . . . . . . 564
A.7.1 Existence testing . . . . . . . . . . . . . . . . . . . . . . . 564
A.7.2 Looking up . . . . . . . . . . . . . . . . . . . . . . . . . . 565
A.7.3 finding and filtering . . . . . . . . . . . . . . . . . . . . . 565
A.7.4 Matching . . . . . . . . . . . . . . . . . . . . . . . . . . . 568
A.8 zipping and unzipping . . . . . . . . . . . . . . . . . . . . . . . . 570
A.9 Notes and short summary . . . . . . . . . . . . . . . . . . . . . . 573

B The imperative red-black tree deletion algorithm 577

B.1 Doubly Black node . . . . . . . . . . . . . . . . . . . . . . . . . . 577
B.1.1 The doubly black node has a black sibling, and one of its
nephew is red. . . . . . . . . . . . . . . . . . . . . . . . . 578
B.1.2 The sibling of the doubly black node is red. . . . . . . . . 580
B.1.3 The sibling of the doubly black node, and both nephews
are black. . . . . . . . . . . . . . . . . . . . . . . . . . . . 581

C AVL tree - proofs and deletion algorithm 587

C.1 Height increment after insertion . . . . . . . . . . . . . . . . . . . 587
C.2 Proof to the balance adjustment after insertion . . . . . . . . . . 588
C.3 Deletion algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 592
C.3.1 Functional deletion . . . . . . . . . . . . . . . . . . . . . . 592
C.3.2 Imperative deletion . . . . . . . . . . . . . . . . . . . . . . 594
CONTENTS 9

D Suffix Tree 599

D.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 599
D.2 Suffix trie . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 600
D.2.1 Node transfer and suffix link . . . . . . . . . . . . . . . . 601
D.2.2 On-line construction . . . . . . . . . . . . . . . . . . . . . 602
D.3 Suffix Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 606
D.3.1 On-line construction . . . . . . . . . . . . . . . . . . . . . 606
D.4 Suffix tree applications . . . . . . . . . . . . . . . . . . . . . . . . 615
D.4.1 String/Pattern searching . . . . . . . . . . . . . . . . . . . 615
D.4.2 Find the longest repeated sub-string . . . . . . . . . . . . 617
D.4.3 Find the longest common sub-string . . . . . . . . . . . . 619
D.4.4 Find the longest palindrome . . . . . . . . . . . . . . . . . 621
D.4.5 Others . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 621
D.5 Notes and short summary . . . . . . . . . . . . . . . . . . . . . . 621

GNU Free Documentation License 625

1. APPLICABILITY AND DEFINITIONS . . . . . . . . . . . . . . . 625
2. VERBATIM COPYING . . . . . . . . . . . . . . . . . . . . . . . . 627
3. COPYING IN QUANTITY . . . . . . . . . . . . . . . . . . . . . . 627
4. MODIFICATIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . 628
5. COMBINING DOCUMENTS . . . . . . . . . . . . . . . . . . . . . 629
6. COLLECTIONS OF DOCUMENTS . . . . . . . . . . . . . . . . . 630
7. AGGREGATION WITH INDEPENDENT WORKS . . . . . . . . 630
8. TRANSLATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . 630
9. TERMINATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . 631
10. FUTURE REVISIONS OF THIS LICENSE . . . . . . . . . . . . 631
11. RELICENSING . . . . . . . . . . . . . . . . . . . . . . . . . . . . 631
ADDENDUM: How to use this License for your documents . . . . . . 632
10 CONTENTS
Part I

Preface

11
Elementary Algorithms 13

0.1 Why?
‘Are algorithms useful?’. Some programmers say that they seldom use any
serious data structures or algorithms in real work such as commercial application
development. Even when they need some of them, they have already been
provided by libraries. For example, the C++ standard template library (STL)
provides sort and selection algorithms as well as the vector, queue, and set data
structures. It seems that knowing about how to use the library as a tool is quite
enough.
Instead of answering this question directly, I would like to say algorithms
and data structures are critical in solving ‘interesting problems’, the usefulness
of the problem set aside.
Let’s start with two problems that looks like they can be solved in a brute-
force way even by a fresh programmer.

0.2 The smallest free ID problem, the power of

algorithms
This problem is discussed in Chapter 1 of Richard Bird’s book [1]. It’s common
that applications and systems use ID (identifier) to manage objects and entities.
At any time, some IDs are used, and some of them are available for use. When
some client tries to acquire a new ID, we want to always allocate it the smallest
available one. Suppose IDs are non-negative integers and all IDs in use are kept
in a list (or an array) which is not ordered. For example:

[18, 4, 8, 9, 16, 1, 14, 7, 19, 3, 0, 5, 2, 11, 6]

How can you find the smallest free ID, which is 10, from the list?
It seems the solution is quite easy even without any serious algorithms.
1: function Min-Free(A)
2: x←0
3: loop
4: if x ∈
/ A then
5: return x
6: else
7: x←x+1
Where the ∈ / is realized like below.
1: function ‘∈’(x,
/ X)
2: for i ← 1 to |X| do
3: if x = X[i] then
4: return False
5: return True
Some languages provide handy tools which wrap this linear time process. For
example in Python, this algorithm can be directly translated as the following.
def b r u t e _ f o r c e ( l s t ) :
i = 0
while True :
i f i not in l s t :
14 Preface

return i
i = i + 1
It seems this problem is trivial. However, There will be millions of IDs in a
large system. The speed of this solution is poor in such case for it takes O(n2 )
time, where n is the length of the ID list. In my computer (2 Cores 2.10 GHz,
with 2G RAM), a C program using this solution takes an average of 5.4 seconds
to search a minimum free number among 100,000 IDs1 . And it takes more than
8 minutes to handle a million numbers.

0.2.1 Improvement 1
The key idea to improve the solution is based on a fact that for a series of n
numbers x1 , x2 , ..., xn , if there are free numbers, some of the xi are outside the
range [0, n); otherwise the list is exactly a permutation of 0, 1, ..., n − 1 and n
should be returned as the minimum free number. We have the following fact.

minf ree(x1 , x2 , ..., xn ) ≤ n (1)

One solution is to use an array of n + 1 flags to mark whether a number in
range [0, n] is free.
1: function Min-Free(A)
2: F ← [F alse, F alse, ..., F alse] where |F | = n + 1
3: for ∀x ∈ A do
4: if x < n then
5: F [x] ← True
6: for i ← [0, n] do
7: if F [i] = False then
8: return i
Line 2 initializes a flag array all of False values. This takes O(n) time. Then
the algorithm scans all numbers in A and mark the relative flag to True if the
value is less than n, This step also takes O(n) time. Finally, the algorithm
performs a linear time search to find the first flag with False value. So the total
performance of this algorithm is O(n). Note that we use n + 1 flags instead of
n flags to cover the special case that sorted(A) = [0, 1, 2, ..., n − 1].
Although the algorithm only takes O(n) time, it needs extra O(n) spaces to
store the flags.
This solution is much faster than the brute force one. On my computer,
the relevant Python program takes an average of 0.02 second when dealing with
100,000 numbers.
We haven’t fine tuned this algorithm yet. Observe that each time we have
to allocate memory to create a n + 1 elements array of flags, and release the
memory when finished. The memory allocation and release is very expensive
thus they cost us a lot of processing time.
There are two ways in which we can improve on this solution. One is to
allocate the flags array in advance and reuse it for all the calls of our function to
find the smallest free number. The other is to use bit-wise flags instead of a flag
array. The following is the C program based on these two minor improvements.
1 All programs can be downloaded along with this series posts.
0.2. THE SMALLEST FREE ID PROBLEM, THE POWER OF ALGORITHMS15

#define N 1000000 // 1 m i l l i o n
#define WORD_LENGTH ( s i z e o f ( int ) ∗ 8 )

void s e t b i t ( unsigned int ∗ b i t s , unsigned int i ) {

b i t s [ i / WORD_LENGTH] |= 1<<( i % WORD_LENGTH) ;
}

int t e s t b i t ( unsigned int ∗ b i t s , unsigned int i ) {

return b i t s [ i /WORD_LENGTH] & (1<<( i % WORD_LENGTH) ) ;
}

unsigned int b i t s [N/WORD_LENGTH+ 1 ] ;

int min_free ( int ∗ xs , int n ) {

int i , l e n = N/WORD_LENGTH+1;
f or ( i =0; i <l e n ; ++i )
b i t s [ i ]=0;
f or ( i =0; i <n ; ++i )
i f ( xs [ i ]<n )
s e t b i t ( b i t s , xs [ i ] ) ;
f or ( i =0; i<=n ; ++i )
i f ( ! t e s t b i t ( bits , i ))
return i ;
}
This C program can handle 1,000,000 (1 million) IDs in just 0.023 second
on my computer.
The last for-loop can be further improved as seen below but this is just minor
fine-tuning.
f or ( i =0; ; ++i )
i f (~ b i t s [ i ] !=0 )
for ( j =0; ; ++j )
i f ( ! t e s t b i t ( b i t s , i ∗WORD_LENGTH+j ) )
return i ∗WORD_LENGTH+j ;

0.2.2 Improvement 2, Divide and Conquer

Although the above improvement is much faster, it costs O(n) extra spaces to
keep a check list. if n is huge number this means a huge amount of space is
wasted.
The typical divide and conquer strategy is to break the problem into some
smaller ones, and solve these to get the final answer.
We can put all numbers xi ≤ ⌊n/2⌋ as a sub-list A′ and put all the others
as a second sub-list A′′ . Based on formula 1 if the length of A′ is exactly ⌊n/2⌋,
this means the first half of numbers are ‘full’, which indicates that the minimum
free number must be in A′′ and so we’ll need to recursively seek in the shorter
list A′′ . Otherwise, it means the minimum free number is located in A′ , which
again leads to a smaller problem.
When we search the minimum free number in A′′ , the conditions changes
a little bit, we are not searching the smallest free number starting from 0, but
16 Preface

actually from ⌊n/2⌋ + 1 as the lower bound. So the algorithm is something like
minf ree(A, l, u), where l is the lower bound and u is the upper bound index of
the element.
Note that there is a trivial case, that if the number list is empty, we merely
return the lower bound as the result.
This divide and conquer solution can be formally expressed as a function :

minf ree(A) = search(A, 0, |A| − 1)


 l : A=ϕ
search(A, l, u) = search(A′′ , m + 1, u) : |A′ | = m − l + 1

search(A′ , l, m) : otherwise

where
l+u
m=⌊ ⌋
′
2
A = {∀x ∈ A ∧ x ≤ m}
A′′ = {∀x ∈ A ∧ x > m}
It is obvious that this algorithm doesn’t need any extra space2 . Each call
performs O(|A|) comparison to build A′ and A′′ . After that the problem scale
halves. So the time needed for this algorithm is T (n) = T (n/2) + O(n) which
reduce to O(n). Another way to analyze the performance is by observing that
the first call takes O(n) to build A′ and A′′ and the second call takes O(n/2), and
O(n/4) for the third... The total time is O(n + n/2 + n/4 + ...) = O(2n) = O(n)
.
In functional programming languages such as Haskell, partitioning a list has
already been provided in the basic library and this algorithm can be translated
as the following.
import Data.List

minFree xs = bsearch xs 0 (length xs - 1)

bsearch xs l u | xs == [] = l
| length as == m - l + 1 = bsearch bs (m+1) u
| otherwise = bsearch as l m
where
m = (l + u) `div` 2
(as, bs) = partition (≤m) xs

0.2.3 Expressiveness vs. Performance

Imperative language programmers may be concerned about the performance of
this kind of implementation. For instance in this minimum free ID problem, the
number of recursive calls is in O(lg n) , which means the stack size consumed
is in O(lg n). It’s not free in terms of space. But if we want to avoid that , we
2 Procedural programmer may note that it actually takes O(lg n) stack spaces for book-

keeping. As we’ll see later, this can be eliminated either by tail recursion optimization, for
instance gcc -O2. or by manually changing the recursion to iteration
0.2. THE SMALLEST FREE ID PROBLEM, THE POWER OF ALGORITHMS17

3
can eliminate the recursion by replacing it by an iteration which yields the
following C program.

int min_free(int∗ xs, int n){

int l=0;
int u=n-1;
while(n){
int m = (l + u) / 2;
int right, left = 0;
for(right = 0; right < n; ++ right)
if(xs[right] ≤ m){
swap(xs[left], xs[right]);
++left;
}
if(left == m - l + 1){
xs = xs + left;
n = n - left;
l = m+1;
}
else{
n = left;
u = m;
}
}
return l;
}

This program uses a ‘quick-sort’ like approach to re-arrange the array so that
all the elements before lef t are less than or equal to m; while those between
lef t and right are greater than m. This is shown in figure 1.

left right

x[i]<=m x[i]>m ...?...

Figure 1: Divide the array, all x[i] ≤ m where 0 ≤ i < lef t; while all x[i] > m
where lef t ≤ i < right. The left elements are unknown.

This program is fast and it doesn’t need extra stack space. However, com-
pared to the previous Haskell program, it’s hard to read and the expressiveness
decreased. We have to balance performance and expressiveness.

3 This is done automatically in most functional languages since our function is in tail re-

cursive form which lends itself perfectly to this transformation

18 Preface

0.3 The number puzzle, power of data structure

If the first problem, to find the minimum free number, is a some what useful in
practice, this problem is a ‘pure’ one for fun. The puzzle is to find the 1,500th
number, which only contains factor 2, 3 or 5. The first 3 numbers are of course
2, 3, and 5. Number 60 = 22 31 51 , However it is the 25th number. Number
21 = 20 31 71 , isn’t a valid number because it contains a factor 7. The first 10
such numbers are list as the following.
2,3,4,5,6,8,9,10,12,15
If we consider 1 = 20 30 50 , then 1 is also a valid number and it is the first
one.

0.3.1 The brute-force solution

It seems the solution is quite easy without need any serious algorithms. We can
check all numbers from 1, then extract all factors of 2, 3 and 5 to see if the left
part is 1.
1: function Get-Number(n)
2: x←1
3: i←0
4: loop
5: if Valid?(x) then
6: i←i+1
7: if i = n then
8: return x
9: x←x+1

10: function Valid?(x)

11: while x mod 2 = 0 do
12: x ← x/2
13: while x mod 3 = 0 do
14: x ← x/3
15: while x mod 5 = 0 do
16: x ← x/5
17: if x = 1 then
18: return T rue
19: else
20: return F alse
This ‘brute-force’ algorithm works for most small n. However, to find the
1500th number (which is 859963392), the C program based on this algorithm
takes 40.39 seconds in my computer. I have to kill the program after 10 minutes
when I increased n to 15,000.

0.3.2 Improvement 1
Analysis of the above algorithm shows that modular and divide calculations
are very expensive [2]. And they executed a lot in loops. Instead of checking a
number contains only 2, 3, or 5 as factors, one alternative solution is to construct
such number by these factors.
0.3. THE NUMBER PUZZLE, POWER OF DATA STRUCTURE 19

We start from 1, and times it with 2, or 3, or 5 to generate rest numbers.

The problem turns to be how to generate the candidate number in order? One
handy way is to utilize the queue data structure.
A queue data structure is used to push elements at one end, and pops them
at the other end. So that the element be pushed first is also be popped out first.
This property is called FIFO (First-In-First-Out).
The idea is to push 1 as the only element to the queue, then we pop an
element, times it with 2, 3, and 5, to get 3 new elements. We then push them
back to the queue in order. Note that, the new elements may have already
existed in the queue. In such case, we just drop the element. The new element
may also smaller than the others in the queue, so we must put them to the
correct position. Figure 2 illustrates this idea.

1 2 3 5 3 4 5 6 10

12=2 13=3 15=5 22=4 23=6 25=10 32=6 33=9 3*5=15

4 5 6 9 10 15

42=8 43=12 4*5=20

Figure 2: First 4 steps of constructing numbers with a queue.

1. Queue is initialized with 1 as the only element;
2. New elements 2, 3, and 5 are pushed back;
3. New elements 4, 6, and 10, are pushed back in order;
4. New elements 9 and 15 are pushed back, element 6 already exists.

This algorithm is shown as the following.

1: function Get-Number(n)
2: Q ← N IL
3: Enqueue(Q, 1)
4: while n > 0 do
5: x ← Dequeue(Q)
6: Unique-Enqueue(Q, 2x)
7: Unique-Enqueue(Q, 3x)
8: Unique-Enqueue(Q, 5x)
9: n←n−1
10: return x

11: function Unique-Enqueue(Q, x)

12: i←0
13: while i < |Q| ∧ Q[i] < x do
14: i←i+1
15: if i < |Q| ∧ x = Q[i] then
16: return
17: Insert(Q, i, x)
20 Preface

The insert function takes O(|Q|) time to find the proper position and insert
it. If the element has already existed, it just returns.
A rough estimation tells that the length of the queue increase proportion to
n, (Each time, we extract one element, and pushed 3 new, the increase ratio ≤
2), so the total running time is O(1 + 2 + 3 + ... + n) = O(n2 ).
Figure3 shows the number of queue access time against n. It is quadratic
curve which reflect the O(n2 ) performance.

Figure 3: Queue access count v.s. n.

The C program based on this algorithm takes only 0.016[s] to get the right
answer 859963392. Which is 2500 times faster than the brute force solution.
Improvement 1 can also be considered in recursive way. Suppose X is the
infinity series for all numbers which only contain factors of 2, 3, or 5. The
following formula shows an interesting relationship.

X = {1} ∪ {2x : ∀x ∈ X} ∪ {3x : ∀x ∈ X} ∪ {5x : ∀x ∈ X} (2)

Where we can define ∪ to a special form so that all elements are stored
in order as well as unique to each other. Suppose that X = {x1 , x2 , x3 ...},
Y = {y1 , y2 , y3 , ...}, X ′ = {x2 , x3 , ...} and Y ′ = {y2 , y3 , ...}. We have


 X : Y =ϕ


 Y : X=ϕ
X ∪Y = {x1 , X ′ ∪ Y } : x1 < y1



 {x1 , X ′ ∪ Y ′ } : x1 = y1

{y1 , X ∪ Y ′ } : x1 > y1
In a functional programming language such as Haskell, which supports lazy
evaluation, The above infinity series functions can be translate into the following
program.
ns = 1:merge (map (∗2) ns) (merge (map (∗3) ns) (map (∗5) ns))

merge [] l = l
merge l [] = l
merge (x:xs) (y:ys) | x <y = x : merge xs (y:ys)
| x ==y = x : merge xs ys
| otherwise = y : merge (x:xs) ys
0.3. THE NUMBER PUZZLE, POWER OF DATA STRUCTURE 21

By evaluate ns !! (n-1), we can get the 1500th number as below.

>ns !! (1500-1)
859963392

0.3.3 Improvement 2
Considering the above solution, although it is much faster than the brute-force
one, It still has some drawbacks. It produces many duplicated numbers and
they are finally dropped when examine the queue. Secondly, it does linear scan
and insertion to keep the order of all elements in the queue, which degrade the
ENQUEUE operation from O(1) to O(|Q|).
If we use three queues instead of using only one, we can improve the solution
one step ahead. Denote these queues as Q2 , Q3 , and Q5 , and we initialize them
as Q2 = {2}, Q3 = {3} and Q5 = {5}. Each time we DEQUEUEed the smallest
one from Q2 , Q3 , and Q5 as x. And do the following test:

• If x comes from Q2 , we ENQUEUE 2x, 3x, and 5x back to Q2 , Q3 , and

Q5 respectively;
• If x comes from Q3 , we only need ENQUEUE 3x to Q3 , and 5x to Q5 ;
We needn’t ENQUEUE 2x to Q2 , because 2x have already existed in Q3 ;
• If x comes from Q5 , we only need ENQUEUE 5x to Q5 ; there is no need
to ENQUEUE 2x, 3x to Q2 , Q3 because they have already been in the
queues;

We repeatedly ENQUEUE the smallest one until we find the n-th element.
The algorithm based on this idea is implemented as below.
1: function Get-Number(n)
2: if n = 1 then
3: return 1
4: else
5: Q2 ← {2}
6: Q3 ← {3}
7: Q5 ← {5}
8: while n > 1 do
9: x ← min(Head(Q2 ), Head(Q3 ), Head(Q5 ))
10: if x = Head(Q2 ) then
11: Dequeue(Q2 )
12: Enqueue(Q2 , 2x)
13: Enqueue(Q3 , 3x)
14: Enqueue(Q5 , 5x)
15: else if x = Head(Q3 ) then
16: Dequeue(Q3 )
17: Enqueue(Q3 , 3x)
18: Enqueue(Q5 , 5x)
19: else
20: Dequeue(Q5 )
21: Enqueue(Q5 , 5x)
22: n←n−1
22 Preface

2min=4 3min=6 5min=10 3min=9 5*min=15

2 3 5 4 3 6 5 10

min=2 min=3

2min=8 3min=12 5min=20 5min=25

4 6 9 5 10 15 8 6 9 12 5 10 15 20

min=4 min=5

Figure 4: First 4 steps of constructing numbers with Q2 , Q3 , and Q5 .

1. Queues are initialized with 2, 3, 5 as the only element;
2. New elements 4, 6, and 10 are pushed back;
3. New elements 9, and 15, are pushed back;
4. New elements 8, 12, and 20 are pushed back;
5. New element 25 is pushed back.

23: return x
This algorithm loops n times, and within each loop, it extract one head
element from the three queues, which takes constant time. Then it appends
one to three new elements at the end of queues which bounds to constant time
too. So the total time of the algorithm bounds to O(n). The C++ program
translated from this algorithm shown below takes less than 1 µs to produce the
1500th number, 859963392.

typedef unsigned long Integer;

Integer get_number(int n){

if(n==1)
return 1;
queue<Integer> Q2, Q3, Q5;
Q2.push(2);
Q3.push(3);
Q5.push(5);
Integer x;
while(n-- > 1){
x = min(min(Q2.front(), Q3.front()), Q5.front());
if(x==Q2.front()){
Q2.pop();
Q2.push(x∗2);
Q3.push(x∗3);
Q5.push(x∗5);
}
0.4. NOTES AND SHORT SUMMARY 23

else if(x==Q3.front()){
Q3.pop();
Q3.push(x∗3);
Q5.push(x∗5);
}
else{
Q5.pop();
Q5.push(x∗5);
}
}
return x;
}

This solution can be also implemented in Functional way. We define a func-

tion take(n), which will return the first n numbers contains only factor 2, 3, or
5.

take(n) = f (n, {1}, {2}, {3}, {5})

Where
{
X : n=1
f (n, X, Q2 , Q3 , Q5 ) =
f (n − 1, X ∪ {x}, Q′2 , Q′3 , Q′5 ) : otherwise

x = min(Q21 , Q31 , Q51 )


 {Q22 , Q23 , ...} ∪ {2x}, Q3 ∪ {3x}, Q5 ∪ {5x} : x = Q21
Q′2 , Q′3 , Q′5 = Q2 , {Q32 , Q33 , ...} ∪ {3x}, Q5 ∪ {5x} : x = Q31

Q2 , Q3 , {Q52 , Q53 , ...} ∪ {5x} : x = Q51
And these functional definition can be realized in Haskell as the following.
ks 1 xs _ = xs
ks n xs (q2, q3, q5) = ks (n-1) (xs++[x]) update
where
x = minimum $ map head [q2, q3, q5]
update | x == head q2 = ((tail q2)++[x∗2], q3++[x∗3], q5++[x∗5])
| x == head q3 = (q2, (tail q3)++[x∗3], q5++[x∗5])
| otherwise = (q2, q3, (tail q5)++[x∗5])

takeN n = ks n [1] ([2], [3], [5])

Invoke ‘last takeN 1500’ will generate the correct answer 859963392.

0.4 Notes and short summary

If review the 2 puzzles, we found in both cases, the brute-force solutions are so
weak. In the first problem, it’s quite poor in dealing with long ID list, while in
the second problem, it doesn’t work at all.
The first problem shows the power of algorithms, while the second problem
tells why data structure is important. There are plenty of interesting problems,
which are hard to solve before computer was invented. With the aid of com-
puter and programming, we are able to find the answer in a quite different way.
24 Preface

Compare to what we learned in mathematics course in school, we haven’t been

taught the method like this.
While there have been already a lot of wonderful books about algorithms,
data structures and math, however, few of them provide the comparison between
the procedural solution and the functional solution. From the above discussion,
it can be found that functional solution sometimes is very expressive and they
are close to what we are familiar in mathematics.
This series of post focus on providing both imperative and functional algo-
rithms and data structures. Many functional data structures can be referenced
from Okasaki’s book[6]. While the imperative ones can be founded in classic
text books [2] or even in WIKIpedia. Multiple programming languages, includ-
ing, C, C++, Python, Haskell, and Scheme/Lisp will be used. In order to make
it easy to read by programmers with different background, pseudo code and
mathematical function are the regular descriptions of each post.
The author is NOT a native English speaker, the reason why this book is
only available in English for the time being is because the contents are still
changing frequently. Any feedback, comments, or criticizes are welcome.

0.5 Structure of the contents

In the following series of post, I’ll first introduce about elementary data struc-
tures before algorithms, because many algorithms need knowledge of data struc-
tures as prerequisite.
The ‘hello world’ data structure, binary search tree is the first topic; Then
we introduce how to solve the balance problem of binary search tree. After
that, I’ll show other interesting trees. Trie, and Prefix trees are useful in text
manipulation. While B-trees are commonly used in file system and data base
implementation.
The second part of data structures is about heaps. We’ll provide a gen-
eral Heap definition and introduce about binary heaps by array and by explicit
binary trees. Then we’ll extend to K-ary heaps including Binomial heaps, Fi-
bonacci heaps, and pairing heaps.
Array and queues are considered among the easiest data structures typically,
However, we’ll show how difficult to implement them in the third part.
As the elementary sort algorithms, we’ll introduce insertion sort, quick sort,
merge sort etc in both imperative way and functional way.
The final part is about searching, besides the element searching, we’ll also
show string matching algorithms such as KMP.
Bibliography

[1] Richard Bird. “Pearls of functional algorithm design”. Cambridge Univer-

sity Press; 1 edition (November 1, 2010). ISBN-10: 0521513383
[2] Jon Bentley. “Programming Pearls(2nd Edition)”. Addison-Wesley Profes-
sional; 2 edition (October 7, 1999). ISBN-13: 978-0201657883
[3] Chris Okasaki. “Purely Functional Data Structures”. Cambridge university
press, (July 1, 1999), ISBN-13: 978-0521663502
[4] Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest and Clifford
Stein. “Introduction to Algorithms, Second Edition”. The MIT Press, 2001.
ISBN: 0262032937.

25
26 BIBLIOGRAPHY
Part II

Trees

27
Chapter 1

Binary search tree, the

‘hello world’ data structure

1.1 Introduction
Arrays or lists are typically considered the ‘hello world’ data structures. How-
ever, we’ll see they are not actually particularly easy to implement. In some
procedural settings, arrays are the most elementary data structures, and it is
possible to implement linked lists using arrays (see section 10.3 in [2]). On the
other hand, in some functional settings, linked lists are the elementary building
blocks used to create arrays and other data structures.
Considering these factors, we start with Binary Search Trees (or BST) as the
‘hello world’ data structure using an interesting problem Jon Bentley mentioned
in ‘Programming Pearls’ [2]. The problem is to count the number of times each
word occurs in a large text. One solution in C++ is below:
int main(int, char∗∗ ){
map<string, int> dict;
string s;
while(cin>>s)
++dict[s];
map<string, int>::iterator it=dict.begin();
for(; it!=dict.end(); ++it)
cout<<it→first<<": "<<it→second<<"λn";
}

And we can run it to produce the result using the following UNIX commands
1
.

$ g++ wordcount.cpp -o wordcount

$ cat bbe.txt | ./wordcount > wc.txt

The map provided in the standard template library is a kind of balanced

BST with augmented data. Here we use the words in the text as the keys and
the number of occurrences as the augmented data. This program is fast, and
1 This is not a UNIX unique command, in Windows OS, it can be achieved by: type bbe.txt

| wordcount.exe > wc.txt

29
30CHAPTER 1. BINARY SEARCH TREE, THE ‘HELLO WORLD’ DATA STRUCTURE

it reflects the power of BSTs. We’ll introduce how to implement BSTs in this
section and show how to balance them in a later section.
Before we dive into BSTs, let’s first introduce the more general binary tree.
Binary trees are recursively defined. BSTs are just one type of binary tree.
A binary tree is usually defined in the following way.
A binary tree is

• either an empty node;

• or a node containing 3 parts: a value, a left child which is a binary tree

and a right child which is also a binary tree.

Figure 1.1 shows this concept and an example binary tree.

L R

(a) Concept of binary tree

4 10

14 7 9 3

2 8 1

(b) An example binary tree

Figure 1.1: Binary tree concept and an example.

A BST is a binary tree where the following applies to each node:

• all the values in left child tree are less than the value of this node;

• the value of this node is less than any values in its right child tree.

Figure 1.2 shows an example of a BST. Comparing with Figure 1.1, we can
see the differences in how keys are ordered between them.

1.2 Data Layout

Based on the recursive definition of BSTs, we can draw the data layout in a
procedural setting with pointers as in Figure 1.3.
1.2. DATA LAYOUT 31

3 8

1 7 16

2 10

9 14

Figure 1.2: An example of a BST

The node first contains a field for the key, which can be augmented with
satellite data. The next two fields contain pointers to the left and right children,
respectively. To make backtracking to ancestors easy, a parent field is sometimes
provided as well.
In this section, we’ll ignore the satellite data for the sake of simplifying
the illustrations. Based on this layout, the node of BST can be defined in a
procedural language, such as C++:
template<class T>
struct node{
node(T x):key(x), left(0), right(0), parent(0){}
~node(){
delete left;
delete right;
}

node∗ left;
node∗ right;
node∗ parent; //Optional, it's helpful for succ and pred
T key;
};

There is another setting, for instance in Scheme/Lisp languages, the elemen-

tary data structure is a linked list. Figure 1.4 shows how a BST node can be
built on top of linked list.
In more functional settings, it’s hard to use pointers for backtracking (and
typically, there is no need for backtracking, since there are usually top-down
recursive solutions), and so the ‘parent’ field has been omitted in that layout.
To simplify things, we’ll skip the detailed layouts in the future and only focus
on the logic layouts of data structures. For example, below is the definition of
32CHAPTER 1. BINARY SEARCH TREE, THE ‘HELLO WORLD’ DATA STRUCTURE

key + satellite data

left
right
parent

key + satellite data key + satellite data

left left
right right
parent parent

... ... ... ...

Figure 1.3: Layout of nodes with parent field.

key next

left ... next

right ... NIL

Figure 1.4: Binary search tree node layout on top of linked list. Where ‘left...’
and ‘right ...’ are either empty or BST nodes composed in the same way.
1.3. INSERTION 33

a BST node in Haskell:

data Tree a = Empty
| Node (Tree a) a (Tree a)

1.3 Insertion
To insert a key k (sometimes along with a value in practice) to a BST T , we
can use the following algorithm:

• If the tree is empty, construct a leaf node with key = k;

• If k is less than the key of root node, insert it in the left child;
• If k is greater than the key of root node, insert it in the right child.

The exception to the above is when k is equal to the key of the root node,
meaning it already exists in the BST, and we can either overwrite the data, or
just do nothing. To simplify things, this case has been skipped in this section.
This algorithm is described recursively. It’s simplicity is why we consider
the BST structure the ‘hello world’ data structure. Formally, the algorithm can
be represented with a recursive mathematical function:

 node(ϕ, k, ϕ) : T = ϕ
insert(T, k) = node(insert(Tl , k), k ′ , Tr ) : k < k ′ (1.1)

node(Tl , k ′ , insert(Tr , k)) : otherwise
Where Tl is the left child, Tr is the right child, and k ′ is the key when T
isn’t empty.
The node function creates a new node given the left subtree, a key and a
right subtree as parameters. ϕ means NIL or empty.
Translating the above functions directly to Haskell yields the following pro-
gram:
insert Empty k = Node Empty k Empty
insert (Node l x r) k | k < x = Node (insert l k) x r
| otherwise = Node l x (insert r k)
This program utilized the pattern matching features provided by the lan-
guage. However, even in functional settings without this feature (e.g. Scheme/Lisp)
the program is still expressive:
(define (insert tree x)
(cond ((null? tree) (list '() x '()))
((< x (key tree))
(make-tree (insert (left tree) x)
(key tree)
(right tree)))
((> x (key tree))
(make-tree (left tree)
(key tree)
(insert (right tree) x)))))
This algorithm can be expressed imperatively using iteration, completely
free of recursion:
34CHAPTER 1. BINARY SEARCH TREE, THE ‘HELLO WORLD’ DATA STRUCTURE

1: function Insert(T, k)
2: root ← T
3: x ← Create-Leaf(k)
4: parent ← N IL
5: while T ̸= N IL do
6: parent ← T
7: if k < Key(T ) then
8: T ← Left(T )
9: else
10: T ← Right(T )
11: Parent(x) ← parent
12: if parent = N IL then ▷ tree T is empty
13: return x
14: else if k < Key(parent) then
15: Left(parent) ← x
16: else
17: Right(parent) ← x
18: return root

19: function Create-Leaf(k)

20: x ← Empty-Node
21: Key(x) ← k
22: Left(x) ← N IL
23: Right(x) ← N IL
24: Parent(x) ← N IL
25: return x
While more complex than the functional algorithm, it is still fast, even when
presented with very deep trees. Complete C++ and python programs are avail-
able along with this section for reference.

1.4 Traversing
Traversing means visiting every element one-by-one in a BST. There are 3 ways
to traverse a binary tree: a pre-order tree walk, an in-order tree walk and a
post-order tree walk. The names of these traversal methods highlight the order
in which we visit the root of a BST.

• pre-order traversal:, visit the key, then the left child, finally the right child;

• in-order traversal: visit the left child, then the key, finally the right child;

• post-order traversal: visit the left child, then the right child, finally the
key.

Note that each ‘visiting’ operation is recursive. As mentioned before, we see

that the order in which the key is visited determines the name of the traversal
method.
For the BST shown in figure 1.2, below are the three different traversal
results.
1.4. TRAVERSING 35

• pre-order traversal results: 4, 3, 1, 2, 8, 7, 16, 10, 9, 14;

• in-order traversal results: 1, 2, 3, 4, 7, 8, 9, 10, 14, 16;

• post-order traversal results: 2, 1, 3, 7, 9, 14, 10, 16, 8, 4.

The in-order walk of a BST outputs the elements in increasing order. The
definition of a BST ensures this interesting property, while the proof of this fact
is left as an exercise to the reader.
The in-order tree walk algorithm can be described as:

• If the tree is empty, just return;

• traverse the left child by in-order walk, then access the key, finally traverse
the right child by in-order walk.

Translating the above description yields a generic map function:

{
ϕ : T =ϕ
map(f, T ) = (1.2)
node(Tl′ , k ′ , Tr′ ) : otherwise

Where

Tl′ = map(f, Tl )
Tr′ = map(f, Tr )
k ′ = f (k)

And Tl , Tr and k are the children and key when the tree isn’t empty.
If we only need access the key without create the transformed tree, we can
realize this algorithm in procedural way lie the below C++ program.
template<class T, class F>
void in_order_walk(node<T>∗ t, F f){
if(t){
in_order_walk(t→left, f);
f(t→value);
in_order_walk(t→right, f);
}
}

The function takes a parameter f, it can be a real function, or a function

object, this program will apply f to the node by in-order tree walk.
We can simplified this algorithm one more step to define a function which
turns a BST to a sorted list by in-order traversing.

{
ϕ : T =ϕ
toList(T ) = (1.3)
toList(Tl ) ∪ {k} ∪ toList(Tr ) : otherwise

Below is the Haskell program based on this definition.

toList Empty = []
toList (Node l x r) = toList l ++ [x] ++ toList r
36CHAPTER 1. BINARY SEARCH TREE, THE ‘HELLO WORLD’ DATA STRUCTURE

This provides us a method to sort a list of elements. We can first build a

BST from the list, then output the tree by in-order traversing. This method is
called as ‘tree sort’. Let’s denote the list X = {x1 , x2 , x3 , ..., xn }.

sort(X) = toList(f romList(X)) (1.4)

And we can write it in function composition form.

sort = toList.f romList

Where function f romList repeatedly insert every element to an empty BST.

f romList(X) = f oldL(insert, ϕ, X) (1.5)

It can also be written in partial application form2 like below.

f romList = f oldL insert ϕ

For the readers who are not familiar with folding from left, this function can
also be defined recursively as the following.

{
ϕ : X=ϕ
f romList(X) =
insert(f romList({x2 , x3 , ..., xn }), x1 ) : otherwise

We’ll intense use folding function as well as the function composition and
partial evaluation in the future, please refer to appendix of this book or [6] [7]
and [8] for more information.

Exercise 1.1

• Given the in-order traverse result and pre-order traverse result, can you re-
construct the tree from these result and figure out the post-order traversing
result?

– Pre-order result: 1, 2, 4, 3, 5, 6;
– In-order result: 4, 2, 1, 5, 3, 6;
– Post-order result: ?

• Write a program in your favorite language to re-construct the binary tree

from pre-order result and in-order result.

• Prove why in-order walk output the elements stored in a binary search
tree in increase order?

• Can you analyze the performance of tree sort with big-O notation?

2 Also known as ’Curried form’ to memorialize the mathematician and logician Haskell

Curry.
1.5. QUERYING A BINARY SEARCH TREE 37

1.5 Querying a binary search tree

There are three types of querying for binary search tree, searching a key in the
tree, find the minimum or maximum element in the tree, and find the predecessor
or successor of an element in the tree.

1.5.1 Looking up
According to the definition of binary search tree, search a key in a tree can be
realized as the following.

• If the tree is empty, the searching fails;

• If the key of the root is equal to the value to be found, the search succeed.
The root is returned as the result;
• If the value is less than the key of the root, search in the left child.
• Else, which means that the value is greater than the key of the root, search
in the right child.

This algorithm can be described with a recursive function as below.



 ϕ : T =ϕ

T : k=x
lookup(T, x) = (1.6)

 lookup(Tl , x) : x < k

lookup(Tr , x) : otherwise
Where Tl , Tr and k are the children and key when T isn’t empty. In the real
application, we may return the satellite data instead of the node as the search
result. This algorithm is simple and straightforward. Here is a translation of
Haskell program.
lookup Empty _ = Empty
lookup t@(Node l k r) x | k == x = t
| x < k = lookup l x
| otherwise = lookup r x
If the BST is well balanced, which means that almost all nodes have both
non-NIL left child and right child, for n elements, the search algorithm takes
O(lg n) time to perform. This is not formal definition of balance. We’ll show it
in later post about red-black-tree. If the tree is poor balanced, the worst case
takes O(n) time to search for a key. If we denote the height of the tree as h, we
can uniform the performance of the algorithm as O(h).
The search algorithm can also be realized without using recursion in a pro-
cedural manner.
1: function Search(T, x)
2: while T ̸= N IL∧ Key(T ) ̸= x do
3: if x < Key(T ) then
4: T ← Left(T )
5: else
6: T ← Right(T )
7: return T
38CHAPTER 1. BINARY SEARCH TREE, THE ‘HELLO WORLD’ DATA STRUCTURE

Below is the C++ program based on this algorithm.

template<class T>
node<T>∗ search(node<T>∗ t, T x){
while(t && t→key!=x){
if(x < t→key) t=t→left;
else t=t→right;
}
return t;
}

1.5.2 Minimum and maximum

Minimum and maximum can be implemented from the property of binary search
tree, less keys are always in left child, and greater keys are in right.
For minimum, we can continue traverse the left sub tree until it is empty.
While for maximum, we traverse the right.
{
k : Tl = ϕ
min(T ) = (1.7)
min(Tl ) : otherwise
{
k : Tr = ϕ
max(T ) = (1.8)
max(Tr ) : otherwise
Both functions bound to O(h) time, where h is the height of the tree. For
the balanced BST, min/max are bound to O(lg n) time, while they are O(n) in
the worst cases.
We skip translating them to programs, It’s also possible to implement them
in pure procedural way without using recursion.

1.5.3 Successor and predecessor

The last kind of querying is to find the successor or predecessor of an element. It
is useful when a tree is treated as a generic container and traversed with iterator.
We need access the parent of a node to make the implementation simple.
It seems hard to find the functional solution, because there is no pointer like
field linking to the parent node3 . One solution is to left ‘breadcrumbs’ when we
visit the tree, and use these information to back-track or even re-construct the
whole tree. Such data structure, that contains both the tree and ‘breadcrumbs’
is called zipper. please refer to [9] for details.
However, If we consider the original purpose of providing succ/pred function,
‘to traverse all the BST elements one by one‘ as a generic container, we realize
that they don’t make significant sense in functional settings because we can
traverse the tree in increase order by map function we defined previously.
We’ll meet many problems in this series of post that they are only valid in
imperative settings, and they are not meaningful problems in functional settings
at all. One good example is how to delete an element in red-black-tree[3].
In this section, we’ll only present the imperative algorithm for finding the
successor and predecessor in a BST.
3 There is ref in ML and OCaml, but we only consider the purely functional settings.
1.5. QUERYING A BINARY SEARCH TREE 39

When finding the successor of element x, which is the smallest one y that
satisfies y > x, there are two cases. If the node with value x has non-NIL right
child, the minimum element in right child is the answer; For example, in Figure
1.5, in order to find the successor of 8, we search it’s right sub tree for the
minimum one, which yields 9 as the result. While if node x don’t have right
child, we need back-track to find the closest ancestor whose left child is also
ancestor of x. In Figure 1.5, since 2 don’t have right sub tree, we go back to its
parent 1. However, node 1 don’t have left child, so we go back again and reach
to node 3, the left child of 3, is also ancestor of 2, thus, 3 is the successor of
node 2.

3 8

1 7 16

2 10

9 14

Figure 1.5: The successor of 8, is the minimum one in its right sub tree, 9;
In order to find the successor of 2, we go up to its parent 1, but 1 doesn’t have
left child, we go up again and find 3. Because its left child is also the ancestor
of 2, 3 is the result.

Based on this description, the algorithm can be given as the following.

1: function Succ(x)
2: if Right(x) ̸= N IL then
3: return Min(Right(x))
4: else
5: p ← Parent(x)
6: while p ̸= N IL and x = Right(p) do
7: x←p
8: p ← Parent(p)
9: return p
If x doesn’t has successor, this algorithm returns NIL. The predecessor case
is quite similar to the successor algorithm, they are symmetrical to each other.
1: function Pred(x)
2: if Left(x) ̸= N IL then
3: return Max(Left(x))
4: else
40CHAPTER 1. BINARY SEARCH TREE, THE ‘HELLO WORLD’ DATA STRUCTURE

5: p ← Parent(x)
6: while p ̸= N IL and x = Left(p) do
7: x←p
8: p ← Parent(p)
9: return p
Below are the Python programs based on these algorithms. They are changed
a bit in while loop conditions.
def succ(x):
if x.right is not None: return tree_min(x.right)
p = x.parent
while p is not None and p.left != x:
x=p
p = p.parent
return p

def pred(x):
if x.left is not None: return tree_max(x.left)
p = x.parent
while p is not None and p.right != x:
x=p
p = p.parent
return p

Exercise 1.2
• Can you figure out how to iterate a tree as a generic container by using
Pred/Succ? What’s the performance of such traversing process in terms
of big-O?
• A reader discussed about traversing all elements inside a range [a, b]. In
C++, the algorithm looks like the below code:
for_each (m.lower_bound(12), m.upper_bound(26), f);
Can you provide the purely function solution for this problem?

1.6 Deletion
Deletion is another ‘imperative only’ topic for binary search tree. This is because
deletion mutate the tree, while in purely functional settings, we don’t modify
the tree after building it in most application.
However, One method of deleting element from binary search tree in purely
functional way is shown in this section. It’s actually reconstructing the tree but
not modifying the tree.
Deletion is the most complex operation for binary search tree. this is because
we must keep the BST property, that for any node, all keys in left sub tree are
less than the key of this node, and they are all less than any keys in right sub
tree. Deleting a node can break this property.
In this post, different with the algorithm described in [2], A simpler one from
SGI STL implementation is used.[4]
To delete a node x from a tree.
1.6. DELETION 41

• If x has no child or only one child, splice x out;

• Otherwise (x has two children), use minimum element of its right sub tree
to replace x, and splice the original minimum element out.

The simplicity comes from the truth that, the minimum element is stored in
a node in the right sub tree, which can’t have two non-NIL children. It ends up
in the trivial case, the node can be directly splice out from the tree.
Figure 1.6, 1.7, and 1.8 illustrate these different cases when deleting a node
from the tree.

Tree

NIL NIL

Figure 1.6: x can be spliced out.

Tree
Tree

x
L

L NIL

(a) Before delete x. (b) After delete x, x is spliced out, and

replaced by its left child.

Tree
Tree

x
R

NIL R

(c) Before delete x. (d) After delete x, x is spliced out, and

replaced by its right child.

Figure 1.7: Delete a node which has only one non-NIL child.
42CHAPTER 1. BINARY SEARCH TREE, THE ‘HELLO WORLD’ DATA STRUCTURE

Tree

min(R)

Tree

x
L delete(R, min(R))

L R

(a) Before delete x. (b) After delete x, x is replaced by

splicing the minimum element from
its right child.

Figure 1.8: Delete a node which has both children.

Based on this idea, the deletion can be defined as the below function.



 ϕ : T =ϕ



 node(delete(Tl , x), K, Tr ) : x<k

node(Tl , k, delete(Tr , x)) : x>k
delete(T, x) = (1.9)

 Tr : x = k ∧ Tl = ϕ



 Tl : x = k ∧ Tr = ϕ

node(Tl , y, delete(Tr , y)) : otherwise

Where
Tl = lef t(T )
Tr = right(T )
k = key(T )
y = min(Tr )
Translating the function to Haskell yields the below program.
delete Empty _ = Empty
delete (Node l k r) x | x < k = (Node (delete l x) k r)
| x > k = (Node l k (delete r x))
-- x == k
| isEmpty l = r
| isEmpty r = l
| otherwise = (Node l k' (delete r k'))
where k' = min r

Function isEmpty is to test if a tree is empty (ϕ). Note that the algorithm
first performs search to locate the node where the element need be deleted,
after that it execute the deletion. This algorithm takes O(h) time where h is
the height of the tree.
1.6. DELETION 43

It’s also possible to pass the node but not the element to the algorithm for
deletion. Thus the searching is no more needed.
The imperative algorithm is more complex because it need set the parent
properly. The function will return the root of the result tree.
1: function Delete(T, x)
2: r←T
3: x′ ← x ▷ save x
4: p ← Parent(x)
5: if Left(x) = N IL then
6: x ← Right(x)
7: else if Right(x) = N IL then
8: x ← Left(x)
9: else ▷ both children are non-NIL
10: y ← Min(Right(x))
11: Key(x) ← Key(y)
12: Copy other satellite data from y to x
13: if Parent(y) ̸= x then ▷ y hasn’t left sub tree
14: Left(Parent(y)) ← Right(y)
15: else ▷ y is the root of right child of x
16: Right(x) ← Right(y)
17: if Right(y) ̸= N IL then
18: Parent(Right(y)) ← Parent(y)
19: Remove y
20: return r
21: if x ̸= N IL then
22: Parent(x) ← p
23: if p = N IL then ▷ We are removing the root of the tree
24: r←x
25: else
26: if Left(p) = x′ then
27: Left(p) ← x
28: else
29: Right(p) ← x
30: Remove x′
31: return r
Here we assume the node to be deleted is not empty (otherwise we can simply
returns the original tree). In other cases, it will first record the root of the tree,
create copy pointers to x, and its parent.
If either of the children is empty, the algorithm just splice x out. If it has
two non-NIL children, we first located the minimum of right child, replace the
key of x to y’s, copy the satellite data as well, then splice y out. Note that there
is a special case that y is the root node of x’s right sub tree.
Finally we need reset the stored parent if the original x has only one non-
NIL child. If the parent pointer we copied before is empty, it means that we are
deleting the root node, so we need return the new root. After the parent is set
properly, we finally remove the old x from memory.
The relative Python program for deleting algorithm is given as below. Be-
cause Python provides GC, we needn’t explicitly remove the node from the
44CHAPTER 1. BINARY SEARCH TREE, THE ‘HELLO WORLD’ DATA STRUCTURE

memory.
def tree_delete(t, x):
if x is None:
return t
[root, old_x, parent] = [t, x, x.parent]
if x.left is None:
x = x.right
elif x.right is None:
x = x.left
else:
y = tree_min(x.right)
x.key = y.key
if y.parent != x:
y.parent.left = y.right
else:
x.right = y.right
if y.right is not None:
y.right.parent = y.parent
return root
if x is not None:
x.parent = parent
if parent is None:
root = x
else:
if parent.left == old_x:
parent.left = x
else:
parent.right = x
return root

Because the procedure seeks minimum element, it runs in O(h) time on a

tree of height h.

Exercise 1.3

• There is a symmetrical solution for deleting a node which has two non-NIL
children, to replace the element by splicing the maximum one out off the
left sub-tree. Write a program to implement this solution.

1.7 Randomly build binary search tree

It can be found that all operations given in this post bound to O(h) time for a
tree of height h. The height affects the performance a lot. For a very unbalanced
tree, h tends to be O(n), which leads to the worst case. While for balanced tree,
h close to O(lg n). We can gain the good performance.
How to make the binary search tree balanced will be discussed in next post.
However, there exists a simple way. Binary search tree can be randomly built as
described in [2]. Randomly building can help to avoid (decrease the possibility)
unbalanced binary trees. The idea is that before building the tree, we can call
a random process, to shuffle the elements.
1.7. RANDOMLY BUILD BINARY SEARCH TREE 45

Exercise 1.4

• Write a randomly building process for binary search tree.

46CHAPTER 1. BINARY SEARCH TREE, THE ‘HELLO WORLD’ DATA STRUCTURE
Bibliography

[1] Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest and Clifford

Stein. “Introduction to Algorithms, Second Edition”. ISBN:0262032937.
The MIT Press. 2001

[2] Jon Bentley. “Programming Pearls(2nd Edition)”. Addison-Wesley Profes-

sional; 2 edition (October 7, 1999). ISBN-13: 978-0201657883
[3] Chris Okasaki. “Ten Years of Purely Functional Data Structures”.
http://okasaki.blogspot.com/2008/02/ten-years-of-purely-functional-
data.html

[4] SGI. “Standard Template Library Programmer’s Guide”.

http://www.sgi.com/tech/stl/
[5] http://en.literateprograms.org/Category:Binary_search_tree
[6] http://en.wikipedia.org/wiki/Foldl

[7] http://en.wikipedia.org/wiki/Function_composition
[8] http://en.wikipedia.org/wiki/Partial_application
[9] Miran Lipovaca. “Learn You a Haskell for Great Good! A Beginner’s
Guide”. the last chapter. No Starch Press; 1 edition April 2011, 400 pp.
ISBN: 978-1-59327-283-8

47
48 The evolution of insertion sort
Chapter 2

The evolution of insertion

sort

2.1 Introduction
In previous chapter, we introduced the ’hello world’ data structure, binary
search tree. In this chapter, we explain insertion sort, which can be think of
the ’hello world’ sorting algorithm 1 . It’s straightforward, but the performance
is not as good as some divide and conqueror sorting approaches, such as quick
sort and merge sort. Thus insertion sort is seldom used as generic sorting utility
in modern software libraries. We’ll analyze the problems why it is slow, and
trying to improve it bit by bit till we reach the best bound of comparison based
sorting algorithms, O(n lg n), by evolution to tree sort. And we finally show the
connection between the ’hello world’ data structure and ’hello world’ sorting
algorithm.
The idea of insertion sort can be vivid illustrated by a real life poker game[2].
Suppose the cards are shuffled, and a player starts taking card one by one.
At any time, all cards in player’s hand are well sorted. When the player
gets a new card, he insert it in proper position according to the order of points.
Figure 2.1 shows this insertion example.
Based on this idea, the algorithm of insertion sort can be directly given as
the following.
function Sort(A)
X←ϕ
for each x ∈ A do
Insert(X, x)
return X
It’s easy to express this process with folding, which we mentioned in the
chapter of binary search tree.

sort = f oldL insert ϕ (2.1)

1 Some reader may argue that ’Bubble sort’ is the easiest sort algorithm. Bubble sort isn’t

covered in this book as we don’t think it’s a valuable algorithm[1]

49
50 CHAPTER 2. THE EVOLUTION OF INSERTION SORT

Figure 2.1: Insert card 8 to proper position in a deck.

Note that in the above algorithm, we store the sorted result in X, so this
isn’t in-place sorting. It’s easy to change it to in-place algorithm. Denote the
sequence as A = {a1 , a2 , ..., an }.
function Sort(A)
for i ← 2 to |A| do
insert ai to sorted sequence {a′1 , a′2 , ..., a′i−1 }
At any time, when we process the i-th element, all elements before i have
already been sorted. we continuously insert the current elements until consume
all the unsorted data. This idea is illustrated as in figure 8.3.

insert

... sorted elements ... x ... unsorted elements ...

Figure 2.2: The left part is sorted data, continuously insert elements to sorted
part.

We can find there is recursive concept in this definition. Thus it can be

expressed as the following.
{
ϕ : A=ϕ
sort(A) = (2.2)
insert(sort({a2 , a3 , ...}), a1 ) : otherwise

2.2 Insertion
We haven’t answered the question about how to realize insertion however. It’s
a puzzle how does human locate the proper position so quickly.
For computer, it’s an obvious option to perform a scan. We can either scan
from left to right or vice versa. However, if the sequence is stored in plain array,
it’s necessary to scan from right to left.
2.2. INSERTION 51

function Sort(A)
for i ← 2 to |A| do ▷ Insert A[i] to sorted sequence A[1...i − 1]
x ← A[i]
j ←i−1
while j > 0 ∧ x < A[j] do
A[j + 1] ← A[j]
j ←j−1
A[j + 1] ← x
One may think scan from left to right is natural. However, it isn’t as effect
as above algorithm for plain array. The reason is that, it’s expensive to insert an
element in arbitrary position in an array. As array stores elements continuously,
if we want to insert new element x in position i, we must shift all elements after
i, including i + 1, i + 2, ... one position to right. After that the cell at position i
is empty, and we can put x in it. This is illustrated in figure 2.3.

insert

A[1] A[2] ... A[i-1] A[i] A[i+1] A[i+2] ... A[n-1] A[n] empty

Figure 2.3: Insert x to array A at position i.

If the length of array is n, this indicates we need examine the first i elements,
then perform n − i + 1 moves, and then insert x to the i-th cell. So insertion
from left to right need traverse the whole array anyway. While if we scan from
right to left, we examine i elements at most, and perform the same amount of
moves.
Translate the above algorithm to Python yields the following code.
def isort(xs):
n = len(xs)
for i in range(1, n):
x = xs[i]
j=i - 1
while j ≥ 0 and x < xs[j]:
xs[j+1] = xs[j]
j=j - 1
xs[j+1] = x

It can be found some other equivalent programs, for instance the following
ANSI C program. However this version isn’t as effective as the pseudo code.
void isort(Key∗ xs, int n){
int i, j;
for(i=1; i<n; ++i)
for(j=i-1; j≥0 && xs[j+1] < xs[j]; --j)
swap(xs, j, j+1);
}
52 CHAPTER 2. THE EVOLUTION OF INSERTION SORT

This is because the swapping function, which can exchange two elements
typically uses a temporary variable like the following:
void swap(Key∗ xs, int i, int j){
Key temp = xs[i];
xs[i] = xs[j];
xs[j] = temp;
}
So the ANSI C program presented above takes 3m times assignment, where
m is the number of inner loops. While the pseudo code as well as the Python
program use shift operation instead of swapping. There are m + 2 times assign-
ment.
We can also provide Insert() function explicitly, and call it from the general
insertion sort algorithm in previous section. We skip the detailed realization here
and left it as an exercise.
All the insertion algorithms are bound to O(n), where n is the length of
the sequence. No matter what difference among them, such as scan from left
or from right. Thus the over all performance for insertion sort is quadratic as
O(n2 ).

Exercise 2.1
• Provide explicit insertion function, and call it with general insertion sort
algorithm. Please realize it in both procedural way and functional way.

2.3 Improvement 1
Let’s go back to the question, that why human being can find the proper position
for insertion so quickly. We have shown a solution based on scan. Note the fact
that at any time, all cards at hands have been well sorted, another possible
solution is to use binary search to find that location.
We’ll explain the search algorithms in other dedicated chapter. Binary search
is just briefly introduced for illustration purpose here.
The algorithm will be changed to call a binary search procedure.
function Sort(A)
for i ← 2 to |A| do
x ← A[i]
p ← Binary-Search(A[1...i − 1], x)
for j ← i down to p do
A[j] ← A[j − 1]
A[p] ← x
Instead of scan elements one by one, binary search utilize the information
that all elements in slice of array {A1 , ..., Ai−1 } are sorted. Let’s assume the
order is monotonic increase order. To find a position j that satisfies Aj−1 ≤
x ≤ Aj . We can first examine the middle element, for example, A⌊i/2⌋ . If x is
less than it, we need next recursively perform binary search in the first half of
the sequence; otherwise, we only need search in last half.
Every time, we halve the elements to be examined, this search process runs
O(lg n) time to locate the insertion position.
2.4. IMPROVEMENT 2 53

function Binary-Search(A, x)
l←1
u ← 1 + |A|
while l < u do
m ← ⌊ l+u
2 ⌋
if A[m] = x then
return m ▷ Find a duplicated element
else if A[m] < x then
l ←m+1
else
u←m
return l
The improved insertion sort algorithm is still bound to O(n2 ), compare to
previous section, which we use O(n2 ) times comparison and O(n2 ) moves, with
binary search, we just use O(n lg n) times comparison and O(n2 ) moves.
The Python program regarding to this algorithm is given below.
def isort(xs):
n = len(xs)
for i in range(1, n):
x = xs[i]
p = binary_search(xs[:i], x)
for j in range(i, p, -1):
xs[j] = xs[j-1]
xs[p] = x

def binary_search(xs, x):

l=0
u = len(xs)
while l < u:
m = (l+u)/2
if xs[m] == x:
return m
elif xs[m] < x:
l=m+1
else:
u=m
return l

Exercise 2.2
Write the binary search in recursive manner. You needn’t use purely func-
tional programming language.

2.4 Improvement 2
Although we improve the search time to O(n lg n) in previous section, the num-
ber of moves is still O(n2 ). The reason of why movement takes so long time, is
because the sequence is stored in plain array. The nature of array is continu-
ously layout data structure, so the insertion operation is expensive. This hints
54 CHAPTER 2. THE EVOLUTION OF INSERTION SORT

us that we can use linked-list setting to represent the sequence. It can improve
the insertion operation from O(n) to constant time O(1).


 {x} : A = ϕ
insert(A, x) = {x} ∪ A : x < a1 (2.3)

{a1 } ∪ insert({a2 , a3 , ...an }, x) : otherwise

Translating the algorithm to Haskell yields the below program.

insert [] x = [x]
insert (y:ys) x = if x < y then x:y:ys else y:insert ys x
And we can complete the two versions of insertion sort program based on
the first two equations in this chapter.
isort [] = []
isort (x:xs) = insert (isort xs) x
Or we can represent the recursion with folding.
isort = foldl insert []
Linked-list setting solution can also be described imperatively. Suppose
function Key(x), returns the value of element stored in node x, and Next(x)
accesses the next node in the linked-list.
function Insert(L, x)
p ← NIL
H←L
while L ̸= NIL ∧ Key(L) < Key(x) do
p←L
L ← Next(L)
Next(x) ← L
if p = NIL then
H←x
else
Next(p) ← x
return H
For example in ANSI C, the linked-list can be defined as the following.
struct node{
Key key;
struct node∗ next;
};
Thus the insert function can be given as below.
struct node∗ insert(struct node∗ lst, struct node∗ x){
struct node ∗p, ∗head;
p = NULL;
for(head = lst; lst && x→key > lst→key; lst = lst→next)
p = lst;
x→next = lst;
if(!p)
return x;
p→next = x;
2.5. FINAL IMPROVEMENT BY BINARY SEARCH TREE 55

return head;
}

Instead of using explicit linked-list such as by pointer or reference based

structure. Linked-list can also be realized by another index array. For any
array element A[i], N ext[i] stores the index of next element follows A[i]. It
means A[N ext[i]] is the next element after A[i].
The insertion algorithm based on this solution is given like below.
function Insert(A, N ext, i)
j ←⊥
while N ext[j] ̸= NIL ∧A[N ext[j]] < A[i] do
j ← N ext[j]
N ext[i] ← N ext[j]
N ext[j] ← i
Here ⊥ means the head of the N ext table. And the relative Python program
for this algorithm is given as the following.
def isort(xs):
n = len(xs)
next = [-1]∗(n+1)
for i in range(n):
insert(xs, next, i)
return next

def insert(xs, next, i):

j = -1
while next[j] != -1 and xs[next[j]] < xs[i]:
j = next[j]
next[j], next[i] = i, next[j]

Although we change the insertion operation to constant time by using linked-

list. However, we have to traverse the linked-list to find the position, which
results O(n2 ) times comparison. This is because linked-list, unlike array, doesn’t
support random access. It means we can’t use binary search with linked-list
setting.

Exercise 2.3

• Complete the insertion sort by using linked-list insertion function in your

favorate imperative programming language.

• The index based linked-list return the sequence of rearranged index as

result. Write a program to re-order the original array of elements from
this result.

2.5 Final improvement by binary search tree

It seems that we drive into a corner. We must improve both the comparison
and the insertion at the same time, or we will end up with O(n2 ) performance.
56 CHAPTER 2. THE EVOLUTION OF INSERTION SORT

We must use binary search, this is the only way to improve the comparison
time to O(lg n). On the other hand, we must change the data structure, because
we can’t achieve constant time insertion at a position with plain array.
This remind us about our ’hello world’ data structure, binary search tree. It
naturally support binary search from its definition. At the same time, We can
insert a new node in binary search tree in O(1) constant time if we already find
the location.
So the algorithm changes to this.
function Sort(A)
T ←ϕ
for each x ∈ A do
T ← Insert-Tree(T, x)
return To-List(T )
Where Insert-Tree() and To-List() are described in previous chapter
about binary search tree.
As we have analyzed for binary search tree, the performance of tree sort is
bound to O(n lg n), which is the lower limit of comparison based sort[3].

2.6 Short summary

In this chapter, we present the evolution process of insertion sort. Insertion sort
is well explained in most textbooks as the first sorting algorithm. It has simple
and straightforward idea, but the performance is quadratic. Some textbooks
stop here, but we want to show that there exist ways to improve it by different
point of view. We first try to save the comparison time by using binary search,
and then try to save the insertion operation by changing the data structure to
linked-list. Finally, we combine these two ideas and evolute insertion sort to
tree sort.
Bibliography

[1] http://en.wikipedia.org/wiki/Bubble_sort
[2] Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest and Clifford
Stein. “Introduction to Algorithms, Second Edition”. ISBN:0262032937.
The MIT Press. 2001
[3] Donald E. Knuth. “The Art of Computer Programming, Volume 3: Sorting
and Searching (2nd Edition)”. Addison-Wesley Professional; 2 edition (May
4, 1998) ISBN-10: 0201896850 ISBN-13: 978-0201896855

57
58 Red black tree
Chapter 3

Red-black tree, not so

complex as it was thought

3.1 Introduction
3.1.1 Exploit the binary search tree
We showed the power of using binary search tree as a dictionary to count the
occurrence of every word in a book in previous chapter.
One may come to the idea to feed a yellow page book 1 to a binary search
tree, and use it to look up the phone number for a contact.
By modifying a bit of the program for word occurrence counting yields the
following code.
int main(int, char∗∗ ){
ifstream f("yp.txt");
map<string, string> dict;
string name, phone;
while(f>>name && f>>phone)
dict[name]=phone;
for(;;){
cout<<"λnname: ";
cin>>name;
if(dict.find(name)==dict.end())
cout<<"not found";
else
cout<<"phone: "<<dict[name];
}
}

This program works well. However, if you replace the STL map with the
binary search tree introduced in previous chapter, the performance will be bad,
especially when you search some names such as Zara, Zed, Zulu.
This is because the content of yellow page is typically listed in lexicographic
order. Which means the name list is in increase order. If we try to insert a
1A telephone number contact list book

59
60CHAPTER 3. RED-BLACK TREE, NOT SO COMPLEX AS IT WAS THOUGHT

sequence of number 1, 2, 3, ..., n to a binary search tree, we will get a tree like
in Figure 3.1.

...

Figure 3.1: unbalanced tree

This is extreme unbalanced binary search tree. The looking up performs

O(h) for a tree with height h. In balanced case, we benefit from binary search
tree by O(lg n) search time. But in this extreme case, the search time down-
graded to O(n). It’s no better than a normal link-list.

Exercise 3.1

• For a very big yellow page list, one may want to speed up the dictionary
building process by two concurrent tasks (threads or processes). One task
reads the name-phone pair from the head of the list, while the other one
reads from the tail. The building terminates when these two tasks meet at
the middle of the list. What will be the binary search tree looks like after
building? What if you split the list more than two and use more tasks?
• Can you find any more cases to exploit a binary search tree? Please
consider the unbalanced trees shown in figure 3.2.

3.1.2 How to ensure the balance of the tree

In order to avoid such case, we can shuffle the input sequence by randomized
algorithm, such as described in Section 12.4 in [2]. However, this method doesn’t
always work, for example the input is fed from user interactively, and the tree
need to be built and updated online.
There are many solutions people found to make binary search tree balanced.
Many of them rely on the rotation operations to the binary search tree. Rotation
operations change the tree structure while maintain the ordering of the elements.
Thus it can be used to improve the balance property of the binary search tree.
In this chapter, we’ll first introduce the red-black tree. It is one of the
most popular and widely used self-adjusting balanced binary search tree. In
next chapter, we’ll introduce another intuitive solution, the AVL tree. In later
3.1. INTRODUCTION 61

n
n

n-1 3

n-2 n-1

... 4

1 ...

(a) (b)

m-1 m+1

m-2 m+2

... ...

1 n

(c)

Figure 3.2: Some unbalanced trees

62CHAPTER 3. RED-BLACK TREE, NOT SO COMPLEX AS IT WAS THOUGHT

chapter about binary heaps, we’ll show another interesting tree called splay tree,
which can gradually adjust the the tree to make it more and more balanced.

3.1.3 Tree rotation

X Y

a Y X c
⇐⇒

b c a b

(a) (b)

Figure 3.3: Tree rotation, ‘rotate-left’ transforms the tree from left side to right
side, and ‘rotate-right’ does the inverse transformation.

Tree rotation is a set of operations that can transform the tree structure
without changing the in-order traverse result. It based on the fact that for
a specified ordering, there are multiple binary search trees correspond to it.
Figure 3.3 shows the tree rotation. For a binary search tree on the left side, left
rotate transforms it to the tree on the right, and right rotate does the inverse
transformation.
Although tree rotation can be realized in procedural way, there exists simple
functional definition by using pattern matching. Denote the non-empty tree as
T = (Tl , k, Tr ), where k is the key, and Tl , Tr are left and right sub-trees.
{
((a, X, b), Y, c) : T = (a, X, (b, Y, c))
rotatel (T ) = (3.1)
T : otherwise
{
(a, X, (b, Y, c)) : T = ((a, X, b), Y, c))
rotater (T ) = (3.2)
T : otherwise
To perform tree rotation imperatively, we need set all fields of the node as
the following.
1: function Left-Rotate(T, x)
2: p ← Parent(x)
3: y ← Right(x) ▷ Assume y ̸= NIL
4: a ← Left(x)
5: b ← Left(y)
6: c ← Right(y)
7: Replace(x, y)
8: Set-Children(x, a, b)
9: Set-Children(y, x, c)
10: if p = NIL then
11: T ←y
12: return T
3.1. INTRODUCTION 63

13: function Right-Rotate(T, y)

14: p ← Parent(y)
15: x ← Left(y) ▷ Assume x ̸= NIL
16: a ← Left(x)
17: b ← Right(x)
18: c ← Right(y)
19: Replace(y, x)
20: Set-Children(y, b, c)
21: Set-Children(x, a, y)
22: if p = NIL then
23: T ←x
24: return T

Where procedure Replace(x, y), uses y to replace x.

1: function Replace(x, y)
2: if Parent(x) = NIL then
3: if y ̸= NIL then Parent(y) ← NIL
4: else if Left(Parent(x)) = x then
5: Set-Left(Parent(x), y)
6: else
7: Set-Right(Parent(x), y)
8: Parent(x) ← NIL

Procedure Set-Children assigns a pair of sub-trees as the left and right

children of a given node.
1: function Set-Children(x, L, R)
2: Set-Left(x, L)
3: Set-Right(x, R)

4: function Set-Left(x, y)
5: Left(x) ← y
6: if y ̸= NIL then Parent(y) ← x

7: function Set-Right(x, y)
8: Right(x) ← y
9: if y ̸= NIL then Parent(y) ← x

Compare the imperative operations with the pattern matching functions, we

can found the latter focus on the structure change, while the former focus on the
rotation process. As the title of this chapter indicated, red-black tree needn’t
be so complex as it was thought. Many traditional algorithm text books use the
classic procedural treatment to the red-black tree. When insert or delete keys,
there are multiple cases with a series of node manipulation. On the other hand,
in functional settings, the algorithm turns to be intuitive and simple, although
there is some performance overhead.

Most of the content in this chapter is based on Chris Okasaki’s work in [2].
64CHAPTER 3. RED-BLACK TREE, NOT SO COMPLEX AS IT WAS THOUGHT

3.2 Definition of red-black tree

Red-black tree is a type of self-balancing binary search tree[4]. 2 By using color
changing and rotation, red-black tree provides a very simple and straightforward
way to keep the tree balanced.
For a binary search tree, we can augment the nodes with a color field, a node
can be colored either red or black. We call a binary search tree red-black tree
if it satisfies the following 5 properties([2] pp273).
1. Every node is either red or black.
2. The root is black.
3. Every leaf (NIL) is black.
4. If a node is red, then both its children are black.
5. For each node, all paths from the node to descendant leaves contain the
same number of black nodes.
Why this 5 properties can ensure the red-black tree is well balanced? Because
they have a key characteristic, the longest path from root to a leaf can’t be as
2 times longer than the shortest path.
Consider the 4-th property. It means there can’t be two adjacent red nodes.
so the shortest path only contains black nodes, any path that is longer than the
shortest one has interval red nodes. According to property 5, all paths have the
same number of black nodes, it finally ensures there can’t be any path that is 2
times longer than others[4]. Figure 3.4 shows a red-black tree example.

8 17

1 11 15 25

NIL 6 NIL NIL NIL NIL 22 27

NIL NIL NIL NIL NIL NIL

Figure 3.4: A red-black tree

As all NIL nodes are black, people often omit them when draw red-black
tree. Figure 3.5 gives the corresponding tree that hides all the NIL nodes.
2 Red-black tree is one of the equivalent form of 2-3-4 tree (see chapter B-tree about 2-3-4

tree). That is to say, for any 2-3-4 tree, there is at least one red-black tree has the same data
order.
3.2. DEFINITION OF RED-BLACK TREE 65

8 17

1 11 15 25

6 22 27

Figure 3.5: The red-black tree with all NIL nodes hidden.

All read operations such as search, find the min/max, are same as the binary
search tree. The insertion and deletion are special for the red-black tree.
Many implementation of set or map container are based on red-black tree.
One example is the C++ Standard library (STL)[4].
For the data layout, the only change is the color color information need be
augmented to binary search tree. This can be represented as a data field. Like
the below C++ example.
enum Color {Red, Black};

template <class T>

struct node{
Color color;
T key;
node∗ left;
node∗ right;
node∗ parent;
};

In functional settings, we can add the color information in constructors,

below is the Haskell example of red-black tree definition.
data Color = R | B
data RBTree a = Empty
| Node Color (RBTree a) a (RBTree a)

Exercise 3.2

• Can you prove that a red-black tree with n nodes has height at most
2 lg(n + 1)?
66CHAPTER 3. RED-BLACK TREE, NOT SO COMPLEX AS IT WAS THOUGHT

3.3 Insertion
The tree may be unblanced if a new node is inserted with the method we used
for the binary search tree. In order to maintain the red-black properties, we
need do some fixing after insertion.
When insert a new key, we can always insert it as a red node. As far as the
new inserted node isn’t the root of the tree, we can keep all properties except
the 4-th one as it may bring two adjacent red nodes.
There are both functional and procedural fixing methods. One is intuitive
but has some overhead, the other is a bit complex but has higher performance. In
this chapter, we focus on the functional approach to show how easy a red-black
tree insertion algorithm can be realized. The traditional procedural method will
be given for comparison purpose.
As described by Chris Okasaki, there are total 4 cases which violate property
4. All of them has 2 adjacent red nodes. However, they have a uniformed
structure after fixing[2] as shown in figure 3.6.
Note that this transformation will move the redness one level up. During the
bottom-up recursive fixing, the last step will make the root node red. According
to property 2, root is always black, thus we need finally fix to revert the root
color to black.
Observing that the 4 cases and the fixed result have strong patterns, the
fixing function can be defined by using the similar method we mentioned in tree
rotation. Denote the color of a node as C, it has two values: black B, and redR.
A none empty tree can be represented as T = (C, Tl , k, Tr ).

{
(R, (B, A, x, B), y, (B, C, z, D)) : match(T )
balance(T ) = (3.3)
T : otherwise

where function match() tests if a tree mathes one of the 4 possible patterns
as the following.
 

(B, (R, (R, A, x, B), y, C), z, D)∨
 

 
(B, (R, A, x, (R, B, y, C), z, D))∨
match(T ) = T =

 (B, A, x, (R, B, y, (R, C, z, D)))∨ 

 
(B, A, x, (R, (R, B, y, C), z, D))

With function balance(T ) defined, we can modify the binary search tree
insertion functions to make it work for red-black tree.

insert(T, k) = makeBlack(ins(T, k)) (3.4)

where

 (R, ϕ, k, ϕ) : T =ϕ
ins(T, k) = balance((ins(Tl , k), k ′ , Tr )) : k < k′ (3.5)

balance((Tl , k ′ , ins(Tr , k))) : otherwise

If the tree is empty, then a new red node with k as the key is created;
otherwise, denote the children and the key as Tl , Tr , and k ′ , we compare k and
3.3. INSERTION 67

z x

y D A y

x C B z

@
@ y
A B @
R C D

x z

A B C D

@
I
z @ x
@

x D A z

A y y D

B C B C

Figure 3.6: 4 cases for balancing a red-black tree after insertion

68CHAPTER 3. RED-BLACK TREE, NOT SO COMPLEX AS IT WAS THOUGHT

k ′ and recursively insert k to a child. Function balance is called after that, and
the root is re-colored black finally.

makeBlack(T ) = (B, Tl , k, Tr ) (3.6)

Summarize the above functions and use language supported pattern match-
ing features, we can come to the following Haskell program.
insert t x = makeBlack $ ins t where
ins Empty = Node R Empty x Empty
ins (Node color l k r)
| x<k = balance color (ins l) k r
| otherwise = balance color l k (ins r) --[3]
makeBlack(Node _ l k r) = Node B l k r

balance B (Node R (Node R a x b) y c) z d =

Node R (Node B a x b) y (Node B c z d)
balance B (Node R a x (Node R b y c)) z d =
Node R (Node B a x b) y (Node B c z d)
balance B a x (Node R b y (Node R c z d)) =
Node R (Node B a x b) y (Node B c z d)
balance B a x (Node R (Node R b y c) z d) =
Node R (Node B a x b) y (Node B c z d)
balance color l k r = Node color l k r

Note that the ’balance’ function is changed a bit from the original definition.
Instead of passing the tree, we pass the color, the left child, the key and the
right child to it. This can save a pair of ‘boxing’ and ’un-boxing’ operations.
This program doesn’t handle the case of duplicated keys. we can either
overwrite the key or drop the duplicated one. Another option is to augment the
data with a linked list([2], pp269).
Figure 3.7 shows two red-black trees built from feeding list 11, 2, 14, 1, 7,
15, 5, 8, 4 and 1, 2, ..., 8. The tree is well balanced even if we input an ordered
list.

7 4

2 14 2 6

1 5 11 15 1 3 5 7

4 8 8

Figure 3.7: insert results generated from two sequences of keys.

This algorithm shows great simplicity by summarizing the uniform pattern

from the four different unbalanced cases. It is expressive over the traditional tree
rotation approach, that even in programming languages which don’t support
pattern matching, the algorithm can still be implemented by manually check
the pattern. A Scheme/Lisp program is available along with this book can be
referenced as an example.
3.4. DELETION 69

The insertion algorithm takes O(lg n) time to insert a key to a red-black tree
which has n nodes.

Exercise 3.3

• Write a program in an imperative language, such as C, C++ or python

to realize the same algorithm in this section. Note that, because there is
no language supported pattern matching, you need to test the 4 different
cases manually.

3.4 Deletion
Remind the deletion section in binary search tree. Deletion is ‘imperative only’
for red-black tree as well. In many cases, the tree is often built just one time,
and then performs looking up frequently[3].
The purpose of this section is to show that red-black tree deletion is possible
in purely functional settings, although it actually rebuilds the tree because trees
are read only in terms of purely functional data structure3 . In real world, it’s
up to the user (i.e. the programmer) to adopt the proper solution. One option
is to mark the node be deleted with a flag, and later rebuild the tree when the
number of deleted nodes exceeds 50%.
Deletion is more complex than insertion in both functional and imperative
settings, as there are more cases to fix. Deletion may also violate the red black
tree properties, so we need fix it after the normal deletion as described in binary
search tree.
The problem only happens if you try to delete a black node, because it will
violate the last property of red-black tree. The number of black node in the
path decreases so not all the paths contain the same number of black node.
When delete a black node, we can resume the last red-black property by
introducing a ’doubly-black’ concept([2], pp290). It means that the although
the node is deleted, the blackness is kept by storing it in the parent node. If
the parent node is red, it turns to black, However, if it’s already black, it turns
to ‘doubly-black’.
In order to express the ’doubly-black node’, The definition need some mod-
ification accordingly.
data Color = R | B | BB -- BB: doubly black for deletion
data RBTree a = Empty | BBEmpty -- doubly black empty
| Node Color (RBTree a) a (RBTree a)

When deleting a node, we first perform the same binary search tree deleting
algorithm. After that, if the node to be sliced out is black, we need fix the tree
to keep the red-black properties. The delete function is defined as the following.

delete(T, k) = blackenRoot(del(T, k)) (3.7)

3 Actually, the common part of the tree is reused. Most functional programming environ-

ments support this persistent feature.

70CHAPTER 3. RED-BLACK TREE, NOT SO COMPLEX AS IT WAS THOUGHT

where



 ϕ : T =ϕ



 f ixBlack 2 ((C, del(Tl , k), k ′ , Tr )) : k < k′



 f{ixBlack 2 ((C, Tl , k ′ , del(Tr , k))) : k > k′

mkBlk(Tr ) : C = B
del(T, k) = : Tl = ϕ (3.8)

 { Tr : otherwise



 mkBlk(Tl ) : C = B

 : Tr = ϕ

 Tl : otherwise

f ixBlack 2 ((C, Tl , k ′′ , del(Tr , k ′′ ))) : otherwise

The real deleting happens inside function del. For the trivial case, that the
tree is empty, the deletion result is ϕ; If the key to be deleted is less than the
key of the current node, we recursively perform deletion on its left sub-tree; if
it is bigger than the key of the current node, then we recursively delete the key
from the right sub-tree; Because it may bring doubly-blackness, so we need fix
it.
If the key to be deleted is equal to the key of the current node, we need
splice it out. If one of its children is empty, we just replace the node by the
other one and reserve the blackness of this node. otherwise we cut and past the
minimum element k ′′ = min(Tr ) from the right sub-tree.
Function delete just forces the result tree of del to have a black root. This
is realized by function blackenRoot.
{
ϕ : T =ϕ
blackenRoot(T ) = (3.9)
(B, Tl , k, Tr ) : otherwise
The blackenRoot(T ) function is almost same as the makeBlack(T ) function
defined for insertion except for the case of empty tree. This is only valid in
deletion, because insertion can’t result an empty tree, while deletion may.
Function mkBlk is defined to reserved the blackness of a node. If the node
to be sliced isn’t black, this function won’t be applied, otherwise, it turns a red
node to black and turns a black node to doubly-black. This function also marks
an empty tree ϕ to doubly-black empty Φ.


 Φ : T =ϕ

(B, Tl , k, Tr ) : C = R
mkBlk(T ) = (3.10)

 (B 2 , Tl , k, Tr ) : C = B

T : otherwise
where B 2 denotes the doubly-black color.
Summarizing the above functions yields the following Haskell program.
delete t x = blackenRoot(del t x) where
del Empty _ = Empty
del (Node color l k r) x
| x < k = fixDB color (del l x) k r
| x > k = fixDB color l k (del r x)
-- x == k, delete this node
| isEmpty l = if color==B then makeBlack r else r
| isEmpty r = if color==B then makeBlack l else l
| otherwise = fixDB color l k' (del r k') where k'= min r
3.4. DELETION 71

blackenRoot (Node _ l k r) = Node B l k r

blackenRoot _ = Empty

makeBlack (Node B l k r) = Node BB l k r -- doubly black

makeBlack (Node _ l k r) = Node B l k r
makeBlack Empty = BBEmpty
makeBlack t=t

The final attack to the red-black tree deletion algorithm is to realize the
f ixBlack 2 function. The purpose of this function is to eliminate the ‘doubly-
black’ colored node by rotation and color changing. There are three cases. In
every case, the doubly black node can either be normal node, or doubly black
empty node Φ. Let’s examine these three cases one by one.

3.4.1 The sibling of the doubly black node is black, and it

has one red child

In this situation, we can fix the doubly-blackness with one rotation. Actually
there are 4 different sub-cases, all of them can be transformed to one uniformed
pattern. They are shown in the figure B.1.

Figure 3.8: Fix the doubly black by rotation, the sibling of the doubly-black
node is black, and it has one red child.
72CHAPTER 3. RED-BLACK TREE, NOT SO COMPLEX AS IT WAS THOUGHT

The handling of these 4 sub-cases can be realized with pattern matching.

{
2 (C, (B, mkBlk(A), x, B), y, (B, C, z, D)) : p1.1
f ixBlack (T ) = (3.11)
(C, (B, A, x, B), y, (B, C, z, mkBlk(D))) : p1.2
where p1.1 and p1.2 each represent 2 patterns as the following.
 
 T = (C, A, x, (B, (R, B, y, C), z, D)) ∧ color(A) = B 2 
p1.1 : ∨
 
T = (C, A, x, (B, B, y, (R, C, z, D))) ∧ color(A) = B 2
 
 T = (C, (B, A, x, (R, B, y, C)), z, D) ∧ color(D) = B 2 
p1.2 : ∨
 
T = (C, (B, (R, A, x, B), y, C), z, D) ∧ color(D) = B 2
If the doubly black node is a doubly black empty node Φ, it can be changed
back to normal empty node after the above operation. We can add the doubly
black empty node handling on top of the (3.11).


 (C, (B, mkBlk(A), x, B), y, (B, C, z, D)) : p1.1

2 (C, (B, ϕ, x, B), y, (B, C, z, D)) : p1.1′
f ixBlack (T ) = (3.12)
 (C, (B, A, x, B), y, (B, C, z, mkBlk(D)))
 : p1.2

(C, (B, A, x, B), y, (B, C, z, ϕ)) : p1.2′
Where patter p1.1′ and p1.2′ are defined as below:
 
 T = (C, Φ, x, (B, (R, B, y, C), z, D)) 
p1.1′ : ∨
 
T = (C, Φ, x, (B, B, y, (R, C, z, D)))
 
 T = (C, (B, A, x, (R, B, y, C)), z, Φ) 
p1.2′ : ∨
 
T = (C, (B, (R, A, x, B), y, C), z, Φ)

3.4.2 The sibling of the doubly-black node is red

In this case, we can rotate the tree to it to pattern p1.1 or p1.2. Figure B.2
shows about it.
We can add this case on top of (3.12) to gain (3.14).


 ... : ...

2 f ixBlack 2 (B, f ixBlack 2 ((R, A, x, B), y, C) : p2.1
f ixBlack (T ) =

 f ixBlack 2 (B, A, x, f ixBlack 2 ((R, B, y, C)) : p2.2

T : otherwise
(3.13)
where p2.1 and p2.2 are two patterns as the following.

p2.1 : {color(T ) = B ∧ color(Tl ) = B 2 ∧ color(Tr ) = R}

p2.2 : {color(T ) = B ∧ color(Tl ) = R ∧ color(Tr ) = B 2 }

3.4. DELETION 73

Figure 3.9: The sibling of the doubly-black node is red.

3.4.3 The sibling of the doubly-black node, and its two

children are all black
In this case, we can change the color of the sibling node to red; turn the doubly-
black node to black and propagate the doubly-blackness one level up to the
parent node as shown in figure B.3. There are two symmetric sub-cases.
We go on adding this fixing after formula (3.14).



 ... : ...

mkBlk((C, mkBlk(A), x, (R, B, y, C))) : p3.1
f ixBlack 2 (T ) = (3.14)

 mkBlk((C, (R, A, x, B), y, mkBlk(C))) : p3.2

... : ...

where p3.1 and p3.2 are two patterns as below.

{ }
T = (C, A, x, (B, B, y, C))∧
p3.1 :
color(A) = B 2 ∧ color(B) = color(C) = B
{ }
T = (C, (B, A, x, B), y, C)∧
p3.2 :
color(C) = B 2 ∧ color(A) = color(B) = B
If the doubly black node is doubly black empty node Φ, it can be changed
back to normal empty node after re-coloring. We add the doubly black empty
node handling to (??) as below.



 ... : ...



 mkBlk((C, mkBlk(A), x, (R, B, y, C))) : p2.1

2 mkBlk((C, ϕ, x, (R, B, y, C))) : p2.1′
f ixBlack (T ) = (3.15)

 mkBlk((C, (R, A, x, B), y, mkBlk(C))) : p2.2



 mkBlk((C, (R, A, x, B), y, ϕ)) : p2.2′

... : ...
74CHAPTER 3. RED-BLACK TREE, NOT SO COMPLEX AS IT WAS THOUGHT

a y

a y
=⇒
b c

b c

(a) Color of x can be either black or red. (b) If x was red, then it becomes black, oth-
erwise, it becomes doubly-black.
y

x c

x c
=⇒
a b

a b

(c) Color of y can be either black or red. (d) If y was red, then it becomes black, oth-
erwise, it becomes doubly-black.
Figure 3.10: propagate the blackness up.

Where pattern p3.1′ and p3.2′ are defined as the following.

{ }
T = (C, Φ, x, (B, B, y, C))∧
p3.1′ :
color(B) = color(C) = B
{ }
′ T = (C, (B, A, x, B), y, Φ)∧
p3.2 :
color(A) = color(B) = B
Fixing the doubly-black node with all above different cases is a recursive
function. There are two termination conditions. One contains pattern p1.1 and
p1.2, The doubly-black node was eliminated. The other cases may continuously
propagate the doubly-blackness from bottom to top till the root. Finally the
algorithm marks the root node as black anyway. The doubly-blackness will be
removed.
Put formula (3.12), (3.14), and (3.15) together, we can write the final Haskell
program.
-- the sibling is black, and it has one red child
fixDB color a@(Node BB _ _ _) x (Node B (Node R b y c) z d)
= Node color (Node B (makeBlack a) x b) y (Node B c z d)
fixDB color BBEmpty x (Node B (Node R b y c) z d)
= Node color (Node B Empty x b) y (Node B c z d)
fixDB color a@(Node BB _ _ _) x (Node B b y (Node R c z d))
= Node color (Node B (makeBlack a) x b) y (Node B c z d)
fixDB color BBEmpty x (Node B b y (Node R c z d))
= Node color (Node B Empty x b) y (Node B c z d)
3.5. IMPERATIVE RED-BLACK TREE ALGORITHM ⋆ 75

fixDB color (Node B a x (Node R b y c)) z d@(Node BB _ _ _)

= Node color (Node B a x b) y (Node B c z (makeBlack d))
fixDB color (Node B a x (Node R b y c)) z BBEmpty
= Node color (Node B a x b) y (Node B c z Empty)
fixDB color (Node B (Node R a x b) y c) z d@(Node BB _ _ _)
= Node color (Node B a x b) y (Node B c z (makeBlack d))
fixDB color (Node B (Node R a x b) y c) z BBEmpty
= Node color (Node B a x b) y (Node B c z Empty)
-- the sibling is red
fixDB B a@(Node BB _ _ _) x (Node R b y c) = fixDB B (fixDB R a x b) y c
fixDB B a@BBEmpty x (Node R b y c) = fixDB B (fixDB R a x b) y c
fixDB B (Node R a x b) y c@(Node BB _ _ _) = fixDB B a x (fixDB R b y c)
fixDB B (Node R a x b) y c@BBEmpty = fixDB B a x (fixDB R b y c)
-- the sibling and its 2 children are all black, propagate the blackness up
fixDB color a@(Node BB _ _ _) x (Node B b y c) = makeBlack (Node color (makeBlack a) x (Node R b y c))
fixDB color BBEmpty x (Node B b y c) = makeBlack (Node color Empty x (Node R b y c))
fixDB color (Node B a x b) y c@(Node BB _ _ _) = makeBlack (Node color (Node R a x b) y (makeBlack c))
fixDB color (Node B a x b) y BBEmpty = makeBlack (Node color (Node R a x b) y Empty)
-- otherwise
fixDB color l k r = Node color l k r

The deletion algorithm takes O(lg n) time to delete a key from a red-black
tree with n nodes.

Exercise 3.4

• As we mentioned in this section, deletion can be implemented by just

marking the node as deleted without actually removing it. Once the num-
ber of marked nodes exceeds 50%, a tree re-build is performed. Try to
implement this method in your favorite programming language.

• Why needn’t enclose mkBlk with a call to f ixBlack 2 explicitly in the

definition of del(T, k)?

3.5 Imperative red-black tree algorithm ⋆

We almost finished all the content in this chapter. By induction the patterns, we
can implement the red-black tree in a simple way compare to the imperative tree
rotation solution. However, we should show the comparator for completeness.
For insertion, the basic idea is to use the similar algorithm as described in
binary search tree. And then fix the balance problem by rotation and return
the final result.
1: function Insert(T, k)
2: root ← T
3: x ← Create-Leaf(k)
4: Color(x) ← RED
5: p ← NIL
6: while T ̸= NIL do
7: p←T
8: if k < Key(T ) then
76CHAPTER 3. RED-BLACK TREE, NOT SO COMPLEX AS IT WAS THOUGHT

9: T ← Left(T )
10: else
11: T ← Right(T )
12: Parent(x) ← p
13: if p = NIL then ▷ tree T is empty
14: return x
15: else if k < Key(p) then
16: Left(p) ← x
17: else
18: Right(p) ← x
19: return Insert-Fix(root, x)
The only difference from the binary search tree insertion algorithm is that
we set the color of the new node as red, and perform fixing before return. Below
is the example Python program.
def rb_insert(t, key):
root = t
x = Node(key)
parent = None
while(t):
parent = t
if(key < t.key):
t = t.left
else:
t = t.right
if parent is None: #tree is empty
root = x
elif key < parent.key:
parent.set_left(x)
else:
parent.set_right(x)
return rb_insert_fix(root, x)

There are 3 base cases for fixing, and if we take the left-right symmetric
into consideration. there are total 6 cases. Among them two cases can be
merged together, because they all have uncle node in red color, we can toggle
the parent color and uncle color to black and set grand parent color to red.
With this merging, the fixing algorithm can be realized as the following.
1: function Insert-Fix(T, x)
2: while Parent(x) ̸= NIL ∧ Color(Parent(x)) = RED do
3: if Color(Uncle(x)) = RED then ▷ Case 1, x’s uncle is red
4: Color(Parent(x)) ← BLACK
5: Color(Grand-Parent(x)) ← RED
6: Color(Uncle(x)) ← BLACK
7: x ← Grand-Parent(x)
8: else ▷ x’s uncle is black
9: if Parent(x) = Left(Grand-Parent(x)) then
10: if x = Right(Parent(x)) then ▷ Case 2, x is a right child
11: x ← Parent(x)
12: T ← Left-Rotate(T, x)
▷ Case 3, x is a left child
3.5. IMPERATIVE RED-BLACK TREE ALGORITHM ⋆ 77

13: Color(Parent(x)) ← BLACK

14: Color(Grand-Parent(x)) ← RED
15: T ← Right-Rotate(T , Grand-Parent(x))
16: else
17: if x = Left(Parent(x)) then ▷ Case 2, Symmetric
18: x ← Parent(x)
19: T ← Right-Rotate(T, x)
▷ Case 3, Symmetric
20: Color(Parent(x)) ← BLACK
21: Color(Grand-Parent(x)) ← RED
22: T ← Left-Rotate(T , Grand-Parent(x))
23: Color(T ) ← BLACK
24: return T
This program takes O(lg n) time to insert a new key to the red-black tree.
Compare this pseudo code and the balance function we defined in previous
section, we can see the difference. They differ not only in terms of simplicity,
but also in logic. Even if we feed the same series of keys to the two algorithms,
they may build different red-black trees. There is a bit performance overhead
in the pattern matching algorithm. Okasaki discussed about the difference in
detail in his paper[2].
Translate the above algorithm to Python yields the below program.
# Fix the red→red violation
def rb_insert_fix(t, x):
while(x.parent and x.parent.color==RED):
if x.uncle().color == RED:
#case 1: ((a:R x:R b) y:B c:R) =⇒ ((a:R x:B b) y:R c:B)
set_color([x.parent, x.grandparent(), x.uncle()],
[BLACK, RED, BLACK])
x = x.grandparent()
else:
if x.parent == x.grandparent().left:
if x == x.parent.right:
#case 2: ((a x:R b:R) y:B c) =⇒ case 3
x = x.parent
t=left_rotate(t, x)
# case 3: ((a:R x:R b) y:B c) =⇒ (a:R x:B (b y:R c))
set_color([x.parent, x.grandparent()], [BLACK, RED])
t=right_rotate(t, x.grandparent())
else:
if x == x.parent.left:
#case 2': (a x:B (b:R y:R c)) =⇒ case 3'
x = x.parent
t = right_rotate(t, x)
# case 3': (a x:B (b y:R c:R)) =⇒ ((a x:R b) y:B c:R)
set_color([x.parent, x.grandparent()], [BLACK, RED])
t=left_rotate(t, x.grandparent())
t.color = BLACK
return t

Figure 3.11 shows the results of feeding same series of keys to the above
python insertion program. Compare them with figure 3.7, one can tell the
difference clearly.
78CHAPTER 3. RED-BLACK TREE, NOT SO COMPLEX AS IT WAS THOUGHT

2 14

2 7

1 7 15
1 4 6 9

5 8
3 8

(a) (b)

Figure 3.11: Red-black trees created by imperative algorithm.

We put the red-black tree delete algorithm in imperative settings in Ap-

pendix B, because it is more complex than the insertion.

3.6 More words

Red-black tree is the most popular implementation of balanced binary search
tree. Another one is the AVL tree, which we’ll introduce in next chapter. Red-
black tree can be a good start point for more data structures. If we extend the
number of children from 2 to k, and keep the balance as well, it leads to B-
tree, If we store the data along with edge but not inside node, it leads to Tries.
However, the multiple cases handling and the long program tends to make new
comers think red-black tree is complex.
Okasaki’s work helps making the red-black tree much easily understand.
There are many implementation in other programming languages in that manner
[5]. It’s also inspired me to find the pattern matching solution for Splay tree
and AVL tree etc.
Bibliography

[1] Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest and Clifford

Stein. “Introduction to Algorithms, Second Edition”. ISBN:0262032937.
The MIT Press. 2001

[2] Chris Okasaki. “FUNCTIONAL PEARLS Red-Black Trees in a Functional

Setting”. J. Functional Programming. 1998
[3] Chris Okasaki. “Ten Years of Purely Functional Data Structures”.
http://okasaki.blogspot.com/2008/02/ten-years-of-purely-functional-
data.html

[4] Wikipedia. “Red-black tree”. http://en.wikipedia.org/wiki/Red-black_tree

[5] Pattern matching. http://rosettacode.org/wiki/Pattern_matching

79
80 AVL tree
Chapter 4

AVL tree

4.1 Introduction
4.1.1 How to measure the balance of a tree?
Besides red-black tree, are there any other intuitive self-balancing binary search
trees? In order to measure how balancing a binary search tree is, one idea is
to compare the height of the right sub-tree and left sub-tree. If they differs a
lot, the tree isn’t well balanced. Let’s denote the difference height between two
children as below

δ(T ) = |Tr | − |Tl | (4.1)

Where |T | means the height of tree T , and Tl , Tr are the left and right
sub-trees.
If δ(T ) = 0 for every node, The tree is definitely balanced. For example,
a complete binary tree has n = 2h − 1 nodes for height h. There is no empty
branches unless the leafs. Another trivial case is empty tree. δ(ϕ) = 0. The less
absolute value of δ(T ) the more balanced the tree is.
We define δ(T ) as the balance factor of a binary search tree.

4.2 Definition of AVL tree

The AVL tree is a special binary search tree, that all sub-trees satisfying the
following criteria.

|δ(T )| ≤ 1 (4.2)
The absolute value of balance factor is less than or equal to 1, which means
there are only three valid values, -1, 0 and 1. Figure 4.1 shows an example AVL
tree.
Why AVL tree can keep the tree balanced? In other words, Can this defini-
tion ensure the height of the tree as O(lg n) where n is the number of the nodes
in the tree? Let’s prove this fact.
For an AVL tree of height h, The number of nodes varies. It can have at
most 2h − 1 nodes for a complete binary tree. We are interesting about how

81
82 CHAPTER 4. AVL TREE

2 8

1 3 6 9

5 7 10

Figure 4.1: AVL tree example

many nodes there are at least. Let’s denote the minimum number of nodes for
the AVL tree of height h as N (h). It’s obvious we have the below result for the
trivial cases.

• For empty tree, h = 0, N (0) = 0;

• For a singleton leaf tree, h = 1, N (1) = 1;

What’s the situation for the common case N (h)? Figure 4.2 shows an AVL
tree T of height h. It contains three parts, the root node, and two sub trees Tl ,
Tr . We have the following fact.

h = max(|Tl |, |Tr |) + 1 (4.3)

We immediately know that, there must be one child has height h − 1. Ac-
cording to the definition of AVL tree, we have ||Tl | − |Tr || ≤ 1. This leads to the
fact that the height of other tree can’t be lower than h − 2, So the total number
of the nodes of T is the number of nodes in both children plus 1 (for the root
node). We exclaim that.

N (h) = N (h − 1) + N (h − 2) + 1 (4.4)

h-1 h-2

Figure 4.2: An AVL tree of height h. The height of one sub-tree is h − 1, the
other is no less than h − 2
4.2. DEFINITION OF AVL TREE 83

This recursion reminds us the famous Fibonacci series. Actually we can

transform it to Fibonacci series by defining N ′ (h) = N (h) + 1. So equation
(4.4) changes to.

N ′ (h) = N ′ (h − 1) + N ′ (h − 2) (4.5)
Lemma 4.2.1. Let N (h) be the minimum number of nodes for an AVL tree of
height h. and N ′ (h) = N (h) + 1, then
N ′ (h) ≥ ϕh (4.6)
√
5+1
Where ϕ = 2 is the golden ratio.
Proof. For the trivial case, we have
• h = 0, N ′ (0) = 1 ≥ ϕ0 = 1
• h = 1, N ′ (1) = 2 ≥ ϕ1 = 1.618...
For the induction case, suppose N ′ (h) ≥ ϕh .
N ′ (h + 1) = N ′ (h) + N ′ (h − 1) {F ibonacci}
≥ ϕh + ϕh−1 √
= ϕh−1 (ϕ + 1) {ϕ + 1 = ϕ2 = 2 }
5+3
h+1
=ϕ

From Lemma 4.2.1, we immediately get

h ≤ logϕ (n + 1) = logϕ 2 · lg(n + 1) ≈ 1.44 lg(n + 1) (4.7)

It tells that the height of AVL tree is proportion to O(lg n), which means
that AVL tree is balanced.
For the mutate operations such as tree insertion and deletion, if the balance
factor changes to any invalid values, some fixing has to be performed to resume
|δ| within 1. Most implementations utilize tree rotations. In this chapter, we’ll
show the pattern matching solution which is inspired by Okasaki’s red-black
tree solution[2]. Because of this ‘modify-fix’ approach, AVL tree is also a kind
of self-balancing binary search tree. For comparison purpose, we’ll also show
the procedural algorithms.
Of course we can compute the δ value recursively, another option is to store
the balance factor inside each nodes, and update them when we modify the tree.
The latter one avoid computing the same value every time.
Based on this idea, we can add one extra data field δ to the binary search
tree definition. The following C++ example code reflects this change 1 .
template <class T>
struct node {
int delta;
T key;
node∗ left;
node∗ right;
node∗ parent;
};
1 Some implementations store the height of a tree instead of δ as in [5]
84 CHAPTER 4. AVL TREE

In purely functional setting, some implementation use different constructors

to store the δ information. for example in [1], there are 4 constructors, E, N, P,
Z defined. E for empty tree, N for tree with negative 1 balance factor, P for tree
with positive 1 balance factor and Z for zero case.
In this chapter, we’ll explicitly store the balance factor inside the node.
data AVLTree a = Empty
| Br (AVLTree a) a (AVLTree a) Int
The immutable operations, including looking up, finding the maximum and
minimum elements are all same as the binary search tree. We’ll skip them and
focus on the mutable operations.

4.3 Insertion
Insert a new element to the tree may violate the AVL tree property that the
absolute value of δ exceeds 1. To resume it, one option is to do the tree rotation
according to the different insertion cases. Most implementation is based on this
approach
Another way is to use the similar pattern matching method mentioned by
Okasaki in his red-black tree implementation [2]. Inspired by this idea, it is
possible to provide a simple and intuitive solution.
When insert a new key to the AVL tree, the balance factor of the root may
changes in range [−1, 1]2 , and the height may increase at most by one, which we
need recursively use this information to update the δ value in further level nodes.
We can define the result of the insertion algorithm as a pair of data (T ′ , ∆H).
Where T ′ is the new tree and ∆H is the increment of height. Let’s denote
function f irst(pair) can return the first element in a pair. We can modify the
binary search tree insertion algorithm as the following to handle AVL tree.

insert(T, k) = f irst(ins(T, k)) (4.8)

where

 ((ϕ, k, ϕ, 0), 1) : T =ϕ
ins(T, k) = tree(ins(Tl , k), k ′ , (Tr , 0), ∆) : k < k′ (4.9)

tree((Tl , 0), k ′ , ins(Tr , k), ∆) : otherwise
Tl , Tr , k ′ , ∆ represent the left child, right child, the key and the balance
factor of a tree.

Tl = lef t(T )
Tr = right(T )
k ′ = key(T )
∆ = δ(T )
When we insert a new key k to a AVL tree T , if the tree is empty, we create
a leaf with k as the key, set the balance factor as 0, and the height is increased
by one.
If T isn’t empty, we need compare the key k ′ with k. If k is less than the
key, we recursively insert it to the left child, otherwise we insert it to the right.
2 Note that, it doesn’t mean δ is in range [−1, 1], the changes of δ is in this range.
4.3. INSERTION 85

As the result of the recursive insertion is a pair like (Tl′ , ∆Hl ), we need
do balance adjustment and update the increment of height. Function tree()
is defined to dealing with this task. It takes 4 parameters as (Tl′ , ∆Hl ), k ′ ,
(Tr′ , ∆Hr ), and ∆. The result of this function is defined as (T ′ , ∆H), where T ′
is the new tree after adjustment, and ∆H is the new increment of height. It is
defined as below.

∆H = |T ′ | − |T | (4.10)
This can be further detailed deduced in 4 cases.

∆H = |T ′ | − |T |
= 1 + max(|Tr′ |, |Tl′ |) − (1 + max(|Tr |, |Tl |))
=max(|Tr′ |, |Tl′ |) − max(|Tr |, |Tl |)

 ∆Hr : ∆ ≥ 0 ∧ ∆′ ≥ 0 (4.11)

∆ + ∆Hr : ∆ ≤ 0 ∧ ∆′ ≥ 0
=

 ∆Hl − ∆ : ∆ ≥ 0 ∧ ∆′ ≤ 0

∆Hl : otherwise
The proof of this equation can be referred from Appendix C.
The next problem is to determine the new balance factor ∆′ before perform-
ing balance adjustment. According to the definition of AVL tree, the balance
factor is the height difference of the right and left sub trees. We have the
following fact.

∆′ = |Tr′ | − |Tl′ |
= |Tr | + ∆Hr − (|Tl | + ∆Hl )
(4.12)
= |Tr | − |Tl | + ∆Hr − ∆Hl
= ∆ + ∆Hr − ∆Hl
With all these changes in height and the balance factor, we can define the
tree() function mentioned in (4.9).

tree((Tl′ , ∆Hl ), k, (Tr′ , ∆Hr ), ∆) = balance((Tl′ , k, Tr′ , ∆′ ), ∆H) (4.13)

Before we moving into details of balance adjustment, let’s translate the above
equations to example Haskell program.
First is the insert function.
insert::(Ord a)⇒AVLTree a → a → AVLTree a
insert t x = fst $ ins t where
ins Empty = (Br Empty x Empty 0, 1)
ins (Br l k r d)
| x<k = tree (ins l) k (r, 0) d
| x == k = (Br l k r d, 0)
| otherwise = tree (l, 0) k (ins r) d
Here we also handle the duplicated keys (the key has already existed.) by
overwriting.
tree::(AVLTree a, Int) → a → (AVLTree a, Int) → Int → (AVLTree a, Int)
tree (l, dl) k (r, dr) d = balance (Br l k r d', delta) where
d' = d + dr - dl
delta = deltaH d d' dl dr
86 CHAPTER 4. AVL TREE

And the definition of height increment is as below.

deltaH :: Int → Int → Int → Int → Int

deltaH d d' dl dr
| d ≥0 && d' ≥0 = dr
| d ≤0 && d' ≥0 = d+dr
| d ≥0 && d' ≤0 = dl - d
| otherwise = dl

4.3.1 Balancing adjustment

As the pattern matching approach is adopted in doing re-balancing. We need
consider what kind of patterns violate the AVL tree property.
Figure 4.3 shows the 4 cases which need fix. For all these 4 cases the bal-
ancing factors are either -2, or +2 which exceed the range of [−1, 1]. After
balancing adjustment, this factor turns to be 0, which means the height of left
sub tree is equal to the right sub tree.
We call these four cases left-left lean, right-right lean, right-left lean, and left-
right lean cases in clock-wise direction from top-left. We denote the balancing
factor before fixing as δ(x), δ(y), and δ(z), while after fixing, they changes to
δ ′ (x), δ ′ (y), and δ ′ (z) respectively.
After fixing, we have δ(y) = 0 for all four cases. The result values of δ ′ (x)
and δ ′ (z) can be given as below. The proof are provided in Appendix C.

Left-left lean

δ ′ (x) = δ(x)
δ ′ (y) = 0 (4.14)
δ ′ (z) = 0

Right-right lean

δ ′ (x) = 0
δ ′ (y) = 0 (4.15)
δ ′ (z) = δ(z)

Right-left lean and Left-right lean

{
′ −1 : δ(y) = 1
δ (x) =
0 : otherwise
δ ′ (y) = {
0 (4.16)
1 : δ(y) = −1
δ ′ (z) =
0 : otherwise
4.3. INSERTION 87

δ(z) = −2 δ(x) = 2
z x
δ(y) = −1
δ(y) = 1
y D A y

x C
δ ′ (y) = 0 B z

@
@ y
A B @
R C D

x z

A B C D

δ(z) = −2 @
I δ(x) = 2
z @ x
@

x
δ(x) = 1 D A
δ(z) = −1 z

A y y D

B C B C

Figure 4.3: 4 cases for balancing a AVL tree after insertion

88 CHAPTER 4. AVL TREE

4.3.2 Pattern Matching

The pattern matching fixing function can be given as the following.


(((A, x, B, δ(x)), y, (C, z, D, 0), 0), ∆H − 1)

 :
Pll (T )

(((A, x, B, 0), y, (C, z, D, δ(z)), 0), ∆H − 1) :
Prr (T )
balance(T, ∆H) =

 (((A, x, B, δ ′ (x)), y, (C, z, D, δ ′ (z)), 0), ∆H − 1) Prl (T ) ∨ Plr (T )
:

(T, ∆H) :
otherwise
(4.17)
Where Pll (T ) means the pattern of tree T is left-left lean respectively. δ ′ (x)
and delta′ (z) are defined in (C.16). The four patterns are tested as below.

Pll (T ) : T = (((A, x, B, δ(x)), y, C, −1), z, D, −2)

Prr (T ) : T = (A, x, (B, y, node(C, z, D, δ(z)), 1), 2)
(4.18)
Prl (T ) : T = ((A, x, (B, y, C, δ(y)), 1), z, D, −2)
Plr (T ) : T = (A, x, ((B, y, C, δ(y)), z, D, −1), 2)
Translating the above function definition to Haskell yields a simple and in-
tuitive program.
balance (Br (Br (Br a x b dx) y c (-1)) z d (-2), _) =
(Br (Br a x b dx) y (Br c z d 0) 0, 0)
balance (Br a x (Br b y (Br c z d dz) 1) 2, _) =
(Br (Br a x b 0) y (Br c z d dz) 0, 0)
balance (Br (Br a x (Br b y c dy) 1) z d (-2), _) =
(Br (Br a x b dx') y (Br c z d dz') 0, 0) where
dx' = if dy == 1 then -1 else 0
dz' = if dy == -1 then 1 else 0
balance (Br a x (Br (Br b y c dy) z d (-1)) 2, _) =
(Br (Br a x b dx') y (Br c z d dz') 0, 0) where
dx' = if dy == 1 then -1 else 0
dz' = if dy == -1 then 1 else 0
balance (t, d) = (t, d)
The insertion algorithm takes time proportion to the height of the tree. As
AVL is balanced according to (4.7), its performance is O(lg n) where n is the
number of elements stored in the AVL tree.

Verification
When verify if a tree is AVL tree, we need verify two things, first, it’s a binary
search tree; second, it satisfies AVL tree property.
In order to test if a binary tree satisfies AVL tree property, we can examine
the height difference between the two sub trees recursively till the leaves.

{
T rue : T =ϕ
avl?(T ) = (4.19)
avl?(Tl ) ∧ avl?(Tr ) ∧ ||Tr | − |Tl || ≤ 1 : otherwise
Where the height can also be calculated recursively.
{
0 : T =ϕ
|T | = (4.20)
1 + max(|Tr |, |Tl |) : otherwise
The corresponding Haskell example program is given as the following.
4.4. DELETION 89

isAVL :: (AVLTree a) → Bool

isAVL Empty = True
isAVL (Br l _ r d) = and [isAVL l, isAVL r, abs (height r - height l) ≤ 1]

height :: (AVLTree a) → Int

height Empty = 0
height (Br l _ r _) = 1 + max (height l) (height r)

Exercise 4.1
Write a program to verify if a tree is the AVL tree. Please consider both
functional and imperative approaches.

4.4 Deletion
As we mentioned before, deletion will not be a major problem in purely func-
tional settings. As the tree is read only, the use case is typically performing
looking up after build.
For purely functional deletion, it actually re-builds the tree as we show in the
chatper of red-black tree. We put the AVL tree deletion algorithm in Appendix
C.

4.5 Imperative AVL tree algorithm ⋆

This section shows the traditional insert-and-rotate approach to realize AVL
tree insertion algorithm.
Similar to the red-black tree algorithm, the strategy is to first do the binary
search tree insertion, then fix the balance by rotation and return the final result.
1: function Insert(T, k)
2: root ← T
3: x ← Create-Leaf(k)
4: δ(x) ← 0
5: parent ← NIL
6: while T ̸= NIL do
7: parent ← T
8: if k < Key(T ) then
9: T ← Left(T )
10: else
11: T ← Right(T )
12: Parent(x) ← parent
13: if parent = NIL then ▷ tree T is empty
14: return x
15: else if k < Key(parent) then
16: Left(parent) ← x
17: else
18: Right(parent) ← x
19: return AVL-Insert-Fix(root, x)
90 CHAPTER 4. AVL TREE

Note that after insertion, the balance factor δ may change because the height
of the tree can grow. Inserting on right side can increase δ by 1, while insert
on left side can decrease it. By the end of this algorithm, we need perform
bottom-up fixing from node x towards root.
We can translate the pseudo code to Python example program3 .
def avl_insert(t, key):
root = t
x = Node(key)
parent = None
while(t):
parent = t
if(key < t.key):
t = t.left
else:
t = t.right
if parent is None: #tree is empty
root = x
elif key < parent.key:
parent.set_left(x)
else:
parent.set_right(x)
return avl_insert_fix(root, x)
This is a top-down algorithm. It searches the tree from root down to the
proper position and inserts the new key as a leaf. By the end of this algorithm,
it calls the fixing function with the root and the new inserted node.
Note that we reuse the same methods of set_left() and set_right() as
we defined in chapter of red-black tree.
In order to resume the AVL tree property, we first check if the new node is
inserted on left or right. If it is on left, the balance factor δ decreases, otherwise
it increases. If we denote the new value as δ ′ , there are 3 cases between δ and
δ′ .

• If |δ| = 1 and |δ ′ | = 0, it means the new node makes the tree perfectly
balanced, the height of the parent node doesn’t change, the algorithm can
be terminated.
• If |δ| = 0 and |δ ′ | = 1, it means either the left or the right sub tree
increases its height. We need go on checking the upper level of the tree.
• If |δ| = 1 and |δ ′ | = 2, it means the AVL tree property is violated due to
the new insertion. We need perform rotation to fix it.

1: function AVL-Insert-Fix(T, x)
2: while Parent(x) ̸= NIL do
3: δ ← δ(Parent(x))
4: if x = Left(Parent(x)) then
5: δ′ ← δ − 1
6: else
7: δ′ ← δ + 1
8: δ(Parent(x)) ← δ ′
3C and C++ source code are available along with this book
4.5. IMPERATIVE AVL TREE ALGORITHM ⋆ 91

9: P ← Parent(x)
10: L ← Left(x)
11: R ← Right(x)
12: if |δ| = 1 and |δ ′ | = 0 then ▷ Height doesn’t change, terminates.
13: return T
14: else if |δ| = 0 and |δ ′ | = 1 then ▷ Go on bottom-up updating.
15: x←P
16: else if |δ| = 1 and |δ ′ | = 2 then
17: if δ ′ = 2 then
18: if δ(R) = 1 then ▷ Right-right case
19: δ(P ) ← 0 ▷ By (C.5)
20: δ(R) ← 0
21: T ← Left-Rotate(T, P )
22: if δ(R) = −1 then ▷ Right-left case
23: δy ← δ(Left(R)) ▷ By (C.16)
24: if δy = 1 then
25: δ(P ) ← −1
26: else
27: δ(P ) ← 0
28: δ(Left(R)) ← 0
29: if δy = −1 then
30: δ(R) ← 1
31: else
32: δ(R) ← 0
33: T ← Right-Rotate(T, R)
34: T ← Left-Rotate(T, P )
35: if δ ′ = −2 then
36: if δ(L) = −1 then ▷ Left-left case
37: δ(P ) ← 0
38: δ(L) ← 0
39: Right-Rotate(T, P )
40: else ▷ Left-Right case
41: δy ← δ(Right(L))
42: if δy = 1 then
43: δ(L) ← −1
44: else
45: δ(L) ← 0
46: δ(Right(L)) ← 0
47: if δy = −1 then
48: δ(P ) ← 1
49: else
50: δ(P ) ← 0
51: Left-Rotate(T, L)
52: Right-Rotate(T, P )
53: break
54: return T
As rotation operation doesn’t update the balance factor δ, we need update it
for impacted nodes. Among the four cases, the right-right case and the left-left
92 CHAPTER 4. AVL TREE

case need only one rotation, while the right-left case and the left-right case need
two rotations.
The relative example python program is as the following.

def avl_insert_fix(t, x):

while x.parent is not None:
d2 = d1 = x.parent.delta
if x == x.parent.left:
d2 = d2 - 1
else:
d2 = d2 + 1
x.parent.delta = d2
(p, l, r) = (x.parent, x.parent.left, x.parent.right)
if abs(d1) == 1 and abs(d2) == 0:
return t
elif abs(d1) == 0 and abs(d2) == 1:
x = x.parent
elif abs(d1)==1 and abs(d2) == 2:
if d2 == 2:
if r.delta == 1: # Right-right case
p.delta = 0
r.delta = 0
t = left_rotate(t, p)
if r.delta == -1: # Right-Left case
dy = r.left.delta
if dy == 1:
p.delta = -1
else:
p.delta = 0
r.left.delta = 0
if dy == -1:
r.delta = 1
else:
r.delta = 0
t = right_rotate(t, r)
t = left_rotate(t, p)
if d2 == -2:
if l.delta == -1: # Left-left case
p.delta = 0
l.delta = 0
t = right_rotate(t, p)
if l.delta == 1: # Left-right case
dy = l.right.delta
if dy == 1:
l.delta = -1
else:
l.delta = 0
l.right.delta = 0
if dy == -1:
p.delta = 1
else:
p.delta = 0
t = left_rotate(t, l)
t = right_rotate(t, p)
4.6. CHAPTER NOTE 93

break
return t

We put the AVL tree deletion algorithm in appendix C for reference.

4.6 Chapter note

AVL tree was invented in 1962 by Adelson-Velskii and Landis[3], [4]. The name
AVL tree comes from the two inventor’s name. It’s earlier than red-black tree.
It’s very common to compare AVL tree and red-black tree, both are self-
balancing binary search trees, and for all the major operations, they both con-
sume O(lg n) time. From the result of (4.7), AVL tree is more rigidly balanced
hence they are faster than red-black tree in looking up intensive applications
[3]. However, red-black trees could perform better in frequently insertion and
removal cases.
Many popular self-balancing binary search tree libraries are implemented on
top of red-black tree such as STL etc. However, AVL tree provides an intuitive
and effective solution to the balance problem as well.
After this chapter, we’ll extend the tree data structure from storing data in
node to storing information on edges. It leads to Radix trees. If we extend the
number of children from two to more, we can get B-tree. These data structures
will be introduced in the next chapters.
94 CHAPTER 4. AVL TREE
Bibliography

[1] Data.Tree.AVL http://hackage.haskell.org/packages/archive/

AvlTree/4.2/doc/html/Data-Tree-AVL.html
[2] Chris Okasaki. “FUNCTIONAL PEARLS Red-Black Trees in a Functional
Setting”. J. Functional Programming. 1998
[3] Wikipedia. “AVL tree”. http://en.wikipedia.org/wiki/AVL_tree
[4] Guy Cousinear, Michel Mauny. “The Functional Approach to Program-
ming”. Cambridge University Press; English Ed edition (October 29, 1998).
ISBN-13: 978-0521576819

[5] Pavel Grafov. “Implementation of an AVL tree in Python”. http://

github.com/pgrafov/python-avl-tree

95
96 Radix tree, Trie and Prefix Tree
Chapter 5

Radix tree, Trie and Prefix

Tree

5.1 Introduction
The binary trees introduced so far store information in nodes. Edge can also
be used to store information. Radix trees including Trie and prefix tree are im-
portant data structures in information retrieving and manipulating. They were
found in 1960s. And are widely used in compiler design[2], and bio-information
area, such as DNA pattern matching [3].

0 1

1 0

1 0 1

011 100

1011

Figure 5.1: Radix tree.

Figure 5.1 shows a radix tree([2] pp. 269). It contains strings of bit 1011,
10, 011, 100 and 0. When searching a key k = (b0 b1 ...bn )2 , we take the first bit
b0 (MSB from left), check if it is 0 or 1, if it is 0, we turn left, else turn right
for 1. Then we take the second bit and repeat this search till either meet a leaf
node or finish all the n bits.
The radix tree needn’t store keys in node at all. The information is repre-

97
98 CHAPTER 5. RADIX TREE, TRIE AND PREFIX TREE

sented by edges. The nodes marked with keys in the above figure are only for
illustration purpose.
Another idea is to represent the key in integer instead of string. Because
integer can be in binary format to save space. The speed is also fast as we can
use bit-wise manipulation in most programming environments.

5.2 Integer Trie

The data structure shown in figure 5.1 is often called as binary trie. Trie is
invented by Edward Fredkin. It comes from “retrieval”, pronounce as /’tri:/
by the inventor, while it is pronounced /’trai/ “try” by other authors [5]. Trie
is also called prefix tree. A binary trie is a special binary tree in which the
placement of each key is controlled by its bits, each 0 means ‘go left’ and each
1 means ‘go right’[2].
Because integer can be represented in binary format, we can use it instead
of 0, 1 string. When insert a new integer to the trie, we change it to binary
form, then examine the first bit, if it is 0, we recursively insert the rest bits to
the left sub-tree; otherwise if it is 1, we insert into the right sub-tree.
There is a problem when treat the key as integer. Consider a binary trie
shown in figure 5.2. If represented in 0, 1 strings, all the three keys are different
although they are equal integers. Where should we insert decimal 3 to this trie?

0 1

0 1 1

1 1

011

0011

Figure 5.2: A big-endian trie.

One approach is to treat all the prefix zero as effective bits. Suppose the
integer is represented with 32-bits, If we want to insert key 1, it ends up with
a tree of 32 levels. There are 31 nodes, each only has the left sub-tree. the last
node only has the right sub-tree. It is very inefficient in terms of space.
Okasaki shows a method to solve this problem in [2]. Instead of using big-
endian integer, we can use the little-endian integer to represent key. Thus
decimal integer 1 is represented as binary 1. When insert it to the empty binary
trie, the result is a trie with a root and a right leaf. There is only 1 level.
decimal 2 is represented as 01, and decimal 3 is (11)2 in little-endian binary
5.2. INTEGER TRIE 99

format. There is no need to add any prefix 0, the position in the trie is uniquely
determined.

5.2.1 Definition of integer Trie

We can use the binary tree structure to define the littel-endian binary trie. A
binary trie node is either empty, or a branch. The branch node contains a left
child, a right node, and optional value as the satellite data. The left sub-tree is
encoded as 0 and the right sub-tree is encoded as 1.
The following example Haskell code defines the integer trie as algebraic data
type.
data IntTrie a = Empty
| Branch (IntTrie a) (Maybe a) (IntTrie a)
The below Python example provides the corresponding imperative definition.
class IntTrie:
def __init__(self):
self.left = self.right = None
self.value = None

5.2.2 Insertion
Because the definition of the integer trie is recursive, it’s strightforward to define
the insertion algorithm recursively. If the lowest bit is 0, the key to be inserted
is even, we recursively insert it to the left sub-tree; otherwise if the lowest bit
is 1, the key is odd, then the recursive insertion is applied to the right. we next
divide the key by 2 to get rid of the lowest bit. For trie T , denote the left and
right sub-trees as Tl and Tr respectively. Thus T = (Tl , v ′ , Tr ), where v ′ is the
optional satellite data. If T is empty, then Tl , Tr and v ′ are defined as empty
as well.


 (Tl , v, Tr ) : k = 0
insert(T, k, v) = (insert(Tl , k/2, v), v ′ , Tr ) : even(k) (5.1)

(Tl , v ′ , insert(Tr , ⌊k/2⌋, v)) : otherwise

If the key to be inserted already exists, this algorithm just overwrites the
previous stored data. It can be replaced with other alternatives, such as to store
the data in a linked-list.
Figure 5.3 shows an example trie. It’s generated by inserting the key-value
pairs {1 → a, 4 → b, 5 → c, 9 → d} to the empty trie.
The following Haskell example program implements the insertion algorithm.
insert t 0 x = Branch (left t) (Just x) (right t)
insert t k x
| even k = Branch (insert (left t) (k `div` 2) x) (value t) (right t)
| otherwise = Branch (left t) (value t) (insert (right t) (k `div` 2) x)

left (Branch l _ _) = l
left Empty = Empty

right (Branch _ _ r) = r
100 CHAPTER 5. RADIX TREE, TRIE AND PREFIX TREE

0 1

1:a

0 0

1 0 1

4:b 5:c

9:d

Figure 5.3: A little-endian integer binary trie for the map {1 → a, 4 → b, 5 →

c, 9 → d}.

right Empty = Empty

value (Branch _ v _) = v
value Empty = Nothing

We can also define the insertion algorithm imperatively. As the key is is

stored as little-endian integer, when insert a new key, we extract the bit one by
one from the right most. If it is 0, we go to the left, otherwise for 1, we go to
the right. If the sub-tree is empty, we need create a new node, and repeat this
to the last bit of the key.
1: function Insert(T, k, v)
2: if T = NIL then
3: T ← Empty-Node
4: p←T
5: while k ̸= 0 do
6: if Even?(k) then
7: if Left(p) = NIL then
8: Left(p) ← Empty-Node
9: p ← Left(p)
10: else
11: if Right(p) = NIL then
12: Right(p) ← Empty-Node
13: p ← Right(p)
14: k ← ⌊k/2⌋
15: Data(p) ← v
16: return T
This algorithm takes 3 arguments, a Trie T , a key k, and the satellite data
v. The following example Python program implements the insertion algorithm.
5.2. INTEGER TRIE 101

It uses bit-wise operation to test whether a number is even or odd, and shift
the bit to right as division.
def insert(t, key, value = None):
if t is None:
t = IntTrie()
p=t
while key != 0:
if key & 1 == 0:
if p.left is None:
p.left = IntTrie()
p = p.left
else:
if p.right is None:
p.right = IntTrie()
p = p.right
key = key >> 1 # key / 2
p.value = value
return t
For a given integer k with m bits in binary, the insertion algorithm goest
into m levels. The performance is bound to O(m) time.

5.2.3 Look up
To look up key k in the little-endian integer binary trie, if the trie is empty, the
looking up fails; if k = 0, then we return the data stored in the current node; if
the last bit is 0, we recursively look up the left sub-tree; otherwise we look up
the right sub-tree.


 ϕ : T =ϕ

d : k=0
lookup(T, k) = (5.2)

 lookup(T l , k/2) : even(k)

lookup(Tr , ⌊k/2⌋) : otherwise
The following Haskell example program implements the recursive look up
algorithm.
search Empty k = Nothing
search t 0 = value t
search t k = if even k then search (left t) (k `div` 2)
else search (right t) (k `div` 2)
The look up algorithm can also be realized imperatively. We examine each
bit of k from the lowest one. We go left if the bit is 0, otherwise, go right. The
looking up completes when all bits are consumed.
1: function Lookup(T, k)
2: while k ̸= 0 ∧ T ̸=NIL do
3: if Even?(k) then
4: T ← Left(T )
5: else
6: T ← Right(T )
7: k ← ⌊k/2⌋
8: if T ̸= NIL then
102 CHAPTER 5. RADIX TREE, TRIE AND PREFIX TREE

9: return Data(T )
10: else
11: return not found
Below example Python program implements the looking up algorithm.
def lookup(t, key):
while t is not None and k != 0:
if key & 1 == 0:
t = t.left
else:
t = t.right
key = key >> 1
return None if t is None else t.value

The looking up algorithm is bound to O(m) time, where m is the number of

bits of the key.

5.3 Integer prefix tree

Trie has some drawbacks. It occupies a lot of spaces. As shown in figure 5.3,
the real data is mostly stored in leafs. It’s very common that an integer binary
trie contains many nodes only have one child. One idea is to compress the
chained nodes to one. Integer prefix tree is such a data structure invented by
Donald R. Morrison in 1968, who named it as ’Patricia’. It stands for Practical
Algorithm To Retrieve Information Coded In Alphanumeric[3]. It is another
kind of prefix tree. We call it integer tree in this book.
Okasaki provided the implementation of integer tree in [2]. If merge the
chained nodes which have only one child together in figure 5.3, we can get a
integer tree as shown in figure 5.4.

001 1

4:b 1:a

01 1

9:d 5:c

Figure 5.4: Little endian integer tree for the map {1 → a, 4 → b, 5 → c, 9 → d}.

From this figure, we can find the key of the branch node is the longest com-
mon prefix for its descendant trees. They branches out at certain bit. Integer
tree saves a lot of space compare to trie.
Different from integer trie, padding bits of zero don’t cause issue with the
big endian integer tree. All zero bits before MSB are omitted to save the space.
5.3. INTEGER PREFIX TREE 103

Okasaki list some significant advantages of big endian integer tree in [2].

5.3.1 Definition
Integer prefix tree is a special binary tree. It is either empty or a node. There
are two different types of node:
• A leaf contains integer key and optional satellite data;
• Or a branch node with the left and right sub-trees. The two children share
the longest common prefix bits for their keys. For the left child, the
next bit in the key is zero, while it’s one for the right child.
The following Haskell example code defines integer tree accordingly.
type Key = Int
type Prefix = Int
type Mask = Int

data IntTree a = Empty

| Leaf Key a
| Branch Prefix Mask (IntTree a) (IntTree a)
In the branch node, we use a mask number to tell from which bit the sub-
trees differ. The mask is power of 2, which is 2n for some non-negative integer
n, all bits that are lower than n don’t belong to the common prefix.
The following example Python code defines integer tree with auxiliary func-
tions.
class IntTree:
def __init__(self, key = 0, value = None):
self.key = key
self.value = value
self.prefix = key
self.mask = 1
self.left = self.right = None

def isleaf(self):
return self.left is None and self.right is None

def replace(self, x, y):

if self.left == x:
self.left = y
else:
self.right = y

def match(self, k):

return maskbit(k, self.mask) == self.prefix
Where match tests if the prefix stored in the node are same as the given key
before the mask bit. It’s explained in the next section.

5.3.2 Insertion
When insert a key, if the tree is empty, we create a leaf node as shown in figure
5.5.
104 CHAPTER 5. RADIX TREE, TRIE AND PREFIX TREE

NIL 12

Figure 5.5: Left: the empty tree; Right: After insert key 12.

If the tree is a singleton leaf node x, we create a new leaf y, put the key and
the value into it. After that, we need create a new branch node, set x and y
as the two sub-trees. In order to determine if y should be on the left or right,
we need find the longest common prefix of x and y. For example if key(x) is 12
((1100)2 in binary), key(y) is 15 ((1111)2 in binary), then the longest common
prefix is (11oo)2 . Where o denotes the bits we don’t care about. We can use
another integer to mask those bits. In this case, the mask number is 4 (100 in
binary). The next bit after the longest common prefix presents 21 . This bit is
0 in key(x), while it is 1 in key(y). We should set x as the left sub-tree and y
as the right sub-tree. Figure 5.6 shows this example.

prefix=1100
12
mask=100

0 1

12 15

Figure 5.6: Left: A tree with a singleton leaf 12; Right: After insert key 15.

In case the tree is neither empty, nor a singleton leaf, we need firstly check if
the key to be inserted matches the longest common prefix recorded in the root.
Then recursively insert the key to the left or right according to the next bit of
the longest common prefix. For example, if insert key 14 ((1110)2 in binary) to
the result tree in figure 5.6, since the common prefix is (11oo)2 , and the next
bit (the bit of 21 ) is 1, we need recursively insert to the right sub-tree.
If the key to be inserted doesn’t match the longest common prefix in the
root, we need branch a new leaf out. Figure 5.7 shows these two different cases.
For a given key k and value v, denote (k, v) as the leaf node. For branch
node, denote it in form of (p, m, Tl , Tr ), where p is the longest common prefix,
m is the mask, Tl and Tr are the left and right sub-trees. Summarize the above
cases, the insertion algorithm can be defined as below.



 (k, v) = ϕ ∨ T = (k, v ′ )
: T


 join(k, (k, v), k ′ , T ) = (k ′ , v ′ )
: T
insert(T, k, v) = (p, m, insert(Tl , k, v), Tr ) : T
= (p, m, Tl , Tr ), match(k, p, m), zero(k, m)



 (p, m, Tl , insert(Tr , k, v)) = (p, m, Tl , Tr ), match(k, p, m), ¬zero(k, m)
: T

join(k, (k, v), p, T ) = (p, m, Tl , Tr ), ¬match(k, p, m)
: T
(5.3)
The first clause deals with the edge cases, if T is empty, the result is a leaf
node. If T is a leaf node with the same key, we overwrite the previous value.
5.3. INTEGER PREFIX TREE 105

prefix=1100 prefix=1100
mask=100 mask=100

0 1 0 1

prefix=1110
12 15 12
mask=10

0 1

14 15

(a) Insert key 14. It matches the longest

common prefix (1100)2 ; 14 is then recur-
sively inserted to the right sub-tree.

prefix=1100 prefix=0
mask=100 mask=10000

0 1 0 1

prefix=1110
12 15 5
mask=10

0 1

12 15

(b) Insert key 5. It doesn’t match the longest com-

mon prefix (1100)2 , a new leaf is branched out.

Figure 5.7: Insert key to the branch node.

106 CHAPTER 5. RADIX TREE, TRIE AND PREFIX TREE

The second clause handles the case that T is a leaf node, but with differ-
ent key. Here we branch out another leaf, then extract the longest common
prefix, and determine which leaf should be set as the left sub-tree. Function
join(k1 , T1 , k2 , T2 ) does this work. We’ll define it later.
The third clause deals with the case that T is a branch node, the longest
common prefix matches the key to be inserted, and the next bit to the common
prefix is zero. Here we need recursively insert to the left sub-tree.
The fourth clause handles the similar case as the third one, except that the
next bit to the common prefix is one, but not zero. We need recursively insert
to the right sub-tree.
The last clause is for the case that the key to be inserted doesn’t match the
longest common prefix in the branch. We need branch out a new leaf by calling
the join function.
We need define function match(k, p, m) to test if the key k, has the same
prefix p above the masked bits m. For example, suppose the prefix stored in a
branch node is (pn pn−1 ...pi ...p0 )2 in binary, key k is (kn kn−1 ...ki ...k0 )2 in binary,
and the mask is (100...0)2 = 2i . They match if and only if pj = kj for all j,
that i ≤ j ≤ n.
One solution to realize match is to test if mask(k, m) = p is satisfied. Where
mask(x, m) = m − 1&x, that we perform bitwise-not of m − 1, then perform
bitwise-and with x.
Function zero(k, m) test the next bit of the common prefix is zero. With
the help of the mask m, we can shift m one bit to the right, then perform
bitwise-and with the key.

zero(k, m) = x&shif tr (m, 1) (5.4)

If the mask m = (100..0)2 = 2i , k = (kn kn−1 ...ki 1...k0 )2 , because the bit
next to ki is 1, zero(k, m) returns false value; if k = (kn kn−1 ...ki 0...k0 )2 , then
the result is true.
Function join(p1 , T1 , p2 , T2 ) takes two different prefixes and trees. It extracts
the longest common prefix of p1 and p2 , create a new branch node, and set T1
and T2 as the two sub-trees.

{
(p, m, T1 , T2 ) :
zero(p1, m), (p, m) = LCP (p1 , p2 )
join(p1 , T1 , p2 , T2 ) =
¬zero(p1, m)
(p, m, T2 , T1 ) :
(5.5)
In order to calculate the longest common prefix of p1 and p2 , we can firstly
compute bitwise exclusive-or for them, then count the number of bits in this
result, and generate a mask m = 2|xor(p1 ,p2 )| . The longest common prefix p can
be given by masking the bits with m for either p1 or p2 .

p = mask(p1 , m) (5.6)
The following Haskell example program implements the insertion algorithm.
import Data.Bits

insert t k x
= case t of
Empty → Leaf k x
5.3. INTEGER PREFIX TREE 107

Leaf k' x' → if k==k' then Leaf k x

else join k (Leaf k x) k' t -- t@(Leaf k' x')
Branch p m l r
| match k p m → if zero k m
then Branch p m (insert l k x) r
else Branch p m l (insert r k x)
| otherwise → join k (Leaf k x) p t -- t@(Branch p m l r)

join p1 t1 p2 t2 = if zero p1 m then Branch p m t1 t2

else Branch p m t2 t1
where
(p, m) = lcp p1 p2

lcp :: Prefix → Prefix → (Prefix, Mask)

lcp p1 p2 = (p, m) where
m = bit (highestBit (p1 `xor` p2))
p = mask p1 m

highestBit x = if x == 0 then 0 else 1 + highestBit (shiftR x 1)

mask x m = (x ◦ &. complement (m-1)) -- complement means bit-wise not.

zero x m = x ◦ &. (shiftR m 1) == 0

match k p m = (mask k m) == p

The insertion algorithm can also be realized imperatively.

1: function Insert(T, k, v)
2: if T = NIL then
3: T ← Create-Leaf(k, v)
4: return T
5: y←T
6: p ← NIL
7: while y is not leaf, and Match(k, Prefix(y), Mask(y)) do
8: p←y
9: if Zero?(k, Mask(y)) then
10: y ← Left(y)
11: else
12: y ← Right(y)
13: if y is leaf, and k = Key(y) then
14: Data(y) ← v
15: else
16: z ← Branch(y, Create-Leaf(k, v))
17: if p = NIL then
18: T ←z
19: else
20: if Left(p) = y then
21: Left(p) ← z
22: else
23: Right(p) ← z
24: return T
108 CHAPTER 5. RADIX TREE, TRIE AND PREFIX TREE

Function Branch(T1 , T2 ) does the similar job as what join is defined. It

creates a new branch node, extracts the longest common prefix, sets T1 and T2
as the two sub-trees.
1: function Branch(T1 , T2 )
2: T ← Empty-Node
3: ( Prefix(T ), Mask(T ) ) ← LCP(Prefix(T1 ), Prefix(T2 ))
4: if Zero?(Prefix(T1 ), Mask(T )) then
5: Left(T ) ← T1
6: Right(T ) ← T2
7: else
8: Left(T ) ← T2
9: Right(T ) ← T1
10: return T
The following Python example program implements the insertion algorithm.
def insert(t, key, value):
if t is None:
return IntTree(key, value)
node = t
parent = None
while (not node.isleaf()) and node.match(key):
parent = node
if zero(key, node.mask):
node = node.left
else:
node = node.right
if node.isleaf() and key == node.key:
node.value = value
else:
p = branch(node, IntTree(key, value))
if parent is None:
return p
parent.replace(node, p)
return t
The auxiliary functions, branch, lcp etc. are given as below.
def maskbit(x, mask):
return x & (~(mask - 1))

def zero(x, mask):

return x & (mask >> 1) == 0

def lcp(p1, p2):

diff = p1 ^ p2
mask = 1
while diff != 0:
diff >>= 1
mask <≤ 1
return (maskbit(p1, mask), mask)

def branch(t1, t2):

t = IntTree()
(t.prefix, t.mask) = lcp(t1.prefix, t2.prefix)
5.3. INTEGER PREFIX TREE 109

if zero(t1.prefix, t.mask):
t.left, t.right = t1, t2
else:
t.left, t.right = t2, t1
return t
Figure 5.8 shows the example integer tree created with the insertion algo-
rithm.

prefix=0
mask=8

0 1

prefix=100
1:x
mask=2

0 1

4:y 5:z

Figure 5.8: Insert map 1 → x, 4 → y, 5 → z into the big-endian integer prefix

tree.

5.3.3 Look up
If the integer tree T is empty, or it’s a singleton leaf with the key that is different
from what we are looking up, the result is empty. else if the key in the leaf equals,
we are done. If T is a branch node, we need check if the common prefix matches
the subject key, and recursively look up the sub-tree according to the next bit.
If the common prefix doesn’t match the key, then the lookup fails.



 ϕ T = ϕ ∨ (T = (k ′ , v), k ′ ̸= k)
:


 v T = (k ′ , v), k ′ = k
:
lookup(T, k) = lookup(Tl , k) :
T = (p, m, Tl , Tr ), match(k, p, m), zero(k, m)



 lookup(Tr , k) T = (p, m, Tl , Tr ), match(k, p, m), ¬zero(k, m)
:

ϕ :
otherwise
(5.7)
The following Haskell example program implements this recursive lookup up
algorithm.
search t k
= case t of
Empty → Nothing
Leaf k' x → if k == k' then Just x else Nothing
Branch p m l r
| match k p m → if zero k m then search l k
else search r k
| otherwise → Nothing
110 CHAPTER 5. RADIX TREE, TRIE AND PREFIX TREE

The look up algorithm can also be realized imperatively. Consider the prop-
erty of integer prefix tree. When look up a key, if it has common prefix with
the root, then we check the next bit. If this bit is zero, we then recursively look
up the left sub-tree; otherwise we look up the right sub-tree if the bit is one.
When arrive at the leaf node, we check if the key of the leaf equals to the
one we are looking up.
1: function Look-Up(T, k)
2: if T = NIL then
3: return N IL ▷ Not found
4: while T is not leaf, and Match(k, Prefix(T ), Mask(T )) do
5: if Zero?(k, Mask(T )) then
6: T ← Left(T )
7: else
8: T ← Right(T )
9: if T is leaf, and Key(T ) = k then
10: return Data(T )
11: else
12: return N IL ▷ Not found
Below Python example program implements the looking up algorithm.
def lookup(t, key):
while t is not None and (not t.isleaf()) and t.match(key):
if zero(key, t.mask):
t = t.left
else:
t = t.right
if t is not None and t.isleaf() and t.key == key:
return t.value
return None

5.4 Alphabetic Trie

Integer based trie and tree can be a good start point. The Glasgow Haskell
Compiler (GHC) utilized the similar integer tree implementation for several
years before 1998[2].
If we extend the key from integer to alphabetic value, Trie and integer tree
can be very powerful in solving textual manipulation problems.

5.4.1 Definition
It’s not enough to just use the left and right sub-trees to represent alphabetic
keys. Taking English for example, there are 26 letters. If we don’t care about
the case, one solution is to limit the number of branches (children) to 26. Some
simplified implementation defines the trie with the array of 26 letters. This can
be illustrated as in Figure 5.9.
Not all the 26 branches contain data. For instance, in Figure 5.9, the root
only has three non-empty branches representing letter ’a’, ’b’, and ’z’. Other
branches such as for letter ’c’, are all empty. We will not show empty branch in
the future.
5.4. ALPHABETIC TRIE 111

a b c z

a nil ...

n o o

o o y o

boy zoo

t l

bool

another

Figure 5.9: A trie with 26 branches, containing key ’a’, ’an’, ’another’, ’bool’,
’boy’ and ’zoo’.
112 CHAPTER 5. RADIX TREE, TRIE AND PREFIX TREE

When dealing with case sensitive problems, or handling languages other than
English, there can be more letters. We can use the collection data structures,
like Hash map to define the trie.
Alphabetic trie is either empty or a node. There are two types of node.

• A leaf node does not have any sub-trees;

• A branch node contains multiple sub-trees. Each sub-tree is bound to a

character.

Both leaf and branch can contain optional satellite data. The following
Haskell code shows the example definition.
data Trie a = Trie { value :: Maybe a
, children :: [(Char, Trie a)]}

empty = Trie Nothing []

Below ANSI C example code defines the alphabetic trie. For illustration
purpose, it limits the character set to lower case English letters, from ’a’ to ’z’.
struct Trie {
struct Trie∗ children[26];
void∗ data;
};

5.4.2 Insertion
When insert to the trie, denote the key to be inserted as K = k1 k2 ...kn , where
ki is the i-th character. K ′ is the rest of characters except k1 , v ′ is the data to
be inserted. The trie is in form T = (v, C), where v is the data store in the trie,
C = {(c1 , T1 ), (c2 , T2 ), ..., (cm , Tm )} is the collection of sub-trees. It associates
a character ci and the corresponding sub-tree Ti . C is empty for leaf node.
{
(v ′ , C) : K = ϕ
insert(T, K, v ′ ) = (5.8)
(v, ins(C, k1 , K ′ , v ′ )) : otherwise.
If the key is empty, the previous value v is overwritten with v ′ . Otherwise,
we need check the children and perform recursive insertion. This is realized in
function ins(C, k1 , K ′ , v ′ ). It examines the (character, sub-tree) pairs in C one
by one. Let C ′ be the rest of pairs except for the first one. This function can
be defined as below.


 {(k1 , insert((ϕ, ϕ), K ′ , v ′ ))} : C = ϕ
′ ′
ins(C, k1 , K , v ) = {k1 , insert(T1 , K ′ , v ′ )} ∪ C ′ : k1 = c1 (5.9)

{(c1 , T1 )} ∪ ins(C ′ , k1 , K ′ , v ′ ) : otherwise

If C is empty, we create a pair, mapping from character k1 to a new empty

tree (it is not ϕ, but a node with empty value and empty sub-tree list), and
recursively insert the rest characters. Otherwise, the algorithm locates the child
which is mapped from k1 for further insertion.
The following Haskell example program implements the insertion algorithm.
5.4. ALPHABETIC TRIE 113

insert t [] x= Trie (Just x) (children t)

insert t (k:ks) x = Trie (value t) (ins (children t) k ks x) where
ins [] k ks x = [(k, (insert empty ks x))]
ins (p:ps) k ks x = if fst p == k
then (k, insert (snd p) ks x):ps
else p:(ins ps k ks x)

To realize the insertion imperatively, starting from the root, we pick the
character one by one from the string. For each character, we examine which
child sub-tree represents that character. If the corresponding child is empty,
a new node is created. After that, we pick the next character and repeat this
process.
After consuming all the characters, we then store the value bound the key
in the node we arrived.
1: function Insert(T, k, v)
2: if T = NIL then
3: T ← Empty-Node
4: p←T
5: for each c in k do
6: if Children(p)[c] = NIL then
7: Children(p)[c] ← Empty-Node
8: p ← Children(p)[c]
9: Data(p) ← v
10: return T
The following example ANSI C program implements the insertion algorithm.
struct Trie∗ insert(struct Trie∗ t, const char∗ key, void∗ value) {
int c;
struct Trie ∗p;
if(!t)
t = create_node();
for (p = t; ∗key; ++key, p = p→children[c]) {
c = ∗key - 'a';
if (!p→children[c])
p→children[c] = create_node();
}
p→data = value;
return t;
}

Where function create_node creates new empty node, with all children
initialized to empty.
struct Trie∗ create_node() {
struct Trie∗ t = (struct Trie∗) malloc(sizeof(struct Trie));
int i;
for (i = 0; i < 26; ++i)
t→children[i] = NULL;
t→data = NULL;
return t;
}
114 CHAPTER 5. RADIX TREE, TRIE AND PREFIX TREE

5.4.3 Look up
When looking up a key, we start from the first character, if it is bound to some
sub-tree, we then recursively search the rest characters in that child sub-tree.
Denote the trie as T = (v, C), the key being looked up as K = k1 k2 ...kn if it
isn’t empty. The first character in the key is k1 , and the rest characters are
represented as K ′ .


 v : K=ϕ
lookup(T, K) = ϕ : f ind(C, k1 ) = ϕ (5.10)

lookup(T ′ , K ′ ) : f ind(C, k1 ) = T ′

Where function f ind(C, k) examines the character-tree pairs one by one to

check if any child sub-tree is bound to character k. If the list of pairs C is empty,
then the subject key does not exist. Otherwise, let C = {(k1 , T1 ), (k2 , T2 ), ..., (km , Tm )},
the first sub-tree T1 is bound to k1 , the rest of pairs are represented as C ′ . We
repeatedly consumes each pair to located the sub-tree for further search. Below
equation defines the f ind function.

 ϕ : C=ϕ
f ind(C, k) = T1 : k1 = k (5.11)

f ind(C ′ , k) : otherwise
The following Haskell example program implements the trie looking up al-
gorithm. It uses the lookup function provided in standard library.
find t [] = value t
find t (k:ks) = case lookup k (children t) of
Nothing → Nothing
Just t' → find t' ks

To realize the look up algorithm imperatively, we extract the character from

the key one by one. For each character, we search among the sub-trees to see if
there is a branch matches this character. If there is no such a child, the look up
process terminates to indicate that the key does not exist. When we arrive at
the last character of the key, the data stored in the current node is the result.
1: function Look-Up(T, key)
2: if T = NIL then
3: return not found
4: for each c in key do
5: if Children(T )[c] = NIL then
6: return not found
7: T ← Children(T )[c]
8: return Data(T )
Below ANSI C example program implements the look up algorithm. It
returns NULL if the key does not exist.
void∗ lookup(struct Trie∗ t, const char∗ key) {
while (∗key && t && t→children[∗key - 'a'])
t = t→children[∗key++ - 'a'];
return (∗key | | !t) ? NULL : t→data;
}
5.5. ALPHABETIC PREFIX TREE 115

Exercise 5.1

• Use a collection data structure to manage sub-trees in the imperative

alphabetic trie. How does the collection impact the performance?

5.5 Alphabetic prefix tree

Similar to integer trie, alphabetic trie is not memory efficient. We can use the
same approach to compress alphabetic trie to prefix tree.

5.5.1 Definition
Alphabetic prefix tree is a special prefix tree, each node contains multiple
branches. All sub-trees share the longest common prefix string in a node. As
the result, there is no node has only one child, because it conflicts with the
longest common prefix property.
If we turn the trie shown in figure 5.9 into prefix tree by compressing all
nodes which have only one child. we can get a prefix tree as in figure 5.10.

a bo zoo

a zoo

n ol y

an bool boy

other

another

Figure 5.10: A prefix tree, with keys: ’a’, ’an’, ’another’, ’bool’, ’boy’ and ’zoo’.

We can modify the alphabetic trie and adapt it to prefix tree. The tree is
either empty, or a node in form T = (v, C). Where v is the optional satellite
data; C = {(s1 , T1 ), (s2 , T2 ), ..., (sn , Tn )} represents the sub-trees. It is a list of
pairs. Each pair contains a string si , and a sub-tree Ti the string is bound to.
The following Haskell example code defines prefix tree accordingly.
data PrefixTree k v = PrefixTree { value :: Maybe v
, children :: [([k], PrefixTree k v)]}

empty = PrefixTree Nothing []

leaf x = PrefixTree (Just x) []

Below Python example program reuses the trie definition to define prefix
tree.
116 CHAPTER 5. RADIX TREE, TRIE AND PREFIX TREE

class PrefixTree:
def __init__(self, value = None):
self.value = value
self.subtrees = {}

5.5.2 Insertion
When insert a key s, if the prefix tree is empty, we create a leaf node as shown
in figure 5.11 (a). Otherwise, we examine the sub-trees to see if there’s some
tree Ti bound to the string si , and there exists common prefix between si and
s. In such case, we need branch out a new leaf Tj . To do this, we firstly create
a new internal branch node, bind it with the common prefix; then set Ti and Tj
as the two children sub-trees of this node. Ti and Tj share the common prefix.
This is shown in figure 5.11 (b). There are two special cases. s can be the prefix
of si as shown in figure 5.11 (c). Similarly, si can be the prefix of s as shown in
figure 5.11 (d).
For prefix tree T = (v, C), function insert(T, k, v ′ ) inserts key k, and value
′
v to the tree.

insert(T, k, v ′ ) = (v, ins(C, k, v ′ )) (5.12)

This function calls another function ins(C, k, v ′ ). If the children sub-trees C
is empty, a new leaf is created; Otherwise we examine the sub-trees one by one.
Denote C = {(k1 , T1 ), (k2 , T2 ), ..., (kn , Tn )}, C ′ holds all the (prefix, sub-tree)
pairs except for the first one. the ins function can be defined as the following.



 {(k, (v ′ , ϕ))} : C=ϕ
 ′
{(k, (v , CT1 ))} ∪ C ′ : k1 = k
ins(C, k, v ′ ) = ′ ′ (5.13)

 {branch(k, v , k 1 , T1 )} ∪ C : match(k1 , k)

{(k1 , T1 )} ∪ ins(C ′ , k, v ′ ) : otherwise

The first clause deals with the edge case of empty children. A leaf node
bound to k, containing v ′ is returned as the only sub-tree. The second clause
overwrites the previous value with v ′ if there is some child bound to the same
key. CT1 represents the children of sub-tree T1 . The third clause branches out
a new leaf if the first child matches the key k. The last clause goes on checking
the rest sub-trees.
We define two keys A and B matching if they have non-empty common
prefix.

match(A, B) = A ̸= ϕ ∧ B ̸= ϕ ∧ a1 = b1 (5.14)
Where a1 and b1 are the first characters in A and B if they are not empty.
Function branch(k1 , v, k2 , T2 ) takes two keys, a value and a tree. It extracts
the longest common prefix k = lcp(k1 , k2 ), and assigns the different part to
k1′ = k1 − k, k2′ = k2 − k. The algorithm firstly handles the edge cases that
either k1 is the prefix of k2 or k2 is the prefix of k1 . For the former one, it creates
a new node containing v, binds this node to k, and set (k2′ , T2 ) as the only child
sub-tree; For the later one, it recursively inserts k1′ and v to T2 . Otherwise, the
algorithm creates a branch node, binds it to the longest common prefix k, and
5.5. ALPHABETIC PREFIX TREE 117

NIL

boy ol y

(a) Insert key ‘boy’ into the empty prefix (b) Insert key ‘bool’. A new branch with
tree, the result is a leaf. common prefix ‘bo’ is created.

another an

x y

p1 p2 ... other

p1 p2 ...

insert
another

an p1 ... an p1 ...

insert

other

(d) Insert ‘another’, into the node with prefix ‘an’. We recursively insert key ‘other’ to the child.

Figure 5.11: Prefix tree insertion

118 CHAPTER 5. RADIX TREE, TRIE AND PREFIX TREE

set the two children sub-trees for it. One sub-tree is (k2′ , T2 ), the other is a leaf
node containing v, and being bound to k1′ .


 (k, (v, {(k2′ , T2 )})) : k = k1
branch(k1 , v, k2 , T2 ) = (k, insert(T2 , k1′ , v)) : k = k2

(k, (ϕ, {(k1′ , (v, ϕ)), (k2′ , T2 )}) : otherwise
(5.15)
Where

k = lcp(k1 , k2 )
k1′ = k1 − k
k2′ = k1 − k
Function lcp(A, B) keeps taking the same characters from A and B one by
one. Denote a1 and b1 as the first characters in A and B if they are not empty.
A′ and B ′ are the rest characters.

{
ϕ : A = ϕ ∨ B = ϕ ∨ a1 ̸= b1
lcp(A, B) = (5.16)
{a1 } ∪ lcp(A′ , B ′ ) : a1 = b1

The following Haskell example program implements the prefix tree insertion
algorithm.
import Data.List (isPrefixOf)

insert :: Eq k ⇒ PrefixTree k v → [k] → v → PrefixTree k v

insert t ks x = PrefixTree (value t) (ins (children t) ks x) where
ins [] ks x = [(ks, leaf x)]
ins (p@(ks', t') : ps) ks x
| ks' == ks
= (ks, PrefixTree (Just x) (children t')) : ps -- overwrite
| match ks' ks
= (branch ks x ks' t') : ps
| otherwise
= p : (ins ps ks x)

match x y = x /= [] && y /= [] && head x == head y

branch :: Eq k ⇒ [k] → v → [k] → PrefixTree k v → ([k], PrefixTree k v)

branch ks1 x ks2 t2
| ks1 == ks
-- ex: insert "an" into "another"
= (ks, PrefixTree (Just x) [(ks2', t2)])
| ks2 == ks
-- ex: insert "another" into "an"
= (ks, insert t2 ks1' x)
| otherwise = (ks, PrefixTree Nothing [(ks1', leaf x), (ks2', t2)])
where
ks = lcp ks1 ks2
m = length ks
ks1' = drop m ks1
ks2' = drop m ks2
5.5. ALPHABETIC PREFIX TREE 119

lcp :: Eq k ⇒ [k] → [k] → [k]

lcp [] _ = []
lcp _ [] = []
lcp (x:xs) (y:ys) = if x==y then x : (lcp xs ys) else []

The insertion algorithm can be realized imperative as below.

1: function Insert(T, k, v)
2: if T = NIL then
3: T ← Empty-Node
4: p←T
5: loop
6: match ← FALSE
7: for each (si , Ti ) ∈ Children(p) do
8: if k = si then
9: Value(p) ← v
10: return T
11: c ← LCP(k, si )
12: k1 ← k − c
13: k2 ← si − c
14: if c ̸= NIL then
15: match ← TRUE
16: if k2 = NIL then ▷ si is prefix of k
17: p ← Ti
18: k ← k1
19: break
20: else ▷ Branch out a new leaf
21: Add(Children(p), (c, Branch(k1 , Leaf(v), k2 , Ti )))
22: Delete(Children(p), (si , Ti ))
23: return T
24: if ¬match then ▷ Add a new leaf
25: Add(Children(p), (k, Leaf(v)))
26: break
27: return T
In this algorithm, function LCP finds the longest common prefix of the two
strings. For example, string ‘bool’ and ‘boy’ have the longest common prefix
‘bo’. The subtraction symbol ’-’ for strings gives the different part of two strings.
For example ‘bool’ - ‘bo’ = ‘ol’. Function Branch creates a branch node and
updates keys.
The longest common prefix can be extracted character by character from
two strings till there is unmatch.
1: function LCP(A, B)
2: i←1
3: while i ≤ |A| ∧ i ≤ |B| ∧ A[i] = B[i] do
4: i←i+1
5: return A[1...i − 1]
There are two cases when branch out a new leaf. Branch(s1 , T1 , s2 , T2 )
takes two different keys and trees. If s1 is empty, we are dealing with the case
such as insert key ‘an’ into a child bound to string ‘another’. We set T2 as the
child sub-tree of T1 . Otherwise, we create a new branch node and set T1 and
120 CHAPTER 5. RADIX TREE, TRIE AND PREFIX TREE

T2 as the two children.

1: function Branch(s1 , T1 , s2 , T2 )
2: if s1 = ϕ then
3: Add(Children(T1 ), (s2 , T2 ))
4: return T1
5: T ← Empty-Node
6: Children(T ) ← {(s1 , T1 ), (s2 , T2 )}
7: return T
The following example Python program implements the prefix tree insertion
algorithm.
def insert(t, key, value):
if t is None:
t = PrefixTree()
node = t
while True:
match = False
for k, tr in node.subtrees.items():
if key == k: # overwrite
node.value = value
return t
prefix, k1, k2 = lcp(key, k)
if prefix != "":
match = True
if k2 == "":
# e.g.: insert "another" into "an", go on traversing
node = tr
key = k1
break
else: #branch out a new leaf
node.subtrees[prefix] = branch(k1, PrefixTree(value), k2, tr)
del node.subtrees[k]
return t
if not match: # add a new leaf
node.subtrees[key] = PrefixTree(value)
break
return t

Where the lcp and branch functions are implemented as below.

def lcp(s1, s2):
j=0
while j < len(s1) and j < len(s2) and s1[j] == s2[j]:
j += 1
return (s1[0:j], s1[j:], s2[j:])

def branch(key1, tree1, key2, tree2):

if key1 == "":
#example: insert "an" into "another"
tree1.subtrees[key2] = tree2
return tree1
t = PrefixTree()
t.subtrees[key1] = tree1
t.subtrees[key2] = tree2
5.5. ALPHABETIC PREFIX TREE 121

return t

5.5.3 Look up
When look up a key, we can’t examine the characters one by one as in trie any
more. Start from the root, we need search among the children sub-trees to see
if any one is bound to some prefix of the key. If there is such a sub-tree, we
remove the prefix from the key, and recursively look up the updated key in this
child sub-tree. The look up fails if there’s no sub-tree bound to any prefix of
the key.
For prefix tree T = (v, C), we search among its children sub-tree C.

lookup(T, k) = f ind(C, k) (5.17)

If C is empty, the lookup fails; Otherwise, For C = {(k1 , T1 ), (k2 , T2 ), ..., (kn , Tn )},
we firstly examine if k is the prefix of k1 , then recursively check the rest pairs
denoted as C ′ .


 ϕ : C=ϕ

vT1 : k = k1
f ind(C, k) = (5.18)

 lookup(T1 , k − k1 ) : k1 ⊏ k
 ′
f ind(C , k) : otherwise
Where A ⊏ B means string A is prefix of B. f ind mutually calls lookup if
a child is bound to some prefix of the key.
Below Haskell example program implements the looking up algorithm.
find :: Eq k ⇒ PrefixTree k v → [k] → Maybe v
find t = find' (children t) where
find' [] _ = Nothing
find' (p@(ks', t') : ps) ks
| ks' == ks = value t'
| ks' `isPrefixOf` ks = find t' (diff ks ks')
| otherwise = find' ps ks
diff ks1 ks2 = drop (length (lcp ks1 ks2)) ks1

The look up algorithm can also be realized imperatively.

1: function Look-Up(T, k)
2: if T = NIL then
3: return not found
4: repeat
5: match ← FALSE
6: for ∀(ki , Ti ) ∈ Children(T ) do
7: if k = ki then
8: return Data(Ti )
9: if ki is prefix of k then
10: match ← TRUE
11: k ← k − ki
12: T ← Ti
13: break
14: until ¬match
15: return not found
122 CHAPTER 5. RADIX TREE, TRIE AND PREFIX TREE

Below Python example program implements the looking up algorithm. It

reuses the lcp(s1, s2) function defined previously to test if a string is the
prefix of the other.
def lookup(t, key):
if t is None:
return None
while True:
match = False
for k, tr in t.subtrees.items():
if k == key:
return tr.value
prefix, k1, k2 = lcp(key, k)
if prefix != "" and k2 == "":
match = True
key = k1
t = tr
break
if not match:
break
return None

5.6 Applications of trie and prefix tree

Trie and prefix tree can be used to solve many interesting problems. Integer
based prefix tree is used in compiler implementation. Some daily used software
applications have many interesting features which can be realized with trie or
prefix tree. In this section, we give some examples, including, e-dictionary,
word auto-completion, T9 input method etc. Different from the commerial
implementation, the solutions we demonstrated here are for illustration purpose
only.

5.6.1 E-dictionary and word auto-completion

Figure 5.12 shows a screen shot of an E-dictionary. When user enters characters,
the dictionary searches its word library, then lists the candidate words and
phrases starts from what the user input.
A E-dictionary typically contains hundreds of thousands words. It’s very
expensive to perform a complete search. Commercial software adopts complex
approaches, including caching, indexing etc to speed up this process.
Similar with e-dictionary, figure 5.13 shows a popular Internet search engine.
When user input something, it provides a candidate lists, with all items starting
with what the user has entered1 . And these candidates are shown in the order
of popularity. The more people search, the upper position it is in the list.
In both cases, the software provides a kind of word auto-completion mech-
anism. Some editors can also help programmers to auto-complete the code.
Let’s see how to implement the e-dictionary with prefix tree. To simplify the
problem, we assume the dictionary only supports English - English information.
1 It’s more complex than just matching the prefix. Including the spell checking and auto

currection, key words extraction and recommendation etc.

5.6. APPLICATIONS OF TRIE AND PREFIX TREE 123

Figure 5.12: E-dictionary. All candidates starting with what the user input are
listed.

Figure 5.13: A search engine. All candidates starting with what user input are
listed.
124 CHAPTER 5. RADIX TREE, TRIE AND PREFIX TREE

A dictionary stores key-value pairs, the key is English word or phrase, the
value is the meaning described in text.
We can store all the words and their meanings in a trie, but it consumes too
large space especially when there are huge amount of items. We’ll use prefix
tree to realize the e-dictionary.
When user wants to look up word ’a’, the dictionary does not only return the
meaning of ’a’, but also provides a list of candidates starting with ’a’, including
’abandon’, ’about’, ’accent’, ’adam’, ... Of course all these words are stored in
the prefix tree.
If there are too many candidates, we can limit only displaying the top 10
candidates, and allow the user to browse more.
To define this algorithm, if the string we are looking for is empty, we ex-
pand all children sub-trees until getting n candidates. Otherwise we recursively
examine the children to find one which has prefix equal to this string.
In programming environments supporting lazy evaluation. An intuitive so-
lution is to lazily expand all candidates, and take the first n on demand. Denote
the prefix tree in form T = (v, C), below function enumerates all items starts
with key k.


 enum(C) : k = ϕ, v = ϕ
f indAll(T, k) = {(ϕ, v)} ∪ enum(C) : k = ϕ, v ̸= ϕ (5.19)

f ind(C, k) : k ̸= ϕ

The first two clauses deal with the edge cases that the key is empty. All the
children sub-trees are enumerated except for those with empty values. The last
clause finds child sub-tree matches k.
For non-empty children sub-trees, C = {(k1 , T1 ), (k2 , T2 ), ..., (km , Tm )}, de-
note the rest pairs except for the first one as C ′ . The enumeration algorithm
can be defined as below.

{
ϕ : C=ϕ
enum(C) =
mapAppend(k1 , f indAll(T1 , ϕ)) ∪ enum(C ′ ) :
(5.20)
Where mapAppend(k, L) = {(k + ki , vi )|(ki , vi ) ∈ L}. It concatenate the
prefix k in front of every key-value pair in list L2 .
Function enum can also be defined with concept of concatM ap (also called
f latM ap)3 .

enum(C) = concatM ap(λ(k,T ) .mapAppend(k, f indAll(T, ϕ))) (5.21)

Function f ind(C, k) is defined as the following. For empty children, the

result is empty as well; Otherwise, it examines the first child sub-tree T1 which
is bound to string k1 . If k equals to k1 or is a prefix of k1 , it calls mapAppend
to concatenate the prefix k1 in front of the key of every child sub-tree under
2 The concept here is to map on the first thing. In some environment, like Haskell,
mapAppend can be expressed as map(f irst(k+), L) by using the arrow in category theory.
3 Literally, it results like first map on each element, then concatenate the result together.

It’s typically realized with ’build-foldr’ to eliminate the intermediate list.

5.6. APPLICATIONS OF TRIE AND PREFIX TREE 125

T1 ; If k1 is prefix of k, the algorithm recursively find all children sub-trees start

with k − k1 ; otherwise, the algorithm by-passes the first child sub-tree and goes
on finding the rest sub-trees.



 ϕ : C=ϕ

mapAppend(k1 , f indAll(T1 , ϕ)) : k ⊏ k1
f ind(C, k) = (5.22)
 mapAppend(k1 , f indAll(T1 , k − k1 ))
 : k1 ⊏ k

f ind(C ′ , k) : otherwise
Below example Haskell program implements the e-dictionary application ac-
cording to the above equations.
import Control.Arrow (first)

get n t k = take n $ findAll t k

findAll :: Eq k ⇒ PrefixTree k v → [k] → [([k], v)]

findAll (PrefixTree Nothing cs) [] = enum cs
findAll (PrefixTree (Just x) cs) [] = ([], x) : enum cs
findAll (PrefixTree _ cs) k = find' cs k
where
find' [] _ = []
find' ((k', t') : ps) k
| k `isPrefixOf` k'
= map (first (k' ++)) (findAll t' [])
| k' `isPrefixOf` k
= map (first (k' ++)) (findAll t' $ drop (length k') k)
| otherwise = find' ps k

enum :: Eq k ⇒ [([k], PrefixTree k v)] → [([k], v)]

enum = concatMap (λ(k, t) → map (first (k ++)) (findAll t []))
In the lazy evaluation environment, the top n candidates can be gotten like
take(n, f indAll(T, k)). Appendix A has detailed definition of take function.
We can also realize this algorithm impertiavely. The following algorithm
reuses the looking up defined for prefix tree. When finds a node bound the
prefix of what we are looking for, it expands all its children sub-trees till getting
n candidates.
1: function Look-Up(T, k, n)
2: if T = NIL then
3: return ϕ
4: pref ix ← NIL
5: repeat
6: match ← FALSE
7: for ∀(ki , Ti ) ∈ Children(T ) do
8: if k is prefix of ki then
9: return Expand(pref ix + ki , Ti , n)
10: if ki is prefix of k then
11: match ← TRUE
12: k ← k − ki
13: T ← Ti
14: pref ix ← pref ix + ki
126 CHAPTER 5. RADIX TREE, TRIE AND PREFIX TREE

15: break
16: until ¬match
17: return ϕ

Where function Expand(T, pref ix, n) picks n sub-trees. They share the
same prefix in T . It is realized as BFS (Bread-First-Search) traverse. 14.3.1 in
the Chapter of search explains BFS in detail.
1: function Expand(pref ix, T, n)
2: R←ϕ
3: Q ← {(pref ix, T )}
4: while |R| < n ∧ Q is not empty do
5: (k, T ) ← Pop(Q)
6: if Data(T ) ̸= NIL then
7: R ← R ∪ {(k, Data(T ) )}
8: for ∀(ki , Ti ) ∈ Children(T ) in sorted order do
9: Push(Q, (k + ki , Ti ))

The following example Python program implements the e-dictionary appli-

cation. When testing if a string is prefix of another one, it uses the find function
provided in standard string library.

def lookup(t, key, n):

if t is None:
return []
prefix = ""
while True:
match = False
for k, tr in t.subtrees.items():
if string.find(k, key) == 0: # key is prefix of k
return expand(prefix + k, tr, n)
if string.find(key, k) ==0:
match = True
key = key[len(k):]
t = tr
prefix += k
break
if not match:
break
return []

def expand(prefix, t, n):

res = []
q = [(prefix, t)]
while len(res)<n and q:
(s, p) = q.pop(0)
if p.value is not None:
res.append((s, p.value))
for k, tr in sorted(p.subtrees.items()):
q.append((s + k, tr))
return res
5.6. APPLICATIONS OF TRIE AND PREFIX TREE 127

5.6.2 T9 input method

When people edit text in the mobile phone, the experience is quite different.
This is because the so called ITU-T key pad has much fewer keys than PC as
shown in figure 5.14.

Figure 5.14: The ITU-T keypad for mobile phone.

There are typical two methods to input word or phrases with ITU-T key
pad. If user wants to enter a word ‘home’ for example, he can press the keys in
below sequence.

• Press key ’4’ twice to enter the letter ’h’;

• Press key ’6’ three times to enter the letter ’o’;

• Press key ’6’ to enter the letter ’m’;

• Press key ’3’ twice to enter the letter ’e’;

Another much quicker way is to just press the following keys.

• Press key ’4’, ’6’, ’6’, ’3’, word ‘home’ appears on top of the candidate list;

• Press key ’*’ to change a candidate word, so word ‘good’ appears;

• Press key ’*’ again to change another candidate word, next word ‘gone’
appears;

• ...

Compare the two methods, the second one is much easier for the user. The
only overhead is the need to store a dictionary of candidate words.
The second method is known as ‘T9’ input method, or predictive input
method [6], [7]. The abbreviation ’T9’ stands for ’textonym’. It start with ’T’
with 9 characters. T9 input can also be realized with prefix tree.
In order to provide candidate words, a dictionary must be prepared in ad-
vance. Prefix tree can be used to store the dictionary. The commercial T9
implementations typically use multiple layers indexed dictionary in both file
system and cache. The realization shown here is for illustration purpose only.
128 CHAPTER 5. RADIX TREE, TRIE AND PREFIX TREE

Firstly, we need define the T9 mapping, which maps from digit to candidate
characters.

MT 9 = { 2 → abc, 3 → def, 4 → ghi,

5 → jkl, 6 → mno, 7 → pqrs, (5.23)
8 → tuv, 9 → wxyz}
With this mapping, MT 9 [i] returns the corresponding characters for digit i.
We can also define the reversed mapping from a character back to digit.

MT−1
9 = concat({{c → d|c ∈ S}|(d → S) ∈ MT 9 }) (5.24)
Given a sequence of characters, we can convert it to a sequence of digits by
looking up MT−1
9.

digits(S) = {MT−1
9 [c]|c ∈ S} (5.25)
When input digits D = d1 d2 ...dn , we define the T9 lookup algorithm as
below.

{
{ϕ} : D = ϕ
f indT 9(T, D) = (5.26)
concatM ap(f ind, pref ixes(T )) : otherwise

Where T is the prefix tree built from a set of words and phrases. It’s kind
of a dictionary we’ll look up. If the input D is empty, the result is an empty
string. Otherwise, it looks up the sub-trees that match the input, and concat
the result together.
To enumerate the matched sub-trees, we examine all the children sub-trees
CT , for every pair (ki , Ti ). We first convert string ki to digit sequence di , then
compare di and D. If either one is the prefix of the other, then this pair is
selected as a candidate for further search.

pref ixes(T ) = {(ki , Ti )|(ki , Ti ) ∈ CT , di = digits(ki ), di ⊏ D ∨ D ⊏ di } (5.27)

Function f ind takes a passed in prefix S, and a sub-tree T ′ to look up further.

As S is prefix of D, it removes it from D to get a new input to D′ = D − S to
search, then later insert S back in front of every recursive search result.

f ind(S, T ′ ) = {take(n, S + si )|si ∈ f indT 9(T ′ , D − S)} (5.28)

Where n = |D| is the length of the input digits. Function take(n, L) takes
the first n elements from the list L. If the length of the list is less then n, or
the elements are taken.
The following Haskell example program implements the T9 look up algorithm
with prefix tree.
import qualified Data.Map as Map

mapT9 = Map.fromList [('1', ",."), ('2', "abc"), ('3', "def"), ('4', "ghi"),
('5', "jkl"), ('6', "mno"), ('7', "pqrs"), ('8', "tuv"),
('9', "wxyz")]
5.6. APPLICATIONS OF TRIE AND PREFIX TREE 129

rmapT9 = Map.fromList $ concatMap (λ(d, s) → [(c, d) | c ← s]) $ Map.toList mapT9

digits = map (λc → Map.findWithDefault '#' c rmapT9)

findT9 :: PrefixTree Char v → String → [String]

findT9 t [] = [""]
findT9 t k = concatMap find prefixes
where
n = length k
find (s, t') = map (take n ◦ (s++)) $ findT9 t' (k `diff` s)
diff x y = drop (length y) x
prefixes = [(s, t') | (s, t') ← children t, let ds = digits s in
ds `isPrefixOf` k | | k `isPrefixOf` ds]

To realize this algorithm imperatively, we can perform BFS search with a

queue Q. The queue stores tuples (pref ix, D, T ). Every tuple records the
possible prefix string we’ve searched so far; the rest of the digits to be searched;
and the sub-tree we are going to search. The queue is initialized with the empty
prefix, the whole digit sequence, and the prefix tree root. The algorithm keeps
picking the tuple from the queue until it’s empty. For every tuple popped from
the queue, we extract the tree from the tuple, then examine the children sub-
trees of it. for each sub-tree Ti , we convert the corresponding prefix string ki
to digits D′ by looking up the reversed T9 map. If the D is prefix of D′ , it’s a
valid candidate. We concatenate ki after the prefix in the tuple, and record this
string in the result. If D′ is prefix of D, we need furthur search this sub-tree.
To do this, we create a new tuple consist of the new prefix ends with ki , the rest
of the digits D − D′ , and the sub-tree. Then push this tuple back to the queue.
1: function Look-Up-T9(T, D)
2: R←ϕ
3: if T = NIL or D = ϕ then
4: return R
5: n ← |D|
6: Q ← {(ϕ, D, T )}
7: while Q ̸= ϕ do
8: (pref ix, D, T ) ← Pop(Q)
9: for ∀(ki , Ti ) ∈ Children(T ) do
10: D′ ← Digits(ki )
11: if D′ ⊏ D then ▷ D′ is prefix of D
12: R ← R ∪ { Take (n, pref ix + ki )} ▷ limit the length to n
13: else if D ⊏ D′ then
14: Push(Q, (pref ix + ki , D − D′ , Ti ))
15: return R
Function Digits(S) converts string S to sequence of digits.
1: function Digits(S)
2: D←ϕ
3: for each c ∈ S do
4: D ← D ∪ {MT−1 9 [c]}
5: return D
The following example Python program implements the T9 input method
with prefix tree.
130 CHAPTER 5. RADIX TREE, TRIE AND PREFIX TREE

T9MAP={'2':"abc", '3':"def", '4':"ghi", '5':"jkl", λ

'6':"mno", '7':"pqrs", '8':"tuv", '9':"wxyz"}

T9RMAP = dict([(c, d) for d, cs in T9MAP.items() for c in cs])

def digits(w):
return ''.join([T9RMAP[c] for c in w])

def lookup_t9(t, key):

if t is None or key == "":
return []
res = []
n = len(key)
q = [("", key, t)]
while q:
prefix, key, t = q.pop(0)
for k, tr in t.subtrees.items():
ds = digits(k)
if string.find(ds, key) == 0: # key is prefix of ds
res.append((prefix + k)[:n])
elif string.find(key, ds) == 0: # ds is prefix of key
q.append((prefix + k, key[len(k):], tr))
return res

Exercise 5.2

• Realize the e-dictionary and T9 lookup with trie.

• For the alphabetic prefix tree look up algorithms that return multiple
results, how to ensure the result is in lexicographic order? What is the
performance?
• How to realize the e-dictionary and T9 look up without lazy evaluation?

5.7 Summary
In this chapter, we start from the integer based trie and prefix tree. The map
data structure based on integer tree plays the important role in Compiler im-
plementation. Alphabetic trie and prefix tree are natural extensions. They can
manipulate text information. We demonstrate how to realize the predictive e-
dictionary and T9 input method with prefix tree, although these examples are
different from the commercial implementations. Other data structure, suffix
tree, has close relationship with trie and prefix tree. Suffix tree is introduced in
Appendix D.
Bibliography

[1] Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest and Clif-

ford Stein. “Introduction to Algorithms, Second Edition”. Problem 12-1.
ISBN:0262032937. The MIT Press. 2001

[2] Chris Okasaki and Andrew Gill. “Fast Mergeable Integer Maps”. Workshop
on ML, September 1998, pages 77-86, http://www.cse.ogi.edu/~andy/
pub/finite.htm
[3] D.R. Morrison, “PATRICIA – Practical Algorithm To Retrieve Information
Coded In Alphanumeric”, Journal of the ACM, 15(4), October 1968, pages
514-534.
[4] Suffix Tree, Wikipedia. http://en.wikipedia.org/wiki/Suffix_tree
[5] Trie, Wikipedia. http://en.wikipedia.org/wiki/Trie
[6] T9 (predictive text), Wikipedia. http://en.wikipedia.org/wiki/T9_
(predictive_text)
[7] Predictive text, Wikipedia. http://en.wikipedia.org/wiki/
Predictive_text

131
132 B-Trees
Chapter 6

B-Trees

6.1 Introduction
B-Tree is important data structure. It is widely used in modern file systems.
Some are implemented based on B+ tree, which is extended from B-tree. B-tree
is also widely used in database systems.
Some textbooks introduce B-tree with the the problem of how to access a
large block of data on magnetic disks or secondary storage devices[2]. It is
also helpful to understand B-tree as a generalization of balanced binary search
tree[2].
Refer to the Figure 6.1, It is easy to find the difference and similarity of
B-tree regarding to binary search tree.

C G P T W

A B D E F H I J K N O Q R S U V X Y Z

Figure 6.1: Example B-Tree

Remind the definition of binary search tree. A binary search tree is

• either an empty node;

• or a node contains 3 parts, a value, a left child and a right child. Both
children are also binary search trees.

The binary search tree satisfies the constraint that.

• all the values on the left child are not greater than the value of of this
node;

• the value of this node is not greater than any values on the right child.

133
134 CHAPTER 6. B-TREES

For non-empty binary tree (L, k, R), where L, R and k are the left, right chil-
dren, and the key. Function Key(T ) accesses the key of tree T . The constraint
can be represented as the following.

∀x ∈ L, ∀y ∈ R ⇒ Key(x) ≤ k ≤ Key(y) (6.1)

If we extend this definition to allow multiple keys and children, we get the
B-tree definition.
A B-tree
• is either empty;
• or contains n keys, and n + 1 children, each child is also a B-Tree, we
denote these keys and children as k1 , k2 , ..., kn and c1 , c2 , ..., cn , cn+1 .
Figure 6.2 illustrates a B-Tree node.

C[1] K[1] C[2] K[2] ... C[n] K[n] C[n+1]

Figure 6.2: A B-Tree node

The keys and children in a node satisfy the following order constraints.
• Keys are stored in non-decreasing order. that k1 ≤ k2 ≤ ... ≤ kn ;
• for each ki , all elements stored in child ci are not greater than ki , while
ki is not greater than any values stored in child ci+1 .
The constraints can be represented as in equation (6.2) as well.

∀xi ∈ ci , i = 0, 1, ..., n, ⇒ x1 ≤ k1 ≤ x2 ≤ k2 ≤ ... ≤ xn ≤ kn ≤ xn+1 (6.2)

Finally, after adding some constraints to make the tree balanced, we get the
complete B-tree definition.
• All leaves have the same depth;
• We define integral number, t, as the minimum degree of B-tree;
– each node can have at most 2t − 1 keys;
– each node can have at least t − 1 keys, except the root;
Consider a B-tree holds n keys. The minimum degree t ≥ 2. The height is
h. All the nodes have at least t − 1 keys except the root. The root contains at
least 1 key. There are at least 2 nodes at depth 1, at least 2t nodes at depth 2,
at least 2t2 nodes at depth 3, ..., finally, there are at least 2th−1 nodes at depth
h. Times all nodes with t − 1 except for root, the total number of keys satisfies
the following inequality.

n ≥ 1 + (t − 1)(2 + 2t + 2t2 + ... + 2th−1 )

∑
h−1
= 1 + 2(t − 1) tk
k=0 (6.3)
th − 1
= 1 + 2(t − 1)
t−1
= 2th − 1
6.2. INSERTION 135

Thus we have the inequality between the height and the number of keys.

n+1
h ≤ logt (6.4)
2
This is the reason why B-tree is balanced. The simplest B-tree is so called
2-3-4 tree, where t = 2, that every node except root contains 2 or 3 or 4 keys.
red-black tree can be mapped to 2-3-4 tree essentially.
The following Python code shows example B-tree definition. It explicitly
pass t when create a node.
class BTree:
def __init__(self, t):
self.t = t
self.keys = []
self.children = []

B-tree nodes commonly have satellite data as well. We ignore satellite data
for illustration purpose.
In this chapter, we will firstly introduce how to generate B-tree by insertion.
Two different methods will be explained. One is the classic method as in [2],
that we split the node before insertion if it’s full; the other is the modify-fix
approach which is quite similar to the red-black tree solution [3] [2]. We will
next explain how to delete key from B-tree and how to look up a key.

6.2 Insertion
B-tree can be created by inserting keys repeatedly. The basic idea is similar to
the binary search tree. When insert key x, from the tree root, we examine all
the keys in the node to find a position where all the keys on the left are less
than x, while all the keys on the right are greater than x.1 If the current node
is a leaf node, and it is not full (there are less then 2t − 1 keys in this node), x
will be insert at this position. Otherwise, the position points to a child node.
We need recursively insert x to it.
Figure 6.3 shows one example. The B-tree illustrated is 2-3-4 tree. When
insert key x = 22, because it’s greater than the root, the right child contains
key 26, 38, 45 is examined next; Since 22 < 26, the first child contains key 21
and 25 are examined. This is a leaf node, and it is not full, key 22 is inserted
to this node.
However, if there are 2t − 1 keys in the leaf, the new key x can’t be inserted,
because this node is ’full’. When try to insert key 18 to the above example
B-tree will meet this problem. There are 2 methods to solve it.

6.2.1 Splitting
Split before insertion
If the node is full, one method to solve the problem is to split to node before
insertion.
1 This is a strong constraint. In fact, only less-than and equality testing is necessary. The

later exercise address this point.

136 CHAPTER 6. B-TREES

4 11 26 38 45

1 2 5 8 9 12 15 16 17 21 25 30 31 37 40 42 46 47 50

(a) Insert key 22 to the 2-3-4 tree. 22 > 20, go to the right child; 22 < 26 go
to the first child.

4 11 26 38 45

1 2 5 8 9 12 15 16 17 21 22 25 30 31 37 40 42 46 47 50

(b) 21 < 22 < 25, and the leaf isn’t full.

Figure 6.3: Insertion is similar to binary search tree.

For a node with t − 1 keys, it can be divided into 3 parts as shown in Figure
6.4. the left part contains the first t − 1 keys and t children. The right part
contains the rest t − 1 keys and t children. Both left part and right part are
valid B-tree nodes. the middle part is the t-th key. We can push it up to the
parent node (if the current node is root, then the this key, with the two children
will be the new root).
For node x, denote K(x) as keys, C(x) as children. The i-th key as ki (x),
the j-th child as cj (x). Below algorithm describes how to split the i-th child for
a given node.
1: procedure Split-Child(node, i)
2: x ← ci (node)
3: y ← CREATE-NODE
4: Insert(K(node), i, kt (x))
5: Insert(C(node), i + 1, y)
6: K(y) ← {kt+1 (x), kt+2 (x), ..., k2t−1 (x)}
7: K(x) ← {k1 (x), k2 (x), ..., kt−1 (x)}
8: if y is not leaf then
9: C(y) ← {ct+1 (x), ct+2 (x), ..., c2t (x)}
10: C(x) ← {c1 (x), c2 (x), ..., ct (x)}
The following example Python program implements this child splitting al-
gorithm.
def split_child(node, i):
t = node.t
x = node.children[i]
y = BTree(t)
node.keys.insert(i, x.keys[t-1])
node.children.insert(i+1, y)
y.keys = x.keys[t:]
6.2. INSERTION 137

K[1] K[2] ... K[t] ... K[2t-1]

C[1] C[2] ... C[t] C[t+1] ... C[2t-1] C[2t]

(a) Before split

... K[t] ...

K[1] K[2] ... K[t-1] K[t+1] ... K[2t-1]

C[1] C[2] ... C[t] C[t+1] ... C[2t-1]

(b) After split

Figure 6.4: Split node

x.keys = x.keys[:t-1]
if not is_leaf(x):
y.children = x.children[t:]
x.children = x.children[:t]

Where function is_leaf test if a node is leaf.

def is_leaf(t):
return t.children == []

After splitting, a key is pushed up to its parent node. It is quite possible

that the parent node has already been full. And this pushing violates the B-tree
property.
In order to solve this problem, we can check from the root along the path
of insertion traversing till the leaf. If there is any node in this path is full,
the splitting is applied. Since the parent of this node has been examined, it is
ensured that there are less than 2t − 1 keys in the parent. It won’t make the
parent full if pushing up one key. This approach only need one single pass down
the tree without any back-tracking.
If the root need splitting, a new node is created as the new root. There is
no keys in this new created root, and the previous root is set as the only child.
After that, splitting is performed top-down. And we can insert the new key
finally.
1: function Insert(T, k)
2: r←T
3: if r is full then ▷ root is full
4: s ← CREATE-NODE
5: C(s) ← {r}
6: Split-Child(s, 1)
7: r←s
8: return Insert-Nonfull(r, k)
138 CHAPTER 6. B-TREES

Where algorithm Insert-Nonfull assumes the node passed in is not full.

If it is a leaf node, the new key is inserted to the proper position based on the
order; Otherwise, the algorithm finds a proper child node to which the new key
will be inserted. If this child is full, splitting will be performed.
1: function Insert-Nonfull(T, k)
2: if T is leaf then
3: i←1
4: while i ≤ |K(T )| ∧ k > ki (T ) do
5: i←i+1
6: Insert(K(T ), i, k)
7: else
8: i ← |K(T )|
9: while i > 1 ∧ k < ki (T ) do
10: i←i−1
11: if ci (T ) is full then
12: Split-Child(T, i)
13: if k > ki (T ) then
14: i←i+1
15: Insert-Nonfull(ci (T ), k)
16: return T
This algorithm is recursive. In B-tree, the minimum degree t is typically
relative to magnetic disk structure. Even small depth can support huge amount
of data (with t = 10, maximum to 10 billion data can be stored in a B-tree with
height of 10). The recursion can also be eliminated. This is left as exercise to
the reader.
Figure 6.5 shows the result of continuously inserting keys G, M, P, X, A, C,
D, E, J, K, N, O, R, S, T, U, V, Y, Z to the empty tree. The first result is the
2-3-4 tree (t = 2). The second result shows how it varies when t = 3.

E P

C M S U X

A D G J K N O R T V Y Z

(a) 2-3-4 tree.

D M P T

A C E G J K N O R S U V X Y Z

(b) t = 3

Figure 6.5: Insertion result

Below example Python program implements this algorithm.

6.2. INSERTION 139

def insert(tr, key):

root = tr
if is_full(root):
s = BTree(root.t)
s.children.insert(0, root)
split_child(s, 0)
root = s
return insert_nonfull(root, key)
And the insertion to non-full node is implemented as the following.
def insert_nonfull(tr, key):
if is_leaf(tr):
ordered_insert(tr.keys, key)
else:
i = len(tr.keys)
while i>0 and key < tr.keys[i-1]:
i = i-1
if is_full(tr.children[i]):
split_child(tr, i)
if key>tr.keys[i]:
i = i+1
insert_nonfull(tr.children[i], key)
return tr
Where function ordered_insert is used to insert an element to an ordered
list. Function is_full tests if a node contains 2t − 1 keys.
def ordered_insert(lst, x):
i = len(lst)
lst.append(x)
while i>0 and lst[i]<lst[i-1]:
(lst[i-1], lst[i]) = (lst[i], lst[i-1])
i=i-1

def is_full(node):
return len(node.keys) ≥ 2 ∗ node.t - 1
For the array based collection, append on the tail is much more effective
than insert in other position, because the later takes O(n) time, if the length
of the collection is n. The ordered_insert program firstly appends the new
element at the end of the existing collection, then iterates from the last element
to the first one, and checks if the current two elements next to each other are
ordered. If not, these two elements will be swapped.

Insert then fixing

In functional settings, B-tree insertion can be realized in a way similar to red-
black tree. When insert a key to red-black tree, it is firstly inserted as in the
normal binary search tree, then recursive fixing is performed to resume the
balance of the tree. B-tree can be viewed as extension to the binary search tree,
that each node contains multiple keys and children. We can firstly insert the
key without considering if the node is full. Then perform fixing to satisfy the
minimum degree constraint.

insert(T, k) = f ix(ins(T, k)) (6.5)

140 CHAPTER 6. B-TREES

Function ins(T, k) traverse the B-tree T from root to find a proper position
where key k can be inserted. After that, function f ix is applied to resume the
B-tree properties. Denote B-tree in a form of T = (K, C, t), where K represents
keys, C represents children, and t is the minimum degree.
Below is the Haskell definition of B-tree.
data BTree a = Node{ keys :: [a]
, children :: [BTree a]
, degree :: Int} deriving (Eq)

The insertion function can be provided based on this definition.

insert tr x = fixRoot $ ins tr x

There are two cases when realize ins(T, k) function. If the tree T is leaf, k
is inserted to the keys; Otherwise if T is the branch node, we need recursively
insert k to the proper child.
Figure 6.6 shows the branch case. The algorithm first locates the position.
for certain key ki , if the new key k to be inserted satisfy ki−1 < k < ki , Then
we need recursively insert k to child ci .
This position divides the node into 3 parts, the left part, the child ci and
the right part.

k, K[i-1]<k<K[i]

insert to

K[1] K[2] ... K[i-1] K[i] ... K[n]

C[1] C[2] ... C[i-1] C[i] C[i+1] ... C[n] C[n+1]

(a) Locate the child to insert.

K[1] K[2] ... K[i-1] k, K[i-1]<k<K[i] K[i] K[i+1] ... K[n]

recursive insert

C[1] C[2] ... C[i-1] C[i] C[i+1] ... C[n+1]

(b) Recursive insert.

Figure 6.6: Insert a key to a branch node

{
(K ′ ∪ {k} ∪ K ′′ , ϕ, t) : C = ϕ, (K ′ , K ′′ ) = divide(K, k)
ins(T, k) =
make((K , C1 ), ins(c, k), (K ′′ , C2′ )) : (C1 , C2 ) = split(|K ′ |, C)
′

(6.6)
The first clause deals with the leaf case. Function divide(K, k) divide keys
into two parts, all keys in the first part are not greater than k, and all rest keys
are not less than k.

K = K ′ ∪ K ′′ ∧ ∀k ′ ∈ K ′ , k ′′ ∈ K ′′ ⇒ k ′ ≤ k ≤ k ′′
6.2. INSERTION 141

The second clause handle the branch case. Function split(n, C) splits chil-
dren in two parts, C1 and C2 . C1 contains the first n children; and C2 contains
the rest. Among C2 , the first child is denoted as c, and others are represented
as C2′ .
Here the key k need be recursively inserted into child c. Function make
takes 3 parameter. The first and the third are pairs of key and children; the
second parameter is a child node. It examines if a B-tree node made from these
keys and children violates the minimum degree constraint and performs fixing
if necessary.

{
′ ′ ′′ ′′ f ixF ull((K ′ , C ′ ), c, (K ′′ , C ′′ )) :
f ull(c)
make((K , C ), c, (K , C )) =
(K ′ ∪ K ′′ , C ′ ∪ {c} ∪ C ′′ , t) :
otherwise
(6.7)
Where function f ull(c) tests if the child c is full. Function f ixF ull splits
the the child c, and forms a new B-tree node with the pushed up key.

f ixF ull((K ′ , C ′ ), c, (K ′′ , C ′′ )) = (K ′ ∪ {k ′ } ∪ K ′′ , C ′ ∪ {c1 , c2 } ∪ C ′′ , t) (6.8)

Where (c1 , k ′ , c2 ) = split(c). During splitting, the first t − 1 keys and t

children are extract to one new child, the last t − 1 keys and t children form
another child. The t-th key k ′ is pushed up.
With all the above functions defined, we can realize f ix(T ) to complete the
functional B-tree insertion algorithm. It firstly checks if the root contains too
many keys. If it exceeds the limit, splitting will be applied. The split result will
be used to make a new node, so the total height of the tree increases by one.


 c : T = (ϕ, {c}, t)
f ix(T ) = ({k ′ }, {c1 , c2 }, t) : f ull(T ), (c1 , k ′ , c2 ) = split(T ) (6.9)

T : otherwise

The following Haskell example code implements the B-tree insertion.

import qualified Data.List as L

ins (Node ks [] t) x = Node (L.insert x ks) [] t

ins (Node ks cs t) x = make (ks', cs') (ins c x) (ks'', cs'')
where
(ks', ks'') = L.partition (<x) ks
(cs', (c:cs'')) = L.splitAt (length ks') cs

fixRoot (Node [] [tr] _) = tr -- shrink height

fixRoot tr = if full tr then Node [k] [c1, c2] (degree tr)
else tr
where
(c1, k, c2) = split tr

make (ks', cs') c (ks'', cs'')

| full c = fixFull (ks', cs') c (ks'', cs'')
| otherwise = Node (ks'++ks'') (cs'++[c]++cs'') (degree c)
142 CHAPTER 6. B-TREES

fixFull (ks', cs') c (ks'', cs'') = Node (ks'++[k]++ks'')

(cs'++[c1,c2]++cs'') (degree c)
where
(c1, k, c2) = split c

full tr = (length $ keys tr) > 2∗(degree tr)-1

Figure 6.7 shows the varies of results of building B-trees by continuously
inserting keys ”GMPXACDEJKNORSTUVYZ”.

E O

C M R T V

A D G J K N P S U X Y Z

(a) Insert result of a 2-3-4 tree.

G M P T

A C D E J K N O R S U V X Y Z

(b) Insert result of a B-tree with t = 3

Figure 6.7: Insert then fixing results

Compare to the imperative insertion result as shown in figure 6.7 we can

found that there are different. However, they are all valid because all B-tree
properties are satisfied.

6.3 Deletion
Deleting a key from B-tree may violate balance properties. Except the root, a
node shouldn’t contain too few keys less than t − 1, where t is the minimum
degree.
Similar to the approaches for insertion, we can either do some preparation
so that the node from where the key being deleted contains enough keys; or do
some fixing after the deletion if the node has too few keys.

6.3.1 Merge before delete method

We start from the easiest case. If the key k to be deleted can be located in
node x, and x is a leaf node, we can directly remove k from x. If x is the root
(the only node of the tree), we needn’t worry about there are too few keys after
deletion. This case is named as case 1 later.
In most cases, we start from the root, along a path to locate where is the
node contains k. If k can be located in the internal node x, there are three sub
cases.
6.3. DELETION 143

• Case 2a, If the child y precedes k contains enough keys (more than t), we
replace k in node x with k ′ , which is the predecessor of k in child y. And
recursively remove k ′ from y.
The predecessor of k can be easily located as the last key of child y.
This is shown in figure 6.8.

Figure 6.8: Replace and delete from predecessor.

• Case 2b, If y doesn’t contain enough keys, while the child z follows k
contains more than t keys. We replace k in node x with k ′′ , which is the
successor of k in child z. And recursively remove k ′′ from z.
The successor of k can be easily located as the first key of child z.
This sub-case is illustrated in figure 6.9.

• Case 2c, Otherwise, if neither y, nor z contains enough keys, we can merge
y, k and z into one new node, so that this new node contains 2t − 1 keys.
After that, we can then recursively do the removing.
Note that after merge, if the current node doesn’t contain any keys, which
means k is the only key in x. y and z are the only two children of x. we
need shrink the tree height by one.

Figure 6.10 illustrates this sub-case.

the last case states that, if k can’t be located in node x, the algorithm need
find a child node ci in x, so that the sub-tree ci contains k. Before the deletion
is recursively applied in ci , we need make sure that there are at least t keys in
ci . If there are not enough keys, the following adjustment is performed.

• Case 3a, We check the two sibling of ci , which are ci−1 and ci+1 . If either
one contains enough keys (at least t keys), we move one key from x down
144 CHAPTER 6. B-TREES

Figure 6.9: Replace and delete from successor.

Figure 6.10: Merge and delete.

6.3. DELETION 145

to ci , and move one key from the sibling up to x. Also we need move the
relative child from the sibling to ci .
This operation makes ci contains enough keys for deletion. we can next
try to delete k from ci recursively.
Figure 6.11 illustrates this case.

Figure 6.11: Borrow from the right sibling.

• Case 3b, In case neither one of the two siblings contains enough keys, we
then merge ci , a key from x, and either one of the sibling into a new node.
Then do the deletion on this new node.

Figure 6.12 shows this case.

Before define the B-tree delete algorithm, we need provide some auxiliary
functions. Function Can-Del tests if a node contains enough keys for deletion.
1: function Can-Del(T )
2: return |K(T )| ≥ t
Procedure Merge-Children(T, i) merges child ci (T ), key ki (T ), and child
ci+1 (T ) into one big node.
1: procedure Merge-Children(T, i) ▷ Merge ci (T ), ki (T ), and ci+1 (T )
2: x ← ci (T )
3: y ← ci+1 (T )
4: K(x) ← K(x) ∪ {ki (T )} ∪ K(y)
5: C(x) ← C(x) ∪ C(y)
6: Remove-At(K(T ), i)
7: Remove-At(C(T ), i + 1)
146 CHAPTER 6. B-TREES

Figure 6.12: Merge ci , k, and ci+1 to a new node.

Procedure Merge-Children merges the i-th child, the i-th key, and i + 1-
th child of node T into a new child, and remove the i-th key and i + 1-th child
from T after merging.
With these functions defined, the B-tree deletion algorithm can be given by
realizing the above 3 cases.
1: function Delete(T, k)
2: i←1
3: while i ≤ |K(T )| do
4: if k = ki (T ) then
5: if T is leaf then ▷ case 1
6: Remove(K(T ), k)
7: else ▷ case 2
8: if Can-Del(ci (T )) then ▷ case 2a
9: ki (T ) ← Last-Key(ci (T ))
10: Delete(ci (T ), ki (T ))
11: else if Can-Del(ci+1 (T )) then ▷ case 2b
12: ki (T ) ← First-Key(ci+1 (T ))
13: Delete(ci+1 (T ), ki (T ))
14: else ▷ case 2c
15: Merge-Children(T, i)
16: Delete(ci (T ), k)
17: if K(T ) = N IL then
18: T ← ci (T ) ▷ Shrinks height
19: return T
20: else if k < ki (T ) then
6.3. DELETION 147

21: Break
22: else
23: i←i+1

24: if T is leaf then

25: return T ▷ k doesn’t exist in T .
26: if ¬ Can-Del(ci (T )) then ▷ case 3
27: if i > 1∧ Can-Del(ci−1 (T )) then ▷ case 3a: left sibling
28: Insert(K(ci (T )), ki−1 (T ))
29: ki−1 (T ) ← Pop-Back(K(ci−1 (T )))
30: if ci (T ) isn’t leaf then
31: c ← Pop-Back(C(ci−1 (T )))
32: Insert(C(ci (T )), c)
33: else if i ≤ |C(T )|∧ Can-Del(ci1 (T )) then ▷ case 3a: right sibling
34: Append(K(ci (T )), ki (T ))
35: ki (T ) ← Pop-Front(K(ci+1 (T )))
36: if ci (T ) isn’t leaf then
37: c ← Pop-Front(C(ci+1 (T )))
38: Append(C(ci (T )), c)
39: else ▷ case 3b
40: if i > 1 then
41: Merge-Children(T, i − 1)
42: else
43: Merge-Children(T, i)
44: Delete(ci (T ), k) ▷ recursive delete
45: if K(T ) = N IL then ▷ Shrinks height
46: T ← c1 (T )
47: return T
Figure 6.13, 6.14, and 6.15 show the deleting process step by step. The nodes
modified are shaded.
The following example Python program implements the B-tree deletion al-
gorithm.
def can_remove(tr):
return len(tr.keys) ≥ tr.t

def replace_key(tr, i, k):

tr.keys[i] = k
return k

def merge_children(tr, i):

tr.children[i].keys += [tr.keys[i]] + tr.children[i+1].keys
tr.children[i].children += tr.children[i+1].children
tr.keys.pop(i)
tr.children.pop(i+1)

def B_tree_delete(tr, key):

i = len(tr.keys)
while i>0:
if key == tr.keys[i-1]:
if tr.leaf: # case 1 in CLRS
148 CHAPTER 6. B-TREES

C G M T X

A B D E F J K L N O Q R S U V Y Z

(a) A B-tree before deleting.

C G M T X

A B D E J K L N O Q R S U V Y Z

(b) After delete key ’F’, case 1.

Figure 6.13: Result of B-tree deleting (1).

C G L T X

A B D E J K N O Q R S U V Y Z

(a) After delete key ’M’, case 2a.

C L T X

A B D E J K N O Q R S U V Y Z

(b) After delete key ’G’, case 2c.

Figure 6.14: Result of B-tree deleting program (2)

6.3. DELETION 149

C L P T X

A B E J K N O Q R S U V Y Z

(a) After delete key ’D’, case 3b, and height is shrunk.

E L P T X

A C J K N O Q R S U V Y Z

(b) After delete key ’B’, case 3a, borrow from right sibling.

E L P S X

A C J K N O Q R T V Y Z

Figure 6.15: Result of B-tree deleting program (3)

tr.keys.remove(key)
else: # case 2 in CLRS
if tr.children[i-1].can_remove(): # case 2a
key = tr.replace_key(i-1, tr.children[i-1].keys[-1])
B_tree_delete(tr.children[i-1], key)
elif tr.children[i].can_remove(): # case 2b
key = tr.replace_key(i-1, tr.children[i].keys[0])
B_tree_delete(tr.children[i], key)
else: # case 2c
tr.merge_children(i-1)
B_tree_delete(tr.children[i-1], key)
if tr.keys==[]: # tree shrinks in height
tr = tr.children[i-1]
return tr
elif key > tr.keys[i-1]:
break
else:
i = i-1
# case 3
if tr.leaf:
return tr #key doesn't exist at all
if not tr.children[i].can_remove():
if i>0 and tr.children[i-1].can_remove(): #left sibling
tr.children[i].keys.insert(0, tr.keys[i-1])
tr.keys[i-1] = tr.children[i-1].keys.pop()
if not tr.children[i].leaf:
tr.children[i].children.insert(0, tr.children[i-1].children.pop())
elif i<len(tr.children) and tr.children[i+1].can_remove(): #right sibling
150 CHAPTER 6. B-TREES

tr.children[i].keys.append(tr.keys[i])
tr.keys[i]=tr.children[i+1].keys.pop(0)
if not tr.children[i].leaf:
tr.children[i].children.append(tr.children[i+1].children.pop(0))
else: # case 3b
if i>0:
tr.merge_children(i-1)
else:
tr.merge_children(i)
B_tree_delete(tr.children[i], key)
if tr.keys==[]: # tree shrinks in height
tr = tr.children[0]
return tr

6.3.2 Delete and fix method

The merge and delete algorithm is a bit complex. There are several cases, and
in each case, there are sub cases to deal.
Another approach to design the deleting algorithm is to perform fixing after
deletion. It is similar to the insert-then-fix strategy.

delete(T, k) = f ix(del(T, k)) (6.10)

When delete a key from B-tree, we firstly locate which node this key is
contained. We traverse from the root to the leaves till find this key in some
node.
If this node is a leaf, we can remove the key, and then examine if the deletion
makes the node contains too few keys to satisfy the B-tree balance properties.
If it is a branch node, removing the key breaks the node into two parts. We
need merge them together. The merging is a recursive process which is shown
in figure 6.16.
When do merging, if the two nodes are not leaves, we merge the keys to-
gether, and recursively merge the last child of the left part and the first child
of the right part to one new node. Otherwise, if they are leaves, we merely put
all keys together.
Till now, the deleting is performed in straightforward way. However, deleting
decreases the number of keys of a node, and it may result in violating the B-tree
balance properties. The solution is to perform fixing along the path traversed
from root.
During the recursive deletion, the branch node is broken into 3 parts. The
left part contains all keys less than k, includes k1 , k2 , ..., ki−1 , and children
c1 , c2 , ..., ci−1 , the right part contains all keys greater than k, say ki , ki+1 , ..., kn+1 ,
and children ci+1 , ci+2 , ..., cn+1 . Then key k is recursively deleted from child ci .
Denote the result becomes c′i after that. We need make a new node from these
3 parts, as shown in figure 6.17.
At this time point, we need examine if c′i contains enough keys. If there
are too less keys (less than t − 1, but not t in contrast to the merge-and-delete
approach), we can either borrow a key-child pair from the left or the right part,
and do inverse operation of splitting. Figure 6.18 shows example of borrowing
from the left part.
If both left part and right part are empty, we can simply push c′i up.
6.3. DELETION 151

Figure 6.16: Delete a key from a branch node. Removing ki breaks the node
into 2 parts. Merging these 2 parts is a recursive process. When the two parts
are leaves, the merging terminates.

Figure 6.17: After delete key k from node ci , denote the result as c′i . The fixing
makes a new node from the left part, c′i and the right part.
152 CHAPTER 6. B-TREES

Figure 6.18: Borrow a key-child pair from left part and un-split to a new child.

Denote the B-tree as T = (K, C, t), where K and C are keys and children.
The del(T, k) function deletes key k from the tree.


 (delete(K, k), ϕ, t) : C=ϕ
del(T, k) = merge((K1 , C1 , t), (K2 , C2 , t)) : ki = k (6.11)

make((K1′ , C1′ ), del(c, k), (K2′ , C2′ )) : k∈/K

If children C = ϕ is empty, T is leaf. k is deleted from keys directly. Other-

wise, T is internal node. If k ∈ K, removing it separates the keys and children
in two parts (K1 , C1 ) and (K2 , C2 ). They will be recursively merged.

K1 = {k1 , k2 , ..., ki−1 }

K2 = {ki+1 , ki+2 , ..., km }
C1 = {c1 , c2 , ..., ci }
C2 = {ci+1 , ci+2 , ..., cm+1 }

If k ∈
/ K, we need locate a child c, and further delete k from it.

(K1′ , K2′ ) = ({k ′ |k ′ ∈ K, k ′ < k}, {k ′ |k ′ ∈ K, k < k ′ })

(C1′ , {c} ∪ C2′ ) = splitAt(|K1′ |, C)

The recursive merge function is defined as the following. When merge two
trees T1 = (K1 , C1 , t) and T2 = (K2 , C2 , t), if both are leaves, we create a new
leave by concatenating the keys. Otherwise, the last child in C1 , and the first
child in C2 are recursively merged. And we call make function to form the new
tree. When C1 and C2 are not empty, denote the last child of C1 as c1,m , the
rest as C1′ ; the first child of C2 as C2,1 , the rest as C2′ . Below equation defines
6.3. DELETION 153

the merge function.

{
(K1 ∪ K2 , ϕ, t) : C1 = C2 = ϕ
merge(T1 , T2 ) =
make((K1 , C1′ ), merge(c1,m , c2,1 ), (K2 , C2′ )) : otherwise
(6.12)
The make function defined above only handles the case that a node contains
too many keys due to insertion. When delete key, it may cause a node contains
too few keys. We need test and fix this situation as well.


 f ixF ull((K ′ , C ′ ), c, (K ′′ , C ′′ )) : f ull(c)
′ ′ ′′ ′′
make((K , C ), c, (K , C )) = f ixLow((K ′ , C ′ ), c, (K ′′ , C ′′ )) : low(c)

(K ′ ∪ K ′′ , C ′ ∪ {c} ∪ C ′′ , t) : otherwise
(6.13)
Where low(T ) checks if there are too few keys less than t − 1. Function
f ixLow(Pl , c, Pr ) takes three arguments, the left pair of keys and children, a
child node, and the right pair of keys and children. If the left part isn’t empty, we
borrow a pair of key-child, and do un-splitting to make the child contain enough
keys, then recursively call make; If the right part isn’t empty, we borrow a pair
from the right; and if both sides are empty, we return the child node as result.
In this case, the height of the tree shrinks.
Denote the left part Pl = (Kl , Cl ). If Kl isn’t empty, the last key and child
are represented as kl,m and cl,m respectively. The rest keys and children become
Kl′ and Cl′ ; Similarly, the right part is denoted as Pr = (Kr , Cr ). If Kr isn’t
empty, the first key and child are represented as kr,1 , and cr,1 . The rest keys
and children are Kr′ and Cr′ . Below equation gives the definition of f ixLow.


 make((Kl′ , Cl′ ), unsplit(cl,m , kl,m , c), (Kr , Cr )) : Kl ̸= ϕ
f ixLow(Pl , c, Pr ) = make((Kr , Cr ), unsplit(c, kr,1 , cr,1 ), (Kr′ , Cr′ )) : Kr ≠ ϕ

c : otherwise
(6.14)
Function unsplit(T1 , k, T2 ) is the inverse operation to splitting. It forms a
new B-tree nodes from two small nodes and a key.

unsplit(T1 , k, T2 ) = (K1 ∪ {k} ∪ K2 , C1 ∪ C2 , t) (6.15)

The following example Haskell program implements the B-tree deletion al-
gorithm.
import qualified Data.List as L

delete tr x = fixRoot $ del tr x

del:: (Ord a) ⇒ BTree a → a → BTree a

del (Node ks [] t) x = Node (L.delete x ks) [] t
del (Node ks cs t) x =
case L.elemIndex x ks of
Just i → merge (Node (take i ks) (take (i+1) cs) t)
(Node (drop (i+1) ks) (drop (i+1) cs) t)
Nothing → make (ks', cs') (del c x) (ks'', cs'')
where
154 CHAPTER 6. B-TREES

(ks', ks'') = L.partition (<x) ks

(cs', (c:cs'')) = L.splitAt (length ks') cs

merge (Node ks [] t) (Node ks' [] _) = Node (ks++ks') [] t

merge (Node ks cs t) (Node ks' cs' _) = make (ks, init cs)
(merge (last cs) (head cs'))
(ks', tail cs')

make (ks', cs') c (ks'', cs'')

| full c = fixFull (ks', cs') c (ks'', cs'')
| low c = fixLow (ks', cs') c (ks'', cs'')
| otherwise = Node (ks'++ks'') (cs'++[c]++cs'') (degree c)

low tr = (length $ keys tr) < (degree tr)-1

fixLow (ks'@(_:_), cs') c (ks'', cs'') = make (init ks', init cs')
(unsplit (last cs') (last ks') c)
(ks'', cs'')
fixLow (ks', cs') c (ks''@(_:_), cs'') = make (ks', cs')
(unsplit c (head ks'') (head cs''))
(tail ks'', tail cs'')
fixLow _ c _ = c

unsplit c1 k c2 = Node ((keys c1)++[k]++(keys c2))

((children c1)++(children c2)) (degree c1)

When delete the same keys from the B-tree as in delete and fixing approach,
the results are different. However, both satisfy the B-tree properties, so they
are all valid.

C G P T W

A B D E F H I J K N O Q R S U V X Y Z

(a) B-tree before deleting

C G P T W

A B D F H I J K N O Q R S U V X Y Z

(b) After delete key ’E’.

Figure 6.19: Result of delete-then-fixing (1)

6.3. DELETION 155

C H P T W

A B D F I J K N O Q R S U V X Y Z

(a) After delete key ’G’,

H M P T W

B C D F I J K N O Q R S U V X Y Z

(b) After delete key ’A’.

Figure 6.20: Result of delete-then-fixing (2)

H P T W

B C D F I J K N O Q R S U V X Y Z

(a) After delete key ’M’.

H P W

B C D F I J K N O Q R S T V X Y Z

(b) After delete key ’U’.

Figure 6.21: Result of delete-then-fixing (3)

156 CHAPTER 6. B-TREES

6.4 Searching
Searching in B-tree can be considered as the generalized tree search extended
from binary search tree.
When searching in the binary tree, there are only 2 different directions, the
left and the right. However, there are multiple directions in B-tree.
1: function Search(T, k)
2: loop
3: i←1
4: while i ≤ |K(T )| ∧ k > ki (T ) do
5: i←i+1
6: if i ≤ |K(T )| ∧ k = ki (T ) then
7: return (T, i)
8: if T is leaf then
9: return N IL ▷ k doesn’t exist
10: else
11: T ← ci (T )
Starts from the root, this program examines each key one by one from the
smallest to the biggest. In case it finds the matched key, it returns the current
node and the index of this key. Otherwise, if it finds the position i that ki <
k < ki+1 , the program will next search the child node ci+1 for the key. If it
traverses to some leaf node, and fails to find the key, the empty value is returned
to indicate that this key doesn’t exist in the tree.
The following example Python program implements the search algorithm.
def B_tree_search(tr, key):
while True:
for i in range(len(tr.keys)):
if key ≤ tr.keys[i]:
break
if key == tr.keys[i]:
return (tr, i)
if tr.leaf:
return None
else:
if key > tr.keys[-1]:
i=i+1
tr = tr.children[i]

The search algorithm can also be realized by recursion. When search key k
in B-tree T = (K, C, t), we partition the keys with k.

K1 = {k ′ |k ′ < k}
K2 = {k ′ |k ≤ k ′ }
Thus K1 contains all the keys less than k, and K2 holds the rest. If the first
element in K2 is equal to k, we find the key. Otherwise, we recursively search
the key in child c|K1 |+1 .

 (T, |K1 | + 1) : k ∈ K2
search(T, k) = ϕ : C=ϕ (6.16)

search(c|K1 |+1 , k) : otherwise
6.5. NOTES AND SHORT SUMMARY 157

Below example Haskell program implements this algorithm.

search :: (Ord a)⇒ BTree a → a → Maybe (BTree a, Int)
search tr@(Node ks cs _) k
| matchFirst k $ drop len ks = Just (tr, len)
| otherwise = if null cs then Nothing
else search (cs !! len) k
where
matchFirst x (y:_) = x==y
matchFirst x _ = False
len = length $ filter (<k) ks

6.5 Notes and short summary

In this chapter, we explained the B-tree data structure as a kind of extension
from binary search tree. The background knowledge of magnetic disk access is
skipped, user can refer to [2] for detail. For the three main operations, insertion,
deletion, and searching, both imperative and functional algorithms are given.
They traverse from the root to the leaf. All the three operations perform in
time proportion to the height of the tree. Because B-tree always maintains the
balance properties. The performance is ensured to bound to O(lg n) time, where
n is the number of the keys in B-tree.

Exercise 6.1

• When insert a key, we need find a position, where all keys on the left are
less than it, while all the others on the right are greater than it. Modify
the algorithm so that the elements stored in B-tree only need support
less-than and equality test.

• We assume the element being inserted doesn’t exist in the tree. Modify
the algorithm so that duplicated elements can be stored in a linked-list.
• Eliminate the recursion in imperative B-tree insertion algorithm.
158 CHAPTER 6. B-TREES
Bibliography

[1] Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest and Clifford

Stein. “Introduction to Algorithms, Second Edition”. The MIT Press, 2001.
ISBN: 0262032937.

[2] B-tree, Wikipedia. http://en.wikipedia.org/wiki/B-tree

[3] Chris Okasaki. “FUNCTIONAL PEARLS Red-Black Trees in a Functional
Setting”. J. Functional Programming. 1998

159
160 BIBLIOGRAPHY
Part III

Heaps

161
Chapter 7

Binary Heaps

7.1 Introduction
Heaps are one of the most widely used data structures–used to solve practical
problems such as sorting, prioritized scheduling and in implementing graph
algorithms, to name a few[2].
Most popular implementations of heaps use a kind of implicit binary heap
using arrays, which is described in [2]. Examples include C++/STL heap and
Python heapq. The most efficient heap sort algorithm is also realized with
binary heap as proposed by R. W. Floyd [3] [5].
However, heaps can be general and realized with varies of other data struc-
tures besides array. In this chapter, explicit binary tree is used. It leads to
Leftist heaps, Skew heaps, and Splay heaps, which are suitable for purely func-
tional implementation as shown by Okasaki[6].
A heap is a data structure that satisfies the following heap property.
• Top operation always returns the minimum (maximum) element;
• Pop operation removes the top element from the heap while the heap
property should be kept, so that the new top element is still the minimum
(maximum) one;
• Insert a new element to heap should keep the heap property. That the
new top is still the minimum (maximum) element;
• Other operations including merge etc should all keep the heap property.
This is a kind of recursive definition, while it doesn’t limit the under ground
data structure.
We call the heap with the minimum element on top as min-heap, while if
the top keeps the maximum element, we call it max-heap.

7.2 Implicit binary heap by array

Considering the heap definition in previous section, one option to implement
heap is by using trees. A straightforward solution is to store the minimum
(maximum) element in the root of the tree, so for ‘top’ operation, we simply

163
164 CHAPTER 7. BINARY HEAPS

return the root as the result. And for ‘pop’ operation, we can remove the root
and rebuild the tree from the children.
If binary tree is used to implement the heap, we can call it binary heap. This
chapter explains three different realizations for binary heap.

7.2.1 Definition
The first one is implicit binary tree. Consider the problem how to represent
a complete binary tree with array. (For example, try to represent a complete
binary tree in the programming language doesn’t support structure or record
data type. Only array can be used). One solution is to pack all elements from
top level (root) down to bottom level (leaves).
Figure 7.1 shows a complete binary tree and its corresponding array repre-
sentation.

14 10

8 7 9 3

2 4 1

16 14 10 8 7 9 3 2 4 1

Figure 7.1: Mapping between a complete binary tree and array

This mapping between tree and array can be defined as the following equa-
tions (The array index starts from 1).
1: function Parent(i)
2: return ⌊ 2i ⌋

3: function Left(i)
4: return 2i

5: function Right(i)
6: return 2i + 1
For a given tree node which is represented as the i-th element of the array,
since the tree is complete, we can easily find its parent node as the ⌊i/2⌋-th
element; Its left child with index of 2i and right child of 2i + 1. If the index of
the child exceeds the length of the array, it means this node does not have such
a child (leaf for example).
In real implementation, this mapping can be calculated fast with bit-wise
operation like the following example ANSI C code. Note that, the array index
starts from zero in C like languages.
7.2. IMPLICIT BINARY HEAP BY ARRAY 165

#define PARENT(i) ((((i) + 1) >> 1) - 1)

#define LEFT(i) (((i) << 1) + 1)

#define RIGHT(i) (((i) + 1) << 1)

7.2.2 Heapify
The most important thing for heap algorithm is to maintain the heap property,
that the top element should be the minimum (maximum) one.
For the implicit binary heap by array, it means for a given node, which is
represented as the i-th index, we can develop a method to check if both its two
children are not less than the parent. In case there is violation, we need swap
the parent and child recursively [2]. Note that here we assume both the two
sub-trees are the valid heaps.
Below algorithm shows the iterative solution to enforce the min-heap prop-
erty from a given index of the array.
1: function Heapify(A, i)
2: n ← |A|
3: loop
4: l ← Left(i)
5: r ← Right(i)
6: smallest ← i
7: if l < n ∧ A[l] < A[i] then
8: smallest ← l
9: if r < n ∧ A[r] < A[smallest] then
10: smallest ← r
11: if smallest ̸= i then
12: Exchange A[i] ↔ A[smallest]
13: i ← smallest
14: else
15: return
For array A and the given index i, None its children should be less than
A[i], in case there is violation, we pick the smallest element as A[i], and swap
the previous A[i] to child. The algorithm traverses the tree top-down to fix the
heap property until either reach a leaf or there is no heap property violation.
The Heapify algorithm takes O(lg n) time, where n is the number of ele-
ments. This is because the loop time is proportion to the height of the complete
binary tree.
When implement this algorithm, the comparison method can be passed as
a parameter, so that both min-heap and max-heap can be supported. The
following ANSI C example code uses this approach.
typedef int (∗Less)(Key, Key);
int less(Key x, Key y) { return x < y; }
int notless(Key x, Key y) { return !less(x, y); }

void heapify(Key∗ a, int i, int n, Less lt) {

int l, r, m;
while (1) {
166 CHAPTER 7. BINARY HEAPS

l = LEFT(i);
r = RIGHT(i);
m = i;
if (l < n && lt(a[l], a[i]))
m = l;
if (r < n && lt(a[r], a[m]))
m = r;
if (m != i) {
swap(a, i, m);
i = m;
} else
break;
}
}

Figure 7.2 illustrates the steps when Heapify processing the array {16, 4, 10, 14, 7, 9, 3, 2, 8, 1}
from the second index. The array changes to {16, 14, 10, 8, 7, 9, 3, 2, 4, 1} as a
max-heap.

7.2.3 Build a heap

With Heapify algorithm defined, it is easy to build a heap from an arbitrary
array. Observe that the numbers of nodes in a complete binary tree for each
level is a list like below:
1, 2, 4, 8, ..., 2i , ....
The only exception is the last level. Since the tree may not full (note that
complete binary tree doesn’t mean full binary tree), the last level contains at
most 2p−1 nodes, where 2p + 1 ≤ n and n is the length of the array.
The Heapify algorithm doesn’t have any effect on leave node. We can skip
applying Heapify for all leaves. In other words, all leaf nodes have already
satisfied the heap property. We only need start checking and maintain the heap
property from the last branch node. the index of the last branch node is no
greater than ⌊n/2⌋.
Based on this fact, we can build a heap with the following algorithm. (As-
sume the heap is min-heap).
1: function Build-Heap(A)
2: n ← |A|
3: for i ← ⌊n/2⌋ down to 1 do
4: Heapify(A, i)
Although the complexity of Heapify is O(lg n), the running time of Build-
Heap is not bound to O(n lg n) but O(n). It is a linear time algorithm. This
can be deduced as the following:
The heap is built by skipping all leaves. Given n nodes, there are at most
n/4 nodes being compared and moved down 1 time; at most n/8 nodes being
compared and moved down 2 times; at most n/16 nodes being compared and
moved down 3 times,... Thus the upper bound of total comparison and moving
time is:

1 1 1
S = n( + 2 + 3 + ...) (7.1)
4 8 16
7.2. IMPLICIT BINARY HEAP BY ARRAY 167

4 10

14 7 9 3

2 8 1

(a) Step 1, 14 is the biggest element among 4, 14, and 7. Swap 4 with the left
child;

14 10

4 7 9 3

2 8 1

(b) Step 2, 8 is the biggest element among 2, 4, and 8. Swap 4 with the right
child;

14 10

8 7 9 3

2 4 1

(c) 4 is the leaf node. It hasn’t any children. Process terminates.

Figure 7.2: Heapify example, a max-heap case.

168 CHAPTER 7. BINARY HEAPS

Times by 2 for both sides, we have:

1 1 1
2S = n( + 2 + 3 + ...) (7.2)
2 4 8
Substract equation (7.1) from (7.2):
1 1 1
S = n( + + + ...) = n
2 4 8
Below ANSI C example program implements this heap building function.
void build_heap(Key∗ a, int n, Less lt) {
int i;
for (i = (n-1) >> 1; i ≥ 0; --i)
heapify(a, i, n, lt);
}

Figure 7.3, 7.4 and 7.5 show the steps when building a max-heap from array
{4, 1, 3, 2, 16, 9, 10, 14, 8, 7}. The node in black color is the one where Heapify
being applied, the nodes in gray color are swapped in order to keep the heap
property.

7.2.4 Basic heap operations

The generic definition of heap (not necessarily the binary heap) demands us to
to provide basic operations for accessing and modifying data.
The most important operations include accessing the top element (find the
minimum or maximum one), popping the top element from the heap, finding
the top k elements, decreasing a key ( for min-heap. It is increasing a key for
max-heap), and insertion.
For the binary tree, most of operations are bound to O(lg n) in worst-case,
some of them, such as top is O(1) constant time.

Access the top element

For the binary tree realization, it is the root stores the minimum (maximum)
value. This is the first element in the array.
1: function Top(A)
2: return A[1]
This operation is trivial. It takes O(1) time. Here we skip the error handling
for empty case. If the heap is empty, one option is to raise an error.

Heap Pop
Pop operation is more complex than accessing the top, because the heap prop-
erty has to be maintained after the top element is removed.
The solution is to apply Heapify algorithm to the next element after the
root is removed.
One simple but slow method based on this idea looks like the following.
1: function Pop-Slow(A)
2: x ← Top(A)
3: Remove(A, 1)
7.2. IMPLICIT BINARY HEAP BY ARRAY 169

4 1 3 2 16 9 10 14 8 7

(a) An array in arbitrary order be-

fore heap building process;

1 3

2 16 9 10

14 8 7

(b) Step 1, The array is mapped to binary tree. The first branch node, which
is 16 is examined;

1 3

2 16 9 10

14 8 7

Figure 7.3: Build a heap from the arbitrary array. Gray nodes are changed in
each step, black node will be processed next step.
170 CHAPTER 7. BINARY HEAPS

1 3

14 16 9 10

2 8 7

(a) Step 3, 14 is the largest value in the sub-tree, swap 14 and 2; next is to
check node with value 3;

1 10

14 16 9 3

2 8 7

(b) Step 4, 10 is the largest value in the sub-tree, swap 10 and 3; next is to
check node with value 1;

Figure 7.4: Build a heap from the arbitrary array. Gray nodes are changed in
each step, black node will be processed next step.
7.2. IMPLICIT BINARY HEAP BY ARRAY 171

16 10

14 7 9 3

2 8 1

(a) Step 5, 16 is the largest value in current sub-tree, swap 16 and 1 first; then
similarly, swap 1 and 7; next is to check the root node with value 4;

14 10

8 7 9 3

2 4 1

(b) Step 6, Swap 4 and 16, then swap 4 and 14, and then swap 4 and 8; And
the whole build process finish.

Figure 7.5: Build a heap from the arbitrary array. Gray nodes are changed in
each step, black node will be processed next step.
172 CHAPTER 7. BINARY HEAPS

4: if A is not empty then

5: Heapify(A, 1)
6: return x
This algorithm firstly records the top element in x, then it removes the first
element from the array, the size of this array is reduced by one. After that if the
array isn’t empty, Heapify will applied to the new array from the first element
(It was previously the second one).
Removing the first element from array takes O(n) time, where n is the length
of the array. This is because we need shift all the rest elements one by one. This
bottle neck slows the whole algorithm to linear time.
In order to solve this problem, one alternative is to swap the first element
with the last one in the array, then shrink the array size by one.
1: function Pop(A)
2: x ← Top(A)
3: n ← Heap-Size(A)
4: Exchange A[1] ↔ A[n]
5: Remove(A, n)
6: if A is not empty then
7: Heapify(A, 1)
8: return x
Removing the last element from the array takes only constant O(1) time, and
Heapify is bound to O(lg n). Thus the whole algorithm performs in O(lg n)
time. The following example ANSI C program implements this algorithm1 .
Key pop(Key∗ a, int n, Less lt) {
swap(a, 0, --n);
heapify(a, 0, n, lt);
return a[n];
}

Find the top k elements

With pop defined, it is easy to find the top k elements from array. we can build
a max-heap from the array, then perform pop operation k times.
1: function Top-k(A, k)
2: R←ϕ
3: Build-Heap(A)
4: for i ← 1 to Min(k, |A|) do
5: Append(R, Pop(A))
6: return R
If k is greater than the length of the array, we need return the whole array
as the result. That’s why it calls the Min function to determine the number of
loops.
Below example Python program implements the top-k algorithm.
def top_k(x, k, less_p = MIN_HEAP):
build_heap(x, less_p)
return [heap_pop(x, less_p) for _ in range(min(k, len(x)))]
1 This program does not actually remove the last element, it reuse the last cell to store the

popped result
7.2. IMPLICIT BINARY HEAP BY ARRAY 173

Decrease key
Heap can be used to implement priority queue. It is important to support key
modification in heap. One typical operation is to increase the priority of a tasks
so that it can be performed earlier.
Here we present the decrease key operation for a min-heap. The correspond-
ing operation is increase key for max-heap. Figure 7.6 and 7.7 illustrate such a
case for a max-heap. The key of the 9-th node is increased from 4 to 15.

14 10

8 7 9 3

2 4 1

(a) The 9-th node with key 4 will be modified;

14 10

8 7 9 3

2 15 1

(b) The key is modified to 15, which is greater than its parent;

14 10

15 7 9 3

2 8 1

(c) According the max-heap property, 8 and 15 are swapped.

Figure 7.6: Example process when increase a key in a max-heap.

Once a key is decreased in a min-heap, it may make the node conflict with
the heap property, that the key may be less than some ancestor. In order to
maintain the invariant, the following auxiliary algorithm is defined to resume
174 CHAPTER 7. BINARY HEAPS

15 10

14 7 9 3

2 8 1

(a) Since 15 is greater than its parent 14, they are swapped. After that, because
15 is less than 16, the process terminates.

Figure 7.7: Example process when increase a key in a max-heap.

the heap property.

1: function Heap-Fix(A, i)
2: while i > 1 ∧ A[i] < A[ Parent(i) ] do
3: Exchange A[i] ↔ A[ Parent(i) ]
4: i ← Parent(i)

This algorithm repeatedly compares the keys of parent node and current
node. It swap the nodes if the parent contains the smaller key. This process
is performed from the current node towards the root node till it finds that the
parent node holds the smaller key.
With this auxiliary algorithm, decrease key can be realized as below.
1: function Decrease-Key(A, i, k)
2: if k < A[i] then
3: A[i] ← k
4: Heap-Fix(A, i)

This algorithm is only triggered when the new key is less than the original
key. The performance is bound to O(lg n). Below example ANSI C program
implements the algorithm.

void heap_fix(Key∗ a, int i, Less lt) {

while (i > 0 && lt(a[i], a[PARENT(i)])) {
swap(a, i, PARENT(i));
i = PARENT(i);
}
}

void decrease_key(Key∗ a, int i, Key k, Less lt) {

if (lt(k, a[i])) {
a[i] = k;
heap_fix(a, i, lt);
}
}
7.2. IMPLICIT BINARY HEAP BY ARRAY 175

Insertion
Insertion can be implemented by using Decrease-Key [2]. A new node with
∞ as key is created. According to the min-heap property, it should be the last
element in the under ground array. After that, the key is decreased to the value
to be inserted, and Decrease-Key is called to fix any violation to the heap
property.
Alternatively, we can reuse Heap-Fix to implement insertion. The new key
is directly appended at the end of the array, and the Heap-Fix is applied to
this new node.
1: function Heap-Push(A, k)
2: Append(A, k)
3: Heap-Fix(A, |A|)
The following example Python program implements the heap insertion algo-
rithm.
def heap_insert(x, key, less_p = MIN_HEAP):
i = len(x)
x.append(key)
heap_fix(x, i, less_p)

7.2.5 Heap sort

Heap sort is interesting application of heap. According to the heap property,
the min(max) element can be easily accessed by from the top of the heap. A
straightforward way to sort a list of values is to build a heap from them, then
continuously pop the smallest element till the heap is empty.
The algorithm based on this idea can be defined like below.
1: function Heap-Sort(A)
2: R←ϕ
3: Build-Heap(A)
4: while A ̸= ϕ do
5: Append(R, Heap-Pop(A))
6: return R
The following Python example program implements this definition.
def heap_sort(x, less_p = MIN_HEAP):
res = []
build_heap(x, less_p)
while x!=[]:
res.append(heap_pop(x, less_p))
return res

When sort n elements, the Build-Heap is bound to O(n). Since pop is

O(lg n), and it is called n times, so the overall sorting takes O(n lg n) time to
run. Because we use another list to hold the result, the space requirement is
O(n).
Robert. W. Floyd found a fast implementation of heap sort. The idea is to
build a max-heap instead of min-heap, so the first element is the biggest one.
Then the biggest element is swapped with the last element in the array, so that
it is in the right position after sorting. As the last element becomes the new
176 CHAPTER 7. BINARY HEAPS

top, it may violate the heap property. We can shrink the heap size by one and
perform Heapify to resume the heap property. This process is repeated till
there is only one element left in the heap.
1: function Heap-Sort(A)
2: Build-Max-Heap(A)
3: while |A| > 1 do
4: Exchange A[1] ↔ A[n]
5: |A| ← |A| − 1
6: Heapify(A, 1)
This is in-place algorithm, it needn’t any extra spaces to hold the result.
The following ANSI C example code implements this algorithm.
void heap_sort(Key∗ a, int n) {
build_heap(a, n, notless);
while(n > 1) {
swap(a, 0, --n);
heapify(a, 0, n, notless);
}
}

Exercise 7.1

• Somebody considers one alternative to realize in-place heap sort. Take

sorting the array in ascending order as example, the first step is to build the
array as a minimum heap A, but not the maximum heap like the Floyd’s
method. After that the first element a1 is in the correct place. Next, treat
the rest {a2 , a3 , ..., an } as a new heap, and perform Heapify to them from
a2 for these n − 1 elements. Repeating this advance and Heapify step
from left to right would sort the array. The following example ANSI C
code illustrates this idea. Is this solution correct? If yes, prove it; if not,
why?
void heap_sort(Key∗ a, int n) {
build_heap(a, n, less);
while(--n)
heapify(++a, 0, n, less);
}

• Because of the same reason, can we perform Heapify from left to right k
times to realize in-place top-k algorithm like below ANSI C code?
int tops(int k, Key∗ a, int n, Less lt) {
build_heap(a, n, lt);
for (k = MIN(k, n) - 1; k; --k)
heapify(++a, 0, --n, lt);
return k;
}
7.3. LEFTIST HEAP AND SKEW HEAP, THE EXPLICIT BINARY HEAPS177

7.3 Leftist heap and Skew heap, the explicit bi-

nary heaps
Instead of using implicit binary tree by array, it is natural to consider why we
can’t use explicit binary tree to realize heap?
There are some problems must be solved if we turn into explicit binary tree
as the under ground data structure.
The first problem is about the Heap-Pop or Delete-Min operation. Con-
sider the binary tree is represented in form of left, key, and right as (L, k, R),
which is shown in figure 7.8

L R

Figure 7.8: A binary tree, all elements in children are not less than k.

If k is the top element, all elements in left and right children are not less than
k in a min-heap. After k is popped, only left and right children are left. They
have to be merged to a new tree. Since heap property should be maintained
after merge, the new root is still the smallest element.
Because both left and right children are binary trees conforming heap prop-
erty, the two trivial cases can be defined immediately.

 H2 : H1 = ϕ
merge(H1 , H2 ) = H1 : H2 = ϕ

? : otherwise
Where ϕ means empty heap.
If neither left nor right child is empty, because they all fit heap property, the
top elements of them are all the minimum respectively. We can compare these
two roots, and select the smaller as the new root of the merged heap.
For instance, let L = (A, x, B) and R = (A′ , y, B ′ ), where A, A′ , B, and B ′
are all sub trees. If x < y, x will be the new root. We can either keep A, and
recursively merge B and R; or keep B, and merge A and R, so the new heap
can be one of the following.

• (merge(A, R), x, B)

• (A, x, merge(B, R))

178 CHAPTER 7. BINARY HEAPS

Both are correct. One simplified solution is to only merge the right sub tree.
Leftist tree provides a systematically approach based on this idea.

7.3.1 Definition
The heap implemented by Leftist tree is called Leftist heap. Leftist tree is first
introduced by C. A. Crane in 1972[6].

Rank (S-value)
In Leftist tree, a rank value (or S value) is defined for each node. Rank is the
distance to the nearest external node. Where external node is the NIL concept
extended from the leaf node.
For example, in figure 7.9, the rank of NIL is defined 0, consider the root
node 4, The nearest external node is the child of node 8. So the rank of root
node 4 is 2. Because node 6 and node 8 both only contain NIL, so their rank
values are 1. Although node 5 has non-NIL left child, However, since the right
child is NIL, so the rank value, which is the minimum distance to NIL is still 1.

5 8

6 NIL NIL NIL

NIL NIL

Figure 7.9: rank(4) = 2, rank(6) = rank(8) = rank(5) = 1.

Leftist property
With rank defined, we can create a strategy when merging.

• Every time when merging, we always merge to right child; Denote the
rank of the new right sub tree as rr ;
• Compare the ranks of the left and right children, if the rank of left sub
tree is rl and rl < rr , we swap the left and the right children.

We call this ‘Leftist property’. In general, a Leftist tree always has the
shortest path to some external node on the right.
Leftist tree tends to be very unbalanced, However, it ensures important
property as specified in the following theorem.
Theorem 7.3.1. If a Leftist tree T contains n internal nodes, the path from
root to the rightmost external node contains at most ⌊log(n + 1)⌋ nodes.
7.3. LEFTIST HEAP AND SKEW HEAP, THE EXPLICIT BINARY HEAPS179

We skip the proof here, readers can refer to [7] and [1] for more information.
With this theorem, algorithms operate along this path are all bound to O(lg n).
We can reuse the binary tree definition, and augment with a rank field to
define the Leftist tree, for example in form of (r, k, L, R) for non-empty case.
Below Haskell code defines the Leftist tree.
data LHeap a = E -- Empty
| Node Int a (LHeap a) (LHeap a) -- rank, element, left, right

For empty tree, the rank is defined as zero. Otherwise, it’s the value of the
augmented field. A rank(H) function can be given to cover both cases.
{
0 : H=ϕ
rank(H) = (7.3)
r : otherwise, H = (r, k, L, R)
Here is the example Haskell rank function.
rank E = 0
rank (Node r _ _ _) = r

In the rest of this section, we denote rank(H) as rH

7.3.2 Merge
In order to realize ‘merge’, we need develop the auxiliary algorithm to compare
the ranks and swap the children if necessary.
{
(rA + 1, k, B, A) : rA < rB
mk(k, A, B) = (7.4)
(rB + 1, k, A, B) : otherwise
This function takes three arguments, a key and two sub trees A, and B. if
the rank of A is smaller, it builds a bigger tree with B as the left child, and A
as the right child. It increment the rank of A by 1 as the rank of the new tree;
Otherwise if B holds the smaller rank, then A is set as the left child, and B
becomes the right. The resulting rank is rb + 1.
The reason why rank need be increased by one is because there is a new key
added on top of the tree. It causes the rank increasing.
Denote the key, the left and right children for H1 and H2 as k1 , L1 , R1 , and
k2 , L2 , R2 respectively. The merge(H1 , H2 ) function can be completed by using
this auxiliary tool as below



 H2 : H1 = ϕ

H1 : H2 = ϕ
merge(H1 , H2 ) = (7.5)

 mk(k 1 , L 1 , merge(R 1 , H 2 )) : k1 < k 2

mk(k2 , L2 , merge(H1 , R2 )) : otherwise

The merge function is always recursively called on the right side, and the
Leftist property is maintained. These facts ensure the performance being bound
to O(lg n).
The following Haskell example code implements the merge program.
merge E h = h
merge h E = h
merge h1@(Node _ x l r) h2@(Node _ y l' r') =
180 CHAPTER 7. BINARY HEAPS

if x < y then makeNode x l (merge r h2)

else makeNode y l' (merge h1 r')

makeNode x a b = if rank a < rank b then Node (rank a + 1) x b a

else Node (rank b + 1) x a b

Merge operation in implicit binary heap by array

Implicit binary heap by array performs very fast in most cases, and it fits modern
computer with cache technology well. However, merge is the algorithm bounds
to O(n) time. The typical realization is to concatenate two arrays together and
make a heap for this array [13].
1: function Merge-Heap(A, B)
2: C ← Concat(A, B)
3: Build-Heap(C)

7.3.3 Basic heap operations

Most of the basic heap operations can be implemented with merge algorithm
defined above.

Top and pop

Because the smallest element is always held in root, it’s trivial to find the
minimum value. It’s constant O(1) operation. Below equation extracts the root
from non-empty heap H = (r, k, L, R). The error handling for empty case is
skipped here.

top(H) = k (7.6)
For pop operation, firstly, the top element is removed, then left and right
children are merged to a new heap.

pop(H) = merge(L, R) (7.7)

Because it calls merge directly, the pop operation on Leftist heap is bound
to O(lg n).

Insertion
To insert a new element, one solution is to create a single leaf node with the
element, and then merge this leaf node to the existing Leftist tree.

insert(H, k) = merge(H, (1, k, ϕ, ϕ)) (7.8)

It is O(lg n) algorithm since insertion also calls merge directly.
There is a convenient way to build the Leftist heap from a list. We can
continuously insert the elements one by one to the empty heap. This can be
realized by folding.

build(L) = f old(insert, ϕ, L) (7.9)

7.3. LEFTIST HEAP AND SKEW HEAP, THE EXPLICIT BINARY HEAPS181

4 3

7 9 14 8

16 10

Figure 7.10: A Leftist tree built from list {9, 4, 16, 7, 10, 2, 14, 3, 8, 1}.

Figure 7.10 shows one example Leftist tree built in this way.
The following example Haskell code gives reference implementation for the
Leftist tree operations.
insert h x = merge (Node 1 x E E) h

findMin (Node _ x _ _) = x

deleteMin (Node _ _ l r) = merge l r

fromList = foldl insert E

7.3.4 Heap sort by Leftist Heap

With all the basic operations defined, it’s straightforward to implement heap
sort. We can firstly turn the list into a Leftist heap, then continuously extract
the minimum element from it.

sort(L) = heapSort(build(L)) (7.10)

{
ϕ : H=ϕ
heapSort(H) = (7.11)
{top(H)} ∪ heapSort(pop(H)) : otherwise

Because pop is logarithm operation, and it is recursively called n times, this

algorithm takes O(n lg n) time in total. The following Haskell example program
implements heap sort with Leftist tree.
heapSort = hsort ◦ fromList where
hsort E = []
hsort h = (findMin h):(hsort $ deleteMin h)
182 CHAPTER 7. BINARY HEAPS

7.3.5 Skew heaps

Leftist heap leads to quite unbalanced structure sometimes. Figure 7.11 shows
one example. The Leftist tree is built by folding on list {16, 14, 10, 8, 7, 9, 3, 2, 4, 1}.

3 4

8 9

Figure 7.11: A very unbalanced Leftist tree build from list

{16, 14, 10, 8, 7, 9, 3, 2, 4, 1}.

Skew heap (or self-adjusting heap) simplifies Leftist heap realization and
intends to solve the balance issue[9] [10].
When construct the Leftist heap, we swap the left and right children during
merge if the rank on left side is less than the right side. This comparison-and-
swap strategy doesn’t work when either sub tree has only one child. Because
in such case, the rank of the sub tree is always 1 no matter how big it is. A
‘Brute-force’ approach is to swap the left and right children every time when
merge. This idea leads to Skew heap.

Definition of Skew heap

Skew heap is the heap realized with Skew tree. Skew tree is a special binary
tree. The minimum element is stored in root. Every sub tree is also a skew tree.
It needn’t keep the rank (or S-value) field. We can reuse the binary tree
definition for Skew heap. The tree is either empty, or in a pre-order form
(k, L, R). Below Haskell code defines Skew heap like this.
data SHeap a = E -- Empty
| Node a (SHeap a) (SHeap a) -- element, left, right

Merge
The merge algorithm tends to be very simple. When merge two non-empty Skew
trees, we compare the roots, and pick the smaller one as the new root, then the
other tree contains the bigger element is merged onto one sub tree, finally, the
tow children are swapped. Denote H1 = (k1 , L1 , R1 ) and H2 = (k2 , L2 , R2 ) if
7.3. LEFTIST HEAP AND SKEW HEAP, THE EXPLICIT BINARY HEAPS183

they are not empty. if k1 < k2 for instance, select k1 as the new root. We
can either merge H2 to L1 , or merge H2 to R1 . Without loss of generality,
let’s merge to R1 . And after swapping the two children, the final result is
(k1 , merge(R1 , H2 ), L1 ). Take account of edge cases, the merge algorithm is
defined as the following.



 H1 : H2 = ϕ

H2 : H1 = ϕ
merge(H1 , H2 ) = (7.12)

 (k1 , merge(R1 , H 2 ), L 1) : k1 < k 2

(k2 , merge(H1 , R2 ), L2 ) : otherwise

All the rest operations, including insert, top and pop are all realized as same
as the Leftist heap by using merge, except that we needn’t the rank any more.
Translating the above algorithm into Haskell yields the following example
program.

merge E h = h
merge h E = h
merge h1@(Node x l r) h2@(Node y l' r') =
if x < y then Node x (merge r h2) l
else Node y (merge h1 r') l'

insert h x = merge (Node x E E) h

findMin (Node x _ _) = x

deleteMin (Node _ l r) = merge l r

Different from the Leftist heap, if we feed ordered list to Skew heap, it can
build a fairly balanced binary tree as illustrated in figure 7.12.

4 3

7 9 14 8

16 10

Figure 7.12: Skew tree is still balanced even the input is an ordered list
{1, 2, ..., 10}.
184 CHAPTER 7. BINARY HEAPS

7.4 Splay heap

The Leftist heap and Skew heap show the fact that it’s quite possible to realize
heap data structure with explicit binary tree. Skew heap gives one method to
solve the tree balance problem. Splay heap on the other hand, use another
method to keep the tree balanced.
The binary trees used in Leftist heap and Skew heap are not Binary Search
tree (BST). If we turn the underground data structure to binary search tree,
the minimum(or maximum) element is not root any more. It takes O(lg n) time
to find the minimum(or maximum) element.
Binary search tree becomes inefficient if it isn’t well balanced. Most op-
erations degrade to O(n) in the worst case. Although red-black tree can be
used to realize binary heap, it’s overkill. Splay tree provides a light weight
implementation with acceptable dynamic balancing result.

7.4.1 Definition
Splay tree uses cache-like approach. It keeps rotating the current access node
close to the top, so that the node can be accessed fast next time. It defines
such kinds of operation as “Splay”. For the unbalanced binary search tree, after
several splay operations, the tree tends to be more and more balanced. Most
basic operations of Splay tree perform in amortized O(lg n) time. Splay tree
was invented by Daniel Dominic Sleator and Robert Endre Tarjan in 1985[11]
[12].

Splaying
There are two methods to do splaying. The first one need deal with many
different cases, but can be implemented fairly easy with pattern matching. The
second one has a uniformed form, but the implementation is complex.
Denote the node currently being accessed as X, the parent node as P , and
the grand parent node as G (If there are). There are 3 steps for splaying. Each
step contains 2 symmetric cases. For illustration purpose, only one case is shown
for each step.

• Zig-zig step. As shown in figure 7.13, in this case, X and P are children
on the same side of G, either both on left or right. By rotating 2 times,
X becomes the new root.

• Zig-zag step. As shown in figure 7.14, in this case, X and P are children
on different sides. X is on the left, P is on the right. Or X is on the right,
P is on the left. After rotation, X becomes the new root, P and G are
siblings.

• Zig step. As shown in figure 7.15, in this case, P is the root, we rotate the
tree, so that X becomes new root. This is the last step in splay operation.

Although there are 6 different cases, they can be handled in the environments
support pattern matching. Denote the non-empty binary tree in form T =
7.4. SPLAY HEAP 185

G X

P d a p

X c b g

a b c d

(a) X and P are both left children or both right (b) X becomes new root after rotating 2 times.
children.

Figure 7.13: Zig-zig case.

P d
X

a X
P G

b c
a b c d

(a) X and P are children on different sides. (b) X becomes new root. P and G are siblings.

Figure 7.14: Zig-zag case.

P X

X c a P

a b b c

(a) P is the root. (b) Rotate the tree to make X be new root.

Figure 7.15: Zig case.

186 CHAPTER 7. BINARY HEAPS

(L, k, R),. when access key Y in tree T , the splay operation can be defined as
below.



 (a, X, (b, P, (c, G, d))) :
T = (((a, X, b), P, c), G, d), X = Y



 (((a, G, b), P, c), X, d) :
T = (a, G, (b, P, (c, X, d))), X = Y


 ((a, P, b), X, (c, G, d)) :
T = (a, P, (b, X, c), G, d), X = Y
splay(T, X) = ((a, G, b), X, (c, P, d)) :
T = (a, G, ((b, X, c), P, d)), X = Y



 (a, X, (b, P, c)) :
T = ((a, X, b), P, c), X = Y



 ((a, P, b), X, c) :
T = (a, P, (b, X, c)), X = Y

T :
otherwise
(7.13)
The first two clauses handle the ’zig-zig’ cases; the next two clauses handle
the ’zig-zag’ cases; the last two clauses handle the zig cases. The tree aren’t
changed for all other situations.
The following Haskell program implements this splay function.
data STree a = E -- Empty
| Node (STree a) a (STree a) -- left, key, right

-- zig-zig
splay t@(Node (Node (Node a x b) p c) g d) y =
if x == y then Node a x (Node b p (Node c g d)) else t
splay t@(Node a g (Node b p (Node c x d))) y =
if x == y then Node (Node (Node a g b) p c) x d else t
-- zig-zag
splay t@(Node (Node a p (Node b x c)) g d) y =
if x == y then Node (Node a p b) x (Node c g d) else t
splay t@(Node a g (Node (Node b x c) p d)) y =
if x == y then Node (Node a g b) x (Node c p d) else t
-- zig
splay t@(Node (Node a x b) p c) y = if x == y then Node a x (Node b p c) else t
splay t@(Node a p (Node b x c)) y = if x == y then Node (Node a p b) x c else t
-- otherwise
splay t _ = t

With splay operation defined, every time when insert a new key, we call
the splay function to adjust the tree. If the tree is empty, the result is a leaf;
otherwise we compare this key with the root, if it is less than the root, we
recursively insert it into the left child, and perform splaying after that; else the
key is inserted into the right child.


 (ϕ, x, ϕ) : T =ϕ
insert(T, x) = splay((insert(L, x), k, R), x) : T = (L, k, R), x < k

splay(L, k, insert(R, x)) : otherwise
(7.14)
The following Haskell program implements this insertion algorithm.
insert E y = Node E y E
insert (Node l x r) y
| x>y = splay (Node (insert l y) x r) y
| otherwise = splay (Node l x (insert r y)) y
7.4. SPLAY HEAP 187

Figure 7.16 shows the result of using this function. It inserts the ordered
elements {1, 2, ..., 10} one by one to the empty tree. This would build a very
poor result which downgrade to linked-list with normal binary search tree. The
splay method creates more balanced result.

4 10

2 9

1 3 7

6 8

Figure 7.16: Splaying helps improving the balance.

Okasaki found a simple rule for Splaying [6]. Whenever we follow two left
branches, or two right branches continuously, we rotate the two nodes.

Based on this rule, splaying can be realized in such a way. When we access
node for a key x (can be during the process of inserting a node, or looking up a
node, or deleting a node), if we traverse two left branches or two right branches,
we partition the tree in two parts L and R, where L contains all nodes smaller
than x, and R contains all the rest. We can then create a new tree (for instance
in insertion), with x as the root, L as the left child, and R being the right child.
188 CHAPTER 7. BINARY HEAPS

The partition process is recursive, because it will splay its children as well.



 (ϕ, ϕ) : T =ϕ



 (T, ϕ) : T = (L, k, R) ∧ R = ϕ







 T = (L, k, (L′ , k ′ , R′ ))

 ((L, k, L′ ), k ′ , A, B) :

 k < p, k ′ < p



 (A, B) = partition(R′ , p)







 T = (L, K, (L′ , k ′ , R′ ))



 ((L, k, A), (B, k ′ , R′ )) : k < p ≤ k′

(A, B) = partition(L′ , p)
partition(T, p) =





 (ϕ, T ) : T = (L, k, R) ∧ L = ϕ







 T = ((L′ , k ′ , R′ ), k, R)

 (A, (L′ , k ′ , (R′ , k, R)) :

 p ≤ k, p ≤ k ′



 (A, B) = partition(L′ , p)







 T = ((L′ , k ′ , R′ ), k, R)

 ((L′ , k ′ , A), (B, k, R)) :

 k′ ≤ p ≤ k

(A, B) = partition(R′ , p)
(7.15)
Function partition(T, p) takes a tree T , and a pivot p as arguments. The
first clause is edge case. The partition result for empty is a pair of empty left
and right trees. Otherwise, denote the tree as (L, k, R). we need compare the
pivot p and the root k. If k < p, there are two sub-cases. one is trivial case that
R is empty. According to the property of binary search tree, All elements are
less than p, so the result pair is (T, ϕ); For the other case, R = (L′ , k ′ , R′ ), we
need further compare k ′ with the pivot p. If k ′ < p is also true, we recursively
partition R′ with the pivot, all the elements less than p in R′ is held in tree A,
and the rest is in tree B. The result pair can be composed with two trees, one is
((L, k, L′ ), k ′ , A); the other is B. If the key of the right sub tree is not less than
the pivot, we recursively partition L′ with the pivot to give the intermediate
pair (A, B), the final pair trees can be composed with (L, k, A) and (B, k ′ , R′ ).
There are symmetric cases for p ≤ k. They are handled in the last three clauses.
Translating the above algorithm into Haskell yields the following partition
program.
partition E _ = (E, E)
partition t@(Node l x r) y
| x<y=
case r of
E → (t, E)
Node l' x' r' →
if x' < y then
let (small, big) = partition r' y in
(Node (Node l x l') x' small, big)
else
let (small, big) = partition l' y in
(Node l x small, Node big x' r')
7.4. SPLAY HEAP 189

| otherwise =
case l of
E → (E, t)
Node l' x' r' →
if y < x' then
let (small, big) = partition l' y in
(small, Node l' x' (Node r' x r))
else
let (small, big) = partition r' y in
(Node l' x' small, Node big x r)
Alternatively, insertion can be realized with partition algorithm. When
insert a new element k into the splay heap T , we can first partition the heap
into two trees, L and R. Where L contains all nodes smaller than k, and R
contains the rest. We then construct a new node, with k as the root and L, R
as the children.

insert(T, k) = (L, k, R), (L, R) = partition(T, k) (7.16)

The corresponding Haskell example program is as the following.
insert t x = Node small x big where (small, big) = partition t x

Top and pop

Since splay tree is just a special binary search tree, the minimum element is
stored in the left most node. We need keep traversing the left child to realize the
top operation. Denote the none empty tree T = (L, k, R), the top(T ) function
can be defined as below.
{
k : L=ϕ
top(T ) = (7.17)
top(L) : otherwise
This is exactly the min(T ) algorithm for binary search tree.
For pop operation, the algorithm need remove the minimum element from
the tree. Whenever there are two left nodes traversed, the splaying operation
should be performed.

 R : T = (ϕ, k, R)
pop(T ) = (R′ , k, R) : T = ((ϕ, k ′ , R′ ), k, R) (7.18)

(pop(L′ ), k ′ , (R′ , k, R)) : T = ((L′ , k ′ , R′ ), k, R)
Note that the third clause performs splaying without explicitly call the
partition function. It utilizes the property of binary search tree directly.
Both the top and pop algorithms are bound to O(lg n) time because the
splay tree is balanced.
The following Haskell example programs implement the top and pop opera-
tions.
findMin (Node E x _) = x
findMin (Node l x _) = findMin l

deleteMin (Node E x r) = r
deleteMin (Node (Node E x' r') x r) = Node r' x r
deleteMin (Node (Node l' x' r') x r) = Node (deleteMin l') x' (Node r' x r)
190 CHAPTER 7. BINARY HEAPS

Merge
Merge is another basic operation for heaps as it is widely used in Graph al-
gorithms. By using the partition algorithm, merge can be realized in O(lg n)
time.
When merging two splay trees, for non-trivial case, we can take the root of
the first tree as the new root, then partition the second tree with this new root
as the pivot. After that we recursively merge the children of the first tree to the
partition result. This algorithm is defined as the following.

{
T2 : T1 = ϕ
merge(T1 , T2 ) =
(merge(L, A), k, merge(R, B)) : T1 = (L, k, R), (A, B) = partition(T2 , k)
(7.19)
If the first heap is empty, the result is definitely the second heap. Otherwise,
denote the first splay heap as (L, k, R), we partition T2 with k as the pivot to
yield (A, B), where A contains all the elements in T2 which are less than k, and
B holds the rest. We next recursively merge A with L; and merge B with R as
the new children for T1 .
Translating the definition to Haskell gives the following example program.
merge E t = t
merge (Node l x r) t = Node (merge l l') x (merge r r')
where (l', r') = partition t x

7.4.2 Heap sort

Since the internal implementation of the Splay heap is completely transparent
to the heap interface, the heap sort algorithm can be reused. It means that the
heap sort algorithm is generic no matter what the underground data structure
is.

7.5 Notes and short summary

In this chapter, we define binary heap more general so that as long as the heap
property is maintained, all binary representation of data structures can be used
to implement binary heap.
This definition doesn’t limit to the popular array based binary heap, but
also extends to the explicit binary heaps including Leftist heap, Skew heap and
Splay heap. The array based binary heap is particularly convenient for the
imperative implementation because it intensely uses random index access which
can be mapped to a completely binary tree. It’s hard to find directly functional
counterpart in this way.
However, by using explicit binary tree, functional implementation can be
achieved, most of them have O(lg n) worst case performance, and some of them
even reach O(1) amortize time. Okasaki in [6] shows detailed analysis of these
data structures.
In this chapter, only purely functional realization for Leftist heap, Skew
heap, and Splay heap are explained, they can all be realized in imperative
approaches.
7.5. NOTES AND SHORT SUMMARY 191

It’s very natural to extend the concept from binary tree to k-ary (k-way)
tree, which leads to other useful heaps such as Binomial heap, Fibonacci heap
and pairing heap. They are introduced in the following chapters.

Exercise 7.2

• Realize the imperative Leftist heap, Skew heap, and Splay heap.
192 CHAPTER 7. BINARY HEAPS
Bibliography

[1] Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest and Clifford

Stein. “Introduction to Algorithms, Second Edition”. The MIT Press, 2001.
ISBN: 0262032937.

[2] Heap (data structure), Wikipedia. http://en.wikipedia.org/wiki/Heap_(data_structure)

[3] Heapsort, Wikipedia. http://en.wikipedia.org/wiki/Heapsort
[4] Chris Okasaki. “Purely Functional Data Structures”. Cambridge university
press, (July 1, 1999), ISBN-13: 978-0521663502

[5] Sorting algorithms/Heapsort. Rosetta Code.

http://rosettacode.org/wiki/Sorting_algorithms/Heapsort
[6] Leftist Tree, Wikipedia. http://en.wikipedia.org/wiki/Leftist_tree
[7] Bruno R. Preiss. Data Structures and Algorithms with Object-Oriented De-
sign Patterns in Java. http://www.brpreiss.com/books/opus5/index.html
[8] Donald E. Knuth. “The Art of Computer Programming. Volume 3: Sorting
and Searching.”. Addison-Wesley Professional; 2nd Edition (October 15,
1998). ISBN-13: 978-0201485417. Section 5.2.3 and 6.2.3
[9] Skew heap, Wikipedia. http://en.wikipedia.org/wiki/Skew_heap

[10] Sleator, Daniel Dominic; Jarjan, Robert Endre. “Self-adjusting heaps”

SIAM Journal on Computing 15(1):52-69. doi:10.1137/0215004 ISSN
00975397 (1986)
[11] Splay tree, Wikipedia. http://en.wikipedia.org/wiki/Splay_tree

[12] Sleator, Daniel D.; Tarjan, Robert E. (1985), “Self-Adjusting Binary Search
Trees”, Journal of the ACM 32(3):652 - 686, doi: 10.1145/3828.3835
[13] NIST, “binary heap”. http://xw2k.nist.gov/dads//HTML/binaryheap.html

193
194 The evolution of selection sort
Chapter 8

From grape to the world

cup, the evolution of
selection sort

8.1 Introduction
We have introduced the ‘hello world’ sorting algorithm, insertion sort. In this
short chapter, we explain another straightforward sorting method, selection sort.
The basic version of selection sort doesn’t perform as good as the divide and
conqueror methods, e.g. quick sort and merge sort. We’ll use the same ap-
proaches in the chapter of insertion sort, to analyze why it’s slow, and try to
improve it by varies of attempts till reach the best bound of comparison based
sorting, O(n lg n), by evolving to heap sort.
The idea of selection sort can be illustrated by a real life story. Consider
a kid eating a bunch of grapes. There are two types of children according to
my observation. One is optimistic type, that the kid always eats the biggest
grape he/she can ever find; the other is pessimistic, that he/she always eats the
smallest one.
The first type of kids actually eat the grape in an order that the size decreases
monotonically; while the other eat in a increase order. The kid sorts the grapes
in order of size in fact, and the method used here is selection sort.
Based on this idea, the algorithm of selection sort can be directly described
as the following.
In order to sort a series of elements:

• The trivial case, if the series is empty, then we are done, the result is also
empty;

• Otherwise, we find the smallest element, and append it to the tail of the
result;

Note that this algorithm sorts the elements in increase order; It’s easy to
sort in decrease order by picking the biggest element instead; We’ll introduce
about passing a comparator as a parameter later on.

195
196CHAPTER 8. FROM GRAPE TO THE WORLD CUP, THE EVOLUTION OF SELECTION SORT

Figure 8.1: Always picking the smallest grape.

This description can be formalized to a equation.

{
ϕ : A=ϕ
sort(A) = (8.1)
{m} ∪ sort(A′ ) : otherwise
Where m is the minimum element among collection A, and A′ is all the rest
elements except m:

m = min(A)
A′ = A − {m}
We don’t limit the data structure of the collection here. Typically, A is an
array in imperative environment, and a list (singly linked-list particularly) in
functional environment, and it can even be other data struture which will be
introduced later.
The algorithm can also be given in imperative manner.
function Sort(A)
X←ϕ
while A ̸= ϕ do
x ← Min(A)
A ← Del(A, x)
X ← Append(X, x)
return X
Figure 8.2 depicts the process of this algorithm.

pick

... sorted elements ... min ... unsorted elements ...

Figure 8.2: The left part is sorted data, continuously pick the minimum element
in the rest and append it to the result.

We just translate the very original idea of ‘eating grapes’ line by line without
considering any expense of time and space. This realization stores the result in
8.2. FINDING THE MINIMUM 197

X, and when an selected element is appended to X, we delete the same element

from A. This indicates that we can change it to ‘in-place’ sorting to reuse the
spaces in A.
The idea is to store the minimum element in the first cell in A (we use
term ‘cell’ if A is an array, and say ‘node’ if A is a list); then store the second
minimum element in the next cell, then the third cell, ...
One solution to realize this sorting strategy is swapping. When we select
the i-th minimum element, we swap it with the element in the i-th cell:
function Sort(A)
for i ← 1 to |A| do
m ← Min(A[i...])
Exchange A[i] ↔ m
Denote A = {a1 , a2 , ..., an }. At any time, when we process the i-th element,
all elements before i, as {a1 , a2 , ..., ai−1 } have already been sorted. We locate
the minimum element among the {ai , ai+1 , ..., an }, and exchange it with ai , so
that the i-th cell contains the right value. The process is repeatedly executed
until we arrived at the last element.
This idea can be illustrated by figure 8.3.

insert

... sorted elements ... x ... unsorted elements ...

Figure 8.3: The left part is sorted data, continuously pick the minimum element
in the rest and put it to the right position.

8.2 Finding the minimum

We haven’t completely realized the selection sort, because we take the operation
of finding the minimum (or the maximum) element as a black box. It’s a puzzle
how does a kid locate the biggest or the smallest grape. And this is an interesting
topic for computer algorithms.
The easiest but not so fast way to find the minimum in a collection is to
perform a scan. There are several ways to interpret this scan process. Consider
that we want to pick the biggest grape. We start from any grape, compare
it with another one, and pick the bigger one; then we take a next grape and
compare it with the one we selected so far, pick the bigger one and go on the
take-and-compare process, until there are not any grapes we haven’t compared.
It’s easy to get loss in real practice if we don’t mark which grape has been
compared. There are two ways to to solve this problem, which are suitable for
different data-structures respectively.
198CHAPTER 8. FROM GRAPE TO THE WORLD CUP, THE EVOLUTION OF SELECTION SORT

8.2.1 Labeling
Method 1 is to label each grape with a number: {1, 2, ..., n}, and we systemat-
ically perform the comparison in the order of this sequence of labels. That we
first compare grape number 1 and grape number 2, pick the bigger one; then we
take grape number 3, and do the comparison, ... We repeat this process until
arrive at grape number n. This is quite suitable for elements stored in an array.
function Min(A)
m ← A[1]
for i ← 2 to |A| do
if A[i] < m then
m ← A[i]
return m
With Min defined, we can complete the basic version of selection sort (or
naive version without any optimization in terms of time and space).
However, this algorithm returns the value of the minimum element instead
of its location (or the label of the grape), which needs a bit tweaking for the
in-place version. Some languages such as ISO C++, support returning the
reference as result, so that the swap can be achieved directly as below.
template<typename T>
T& min(T∗ from, T∗ to) {
T∗ m;
for (m = from++; from != to; ++from)
if (∗from < ∗m)
m = from;
return ∗m;
}

template<typename T>
void ssort(T∗ xs, int n) {
for (int i = 0; i < n; ++i)
std::swap(xs[i], min(xs+i, xs+n));
}

In environments without reference semantics, the solution is to return the

location of the minimum element instead of the value:
function Min-At(A)
m ← First-Index(A)
for i ← m + 1 to |A| do
if A[i] < A[m] then
m←i
return m
Note that since we pass A[i...] to Min-At as the argument, we assume the
first element A[i] as the smallest one, and examine all elements A[i + 1], A[i +
2], ... one by one. Function First-Index() is used to retrieve i from the input
parameter.
The following Python example program, for example, completes the basic
in-place selection sort algorithm based on this idea. It explicitly passes the
range information to the function of finding the minimum location.
def ssort(xs):
8.2. FINDING THE MINIMUM 199

n = len(xs)
for i in range(n):
m = min_at(xs, i, n)
(xs[i], xs[m]) = (xs[m], xs[i])
return xs

def min_at(xs, i, n):

m = i;
for j in range(i+1, n):
if xs[j] < xs[m]:
m=j
return m

8.2.2 Grouping
Another method is to group all grapes in two parts: the group we have examined,
and the rest we haven’t. We denote these two groups as A and B; All the
elements (grapes) as L. At the beginning, we haven’t examine any grapes at
all, thus A is empty (ϕ), and B contains all grapes. We can select arbitrary two
grapes from B, compare them, and put the loser (the smaller one for example) to
A. After that, we repeat this process by continuously picking arbitrary grapes
from B, and compare with the winner of the previous time until B becomes
empty. At this time being, the final winner is the minimum element. And A
turns to be L−{min(L)}, which can be used for the next time minimum finding.
There is an invariant of this method, that at any time, we have L = A ∪
{m} ∪ B, where m is the winner so far we hold.
This approach doesn’t need the collection of grapes being indexed (as being
labeled in method 1). It’s suitable for any traversable data structures, including
linked-list etc. Suppose b1 is an arbitrary element in B if B isn’t empty, and B ′
is the rest of elements with b1 being removed, this method can be formalized as
the below auxiliary function.

 (m, A) : B = ϕ
min′ (A, m, B) = min′ (A ∪ {m}, b1 , B ′ ) : b1 < m (8.2)

min′ (A ∪ {b1 }, m, B ′ ) : otherwise
In order to pick the minimum element, we call this auxiliary function by
passing an empty A, and use an arbitrary element (for instance, the first one)
to initialize m:

extractM in(L) = min′ (ϕ, l1 , L′ ) (8.3)

Where L′ is all elements in L except for the first one l1 . The algorithm
extractM in doesn’t not only find the minimum element, but also returns the
updated collection which doesn’t contain this minimum. Summarize this mini-
mum extracting algorithm up to the basic selection sort definition, we can create
a complete functional sorting program, for example as this Haskell code snippet.
sort [] = []
sort xs = x : sort xs' where
(x, xs') = extractMin xs
200CHAPTER 8. FROM GRAPE TO THE WORLD CUP, THE EVOLUTION OF SELECTION SORT

extractMin (x:xs) = min' [] x xs where

min' ys m [] = (m, ys)
min' ys m (x:xs) = if m < x then min' (x:ys) m xs else min' (m:ys) x xs
The first line handles the trivial edge case that the sorting result for empty
list is obvious empty; The second clause ensures that, there is at least one
element, that’s why the extractMin function needn’t other pattern-matching.
One may think the second clause of min' function should be written like
below:
min' ys m (x:xs) = if m < x then min' ys ++ [x] m xs
else min' ys ++ [m] x xs
Or it will produce the updated list in reverse order. Actually, it’s necessary
to use ‘cons’ instead of appending here. This is because appending is linear
operation which is proportion to the length of part A, while ‘cons’ is constant
O(1) time operation. In fact, we needn’t keep the relative order of the list to
be sorted, as it will be re-arranged anyway during sorting.
It’s quite possible to keep the relative order during sorting1 , while ensure
the performance of finding the minimum element not degrade to quadratic. The
following equation defines a solution.


 (l1 , ϕ) : |L| = 1
extractM in(L) = (l1 , L′ ) : l1 < m, (m, L′′ ) = extractM in(L′ )

(m, l1 ∪ L′′ ) : otherwise
(8.4)
If L is a singleton, the minimum is the only element it contains. Otherwise,
denote l1 as the first element in L, and L′ contains the rest elements except for
l1 , that L′ = {l2 , l3 , ...}. The algorithm recursively finding the minimum element
in L′ , which yields the intermediate result as (m, L′′ ), that m is the minimum
element in L′ , and L′′ contains all rest elements except for m. Comparing l1
with m, we can determine which of them is the final minimum result.
The following Haskell program implements this version of selection sort.
sort [] = []
sort xs = x : sort xs' where
(x, xs') = extractMin xs

extractMin [x] = (x, [])

extractMin (x:xs) = if x < m then (x, xs) else (m, x:xs') where
(m, xs') = extractMin xs
Note that only ‘cons’ operation is used, we needn’t appending at all because
the algorithm actually examines the list from right to left. However, it’s not
free, as this program need book-keeping the context (via call stack typically).
The relative order is ensured by the nature of recursion. Please refer to the
appendix about tail recursion call for detailed discussion.

8.2.3 performance of the basic selection sorting

Both the labeling method, and the grouping method need examine all the ele-
ments to pick the minimum in every round; and we totally pick up the minimum
1 known as stable sort.
8.3. MINOR IMPROVEMENT 201

element n times. Thus the performance is around n+(n−1)+(n−2)+...+1 times

comparison, which is n(n+1)
2 . Selection sort is a quadratic algorithm bound to
O(n2 ) time.
Compare to the insertion sort, which we introduced previously, selection sort
performs same in its best case, worst case and average case. While insertion
sort performs well in best case (that the list has been reverse ordered, and it is
stored in linked-list) as O(n), and the worst performance is O(n2 ).
In the next sections, we’ll examine, why selection sort performs poor, and
try to improve it step by step.

Exercise 8.1

• Implement the basic imperative selection sort algorithm (the none in-place
version) in your favorite programming language. Compare it with the in-
place version, and analyze the time and space effectiveness.

8.3 Minor Improvement

8.3.1 Parameterize the comparator
Before any improvement in terms of performance, let’s make the selection sort
algorithm general enough to handle different sorting criteria.
We’ve seen two opposite examples so far, that one may need sort the elements
in ascending order or descending order. For the former case, we need repeatedly
finding the minimum, while for the later, we need find the maximum instead.
They are just two special cases. In real world practice, one may want to sort
things in varies criteria, e.g. in terms of size, weight, age, ...
One solution to handle them all is to passing the criteria as a compare
function to the basic selection sort algorithms. For example:

{
ϕ : L=ϕ
sort(c, L) =
m ∪ sort(c, L′′ ) : otherwise, (m, L′′ ) = extract(c, L′ )
(8.5)
And the algorithm extract(c, L) is defined as below.


 (l1 , ϕ) : |L| = 1
extract(c, L) = (l1 , L′ ) : c(l1 , m), (m, L′′ ) = extract(c, L′ )

(m, {l1 } ∪ L′′ ) : ¬c(l1 , m)
(8.6)
Where c is a comparator function, it takes two elements, compare them and
returns the result of which one is preceding of the other. Passing ‘less than’
operator (<) turns this algorithm to be the version we introduced in previous
section.
Some environments require to pass the total ordering comparator, which
returns result among ‘less than’, ’equal’, and ’greater than’. We needn’t such
strong condition here, that c only tests if ‘less than’ is satisfied. However, as the
minimum requirement, the comparator should meet the strict weak ordering as
following [16]:
202CHAPTER 8. FROM GRAPE TO THE WORLD CUP, THE EVOLUTION OF SELECTION SORT

• Irreflexivity, for all x, it’s not the case that x < x;

• Asymmetric, For all x and y, if x < y, then it’s not the case y < x;

• Transitivity, For all x, y, and z, if x < y, and y < z, then x < z;

The following Scheme/Lisp program translates this generic selection sorting

algorithm. The reason why we choose Scheme/Lisp here is because the lexical
scope can simplify the needs to pass the ‘less than’ comparator for every function
calls.
(define (sel-sort-by ltp? lst)
(define (ssort lst)
(if (null? lst)
lst
(let ((p (extract-min lst)))
(cons (car p) (ssort (cdr p))))))
(define (extract-min lst)
(if (null? (cdr lst))
lst
(let ((p (extract-min (cdr lst))))
(if (ltp? (car lst) (car p))
lst
(cons (car p) (cons (car lst) (cdr p)))))))
(ssort lst))

Note that, both ssort and extract-min are inner functions, so that the
‘less than’ comparator ltp? is available to them. Passing ‘<’ to this function
yields the normal sorting in ascending order:
(sel-sort-by < '(3 1 2 4 5 10 9))
;Value 16: (1 2 3 4 5 9 10)

It’s possible to pass varies of comparator to imperative selection sort as well.

This is left as an exercise to the reader.
For the sake of brevity, we only consider sorting elements in ascending order
in the rest of this chapter. And we’ll not pass comparator as a parameter unless
it’s necessary.

8.3.2 Trivial fine tune

The basic in-place imperative selection sorting algorithm iterates all elements,
and picking the minimum by traversing as well. It can be written in a compact
way, that we inline the minimum finding part as an inner loop.
procedure Sort(A)
for i ← 1 to |A| do
m←i
for j ← i + 1 to |A| do
if A[i] < A[m] then
m←i
Exchange A[i] ↔ A[m]
Observe that, when we are sorting n elements, after the first n − 1 minimum
ones are selected, the left only one, is definitely the n-th big element, so that
8.3. MINOR IMPROVEMENT 203

we need NOT find the minimum if there is only one element in the list. This
indicates that the outer loop can iterate to n − 1 instead of n.
Another place we can fine tune, is that we needn’t swap the elements if the
i-th minimum one is just A[i]. The algorithm can be modified accordingly as
below:
procedure Sort(A)
for i ← 1 to |A| − 1 do
m←i
for j ← i + 1 to |A| do
if A[i] < A[m] then
m←i
if m ̸= i then
Exchange A[i] ↔ A[m]
Definitely, these modifications won’t affects the performance in terms of big-
O.

8.3.3 Cock-tail sort

Knuth gave an alternative realization of selection sort in [1]. Instead of selecting
the minimum each time, we can select the maximum element, and put it to the
last position. This method can be illustrated by the following algorithm.
procedure Sort’(A)
for i ← |A| down-to 2 do
m←i
for j ← 1 to i − 1 do
if A[m] < A[i] then
m←i
Exchange A[i] ↔ A[m]
As shown in figure 12.1, at any time, the elements on right most side are
sorted. The algorithm scans all unsorted ones, and locate the maximum. Then,
put it to the tail of the unsorted range by swapping.

swap

... max ... x ... sorted elements ...

Figure 8.4: Select the maximum every time and put it to the end.

This version reveals the fact that, selecting the maximum element can sort
the element in ascending order as well. What’s more, we can find both the
minimum and the maximum elements in one pass of traversing, putting the
minimum at the first location, while putting the maximum at the last position.
This approach can speed up the sorting slightly (halve the times of the outer
loop). This method is called ’cock-tail sort’.
procedure Sort(A)
for i ← 1 to ⌊ |A|
2 ⌋ do
min ← i
204CHAPTER 8. FROM GRAPE TO THE WORLD CUP, THE EVOLUTION OF SELECTION SORT

max ← |A| + 1 − i
if A[max] < A[min] then
Exchange A[min] ↔ A[max]
for j ← i + 1 to |A| − i do
if A[j] < A[min] then
min ← j
if A[max] < A[j] then
max ← j
Exchange A[i] ↔ A[min]
Exchange A[|A| + 1 − i] ↔ A[max]
This algorithm can be illustrated as in figure 8.5, at any time, the left most
and right most parts contain sorted elements so far. That the smaller sorted ones
are on the left, while the bigger sorted ones are on the right. The algorithm scans
the unsorted ranges, located both the minimum and the maximum positions,
then put them to the head and the tail position of the unsorted ranges by
swapping.

swap

... sorted small ones ... x ... max ... min ... y ... sorted big ones ...

Figure 8.5: Select both the minimum and maximum in one pass, and put them
to the proper positions.

Note that it’s necessary to swap the left most and right most elements before
the inner loop if they are not in correct order. This is because we scan the range
excluding these two elements. Another method is to initialize the first element of
the unsorted range as both the maximum and minimum before the inner loop.
However, since we need two swapping operations after the scan, it’s possible
that the first swapping moves the maximum or the minimum from the position
we just found, which leads the second swapping malfunctioned. How to solve
this problem is left as exercise to the reader.
The following Python example program implements this cock-tail sort algo-
rithm.
def cocktail_sort(xs):
n = len(xs)
for i in range(n / 2):
(mi, ma) = (i, n - 1 -i)
if xs[ma] < xs[mi]:
(xs[mi], xs[ma]) = (xs[ma], xs[mi])
for j in range(i+1, n - 1 - i):
if xs[j] < xs[mi]:
mi = j
if xs[ma] < xs[j]:
ma = j
(xs[i], xs[mi]) = (xs[mi], xs[i])
(xs[n - 1 - i], xs[ma]) = (xs[ma], xs[n - 1 - i])
return xs
8.3. MINOR IMPROVEMENT 205

It’s possible to realize cock-tail sort in functional approach as well. An

intuitive recursive description can be given like this:

• Trivial edge case: If the list is empty, or there is only one element in the
list, the sorted result is obviously the origin list;

• Otherwise, we select the minimum and the maximum, put them in the
head and tail positions, then recursively sort the rest elements.

This algorithm description can be formalized by the following equation.

{
L : |L| ≤ 1
sort(L) = (8.7)
{lmin } ∪ sort(L′′ ) ∪ {lmax } : otherwise
Where the minimum and the maximum are extracted from L by a function
select(L).

(lmin , L′′ , lmax ) = select(L)

Note that, the minimum is actually linked to the front of the recursive sort
result. Its semantic is a constant O(1) time ‘cons’ (refer to the appendix of this
book for detail). While the maximum is appending to the tail. This is typically
a linear O(n) time expensive operation. We’ll optimize it later.
Function select(L) scans the whole list to find both the minimum and the
maximum. It can be defined as below:


 (min(l1 , l2 ), max(l 1 , l2 )) : L = {l1 , l2 }

(l1 , {lmin } ∪ L′′ , lmax ) : l1 < lmin
select(L) = (8.8)

 (lmin , {lmax } ∪ L′′ , l1 ) : lmax < l1

(lmin , {l1 } ∪ L′′ , lmax ) : otherwise
Where (lmin , L′′ , lmax ) = select(L′ ) and L′ is the rest of the list except for
the first element l1 . If there are only two elements in the list, we pick the
smaller as the minimum, and the bigger as the maximum. After extract them,
the list becomes empty. This is the trivial edge case; Otherwise, we take the first
element l1 out, then recursively perform selection on the rest of the list. After
that, we compare if l1 is less then the minimum or greater than the maximum
candidates, so that we can finalize the result.
Note that for all the cases, there is no appending operation to form the result.
However, since selection must scan all the element to determine the minimum
and the maximum, it is bound to O(n) linear time.
The complete example Haskell program is given as the following.
csort [] = []
csort [x] = [x]
csort xs = mi : csort xs' ++ [ma] where
(mi, xs', ma) = extractMinMax xs

extractMinMax [x, y] = (min x y, [], max x y)

extractMinMax (x:xs) | x < mi = (x, mi:xs', ma)
| ma < x = (mi, ma:xs', x)
| otherwise = (mi, x:xs', ma)
where (mi, xs', ma) = extractMinMax xs
206CHAPTER 8. FROM GRAPE TO THE WORLD CUP, THE EVOLUTION OF SELECTION SORT

We mentioned that the appending operation is expensive in this intuitive

version. It can be improved. This can be achieved in two steps. The first step is
to convert the cock-tail sort into tail-recursive call. Denote the sorted small ones
as A, and sorted big ones as B in figure 8.5. We use A and B as accumulators.
The new cock-tail sort is defined as the following.

{
′ A ∪ L ∪ B : L = ϕ ∨ |L| = 1
sort (A, L, B) =
sort′ (A ∪ {lmin }, L′′ , {lmax } ∪ B) : otherwise
(8.9)
Where lmin , lmax and L′′ are defined as same as before. And we start sorting
by passing empty A and B: sort(L) = sort′ (ϕ, L, ϕ).
Besides the edge case, observing that the appending operation only happens
on A ∪ {lmin }; while lmax is only linked to the head of B. This appending
occurs in every recursive call. To eliminate it, we can store A in reverse order
←
−
as A , so that lmax can be ‘cons’ to the head instead of appending. Denote
cons(x, L) = {x} ∪ L and append(L, x) = L ∪ {x}, we have the below equation.

append(L, x) = reverse(cons(x, reverse(L)))

←
− (8.10)
= reverse(cons(x, L ))
←
−
Finally, we perform a reverse to turn A back to A. Based on this idea, the
algorithm can be improved one more step as the following.


 reverse(A) ∪ B : L = ϕ
sort′ (A, L, B) = reverse({l1 } ∪ A) ∪ B : |L| = 1 (8.11)

sort′ ({lmin } ∪ A, L′′ , {lmax } ∪ B) :
This algorithm can be implemented by Haskell as below.
csort' xs = cocktail [] xs [] where
cocktail as [] bs = reverse as ++ bs
cocktail as [x] bs = reverse (x:as) ++ bs
cocktail as xs bs = let (mi, xs', ma) = extractMinMax xs
in cocktail (mi:as) xs' (ma:bs)

Exercise 8.2

• Realize the imperative basic selection sort algorithm, which can take a
comparator as a parameter. Please try both dynamic typed language and
static typed language. How to annotate the type of the comparator as
general as possible in a static typed language?
• Implement Knuth’s version of selection sort in your favorite programming
language.
• An alternative to realize cock-tail sort is to assume the i-th element both
the minimum and the maximum, after the inner loop, the minimum and
maximum are found, then we can swap the the minimum to the i-th
position, and the maximum to position |A|+1−i. Implement this solution
in your favorite imperative language. Please note that there are several
special edge cases should be handled correctly:
8.4. MAJOR IMPROVEMENT 207

– A = {max, min, ...};

– A = {..., max, min};
– A = {max, ..., min}.
Please don’t refer to the example source code along with this chapter
before you try to solve this problem.
• Realize the function select(L) by folding.

8.4 Major improvement

Although cock-tail sort halves the numbers of loop, the performance is still
bound to quadratic time. It means that, the method we developed so far handles
big data poorly compare to other divide and conquer sorting solutions.
To improve selection based sort essentially, we must analyze where is the
bottle-neck. In order to sort the elements by comparison, we must examine all
the elements for ordering. Thus the outer loop of selection sort is necessary.
However, must it scan all the elements every time to select the minimum? Note
that when we pick the smallest one at the first time, we actually traverse the
whole collection, so that we know which ones are relative big, and which ones
are relative small partially.
The problem is that, when we select the further minimum elements, instead
of re-using the ordering information we obtained previously, we drop them all,
and blindly start a new traverse.
So the key point to improve selection based sort is to re-use the previous
result. There are several approaches, we’ll adopt an intuitive idea inspired by
football match in this chapter.

8.4.1 Tournament knock out

The football world cup is held every four years. There are 32 teams from
different continent play the final games. Before 1982, there were 16 teams
compete for the tournament finals[4].
For simplification purpose, let’s go back to 1978 and imagine a way to de-
termine the champion: In the first round, the teams are grouped into 8 pairs
to play the game; After that, there will be 8 winner, and 8 teams will be out.
Then in the second round, these 8 teams are grouped into 4 pairs. This time
there will be 4 winners after the second round of games; Then the top 4 teams
are divided into 2 pairs, so that there will be only two teams left for the final
game.
The champion is determined after the total 4 rounds of games. And there
are actually 8 + 4 + 2 + 1 = 16 games. Now we have the world cup champion,
however, the world cup game won’t finish at this stage, we need to determine
which is the silver medal team.
Readers may argue that isn’t the team beaten by the champion at the fi-
nal game the second best? This is true according to the real world cup rule.
However, it isn’t fair enough in some sense.
We often heard about the so called ‘group of death’, Let’s suppose that
Brazil team is grouped with Deutch team at the very beginning. Although both
208CHAPTER 8. FROM GRAPE TO THE WORLD CUP, THE EVOLUTION OF SELECTION SORT

teams are quite strong, one of them must be knocked out. It’s quite possible
that even the team loss that game can beat all the other teams except for the
champion. Figure 8.6 illustrates such case.

16 14

16 13 10 14

7 16 8 13 10 9 12 14

7 6 15 16 8 4 13 3 5 10 9 1 12 2 11 14

Figure 8.6: The element 15 is knocked out in the first round.

Imagine that every team has a number. The bigger the number, the stronger
the team. Suppose that the stronger team always beats the team with smaller
number, although this is not true in real world. But this simplification is fair
enough for us to develop the tournament knock out solution. This maximum
number which represents the champion is 16. Definitely, team with number 14
isn’t the second best according to our rules. It should be 15, which is knocked
out at the first round of comparison.
The key question here is to find an effective way to locate the second max-
imum number in this tournament tree. After that, what we need is to apply
the same method to select the third, the fourth, ..., to accomplish the selection
based sort.
One idea is to assign the champion a very small number (for instance, −∞),
so that it won’t be selected next time, and the second best one, becomes the
new champion. However, suppose there are 2m teams for some natural number
m, it still takes 2m−1 + 2m−2 + ... + 2 + 1 = 2m times of comparison to determine
the new champion. Which is as slow as the first time.
Actually, we needn’t perform a bottom-up comparison at all since the tour-
nament tree stores plenty of ordering information. Observe that, the second
best team must be beaten by the champion at sometime, or it will be the final
winner. So we can track the path from the root of the tournament tree to the
leaf of the champion, examine all the teams along with this path to find the
second best team.
In figure 8.6, this path is marked in gray color, the elements to be examined
are {14, 13, 7, 15}. Based on this idea, we refine the algorithm like below.

1. Build a tournament tree from the elements to be sorted, so that the cham-
pion (the maximum) becomes the root;
2. Extract the root from the tree, perform a top-down pass and replace the
maximum with −∞;
3. Perform a bottom-up back-track along the path, determine the new cham-
pion and make it as the new root;
4. Repeat step 2 until all elements have been extracted.

Figure 8.7, 8.8, and 8.9 show the steps of applying this strategy.
8.4. MAJOR IMPROVEMENT 209

15 14

15 13 10 14

7 15 8 13 10 9 12 14

7 6 15 -INF 8 4 13 3 5 10 9 1 12 2 11 14

Figure 8.7: Extract 16, replace it with −∞, 15 sifts up to root.

13 14

7 13 10 14

7 -INF 8 13 10 9 12 14

7 6 -INF -INF 8 4 13 3 5 10 9 1 12 2 11 14

Figure 8.8: Extract 15, replace it with −∞, 14 sifts up to root.

13 12

7 13 10 12

7 -INF 8 13 10 9 12 11

7 6 -INF -INF 8 4 13 3 5 10 9 1 12 2 11 -INF

Figure 8.9: Extract 14, replace it with −∞, 13 sifts up to root.

210CHAPTER 8. FROM GRAPE TO THE WORLD CUP, THE EVOLUTION OF SELECTION SORT

We can reuse the binary tree definition given in the first chapter of this
book to represent tournament tree. In order to back-track from leaf to the root,
every node should hold a reference to its parent (concept of pointer in some
environment such as ANSI C):
struct Node {
Key key;
struct Node ∗left, ∗right, ∗parent;
};

To build a tournament tree from a list of elements (suppose the number of

elements are 2m for some m), we can first wrap each element as a leaf, so that
we obtain a list of binary trees. We take every two trees from this list, compare
their keys, and form a new binary tree with the bigger key as the root; the two
trees are set as the left and right children of this new binary tree. Repeat this
operation to build a new list of trees. The height of each tree is increased by 1.
Note that the size of the tree list halves after such a pass, so that we can keep
reducing the list until there is only one tree left. And this tree is the finally
built tournament tree.
function Build-Tree(A)
T ←ϕ
for each x ∈ A do
t ← Create-Node
Key(t) ← x
Append(T, t)
while |T | > 1 do
T′ ← ϕ
for every t1 , t2 ∈ T do
t ← Create-Node
Key(t) ← Max(Key(t1 ), Key(t2 ))
Left(t) ← t1
Right(t) ← t2
Parent(t1 ) ← t
Parent(t2 ) ← t
Append(T ′ , t)
T ← T′
return T [1]
Suppose the length of the list A is n, this algorithm firstly traverses the list
to build tree, which is linear to n time. Then it repeatedly compares pairs,
which loops proportion to n + n2 + n4 + ... + 2 = 2n. So the total performance
is bound to O(n) time.
The following ANSI C program implements this tournament tree building
algorithm.
struct Node∗ build(const Key∗ xs, int n) {
int i;
struct Node ∗t, ∗∗ts = (struct Node∗∗) malloc(sizeof(struct Node∗) ∗ n);
for (i = 0; i < n; ++i)
ts[i] = leaf(xs[i]);
for (; n > 1; n /= 2)
for (i = 0; i < n; i += 2)
8.4. MAJOR IMPROVEMENT 211

ts[i/2] = branch(max(ts[i]→key, ts[i+1]→key), ts[i], ts[i+1]);

t = ts[0];
free(ts);
return t;
}
The type of key can be defined somewhere, for example:
typedef int Key;
Function leaf(x) creats a leaf node, with value x as key, and sets all its
fields, left, right and parent to NIL. While function branch(key, left, right)
creates a branch node, and links the new created node as parent of its two
children if they are not empty. For the sake of brevity, we skip the detail of
them. They are left as exercise to the reader, and the complete program can be
downloaded along with this book.
Some programming environments, such as Python provides tool to iterate
every two elements at a time, for example:
for x, y in zip(∗[iter(ts)]∗2):
We skip such language specific feature, readers can refer to the Python ex-
ample program along with this book for details.
When the maximum element is extracted from the tournament tree, we
replace it with −∞, and repeatedly replace all these values from the root to the
leaf. Next, we back-track to root through the parent field, and determine the
new maximum element.
function Extract-Max(T )
m ← Key(T )
Key(T ) ← −∞
while ¬ Leaf?(T ) do ▷ The top down pass
if Key(Left(T )) = m then
T ← Left(T )
else
T ← Right(T )
Key(T ) ← −∞
while Parent(T ) ̸= ϕ do ▷ The bottom up pass
T ← Parent(T )
Key(T ) ← Max(Key(Left(T )), Key(Right(T )))
return m
This algorithm returns the extracted maximum element, and modifies the
tournament tree in-place. Because we can’t represent −∞ in real program by
limited length of word, one approach is to define a relative negative big number,
which is less than all the elements in the tournament tree, for example, suppose
all the elements are greater than -65535, we can define negative infinity as below:
#define N_INF -65535
We can implements this algorithm as the following ANSI C example program.
Key pop(struct Node∗ t) {
Key x = t→key;
t→key = N_INF;
while (!isleaf(t)) {
212CHAPTER 8. FROM GRAPE TO THE WORLD CUP, THE EVOLUTION OF SELECTION SORT

t = t→left→key == x ? t→left : t→right;

t→key = N_INF;
}
while (t→parent) {
t = t→parent;
t→key = max(t→left→key, t→right→key);
}
return x;
}

The behavior of Extract-Max is quite similar to the pop operation for

some data structures, such as queue, and heap, thus we name it as pop in this
code snippet.
Algorithm Extract-Max process the tree in two passes, one is top-down,
then a bottom-up along the path that the ‘champion team wins the world cup’.
Because the tournament tree is well balanced, the length of this path, which is
the height of the tree, is bound to O(lg n), where n is the number of the elements
to be sorted (which are equal to the number of leaves). Thus the performance
of this algorithm is O(lg n).
It’s possible to realize the tournament knock out sort now. We build a
tournament tree from the elements to be sorted, then continuously extract the
maximum. If we want to sort in monotonically increase order, we put the first
extracted one to the right most, then insert the further extracted elements one
by one to left; Otherwise if we want to sort in decrease order, we can just append
the extracted elements to the result. Below is the algorithm sorts elements in
ascending order.
procedure Sort(A)
T ← Build-Tree(A)
for i ← |A| down to 1 do
A[i] ← Extract-Max(T )
Translating it to ANSI C example program is straightforward.
void tsort(Key∗ xs, int n) {
struct Node∗ t = build(xs, n);
while(n)
xs[--n] = pop(t);
release(t);
}

This algorithm firstly takes O(n) time to build the tournament tree, then
performs n pops to select the maximum elements so far left in the tree. Since
each pop operation is bound to O(lg n), thus the total performance of tourna-
ment knock out sorting is O(n lg n).

Refine the tournament knock out

It’s possible to design the tournament knock out algorithm in purely functional
approach. And we’ll see that the two passes (first top-down replace the cham-
pion with −∞, then bottom-up determine the new champion) in pop operation
can be combined in recursive manner, so that we needn’t the parent field any
more. We can re-use the functional binary tree definition as the following ex-
ample Haskell code.
8.4. MAJOR IMPROVEMENT 213

data Tr a = Empty | Br (Tr a) a (Tr a)

Thus a binary tree is either empty or a branch node contains a key, a left
sub tree and a right sub tree. Both children are again binary trees.
We’ve use hard coded big negative number to represents −∞. However, this
solution is ad-hoc, and it forces all elements to be sorted are greater than this
pre-defined magic number. Some programming environments support algebraic
type, so that we can define negative infinity explicitly. For instance, the below
Haskell program setups the concept of infinity 2 .
data Infinite a = NegInf | Only a | Inf deriving (Eq, Ord)

From now on, we switch back to use the min() function to determine the
winner, so that the tournament selects the minimum instead of the maximum
as the champion.
Denote function key(T ) returns the key of the tree rooted at T . Function
wrap(x) wraps the element x into a leaf node. Function tree(l, k, r) creates a
branch node, with k as the key, l and r as the two children respectively.
The knock out process, can be represented as comparing two trees, picking
the smaller key as the new key, and setting these two trees as children:

branch(T1 , T2 ) = tree(T1 , min(key(T1 ), key(T2 )), T2 ) (8.12)

This can be implemented in Haskell word by word:
branch t1 t2 = Br t1 (min (key t1) (key t2)) t2

There is limitation in our tournament sorting algorithm so far. It only

accepts collection of elements with size of 2m , or we can’t build a complete
binary tree. This can be actually solved in the tree building process. Remind
that we pick two trees every time, compare and pick the winner. This is perfect
if there are always even number of trees. Considering a case in football match,
that one team is absent for some reason (sever flight delay or whatever), so that
there left one team without a challenger. One option is to make this team the
winner, so that it will attend the further games. Actually, we can use the similar
approach.
To build the tournament tree from a list of elements, we wrap every element
into a leaf, then start the building process.

build(L) = build′ ({wrap(x)|x ∈ L}) (8.13)

The build′ (T) function terminates when there is only one tree left in T, which
is the champion. This is the trivial edge case. Otherwise, it groups every two
trees in a pair to determine the winners. When there are odd numbers of trees,
it just makes the last tree as the winner to attend the next level of tournament
and recursively repeats the building process.
{
′ T : |T| ≤ 1
build (T) = (8.14)
build′ (pair(T)) : otherwise
2 The order of the definition of ‘NegInf’, regular number, and ‘Inf’ is significant if we want

to derive the default, correct comparing behavior of ‘Ord’. Anyway, it’s possible to specify the
detailed order by make it as an instance of ‘Ord’. However, this is Language specific feature
which is out of the scope of this book. Please refer to other textbook about Haskell.
214CHAPTER 8. FROM GRAPE TO THE WORLD CUP, THE EVOLUTION OF SELECTION SORT

Note that this algorithm actually handles another special cases, that the list
to be sort is empty. The result is obviously empty.
Denote T = {T1 , T2 , ...} if there are at least two trees, and T′ represents the
left trees by removing the first two. Function pair(T) is defined as the following.

{
{branch(T1 , T2 )} ∪ pair(T′ ) : |T| ≥ 2
pair(T) = (8.15)
T : otherwise

The complete tournament tree building algorithm can be implemented as

the below example Haskell program.
fromList :: (Ord a) ⇒ [a] → Tr (Infinite a)
fromList = build ◦ (map wrap) where
build [] = Empty
build [t] = t
build ts = build $ pair ts
pair (t1:t2:ts) = (branch t1 t2):pair ts
pair ts = ts
When extracting the champion (the minimum) from the tournament tree,
we need examine either the left child sub-tree or the right one has the same key
as the root, and recursively extract on that tree until arrive at the leaf node.
Denote the left sub-tree of T as L, right sub-tree as R, and K as its key. We
can define this popping algorithm as the following.


 tree(ϕ, ∞, ϕ) : L=ϕ∧R=ϕ
pop(T ) = tree(L′ , min(key(L′ ), key(R)), R) : K = key(L), L′ = pop(L)

tree(L, min(key(L), key(R′ )), R′ ) : K = key(R), R′ = pop(R)
(8.16)
It’s straightforward to translate this algorithm into example Haskell code.
pop (Br Empty _ Empty) = Br Empty Inf Empty
pop (Br l k r) | k == key l = let l' = pop l in Br l' (min (key l') (key r)) r
| k == key r = let r' = pop r in Br l (min (key l) (key r')) r'
Note that this algorithm only removes the current champion without return-
ing it. So it’s necessary to define a function to get the champion at the root
node.

top(T ) = key(T ) (8.17)

With these functions defined, tournament knock out sorting can be formal-
ized by using them.

sort(L) = sort′ (build(L)) (8.18)

′
Where sort (T ) continuously pops the minimum element to form a result
tree

{
ϕ : T = ϕ ∨ key(T ) = ∞
sort′ (T ) = (8.19)
{top(T )} ∪ sort′ (pop(T )) : otherwise

The rest of the Haskell code is given below to complete the implementation.
8.4. MAJOR IMPROVEMENT 215

top = only ◦ key

tsort :: (Ord a) ⇒ [a] → [a]

tsort = sort' ◦ fromList where
sort' Empty = []
sort' (Br _ Inf _) = []
sort' t = (top t) : (sort' $ pop t)

And the auxiliary function only, key, wrap accomplished with explicit in-
finity support are list as the following.
only (Only x) = x
key (Br _ k _ ) = k
wrap x = Br Empty (Only x) Empty

Exercise 8.3

• Implement the helper function leaf(), branch, max() lsleaf(), and

release() to complete the imperative tournament tree program.

• Implement the imperative tournament tree in a programming language

support GC (garbage collection).

• Why can our tournament tree knock out sort algorithm handle duplicated
elements (elements with same value)? We say a sorting algorithm stable, if
it keeps the original order of elements with same value. Is the tournament
tree knock out sorting stable?

• Design an imperative tournament tree knock out sort algorithm, which

satisfies the following:

– Can handle arbitrary number of elements;

– Without using hard coded negative infinity, so that it can take ele-
ments with any value.

• Compare the tournament tree knock out sort algorithm and binary tree
sort algorithm, analyze efficiency both in time and space.

• Compare the heap sort algorithm and binary tree sort algorithm, and do
same analysis for them.

8.4.2 Final improvement by using heap sort

We manage improving the performance of selection based sorting to O(n lg n)
by using tournament knock out. This is the limit of comparison based sort
according to [1]. However, there are still rooms for improvement. After sorting,
there lefts a complete binary tree with all leaves and branches hold useless
infinite values. This isn’t space efficient at all. Can we release the nodes when
popping?
Another observation is that if there are n elements to be sorted, we actually
allocate about 2n tree nodes. n for leaves and n for branches. Is there any
better way to halve the space usage?
216CHAPTER 8. FROM GRAPE TO THE WORLD CUP, THE EVOLUTION OF SELECTION SORT

The final sorting structure described in equation 8.19 can be easily uniformed
to a more general one if we treat the case that the tree is empty if its root holds
infinity as key:
{
ϕ : T =ϕ
sort′ (T ) = (8.20)
{top(T )} ∪ sort′ (pop(T )) : otherwise
This is exactly as same as the one of heap sort we gave in previous chapter.
Heap always keeps the minimum (or the maximum) on the top, and provides
fast pop operation. The binary heap by implicit array encodes the tree structure
in array index, so there aren’t any extra spaces allocated except for the n array
cells. The functional heaps, such as leftist heap and splay heap allocate n nodes
as well. We’ll introduce more heaps in next chapter which perform well in many
aspects.

8.5 Short summary

In this chapter, we present the evolution process of selection based sort. selection
sort is easy and commonly used as example to teach students about embedded
looping. It has simple and straightforward structure, but the performance is
quadratic. In this chapter, we do see that there exists ways to improve it not
only by some fine tuning, but also fundamentally change the data structure,
which leads to tournament knock out and heap sort.
Bibliography

[1] Donald E. Knuth. “The Art of Computer Programming, Volume 3: Sorting

and Searching (2nd Edition)”. Addison-Wesley Professional; 2 edition (May
4, 1998) ISBN-10: 0201896850 ISBN-13: 978-0201896855

[2] Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest and Clifford

Stein. “Introduction to Algorithms, Second Edition”. ISBN:0262032937.
The MIT Press. 2001
[3] Wikipedia. “Strict weak order”. http://en.wikipedia.org/wiki/Strict_weak_order
[4] Wikipedia. “FIFA world cup”. http://en.wikipedia.org/wiki/FIFA_World_Cup

217
218 Binomial heap, Fibonacci heap, and pairing heap
Chapter 9

Binomial heap, Fibonacci

heap, and pairing heap

9.1 Introduction
In previous chapter, we mentioned that heaps can be generalized and imple-
mented with varies of data structures. However, only binary heaps are focused
so far no matter by explicit binary trees or implicit array.
It’s quite natural to extend the binary tree to K-ary [1] tree. In this chapter,
we first show Binomial heaps which is actually consist of forest of K-ary trees.
Binomial heaps gain the performance for all operations to O(lg n), as well as
keeping the finding minimum element to O(1) time.
If we delay some operations in Binomial heaps by using lazy strategy, it
turns to be Fibonacci heap.
All binary heaps we have shown perform no less than O(lg n) time for merg-
ing, we’ll show it’s possible to improve it to O(1) with Fibonacci heap, which
is quite helpful to graph algorithms. Actually, Fibonacci heap achieves almost
all operations to good amortized time bound as O(1), and left the heap pop to
O(lg n).
Finally, we’ll introduce about the pairing heaps. It has the best performance
in practice although the proof of it is still a conjecture for the time being.

9.2 Binomial Heaps

9.2.1 Definition
Binomial heap is more complex than most of the binary heaps. However, it has
excellent merge performance which bound to O(lg n) time. A binomial heap is
consist of a list of binomial trees.

Binomial tree
In order to explain why the name of the tree is ‘binomial’, let’s review the
famous Pascal’s triangle (Also know as the Jia Xian’s triangle to memorize the
Chinese methematican Jia Xian (1010-1070).) [4].

219
220CHAPTER 9. BINOMIAL HEAP, FIBONACCI HEAP, AND PAIRING HEAP

1
1 1
1 2 1
1 3 3 1
1 4 6 4 1
...

In each row, the numbers are all binomial coefficients. There are many
ways to gain a series of binomial coefficient numbers. One of them is by using
recursive composition. Binomial trees, as well, can be defined in this way as the
following.

• A binomial tree of rank 0 has only a node as the root;

• A binomial tree of rank n is consist of two rank n − 1 binomial trees,

Among these 2 sub trees, the one has the bigger root element is linked as
the leftmost child of the other.

We denote a binomial tree of rank 0 as B0 , and the binomial tree of rank n

as Bn .
Figure 9.1 shows a B0 tree and how to link 2 Bn−1 trees to a Bn tree.

(a) A B0 tree.

rank=n-1

rank=n-1 ...

...

(b) Linking 2 Bn−1 trees yields a Bn tree.

Figure 9.1: Recursive definition of binomial trees

With this recursive definition, it easy to draw the form of binomial trees of
rank 0, 1, 2, ..., as shown in figure 9.2
Observing the binomial trees reveals some interesting properties. For each
rank n binomial tree, if counting the number of nodes in each row, it can be
found that it is the binomial number.
For instance for rank 4 binomial tree, there is 1 node as the root; and in the
second level next to root, there are 4 nodes; and in 3rd level, there are 6 nodes;
and in 4-th level, there are 4 nodes; and the 5-th level, there is 1 node. They
9.2. BINOMIAL HEAPS 221

2 2 1 0

1 1 0 1 0 0

0 0 0

(a) B0 tree; (b) B1 tree; (c) B2 tree; (d) B3 tree;

3 2 1 0

2 1 0 1 0 0

1 0 0 0

0
...
(e) B4 tree;

Figure 9.2: Forms of binomial trees with rank = 0, 1, 2, 3, 4, ...

222CHAPTER 9. BINOMIAL HEAP, FIBONACCI HEAP, AND PAIRING HEAP

are exactly 1, 4, 6, 4, 1 which is the 5th row in Pascal’s triangle. That’s why
we call it binomial tree.
Another interesting property is that the total number of node for a binomial
tree with rank n is 2n . This can be proved either by binomial theory or the
recursive definition directly.

Binomial heap

With binomial tree defined, we can introduce the definition of binomial heap.
A binomial heap is a set of binomial trees (or a forest of binomial trees) that
satisfied the following properties.

• Each binomial tree in the heap conforms to heap property, that the key
of a node is equal or greater than the key of its parent. Here the heap is
actually min-heap, for max-heap, it changes to ‘equal or less than’. In this
chapter, we only discuss about min-heap, and max-heap can be equally
applied by changing the comparison condition.

• There is at most one binomial tree which has the rank r. In other words,
there are no two binomial trees have the same rank.

This definition leads to an important result that for a binomial heap contains
n elements, and if convert n to binary format yields a0 , a1 , a2 , ..., am , where a0
is the LSB and am is the MSB, then for each 0 ≤ i ≤ m, if ai = 0, there is no
binomial tree of rank i and if ai = 1, there must be a binomial tree of rank i.
For example, if a binomial heap contains 5 element, as 5 is ‘(LSB)101(MSB)’,
then there are 2 binomial trees in this heap, one tree has rank 0, the other has
rank 2.
Figure 9.3 shows a binomial heap which have 19 nodes, as 19 is ‘(LSB)11001(MSB)’
in binary format, so there is a B0 tree, a B1 tree and a B4 tree.

18 3 6

37 8 29 10 44

30 23 22 48 31 17

45 32 24 50

Figure 9.3: A binomial heap with 19 elements

9.2. BINOMIAL HEAPS 223

Data layout
There are two ways to define K-ary trees imperatively. One is by using ‘left-
child, right-sibling’ approach[2]. It is compatible with the typical binary tree
structure. For each node, it has two fields, left field and right field. We use the
left field to point to the first child of this node, and use the right field to point to
the sibling node of this node. All siblings are represented as a single directional
linked list. Figure 9.4 shows an example tree represented in this way.

R NIL

C1 C2 ... Cn

C1’ C2’ ... Cm’

Figure 9.4: Example tree represented in ‘left-child, right-sibling’ way. R is the

root node, it has no sibling, so it right side is pointed to N IL. C1 , C2 , ..., Cn
are children of R. C1 is linked from the left side of R, other siblings of C1 are
linked one next to each other on the right side of C1 . C2′ , ..., Cm
′
are children of
C1 .

The other way is to use the library defined collection container, such as array
or list to represent all children of a node.
Since the rank of a tree plays very important role, we also defined it as a
field.
For ‘left-child, right-sibling’ method, we defined the binomial tree as the
following.1
class BinomialTree:
def __init__(self, x = None):
self.rank = 0
self.key = x
self.parent = None
self.child = None
self.sibling = None

When initialize a tree with a key, we create a leaf node, set its rank as zero
and all other fields are set as NIL.
It quite nature to utilize pre-defined list to represent multiple children as
below.
class BinomialTree:
def __init__(self, x = None):
self.rank = 0
self.key = x
self.parent = None
self.children = []

1C programs are also provided along with this book.

224CHAPTER 9. BINOMIAL HEAP, FIBONACCI HEAP, AND PAIRING HEAP

For purely functional settings, such as in Haskell language, binomial tree are
defined as the following.
data BiTree a = Node { rank :: Int
, root :: a
, children :: [BiTree a]}
While binomial heap are defined as a list of binomial trees (a forest) with
ranks in monotonically increase order. And as another implicit constraint, there
are no two binomial trees have the same rank.
type BiHeap a = [BiTree a]

9.2.2 Basic heap operations

Linking trees
Before dive into the basic heap operations such as pop and insert, We’ll first
realize how to link two binomial trees with same rank into a bigger one. Accord-
ing to the definition of binomial tree and heap property that the root always
contains the minimum key, we firstly compare the two root values, select the
smaller one as the new root, and insert the other tree as the first child in front
of all other children. Suppose function Key(T ), Children(T ), and Rank(T )
access the key, children and rank of a binomial tree respectively.
{
node(r + 1, x, {T2 } ∪ C1 ) : x < y
link(T1 , T2 ) = (9.1)
node(r + 1, y, {T1 } ∪ C2 ) : otherwise
Where

x = Key(T1 )
y = Key(T2 )
r = Rank(T1 ) = Rank(T2 )
C1 = Children(T1 )
C2 = Children(T2 )

y ...

...

Figure 9.5: Suppose x < y, insert y as the first child of x.

Note that the link operation is bound to O(1) time if the ∪ is a constant
time operation. It’s easy to translate the link function to Haskell program as
the following.
link t1@(Node r x c1) t2@(Node _ y c2) =
if x<y then Node (r+1) x (t2:c1)
else Node (r+1) y (t1:c2)
9.2. BINOMIAL HEAPS 225

It’s possible to realize the link operation in imperative way. If we use ‘left
child, right sibling’ approach, we just link the tree which has the bigger key to
the left side of the other’s key, and link the children of it to the right side as
sibling. Figure 9.6 shows the result of one case.
1: function Link(T1 , T2 )
2: if Key(T2 ) < Key(T1 ) then
3: Exchange T1 ↔ T2
4: Sibling(T2 ) ← Child(T1 )
5: Child(T1 ) ← T2
6: Parent(T2 ) ← T1
7: Rank(T1 ) ← Rank(T1 ) + 1
8: return T1

y ...

...

Figure 9.6: Suppose x < y, link y to the left side of x and link the original
children of x to the right side of y.

And if we use a container to manage all children of a node, the algorithm is

like below.
1: function Link’(T1 , T2 )
2: if Key(T2 ) < Key(T1 ) then
3: Exchange T1 ↔ T2
4: Parent(T2 ) ← T1
5: Insert-Before(Children(T1 ), T2 )
6: Rank(T1 ) ← Rank(T1 ) + 1
7: return T1
It’s easy to translate both algorithms to real program. Here we only show
the Python program of Link’ for illustration purpose 2 .
def link(t1, t2):
if t2.key < t1.key:
(t1, t2) = (t2, t1)
t2.parent = t1
t1.children.insert(0, t2)
t1.rank = t1.rank + 1
return t1

Exercise 9.1
Implement the tree-linking program in your favorite language with left-child,
right-sibling method.
2 The C and C++ programs are also available along with this book
226CHAPTER 9. BINOMIAL HEAP, FIBONACCI HEAP, AND PAIRING HEAP

We mentioned linking is a constant time algorithm and it is true when using

left-child, right-sibling approach, However, if we use container to manage the
children, the performance depends on the concrete implementation of the con-
tainer. If it is plain array, the linking time will be proportion to the number
of children. In this chapter, we assume the time is constant. This is true if the
container is implemented in linked-list.

Insert a new element to the heap (push)

As the rank of binomial trees in a forest is monotonically increasing, by using
the link function defined above, it’s possible to define an auxiliary function, so
that we can insert a new tree, with rank no bigger than the smallest one, to the
heap which is a forest actually.
Denote the non-empty heap as H = {T1 , T2 , ..., Tn }, we define


 {T } : H = ϕ
insertT (H, T ) = {T } ∪ H : Rank(T ) < Rank(T1 )

insertT (H ′ , link(T, T1 )) : otherwise
(9.2)
where

H ′ = {T2 , T3 , ..., Tn }
The idea is that for the empty heap, we set the new tree as the only element
to create a singleton forest; otherwise, we compare the ranks of the new tree
and the first tree in the forest, if they are same, we link them together, and
recursively insert the linked result (a tree with rank increased by one) to the
rest of the forest; If they are not same, since the pre-condition constraints the
rank of the new tree, it must be the smallest, we put this new tree in front of
all the other trees in the forest.
From the binomial properties mentioned above, there are at most O(lg n)
binomial trees in the forest, where n is the total number of nodes. Thus function
insertT performs at most O(lg n) times linking, which are all constant time
operation. So the performance of insertT is O(lg n). 3
The relative Haskell program is given as below.
insertTree [] t = [t]
insertTree ts@(t':ts') t = if rank t < rank t' then t:ts
else insertTree ts' (link t t')

With this auxiliary function, it’s easy to realize the insertion. We can wrap
the new element to be inserted as the only leaf of a tree, then insert this tree to
the binomial heap.

insert(H, x) = insertT (H, node(0, x, ϕ)) (9.3)

And we can continuously build a heap from a series of elements by folding.
For example the following Haskell define a helper function ’fromList’.
3 There is interesting observation by comparing this operation with adding two binary

numbers. Which will lead to topic of numeric representation[6].

9.2. BINOMIAL HEAPS 227

fromList = foldl insert []

Since wrapping an element as a singleton tree takes O(1) time, the real work
is done in insertT , the performance of binomial heap insertion is bound to
O(lg n).
The insertion algorithm can also be realized with imperative approach.

Algorithm 1 Insert a tree with ’left-child-right-sibling’ method.

1: function Insert-Tree(H, T )
2: while H ̸= ϕ∧ Rank(Head(H)) = Rank(T ) do
3: (T1 , H) ← Extract-Head(H)
4: T ← Link(T, T1 )
5: Sibling(T ) ← H
6: return T

Algorithm 1 continuously linking the first tree in a heap with the new tree
to be inserted if they have the same rank. After that, it puts the linked-list of
the rest trees as the sibling, and returns the new linked-list.
If using a container to manage the children of a node, the algorithm can be
given in Algorithm 2.

Algorithm 2 Insert a tree with children managed by a container.

1: function Insert-Tree’(H, T )
2: while H ̸= ϕ∧ Rank(H[0]) = Rank(T ) do
3: T1 ← Pop(H)
4: T ← Link(T, T1 )
5: Head-Insert(H, T )
6: return H

In this algorithm, function Pop removes the first tree T1 = H[0] from the
forest. And function Head-Insert, insert a new tree before any other trees in
the heap, so that it becomes the first element in the forest.
With either Insert-Tree or Insert-Tree’ defined. Realize the binomial
heap insertion is trivial.

Algorithm 3 Imperative insert algorithm

1: function Insert(H, x)
2: return Insert-Tree(H, Node(0, x, ϕ))

The following python program implement the insert algorithm by using a

container to manage sub-trees. the ‘left-child, right-sibling’ program is left as
an exercise.
def insert_tree(ts, t):
while ts !=[] and t.rank == ts[0].rank:
t = link(t, ts.pop(0))
ts.insert(0, t)
return ts
228CHAPTER 9. BINOMIAL HEAP, FIBONACCI HEAP, AND PAIRING HEAP

def insert(h, x):

return insert_tree(h, BinomialTree(x))

Exercise 9.2
Write the insertion program in your favorite imperative programming lan-
guage by using the ‘left-child, right-sibling’ approach.

Merge two heaps

When merge two binomial heaps, we actually try to merge two forests of bino-
mial trees. According to the definition, there can’t be two trees with the same
rank and the ranks are in monotonically increasing order. Our strategy is very
similar to merge sort. That in every iteration, we take the first tree from each
forest, compare their ranks, and pick the smaller one to the result heap; if the
ranks are equal, we then perform linking to get a new tree, and recursively insert
this new tree to the result of merging the rest trees.
Figure 9.7 illustrates the idea of this algorithm. This method is different
from the one given in [2].
We can formalize this idea with a function. For non-empty cases, we denote
the two heaps as H1 = {T1 , T2 , ...} and H2 = {T1′ , T2′ , ...}. Let H1′ = {T2 , T3 , ...}
and H2′ = {T2′ , T3′ , ...}.



 H1 :
H2 = ϕ


 H2 :
H1 = ϕ
merge(H1 , H2 ) = {T1 } ∪ merge(H1′ , H2 ) Rank(T1 ) < Rank(T1′ )
:



 {T1′ } ∪ merge(H1 , H2′ ) Rank(T1 ) > Rank(T1′ )
:

insertT (merge(H1′ , H2′ ), link(T1 , T1′ )) :
otherwise
(9.4)
To analysis the performance of merge, suppose there are m1 trees in H1 ,
and m2 trees in H2 . There are at most m1 + m2 trees in the merged result.
If there are no two trees have the same rank, the merge operation is bound to
O(m1 + m2 ). While if there need linking for the trees with same rank, insertT
performs at most O(m1 + m2 ) time. Consider the fact that m1 = 1 + ⌊lg n1 ⌋
and m2 = 1 + ⌊lg n2 ⌋, where n1 , n2 are the numbers of nodes in each heap, and
⌊lg n1 ⌋ + ⌊lg n2 ⌋ ≤ 2⌊lg n⌋, where n = n1 + n2 , is the total number of nodes. the
final performance of merging is O(lg n).
Translating this algorithm to Haskell yields the following program.
merge ts1 [] = ts1
merge [] ts2 = ts2
merge ts1@(t1:ts1') ts2@(t2:ts2')
| rank t1 < rank t2 = t1:(merge ts1' ts2)
| rank t1 > rank t2 = t2:(merge ts1 ts2')
| otherwise = insertTree (merge ts1' ts2') (link t1 t2)

Merge algorithm can also be described in imperative way as shown in algo-

rithm 4.
Since both heaps contain binomial trees with rank in monotonically increas-
ing order. Each iteration, we pick the tree with smallest rank and append it to
the result heap. If both trees have same rank we perform linking first. Consider
9.2. BINOMIAL HEAPS 229

t1 ... t2 ...

Rank(t1)<Rank(t2)?

the smaller

T1 T2 ... Ti ...

(a) Pick the tree with smaller rank to

the result.

t2 ... t1 ...

Rank(t1)=Rank(t2)

link(t1, t2)

insert
T1 T2 ... + Ti merge rest

(b) If two trees have same rank, link them to a new tree, and recursively insert
it to the merge result of the rest.

Figure 9.7: Merge two heaps.

230CHAPTER 9. BINOMIAL HEAP, FIBONACCI HEAP, AND PAIRING HEAP

Algorithm 4 imperative merge two binomial heaps

1: function Merge(H1 , H2 )
2: if H1 = ϕ then
3: return H2
4: if H2 = ϕ then
5: return H1
6: H←ϕ
7: while H1 ̸= ϕ ∧ H2 ̸= ϕ do
8: T ←ϕ
9: if Rank(H1 ) < Rank(H2 ) then
10: (T, H1 ) ← Extract-Head(H1 )
11: else if Rank(H2 ) < Rank(H1 ) then
12: (T, H2 ) ← Extract-Head(H2 )
13: else ▷ Equal rank
14: (T1 , H1 ) ← Extract-Head(H1 )
15: (T2 , H2 ) ← Extract-Head(H2 )
16: T ← Link(T1 , T2 )
17: Append-Tree(H, T )
18: if H1 ̸= ϕ then
19: Append-Trees(H, H1 )
20: if H2 ̸= ϕ then
21: Append-Trees(H, H2 )
22: return H

the Append-Tree algorithm, The rank of the new tree to be appended, can’t
be less than any other trees in the result heap according to our merge strategy,
however, it might be equal to the rank of the last tree in the result heap. This
can happen if the last tree appended are the result of linking, which will increase
the rank by one. In this case, we must link the new tree to be inserted with the
last tree. In below algorithm, suppose function Last(H) refers to the last tree
in a heap, and Append(H, T ) just appends a new tree at the end of a forest.
1: function Append-Tree(H, T )
2: if H ̸= ϕ∧ Rank(T ) = Rank(Last(H)) then
3: Last(H) ← Link(T , Last(H))
4: else
5: Append(H, T )
Function Append-Trees repeatedly call this function, so that it can append
all trees in a heap to the other heap.
1: function Append-Trees(H1 , H2 )
2: for each T ∈ H2 do
3: H1 ← Append-Tree(H1 , T )
The following Python program translates the merge algorithm.
def append_tree(ts, t):
if ts != [] and ts[-1].rank == t.rank:
ts[-1] = link(ts[-1], t)
else:
9.2. BINOMIAL HEAPS 231

ts.append(t)
return ts

def append_trees(ts1, ts2):

return reduce(append_tree, ts2, ts1)

def merge(ts1, ts2):

if ts1 == []:
return ts2
if ts2 == []:
return ts1
ts = []
while ts1 != [] and ts2 != []:
t = None
if ts1[0].rank < ts2[0].rank:
t = ts1.pop(0)
elif ts2[0].rank < ts1[0].rank:
t = ts2.pop(0)
else:
t = link(ts1.pop(0), ts2.pop(0))
ts = append_tree(ts, t)
ts = append_trees(ts, ts1)
ts = append_trees(ts, ts2)
return ts

Exercise 9.3
The program given above uses a container to manage sub-trees. Implement
the merge algorithm in your favorite imperative programming language with
‘left-child, right-sibling’ approach.

Pop

Among the forest which forms the binomial heap, each binomial tree conforms
to heap property that the root contains the minimum element in that tree.
However, the order relationship of these roots can be arbitrary. To find the
minimum element in the heap, we can select the smallest root of these trees.
Since there are lg n binomial trees, this approach takes O(lg n) time.
However, after we locate the minimum element (which is also know as the
top element of a heap), we need remove it from the heap and keep the binomial
property to accomplish heap-pop operation. Suppose the forest forms the bino-
mial heap consists trees of Bi , Bj , ..., Bp , ..., Bm , where Bk is a binomial tree of
rank k, and the minimum element is the root of Bp . If we delete it, there will
be p children left, which are all binomial trees with ranks p − 1, p − 2, ..., 0.
One tool at hand is that we have defined O(lg n) merge function. A possible
approach is to reverse the p children, so that their ranks change to monotonically
increasing order, and forms a binomial heap Hp . The rest of trees is still a
binomial heap, we represent it as H ′ = H − Bp . Merging Hp and H ′ given the
final result of pop. Figure 9.8 illustrates this idea.
In order to realize this algorithm, we first need to define an auxiliary function,
232CHAPTER 9. BINOMIAL HEAP, FIBONACCI HEAP, AND PAIRING HEAP

Figure 9.8: Pop the minimum element from a binomial heap.

which can extract the tree contains the minimum element at root from the forest.


 (T, ϕ) : H is a singleton as {T }
extractM in(H) = (T1 , H ′ ) : Root(T1 ) < Root(T ′ ) (9.5)

(T ′ , {T1 } ∪ H ′′ ) : otherwise

where

H = {T1 , T2 , ...} for the non-empty forest case;

H ′ = {T2 , T3 , ...} is the forest without the first tree;
(T ′ , H ′′ ) = extractM in(H ′ )

The result of this function is a tuple. The first part is the tree which has the
minimum element at root, the second part is the rest of the trees after remove
the first part from the forest.
This function examine each of the trees in the forest thus is bound to O(lg n)
time.
The relative Haskell program can be give respectively.
extractMin [t] = (t, [])
extractMin (t:ts) = if root t < root t' then (t, ts)
else (t', t:ts')
where
(t', ts') = extractMin ts
9.2. BINOMIAL HEAPS 233

With this function defined, to return the minimum element is trivial.

findMin = root ◦ fst ◦ extractMin
Of course, it’s possible to just traverse forest and pick the minimum root
without remove the tree for this purpose. Below imperative algorithm describes
it with ‘left child, right sibling’ approach.
1: function Find-Minimum(H)
2: T ← Head(H)
3: min ← ∞
4: while T ̸= ϕ do
5: if Key(T )< min then
6: min ← Key(T )
7: T ← Sibling(T )
8: return min
While if we manage the children with collection containers, the link list
traversing is abstracted as to find the minimum element among the list. The
following Python program shows about this situation.
def find_min(ts):
min_t = min(ts, key=lambda t: t.key)
return min_t.key
Next we define the function to delete the minimum element from the heap
by using extractM in.

delteM in(H) = merge(reverse(Children(T )), H ′ ) (9.6)

where

(T, H ′ ) = extractM in(H)

Translate the formula to Haskell program is trivial and we’ll skip it.
To realize the algorithm in procedural way takes extra efforts including list
reversing etc. We left these details as exercise to the reader. The following
pseudo code illustrate the imperative pop algorithm
1: function Extract-Min(H)
2: (Tmin , H) ← Extract-Min-Tree(H)
3: H ← Merge(H, Reverse(Children(Tmin )))
4: return (Key(Tmin ), H)
With pop operation defined, we can realize heap sort by creating a binomial
heap from a series of numbers, than keep popping the smallest number from the
heap till it becomes empty.

sort(xs) = heapSort(f romList(xs)) (9.7)

And the real work is done in function heapSort.

{
ϕ : H=ϕ
heapSort(H) =
{f indM in(H)} ∪ heapSort(deleteM in(H)) : otherwise
(9.8)
Translate to Haskell yields the following program.
234CHAPTER 9. BINOMIAL HEAP, FIBONACCI HEAP, AND PAIRING HEAP

heapSort = hsort ◦ fromList where

hsort [] = []
hsort h = (findMin h):(hsort $ deleteMin h)

Function fromList can be defined by folding. Heap sort can also be expressed
in procedural way respectively. Please refer to previous chapter about binary
heap for detail.

Exercise 9.4

• Write the program to return the minimum element from a binomial heap
in your favorite imperative programming language with ’left-child, right-
sibling’ approach.

• Realize the Extract-Min-Tree() Algorithm.

• For ’left-child, right-sibling’ approach, reversing all children of a tree is

actually reversing a single-direct linked-list. Write a program to reverse
such linked-list in your favorite imperative programming language.

More words about binomial heap

As what we have shown that insertion and merge are bound to O(lg n) time.
The results are all ensure for the worst case. The amortized performance are
O(1). We skip the proof for this fact.

9.3 Fibonacci Heaps

It’s interesting that why the name is given as ‘Fibonacci heap’. In fact, there is
no direct connection from the structure design to Fibonacci series. The inventors
of ‘Fibonacci heap’, Michael L. Fredman and Robert E. Tarjan, utilized the
property of Fibonacci series to prove the performance time bound, so they
decided to use Fibonacci to name this data structure.[2]

9.3.1 Definition
Fibonacci heap is essentially a lazy evaluated binomial heap. Note that, it
doesn’t mean implementing binomial heap in lazy evaluation settings, for in-
stance Haskell, brings Fibonacci heap automatically. However, lazy evaluation
setting does help in realization. For example in [5], presents a elegant imple-
mentation.
Fibonacci heap has excellent performance theoretically. All operations ex-
cept for pop are bound to amortized O(1) time. In this section, we’ll give an
algorithm different from some popular textbook[2]. Most of the ideas present
here are based on Okasaki’s work[6].
Let’s review and compare the performance of binomial heap and Fibonacci
heap (more precisely, the performance goal of Fibonacci heap).
9.3. FIBONACCI HEAPS 235

operation Binomial heap Fibonacci heap

insertion O(lg n) O(1)
merge O(lg n) O(1)
top O(lg n) O(1)
pop O(lg n) amortized O(lg n)
Consider where is the bottleneck of inserting a new element x to binomial
heap. We actually wrap x as a singleton leaf and insert this tree into the heap
which is actually a forest.
During this operation, we inserted the tree in monotonically increasing order
of rank, and once the rank is equal, recursively linking and inserting will happen,
which lead to the O(lg n) time.
As the lazy strategy, we can postpone the ordered-rank insertion and merging
operations. On the contrary, we just put the singleton leaf to the forest. The
problem is that when we try to find the minimum element, for example the top
operation, the performance will be bad, because we need check all trees in the
forest, and there aren’t only O(lg n) trees.
In order to locate the top element in constant time, we must remember where
is the tree contains the minimum element as root.
Based on this idea, we can reuse the definition of binomial tree and give the
definition of Fibonacci heap as the following Haskell program for example.
data BiTree a = Node { rank :: Int
, root :: a
, children :: [BiTree a]}

The Fibonacci heap is either empty or a forest of binomial trees with the
minimum element stored in a special one explicitly.
data FibHeap a = E | FH { size :: Int
, minTree :: BiTree a
, trees :: [BiTree a]}

For convenient purpose, we also add a size field to record how many elements
are there in a heap.
The data layout can also be defined in imperative way as the following ANSI
C code.
struct node{
Key key;
struct node ∗next, ∗prev, ∗parent, ∗children;
int degree; /∗ As known as rank ∗/
int mark;
};

struct FibHeap{
struct node ∗roots;
struct node ∗minTr;
int n; /∗ number of nodes ∗/
};

For generality, Key can be a customized type, we use integer for illustration
purpose.
typedef int Key;
236CHAPTER 9. BINOMIAL HEAP, FIBONACCI HEAP, AND PAIRING HEAP

In this chapter, we use the circular doubly linked-list for imperative settings
to realize the Fibonacci Heap as described in [2]. It makes many operations easy
and fast. Note that, there are two extra fields added. A degree, also known as
rank for a node is the number of children of this node; Flag mark is used only
in decreasing key operation. It will be explained in detail in later section.

9.3.2 Basic heap operations

As we mentioned that Fibonacci heap is essentially binomial heap implemented
in a lazy evaluation strategy, we’ll reuse many algorithms defined for binomial
heap.

Insert a new element to the heap

Recall the insertion algorithm of binomial tree. It can be treated as a special
case of merge operation, that one heap contains only a singleton tree.

insert(H, x) = merge(H, singleton(x)) (9.9)

where singleton is an auxiliary function to wrap an element to a one-leaf-tree.

singleton(x) = F ibHeap(1, node(1, x, ϕ), ϕ)

Note that function F ibHeap() accepts three parameters, a size value, which
is 1 for this one-leaf-tree, a special tree which contains the minimum element as
root, and a list of other binomial trees in the forest. The meaning of function
node() is as same as before, that it creates a binomial tree from a rank, an
element, and a list of children.
Insertion can also be realized directly by appending the new node to the
forest and updated the record of the tree which contains the minimum element.
1: function Insert(H, k)
2: x ← Singleton(k) ▷ Wrap x to a node
3: append x to root list of H
4: if Tmin (H) = N IL ∨ k < Key(Tmin (H)) then
5: Tmin (H) ← x
6: n(H) ← n(H)+1
Where function Tmin () returns the tree which contains the minimum element
at root.
The following C source snippet is a translation for this algorithm.
struct FibHeap∗ insert_node(struct FibHeap∗ h, struct node∗ x){
h = add_tree(h, x);
if(h→minTr == NULL | | x→key < h→minTr→key)
h→minTr = x;
h→n++;
return h;
}

Exercise 9.5
9.3. FIBONACCI HEAPS 237

Implement the insert algorithm in your favorite imperative programming

language completely. This is also an exercise to circular doubly linked list ma-
nipulation.

Merge two heaps

Different with the merging algorithm of binomial heap, we post-pone the linking
operations later. The idea is to just put all binomial trees from each heap
together, and choose one special tree which record the minimum element for
the result heap.



 H1 : H2 = ϕ

H2 : H1 = ϕ
merge(H1 , H2 ) =

 F ibHeap(s1 + s2 , T 1min , {T2min } ∪ T1 ∪ T2) : root(T1min ) < root(T2min )

F ibHeap(s1 + s2 , T2min , {T1min } ∪ T1 ∪ T2 ) : otherwise
(9.10)
where s1 and s2 are the size of H1 and H2 ; T1min and T2min are the spe-
cial trees with minimum element as root in H1 and H2 respectively; T1 =
{T11 , T12 , ...} is a forest contains all other binomial trees in H1 ; while T2 has
the same meaning as T1 except that it represents the forest in H2 . Function
root(T ) return the root element of a binomial tree.
Note that as long as the ∪ operation takes constant time, these merge al-
gorithm is bound to O(1). The following Haskell program is the translation of
this algorithm.
merge h E=h
merge E h=h
merge h1@(FH sz1 minTr1 ts1) h2@(FH sz2 minTr2 ts2)
| root minTr1 < root minTr2 = FH (sz1+sz2) minTr1 (minTr2:ts2++ts1)
| otherwise = FH (sz1+sz2) minTr2 (minTr1:ts1++ts2)

Merge algorithm can also be realized imperatively by concatenating the root

lists of the two heaps.
1: function Merge(H1 , H2 )
2: H←Φ
3: Root(H) ← Concat(Root(H1 ), Root(H2 ))
4: if Key(Tmin (H1 )) < Key(Tmin (H2 )) then
5: Tmin (H) ← Tmin (H1 )
6: else
7: Tmin (H) ← Tmin (H2 )
n(H) = n(H1 ) + n(H2 )
8: return H
This function assumes neither H1 , nor H2 is empty. And it’s easy to add
handling to these special cases as the following ANSI C program.
struct FibHeap∗ merge(struct FibHeap∗ h1, struct FibHeap∗ h2){
struct FibHeap∗ h;
if(is_empty(h1))
return h2;
if(is_empty(h2))
return h1;
h = empty();
238CHAPTER 9. BINOMIAL HEAP, FIBONACCI HEAP, AND PAIRING HEAP

h→roots = concat(h1→roots, h2→roots);

if(h1→minTr→key < h2→minTr→key)
h→minTr = h1→minTr;
else
h→minTr = h2→minTr;
h→n = h1→n + h2→n;
free(h1);
free(h2);
return h;
}
With merge function defined, the O(1) insertion algorithm is realized as
well. And we can also give the O(1) time top function as below.

top(H) = root(Tmin ) (9.11)

Exercise 9.6
Implement the circular doubly linked list concatenation function in your
favorite imperative programming language.

Extract the minimum element from the heap (pop)

The pop operation is the most complex one in Fibonacci heap. Since we post-
pone the tree consolidation in merge algorithm. We have to compensate it
somewhere. Pop is the only place left as we have defined, insert, merge, top
already.
There is an elegant procedural algorithm to do the tree consolidation by
using an auxiliary array[2]. We’ll show it later in imperative approach section.
In order to realize the purely functional consolidation algorithm, let’s first
consider a similar number puzzle.
Given a list of numbers, such as {2, 1, 1, 4, 8, 1, 1, 2, 4}, we want to add any
two values if they are same. And repeat this procedure till all numbers are
unique. The result of the example list should be {8, 16} for instance.
One solution to this problem will as the following.

consolidate(L) = f old(meld, ϕ, L) (9.12)

Where f old() function is defined to iterate all elements from a list, applying
a specified function to the intermediate result and each element. it is sometimes
called as reducing. Please refer to Appendix A and the chapter of binary search
tree for it.
L = {x1 , x2 , ..., xn }, denotes a list of numbers; and we’ll use L′ = {x2 , x3 , ..., xn }
to represent the rest of the list with the first element removed. Function meld()
is defined as below.


 {x} : L = ϕ

meld(L′ , x + x1 ) : x = x1
meld(L, x) = (9.13)

 {x} ∪ L : x < x1
 ′
{x1 } ∪ meld(L , x) : otherwise
The consolidate() function works as the follows. It maintains an ordered
result list L, contains only unique numbers, which is initialized from an empty
9.3. FIBONACCI HEAPS 239

Table 9.1: Steps of consolidate numbers

number intermediate result result

2 2 2
1 1, 2 1, 2
1 (1+1), 2 4
4 (4+4) 8
8 (8+8) 16
1 1, 16 1, 16
1 (1+1), 16 2, 16
2 (2+2), 16 4, 16
4 (4+4), 16 8, 16

list ϕ. Each time it process an element x, it firstly check if the first element in L
is equal to x, if so, it will add them together (which yields 2x), and repeatedly
check if 2x is equal to the next element in L. This process won’t stop until either
the element to be melt is not equal to the head element in the rest of the list, or
the list becomes empty. Table 9.1 illustrates the process of consolidating num-
ber sequence {2, 1, 1, 4, 8, 1, 1, 2, 4}. Column one lists the number ’scanned’ one
by one; Column two shows the intermediate result, typically the new scanned
number is compared with the first number in result list. If they are equal, they
are enclosed in a pair of parentheses; The last column is the result of meld, and
it will be used as the input to next step processing.
The Haskell program can be give accordingly.
consolidate = foldl meld [] where
meld [] x = [x]
meld (x':xs) x | x == x' = meld xs (x+x')
| x < x' = x:x':xs
| otherwise = x': meld xs x
We’ll analyze the performance of consolidation as a part of pop operation in
later section.
The tree consolidation is very similar to this algorithm except it performs
based on rank. The only thing we need to do is to modify meld() function a
bit, so that it compare on ranks and do linking instead of adding.


 {x} : L=ϕ

meld(L′ , link(x, x1 )) : rank(x) = rank(x1 )
meld(L, x) = (9.14)

 {x} ∪ L : rank(x) < rank(x1 )

{x1 } ∪ meld(L′ , x) : otherwise
The final consolidate Haskell program changes to the below version.
consolidate = foldl meld [] where
meld [] t = [t]
meld (t':ts) t | rank t == rank t' = meld ts (link t t')
| rank t < rank t' = t:t':ts
| otherwise = t' : meld ts t
Figure 9.9 and 9.10 show the steps of consolidation when processing a Fi-
bonacci Heap contains different ranks of trees. Comparing with table 9.1 reveals
the similarity.
240CHAPTER 9. BINOMIAL HEAP, FIBONACCI HEAP, AND PAIRING HEAP

a c d e i q r s u

b f g j k m t v w

h l n o x

(a) Before consolidation

a b c e

c a b c d f g

b d h

(b) Step 1, 2 (c) Step 3, ’d’ is firstly linked to ’c’, (d) Step 4
then repeatedly linked to ’a’.

Figure 9.9: Steps of consolidation

9.3. FIBONACCI HEAPS 241

a q a

b c e i b c e i

d f g j k m d f g j k m

h l n o h l n o

p p

(a) Step 5 (b) Step 6

q a

r s b c e i

t d f g j k m

h l n o

Figure 9.10: Steps of consolidation

242CHAPTER 9. BINOMIAL HEAP, FIBONACCI HEAP, AND PAIRING HEAP

After we merge all binomial trees, including the special tree record for the
minimum element in root, in a Fibonacci heap, the heap becomes a Binomial
heap. And we lost the special tree, which gives us the ability to return the top
element in O(1) time.
It’s necessary to perform a O(lg n) time search to resume the special tree.
We can reuse the function extractM in() defined for Binomial heap.
It’s time to give the final pop function for Fibonacci heap as all the sub
problems have been solved. Let Tmin denote the special tree in the heap to
record the minimum element in root; T denote the forest contains all the other
trees except for the special tree, s represents the size of a heap, and function
children() returns all sub trees except the root of a binomial tree.

{
ϕ : T = ϕ ∧ children(Tmin ) = ϕ
deleteM in(H) = ′
F ibHeap(s − 1, Tmin , T′ ) : otherwise
(9.15)
Where
′
(Tmin , T′ ) = extractM in(consolidate(children(Tmin ) ∪ T))
Translate to Haskell yields the below program.
deleteMin (FH _ (Node _ x []) []) = E
deleteMin h@(FH sz minTr ts) = FH (sz-1) minTr' ts' where
(minTr', ts') = extractMin $ consolidate (children minTr ++ ts)
The main part of the imperative realization is similar. We cut all children
of Tmin and append them to root list, then perform consolidation to merge all
trees with the same rank until all trees are unique in term of rank.
1: function Delete-Min(H)
2: x ← Tmin (H)
3: if x ̸= N IL then
4: for each y ∈ Children(x) do
5: append y to root list of H
6: Parent(y) ← N IL
7: remove x from root list of H
8: n(H) ← n(H) - 1
9: Consolidate(H)
10: return x
Algorithm Consolidate utilizes an auxiliary array A to do the merge job.
Array A[i] is defined to store the tree with rank (degree) i. During the traverse
of root list, if we meet another tree of rank i, we link them together to get a
new tree of rank i + 1. Next we clean A[i], and check if A[i + 1] is empty and
perform further linking if necessary. After we finish traversing all roots, array
A stores all result trees and we can re-construct the heap from it.
1: function Consolidate(H)
2: D ← Max-Degree(n(H))
3: for i ← 0 to D do
4: A[i] ← N IL
5: for each x ∈ root list of H do
6: remove x from root list of H
9.3. FIBONACCI HEAPS 243

7: d ← Degree(x)
8: while A[d] ̸= N IL do
9: y ← A[d]
10: x ← Link(x, y)
11: A[d] ← N IL
12: d←d+1
13: A[d] ← x
14: Tmin (H) ← N IL ▷ root list is NIL at the time
15: for i ← 0 to D do
16: if A[i] ̸= N IL then
17: append A[i] to root list of H.
18: if Tmin = N IL∨ Key(A[i]) < Key(Tmin (H)) then
19: Tmin (H) ← A[i]
The only unclear sub algorithm is Max-Degree, which can determine the
upper bound of the degree of any node in a Fibonacci Heap. We’ll delay the
realization of it to the last sub section.
Feed a Fibonacci Heap shown in Figure 9.9 to the above algorithm, Figure
9.11, 9.12 and 9.13 show the result trees stored in auxiliary array A in every
steps.

A[0] A[1] A[2] A[3] A[4]

A[0] A[1] A[2] A[3] A[4] a

A[0] A[1] A[2] A[3] A[4] a b c e

c a b c d f g

b d h

(a) Step 1, 2 (b) Step 3, Since A0 ̸= N IL, (c) Step 4

’d’ is firstly linked to ’c’, and
clear A0 to N IL. Again, as
A1 ̸= N IL, ’c’ is linked to ’a’
and the new tree is stored in
A2 .

Figure 9.11: Steps of consolidation

Translate the above algorithm to ANSI C yields the below program.

void consolidate(struct FibHeap∗ h){
if(!h→roots)
return;
int D = max_degree(h→n)+1;
struct node ∗x, ∗y;
struct node∗∗ a = (struct node∗∗)malloc(sizeof(struct node∗)∗(D+1));
244CHAPTER 9. BINOMIAL HEAP, FIBONACCI HEAP, AND PAIRING HEAP

A[0] A[1] A[2] A[3] A[4]

b c e i

d f g j k m

h l n o

(a) Step 5

A[0] A[1] A[2] A[3] A[4]

q a

b c e i

d f g j k m

h l n o

(b) Step 6

Figure 9.12: Steps of consolidation

9.3. FIBONACCI HEAPS 245

A[0] A[1] A[2] A[3] A[4]

q a

r s b c e i

t d f g j k m

h l n o

(a) Step 7, 8, Since A0 ̸= N IL, ’r’ is firstly linked to ’q’, and the new
tree is stored in A1 (A0 is cleared); then ’s’ is linked to ’q’, and stored
in A2 (A1 is cleared).

Figure 9.13: Steps of consolidation

int i, d;
for(i=0; i≤D; ++i)
a[i] = NULL;
while(h→roots){
x = h→roots;
h→roots = remove_node(h→roots, x);
d= x→degree;
while(a[d]){
y = a[d]; /∗ Another node has the same degree as x ∗/
x = link(x, y);
a[d++] = NULL;
}
a[d] = x;
}
h→minTr = h→roots = NULL;
for(i=0; i≤D; ++i)
if(a[i]){
h→roots = append(h→roots, a[i]);
if(h→minTr == NULL | | a[i]→key < h→minTr→key)
h→minTr = a[i];
}
free(a);
}

Exercise 9.7
Implement the remove function for circular doubly linked list in your favorite
imperative programming language.
246CHAPTER 9. BINOMIAL HEAP, FIBONACCI HEAP, AND PAIRING HEAP

9.3.3 Running time of pop

In order to analyze the amortize performance of pop, we adopt potential method.
Reader can refer to [2] for a formal definition. In this chapter, we only give a
intuitive illustration.
Remind the gravity potential energy, which is defined as

E =M ·g·h

Suppose there is a complex process, which moves the object with mass M up
and down, and finally the object stop at height h′ . And if there exists friction
resistance Wf , We say the process works the following power.

W = M · g · (h′ − h) + Wf

Figure 9.14: Gravity potential energy.

Figure 9.14 illustrated this concept.

We treat the Fibonacci heap pop operation in a similar way, in order to
evaluate the cost, we firstly define the potential Φ(H) before extract the mini-
mum element. This potential is accumulated by insertion and merge operations
executed so far. And after tree consolidation and we get the result H ′ , we then
calculate the new potential Φ(H ′ ). The difference between Φ(H ′ ) and Φ(H) plus
the contribution of consolidate algorithm indicates the amortized performance
of pop.
For pop operation analysis, the potential can be defined as

Φ(H) = t(H) (9.16)

Where t(H) is the number of trees in Fibonacci heap forest. We have t(H) =
1 + length(T) for any non-empty heap.
For the n-nodes Fibonacci heap, suppose there is an upper bound of ranks
for all trees as D(n). After consolidation, it ensures that the number of trees in
the heap forest is at most D(n) + 1.
9.3. FIBONACCI HEAPS 247

Before consolidation, we actually did another important thing, which also

contribute to running time, we removed the root of the minimum tree, and
concatenate all children left to the forest. So consolidate operation at most
processes D(n) + t(H) − 1 trees.
Summarize all the above factors, we deduce the amortized cost as below.

T = Tconsolidation + Φ(H ′ ) − Φ(H)

= O(D(n) + t(H) − 1) + (D(n) + 1) − t(H) (9.17)
= O(D(n))

If only insertion, merge, and pop function are applied to Fibonacci heap.
We ensure that all trees are binomial trees. It is easy to estimate the upper
limit D(n) is O(lg n). (Suppose the extreme case, that all nodes are in only one
Binomial tree).
However, we’ll show in next sub section that, there is operation can violate
the binomial tree assumption.

Exercise 9.8
Why the tree consolidation time is proportion to the number of trees it
processed?

9.3.4 Decreasing key

There is a special heap operation left. It only makes sense for imperative set-
tings. It’s about decreasing key of a certain node. Decreasing key plays impor-
tant role in some Graphic algorithms such as Minimum Spanning tree algorithm
and Dijkstra’s algorithm [2]. In that case we hope the decreasing key takes O(1)
amortized time.
However, we can’t define a function like Decrease(H, k, k ′ ), which first lo-
cates a node with key k, then decrease k to k ′ by replacement, and then resume
the heap properties. This is because the time for locating phase is bound to
O(n) time, since we don’t have a pointer to the target node.
In imperative setting, we can define the algorithm as Decrease-Key(H, x, k).
Here x is a node in heap H, which we want to decrease its key to k. We needn’t
perform a search, as we have x at hand. It’s possible to give an amortized O(1)
solution.
When we decreased the key of a node, if it’s not a root, this operation may
violate the property Binomial tree that the key of parent is less than all keys of
children. So we need to compare the decreased key with the parent node, and if
this case happens, we can cut this node and append it to the root list. (Remind
the recursive swapping solution for binary heap which leads to O(lg n))
Figure 9.15 illustrates this situation. After decreasing key of node x, it is
less than y, we cut x off its parent y, and ’past’ the whole tree rooted at x to
root list.
Although we recover the property of that parent is less than all children, the
tree isn’t any longer a Binomial tree after it losses some sub tree. If a tree losses
too many of its children because of cutting, we can’t ensure the performance of
merge-able heap operations. Fibonacci Heap adds another constraints to avoid
such problem:
248CHAPTER 9. BINOMIAL HEAP, FIBONACCI HEAP, AND PAIRING HEAP

x ... r

... y ...

@
@

...

Figure 9.15: x < y, cut tree x from its parent, and add x to root list.

If a node losses its second child, it is immediately cut from parent, and added
to root list
The final Decrease-Key algorithm is given as below.
1: function Decrease-Key(H, x, k)
2: Key(x) ← k
3: p ← Parent(x)
4: if p ̸= N IL ∧ k < Key(p) then
5: Cut(H, x)
6: Cascading-Cut(H, p)
7: if k < Key(Tmin (H)) then
8: Tmin (H) ← x
Where function Cascading-Cut uses the mark to determine if the node is
losing the second child. the node is marked after it losses the first child. And
the mark is cleared in Cut function.
1: function Cut(H, x)
2: p ← Parent(x)
3: remove x from p
4: Degree(p) ← Degree(p) - 1
5: add x to root list of H
6: Parent(x) ← N IL
7: Mark(x) ← F ALSE
During cascading cut process, if x is marked, which means it has already
lost one child. We recursively performs cut and cascading cut on its parent till
reach to root.
1: function Cascading-Cut(H, x)
2: p ← Parent(x)
3: if p ̸= N IL then
4: if Mark(x) = F ALSE then
9.3. FIBONACCI HEAPS 249

5: Mark(x) ← T RU E
6: else
7: Cut(H, x)
8: Cascading-Cut(H, p)
The relevant ANSI C decreasing key program is given as the following.
void decrease_key(struct FibHeap∗ h, struct node∗ x, Key k){
struct node∗ p = x→parent;
x→key = k;
if(p && k < p→key){
cut(h, x);
cascading_cut(h, p);
}
if(k < h→minTr→key)
h→minTr = x;
}

void cut(struct FibHeap∗ h, struct node∗ x){

struct node∗ p = x→parent;
p→children = remove_node(p→children, x);
p→degree--;
h→roots = append(h→roots, x);
x→parent = NULL;
x→mark = 0;
}

void cascading_cut(struct FibHeap∗ h, struct node∗ x){

struct node∗ p = x→parent;
if(p){
if(!x→mark)
x→mark = 1;
else{
cut(h, x);
cascading_cut(h, p);
}
}
}

Exercise 9.9
Prove that Decrease-Key algorithm is amortized O(1) time.

9.3.5 The name of Fibonacci Heap

It’s time to reveal the reason why the data structure is named as ’Fibonacci
Heap’.
There is only one undefined algorithm so far, Max-Degree(n). Which can
determine the upper bound of degree for any node in a n nodes Fibonacci Heap.
We’ll give the proof by using Fibonacci series and finally realize Max-Degree
algorithm.
Lemma 9.3.1. For any node x in a Fibonacci Heap, denote k = degree(x),
and |x| = size(x), then
|x| ≥ Fk+2 (9.18)
250CHAPTER 9. BINOMIAL HEAP, FIBONACCI HEAP, AND PAIRING HEAP

Where Fk is Fibonacci series defined as the following.


 0 : k=0
Fk = 1 : k=1

Fk−1 + Fk−2 : k ≥ 2
Proof. Consider all k children of node x, we denote them as y1 , y2 , ..., yk in the
order of time when they were linked to x. Where y1 is the oldest, and yk is the
youngest.
Obviously, |yi | ≥ 0. When we link yi to x, children y1 , y2 , ..., yi−1 have
already been there. And algorithm Link only links nodes with the same degree.
Which indicates at that time, we have

degree(yi ) = degree(x) = i − 1
After that, node yi can at most lost 1 child, (due to the decreasing key
operation) otherwise, if it will be immediately cut off and append to root list
after the second child loss. Thus we conclude

degree(yi ) ≥ i − 2
For any i = 2, 3, ..., k.
Let sk be the minimum possible size of node x, where degree(x) = k. For
trivial cases, s0 = 1, s1 = 2, and we have

|x| ≥ sk
∑
k
= 2+ sdegree(yi )
i=2
∑
k
≥ 2+ si−2
i=2

We next show that sk > Fk+2 . This can be proved by induction. For trivial
cases, we have s0 = 1 ≥ F2 = 1, and s1 = 2 ≥ F3 = 2. For induction case k ≥ 2.
We have

|x| ≥ sk
∑
k
≥ 2+ si−2
i=2
∑
k
≥ 2+ Fi
i=2
∑
k
= 1+ Fi
i=0

At this point, we need prove that

∑
k
Fk+2 = 1 + Fi (9.19)
i=0
9.3. FIBONACCI HEAPS 251

This can also be proved by using induction:

• Trivial case, F2 = 1 + F0 = 2

• Induction case,

Fk+2 = Fk+1 + Fk
∑
k−1
= 1+ Fi + Fk
i=0
∑
k
= 1+ Fi
i=0

Summarize all above we have the final result.

n ≥ |x| ≥ Fk + 2 (9.20)

√
Recall the result of AVL tree, that Fk ≥ ϕk , where ϕ = 1+2 5 is the golden
ratio. We also proved that pop operation is amortized O(lg n) algorithm.
Based on this result. We can define Function M axDegree as the following.

M axDegree(n) = 1 + ⌊logϕ n⌋ (9.21)

The imperative Max-Degree algorithm can also be realized by using Fi-

bonacci sequences.
1: function Max-Degree(n)
2: F0 ← 0
3: F1 ← 1
4: k←2
5: repeat
6: Fk ← Fk1 + Fk2
7: k ←k+1
8: until Fk < n
9: return k − 2
Translate the algorithm to ANSI C given the following program.
int max_degree(int n){
int k, F;
int F2 = 0;
int F1 = 1;
for(F=F1+F2, k=2; F<n; ++k){
F2 = F1;
F1 = F;
F = F1 + F2;
}
return k-2;
}
252CHAPTER 9. BINOMIAL HEAP, FIBONACCI HEAP, AND PAIRING HEAP

9.4 Pairing Heaps

Although Fibonacci Heaps provide excellent performance theoretically, it is
complex to realize. People find that the constant behind the big-O is big.
Actually, Fibonacci Heap is more significant in theory than in practice.
In this section, we’ll introduce another solution, Pairing heap, which is one of
the best heaps ever known in terms of performance. Most operations including
insertion, finding minimum element (top), merging are all bounds to O(1) time,
while deleting minimum element (pop) is conjectured to amortized O(lg n) time
[7] [6]. Note that this is still a conjecture for 15 years by the time I write this
chapter. Nobody has been proven it although there are much experimental data
support the O(lg n) amortized result.
Besides that, pairing heap is simple. There exist both elegant imperative
and functional implementations.

9.4.1 Definition
Both Binomial Heaps and Fibonacci Heaps are realized with forest. While a
pairing heaps is essentially a K-ary tree. The minimum element is stored at
root. All other elements are stored in sub trees.
The following Haskell program defines pairing heap.
data PHeap a = E | Node a [PHeap a]
This is a recursive definition, that a pairing heap is either empty or a K-ary
tree, which is consist of a root node, and a list of sub trees.
Pairing heap can also be defined in procedural languages, for example ANSI
C as below. For illustration purpose, all heaps we mentioned later are minimum-
heap, and we assume the type of key is integer 4 . We use same linked-list based
left-child, right-sibling approach (aka, binary tree representation[2]).
typedef int Key;

struct node{
Key key;
struct node ∗next, ∗children, ∗parent;
};
Note that the parent field does only make sense for decreasing key operation,
which will be explained later on. we can omit it for the time being.

9.4.2 Basic heap operations

In this section, we first give the merging operation for pairing heap, which can be
used to realize insertion. Merging, insertion, and finding the minimum element
are relative trivial compare to the extracting minimum element operation.

Merge, insert, and find the minimum element (top)

The idea of merging is similar to the linking algorithm we shown previously for
Binomial heap. When we merge two pairing heaps, there are two cases.
4 We can parametrize the key type with C++ template, but this is beyond our scope, please

refer to the example programs along with this book

9.4. PAIRING HEAPS 253

• Trivial case, one heap is empty, we simply return the other heap as the
result;
• Otherwise, we compare the root element of the two heaps, make the heap
with bigger root element as a new children of the other.

Let H1 , and H2 denote the two heaps, x and y be the root element of H1
and H2 respectively. Function Children() returns the children of a K-ary tree.
Function N ode() can construct a K-ary tree from a root element and a list of
children.



 H1 : H2 = ϕ

H2 : H1 = ϕ
merge(H1 , H2 ) = (9.22)

 N ode(x, {H2 } ∪ Children(H1 )) : x<y

N ode(y, {H1 } ∪ Children(H2 )) : otherwise

Where
x = Root(H1 )
y = Root(H2 )
It’s obviously that merging algorithm is bound to O(1) time 5 . The merge
equation can be translated to the following Haskell program.
merge h E = h
merge E h = h
merge h1@(Node x hs1) h2@(Node y hs2) =
if x < y then Node x (h2:hs1) else Node y (h1:hs2)
Merge can also be realized imperatively. With left-child, right sibling ap-
proach, we can just link the heap, which is in fact a K-ary tree, with larger key
as the first new child of the other. This is constant time operation as described
below.
1: function Merge(H1 , H2 )
2: if H1 = NIL then
3: return H2
4: if H2 = NIL then
5: return H1
6: if Key(H2 ) < Key(H1 ) then
7: Exchange(H1 ↔ H2 )
8: Insert H2 in front of Children(H1 )
9: Parent(H2 ) ← H1
10: return H1
Note that we also update the parent field accordingly. The ANSI C example
program is given as the following.
struct node∗ merge(struct node∗ h1, struct node∗ h2) {
if (h1 == NULL)
return h2;
if (h2 == NULL)
return h1;
5 Assume ∪ is constant time operation, this is true for linked-list settings, including ’cons’

like operation in functional programming languages.

254CHAPTER 9. BINOMIAL HEAP, FIBONACCI HEAP, AND PAIRING HEAP

if (h2→key < h1→key)

swap(&h1, &h2);
h2→next = h1→children;
h1→children = h2;
h2→parent = h1;
h1→next = NULL; /∗Break previous link if any∗/
return h1;
}

Where function swap() is defined in a similar way as Fibonacci Heap.

With merge defined, insertion can be realized as same as Fibonacci Heap in
Equation 9.9. Definitely it’s O(1) time operation. As the minimum element is
always stored in root, finding it is trivial.

top(H) = Root(H) (9.23)

Same as the other two above operations, it’s bound to O(1) time.

Decrease key of a node

There is another operation, to decrease key of a given node, which only makes
sense in imperative settings as we explained in Fibonacci Heap section.
The solution is simple, that we can cut the node with the new smaller key
from it’s parent along with all its children. Then merge it again to the heap.
The only special case is that if the given node is the root, then we can directly
set the new key without doing anything else.
The following algorithm describes this procedure for a given node x, with
new key k.
1: function Decrease-Key(H, x, k)
2: Key(x) ← k
3: if Parent(x) ̸= NIL then
4: Remove x from Children(Parent(x)) Parent(x) ← NIL
5: return Merge(H, x)
6: return H
The following ANSI C program translates this algorithm.
struct node∗ decrease_key(struct node∗ h, struct node∗ x, Key key) {
x→key = key; /∗ Assume key ≤ x→key ∗/
if(x→parent) {
x→parent→children = remove_node(x→parent→children, x);
x→parent = NULL;
return merge(h, x);
}
return h;
}

Exercise 9.10
Implement the program of removing a node from the children of its parent
in your favorite imperative programming language. Consider how can we ensure
the overall performance of decreasing key is O(1) time? Is left-child, right sibling
approach enough?
9.4. PAIRING HEAPS 255

Delete the minimum element from the heap (pop)

Since the minimum element is always stored at root, after delete it during pop-
ping, the rest things left are all sub-trees. These trees can be merged to one big
tree.

pop(H) = mergeP airs(Children(H)) (9.24)

Pairing Heap uses a special approach that it merges every two sub-trees
from left to right in pair. Then merge these paired results from right to left
which forms a final result tree. The name of ‘Pairing Heap’ comes from the
characteristic of this pair-merging.
Figure 9.16 and 9.17 illustrate the procedure of pair-merging.

5 4 3 12 7 10 11 6 9

15 13 8 17 14

(a) A pairing heap before pop.

5 4 3 12 7 10 11 6 9

15 13 8 17 14

(b) After root element 2 being removed, there are 9 sub-trees left.

4 3 7 6 9

5 13 12 8 10 11 7 14

15 16

Figure 9.16: Remove the root element, and merge children in pairs.
256CHAPTER 9. BINOMIAL HEAP, FIBONACCI HEAP, AND PAIRING HEAP

6 6

9 11 7 9 11

7 14 10 14

16 16

(a) Merge tree with 9, and tree with root 6. (b) Merge tree with root 7 to the result.

3 3

6 12 8 4 6 12 8

7 9 11 5 13 7 9 11

10 14 15 10 14

16 16

Figure 9.17: Steps of merge from right to left.

9.4. PAIRING HEAPS 257

The recursive pair-merging solution is quite similar to the bottom up merge

sort[6]. Denote the children of a pairing heap as A, which is a list of trees of
{T1 , T2 , T3 , ..., Tm } for example. The mergeP airs() function can be given as
below.


 Φ : A=Φ
mergeP airs(A) = T1 : A = {T1 }

merge(merge(T1 , T2 ), mergeP airs(A′ )) : otherwise
(9.25)
where

A′ = {T3 , T4 , ..., Tm }
is the rest of the children without the first two trees.
The relative Haskell program of popping is given as the following.
deleteMin (Node _ hs) = mergePairs hs where
mergePairs [] = E
mergePairs [h] = h
mergePairs (h1:h2:hs) = merge (merge h1 h2) (mergePairs hs)
The popping operation can also be explained in the following procedural
algorithm.
1: function Pop(H)
2: L ← N IL
3: for every 2 trees Tx , Ty ∈ Children(H) from left to right do
4: Extract x, and y from Children(H)
5: T ← Merge(Tx , Ty )
6: Insert T at the beginning of L
7: H ← Children(H) ▷ H is either N IL or one tree.
8: for ∀T ∈ L from left to right do
9: H ← Merge(H, T )
10: return H
Note that L is initialized as an empty linked-list, then the algorithm iterates
every two trees in pair in the children of the K-ary tree, from left to right, and
performs merging, the result is inserted at the beginning of L. Because we insert
to front end, so when we traverse L later on, we actually process from right to
left. There may be odd number of sub-trees in H, in that case, it will leave one
tree after pair-merging. We handle it by start the right to left merging from
this left tree.
Below is the ANSI C program to this algorithm.
struct node∗ pop(struct node∗ h) {
struct node ∗x, ∗y, ∗lst = NULL;
while ((x = h→children) != NULL) {
if ((h→children = y = x→next) != NULL)
h→children = h→children→next;
lst = push_front(lst, merge(x, y));
}
x = NULL;
while((y = lst) != NULL) {
lst = lst→next;
258CHAPTER 9. BINOMIAL HEAP, FIBONACCI HEAP, AND PAIRING HEAP

x = merge(x, y);
}
free(h);
return x;
}

The pairing heap pop operation is conjectured to be amortized O(lg n) time

[7].

Exercise 9.11
Write a program to insert a tree at the beginning of a linked-list in your
favorite imperative programming language.

Delete a node
We didn’t mention delete in Binomial heap or Fibonacci Heap. Deletion can be
realized by first decreasing key to minus infinity (−∞), then performing pop.
In this section, we present another solution for delete node.
The algorithm is to define the function delete(H, x), where x is a node in a
pairing heap H 6 .
If x is root, we can just perform a pop operation. Otherwise, we can cut x
from H, perform a pop on x, and then merge the pop result back to H. This
can be described as the following.

{
pop(H) : x is root of H
delete(H, x) = (9.26)
merge(cut(H, x), pop(x)) : otherwise

As delete algorithm uses pop, the performance is conjectured to be amortized

O(lg n) time.

Exercise 9.12

• Write procedural pseudo code for delete algorithm.

• Write the delete operation in your favorite imperative programming lan-

guage

• Consider how to realize delete in purely functional setting.

9.5 Notes and short summary

In this chapter, we extend the heap implementation from binary tree to more
generic approach. Binomial heap and Fibonacci heap use Forest of K-ary trees
as under ground data structure, while Pairing heap use a K-ary tree to represent
heap. It’s a good point to post pone some expensive operation, so that the over
all amortized performance is ensured. Although Fibonacci Heap gives good
performance in theory, the implementation is a bit complex. It was removed in
6 Here the semantic of x is a reference to a node.
9.5. NOTES AND SHORT SUMMARY 259

some latest textbooks. We also present pairing heap, which is easy to realize
and have good performance in practice.
The elementary tree based data structures are all introduced in this book.
There are still many tree based data structures which we can’t covers them all
and skip here. We encourage the reader to refer to other textbooks about them.
From next chapter, we’ll introduce generic sequence data structures, array and
queue.
260CHAPTER 9. BINOMIAL HEAP, FIBONACCI HEAP, AND PAIRING HEAP
Bibliography

[1] K-ary tree, Wikipedia. http://en.wikipedia.org/wiki/K-ary_tree

[2] Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest and Clifford
Stein. “Introduction to Algorithms, Second Edition”. The MIT Press, 2001.
ISBN: 0262032937.
[3] Chris Okasaki. “Purely Functional Data Structures”. Cambridge university
press, (July 1, 1999), ISBN-13: 978-0521663502
[4] Wikipedia, “Pascal’s triangle”. http://en.wikipedia.org/wiki/Pascal’s_triangle

[5] Hackage. “An alternate implementation of a priority queue based on

a Fibonacci heap.”, http://hackage.haskell.org/packages/archive/pqueue-
mtl/1.0.7/doc/html/src/Data-Queue-FibQueue.html
[6] Chris Okasaki. “Fibonacci Heaps.” http://darcs.haskell.org/nofib/gc/fibheaps/orig

[7] Michael L. Fredman, Robert Sedgewick, Daniel D. Sleator, and Robert E.

Tarjan. “The Pairing Heap: A New Form of Self-Adjusting Heap” Algo-
rithmica (1986) 1: 111-129.

261
262 BIBLIOGRAPHY
Part IV

Queues and Sequences

263
Chapter 10

Queue, not so simple as it

was thought

10.1 Introduction
It seems that queues are relative simple. A queue provides FIFO (first-in, first-
out) data manipulation support. There are many options to realize queue in-
cludes singly linked-list, doubly linked-list, circular buffer etc. However, we’ll
show that it’s not so easy to realize queue in purely functional settings if it must
satisfy abstract queue properties.
In this chapter, we’ll present several different approaches to implement
queue. A queue is a FIFO data structure satisfies the following performance
constraints.

• Element can be added to the tail of the queue in O(1) constant time;

• Element can be removed from the head of the queue in O(1) constant
time.

These two properties must be satisfied. And it’s common to add some extra
goals, such as dynamic memory allocation etc.
Of course such abstract queue interface can be implemented with doubly-
linked list trivially. But this is a overkill solution. We can even implement
imperative queue with singly linked-list or plain array. However, our main
question here is about how to realize a purely functional queue as well?
We’ll first review the typical queue solution which is realized by singly linked-
list and circular buffer in first section; Then we give a simple and straightforward
functional solution in the second section. While the performance is ensured in
terms of amortized constant time, we need find real-time solution (or worst-case
solution) for some special case. Such solution will be described in the third
and the fourth section. Finally, we’ll show a very simple real-time queue which
depends on lazy evaluation.
Most of the functional contents are based on Chris, Okasaki’s great work in
[6]. There are more than 16 different types of purely functional queue given in
that material.

265
266 CHAPTER 10. QUEUE, NOT SO SIMPLE AS IT WAS THOUGHT

10.2 Queue by linked-list and circular buffer

10.2.1 Singly linked-list solution
Queue can be implemented with singly linked-list. It’s easy to add and remove
element at the front end of a linked-list in O(1) time. However, in order to
keep the FIFO order, if we execute one operation on head, we must perform the
inverse operation on tail.
In order to operate on tail, for plain singly linked-list, we must traverse the
whole list before adding or removing. Traversing is bound to O(n) time, where
n is the length of the list. This doesn’t match the abstract queue properties.
The solution is to use an extra record to store the tail of the linked-list. A
sentinel is often used to simplify the boundary handling. The following ANSI
C 1 code defines a queue realized by singly linked-list.
typedef int Key;

struct Node{
Key key;
struct Node∗ next;
};

struct Queue{
struct Node ∗head, ∗tail;
};

Figure 10.1 illustrates an empty list. Both head and tail point to the sentinel
NIL node.

head tail

Figure 10.1: The empty queue, both head and tail point to sentinel node.

We summarize the abstract queue interface as the following.

function Empty ▷ Create an empty queue
function Empty?(Q) ▷ Test if Q is empty
function Enqueue(Q, x) ▷ Add a new element x to queue Q
function Dequeue(Q) ▷ Remove element from queue Q
function Head(Q) ▷ get the next element in queue Q in FIFO order
1 It’s possible to parameterize the type of the key with C++ template. ANSI C is used here

for illustration purpose.

10.2. QUEUE BY LINKED-LIST AND CIRCULAR BUFFER 267

Note the difference between Dequeue and Head. Head only retrieve next
element in FIFO order without removing it, while Dequeue performs removing.
In some programming languages, such as Haskell, and most object-oriented
languages, the above abstract queue interface can be ensured by some definition.
For example, the following Haskell code specifies the abstract queue.
class Queue q where
empty :: q a
isEmpty :: q a → Bool
push :: q a → a → q a -- Or named as 'snoc', append, pushλ_back
pop :: q a → q a -- Or named as 'tail', popλ_front
front :: q a → a -- Or named as 'head'

To ensure the constant time Enqueue and Dequeue, we add new element
to head and remove element from tail.2
function Enqueue(Q, x)
p ← Create-New-Node
Key(p) ← x
Next(p) ← N IL
Next(Tail(Q)) ← p
Tail(Q) ← p
Note that, as we use the sentinel node, there are at least one node, the
sentinel in the queue. That’s why we needn’t check the validation of of the tail
before we append the new created node p to it.
function Dequeue(Q)
x ← Head(Q)
Next(Head(Q)) ← Next(x)
if x = Tail(Q) then ▷ Q gets empty
Tail(Q) ← Head(Q)
return Key(x)
As we always put the sentinel node in front of all the other nodes, function
Head actually returns the next node to the sentinel.
Figure 10.2 illustrates Enqueue and Dequeue process with sentinel node.
Translating the pseudo code to ANSI C program yields the below code.
struct Queue∗ enqueue(struct Queue∗ q, Key x) {
struct Node∗ p = (struct Node∗)malloc(sizeof(struct Node));
p→key = x;
p→next = NULL;
q→tail→next = p;
q→tail = p;
return q;
}

Key dequeue(struct Queue∗ q) {

struct Node∗ p = head(q); /∗gets the node next to sentinel∗/
Key x = key(p);
q→head→next = p→next;
if(q→tail == p)
2 It’s possible to add new element to the tail, while remove element from head, but the

operations are more complex than this approach.

268 CHAPTER 10. QUEUE, NOT SO SIMPLE AS IT WAS THOUGHT

head tail x NIL

Enqueue

Sentinel a ... e NIL

(a) Before Enqueue x to queue

head tail

Sentinel a ... e x NIL

(b) After Enqueue x to queue

head tail

Sentinel a b ... e NIL

Dequeue

(c) Before Dequeue

head tail

Sentinel b ... e NIL

(d) After Dequeue

Figure 10.2: Enqueue and Dequeue to linked-list queue.

10.2. QUEUE BY LINKED-LIST AND CIRCULAR BUFFER 269

q→tail = q→head;
free(p);
return x;
}

This solution is simple and robust. It’s easy to extend this solution even to
the concurrent environment (e.g. multicores). We can assign a lock to the head
and use another lock to the tail. The sentinel helps us from being dead-locked
due to the empty case [1] [2].

Exercise 10.1

• Realize the Empty? and Head algorithms for linked-list queue.

• Implement the singly linked-list queue in your favorite imperative pro-

gramming language. Note that you need provide functions to initialize
and destroy the queue.

10.2.2 Circular buffer solution

Another typical solution to realize queue is to use plain array as a circular buffer
(also known as ring buffer). Oppose to linked-list, array support appending to
the tail in constant O(1) time if there are still spaces. Of course we need re-
allocate spaces if the array is fully occupied. However, Array performs poor
in O(n) time when removing element from head and packing the space. This
is because we need shift all rest elements one cell ahead. The idea of circular
buffer is to reuse the free cells before the first valid element after we remove
elements from head.
The idea of circular buffer can be described in figure 10.3 and 10.4.
If we set a maximum size of the buffer instead of dynamically allocate mem-
ories, the queue can be defined with the below ANSI C code.
struct QueueBuf{
Key∗ buf;
int head, cnt, size;
};

When initialize the queue, we are explicitly asked to provide the maximum
size as the parameter.
struct QueueBuf∗ createQ(int max){
struct QueueBuf∗ q = (struct QueueBuf∗)malloc(sizeof(struct QueueBuf));
q→buf = (Key∗)malloc(sizeof(Key)∗max);
q→size = max;
q→head = q→cnt = 0;
return q;
}

With the counter variable, we can compare it with zero and the capacity to
test if the queue is empty or full.
function Empty?(Q)
return Count(Q) = 0
270 CHAPTER 10. QUEUE, NOT SO SIMPLE AS IT WAS THOUGHT

head tail boundary

a[0] a[1] ... a[i] ...

(a) Continuously add some elements.

head tail boundary

... a[j] ... a[i] ...

(b) After remove some elements from head,

there are free cells.

head tail boundary

... a[j] ... a[i]

(c) Go on adding elements till the boundary of

the array.

tail head boundary

a[0] ... a[j] ...

(d) The next element is added to the first

free cell on head.

tail head boundary

a[0] a[1] ... a[j-1] a[j] ...

(e) All cells are occupied. The queue is full.

Figure 10.3: A queue is realized with ring buffer.

10.2. QUEUE BY LINKED-LIST AND CIRCULAR BUFFER 271

Figure 10.4: The circular buffer.

To realize Enqueue and Dequeue, an easy way is to calculate the modular

of index as the following.
function Enqueue(Q, x)
if ¬ Full?(Q) then
Count(Q) ← Count(Q) + 1
tail ← (Head(Q) + Count(Q)) mod Size(Q)
Buffer(Q)[tail] ← x
function Head(Q)
if ¬ Empty?(Q) then
return Buffer(Q)[Head(Q)]
function Dequeue(Q)
if ¬ Empty?(Q) then
Head(Q) ← (Head(Q) + 1) mod Size(Q)
Count(Q) ← Count(Q) - 1
However, modular is expensive and slow depends on some settings, so one
may replace it by some adjustment. For example as in the below ANSI C
program.

void enQ(struct QueueBuf∗ q, Key x){

if(!fullQ(q)){
q→buf[offset(q→head + q→cnt, q→size)] = x;
q→cnt++;
}
}

Key headQ(struct Queue∗ q) {

return q→buf[q→head]; //��Ϊ�¼»¿£
}
272 CHAPTER 10. QUEUE, NOT SO SIMPLE AS IT WAS THOUGHT

Key deQ(struct QueueBuf∗ q){

Key x = headQ(q);
q→head = offset(++q→head, q→size);
q→cnt--;
return x;
}

Exercise 10.2
The circular buffer is allocated with a maximum size parameter. Can we
test the queue is empty or full with only head and tail pointers? Note that the
head can be either before or after the tail.

10.3 Purely functional solution

10.3.1 Paired-list queue
We can’t just use a list to implement queue, or we can’t satisfy abstract queue
properties. This is because singly linked-list, which is the back-end data struc-
ture in most functional settings, performs well on head in constant O(1) time,
while it performs in linear O(n) time on tail, where n is the length of the list.
Either dequeue or enqueue will perform proportion to the number of elements
stored in the list as shown in figure 10.5.

EnQueue O(1) x[n] x[n-1] ... x[2] x[1] NIL DeQueue O(n)

(a) DeQueue performs poorly.

EnQueue O(n) x[n] x[n-1] ... x[2] x[1] NIL DeQueue O(1)

(b) EnQueue performs poorly.

Figure 10.5: DeQueue and EnQueue can’t perform both in constant O(1)
time with a list.

We neither can add a pointer to record the tail position of the list as what
we have done in the imperative settings like in the ANSI C program, because
of the nature of purely functional.
Chris Okasaki mentioned a simple and straightforward functional solution
in [6]. The idea is to maintain two linked-lists as a queue, and concatenate these
two lists in a tail-to-tail manner. The shape of the queue looks like a horseshoe
magnet as shown in figure 10.6.
With this setup, we push new element to the head of the rear list, which is
ensure to be O(1) constant time; on the other hand, we pop element from the
head of the front list, which is also O(1) constant time. So that the abstract
queue properties can be satisfied.
The definition of such paired-list queue can be expressed in the following
Haskell code.
type Queue a = ([a], [a])

empty = ([], [])

10.3. PURELY FUNCTIONAL SOLUTION 273

(a) a horseshoe magnet.

front

DeQueue O(1) x[n] x[n-1 ... x[2] x[1] NIL

EnQueue O(1) y[m] y[m-1] ... y[2] y[1] NIL

rear

(b) concatenate two lists tail-to-tail.

Figure 10.6: A queue with front and rear list shapes like a horseshoe magnet.
274 CHAPTER 10. QUEUE, NOT SO SIMPLE AS IT WAS THOUGHT

Suppose function f ront(Q) and rear(Q) return the front and rear list in
such setup, and Queue(F, R) create a paired-list queue from two lists F and R.
The EnQueue (push) and DeQueue (pop) operations can be easily realized
based on this setup.

push(Q, x) = Queue(f ront(Q), {x} ∪ rear(Q)) (10.1)

pop(Q) = Queue(tail(f ront(Q)), rear(Q)) (10.2)

where if a list X = {x1 , x2 , ..., xn }, function tail(X) = {x2 , x3 , ..., xn } returns

the rest of the list without the first element.
However, we must next solve the problem that after several pop operations,
the front list becomes empty, while there are still elements in rear list. One
method is to rebuild the queue by reversing the rear list, and use it to replace
front list.
Hence a balance operation will be execute after popping. Let’s denote the
front and rear list of a queue Q as F = f ront(Q), and R = f ear(Q).

{
Queue(reverse(R), ϕ) : F = ϕ
balance(F, R) = (10.3)
Q : otherwise

Thus if front list isn’t empty, we do nothing, while when the front list be-
comes empty, we use the reversed rear list as the new front list, and the new
rear list is empty.
The new enqueue and dequeue algorithms are updated as below.

push(Q, x) = balance(F, {x} ∪ R) (10.4)

pop(Q) = balance(tail(F ), R) (10.5)

Sum up the above algorithms and translate them to Haskell yields the fol-
lowing program.
balance :: Queue a → Queue a
balance ([], r) = (reverse r, [])
balance q = q

push :: Queue a → a → Queue a

push (f, r) x = balance (f, x:r)

pop :: Queue a → Queue a

pop ([], _) = error "Empty"
pop (_:f, r) = balance (f, r)

Although we only touch the heads of front list and rear list, the overall
performance can’t be kept always as O(1). Actually, the performance of this
algorithm is amortized O(1). This is because the reverse operation takes time
proportion to the length of the rear list. it’s bound O(n) time, where N = |R|.
We left the prove of amortized performance as an exercise to the reader.
10.3. PURELY FUNCTIONAL SOLUTION 275

10.3.2 Paired-array queue - a symmetric implementation

There is an interesting implementation which is symmetric to the paired-list
queue. In some old programming languages, such as legacy version of BASIC,
There is array supported, but there is no pointers, nor records to represent
linked-list. Although we can use another array to store indexes so that we
can represent linked-list with implicit array, there is another option to realized
amortized O(1) queue.
Compare the performance of array and linked-list. Below table reveals some
facts (Suppose both contain n elements).
operation Array Linked-list
insert on head O(n) O(1)
insert on tail O(1) O(n)
remove on head O(n) O(1)
remove on tail O(1) O(n)
Note that linked-list performs in constant time on head, but in linear time
on tail; while array performs in constant time on tail (suppose there is enough
memory spaces, and omit the memory reallocation for simplification), but in
linear time on head. This is because we need do shifting when prepare or
eliminate an empty cell in array. (see chapter ’the evolution of insertion sort’
for detail.)
The above table shows an interesting characteristic, that we can exploit it
and provide a solution mimic to the paired-list queue: We concatenate two
arrays, head-to-head, to make a horseshoe shape queue like in figure 10.7.

front array

x[1] x[2] ... x[n-1] x[n] EnQueue O(1)

y[1] y[2] ... y[m-1] y[m] DeQueue O(1)

rear array

(a) a horseshoe magnet. (b) concatenate two arrays head-to-head.

Figure 10.7: A queue with front and rear arrays shapes like a horseshoe magnet.

3
We can define such paired-array queue like the following Python code
class Queue:
def __init__(self):
self.front = []
self.rear = []

3 Legacy Basic code is not presented here. And we actually use list but not array in Python

to illustrate the idea. ANSI C and ISO C++ programs are provides along with this chapter,
they show more in a purely array manner.
276 CHAPTER 10. QUEUE, NOT SO SIMPLE AS IT WAS THOUGHT

def is_empty(q):
return q.front == [] and q.rear == []

The relative Push() and Pop() algorithm only manipulate on the tail of the
arrays.
function Push(Q, x)
Append(Rear(Q), x)
Here we assume that the Append() algorithm append element x to the end
of the array, and handle the necessary memory allocation etc. Actually, there
are multiple memory handling approaches. For example, besides the dynamic
re-allocation, we can initialize the array with enough space, and just report error
if it’s full.
function Pop(Q)
if Front(Q) = ϕ then
Front(Q) ← Reverse(Rear(Q))
Rear(Q) ← ϕ
n ← Length(Front(Q))
x ← Front(Q)[n]
Length(Front(Q)) ← n − 1
return x
For simplification and pure illustration purpose, the array isn’t shrunk ex-
plicitly after elements removed. So test if front array is empty (ϕ) can be realized
as check if the length of the array is zero. We omit all these details here.
The enqueue and dequeue algorithms can be translated to Python programs
straightforwardly.
def push(q, x):
q.rear.append(x)

def pop(q):
if q.front == []:
q.rear.reverse()
(q.front, q.rear) = (q.rear, [])
return q.front.pop()

Similar to the paired-list queue, the performance is amortized O(1) because

the reverse procedure takes linear time.

Exercise 10.3

Elementary Algorithms PDF

Uploaded by

Elementary Algorithms PDF

Uploaded by

Elementary Algorithms

August 25, 2018

1 Binary search tree, the ‘hello world’ data structure 23

2 The evolution of insertion sort 43

3 Red-black tree, not so complex as it was thought 53

5 Radix tree, Trie and Prefix Tree 91

III Heaps 155

7 Binary Heaps 157

9 Binomial heap, Fibonacci heap, and pairing heap 213

IV Queues and Sequences 257

11 Sequences, The last brick 285

11.6.4 Handling the ill-formed finger tree when removing . . . . 313

V Sorting and Searching 341

B The imperative red-black tree deletion algorithm 577

C AVL tree - proofs and deletion algorithm 587

D Suffix Tree 599

GNU Free Documentation License 625

0.2 The smallest free ID problem, the power of

[18, 4, 8, 9, 16, 1, 14, 7, 19, 3, 0, 5, 2, 11, 6]

minf ree(x1 , x2 , ..., xn ) ≤ n (1)

void s e t b i t ( unsigned int ∗ b i t s , unsigned int i ) {

int t e s t b i t ( unsigned int ∗ b i t s , unsigned int i ) {

unsigned int b i t s [N/WORD_LENGTH+ 1 ] ;

int min_free ( int ∗ xs , int n ) {

0.2.2 Improvement 2, Divide and Conquer

minf ree(A) = search(A, 0, |A| − 1)

minFree xs = bsearch xs 0 (length xs - 1)

0.2.3 Expressiveness vs. Performance

int min_free(int∗ xs, int n){

x[i]<=m x[i]>m ...?...

cursive form which lends itself perfectly to this transformation

0.3 The number puzzle, power of data structure

0.3.1 The brute-force solution

10: function Valid?(x)

We start from 1, and times it with 2, or 3, or 5 to generate rest numbers.

1*2=2 1*3=3 1*5=5 2*2=4 2*3=6 2*5=10 3*2=6 3*3=9 3*5=15

4*2=8 4*3=12 4*5=20

Figure 2: First 4 steps of constructing numbers with a queue.

This algorithm is shown as the following.

11: function Unique-Enqueue(Q, x)

Figure 3: Queue access count v.s. n.

X = {1} ∪ {2x : ∀x ∈ X} ∪ {3x : ∀x ∈ X} ∪ {5x : ∀x ∈ X} (2)

By evaluate ns !! (n-1), we can get the 1500th number as below.

• If x comes from Q2 , we ENQUEUE 2x, 3x, and 5x back to Q2 , Q3 , and

2*min=4 3*min=6 5*min=10 3*min=9 5*min=15

2*min=8 3*min=12 5*min=20 5*min=25

Figure 4: First 4 steps of constructing numbers with Q2 , Q3 , and Q5 .

typedef unsigned long Integer;

Integer get_number(int n){

This solution can be also implemented in Functional way. We define a func-

take(n) = f (n, {1}, {2}, {3}, {5})

x = min(Q21 , Q31 , Q51 )

takeN n = ks n [1] ([2], [3], [5])

0.4 Notes and short summary

Compare to what we learned in mathematics course in school, we haven’t been

0.5 Structure of the contents

[1] Richard Bird. “Pearls of functional algorithm design”. Cambridge Univer-

Binary search tree, the

$ g++ wordcount.cpp -o wordcount

The map provided in the standard template library is a kind of balanced

| wordcount.exe > wc.txt

• either an empty node;

• or a node containing 3 parts: a value, a left child which is a binary tree

Figure 1.1 shows this concept and an example binary tree.

(a) Concept of binary tree

(b) An example binary tree

Figure 1.1: Binary tree concept and an example.

A BST is a binary tree where the following applies to each node:

1.2 Data Layout

Figure 1.2: An example of a BST

There is another setting, for instance in Scheme/Lisp languages, the elemen-

key + satellite data

key + satellite data key + satellite data

... ... ... ...

Figure 1.3: Layout of nodes with parent field.

left ... next

12=2 13=3 15=5 22=4 23=6 25=10 32=6 33=9 3*5=15

42=8 43=12 4*5=20

2min=4 3min=6 5min=10 3min=9 5*min=15

2min=8 3min=12 5min=20 5min=25