0% found this document useful (0 votes)

2K views432 pages

Algorithms Design

Uploaded by

Nguyễn Minh Tiến

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2K views432 pages

Algorithms Design

Uploaded by

Nguyễn Minh Tiến

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 432

Cornell University

Boston San Francisco NewYork

London Toronto Sydney Tokyo Singapore Madrid
Mexico City Munich Paris Cape Toxvn Hong Kong Montreal
Acquisitions Editor: Matt Goldstein
Project Editor: Maite Suarez-Rivus
Production Supervisor: MariIyn Lloyd
Marketing Manager: MichelIe Brown
Marketing Coordinator: Yake Zavracky
Project Management: Windfall Sofi-tvare
Composition: Windfall Software, using ZzTEX
Copyeditor: Carol Leyba
Technical Illustration: Dartmouth Publishing About the Authors
Proofreader: Jennifer McClain
Indexer: Ted Laux
Cover Design: Yoyce Cosentino Wells
Cover Photo: © 2005 Tim Laman / National Geographic. A pair of weaverbirds work
together on their nest in Africa.
Prepress and Manufacturing: Caroline Fell
Printer: Courier West~ord 3on Kleinberg is a professor of Computer Science at
Access the latest information about Addison-Wesley rifles from our World Wide Web Cornell University. He received his Ph.D. from M.I.T.
in 1996. He is the recipient of an NSF Career Award,
site: http://www.aw-bc.com/computing
an ONR Young Investigator Award, an IBM Outstand-
Many of the designations used by manufacturers and sellers to distinguish their
products are claimed as trademarks. Where those designations appear in this book,
ing Innovation Award, the National Academy of Sci-
and Addison-Wesley was aware of a trademark claim, the designations have been ences Award for Initiatives in Research, research fel-
printed in initial caps or all caps. lowships from the Packard and Sloan Foundations,
The programs and applications presented in this book have been included for their
and teaching awards from the Cornell Engineering
instructional value. They have been tested with care, but are not guaranteed for any College and Computer Science Department.
particular purpose. The publisher does not offer any warranties or representations, nor Kleinberg’s research is centered around algorithms, particularly those con-
does it accept any liabilities with respect to the programs or applications. cerned with the structure of networks and information, and with applications
Library of Congress Cataloging-in-Publication Data to information science, optimization, data mining, and computational biol-
Kleinberg, Jon.
ogy. His work on network analysis using hubs and authorities helped form the
Algorithm design / Jon Kleinberg, l~va Tardos.--lst ed. foundation for the current generation of Internet search engines.
p. cm.
Includes bibliographical references and index.
ISBN 0-321-29535-8 (alk. paper)
1. Computer algorithms. 2. Data structures (Computer science) I. Tardos, l~va.
fiva Tardos is a professor of Computer Science at Cor-
nell University. She received her Ph.D. from E6tv6s
II. Title. University in Budapest, Hungary in 1984. She is a
QA76.9.A43K54 2005 2005000401 member of the American Academy of Arts and Sci-
005.1--dc22 ences, and an ACM Fellow; she is the recipient of an
Copyright © 2006 by Pearson Education, Inc. NSF Presidential Young Investigator Award, the Fulk-
For information on obtaining permission for use of material in this work, please erson Prize, research fellowships from the Guggen-
submit a written request to Pearson Education, Inc., Rights and Contract Department, helm, Packard, and Sloan Foundations, and teach-
75 Arlington Street, Suite 300, Boston, MA 02116 or fax your request to (617) 848-7047. ing awards from the Cornell Engineering College and
All rights reserved. No part of this publication may be reproduced, stored in a Computer Science Department.
retrieval system, or transmitted, in any form or by any means, electronic, mechanical, Tardos’s research interests are focused on the design and analysis of
photocopying, recording, or any toher media embodiments now known or hereafter to algorithms for problems on graphs or networks. She is most known for her
become known, without the prior written permission of the publisher. Printed in the
work on network-flow algorithms and approximation algorithms for network
United States of America.
problems. Her recent work focuses on algorithmic game theory, an emerging
ISBN 0-321-29535-8 area concerned with designing systems and algorithms for selfish users.
2 3 4 5 6 7 8 9 10-CRW-08 07 06 05
Contents

About the Authors v

Preface

Introduction: Some Representative Problems

I. 1 A First Problem: Stable Matching,
1).
19
Exercises
Notes and Further Reading )‘8

2 Basics of Algorithm Analysis 29

2.1 Computational Tractability 29
2.2 Asymptotic Order of Growth 35
2.3 Implementing the Stable Matching Algorithm Using Lists and
Arrays 42
2.4 A Survey of Common Running Times 47
2.5 57
65
Exercises 67
Notes and Fm-ther Reading 70

3 Graphs 73
Basic Definitions and Applications 73
Graph Connectivity and Graph Traversal 78
Implementing Graph Traversal Using Queues and Stacks 87
Testing Bipartiteness: An Application of Breadth-First Search 94
Connectivity in Directed Graphs 97
Contents
Contents

6.4 Subset Sums and Knapsacks: Adding a.,~able 266

3.6 Directed Acyclic Graphs and Topological Ordering 99
6.5 RNA Secondary Structure: Dynarmc~gramming over
104
Intervals 272
Exercises 107
6.6 Sequence Alignment 278
Notes and Further Reading 112
6.7 Sequence Alignment in Linear Space via Divide and
Conquer 284
4 Greedy Algorithms
6.8 Shortest Paths in a Graph 290
Interval Scheduling: The Greedy Algorithm Stays Ahead 116
6.9 297
Scheduling to Minimize Lateness: An Exchange Argument 125
* 6.10 Negative Cycles in a Graph 301
Optimal Caching: A More Complex Exchange Argument 131
307
Shortest Paths in a Graph 137
Exercises 312
The Minimum Spanning Tree ProbJem 142
Notes and Further Reading 335
Implementing Kruskal’s Algorithm: The Union-Find Data
Structure 151
Clustering 157
7 337
4.7
The Maximum-Flow Problem and the Ford-FulkersOn
4.8 Huffman Codes and Data Compression 161
Algorithm 338
*4.9 Minimum-Cost Arborescences: A Multi-Phase Greedy
7.2 Maximum Flows and Minimum Cuts in a Network 346
Algorithm 177
183 7.3 Choosing Good Augmenting Paths 352
* 7.4 The Preflow-Push Maximum-Flow Algorithm:, 357
Exercises 188
7.5 A First Application: The Bipartite Matching Problem 367
Notes and Further Reading 205
7.6 373
5 Divide and Conquer 209 7.7 378
7.8 Survey Design 384
5.1 A First Recurrence: The Mergesort Algorithm 210
5.2 Further Recurrence Relations 214 7.9 Airline Scheduling 387
7.!0 Image Segmentation 391
5.3 Counting Inversions 221
7.11 \
5.4 Finding the Closest Pair of Points 225
7.12 Baseball Elimination 400
5.5 Integer Multiplication 231
234 "7.!3 A Further Direction: Adding Costs to the Matching Problem,~) 404
5.6
242 Solved Exercises 411
Exercises 246 Exercises 415
Notes and Further Reading 249 Notes and Further Reading 448

6 2S1 451
6.1 Weighted Interval Scheduling: A Recursive Procedure 252 8.1 Polynomial-Time Reductions 452
6.2 Principles of Dynamic Programming: Memoization or Iteration 8.2 Reductions via "Gadgets": The Satisfiabflity Problem 459
over Subproblems 258 8.3 Efficient Certification and the Definition of NP 463
6.3 Segmented Least Squares: Multi-way Choices 26~ 8.4 NP-Complete Problems 466
8.5 Sequencing,,Problems 473
8.6 Partitioning Problems 481
* The star indicates an optional section. (See the Preface for more information about the relationships
8.7 Graph Coloring 485
among the chapters and sections.)
Contents Contents
X

8.8 Numerical Problems 490 11.8 Arbitrarily Good Approximations: The Knapsack Problem 644
8.9 Co-NP and the Asymmetry of NP 495 649
8.10 A Partial Taxonomy of Hard Problems 497 Exercises 651
500 Notes and Further Reading 659
Exercises 505
Notes and Further Reading 529 12 Local Search 661
12.1 The Landscape of an Optimization Problem 662
531 12.2 The Metropolis Algorithm and Simulated Annealing 666
9 PSPACE: A Class of Problems beyond NP
PSPACE 531 12.3 An Application of Local Se_arch to Hopfield Neural Networks
Some Hard Problems in PSPACE 533 12.4 676
Solving Quantified Problems and Games in Polynomia! 12.5 Choosing a Neighbor Relation 679
12.6 Classification via Local Search 681
Space 536
12.7 690
9.4 Solving the Planning Problem in Polynomial Space 538
543 700
547 Exercises 702
Exercises 550 Notes and Further Reading 705
Notes and Further Reading 551
13 707
553 13.1 A First Application: Contention Resolution 708
10 Extending the Limits of Tractability 13.2 Finding the Global Minimum Cut 714
10.! Finding Smal! Vertex Covers 554
13.3 Random Variables and Their Expectations 719
10.2 Solving NP-Hard Problems on Trees 558
13.4 A Randomized Approximation Algorithm for MAX 3-SAT 724
10.3 Coloring a Set of Circular Arcs 563
13.5 Randomized Divide and Conquer: Median-Finding and
* 10.4 Tree Decompositions of Graphs 572
Quicksort 727
* 10.5 584
13.6 Hashing: A Randomized Implementation of Dictionaries 734
591
13.7 Finding the Closest Pair of Points: A Randomized Approach 741
Exercises 594
13.8 Randomized Caching 750
Notes and Further Reading 598
13.9 Chernoff Bounds 758
13.10 Load Balancing 760
11 Approximation Algorithms 599
13.1! Packet Routing 762
11.1 Greedy Algorithms and Bounds on the Optimum: A Load 13.12 Background: Some Basic ProbabiLity Definitions 769
Balancing Problem 600
776
606
Exercises 782
11.3 Set Cover: A General Greedy Heuristic 612
Notes and Further Reading 793
11.4 The Pricing Method: Vertex Cover 618
11.5 Maximization via the Pricing Method: The Disioint Paths 795
Problem 624
805
11.6 Linear Programming and Rounding: An Application to Vertex
Cover 630 815
* 11.7 Load Balancing Revisited: A More Advanced LP Application 637
Algorithmic !deas are pervasive, and their reach is apparent in examples both
within computer science and beyond. Some of the major shifts in Internet
routing standards can be viewed as debates over the deficiencies of one
shortest-path algorithm and the relative advantages of another. The basic
notions used by biologists to express similarities among genes and genomes
have algorithmic definitions. The concerns voiced by economists over the
feasibility of combinatorial auctions in practice are rooted partly in the fact that
these auctions contain computationally intractable search problems as special
cases. And algorithmic notions aren’t just restricted to well-known and long-
standing problems; one sees the reflections of these ideas on a regular basis,
in novel issues arising across a wide range of areas. The scientist from Yahoo!
who told us over lunch one day about their system for serving ads to users was
describing a set of issues that, deep down, could be modeled as a network flow
problem. So was the former student, now a management consultant working
on staffing protocols for large hospitals, whom we happened to meet on a trip
to New York City.
The point is not simply that algorithms have many applications. The
deeper issue is that the subject of algorithms is a powerful lens through which
to view the field of computer science in general. Algorithmic problems form
the heart of computer science, but they rarely arrive as cleanly packaged,
mathematically precise questions. Rather, they tend to come bundled together
with lots of messy, application-specific detail, some of,it essential, some of it
extraneous. As a result, the algorithmic enterprise consists of two fundamental
components: the task of getting to the mathematically clean core of a problem,
and then the task of identifying the appropriate algorithm design techniques,
based on the structure of the problem. These two components interact: the
more comfortable one is with the full array of possible design techniques,
the more one starts to recognize the clean formulations that lie within messy
Preface Preface XV
xiv

problems out in the world. At their most effective, then, algorithmic ideas do intelligence (planning, game playing, Hopfield networks), computer vision
not just provide solutions to _well-posed problems; they form the language that (image segmentation), data mining (change-point detection, clustering), op-
lets you cleanly express the underlying questions. erations research (airline scheduling), and computational biology (sequence
alignment, RNA secondary structure).
The goal of our book is to convey this approach to algorithms, as a design
process that begins with problems arising across the full range of computing The notion of computational intractability, and NP-completeness in par-
applications, builds on an understanding of algorithm design techniques, and ticular, plays a large role in the book. This is consistent with how we think
results in the development of efficient solutions to these problems. We seek about the overall process of algorithm design. Some of the time, an interest-
to explore the role of algorithmic ideas in computer science generally, and ing problem arising in an application area will be amenable to an efficient
relate these ideas to the range of precisely formulated problems for which we solution, and some of the time it will be provably NP-complete; in order to
can design and analyze algorithms. In other words, what are the underlying fully address a new algorithmic problem, one should be able to explore both
issues that motivate these problems, and how did we choose these particular of these ol)tions with equal familiarity. Since so many natural problems in
ways of formulating them? How did we recognize which design principles were computer science are NP-complete, the development of methods to deal with
appropriate in different situations? intractable problems has become a crucial issue in the study of algorithms,
and our book heavily reflects this theme. The discovery that a problem is NP-
In keeping with this, our goal is to offer advice on how to identify clean
complete should not be taken as the end of the story, but as an invitation to
algorithmic problem formulations in complex issues from different areas of
begin looking for approximation algorithms, heuristic local search techniques,
computing and, from this, how to design efficient algorithms for the resulting
or tractable special cases. We include extensive coverage of each of these three
problems. Sophisticated algorithms are often best understood by reconstruct-
approaches.
ing the sequence of ideas--including false starts and dead ends--that led from
simpler initial approaches to the eventual solution. The result is a style of ex-
position that does not take the most direct route from problem statement to Problems and Solved Exercises
algorithm, but we feel it better reflects the way that we and our colleagues An important feature of the book is the collection of problems. Across all
genuinely think about these questions. chapters, the book includes over 200 problems, almost a!l of them developed
and class-tested in homework or exams as part of our teaching of the course
Overview at Cornell. We view the problems as a crucial component of the book, and
The book is intended for students who have completed a programming- they are structured in keeping with our overall approach to the material. Most
based two-semester introductory computer science sequence (the standard of them consist of extended verbal descriptions of a problem arising in an
"CS1/CS2" courses) in which they have written programs that implement application area in computer science or elsewhere out in the world, and part of
basic algorithms, manipulate discrete structures such as trees and graphs, and the problem is to practice what we discuss in the text: setting up the necessary
apply basic data structures such as arrays, lists, queues, and stacks. Since notation and formalization, designing an algorithm, and then analyzing it and
the interface between CS1/CS2 and a first algorithms course is not entirely proving it correct. (We view a complete answer to one of these problems as
standard, we begin the book with self-contained coverage of topics that at consisting of all these components: a fl~y explained algorithm, an analysis of
some institutions a_re familiar to students from CS1/CS2, but which at other the nmning time, and a proof of correctness.) The ideas for these problems
institutions are included in the syllabi of the first algorithms course. This come in large part from discussions we have had over the years with people
material can thus be treated either as a review or as new material; by including working in different areas, and in some cases they serve the dual purpose of
it, we hope the book can be used in a broader array of courses, and with more recording an interesting (though manageable) application of algorithms that
flexibility in the prerequisite knowiedge that is assumed. we haven’t seen written down anywhere else.
In keeping with the approach outlined above, we develop the basic algo- To help with the process of working on these problems, we include in
rithm design techniques by drawing on problems from across many areas of each chapter a section entitled "Solved Exercises," where we take one or more
computer science and related fields. To mention a few representative examples problems and describe how to go about formulating a solution. The discussion
here, we include fairly detailed discussions of applications from systems and devoted to each solved exercise is therefore significantly longer than what
networks (caching, switching, interdomain routing on the Internet), artificial would be needed simply to write a complete, correct solution (in other words,
xvi Preface Preface xvii

significantly longer than what it would take to receive full credit if these were login and password, search the site for either "Kleinberg°’ or "Tardos" or
being assigned as homework problems). Rather, as with the rest of the text, contact your local Addison-Wesley representative.
the discussions in these sections should be viewed as trying to give a sense Finally, we would appreciate receiving feedback on the book. In particular,
of the larger process by which one might think about problems of this type, as in any book of this length, there are undoubtedly errors that have remained
culminating in the speci.fication of a precise solution. in the final version. Comments and reports of errors can be sent to us by e-mail,
It is worth mentioning two points concerning the use of these problems at the address [email protected]; please include the word "feedback"
as homework in a course. First, the problems are sequenced roughly in order in the subject line of the message.
of increasing difficulty, but this is only an approximate guide and we advise
against placing too much weight on it: since the bulk of the problems were
designed as homework for our undergraduate class, large subsets of the Chapter-by-Chapter Synopsis
problems in each chapter are really closely comparable in terms of difficulty.
Chapter I starts by introducing some representative algorithmic problems. We
Second, aside from the lowest-numbered ones, the problems are designed to
involve some investment of time, both to relate the problem description to the begin immediately with the Stable Matching Problem, since we feel it sets
algorithmic techniques in the chapter, and then to actually design the necessary up the basic issues in algorithm design more concretely and more elegantly
than any abstract discussion could: stable matching is motivated by a natural
algorithm. In our undergraduate class, we have tended to assign roughly three
though complex real-world issue, from which one can abstract an interesting
of these problems per week.
problem statement and a surprisingly effective algorithm to solve this problem.
The remainder of Chapter 1 discusses a list of five "representative problems"
Pedagogical Features and Supplements
that foreshadow topics from the remainder of the course. These five problems
In addition to the Problems and solved exercises, the book has a number of are interrelated in the sense that they are all variations and/or special cases
further pedagogical features, as well as additional supplements to facilitate its of the Independent Set Problem; but one is solvable bya greedy algorithm,
use for teaching. one by dynamic programming, one by network flow, one (the Independent
As noted earlier, a large number of the sections in the book axe devoted Set Problem itself) is NP-complete, and one is PSPACE-complete. The fact that
to the formulation of an algorithmic problem--including its background and closely related problems can vary greatly in complexity is an important theme
underlying motivation--and the design and analysis of an algorithm for this of the book, and these five problems serve as milestones that reappear as the
problem. To reflect this style, these sections are consistently structured around book progresses.
a sequence of subsections: "The Problem," where the problem is described Chapters 2 and 3 cover the interface to the CS1/CS2 course sequence
and a precise formulation is worked out; "Designing the Algorithm," where mentioned earlier. Chapter 2 introduces the key mathematical definitions and
the appropriate design technique is employed to develop an algorithm; and notations used for analyzing algorithms, as wel! as the motivating principles
"Analyzing the Algorithm," which proves properties of the algorithm and behind them. It begins with an informal overview of what it means for a prob-
analyzes its efficiency. These subsections are highlighted in the text with an lem to be computationally tractable, together with the concept of polynomial
icon depicting a feather. In cases where extensions to the problem or further time as a formal notion of efficiency. It then discusses growth rates of func-
analysis of the algorithm is pursued, there are additional subsections devoted tions and asymptotic analysis more formally, and offers a guide to commordy
to these issues. The goal of this structure is to offer a relatively uniform style occurring functions in algorithm analysis, together with standard applications
of presentation that moves from the initial discussion of a problem arising in a in which they arise. Chapter 3 covers the basic definitions and algorithmic
computing application through to the detailed analysis of a method to solve it. primitives needed for working with graphs, which are central to so many of
A number of supplements are available in support of the book itself. An the problems in the book. A number of basic graph algorithms are often im-
instructor’s manual works through al! the problems, providing fi~ solutions to plemented by students late in the CS1/CS2 course sequence, but it is valuable
each. A set of lecture slides, developed by Kevin Wayne of Princeton University, to present the material here in a broader algorithm design context. In par-
is also available; these slides follow the order of the book’s sections and can ticular, we discuss basic graph definitions, graph traversal techniques such
thus be used as the foundation for lectures in a course based on the book. These as breadth-first search and depth-first search, and directed graph concepts
files are available at wunv.aw.com. For instructions on obtaining a professor including strong connectivity and topological ordering.
Preface Preface

Chapters 2 and 3 also present many of the basic data structures that will find this is a valuable way to emphasize that intractability doesn’t end at
be used for implementing algorithms throughout the book; more advanced NP-completeness, and PSPACE-completeness also forms the underpinning for
data structures are presented in subsequent chapters. Our approach to data some central notions from artificial intelligence--planning and game playing--
structures is to introduce them as they are needed for the implementation of that would otherwise not find a place in the algorithmic landscape we are
the algorithms being developed in the book. Thus, although many of the data surveying.
structures covered herewill be familiar to students from the CS1/CS2 sequence,
Chapters 10 through 12 cover three maior techniques for dealing with com-
our focus is on these data structures in the broader context of algorithm design
putationally intractable problems: identification of structured special cases,
and analysis.
approximation algorithms, and local search heuristics. Our chapter on tractable
Chapters 4 through 7 cover four major algorithm design techniques: greedy special cases emphasizes that instances of NP-complete problems arising in
algorithms, divide and conquer, dynamic programming, and network flow. practice may not be nearly as hard as worst-case instances, because they often
With greedy algorithms, the challenge is to recognize when they work and contain some structure that can be exploited in the design of an efficient algo-
when they don’t; our coverage of this topic is centered around a way of clas- rithm. We illustrate how NP-complete problems are often efficiently solvable
sifying the kinds of arguments used to prove greedy algorithms correct. This when restricted to tree-structured inputs, and we conclude with an extended
chapter concludes with some of the main applications of greedy algorithms, discussion of tree decompositions of graphs. While this topic is more suit-
for shortest paths, undirected and directed spanning trees, clustering, and able for a graduate course than for an undergraduate one, it is a technique
compression. For divide and conquer, we begin with a discussion of strategies with considerable practical utility for which it is hard to find an existing
for solving recurrence relations as bounds on running times; we then show. accessible reference for students. Our chapter on approximation algorithms
how familiarity with these recurrences can guide thedesign of algorithms that discusses both the process of designing effective algorithms and the task of
improve over straightforward approaches to a number of basic problems, in- understanding the optimal solution well enough to obtain good bounds on it.
cluding the comparison of rankings, the computation of c!osest pairs of points As design techniques for approximation algorithms, we focus on greedy algo-
in the plane, and the Fast Fourier Transform. Next we develop dynamic pro- rithms, linear programming, anda third method we refer to as "pricing:’ which
gramming by starting with the recursive intuition behind it, and subsequently incorporates ideas from each of the first two. Finally, we discuss local search
building up more and more expressive recurrence formulations through appli- heuristics, including the Metropolis algorithm and simulated annealing. This
cations in which they naturally arise. This chapter concludes with extended topic is often missing from undergraduate algorithms courses, because very
discussions of the dynamic programming approach to two fundamental prob- little is known in the way of provable guarantees for these algorithms; how-
lems: sequence alignment, with applications in computational biology; and ever, given their widespread use in practice, we feel it is valuable for students
shortest paths in graphs, with connections to Internet routing protocols. Fi- to know something about them, and we also include some cases in which
nally, we cover algorithms for network flow problems, devoting much of our guarantees can be proved.
focus in this chapter to discussing a large array of different flow applications. Chapter 13 covers the use of randomization in the design of algorithms.
To the extent that network flow is covered in algorithms courses, students are This is a topic on which several nice graduate-level books have been written.
often left without an appreciation for the wide range of problems to which it Our goal here is to provide a more compact introduction to some of the
can be applied; we try to do iustice to its versatility by presenting applications ways in which students can apply randomized techniques using the kind of
to load balancing, scheduling, image segmentation, and a number of other background in probability one typically gains from an undergraduate discrete
problems. math course.
Chapters 8 and 9 cover computational intractability. We devote most of
our attention to NP-completeness, organizing the basic NP-complete problems Use of the Book
thematically to help students recognize candidates for reductions when they The book is primarily designed for use in a first undergraduate course on
encounter new problems. We build up to some fairly complex proofs of NP- algorithms, but it can also be used as the basis for an introductory graduate
completeness, with guidance on how one goes about constructing a difficult course.
~reduction. We also consider types of computational hardness beyond NP- When we use the book at the undergraduate level, we spend roughly
completeness, particularly through the topic of PSPACE-completeness. We one lecture per numbered section; in cases where there is more than one
Preface Preface xxi

lecture’s worth of material in a section (for example, when a section provides might be able to use particular algorithm design techniques in the context of
further applications as additional examples), we treat this extra material as a their own work. A number of graduate students and colleagues have used
supplement that students carl read about outside of lecture. We skip the starred portions of the book in this way.
sections; while these sections contain important topics, they are less central
to the development of the subject, and in some cases they are harder as well.
We also tend to skip one or two other sections per chapter in the first half of Acknowledgments
the book (for example, we tend to skip Sections 4.3, 4.7-4.8, 5.5-5.6, 6.5, 7.6, This book grew out of the sequence of algorithms co~ses that we have taught
and 7.!1). We cover roughly half of each of Chapters 11-13. at Cornell. These courses have grown, as the field has grown, over a number of
This last point is worth emphasizing: rather than viewing the later chapters years, and they reflect the influence of the Comell faculty who helped to shape
as "advanced," and hence off-limits to undergraduate algorithms courses, we them during this time, including Juris Hartmanis, Monika Henzinger, John
have designed them with the goal that the first few sections of each should Hopcroft, Dexter Kozen, Ronitt Rubinfeld, and Sam Toueg. More generally, we
be accessible to an undergraduate audience. Our own undergraduate course would like to thank al! our colleagues at Corne!l for countless discussions both
involves material from all these chapters, as we feel that all of these topics on the material here and on broader issues about the nature of the field.
have an important place at the undergraduate level. The course staffs we’ve had in teaching the subject have been tremen-
Finally, we treat Chapters 2 and 3 primarily as a review of material from dously helpful in the formulation of this material. We thank our undergradu-
earlier courses; but, as discussed above, the use of these two chapters depends ate and graduate teaching assistants, Siddharth Alexander, Rie Ando, Elliot
heavily on the relationship of each specific course to its prerequisites. Anshelevich, Lars Backstrom, Steve Baker, Ralph Benzinger, John Bicket,
Doug Burdick, Mike Connor, Vladimir Dizhoor, Shaddin Doghmi, Alexan-
The resulting syllabus looks roughly as follows: Chapter 1; Chapters 4-8 der Druyan, Bowei Du, Sasha Evfimievski, Ariful Gan~.,_ Vadim Grinshpun,
(excluding 4.3, 4.7-4.9, 5.5-5.6, 6.5, 6.10, 7.4, 7.6, 7.11, and 7.13); Chapter 9 Ara Hayrapetyan, Chris Jeuell, Igor Kats, Omar Khan£ Mikhail Kobyakov,
(briefly); Chapter 10, Sections.10.! and 10.2; Chapter 11, Sections 11.1, 11.2, Alexei Kopylov, Brian Kulis, Amit Kumar, Yeongwee Lee, Henry Lin, Ash-
11.6, and 11.8; Chapter 12, Sections 12.1-12.3; and Chapter 13, Sections 13.1- win Machanavajjhala, Ayan Mandal, Bill McCloskey, Leonid Meyerguz, Evan
13.5. Moran, Niranjan Nagarajan, Tina Nolte, Travis Ortogero, Martin P~il, Jon
The book also naturally supports an introductory graduate course on Peress, Matt Piotrowski, Joe Polastre, Mike Priscott, Xin Qi, Venu Ramasubra-
algorithms. Our view of such a course is that it should introduce students manian, Aditya Rao, David Richardson, Brian Sabino, Rachit Siamwalla, Se-
destined for research in all different areas to the important current themes in bastian Sllgardo, Alex Slivkins, Chaitanya Swamy, Perry Tam, Nadya Travinin,
algorithm design. Here we find the emphasis on formulating problems to be Sergei Vassilvitskii, Matthew Wachs, Tom Wexler, Shan-Leung Maverick Woo,
useful as well, since students will soon be trying to define their own research Justin Yang, and Misha Zatsman. Many of them have provided valuable in-
problems in many different subfields. For this type of course, we cover the sights, suggestions, and comments on the text. We also thank all the students
later topics in Chapters 4 and 6 (Sections 4.5-4.9 and 6.5-6.10), cover all of in these classes who have provided comments and feedback on early drafts of
Chapter 7 (moving more rapidly through the early sections), quickly cover NP- the book over the years.
completeness in Chapter 8 (since many beginning graduate students will have For the past several years, the development of the book has benefited
seen this topic as undergraduates), and then spend the remainder of the time greatly from the feedback and advice of colleagues who have used prepubli-
on Chapters !0-13. Although our focus in an introductory graduate course is cation drafts for teaching. Anna Karlin fearlessly adopted a draft as her course
on the more advanced sections, we find it usefifl for the students to have the textbook at the University of Washington when it was st~ in an early stage of
full book to consult for reviewing or filling in background knowledge, given development; she was followed by a number of people who have used it either
the range of different undergraduate backgrounds among the students in such as a course textbook or as a resource for teaching: Paul Beame, Allan Borodin,
a course. Devdatt Dubhashi, David Kempe, Gene Kleinberg, Dexter Kozen, Amit Kumar,
Finally, the book can be used to support self-study by graduate students, Mike Molloy, Yuval Rabani, Tim Roughgarden, Alexa Sharp, Shanghua Teng,
researchers, or computer professionals who want to get a sense for how they Aravind Srinivasan, Dieter van Melkebeek, Kevin Wayne, Tom Wexler, and
xxii Preface Preface xxiii

Sue Whitesides. We deeply appreciate their input and advice, which has in- This book was begun amid the irrational exuberance of the late nineties,
formed many of our revisions to the content. We would like to additionally when the arc of computing technology seemed, to many of us, briefly to pass
thank Kevin Wayne for producing supplementary material associated with the through a place traditionally occupied by celebrities and other inhabitants of
book, which promises to greatly extend its utility to future instructors. the pop-cultural firmament. (It was probably iust in our imaginations.) Now,
several years after the hype and stock prices have come back to earth, one can
In a number of other cases, our approach to particular topics in the book appreciate that in some ways computer science was forever changed by this
reflects the infuence of specific colleagues. Many of these contributions have period, and in other ways it has remained the same: the driving excitement
undoubtedly escaped our notice, but we especially thank Yufi Boykov, Ron
that has characterized the field since its early days is as strong and enticing as
Elber, Dan Huttenlocher, Bobby Kleinberg, Evie Kleinberg, Lillian Lee, David
ever, the public’s fascination with information technology is still vibrant, and
McAllester, Mark Newman, Prabhakar Raghavan, Bart Selman, David Shmoys,
the reach of computing continues to extend into new disciplines. And so to
St~ve Strogatz, Olga Veksler, Duncan Watts, and Ramin Zabih.
all students of the subject, drawn to it for so many different reasons, we hope
It has been a pleasure working with Addison Wesley over the past year. you find this book an enjoyable and useful guide wherever your computational
First and foremost, we thank Matt Goldstein for all his advice and guidance in pursuits may take you.
this process, and for helping us to synthesize a vast amount of review material
into a concrete plan that improved the book. Our early conversations about Jon Kleinberg
the book with Susan Hartman were extremely valuable as well. We thank Matt gva Tardos
and Susan, together with Michelle Brown, Marilyn Lloyd, Patty Mahtani, and. Ithaca, 2005
Maite Suarez-Rivas at Addison Wesley, and Paul Anagnostopoulos and Jacqui
Scarlott at Windfall Software, for all their work on the editing, production, and
management of the proiect. We fln-ther thank Paul and Jacqui for their expert
composition of the book. We thank Joyce Wells for the cover design, Nancy
Murphy of Dartmouth Publishing for her work on the figures, Ted Laux for
the indexing, and Carol Leyba and Jennifer McClain for the copyedifing and
proofreading.
We thank Anselm Blumer (Tufts University), Richard Chang (University of
Maryland, Baltimore County), Kevin Compton (University of Michigan), Diane
Cook (University of Texas, Arlington), Sariel Har-Peled (University of Illinois,
Urbana-Champaign), Sanjeev Khanna (University of Pennsylvania), Philip
Klein (Brown University), David Matthias (Ohio State University), Adam Mey-
erson (UCLA), Michael Mitzenmacher (Harvard University), Stephan Olariu
(Old Dominion University), Mohan Paturi (UC San Diego), Edgar Ramos (Uni-
versity of Illinois, Urbana-Champaign), Sanjay Ranka (University of Florida,
Gainesville), Leon Reznik (Rochester Institute of Technology), Subhash Suri
(UC Santa Barbara), Dieter van Melkebeek (University of Wisconsin, Madi-
son), and Bulent Yener (Rensselaer Polytechnic Institute) who generously
contributed their time to provide detailed and thoughtful reviews of the man-
uscript; their comments led to numerous improvements, both large and small,
in the final version of the text.
Finally, we thank our families--Lillian and Alice, and David, Rebecca, and
Amy. We appreciate their support, patience, and many other contributions
more than we can express in any acknowledgments here.
1.1 A First Problem: Stable Matching
As an opening topic, we look at an algorithmic problem that nicely illustrates
many of the themes we will be emphasizing. It is motivated by some very
natural and practical concerns, and from these we formulate a clean and
simple statement of a problem. The algorithm to solve the problem is very
clean as well, and most of our work will be spent in proving that it is correct
and giving an acceptable bound on the amount of time it takes to terminate
with an answer. The problem itself--the Stable Matching Problem--has several
origins.

~ The Problem
The Stable Matching Problem originated, in part, in 1962, when David Gale
and Lloyd Shapley, two mathematical economists, asked the question: Could
one design a college admissions process, or a job recruiting process, that was
self-enforcing? What did they mean by this?
To set up the question, let’s first think informally about the kind of situation
that might arise as a group of friends, all iurdors in college majoring in
computer science, begin applying to companies for summer internships. The
crux of the application process is the interplay between two different types
of parties: companies (the employers) and students (the applicants). Each
applicant has a preference ordering on companies, and each company--once
the applications Come in--forms a preference ordering on its applicants. Based
on these preferences, companies extend offers to some of their applicants,
applicants choose which of their offers to accept, and people begin heading
off to their summer internships.
Chapter 1 Introduction: Some Representative Problems 1. ! A First Problem: Stable,Matching
2

Gale and Shapley considered the sorts of things that could start going (i) E prefers every one of its accepted applicants to A; or
wrong with this process, in the absence of any mechanism to enforce the status (ii) A prefers her current situation over working for employer E.
quo. Suppose, for example, that your friend Raj has iust accepted a summer job
If this holds, the outcome is stable: individual self-interest will prevent any
at the large telecommunications company CluNet. A few days later, the small
applicant/employer deal from being made behind the scenes.
start-up company WebExodus, which had been dragging its feet on making a
few final decisions, calls up Rai and offers him a summer iob as well. Now, R Gale and Shapley proceeded to develop a striking algorithmic solution to
actually prefers WebExodus to CluNet--won over perhaps by the laid-back, this problem, which we will discuss presently. Before doing this, let’s note that
anything-can-happen atmosphere--and so this new development may well this is not the only origin of the Stable Matching Problem. It turns out that for
cause him to retract his acceptance of the CluNet offer and go to WebExodus a decade before the work of Gale and Shapley, unbeknownst to them, the
instead. Suddenly down one summer intern, CluNet offers a job to one of its National Resident Matching Program had been using a very similar procedure,
wait-listed applicants, who promptly retracts his previous acceptance of an with the same underlying motivation, to match residents to hospitals. Indeed,
offer from the software giant Babelsoft, and the situation begins to spiral out this system, with relatively little change, is still in use today.
of control. This is one testament to the problem’s fundamental appeal. And from the
Things look just as bad, if not worse, from the other direction. Suppose point of view of this book, it provides us with a nice first domain in which
that Raj’s friend Chelsea, destined to go to Babelsoft but having just heard Raj’s to reason about some basic combinatorial definitions and the algorithms that
story, calls up the people at WebExodus and says, "You know, I’d really rather build on them.
spend the summer with you guys than at Babelsoft." They find this very easy Formulating the Problem To get at the essence of this concept, it helps to
to believe; and furthermore, on looking at Chelsea’s application, they realize make the problem as clean as possible. The world of companies and applicants
that they would have rather hired her than some other student who actually contains some distracting asymmetries. Each applicant is looking for a single
is scheduled to spend the summer at WebExodus. In this case, if WebExodus company, but each company is looking for many applicants; moreover, there
were a slightly less scrupulous company, it might well find some way to retract may be more (or, as is sometimes the case, fewer) applicants than there are
its offer to this other student and hire Chelsea instead. available slots for summer iobs. Finally, each applicant does not typica!ly apply
Situations like this can rapidly generate a lot of chaos, and many people-- to every company.
both applicants and employers--can end up unhappy with the process as well It is useful, at least initially, to eliminate these complications and arrive at a
as the outcome. What has gone wrong? One basic problem is that the process more "bare-bones" version of the problem: each of n applicants applies to each
is not self-enforcing--if people are allowed to act in their self-interest, then it of n companies, and each company wants to accept a single applicant. We will
risks breaking down. see that doing this preserves the fundamental issues inherent in the problem;
We might well prefer the following, more stable situation, in which self- in particular, our solution to this simplified version will extend directly to the
interest itself prevents offers from being retracted and redirected. Consider more general case as well.
another student, who has arranged to spend the summer at CluNet but calls Following Gale and Shapley, we observe that this special case can be
up WebExodus and reveals that he, too, would rather work for them. But in viewed as the problem of devising a system by which each of n men and
this case, based on the offers already accepted, they are able to reply, "No, it n women can end up getting married: our problem naturally has the analogue
turns out that we prefer each of the students we’ve accepted to you, so we’re of two "genders"--the applicants and the companies--and in the case we are
afraid there’s nothing we can do." Or consider an employer, earnestly following considering, everyone is seeking to be paired with exactly one individual of
up with its top applicants who went elsewhere, being told by each of them, the opposite gender.1
"No, I’m happy where I am." In such a case, all the outcomes are stable--there
are no further outside deals that can be made.
So this is the question Gale and Shapley asked: Given a set of preferences 1 Gale and Shapley considered the same-sex Stable Matching Problem as well, where there is only a
among employers and applicants, can we assign applicants to employers so single gender. This is motivated by related applications, but it turns out to be fairly different at a
that for every employer E, and every applicant A who is not scheduled to work technical level. Given the applicant-employer application we’re considering here, we’ll be focusing
for E, at least one of the following two things is the case? on the version with two genders.
4 Chapter 1 Introduction: Some Representative Problems !.1 A First Problem: Stable Matching

So consider a set M = {m1 ..... ran} of n men, and a set W = {iv1 ..... Ivn} Iv prefers m to m’.
of n women. Let M x W denote the set of all possible ordered pairs of the for Iv’ prefers m to m’.
(m, Iv), where m ~ M and Iv ~ W. A matching S is a set of ordered pairs, each
from M x W, with the property that each member of M and each member If we think about this set of preference lists intuitively, it represents complete
W appears in at most one pair in S. A perfect matching S’ is a matching w agreement: the men agree on the order of the women, and the women agree
the property that each member of M and each member of W appears in exactl on the order of the men. There is a unique stable matching here, consisting
one pair in S’. of the pairs (m, Iv) and (m’, Iv’). The other perfect matching, consisting of the
pairs (m’, Iv) and (m, Iv’), would not be a stable matching, because the pair
Matchings and perfect matchings are objects that will recur freque (m, Iv) would form an instability with respect to this matching. (Both m and
throughout the book; they arise naturally in modeling a wide range of a
I~
n instability: m and w’~
each prefer the other to
eir current partners.
rithmic problems. In the present situation, a perfect matching correspo
simply to a way of pairing off the men with the women, in such a way
Iv would want to leave their respective partners and pair up.)
Next, here’s an example where things are a bit more intricate. Suppose
the preferences are
everyone ends up married to somebody, and nobody is married to more th
one person--there is neither singlehood nor polygamy. m prefers Iv to Iv’.
Now we can add the notion of preferences to this setting. Each man m ~ m’ prefers Iv’ to Iv.
ranks all the women; we will say that m prefers Iv to Iv’ if m ranks Iv high Iv prefers m’ to m.
than Iv’. We will refer to the ordered ranking of m as his preference list. We Iv’ prefers m to m’.
not allow ties in the ranking. Each woman, analogously, ranks all the me
Given a perfect matching S, what can go wrong? Guided by our in What’s going on in this case? The two men’s preferences mesh perfectly with
motivation in terms of employers and applicants, we should be worried ab each other (they rank different women first), and the two women’s preferences
the following situation: There are two pairs (m, Iv) and (m’, to’) in S likewise mesh perfectly with each other. But the men’s preferences clash
Figure 1.1 Perfect matching depicted in Figure 1.1) with the property that m prefers w’ to Iv, and Iv’ pref completely with the women’s preferences.
S with instability (m, w’). m to m’. In this case, there’s nothing to stop m and Iv’ from abandoning In this second example, there are two different stable matchings. The
current partners and heading off together; the set of marriages is not s matching consisting of the pairs (m, w) and (m’, w’) is stable, because both
enforcing. We’ll say that such a pair (m, Iv’) is an instability with respect t men are as happy as possible, so neither would leave their matched partner.
(m, Iv’) does not belong to S, but each of m and Iv’ prefers the other to th But the matching consisting of the pairs (m’, w) and (m, w’) is also stable, for
partner in S. the complementary reason that both women are as happy as possible. This is
Our goal, then, is a set of marriages with no instabilities. We’ll say an important point to remember as we go forward--it’s possible for an instance
a matching S is stable if (i) it is perfect, and (ii) there is no instability to have more than one stable matching.
respect to S. Two questions spring immediately to mind:
~:~ Designing the Algorithm
Does there exist a stable matching for every set of preference lists? we now show that there exists a stable matching for every set of preference
Given a set of preference lists, can we efficiently construct a st lists among the men and women. Moreover, our means of showing this will
matching if there is one? also answer the second question that we asked above: we will give an efficient
algorithm that takes the preference lists and constructs a stable matching.
Some Examples To illustrate these definitions, consider the following two
very simple instances of the Stable Matching Problem. Let us consider some of the basic ideas that.motivate the algorithm.
First, suppose we have a set of two men, fro, m’}, and a set of two women, Initially, everyone is unmarried. Suppose an unmarried man m chooses
{iv, Iv’}. The preference lists are as follows: the woman Iv who ranks highest on his preference list and proposes to
her. Can we declare immediately that (m, Iv) wii1 be one of the pairs in our
prefers Iv to Iv’. final stable matching? Not necessarily: at some point in the future, a man
prefers Iv to IV’. m’ whom Iv prefers may propose to her. On the other hand, it would be
6 Chapter 1 Introduction: Some Representative Problems 1.1 A First Problem: Stable Matching 7

dangerous for w to reject m right away; she may never receive a proposal ~ Analyzing the Algorithm

I~
oman w will become~
ngaged to m if she |
refers him to rat J
from someone she ranks as highly as m. So a natural idea would be to
have the pair (m, w) enter an intermediate state--engagement.
Suppose we are now at a state in which some men and women are/Tee--
First consider the view of a woman w during the execution of the algorithm.
For a while, no one has proposed to her, and she is free. Then a man m may
propose to her, and she becomes engaged. As time goes on, she may receive
not engaged--and some are engaged. The next step could look like this. additional proposals, accepting those that increase the rank of her partner. So
An arbitrary flee man m chooses the highest-ranked woman w to whom we discover the following.
© he has not yet proposed, and he proposes to her. If w is also free, then m
and w become engaged. Otherwise, w is already engaged to some other
(1.1) w remains engaged /Tom the point at which she receives her first
© man m’. In this case, she determines which of m or m’ ranks higher
proposal; and the sequence of partners to which she is engaged gets better and
better (in terms of her preference list).
© on her preference list; this man becomes engaged to w and the other
becomes flee, The view of a man m during the execution of the algorithm is rather
Finally, the algorithm wil! terminate when no one is free; at this moment, different. He is free until he proposes to the highest-ranked woman on his
all engagements are declared final, and the resulting perfect matchdng is list; at this point he may or may not become engaged. As time goes on, he
returned. may alternate between being free and being engaged; however, the following
property does hold.
Here is a concrete description of the Gale-Shapley algorithm, with Fig-
(1.2) The sequence of women to whom m proposes gets worse and worse (in
Figure 1.2 An intermediate ure 1.2 depicting a state of the algorithm.
state of the G-S algorithm
terms of his preference list).
when a free man ra is propos-
ing to a woman w. Initially all m E M and w E W are free Now we show that the algorithm terminates, and give a bound on the
While there is a man m who is free and hasn’t proposed to maximum number of iterations needed for termination.
every woman
Choose such a man m (1,3) The G-S algorithm terminates after at most n2 iterations of the While
Let w be the highest-ranked woman in m’s preference list loop.
to whom m has not yet proposed
If ~ is free then Proof. A useful strategy for upper-bounding the running time of an algorithm,
(m, ~) become engaged as we are trying to do here, is to find a measure of progress. Namely, we seek
Else ~ is currently engaged to m’ some precise way of saying that each step taken by the algorithm brings it
If ~ prefers m’ to m then closer to termination.
m remains free In the case of the present algorithm, each iteration consists of some man
Else w prefers m to m’ proposing (for the only time) to a woman he has never proposed to before. So
(m,~) become engaged if we let ~P(t) denote the set of pairs (m, w) such that m has proposed to w by
nlI becomes free the end of iteration t, we see that for all t, the size of ~P(t + 1) is strictly greater
Endif than the size of ~P(t). But there are only n2 possible pairs of men and women
Endif in total, so the value of ~P(.) can increase at most n2 times over the course of
Endwhile the algorithm. It follows that there can be at most n2 iterations. []
Return the set S of engaged pairs
Two points are worth noting about the previous fact and its proof. First,
An intriguing thing is that, although the G-S algorithm is quite simpl there are executions of the algorithm (with certain preference lists) that can
to state, it is not immediately obvious that it returns a stable matching, or involve close to n2 iterations, so this analysis is not far from the best possible.
even a perfect matching. We proceed to prove this now, through a sequence Second, there are many quantities that would not have worked well as a
of intermediate facts. progress measure for the algorithm, since they need not strictly increase in each
Chapter 1 Introduction: Some Representative Problems 1.1 A First Problem: Stable Matching 9
8

iteration. For example, the number of free individuals could remain constant this execution? If he didn’t, then w must occur higher on m’s preference.list
from one iteration to the next, as could the number of engaged pairs. Thus, than w’, contxadicting our assumption that m prefers w’ to w. If he did, then
these quantities could not be used directly in giving an upper bound on the he was rejected by w’ in favor of some other man m", whom w’ prefers to m.
maximum possible number of.iterations, in the style of the previous paragraph. m’ is the final partner of w’, so either m" = m’ or, by (1.!), w’ prefers her final
Let us now establish that the set S returned at the termination of the partner m~ to m"; either way this contradicts our assumption that w’ prefers
m to mI.
algorithm is in fact a perfect matching. Why is this not immediately obvious?
Essentially, we have to show that no man can "fall off" the end of his preference It follows that S is a stable matching. []
list; the only way for the ~’h±].e loop to exit is for there to be no flee man. In
this case, the set of engaged couples would indeed be a perfect matching. Extensions
So the main thing we need to show is the following. We began by defining the notion of a stable matching; we have just proven
that the G-S algorithm actually constructs one. We now consider some further
(1.4) If m is free at some point in the execution of the algorithm, then there
questions about the behavior of the G-S algorithm and its relation to the
is a woman to whom he has not yet proposed.
properties of different stable matchings.
Proof. Suppose there comes a point when m is flee but has already proposed To begin wit_h, recall that we saw an example earlier in which there could
to every woman. Then by (1.1), each of the n women is engaged at this point be multiple stable matchings. To recap, the preference lists in this example
in time. Since the set of engaged pairs forms a matching, there must also be were as follows:
n engaged men at this point in time. But there are only n men total, and m is
not engaged, so this is a contradiction. ,, prefers w to w’.
~ prefers w’ to w.
(1..~) The set S returned at termination is a peryect matching. prefers m~ to m.
Proof. The set of engaged pairs always forms a matching. Let us suppose that prefers m to m’.
the algorithm terminates with a flee man m. At termination, it must be the Now, in any execution of the Gale-Shapley algorithm, m will become engaged
case that m had already proposed to every woman, for otherwise the ~qhile to w, m’ will become engaged to w’ (perhaps in the other order), and things
loop would not have exited. But this contradicts (1.4), which says that there will stop there. Thus, the other stable matching, consisting of the pairs (m’, w)
cannot be a flee man who has proposed to every woman. ,, and (m, w’), is not attainable from an execution of the G-S algorithm in which
the men propose. On the other hand, it would be reached if we ran a version of
Finally, we prove the main property of the algorithm--namely, that it the algorithm in which the women propose. And in larger examples, with more
results in a stable matching. than two people on each side, we can have an even larger collection of possible
stable matchings, many of them not achievable by any natural algorithm.
(1.6) Consider an executionof the G-S algorithm that returns a set of pairs
S. The set S is a stable matching. This example shows a certain "unfairness" in the G-S algorithm, favoring
men. If the men’s preferences mesh perfectly (they all list different women as
Proof. We have already seen, in (1.5), that S is a perfect matching. Thus, to their first choice), then in all runs of the G-S algorithm all men end up matched
prove S is a stable matching, we will assume that there is an instability with with their first choice, independent of the preferences of the women. If the
women’s preferences clash completely with the men’s preferences (as was the
respect to S and obtain a contradiction. As defined earlier, such an instability
would involve two pairs, (m, w) and (m’, w’), in S with the properties that case in this example), then the resulting stable matching is as bad as possible
for the women. So this simple set of preference lists compactly summarizes a
o m prefers w’ to w, and world in which someone is destined to end up unhappy: women are unhappy
o w’ prefers m to mL if men propose, and men are unhappy if women propose.
In the execution of the algorithm that produced S, m’s last proposal was, by Let’s now analyze the G-S algorithm in more detail and try to understand
definition, to w. Now we ask: Did m propose to w’ at some earlier point in how general this "unfairness" phenomenon is.
10 Chapter 1 Introduction: Some Representative Problems 1.1 A First Problem: Stable Matching 11

To begin With, our example reinforces the point that the G-S algorithm our question above by showing that the order of proposals in the G-S algorithm
is actually underspecified: as long as there is a free man, we are allowed to has absolutely no effect on the final outcome.
choose any flee man to make the next proposal. Different choices specify Despite all this, the proof is not so difficult.
different executions of the algprithm; this is why, to be careful, we stated (1.6)
as "Consider an execution of the G-S algorithm that returns a set of pairs S," Proof. Let us suppose, by way of contradiction, that some execution g of the
instead of "Consider the set S returned by the G-S algorithm." G-S algorithm results in a matching S in which some man is paired with a
Thus, we encounter another very natural question: Do all executions of woman who is not his best valid partner. Since men propose in decreasing
the G-S algorithm yield the same matching? This is a genre of question that order of preference, this means that some man is rejected by a valid partner
arises in many settings in computer science: we have an algorithm that runs during the execution g of the algorithm. So consider the first moment during
asynchronously, with different independent components performing actions the execution g in which some man, say m, is rejected by a valid partner iv.
that can be interleaved in complex ways, and we want to know how much Again, since men propose in decreasing order of preference, and since this is
variability this asynchrony causes in the final outcome. To consider a very the first time such a rejection has occurred, it must be that iv is m’s best valid
different kind of example, the independent components may not be men and partner best(m).
women but electronic components activating parts of an airplane wing; the The reiection of m by iv may have happened either because m proposed
effect of asynchrony in their behavior can be a big deal. and was turned down in favor of iv’s existing engagement, or because iv broke
In the present context, we will see that the answer to our question is her engagement to m in favor of a better proposal. But either way, at this
moment iv forms or continues an engagement with a man m’ whom she prefers
surprisingly clean: all executions of the G-S algorithm yield the same matching.
We proceed to prove this now. to m.
All Executions Yield the Same Matching There are a number of possible Since iv is a valid parmer of m, there exists a stable matching S’ containing
ways to prove a statement such as this, many of which would result in quite the pair (m, iv). Now we ask: Who is m’ paired with in this matching? Suppose
complicated arguments. It turns out that the easiest and most informative ap- it is a woman iv’ ~= iv.
proach for us will be to uniquely characterize the matching that is obtained and Since the rejection of m by iv was the first rejection of a man by a valid
then show that al! executions result in the matching with this characterization. partner in the execution ~, it must be that m’ had not been rejected by any valid
What is the characterization? We’ll show that each man ends up with the parmer at the point in ~ when he became engaged to iv. Since he proposed in
"best possible partner" in a concrete sense. (Recall that this is true if all men decreasing order of preference, and since iv’ is clearly a valid parmer of m’, it
prefer different women.) First, we will say that a woman iv is a valid partner must be that m’ prefers iv to iv’. But we have already seen that iv prefers m’
of a man m if there is a stable matching that contains the pair (m, iv). We will to m, for in execution ~ she rejected m in favor of m’. Since (m’, iv) S’, it
say that iv is the best valid partner of m if iv is a valid parmer of m, and no follows that (m’, iv) is an instability in S’.
woman whom m ranks higher than iv is a valid partner of his. We will use This contradicts our claim that S’ is stable and hence contradicts our initial
best(m) to denote the best valid partner of m. assumption. []
Now, let S* denote the set of pairs {(m, best(m)) : m ~ M}. We will prove
the folloWing fact. So for the men, the G-S algorithm is ideal. Unfortunately, the same cannot
be said for the women. For a woman w, we say that m is a valid partner if
(1.7) Every execution of the C--S algorithm results in the set S*: there is a stable matching that contains the pair (m, w). We say that m is the
ivorst valid partner of iv if m is a valid partner of w, and no man whom iv
This statement is surprising at a number of levels. First of all, as defined, ranks lower than m is a valid partner of hers.
there is no reason to believe that S* is a matching at all, let alone a stable (1.8) In the stable matching S*, each woman is paired ivith her ivorst valid
matching. After all, why couldn’t it happen that two men have the same best partner.
valid partner? Second, the result shows that the G-S algorithm gives the best
possible outcome for every man simultaneously; there is no stable matching Proof. Suppose there were a pair (m, iv) in S* such that m is not the worst
in which any of the men could have hoped to do better. And finally, it answers valid partner of iv. Then there is a stable matching S’ in which iv is paired
Chapter 1 Introduction: Some Representative Problems 1.2 Five Representative Problems 13
12

with a man m’ whom she likes less than m. In S’, m is paired with a woman science courses, we’ll be introducing them in a fair amount of depth in
w’ ~ w; since w is the best valid partner of m, and w’ is a valid partner of m, Chapter 3; due to their enormous expressive power, we’ll also be using them
we see that m prefers w to w’. extensively throughout the book. For the discussion here, it’s enough to think
of a graph G as simply a way of encoding pairwise relationships among a set
But from this it follows that (m, w) is an instability in S’, contradicting the
of objects. Thus, G consists of a pair of sets (V, E)--a collection V of nodes
claim that S’ is stable and hence contradicting our initial assumption. []
and a collection E of edges, each of which "joins" two of the nodes. We thus
represent an edge e ~ E as a two-element subset of V: e = (u, u) for some (a)
Thus, we find that our simple example above, in which the men’s pref-
u, u ~ V, where we call u and u the ends of e. We typica!ly draw graphs as in
erences clashed with the women’s, hinted at a very general phenomenon: for
Figure 1.3, with each node as a small circle and each edge as a line segment
any input, the side that does the proposing in the G-S algorithm ends up with
joining its two ends.
the best possible stable matching (from their perspective), while the side that
does not do the proposing correspondingly ends up with the worst possible Let’s now turn to a discussion of the five representative problems.
stable matching.
Interval Scheduling
1.2 Five Representative Problems Consider the following very simple scheduling problem. You have a resource-- Figure 1.3 Each of (a) and
it may be a lecture room, a supercompnter, or an electron microscope--and (b) depicts a graph on four
The Stable Matching Problem provides us with a rich example of the process of nodes.
many people request to use the resource for periods of time. A request takes
algorithm design. For many problems, this process involves a few significant,
the form: Can I reserve the resource starting at time s, until time f? We will
steps: formulating the problem with enough mathematical precision that we
assume that the resource can be used by at most one person at a time. A
can ask a concrete question and start thinking about algorithms to solve
scheduler wants to accept a subset of these requests, rejecting al! others, so
it; designing an algorithm for the problem; and analyzing the algorithm by
that the accepted requests do not overlap in time. The goal is to maximize the
proving it is correct and giving a bound on the running time so as to establish
number of requests accepted.
the algorithm’s efficiency.
More formally, there will be n requests labeled 1 ..... n, with each request
This high-level strategy is carried out in practice with the help of a few
fundamental design techniques, which are very useful in assessing the inherent i specifying a start time si and a finish time fi. Naturally, we have si < fi for all
i. Two requests i andj are compatible if the requested intervals do not overlap:
complexity of a problem and in formulating an algorithm to solve it. As in any
that is, either request i is for an earlier time interval than request j (fi <
area, becoming familiar with these design techniques is a gradual process; but
with experience one can start recognizing problems as belonging to identifiable or request i is for a later time than request j (1~ _< si). We’ll say more generally
that a subset A of requests is compatible if all pairs of requests i,j ~ A, i ~=j are
genres and appreciating how subtle changes in the statement of a problem can
compatible. The goal is to select a compatible subset of requests of maximum
have an enormous effect on its computational difficulty.
possible size.
To get this discussion started, then,, it helps to pick out a few representa-
We illustrate an instance of this Interual Scheduling Problem in Figure 1.4.
tive milestones that we’ll be encountering in our study of algorithms: cleanly
formulated problems, all resembling one another at a general level, but differ- Note that there is a single compatible set of size 4, and this is the largest
compatible set.
ing greatly in their difficulty and in the kinds of approaches that one brings
to bear on them. The first three will be solvable efficiently by a sequence of
increasingly subtle algorithmic techniques; the fourth marks a major turning
point in our discussion, serving as an example of a problem believed to be un-
solvable by any efficient algorithm; and the fifth hints at a class of problems
believed to be harder stil!.
The problems are self-contained and are al! motivated by computing
applications. To talk about some of them, though, it will help to use the
termino!ogy of graphs. While graphs are a common topic in earlier computer Figure 1.4 An instance of the Interval Scheduling Problem.
14 Chapter ! Introduction: Some Representative Problems 1.2 Five Representative Problems 15

We will see shortly that this problem can be solved by a very natural and Y in such a way that every edge has one end in X and the other end in Y.
algorithm that orders the set of requests according to a certain heuristic and A bipartite graph is pictured in Figure 1.5; often, when we want to emphasize
then "greedily" processes them in one pass, selecting as large a compatible a graph’s "bipartiteness," we will draw it this way, with the nodes in X and
subset as it can. This will be .typical of a class of greedy algorithms that we Y in two parallel columns. But notice, for example, that the two graphs in
will consider for various problems--myopic rules that process the input one Figure 1.3 are also bipartite.
piece at a time with no apparent look-ahead. When a greedy algorithm can be Now, in the problem of finding a stable matching, matchings were built
shown to find an optimal solution for al! instances of a problem, it’s often fairly from pairs of men and women. In the case of bipartite graphs, the edges are
surprising. We typically learn something about the structure of the underlying pairs of nodes, so we say that a matching in a graph G = (V, E) is a set of edges
problem from the fact that such a simple approach can be optimal. M _c E with the property that each node appears in at most one edge of M.
M is a perfect matching if every node appears in exactly one edge of M. Figure 1.5 A bipartite graph.
Weighted Interval Scheduling To see that this does capture the same notion we encountered in the Stable
In the Interval Scheduling Problem, we sohght to maximize the number of Matching Problem, consider a bipartite graph G’ with a set X of n men, a set Y
requests that could be accommodated simultaneously. Now, suppose more of n women, and an edge from every node in X to every node in Y. Then the
generally that each request interval i has an associated value, or weight, matchings and perfect matchings in G’ are precisely the matchings and perfect
vi > O; we could picture this as the amount of money we will make from matchings among the set of men and women.
the ith individual if we schedule his or her request. Our goal will be to find a
compatible subset of intervals of maximum total value. In the Stable Matching Problem, we added preferences to this picture. Here,
we do not consider preferences; but the nature of the problem in arbitrary
The case in which vi = I for each i is simply the basic Interval Scheduling bipartite graphs adds a different source of complexity: there is not necessarily
Problem; but the appearance of arbitrary values changes the nature of the an edge from every x ~ X to every y ~ Y, so the set of possible matchings has
maximization problem quite a bit. Consider, for example, that if v1 exceeds quite a complicated structure. In other words, it is as though only certain pairs
the sum of all other vi, then the optimal solution must include interval 1 of men and women are willing to be paired off, and we want to figure out
regardless of the configuration of the fi~l set of intervals. So any algorithm how to pair off many people in a way that is consistent with this. Consider,
for this problem must be very sensitive to the values, and yet degenerate to a for example, the bipartite graph G in Figure 1.5: there are many matchings in
method for solving (unweighted) interval scheduling when all the values are G, but there is only one perfect matching. (Do you see it?)
equal to 1.
Matchings in bipartite graphs can model situations in which objects are
There appears to be no simple greedy rule that walks through the intervals being assigned to other objects. Thus, the nodes in X can represent jobs, the
one at a time, making the correct decision in the presence of arbitrary values. nodes in Y can represent machines, and an edge (x~, y]) can indicate that
Instead, we employ a technique, dynamic programming, that builds up the machine y] is capable of processing job xi. A perfect matching is then a way
optimal value over all possible solutions in a compact, tabular way that leads of assigning each job to a machine that can process it, with the property that
to a very efficient algorithm. each machine is assigned exactly one job. In the spring, computer science
departments across the country are often seen pondering a bipartite graph in
Bipal~te Matching which X is the set of professors in the department, Y is the set of offered
When we considered the Stable Matching Problem, we defined a matching to courses, and an edge (xi, yj) indicates that professor x~ is capable of teaching
be a set of ordered pairs of men and women with the property that each man course y]. A perfect matching in this graph consists of an assignment of each
and each woman belong to at most one of the ordered pairs. We then defined professor to a course that he or she can teach, in such a way that every course
a perfect matching to be a matching in which every man and every woman is covered.
belong to some pair. Thus the Bipartite Matching Problem is the following: Given an arbitrary
We can express these concepts more generally in terms of graphs, and in bipartite graph G, find a matching of maximum size. If IXI = I YI = n, then there
order to do this it is useful to define the notion of a bipartite graph. We say that is a perfect matching if and only if the maximum matching has size n. We will
a graph G ---- (V, E) is bipa~te if its node set V can be partitioned into sets X find that the algorithmic techniques discussed earlier do not seem adequate
Chapter 1 Introduction: Some Representative Problems 1.2 Five Representative Problems 17
16

for providing an efficient algorithm for this problem. There is, however, a very Given the generality of the Independent Set Problem, an efficient algorithm
elegant and efficient algorithm to find a maximum matching; it inductively to solve it would be quite impressive. It would have to implicitly contain
builds up larger and larger matchings, selectively backtracking along the way. algorithms for Interval Scheduling, Bipartite Matching, and a host of other
This process is called augmeritation, and it forms the central component in a natural optimization problems.
large class of efficiently solvable problems called network flow problems. The current status of Independent Set is this: no efficient algorithm is
known for the problem, and it is conjectured that no such algorithm exists.
The obvious brute-force algorithm would try all subsets of the nodes, checking
Independent Set
each to see if it is independent, and then recording the largest one encountered.
Now let’s talk about an extremely general problem, which includes most of It is possible that this is close to the best we can do on this problem. We will
these earlier problems as special cases. Given a graph G = (V, E), we say see later in the book that Independent Set is one of a large class of problems
a set of nodes S _ V is independent if no ’two nodes~in S are joined by an that are termed NP-compIete. No efficient algorithm is known for any of them;
edge. The Independent Set Problem is, then, the following: Given G, find an but they are all equivalent in the sense that a solution to any one of them
independent set that is as large as possible. For example, the maximum size of would imply, in a precise sense, a solution to all of them.
Figure 1.6 A graph whose
an independent set in the graph in Figure 1.6 is four, achieved by the.four-node
Here’s a natural question: Is there anything good we can say about the
largest independent set has independent set [1, 4, 5, 6}.
size 4.
complexity of the Independent Set Problem? One positive thing is the following:
The Independent Set Problem encodes any situation in which you are If we have a graph G on 1,000 nodes, and we want to convince you that it
trying to choose from among a collection of objects and there are pairwise contains an independent set S of size 100, then it’s quite easy. We simply
conflicts among some of the objects. Say you have n friends, and some pairs show you the graph G, circle the nodes of S in red, and let you check that
of them don’t get along. How large a group of your friends can you invite to no two of them are joined by an edge. So there really seems to be a great
dinner if you don’t want any interpersonal tensions? This is simply the largest difference in difficulty between checking that something is a large independent
independent set in the graph whose nodes are your friends, with an edge set and actually finding a large independent set. This may look like a very basic
between each conflicting pair. observation--and it is--but it turns out to be crucial in understanding this class
Interval Scheduling and Bipartite Matching can both be encoded as special of problems. Furthermore, as we’ll see next, it’s possible for a problem to be
cases of the Independent Set Problem. For Interval Scheduling, define a graph so hard that there isn’t even an easy way to "check" solutions in this sense.
G = (V, E) in which the nodes are the intervals and there is an edge between
each pair of them that overlap; the independent sets in G are then just the Competitive Facility Location
compatible subsets of intervals. Encoding Bipartite Matching as a special case
Finally, we come to our fifth problem, which is based on the following two-
of Independent Set is a little trickier to see. Given a bipartite graph G’ = (V’, E’),
player game. Consider two large companies that operate caf6 franchises across
the objects being chosen are edges, and the conflicts arise between two edges
the country--let’s call them JavaPlanet and Queequeg’s Coffee--and they are
that share an end. (These, indeed, are the pairs of edges that cannot belong
currently competing for market share in a geographic area. First JavaPlanet
to a common matching.) So we define a graph G = (V, E) in which the node
opens a franchise; then Queequeg’s Coffee opens a franchise; then JavaPlanet;
set V is equal to the edge set E’ of G’. We define an edge between each pair
then Queequeg’s; and so on. Suppose they must deal with zoning regulations
of elements in V that correspond to edges of G’ with a common end. We can
that require no two franchises be located too close together, and each is trying
now check that the independent sets of G are precisely the matchings of G’.
to make its locations as convenient as possible. Who will win?
While it is not complicated to check this, it takes a little concentration to deal
with this type of "edges-to-nodes, nodes-to-edges" transformation.2 Let’s make the rules of this "game" more concrete. The geographic region
in question is divided into n zones, labeled 1, 2 ..... n. Each zone i has a

2 For those who are curious, we note that not every instance of the Independent Set Problem can arise
Interval Scheduling, and the graph in Figure 1.3(b) cannot arise as the "conflict graph" in an instance
in this way from Interval Scheduling or from Bipartite Matching; the full Independent Set Problem
of Bipartite Matching.
really is more general. The graph in Figure 1.3(a) cannot arise as the "conflict graph" in an instance of
18 Chapter 1 Introduction: Some Representative Problems Solved Exercises 19

Solved Exercises
Figure 1.7 An instance of the Competitive FaciBt3, Location Problem. Solved Exercise 1
Consider a town with n men and n women seeking to get married to one
another. Each man has a preference list that ranks all the women, and each
woman has a preference list that ranks all the men.
value bi, which is the revenue obtained by either of the companies if it opens
The set of all 2n people is divided into two categories: good people and
a franchise there. Finally, certain pairs of zones (i,]) are adjacent, and local
bad people. Suppose that for some number k, 1 < k < n - 1, there are k good
zoning laws prevent two adjacent zones from each containing a franchise,
men and k good women; thus there are n - k bad men and n - k bad women.
regardless of which company owns them. (They also prevent two franchises
from being opened in the same zone.) We model these conflicts via a graph Everyone would rather marry any good person than any bad person.
G= (V,E), where V is the set of zones, .and ~(i,]) is an edge in E if the Formally, each preference list has the property that it ranks each good person
zones i and ] are adiacent. ~The zoning requirement then says that the full of the opposite gender higher than each bad person of the opposite gender: its
set of franchises opened must form an independent set in G. first k entries are the good people (of the opposite gender) in some order, and
its next n - k are the bad people (of the opposite gender) in some order.
Thus our game consists of two players, P1 and P2, alternately selecting
nodes in G, with P1 moving first. At all times, the set of all selected nodes Show that in every stable matching, every good man is married to a good
must form an independent set in G. Suppose that player P2 has a target bound woman.
B, and we want to know: is there a strategy for P2 so that no matter how P1 Solution A natural way to get started thinking about this problem is to
plays, P2 will be able to select a set of nodes with a total value of at least B? assume the claim is false and try to work toward obtaining a contradiction.
We will call this an instance of the Competitive Facility Location Problem. What would it mean for the claim to be false? There would exist some stable
Consider, for example, the instance pictured in Figure 1.7, and suppose matching M in which a good man m was married to a bad woman w.
that P2’s target bound is B = 20. Then P2 does have a winning strategy. On the Now, let’s consider what the other pairs in M look like. There are k good
other hand, if B = 25, then P2 does not. men and k good women. Could it be the case that every good woman is married
One can work this out by looking at the figure for a while; but it requires to a good man in this matching M? No: one of the good men (namely, m) is
some amount of case-checking of the form, "If P~ goes here, then P2 will go already married to a bad woman, and that leaves only k - ! other good men.
there; but if P~ goes over there, then P2 will go here .... "And this appears to So even if all of them were married to good women, that would still leave some
be intrinsic to the problem: not only is it compntafionally difficult to determine good woman who is married to a bad man.
whether P2 has a winning strategy; on a reasonably sized graph, it would even Let w’ be such a good woman, who is married to a bad man. It is now
be hard for us to convince you that P2 has a winning strategy. There does not easy to identify an instability in M: consider the pair (m, w’). Each is good,
seem to be a short proof we could present; rather, we’d have to lead you on a but is married to a bad partner. Thus, each of m and w’ prefers the other to
lengthy case-by-case analysis of the set of possible moves. their current partner, and hence (m, w’) is an instability. This contradicts our
This is in contrast to the Independent Set Problem, where we believe that assumption that M is stable, and hence concludes the proof.
finding a large solution is hard but checking a proposed large solution is easy.
This contrast can be formalized in the class of PSPACE-complete problems, of Solved Exercise 2
which Competitive Facility Location is an example.. PSPACE-complete prob- We can think about a generalization of the Stable Matching Problem in which
lems are believed to be strictly harder than NP-complete problems, and this certain man-woman pairs are explicitly forbidden. In the case of employers and
conjectured lack of short "proofs" for their solutions is one indication of this applicants, we could imagine that certain applicants simply lack the necessary
greater hardness. The notion of PSPACE-completeness turns out to capture a qualifications or certifications, and so they cannot be employed at certain
large collection of problems involving game-playing and planning; many of companies, however desirable they may seem. Using the analogy to marriage
these are fundamental issues in the area of artificial intelligence. between men and women, we have a set M of n men, a set W of n women,
20 Chapter ! Introduction: Some Representative Problems Solved Exercises 21

and a set F _q M x W of pairs who are simply not allowed to get married. Each won’t work: we don’t want m to propose to a woman w for which the pair
man m ranks all th6 women w for which (m, w) ~ F, and each woman w’ ranks (m, w) is forbidden.
al! the men m’ for which (m’, w’) ~ F. Thus, let’s consider a variation of the G-S algorithm in which we make
In this more general setting, we say that a matching S is stable if it does only one change: we modify the Wh±le loop to say,
not exhibit any of the following types of instability.
While there is a man m who is free and hasn’t proposed to
(i) There are two pairs (m, w) and (m’, w’) in S with the property that every woman w for which (m,w) ~F.
(m, w’) F, m prefers w’ to w, and w’ prefers m to m’. (The usual kind
of instability.) Here is the algorithm in full.
(ii) There is a pair (m, w) E S, and a man m’, so that m’ is not part of any
pair in the matching, (m’, w) F, and w prefers m’ to m. (A single man Initially all m ~M and w ~W are free
is more desirable and not forbidden.) While there is a man m who is free and hasn’t proposed to
every woman w for which (m, w) ~F
(iii) There is a pair (m, w) E S, and a woman W’, so that w’ is not part of
any pair in the matching, (m, w’) F, and m prefers w’ to w. (A single Choose ~uch a man m
Let ~ be the highest-ranked woman in m’s preference list
woman is more desirable and not forbidden.)
to which m has not yet proposed
(iv) There is a man m and a woman w, neither of whom is part of any pair If ~ is free then
in the matching, so that (m, w) F. (There are two single people with
(m,~) become engaged
nothing preventing them from getting married to each other.)
Else w is currently engaged to m’
Note that under these more general definitions, a stable matching need not be If w prefers m’ to m then
a perfect matching. m remains free
Else ~ prefers m to m’
Now we can ask: For every set of preference lists and every set of forbidden
(m,~) become engaged
pairs, is there always a stable matching? Resolve this question by doing one of
mt becomes free
the following two things: (a) give an algorithm that, for any set of preference
Endif
lists and forbidden pairs, produces a stable matching; or (b) give an example
Endif
of a set of preference lists and forbidden pairs for which there is no stable
Endwhile
matching.
Keturn the set S of engaged pairs
Solution The Gale-Shapley algorithm is remarkably robust to variations on
the Stable Matching Problem. So, if you’re faced with a new variation of the We now prove that this yields a stable matching, under our new definition
problem and can’t find a counterexample to stability, it’s often a good idea to of stabi~ty.
check whether a direct adaptation of the G-S algorithm will in fact produce
To begin with, facts (1.1), (1.2), and (1.5) from the text remain true (in
stable matchings.
particular, the algorithm will terminate in at most n2 iterations]. Also, we
That turns out to be the case here. We will show that there is always a don’t have to worry about establishing that the resulting matching S is perfect
stable matching, even in this more general model with forbidden pairs, and (indeed, it may not be]. We also notice an additional pairs of facts. If m is
we will do this by adapting the G-S algorithm. To do this, let’s consider why a man who is not pan of a pair in S, then m must have proposed to every
the original G-S algorithm can’t be used directly. The difficulty, of course, is nonforbidden woman; and if w is a woman who is not part of a pair in S, then
that the G-S algorithm doesn’t know anything about forbidden pairs, and so it must be that no man ever proposed to w.
the condition in the gh±le loop,
Finally, we need only show
While there is a man m who is free and hasn’t proposed to
every woman, (1.9) There is no instability with respect to the returned matching S.
Chapter 1 Introduction: Some Representative Problems Exercises 23
22

Proof. Our general definition of instability has four parts: This means that we Suppose we have two television networks, whom we’ll call A and ~B.
have to make sure that none of the four bad things happens. There are n prime-time programming slots, and each network has n TV
shows. Each network wants to devise a schedule--an assignment of each
First, suppose there is an instability of type (i), consisting of pairs (m, w)
show to a distinct slot--so as to attract as much market share as possible.
and (m’, w’) in S with the property that (m, w’) ~ F, m prefers w’ to w, and w’
prefers m to m’. It follows that m must have proposed to w’; so w’ rejected rn, Here is the way we determine how well the two networks perform
and thus she prefers her final partner to m--a contradiction. relative to each other, given their schedules. Each show has a fixed rating,
which is based on the number of people who watched it last year; we’ll
Next, suppose there is an instability of type (ii), consisting of a pair
assume that no two shows have exactly the same rating. A network wins a
(m, w) ~ S, and a man m’, so that m’ is not part of any pair in the matching,
given time slot if the show that it schedules for the time slot has a larger
(m’, w) ~ F, and w prefers m’ to m. Then m’ must have proposed to w and
rating than the show the other network schedules for that time slot. The
been rejected; again, it follows that w prefers her final partner to
goal of each network is to win as many time slots as possible.
contradiction.
Suppose in the opening week of the fall season, Network A reveals a
Third, suppose there is an instability of type (iii), consisting of a pair
schedule S and Network ~B reveals a schedule T. On the basis of this pair
(m, w) ~ S, and a woman w’, so that w’ is not part of any. pair in the matching,
of schedules, each network wins certain time slots, according to the rule
(m, w’) ~ F, and rn prefers w’ to w. Then no man proposed to w’ at all;
above. We’ll say that the pair of schedules (S, T) is stable if neither network
in particular, m never proposed to w’, and so he must prefer w to
can unilaterally change its own schedule and win more time slots. That
contradiction.
is, there is no schedule S’ such that Network ~t wins more slots with the
Finally, suppose there is an instability of type (iv), consisting of a man pair (S’, T) than it did with the pair (S, T); and symmetrically, there is no
m and a woman w, neither of which is part of any pair in the matching, schedule T’ such that Network ~B wins more slots with the pair (S, T’) than
so that (m, w) ~ F. But for ra to be single, he must have proposed to every it did with the pair (S, T).
nonforbidden woman; in particular, he must have proposed tow, which means
The analogue of Gale and Shapley’s question for this kind of stability
she would no longer be single--a contradiction. []
is the following: For every set of TV shows and ratings, is there always
a stable pair of schedules? Resolve this question by doing one of the
Exercises following two things:
(a) give an algorithm that, for any set of TV shows and associated
Decide whether you think the following statement is true or false. If it is ratings, produces a stable pair of schedules; or
true, give a short explanation. If it is false, give a counterexample.
(b) give an example of a set of TV shows and associated ratings for
True or false? In every instance of the Stable Matching Problem, there is a which there is no stable pair of schedules.
stable matching containing a pair (m, w) such that m is ranked first on the
preference list of w and w is ranked first on the preference list of m.
Gale and Shapley published their paper on the Stable Matching Problem
Decide whether you think the following statement is true or false. If it is in 1962; but a version of their algorithm had already been in use for
true, give a short explanation. If it is false, give a cotmterexample. ten years by the National Resident Matching Program, for the problem of
assigning medical residents to hospitals.
True or false? Consider an instance of the Stable Matching Problem in which
Basically, the situation was the following. There were m hospitals,
there exists a man m and a woman w such that m is ranked first on the
each with a certain number of available positions for hiring residents.
preference list of w and w is ranked first on the preference list of m. Then in
There were n medical students graduating in a given year, each interested
every stable matching S for this instance, the pair (m, w) belongs to S.
in joining one of the hospitals. Each hospital had a ranking of the students
3. There are many other settings in which we can ask questions related in order of preference, and each student had a ranking of the hospitals
to some type of "stability" principle. Here’s one, involx4ng competition in order of preference. We will assume that there were more students
between two enterprises. graduating than there were slots available in the m hospitals.
Chapter 1 Introduction: Some Representative Problems
Exercises 25
24
strong instability? Either give an example of a set of men and women
The interest, naturally, was in finding a way of assigning each student
with preference lists for which every perfect matching has a strong
to at most one hospital, in such a way that all available positions in all
instability; or give an algorithm that is guaranteed to find a perfect
hospitals were filled. (Since we are assuming a surplus of students, there
matching with no strong instability.
would be some students who do not get assigned to any hospital.)
(b) A weak instability in a perfect matching S consists of a man m and
We say that an assignment of students to hospitals is stable ff neither
a woman tv, such that their partners in S are tv’ and m’, respectively,
of the following situations arises. and one of the following holds:
¯ First type of instability: There are students s and s’, and a hospital h, - m prefers u~ to ui, and tv either prefers m to m’ or is indifferent
so that be~veen these two choices; or
- s is assigned to h, and u~ prefers m to m’, and m either prefers u~ to u3’ or is indifferent
- s’ is assigned to no hospital, and between these two choices.
- h prefers s’ to s. In other words, the pairing between m and tv is either preferred
° Second type of instability: There are students s and s~, and hospitals by both, or preferred by one while the other is indifferent. Does
t~ and h’, so that there always exist a perfect matching with no weak instability? Either
- s is assigned to h, and give an example of a set of men and women with preference lists
s’ is assigned to tff, and for which every perfect matching has a weak instability; or give an
- t~ prefers s’ to s, and algorithm that is guaranteed to find a perfect matching with no weak
- s’ prefers tt to h’. instability.
So we basically have the Stable Matching Problem, except that (i)
6. Peripatetic Shipping Lines, inc., is a shipping company that owns n ships
hospitals generally want more than one resident, and (ii) there is a surplus and provides service to n ports. Each of its ships has a schedule that says,
of medical students. for each day of the month, which of the ports it’s currently visiting, or
Show that there is always a stable assignment of students to hospi- whether it’s out at sea. (You can assume the "month" here has m days,
tals, and give an algorithm to find one. for some m > n.) Each ship visits each port for exactly one day during the
month. For safety reasons, PSL Inc. has the following strict requirement:
The Stable Matching Problem, as discussed in the text, assumes that all (t) No two ships can be in the same port on the same day.
men and women have a fully ordered list of preferences. In this problem
The company wants to perform maintenance on all the ships this
we will consider a version of the problem in which men and women can be
month, via the following scheme. They want to truncate each ship’s
indifferent between certain options. As before we have a set M of n men
schedule: for each ship Sg, there will be some day when it arrives in its
and a set W of n women. Assume each man and each woman ranks the scheduled port and simply remains there for the rest of the month (for
members of the opposite gender, but now we allow ties in the ranking.
maintenance). This means that S~ will not visit the remaining ports on
For example (with n = 4), a woman could say that ml is ranked in first
its schedule (if any) that month, but this is okay. So the truncation of
place; second place is a tie between mz and m3 (she has no preference
S~’s schedule will simply consist of its original schedule up to a certain
between them); and m4 is in last place. We will say that tv prefers m to m’
specified day on which it is in a port P; the remainder of the truncated
if m is ranked higher than m’ on her preference list (they are not tied).
schedule simply has it remain in port P.
With indifferences in the ranldngs, there could be two natural notions
Now the company’s question to you is the following: Given the sched-
for stability. And for each, we can ask about the existence of stable
ule for each ship, find a truncation of each so that condition (t) continues
matchings, as follows. to hold: no two ships are ever in the same port on the same day.
(a) A strong instability in a perfect matching S consists of a man m and
Show that such a set of truncations can always be found, and give an
a woman tv, such that each of m and tv prefers the other to their
algorithm to find them.
partner in S. Does there always exist a perfect matching with no
Chapter 1 Introduction: Some Representative Problems Exercises 27
26

Example. Suppose we have two ships and two ports, and the "month" has
four days. Suppose the first ship’s schedule is
Junction Output 1
port P1; at sea; port P2~ at sea (meets Input 2
,, ", ~Junction before Input 1)
and the second ship’s schedule is
at sea; port Pff at sea; port P2 -
Then the (only) way to choose truncations would be to have the first ship
Output 2
remain in port Pz starting on day 3, and have the second ship remain in (meets Input 2
Junction [Junction
port P1 starting on day 2. before Input 1)

Some of your friends are working for CluNet, a builder of large commu-
nication networks, and they, are looking at algorithms for switching in a
particular type of input/output crossbar.
Here is the setup. There are n input wires and rt output wires, each Input 1 Input 2
directed from a source to a terminus. Each input wire meets each output (meets Output 2 (meets Output 1
before Output 1) before Output 2)
;~e in exactly one distinct point, at a special piece of hardware called
a junction box. Points on the ~e are naturally ordered in the direction Figure 1.8 An example with two input wires and two output wires. Input 1 has its
from source to terminus; for two distinct points x and y on the same junction with Output 2 upstream from its junction with Output 1; Input 2 has its
wire, we say, that x is upstream from y if x is closer to the source than junction with Output 1 upstream from its junction with Output 2. A valid solution is
to switch the data stream of Input 1 onto Output 2, and the data stream of Input 2
y, and otherwise we say, x is downstream from y. The order in which one onto Output 1. On the other hand, if the stream of Input 1 were switched onto Output
input wire meets the output ~es is not necessarily the same as the order 1, and the stream of Input 2 were switched onto Output 2, then both streams would
in which another input wire meets the output wires. (And similarly for pass through the junction box at the meeting of Input 1 and Output 2--and this is not
allowed.
the orders in which output wires meet input wires.) Figure !.8 gives an
example of such a collection of input and output wires.
Now, here’s the switching component of this situation. Each input
~e is carrying a distinct data stream, and this data stream must be For this problem, we will explore the issue of truthfulness in the Stable
switched onto one of the output wqres. If the stream of Input i is switched Matching Problem and specifically in the Gale-Shapley algorithm. The
onto Output j, at junction box B, then this stream passes through all basic question is: Can a man or a woman end up better off by lying about
his or her preferences? More concretely, we suppose each participant has
junction boxes upstream from B on input i, then through B, then through
a true preference order. Now consider a woman w. Suppose w prefers man
all junction boxes downstream from B on Output j. It does not matter
m to m’, but both m and m’ are low on her list of preferences. Can it be the
;vhich input data stream gets switched onto which output wire, but
each input data stream must be switched onto a different output wire. case that by switching the order of m and ra’ on her list of preferences (i.e.,
by falsely claiming that she prefers m’ to m) and nmning the algorithm
Furthermore--and this is the trick3, constraint--no two data streams can
with this false preference list, w will end up with a man m" that she truly
pass through the same junction box following the switching operation.
prefers to both m and m’? (We can ask the same question for men, but
Finally, here’s the problem. Show that for any specified pattern in will focus on the case of women for purposes of this question.)
which the input wires and output wires meet each other (each pair meet-
ing exactly once), a valid switching of the data streams can always be Resolve this question by doing one of the following two things:
found--one in which each input data stream is switched onto a different (a) Give a proof that, for any set of preference lists, switching the
output, and no two of the resulting streams pass through the same junc- order of a pair on the list cannot improve a woman’s partner in the Gale-
tion box. Additionally, give an algorithm to find such a valid switching. Shapley algorithm; or
28 Chapter 1 Introduction: Some Representative Problems

(b) Give an example of a set of preference lists for which there is

a switch that-Would improve the partner of a woman who switched
preferences.

Notes and Further Reading

The Stable Matching Problem was ~st defined and analyzed by Gale and
Shapley (1962); according to David Gale, their motivation for the problem
came from a story they had recently read in the Netv Yorker about the intricacies
of the college admissions process (Gale, 2001). Stable matching has grown
Basics A~gorithm Analysis
into an area of study in its own right, covered in books by Gusfield and Irving
(1989) and Knuth (1997c). Gusfield and Irving also provide a nice survey of
the "paralle!" history of the Stable Matching Problem as a technique invented
for matching applicants with employers in medicine and other professions.
As discussed in the chapter, our five representative problems will be Analyzing algorithms involves thinking about how their resource require-
central to the book’s discussions, respectively, of greedy algorithms, dynamic ments--the amount of time and space they use--will scale with increasing
programming, network flow, NP-completeness, and pSPACE-completeness. input size. We begin this chapter by talking about how to put this notion on a
We will discuss the problems in these contexts later in the book. concrete footing, as making it concrete opens the door to a rich understanding
of computational tractability. Having done this, we develop the mathematical
machinery needed to talk about the way in which different functions scale
with increasing input size, making precise what it means for one function to
grow faster than another.
We then develop running-time bounds for some basic algorithms, begin-
ning with an implementation of the Gale-Shapley algorithm from Chapter 1
and continuing to a survey of many different running times and certain char-
acteristic types of algorithms that achieve these running times. In some cases,
obtaining a good running-time bound relies on the use of more sophisticated
data structures, and we conclude this chapter with a very useful example of
such a data structure: priority queues and their implementation using heaps.

2.1 Computational Tractability

A major focus of this book is to find efficient algorithms for computational
problems. At this level of generality, our topic seems to ,encompass the whole
of computer science; so what is specific to our approach here?
First, we will txy to identify broad themes and design principles in the
development of algorithms. We will look for paradigmatic problems and ap-
proaches that illustrate, with a minimum of irrelevant detail, the basic ap-
proaches to designing efficient algorithms. At the same time, it would be
pointless to pursue these design principles in a vacuum, so the problems and
Chapter 2 Basics of Algorithm Analysis 2.1 Comput.atipnal Tractability "31
30

approaches we consider are drawn from fundamental issues that arise through- So what we could ask for is a concrete definition of efficiency that is
out computer science, and a general study of algorithms turns out to serve as platform-independent, instance-independent, and of predictive value with
a nice survey of computationa~ ideas that arise in many areas. respect to increasing input sizes. Before focusing on any specific consequences
of this claim, we can at least explore its implicit, high-level suggestion: that
Another property shared by many of the problems we study is their
we need to take a more mathematical view of the situation.
fundamentally discrete nature. That is, like the Stable Matching Problem, they
will involve an implicit search over a large set of combinatorial possibilities; We can use the Stable Matching Problem as an example to guide us. The
and the goal will be to efficiently find a solution that satisfies certain clearly input has a natural "size" parameter N; we could take this to be the total size of
delineated conditions. the representation of all preference lists, since this is what any algorithm for the
problem wi!l receive as input. N is closely related to the other natural parameter
As we seek to understand the general notion of computational efficiency, in this problem: n, the number of men and the number of women. Since there
we will focus primarily on efficiency in running time: we want algorithms that
are 2n preference lists, each of length n, we can view N = 2n2, suppressing
run quickly. But it is important that algorithms be efficient in their use of other
more fine-grained details of how the data is represented. In considering the
resources as well. In particular, the amount of space (or memory) used by an
problem, we will seek to describe an algorithm at a high level, and then analyze
algorithm is an issue that will also arise at a number of points in the book, and
its running time mathematically as a function of this input size N.
we will see techniques for reducing the amount of space needed to perform a
computation.

Worst-Case Running Times and Brute-Force Search

Some Initial Attempts at Defining Efficiency To begin with, we will focus on analyzing the worst-case running time: we will
The first major question we need to answer is the following: How should we look for a bound on the largest possible running time the algorithm could have
turn the fuzzy notion of an "efficient" algorithm into something more concrete? over all inputs of a given size N, and see how this scales with N. The focus on
A first attempt at a working definition of efficiency is the following. worst-case performance initially seems quite draconian: what if an algorithm
performs well on most instances and just has a few pathological inputs on
Proposed Definition of Efficiency (1): An algorithm is efficient if, when which it is very slow? This certainly is an issue in some cases, but in general
implemented, it runs quickly on real input instances. the worst-case analysis of an algorithm has been found to do a reasonable job
Let’s spend a little time considering this definition. At a certain leve!, it’s hard of capturing its efficiency in practice. Moreover, once we have decided to go
to argue with: one of the goals at the bedrock of our study of algorithms is the route of mathematical analysis, it is hard to find an effective alternative to
solving real problems quickly. And indeed, there is a significant area of research worst-case analysis. Average-case analysis--the obvious appealing alternative,
devoted to the careful implementation and profiling of different algorithms for in which one studies the performance of an algorithm averaged over "random"
discrete computational problems. instances--can sometimes provide considerable insight, but very often it can
also become a quagmire. As we observed earlier, it’s very hard to express the
But there are some crucial things missing from this definition, even if our full range of input instances that arise in practice, and so attempts to study an
main goal is to solve real problem instances quickly on real computers. The algorithm’s performance on "random" input instances can quickly devolve into
first is the omission of where, and how well, we implement an algorithm. Even
debates over how a random input should be generated: the same algorithm
bad algorithms can run quickly when applied to small test cases on extremely can perform very well on one class of random inputs and very poorly on
fast processors; even good algorithms can run slowly when they are coded
another. After all, real inputs to an algorithm are generally not being produced
sloppily. Also, what is a "real" input instance? We don’t know the ful! range of from a random distribution, and so average-case analysis risks telling us more
input instances that will be encountered in practice, and some input instances
about the means by which the random inputs were generated than about the
can be much harder than others. Finally, this proposed deflation above does algorithm itself.
not consider how well, or badly, an algorithm may scale as problem sizes grow
to unexpected levels. A common situation is that two very different algorithms So in general we will think about the worst-case analysis of an algorithm’s
will perform comparably on inputs of size 100; multiply the input size tenfold, running time. But what is a reasonable analytical benchmark that can tell us
and one will sti!l run quickly while the other consumes a huge amount of time. whether a running-time bound is impressive or weak? A first simple guide
32 Chapter 2 Basics of Algorithm Analysis 2.1 Computational Tractability

is by comparison with brute-force search over the search space of possible a consensus began to emerge on how to quantify the notion of a "reasonable"
solutions. running time. Search spaces for natural combinatorial problems tend to grow
exponentially in the size N of the input; if the input size increases by one, the
Let’s return to the example of the Stable Matching Problem. Even when
number of possibilities increases multiplicatively. We’d like a good algorithm
the size of a Stable Matching input instance is relatively small, the search
for such a problem to have a better scaling property: when the input size
space it defines is enormous (there are n! possible perfect matchings between
increases by a constant factor--say, a factor of 2--the algorithm should only
n men and n women), and we need to find a matching that is stable. The
slow down by some constant factor C.
natural "brute-force" algorithm for this problem would plow through all perfect
matchings by enumeration, checking each to see if it is stable. The surprising Arithmetically, we can formulate this scaling behavior as follows. Suppose
punchline, in a sense, to our solution of the Stable Matching Problem is that we an algorithm has the following property: There are absolute constants c > 0
needed to spend time proportional only to N in finding a stable matching from and d > 0 so that on every input instance of size N, its running time is
amgng this stupendously large space of possibilities. This was a conclusion we bounded by cNd primitive computational steps. (In other words, its running
reached at an analytical level. We did not implement the algorithm and try it time is at most proportional to Nd.) For now, we will remain deliberately
out on sample preference lists; we reasoned about it mathematically. Yet, at the vague on what we mean by the notion of a "primitive computational step"-
same time, our analysis indicated how the algorithm could be implemented in but it can be easily formalized in a model where each step corresponds to
practice and gave fairly conclusive evidence that it would be a big improvement a single assembly-language instruction on a standard processor, or one line
over exhaustive enumeration. of a standard programming language such as C or Java. In any case, if this
This will be a common theme in most of the problems we study: a compact’ running-time bound holds, for some c and d, then we say that the algorithm
representation, implicitly specifying a giant search space. For most of these has a polynomial running time, or that it is a polynomial-time algorithm. Note
problems, there will be an obvious brute-force solution: try all possibilities that any polynomial-time bound has the scaling property we’re looking for. If
and see if any one of them works. Not only is this approach almost always too the input size increases from N to 2N, the bound on the running time increases
slow to be useful, it is an intellectual cop-out; it provides us with absolutely from cNd to c(2N)a = c. 2aNa, which is a slow-down by a factor of 2a. Since d is
no insight into the structure of the problem we are studying. And so if there a constant, so is 2a; of course, as one might expect, lower-degree polynomials
is a common thread in the algorithms we emphasize in this book, it would be exhibit better scaling behavior than higher-degree polynomials.
the following alternative definition of efficiency. From this notion, and the intuition expressed above, emerges our third
attempt at a working definition of efficiency.
Proposed Definition of Efficiency (2): An algorithm is efficient if it achieves
qualitatively better worst-case performance, at an analytical level, than
Proposed Definition of Efficiency (3)" An algorithm is efficient if it has a
brute-force search.
polynomial running time.
This will turn out to be a very usefu! working definition for us. Algorithms
that improve substantially on brute-force search nearly always contain a
Where our previous definition seemed overly vague, this one seems much
valuable heuristic idea that makes them work; and they tell us something
too prescriptive. Wouldn’t an algorithm with running time proportional to
about the intrinsic structure, and computational tractability, of the underlying
nl°°--and hence polynomial--be hopelessly inefficient? Wouldn’t we be rel-
problem itself.
atively pleased with a nonpolynomial running time of nl+’02(l°g n)? The an-
But if there is a problem with our second working definition, it is vague- swers are, of course, "yes" and "yes." And indeed, however, much one may
ness. What do we mean by "qualitatively better performance?" This suggests try to abstractly motivate the definition of efficiency in terms of polynomial
that we consider the actual running time of algorithms more carefully, and try time, a primary justification for it is this: It really works. Problems for which
to quantify what a reasonable running time would be. polynomial-time algorithms exist almost invariably turn out to have algorithms
with running times proportional to very moderately growing polynomials like
Polynomial Time as a Definition of Efficiency n, n log n, n2, or n3. Conversely, problems for which no polynomial-time al-
When people first began analyzing discrete algorithms mathematicalfy--a gorithm is known tend to be very difficult in practice. There are certainly
thread of research that began gathering momentum through the 1960s-- exceptions to this principle in both directions: there are cases, for example, in
Chapter 2 Basics of Algorithm Analysis 2.2 Asymptotic Order of Growth 35
34

Table 2.1 The running times (rounded up) of different algorithms on inputs of previous definitions were completely subjective, and hence limited the extent
increasing size, for a processor performing a million high-leve! instructions per second. to which we could discuss certain issues in concrete terms.
In cases where the running time exceeds 10-’s years, we simply record the algorithm as In particular, the first of our definitions, which was tied to the specific
taking a very long time.
implementation of an algorithm, turned efficiency into a moving target: as
n n log rt
2 n2 /73 1.5n 2n n! processor speeds increase, more and more algorithms fal! under this notion of
n= I0 < I see < I sec < I sec < I sec < I SeE < I sec 4 see efficiency. Our definition in terms of polynomial time is much more an absolute
< I sec < i sec < I sec 18 rain 1025 years notion; it is closely connected with the idea that each problem has an intrinsic
n=30 < I sec < I sec
36 years very long level of computational tractability: some admit efficient solutions, and others
n=50 < 1 sec < I sec < 1 sec < 1 sec 11 mJn
do not.
= 100 < 1 sec < I sec < 1 sec 1 sec 12,892 years 1017 years vgry long
= 1,000 < 1 sec < 1 sec 1 sec 18 re_in very long very long very long
10,000 < I sec < I sec 2 min 12 days very long very long very long 2.2 Asymptotic Order of Growth
I00,000 < 1 sec 2 sec 3 hours 32 years very long very long very long Our discussion of computational tractability has turned out to be intrinsically
1,000,000 1 sec 20 sec 12 days 31,710 years very long very long very long based on our ability to express the notion that an algorithm’s worst-case
running time on inputs of size n grows at a rate that is at most proportiona! to
some function f(n). The function f(n) then becomes a bound on the rtmning
time of the algorithm. We now discuss a framework for talking about this
which an algorithm with exponential worst-case behavior generally runs well concept.
on the kinds of instances that arise in practice; and there are also cases where We will mainly express algorithms in the pseudo-code style that we used
the best polynomia!-time algorithm for a problem is completely impractical for the Gale-Shapley algorithm. At times we will need to become more formal,
due to large constants or a high exponent on the polynomial bound. All this but this style Of specifying algorithms will be completely adequate for most
serves to reinforce the point that our emphasis on worst-case, polynomial-time purposes. When we provide a bound on the running time of an algorithm,
bounds is only an abstraction of practical situations. But overwhelmingly, the we will generally be counting the number of such pseudo-code steps that
concrete mathematical definition of polynomial time has turned out to corre- are executed; in this context, one step wil! consist of assigning a value to a
spond surprisingly wel! in practice to what we observe about the efficiency of variable, looking up an entry in an array, following a pointer, or performing
algorithms, and the tractability of problems, in tea! life. an arithmetic operation on a fixed-size integer.
One further reason why the mathematical formalism and the empirical When we seek to say something about the running time of an algorithm on
evidence seem to line up well in the case of polynomial-time solvability is that inputs of size n, one thing we could aim for would be a very concrete statement
the gulf between the growth rates of polynomial and exponential functions such as, "On any input of size n, the algorithm runs for at most 1.62n2 +
is enormous. Suppose, for example, that we have a processor that executes 3.5n + 8 steps." This may be an interesting statement in some contexts, but as
a million high-level instructions per second, and we have algorithms with a general goal there are several things wrong with it. First, getting such a precise
running-time bounds of n, n log2 n, n2, n3, 1.5n, 2n, and n!. In Table 2.1, bound may be an exhausting activity, and more detail than we wanted anyway.
we show the running times of these algorithms (in seconds, minutes, days, Second, because our ultimate goal is to identify broad classes of algorithms that
or years) for inputs of size n = 10, 50, 50,100, 1,000, 10,000,100,000, and have similar behavior, we’d actually like to classify running times at a coarser
1,000,000. level of granularity so that similarities among different algorithms, and among
There is a final, fundamental benefit to making our definition of efficiency different problems, show up more clearly. And finally, extremely detailed
so specific: it becomes negatable. It becomes possible to express the notion statements about the number of steps an algorithm executes are often--in
that there is no efficient algorithm for a particular problem. In a sense, being a strong sense--meaningless. As just discussed, we will generally be counting
able to do this is a prerequisite for turning our study of algorithms into steps in a pseudo-code specification of an algorithm that resembles a high-
good science, for it allows us to ask about the existence or nonexistence level programming language. Each one of these steps will typically unfold
of efficient algorithms as a well-defined question. In contrast, both of our into some fixed number of primitive steps when the program is compiled into
Chapter 2 Basics of Algorithm Analysis 2.2 Asymptotic Order of Growth 37
36
where an algorithm has been proved to have running time O(n~); some years
an intermediate representation, and then into some further number of steps
depending on the particular architecture being used to do the computing. So pass, people analyze the same algorithm more carefully, and they show that
the most we can safely say is that as we look at different levels of computational in fact its running time is O(n2). There was nothing wrong with the first result;
abstraction, the notion of a "step" may grow or shrink by a constant factor-- it was a correct upper bound. It’s simply that it wasn’t the "tightest" possible
for example, .if it takes 25 low-level machine instructions to perform one running time.
operation in our high-level language, then our algorithm that took at most Asymptotic Lower Bounds There is a complementary notation for lower
1.62n2 + 3.5n + 8 steps can also be viewed as taking 40.5n2 + 87.5n + 200 steps bounds. Often when we analyze an algorithm--say we have just proven that
when we analyze it at a level that is closer to the actual hardware. its worst-case running time T(n) is O(n2)--we want to show that this upper
bound is the best one possible. To do this, we want to express the notion that for
O, s2, and ® arbitrarily large input sizes n, the function T(n) is at least a constant multiple of
For all these reasons, we want to express the growth rate of running times some specific function f(n). (In this example, f(n) happens to be n2.) Thus, we
and other functions in a way that is insensitive to constant factors and low- say that T(n) is ~2 if(n)) (also written T(n) = S2 if(n))) if there exist constants
order terms. In other words, we’d like to be able to take a running time like ~ > 0 and no _> 0 so that for all n > n0, we have T(n) > ~. f(n). By analogy with
the one we discussed above, 1.62n2 + 3.5n + 8, and say that it grows like n2, O(-) notation, we will refer to T in this case as being asymptotically lower-
up to constant factors. We now discuss a precise way to do this. bounded by f. Again, note that the constant ~ must be fixed, independent
of n.
Asymptotic Upper Bounds Let T(n) be a function--say, [he worst-case run-
ning time of a certain algorithm on an input of size n. (We will assume that ’ This definition works just like 0(.), except that we are bounding the
all the functions we talk about _here take nonnegative values.) Given another function T(n) from below, rather than from above. For example, returning
function f(n), we say that T(n) is Off(n)) (read as "T(n) is order f(n)") if, for to the function T(n) = pn2 + qn + r, where p, q, and r are positive constants,
sufficiently large n, the function T(n) is bounded above by a constant multiple let’s claim that T(n) = fl (n2). Whereas establishing the upper bound involved
of f(n). We will also sometimes write this as T(n) = Off(n)). More precisely, "inflating" the terms in T(n) until it looked like a constant times n2, now we
T(n) is Off(n)) if there exist constants c > 0 and no >_ 0 so that for all n >_ no, need to do the opposite: we need to reduce the size of T(n) until it looks like
a constant times n2. It is not hard to do this; for all n >_ O, we have
we have T(n) <_ c. f(n). In this case, we will say that T is asymptotically upper-
bounded by f. It is important to note that this definition requires a constant c
T(n) = pn2 + qn + r > pn2,
to exist that works for all n; in particular, c cannot depend on n.
As an example of how this definition lets us express upper bounds on which meets what is required by the definition of f2 (.) with ~ = p > 0.
running times, consider an algorithm whose running time (as in the earlier Just as we discussed the notion of "tighter" and "weaker" upper bounds,
discussion) has the form T(n) = pn2 + qn + r for positive constants p, q, and the same issue arises for lower bounds. For example, it is correct to say that
r. We’d like to claim that any such function is O(n2). To see why, we notice our function T(n) = pn2 + qn + r is S2 (n), since T(n) > pn2 > pn.
that for all n > 1, we have qn <_ qn2, and r < rn2. So we can write
Asymptotically Tight Bounds If we can show that a running time T(n) is
T(n) = pn~ + qn + r < pn2 + qn2 q- yn2 = (P q- q q- r)n2 both O(]’(n)) and also s2 ([(n)), then in a natural sense we’ve found the "right"
bound: T(n) grows exactly like [(n) to within a constant factor. This, for
for all n >_ 1. This inequality is exactly what the definition of O(-) requires: example, is the conclusion we can draw from the fact that T(n) -=pn2 q- qn q- r
T(n) < cn2, where c =p + q + r. is both O(n2) and f2 (n2).
Note that O(.) expresses only an upper bound, not the exact growth rate There is a notation to express this: if a function T(n) is both O([(n)) and
of the function. For example, just as we claimed that the function T(n)= S2([(n)), we say that T(n) is ®([(n)). In this case, we say that [(n) is an
pn2 + qn + r is O(n2), it’s also correct to say that it’s O(n3). Indeed, we just
asymptotically tight bound for T(n). So, for example, our analysis above shows
argued that T(n) <_ (p + q + r)n2, and since we also have n2 < n3, we can that T(n) = pn2 q- qn + r is ®(ha).
conclude that T(n) < (p + q + r)n~ as the definition of O(n~) requires. The
fact that a function can have many upper bounds is not just a trick of the Asymptotically tight bounds on worst-case running times are nice things
notation; it shows up in the analysis of running times as well. There are cases to find, since they characterize the worst-case performance of an algorithm
Chapter 2 Basics of Algorithm Analysis 2.2 Asymptotic Order of Growth 39
58
Proof. We’ll prove part (a) of this claim; the proof of part (b) is very similar.
precisely up to constant factors. And as the definition of ®(-) shows, one can
obtain such bounds by closing the gap between an upper bound and a lower For (a), we’re given that for some constants c and n0, we have f(n) <_ cg(n)
bound. For example, sometimes you will read a (slightly informally phrased) for all n >_ n0. Also, for some (potentially different) constants c’ and n~, we
sentence such as "An upper bound of O(n3) has been shown on the worst-case have g(n) <_ c’h(n) for all n _> n~. So consider any number n that is at least as
running time of the algorithm, but there is no example known on which the large as both no and n~. We have f(n) < cg(n) < cc’h(n), and so f(n) < cc’h(n)
algorithm runs for more than f2 (n2) steps." This is implicitly an invitation to for all n > max(no, n~). This latter inequality is exactly what is required for
search for an asymptotically tight bound on the algorithm’s worst-case running showing that f = O(h). ,,
time.
Sometimes one can also obtain an asymptotically tight bound directly by Combining parts (a) and (b) of (2.2), we can obtain a similar result
computing a limit as n goes to infinity. Essentially, if the ratio of functions for asymptotically tight bounds. Suppose we know that [ = ®(g) and that
f(n) and g(n) converges to a positive constant as n goes to infinity, then g = ®(h). Then since [ = O(g) and g = O(h), we know from part (a) that
[ = O(h); since [ = S2(g) and g = S2(h), we know from part (b) that [ =
f(n) = ®(g(n)).
It follows that [ = ® (h). Thus we have shown
(2.1) Let f and g be two functions that
(2.3) !f/=O(g) andg=®(h),thenf=®(h).
lim f(n___~)
.-->~ g(n) Sums of Functions It is also useful to have results that quantify the effect of
adding two functions. First, if we have an asymptotic upper bound that applies
exists and is equal to some number c > O. Then f(n) = ®(g(n)). to each of two functions f and g, then it applies to their sum.
Proof. We will use the fact that the limit exists and is positive to show that
(2.4) Suppose that f and g are two functions such that for some other function
f(n) = O(g(n)) and f(n) = S2(g(n)), as re.quired by the definition of ®(.). h, we have f = O(h) and g = O(h). Then f + g = O(h).
Since
Proof. We’re given that for some constants c and no, we have f(n) <_ Ch(n)
lira f(n)
n-+oo g(n) = c > 0, for all n > no. Also, for some (potentially different) constants c’ and no,
we have g(n) < c’h(n) for all n > no. ’ So consider any number n that is at
it follows from the definition of a limit that there is some no beyond which the least as large as both no and no.’ We have f(n) + g(n) <_ch(n) + c’h(n). Thus
ratio is always between ½c and 2c. Thus, f(n) < 2cg(n) for all n >_ no, which f(n) + g(n) <_ (c + c’)h(n) for all n _> max(no, n~), which is exactly what is
implies that f(n) = O(g(n)); and [(n) >_ ½cg(n) for all n >_ no, which implies required for showing that f + g = O(h). m
that [(n) = ~(g(n)). []
There is a generalization of this to sums of a fixed constant number of
Properties of Asymptotic Growth Rates functions k, where k may be larger than two. The result can be stated precisely
Having seen the definitions of O, S2, and O, it is useful to explore some of their as follows; we omit the proof, since it is essenti!lly the same as the proof of
basic properties. (2.4), adapted to sums consisting of k terms rather than just two.
Transitivity A first property is transitivity: if a function f is asymptotically (2.5) Let k be a fixed constant, and let fl, f2 ..... & and h be functions such
upper-bounded by a function g, and if g in turn is asymptotically upper- that fi = O(h) for all i. Then fl + f2 +"" + fk = O(h).
bounded by a function h, then f is asymptotically upper-bounded by h. A
similar property holds for lower bounds. We write this more precisely as There is also a consequence of (2.4) that covers the following kind of
follows. situation. It frequently happens that we’re analyzing an algorithm with two
high-level parts, and it is easy to show that one of the two parts is slower
(z.2) than the other. We’d like to be able to say that the running time of the whole
algorithm is asymptotically comparable to the running time of the slow part.
(a) !ff = O(g) and g = O(h), then f = O(h). Since the overall running time is a sum of two functions (the running times of
(b) If f = S2 (g) and g = ga (h), then f = ~2 (h).
2.2 Asymptotic Order of Growth- 41
40 Chapter 2 Basics of Algorithm Analysis

This is a good point at which to discuss the relationship between these

the two parts), results on asymptotic bounds for sums of functions are directly
types of asymptotic bounds and the notion of polynomial time, which we
relevant.
arrived at in the previous section as a way to formalize the more elusive concept
(2.6) Suppose that [ and g are two functions (taking nonnegative values) of efficiency. Using O(-) notation, it’s easy to formally define polynomial time:
such that g = Off). Then [ + g = ®([). In other words, [ is an asymptotically apolynomiaI-time algorithm is one whose running time T(n) is O(nd) for some
tight bound for the combined function [ + g. constant d, where d is independent of the input size.
So algorithms with running-time bounds like O(n2) and O(n3) are
Proof. Clearly f + g = f2 (f), since for all n >~ 0, we have f(n) + g(n) ~ f(n).
polynomial-time algorithms. But it’s important to realize that an algorithm
So to complete the proof, we need to show that f + g = O(f).
But this is a direct consequence of (2.4): we’re given the fact that g = O(f), can be polynomial time even if its running time is not written as n raised
and also f = O(f) holds for any function, so by (2.~) we have f + g = O(f). m to some integer power. To begin with, a number of algorithms have running
times of the form O(nx) for some number x that is not an integer. For example,
in Chapter 5 we will see an algorithm whose running time is O(n1"59); we will
This result also extends to the sum of any fixed, constant number of also see exponents less than 1, as in bounds like ®(J-K) = O(nl/2).
functions: the most rapidly growing among the functions is an asymptotically To take another common kind of example, we will see many algorithms
tight bound for the sum. whose running times have the form O(n log n). Such algorithms are also
polynomial time: as we will see next, log n < n for all n > 1, and hence
Asymptotic Bounds for Some Common Functions n log n < n2 for all n > 1. In other words, if an algorithm has nmning time
There are a number of functions that come up repeatedly in the analysis of O(n log n), then it also has running time O(n2), and so it is a polynomial-time
algorithms, and it is useful to consider the asymptotic properties of some of algorithm.
the most basic of these: polynomials, logarithms, and exponentials.
Logarithms Recall that logo n is the number x such that bx = n. One way
Polynomials Recall that a polynomial is-a function that can be written in
to get an approximate sense of how fast logb n grows is to note that, if we
the form f(n) = at + aln + a2n2 +" ¯ ¯ + aana for some integer constant d > 0,
round it down to the nearest integer, it is one less than the number of digits
where the final coefficient aa is nonzero. This value d is called the degree of the
in the base-b representation of the number n. (Thus, for example, 1 + log2 n,
polynomial. For example, the functions of the form pn2 + qn + r (with p ~ 0)
rounded down, is the number of bits needed to represent n.)
that we considered earlier are polynomials of degree 2.
So logarithms are very slowly growing functions. In particular, for every
A basic fact about polynomials is that their asymptotic rate of growth is
base b, the function logo n is asymptotically bounded by every function of the
determined by their "high-order term"--the one that determines the degree.
form nx, even for (noninteger) values of x arbitrary close to 0.
We state this more formally in the following claim. Since we are concerned here
only with functions that take nonnegative values, we will restrict our attention
(2.8) For every b > I and every x > O, we have logo n = O(nX).
to polynomials for which the high-order term has a positive coefficient aa > O.
(2.7) Let f be a polynomial of degree d, in which the coefficient aa is positive. One can directly translate between logarithms of different bases using the
Then f = O(nd). following fundamental identity:

Proof. We write f = ao + aln + a2n2 ÷ " " " ÷ aana, where aa > 0. The upper logb n
loga n --
bound is a direct application of (2.5). First, notice that coefficients aj forj < d logo a
may be negative, but in any case we have ajnJ <_ lajlna for all n > 1. Thus each
This equation explains why you’ll often notice people writing bounds like
term in the polynomial is O(na). Since f is a sum of a constant number of
O(!og n) without indicating the base of the logarithm. This is not sloppy
functions, each of which is O(na), it follows from (2.5) that f is O(na). ¯ usage: the identity above says that loga n =~1 ¯ !ogb n, so the point is that
loga n = ® (logo n), and the base of the logarithm is not important when writing
One can also show that under the conditions of (2.7), we have f = f2 (ha),
bounds using asymptotic notation¯
and hence it follows that in fact f = ® (ha).
2.3 Implementing the Stable Matching Algorithm Using Lists and Arrays
Chapter 2 Basics of Algorithm Analysis
42
an algorithm expressed in a high-level fashion--as we expressed the Gale-
Exponentials Exponential functions are functions of the form f(n) = rn for
Shapley Stable Matching algorithm in Chapter 1, for example--one doesn’t
some constant base r. Here we will be concerned with the case in which r > !,
have to actually program, compile, and execute it, but one does have to think
which results in a very fast-growing function. about how the data will be represented and manipulated in an implementation
In particular, where polynomials raise rt to a fixed exponent, exponentials of the algorithm, so as to bound the number of computational steps it takes.
raise a fixed number to n as a power; this leads to much faster rates of growth.
The implementation of basic algorithms using data structures is something
One way to summarize the relationship between polynomials and exponentials
that you probably have had some experience with. In this book, data structures
is as follows. will be covered in the context of implementing specific algorithms, and so we
(2.9) For every r > 1 and every d > O, we have na = O(rn). will encounter different data structures based on the needs of the algorithms
we are developing. To get this process started, we consider an implementation
In particular, every exponential grows faster thari every polynomial. And as of the Gale-Shapley Stable Matching algorithm; we showed earlier that the
we saw in Table 2.1, when you plug in actual values of rt, the differences in algorithm terminates in at most rt2 iterations, and our implementation here
growth rates are really quite impressive. provides a corresponding worst-case running time of O(n2), counting actual
Just as people write O(log rt) without specifying the base, you’ll also see computational steps rather than simply the total number of iterations. To get
people write "The running time of this algorithm is exponential," without such a bound for the Stable Matching algorithm, we will only need to use two
specifying which exponential function they have in mind. Unlike the liberal of the simplest data structures: lists and arrays. Thus, our implementation also
use of log n, which is iustified by ignoring constant factors, this generic use of provides a good chance to review the use of these basic data structures as well.
the term "exponential" is somewhat sloppy. In particular, for different bases In the Stable Matching Problem, each man and each woman has a ranking
r > s > 1, it is never the case that rn = ® (sn). Indeed, this would require that of all members of the opposite gender. The very first question we need to
for some constant c > 0, we would have rn _< csn for all sufficiently large ft. discuss is how such a ranking wil! be represented. Further, the algorithm
But rearranging this inequality would give (r/s)n < c for all sufficiently large maintains a matching and will need to know at each step which men and
ft. Since r > s, the expression (r/s)n is. tending to infinity with rt, and so it women are free, and who is matched with whom. In order to implement the
cannot possibly remain bounded by a fixed constant c. algorithm, we need to decide which data structures we will use for all these
So asymptotically speaking, exponential functions are all different. Still, things.
it’s usually clear what people intend when they inexactly write "The running An important issue to note here is that the choice of data structure is up
time of this algorithm is exponential"--they typically mean that the running to the algorithm designer; for each algorithm we will choose data structures
time grows at least as fast as some exponential function, and all exponentials that make it efficient and easy to implement. In some cases, this may involve
grow so fast that we can effectively dismiss this algorithm without working out preprocessing the input to convert it from its given input representation into a
flLrther details of the exact running time. This is not entirely fair. Occasionally data structure that is more appropriate for the problem being solved.
there’s more going on with an exponential algorithm than first appears, as
we’!l see, for example, in Chapter 10; but as we argued in the first section of
this chapter, it’s a reasonable rule of thumb. Arrays and Lists
Taken together, then, logarithms, polynomials, and exponentials serve as To start our discussion we wi!l focus on a single list, such as the list of women
useful landmarks in the range of possible functions that you encounter when in order of preference by a single man. Maybe the simplest way to keep a list
analyzing running times. Logarithms grow more slowly than polynomials, and of rt elements is to use an array A of length n, and have A[i] be the ith element
polynomials grow more slowly than exponentials. of the list. Such an array is simple to implement in essentially all standard
programming languages, and it has the following properties.
2.3 Implementing the Stable Matching Algorithm We can answer a query of the form "What is the ith element on the list?"
IJsing Lists and Arrays in O(1) time, by a direct access to the value A[i].
We’ve now seen a general approach for expressing bounds on the running If we want to determine whether a particular element e belongs to the
time of an algorithm. In order to asymptotically analyze the running time of list (i.e., whether it is equal to A[i] for some i), we need to check the
2.3 Implementing the Stable Matching Algorithm Using Lisis and Arrays
Chapter 2 Basics of Algorithm Analysis 45

elements one by one in O(n) time, assuming we don’t know anything Before deleting e:
Element e
about the order in which the elements appear in A.
If the array elements are sorted in some clear way (either numerically
or alphabetically), then we can determine whether an element e belongs
to the list in O(log n) time using binary search; we will not need to use After deleting e:
binary search for any part of our stable matching implementation, but Element e
we will have more to say about it in the next section.

An array is less good for dynamically maintaining a list of elements that

changes over time, such as the fist of flee men in the Stable Matching algorithm;
since men go from being flee to engaged, and potentially back again, a list of Figure 2.1 A schematic representation of a doubly linked fist, showing the deletion of
an element e.
flee men needs to grow and shrink during the execution of the algorithm. It
is generally cumbersome to frequently add or delete elements to a list that is
maintained as an array.
essentially the reverse of deletion, and indeed one can see this operation
An alternate, and often preferable, way to maintain such a dynamic set at work by reading Figure 2.1 from bottom to top.
of elements is via a linked list. In a linked list, the elements are sequenced
together by having each element point to the next in the list. Thus, for each Inserting or deleting e at the beginning of the list involves updating the First
element v on the list, we need to maintain a pointer to the next element; we pointer, rather than updating the record of the element before e.
set this pointer to nail if i is the last element. We also have a pointer First While lists are good for maintaining a dynamically changing set, they also
that points to the first element. By starting at First and repeatedly following have disadvantages. Unlike arrays, we cannot find the ith element of the list in
pointers to the next element until we reach null, we can thus traverse the entire 0(1) time: to find the ith element, we have to follow the Next pointers starting
contents of the list in time proportional tO its length. from the beginning of the list, which takes a total of O(i) time.
A generic way to implement such a linked list, when the set of possible Given the relative advantages and disadvantages of arrays and lists, it may
elements may not be fixed in advance, is to allocate a record e for each element happen that we receive the input to a problem in one of the two formats ,and
that we want to include in the list. Such a record would contain a field e.val want to convert it into the other. As discussed earlier, such preprocessing is
that contains the value of the element, and a field e.Next that contains a often useful; and in this case, it is easy to convert between the array and
pointer to the next element in the list. We can create a doubly linked list, which list representations in O(n) time. This allows us to freely choose the data
is traversable in both directions, by also having a field e.Prev that contains structure that suits the algorithm better and not be constrained by the way
a pointer to the previous element in the list. (e.Prev = null if e is the first the information is given as input.
element.) We also include a pointer Last, analogous to First, that points to
the last element in the list. A schematic illustration of part of such a list is
Implementing the Stable Matching Algorithm
shown in the first line of Figure 2.1.
Next we will use arrays and linked lists to implement the Stable Matching algo-
A doubly linked list can be modified as follows.
rithm from Chapter 1. We have already shown that the algorithm terminates in
at most n2 iterations, and this provides a type of upper bound on the running
o Deletion. To delete the element e from a doubly linked list, we can just
"splice it out" by having the previous element, referenced by e.Prev, and time. However, if we actually want to implement the G-S algorithm so that it
the next element, referenced by e.Igext, point directly to each other. The runs in time proportional to n2, we need to be able to implement each iteration
in constant time. We discuss how to do this now.
deletion operation is illustrated in Figure 2.1.
o Insertion. To insert element e between elements d and f in a list, we For simplicity, assume that the set of men and women are both {1 ..... n}.
"splice it in" by updating d.Igext and/.Prey to point to e, and the Next To ensure this, we can order the men and women (say, alphabetically), and
and Prey pointers of e to point to d and f, respectively. This operation is associate number i with the ith man mi or ith women wi in this order. This
Chapter 2 Basics of Algorithm Analysis 2.4 A Survey of Common Running Times 47
46

assumption (or notation) allows us to define an array indexed by all men Maybe the trickiest question is how to maintain women’s preferences to
or all women. We need to have a preference list for each man and for each keep step (4) efficient. Consider a step of the algorithm, when man m proposes
woman. To do this we will haye two arrays, one for women’s preference lists to a woman w. Assume w is already engaged, and her current partner is
and one for the men’s preference lists; we will use ManPref Ira, i] to denote rn’ =Current[w]. We would like to decide in O(1) time if woman w prefers rn
the ith woman on man m’s preference hst, and similarly WomanPref [w, i] to or rn’. Keeping the women’s preferences in an array IqomanPref, analogous to
be the ith man on the preference list of woman w. Note that the amount of the one we used for men, does not work, as we would need to walk through
space needed to give the preferences for all 2rt individuals is O(rt2), as each w’s list one by one, taking O(n) time to find m and rn’ on the list. While O(rt)
is still polynomial, we can do a lot better if we build an auxiliary data structure
person has a list of length n.
at the beginning.
We need to consider each step of the algorithm and understand what data
structure allows us to implement it efficiently. Essentially, we need to be able At the start of the algorithm, we create an n. x n array Ranking, where
Ranking[w, m] contains the rank of man m in the sorted order of w’s prefer-
to do each of four things in constant time.
ences. By a single pass through w’s preference list, we can create this array in
linear time for each woman, for a total initial time investment proportional to
1. We need to be able to identify a free man. rt2. Then, to decide which of m or m’ is preferred by w, we simply compare
2. We need, for a man m, to be able to identify the highest-ranked woman the values Ranking[w, rrt] and Ranking[w, rrt’].
to whom he has not yet proposed. This allows us to execute step (4) in constant time, and hence we have
3. For a woman w, we need to decide if w is currently engaged, and if she everything we need to obtain the desired running time.
is, we need to identify her current partner.
¯ 4. For a woman w and two men m and m’, we need to be able to decide, (2.10) The data structures described above allow us to implernentthe G-S
again in constant time, which of m or m’ is preferred by w. algorithm in O(n2) time.

First, consider selecting a free man. We will do this b_y maintaining the set
of flee men as a linked list. When we need to select a flee man, we take the
first man m on this list. We delete m from the list if he becomes engaged, and 2.4 A Survey of Common Running Times
possibly insert a different man rn’, if some other man m’ becomes free. In this When trying to analyze a new algorithm, it helps to have a rough sense of
case, m’ can be inserted at the front of the list, again in constant time. the "landscape" of different running times. Indeed, there are styles of analysis
Next, consider a man m. We need to identify the highest-ranked woman that recur frequently, and so when one sees running-time bounds like O(n),
to whom he has not yet proposed. To do this we will need to maintain an extra O(n log n), and O(n2) appearing over and over, it’s often for one of a very
array Next that indicates for each man m the position of the next woman he small number of distinct reasons. Learning to recognize these common styles
wil! propose to on his list. We initialize Next [m] = 1 for al! men m. If a man m of analysis is a long-term goal. To get things under way, we offer the following
needs to propose to a woman, he’ll propose to w = ManPref[m,Next [re]I, and survey of common running-time bounds and some of the typical .approaches
once he prdposes to w, we increment the value of Next[m] by one, regardless that lead to them.
of whether or not w accepts the proposal. Earlier we discussed the notion that most problems have a natural "search
Now assume man m proposes to woman w; we need to be able to ~denfify space"--the set of all possible solutions--and we noted that a unifying theme
the man m’ that w is engaged to (if there is such a man). We can do this by in algorithm design is the search for algorithms whose performance is more
maintaining an array Current of length n, where Current[w] is the woman efficient than a brute-force enumeration of this search space. In approaching a
w’s current partner m’. We set Current [w] to a special null symbol when we new problem, then, it often helps to think about two kinds of bounds: one on
need to indicate that woman w is not currently engaged; at the start of the the running time you hope to achieve, and the other on the size of the problem’s
algorithm, Current[w] is initialized to this null symbol for all women w. natural search space (and hence on the running time of a brute-force algorithm
To sum up, the data structures we have set up thus far can implement the for the problem). The discussion of running times in this section will begin in
rhany cases with an analysis of the brute-force algorithm, since it is a useful
operations (1)-(3) in O(1) time each.
Chapter 2 Basics of Algorithm Analysis 2.4 A Survey of Common Running Times 49
48

way to get one’s bearings with respect to a problem; the task of improving on qn that is also arranged in ascending
such algorithms will be our goal in most of the book. order. For example, merging the lists 2, 3, 11, 19 and 4, 9, 16, 25 results in the
output 2, 3, 4, 9, 11, 16, 19, 25.
To do this, we could just throw the two lists together, ignore the fact that
Linear Time
they’re separately arranged in ascending order, and run a sorting algorithm.
An algorithm that runs in O(n), or linear, time has a very natural property: But this clearly seems wasteful; we’d like to make use of the existing order in
its running time is at most a constant factor times the size of the input. One the input. One way to think about designing a better algorithm is to imagine
basic way to get an algorithm with this running time is to process the input performing the merging of the two lists by hand: suppose you’re given two
in a single pass, spending a constant amount of time on each item of input piles of numbered cards, each arranged in ascending order, and you’d like to
encountered. Other algorithms achieve a linear time bound for more subtle produce a single ordered pile containing all the cards. If you look at the top
reasons. To illustrate some of the ideas here, we c6nsider two simple linear- card on each stack, you know that the smaller of these two should go first on
time algorithms as examples. the output pile; so you could remove this card, place it on the output, and now
Computing the Maxirrturrt Computing the maximum of n numbers, for ex- iterate on what’s left.
ample, can be performed in the basic "one-pass" style. Suppose the numbers In other words, we have the following algorithm.
are provided as input in either a list or an array. We process the numbers
an in order, keeping a running estimate of the maximum as we go.
Each time we encounter a number ai, we check whether ai is larger than our bn:
current estimate, and if so we update the estimate to Maintain a Cu~ent pointer into each list, initialized to
point to the front elements
While both lists are nonempty:
Let a~ and ~ be the elements pointed to by the Cu~ent pointer
For i= 2 to n
Append the smaller of these two to the output list
If ai> max then
Advance the Cu~ent pointer in the list from which the
set max---- ai
smaller element was selected
Endif
EndWhile
Endfor
Once one list is empty, append the remainder of the other list
to the output
In this way, we do constant work per element, for a total running time of O(n).
Sometimes the constraints of an application force this kind of one-pass See Figure 2.2 for a picture of this process.
algorithm on you--for example, an algorithm running on a high-speed switch
on the Internet may see a stream of packets flying past it, and it can try
computing anything it wants to as this stream passes by, but it can only perform
a constant amount of computational work on each packet, and it can’t save
the stream so as to make subsequent scans through it. Two different subareas
of algorithms, online algorithms and data stream algorithms, have developed
IaAppend the smaller of~
and bj to the output.)

to study this model of computation.

Merging Two Sorted Lists Often, an algorithm has a running time of O(n), A
but the reason is more complex. We now describe an algorithm for merging
two sorted lists that stretches the one-pass style of design just a little, but still B
has a linear running time.
an and Figure 2.2 To merge sorted lists A and B, we repeatedly extract the smaller item from
the front of the two lists and append it to the ou~ut.
bn, and each is already arranged in ascending order. We’d like to
2.4 A Survey of Common Running Times 51
50 Chapter 2 Basics of Algorithm Analysis

Now, to show a linear-time bound, one is tempted to describe an argument One also frequently encounters O(n log n) as a running time simply be-
like what worked for the maximum-finding algorithm: "We do constant work cause there are many algorithms whose most expensive step is to sort the
per element, for a total running time of O(n)." But it is actually not true that input. For example, suppose we are given a set of n time-stamps xl, x2 ..... xn
we do only constant work per element. Suppose that n is an even number, and on which copies of a file arrived at a server, and we’d like to find the largest
interval of time between the first and last of these time-stamps during which
consider the lists A = 1, 3, 5 ..... 2n - 1 and B = n, n + 2, n + 4 ..... 3n - 2.
The number b1 at the front of list B will sit at the front of the list for no copy of the file arrived. A simple solution to this problem is to first sort the
iterations while elements from A are repeatedly being selected, and hence time-stamps x~, x2 ..... xn and then process them in sorted order, determining
it will be involved in f2 (n) comparisons. Now, it is true that each element the sizes of the gaps between each number and its successor in ascending
can be involved in at most O(n) comparisons (at worst, it is compared with order. The largest of these gaps is the desired subinterval. Note that this algo-
each element in the other list), and if we sum this over all elements we get rithm requires O(rt log n) time to sort the numbers, and then it spends constant
a running-time bound of O(n2). This is a correct boflnd, but we can show work on each number in ascending order. In other words, the remainder of the
something much stronger. algorithm after sorting follows the basic recipe for linear time that we discussed
earlier.
The better way to argue is to bound the number of iterations of the While
loop by an "accounting" scheme. Suppose we charge the cost of each iteration
to the element that is selected and added to the output list. An element can Quadratic Time
be charged only once, since at the moment it is first charged, it is added Here’s a basic problem: suppose you are given n points in the plane, each
to the output and never seen again by the algorithm. But there are only 2n specified by (x, y) coordinates, and you’d like to find the pair of points that
elements total, and the cost of each iteration is accounted for by a charge to are closest together. The natural brute-force algorithm for this problem would,
some element, so there can be at most 2n iterations. Each iteration involves a enumerate all pairs of points, compute the distance between each pair, and
constant amount of work, so the total running time is O(n), as desired. then choose the pair for which this distance is smallest.
While this merging algorithm iterated through its input lists in order, the What is the running time of this algorithm? The number of pairs of points
"interleaved" way in which it processed the lists necessitated a slightly subtle is (~)_ n(n-1)2 , and since this quantity is bounded by ½n2, it is O(n2). More
running-time analysis. In Chapter 3 we will see linear-time algorithms for crudely, the number of pairs is O(n2) because we multiply the number of
graphs that have an even more complex flow of control: they spend a constant ways of choosing the first member of the pair (at most n) by the number
amount of time on each node and edge in the underlying graph, but the order of ways of choosing the second member of the pair (also at most n). The
in which they process the nodes and edges depends on the structure of the distance between points (xi, yi) and (xj, yj) can be computed by the formula
graph. ( (x~ - x/)2 + (y~ - yj)2 in constant time, so the overall running time is O(rt2).
This example illustrates a very common way in which a rtmning time of O(n2)
arises: performing a search over all pairs of input items and spending constant
O(rt log n) Time time per pair.
O(n log n) is also a very common running time, and in Chapter 5 we will Quadratic time also arises naturally from a pair of nested loops: An algo-
see one of the main reasons for its prevalence: it is the running time of any rithm consists of a !oop with O(n) iterations, and each iteration of the loop
algorithm that splits its input into two equa!-sized pieces, solves each piece launches an internal loop that takes O(n) time. Multiplying these two factors
recursively, and then combines the two solutions in linear time. of n together gives the running time.
Sorting is perhaps the most well-known example of a problem that can be The brute-force algorithm for finding the closest pair of points can be
solved this way. Specifically, the Mergesort algorithm divides the set of input written in an equivalent way with two nested loops:
numbers into two equal-sized pieces, sorts each half recursively, and then
merges the two sorted halves into a single sorted output list. We have just
For each input point (xi, yi)
seen that the merging can be done in linear time; and Chapter 5 will discuss
how to analyze the recursion so as to get a bound of O(n log n) on the overall For each other input point (~, ~)
Compute distance d = J(xi - ~)2 +
running time. ¥
Chapter 2 Basics of Algorithm Analysis 2.4 A Survey of Common Running Times 53

If d is less th~n the current minimum, update minimum to d Report that S~ and Sj are disjoint
End/or Endif
End/or End/or
End/or
Note how the "inner" loop, over (xj, yj), has O(n) iterations, each taking
constant time; and the "outer" loop, over (xi, yi), has O(n) iterations, each Each of the sets has maximum size O(n), so the innermost loop takes time
invoking the inner loop once. O(n). Looping over the sets S] involves O(n) iterations around this innermos~
It’s important to notice that the algorithm we’ve been discussing for the loop; and looping over the sets Si involves O(n) iterations around this. Multi-
Closest-Pair Problem really is just the brute-force approach: the natural search plying these three factors of n together, we get the running time of O(n3).
space for this problem has size O(n2), and _we’re simply enumerating it. At For this problem, there are algorithms that improve on O(n3) running
first, one feels there is a certain inevitability about thi~ quadratic algorithm-- time, but they are quite complicated. Furthermore, it is not clear whether
we have to measure all the distances, don’t we?--but in fact this is an illusion. the improved algorithms for this problem are practical on inputs of reasonable
In Chapter 5 we describe a very clever algorithm that finds the closest pair of size.
points in the plane in only O(n log n) time, and in Chapter 13 we show how
randomization can be used to reduce the running time to O(n).
O(nk) Time
In the same way that we obtained a running time of O(n2) by performing brute-
Cubic Time
force search over all pairs formed from a set of n items, we obtain a running
More elaborate sets of nested loops often lead to algorithms that run in time of O(nk) for any constant k when we search over all subsets of size k.
O(n3) time. Consider, for example, the following problem. We are given sets
n}, and we would like Consider, for example, the problem of finding independent sets in a graph,
which we discussed in Chapter 1. Recall that a set of nodes is independent
to know whether some pair of these sets is disjoint--in other words, has no
if no two are joined by an edge. Suppose, in particular, that for some fixed
elements in common.
constant k, we would like to know if a given n-node input graph G has an
What is the running time needed to solve this problem? Let’s suppose that independent set of size k. The natural brute-force aigorithm for this problem
each set Si is represented in such a way that the elements of Si can be listed in would enumerate all subsets of k nodes, and for each subset S it would check
constant time per element, and we can also check in constanttime whether a whether there is an edge joining any two members of S. That is,
given number p belongs to Si. The following is a direct way to approach the
problem.
For each subset S of k nodes
Check whether S constitutes an independent set
For pair of sets Si snd S]
If S is an independent set then
Determine whether Si ~ud S] have ~u element in common
Stop and declare success
End/or
Endif
End/or
This is a concrete algorithm, but to reason about its running time it helps to
If no k-node independent set was fotmd then
open it up (at least conceptually) into three nested loops.
Declare failure
Endif
For each set Si
For each other set S]
To understand the running time of this algorithm, we need to consider two
For each element p of St
quantities. First, the total number of k-element subsets in an n-element set is
Determine whether p also belongs to Sj
End/or
If no element of S~ belongs to Sj then nk) n(n- 1)(n - 2)... (n- k+ 1) nk
2.4 A Survey of Common Running Times 5,5
Chapter 2 Basics of Algorithm Analysis
54
of subsets of an n-element set is 2n, and so the outer loop in this algorithm
Since we are treating k as a constant, this quantity is O(nk). Thus, the outer will run for 2n iterations as it tries all these subsets. Inside the loop, we are
loop in the algorithm above will run for O(n~) iterations as it tries all k-node
checking all pairs from a set S that can be as large as n nodes, so each iteration
subsets of the n nodes of the graph. of the !oop takes at most O(n2) time. Multiplying these two together, we get-a
Inside this loop, we need to test whether a given set S of k nodes constitutes rulming time of O(n22n).
an independent set. The definition of an independent set tells us that we need
Thus see that 2n arises naturally as a running time for a search algorithm
to check, for each pair of nodes, whether there is an edge joining them. Hence
that must consider all subsets. In the case of Independent Set, something
this is a search over pairs, like we saw earlier in the discussion of quadratic
at least nearly this inefficient appears to be necessary; but it’s important
time; it requires looking at (~2), that is, o(k2), pairs and spending constant time
to ke~p in mind that 2n is the size of the search space for many problems,
on each. and for many of them we wil! be able to find highly efficient polynomial-
Thus the total running time is O(k2n~). Since we are treating k as a constant time algorithms. For example, a brute-force search algorithm for the Interval
here, and since constants can be dropped in O(-) notation, we can write this Scheduling Problem that we saw in Chapter 1 would look very similar to the
running time as O(nk). algorithm above: try all subsets of intervals, and find the largest subset that has
Independent Set is a principal example of a problem believed to be compu- no overlaps. But in the case of the Interval Scheduling Problem, as opposed
tationally hard, and in particular it is believed that no algorithm to find k-node to the Independent Set Problem, we will see (in Chapter 4) how to find an
independent sets in arbitrary graphs can avoid having some dependence on k optimal solution in O(n log n) time. This is a recurring kind of dichotomy in
in the exponent. However, as we will discuss in Chapter 10 in the context of the study of algorithms: two algorithms can have very similar-looking search
a related problem, even once we’ve conceded that brute-force search over k- spaces, but in one case you’re able to bypass the brute-force search algorithm,
element subsets is necessary, there can be different ways of going about this and in the other you aren’t.
that lead to significant differences in the efficiency of the computation. The function n! grows even more rapidly than 2n, so it’s even more
menacing as a bound on the performance of an algorithm. Search spaces of
Beyond Polynomial Time size n! tend to arise for one of two reasons. First, n! is the number of ways to
The previous example of the Independent Set Problem starts us rapidly down match up n items with n other items--for example, it is the number of possible
the path toward running times that grow faster than any polynomial. In perfect matchings of n men with n women in an instance of the Stable Matching
particular, two kinds of bounds that coine up very frequently are 2n and Problem. To see this, note that there are n choices for how we can match up
the first man; having eliminated this option, there are n - 1 choices for how we
and we now discuss why this is so.
can match up the second man; having eliminated these two options, there are
Suppose, for example, that we are given a graph and want to find an n - 2 choices for how we can match up the third man; and so forth. Multiplying
independent set of maximum size (rather than testing for the existence of one all these choices out, we get n(n - 1)(n - 2) -- ¯ (2)(1) = n!
with a given number of nodes). Again, people don’t know of algorithms that
improve significantly on brute-force search, which in this case would look as Despite this enormous set of possible solutions, we were able to solve
the Stable Matching Problem in O(n2) iterations of the proposal algorithm.
fol!ows.
In Chapter 7, we will see a similar phenomenon for the Bipartite Matching
Problem we discussed earlier; if there are n nodes on each side of the given
For each subset S of nodes bipartite graph, there can be up to n! ways of pairing them up. However, by
Check whether S constitutes ~n independent set a fairly subtle search algorithm, we will be able to find the largest bipartite
If g is a larger independent set than the largest seen so far then
matching in O(n3) time.
~ecord the size of S as the current maximum
The function n! also arises in problems where the search space consists
Endif
of all ways to arrange n items in order. A basic problem in this genre is the
Endfor
Traveling Salesman Problem: given a set of n cities, with distances between
all pairs, what is the shortest tour that visits all cities? We assume that the
This is very much like the brute-force algorithm for k-node independent sets, salesman starts and ends at the first city, so the crux of the problem is the
except that now we are iterating over all subsets of the graph. The total number
56 Chapter 2 Basics of Algorithm Analysis 2.5 A More Complex Data Structure: Priority Queues
57

implicit search over all orders of the remaining n - 1 cities, leading to a search 2.5 A More Complex Data Structure:
space of size (n- 1)!. In Chapter 8, we will see that Traveling Salesman Priority Queues
is another problem that, like Independent Set, belongs to the class of NP-
Our primary goal in this book was expressed at the outset of the chapter:
complete problems and is believed to have no efficient solution.
we seek algorithms that improve qualitatively on brute-force search, and in
general we use polynomial-time solvability as the concrete formulation of
this. Typically, achieving a polynomial-time solution to a nontrivial problem
Sublinear Time is not something that depends on fine-grained implementation details; rather,
Finally, there are cases where one encounters running times that are asymp- the difference between exponential and polynomial is based on overcoming
totically smaller than linear. Since it takes linear time just to read the input, higher-level obstacles. Once one has an efficient algorithm to solve a problem,
these situations tend to arise in a model of computation where the input can be however, it is often possible to achieve further improvements in running time
"queried" indirectly rather than read completely, and the goal is to minimize by being careful with the implementation details, and sometimes by using
the amount of querying that must be done. more complex data structures.
Some complex data structures are essentially tailored for use in a single
Perhaps the best-known example of this is the binary search algorithm.
kind of algorithm, while others are more generally applicable. In this section,
Given a sorted array A of n numbers, we’d like to determine whether a given
we describe one of the most broadly useful sophisticated data structures,
number p belongs to the array. We could do this by reading the entire array,
but we’d like to do it much more efficiently, taking advantage of the fact that the priority queue. Priority queues will be useful when we describe how to
the array is sorted, by carefully probing particular entries. In particular, we implement some of the graph algorithms developed later in the book. For our
purposes here, it is a useful illustration of the analysis of a data structure that,
probe the middle entry of A and get its value--say it is q--and we compare q
to p. If q = p, we’re done. If q > p, then in order for p to belong to the array unlike lists and arrays, must perform some nontrivial processing each time it
A, it must lie in the lower half of A; so we ignore the upper half of A from is invoked.
now on and recursively apply this search in the lower half. Finally, ff q < p,
then we apply the analogous reasoning and recursively search in the upper ~ The Problem
half of A. In the implementation of the Stable Matching algorithm in Section 2.3, we
The point is that in each step, there’s a region of A where p might possibly discussed the need to maintain a dynamically changing set S (such as the set
be; and we’re shrinking the size of this region by a factor of two with every of all free men in that case). In such situations, we want to be able to add
probe. So how large is the "active" region of A after k probes? It starts at size elements to and delete elements from the set S, and we want to be able to
n, so after k probes it has size at most (½)kn. select an element from S when the algorithm calls for it. A priority queue is
Given this, how long will it take for the size of the active region-to be designed for applications in which elements have a priority value, or key, and
reduced to a constant? We need k to be large enough so that (½)k = O(1/n), each time we need to select an element from S, we want to take the one with
and to do this we can choose k = log2 n. Thus, when k = log2 n, the size of highest priority.
the active region has been reduced to a constant, at which point the recursion A priority queue is a data structure that maintains a set of elements S,
bottoms out and we can search the remainder of the array directly in constant where each element v ~ S has an associated value key(v) that denotes the
time. priority of element v; smaller keys represent higher priorities. Priority queues
So the running time of binary search is O(log n), because of this successive support the addition and deletion of elements from the set, and also the
shrinking of the search region. In general, O(log n) arises as a time bohnd selection of the element with smallest key. Our implementation of priority
whenever we’re dealing with an algorithm that does a constant amount of queues will also support some additional operations that we summarize at the
work in order to throw away a constant fraction of the input. The crucial fact end of the section.
is that O(log n) such iterations suffice to shrink the input down to constant A motivating application for priority queues, and one that is useful to keep
size, at which point the problem can generally be solved directly. in mind when considering their general function, is the problem of managing
Chapter 2 Basics of Algorithm Analysis 2.5 A More Complex Data Structure: Priority Queues 59
58

real-time events such as the scheduling of processes on a computer. Each of a priority queue. We could just have the elements in a list, and separately
process has a priority, or urgency, but processes do not arrive in order of have a pointer labeled M±n to the one with minimum key. This makes adding
their priorities. Rather, we have a current set of active processes, and we want new elements easy, but extraction of the minimum hard. Specifically, finding
to be able to extract the one with the currently highest priority and run it. the minimum is quick--we just consult the M±n pointer--but after removing
We can maintain the set of processes in a priority queue, with the key of a this minimum element, we need to update the ~±n pointer to be ready for the
process representing its priority value. Scheduling the highest-priority process next operation, and this would require a scan of all elements in O(n) time to
Corresponds to selecting the element with minimum key from the priority find the new minimum.
queue; concurrent with this, we will also be inserting new processes as they This complication suggests that we should perhaps maintain the elements
arrive, according to their priority values. in the sorted order of the keys. This makes it easy to extract the element with
How efficiently do we hope to be able to execute the operations in a priority smallest key, but now how do we add a new element to our set? Should we
queue? We will show how to implement a priority queue containing at most have the elements in an array, or a linked list? Suppose we want to add s
n elements at any time so that elements can be added and deleted, and the with key value key(s). If the set S is maintained as a sorted array, we can use
element with minimum key selected, in O(log n) time per operation. binary search to find the array position where s should be inserted in O(log n)
Before discussing the implementation, let us point out a very basic appli- time, but to insert s in the array, we would have to move all later elements
cation of priority queues that highlights why O(log n) time per operation is one position to the right. This would take O(n) time. On the other hand, if we
essentially the "right" bound to aim for. maintain the set as a sorted doubly linked list, we could insert it in O(1) time
into any position, but the doubly linked list would not support binary search,
(2.11) A sequence of O(n) priority queue operations can be used to sort a set and hence we may need up to O(n) time to find the position where s should
of n numbers. be inserted.
Proof. Set up a priority queue H, and insert each number into H with its value The Definition of a Heap So in all these simple approaches, at least one of
as a key. Then extract the smallest number one by one until all numbers have the operations can take up to O(n) time--much more than the O(log n) per
been extracted; this way, the numbers will come out of the priority queue in operation that we’re hoping for. This is where heaps come in. The heap data
sorted order. structure combines the benefits of a sorted array and list for purposes of this
application. Conceptually, we think of a heap as a balanced binary tree as
Thus, with a priority queue that can perform insertion and the extraction shown on the left of Figure 2.3. The tree will have a root, and each node can
of minima in O(log n) per operation, we can sort n numbers in O(n log n) have up to two children, a left and a right child. The keys in such a binary tree
time. It is known that, in a comparison-based model of computation (when are said to be in heap order if the key of any element is at least as large as the
each operation accesses the input only by comparing a pair of numbers), key of the element at its parent node in the txee. In other words,
the time needed to sort must be at least proportional to n log n, so. (2.11)
Heap order: For every element v, at a node i, the element w at i’s parent
highlights a sense in which O(log n) time per operation is the best we can
satisfies key(w) < key(v).
hope for. We should note that the situation is a bit more complicated than
this: implementations of priority queues more sophisticated than the one we In Figure 2.3 the numbers in the nodes are the keys of the corresponding
present here can improve the running time needed for certain operations, and elements.
add extra functionality. But (2.11) shows that any sequence of priority queue Before we discuss how to work with a heap, we need to consider what data
operations that results in the sorting of n numbers must take time at least structure should be used to represent it. We can use poiriters: each node at the
proportional to n log n in total. heap could keep the element it stores, its key, and three pointers pointing to
the two children and the parent of the heap node. We can avoid using pointers,
A Data Structure for Implementing a Priority Queue however, if a bound N is known in advance on the total number of elements
We will use a data structure called a heap to implement a priority queue. that will ever be in the heap at any one time. Such heaps can be maintained
Before we discuss the structure of heaps, we should consider what happens in an array H indexed by i = 1 ..... N. We will think of the heap nodes as
with some simpler, more natural approaches to implementing the flmctions corresponding to the positions in this array. H[1] is the root, and for any node
2.5 A More Complex Data Structure: Priority Queues 61
6O Chapter 2 Basics of Algorithm Analysis

~a
EaCh node’s key is at least~
s large as its parent’s.
The H e a p i fy - u p process is movingI
element v toward the root.

1 2 5 10 3 7 11 15 17 20 9 15 8 16 X

Figure 2.3 Values in a heap shown as a binaD, tree on the left, and represented as an Figure 2.4 The Heapify-up process. Key 3 (at position 16) is too small (on the left).
array on the right. The arrows show the children for the top three nodes in the tree. After swapping keys 3 and 11, the heap xdolation moves one step closer to the root of
the tree (on the right).

at position i, the children are the nodes at positions leftChild(i) = 2i and ] = parent(i) to continue fixing the heap by pushing the damaged part upward.
rightChild(f) = 2i + 1. So the two children of the root are at positions 2 and Figure 2.4 shows the first two steps of the process after an insertion.
3, and the parent of a node at position i is at position parent(f) =/i/2J. If
the heap has n < N elements at some time, we will use the first rt positions
of the array to store the n heap elements, and use lenggh(H) to denote the Heapify-up (H, i) :
number of elements in H. This representation keeps the heap balanced at all If i> 1 then
times. See the right-hand side of Figure 2.3 for the array representation of the let ] = parent(i) = Lil2J
If key[H[i]]<key[H[j]] then
heap on the left-hand side.
swap the array entries H[i] mad H[j]
Heapify-up (H, j )
Implementing the Heap Operations Endif
The heap element with smallest key is at the root, so it takes O(1) time to Endif
identify the minimal element. How do we add or delete heap elements? First
conside~ adding a new heap element v, and assume that our heap H has n < N To see why Heapify-up wOrks, eventually restoring the heap order, it
elements so far. Now it will have n + 1 elements. To start with, we can add the helps to understand more fully the structure of our slightly damaged heap in
new element v to the final position i = n + 1, by setting H[i] = v. Unfortunately, the middle of this process. Assume that H is an array, and v is the element in
this does not maintain the heap property, as the key of element v may be position i. We say that H is almost a heap with the key of H[i] too small, if there
smaller than the key of its parent. So we now have something that is almost-a is a value ~ _> key(v) such that raising the value of key(v) to c~ would make
heap, except for a small "damaged" part where v was pasted on at the end. the resulting array satisfy the heap property. (In other words, element v in H[i]
is too small, but raising it to cz would fix the problem.) One important point
We will use the procedure Heap±f y-up to fix our heap. Letj = parent(i) =
to note is that if H is almost a heap with the key of the root (i.e., H[1]) too
L//2] be the parent of the node i, and assume H[j] = w. If key[v] < key[w],
then we will simply swap the positions of v and w. This wil! fix the heap small, then in fact it is a~heap. To see why this is true, consider that if raising
property at position i, but the resulting structure will possibly fail to satisfy the value of H[1] to c~ would make H a heap, then the value of H[!] must
the heap property at position j--in other words, the site of the "damage" has also be smaller than both its children, and hence it already has the heap-order
moved upward from i to j. We thus call the process recursively from position property.
2.5 A More Complex Data Structure: Priority Queues
62 Chapter 2 Basics of Algorithm Analysis 63

(2.12) The procedure Heapify-up(!-!, i) fixes the heap property in O(log i)

time, assuming that the array H is almost a heap with the key of H[i] too small.
Using Heapify-up we can insert a need element in a heap of n elements in
IThe He~pi fy-down process~
is moving element w down,|
toward the leaves. )
O(log n) time.

Proof. We prove the statement by induction on i. If i = ! there is nothing to

prove, since we have already argued that in this case H is actually a heap.
Now consider the case in which i > 1: Let v = H[i], j = parent(i), w = H[j],
and fl = key(w). Swapping elements v and w takes O(1) time. We claim that
after the swap, the array H is either a heap or almost a heap with the key of
H[j] (which now holds v) too small. This is true, as setting the key value at
node j to ~ would make H a heap.
So by the induction hypothesis, applying Heap±fy-up(j) recursively will
Figure 2.5 The Heapify-down process:. Key 21 (at position 3) is too big (on the left).
produce a heap as required. The process follows the tree-path from position i After swapping keys 21 and 7, the heap violation moves one step closer to the bottom
to the root, so it takes O(log i) time. of the tree (on the right).
To insert a new element in a heap, we first add it as the last element. If the
new element has a very large key value, then the array is a heap. Otherwise,
swaps the element at position i with one of its children and proceeds down
it is almost a heap with the key value of the new element too small. We use
the tree recursively. Figure 2.5 shows the first steps of this process.
Heapify-up to fix the heap property. []
Heapify-down (H, i) :
Now consider deleting an element. Many applications of priority queues Let n= length(H)
don’t require the deletion of arbitrary elements, but only the extraction of If 2i>n then
the minimum. In a heap, this corresponds to identifying the key at the root Terminate with H unchanged
(which will be the minimum) and then deleting it; we will refer to this oper- Else if 2f<n then
ation as ExtractMin(H). Here we will implement a more general operation Let left=2f, and right=2f+l
Delete(/./, i), which will delete the element in position i. Assume the heap Let ] be the index that minimizes key [H [left] ] and key [H [right] ]
currently has n elements. After deleting the element H[i], the heap will have Else if 2f=n then
only n - 1 elements; and not only is the heap-order property violated, there Let ] = 2f
is actually a "hole" at position i, since H[i] is now empty. So as a first step, Endif
to patch the hole in H, we move the element w in position n to position i. If key[H[]]] < key[H[i]] then
After doing this, H at least has the property that its n - 1 elements are in the swap the array entries H[i] and H[]]
first n - 1 positions, as required, but we may well still not. have the heap-order Heapify-down (H, ])
property. Endif
However, the only place in the heap where the order might be violated is
position i, as the key of element w may be either too small or too big for the Assume that H is an array and w is the element in position i. We say that
H is almost a heap with the key of Hill too big, if there is a-vaiue o~ _< key(w)
position i. If the key is too small (that is, the violation of the heap property is
such that lowering the value of key(w) to c~ would make the resulting array
between node i and its parent), then we can use Heapify-up(i) to reestablish
satisfy the heap property. Note that if H[i] corresponds to a leaf in the heap
the heap order. On the other hand, if key[w] is too big, the heap property
may be violated between i and one or both of its children. In this case, we will (i.e., it has no children), and H is almost a heap with H[i] too big, then in fact
H is a heap. Indeed, if lowering the value in H[i] would make H a heap, then
use a procedure called Heapify-dovn, closely analogous to Heapify-up, that
Solved Exercises 65
Chapter 2 Basics of Algorithm Analysis
64
There is a second class of operations in which we want to operate on
H[i] is already larger than its parent and hence it already has the heap-order
elements by name, rather than by their position in the heap. For example, in
property. a number of graph algorithms that use heaps, the heap elements are nodes of
(2.13) The procedure Heapify-down(H, i) fixes the heap property in O(log n) the graph with key values that are computed during the algorithm. At various
time, assuming that H is almost a heap with the key value of H[i] too big. Using points in these algorithms, we want to operate on a particular node, regardless
Heap±fy-up or Heap±fy-dovn we can delete a new element in a heap o[ n of where it happens to be in the heap.
elements in O(log n) time. To be able to access given elements of the priority queue efficiently, we
Proof. We prove that the process fixes the heap by reverse induction on the simply maintain an additional array Pos±t±on that stores the current position
value i. Let n be the number of elements in the heap. If 2i > n, then, as we of each element (each node) in the heap. We can now implement the following
just argued above, H is a heap and hence there is nothing to prove. Otherwise, further operations.
let j be the child of i with smaller key value, and let w = H[j]. Swapping the
array elements w and v takes O(1) time. We claim that the resulting array is To delete the element u, we apply Delete(H,Position[u]). Maintaining
either a heap or almost a heap with H[j] = v too big. This is true as setting this array does not increase the overall running time, and so we can
key(v) = key(w) would make H a heap. Now j >_ 2i, so by the induction delete an element v from a heap with n nodes in O(log n) time.
hypothesis, the recursive call to Heap±fy-cloun fixes the heap property. An additional operation that is used by some algorithms is ChangeKey
The algorithm repeatedly swaps the element originally at position i down, (H, v, cO, which changes the key value of element u to key(u) = o~. To
following a tree-path, so in O(log n) iterations the process results in a heap. implement this operation in O(log n) time, we first need to be able to
identify the position of element v in the array, which we do by using
To use the process to remove an element v = H[i] from the heap, we replace
the array Position. Once we have identified the position of element v,
HI±] with the last element in the array, H[n] = w. If the resulting array is not a we change the key and then apply Heapify-up or Heapify-doma as
heap, it is almost a heap with the key value of H[i] either too small or too big. appropriate.
We use Heapify-down or Heapify-down to fix the heap property in O(log n)
time. []

Implementing Priority Queues with Heaps

Solved Exercises
The heap data structure with the Heap±fy-do~m and Heapi~y-up operations Solved Exercise 1
can efficiently implement a priority queue that is constrained to hold at most
Take the following list of functions and arrange them in ascending order of
N elements at any point in time. Here we summarize the operations we will
growth rate. That is, if function g(n) immediately follows function f(n) in
use. your list, then it should be the case that [(n) is O(g(n)).
o 8taxtHeap(N) returns an empty heap H that is set up to store at most N
elements. This operation takes O(N) time, as it involves initializing the fl(n) = 10n
array that will hold the heap.
o Insert(H, v) inserts the item u into heap H. If the heap currently has n
h(n) = n
elements, this takes O(log n) time. h(n) = nn
~ F±ndM±n(H) identifies the minimum element in the heap H but does not f4(n) = log2 n
remove it. This takes O(!) time. f~(n) = 2~4T~ ~
~ Delete(H, i) deletes the element in heap position i. This is implemented
in O(log n) time for heaps that have n elements. Solution We can deal with functions fl, f2, and f4 very easily, since they
~ ExtractMin(H) identifies and deletes an element with minimum key belong to the basic families of exponentials, polynomials, and logarithms.
value from a heap. This is a combination of the preceding two operations, In particular, by (2.8), we have f4(n)= O(f2(n)); and by (2.9),-we have
f2(n) = O(f~(n)).
and so it takes O(log n) time.
Exercises 67
Chapter 2 Basics of Algorithm Analysis
66

Now, the function f3 isn’t so hard to deal with. It starts out smaller than
Exercises
I0n, but once n >_ 10, then clearly I0n < nn. This is exactly what we need for
Suppose you have algorithms with the five running times listed below.
the definition of O(.) notation: for all n >_ 10, we have I0n _< cnn, where in this (Assume these are the exact running times.) How much slower do each of
case c = 1, and so I0n = o(nn). these algorithms get when you (a) double the input size, or (b) increase
Finally, we come to function fls, which is admittedly kind of strange- the input size by one?
looking. A useful rule of thumb in such situations is to try taking logarithms
(a) n2
to see whether this makes things clearer. In this case, log2 fs(n) = ~ n =
(!og2 n)l/2. What do the logarithms of the other functions look like? log f4(n) = n3
log2 log2 n, while log fa(n) = ½ log2 n. All of these can be viewed as functions lOOn2
of log2 n, and so using the notation z = log2 n, we can write nlog n
1 2n
log fa(n) = -z
3 Suppose you have algorithms with the sLx running times listed below.
log f4(n) = log2 z (Assume these are the exact number of operations performed as a func-
log fs(n) = z~/2 tion of the input size n.) Suppose you have a computer that can perform
10t° operations per second, and you need to compute a result in at most
Now it’s easier to see what’s going on. First, for z > 16, we have log2 z < an hour of computation. For each of the algorithms, what is the largest
z1/2. But the condition z > 16 is the same as n >_ 216 -= 65,536; thus once input size n for which you would be able to get the result within an hour?
n > 216 we have log/4(n) _< log/s(n), and so/4(n) _< Is(n). Thus we can write (a) rt~
f4(n) _= O(fs(n)). Similarly we have z11~< ½z once z >_ 9--in other words, (b) n3
once n > 29 = 512. For n above this bound we have log fs(n) < log f2(n) and
(c) lOOn~
hence fs(n)< f2(n), and so we can write Is(n)= O(f2(n)). Essentially, we
(d) n log n
have discovered that 2l~/i-~ n is a function whose growth rate lies somewhere
between that of logarithms and polynomials. (e) 2n
Since we have sandwiched fs between f4 and f2, this finishes the task of (f) 22"
putting the functions in order.
Take the foilowing list of functions and arrange them in ascending order
of growth rate. That is, if function g(n) immediately follows function f(n)
in your list, then it should be the case that f(n) is O(g(n)).
Solved Exercise 2
Let f and g be two functions that take nonnegative values, and suppose that v/ fl(n) = n
f = O(g). Show that g = fl (f).
~/ f3(n) = n + 10
Solution This exercise is a way to formalize the intuition that O(.) and fl (-)
are in a sense opposites. It is, in fact, not difficult to prove; it is just a matter ~/f4(n) = lon
of unwinding the definitions. ~/fstn) = 10on
We’re given that, for some constants c and no, we have f(n) < cg(n) for fc,(n) = n2 log n
all n >_ n0. Dividing both sides by c, we can conclude that g(n) >_ ~f(n) for Take the following list of functions and arrange them in ascending order
all n >_ no. But this is exactly what is required to show that g = fl (f): we have
of growth rate. That is, if function g(n) immediately follows function f(n)
established that g(n) is at least a constant multiple of f(n) (where the constant
in your list, then it should be the case that f(n) is O(g(n)).
is ~), for all sufficiently large n (at least no).
Exercises 69
Chapter 2 Basics of Algorithm Analysis
68

~ gl(a) = 2~°4i~ the relevant entries of the array B, filling in a value for each--it
contains some highly urmecessary sources of inefficiency. Give a
" g2(n) = 2n
different algorithm to solve this problem, with an asymptotically
i g4(n) ---- n4/3 better nmning time. In other words, you should design an algorithm
g3(n) = n(log n)3 with running time O(g(n)), where lim~_.oo g(n)/f(n) = O.

gs(n) = nlogn
There’s a class of folk songs and holiday songs in which each verse
g6(n) = 22n
consists of the previous verse, with one extra line added on. "The Twelve
i gT(n) = 2n2 Days of Christmas" has this property; for e.xample, when you get to the
fifth verse, you sing about the five golden rings and then, reprising the
Assume you have functions f and g such that f(n) is O(g(n)). For each of lines from the fourth verse, also cover the four calling birds, the three
the following statements, decide whether you think it is true or false and French hens, the two turtle doves, and of course the.partridge in the’pear
give a proof or counterexample. tree. The Aramaic song "Had gadya" from the PassoVer Haggadah works
(a) log2 f(n)’is O(log2g(n))- like this as well, as do many other songs.
(b) 2f(n) is O(2g(~)). These songs tend to last a long time, despite having relatively short
(C) /(n)2 iS O(g(n)2). scripts. In particular, you can convey the words plus instructions for one
of these songs by specifying just the new line that is added In each verse,
Consider the following basic problem. You’re given an array A consisting without ha~4ng to write out all the previous lines each time. (So the phrase
A[n]. You’d like to output a two-dimensional "five golden rings" ouly has to be written once, even though it will appear
n-by-n array B in which B[i,j] (for i <j) contains the sum of array entries in verses five and Onward.)
A[i] through A~]--that is, the sum A[i] +A[i + 1] +-" + A[j]. (The value of There’s someth~g asy~nptotic that can be analyzed here. Suppose,
array entry B[i,j] is left unspecified whenever i >_j, so it doesn’t matter for concreteness, that ~ach line has a length that i~ bounded by a constant
what is output for these values.) c, and suppose that the song, when sung out loud, runs for n words total.
Here’s a simple algorithm to solve this problem. Show how to encode such a song using a script that has length f(n), for
a function f(n) that grows as slowly as possible.
For i=1, 2,...,n
n
You’re doing some stress-testing on various models of glass jars to
Add up array entries A[i] through A[j] determine the height from which they can be dropped and still not break.
Store the result in B[i,]] The setup for this experiment, on a particular type of jar, is as follows.
End/or You have a ladder with n rungs, and you want to find the highest rung
End/or
from which you can drop a copy of the jar and not have it break..We ca~,
this the highest safe rung.
(a) For some function f that you should choose, give a bound of the
form O(f(n)) on the running time of this algorithm on an input of It might be natural to try binary search: drop a jar from the middle
size n (i.e., a bound on the number of operations performed by the rung, see if it breaks, and then recursively try from rung n/4 or 3n/4
algorithm). depending on the outcome. But this has the drawback that y9u could
break a lot of jars in finding the answer.
(b) For this same function f, show that thertmning time of the algorithm
on an input of size n is also ~2 (f(n)). (This shows an asymptotically If your primary goal were to conserve jars, on the other hand, you
tight bound of ®(f(n)) on the running time.) could try the following strategy. Start by dropping a jar from the first
rung, then the second rung, and so forth, climbing one higher each time
(c) Although the algorithm you analyzed in parts (a) and (b) is the most until the jar breaks. In this way, you only need a single j ar--at the moment
natural way to solve the problem--after all, it just iterates through
Chapter 2 Basics of Algorithm Analysis Notes and Further Reading 71
70

it breaks, you have the correct answer--but you may have to drop it rt cost spanning trees, and we will discuss randomized hashing in Chapter 13.
times (rather than log rt as in the binary search solution). A number of other data structures are discussed in the book by Tarjan (1983).
The LEDA library (Library of Efficient Datatypes and Algorithms) of Mehlhorn
So here is the trade-off: it seems you can perform fewer drops if
and Ngher (1999) offers an extensive library of data structures useful in
you’re willing to break more jars. To understand better how this trade-
combinatorial and geometric applications.
off works at a quantitative level, let’s consider how to run this experiment
given a fixed "budget" of k >_ 1 jars. In other words, you have to determine Notes on the Exercises Exercise 8 is based on a problem we learned from
the correct answer--the highest safe rung--and can use at most k jars In Sam Toueg.
doing so.
(a) Suppose you are given a budget of k = 2 jars. Describe a strategy for
finding the highest safe rung that requires you to drop a jar at most
f(n) times, for some function f(n) that grows slower than linearly. (In
other words, it should be the case that limn-.~ f(n)/n = 0.)
(b) Now suppose you have a budget of k > 2 jars, for some given k.
Describe a strategy for fInding the highest safe rung using at most
k jars. If fk(n) denotes the number of times you need to drop a jar
should have.
the property that each grows asymptotically slower than the previous
one: lirnn_~ fk(n)/fk_l(n) = 0 for each k.

Notes and Further Reading

Polynomial-time solvability emerged as a formal notion of efficiency by a
gradual process, motivated by the work of a number of researchers includ-
ing Cobham, Rahin, Edmonds, Hartmanis, and Stearns. The survey by Sipser
(1992) provides both a historical and technical perspective on these develop-
ments. Similarly, the use of asymptotic order of growth notation to bound the
running time of algorithms--as opposed to working out exact formulas with
leading coefficients and lower-order terms--is a modeling decision that was
quite non-obvious at the time it was introduced; Tarjan’s Turing Award lecture
(1987) offers an interesting perspective on the early thinking of researchers
including Hopcroft, Tarian, and others on this issue. Further discussion of
asymptotic notation and the growth of basic functions can be found in Knuth
(1997a).
The implementation of priority queues using heaps, and the application to
sorting, is generally credited to Williams (1964) and Floyd (1964). The priority
queue is an example of a nontrivial data structure with many applications; in
later chapters we will discuss other data structures as they become useful for
the implementation of particular algorithms. We will consider the Union-Find
data structure in Chapter 4 for implementing an algorithm to find minimum-
3
Graphs

Our focus in this book is on problems with a discrete flavor. Just as continuous
mathematics is concerned with certain basic structures such as real numbers,
vectors, and matrices, discrete mathematics has developed basic combinatorial
structures that lie at the heart of the subiect. One of the most fundamental and
expressive of these is the graph.
The more one works with graphs, the more one tends to see them ev-
erywhere. Thus, we begin by introducing the basic definitions surrounding
graphs, and list a spectrum of different algorithmic settings where graphs arise
naturally. We then discuss some basic algorithmic primitives for graphs, be-
ginning with the problem of connectivity and developing some fundamental
graph search techniques.

5.1 Basic Definitions and Applications

Reca!l from Chapter 1 that a graph G is simply a way of encoding pairwise
relationships among a set of objects: it consists of a collection V of nodes
and a collection E of edges, each of which "joins" two of the nodes. We thus
represent an edge e E E as a two-element subset of V: e = {u, v} for some
u, v E V, where we cal! u and v the ends of e.
Edges in a graph indicate a symmetric relationship between their ends.
Often we want to encode asymmetric relationships, and for this we use the
c!osely related notion of a directed graph. A directed graph G’ consists of a set
of nodes V and a set of directed edges E’. Each e’ ~ E’ is an ordered pair (a, v);
in other words, the roles of u and v are not interchangeable, and we call u the
tail of the edge and v the head. We will also say that edge e’ leaves node u and
enters node v.
74 Chapter 3 Graphs 3.1 Basic Definitions and Applications

different ways. First, we could have a node for each computer and
When we want to emphasize that the graph we are considering is not
an edge joining u and u if there is a direct physical link connecting
directed, we will cal! it an undirected graph; by default, however, the term
them. Alternatively, for studying the large-scale structure of the Internet,
"graph" will mean an undirected graph. It is also worth mentioning two
warnings in our use of graph terminology. First, although an edge e in an people often define a node to be the set of all machines controlled by
undirected graph should properly be written as a set of nodes {u, u}, one will a single Internet service provider, with an edge joining u and v if there
is a direct peering relationship between them--roughly, an agreement
more often see it written (even in this book) in the notation used for ordered
pairs: e = (u, v). Second, a node in a graph is also frequently called a vertex; to exchange data under the standard BCP protocol that governs global
in this context, the two words have exactly the same meaning. Internet routing. Note that this latter network is more "virtual" than
the former, since the links indicate a formal agreement in addition to
Examples of Graphs Graphs are very simple to define: we just take a collec- a physical connection.
tion of things and join some of them by edges. But at this level of abstraction,
In studying wireless networks, one typically defines a graph where
it’s hard to appreciate the typical kinds of situations in which they arise. Thus,
the nodes are computing devices situated at locations in physical space,
we propose the following list of specific contexts in which graphs serve as
and there is an edge from u to u if u is close enough to u to receive a signal
important models. The list covers a lot of ground, and it’s not important to
remember everything on it; rather, it will provide us with a lot of usefifl ex- from it. Note that it’s often useful to view such a graph as directed, since
it may be the case that u can hear u’s signal but u cannot hear u’s signal
amples against which to check the basic definitions and algorithmic problems
that we’ll be encountering later in the chapter. Also, in going through the list, (if, for example, u has a stronger transmitter). These graphs are also
interesting from a geometric perspective, since they roughly correspond
it’s usefi~ to digest the meaning of the nodes and the meaning of the edges in.
to putting down points in the plane and then joining pairs that are close
the context of the application. In some cases the nodes and edges both corre-
spond to physical objects in the real world, in others the nodes are real objects together.
while the edges are virtual, and in still others both nodes and edges are pure Inyormation networks. The World Wide Web can be naturally viewed as a
abstractions. directed graph, in which nodes correspond to Web pages and there is an
edge from u to v if u has a hyperlink to v. The directedness of the graph
1. Transportation networks. The map of routes served by an airline carrier is crucial here; many pages, for example, link to popular news sites,
naturally forms a graph: the nodes are airports, and there is an edge from
but these sites clearly do not reciprocate all these links. The structure of
u to t~ if there is a nonstop flight that departs from u and arrives at v.
all these hyperlinks can be used by algorithms to try inferring the most
Described this way, the graph is directed; but in practice when there is an
important pages on the Web, a technique employed by most current
edge (u, u), there is almost always an edge (u, u), so we would not lose
search engines.
much by .treating the airline route map as an undirected graph with edges
joining pairs of airports that have nonstop flights each way. Looking at The hypertextual structure of the Web is anticipated by a number of
such a graph (you can generally find them depicted in the backs of in- information networks that predate the Internet by many decades. These
flight airline magazines), we’d quickly notice a few things: there are often include the network of cross-references among articles in an encyclopedia
a small number of hubs with a very large number of incident edges; and or other reference work, and the network of bibliographic citations
it’s possible to get between any two nodes in the graph via a very small among scientific papers.
number of intermediate stops. Social networks. Given any collection of people who interact (the em-
Other transportation networks can be modeled in a similar way. For ployees of a company, the students in a high school, or the residents of
example, we could take a rail network and have a node for each terminal, a small town), we can define a network whose nodes are people, with
and an edge joining u and v if there’s a section of railway track that an edge joining u and v if they are friends with one another. We could
goes between them without stopping at any intermediate terminal. The have the edges mean a number of different things instead of friendship:
standard depiction of the subway map in a major city is a drawing of the undirected edge (u, v) could mean that u and v have had a roman-
such a graph. tic relationship or a financial relationship; the directed edge (u, v) could
Communication networks. A collection of computers connected via a mean that u seeks advice from v, or that u lists v in his or her e-mail
2.
communication network can be naturally modeled as a graph in a few address book. One can also imagine bipartite social networks based on a
3.1 Basic Definitions and Applications 77
Chapter 3 Graphs
76

notion of affiliation: given a set X of people and a set Y of organizations,

we could define an edge between u a X and v ~ Y if person u belongs to
organization v.
Networks such as this are used extensively by sociologists to study
the dynamics of interaction among people. They can be used to identify
the most "influential" people in a company or organization, to model
trust relationships in a financial or political setting, and to track the
spread of fads, rumors, jokes, diseases, and e-mail viruses.
Dependency nenvorks. It is natural to define directed graphs that capture
the interdependencies among a collection of objects. For example, given
the list of courses offered by a college or university, we could have a Figure 3.1 Two drawings of the same tree. On the right, the tree is rooted at node 1.
node for each course and an edge from u to v if u is a prerequisite for v.
Given a list of functions or modules in a large software system, we could
have a node for each function and an edge from u to v if u invokes v by a directed graph is a bit more subtle, since it’s possible for u to have a path to
function cal!. Or given a set of species in an ecosystem, we could define ~ while u has no path to u. We say that a directed graph is strongly connected
a graph--a food web--in which the nodes are the different species and if, for every two nodes u and u, there is a path from u to v and a path from v
there is an edge from u to v if u consumes v. to u.
This is far from a complete list, too far to even begin tabulating its In addition to simply knowing about the existence of a path between some
omissions. It is meant simply to suggest some examples that are useful to pair of nodes a and u, we may also want to know whether there is a short path.
keep in mind when we start thinking about graphs in an algorithmic context. Thus we define the distance between two nodes a and u to be the minimum
number of edges in a u-u path. (We can designate some symbol like oo to
Paths and Connectiuity One of the fundamental operations in a graph is denote the distance between nodes that are not connected by a path.) The
that of traversing a sequence of nodes connected by edges. In the examples term distance here comes from imagining G as representing a communication
iust listed, such a traversal could correspond to a user browsing Web pages by or transportation network; if we want to get from a to u, we may well want a
following hyperlinks; a rumor passing by word of mouth from you to someone route with as few "hops" as possible.
halfway around the world; or an airline passenger traveling from San Francisco
to Rome on a sequence of flights. Trees We say that an undirected graph is a tree if it is connected and does not
With this notion in mind,-we define a path in an undirected graph contain a cycle. For example, the two graphs pictured in Figure 3.! are trees.
In a strong sense, trees are the simplest kind of connected graph: deleting any
G = (V, E) to be a sequence P of nodes v1, v2 ..... v~_l, v~ with the property
that each consecutive pair v~, vg+~ is ioined by an edge in G. P is often called edge from a tree will disconnect it.
a path from v~ to ug, or a v~-vg path. For example, the nodes 4, 2, 1, 7, 8 form For thinking about the structure of a tree T, it is useful to root it at a
a path in Figure 3.1. A path is called simple if all its vertices are distinct from particular node r. Physically, this is the operation of grabbing T at the node r
one another. A cycle is a path v~, v2 ..... v~_l, v~ in which k > 2, the first k - 1 and letting the rest of it hang downward under the force of gravity, like a
nodes are all distinct, and vl = v~--in other words, the sequence of nodes mobile. More precisely, we "orient" each edge of T away ffomr; for each other
"cycles back" to where it began. All of these definitions carry over naturally node v, we declare the parent of v to be the node u that directly precedes v
to directed graphs, with the fol!owing change: in a directed path or cycle, on its path from r; we declare w to be a child of v if v is the parent of w. More
each pair of consecutive nodes has the property that (vi, vi+l) is an edge. In generally, we say that w is a descendant of v (or v is an ancestor of w) if v lies
other words, the sequence of nodes in the path or cycle must respect the on the path from the root to w; and we say that a node x is a leaf if it has no
directionality of edges. descendants. Thus, for example, the two pictures in Figure 3.1 correspond to
We say that an undirected graph is connected if, for every pair of nodes u the same tree T--the same pairs of nodes are joined by edges--but the drawing
and v, there is a path from u to v. Choosing how to define connectivity of a on the right represents the result of rooting T at node 1.
3.2 Graph Connectivity and Graph Traversal 79
Chapter 3 Graphs
78

Rooted trees are fundamental objects in computer science, because they

encode the notion of a hierarchy. For example, we can imagine the rooted tree
in Figure 3.1 as corresponding to the organizational structure of a tiny nine-
person company; employees 3 and 4 report to employee 2; employees 2, 5,
and 7 report to employee 1; and so on. Many Web sites are organized according
to a tree-like structure, to facilitate navigation. A Wpical computer science
department’s Web site will have an entry page as the root; the People page is
a child of this entry page (as is the Courses page); pages entitled Faculty and
Students are children of the People page; individual professors’ home pages
are children of the Faculty page; and so on. Figure 3.2 In this graph, node 1 has paths to nodes 2 through 8, but not to nodes 9
For our purposes here, roofing a tree T can make certain questions about T ~ough 13.
conceptually easy to answer. For example, given a tree T on n nodes, how many
edges does it have? Each node other than the root has a single edge leading
"upward" to its parent; and conversely, each edge leads upward from precisely then the problem is to start in a room s and find your way to another designated
one non-root node. Thus we have very easily proved the following fact. room t. How efficient an algorithm can we design for this task?
In this section, we describe two natural algorithms for this problem at a
(3.1) Every n-node tree has exactly n - I edges.
high level: breadth-first search (BFS) and depth-first search (DFS). In the next
In fact, the following stronger statement is true, although we do not prove section we discuss how to implement each of these efficiently, building on a
it here. data structure for representing a graph as the input to an algorithm.

(3.2) Let G be an undirected graph on n nodes. Any tmo of the following

Breadth-First Search
statements implies the third.
Perhaps the simplest algorithm for determining s-t connectivity is breadth-first
(0 G is connected. search (BFS), in which we explore outward from s in all possible directions,
(ii) G does not contain a c31cle. adding nodes one "layer" at a time. Thus we start with s and include all nodes
(iiO G has n - 1 edges. that are joined by an edge to s--this is the first layer of the search. We then
include all additional nodes that are joined by an edge to any node in the first
We now turn to the role of trees in the fundamental algorithmic idea of layer--this is the second layer. We continue in this way until no new nodes
graph trauersal. are encountered.
In the example of Figure 3.2, starting with node 1 as s, the first layer of
the search would consist of nodes 2 and 3, the second layer would consist of
3.2 Graph Connectivity and Graph Traversal nodes 4, 5, 7, and 8, and the third layer would consist just of node 6. At this
Having built up some fundamental notions regarding graphs, we turn to a very point the search would stop, since there are no further nodes that could be
basic algorithmic question: n0de-to-node connectivity. Suppose we are given a added (and in particular, note that nodes 9 through 13 are never reached by
graph G = (V, E) and two particular nodes s and t. We’d like to find an efficient the search).
algorithm that answers the question: Is there a path from s to t in G.~ We wi~
As this example reinforces; there is a natural physical interpretation to the
call this the problem of determining s-t connectivity.
algorithm. Essentially, we start at s and "flood" the graph with an expanding
For very small graphs, this question can often be answered easily by visual wave that grows to visit all nodes that it can reach. The layer containing a
inspection. But for large graphs, it can take some work to search for a path. node represents the point in time at which the node is reached.
Indeed, the s-t Coimectivity Problem could also be called the Maze-Solving
Problem. If we imagine G as a maze with a room corresponding to each node, We can define the layers L1, L2, L3 .... constructed by the BFS algorithm
and a hallway corresponding to each edge that joins nodes (rooms) together, more precisely as follows.
3.2 Graph Connectivity and Graph Traversal 81
Chapter 3 Graphs
80

Layer L1 consists of all nodes that are neighbors of s. (For notational

reasons, we will sometimes use layer L0 to denote the set consisting just
of s.)
Assuming that we have defined layers L1 ..... Lj, then layer Lj+I consists
of all nodes that do not belong to an earlier layer and that have an edge
to a node in layer Li.
Recalling our definition of the distance between two nodes as the minimum
number of edges on a path joining them, we see that layer L1 is the set of all
nodes at distance 1 from s, and more generally layer Lj is the set of al! nodes (a) Co)
at distance exactly j from s. A node falls to appear in any of the layers if and
Figure 3.3 The construction of a breadth-first search tree T for the gTaph in Figure 3.2,
only if there is no path to it. Thus, BFS is not only determining the nodes that s with (a), (b), and (c) depicting the successive layers that are added. The solid edges are
can reach, it is also computing shortest paths to them. We sum this up in the the edges of T; the dotted edges are in the connected component of G containing node
!, but do not belong to T.
fo!lowing fact.

{3.3) For each j >_ !, layer LI produced by BFS consists of all nodes at distaffce
exactly j from s. There is a path from s to t if and only if t appears in some, the BFS tree, because by the time we look at this edge out of node 3, we
layer. already know about node 5.
(c) We then consider the nodes in layer L2 in order, but the only new node
A further property of breadth-first search is that it produces, in a very discovered when we look through L2 is node 6, which is added to layer
natural way, a tree T rooted at s on the set of nodes reachable from s. L3. Note that the edges (4, 5) and (7, 8) don’t get added to the BFS tree,
Specifically, for each such node v (other than s), consider the moment when because they don’t result in the discovery of new nodes.
v is first "discovered" by the BFS algorithm; this happens when some node (d) No new nodes are discovered when node 6 is examined, so nothing is put
in layer Lj is being examined, and we find that it has an edge to the previously in layer L4, and the algorithm terminates. The full BFS tree is depicted
unseen node v. At this moment, we add the edge (u, v) to the tree
in Figure 3.3 (c).
becomes the parent of v, representing the fact that u is "responsible" for
completing the path to v. We call the tree T that is produced in this way a We notice that as we ran BFS on this graph, the nontree edges all either
breadth-first search tree. connected nodes in the same layer, or connected nodes in adjacent layers. We
Figure 3.3 depicts the construction of a BFS tree rooted at node 1 for the now prove that this is a properW of BFS trees in general.
graph in Figure 3.2. The solid edges are the edges of T; the dotted edges are
edges of G that do not belong to T. The execution of BFS that produces this (3.4) Let T be a breadth-first search tree, let x and y be nodes in T belonging
tree can be described as follows. to layers Li and Lj respectively, and let (x, y) be an edge of G. Then i and j differ
by at most 1.
(a) Starting from node 1, layer L1 consists of the nodes {2, 3}.
(6) Layer Li is then grown by considering the nodes in layer L1 in order (say,
first 2, then 3). Thus we discover nodes 4 and 5 as soon as we look at 2, Proof. Suppose by way of contradiction that i and j differed by more than 1;
so 2 becomes their parent. When we consider node 2, we also discover in particular, suppose i < j - 1. Now consider the point in the BFS algorithm
an edge to 3, but this isn’t added to the BFS tree, since we already know when the edges incident to x were being examined. Since x belongs to layer
about node 3. Li, the only nodes discovered from x belong to layers Li+1 and earlier; hence,
We first discover nodes 7 and 8 when we look at node 3. On the other if y is a neighbor of x, then it should have been discovered by this point at the
hand, the edge from 3 to 5 is another edge of G that does not end up in latest and hence should belong to layer Li+1 or earlier. []
Chapter 3 Graphs 3.2 Graph Connectivity and Graph Traversal
82 83

Current component
Proof. We have already argued that for any node u ~ R, there is a path from s
containing s to v.
Now, consider a node tu ~ R, and suppose bY way of contradiction, that
there is an s-tu path P in G. Since s ~ R but tu R, there must be a first node v
on P that does not belong to R; and this’node ~is not equal to s. Thus there is
a node u immediately preceding u on P, so (u, u) is an edge. Moreover, since v
is the first node on P that does not belong to R, we must have u ~ R. It follows
that (u, v) is an edge where u ~ R and u ~g R; this contradicts the stopping rule
for the algorithm. []
Figure 3.4 When growing the connected component containing s, we look for nodes
like v that have not yet been x4sited.
For any node t in the component R, observe that it is easy to recover the
actual path from s to t along the lines of the argument above: we simply record,
for each node u, the edge (u, u) that was considered in the iteration in which
Exploring a Connected Component u was added to R. Then, by tracing these edges backward from t, we proceed
The set of nodes discovered by the BFS algorithm is precisely those reachable through a sequence of nodes that were added in earlier and earlier iterations,
from the starting node s. We will refer to this set R as the connected component eventually reaching s; this defines an s-t path.
of G containing s; and once we know the connected component containing s, To conclude, we notice that the general algorithm we have defined to
we can simply check whether t belongs to it so as to answer the question of grow R is underspecified, so how do we decide which edge to consider next?
s-t connectivity. The BFS algorithm arises, in particular, as a particular way of ordering the
Now, if one thinks about it, it’s clear that BFS is iust one possible way to nodes we visit--in successive layers, based on their distance from s. But
produce this component. At a more general level, we can build the component there are other natural ways to grow the component, several of which lead
R by "exploring" G in any order, starting from s. To start off, we define R = {s}. to efficient algorithms for the connectivity problem while producing search
Then at any point in time, if we find an edge (u, v) where u ~ R and v ~ R, we patterns with different structures. We now go on to discuss a different one of
can add u to R. Indeed, if there is a path P from s to u, then there is a path these algorithms, depth-first search, and develop some of its basic properties.
from s to v obtained by first following P and then following the edge (u, v).
Figure 3.4 illustrates this basic step in growing the component R.
Depth-First Search
Suppose we continue growing the set R until there are no more edges
leading out of R; in other words, we run the following algorithm. Another natural method to find the nodes reachable from s is the approach you
might take if the graph G were truly a maze of interconnected rooms and you
were walking around in it. You’d start from s and try the first edge leading out
R will consist of nodes to which s has a path of it, to a node u. You’d then follow the first edge leading out of u, and continue
Initially R = {s} in this way until you reached a "dead end"--a node for which you had already
While there is ~u edge (u,u) where uER and explored all its neighbors. You’d then backtrack until you got to a node with
Add u to R an unexplored neighbor, and resume from there. We call this algorithm depth-
Endwhile first search (DFS), since it explores G by going as deeply’ as possible and only
retreating when necessary.
Here is the key property of this algorithm. DFS is also a particular implementation of the generic component-growing
algorithm that we introduced earlier. It is most easily described in recursive
(3 !5)SetR prod~ded at the end of the aIgori&m is ~re~isely the ~b;~ctea form: we can invoke DFS from any starting point but maintain global knowl-
cOmpone~ Of G edge of which nodes have already been explored.
Chapter 3 Graphs 3.2 Graph Connectivity and Graph Traversal
84 85

DFS(u) :
Mark u as "Explored" and add u to R
For each edge (u,u) incident to u
If v is not marked "Explored" then
Recursively invoke DFS(u)
Endif
Endfor

To apply this to s-t connectivity, we simply declare all nodes initially to be not
explored, and invoke DFS(s). Ca) (d)
There are some fundamental similarities and some fundamental differ-
ences between DFS and BFS. The similarities are based on the fact that they
both build the connected component containing s, and we will see in the next
section that they achieve qualitatively similar levels of efficiency.
While DFS ultimately visits exactly the same set of nodes as BFS, it typically "
does so in a very different order; it probes its way down long paths, potentially
getting very far from s, before backing up to try nearer unexplored nodes. We
can see a reflection of this difference in the fact that, like BFS, the DFS algorithm
yields a natural rooted tree T on the component containing s, but the tree will
generally have a very different structure. We make s the root of the tree T,
and make u the parent of v when u is responsible for the discovery of v. That
is, whenever DFS(v) is invoked directly during the ca!l to DFS(u), we add the
edge (u, v) to T. The resulting tree is called a depth-first search tree of the (g)
component R.
Figure 3.5 The construction of a depth-first search tree T for the graph in Figure 3.2,
Figure 3.5 depicts the construction of a DFS tree rooted at node 1 for the with (a) through (g) depicting the nodes as they are discovered in sequence. The solid
graph in Figure 3.2. The solid edges are the edges of T; the dotted edges are edges are the edges of T; the dotted edges are edges of G that do not belong to T.
edges of G that do not belong to T. The execution of DFS begins by building a
path on nodes 1, 2, 3, 5, 4. The execution reaches a dead.end at 4, since there
To establish this, we first observe the following property of the DFS
are no new nodes to find, and so it "backs up" to 5, finds node 6, backs up
algorithm and the tree that it produces.
again to 3, and finds nodes 7 and 8. At this point there are no new nodes to find
in the connected component, so all the pending recursive DFS calls terminate, (3.6) For a given recursive call DFS(u), all nodes that are marked "Explored"
. one by one, and the execution comes to an end. The full DFS tree is depicted between the invocation and end of this recursive call are descendants of u
in Figure 3.5(g). in T.
This example suggests the characteristic way in which DFS trees look
Using (3.6), we prove
different from BFS trees. Rather than having root-to-leaf paths that are as short
as possible, they tend to be quite narrow and deep. However, as in the case (3,7) Let T be a depth-first search tree, let x and y be nodes in T, and let
of BFS, we can say something quite strong about the way in which nontree (x, y) be an edge of G that is not an edge of T. Then one of x ory is an ancestor
edges of G must be arranged relative to the edges of a DFS tree T: as in the
of the other.
figure, nontree edges can only connect ancestors of T to descendants.
3.3 Implementing Graph Traversal Using Queues and Stacks 87
Chapter 3 Graphs
86
component. We then find a node v (if any) that was not visited by the search
Proof. Suppose that (x, y) is an edge of G that is not an edge of T, and suppose
from s, and iterate, using BFg starting from v, to generate its Connected
without loss of generality that x is reached first by the DFS algorithm. When component--which, by (3.8), will be disioint from the component of s. We
the edge (x, y) is examined during the execution of DFS(x), it is not added
continue in this way until all nodes have been visited.
to T because y is marked "Explored." Since y was not marked "Explored"
when DFS(x) was first invoked, it is a node that was discovered between the
invocation and end of the recursive call DFS(x). It follows from (3.6) that y is 3.3 Implementing Graph Traversal Using Queues
a descendant of x. ,, and Stacks
So far we have been discussing basic algorithmic primitives for working with
The Set of All Connected Components graphs without mentioning any implementation details. Here we discuss how
So far we have been talking about the connected component containing a to use lists and arrays to represent graphs, and we discuss the trade-offs
particular node s. But there is a connected component associated with each between the different representations. Then we use these data structures to
node in the graph. What is the relationship between these components? implement the graph traversal algorithms breadth-first search (BFS) and depth-
In fact, this relationship is highly structured and is expressed in the first search (DFS) efficiently. We will see that BFS and DFS differ essentially
following claim. only in that one uses a queue and the other uses a stack, two simple data
structures that we will describe later in this section.
(3.8) For any two nodes s and t in a graph, their connected components are
either identical or disjoint. Representing Graphs
This is a statement that is very clear intuitively, if one looks at a graph like There are two basic ways to represent graphs: by an adjacency matrix and
the example in Figure 3.2. The graph is divided into multiple pieces with no by an adjacency list representation. Throughout the book we wil! use the
edges between them; the largest piece is the connected component of nodes adjacency list representation. We start, however, by reviewing both of these
1 through 8, the medium piece is the connected component of nodes 11, 12, representations and discussing the trade-offs between them.
and 13, and the sma!lest piece is the connected component of nodes 9 and 10. A graph G = (V, E) has two natural input parameters, the number of nodes
To prove the statement in general, we )ust need to show how to define these IVI, and the number of edges IEI. We will use n = IVI and m = IEI to denote
"pieces" precisely for an arbitrary graph. these, respectively. Running times will be given in terms of both of these two
parameters. As usual, we will aim for polynomial running times, and lower-
Proof. Consider any two nodes s and t in a graph G with the property that degree polynomials are better. However, with two parameters in the running
there is a path between s and t. We claim that the connected components time, the comparison is not always so clear. Is O(m2) or O(n3) a better running
containing s and t are the same set. Indeed, for any node v in the component
time? This depends on what the relation is between n and m. With at most
of s, the node v must also be reachable from t by a path: we can )fist walk one edge between any pair of nodes, the number of edges m can be at most
from t to s, and then on from s to v. The same reasoning works with the roles
(~) < n2. On the other hand, in many applications the graphs of interest are
of s and t reversed, and so a node is in the component of one if and only if it connected, and by (3.1), connected graphs must have at least m > n - ! edges.
is in the component of the other. But these comparisons do not always tell us which of two running times (such
On the other hand, if there is no path between s and t, then there cannot as m2 and n3) are better, so we will tend to keep the running times in terms
be a node v that is in the connected component of each. For if there were such of both of these parameters. In this section we aim to. implement the basic
a node v, then we could walk from s to v and then on to t, constructing a graph search algorithms in time O(m + n). We will refer to this as linear time,
path between s and t. Thus, if there is no path between s and t, then their since it takes O(m + n) time simply to read the input. Note that when we work
connected components are dis)tint. ,, with connected graphs, a running time of O(m + n) is the same as O(m), since
m>_n-1.
This proof suggests a natural algorithm for producing all the connected
Consider a graph G = (V, E) with n nodes, and assume the set of nodes
components of a graph, by growing them one component at a time. We start
is V = {1 ..... n}. The simplest way to represent a graph is by an adjacency
with an arbi~ary node s, and we use BFS (or DFS) to generate its connected
3.3 Implementing Graph Traversal Using Queues and Stacks 89
Chapter 3 Graphs
88
Proof. Each edge e = (u, w) contributes exactly twice to this sum: once in the.
matrix, which is an n x n matrix A where A[u, v] is equal to ! if the graph
quantity nu and once in the quantity nw. Since the sum is the total of th~
contains the edge (u, v) and 0 otherwise. If the graph is undirected, the matrix A
contributions of each edge, it is 2m. ¯
is symmetric, with A[u, v] = A[v, u] for all nodes u, v ~ V. The adjacency
matrix representation allows us to check in O(1) time if a given edge (u, v) is
present in the graph. However, the representation has two basic disadvantages. We sum up the comparison between adjacency matrices and adjacency
lists as follows.
o The representation takes ®(n2) space. When the graph has many fewer
edges than n2, more compact representations are possible. (3.10) The adjacency matrix representation of a graph requires O(n2) space,
o Many graph algorithms need to examine all edges incident to a given node while the adjacency list representation requires only O(m + n) ~pace.
v. In the adjacency matrix representation, doing this involves considering
all other nodes w, and checking the matrix entry A[v, w] to see whether Since we have already argued that m < n2, the bound O(m + n) is never
the edge (v, w) is present--and this takes ®(n) time. In the worst case, worse than O(n2); and it is much better when the underlying graph is sparse,
v may have ® (n) incident edges, in which case checking all these edges with m much smaller than n2.
will take ® (n) time regardless of the representation. But many graphs in Now we consider the ease of accessing the information stored in these two
practice have significantly fewer edges incident to most nodes, and so it
different representations. Recall that in an adjacency matrix we can check in
would be good to be able to find all these incident edges more efficiently.
O(1) time if a particular edge (u, v) is present in the graph. In the adjacency list
The representation of graphs used throughout the book is the adjacency representation, this can take time proportional to the degree O(nv): we have to
list, which works better for sparse graphs--that is, those with many fewer than follow the pointers on u’s adjacency list to see if edge u occurs on the list. On
n2 edges. In the adjacency list representation there is a record for each node u, the other hand, if the algorithm is currently looking at a node u, it can read
containing a list of the nodes to which v has edges. To be precise, we have an the list of neighbors in constant time per neighbor.
array Adj, where Adj [v] is a record containing a list of all nodes adjacent to In view of this, the adjacency list is a natural representation for explorihg
node v. For an undirected graph G = (V, E), each edge e = (v, w) ~ E occurs on graphs. If the algorithm is currently looking at a node u, it can read this list
two adjacency lists: node w appears on the list for node v, and node ~ appears of neighbors in constant time per neighbor; move to a neighbor ~ once it
on the list for node w. encounters it on this list in constant time; and then be ready to read the list
Let’s compare the adiacency matrix and adiacency list representations. associated with node v. The list representation thus corresponds to a physical
First consider the space required by the representation. An adjacency matrix notion of "exploring" the graph, in which you learn the neighbors of a node
requires O(n2) space, since it uses an n x n matrix. In contrast, we claim that u once you arrive at u, and can read them off in constant time per neighbor.
the adjacency list representation requires only O(m + n) space. Here is why.
First, we need an array of pointers of length n to set up the lists in Adj, and Queues and Stacks
then we need space for all the lists. Now, the lengths of these lists may differ
Many algorithms have an inner step in which they need to process a set of
from node to node, but we argued in the previous paragraph that overall, each
elements, such the set of all edges adjacent to a node in a graph, the set of
edge e = (v, w) appears in exactly two of the lists: the one for u and the one
visited nodes in BFS and DFS, or the set of all free men in the Stable Matching
for w. Thus the total length of al! lists is 2m = O(m).
algorithm. For this purpose, it is natural to maintain the set of elements to be
Another (essentially equivalent) way to iustify this bound is as follows. considered in a linked list, as we have done for maintaining the set of free men
We define the degree nv of a node v to be the number of incident edges it has. in the Stable Matching algorithm.
The length of the list at Adj [u] is list is nv, so the total length over all nodes is
One important issue that arises is the order in which to consider the
O (~v~v nu). Now, the sum of the degrees in a graph is a quantity that often
elements in such a list. In the Stable Matching algorithm, the order in which
comes up in the analysis of graph algorithms, so it is useful to work out what we considered the free men did not affect the outcome, although this required
this sum is. a fairly subtle proof to verify. In many other algorithms, such as DFS and BFS,
the order in which elements are considered is crucial.
(3.9) ~u~v nv=2m.
90 Chapter 3 Graphs 3.3 Implementing Graph Traversal Using Queues and Stacks 91

Two of the simplest and most natural options are to maintain a set of Add u to the list L[i+ I]
elements as either a queue or a stack. A queue is a set from which we extract Endif
elements in first-in, first-out (FIFO) order: we select elements in the same order End/or
in which they were added. A stack is a set from which we extract elements Increment the layer counter i by one
in last-in, first-out (LIFO) order: each time we select an element, we choose Endwhile
the one that was added most recently. Both queues and stacks can be easily
implemented via a doubly linked list. In both cases, we always select the first In this implementation it does not matter whether we manage each list
element on our list; the difference is in where we insert a new element. In a L[i] as a queue or a stack, since the algorithm is allowed to consider the nodes
queue a new element is added to the end of the list as the last element, while in a layer Li in any order.
in a stack a new element is placed in the first position on the list. Recall that a
doubly linked list has explicit First and Last pointers to the beginning and (3.11) The above implementation of the BFS algorithm tans in time O(m + n)
end, respectively, so each of these insertions can be done in constant time. (i.e., linear in the input size), if the graph is given by the adjacency list
representation.
Next we will discuss how to implement the search algorithms of the
previous section in linear time. We will see that BFS can be thought of as
Proof. As a first step, it is easy to bound the running time of the algorithm
using a queue to select which node to consider next, while DFS is effectively
by O(n2) (a weaker bound than our claimed O(m + n)). To see this, note that
using a stack.
there are at most n lists L[i] that we need to set up, so this takes O(n) time.
Now we need to consider the nodes u on these lists. Each node occurs on at
Implementing Breadth-First Search most one list, so the For loop runs at most n times over a].l iterations of the
The adjacency list data stxucture is ideal for implementing breadth-first search. While loop. When we consider a node u, we need to look through all edges
The algorithm examines the edges leaving a given node one by one. When we (u, u) incident to u. There can be at most n such edges, and we spend O(1)
are scanning the edges leaving u and come to an edge (u, u), we need to time considering each edge. So the total time spent on one iteration of the For
know whether or not node u has been previously discovered by the search. loop is at most O(n). We’ve thus concluded that there are at most n iterations
To make this simple, we maintain an array Discovered of length n and set of the For loop, and that each iteration takes at most O(n) time, so the total
Discovered[u] = true as soon as our search first sees u. The algorithm, as time is at most O(n2).
described in the previous section, constructs layers of nodes LI, L2 ..... where To get the improved O(m + n) time bound, we need to observe that the
Li is the set of nodes at distance i from the source s. To maintain the nodes in For loop processing a node u can take less than O(n) time if u has only a
a layer Li, we have a list L[i] for each i --- 0, I, 2 few neighbors. As before, let nu denote the degree of node u, the number of
edges incident to u. Now, the time spent in the For loop considering edges
BFS (s) : incident to node u is O(nu), so the total over all nodes is O(Y~u~v ha). Recall
Set Discovered[s] = true and Discovered[u] = false for all other u from (3.9) that ~,v nu = 2m, and so the total time spent considering edges
Initialize L[0] to consist of the single element s over the whole algorithm is O(m). We need O(n) additional time to set up
Set the layer counter i----0 lists and manage the array Discovered. So the total time spent is O(m + n)
Set the current BFS tree T=0 as claimed. ~
While /[f] is not empty
Initialize an empty list i[i÷ I] We described the algorithm using up to n separate lists L[i] for each layer
For each node u E i[i] L~. Instead of all these distinct lists, we can implement the algorithm using a
Consider each edge (u, u) incident to u single list L that we maintain as a queue. In this way, the algorithm processes
If Discovered[u] = false then nodes in the order they are first discovered: each time a node is discovered,
Set Discovered[u] = true it is added to the end of the queue, and the algorithm always processes the
Add edge (u,u) to the tree T edges out of the node that is currently first in the queue.
3.3 Implementing Graph Traversal Using Queues and Stacks 93
Chapter 3 Graphs
92

If we maintain the discovered nodes in this order, then al! nodes in layer Li DFS (s) :
will appear in the queue ahead of all nodes in layer Li+l, for i = 0, 1, 2 .... Thus, Initialize S to be a stack with one element s
all nodes in layer Li will be considered in a contiguous sequence, followed While S is not empty
by all nodes in layer Li+l, and so forth. Hence this implementation in terms Take a node u from S
of a single queue wi!l produce the same result as the BFS implementation If Explored[u] = false then
above. Set Explored[u] = true
For each edge (u, v) incident to u
Add v to the stack S
Endfor
Implementing Depth-First Search
Endif
We now consider the depth-first search algorithm: In the previous section we
Endwhile
presented DFS as a recursive procedure, which is a natural way to specify it.
However, it can also be viewed as almost identical to BFS, with the difference There is one final wrinkle to mention. Depth-first search is underspecified,
that it maintains the nodes to be processed in a stack, rather than in a queue. since the adjacency list of a node being explored can be processed in any order.
Essentially, the recursive structure of DFS can be viewed as pushing nodes Note that the above algorithm, because it pushes all adjacent nodes onto the
onto a stack for later processing, while moving on to more freshly discovered stack before considering any of them, in fact processes each adjacency list
nodes. We now show how to implement DFS by maintaining this stack of in the reverse order relative to the recursive version of DFS in the previous
nodes to be processed explicitly. section.
In both BFS and DFS, there is a distinction between the act of discovering
a node v--the first time it is seen, when the algorithm finds an edge leading (3.12) The above algorithm implements DFS, in the sense that it visits the
to v--and the act of exploring a node v, when all the incident edges to v are nodes in exactly the same order as the recursive DFS procedure in the previous
scanned, resulting in the potential discovery of further nodes. The difference section (except that each ad]acency list is processed in reverse order).
between BFS and DFS lies in the way in which discovery and exploration are
interleaved. If we want the algorithm to also find the DFS tree, we need to have each
In BFS, once we started to explore a node u in layer Li, we added all its node u on the stack S maintain the node that "caused" u to get added to
newly discovered neighbors to the next layer L~+I, and we deferred actually the stack. This can be easily done by using an array parent and setting
exploring these neighbors until we got to the processing of layer L~+I. In parent[v] = u when we add node v to the stack due to edge (u, v). When
contrast, DFS is more impulsive: when it explores a node u, it scans the we mark a node u # s as Explored, we also can add the edge (u,parent[u])
neighbors of u until it finds the fffst not-yet-explored node v (if any), and to the tree T. Note that a node v may be in the stack S multiple times, as it
can be adjacent to multiple nodes u that we explore, and each such node adds
then it immediately shifts attention to exploring v.
a copy of v to the stack S. However, we will only use one of these copies to
To implement the exploration strategy of DFS, we first add all of the nodes explore node v, the copy that we add last. As a result, it suffices to maintain one
adjacent to u to our list of nodes to be considered, but after doing this we value parent [v] for each node v by simply overwriting the value parent [v]
proceed to explore a new neighbor v of u. As we explore v, in turn, we add every time we add a new copy of v to the stack S.
the neighbors of v to the list we’re maintaining, but we do so in stack order,
The main step in the algorithm is to add and delete nodes to and from
so that these neighbors will be explored before we return to explore the other
the stack S, which takes O(1) time. Thus, to bound t~e running time, we
neighbors of u. We only come back to other nodes adjacent to u when there
need to bound the number of these operations. To count the number of stack
are no other nodes left.
operations, it suffices to count the number of nodes added to S, as each node
In addition, we use an array Explored analogous to the Discovered array needs to be added once for every time it can be deleted from S.
we used for BFS. The difference is that we only set Explored[v] to be true
when we scan v’s incident edges (when the DFS search is at v), while BFS sets How many elements ever get added to S? As before, let nu denote the
Discovered[v] to true as soon as v is first discovered. The implementation degree of node v. Node v will be added to the stack S every time one of its
nv adjacent nodes is explored, so the total number of nodes added to S is at
in full looks as follows.
Chapter 3 Graphs 3.4 Testing Bipartiteness: An Application of Breadth-First Search
94 95
most ~u nv = 2m. This proves the desired O(m + n) bound on the running Clearly a triangle is not bipartite, since we can color one node red,,another
time of DFS. one blue, and then we can’t do anything with the third node. More generally,
consider a cycle C of odd leng~, with nodes numbered 1, 2, 3 ..... 2k, 2k + 1.
(3.13) The above implementation of the DFS algorithm runs in time O( m + n) If we color node 1 red, then we must color node 2 blue, and then we must color
(i.e., linear in the input size), if the graph is given by the adjacency list node 3 red, and so on--coloring odd-numbered nodes red and even-numbered
representation. nodes blue. But then we must color node 2k + 1 red, and it has an edge to node
1, which is also red. This demonstrates that there’s no way to partition C into
red and blue nodes as required. More generally, if a graph G simply contains
Finding the Set of All Connected Components an odd cycle, then we can apply the same argument; thus we have established
In the previous section we talked about how one c.an use BFS (or DFS) to find the following.
all connected components of a graph. We start with an arbitxary node s, and
we use BFS (or DFS) to generate its connected component. We then find a (3,14) If.d graph G is bipartite, then it cannot contain an odd cycle.
node v (if any) that was not visited by the search from s and iterate, using
BFS (or DFS) starting from v to generate its connected component--which, by It is easy to recognize that a graph is bipartite when appropriate sets X
(3.8), wil! be disjoint from the component of s. We continue in this way until and Y (i.e., red and blue nodes) have actually been identified for us; and in
all nodes have been visited. many settings where bipartite graphs arise, this is natural. But suppose we
Although we earlier expressed the running time of BFS and DFS as O(m +" encounter a graph G with no annotation provided for us, and we’d like to
n), where m and n are the total number of edges and nodes in the graph, both determine for ourselves whether it is bipartite--that is, whether there exists a
BFS and DFS in fact spend work only on edges and nodes in the connected partition into red and blue nodes, as required. How difficult is this? We see from
component containing the starting node. (They never see any of the other (3.14) that an odd cycle is one simple "obstacle" to a graph’s being bipartite.
nodes or edges.) Thus the above algorithm, although it may run BFS or Are there other, more complex obstacles to bipartitness?
DFS a number of times, only spends a constant amount of work on a given
edge or node in the iteration when the connected component it belongs to is
under consideration. Hence the overall running time of this algorithm is still /’~ Designing the Algorithm ~
O(m + n). In fact, there is a very simple procedure to test for bipartiteness, and its analysis
can be used to show that odd cycles are the only obstacle. First we assume
the graph G is connected, since otherwise we can first compute its connected
3.4 Testing Bipartiteness: An Application of
components and analyze each of them separately. Next we pick any node s ~ V
Breadth-First Search and color it red; there is no loss in doing this, since s must receive some color.
Recall the definition of a bipartite graph: it is one where the node set V can It follows that all the neighbors of s must be colored blue, so we do this. It
be partitioned into sets X and Y in such a way that every edge has one end then follows that all the neighbors of these nodes must be colored red, their
in X and the other end in Y. To make the discussion a little smoother, we can neighbors must be colored blue, and so on, unti! the whole graph is colored. At
imagine that the nodes in the set X are colored red, and the nodes in the set this point, either we have a valid red/blue coloring of G, in which every edge
Y are colored blue. With this imagery, we can say a graph is bipartite if it is has ends of opposite colors, or there is some edge with ends of the same color.
possible to color its nodes red and blue so that every edge has one red end In this latter case, it seems clear that there’s nothing we ’could have donei G
and one blue end. simply is not bipartite. We now want to argue this point precisely and also
work out an efficient way to perform the coloring.
~ The Problem The first thing to notice is that the co!oring procedure we have just
In the earlier chapters, we saw examples of bipartite graphs. Here we start by described is essentially identical to the description of BFS: we move outward
asking: What are some natural examples of a nonbipartite graph, one where from s, co!oring nodes as soon as we first encounter them. Indeed, another
no such partition of V is possible? way to describe the coloring algorithm is as follows: we perform BFS, coloring
3.5 Connectivity in Directed Graphs 97
Chapter 3 Graphs
96
and then the y-z path in T. The length of this cycle is (j - i) + 1 + (j - i),-adding
s red, all of layer L1 blue, all of layer L2 red, and so on, coloring odd-numbered the length of its three parts separately; this is equal to 2(j - i) + 1;which is an
layers blue and even-numbered layers red. odd number. []
We can implement this on top of BFS, by simply taking the implementation
of BFS and adding an extra array Color over the nodes. Whenever we get
to a step in BFS where we are adding a node v to a list L[i + 1], we assign 3.5 Connectivity in Directed Graphs
Color[u] = red if i + I is an even number, and Color[u] = blue if i + 1 is an Thus far, we have been looking at problems on undirected graphs; we now
odd number. At the end of this procedure, we simply scan al! the edges and consider the extent to which these ideas carry over to the case of directed
determine whether there is any edge for which both ends received the same graphs.
color. Thus, the total running time for the coloring algorithm is O(m + n), iust Recall that in a directed graph, the edge (u, v) has a direction: it goes from
as it is for BFS. u to v. In this way, the relationship between u and v is asymmetric, and this
has qualitative effects on the structure of the resulting graph. In Section 3.1, for
~ Analyzing the Algorithm example, we discussed the World Wide Web as an instance of a large, complex
We now prove a claim that shows this algorithm correctly determines whether directed graph whose nodes are pages and whose edges are hyperlinks. The act
G is bipartite, and it also shows that we can find an odd cycle in G whenever of browsing the Web is based on following a sequence of edges in this directed
it is not bipartite. graph; and the directionality is crucial, since it’s not generally possible to
browse "backwards" by following hyperlinks in the reverse direction.
(3.15} Let G be a connected graph, and let LI, L2 .... be the layers produced At the same time, a number of basic definitions and algorithms have
by BFS starting at node s. Then exactly one of the following two things must natural analogues in the directed case. This includes the adjacency list repre-
hold. sentation and graph search algorithms such as BFS and DFS. We now discuss
these in turn.
(0 There is no edge of G joining two nodes of the same layer. In this case G
is a bipartite graph in which the nodes in even-numbered layers can be
colored red, and the nodes in odd-numbered layers can be colored blue. Representing Directed Graphs
(ii) There is an edge of G joining two nodes of the same layer. In this case, G In order to represent a directed graph for purposes of designing algorithms,
~ he cycle through x, y,~
d z has odd length9
contains an odd-length cycle, and so it cannot be bipartite. we use a version of the adiacency list representation that we employed for
undirected graphs. Now, instead of each node having a single list of neighbors,
each node has two lists associated with it: one list consists of nodes to which it
ProoL First consider case (i), where we suppose that there is no edge joining
has edges, and a second list consists of nodes from which it has edges. Thus an
two nodes of the same layer. By (3.4), we know that every edge of G ioins nodes
algorithm that is currently looking at a node u can read off the nodes reachable
either in the same layer or in adiacent layers. Our assumption for case (i) is
by going one step forward on a directed edge, as well as the nodes that would
precisely that the first of these two alternatives never happens, so this means
be reachable if one went one step in the reverse direction on an edge from u.
Layer Li that every edge joins two nodes in adjacent layers. But our coloring procedure
gives nodes in adjacent layers the opposite colors, and so every edge has ends The Graph Search Algorithms
with opposite colors. Thus this coloring establishes that G is bipartite.
Breadth-first search and depth-first search are almost the same in directed
Now suppose we are in case (ii); why must G contain an odd cycle? We graphs as they are in undirected graphs. We will focus here on BFSi We start
are told that G contains an edge joining two nodes of the same layer. Suppose at a node s, define a first layer of nodes to consist of all those to which s has
Layer Lj
this is the edge e = (x, y), with x, y ~ Lj. Also, for notational reasons, recall an edge, define a second layer to consist of all additional nodes to which these
that L0 ("layer 0") is the set consisting of just s. Now consider the BFS tree T first-layer nodes have an edge, and so forth. In this way, we discover nodes
Figure 3.6 If two nodes x and
y in the same layer a_re joined produced by our algorithm, and let z be the node whose layer number is as layer by layer as they are reached in this outward search from s, and the nodes
by an edge, then the cycle large as possible, subject to the condition that z is an ancestor of both x and y in layer j are precisely those for which the shortest path from s has exactly
through x, y, and their lowest in T; for obvious reasons, we can cal! z the lowest common ancestor of x and y.
common ancestor z has odd j edges. As in the undirected case, this algorithm performs at most constant
length, demonstrating that Suppose z ~ Li, where i < j. We now have the situation pictured in Figure 3.6. work for each node and edge, resulting in a running time of O(m + n).
the graph cannot be bipartite. We consider the cycle C defined by following the z-x path in T, then the edge e,
3.6 Directed Acyclic Graphs and Topological Ordering
Chapter 3 Graphs 99
98
every node has a path to s. Then s and v are mutually reachable for every v,
It is important to understand what this directed version of BFS is comput-
and so it follows that every two nodes u and v are mutually reachable: s and
ing. In directed graphs, it is possible for a node s to have a path to a node t u are mutually reachable, and s and v are mutually reachable, so by (3.16) we
even though t has no path to s; and what directed BFS is computing is the set
also have that u and v are mutually reachable.
of all nodes t with the property that s has a path to t. Such nodes may or may
By analogy with connected components in an undirected graph, we can
not have paths back to s.
define the strong component containing a node s in a directed graph to be the
There is a natural analogue of depth-first search as well, which also runs
set of all v such that s and v are mutually reachable. If one thinks about it, the
in linear time and computes the same set of nodes. It is again a recursive
algorithm in the previous paragraph is really computing the strong component
procedure that tries to explore as deeply as possible, in this case only following
containing s: we run BFS starting from s both in G and in Gre"; the set of nodes
edges according to their inherent direction. Thus, when DFS is at a node u, it
reached by both searches is the set of nodes with paths to and from s, and
recursively launches .a depth-first search, in order, for each node to which u
hence this set is the strong component containing s.
has an edge.
There are further similarities between the notion of connected components
Suppose that, for a given node s, we wanted the set of nodes with paths
to s, rather than the set of nodes to which s has paths. An easy way to do this in undirected graphs and strong components in directed graphs. Recall that
connected components naturally partitioned the graph, since any two were
would be to define a new directed graph, Grev, that we obtain from G simply
either identical or disjoint. Strong components have this property as well, and
by reversing the direction of every edge. We could then run BFS or DFS in GreY;
for essentially the same reason, based on (3.16).
a node has a path from s in Gre~ if and only if it has a path to s in G.
(3.17) For any two nodes s and t in a directed graph, their strong Components
Strong Connectivity
are either identical or disjoint.
Recall that a directed graph is strongly connected if, for every two nodes u and
v, there is a path from u to v and a path from v to u. It’s worth also formulating
Proof. Consider any two nodes s and t that are mutually reachable; we claim
some terminology for the property at the heart of this definition; let’s say that
that the strong components containing s and t are identical. Indeed, for any
two nodes u and v in a directed graph are mutually reachable if there is a path
node v, if s and v are mutually reachable, then by (3.16), t and v are mutually
from u to v and also a path from v to u. (So a graph is strongly connected if
reachable as wel!. Similarly, if t and v are mutually reachable, then again by
every pair of nodes is mutually reachable.)
(3.16), s and v are mutually reachable.
Mutual teachability has a number of nice properties, many of them stem-
On the other hand, if s and t are not mutually reachable, then there cannot
ruing from the following simple fact.
be a node v that is in the strong component of each. For if there were such
(3.16) If u and v are mutually reachable, and v and iv are mutually reachable, a node v, then s and u would be mutually reachable, and ~ and t would be
then u and iv are mutually reachable. mutually reachable, so from (3.16) it would follow that s and t were mutually
Proof. To construct a path from u to w, we first go from u to v (along the reachable. ,,
path guaranteed by the mutual teachability of u and v), and then on from v In fact, although we will not discuss the details of this here, with more
to iv (along the path guaranteed by the mutual teachability of v and w). To work it is possible to compute the strong components for all nodes in a total
construct a path from w to u, we just reverse this reasoning: we first go from time of O(m + n).
iv to v (along the path guaranteed by the mutual reachability of v and iv), and
then on from v to u (along the path guaranteed by the mutual teachability of 3.6 Directed Acyclic Graphs and
u and v). a
Topological Ordering
There is a simple linear-time algorithm to test if a directed graph is strongly
If an undirected graph has no cycles, then it has an extremely simple structure:
connected, implicitly based on (3.16). We pick any node s and run BFS in G
each of its connected components is a tree. But it is possible for a directed graph
starting from s. We then also run BFS starting from s in Grev. Now, if one of
to have no (directed).cycles and still have a very rich structure. For example,
these two searches fails to reach every node, then clearly G is not strongly
such graphs can have a large number of edges: if we start with the node
connected. But suppose we find that s has a path to every node, and that
3.6 Directed Acyclic Graphs and Topological Ordering 101
Chapter 3 Graphs
100
Let’s continue a little further with this picture of DAGs as precedence
~e~ a topological ordering, all
ges point from left to right.)
relations. Given a set of tasks with dependencies, it would be natural to seek
a valid order in which the tasks could be performed, so that all dependencies
are respected. Specifically, for a directed graph G, we say that a topological
ordering of G is an ordering of its nodes as ul, u2 ..... un so that for every edge
(ui, uj), we have i < j. In other words, all edges point "forward" in the ordering.
A topological ordering on tasks provides an order in which they can be safely
performed; when we come to the task vj, all the tasks that are required to
precede it have already been done. In Figure 3.7(b) we’ve labeled the nodes of
the DAG from part (a) with a topological ordering; note that each edge indeed
(c) goes from a lower-indexed node to a higher-indexed node.
In fact, we can view a topological ordering of G as providing an immediate
Figure 3.7 (a) A directed acyclic graph. (b) The same DAG with a topological ordering, "proof" that G has no cycles, via the following.
specified by the labels on each node. (c) A different drawing of the same DAG, arranged
so as to emphasize the topological ordering.
(3.18} If G has a topological ordering, then G is a DAG.

set {1, 2 ..... n} and include an edge (i,j) whenever i <j, then the resulting Proof. Suppose, by way of contradiction, that G has a topological ordering
directed graph has (~) edges but no cycles. un, and also has a cycle C. Let ui be the lowest-indexed node on C,
If a directed graph has no cycles, we call it--naturally enough--a directed and let uj be the node on C just before ui--thus (vj, vi) is an edge. But by our
acycIic graph, or a DAG for short. (The term DAG is typically pronounced as a choice of i, we havej > i, which contradicts the assumption that u1, u2 ..... un
word, not spelled out as an acronym.) In Figure 3.7(a) we see an example of was a topological ordering. ~
a DAG, although it may take some checking to convince oneself that it really
has no directed cycles. The proof of acyclicity that a topological ordering provides can be very
useful, even visually. In Figure 3.7(c), we have drawn the same graph as
in (a) and (b), but with the nodes laid out in the topological ordering. It is
~,~ The Problem
immediately clear that the graph in (c) is a DAG since each edge goes from left
DAGs are a very common structure in computer science, because many kinds
to right.
of dependency networks of the type we discussed in Section 3.1 are acyclic.
Thus DAGs can be used to encode precedence relations or dependencies in a Computing a Topological Ordering Themain question we consider here is
natural way. Suppose we have a set of tasks labeled {1, 2 ..... n} that need to the converse of (3. ! 8): Does every DAG have a topological ordering, and if so,
be performed, and there are dependencies among them stipulating, for certain how do we find one efficiently? A method to do this for every DAG would be
pairs i and j, that i must be performed before j. For example, the tasks may be very useful: it would show that for any precedence relation on a set of tasks
courses, with prerequisite requirements stating that certain courses must be without cycles, there is an efficiently computable order in which to perform
taken before others. Or the tasks may correspond to a pipeline of computing the tasks.
iobs, with assertions that the output of iob i is used in determining the input
to iob j, and hence job i must be .done before iob j. ~ Designing and Analyzing the Algorithm
We can represent such an interdependent Set of tasks by introducing a In fact, the converse of (3.18) does hold, and we establish this via an efficient
node for each task, and a directed edge (i, j) whenever i must be done before algorithra to compute a topological ordering. The key to this lies in finding a
j. If the precedence relation is to be at all meaningful, the resulting graph G way to get started: which node do we put at the beginning of the topological
must be a DAG. Indeed, if it contained a cycle C, there would be no way to do ordering? Such a node Vl would need to have no incoming edges, since any
any of the tasks in C: since each task in C cannot begin until some other one such incoming edge would violate the defining property of the topological
completes, no task in C could ever be done, since none could be done first.
3.6 Directed Acyclic Graphs and Topological Ordering
Chapter 3 Graphs 103
102

ordering, that all edges point forward. Thus, we need to prove the following
fact.
(3.19) In every DAG G, there is a node v with no incoming edges.

Proof. Let G be a directed graph in which every node has at least one incoming
edge. We show how to find a cycle in G; this will prove the claim. We pick
any node v, and begin following edges backward from v: sihce v has at least
one incoming edge (u, v), we can walk backward to u; then, since u has at (a)
least one incoming edge (x, u), we can walk backward to x; and so on. We
can continue this process indefinitely, since every node we encounter has an
incoming edge. But after n + I steps, we will have visited some node w twice. If
we let C denote the sequence of nodes encountered between successive visits
to w, then clearly C forms a cycle, m

In fact, the existence of such a node v is all we need to produce a topological (d) (e) (f)
ordering of G by induction. Specifically, let us claim by induction that every
Figure 3.8 Starting from the graph in Figure 3.7, nodes are deleted one by one so as
DAG has a topological ordering. This is clearly true for DAGs on one or two to be added to a topologica! ordering. The shaded nodes are those with no incoming
nodes. Now suppose it is true for DAGs with up to some number of nodes n. edges; note that there is always at least one such edge at every stage of the algorithm’s
Then, given a DAG G on n + 1 nodes, we find a node v with no incoming edges, execution.
as guaranteed by (3.19). We place v first in the topological ordering; this is
safe, since all edges out of v will point forward. Now G-(v} is a DAG, since
deleting v cannot create any cycles that weren’t there previously. Also, G- {v}
(3.19) guarantees, is that when we apply this algorithm to a DAG, there will
has n nodes, so we can apply the induction hypothesis to obtain a topological
always be at least one such node available to delete.
ordering of G- {v}. We append the nodes of G- {v} in this order after v; this is
an ordering of G in which all edges point forward, and hence it is a topological To bound the running time of this algorithm, we note that identifying a
node v with no incoming edges, and deleting it from G, can be done in O(n)
ordering.
time. Since the algorithm runs for n iterations, the total running time is O(n2).
Thus we have proved the desired converse of (3.18).
This is not a bad running time; and if G is very dense, containing ®(n2)
(3.20) ff G is a DAG, then G has a topological ordering. .... ~ edges, then it is linear in the size of the input. But we may well want something
better when the number of edges m is much less than n2. In such a case, a
The inductive proof contains the following algorithm to compute a topo- running time ofO(m + n) could be a significant improvement over ®(n2).
logical ordering of G. In fact, we can achieve a running time of O(m + n) using the same high-
level algorithm--iteratively deleting nodes with no incoming edges. We simply
To compute a topological ordering of G: have to be more efficient in finding these nodes, and we do tBis as follows.
Find a node v with no incoming edges and order it first We declare a node to be "active" ff it has not yet been deleted by the
Delete v from G algorithm, and we explicitly maintain two things:
Recursively compute a topological ordering of G-{v}
and append this order after u (a) for each node m, the number of incoming edges that tv has from active
nodes; and
In Figure 3.8 we show the sequence of node deletions that occurs when this
algorithm is applied to the graph in Figure 3.7. The shaded nodes in each
(b) the set S of all active nodes in G that have no incoming edges from other
active nodes.
iteration are those with no incoming edges; the crucial point, which is what
Solved Exercises 105
Chapter 3 Graphs
104
with a base station, and your friends find that if the robots get too close to one
At the start, all nodes are active, so we can initialize (a) and (b) with a single
another, then there are problems with interference among the transmitters. So
pass through the nodes and edges. Then, each iteration consists of selecting
a natural problem arises: how to plan the motion of the robots in such a way
a node u from the set S and deleting it. After deleting u, we go through all
that each robot gets to its intended destination, but in the process the robots
nodes tv to which u had an edge, and subtract one from the number of active
don’t come close enough together to cause interference problems.
incoming edges that we are maintaining for w. If this causes the number
of active incoming edges tow to drop to zero, then we add tv to the set S. We can model this problem abstractly as follows. Suppose that we have
Proceeding in this way, we keep track of nodes that are eligible for deletion at an undirected graph G = (V, E), representing the floor plan of a building, and
all times, while spending constant work per edge over the course of the whole there are two robots initially located at nodes a and b in the graph. The robot
at node a wants to travel to node c along a path in G, and the robot at node b
algorithm.
wants to travel to node d. This is accomplished by means of a schedule: at
each time step, the schedule specifies that one of the robots moves across a
Solved Exercises single edge, from one node to a neighboring node; at the end of the schedule,
the robot from node a should be sitting on c, and the robot from b should be
Solved Exercise 1 sitting on d.
Figure 3.9 How many topo- Consider the directed acyclic graph G in Figure 3.9. How many topological A schedule is interference-free if there is no point at which the two.robots
logical orderings does this orderings does it have? occupy nodes that are at a distance < r from one another in the graph, for a
graph have? Solution Recall that a topological ordering of G is an ordering of the nodes. given parameter r. We’ll assume that the two starting nodes a and b are at a
as vl, v2 ..... vn so that all edges point "forward": for every edge (vi, vj), we distance greater than r, and so are the two ending nodes c and d.
have i < j. Give a polynomial-time algorithm that decides whether there exists an
So one way to answer this question would be to write down all 5- 4.3.2- interference-free schedule by which each robot can get to its destination.
1 = 120 possible orderings and check whether each is a topological ordering.
Solution This is a problem of the following general flavor. We have a set
But t_his would take a while.
of possible configurations for the robots, where we define a configuration
Instead, we think about this as follows. As we saw in the text (or reasoning
to be a choice of location for each one. We are trying to get from a given
directly from the definition), the first node in a topological ordering must be
starting configuration (a, b) to a given ending configuration (c, d), subject to
one that has no edge coming into it. Analogously, the last node must be one
constraints on how we can move between configurations (we can only change
that has no edge leaving it. Thus, in every topological ordering of G, the node a
one robot’s location to a neighboring node), and also subject to constraints on
must come first and the node e must come last. which configurations are "legal."
Now we have to figure how the nodes b, c, and d can be arranged in the
This problem can be tricky to think about if we view things at the level of
middle of the ordering. The edge (c, d) enforces the requirement that c must
the underlying graph G: for a given configuration of the robots--that is, the
come before d; but b can be placed anywhere relative to these two: before
current location of each one--it’s not clear what rule we should be using to
both, between c and d, or after both. This exhausts ~11 the possibilities, and
decide how to move one of the robots next. So instead we apply an idea that
so we conclude that there are three possible topological orderings: can be very useful for situations in which we’re trying to perform this type of
a,b,c,d,e search. We observe that our problem looks a lot like a path-finding problem,
not in the original graph G but in the space of all possible configurations.
a,c,b,d,e
Let us define the following (larger) graph H. The node set of H is the set
a,c,d,b,e of all possible configurations of the robots; that is, H consists of a!! possible
pairs of nodes in G. We join two nodes of H by an edge if they represent
Solved Exercise 2 configurations that could be consecutive in a schedule; that is, (u, v) and
(u’, u’) will be joined by an edge in H if one of the pairs u, u’ or v, u’ are equal,
Some friends of yours are working on techniques for coordinating groups of
and the other pair corresponds to an edge in G.
mobile robots. Each robot has a radio transmitter that it uses to communicate
Exercises
106 Chapter 3 Graphs 107
Now we have H’, and so we just need to decide whether there is a path
We can already observe that paths in H from (a,/)) to (c, d) correspond
from (a, b) to (c, d). This can be done using the connectivity algorithm
to schedules for the robots: such a path consists precisely of a sequence of from the text in time that is linear in the number of nodes and edges
configurations in which, at each step, one robot crosses a single edge in G. of H’. Since H’ has O(n2) nodes and O(n~) edges, this final step takes
However, we have not yet encoded the notion that the schedule should be polynomial time as well.
interference-free.
To do this, we simply delete from H all nodes that correspond to configura-
tions in which there would be interference. Thus we define H~ to be the graph Exercises
obtained from H by deleting all nodes (u, v) for which the distance between
u and v in G is at most r. 1. Considhr the directed acyclic graph G in Figure 3.10. How many topolog- Figure 3.10 How many topo-
The full algorithm is then as follows. We construct the graph H’, and then ical orderings does it have? logical orderings does this
graph have?
run the connectiviW algorithm from the text to determine whether there is a Give an algorithm to detect whether a given undirected graph contains
path from (a, b) to (c, d). The correctness of the algorithm follows from the
a cycle. If the graph contains a cycle, then your algorithm should output
fact that paths in H’ correspond to schedules, and the nodes in H’ correspond
one. (It should not output all cycles in the graph, just one of them.) The
precisely to the configurations in which there is no interference.
running time of your algorithm should be O(m + n) for a graph with n
Finally, we need to consider the running time. Let n denote the number nodes and m edges.
of nodes in G, and m denote the number of edges in G. We’ll analyze the
running time by doing three things: (1) bounding the size of H’ (which will in 3. The algorithm described in Section 3.6 for computing a topological order-
general be larger than G), (2) bounding the time it takes to construct H’, and ing of a DAG repeatediy finds a node with no incoming edges and deletes
(3) bounding the time it takes to search for a path from (a, b) to (c, d) in H. it. This will eventually produce a topological ordering, provided that the
¯ input graph really is a DAG.
1. First, then, let’s consider the size of H’. H’ has at most nz nodes, since But suppose that we’re given an arbitrary graph that may or may not
its nodes correspond to pairs of nodes in G. Now, how many edges does be a DAG. Extend the topological ordering algorithm so that, given an
H’ have? A node (u, v) will have edges to (u’, v) for each neighbor u’ input directed graph G, it outputs one of two things: (a) a topological
of u in G, and to (u, v’) for each neighbor v’ of v in G. A simple upper ordering, thus establishing that a is a DAG; or (b) a cycle in G, thus
bound says that there can be at most n choices for (u’, u), and at most n establishing that a is not a DAG. The nmning time of your algorithm
choices for (u, v’), so there are at most 2n edges incident to each node should be O(m + n) for a directed graph with n nodes and m edges.
of H’. Summing over the (at most) n2 nodes of H’, we have O(n3) edges.
inspired by the example of that great Cornellian, Vladimir Nabokov, some
(We can actually give a better bound of O(mn) on the number of
of your frien.ds have become amateur lepidopterists (they study butter-
edges in H~, by using the bound (3.9) we proved in Section 3.3 on the
flies). Often when they return from a trip with specimens of butterf~es,
sum of the degrees in a graph. We’ll leave this as a further exercise.)
it is very difficult for them to tell how many distinct species they’ve
2. Now we bound the time needed to construct H’. We first build H by caught--thanks to the fact that many species look very similar to one
enumerating all pairs of nodes in G in time O(n2), and constructing edges another.
using the defiNtion above in time O(n) per node, for a total of O(n3).
One day they return with n butterflies, and thfiy believe that each
Now we need to figure out which nodes to delete from H so as to produce
belongs to one of two different species, which we’ll call A and B for
H’. We can do this as follows. For each node u in G, we run a breadth- purposes of this discussion. They’d like to divide the n specimens into
first search from u and identify all nodes u within distance r of u. We list
all these pairs (u, v) and delete them from H. Each breadth-first search two groups--those that belong to .4 and those that belong to B--but it’s
in G takes time O(m + n), and we’re doing one from each node, so the very hard for them to directly label any one specimen. So they decide to
adopt the following approach.
total time for this part is O(rnri + n2).
Chapter 3 Graphs
Exercises 109
108
the following property: at all times, eac~ device i is within 500 meters
For each pair of specimens i and j, they study them carefully side by
of at least n/2 of the other devices. (We’ll assume n is an even number.)
side. If they’re confident enough in their judgment, then they 1abe! the
What they’d like to know is: Does this property by itself guarantee that
pair (i,]) either "same" (meaning they believe them both to come from
the network will remain connected?
the same species) or "different" (meaning they believe them to come from
different species). They also have the option of rendering no judgment Here’s a concrete way to formulate the question as a claim about
on a given pair, in which case we’]] call the pair ambiguous. graphs.
So now they have the collection of n specimens, as we]] as a collection Claim: Let G be a graph on n nodes, where n is an even number. If every node
of m judgments (either "same" or "different") for the pairs that were not of G has degree at least hi2, then G is connected.
declared to be ambiguous. They’d like to know if this data is consistent
with the idea that each butterfly is from one of species A or B. So more Decide whether you think the claim is true or false, and give a proof of
concretely, we’ll declare the m judgments to be consistent if it is possible either the claim or its negation.
to label each specimen either A or/3 in such a way that for each pair (i,])
labeled "same," it is the case that i andj have the same label; and for each A number of stories In the press about the structure of the Internet and
pair (i,j) labeled "different," it is the case that i andj have different labels. the Web have focused on some version of the following question: How
They’re in the middle of tediously working out whether their judgments far apart are typical nodes in these networks? If you read these stories
are consistent, when one of them realizes that you probably have an carefully, you find that many of them are confused about the difference
algorithm that would answer this question right away. between the diameter of a network and the average distance in a network;
they often jump back and forth between these concepts as though they’re
Give an algorithm with running time O(m + n) that determines
the same thing.
whether the m judgments are consistent.
As in the text, we say that the distance between two nodes a and v
A binary tree is a rooted tree in which each node has at most two children. in a graph G = (V, E) is the minimum number of edges in a path joining
Show by induction that in any binary tree the number of nodes with two them; we’]] denote this by dist(a, u). We say that the diameter of G is
children is exactly one less than the number of leaves. the maximum distance between any pair of nodes; and we’H denote this
quantity by diam(G).
We have a connected graph G = (V, E), and a specific vertex a ~ V. Suppose Let’s define a related quantity, which we’H ca]] the average pairwise
we compute a depth-first search tree rooted at a, and obtain a tree T that distance In G (denoted apd(G)). We define apd(G) to be the average, over
includes all nodes of G. Suppose we then compute a breadth-first search
all (~) sets of two distinct nodes a and u, of the distance between a and ~.
tree rooted at a, and obtain the same tree T. Prove that G = T. (In other That is,
words, if T is both a depth-first search tree and a breadth-first search
tree rooted at a, then G cannot contain anY edges that do not belong to
T.)
Some friends of yours work on wireless networks, and they’re currently
Here’s a simple example to convince yourself that there are graphs G
studying the properties of a network of n mobile devices. As the devices
move around (actually, as their human owners move around), they defIne for which diam(G) # apd(G). Let G be a graph with ~ee nodes a, v, w, and
with the two edges {a, ~} and {v, w}. Then
a graph at any point in time as follows: there is a node representing each
of the n devices, and there is an edge between device i and device j ff the diam(G) = dist(a, w) = 2,
physical locations of i andj are no more than 500 meters apart. (if so, we
say that i and ] are "in range" of each other.) while
They’d like it to be the case that the network of devices is connected at
apd(G) = [dist(u, v) + dist(a, w) + dist(u, w)]/3 = 4/3.
all times, and so they’ve constrained the motion of the devices to satisfy
Chapter 3 Graphs Exercises 111
110
Of course, these two numbers aren’t all that far apart in the case of Of course, a single, spurious short path between nodes v and w in
this three-node graph, and so it’s natural to ask whether there s alway S a such a network may be more coincidental than anything else; a large
dose relation between them. Here’s a claim that tries to make this precise. number of short paths between u and w can be much more convincing.
So In addition to the problem of computing a single shortest v-w path
Claim: There exists a positive natural number c so that for all connected graphs in a graph G, social networks researchers have looked at the problem of
G, it is the case that determining the number of shortest u-w paths.
diam(G) This rams out to be a problem that can be solved efficiently. Suppose
apd(G) - we are given an undirected graph G = (V, E), and we identif3, two nodes v
Decide whether you think the claim is true or false, and give a proof of and w in G. Give an algorithm that computes the number of shortest u-w
paths In G. (The algorithm should not list all the paths; just the number
either the claim or its negation.
suffices.) The nmning time of your algorithm should be O(m + n) for a
~q~There’s a natural intuition that two nodes that are far apart in a com- graph with n nodes and m edges.
munication network--separated by many hops--have a more tenuous
connection than two nodes that are close together. There are a number 11. You’re helping some security analysts monitor a collection of networked
of algorithmic results that are based to some extent on different ways of computers, tracking the spread of an online virus. There are n computers
making this notion precise. Here’s one that involves the susceptibiliw of in the system, labeled C1, C2 ..... Cn, and as input you’re given a collection
paths to the deletion of nodes. of trace data Indicating the times at which pairs of computers commu-
Suppose that an n-node undirected graph G = (V, E) contains two nicated. Thus the data is a sequence of ordered triples (Ci, Cj, tk); such a
nodes s and t such that the distance between s and t is strictly greater triple indicates that Ci and Cj exchanged bits at time tk. There are m triples
than n/2. Show that there must exist some node u, not equal to either s total.
or t, such that deleting v from G destroys all s-t paths. (In other words, We’ll assume that the tTiples are presented to you in sorted order of
the graph obtained from G by deleting v contains no path from s to t.) time. For purposes of simplicity, we’ll assume that each pair of computers
Give an algorithm with runnin~ time O(m + n) to find such a node v. communicates at most once during the interval you’re observing.
The security analysts you’re working with would like to be able to
10, A number of art museums around the countts, have been featuring work
by an artist named Mark Lombardi (1951-2000), consisting of a set of answer questions of the following form: If the virus was inserted into
intricately rendered graphs. Building on a great deal of research, these computer Ca at time x, could it possibly have infected computer Cb by
graphs encode the relationships among people involved in major political time y? The mechanics of infection are simple: if an infected computer
scandals over the past several decades: the nodes correspond to partici- Ci communicates with an uninfected computer Cj at time t~ (in other
pants, and each edge indicates some type of relationship between a pair words, if one of the triples (Ci, Cp t~) or (Cj, Ci, t~) appears In the trace
of participants. And so, if you peer c!osely enough at the drawings, you data), then computer Ci becomes infected as well, starting at time t~.
can trace out ominous-looking paths from a high-ranking U.S. govern- Infection can thus spread from one machine to another across a sequence
ment official, to a former business partner, to a bank in Switzerland, to of communications, provided that no step in this sequence involves a
move backward in time. Thus, for example, If Ci is infected by time tk,
a shadowy arms dealer.
and the trace data contains triples (Ci, Cj, tD and (Cp Cq, tr), where tk <_ tr,
Such pictures form striking examples of social networks, which, as
then Ca will become infected via C~. (Note that it is Okay for t~ to be equal
we discussed in Section 3.1, have nodes representing people and organi-
to 6; this would mean that Cj had open connections to both Ci and Cq at
zations, and edges representing relationships of various kinds. And the
the same time, and so a virus could move from Ci to Ca.)
short paths that abound in these networks have attracted considerable
attention recently, as people ponder what they mean. In the case of Mark For example, suppose n = 4, the trace data consists of the triples
Lombardi’s graphs, they hint at the short set of steps that can carry you
(Ci, C2, 4), (C2, C4, 8), (C3, C4, 8), (Cl, C4, 12),
from the reputable to the disreputable.
Notes and Further Reading 113
Chapter 3 Graphs
112
Euler (1736), grown through interest in graph representations of maps and
and the virus was inserted into computer C1 at time 2. Then C3 would be
chemical compounds in the nineteenth century, and emerged as a systematic
infected at time 8 by a sequence of three steps: first C2 becomes ~ected
area of study in the twentieth century, first as a branch of mathematics and later
at time 4, then C4 gets the virus from C2 at time 8, and then G3 gets the
also through its applications to computer science. The books by Berge (1976),
virus from C4 at time 8. On the other hand, if the trace data were Bollobas (1998), and Diestel (2000) provide substantial further coverage of
8), (C1, C4, 12), (C~0 Cz, 14), graph theory. Recently, extensive data has become available for studying large
networks that arise in the physical, biological, and social sciences, and there
and again the virus was inserted into computer C1 at time 2, then C3 has been interest in understanding properties of networks that span all these
would not become infected during the period of observation: although different domains. The books by Barabasi (2002) and Watts (2002) discuss this
¢z becomes infected at time 14, we see that ¢~ only communicates with Cz emerging area of research, with presentations aimed at a general audience.
before ~2 was infected. There is no sequence of communications moving
The basic graph traversal techniques covered in this chapter have numer-
forward in time by which the virus could get from C1 to C~ in this second
ous applications. We will see a number of these in subsequent chapters, and
example. we refer the reader to the book by Tarjan (1983) for further results.
Design an algorithm that answers questions of this type: given a
Notes on the Exercises Exercise 12 is based on a result of Martin Golumbic
collection of trace data, the algorithm should decide whether a virus
and Ron Shamir.
introduced at computer Ca at time x could have infected computer Cb
by time y. The algorithm sh6uld run in time O(rn + n).

12. You’re helping a group of ethnographers analyze some oral history data
they’ve collected by interviewing members of a village to learn about the
fives of people who’ve lived there over the past two hundred years.
From these interviews, they’ve learned about a set of n people (all
Vn" They’ve also

collected facts about when these people lived relative to one another.
Each fact has one of the following two forms:
* For some i and j, person Vi died before person Pj was born; or
o for some i and j, the life spans of Vi and Pj overlapped at least partially.
Naturally, they’re not sure that all these facts are correct; memories
are not so good, and a lot of this was passed down by word of mouth. So
what they’d like you to determine is whether the data they’ve collected is
at least internally consistent, in the sense that there could have existed a
set of people for which all the facts they’ve learned simultaneously hold.
Give an efficient algorithm to do this: either it should produce pro-
posed dates of birth and death for each of the n people so that all the facts
hold true, or it should report (correctly) that no such dates can exist--that
is, the facts collected by the ethnographers are not internally consistent.

Notes and Further Reading

The theory of graphs is a large topic, encompassing both algorithmic and non-
algorithmic issues. It is generally considered to have begun with a paper by
Greedy A~gorithms

In Wall Street, that iconic movie of the 1980s, Michael Douglas gets up in
front of a room full of stockholders and proclaims, "Greed... is good. Greed
is right. Greed works." In this chapter, we’ll be taking a much more understated
perspective as we investigate the pros and cons of short-sighted greed in the
design of algorithms. Indeed, our aim is to approach a number of different
computational problems with a recurring set of questions: Is greed good? Does
greed work?
It is hard, if not impossible, to define precisely what is meant by a greedy
algorithm. An algorithm is greedy if it builds up a solution in sma!l steps,
choosing a decision at each step myopically to optimize some underlying
criterion. One can often design many different greedy algorithms for the same
problem, each one locally, incrementally optimizing some different measure
on its way to a solution.
When a greedy algorithm succeeds in solving a nontrivial problem opti-
mally, it typically implies something interesting and useful about the structure
of the problem itself; there is a local decision role that one can use to con-
struct optimal solutions. And as we’ll see later, in Chapter 11, the same is true
of problems in which a greedy algorithm can produce a solution that is guar-
anteed to be close to optimal, even if it does not achieve the precise optimum.
These are the kinds of issues we’ll be dealing with in this chapter. It’s easy to
invent greedy algorithms for almost any problem; finding cases in which they
work well, and proving that they work well, is the interesting challenge.
The first two sections of this chapter will develop two basic methods for
proving that a greedy algorithm produces an-optimal solution to a problem.
One can view the first approach as establishing that the greedy algorithm stays
ahead. By this we mean that if one measures the greedy algorithm’s progress
4.1 Interval Scheduling: The Greedy Mgorithm Stays Ahead 117
Chapter 4 Greedy Algorithms
116
The most obvious rule might be to always select the available request
in a step-by-step fashion, one sees that it does better than any other algorithm that starts earliest--that is, the one with minimal start time s(i). This
at each step; it then follows that it produces an optimal solution. The second way our resource starts being used as quickly as possible.
approach is known as an exchange argument, and it is more general: one
considers any possible solution to the problem and gradually transforms it This method does not yield an optimal solution. If the earliest request
into the solution found by the greedy algorithm without hurting its quality. i is for a very long interval, then by accepting request i we may have to
Again, it will follow that the greedy algorithm must have found a solution that reject a lot of requests for shorter time intervals. Since our goal is to satisfy
as many requests as possible, we will end up with a suboptimal solution.
is at least as good as any other solution.
In a really bad case--say, when the finish time f(i) is the maximum
Following our introduction of these two styles of analysis, we focus on
among al! requests--the accepted request i keeps our resource occupied
several of the most well-known applications of greedy algorithms: shortest for the whole time. In this case our greedy method would accept a single
paths in a graph, the Minimum Spanning Tree Problem, and the construc- request, while the optimal solution could accept many. Such a situation
tion of Huff-man codes for performing data compression. They each provide is depicted in Figure 4.1 (a).
nice examples of our analysis techniques. We also explore an interesting re-
lationship between minimum spanning trees and the long-studied problem of This might suggest that we should start out by accepting the request that
clustering. Finally, we consider a more complex application, the Minimum- requires the smallest interval of time--namely, the request for which
Cost Arborescence Problem, which further extends our notion of what a greedy f(i)- s(i) is as small as possible. As it turns out, this is a somewhat
better rule than the previous one, but it still can produce a suboptimal
algorithm is.
schedule. For example, in Figure 4.!(b), accepting the short interval in
the middle would prevent us from accepting the other two, which form
4.1 Interval Scheduling: The Greedy Algorithm an optimal solution.
Stays Ahead
Let’s recall the Interval Scheduling Problem, which was the first of the five
representative problems we considered in Chapter 1. We have a set of requests
{1, 2 ..... n}; the ith request corresponds to an interval of time starting at s(i) I
and finishing at f(i). (Note that we are slightly changing the notation from
Section 1.2, where we used si rather than s(i) and fi rather than f(i). This (a)
change of notation will make things easier to talk about in the proofs.) We’ll
say that a subset of the requests is compatible if no two of them overlap in time,
l
and our goal is to accept as large a compatible subset as possible. Compatible
sets of maximum size will be called optimaL.

~ Designing a Greedy Algorithm

Using the Interval Scheduling Problem, we can make our discussion of greedy lI
algorithms much more concrete. The basic idea in a greedy algorithm for
interval scheduling is to use a simple rule to select a first request il. Once
a request il is accepted, we reject all requests that are not compatible with
We then select the next request i2 tO be accepted, and again reject all requests
that are not compatible with i2. We continue in this fashion until we run out (c)
of requests. The cha!lenge in designing a good greedy algorithm is in deciding Figure 4.1 Some instances of the Interval Scheduling Problem on which natural greedy
which simple Me to use for the selection--and there are many natural rules algorithms fail to find the optimal solution. In (a), it does not work to select the interval
for this problem that do not give good solutions. that starts earliest; in (b), it does not work to select the shortest interval; and in (c), it
does not work to select the interval with the fewest conflicts.
Let’s try to think of some of the most natural rules and see how they work.
4.1 Interval Scheduling: The Greedy Algorithm Stays Ahead 119
Chapter 4 Greedy Mgofithms
118
8
o In the previous greedy rule, our problem was that the second request
1 3 5 9
competes with both the first and the third--that is, accepting this request Intervals numbered in order
2 4 7
made us reject two other requests. We could design a greedy algorithm
that is based on this idea: for each request, we count the number of
8
other requests that are not compatible, and accept the request that has 1 3 5
Selecting interval 1 ~ t ! 9
the fewest number of noncompatible requests. (In other words, we select I t
4 7
the interval with the fewest "conflicts.") This greedy choice would lead
to the optimum solution in the previous example. In fact, it is quite a
bit harder to design a bad example for this rule; but it can be done, and -- ~
Selecting interval 3 ~ 3t I 5I t 9~
we’ve drawn an example in Figure 4.1 (c). The unique optimal solution 7
I
in this example is to accept the four requests in the top row. The greedy
method suggested here accepts the middle request in the second row and 8
t------4
thereby ensures a solution of size no greater than three. Selecting interval 5 ~ ~ 1
3I I 5I I 9I

A greedy rule that does lead to the optimal solution is based on a fourth
idea: we should accept first the request that finishes first, that is, the reques~t i 8
for which f(i) is as small as possible. This is also quite a natural idea: we ensure
that our resource becomes free as soon as possible while still satisfying one’ Selecting interval

request. In this way we can maximize the time left to satisfy other requests.
Figure 4.2 Sample run of the Interval Scheduling Algorithm. At each step the selected
Let us state the algorithm a bit more formally. We will use R to denote intervals are darker lines, and the intervals deleted at the corresponding step are
the set of requests that we have neither accepted nor rejected yet, and use A indicated with dashed lines.
to denote the set of accepted requests. For an example of how the algorithm
runs, see Figure 4.2.
What we need to show is that this solution is optimal. So, for purposes of
Initially let R be the set of all requests, and let A be empty comparison, let (9 be an optimal set of intervals. Ideally one might want to show
While R is n6t yet empty that A = (9, but this is too much to ask: there may be many optimal solutions,
Choose a request i ~R that has the smallest finishing time and at best A is equal to a single one of them. So instead we will simply show
Add request i to A that ]A] = 1(91, that is, that A contains the same number of intervals as (9 and
Delete all requests from R that axe not compatible with request i hence is also an optimal solution.
EndWhile The idea underlying the proof, as we suggested initially, wil! be to find
Keturn the set A as the set of accepted requests a sense inwhich our greedy algorithm "stays ahead" of this solution (9. We
will compare the partial solutions that the greedy algorithm constructs to initial
segments of the solution (9, and show that the greedy algorithm is doing better
~ Analyzing the Algorithm in a step-by-step fashion.
While this greedy method is quite natural, it is certainly not obvious that it We introduce some notation to help with this proof. Let il ..... ik be the set
returns an optimal set of intervals. Indeed, it would only be sensible to reserve of requests in A in the order they were added to A. Note that IAI --- k. Similarly,
judgment on its optimality: the ideas that led to the previous nonoptimal let the set of requests in (9 be denoted by jl ..... Jrn. Our goal is to prove that
versions of the greedy method also seemed promising at first. k = m. Assume that the requests in (9 are also ordered in the natural left-to-
As a start, we can immediately declare that the intervals in the set A right order of the corresponding intervals, that is, in the order of the start and
returned by the algorithm are all compatible. finish points. Note that the requests in (9 are compatible, which implies that
the start points have the same order as the finish points.
(4.1) A is a compatible set of requests.
Chapter 4 Greedy Algorithms 4.1 Interval Scheduling: The Greedy Algorithm Stays Ahead 121
120

I~
C~an the greedy algorithm’s
interval really finish later?)
(4.3) The greedy algorithm returns an optimal set A.

lr-1 ir ? Proof. We will prove the statement by contradiction. If A is not optimal, then
I
Jr-1
an optimal set (9 must have more requests, that is, we must have m > k.
I Applying (4.2) with r = k, we get that f(ik) < f(Jk). Since m > k, there is a
request Jk+~ in (9. This request starts after request jk ends, and hence after
Figure 4.3 The inductive step in the proof that the greedy algorithm stays ahead. ik ends. So after deleting all requests that are not compatible with requests
ik, the set of possible requests R still contains Jk+l. But the greedy
algorithm stops with request ik, and it is only supposed to stop when R is
Our intuition for the greedy method came from wanting our resource to empty--a contradiction. []
become flee again as soon as possible after satisfying the first request. And
indeed, our greedy rule guarantees that f(i1) < f(Jl). This is the sense in which Implementation and Running Time We can make our algorithm run in time
we want to show that our greedy rule "stays ahead"--that each of its intervals O(n log n) as follows. We begin by sorting the n requests in order of finishing
finishes at least as soon as the corresponding interval in the set O. Thus we now
time and labeling them in this order; that is, we will assume that f(i) < f(j)
prove that for each r >_ !, the rth accepted request in the algorithm’s schedule when i <j. This takes time O(n log n). In an additional O(n) time, we construct
finishes no later than the rth request in the optimal schedule.
an array S[1... n] with the property that S[i] contains the value s(i).
(4.2) For all indices r < k rye have f (ir) <_ [(Jr)- We now select requests by processing the intervals in order of increasing
f(i). We always select the first interval; we then iterate through the intervals in
Proof. We will prove this statement by induction. For r = 1 the statement is order until reaching the first interval ] for which s(j) > f(1); we then select this
clearly true: the algorithm starts by selecting the request i1 with minimum one as well. More generally, if the most recent interval we’ve selected ends
finish time. at time f, we continue iterating through subsequent intervals until we reach
Now let r > 1. We will assume as our induction hypothesis that the the first ] for which s(J) _> f. In this way, we implement the greedy algorithm
statement is true for r- 1, and we will try to prove it for r. As shown in analyzed above in one pass through the intervals, spending constant time per
Figure 4.3, the induction hypothesis lets us assume that f(ir_1) _< f(Jr-1)- In interval. Thus this part of the algorithm takes time O(n).
order for the algorithm’s rth interval not to finish earlier as well, it would
need to "fall behind" as shown. But there’s a simple reason why this could Extensions
not happen: rather than choose a later-finishing interval, the greedy algorithm
The Interval Scheduling Problem we considered here is a quite simple schedul-
always has the option (at worst) of choosing jr and thus fulfilling the induction
ing problem. There are many further complications that could arise in practical
step.
settings. The following point out issues that we will see later in the book in
We can make this argument precise as follows. We know (since (9 consists various forms.
of compatible intervals) that f(Jr-1) -< s(Jr). Combining this with the induction
hypothesis f(ir_1) < f(jr-1), we get f(ir_1) < s(Jr). Thus the interval Jr is in the In defining the problem, we assumed that all requests were known to
set R of available intervals at the time when the greedy algorithm selects the scheduling algorithm when it was choosing the compatible subset.
The greedy algorithm selects the available interval with smallest finish time; It would also be natural, of course, to think about the version of the
since interval Jr is one of these available intervals, we have f(ir) < f(Jr). This problem in which the scheduler needs to make decisions about accepting
completes the induction step. z or rejecting certain requests before knowing about the full set of requests.
Customers (requestors) may well be impatient, and they may give up
Thus we have formalized the sense in which the greedy algorithm is and leave if the scheduler waits too long to gather information about all
remaining ahead of (9: for each r, the rth interval it selects finishes at least other requests. An active area of research is concerned with such on-
as soon as the rth interval in (9. We now see why this implies the optimality line algorithms, which must make decisions as time proceeds, without
of the greedy algorithm’s set A. knowledge of future input.
4.1. Interval Scheduling: The Greedy Algorithm Stays Ahead
122 Chapter 4 Greedy Algorithms 123

Our goal was to maximize the number of satisfied requests. But we could e lj
picture a situation in which each request has a different value to us. For c d
I 1 I l I
example, each request i could also have a value vi (the amount gained b h
by satisfying request i), and the goal would be to maximize our income: a i
the sum of the values of all satisfied requests. This leads to the Weighted
Interval Scheduling Problem, the second of the representative problems (a)
we described in Chapter 1.
There are many other variants and combinations that can arise. We now
discuss one of these further variants in more detail, since it forms another case
in which a greedy algorithm can be used to produce an optimal solution.
II

A Related Problem: Scheduling All Intervals

The Problem In the Interval Scheduling Problem, there is a single resource Figure 4.4 (a) An instance of the Interval Partitioning Problem with ten intervals (a
through j). (b) A solution in which all intervals are scheduled using three resources:
and many requests in the form of time intervals, so we must choose which each row represents a set of intervals that can all be scheduled on a single resource.
requests to accept and which to reject. A related problem arises if we ha{re
many identical resources available and we wish to schedule all the requests’
using as few resources as possible. Because the goal here is to partition assigned to the first resource, the second row contains all those assigned to
all intervals across multiple resources, we will refer to this as the Interval the second resource, and so forth.
Partitioning Problem) Now, is there any hope of using just two resources in this sample instance?
For example, suppose that each request corresponds to a lecture that needs Clearly the answer is no. We need at least three resources since, for example,
to be scheduled in a classroom for a particular interval of time. We wish to intervals a, b, and c all pass over a common point on the time-line, and hence
satisfy a!l these requests, using as few classrooms as possible. The classrooms they all need to be scheduled on different resources. In fact, one can make
at our disposal are thus the multiple resources, and the basic constraint is that this last argument in general for any instance of Interval Partitioning. Suppose
any two lectures that overlap in time must be scheduled in different classrooms. we define the depth of a set of intervals to be the maximum number that pass
Equivalently, the interval requests could be iobs that need to be processed for over any single point on the time-line. Then we claim
a specific period of time, and the resources are machines capable of handling (4.4) In any instance of Interval Partitioning, the number of resources needed
these jobs. Much later in the book, in Chapter 10, we will see a different is at least the depth of the set of intervals.
application of this problem in which the intervals are routing requests that
need to be allocated bandwidth on a fiber-optic cable. Proof. Suppose a set of intervals has depth d, and let 11 ..... Id all pass over a
As an illustration of the problem, consider the sample instance in Fig- common point on the time-line. Then each of these intervals must be scheduled
ure 4.4(a). The requests in this example can all be scheduled using three on a different resource, so the whole instance needs at least d resources. []
resources; this is indicated in Figure 4.4(b), where the requests are rearranged
into three rows, each containing a set of nonoverlapping intervals. In general, We now consider two questions, which turn out to be closely related.
one can imagine a solution using k resources as a rearrangement of the requests First, can we design an efficient algorithm that schedules all intervals using
into k rows of nonoverlapping intervals: the first row contains all the intervals the minimum possible number of resources? Second, is there always a schedule
using a number of resources that is equal to the depth? In effect, a positive
answer to this second question would say that the only obstacles to partitioning
intervals are purely local--a set of intervals all piled over the same point. It’s
1 The problem is also referred to as the Interval Coloring Problem; the terminology arises from
not immediately clear that there couldn’t exist other, "long-range" obstacles
thinking of the different resources as having distinct colors--al~ the intervals assigned to a particular
that push the number of required resources even higher.
resource are given the corresponding color.
4.2 Scheduling to Minimize Lateness: An Exchange Argument 125
Chapter 4 Greedy Algorithms
124
/]), and so t + 1 < d. Thus t < d - !. It follows that at least one of the d labels
We now design a simple greedy algorithm that schedules al! intervals is not excluded by this set of t intervals, and so there is a label that can be
using a number of resources equal to the depth. This immediately implies the
optimality of the algorithm: in view of (4.4), no solution could use a number assigned to Ij.
of resources that is smaller than the depth. The analysis of our algorithm Next we claim that no two overlapping intervals are assigned the same
wil! therefore illustrate another general approach to proving optimality: one label. Indeed, consider any two intervals I and I’ that overlap, and suppose I
finds a simple, "structural" bound asserting that every possible solution must precedes I’ in the sorted order. Then when I’ is considered by the algorithm,
have at least a certain value, and then one shows that the algorithm under I is in the set of intervals whose labels are excluded from consideration;
consideration always achieves this bound. consequently, the algorithm will not assign to I’ the label that it used for I. []
Designing the Algorithm Let d be the depth of the set of intervals; we show
The algorithm and its analysis are very simple. Essentially, if you have
how to assign a label to each interval, where the labels come from the set of
numbers {1, 2 ..... d}, and the assignment has the property that overlapping d labels at your disposal, then as you sweep through the intervals from left
intervals are labeled with different numbers. This gives the desired solution, to right, assigning an available label to each interval you encounter, you can
since we can interpret each number as the name of a resource, and the label never reach a point where all the labels are currently in use.
of each interval as the name of the resource to which it is assigned. Since our algorithm is using d labels, we can use (4.4) to conclude that it
The algorithm we use for this is a simple one-pass greedy strategy that is, in fact, always using the minimum possible number of labels. We sum this
orders intervals by their starting times. We go through the intervals in this up as follows.
order, and try to assign to each interval we encounter a label that hasn’t already’
been assigned to any previous interval that overlaps it. Specifically, we have (4.6) The greedy algorithm above schedules every interval on a resource,
using a number of resources equal to the depth of the set of intervals. This
the following description. is the optimal number of resources needed.
Sort the intervals by their start times, breaking ties arbitrarily
Let I1,12 ..... In denote the intervals in this order
For j=1,2,3 .....
4.2 Scheduling to Minimize Lateness: An Exchange
For each interval Ii that precedes Ii in sorted order and overlaps it Argument
Exclude the label of We now discuss a schedt~ng problem related to the one with which we began
Endfor the chapter, Despite the similarities in the problem formulation and in the
If there is any label from {I, 2 .....d} that has not been excluded then greedy algorithm to solve it, the proof that this algorithm is optimal will require
Assign a nonexcluded label to a more sophisticated kind of analysis.
Else
Leave 11 unlabeled ~ The Problem
Endif
Consider again a situation in which we have a single resource and a set of n
Endfor
requests to use the resource for an interval of time. Assume that the resource is
available starting at time s. In contrast to the previous problem, however, each
Analyzing the Algorithm We claim the following.
request is now more flexible. Instead of a start time and finish time, the request
(4.5) If we use the greedy algorithm above, every interval will be assigned a i has a deadline di, and it requires a contiguous time interval of length ti, but
label, and no two overlapping intervals will receive the same label. it is willing to be scheduled at any time before the deadline. Each accepted
request must be assigned an interval of time of length t~, and different requests
Proof. First let’s argue that no interval ends up unlabeled. Consider one of must be assigned nonoveflapping intervals.
the intervals Ij, and suppose there are t intervals earlier in the sorted order There are many objective functions we might seek to optimize when faced
that overlap it. These t intervals, together with Ij, form a set of t + 1 intervals with this situation, and some are computationally much more difficult than
that all pass over a common point on the time-line (namely, the start time of
Chapter 4 Greedy Algorithms 4.2 Scheduling to Minimize Lateness: An Exchange Argument 127
126
looks too simplistic, since it completely ignores the deadlines of the jobs.
Length 1 Deadline 2 And indeed, consider a two-job instance where the first job has t1 = 1 and
dl= 100, while the second job has t2 = 10 and d2 = 10. Then the second
Length 2 Deadline 4 job has to be started right away if we want to achieve lateness L = 0, and
scheduling the second job first is indeed the optimal solution.
Deadline 6 o The previous example suggests that we should be concerned about jobs
Length 3 whose available slack time d, - t~ is very small--they’re the ones that
,ob 3 I need to be started with minimal delay. So a more natural greedy algorithm
would be to sort jobs in order of increasing slack di - ti.
Solu~on: I[ " Unfortunately, this greedy rule fails as well. Consider a two-job
Job 1: Job 2: Job 3: instance where the first job has q = 1 and d~ = 2, while the second job has
done at done at done at t2 = 10 and d2 ---- !0. Sorting by increasing slack would place the second
time 1 time 1 + 2 = 3 time 1 + 2 + 3 = 6
job first in the schedule, and the first job would incur a lateness of 9. (It
Figure 4.5 A sample instance of scheduling to minimize lateness. finishes at time 11, nine units beyond its dead~ne.) On the other hand,
if we schedule the first job first, then it finishes on time and the second
job incurs a lateness of only 1.
others. Here we consider a very natural goal that can be optimized by a greedy.
There is, however, an equally basic greedy algorithm that always produces
algorithm. Suppose that we plan to satisfy each request, but we are allowed
an optimal solution. We simply sort the jobs in increasing order of their
to let certain requests run late. Thus, beginning at our overall start time s, we
deadJ~es d~, and schedule them in this order. (This nile is often called Earliest
will assign each request i an interval of time of length ti; let us denote this
Deadline First.) There is an intuitive basis to this rule: we should make sure
interval by [s(i), f(/)], with f(i) = s(i) + ti. Unlike the previous problem, then,
that jobs with earlier deadlines get completed earlier. At the same time, it’s a
the algorithm must actually determine a start time (and hence a finish time)
little hard to believe that this algorithm always produces optimal solutions--
for each interval. specifically because it never looks at the lengths of the jobs. Earlier we were
We say that a request i is late if it misses the deadline, that is, if f(i) > di. skeptical of the approach that sorted by length on the grounds that it threw
The lateness of such a request i is defined to be li = f(i) - di. We wil! say that away half the input data (i.e., the deadlines); but now we’re considering a
li = 0 if request i is not late. The goal in our new optimization problem will be solution that throws away the other half of the data. Nevertheless, Earliest
to schedule all requests, using nonoverlapping intervals, so as to minimize the Deadline First does produce optimal solutions, and we will now prove this.
maximum lateness, L = maxi li. This problem arises naturally when scheduling
First we specify some notation that will be useful in talking about the
jobs that need to use a single machine, and so we will refer to our requests as
algorithm. By renaming the jobs if necessary, we can assume that the jobs are
jobs. labeled in the order of their deadlines, that is, we have
Figure 4.5 shows a sample instance of this problem, consisting of three
iobs: the first has length tl = 1 and deadline dl= 2; the second has tz = 2 dI <_ . .. <_ dn.
and d2 = 4; and the third has t3 = 3 and d3 = 6. It is not hard to check that We will simply schedule all jobs in this order. Again, let s be the start time for
scheduling the iobs in the order 1, 2, 3 incurs a maximum lateness of O. all jobs. Job 1 will start at time s = s(1) and end at time/:(1) = s(1) + tl; Job 2
wfl! start at time s(2) =/:(1) and end at time/:(2) = s(2) + t2; and so forth. We
~ Designing the Algorithm will use/: to denote the finishing time of the last scheduled job. We write this
What would a greedy algorithm for this problem look like.~ There are several algorithm here.
natural greedy approaches in which we look at the data (t~, di) about the jobs
and use this to order them according to some simple nile. Order the jobs in order of their deadlines
One approach would be to schedule the jobs in order of increasing length Assume for simplicity of notation that dlj ...Jdn
o Initially, f= s
t~, so as to get the short jobs out of the way quickly. This immediately
Chapter 4 Greedy Algorithms 4.2 Scheduling to Minimize Lateness: An Exchange Argument
128 129

Consider the jobs i=l ..... n in this order ~he main step in showing the optimality of our algorithm is to establish
Assign job i to the time interval from s(f)----/ to f(O=f+ti that there is an optimal schedule that has no inversions and no idle time.-~Fo do
Let f = f + ti this, we will start with any optimal schedule having no idle time; we will then
End convert it into a schedule with no inversions without increasing its maximum
Keturn the set of scheduled intervals [s(O,/(0] for f= 1 ..... n lateness. Thus the resulting schedt~ng after this conversion will be optimal
as wel!.

~ Analyzing the Algorithm (4.9) There is an optimal schedule that has no inversions and no idle time.
To reason about the optimality of the algorithm, we first observe that the
schedule it produces has no "gaps’--times when the machine is not working Proof. By (4.7), there is an optimal schedule (9 with no idle time. The proof
yet there are iobs left. The time that passes during a gap will be called idle will consist of a sequence of statements. The first of these is simple to establish.
time: there is work to be done, yet for some reason the machine is sitting idle.
Not only does the schedule A produced by our algorithm have no idle time; (a) If (9 has an inversion, then there is a pair of jobs i and j such that j is
it is also very easy to see that there is an optimal schedule with this property. scheduled immediately after i and has d] < di.
We do not write down a proof for this. Ihdeed, consider an inversion in which a iob a is scheduled sometime before
a iob b, and da > db. If we advance in the scheduled order of jobs from a to b
(4.7) There is an optimal schedule with no idle time. one at a time, there has to come a point at which the deadline we see decreases
Now, how can we prove that our schedule A is optimal, that is, its for the first time. This corresponds to a pair of consecutive iobs that form an
maximum lateness L is as small as possible? As in previous analyses, we wi~ inversion.
start by considering an optimal schedule (9. Our plan here is to gradually Now suppose (9 has at least one inversion, and by (a), let i andj be a pair of
modify (9, preserving its optimality at each step, but eventually transforming inverted requests that are consecutive in the scheduled order. We wil! decrease
it into a schedule that is identical to the schedule A found by the greedy the number of inversions in 0 by swapping the requests i andj in the schedule
algorithm. We refer to this type of analysis as an exchange argument, and we O. The pair (i, j) formed an inversion in (9, this inversion is eliminated by the
will see that it is a powerful way to think about greedy algorithms in general. swap, and no new inversions are created. Thus we have
We first try characterizing schedules in the following way. We say that a
(b) After swapping i and ] we get a schedule with one less inversion.
schedule A’ has an inversion if a job i with deadline di is scheduled before
another job j with earlier deadline d] < di. Notice that, by definition, the The hardest part of this proof is to argue that the inverted schedule is also
schedule A produced by our algorithm has no inversions. If there are jobs optimal.
with identical deadlines then there can be many different schedules with no
(c) The new swapped schedule has a maximum lateness no larger than that
inversions. However, we can show that all these schedules have the same of O.
maximum lateness L.
It is clear that if we can prove (c), then we are done.- The initial schedule 0
(4.8) All schedules with no inversions and no idle time have the same can have at most (~) inversions (if all pairs are inverted), and hence after at
maximum lateness. most (~) swaps we get an optimal schedule with no inversions.
Proof. If two different schedules have neither inversions nor idle time, then So we now conclude by proving (c), showing that b~; swapping a pair of
they might not produce exactly the same order of jobs, but they can only differ consecutive, inverted jobs, we do not increase the maximum lateness L of the
in the order in which jobs with identical deadlines are scheduled. Consider schedule. []
such a deadline d. In both schedules, the jobs with deadline d are all scheduled
consecutively (after all jobs with earlier deadlines and before all jobs with Proof of (c). We invent some notation to describe the schedule (9: assume
later deadlines). Among the jobs with deadline d, the last one has the greatest that eachl’r.
lateness request
Let L’r =ismax
scheduled for the
r l’r denote thetime intervallateness
maximum [s(r), f(r)] andschedule.
of this has
lateness, and this lateness does not depend on the order of the jobs. []
131

I Oarnly the finishing

e affected times.......
by the swap. of i and ~
(4.10) The schedule A produced by the greedy algorithm has optimal maxi-
mum lateness L.
Before swapping:
Proof. Statement (4.9) proves that an optimal schedule with no inversions
exdsts. Now by (4.8) all schedules with no inversions have the same maximum
lateness, and so the schedule obtained by the greedy algorithm is optimal, m

(a) Extensions
There are many possible generalizations of this scheduling problem. For ex-
After swapping: ample, we assumed that all jobs were available to start at the common start
time s. A natural, but harder, version of this problem would contain requests i
that, in addition to the deadline dz and the requested time t~, would also have
an earliest possible starting time ri. This earliest possible starting time is usu-
dI di
ally referred to as the release time. Problems with release times arise natura!ly
in scheduling problems where requests can take the form: Can I reserve the
Figure 4.6 The effect of swapping two consecutive, inverted jobs. room for a two-hour lecture, sometime between 1 P.M. and 5 P.M.? Our proof
that the greedy algorithm finds an optimal solution relied crucially on the fact
that all jobs were available at the common start time s. (Do you see where?)
Unfortunately, as we wi!l see later in the book, in Chapter 8, this more general
Let ~ denote the swapped schedule; we will use ~(r), ~(r), ~r, and ~ to denote
version of the problem is much more difficult to solve optimally.
the corresponding quantities in the swapped schedule.
Now recall our two adiacent, inverted jobs i and]. The situation is roughly
as pictured in Figure 4.6. The finishing time of] before the swap is exactly equal 4.3 Optimal Caching: A More Complex Exchange
to the finishing time of i after the swap. Thus all jobs other than jobs i and ] Argument
finish at the same time in the two schedules. Moreover, job j will get finished We now consider a problem that involves processing a sequence of requests
earlier in the new schedule, and hence the swap does not increase the lateness of a different form, and we develop an aig~~rithm whose analysis requires
of job j. a more subtle use of the exchange argumem. The problem is that of cache
Thus the only thing to worry about is job i: its lateness may have been maintenance.
increased, and what if this actually raises the maximum lateness of the
whole schedule? After the swap, job i finishes at time f(j), when job j was
~.~ The Problem
finished in the schedule (9. If job i is late in this new schedule, its lateness
is ~i = ~(i) - di = f(j) -di. But the crucial point is that i cannot be more late To motivate caching, consider the following situation. You’re working on a
in the schedule -~ than j was in the schedule (9. Specifically, our assumption long research paper, and your draconian library will only allow you to have
eight books checked out at once. You know that you’ll probably need more
d~ > dj implies that
than this over the course of working on the paper, but dt any point in time,
~ = f(]) -di < f(]) - di = l~. you’d like to have ready access to the eight books that are most relevant at
that tirng. How should you decide which books to check out, and when should
Since the lateness of the schedule (9 was L’ >_ lj > ~, this shows that the swap you return some in exchange for others, to minimize the number of times you
does not increase the maximum lateness of the schedule. [] have to exchange a book at the library?
This is precisely the problem that arises when dealing with a memory
The optimality of our greedy algorithm now follows immediately. hierarchy: There is a small amount of data that can be accessed very quickly,
4.3 Optimal Caching: A More Complex Exchange Argument
Chapter 4 Greedy Algorithms 133
132
Thus, on a particular sequence of memory references, a cache main-
and a large amount of data that requires more time to access; and you must
tenance algorithm determines an eviction schedule--specifying which items
decide which pieces of data to have close at hand. should be evicted from the cache at which points in the sequence--and t_his
Memory hierarchies have been a ubiquitous feature of computers since determines the contents of the cache and the number of misses over time. Let’s
very early in their history. To begin with, data in the main memory of a consider an example of this process.
processor can be accessed much more quickly than the data on its hard disk;
but the disk has much more storage capaciW. Thus, it is important to keep Suppose we have three items [a, b, c], the cache size is k = 2, and we
the most regularly used pieces o~ data in main memory, and go to disk as are presented with the sequence
infrequently as possible. The same phenomenon, qualitatively, occurs with
a,b,c,b,c,a,b.
on-chip caches in modern processors. These can be accessed in a few cycles,
and so data can be retrieved from cache much more quickly than it can be Suppose that the cache initially contains the items a and b. Then on the
retrieved from main memory. This is another level of hierarchy: smal! caches third item in the sequence, we could evict a so as to bring in c; and
have faster access time than main memory, which in turn is smaller and faster on the sixth item we could evict c so as to bring in a; we thereby incur
to access than disk. And one can see extensions of this hierarchy in many two cache misses over the whole sequence. After thinking about it, one
other settings. When one uses a Web browser, the disk often acts as a cache concludes that any eviction schedule for this sequence must include at
for frequently visited Web pages, since going to disk is stil! much faster than least two cache misses.
downloading something over the Internet.
Under real operating conditions, cache maintenance algorithms must
Caching is a general term for the process of storing a small amount of dat~ process memory references dl, d2 .... without knowledge of what’s coming
in a fast memory so as to reduce the amount of time spent interacting with a in the future; but for purposes of evaluating the quality of these algorithms,
slow memory. In the previous examples, the on-chip cache reduces the need systems researchers very early on sought to understand the nature of the
to fetch data from main memory, the main memory acts as a cache for the optimal solution to the caching problem. Given a fifll sequence S of memory
disk, and the disk acts as a cache for the Internet. (Much as your desk acts as references, what is the eviction schedule that incurs as few cache misses as
a cache for the campus library, and the assorted facts you’re able to remember possible?
without looMng them up constitute a cache for the books on your desk.)
For caching to be as effective as possible, it should generally be the case ~ Designing and Analyzing the Algorithm
that when you go to access a piece of data, it is already in the cache. To achieve
In the 1960s, Les Belady showed that the following simple rule will always
this, a cache maintenance algorithm determines what to keep in the cache and
what to evict from the cache when new data needs to be brought in. incur the minimum number of misses:

Of course, as the caching problem arises in different settings, it involves

various different considerations based on the underlying technologY. For our When di needs to be brought into the cache,
evict the item that is needed the farthest into the future
purposes here, though, we take an abstract view of the problem that underlies
most of these settings. We consider a set U of n pieces of data stored in main
memory. We also have a faster memory, the cache, that can hold k < n pieces We will call this the Farthest-in-Future Algorithm. When it is time to evict
of data at any one time. We will assume that the cache initially holds some something, we look at the next time that each item in the cachewill be
set of k items. A sequence of data items D = dl, d2 ..... dm drawn from U is referenced, and choose the one for which this is as late as possible.
presented to us--this is the sequence of memory references we must process-- This is a very natural algorithm. At the same time, the fact that it is optimal
and in processing them we must decide at all times which k items to keep in the on all sequences is somewhat more subtle than it first appears. Why evict the
cache. When item di is presented, we can access it very quickly if it is already item that is needed farthest in the future, as opposed, for example, to the one
in the cache; otherwise, we are required to bring it from main memory into that will be used least frequently in the future? Moreover, consider a sequence
the cache and, if the cache is ful!, to evict some other piece of data that is like
currently in the cache to make room for di. This is called a cache miss, and we
a,b,c,d,a,d,e,a,d,b,c
want to have as few of these as possible.
Chapter 4 Greedy Algorithms 4.3 Optimal Caching: A More Complex Exchange Argument
134 135

with k = 3 and items {a, b, c} initially in the cache. The Farthest-in-Future rule Proving the Optimalthy of Farthest-in-Future We now proceed with the
will produce a schedule S that evicts c on the fourth step and b on the seventh exchange argument showing that Farthest-in-Future is optimal. Consider an
step. But there are other eviction schedules that are just as good. Consider arbitrary sequence D of memory references; let S~F denote the schedule
the schedule S’ that evicts b on the fourth step and c on the seventh step, produced by Farthest-in-Future, and let S* denote a schedule that incurs the
incurring the same number of misses. So in fact it’s easy to find cases where minimum possible number of misses. We will now gradually "transform" the
schedules produced by rules other than Farthest-in-Future are also optimal; schedule S* into the schedule SEE, one eviction decision at a time, without
and given this flexibility, why might a deviation from Farthest-in-Future early increasing the number of misses.
on not yield an actual savings farther along in the sequence.~ For example, on Here is the basic fact we use to perform one step in the transformation.
the seventh step in our example, the schedule S’ is actually evicting an item
(c) that is needed farther into the future than the item evicted at this point by (4.12) Let S be a reduced scheduIe that makes the same eviction deasions
Farthest-in-Future, since Farthest-in-Future gave up c earlier on. as SEE through the first j items in the sequence, for a number j. Then there is a
reduced schedule S’ that makes the same eviction decisions as SEE through the
These are some of the kinds of things one should worry about before
first ] + 1 items, and incurs no more misses than S does.
concluding that Farthest-in-Future really is optimal. In thinking about the
example above, we quickly appreciate that it doesn’t really matter whether
b or c is evicted at the fourth step, since the other one should be evicted at Proof. Consider the (j + 1)st request, to item d = dy+l. Since S and SEE have
agreed up to this point, they have the same cache contents. If d is in the cache
the seventh step; so given a schedule where b is evicted first, we can swap
the choices of b and c without changing the cost. This reasoning--swapping, for both, then no eviction decision is necessary (both schedules are reduced),
one decision for another--forms the first outline of an exchange argument that and so S in fact agrees with SEE through step j + 1, and we can set S’ = S.
Similarly, if d needs to be brought into the cache, but S and SEE both evict the
proves the optimality of Farthest-in-Future. same item to make room for d, then we can again set S’ = S.
Before delving into this analysis, let’s clear up one important issue. Al!
So the interesting case arises when d needs to be brought into the cache,
the cache maintenance algorithms we’ve been considering so far produce
schedules that only bring an item d into the cache .~n a step i if there is a and to do this S evicts item f while SEE evicts item e ~ f. Here S and SEE do
request to d in step i, and d is not already in the cache. Let us ca!l such a not already agree through step j + ! since S has e in cache while SEE has f in
schedule reduced--it does the minimal amount of work necessary in a given cache. Hence we must actually do something nontrivial to construct S’.
step. But in general one could imagine an algorithm that produced schedules As a first step, we should have S’ evict e rather than f. Now we need to
that are not reduced, by bringing in items in steps when they are not requested. further ensure that S’ incurs no more misses than S. An easy way to do this
We now show that for every nonreduced schedule, there is an equally good would be to have S’ agree with S for the remainder of the sequence; but this
reduced schedule. is no longer possible, since S and S’ have slightly different caches from this
point onward. So instead we’ll have S’ try to get its cache back to the same
Let S be a schedule that may not be reduced. We define a new schedule
state as S as quickly as possible, while not incurring unnecessary misses. Once
-~--the reduction of S--as follows. In any step i where S brings in an item d
the caches are the same, we can finish the construction of S’ by just having it
that has not been requested, our construction of ~ "pretends" to do this but
behave like S.
actually leaves d in main memory. It only really brings d into the cache in
the next step j after this in which d is requested. In this way, the cache miss Specifically, from request j + 2 onward, S’ behaves exactly like S until one
incurred by ~ in step j can be charged to the earlier cache operation performed of the following things happens for the first time.
by S in step i, when it brought in d. Hence we have the following fact. (i) There is a request to an item g ~ e, f that is not in the cache of S, and S
evicts e to make room for it. Since S’ and S only differ on e and f, it must
(4.11) -~ is a reduced schedule that brings in at most as many items as the be that g is not in the cache of S’ either; so we can have S’ evict f, and
schedule S. now the caches of S and S’ are the same. We can then have S’ behave
exactly like S for the rest of the sequence.
Note that for any reduced schedule, the number of items that are brought (ii) There is a request to f, and S evicts an item e’. If e’ = e, then we’re all
set: S’ can simply access f from the cache, and after this step the caches
in is exactly the number of misses.
Chapter 4 Greedy Algorithms 4.4 Shortest Paths in a Graph
136 137

of S and S’ will be the same. If e’ ~ e, then we have S’ evict

" ’ e as wel!, and Long after the adoption of LRU in pradtice, Sleator and Tartan showed that
bring in e from main memory; this too results in S and S’ having the same one could actually provide some theoretical analysis of the performance of
caches. However, we must be careful here, since S’ is no longer a reduced LRU, bounding the number of misses it incurs relative to Farthest-in-Future.
schedule: it brought in e when it wasn’t immediately needed. So to finish We will discuss this analysis, as well as the analysis of a randomized variant
this part of the construction, we further transform S’ to its reduction S’ on LRU, when we return to the caching problem in Chapter 13.
using (4.11); this doesn’t increase the number of items brought in by S’,
and it still agrees with SFF through step j + 1. 4.4 Shortest Paths in a Graph
Hence, in both these cases, we have a new reduced schedule S’ that agrees Some of the basic algorithms for graphs are based on greedy design principles.
with SFF through the first j + 1 items and incurs no more misses than S does. Here we apply a greedy algorithm to the problem of finding shortest paths, and
And crucially--here is where we use the defining property of the Farthest-in- in the next section we look at the construction of minimum-cost spanning trees.
Future Algorithm--one of these two cases will arise before there is a reference
to e. This is because in step j + 1, Farthest-in-Future evicted the item (e) that zJ The Problem
would be needed farthest in the future; so before there could be a request to As we’ve seen, graphs are often used to model networks in which one trav-
e, there would have to be a request to f, and then case (ii) above would apply. els from one point to another--traversing a sequence of highways through
interchanges, or traversing a sequence of communication links through inter-
mediate touters. As a result, a basic algorithmic problem is to determine the
Using this result, it is easy to complete the proof of optimality. We begin’ shortest path between nodes in a graph. We may ask this as a point-to-point
with an optima! schedule S*, and use (4.12) to construct a schedule $1 that question: Given nodes u and v, what is the shortest u-v path? Or we may ask
agrees with SFF through the first step. We continue applying (4.12) inductively for more information: Given a start node s, what is the shortest path from s to
for j = 1, 2, 3 ..... m, producing schedules Sj that agree with S~F through the each other node?
first j steps. Each schedule incurs no more misses than the previous one; and
The concrete setup of the shortest paths problem is as follows. We are
by definition Sm= Sw, since it agrees with it through the whole sequence. given a directed graph G = (V, E), with a designated start node s. We assume
Thus we have that s has a path to every other node in G. Each edge e has a length ~e -> 0,
(4.13) S~F incurs no more misses than any other schedule S* and hence is indicating the time (or distance, or cost) it takes to traverse e. For a path P,
the length of P--denoted g(P)--is the sum of the lengths of all edges in P.
optimal.
Our goal is to determine the shortest path from s to every other node in the
graph. We should mention that although the problem is specified for a dkected
Extensions: Caching under Real Operating Conditions graph, we can handle the case of an undirected graph by simply replacing each
undirected edge e = (u, v) of length ~e by two directed edges (u, v) and (u, u),
As mentioned in the previous subsection, Belady’s optimal algorithm provides
each of length ge-
a benchmark for caching performance; but in applications, one generally must
make eviction decisions on the fly without knowledge of future requests.
Experimentally, the best caching algorithms under this requirement seem to be ~ Designing the Algorithm
variants of the Least-Recently-Used (LRU) Principle, which proposes evicting In 1959, Edsger Dijkstra proposed a very simple greedy algorithm to solve the
the item from the cache that was referenced longest ago. single-source shortest-paths problem. We begin by describing an algorithm that
If one thinks about it, this is just Belady’s Algorithm with the direction just determines the length of the shortest path from s to each other node in the
of time reversed--longest in the past rather than farthest in the future. It is graph; it is then easy to produce the paths as well. The algorithm maintains a
effective because applications generally exhibit locality of reference: a running set S of vertices u for which we have determined a shortest-path distance d(u)
program will generally keep accessing the things it has just been accessing. from s; this is the "explored" part of the graph. Initially S = {s}, and d(s) = O.
(It is easy to invent pathological exceptions to this principle, but these are Now, for each node v ~ V-S, we determine the shortest path that can be
relatively rare in practice.) Thus one wants to keep the more recently referenced constructed by traveling along a path through the explored part S to some
items in the cache. u ~ $, followed by the single edge (u, v). That is, we consider the quantity
Chapter 4 Greedy Algorithms 4.4 Shortest Paths in a Graph 139
138

d’(v) = mine=(a,v):a~s d(a) + ~e. We choose the node v e V-S for which t~s
quantity is minimized, add v to S, and define d(v) to be the value d’(v). 3

Dijkstra’s Algorithm (G, ~)

Let S be the set of explored nodes
For each ueS, we store a distsnce d(u)
Initially S = Is} and d(s) = 0
While S ~ V
Select a node u ~S with at least one edge from S for which
d’(u) = nfine=(u,v):u~s d(u) + ~-e is as small as possible Set S: ~
nodes already
Add u to S and define d(u)=d’(u) explored
EndWhile

It is simple to produce the s-u paths corresponding to the distances found

by Dijkstra’s Algorithm. As each node v is added to the set S, we simply record Figure 4.7 A snapshot of the execution of Dijkstra’s Algorithm. The next node that will
the edge (a, v) on which it achieved the value rnine=(a,v):ues d(u) + £e. The be added to the set S is x, due to the path through u.
path Pv is implicitly represented by these edges: if (u, v) is the edge we have
stored for v, then P~ is just (recursively) the path P~ followed by the single
edge (u, ~). In other words, to construct P~, we simply start at 12; follow the the sense that we always form the shortest new s-12 path we can make from a
edge we have stored for v in the reverse direction to a; then follow the edge we path in S followed by a single edge. We prove its correctness using a variant of
have stored for a in the reverse direction to its predecessor; and so on until we our first style of analysis: we show that it "stays ahead" of all other solutions
by establishing, inductively, that each time it selects a path to a node 12, that
reach s. Note that s must be reached, since our backward walk from 12 visits
path is shorter than every other possible path to v.
nodes that were added to S earlier and earlier.
To get a better sense of what the algorithm is doing, consider the snapshot (4.14) Consider the set S at any point in the algorithm’s execution. For each
of its execution depicted in Figure 4.7. At the point the picture is drawn, two u ~ S, the path Pu is a shortest s-u path. :~
iterations have been performed: the first added node u, and the second added
node 12. In the iteration that is about to be performed, the node x wil! be added Note that this fact immediately establishes the correctness of Dijkstra’s
because it achieves the smallest value of d’(x); thanks to the edge (u, x), we Mgofithm, since we can apply it when the algorithm terminates, at which
have d’(x) = d(a) + lax = 2. Note that attempting to add y or z to the set S at point S includes all nodes.
this point would lead to an incorrect value for their shortest-path distances;
ultimately, they will be added because of their edges from x. Proof. We prove this by induction on the size of S. The case IS] = 1 is easy,
since then we have S = {s] and d(s) = 0. Suppose the claim holds when IS] = k
for some value of k > 1; we now grow S to size k + 1 by adding the node 12.
~ Analyzing the Algorithm Let (u, 12) be the final edge on our s-12 path P~.
We see in this example that Dijkstra’s Algorithm is doing the fight thing and
By induction hypothesis, Pu is the shortest s-u path for each u ~ S. Now
avoiding recurring pitfalls: growing the set S by the wrong node can lead to an
consider any other s-12 path P; we wish to show that it is at least as long as P~.
overestimate of the shortest-path distance to that node. The question becomes:
In order to reach ~, this path P must leave the set S sornetuhere; let y be the
Is it always true that when Dijkstra’s Algorithm adds a node v, we get the true
first node on P that is not in S, and let x ~ S be the node just before y.
shortest-path distance to 127.
We now answer this by proving the correctness of the algorithm, showing The situation is now as depicted in Figure 4.8, and the crux of the proof
that the paths Pa really are shortest paths. Dijkstra’s Algorithm is greedy in is very simple: P cannot be shorter than P~ because it is already at least as
4.4 Shortest Paths in a Graph 141
Chapter 4 Greedy Algorithms
140
of wavefront reaches nodes in increasing order of their distance from s. It is
easy to believe (and also true) that the path taken by the wavefront to get to
any node u is a shortest path. Indeed, it is easy to see that this is exactly the
path to v found by Dijkstra’s Algorithm, and that the nodes are discovered by
lth
The alternate s-v path P through~
x and y is already too long by |
e time it has left the set S, )
the expanding water in the same order that they are discovered by Dijkstra’s
Algorithm.

Set S

Figure 4.8 The shortest path Pv and an alternate s-v path P through the node
Implementation and Running Time To conclude our discussion of Dijkstra’s
Algorithm, we consider its running time. There are n - 1 iterations of the
krt~±].e loop for a graph with n nodes, as each iteration adds a new node v
to S. Selecting the correct node u efficiently is a more subtle issue. One’s first
long as Pv by the time it has left the set S. Indeed, in iteration k + 1, Dijkstra’s
impression is that each iteration would have to consider each node v ~ S,
Algorithm must have considered adding node y to the set S via the edge (x, y)
and go through all the edges between S and u to determine the minimum
and rejected this option in favor of adding u. This means that there is no path
mine=(u,u):u~s d(u)+g-e, so that we can select the node v for which this
from s to y through x that is shorter than Pv- But the subpath of P up to y is
minimum is smallest. For a graph with m edges, computing all these minima
such a path, and so this subpath is at least as long as P,. Since edge length~
can take O(m) time, so this would lead to an implementation that runs in
are nonnegative, the full path P is at least as long as P, as well.
O(mn) time.
This is a complete proof; one can also spell out the argument in the
previous paragraph using the following inequalities. Let P’ be the Subpath We can do considerably better if we use the right data structures. First, we
of P from s to x. Since x ~ S, we know by the induction hypothesis that Px is a will explicitly maintain the values of the minima d’(u) = mJne=(u,u):u~s d(u) +
shortest s-x path (of length d(x)), and so g(P’) > g(Px) = d(x). Thus the subpath ~e for each node v V - S, rather than recomputing them in each iteration.
We can further improve the efficiency by keeping the nodes V - S in a priority
of P out to node y has length ~(P’) + g(x, y) > d(x) + g.(x, y) > d’(y), and the
full path P is at least as long as this subpath. Finally, since Dijkstra’s Algorithm queue with d’(u) as their keys. Priority queues were discussed in Chapter 2;
selected u in this iteration, we know that d’(y) >_ d’(u) = g(Pv). Combining these they are data structures designed to maintain a set of n elements, each with a
key. A priority queue can efficiently insert elements, delete elements, change
inequalities shows that g(P) >_ ~(P’) + ~.(x, y) >_ g(P~). ’~
an element’s key, and extract the element with the minimum key. We will need
the third and fourth of the above operations: ChangeKey and Ex~cractN±n.
Here are two observations about Dijkstra’s Algorithm and its analysis.
First, the algorithm does not always find shortest paths if some of the edges How do we implement Dijkstra’s Algorithm using a priority queue? We put
can have negative lengths. (Do you see where the proof breaks?) Many the nodes V in a priority queue with d’(u) as the key for u ~ V. To select the node
shortest-path applications involve negative edge lengths, and a more com- v that should be added to the set S, we need the Extrac~cN±n operation. To see
plex algorithm--due to Bellman and Ford--is required for this case. We will how to update the keys, consider an iteration in which node u is added to S, and
see this algorithm when we consider the topic of dynamic programming. let tv ~ S be a node that remains in the priority queue. What do we have to do
to update the value of d’(w)? If (v, w) is not an edge, then we don’t have to do
The second observation is that Dijkstra’s Algorithm is, in a sense, even
anything: the set of edges considered in the minimum mihe=(u,w):a~s d(u) + ~e
simpler than we’ve described here. Dijkstra’s Algorithm is really a "contin-
is exactly the same before and after adding v to S. If e’ = (v, w) ~ E, on
uous" version of the standard breadth-first search algorithm for traversing a
the other hand, then the new value for the key is min(d’(w), d(u) + ~-e’). If
graph, and it can be motivated by the following physical intuition. Suppose
d’(ro) > d(u) + ~e’ then we need to use the ChangeKey operation to decrease
the edges of G formed a system of pipes filled with water, joined together at
the key of node w appropriately. This ChangeKey operation can occur at most
the nodes; each edge e has length ge and a fixed cross-sectional area. Now
once per edge, when the tail of the edge e’ is added to S. In summary, we have
suppose an extra droplet of water falls at node s and starts a wave from s. As
the following result.
the wave expands out of node s at a constant speed, the expanding sphere
Chapter 4 Greedy Algorithms 4.5 The Minimum Spanning Tree Problem 143
142
cycles until we had a tree; with nonnegative edges, the cost would not increase
(4,1S) Using a priority queue, Di]kstra’s Algorithm can be implemented on during this process.
a graph with n nodes and m edges to run in O(m) time, plus the time for n We will call a subset T __c E a spanning tree of G if (V, T) is a tree. Statement
Extrac~Min and m ChamgeKey operations. (4.16) says that the goal of our network design problem can be rephrased as
that of finding the cheapest spanning tree of the graph; for this reason, it
Using the heap-based priority queue implementation discussed in Chap- is generally called the Minimum Spanning Tree Problem. Unless G is a very
ter 2, each priority queue operation can be made to run in O(log n) time. Thus simple graph, it will have exponentially many different spanning trees, whose
the overall time for the implementation is O(m log r~). structures may look very different from one another. So it is not at all clear
how to efficiently find the cheapest tree from among all these options.
4.5 The Minimum Spanning Tree Problem
We now apply an exchange argument in the context of a second fundamental fi Designing Algorithms
problem on graphs: the Minimum Spanning Tree Problem. As with the previous problems we’ve seen, it is easy to come up with a number
of natural greedy algorithms for the problem. But curiously, and fortunately,
~ The Problem this is a case where many of the first greedy algorithms one tries turn out to be
correct: they each solve the problem optimally. We will review a few of these
Suppose we have a set of locations V = {vl, v2 ..... vn}, and we want to build a
communication network on top of them. The network should be connected-- algorithms now and then discover, via a nice pair of exchange arguments, some
there should be a path between every pair of nodes--but subiect to this’ of the underlying reasons for this plethora of simple, optimal algorithms.
requirement, we wish to build it as cheaply as possible. Here are three greedy algorithms, each of which correctly finds a minimum
For certain pairs (vi, vj), we may build a direct link between vi and vj for spanning tree.
a certain cost c(vi, vj) > 0. Thus we can represent the set of possible links that One simple algorithm starts without any edges at all and builds a span-
may be built using a graph G = (V, E), with a positive cost Ce associated with ning tree by successively inserting edges from E in order of increasing
each edge e = (vi, vj). The problem is to find a subset of the edges T_ E so cost. As we move through the edges in this order, we insert each edge
that the graph (V, T) is connected, and the total cost ~e~T Ce is as small as e as long as it does not create a cycle when added to the edges we’ve
possible. (We will assume that thefull graph G is connected; otherwise, no already inserted. If, on the other hand, inserting e would result in a cycle,
solution is possible.) then we simply discard e and continue. This approach is called Kruskal’s
Here is a basic observation. Algorithm.
Another simple greedy algorithm can be designed by analogy with Dijk-
(4.16) Let T be a minimum-cost solution to the network design problem
stra’s Algorithm for paths, although, in fact, it is even simpler to specify
defined above. Then (V, T) is a tree. than Dijkstra’s Algorithm. We start with a root node s and try to greedily
Proof. By definition, (V, T) must be connected; we show that it also will grow a tree from s outward. At each step, we simply add the node that
contain no cycles. Indeed, suppose it contained a cycle C, and let e be any can be attached as cheaply as possibly to the partial tree we already have.
edge on C. We claim that (V, T - {e}) is still connected, since any path that More concretely, we maintain a set S _c V on which a spanning tree
previously used the edge e can now go. "the long way" around the remainder has been constructed so far. Initially, S = {s}. In each iteration, we grow
of the cycle C instead. It follows that (V, T - {e}) is also a valid solution to the S by one node, adding the node v that minimizes the "attachment cost"
problem, and it is cheaper--a contradiction. " mine=(u,u):u~s ce, and including the edge e = (u, v) that achieves this
minimum in the spanning tree. This approach is called Prim’s Algorithm.
If we allow some edges to have 0 cost (that is, we assume only that the
Finally, we can design a greedy algorithm by running sort of a "back-
costs Ce are nonnegafive), then a minimum-cost solution to the network design
ward" version of Kruskal’s Algorithm. Specifically, we start with the full
problem may have extra edges--edges that have 0 cost and could option!lly
graph (V, E) and begin deleting edges in order of decreasing cost. As we
be deleted. But even in this case, there is always a minimum-cost solution that
get to each edge e (starting from the most expensive), we delete it as
is a tree. Starting from any optimal solution, we could keep deleting edges on
4.5 The Minimum Spanning Tree Problem 145
Chapter 4 Greedy Algorithms
144
easier to express the arguments that follow, and we will show later in this
section how this assumption can be easily eliminated.
When Is It Safe to Include an Edge in the Minimum Spanning Tree? The
crucial fact about edge insei-tion is the following statement, which we wil!
refer to as the Cut Property.

(4.17) Assumethatalledgecostsaredistinct. LetSbeanysubsetofnodesthat

is neither empty nor equal to all of V, and let edge e = (v, w) be the minimum-
cost edge with one end in S and the other in V- S. Then every minimum
spanning tree contains the edge e.

Proof. Let T be a spanning tree that does not contain e; we need to show that T
does not have the minimum possible cost. We’!l do this using an exchange
(b) argument: we’ll identify an edge e’ in T that is more expensive than e, and
with the property exchanging e for e’ results in another spanning tree. This
Figure 4.9 Sample run of the Minimum Spanning Tree Algorithms of (a) Prim and resulting spanning tree will then be cheaper than T, as desired.
(b) Kruskal, on the same input. The first 4 edges added to the spanning tree are indicated
The crux is therefore to find an edge that can be successfully exchanged
by solid lines; the ne~xt edge to be added is a dashed line.
with e. Recall that the ends of e are v and w. T is a spanning tree, so there
must be a path P in T from v to w. Starting at ~, suppose we follow the nodes
long as doing so would not actually disconnect the graph we currently of P in sequence; there is a first node w’ on P that is in V - S. Let u’ E S be the
have. For want of a better name, this approach is generally called the node just before w’ on P, and let e’ = (v’, w’) be the edge joining them. Thus,
Reverse-Delete Algorithm (as far as we can te!l, it’s never been named e’ is an edge of T with one end in S and the other in V - S. See Figure 4.10 for
after a specific person). the situation at this stage in the proof.
For example, Figure 4.9 shows the first four edges added by Prim’s and If we exchange e for e’, we get a set of edges T’= T- (e’} U {e). We
Kruskal’s Algorithms respectively, on a geometric instance of the Minimum claim that T’ is a spanning tree. Clearly (V, T’) is connected, since (V, T)
Spanning Tree Problem in which the cost of each edge is proportional to the is connected, and any path in (V, T) that used the edge e’ = (~’, w’) can now
geometric distance in the plane. be "rerouted" in (V, T’) to follow the portion of P from v’ to v, then the edge
e, and then the portion of P from w to w’. To see that (V, T’) is also acyclic,
The fact that each of these algorithms is guaranteed to produce an opti-
note that the only cycle in (V, T’ U {e’}) is the one composed of e and the path
mal solution suggests a certain "robustness" to the Minimum Spanning Tree
P, and this cycle is not present in (V, T’) due to the deletion of e’.
Problem--there are many ways to get to the answer. Next we explore some of
the underlying reasons why so many different algorithms produce minimum- We noted above that the edge e’ has one end in S and the other in V - S.
cost spanning trees. But e is the cheapest edge with this property, and so ce < ce,. (The inequality
is strict since no two edges have the same cost.) Thus the total cost of T’ is
less than that of T, as desired. ,,
f! Analyzing the Algorithms
All these algorithms work by repeatedly inserting or deleting edges from a The proof of (4.17) is a bit more subtle than it may first appear. To
partial solution. So, to analyze them, it would be useful to have in hand some appreciate this subtlety, consider the following shorter but incorrect argument
basic facts saying when it is "safe" to include an edge in the minimum spanning for (4.17). Let T be a spanning tree that does not contain e. Since T is a
tree, and, correspondingly, when it is safe to eliminate an edge on the grounds spanning tree, it must contain an edge f with one end in S and the other in
that it couldn’t possibly be in the minimum spanning tree. For purposes of V - S. Since e is the cheapest edge with this property, we have ce < cf, and
the analysis, we will make the simplifying assumption that all edge costs are hence T - If} U {el is a spanning tree that is cheaper than T.
distinct from one another (i.e., no two are equal). This assumption makes it
4.5 The Minimum Spanning Tree Problem 147
Chapter 4 Greedy Algorithms
146
So if we can show that the output (V, T) of Kruskal’s Algorithm is in fact
a spanning tree of G, then we will be done. Clearly (V, T) contains no cycles,
since the algorithm is explicitly designed to avoid creating cycles. Further, if
(V, T) were not connected, then there would exist a nonempty subset of nodes
S (not equal to all of V) such that there is no edge from S to V - S. But this
contradicts the behavior of the algorithm: we know that since G is connected,
there is at least one edge between S and V - S, and the algorithm will add the
first of these that it encounters. []

(4.19) Prim’s Algorithm produces a minimum spanning tree of G~

(e can be swapped for e’.) Proof. For Prim’s Algorithm, it is also very easy to show that it only adds
edges belonging to every minimum spanning tree. Indeed, in each iteration of
the algorithm, there is a set S _ V on which a partial spanning tree has been
constructed, and a node u and edge e are added that minimize the quantity
mine=(u,u):u~s Ce. By definition, e is the cheapest edge with one end in S and the
other end in V - S, and so by the Cut Property (4.17) it is in every minimum
spanning tree.
Figure 4.10 Swapping the edge e for the edge e’ in the spanning tree T, as described in
the proof of (4.17). It is also straightforward to show that Prim’s Algorithm produces a span-
ning tree of G, and hence it produces a minimum spanning tree. []

The problem with this argument is not in the claim that f exists, or that When Can We Guarantee an Edge Is Not in the Minimum Spanning
T {f} U {e} is cheaper than T. The difficulty is that T - {f} U {e} may not be Tree? The crucial fact about edge deletion is the following statement, which
a spanning tree, as shown by the example of the edge f in Figure 4.10. The we wil! refer to as the Cycle Property.
point is that we can’t prove (4.17) by simply picking any edge in T that crosses
(4.20) Assume that all edge costs are distinct. Let C be any cycle in G, and
from S to V - S; some care must be taken to find the right one.
let edge e = (v, w) be the most expensive edge belonging to C. Then e does not
The Optimality of Kraskal’s and Prim’s Algorithms We can now easily belong to any minimum spanning tree of G.
prove the optimality of both Kruskal’s Algorithm and Pfim’s Algorithm. The
point is that both algorithms only include an edge when it is justified by the Proof. Let T be a spanning tree that contains e; we need to show that T does
Cut Property (4.17). not have the minimum possible cost. By analogy with the proof of the Cut
Property (4.17), we’ll do this with an exchange argument, swapping e for a
(4.18) Kruskal’s Algorithm produces a minimum spanning tree of G. cheaper edge in such a way that we still have a spanning tree.
So again the question is: How do we find a cheaper edge that can be
Proof. Consider any edge e = (v, tu) added by Kruskal’s Algorithm, and let exchanged in this way with e? Let’s begin by deleting e from T; this partitions
S be the set of all nodes to which v has a path at the moment iust before the nodes into two components: S, containing node u; and V - S, containing
e is added. Clearly v ~ S, but tu S, since adding e does not create a cycle. node tu. Now, the edge we use in place of e should have one end in S and the
Moreover, no edge from S to V - S has been encountered yet, since any such other in V - S, so as to stitch the tree back together.
edge could have been added without creating a cycle, and hence would have We can find such an edge by following the cycle C. The edges of C other
been added by Kruskal’s Algorithm. Thus e is the cheapest edge with one end than e form, by definition, a path P with one end at u and the other at tu. If
in S and the other in V- S, and so by (4.17) it belongs to every minimum we follow P from u to tu, we begin in S and end up in V - S, so there is some
spanning tree.
4.5 The Minimum Spanning Tree Problem 149
Chapter 4 Greedy Algorithms
148
contradiction that (V, T) contains a cycle C. Consider the most expensive edge
e on C, which would be the first one encountered by the algorithm. This e.dge
should have been removed, since its removal would not have disconnected
the graph, and this contradicts the behavior of Reverse-Delete. []

While we will not explore this further here, the combination of the Cut
Property (4.17) and the Cycle Property (4.20) implies that something even
more general is going on. Any algorithm that builds a spanning tree by
repeatedly including edges when justified by the Cut Property and deleting
edges when justified by the Cycle Property--in any order at all--will end up
with a minimum spanning tree. This principle allows one to design natural
greedy algorithms for this problem beyond the three we have considered here,
~Tcan be swapped for e.) and it provides an explanation for why so many greedy algorithms produce
optimal solutions for this problem.
Figure 4.11 Swapping the edge e’ for the edge e in the spanning tree T, as described in
the proof of (4.20). Eliminating the Assumption that All Edge Costs Are Distinct Thus far, we
have assumed that all edge costs are distinct, and this assumption has made the
analysis cleaner in a number of places. Now, suppose we are given an instance
edge e’ on P that crosses from S to V - S. See Figure 4.11 for an illustration of, of the Minimum Spanning Tree Problem in which certain edges have the same
this. cost - how can we conclude that the algorithms we have been discussing still
Now consider the set of edges T~ = T - {e} LJ [e’}. Arguing just as in the provide optimal solutions?
proof of the Cut Property (4.17), the graph (V, T~) is connected and has no There turns out to be an easy way to do this: we simply take the instance
cycles, so T’ is a spanning tree of G. Moreover, since e is the most expensive and perturb all edge costs by different, extremely small numbers, so that they
edge on the cycle C, and e’ belongs to C, it must be that e’ is cheaper than e, all become distinct. Now, any two costs that differed originally will sti!l have
and hence T’ is cheaper than T, as desired. [] the same relative order, since the perturbations are so small; and since all
of our algorithms are based on just comparing edge costs, the perturbations
The Optimality of the Reverse-Delete Algorithm Now that we have the Cycle effectively serve simply as "tie-breakers" to resolve comparisons among costs
Property (4.20), it is easy to prove that the Reverse-Delete Algorithm produces that used to be equal.
a minimum spanning tree. The basic idea is analogous to the optimality proofs Moreover, we claim that any minimum spanning tree T for the new,
for the previous two algorithms: Reverse-Delete only adds an edge when it is perturbed instance must have also been a minimum spanning tree for the
justified by (4.20). original instance. To see this, we note that if T cost more than some tree T* in
the original instance, then for small enough perturbations, the change in the
(4.21) The Reverse-Delete Algorithm produces a minimum spanning tree
cost of T cannot be enough to make it better than T* under the new costs. Thus,
of G. if we run any of our minimum spanning tree algorithms, using the perturbed
costs for comparing edges, we will produce a minimum spanning tree T that
Proof. Consider any edge e = (v, w) removed by Reverse-Delete. At the time is also optimal for the original instance.
that e is removed, it lies on a cycle C; and since it is the first edge encountered
by the algorithm in decreasing order of edge costs, it must be the most Implementing Prim’s Algorithm
expensive edge on C. Thus by (4.20), e does not belong to any minimum
We next discuss how to implement the algorithms we have been considering
spanning tree. so as to obtain good running-time bounds. We will see that both Prim’s and
So if we show that the output (V, T) of Reverse-Delete is a spanning tree Kruskal’s Algorithms can be implemented, with the right choice of data struc-
of G, we will be done. Clearly (V, T) is connected, since the algorithm never tures, to run in O(m log n) time. We will see how to do this for Prim’s Algorithm
removes an edge when this will disconnect the graph. Now, suppose by way of
Chapter 4 Greedy Algorithms 4.6 Implementing Kruskal’s Algorithm: The Union-Find Data Structure 151
150
all robust against failures. One could instead make resilience an explicit goal,
here, and defer discussing the implementation of Kruskal’s Algorithm to the
for example seeking the cheapest connected network on the set of sites that
next section. Obtaining a running time close to this for the Reverse-Delete
remains connected after the deletion of any one edge.
Algorithm is difficult, so we do not focus on Reverse-Delete in this discussion.
All of these extensions lead to problems that are computationally much
For Pfim’s Algorithm, while the proof of correctness was quite different
harder than the basic Minimum Spanning Tree problem, though due to their
from the proof for Dijkstra’s Algorithm for the Shortest-Path Algorithm, the
implementations of Prim and Dijkstra are almost identical. By analogy with importance in practice there has been research on good heuristics for them.
Dijkstra’s Algorithm, we need to be able to decide which node v to add next to
the growing set S, by maintaining the attachment costs a(v) = mine=(u,v):aEs Ce 4.6 Implementing Kruskal’s Algorithm:
for each node v ~ V - S. As before, we keep the nodes in a priority queue with
these attachment costs a(v) as the keys; we select a node with an Extra¢~cNin
The IJnion-Find Data Structure
operation, and update the attachment costs using ChangeKey operations. One of the most basic graph problems is to find the set of connected compo-
There are n - I iterations in which we perform Ex~crac~cNin, and we perform nents. In Chapter 3 we discussed linear-time algorithms using BFS or DFS for
finding the connected components of a graph.
ChangeKey at most once for each edge. Thus we have
In this section, we consider the scenario in which a graph evolves through
(4.22) Using a priority queue, Prim’s Algorithm can be implemented on a the addition of edges. That is, the graph has a fixed population of nodes, but it
graph with n nodes and m edges to run in O(m) time, plus the time for n grows over time by having edges appear between certain paizs of nodes. Our
Ex~;rac~Iqin, and m ChangeKey operations. goal is to maintain the set of connected components of such a graph thxoughout
this evolution process. When an edge is added to the graph, we don’t want
As with Dijkstra’s Algorithm, if we use a heap-based priority queue we to have to recompute the connected components from scratch. Rather, we
can implement both Ex~crac~cMin and ChangeKey in O(log n) time, and so get will develop a data structure that we ca!l the Union-Find structure, which
an overall running time of O(m log n). will store a representation of the components in a way that supports rapid
searching and updating.
Extensions This is exactly the data structure needed to implement Kruskal’s Algorithm
efficiently. As each edge e = (v, w) is considered, we need to efficiently find
The minimum spanning tree problem emerged as a particular formulation
the identities of the connected components containing v and w. If these
of a broader network design goal--finding a good way to connect a set of
components are different, then there is no path from v and w, and hence
sites by installing edges between them. A minimum spaxming tree optimizes
edge e should be included; but if the components are the same, then there is
a particular goa!, achieving connectedness with minimum total edge cost. But
a v-w path on the edges already included, and so e should be omitted. In the
there are a range of fllrther goals one might consider as well.
event that e is included, the data structure should also support the efficient
We may, for example, be concerned about point-to-point distances in the merging of the components of v and w into a single new component.
spanning tree we .build, and be willing to reduce these even if we pay more
for the set of edges. This raises new issues, since it is not hard to construct
examples where the minimum spanning tree does not minimize point-to-point ~ The Problem
distances, suggesting some tension between these goals. The Union-Find data structure allows us to maintain disjoint sets (such as the
Alternately, we may care more about the congestion on the edges. Given components of a graph) in the following sense. Given a node u, the operation
traffic that needs to be routed between pairs of nodes, one could seek a Find(u) will return the name of the set containing u. This operation can be
spanning tree in which no single edge carries more than a certain amount of used to test if two nodes u and v are in the same set, by simply checking
this traffic. Here too, it is easy to find cases in which the minimum spanning if Find(u) = Find(v). The data structure will also implement an operation
Union(A, B) to take two sets A and B and merge them to a single set.
tree ends up concentrating a lot of traffic on a single edge.
More generally, it is reasonable to ask whether a spanning tree is even the These operations can be used to maintain connected components of an
evolving graph G = (V, E) as edges are added. The sets will be the connected
right kind of solution to our network design problem. A tree has the property
that destroying any one edge disconnects it, which means that trees are not at components of the graph. For a node u, the operation Find(u) will return the
4.6 Implementing Kruskal’s Algorithm: The Union-Find Data Structure 153
Chapter 4 Greedy Algorithms
152
we save some time by choosing the name for the union to be the name of one of
name of the component containing u. If we add an edge (u, v) to the graph,
the sets, say, set A: this way we only have to update the values Component [s]
then we first test if u and v are already in the same connected component (by
for s ~ B, but not for any s ~ A. Of course, if set B is large, this idea by itself
testing if Find(u) = Find(v)). If they are not, then Union(Find(u),Find(v)) doesn’t help very much. Thus we add one further optimization. When set B
can be used to merge the two components into one. It is important to note is big, we may want to keep its name and change Component [s] for all s ~ A
that the Union-Find data structure can only be used to maintain components
instead. More generally, we can maintain an additional array size of length
of a graph as we add edges; it is not designed to handle the effects of edge n, where size[A] is the size of set A, and when a Union(A, B) operation is
deletion, which may result in a single component being "split" into two. performed, we use the name of the larger set for the union. This way, fewer
To summarize, the Union-Find data structure will support three oper- elements need to have their Componen~c values updated.
afions. Even with these optimizations, the worst case for a Union operation is
o MakeUnionFind(S) for a set S will return a Union-Find data structure still O(n) time; this happens if we take the union of two large sets A and B,
on set S where all elements are in separate sets. This corresponds, for each containing a constant fraction of all the elements. However, such bad
example, to the connected components of a graph with no edges. Our cases for Union cannot happen very often, as the resulting set A U B is even
goal will be to implement MakeUnionFind in time O(n) where n bigger. How can we make this statement more precise? Instead of bounding
o For an element u ~ S, the operation Find(u) will return the name of the the worst-case running time of a single Union operation, we can bound the
set containing u. Our goal will be to implement Find(u) in O(log n) time. total (or average) running time of a sequence of k Union operations.
Some implementations that we discuss will in fact .take only 0(1) time,
for this operation. (4.23) Consider the array implementation of the Union-Find data structure
o For two sets A and B, the operation Union(A, B) will change the data for some set S of size n, where unions keep the name of the larger set. The
structure by merging the sets A and B into a single set. Our goal .will be Find operation takes O(1) time, MakeUnionFind(S) takes O(n) time, and any
to implement Union in O(log n) time. . sequence of k Union operations takes at most O(k log k) time.
Let’s briefly discuss what we mean by the name of a set--for example,
as returned by the Find operation. There is a fair amount of flexibility in Proof. The claims about the MakeUnionFind and Find operations are easy
defining the names of the sets; they should simply be consistent in the sense to verify. Now consider a sequence of k Union operations. The only part
that Find(v) and Find(w) should return the same name if v and w belong to~ of a Union operation that takes more than O(I) time is updating the array
the same set, and different names otherwise. In our implementations, we will Component. Instead of bounding the time spent on one Union operation,
name each set using one of the elements it contains. we will bound the total time spent updating Component[v] for an element
u fi-Lroughout the sequence of k operations.
A Simple Data Structure for Union-Find Recall that we start the data structure from a state when all n elements are
Maybe the simplest possible way to implement a Union-Find data structure in their own separate sets. A single Union operation can consider at most two
is to maintain an array Component that contains the name of the set cuirenfly of these original one-element sets, so after any sequence of k Union operations,
containing each element. Let S be a set, and assume it has n elements denoted all but at most 2k elements of S have been completely untouched. Now
{1 ..... n}. We will set up an array Component of size n, where Component [s] is consider a particular element v. As v’s set is involved in a sequence of Union
the name of the set containing s. To implement MakeUnionFind(S), we set up operations, its size grows. It may be that in some of these Unions, the value
the array and initialize it to Component Is] = s for all s ~ S. This implementation of Component[v] is updated, and in others it is not. But our convention is that
makes Find(u) easy: it is a simple lookup and takes only O(.1) time. However, the union uses the name of the larger set, so in every update to Component [v]
Union(A, B) for two sets A and B can take as long as O(n) time, as we have the size of the set containing u at least doubles. The size of v’s set starts out at
I, and the maximum possible size it can reach is 2k (since we argued above
to update the values of Component Is] for all elements in sets A and B.
that all but at most 2k elements are untouched by Union operations). Thus
To improve this bound, we will do a few simple optimizafions. First, it is
Component[v] gets updated at most 1og2(2k) times throughout the process.
useful to explicitly maintain the list of elements in each set, so we don’t have to Moreover, at most 2k elements are involved in any Union operations at all, so
look through the whole array to find the elements that need updating. Further,
4.6 Implementing Kruskal’s Algorithm: The Union-Find Data Structure 155
Chapter 4 Greedy Algorithms
154

we get a bound of O(k log k) for the time spent updating Component values IThe set {s, u, w} was merged into {t, u, z}.)
in a sequence of k Union operations. ,,

While this bound on the average running time for a sequence of k opera-
tions is good enough in many applications, including implementing Kruskal’s
Algorithm, we will try to do better and reduce the worst-case time required.
We’ll do this at the expense of raising the time required for the Find operation-
to O(log n).

A Better Data Structure for Union-Find Figure 4.12 A Union-Find data structure using pointers. The data structure has only
The data structure for this alternate implementation uses pointers. Each node two sets at the moment, named after nodes u andj. The dashed arrow from u to u is the
v ~ S will be contained in a record with an associated pointer to the name result of the last Union operation. To answer a Find query, we follow the arrows unit
we get to a node that has no outgoing arrow. For example, answering the query Find(i)
of the set that contains v. As before, we will use the elements of the set S would involve following the arrows i to x, and then x to ].
as possible set names, naming each set after one of its elements. For the
MakeUnionFind(S) operation, we initiafize a record for each element v ~ S
with a pointer that points to itself (or is defined as a null pointer), to indicate structure in Figure 4.12 followed this convention. To implement this choice
that v is in its own set. efficiently, we will maintain an additional field with the nodes: the size of the
Consider a Union operation for two sets A and/3, and assume that the corresponding set.
name we used for set A is a node v ~ A, while set B is named after node u ~ B.
The idea is to have either u or u be the name of the combined set; assume we (4.24) Consider the above pointer-based implementation of the Union-Find
select v as the name. To indicate that we took the union of the two sets, and data structure [or some set S oy size n, where unions keep the name o[ the larger
that the name of the union set is v, we simply update u’s pointer to point to v. set. A Union operation takes O(1) t~me, MakeUnionFind(S) takes O(n) time,
We do not update the pointers at the other nodes of set B. and a Find operation takes O(log n) time.
As a resuk, for elements w ~/3 other than u, the name of the set they
belong to must be computed by following a sequence of pointers, first lead~g Proof. The statements about Union and MakeUnionFind are easy to verify.
them to the "old name" u and then via the pointer from u to the "new name" v. The time to evaluate Find(u) for a node u is the number of thnes the set
See Figure 4.12 for what such a representation looks like. For example, the twO containing node u changes its name during the process. By the convention
sets in Figure 4.12 could be the outcome of the following sequence of Union that the union keeps the name of the larger set, it follows that every time the
operations: Union(w, u), Union(s, u), Union(t, v), Union(z, u), Union(i, x), name of the set containing node u changes, the size of this set at least doubles.
Union(y, j), Union(x, ]), and Union(u, Since the set containing ~ starts at size 1 and is never larger than n, its size can
This pointer-based data structure implements Union in O(1) time: all we double at most log2 rt times, and so there can be at most log2 n name changes.
have to do is to update one pointer. But a Find operation is no longer constant []
time, as we have to follow a sequence of pointers through a history of old
names the set had, in order to get to the current name. How long can a Find(u) Further Improvements
operation take.~ The number of steps needed is exactly the number of times
Next we will briefly discuss a natural optimization in the pointer-based Union-
the set containing node u had to change its name, that is, the number of times
the Component[u] array position would have been updated in our previous Find data structure that has the effect of speeding up the Find operations.
array representation. This can be as large as O(n) if we are not careful with Strictly speaking, this improvement will not be necessary for our purposes in
this book: for all the applications of Union-Find data structures that we con-
choosing set names. To reduce the time required for a Find operation, we wll!
sider, the O(log n) time per operation is good enough in the sense that further
use the same optimization we used before: keep the name of the larger set
improvement in the time for operations would not translate to improvements
as the name of the union. The sequence of Unions that produced the data
156 Chapter 4 Greedy Algorithms 4.7 Clustering
157
in the overall running time of the algorithms where we use them. (The Union- ¯ e time, since after finding the name x of the set containing v, we have to go
Find operations will not be the only computational bottleneck in the running back through the same path of pointers from v to x, and reset each of these
time of these algorithms.) pointers to point to x directly. But this additional work can at most double
To motivate the improved version of the data structure, let us first discuss a the time required, and so does not change the fact that a Find takes at most
bad case for the running time of the pointer-based Union-Find data structure. O(log n) time. The real gain from compression is in making subsequent calls to
First we build up a structure where one of the Find operations takes about log n Find cheaper, and this can be made precise by the same type of argument we
time. To do this, we can repeatedly take Unions of equal-sized sets. Assume v used in (4.23): bounding the total tLme for a sequence of n Find operations,
is a node for which the Find(v) operation takes about log rt time. Now we can rather than the worst-case time for any one of them. Although we do not go
issue Find(v) repeatedly, and it takes log rt for each such call. Having to follow into the details here, a sequence of n Find operations employing compression
the same sequence of log rt pointers every time for finding the name of the set requires an amount of time that is extremely close to linear in rt; the actual
containing v is quite redundant: after the first request for Find(v), we akeady upper bound is O(not(rt)), where or(n) is an extremely slow-growing function
"know" the name x of the set containing v, and we also know that all other of n called the irtverse Ackermartrt furtctiort. (In particular, o~(rt) < 4 for any
nodes that we touched during our path from v to the current name also are all value of rt that could be encountered in practice.)
contained in the set x. So in the improved implementation, we will compress
the path we followed after every Find operation by resetting all pointers along Implementing Kruskal’s Algorithm
the path to point to the current name of the set. No information is lost by Now we’ll use the Union-Find data structure to implement Kruskal’s Algo-
doing this, and it makes subsequent Find operations run more quickly. See, rithm. First we need to sort the edges by cost. This takes time O(m log m).
Figure 4.13 for a Union-Find data structure and the result of Find(v) using Since we have at most one edge between any pair of nodes, we have m < rt2
path compression. and hence this running time is also O(m log rt).
Now consider the running time of the operations in the resulting imple- After the sorting operation, we use the Union-Find data structure to
mentation. As before, a Union operation takes O(1) time and MakeUnion- maintain the connected components of (V, T) as edges are added. As each
Find(S) takes O(rt) time to set up a data structure for a set of size ft. How did edge e = (v, w) is considered, we compute Find(u) and Find(v) and test
the time required for a Find(v) operation change? Some Find operations can if they are equal to see if v and w belong to different components. We
still take up to log n time; and for some Find operations we actually increase use Union(Find(u),Find(v)) to merge the two components, if the algorithm
decides to include edge e in the tree T.
We are doing a total of at most 2m Find and n- 1 Union operations
I ow points directly
Enverything to x.from v to x1
on the path
over the course of Kruskal’s Algorithm. We can use either (4.23) for the
array-based implementation of Union-Find, or (4.24) for the pointer-based
implementation, to conclude that this is a total of O(m log rt) time. (While
more efficient implementations of the Union-Find data structure are possible,
this would not help the running time of Kruskal’s Algorithm, which has an
unavoidable O(m log n) term due to the initial sorting of the edges by cost.)
To sum up, we have

(4.25) K, ruskal s’ AIgortthm

" cart be implemertted on a graph’ with n -rtodes
: .........
artd
m edges to rurt irt O(m log rt) time.

(a) 4.7 Clustering

Figttre 4.13 (a) An instance of a Union-Find data structure; and (b) the result of the We motivated the construction of minimum spanning trees through the prob-
operation Find(u) on this structure, using path compression. lem of finding a low-cost network connecting a set of sites. But minimum
Chapter 4 Greedy Algorithms 4.7 Clustering 159
158

spanning trees arise in a range of different settings, several of which appear f! Designing the Algorithm
on the surface to be quite different from one another. An appealing example To find a clustering of maximum spacing, we consider growing a graph on the
is the role that minimum spanning trees play in the area of clustering. vertex set U. The connected components will be the clusters, and we will try
to bring nearby points together into the same cluster as rapidly as possible.
(This way, they don’t end up as points in different clusters that are very close
together.) Thus we start by drawing an edge between the closest pair of points.
f! The Problem We then draw an edge between the next closest pair of points. We continue
Clustering arises whenever one has a co!lection of obiects--say, a set of adding edges between pairs of points, in order of increasing distance d(p~, p]).
photographs, documents, or microorganisms--that one is trying to classify In this way, we are growing a graph H on U edge by edge, with connected
or organize into coherent groups. Faced with such a situation, it is natural components corresponding to clusters. Notice that we are only interested in
to look first for measures of how similar or dissimilar each pair of obiects is. the connected components of the graph H, not the full set of edges; so if we
One common approach is to define a distance function on the objects, with are about to add the edge (pi, pj) and find that pi and pj already belong to the
the interpretation that obiects at a larger distance from one another are less same cluster, we will refrain from adding the edge--it’s not necessary, because
similar to each other. For points in the physical world, distance may actually it won’t change the set of components. In this way, our graph-growing process
be related to their physical distance; but in many applications, distance takes will never create a cycle; so H will actually be a union of trees. Each time
on a much more abstract meaning. For example, we could define the distance we add an edge that spans two distinct components, it is as though we have
between two species to be the number of years since they diverged in the merged the two corresponding clusters. In the clustering literature, the iterative
course of evolution; we could define the distance between two images in ~/ merging of clusters in this way is often termed single-link clustering, a special
video stream as the number of corresponding pixels at which their intensity case of hierarchical agglomerative clustering. (Agglomerative here means that
values differ by at least some threshold. we combine clusters; single-link means that we do so as soon as a single link
Now, given a distance function on the objects, the clustering problem joins them together.) See Figure 4.14 for an example of an instance with k = 3
seeks to divide them into groups so that, intuitively, obiects within the same clusters where this algorithm partitions the points into an intuitively natural
group are "close," and objects in different groups are "far apart." Starting from grouping.
this vague set of goals, the field of clustering branches into a vast number of What is the connection to minimum spanning trees? It’s very simple:
technically different approaches, each seeking to formalize this general notion although our graph-growing procedure was motivated by this cluster-merging
of what a good set of groups might look like. idea, our procedure is precisely Kruskal’s Minimum Spanning Tree Algorithm.
We are doing exactly what Kruskal’s Algorithm would do if given a graph G
CIusterings of Maximum Spacing Minimum spanning trees play a role in one on U in which there was an edge of cost d(Pi, pj) between each pair of nodes
of the most basic formalizations, which we describe here. Suppose we are given (Pi, Pj)- The only difference is that we seek a k-clustering, so we stop the
a set U of n obiects, labeledpl,p2 ..... Pn. For each pair, p~ and pj, we have a procedure once we obtain k connected components.
numerical distance d(p~, pj). We require only that d(Pi, P~) = 0; that d(p~, p]) > 0 In other, words, we are running Kruskal’s Algorithm but stopping it just
for distinct p~ and pT; and that distances are symmetric: d(pi, p]) = d(pj, p~). before it adds its last k - 1 edges. This is equivalent to taking the rill minimum
Suppose we are seeking to divide the obiects in U into k groups, for a spanning tree T (as Kruskal’s Algorithm would have produced it), deleting the
given parameter k. We say that a k-clustering of U is a partition of U into k k - 1 most expensive edges (the ones that we never actually added), and defin-
nonempty sets C1, C2 ..... Q. We define the spacing of a k-clustering to be the ing the k-clustering to be the resulting connected components C1, C2 .....
minimum distance between any pair of points lying in different clusters. Given Thus, iteratively merging clusters is equivalent to computing a minimum span-
that we want points in different clusters to be far apart from one another, a ning tree and deleting the most expensive edges.
natural goal is to seek the k-clustering with the maximum possible spacing.
The question now becomes the following. There are exponentially many
~ Analyzing the Algorithm
different k-clusterings of a set U; how can we efficiently find the one that has Have we achieved our goal of producing clusters that are as spaced apart as
maximum spacing? possible? The following claim shows that we have.
Chapter 4 Greedy Algorithms
160 4.8 Huffrnan Codes and Data Compression
161

Cluster I Cluster Cr

Cluster 2

¯ l \ Cluster C~ : .....
Cluster 3 .......... : ..... Cluster C; / ....................

Figure 4.15 An illustration of the proof of (4.26), showing that the spacing of any
other dustering can be no larger than that of the clustering found by the single-linkage
algorithm.

Figure 4.14 An example of single-linkage clustering with k = 3 dusters. The dusters

are formed by adding edges between points in order of increasing distance.
P has length at most d*. Now, we know that Pi ~ C~s but pj ~ C~s; so let p’ be
the first node on P that does not belong to C£, and let p be the node on P that
comes just before p’. We have just argued that d(p, p’) < d*, since the edge
(4.26) The components C!, Ca ..... Ck [ormed by deleting the k, 1 most (p, p’) was added by Kruskal’s Algorithm. But p and p’ belong to different sets
expensive edges of the minimum spanning tree T constitute a k-clustering of in the clustering e’, and hence the spacing of e’ is at most d(p, p’) _< d*. This
maximum spacing. completes the proof, m

Proof. Let e denote the clustering C1, Ca ..... Ck. The spacing of e is precisely
the length d* of the (/( - !)st most expensive edge in the minimum spanning 4.8 Huffman Codes and Data Compression
tree; this is the length of the edge that Kruskal’s Mgorithm would have added
In the Shortest-Path and Minimum Spanning Tree Problems, we’ve seen how
next, at the moment we stopped it. greedy algorithms can be used to commit to certain parts of a solution (edges
Now consider some other/(-clustering e’, which partitions U into non- in a graph, in these cases), based entirely on relatively short-sighted consid-
empW sets C[, C; ..... C~. We must show that the spacing of e’ is at most erations. We now consider a problem in which this style of "committing" is
d*. carried out in an even looser sense: a greedy rule is used, essentially, to shrink
Since the two clustefings e and e’ are not the same, it must be that one the size of the problem instance, so that an equivalent smaller problem can
of our clusters Cr is not a subset of any of the/( sets C; in e’. Hence there then be solved by recursion. The greedy operation here is ’proved to be "safe,"
are points Pi, Pj ~ Cr that belong to different clusters in e’--say, Pi ~ C~s and in the sense that solving the smaller instance still leads to an optimal solu-
tion for the original instance, but the global consequences of the initial greedy
Now consider the picture in Figure 4.15. Since pi and pj belong to the same decision do not become fully apparent until the full recursion is complete.
component Cr, it must be that Kruskal’s Algorithm added all the edges of a
The problem itself is one of the basic questions in the area of data com-
PrPj path P before we stopped it. In particular, this means that each edge on pression, an area that forms part of the foundations for digital communication.
4.8 Huffman Codes and Data Compression 163
Chapter 4 Greedy Algorithms
162
that can take files as input and reduce their space ~rough efficient encoding
~ The Problem schemes.
Encoding Symbols Using Bits Since computers ultimately operate on se-
We now describe one of the fundamental ways of formulating this issue,
quences of bits (i.e., sequences consisting only of the symbols 0 and 1), one
building up to the question of how we might construct the optimal way to take
needs encoding schemes that take text written in richer alphabets (such as the
advantage of the nonuniform frequencies of the letters. In one sense, such an
alphabets underpinning human languages) and converts this text into long
optimal solution is a very appealing answer to the problem of compressing
strings of bits. data: it squeezes all the available gains out of nonuniformities in the frequen-
The simplest way to do this would be to use a fixed number of bits for cies. At the end of the section, we will discuss how one can make flLrther
each symbol in the alphabet, and then just concatenate the bit strings for progress in compression, taking advantage of features other than nonuniform
each symbol to form the text. To take a basic example, suppose we wanted to frequencies.
encode the 26 letters of English, plus the space (to separate words) and five
punctuation characters: comma, period, question mark, exclamation point,
and apostrophe. This would give us 32 symbols in total to be encoded. Variable-Length Encoding Schemes Before the Internet, before the digital
Now, you can form 2b different sequences out of b bits, and so if we use 5 computer, before the radio and telephone, there was the telegraph. Commu-
bits per symbol, then we can encode 2s= 32 symbols--just enough for our nicating by telegraph was a lot faster than the contemporary alternatives of
purposes. So, for example, we could let the bit string 00000 represent a, the hand-delivering messages by railroad or on horseback. But telegraphs were
bit string 00001 represent b, and so forth up to 11111, which could represent the only capable of transmitting pulses down a wire, and so if you wanted to send
apostrophe. Note that the mapping of bit strings to symbols is arbitrary; the’
a message, you needed a way to encode the text of your message as a sequence
point is simply that five bits per symbol is sufficient. In fact, encoding schemes of pulses.
like ASCII work precisely this way, except that they use a larger number of
bits per symbol so as to handle larger character sets, including capital letters, To deal with this issue, the pioneer of telegraphic communication, Samuel
parentheses, and all those other special symbols you see on a typewriter or Morse, developed Morse code, translating each letter into a sequence of dots
computer keyboard. (short pulses) and dashes (long pulses). For our purposes, we can think of
dots and dashes as zeros and ones, and so this is simply a mapping of symbols
Let’s think about our bare-bones example with just 32 symbols. Is there
into bit strings, just as in ASCII. Morse understood the point that one could
anything more we could ask for from an encoding scheme?. We couldn’t ask
communicate more efficiently by encoding frequent letters with short strings,
to encode each symbo! using just four bits, since 24 is only 16--not enough
and so this is the approach he took. (He consulted local printing presses to get
for the number of symbols we have. Nevertheless, it’s not clear that over large frequency estimates for the letters in English.) Thus, Morse code maps e to 0
stretches of text, we really need to be spending an average of five bits per
(a single dot), t to 1 (a single dash), a to 01 (dot-dash), and in general maps
symbol. If we think about it, the letters in most human alphabets do not more frequent letters to shorter bit strings.
get used equally frequently. In English, for example, the letters e, t: a, o, i,
and n get used much more frequently than q, J, x, and z (by more than an In fact, Morse code uses such short strings for the letters that the encoding
order of magnitude). So it’s really a tremendous waste to translate them all of words becomes ambiguous. For example, just using what we know about
the encoding of e, t, and a, we see that the string 0101 could correspond to
into the same number of bits; instead we could use a small number of bits for
the frequent letters, and a larger number of bits for the less frequent ones, and any of the sequences of letters eta, aa, etet, or aet. (There are other possi-
hope to end up using fewer than five bits per letter when we average over a bilities as well, involving other letters.) To deal with this ambiguity, Morse
code transmissions involve short pauses between letter; (so the encoding of
long string of typical text.
aa would actually be dot-dash-pause-dot-dash-pause). This is a reasonable
This issue of reducing the average number of bits per letter is a funda-
solution--using very short bit strings and then introducing pauses--but it
mental problem in the area of data compression. When large files need to be
means that we haven’t actually encoded the letters using just 0 and 1; we’ve
shipped across communication networks, or stored on hard disks, it’s impor-
actually encoded it using a three-letter alphabet of 0, 1, and "pause." Thus, if
tant to represent them as compactly as possible, subject to the requirement
we really needed to encode everything using only the bits 0 and !, there would
that a subsequent reader of the file should be able to correctly reconstruct it.
need to be some flLrther encoding in which the pause got mapped to bits.
A huge amount of research is devoted to the design of compression algorithms
Chapter 4 Greedy Algorithms 4.8 Huffman Codes and Data Compression 165
164
on the rest of the message, 0000011101; next they will conclude that the second
Prefix Codes The ambiguity problem in Morse code arises because there exist letter is e, encoded as 000.
pairs of letters where the bit string that encodes one letter is a prefix of the bit
string that encodes another. To eliminate this problem, and hence to obtain an Optimal Prefix Codes We’ve been doing all this because some letters are
encoding scheme that has a well-defined interpretation for every sequence of
more frequent than others, and we want to take advantage of the fact that more
bits, it is enough to map letters to bit strings in such a way that no encoding frequent letters can have shorter encodings. To make this objective precise, we
is a prefix of any other. now introduce some notation to express the frequencies of letters.
Concretely, we say that a prefix code for a set S of letters is a function y Suppose that for each letter x ~ S, there is a frequency fx, representing the
that maps each letter x ~ S to some sequence of zeros and ones, in such a way fraction of letters in the text that are equal to x. In other words, assuming
that for distinct x, y ~ S, the sequence },(x) is not a prefix of the sequence y(y). there are n letters total, nfx of these letters are equal to x. We notice that the
Now suppose we have a text consisting of a sequence of letters xlx2x3 ¯ ¯ ¯ frequencies sum to 1; that is, ~x~S fx = 1.
x~.~ We can convert this to a sequence of bits by simply encoding each letter as Now, if we use a prefix code ~, to encode the given text, what is the total
a bit sequence using ~ and then concatenating all these bit sequences together: length of our encoding? This is simply the sum, over all letters x ~ S, of the
~ (xl) y (x2) ¯ ¯ ¯ y (xn). If we then hand this message to a recipient who knows the number of times x occurs times the length of the bit string }, (x) used to encode
function y, they will be able to reconstruct the text according to the following x. Using Iy(x)l to denote the length y(x), we can write this as
rule.
o Scan the bit sequence from left to right. encoding length = ~ nfxl},(x)[ = n ~ fx[y(x)l.
o As soon as you’ve seen enough bits to match the encoding of some letter, x~S
output this as the first letter of the text. This must be the correct first letter, Dropping the leading coefficient of n from the final expression gives us
since no shorter or longer prefix of the bit sequence could encode any ~x~s fxl}’(x)l, the average number of bits required per letter. We denote this
other letter. quantity by ABL0,’).
o Now delete the corresponding set of bits from the front of the message
To continue the earlier example, suppose we have a text with the letters
and iterate. S = {a, b, c, d, e}, and their frequencies are as follows:

In this way, the recipient can produce the correct set of letters without our
having to resort to artificial devices like pauses to separate the letters. £=.B2, f~=.25, f~=.20, fa=.~8, f~=.o5.
For example, suppose we are trying to encode the set of five letters Then the average number of bits per letter using the prefix code Yl defined
S = {a, b, c, d, e}. The encoding ~1 specified by previously is
y~(a) = 11
Zl(b) = O1 .32.2+.25.2+.20.3+.18.2+.05.3 =2.25.
y~(c) = 001 It is interesting to compare this to the average number of bits per letter using
y~(d) = 10 a fixed-length encoding. (Note that a fixed-length encoding is a prefix code:
}q(e) = 000 if all letters have encodings of the same length, then clearly no encoding can
be a prefix of any other.) With a set S of five letters, we would need three bits
is a prefix code, since we can check that no encoding is a prefix of any other. per letter for a fixed-length encoding, since two bits could only encode four
NOW, for example, the string cecab would be encoded as 0010000011101. A letters. Thus, using the code ~1 reduces the bits per letter from 3 to 2.25, a
recipient of this message, knowing y~, would begin reading from left to right. savings of 25 percent.
Neither 0 nor O0 encodes a letter, but 001 does, so the recipient concludes that
And, in fact, Yl is not the best we can do in this example. Consider the
the first letter is c. This is a safe decision, since no longer sequence of bits
prefix code ya given by
beginning with 001 could encode a different letter. The recipient now iterates
4.8 Huffman Codes and Data Compression
Chapter 4 Greedy Algorithms 167
166
g2(a) = 11 to y. But this is the same as saying that x would lie on the path from the
root to y, which isn’t possible if x is a leaf. []
g2(b) = 10
g2(c) = 01
This relationship between binary trees and prefix codes works in the other
g2(d) = 001 direction as well. Given a prefix code g, we can build a binary tree recursively
g2(e) = 000 as follows. We start with a root; all letters x ~ S whose encodings begin with
The average number of bits per letter using gz is a 0 will be leaves in the left subtree of the root, and all letters y ~ S whose
encodlngs begin with a 1 will be leaves in the right subtree of the root. We
.32.2 + .25- 2 -k .20 ¯ 2 + .18.3 4- .05- 3 = 2.23. now build these two subtrees recursively using this rule.
For example, the labeled tree in Figure 4.16(a) corresponds to the prefix
So now it is natural to state the underlying question. Given an alphabet code g0 specified by
and a set of frequencies for the letters, we would like to produce a prefix
code that is as efficient as possible--namely, a prefix code that minimizes the go(a) -- 1
average nu}nber of bits per letter ABL(g) = ~_,x~S fxlg(x)l. We will call such a go(b) -- 011
prefix code optimal.
g0(c) = 010
g0(d) = 001
f! Designing the Algorithm g0(e) = 000
The search space for this problem is fairly complicated; it includes all possible
ways of mapping letters to bit strings, subiect to the defining property of prefix
codes. For alphabets consisting of an extremely small number of letters, it is To see this, note that the leaf labeled a is obtained by simply taking the right-
hand edge out of the root (resulting in an encoding of !); the leaf labeled e is
feasible to search this space by brute force, but this rapidly becomes infeasible.
obtained by taMng three successive left-hand edges starting from the root; and
We now describe a greedy method to construct an optimal prefix code analogous explanations apply for b, c, and d. By similar reasoning, one can
very efficiently. As a first step, it is useful to develop a tree-based means of see that the labeled tree in Figure 4.16(b) corresponds to the prefix code gl
representing prefix codes that exposes their structure more clearly than simply defined earlier, and the labeled tree in Figure 4.16(c) corresponds to the prefix
the lists of function values we used in our previous examples. code g2 defined earlier. Note also that the binary trees for the two prefix codes
Representing Prefix Codes Using Binary Trees Suppose we take a rooted tree gl and g2 are identical in structure; only the labeling of the leaves is different.
T in which each node that is not a leaf has at most two children; we call such The tree for go, on the other hand, has a different structure.
a tree a binary tree. Further suppose that the number of leaves is equal to the Thus the search for an optimal prefix code can be viewed as the search for
size of the alphabet S, and we label each leaf with a distinct letter in S. a binary tree T, together with a labeling of the leaves of T, that minimizes the
Such a labeled binary tree T naturally describes a prefix code, as follows. average number of bits per letter. Moreover, this average quantity has a natural
For each letter x ~ S, we follow the path from the root to the leaf labeled x; interpretation in the terms of the structure of T: the length of the encoding of
each time the path goes from a node to its left child, we write down a 0, and a letter x ~ S is simply the length of the path from the root to the leaf labeled
each time the path goes from a node to its right child, we write down a 1. We x. We will refer to the length of this path as the depth of the leaf, and we will
take the resulting string of bits as the encoding of x. denote the depth of a leaf u in T simply by depthw(u). (As fwo bits of notational
convenience, we will drop the subscript T when it is clear from context, and
Now we observe
we will often use a letter x ~ S to also denote the leaf that is labeled by it.)
(4.27) The enCoding of S Constructed from T is a prefix code. Thus we dre seeking the labeled tree that minimizes the weighted average
of the depths of all leaves, where the average is weighted by the frequencies
of the letters that label the leaves: ~x~s Ix" depthw(X). We will use ABL(T) to
Proof. In order for the encoding of x to be a prefix of the encoding of y, the denote this quantity.
path from the root to x would have to be a prefix of the path from the root
4.8 Huffman Codes and Data Compression 169
Chapter 4 Greedy Algorithms
168
a node u with exactly one child u. Now convert T into a tree T’ by replacing
node u with v.
To be precise, we need to distinguish two cases. If u was the root of the
tree, we simply delete node u and use u as the root. If u is not the root, let w
be the parent of u in T. Now we delete node u and make v be a child of w
in place of u. This change decreases the number of bits needed to encode any
leaf in the subtree rooted at node u, and it does notaffect other leaves. So the
prefix code corresponding to T’ has a smaller average number of bits per letter
than the prefix code for T, contradicting the optimality of T. []

A First Attempt: The Top-Down Approach Intuitively, our goal is to produce

a labeled binary tree in which the leaves are as close to the root as possible.
This is what will give us a small average leaf depth.
A natural way to do this would be to try building a tree from the top down
by "packing" the leaves as tightly as possible. So suppose we try to split the
alphabet S into two sets S1 and S2, such that the total frequency of the letters
in each set is exactly ½. If such a perfect split is not possible, then we can try
for a split that is as nearly balanced as possible. We then recursively construct
prefix codes for S1 and S2 independently, and make these the two subtrees of
the root. (In terms of bit strings, this would mean sticking a 0 in front of the
encodings we produce for S1, and sticking a 1 in front of the encodings we
produce for $2.)
It is not entirely clear how we should concretely define this "nearly
balanced" split of th6 alphabet, but there are ways to make this precise.
The resulting encoding schemes are called Shannon-Fano codes, named after
Claude Shannon and Robert Fano, two of the major early figures in the area
of information theory, which deals with representing and encoding digital
information. These types of prefix codes can be fairly good in practice, but
for our present purposes they represent a kind of dead end: no version of this
Figure 4.16 Parts (a), (b), and (c) of the figure depict three different prefix codes for top-down splitting strategy is guaranteed to always produce an optimal prefix
the alphabet S = {a, b, c, d, el. code. Consider again our example with the five-letter alphabet S = {a, b, c, d, e}
and frequencies
As a first step in considering algorithms for this problem, let’s note a simple fa=.32, fb=.25, fc=.20, fd=.18, re=.05.
fact about the optimal tree. For this fact, we need a definition: we say that a
binary tree is full if each node that is not a leaf has two children. (In other There is a unique way to split the alphabet into two sets’ of equal frequency:
words, there are no nodes with exactly one chiAd.) Note that all three binary {a, d} and {b, c, e}. For {a, d}, we can use a single bit to encode each. For
{b, c, e}, we need to continue recursively, and again there is a unique way
trees in Figure 4.16 are full. to split the set into two subsets of equal frequency. The resulting code corre-
(4.28) The binary tree corresponding to the optimal prefix code is full. sponds to the code gl, given by the labeled tree in Figure 4.16(b); and we’ve
already seen that 1~ is not as efficient as the prefix code ~2 corresponding to
Proof. This is easy to prove using an exchange argument. Let T denote the the labeled tree in Figure 4.16(c).
binary tree corresponding to the optimal prefix code, and suppose it contains
4.8 Huffman Codes and Data Compression 171
Chapter 4 Greedy Algorithms
170
Statement (4.29) gives us the following intuitively natura!, and optimal,
Shannon and Fano knew that their approach did not always yield the way to label the tree T* if someone should give it to us. We first take all leaves
optimal prefix code, but they didn’t see how to compute the optimal code of depth 1 (if there are an.y) ~nd label them with the highest-frequency letters
without brute-force search. The problem was solved a few years later by David in any order. We then take all leaves of depth 2 (if there are any) and label them
Huffman, at the time a graduate student who learned about the question in a with the next-highest-frequency letters in any order. We continue through the
class taught by Fano. leaves in order of increasing depth, assigning letters in order of decreasing
We now describe the ideas leading up to the greedy approach that Huffrnan frequency. The point is that this can’t lead to a suboptimal labeling of T*,
discovered for producing optimal prefix codes. since any supposedly better labeling would be susceptible to the exchange in
(4.29). It is also crucial to note that, among the labels we assign to a block of
What If We Knew the Tree Structure of the Optimal Prefix Code? A tech- leaves all at the same depth, it doesn’t matter which label we assign to which
nique that is often helpful in searching for an efficient algorithm is to assume, leaf. Since the depths are all the same, the corresponding multipliers in the
as a thought experiment, that one knows something partial about the optimal expression Y~x~s fxlY (x) l are the same, and so the choice of assignment among
solution, and then to see how one would make use of this partial knowledge leaves of the same depth doesn’t affect the average number of bits per letter.
in finding the complete solution. (Later, in Chapter 6, we will see in fact that
But how is all this helping us? We don’t have the structure of the optimal
this technique is a main underpinning of the dynamic programming approach
tree T*, and since there are exponentially many possible trees (in the size of
to designing algorithms.) the alphabet), we aren’t going to be able to perform a brute-force search over
For the current problem, it is useful to ask: What if someone gave us the all of them.
binary tree T* that corresponded to an optimal prefix code, but not the labeling
In fact, our reasoning about T* becomes very useful if we think not about
of the leaves? To complete the solution, we would need to figure out which
the very beginning of this labeling process, with the leaves of minimum depth,
letter should label which leaf of T*, and then we’d have our code. How hard but about the very end, with the leaves of maximum depth--the ones that
is this? receive the letters with lowest frequency. Specifically, consider a leaf v in T*
In fact, this is quite easy. We begin by formulating the following basic fact. whose depth is as large as possible. Leaf u has a parent u, and by (4.28) T* is
a till binary tree, so u has another child w. We refer to v and w as siblings,
(4.29) Suppose that u and v are leaves of T*, such that depth(u) < depth(v). since they have a common parent. Now, we have
Further, suppose that in a labeling of T* corresponding to an optimal prefix
code, leaf u is labeled with y ~ S and leaf v is labeled with z ~ S. Then fy >_ fz. (4.30) w is a leaf of T*.

Proof. This has a quick proof using an exchange argument. If fy < fz, then Proof. If w were not a leaf, there would be some leaf w’ in the subtree below
consider the code obtained by exchanging the labels at the nodes u and it. But then w’ would have a depth greater than that of v, contradicting our
v. In the expression for the average number of bits per letter, ,~BL(T*)= assumption that v is a leaf of maximum depth in T*. ~,
~x~S fx depth(x), the effect of this exchange is as follows: the multiplier on fy
increases (from depth(u) to depth(v)), and the multiplier on fz decreases by So v and w are sibling leaves that are as deep as possible in T*. Thus our
the same amount (from depth(v) to depth(u)). level-by-level process of labeling T*, as justified by (4.29), will get to the level
Thus the change to the overall sum is (depth(v) - depth(u))(fy - fz). If containing v and w last. The leaves at this level will get the lowest-frequency
letters. Since we have already argued that the order in which we assign these
~fy < fz, this change is a negative number, contradicting the supposed optimality
of the prefix code that we had before the exchange, m letters to the leaves within this level doesn’t matter, there is an optimal labeling
in which u and w get the two lowest-frequency letters of all.
We can see the idea behind (4.29) in Figure 4. !6 (b): a quick way to see that We sum this up in the following claim.
the code here is not optimal is to notice that it can be improved by exchanging
the positions of the labels c and d. Having a lower-frequency letter at a strictly (4.31) There is an optimal prefix code, with corresponding tree T*, in which
smaller depth than some other higher-frequency letter is precisely what (4.29) :the two lowest-frequency letters are assigned to leaves that are Siblings in T*.
rules out for an optimal solution.
Chapter 4 Greedy Algorithms 4.8 Huffman Codes and Data Compression 173
172
Take the leaf labeled ~ and add two children below it
labeled y* and z*
Endif

We refer to this as Huffman’s Algorithm, and the prefix code that it

produces for a given alphabet is accordingly referred to as a Huffman code.
In general, it is clear that this algorithm always terminates, since it simply
invokes a recursive call on an alphabet that is one letter smaller. Moreover,
letter
using (4.31), it will not be difficult to prove that the algorithm in fact produces
with sum of ffequenciesJ
an optimal prefix code. Before doing this, however, we pause to note some
; ’, further observations about the algorithm.
First let’s consider the behavior of the algorithm on our sample instance
0 0"~-~Tw° l°west-frequency letters ) with S = {a, b, c, d, e} and frequencies
Figure 4.17 There is an optimal solution in which the two lowest-frequency letters
labe! sibling leaves; deleting them and labeling their parent with a new letter having t~e
.20, ~=..18, 5=.o5.
combined frequency yields an instance ~th a smaller alphabet. The algorithm would first merge d and e into a single letter--let’s denote it
(de)--of frequency .18 + .05 = .23. We now have an instance of the problem
on the four letters S’ = {a, b, c, (de)}. The two lowest-frequency letters in S’ are
An Algorithm to Construct an Optimal Prefix Code Suppose that y* and z* c and (de), so in the next step we merge these into the single letter (cde) of
are the two lowest-frequency letters in S. (We can break ties in the frequencies frequency .20 + .23 = .43. This gives us the three-letter alphabet {a, b, (cde)}.
arbitrarily.) Statement (4.31) is important because it tells us something about Next we merge a and b, and this gives us a two-letter alphabet, at which point
where y* and z* go in the optim!l solution; it says that it is safe to "lock them we invoke the base case of the recursion. If we unfold the result back through
together" in thinking about the solution, because we know they end up as the recursive calls, we get the tree pictured in Figure 4.16(c).
sibling leaves below a common parent. In effect, this common parent acts like
a "meta-letter" whose frequency is the sum of the frequencies of y* and z*. It is interesting to note how the greedy rule underlying Huffman’s
Algorithm--the merging of the two lowest-frequency letters--fits into the
This directly suggests an algorithm: we replace y* and z* with this meta- structure of the algorithm as a whole. Essentially, at the time we merge these
letter, obtaining an alphabet that is one letter smaller. We recursively find a two letters, we don’t know exactly how they will fit into the overall code.
prefix code for the smaller alphabet, and then "open up" the meta-letter back
Rather, we simply commit to having them be children of the same parent, and
into y* and z* to obtain a prefix code for S. This recursive strategy is depicted
this is enough to produce a new, equivalent problem with one less letter.
in Figure 4.17. Moreover, the algorithm forms a natural contrast with the earlier approach
A concrete description of the algorithm is as follows. that led to suboptimal Shannon-Fano codes. That approach was based on a
top-down strategy that worried first and foremost about the top-level split in
To construct a prefix code for an alphabet S, with given frequencies: the binary tree--namely, the two subtrees directly below the root. Huffman’s
If S has two letters then Algorithm, on the other hand, follows a bottom-up approach: it focuses on
Encode one letter using 0 and the other letter using I the leaves representing the two lowest-frequency letters~ and then continues
Else by recursion.
Let y* and z* be the two lowest-frequency letters
Form a new alphabet S’ by deleting y* and z* and ~ Analyzing the Mgorithm
replacing them with a new letter ~ of frequency ~. ÷ ~*
The Optimality of the Algorithm We first prove the optimaliW of Huffman’s
Kecursively construct a prefix code Z’ for S’, with tree T’
Mgorithm. Since the algorithm operates recursively, invoking itself on smaller
Define a prefix code for S as fol!ows:
and smaller alphabets, it is natural to try establishing optimaliW by induction
Start with T’
4.8 Huffman Codes and Data Compression 175
Chapter 4 Greedy Algorithms ....
174

on the size of the alphabet. Clearly it is optimal for all two-letter alphabets such that ABL(Z) < ABL(T); and by (4.31), there is such a tree Z in which the
(since it uses only one bit per letter). So suppose by induction that it is optimal leaves representing y* and z* are siblings.
for all alphabets of size/~ - 1, and consider an input instance consisting of an It is now easy to get a contradiction, as follows. If we delete the leaves
alphabet S of size labeled y* and z* from Z, and label their former parent with w, we get a tree
Let’s quickly recap the behavior of the algorithm on this instance. The Z’ that defines a prefix code for S’. In the same way that T is obtained from
algorithm merges the two lowest-frequency letters y*, z* ~ S into a single letter T’, the tree Z is obtained from ZI by adding leaves for y* and z* below to; thus
o0, calls itself recursively on the smaller alphabet S’ (in which y* and z* are the identity in (4.32) applies to Z and Z’ as well: ABL(Z’) = ABL(Z) -- [to.
replaced by a)), and by induction produces an optimal prefix code for S’, But we have assumed that ABL(Z) < ABL(T); subtracting/:to from both sides
represented by a labeled binary tree T’. It then extends this into a tree T for S, of this inequality we get ,~BL(Z’) < ABL(T’), which contradicts the optimality
by attaching leaves labeled y* and z* as children of the node in T’ labeled of T’ as a prefix code for S’. ,,
There is a close relationship between ABL(T) and ABL(T’). (Note that the
former quantity is the average number of bits used to encode letters in S, while Implementation and Running Time It is clear that Huffman’s Algorithm can
the latter quantity is the average number of bits used to encode letters in S’.) be made to run in polynomial time in k, the number of letters in the alphabet.
The recursive calls of the algorithm define a sequence of k - 1 iterations over
(4.32) ABL(T’) = ABL(T) -- fro- smaller and smaller alphabets, and each iteration except the last consists
Proof. The depth of each lefter x other than y*, z* is the same in both T and simply of identifying the two lowest-frequency letters and merging them into
T’. Also, the depths of y* and z* in T are each one greater than the depth of a single letter that has the combined frequency. Even without being careful
o) in T’. Using this, plus the fact that [to = fy. + fz*, we have about the implementation, identifying the lowest-frequency letters can be done
in a single scan of the alphabet, in time O(k), and so summing this over the
ABL(T) = ~ ~" depthr(X) k - 1 iterations gives O(k2) time.
But in fact Huffman’s Algorithm is an ideal setting in which to use a
= f~,- depthriv*) + fz*" depthr(z*) + ~ ~. depthT(X) priority queue. Recall that a priority queue maintains a set of/c elements,
x-aY*r- ,Z* each with a numerical key, and it allows for the insertion of new elements and
depthT,(X) the extraction of the element with the minimum key. Thus we can maintain
= (fy* q- fz*)" (1 q- depthT,(~o)) +
x~y*,z* the alphabet S in a priority queue, using each letter’s frequency as its key.
In each iteration we just extract the minimum twice (this gives us the two
= ]’to" (1 q- depthr,(O))) q- ]’x" depthr’(X) lowest-frequency letters), and then we insert a new letter whose key is the
x~-y*,z*
sum of these two minimum frequencies. Our priority queue now contains a
representation of the alphabet that we need for the next iteration.
Using an implementation of priority queues via heaps, as in Chapter 2, we
= L + ~ ]’x" depthr’(X) can make each insertion and extraction of the minimum run in time O(log k);
xES~ hence, each iteration--which performs just three of these operations--takes
time O(log/0. Summing over all k iterations, we get a total running time of
= ]:to q- ABE(T/)..
O(k log k).
Using this, we now prove optimality as follows.
Extensions
(4.33) The Huffinan code for a given alphabet achieves the minimum average
number of bits per letter of any prefix code. The structure of optimal prefix codes, which has been our focus here, stands
as a fundamental result in the area of data compression. But it is important to
Proof. Suppose by way of contradiction that the tree T produced by our greedy understand that this optimality result does not by any means imply that we
algorithm is not optimal. This means that there is some labeled binary tree Z have found the best way to compress data under all circumstances.
4.9 Minimum-Cost Arborescences: A Multi-Phase Greedy Algorithm 177
Chapter 4 Greedy Algorithms
176
letter over a long run of text that follows. Such approaches, which change
What more could we want beyond an optimal prefix code? First, consider the encoding in midstream, are called adaptive compression schemes, and
an application in which we are transmitting black-and-white images: each for many kinds of data they lead to significant improvements over the static
image is a 1,000-by-l,000 array of pixels, and each pixel takes one of the two method we’ve considered here.
values black or white. Further, suppose that a typical image is almost entirely These issues suggest some of the directions in which work on data com-
white: roughly 1,000 of the million pixels are black, and the rest are white. Now, pression has proceeded. In many of these cases, there is a trade-off between
if we wanted to compress such an image, the whole approach of prefix codes the power of the compression technique and its computational cost. In partic-
has very little to say: we have a text of length one million over the two-letter ular, many of the improvements to Huffman codes just described come with
alphabet {black, white}. As a result, the text is already encoded using one bit a corresponding increase in the computational effort needed both to produce
per letter--the lowest possible in our framework. the compressed version of the data and also to decompress it and restore the
It is clear, though, that such images should be highly compressible. original text. Finding the right balance among these trade-offs is a topic of
Intuitively, one ought to be able to use a "fraction of a bit" for each white pixel, active research.
since they are so overwhelmingly frequent, at the cost of using multiple bits
for each black pixel. (In an extreme version, sending a list of (x, y) coordinates
for each black pixel would be an improvement over sending the image as a * 4.9 Minimum-Cost Arborescences: A Multi-Phase
text with a million bits.) The challenge here is to define an encoding scheme Greedy Algorithm
where the notion of using fractions of bits is well-defined. There are results
As we’ve seen more and more examples of greedy algorithms, we’ve come to
in the area of data compression, however, that do iust this; arithmetic coding
appreciate that there can be considerable diversity in the way they operate.
and a range of other techniques have been developed to handle settings like
Many greedy algorithms make some sort of an initial "ordering" decision on
this. the input, and then process everything in a one-pass fashion. Others make
A second drawback of prefix codes, as defined here, is that they cannot more incremental decisions--still local and opportunistic, but without a g!obal
adapt to changes in the text. Again let’s consider a simple example. Suppose we "plan" in advance. In this section, we consider a problem that stresses our
are trying to encode the output of a program that produces a long sequence intuitive view of greedy algorithms still further.
of letters from the set {a, b, c, d}. Further suppose that for the first half of
this sequence, the letters a and b occur equally frequently, while c and d do
not occur at all; but in the second half of this sequence, the letters c and d ,~J The Problem
occur equally frequently, while a and b do not occur at all. In the framework The problem is to compute a minimum-cost arborescence of a directed graph.
developed in this section, we are trying to compress a text over the four-letter This is essentially an analogue of the Minimum Spanning Tree Problem for
alphabet {a, b, c, d}, and all letters are equally frequent. Thus each would be directed, rather than undirected, graphs; we will see that the move to directed
encoded with two bits. graphs introduces significant new complications. At the same time, the style
But what’s really happening in this example is that the frequency remains of the algorithm has a strongly greedy flavor, since it still constructs a solution
stable for half the text, and then it changes radically. So one could get away according to a local, myopic rule.
with iust one bit per letter, plus a bit of extra overhead, as follows. We begin with the basic definitions. Let G = (V, E) be a directed graph in
o Begin with an encoding in which the bit 0 represents a and the bit 1 which we’ve distinguished one node r ~ V as a root. An arborescence (with
represents b. respect to r) is essentially a directed spanning tree rooted at r. Specifically, it
is a subgraph T = (V, F) such that T is a spanning tree of G if we ignore the
o Halfway into the sequence, insert some kind of instruction that says,
direction of edges; and there is a path in T from r to each other node v ~ V if
"We’re changing the encoding now. From now on, the bit 0 represents c we take the direction of edges into account. Figure 4.18 gives an example of
and the bit I represents d:’ two different arborescences in the same directed graph.
o Use this new encoding for the rest of the sequence. There is a useful equivalent way to characterize arborescences, and this
The point is that investing a small amount of space to describe a new encoding is as follows.
can pay off many times over if it reduces the average number of bits per
4.9 Minimum-Cost Arborescences: A Multi-Phase Greedy Algorithm 179
Chapter 4 Greedy Algorithms
178
The basic problem we consider here is the following. We are given a
directed graph G = (V, E), with a distinguished root node r and with a non-
negative cost ce >_ 0 on each edge, and we wish to compute an arborescence
rooted at r of minimum total cost. (We will refer to this as an optimal arbores-
cence.) We will assume throughout that G at least has an arborescence rooted
at r; by (4.35), this can be easily checked at the outset.

L4~ Designing the Algorithm

Given the relationship between arborescences and trees, the minimum-cost
arborescence problem certainlyhas a strong initial resemblance to the Mini-
mum Spanning Tree Problem for undirected graphs. Thus it’s natural to start
by asking whether the ideas we developed for that problem can be carried
over directly to this setting. For example, must the minimum-cost arbores-
cence contain the cheapest edge in the whole graph? Can we safely delete the
Figttre 4.18 A directed graph can have many different arborescences. Parts (b) and (c) most expensive edge on a cycle, confident that it cannot be in the optimal
depict two different aborescences, both rooted at node r, for the graph in part (a).
arborescence?
Clearly the cheapest edge e in G will not belong to the optimal arborescence
(4.t4) A subgraph T = (V, F) of G is an arborescence with respect to root r if if e enters the root, since the arborescence we’re seeking is not supposed to
and only if T has no cycles, and for each node v ~ r, there is exactly one edge have any edges entering the root. But even if the cheapest edge in G belongs
to some arborescence rooted at r, it need not belong to the optimal one, as
in F that enters v.
the example of Figure 4.19 shows. Indeed, including the edge of cost 1 in
Proof. If T is an arborescence with root r, then indeed every other node v Figure 4.!9 would prevent us from including the edge of cost 2 out of the
has exactly one edge entering it: this is simply the last edge on the unique r-v root r (since there can only be one entering edge per node); and this in turn
path. would force us to incur an unacceptable cost of 10 when we included one of
Conversely, suppose T has no cycles, and each node v # r has exactly
one entering edge. In order to establish that T is an arborescence, we need
only show that there is a directed path from r to each other node v. Here is
how to construct such a path. We start at v and repeatedly follow edges in
the backward direction. Since T has no cycles, we can never return tO a node 2
10 10
we’ve previously visited, and thus this process must terminate. But r is the
only node without incoming edges, and so the process must in fact terminate
4
by reaching r; the sequence of nodes thus visited yields a path (in the reverse
direction) from r to v. m

It is easy to see that, just as every connected graph has a spanning tree, a
directed graph has an arborescence rooted at r provided that r can reach every
node. Indeed, in this case, the edges in a breadth-first search tree rooted at r
will form an arborescence.
(a)
(4.t5) A directed graph G has an arborescence rooted at r if and only if the¢e Figure 4.19 (a) A directed graph with costs onits edges, and (b) an optimal arborescence
rooted at r for this graph.
_
4.9 Minimum-Cost Arborescences: A Multi-Phase Greedy Algorithm 181
Chapter 4 Greedy Algorithms
180
This is because an arborescence has exactly one edge entering each node
the other edges out of r. This kind of argument never clouded our thinking in in the sum. Since the difference between the two costs is independent of the
the Minimum Spanning Tree Problem, where it was always safe to plunge choice of the arborescence T, we see that T has minimum cost subiect to {ce}
ahead and include the cheapest edge; it suggests that finding the optimal if and only if it has minimum cost subject to {c’e}. ,,
arborescence may be a significantly more complicated task. (It’s worth noticing
that the optimal arborescence in Figure 4.19 also includes the most expensive
We now consider the problem in terms of the costs {de}. All the edges in
edge on a cycle; with a different construction, one can even cause the optimal
our set F* have cost 0 under these modified costs; and so if (V, F*) contains
arborescence to include the most expensive edge in the whole graph.) a cycle C, we know that all edges in C have cost 0. This suggests that we can
Despite this, it is possible to design a greedy type of algorithm for this afford to use as many edges from C as we want (consistent with producing an
problem; it’s just that our myopic rule for choosing edges has to be a little arborescence), since including edges from C doesn’t raise the cost.
more sophisticated. First let’s consider a little more carefully what goes wrong Thus our algorithm continues as follows. We contract C into a single
with the general strategy of including the cheapest edges. Here’s a particular supemode, obtaining a smaller graph G’ = (V’, E’). Here, V’ contains the nodes
version of this strategy: for each node v # r, select the cheapest edge entering of V-C, plus a single node c* representing C. We transform each edge e E E to
v (breaking ties arbitrarily), and let F* be this set of n - 1 edges. Now consider an edge e’ E E’ by replacing each end of e that belongs to C with the new node
the subgraph (V, F*). Since we know that the optimal arborescence needs to c*. This can result in G’ having parallel edges (i.e., edges with the same ends),
have exactly one edge entering each node v # r, and (V, F*) represents the which is fine; however, we delete self-loops from E’--edges that have both
cheapest possible way of making these choices, we have the following fact2 ends equal to c*. We recursively find an optimal arborescence in this smaller
(4.36) I[ (V, F*) is an arborescence, then it is a minimum-cost arborescence. graph G’, subject to the costs {C’e}. The arborescence returned by this recursive
call can be converted into an arborescence of G by including all but one edge
on the cycle C.
So the difficulty is that (V, F*) may not be an arborescence. In this case, In summary, here is the full algorithm.
(4.34) implies that (V, F*) must contain a cycle C, which does not include the
root. We now must decide how to proceed in this situation. For each node u7&r
To make matters somewhat clearer, we begin with the following observa- Let Yu be the minimum cost of an edge entering node
tion. Every arborescence contains exactly one edge entering each node v # r; Modify the costs of all edges e entering v to c’e=ce
so if we pick some node v and subtract a uniform quantity from the cost of Choose one 0-cost edge entering each u7~r, obtaining a set F*
every edge entering v, then the total cost of every arborescence changes by If F* forms an arborescence, then return it
exactly the same amount. This means, essentially, that the actual cost of the Else there is a directed cycle C_CF*
cheapest edge entering v is not important; what matters is the cost of all other Contract C to a single supernode, yielding a graph G’= (V’,E’)
edges entering v relative to this. Thus let Yv denote the minimum cogt of any Recursively find an optimal arborescence (V’,F’) in G’
edge entering v. For each edge e = (u, v), with cost ce >_ O, we define its modi- with costs [C’e}
fied cost c’e to be ce - Yv- Note that since ce >_ y,, all the modified costs are still Extend (V’,F ~) to an arborescence (V, F) in G
nonnegafive. More crucially, our discussion motivates the following fact. by adding all but one edge of C

(4.37) T is an optimal arborescence in G subject to costs (Ce) if and Only if it

is an optimal arborescence Subject to the modified costs c’ ~ Analyzing the Algorithm
It is easy to implement this algorithm so that it runs in polynomial time. But
ProoL Consider an arbitrary arborescence T. The difference between its cost does it lead to an optimal arborescence? Before concluding that it does, we need
with costs (ce} and [c’e} is exactly ~,#r y~--that is, to worry about the following point: not every arborescence in G corresponds to
an arborescence in the contracted graph G’. Could we perhaps "miss" the true
optimal arborescence in G by focusing on G’? What is true is the following.
eaT eaT \
Solved Exercises 183
Chapter 4 Greedy Algorithms
182

The arborescences of G’ are in one-to-one correspondence with arborescences The algorithm finds an optimal arborescence robted at ~ in G: ’:
of G that have exactly one edge entering the cycle C; and these corresponding
arborescences have the same cost with respect to {c’e}, since C consists of 0- Proof. The proof is by induction on the number of nodes in G. If the edges
cost edges. (We say that an edge e = (u, v) enters C if v belongs to C but u does of F form an arborescence, then the algorithm returns an optimal arborescence
not.) So to prove that our algorithm finds an optimal arborescence in G, we by (4.36). Otherwise, we consider the problem with the modified costs {c’e},
must prove that G has an optimal arborescence with exactly one edge entering which is equivalent by (4.37). After contracting a 0-cost cycle C to obtain a
C. We do this now. smaller graph G’, the algorithm produces an optimal arborescence in G’ by the
inductive hypothesis. Finally, by (4.38), there is an optimal arborescence in G
(4.38) Let C be a cycle in G consisting of edges of cost O, such that r ~ C. that corresponds to the optimal arborescence computed for G’. ~
Then there is an optimal arborescence rooted at r that has exactly one edge
entering C.
Solved Exercises
Proof. Consider an optimal arborescence T in G. Since r has a path in T to
every node, there is at least one edge of T that enters C. If T enters C exactly Solved Exercise 1
once, then we are done. Otherwise, suppose that T enters C more than once. Suppose that three of your friends, inspired by repeated viewings of the
We show how to modify it to obtain a.n arborescence of no greater cost that horror-movie phenomenon The Blair Witch Project, have decided to hike the
enters C exactly once. Appalachian Trail this summer. They want to hike as much as possible per
Let e = (a, b) be an edge entering C that lies on as short a path as possible day but, for obvious reasons, not after dark. On a map they’ve identified a
from r; this means in particular that no edges on the path from r to a can enter large set of good stopping points for camping, and they’re considering the
C. We delete all edges of T that enter C, except for the edge e. We add in all following system for deciding when to stop for the day. Each time they come
edges of C except for the one edge that enters b, the head of edge e. Let T’ to a potential stopping point, they determine whether they can make it to the
denote the resulting subgraph of G. next one before nightfall. If they can make it, then they keep hiking; otherwise,
they stop.
We claim that T’ is also an arborescence. This will establish the result,
since the cost of T’ is clearly no greater than that of T: the only edges of Despite many significant drawbacks, they claim this system does have
T’ that do not also belong to T have cost 0. So why is T’ an arborescence? one good feature. "Given that we’re only hiking in the daylight," they claim,
Observe that T’ has exactly one edge entering each node v # r, and no edge "it minimizes the number of camping stops we have to make."
entering r. So T’ has exactly n - 1 edges; hence if we can show there is an Is this true? The proposed system is a greedy algorithm, and we wish to
path in T’ for each v, then T’ must be connected in an undirected sense, and determine whether it minimizes the number of stops needed.
hence a tree. Thus it would satisfy our initial definition of an arborescence.
To make this question precise, let’s make the following set of simplifying
So consider any node v # r; we must show there is an r-v path in T’. If assumptions. We’ll model the Appalachian Trail as a long line segment of
v ~ C, we can use the fact that the path in T from r to e has been preserved length L, and assume that your friends can hike d miles per day (independent
in the construction of T’; thus we can reach v by first reaching e and then of terrain, weather conditions, and so forth). We’ll assume that the potential
following the edges of the cycle C. Now suppose that v C, and let P denote stopping points are located at distances xl, x2 ..... xn from the start of the
the r-v path in T. If P did not touch C, then it sti!l exists in T’. Otherwise, trail. We’ll also assume (very generously) that your friends are always correct
let tv be the last node in P C~ C, and let P’ be the subpath of P from tv to v. when they estimate whether they can make it to the next stopping point before
Observe that all the edges in P’ still exist in T’. We have already argued that nightfall.
u~ is reachable from r in T’, since it belongs to C. Concatenating this path We’ll say that a set of stopping points is valid if the distance between each
to tv with the subpath P’ gives us a path to v as well. ,, adjacent pair is at most d, the first is at distance at most d from the start of
the trail, and the last is at distance at most d from the end of the trai!. Thus
We can now put all the pieces together to argue that our algorithm is a set of stopping points is valid if one could camp only at these places and
correct. ,
Solved Exercises 185
Chapter 4 Greedy Algorithms
184

stil! make it across the whole trail. We’ll assume, naturally, that the full set of on the first day before stopping. Now let ] > 1 and assume that the claim is
true for all i < j. Then
n stopping points is valid; otherwise, there would be no way to make it the
whole way. xqj - xqj_l <_ d,
We can now state the question as follows. Is your Mends’ greedy
algorithm--hiking as long as possible each day--optimal, in the sense that it since S is a valid set of stopping points, and
finds a valid set whose size is as small as possible.~
xq~ - Xp~:~ < xq~ - xq~_~
SoIation Often a greedy algorithm looks correct when you first encounter it,
since xp~.l > xqj_l by the induction hypothesis. Combining these two inequal-
so before succumbing too deeply to its intuitive appeal, it’s useful to ask: why
ities, we have
might it not work~. What should we be worried about.~
There’s a natural concern with this algorithm: Might it not help to stop
early on some day, so as to get better synchronized with camping opportunities
on future days~. But if you think about it, you start to wonder whether this could This means that your Mends have the option of hiking all the way from
really happen. Could there really be an alternate solution that intentionally.lags Xpi_~ to Xqi in one day; and hence the location Xpj at which they finally stop
behind the greedy solution, and then puts on a burst of speed and passes the can only be farther along than xq~. (Note the similarity with the corresponding
greedy solution? How could it pass it, giv._en that the greedy solution travels as proof for the Interval Scheduling Problem: here too the greedy algorithm is
far as possible each day? staying ahead because, at each step, the choice made by the alternate solution
This last consideration starts to look like the outline of an argument based is one of its valid options.) i
on the "staying ahead" principle from Section 4.1. Perhaps we can show that as
long as the greedy camping strategy is ahead on a given day, no other solution we must have Xp,
Statement n < L -implies
(4.40) d, for otherwise your that
in particular MendsXqmwould
<_ Xpm.never
NOW, have needed
if m < k, then
can catch up and overtake it the next day.
We now turn this into a proof showing the algorithm is indeed optimal, to stop at the location Xp,~+~. Combining these two inequalities, we have
identifying a natural sense in which the stopping points it chooses "stay ahead" concluded that Xqm < L -- d; but this contradicts the assumption that S is a
of any other legal set of stopping points. Although we are following the style valid set of stopping points.
of proof from Section 4.1, it’s worth noting an interesting contrast with the Consequently, we cannot have m < k, and so we have proved that the
Interval Scheduling Problem: there we needed to prove that a greedy algorithm greedy algorithm produces a valid set of stopping points of minimum possible
maximized a quantity of interest, whereas here we seek to minimize a certain size.
quantity.
Let R = {xpl ..... xpk} denote the set of stopping points chosen- by the Solved Exercise 2
greedy algorithm, and suppose by way of contradiction that there is a smaller Your ~ends are starting a security company that needs to obtain licenses for
valid set of stopping points; let’s call this smaller set S = {xq~ ..... xqm}, with n different pieces of cryptographic software. Due to regulations, they can only
obtain these licenses at the rate of at most one per month.
To obtain a contradiction, we first show that the stopping point reached by Each license is currently selling for a price of $!00. However, they are
the greedy algorithm on each day j is farther than the stopping point reached all becoming more expensive according to exponential growth curves: in
under the alternate solution. That is, particular, the cost of license] increases by a factor of rj > 1 each month, where
rj is a given parameter. This means that if license] is purchased t months from
(4.40) For each j = 1, 2 ..... m, we have Xpj > x~tj. now, it will cost 100. r~. We will assume that all the price growth rates are
distinct; that is, ri ~ r1 for licenses i # ] (even though they start at the same
Proof. We prove this by induction on j. The case j = 1 follows directly from
price of $100).
the definition of the greedy algorithm: your friends travel as long as possible
Chapter 4 Greedy Algorithms Solved Exercises 187
186

The question is: Given that the company can only buy at most one license to both expressions, we want to show that the second term is less than the
a month, in which order should it buy the licenses so that the total amount of first one. So we want to show that
money it spends is as small as possible? t ’ ~t+l t ~t+l
ri+~ + ’t < ri + lt+l
Give an algorithm that takes the n rates of price growth rI, r2 ..... rn, and
rt+ l ~t ~t+l ~t
computes an order in which to buy the licenses so that the total amount of t -- It < tt+l -- tt+l
money spent is minimized. The running time of your algorithm shonld be r~(rt - I) < r~+t(rt+~ - I).
polynomial in n.
But this last inequality is true simply because ri > 1 for all i and since rt < rt+P
Solution Two natural guesses for a good sequence would be to sort the ri in
decreasing order, or to sort them in increasing order. Faced with alternatives This concludes the proof of correctness. The running time of the algorithm
like this, it’s perfectly reasonable to work out a small example and see if the is O(n log n), since the sorting takes that much time and the rest (outputting)
example eliminates at least one of them. Here we could try rl = 2, r2 = 3, and is linear. So the overall running time is O(n log n).
r3 = 4. Buying the licenses in increasing order results in a total cost of Note: It’s interesting to note that things become much less straightforward
if we vary this question even a little. Suppose that instead of buying licenses
100(2 -t- 32 4- 43) -= 7,500, whose prices increase, you’re trying to sell off equipment whose cost is
while buying them in decreasing order results in a total cost of depreciating. Item i depreciates at a factor of ri < I per month, starting from
$i00, so if you sell it t months from now you wil! receive 100. t (In
ri. other
100(4 + 32 + 23) ---- 2300. words, the exponential rates are now less than 1, instead of greater than 1.) If
you can only sell one item per month, what is the optimal order in which to
This tells us that increasing order is not the way to go. (On the other hand, it sell them? Here, it turns out that there are cases in which the optimal solution
doesn’t tell us immediately that decreasing order is the right answer, but our doesn’t put the rates in either increasing or decreasing order (as in the input
goal was just to eliminate one of the two options.)
4’ 2’ "
Let’s try proving that sorting the ri in decreasing order in fact always gives
the optimal solution. When a greedy algorithm works for problems like this,
in which we put a set of things in an optimal order, we’ve seen in the text that Solved Exercise 3
it’s often effective to try proving correctness using an exchange argument. Suppose you are given a connected graph G, with edge costs that you may
To do this here, let’s suppose that there is an optimal solution O that assume are all distinct. G has n vertices and m edges. A particular edge e of G
is specified. Give an algorithm with running time O(m + n) to decide whether
differs from our solution S. (In other words, S consists of the licenses sorted in
decreasing order.) So this optimal solution O must contain an inversion--that e is contained in a minimum spanning tree of G.
is, there must exist two neighboring months t and t + 1 such that the price Solution From the text, we know of two rules by which we can conclude
increase rate of the license bought in month t (let us denote it by rti is less whether an edge e belongs to a minimum spanning tree: the Cut Property
than that bought in month t + 1 (similarly, we use rt+l to denote this). That (4.17) says that e is in every minimum spanning tree when it is the cheapest
is, we have rt < rt+l. edge crossing from some set S to the complement V - S; and the Cycle Property
We claim that by exchanging these two purchases, we can strictly improve (4.20) says that e is in no minimum spanning tree if it is the most expensive
our optimal solution, which contradicts the assumption that O was optimal. edge on some cycle C. Let’s see if we can make use of these two rules as part
Therefore if we succeed in showing this, we will successflflly show that ’ou.r of an algorithm that solves this problem in linear time. ,
algorithm is indeed the correct one. - Both the Cut and Cycle Properties are essentially talking about how e
Notice that if we swap these two purchases, the rest of the purchafies relates to the set of edges that are cheaper than e. The Cut Property can be
are identically priced. In O, the amount paid during the two months involved viewed as asking: Is there some set S __ V so that in order to get from S to V - S
in the swap is 100(r[ +q+u’-t+~" On the other hand, if we swapped these two without using e, we need to use an edge that is more expensive than e? And
purchases, we would pay 100(r~+~ + r~+~). Since the constant 100 is common if we think about the cycle C in the statement of the Cycle Property, going the
Exercises 191
Chapter 4 Greedy Algorithms
190
Your friend is working as a camp counselor, and he is in charge of
Prove that, for a given set of boxes with specified weights, the greedy
organizing activities for a set of junior-high-school-age campers. One of
algorithm currently in use actually minimizes the number of trucks that
his plans is the following mini-triathalon exercise: each contestant must
are needed. Your proof should follow the type of analysis we used for
swim 20 laps of a pool, then bike 10 miles, then run 3 miles. The plan is
the Interval Scheduling Problem: it should establish the optimality of this
to send the contestants out in a staggered fashion, via the following rule:
greedy packing algorithm by identif34ng a measure under which it "stays
the contestants must use the pool one at a lime. In other words, first one
ahead" of all other solutions.
contestant swims the 20 laps, gets out, and starts biking. As soon as this
Some of your friends have gotten Into the burgeoning field of time-series first person is out of the pool, a second contestant begins swimming the
data mining, in which one looks for patterns in sequences of events that 20 laps; as soon as he or she is out and starts biking, a third contestant
occur over time. Purchases at stock exchanges--what’s being bought-- begins swimming.., and so on.)
are one source of data with a natural ordering in time. Given a long Each contestant has a projected swimming time (the expected time it
sequence S of such events, your friends want an efficient way to detect will take him or her to complete the 20 laps), a projected biking time (the
certain "patterns" in them--for example, they may want to know if the expected time it will take him or her to complete the 10 miles of bicycling),
four events and a projected running time (the time it will take him or her to complete
the 3 miles of running). Your friend wants to decide on a schedule for the
buy Yahoo, buy eBay, buy Yahoo, buy Oracle
triathalon: an order in which to sequence the starts of the contestants.
occur in this sequence S, in order but not necessarily consecutively. Let’s say that the completion time of a schedul~ is the earliest time at
They begin with a collection of possible events (e.g., the possible’ which all contestants will be finished with all three legs of the triathalon,
transactions) and a sequence S of n of these events. A given event may assuming they each spend exactly their projected swimming, biking, and
occur multiple times in S (e.g., Yahoo stock may be bought many times running times on the three parts. (Again, note that participants can bike
In a single sequence S). We will say that a sequence S’ is a subsequence and run simultaneously, but at most one person can be in the pool at
of S if there is a way to delete certain of the events from S so that the any time.) What’s the best order for sending people out, if one wants the
remaining events, in order, are equal to the sequence S’. So, for example, whole competition to be over as early as possible? More precisely, give
the sequence of four events above is a subsequence of the sequence an efficient algorithm that produces a schedule whose completion time
is as small as possible.
buy Amazon, buy Yahoo, buy eBay, buy Yahoo, buy Yahoo,
buy Oracle
The wildly popular Spanish-language search engine E1 Goog needs to do
Their goal is to be able to dream up short sequences and quickly a serious amount of computation every time it recompiles its index. For-
detect whether they are subsequences of S. So this is the problem they tunately, the company has at its disposal a single large supercomputer,
pose to you: Give an algorithm that takes two sequences of even~s--S’ of together with an essentia!ly unlimited supply of high-end PCs.
length m and S of length n, each possibly containing an event more than They’ve broken the overall computation into n distinct jobs, labeled
once--and decides in time O(m + n) whether S’ is a subsequence of S.
71, J2 ..... Jn, which can be performed completely Independently of one
another. Each job consists of two stages: first it needs to be preprocessed
Let’s consider a long, quiet country road with houses scattered very
on the supercomputer, and then it needs to be finished on one of the
sparsely along it. (We can picture the road as a long line segment, with
PCs. Let’s say that job J~ needs p~ seconds of time on. the supercomputer,
an eastern endpoint and a western endpoint.) Further, let’s suppose that
followed by f~ seconds of time on a PC.
despite the bucolic setting, the residents of all these houses are avid cell
phone users. You want to place cell phone base stations at certain points Since there are at least n PCs available on the premises, the finishing
along the road, so that every house is within four miles of one of the base of the jobs can be performed fully in para!lel--all the jobs can be pro-
stations. cessed at the same time. However, the supercomputer can only work on
a single job at a time, so the system managers need to work out an order
Give an efficient algorithm that achieves this goal, using as few base
in which to feed the jobs to the supercomputer. As soon as the first job
stations as possible.
Chapter 4 Greedy Algorithms Exercises 193
192

in order is done on the supercomputer, it can be handed off to a PC for Suppose T is no longer the minimum-cost spanning tree. Give a
finishing; at that point in time a second job can be fed to the supercom- linear-time algorithm (time O(IEI)) to update the tree T to the new
purer; when the second job is done on the supercomputer, it can proceed minLmum-cost spanning tree.
to a PC regardless of whether or not the first job is done (since the PCs
work in parallel); and so on. 11. Suppose you are given a connected graph G = (V, E), with a cost ce on
each edge e. In an earlier problem, we saw that when all edge costs are
Let’s say that a schedule is an ordering of the jobs for the super-
distinct, G has a unique minimum spanning tree. However, G may have
computer, and the completion time of the schedule is the earliest time at
many minimum spanning trees when the edge costs are not all distinct.
which all jobs will have finished processing on the PCs. This is an impor-
Here we formulate the question: Can Kruskal’s Algorithm be made to find
tant quantity to minimize, since it determines how rapidly E1 Goog can
all the minimum spanning trees of G?
generate a new index.
RecaLl that Kxuskal’s Algorithm sorted the edges in order of increas-
Give a polynomial-time algorithm that finds a schedule with as small
ing cost, then greedily processed edges one by one, adding an edge e as
a completion time as possible. long as it did not form a cycle. When some edges have the same cost, the
phrase "in order of increasing cost" has to be specified a little more care-
8. Suppose you are given a connected graph G, with edge costs that are all fully: we’Ll say that an ordering of the edges is valid if the corresponding
distinct. Prove that G has a tmique minimum spann~g tree. sequence of edge costs is nondecreasing. We’Ll say that a valid execution
of Kruskal’s Algorithm is one that begins with a valid ordering of the
One of the basic motivations behind the’Minimum Spanning Tree Proble~fi edges of G.
is the goal of designing a spanning network for a set of nodes with
For any graph G, and any minimum spanning tree T of G, is there a
minimum total cost. Herewe explore another type of objective: designing
valid execution of Kruskal’s Algorithm onG that produces T as output?
a spanning network for which the most expensive edge is as cheap as
Giv,e a proof or a countere.xample.
possible.
Specifically, let G -= (V, E) be a connected graph with n vertices, m 12. Suppose you have n video streams that need to be sent, one after another,
edges, and positive edge costs that you may assume are all distinct. Let over a communication link. Stream i consists of a total of bi bits that need
T = (V, E’) be a spanning tree of G; we define the bottleneck edge of T to to be sent, at a constant rate, over a period of ti seconds. You cannot send
be the edge of T with the greatest cost. two streams at the same time, so you need to determine a schedule for the
A spanning tree T of G is a minimum-bottleneck spanning tree ff there streams: an order in which to send them. Whichever order you choose,
is no spanning tree T’ of G with a cheaper bottleneck edge. there cannot be any delays between the end of one stream and the start
(a) Is every minimum-bottleneck tree of G a minimum spanning tree of of the next. Suppose your schedule starts at time 0 (and therefore ends at
time ~1 ti, whichever order you choose). We assume that all the values
G? Prove or give a counterexample.
bi and t~ are positive integers.
(b) Is every minimum spanning tree of G a minimum-bottleneck tree of
G? Prove or give a counterexample. Now, because you’re just one user, the link does not want you taking
up too much bandwidth, so it imposes the following constraint, using a
fixed parameter r:
10. Let G = (V, E) be an (undirected) graph with costs ce >_ 0 on the edges e ~ E.
Assume you are given a minimum-cost spanning tree T in G. Now assume (,) For each natural number t > O, the total number of bits you send over the
that a new edge is added to G, connecting two nodes v, tv V with cost c. time interval from 0 to t cannot exceed rt.
(a) Give an efficient algorithm to test if T remains the minimum-cost
spanning tree with the new edge added to G (but not to the tree T). Note that this constraint is only imposed for time intervals that start at
0, not for time intervals that start at any other value.
Make your algorithm run in time O(IEI). Can you do it in O(IVI) time?
Please note any assumptions you make about what data structure is We say that a schedule is valid if it satisfies the constraint (.) imposed
used to represent the tree T and the graph G. by the link.
Exercises 195
Chapter 4 Greedy Algorithms
194
w2 = 2. Then doing job 1 first would yield a weighted completion time
The Problem. Given a set of n streams, each specified by its number of
of 10.1 + 2.4 = 18, while doing the second job first would yield the larger
bits bi and its time duration ti, as well as the link parameter r, determine
weighted completion time of 10.4 + 2.3 = 46.
whether there exists a valid schedule.
Example. Suppose we have n = 3 streams, with
(hi, q) = (2000, 1), (b2, t2) = (6000, 2), (b3, t3) = (2000, 1), 14. You’re working with a group of security consultants who are helping to
monitor a large computer system. There’s particular interest in keeping
and suppose the link’s parameter is r = 5000. Then the schedule that runs track of processes that are labeled "sensitive." Each such process has a
the streams in the order 1, 2, 3, is valid, since the constraint (.) is satisfied: designated start time and finish time, and it rtms continuously between
t = 1: the whole first stream has been sent, and 2000 < 5000.1 these times; the consultants have a list of the planned start and finish
t = 2: half of the second stream has also been sent, times of al! sensitive processes that will be run that day.
and 2000+ 5000 5000- 2 As a simple first step, they’ve written a program called s~ca~;us_check
Similar calcalations hold for t = 3 and t = 4. that, when invoked, runs for a few seconds and records various pieces
(a) Consider the following claim: of logging information about all the sensitive processes running on the
system at that moment. (We’ll model each invocation of status_check
Claim: There exists a valid schedule if and only if each stream i satisfies
as lasting for only this single point in time.) What they’d like to do is to
bi < rti. run status_check as few times as possible during the day, but enough
Decide whether you think the claim is true or false, and give a proof that for each sensitive process P, status_check is invoked at least once
of either the claim or its negation. during the execution of process P.
(b) Give an algorithm that takes a set of n streams, each specified by its (a) Give an efficient algorithm that, given the start and finish times of
number of bits bi and its time duration ti, as well as the link parameter all the sensitive processes, finds as small a set of times as possi-
r, and determines whether there exists a valid schedule. The rtmning ble at which to invoke s~;a~cus_check, subject to the requirement
time of your algorithm should be polynomial in n. that s~a~cus_check is invoked at least once during each sensitive
process P.
A small business--say, a photocopying service with a single large
machine--faces the following scheduling problem. Each morning they (b) WtKle you were designing your algorithm, the security consultants
get a set of jobs from customers. They want to do the jobs on their single were engaging in a little back-of-the-envelope reasoning. "Suppose
machine in an order that keeps their customers happiest. Customer i’s we can find a set of k sensitive processes with the property that no
job will take ti time to complete. Given a schedule (i.e., an ordering of the two are ever running at the same time. Then clearly your algorithm
jobs), let Ci denote the finishing time of job i. For example, if job j is the will need to invoke s~ca~;us_check at least k times: no one invocation
first to be donel we would have Ci = tj; and ff job j is done right after job of s~a~cus_check can handle more than one of these processes."
i, we would have Ci = Q + ti. Each customer i also has a given weight wg This is true, of course, and after some further discussion, you al!
~sents his or her importance to the business. The happiness of begin wondering whether something stronger is true as well, a kind
customer i is expected to be dependent o~ the finishing time of i’s job. of converse to the above argument. Suppose that k* is the largest
So the company decides that they want to order the jobs to mJnimlze the value of k such that one can find a set of k sensitive processes with
weighted sum of the completion times, ~,n i=1 wiCi" no two ever running at the same time. Is it the ~ase that there must
Design an efficient algorithm to solve this problem. That is, you are be a set of k* times at which you can run s~a~;us_check so that some
given a set of n jobs with a processing time ti and a weight w~ for each invocation occurs during the execution of each sensitive process? (In
job. You want to order the jobs so as to minimize the weighted sum of other words, the kind of argument in the previous paragraph is really
the completion times, ~P=I wiCi- the only thing forcing you to need a lot of invocations of
check.) Decide whether you think this claim is true or false, and give
Example. Suppose there are two jobs: the first takes time q = ! and has
a proof or a counterexample.
weight wl = !0, while the second job takes time t2 = 3 and has weight
Exercises 197
Chapter 4 Greedy Algorithms
196
whether it’s possible to associate each of the account’s n events with
15. The manager of a large student union on campus comes to you with the a distinct one of the n suspicious transactions in such a way that, if the
following problem. She’s in charge of a group of n students, each of whom
account event at time x~ is associated with the suspicious transaction that
is scheduled to work one shift during the week. There are different jobs
occurred approximately at time tj, then Itj - x~l <_ e~. (In other words, they
associated with these shifts (tending the main desk, helping with package
want to know if the activity on the account lines up with the suspicious
delivery, rebooting cranky information kiosks, etc.), but.we can view each
transactions to within the margin of error; the tricky part here is that
shift as a single contiguous interval of time. There can be multiple shifts
they don’t know which account event to associate with which suspicious
going on at once. transaction.)
She’s trying to choose a subset of these n students to form a super-
Give an efficient algorithm that takes the given data and decides
vising committee that she can meet with once a week. She considers such
whether such an association exists. If possible, you should make the
a committee to be complete if, for every student not on the committee,
running time be at most O(n2).
that student’s shift overlaps (at least partially) the shift of some student
who is on the committee. In this way, each student’s performance can be 17. Consider the following variation on the Interval Scheduling Problem. You
observed by at least one person who’s serving on the committee. have a processor that can operate 24 hours a day, every day. People
Give an efficient algorithm that takes the schedule of n shifts and submit requests to run daily jobs on the processor. Each such job comes
produces a complete supervising committee containing as few students with a start time and an end time; if the job is accepted to run on the
as possible. processor, it must run conl~nuously, every day, for the period between
Example. Suppose n = 3, and the shifts are its start and end times. (Note that certain jobs can begin before midnight
and end after midnight; this makes for a type of situation different from
Monday 4 p.M.-Monday 8 P.M., what we saw in the Interval Scheduling Problem.)
Monday 6 p.M.-Monday 10 P.M.,
Given a list of n such jobs, your goal is to accept as many jobs as
Monday 9 P.M.-Monday 1I P.M..
possible (regardless of their length), subject to the constraint that the
Then the smallest complete supervising committee would consist of just processor can run at most one job at any given point in time. Provide an
the second student, since the second shift overlaps both the first and the algorithm to do this with a running time that is polynomial in n. You may
third. assume for simplicity that no two jobs have the same start or end times.
Example. Consider the fol!owing four jobs, specified by (start-time, end-
16. Some security consultants wor~g in the financial domain are cur-
rently advising a client who is investigating a potential money-latmdering time) pairs.
scheme. The investigation thus far has indicated that n suspicious trans- (6 P.M., 6 A.M.), (9 P.M., 4 A.M.), (3 A.M., 2 P.M.), (1 P.M., 7 P.M.).
actions took place in recent days, each involving money transferred into a
single account. Unfortunately, the sketchy nature of the evidence to date The optimal solution would be to pick the two jobs (9 P.M., 4 A.M.) and (1
means that they don’t know the identiW of the account, the amounts of P.M., 7 P.~1.), which can be scheduled without overlapping.
the transactions, or the exact t~nes at which the transactions took place.
18. Your friends are planning an expedition to a small town deep in the Cana-
What they do have is an approximate time-stamp for each transaction; the
dian north next winter break. They’ve researched all the travel options
evidence indicates that transaction i took place at time ti ~: e~, for some
and have drawn up a directed graph whose nodes represent intermediate
"margin of error" ev (In other words, it took place sometime between t~ - ei
destinations and edges represent the roads between them.
and t~ + e~.) Note that different transactions may have different margins
In the course of this, they’ve also learned that extreme weather causes
of error.
roads in this part of the world to become quite slow in the winter and
In the last day or so, they’ve come across a bank account that (for
may cause large travel delays. They’ve found an excellent travel Web site
other reasons we don’t need to go into here) they suspect might be the
that can accurately predict how fast they’ll be able to trave_l along the
one involved in the crime. There are n recent events involving the account,
roads; however, the speed of travel depends on the time of year. More
which took place at times Xl, x2 ..... xn. To see whether it’s plausible
precisely, the Web site answers queries of the following form: given an
that this really is the account they’re looking for, they’re wondering
Chapter 4 Greedy Algorithms Exercises 199
198
edge e = (u, w) connecting two sites v and w, and given a proposed starting Show that such a tree exists, and give an efficient algorithm to find
time t from location u, the site will return a value fe(t), the predicted one. That is, give an algorithm constructing a spanning tree T in which,
for each u, v v, the bottleneck rate of the u-v path in T is equal to the
arrival time at w. The Web site guarantees that re(t) >_ t for all edges e
and all times t (you can’t travel backward in time), and that fe(t) is a best achievable bottleneck rate for the pair u, v in G.
monotone increasing function of t (that is, you do not arrive earlier by 20. Every September, somewhere In a far-away mountainous part of the
starting later). Other than that, the functions fe(t) may be arbitrary. For world, the county highway crews get together and decide which roads to
example, in areas where the travel time does not vary with the season, keep dear through thecoming winter. There are n towns in this county,
we would have fe(t) = t + ee, where ee is the time needed to travel from the and the road system can be viewed as a (connected) graph G = (V, E) on
beginning to the end of edge e. this set of towns, each edge representing a road joining two of them.
Your friends want to use the Web site to determine the fastest way In the winter, people are high enough up in the mountains that they
to travel through the directed graph from their starting point to their stop worrying about the length of roads and start worrying about their
intended destination. (You should assume that they start at time 0, and altitude--this is really what determines how difficult the trip will be.
that all predictions made by the Web site are completely correct.) Give a So each road--each edge e in the graph--is annotated with a number
polynomial-time algorithm to do this, where we treat a single query to ue that gives the altitude of the highest point on the road. We’ll assume
the Web site (based on a specific edge e and a time t) as taking a single that no two edges have exactly the same altitude value ae. The height of
computational step. a path P in the graph is then the maximum of ae over all edges e on P.
Fina~y, a path between towns i andj is declared tO be winter-optimal flit
achieves the minimum possible height over a~ paths from i to j.
19. A group of network designers at the communications company CluNet The highway crews are goIng to select a set E’ ~ E of the roads to keep
find themselves facing the following problem. They have a connected dear through the winter; the rest will be left unmaintained and kept off
graph G = (V, E), in which the nodes represent sites that want to com- limits to travelers. They all agree that whichever subset of roads E’ they
municate. Each edge e is a communication link, with a given available decide to keep clear, it should have the properW that (v, E’) is a connected
bandwidth by subgraph; and more strongly, for every pair of towns i and j, the height
For each pair of nodes u, u ~ V, they want to select a single u-u path P of the winter-optimal path in (V, E’) should be no greater than it is In the
on which this pair will communicate. The bottleneck rate b(V) of this p athbV fi~ graph G = (V, E). We’ll say that (V, E’) is a minimum-altitude connected
is the minimumbandwidth of any edge it contains; that is, b(P) = mine~p e. subgraph if it has this property.
The best achievable bottleneck rate for the pair u, v in G is simply the Given that they’re goIng to maintain ~s key property, however, they
maximum, over all u-v paths P in G, of the value b(P). otherwise want to keep as few roads clear as possible. One year, they hit
It’s getting to be very complicated to keep track of a path for each pair upon the following conjecture:
of nodes, and so one of the network designers makes a bold suggestion: The minimum spanning tree of G, with respect to the edge weights ae, is a
Maybe one can find a spanning tree T of G so that for every pair of nodes minimum-altitude connected subgraph.
u, v, the unique u-v path in the tree actually attains the best achievable
bottleneck rate for u, v in G. (In other words, even if you could choose (In an earlier problem, we claimed that there is a unique minimum span-
any u-v path in the whole graph, you couldn’t do better than the u-u path ning tree when the edge weights are distinct. Thus, thanks to the assump-
tion that all ae are distinct, it is okay for us to speak of the minimum
In T.)
spanning tree.)
This idea is roundly heckled in the offices of CluNet for a few days,
and there’s a natural reason for the skepticism: each pair of nodes Initially, this conjecture is somewhat counterintuitive, sInce the min-
might want a very different-looking path to maximize its bottleneck rate; imum spanning tree is trying to minimize the sum of the values ae, while
why should there be a single tree that simultaneously makes everybody the goal of minimizing altitude seems to be asking for a fully different
happy? But after some failed attempts to rule out the idea, people begin thing. But lacking an argument to the contrary, they begin considering an
even bolder second conjecture:
to suspect it could be possible.
Exercises
Chapter 4 Greedy Algorithms 201
200

A subgraph (V, E’) is a minimum-altitude connected subgraph if and only if

it contains the edges of the minimum spanning tree.
1
Note that this second conjecture would immediately imply the first one,
since a minimum spanning tree contains its own edges.
So here’s the question.
(a) Is the first conjecture true, for all choices of G and distinct altitudes
at? Give a proof or a counterexample with e, xplanation.
(b) Is the second conjecture true, for all choices of G and distinct alti-
tudes ae? Give a proof or a countere~xample with explanation. Figure 4.20 An instance of the zero-skew problem, described in Exercise 23.

21. Let us say that a graph G = (V, E) is a near-tree if it is connected and has at The root generates a clock signal which is propagated along the edges
most n + 8 edges, where n = IVI. Give an algorithm with running t~me O(n)
to the leaves. We’]] assume that the time it takes for the signal to reach a
that takes a near-tree G with costs on its edges, and returns a minimum
given leaf is proportional to the distance from the root to the leaf.
spanning tree of G. You may assume that all the edge costs are distinct.
Now, if all leaves do not have the same distance from the root, then
the signal will not reach the leaves at the same time, and this is a big
22. Consider the Minimum Spanning Tree Problem on an undirected graph
G = (V, E), with a cost ce >_ 0 on each edge, where the costs may not all problem. We want the leaves to be completely synchronized, and all to
be different. If the costs are not a~ distinct, there can in general be receive the signal at the same time. To make this happen, we will have to
many distinct minimum-cost solutions. Suppose we are given a spanning increase the lengths of certain edges, so that all root-to-leaf paths have
tree T c E with the guarantee that for every e ~ T, e belongs to some the same length (we’re not able to shrink edge lengths). If we achieve this,
minimum-cost spanning tree in G. Can we conclude that T itself must then the tree (with its new edge lengths) will be said to have zero skew.
be a minimum-cost spanning tree in G? Give a proof or a counterexample Our goal is to achieve zero skew in a way that keeps the sum of all the
with explanation. edge lengths as small as possible.
Give an algorithm that increases the lengths of certain edges so that
23. Recall the problem of computing a minimum-cost arborescence in a the resulting tree has zero skew and the total edge length is as sma]] as
directed graph G = (V, E), with a cost ce >_ 0 on each edge. Here we will possible.
consider the case in which G is a directed acyclic graph--that is, it contains
Example. Consider the tree in Figure 4.20, in which letters name the nodes
no directed cycles. and numbers indicate the edge lengths.
As in general directed graphs, there can be many distinct minimum- The unique optimal solution for ~s instance would be to take the
cost solutions. Suppose we are given a directed acyclic graph G = (V, E), three length-1 edges and increase each of their lengths to 2. The resulting
and an arborescence A c E with the guarantee that for every e ~ A, e
tree has zero skew, and the total edge length is 12, the smallest possible.
belongs to some minimum-cost arborescence in G. Can we conclude that
A itself must be a minimum-cost arborescence in G? Give a proof or a 25. Suppose we are given a set of points P = [Pl,P2 ..... Pn}, together with a
counterexample with explanation. distance function d on the set P; d is simply a function bn paJ_rs of points in
P with the properties that d(p~,pi) = d(py, Pi) > 0 ff i #j, and that d(p~, pi) = 0
24. TimJ.ng circuits are a crucial component of VLSI chips. Here’s a simple for each i.
model of such a timing circuit. Consider a complete balanced binary tree
We define a hierarchical metric onP to be any distance function r that
with n leaves, where n is a power of two. Each edge e of the tree has an
can be constructed as fo]]ows. We build a rooted tree T with n leaves, and
associated length ~e, which is a positive number. The distance from the we associate with each node v of T (both leaves and internal nodes) a
root to a given leaf is the sum of the lengths of all the edges on the path height hr. These heights must satisfy the properties that h(v) = 0 for each
from the root to the leaf.
Exercises 203
Chapter 4 Greedy Algorithms
202
Here is one way to do this. Let G be a connected graph, and T and T’
leaf v, and ff u is the parent of v in T, then h(u) >_ h(v). We place each point two different spanning trees of G.. We say that T and T’ are neighbors if
in P at a distinct leaf in T. Now, for any pair of points p~ and Pi, their T contains exactly one edge that is not in T’, and T"contains exactly one
distance ~(p~, Pi) is defined as follows. We determine the least common edge that is not in T.
ancestor v in T of the leaves containing p~ and Pi, and define ~(p~,
Now, from any graph G, we can build a (large) graph 9~ as follows.
We say that a hierarchical metric r is consistent with our distance The nodes of 9~ are the spanning trees of G, and there is an edge between
function d if, for all pairs i,j, we have r(p~,pl) _< d(p~,Pi). two nodes of 9C if the corresponding spanning trees are neighbors.
Give a polynomial-time algorithm that takes the distance function d Is it true that, for any connected graph G, the resulting graph ~
and produces a hierarchical metric ~ with the following properties. is connected? Give a proof that ~K is always connected, or provide an
(i) ~ is consistent with d, and example (with explanation) of a connected graph G for which % is not
<- connected.
(ii) ff ~’ is any other hierarchical metric consistent with d, then ~’(P~,Pi)
r(p~,pi) for each pair of points Pi and
28. Suppose you’re a consultant for the networking company CluNet, and
26. One of the first things you learn in calculus is how to minimize a dif- they have the following problem. The network that they’re currently
ferentiable function such as y = ax2 + bx + c, where a > 0. The Minimum working on is modeled by a connected graph G = (V, E) with n nodes.
Spanning Tree Problem, on the other hand, is a minimization problem of Each edge e is a fiber-optic cable that is owned by one of two companies--
a very different flavor: there are now just a~ finite number of possibilities, creatively named X and Y--and leased to CluNet.
for how the minimum might be achieved--rather than a continuum of Their plan is to choose a spanning tree T of G and upgrade the links
possibilities--and we are interested in how to perform the computation corresponding to the edges of T. Their business relations people have
without having to exhaust this (huge) finite number of possibilities. already concluded an agreement with companies X and Y stipulating a
One Can ask what happens when these two minimization issues number k so that in the tree T that is chosen, k of the edges will be owned
are brought together, and the following question is an example of this. by X and n - k - 1 of the edges will be owned by Y.
Suppose we have a connected graph G = (V, E). Each edge e now has a time- CluNet management now faces the following problem. It is not at all
varying edge cost given by a function fe :R-+R. Thus, at time t, it has cost clear to them whether there even exists a spanning tree T meeting these
re(t). We’l! assume that all these functions are positive over their entire conditions, or how to find one if it exists. So this is the problem they put
range. Observe that the set of edges constituting the minimum spanning to you: Give a polynomial-time algorithm that takes G, with each edge
tree of G may change over time. Also, of course, the cost of the minimum labeled X or Y, and either (i) returns a spanning tree with e~xactly k edges
spanning tree of G becomes a function of the time t; we’ll denote this labeled X, or (ii) reports correctly that no such tree exists.
function ca(t). A natural problem then becomes: find a value of t at which
cG(t) is minimized. 29. Given a list of n natural numbers all, d2 ..... tin, show how to decide
Suppose each function fe is a polynomial of degree 2: re(t) =aetz + in polynomial time whether there exists an undirected graph G = (V, E)
bet + Ce, where ae > 0. Give an algorithm that takes the graph G and the whose node degrees are precisely the numbers d~, d2 ..... dn. (That is, ff
values {(ae, be, ce) : e ~ E} and returns a value of the time t at which the V = {Ul, v2 ..... vn}, then the degree of u~ should be exactly dv) G should not
minimum spanning tree has minimum cost. Your algorithm should run contain multiple edges between the same pair of nodes, or "!oop" edges
in time polynomial in the number of nodes and edges of the graph G. You with both endpoints equal to the same node.
may assume that arithmetic operations on the numbers {(ae, be, q)} can
be done in constant time per operation. 30. Let G = (V, E) be a graph with n nodes in which each pair of nodes is
joined by an edge. There is a positive weight w~i on each edge (i,]); and
27. In trying to understand the combinatorial StlXlcture of spanning trees, we will assume these weights satisfy the triangle inequality tv~k <_ ra~i + Wik.
we can consider the space of all possible spanning trees of a given graph For a subset V’ _ V, we will use G[V’] to denote the subgraph (with edge
and study the properties of this space. This is a strategy that has been weights) induced on the nodes in V’.
applied to many similar problems as well.
Chapter 4 Greedy Algorithms Notes and Further Reading 205
204
32. Consider a directed graph G = (V, E) with a root r ~ V and nonnegative
We are given a set X _ V of k terminals that must be connected by
costs on the edges. In this problem we consider variants of the ~um-
edges. We say that a Steiner tree onX is a set Z so that X ~_ Z _ V, together
cost arborescence algorithm.
with a spanning subtree T of G[Z]. The weight of the Steiner tree is the
weight of the tree T. (a) The algorithm discussed in Section 4.9 works as follows. We modify
the costs, consider the subgraph of zero-cost edges, look for a
Show that the problem of finding a minimum-weight Steiner tree on
directed cycle in this subgraph, and contract it (if one exists). Argue
X can be solved in time briefly that instead of looking for cycles, we can instead identify and
contract strong components of this subgraph.
31. Let’s go back to the original motivation for the Minimum Spanning Tree
Problem. We are given a connected, undirected graph G = (V, E) with (b) In the course of the algorithm, we defined Yv to be the minimum
positive edge lengths {~e}, and we want to find a spanning subgraph of cost of an edge entering ~, and we modified the costs of all edges e
it. Now suppose we are ~g to settle for a subgraph/4 = (V, F) that is entering node u to be c’e = ce - yr. Suppose we instead use the follow-
"denser" than a tree, and we are interested in guaranteeing that, for each ing modified cost: c~’ = max(0, ce - 2y~). This new change is_likely to
pair of vertices a, v ~ V, the length of the shortest u-v path in/4 is not turn more edges to 0 cost. Suppose now we find an arborescence T
much longer than the length of the shortest a-v path in G. By the length of 0 cost. Prove that this T has cost at most twice the cost of the
of a path P here, we mean the sum of ~e over all edges e in P. minimum-cost arborescence in the original graph.
Here’s a variant of Kruskal’s Algorithm designed to produce such a (c) Assume you do not find an arborescence of 0 cost. Contract al! 0-
subgraph. cost strong components and recursively apply the same procedure
on the resttlting graph unti! an arborescence is found. Prove that this
* First we sort all the edges in order of increasing length. (You may
T has cost at most twice the cost of the minimum-cost arborescence
assume all edge lengths are distinct.)
in the original graph.
o We then construct a subgraph H = (V, F) by considering each edge in
order. 33. Suppose you are given a directed graph G = (V, E) In which each edge has
¯ When we come to edge e = (u, v), we add e to the subgraph/4 if there a cost of either 0 or 1. Also suppose that G has a node r such that there is a
is currently no a-v path in/4. (This is what Kruskal’s Algorithm would path from r to every other node in G. You are also given an integer k. Give a
do as well.) On the other hand, if there is a u-v path in/4, we let duv polynomial-time algorithm that either constructs an arborescence rooted
denote the length of the shortest such path; again, length is with at r of cost exactly k, or reports (correctly) that no such arborescence
respect to the values {~e}. We add e to/4 ff 3~e < duv- exists.
In other words, we add an edge even when a and v are already In the same
connected component, provided that the addition of the edge reduces Notes and Further Reading
their shortest-path distance by a sufficient amount.
Let H = (V, F) be the, subgraph of G returned by the algorithm. Due to their conceptual cleanness and intuitive appeal, greedy algorithms have
(a) Prove that for evet3~ pair of nodes a, v ~ V, the length of the shortest a long history and many applications throughout computer science. In this
u-v path in H is at most three times the length of the shortest a-v chapter we focused on cases in which greedy algorithms find the optimal
solution. Greedy algorithms are also often used as simple heuristics even when
path in G.
they are not guaranteed to find the optimal solution. In Chapter 11 we will
(b) Despite its ability to approximately preserve sh°rtest-p ath distances’
discuss greedy algorithms that find near-optimal approximate solutions.
the subgraph/4 produced by the algorithm cannot be too dense.
Let f(n) denote the maximum number of edges that can possibly As discussed in Chapter 1, Interval Scheduling can be viewed as a special
be produced as the out-put of this algorithm, over all n-node input case of the Independent Set Problem on a graph that represents the overlaps
among a collection of intervals. Graphs arising this way are called interval
graphs with edge lengths. Prove that
graphs, and they have been extensively studied; see, for example, the book
by Golumbic (1980). Not just Independent Set but many hard computational
Chapter 4 Greedy Mgofithms .~otes and Further Reading 207
206

problems become much more tractable when restricted to the special case of of U.S. Supreme Court Justice Potter Stewart’s famous test for obscenity--
interval graphs. "I know it when I see it"--since one finds disagreements within the research
Interval Scheduling and the problem of scheduling to minimize the max- community on what constitutes the boundary, even intuitively, between greedy
imum lateness are two of a range of basic scheduling problems for which and nongreedy algorithms. There has been research aimed at formalizing
a simple greedy algorithm can be shown to produce an optimal solution. A classes of greedy algorithms: the theory of matroids is one very influential
wealth of related problems can be found in the survey by Lawier, Lenstra, example (Edmonds 1971; Lawler 2001); and the paper of Borodin, Nielsen, and
Rinnooy Kan, and Shmoys (1993). Rackoff (2002) formalizes notions of greedy and "greedy-type" algorithms, as
well as providing a comparison to other formal work on this quegtion.
The optimal algorithm for caching and its analysis are due to Belady
(1966). As we mentioned in the text, under real operating conditions caching Notes on the Exercises Exercise 24 is based on results of M. Edahiro, T. Chao,
algorithms must make eviction decisions in real time without knowledge of Y. Hsu, J. Ho, K. Boese, and A. Kahng; Exercise 31 is based on a result of Ingo
future requests. We will discuss such caching strategies in Chapter 13. Althofer, Gantam Das, David Dobkin, and Deborah Joseph.

The algorithm for shortest paths in a graph with nonnegative edge lengths
is due to Dijkstra (1959). Surveys of approaches to the Minimum Spanning Tree
Problem, together with historical background, can be found in the reviews by
Graham and Hell (1985) and Nesetril (1997).
The single-link algorithm is one of the most~widely used approaches to,
the general problem of clustering; the books by Anderberg (1973), Duda, Hart,
and Stork (2001), and Jaln and Dubes (1981) survey a variety of clustering
techniques.
The algorithm for optimal prefix codes is due to Huffman (1952); the ear-
lier approaches mentioned in the text appear in the books by Fano (1949) and
Shannon and Weaver (1949). General overviews of the area of data compres-
sion can be found in the book by Bell, Cleary, and Witten (1990) and the
survey by Lelewer and Hirschberg (1987). More generally, this topic belongs
to the area of information theory, which is concerned with the representation
and encoding of digital information. One of the founding works in this field
is the book by Shannon and Weaver (1949), and the more recent textbook by
Cover and Thomas (1991) provides detailed coverage of the subject..
The algorithm for finding minimum-cost arborescences is generally cred-
ited to Chu and Liu (1965) and to Edmonds (1967) independently. As discussed
in the chapter, this multi-phase approach stretches our notion of what consti-
tutes a greedy algorithm. Itis also important from the perspective of linear
programming, since in that context it can be viewed as a fundamental ap-
plication of the pricing method, or the primal-dual technique, for designing
algorithms. The book by Nemhauser and Wolsey (1988) develops these con-
nections to linear program~ning. We will discuss this method in Chapter 11 in
the context of approximation algorithms.
More generally, as we discussed at the outset of the chapter, it is hard to
find a precise definition of what constitutes a greedy algorithm. In the search
for such a deflation, it is not even clear that one can apply the analogue
Chapter
Divide artd Cortquer

Divide and conquer refers to a class of algorithmic techniques in which one

breaks the input into several parts, solves the problem ifi each part recursively,
and then combines the solutions to these subproblems into an overall solution.
In many cases, it can be a simple and powerful method.
Analyzing the running time of a divide and conquer algorithm generally
involves solving a recurrence relation that bounds the running time recursively
in terms of the running time on smaller instances. We begin the chapter with
a general discussion of recurrence relations, illustrating how they arise in the
analysis and describing methods for working out upper bounds from them.
We then illustrate the use of divide and conquer with applications to
a number of different domains: computing a distance function on different
rankings of a set of objects; finding the closest pair of points in the plane;
multiplying two integers; and smoothing a noisy signal. Divide and conquer
will also come up in subsequent chapters, since it is a method that often works
well when combined with other algorithm design techniques. For example, in
Chapter 6 we will see it combined with dynamic programming to produce a
space-efficient solution to a fundamental sequence comparison problem, and
in Chapter 13 we will see it combined with randomization to yield a simple
and efficient algorithm for computing the median of a set of numbers.
One thing to note about many settings in which divide and conquer
is applied, including these, is that the natural brute-force algorithm may
already be polynomial time, and the divide and conquer strategy is serving
to reduce the running time to a !ower polynomial. This is in contrast to most
of the problems in the previous chapters, for example, where brute force .was
exponential and the goal in designing a more sophisticated algorithm was to
achieve any kind of polynomial running time. For example, we discussed in
Chapter 5 Divide and Conquer 5.1 A First Recurrence: The Mergesort Algorithm 211
210

Chapter 2 that the natural brute-force algorithm for finding the closest pair (5.1) For some constant c,
among n points in the plane would simply measure all ® (n2) distances, for
T(n) < 2T(n/2) + cn
a (polynomial) running time of O(n2). Using divide and conquer, we wi!!
improve the running time to O(n log n). At a high level, then, the overall theme when n > 2, and
of this chapter is the same as what we’ve been seeing earlier: that improving on
brute-force search is a fundamental conceptual hurdle in solving a problem T(2) _< c.
efficiently, and the design of sophisticated algorithms can achieve this. The The structure of (5.1) is typical of what recurrences will look like: there’s an
difference is simply that the distinction between brute-force search and an
inequality or equation that bounds T(n) in terms of an expression involving
improved solution here will not always be the distinction between exponential
T(k) for sma!ler values k; and there is a base case that generally says that
and polynomia!. T(n) is equal to a constant when n is a constant. Note that one can also write
(5.1) more informally as T(n)< 2T(n/2)+ O(n), suppressing the constant
c. However, it is generally useful to make c explicit when analyzing the
5.1 A First Recurrence: The Mergesort Algorithm recurrence.
To motivate the general approach to analyzing divide-and-conquer algorithms,
To keep the exposition simpler, we will generally assume that parameters
we begin with the Mergesort Algorithm. We discussed the Mergesort Algorithm
like n are even when needed. This is somewhat imprecise usage; without this
briefly in Chapter 2, when we surveyed common running times for algorithms.
assumption, the two recursive calls would be on problems of size In/2] and
Mergesort sorts a given list of numbers by first diviiting them into two equal
[n/2J, and the recurrence relation would say that
halves, sorting each half separately by recursion, and then combining the
results of these recursive calls--in the form of the two sorted halves--using T(n) < T([n/2]) + T(Ln/2J) + cn
the linear-time algorithm for merging sorted lists that we saw in Chapter 2.
for n > 2. Nevertheless, for all the recurrences we consider here (and for most
To analyze the running time of Mergesort, we will abstract its behavior into that arise in practice), the asymptotic bounds are not affected by the decision
the following template, which describes many common divide-and-conquer to ignore all the floors and ceilings, and it makes the symbolic manipulation
algorithms. much cleaner.
(Q Divide the input into two pieces of equal size; solve the two subproblems Now (5.1) does not exphcitly provide an asymptotic bound on the growth
on these pieces separately by recursion; and then combine the two results rate of the function T; rather, it specifies T(n) implicitly in terms of its values
into an overall solution, spending only linear time for the initial division on smaller inputs. To obtain an explicit bound, we need to solve the recurrence
and final recombining. relation so that T appears only on the left-hand side of the inequality, not the
fight-hand side as well.
In Mergesort, as in any algorithm that fits this style, we also need a base case
Recurrence solving is a task that has been incorporated into a number
for the recursion, typically having it "bottom out" on inputs of some constant
size. In the case of Mergesort, we will assume that once the input has been of standard computer algebra systems, and the solution to many standard
reduced to size 2, we stop the- recursion and sort the two elements by simply recurrences can now be found by automated means. It is still useful, however,
to understand the process of solving recurrences and to recognize which
comparing them to each other.
recurrences lead to good running times, since the design of an efficient divide-
Consider any algorithm that fits the pattern in (J-), and let T(n) denote its and-conquer algorithm is heavily intertwined with an understanding of how
worst-case running time on input instances of size n. Supposing that n is even, a recurrence relation determines a running time.
the algorithm spends O(n) time to divide the input into two pieces of size n/2
each; it then spends time T(n/2) to solve each one (since T(n/2) is the worst-
case nmning time for an input of size n/2); and finally it spends O(n) time Approaches to Solving Recurrences
to combine the solutions from the two recursive calls. Thus the running time There are two basic ways one can go about solving a recurrence, each of which
T(n) satisfies the following recurrence relation. we describe in more detail below.
5.1 A First Recurrence: The M~rgesort Algorithm 213
212 Chapter 5 Divide and Conquer

The most intuitively natural way to search for a solution to a recurrence is Idengfying a pattern: What’s going on in general? At level j of the
to "unroll" the recursion, accounting for the running time across the first recursion, the number of subproblems has doubled j times, so there are
few levels, and identify a pattern that can be continued as the recursion now a total of 2J. Each has correspondingly shrunk in size by a factor
expands. One then sums the running times over all levels of the recursion of two j times, and so each has size n/2J, and hence each takes time at
(i.e., until it "bottoms out" on subproblems of constant size) and thereby most cn/2J. Thus level j contributes a total of at most 2~(cn/2~) = cn to
the total running time.
arrives at a total running time.
A second way is to start with a guess for the solution, substitute it into Summing over all levels of recursion: We’ve found that the recurrence
the recurrence relation, and check that it works. Formally, one justifies in (5.1) has the property that the same upper bound of cn applies to
this plugging-in using an argument by induction on n. There is a useful total amount Of work performed at each level. The number of times the
variant of this method in which one has a general form for the solution, input must be halved in order to reduce its size from n to 2 is log2 n.
but does not have exact values for all the parameters. By leaving these So summing the cn work over log n levels of recursion, we get a total
parameters unspecified in the substitution, one can often work them out running time of O(n log n).
as needed. We summarize this in the following claim.
We now discuss each of these approaches, using the recurrence in (5.1) as an
example. (5.2) Any function T(.) satisfying (5.1) is bounded by O(n log n), when
n>l.
Unrolling the Mergesort Recurrence
Let’s start with the first approach to solving the recurrence in (5.1). The basic
argument is depicted in Figure 5.1. Substituting a Solution into the Mergesort Recurrence
The argument establishing (5.2) can be used to determine that the function
o Analyzing the first few levels: At the first level of recursion, we have a T(n) is bounded by O(n log n). If, on the other hand, we have a guess for
single problem of size n, which takes time at most cn plus the time spent the running time that we want to verify, we can do so by plugging it into the
in all subsequent rect~sive calls. At the next level, we have two problems recurrence as follows.
each of size n/2. Each of these takes time at most cn/2, for a total of at
most cn, again plus the time in subsequent recursive calls. At the third Suppose we believe that T(n) < cn log2 n for all n > 2, and we want to
level, we have four problems each of size n/4, each taking time at most check whether this is indeed true. This clearly holds for n = 2, since in this
cn/4, for a total of at most cn. case cnlog2 n = 2c, and (5.1) explicitly tells us that T(2) < c. Now suppose,
by induction, that T(m) <_ cm log2 m for all values of m less than n, and we
want to establish this for T(n). We do this by writing the recurrence for T(n)
Level 0: cn and plugging in the inequality T(n/2) <_ c(n/2) log2(n/2). We then simplify the
resulting expression by noticing that log2(n/2) = (log2 n) - 1. Here is the ftfll
calculation.
T(n) < 2T(n/2) + cn
Level 1:crt/2 + crt/2 = cn total
< 2c(n/2) loga(n/2) + cn
= cn[(log2 n) - 1] + cn
= (cn log2 n) - cn + cn
Level 2: 4(cn/4) = cn total
= cn log2 n.
This establishes the bound we want for T(n), assuming it holds for smaller
Figure 5.1 Unrolling the recurrence T(n) < 2T(n/2) + O(n). values m < n, and thus it completes the induction argument.
5.2 Further Recurrence Relations 215
214 Chapter 5 Divide and Conquer

This more general class of algorithms is obtained by considering divide-

An Approach Using Partial Substitution and-conquer algorithms that create recursive calls on q subproblems of size
There is a somewhat weaker kind of substitution one can do, in which one n/2 each and then combine the results in O(n) time. This corresponds to
guesses the overall form of the solution without pinning down the exact values the Mergesort recurrence (5.1) when q = 2 recursive calls are used, but other
of all the constants and other parameters at the outset. algorithms find it useful to spawn q > 2 recursive calls, or just a single (q = 1)
Specifically, suppose we believe that T(n)= O(n log n), but we’re not recursive call. In fact, we will see the case q > 2 later in this chapter when we
sure of the constant inside the O(-) notation. We can use the substitution design algorithms for integer multiplication; and we will see a variant on the
method even without being sure of this constant, as follows. We first write case q = 1 much later in the book, when we design a randomized algorithm
T(n) <_ kn logb n for some constant k and base b that we’ll determine later. for median finding in Chapter 13.
(Actually, the base and the constant we’ll end up needing are related to each If T(n) denotes the nmning time of an algorithm designed in this style,
other, since we saw in Chapter 2 that one can change the base of the logarithm then T(n) obeys the following recurrence relation, which directly generalizes
by simply changing the multiplicative constant in front.) (5.1) by replacing 2 with q:
Now we’d like to know whether there is any choice of k and b that wiJ!
work in an inductive argument. So we try out one level of the induction as (5.3) For some constant c,
follows. T(n) <_ qT(n/2) + cn
T(n) < 2T(n/2) + cn < 2k(n/2) lOgb(n/2) + cn. when n > 2, and
It’s now very tempting to choose the base b = 2 for the logarithm, since we see T(2) < c.
that this wil! let us apply the simplification logz(n/2) = (log2 n) - 1. Proceeding
with this choice, we have
T(n) <_ 2k(n/2) log2(n/2) + cn We now describe how to solve (5.3) by the methods we’ve seen above:
= 2k(n/2) [(log2 n) - 1] + cn unrolling, substitution, and partial substitution. We treat the cases q > 2 and
= krt[(log2 n) - 1] + cn q = 1 separately, since they are qualitatively different from each other--and
different from the case q = 2 as well.
= (kn log2 n) - kn + cn.
Finally, we ask: Is there a choice of k that will cause this last expression to be The Case of q > 2 Subproblems
bounded by kn log2 n? The answer is clearly yes; we iust need to choose any We begin by unro!ling (5.3) in the case q > 2, following the style we used
k that is at least as large as c, and we get earlier for (5.1). We will see that the punch line ends up being quite different.
T(n) < (kn log2 n) - kn + cn <_ kn !og2 n,
Analyzing the first few levels: We show an example of this for the case
which completes the induction. q = 3 in Figure 5.2. At the first level of recursion, we have a single
Thus the substitution method can actually be usefl~ in working out the problem of size n, which takes time at most cn plus the time spent in all
exact constants when one has some guess of the general form of the solution. subsequent recursive calls. At the next level, we have q problems, each
of size n/2. Each of these takes time at most cn/2, for a total of at most
(q/2)cn, again plus the time in subsequent recursive calls. The next level
5.2 Further Recurrence Relations yields q2 problems of size n/4 each, for a total time of (q2/4)cn. Since
We’ve just worked out the solution to a recurrence relation, (5.1), that will q > 2, we see that the total work per level is increasing as we proceed
come up in the design of several divide-and-conquer algorithms later in this through the recursion.
chapter. As a way to explore this issue further, we now consider a class Identifying apattern: At an arbitrary levelj, we have qJ distinct instances,
of recurrence relations that generalizes (5.1), and show how to solve the each of size n/2]. Thus the total work performed at level j is qJ(cn/2]) =
recurrences in this class. Other members of this class will arise in the design (q/2)icn.
of algorithms both in this and in later chapters.
5.2 Further Recurrence Relations 217
Chapter 5 Divide and Conquer
216

cn time plus Level O: cn total (5.4) Any function T(.) satisfying (5.3) with q > 2 is bounded by O(nl°ga q).
recursive calls

So we find that the running time is more than linear, since log2 q > !,
but still polynomial in n. Plugging in specific values of q, the running time
Level 1:cn/2 + cn/2 + cn/2 = (3/2)cn total
is O(nl°g~ 3) = O(nl.sg) when q = 3; and the running time is O(nl°g~ 4) = O(n2)
when q = 4. This increase in running time as q increases makes sense, of
course, since the recursive calls generate more work for larger values of q.
Level 2: 9(cn/4) = (9/4)cn total
Applying Partial Substitution The appearance of log2 q in the exponent
followed naturally from our solution to (5.3), but it’s not necessarily an
expression one would have guessed at the outset. We now consider how an
Figure 5.2 Unrolling the recurrence T(n) < 3T(n/2) + O(rt). approach based on partial substitution into the recurrence yields a different
way of discovering this exponent.
Summing over all levels of recursion: As before, there are log). n levels of Suppose we guess that the solution to (5.3), when q > 2, has the form
recursion, and the total amount of work performed is the sum over-all T(n) <_ kna for some constants k > 0 and d > 1. This is quite a general guess,
these: since we haven’t even tried specifying the exponent d of the polynomial. Now
let’s try starting the inductive argument and seeing what constraints we need
on k and d. We have

This is a geometric sum, consisting of powers of r = q/2. We can use the T(n) <_ qT(n/2) + cn,
formula for a geometric sum when r > 1, which gives us the formula
and applying the inductive hypothesis to T(n/2), this expands to

rlogz
r(n) <_cn \ r-n _!.1)< cn
+ cn
Since we’re aiming for an asymptotic upper bound, it is useful to figure
out what’s simply a constant; we can pull out the factor of r - 1 from = q, knd + cn.
2a
the denominator, and write the last expression as
nrlog2 n This is remarkably close to something that works: if we choose d so that
T(n) <_ q/2d = 1, then we have T(n) < knd + cn, which is almost right except for the
extra term cn. So let’s deal with these two issues: first, how to choose d so we
Finally, we need to figure out what rl°g2 n is. Here we use a very handy get q/2a = 1; and second, how to get rid of the cn term.
identity, which says that, for any a > 1 and b > 1, we have al°g b = blOg a
Choosing d is easy: we want 2d = q, and so d = log2 q. Thus we see that
Thus
the exponent log2 q appears very naturally once we dec.ide to discover which
rlog2 n _~_ nlog2 r = nlog2(q/2) = n(logz q)-l. value of d works when substituted into the recurrence.
But we still have to get rid of the cn term. To do this, we change the
Thus we have form of our guess for T(n) so as to explicitly subtract it off. Suppose we try
n ¯ n(l°g2 q)--I <_ nlog2 q = O(nl°g2 q). the form T(n) <_ kna - gn, where we’ve now decided that d = log2 q but we
T(n) <_
haven’t fixed the constants k or g. Applying the new formula to T(n/2), this
expands to
We sum this up as follows.
218 Chapter 5 Divide and Conquer 5.2 Further Recurrence Relations 219

+ cn cn time, plus
recursive calls Level 0: cn total

q. knd - q~ n + cn
2a 2
= knd -- q~’n + cn
~ /~ Level 1:cn/2 total
2

= l~na - (~ - c)n.
~/~ Level2:cn/4total
This now works completely, if we simply choose ~ so that (~ - c) = ~: in other
Figure 5.3 Unrolling the recurrence T(n) <_ T(n/2) + O(n).
words, ~ = 2c/(q - 2). This completes the inductive step for n. We also need
to handle the base case n = 2, and this we do using the fact that the value of
k has not yet been fixed: we choose k large enough so that the formula is a We sum this up as follows.
valid upper bound for the case n = 2.

The Case of One Subproblem

We now consider the case of q = I in (5.3), since this illustrates an outcome’ This is counterintuitive when you first see it. The algorithm is performing
of yet another flavor. While we won’t see a direct application of the recurrence log n levels of recursion, but the overall running time is still linear in n. The
for q = 1 in this chapter, a variation on it comes up in Chapter 13, as we point is that a geometric series with a decaying exponent is a powerfl~ thing:
mentioned earlier. fully half the work performed by the algorithm is being done at the top level
We begin by unrolling the recurrence to try constructing a solution. of the recursion.
It is also useful to see how partial substitution into the recurrence works
Analyzing the first few levels: We show the first few levels of the recursion very well in this case. Suppose we guess, as before, that the form of the solution
in Figure 5.3. At the first level of recursion, we have a single problem of is T(n) <_ kna. We now try to establish this by induction using (5.3), assuming
size n, which takes time at most cn plus the time spent in all subsequent that the solution holds for the smaller value n/2:
recursive calls. The next level has one problem of size n/2, which
contributes cn/2, and the level after that has one problem of size n/4, T(n) <_ T(n/2) + cn
which contributes cn/4. So we see that, unlike the previous case, the total
+cn
work per leve! when q = 1 is actually decreasing as we proceed .through
the recursion.
Identifying a pattern: At an arbitrary level j, we st£1 have just one ~ ~d nd -}- on.
instance; it has size n/21 and contributes cn/21 to the running time.
If we now simply choose d = ! and k = 2c, we have
Summing over all levels of recursion: There are log2 n levels of recursion,
and the total amount of work performed is the sum over all these: k
T(n) <_ ~n + cn = (-~ + c)n = kn,
l°g2 n-1cn l°g2 n-1 (~./
T(n) < which completes the induction.
-~- =cn 1=o
E "
The Effect of the Parameter q. It is worth reflecting briefly on the role of the
This geometric sum is very easy to work out; even if we continued it to parameter q in the class of recurrences T(n) <_ qT(n/2) + O(n) defined by (5.3).
When q = 1, the restflting running time is linear; when q = 2, it’s O(n log n);
infinity, it would converge to 2. Thus we have
and when q > 2, it’s a polynomial bound with an exponent larger than I that
T(n) <_ 2cn = O(n). grows with q. The reason for this range of different running times lies in where
Chapter 5 Divide and Conquer 5.3 Counting Inversions 221
220

most of the work is spent in the recursion: when q = 1, the total running time total of at most cn2/2, again plus the time in subsequent recursive calls.
is dominated by the top level, whereas when q > 2 it’s dominated by the work At the third level, we have four problems each of size n/4, each taking
done on constant-size subproblems at the bottom of the recursion. Viewed this time at most c(n/4)2 = cn2/16, for a total of at most cn2/4. Already we see
way, we can appreciate that the recurrence for q = 2 really represents a "knife- that something is different from our solution to the analogous recurrence
edge"--the amount of work done at each level is exactly the same, which is (5.!); whereas the total amount of work per level remained the same in
that case, here it’s decreasing.
what yields the O(n log n) running time.
Identifying a pattern: At an arbitrary level j of the recursion, there are 2J
subproblems, each of size n/2J, and hence the total work at this level is
A Related Recurrence: T(n) <_ 2T(n/2) + O(n2)
bounded by 2Jc@)2 = cn2/2j.
We conclude our discussion with one final recurrence relation; it is illustrative
both as another application of a decaying geometric sum and as an interesting Summing over all levels of recarsion: Having gotten this far in the calcu-
contrast with the recurrence (5.1) that characterized Mergesort. Moreover, we lation, we’ve arrived at almost exactly the same sum that we had for the.
wil! see a close variant of it in Chapter 6, when we analyze a divide-and- case q = 1 in the previous recurrence. We have
conquer algorithm for solving the Sequence Alignment Problem using a small
cn2 1og2 n~-I (~.)
amount of working memory. T(n) <_ log~-I : = cn2 < 2cn2 = O(n2),
The recurrence is based on the following divide-and-conquer structure. j=0 23 -
j=0

Divide the input into two pieces of equal size; solve the two subproblems, where the second inequality follows from the fact that we have a con-
on these pieces separately by recursion; and then combine the two results vergent geometric sum.
into an overall solution, spending quadratic time for the initial division In retrospect, our initial guess of T(n) = O(n2 log n), based on the analogy
and final recombining. to (5.1), was an overestimate because of how quickly n2 decreases as we
For our proposes here, we note that this style of algorithm has a running time replace it with (~)2, (n)2,~ (~)~, and so forth in the unrolling of the recurrence.
T(n) that satisfies the fo!lowing recurrence. This means that we get a geometric sum, rather than one that grows by a fixed
amount over all n levels (as in the solution to (5.1)).
(5.6) For some constant c,
T(n) <_ 2T(n/2) + cn2
5.3 Counting Inversions
when n > 2, and We’ve spent some time discussing approaches to solving a number of common
T(2) < c. recurrences. The remainder of the chapter will illustrate the application of
divide-and-conquer to problems from a number of different domains; we will
use what we’ve seen in the previous sections to bound the running times
One’s first reaction is to guess that the solution will be T(n) = O(n2 log n), of these algorithms. We begin by showing how a variant of the Mergesort
since it looks almost identical to (5.1) except that the amount of work per level technique can be used to solve a problem that is not directly related to sorting
is larger by a factor equal to the input size. In fact, this upper bound is correct numbers.
(it would need a more careful argument than what’s in the previous sentence),
but it will turn out that we can also show a stronger upper bound. ~ The Problem
We’ll do this by unrolling the recurrence, following the standard template We will consider a problem that arises in the analysis of rankings, which
for doing this. are becoming important to a number of current applications. For example, a
o Analyzing the first few levels: At the first level of recursion, we have a number of sites on the Web make use of a technique known as collaborative
single problem of size n, which takes time at most cn2 plus the time spent filtering, in which they try to match your preferences (for books, movies,
in al! subsequent recursive calls. At the next level, we have two problems, restaurants) with those of other people out on the Internet. Once the Web site
has identified people with "similar" tastes to yours--based on a comparison
each of size n/2. Each of these takes time at most c(n/2)2 = cn2/4, for a
Chapter 5 Divide and Conquer 5.3 Counting Inversions 223
222
~ Designing and Analyzing the Algorithm
of how you and they rate various things--it can recommend new things that
these other people have liked. Another application arises in recta-search tools What is the simplest algorithm to count inversions? Clearly, we could look
on the Web, which execute the same query on many different search engines at ~very pair of numbers (ai, aj) and determine whether they constitute an
and then try to synthesize the results by looking for similarities and differences inversion; this would take O(n2) time.
among the various rankings that the search engines return. We now show how to count the number of inversions much more quickly,
in O(n log n) time. Note that since there can be a quadratic number of inver-
A core issue in applications like this is the problem of comparing two
sions, such an algorithm must be able to compute the total number without
rankings. You rank a set of rt movies, and then a collaborative filtering system
ever looking at each inversion individually. The basic idea is to follow the
consults its database to look for other people who had "similar" rankings. But
strategy (]-) defined in Section 5.1. We set m = [n/2] and divide the list into
what’s a good way to measure, numerically, how similar two people’s rankings
the two pieces a~ ..... am and ara+l ..... an. We first count the number of
are? Clearly an identical ranking is very similar, and a completely reversed
inversions in each of these two halves separately. Then we count the number
ranking is very different; we want something that interpolates through the
of inversions (az, aj), where the two numbers belong to different halves; the
middle region. trick is that we must do this part in O(n) time, if we want to apply (5.2). Note
Let’s consider comparing your ranking and a stranger’s ranking of the that these first-half/second-half inversions have a particularly nice form: they
same set of n movies. A natural method would be to label the movies from are precisely the pairs (a,, aj), where ai is in the first half, aj is in the second
1 to n according to your ranking, then order these labels according to the half, and ai > aj.
stranger’s ranking, and see how many pairs are "out of order." More concretely,,
To help with counting the number of inversions between the two halves,
we will consider the following problem. We are given a sequence of rt numbers
we will make the algorithm recursively sort the numbers in the two halves as
art; we will assume that all the numbers are distinct. We want to define
a measure that tells us how far this list is from being in ascending order; the well. Having the recursive step do a bit more work (sorting as wel! as counting
inversions) will make the "combining" portion of the algorithm easier.
value of the measure should be 0 if al < a2 < ¯ ¯ ¯ K an, and should increase as
the numbers become more scrambled. So the crucial routine in this process is Nerge-and-Cotmt. Suppose we
have recursively sorted the first and second halves of the list and counted the
A natural way to quantify this notion is by counting the number of inversions in each. We now have two sorted lists A and B, containing the first
inversions. We say that two indices i < j form an inversion if ai > aj, that is, and second halves, respectively. We want to produce a single sorted list C from
if the two elements ai and aj are "out of order." We will seek to determine the their union, while also counting the number of pairs (a, b) with a ~ A, b ~ B,
5 number of inversions in the sequence a~ ..... art. and a > b. By our previous discussion, this is precisely what we will need
2
for the "combining" step that computes the number of first-half/second-half
Just to pin down this definition, consider an example in which the se-
inversions.
quence is 2, 4, 1, 3, 5. There are three inversions in this sequence: (2, 1), (4, 1),
and (4, 3). There is also an appealing geometric way to visualize the inver- This is closely related to the simpler problem we discussed in Chapter 2,
sions, pictured in Figure 5.4: we draw the sequence of input numbers in the which formed the corresponding "combining" step for Mergeso.rt: there we had
Figure 5.4 Counting the
number of inversions in the order they’re p~ovided, and below that in ascending order. We then draw a two sorted lists A and B, and we wanted to merge them into a single sorted list
sequence 2, 4, 1, 3, 5. Each line segment between each number in the top list and its copy in the lower in O(n) time. The difference here is that we want to do something extra: not
crossing pair of line segments only should we produce a single sorted list from A and B, but we should also
corresponds to one pair that list. Each crossing pair of line segments corresponds to one pair that is in the
is in the opposite order in opposite order in the two lists--in other words, an inversion. count the number of "inverted pairs" (a, b) where a ~ A,, b ~ B, and a > b.
the input list and the ascend- It turns out that we will be able to do this in very much the same style
ing list--in other words, an Note how the number of inversions is a measure that smoothly interpolates
inversion. that we used for merging. Our Merge-and-Count routine will walk through
between complete agreement (when the sequence is in ascending order, then
the sorted lists A and B, removing elements from the front and appending
there are no inversions) and complete disagreement (if the sequence is in
them to the sorted list C. In a given step, we have a Current pointer into each
descending order, then every pair forms an inversion, and so there are (~) of
list, showing our current position. Suppose that these pointers are currently
Chapter 5 Divide and Conquer 5.4 Finding the Closest Pair of Points
224 225

Elements inverted 0nce one list is empty, append the remainder of the other list
with by < ai to the output
Return Count and the merged list
]A
The rimming time of Merge-and-Count can be bounded by the analogue
]B of the argument we used for the original merging algorithm at the heart of
Mergesort: each iteration of the While loop takes constant time, and in each
Figure 5.5 Merging two sorted fists while also counting the number of inversions iteration we add some element to the output that will never be seen again.
between them. Thus the number of iterations can be at most the sum of the initial lengths of
A and B, and so the total running time is O(n).
We use this Merge-and-Count routine in a recursive procedure that
at elements, ai and bi. In one step, we compare the elements ai and by being simultaneously sorts and counts the number of inversions in a list L.
pointed to in each list, remove the smaller one from its list, and append it to
the end of list C.
Sort-and-Count (L)
This takes care of merging. How do we also count the number of inver-
If the list has one element then
sions.~ Because A and B are sorted, it is actually very easy to keep track of the
there are no inversions
number of inversions we encounter. Every time the element a~ is appended to
Else
C, no new inversions are encountered, since a~ is smaller than everything left
Divide the list into two halves:
in list B, and it comes before all of them. On the other hand, if bI is appende
A contains the first [rt/2] elements
to list C, then it is smaller than all the remaining items in A, and it comes
B contains the remaining [n/2J elements
after all of them, so we increase our count of the number of inversions by the
(rA, A) = Sort-and-Count (A)
number of elements remaining in A. This is the crucial idea: in constant time,
(rB, B) = Sort-and-Count (B)
we have accounted for a potentially large number of inversions. See Figure 5.5
(r, L) = Merge-and-Count (A, B)
for an illustration of this process. Endif
To summarize, we have the following algorithm. Return r=rA+rB+r, and the sorted list L

Merge-and-Count(A,B) Since our Merge-and-Count procedure takes O(n) time, the rimming time
Maintain a Cuwent pointer into each list, initialized to T(n) of the full Sort-and-Count procedure satisfies the recurrence (5.1). By
point to the front elements (5.2), we have
Maintain a variable Count for the number of inversions,
initialized to 0
(S.7) The Sort-and-Count algorithm correctly sorts the input list and counts
While both lists are nonempty:
the number of inversions; it runs in O(n log n) time for a list with n elements:
Let ai and ~ be the elements pointed to by the Cuwent pointer
Append the smaller of these two to the output list
If ~ is the smaller element then
Increment Count by the number of elements remaining in A
Endif
5.4 Finding the losest Pair of Points
Advance the Cu~ent pointer in the list from which the We now describe another problem that can be solved by an algorithm in the
smaller element was selected. style we’ve been discussing; but finding the right way to "merge" the solutions
EndWhile to the two subproblems it generates requires quite a bit of ingenuity.
Chapter 5 Divide and Conquer 5.4 Finding the Closest Pair of Points 227
226
P and the closest pair among the points in the "right half" of P; and then we
~ The Problem use this information to get the overall solution in linear time. If we develop an
The problem we consider is very simple to state: Given rt points in the plane, algorithm with this structure, then the solution of our basic recurrence from
find the pair that is closest together. (5.1) will give us an O(n log n) running time.
The problem was considered by M. I. Shamos and D. Hoey in the early It is the last, "combining" phase of the algorithm that’s tricky: the distances
1970s, as part of their proiect to work out efficient algorithms for basic com- that have not been considered by either of our recursive calls are precisely those
putational primitives in geometry. These algorithms formed the foundations that occur between a point in the left half and a point in the right half; there
of the then-fledgling field of compatational geometry, and they have found are S2 (n2) such distances, yet we need to find the smallest one in O(n) time
their way into areas such as graphics, computer vision, geographic informa- after the recursive calls return. If we can do this, our solution will be complete:
tion systems, and molecular modeling. And although the closest-pair problem it will be the smallest of the values computed in the recursive calls and this
is one of the most natural algorithmic problems in geometry, it is sm~risingly minimum "left-to-right" distance.
hard to find an efficient algorithm for it. It is immediately clear that there is an
O(n2) solution--compute the distance between each pair of points and take Setting Up the Recursion Let’s get a few easy things out of the way first.
the minimum--and so Shamos and Hoey asked whether an algorithm asymp- It will be very useful if every recursive call, on a set P’ c_ p, begins with two
totically faster than quadratic could be found. It took quite a long time before lists: a list p.t~ in which a~ the points in P’ have been sorted by increasing x-
they resolved this question, and the O(n log n) algorithm we give below is coordinate, and a list P; in which all the points in P’ have been sorted by
essentially the one they discovered. In fact, when we return to this problem in increasing y-coordinate. We can ensure that this remains true throughout the
Chapter 13, we wi!l see that it is possible to further improve the running fim~ algorithm as follows.
to O(n) using randomization. First, before any of the recursion begins, we sort all the points in P by x-
coordinate and again by y-coordinate, producing lists Px and Py. Attached to
each entry in each list is a record of the position of that point in both lists.
/¢::~ Designing the Algorithm
We begin with a bit of notation. Let us denote the set of points by P = The first level of recursion will work as follows, with all further levels
working in a completely analogous way. We define O to be the set of points
{Pl ..... Pn}, where Pi has coordinates (x;, Yi); and for two points Pi, Pj E P,
in the first In/2] positions of the list Px (the "left half") and R to be the set of
we use d(p~, pj) to denote the standard Euclidean distance between them. Our
points in the final [n/2J positions of the list Px (the "right half"). See Figure 5.6.
goal is to find a pair of points pi, pl that minimizes d(pi, p1).
By a single pass through each of Px and Py, in O(n) time, we can create the
We will assume that no two points in P have the same x-coordinate or
the same y-coordinate. This makes the discussion cleaner; and it’s easy to
eliminate this assumption either by initially applying a rotation to the points
that makes it ~e, or by slightly extending the algorithm we develop here. Q Lim L
It’s instructive to consider the one-dimensional version of this problem for O
a minute, since it is much simpler and the contrasts are revealing. How would
we find the closest pair of points on a line? We’d first sort them, in O(n log n) O
o
time, and then we’d walk through the sorted list, computing the distance from
o
each point to the one that comes after it. It is easy to see that one of these
distances must be the minimum one.
In two dimensions, we could try sorting the points by their y-coordinate
(or x-coordinate) and hoping that the two closest points were near one another
in the order of this sorted list. But it is easy to construct examples in which they o o
are very far apart, preventing us from adapting our one-dimensional approach.
Instead, our plan will be to apply the style of divide and conquer used Figure 5.6 The first level of recursion: The point set P is divided evenly into Q and R by
the line L, and the closest pair is found on each side recursively.
in Mergesort: we find the closest pair among the points in the "left half" of
Chapter 5 Divide and Conquer 5.4 Finding the Closest Pair of Points
228 229

following four lists: Qx, consisting of the points in Q sorted by increasing x- (5.9) There exist q ~ O and r ~ R for which d(q, r) < a if and only if there
ach box can
coordinate; Qy, consisting of the points in Q sorted by increasing y-coordinate;
and analogous lists Rx and Ry. For each entry of each of these lists, as before,
exist s, s’ ~ S for which d(s, s’) < &
It’s worth noticing at this point that S might in fact be the whole set P, in
I~ontain at most |
ne input point.)
we record the position of the point in both lists it belongs to. which case (5.8) and (5.9) really seem to buy us nothing. But this is actuary
Line L
We now recursively determine a closest pair of points in Q (with access far from true, as the following amazing fact shows.
to the lists Qx and Qy). Suppose that q~ and q~ are (correctly) returned as a
closest pair of points in Q. Similarly, we determine a closest pair of points in (5.10) If s, s’ ~ S have the property that d(s, s’) < a, then S and s’ are within
R, obtaining r~ and r~. 15 positions of each other in the sorted list Sy. L812
Combining the Solutions The general machinery of divide and conquer has
gotten us this far, without our really having delved into the structure of the Proof. Consider the subset Z of the plane consisting of all points within
closest-pair problem. But it still leaves us with the problem that we saw distance ~ of L. We partition Z into boxes: squares with horizontal and vertical Boxes ~
looming originally: How do we use the solutions to the two subproblems as sides of length 8/2. One row of Z will consist of four boxes whose horizontal
part of a linear-time "combining" operation? sides have t_he same y-coordinates. This collection of boxes is depicted in
Let 8 be the minimum of d(qo,ql
* *) and d(r~,rl).
* * The real question is: Are Figure 5.7.
there points q E Q and r E R for which d(q, r) < 87 If not, then we have already Suppose two points of S lie in the same box. Since all points in this box lie
found the closest pair in one of our recursive calls. But if there are, then the on the same side of L, these two points either both belong to O or both belong
closest such q and r form the closest pair in P. to R. But any two points in the same box are within distance ~. ~/2 < 8,
Let x* denote the x-coordinate of the rightmost point in Q, and let L denote which contradicts our definition of ~ as the minimum distance between any Figure 5.7 The portion of the
pair of points in Q or in R. Thus each box contains at most one point of S. plane dose to the dividing
the vertical line described by the equation x = x*. This line L "separates" Q line L, as analyzed in the
from R. Here is a simple fact. Now suppose that s, s’ ~ S have the property that d(s, s’) < 8, and that they proof of (5.10).
are at least 16 positions apart in Sy. Assume without loss of generality that s
(5.8) If there exists q ~ Q and r ~ R for which d(q, r) < & then each of q and has the smaller y-coordinate. Then, since there can be at most one point per
r lies within a distance ~ of L. box, there are at least three rows of Z lying between s and s’. But any two
points in Z separated by at least three rows must be a distance of at least 38/2
Proof. Suppose such q and r exist; we write q = (qx, qy) and r = (rx, ry). By apart--a contradiction. []
the definition of x*, we know that qx < x* <_ rx. Then we have
x* - qx <- rx - qx <- d(q, r) < ~ We note that the value of 15 can be reduced; but for our purposes at the
moment, the important thing is that it is an absolute constant.
and In view of (5.10), we can conclude the algorithm as follows. We make one
rx - x* < rx - qx <- d(q, r) < 8, pass through Sy, and for each s ~ Sy, we compute its distance to each of the
next 15 points in Sy. Statement (5.10) implies that in doing so, we will have
so each of q and r has an x-coordinate within ~ of x* and hence lies within computed the distance of each pair of points in S (if any) that are at distance
distance a of the line L. [] less than 8 from each other. So having done this, we can compare the smallest
such distance to 8, and we can report one of two things~ (i) the closest pair
So if we want to find a close q and r, we can restrict our search to the of points in S, if their distance is less than 8; or (if) the (correct) conclusion
narrow band consisting only of points in P within 8 of L. Let S __c p denote this that no pairs of points in S are within ~ of each other. In case (i), this pair is
the closest pair in P; in case (if), the closest pair found by our recursive calls
set, and let Sy denote the list consisting of the points in S sorted by increasing
is the closest pair in P.
y-coordinate. By a single pass through the list Py, we can construct Sy in O(n)
time. Note the resemblance between this procedure and the algorithm we re-
We can restate (5.8) as follows, in terms of the set S. jected at the very beginning, which tried to make one pass through P in order
Chapter 5 Divide and Conquer 5.5 Integer Multiplication 231

of y-coordinate. The reason such an approach works now is due to the ex- Else
tra knowledge (the value of 8) we’ve gained from the recursive calls, and the Return (r~, r~)
spec