Algorithms in C Part 5 3rd Edition 2001 8
Algorithms in C Part 5 3rd Edition 2001 8
in C
PART 5
GRAPH ALGORITHMS
Robert Sedgewick
Princeton University
Addison-Wesley
Boston • San Francisco • New York • Toronto • Montreal
London • Munich • Paris • Madrid
Capetown • Sydney • Tokyo • Singapore • Mexico City
Many of the designations used by manufacturers and sellers to distin-
guish their products are claimed as trademarks. Where those designa-
tions appear in this book and we were aware of a trademark claim, the
designations have been printed in initial capital letters or all capitals.
The author and publisher have taken care in the preparation of this
book, but make no expressed or implied warranty of any kind and as-
sume no responsibility for errors or omissions. No liability is assumed
for incidental or consequential damages in connection with or arising
out of the use of the information or programs contained herein.
Copyright
c 2002 by Addison-Wesley
All rights reserved. No part of this publication may be reproduced,
stored in a retrieval system, or transmitted, in any form or by any
means, electronic, mechanical, photocopying, recording, or otherwise,
without the prior written permission of the publisher. Printed in the
United States of America. Published simultaneously in Canada.
The publisher offers discounts on this book when ordered in quantity
for special sales. For more information, please contact: Pearson Edu-
cation Corporate Sales Division, One Lake Street, Upper Saddle River,
NJ 07458, (800) 382-3419, [email protected].
Visit us on the Web at www.awl.com/cseng/ .
Library of Congress Cataloging-in-Publication Data
Sedgewick, Robert, 1946 –
Algorithms in C / Robert Sedgewick. — 3d ed.
500 p. 24 cm.
Includes bibliographical references and index.
Contents: v. 2, pt. 5. Graph algorithms
1. C (Computer program language) 2. Computer algorithms.
I. Title.
QA76.73.C15S43 2002
005.13’3—dc21 97-23418
CIP
ISBN 0201316633
Text printed on recycled and acid-free paper.
6 7 8 9 1011 DOC 09 08 07
6th Printing July 2007
Preface
Algorithms
This book is the second of three volumes that are intended to survey
the most important computer algorithms in use today. The first volume
(Parts 1–4) covers fundamental concepts (Part 1), data structures (Part
2), sorting algorithms (Part 3), and searching algorithms (Part 4);
this volume (Part 5) covers graphs and graph algorithms; and the
(yet to be published) third volume (Parts 6–8) covers strings (Part
6), computational geometry (Part 7), and advanced algorithms and
applications (Part 8).
The books are useful as texts early in the computer science cur-
riculum, after students have acquired basic programming skills and
familiarity with computer systems, but before they have taken spe-
cialized courses in advanced areas of computer science or computer
applications. The books also are useful for self-study or as a refer-
ence for people engaged in the development of computer systems or
applications programs because they contain implementations of useful
algorithms and detailed information on these algorithms’ performance
characteristics. The broad perspective taken makes the series an ap-
propriate introduction to the field.
PREFACE
Scope
This book, Algorithms in C, Third Edition, Part 5: Graph Algorithms,
contains six chapters that cover graph properties and types, graph
search, directed graphs, minimal spanning trees, shortest paths, and
networks. The descriptions here are intended to give readers an un-
derstanding of the basic properties of as broad a range of fundamental
graph algorithms as possible.
iv
You will most appreciate the material here if you have had a
course covering basic principles of algorithm design and analysis and
programming experience in a high-level language such as C, Java, or
C++. Algorithms in C, Third Edition, Parts 1–4 is certainly ade-
quate preparation. This volume assumes basic knowledge about ar-
rays, linked lists, and ADT design, and makes uses of priority-queue,
symbol-table, and union-find ADTs—all of which are described in de-
tail in Parts 1–4 (and in many other introductory texts on algorithms
and data structures).
Basic properties of graphs and graph algorithms are developed
from first principles, but full understanding of the properties of the
algorithms can lead to deep and difficult mathematics. Although the
discussion of advanced mathematical concepts is brief, general, and
descriptive, you certainly need a higher level of mathematical maturity
to appreciate graph algorithms than you do for the topics in Parts 1–4.
Still, readers at various levels of mathematical maturity will be able to
profit from this book. The topic dictates this approach: some elemen-
tary graph algorithms that should be understood and used by everyone
differ only slightly from some advanced algorithms that are not un-
derstood by anyone. The primary intent here is to place important
algorithms in context with other methods throughout the book, not
to teach all of the mathematical material. But the rigorous treatment
demanded by good mathematics often leads us to good programs, so I
have tried to provide a balance between the formal treatment favored
by theoreticians and the coverage needed by practitioners, without
sacrificing rigor.
v
PREFACE
vi
rithms are brought to light on an intuitive level through the visual
dimension provided by these figures.
Characteristics of the algorithms and of the situations in which
they might be useful are discussed in detail. Although not emphasized,
connections to the analysis of algorithms and theoretical computer
science are developed in context. When appropriate, empirical and
analytic results are presented to illustrate why certain algorithms are
preferred. When interesting, the relationship of the practical algo-
rithms being discussed to purely theoretical results is described. Spe-
cific information on performance characteristics of algorithms and im-
plementations is synthesized, encapsulated, and discussed throughout
the book.
Programming Language
The programming language used for all of the implementations is C
(versions of the book in C++ and Java are under development). Any
particular language has advantages and disadvantages; we use C in this
book because it is widely available and provides the features needed
for the implementations here. The programs can be translated easily
to other modern programming languages because relatively few con-
structs are unique to C. We use standard C idioms when appropriate,
but this book is not intended to be a reference work on C program-
ming.
We strive for elegant, compact, and portable implementations,
but we take the point of view that efficiency matters, so we try to
be aware of the code’s performance characteristics at all stages of
development. There are many new programs in this edition, and
many of the old ones have been reworked, primarily to make them
more readily useful as abstract-data-type implementations. Extensive
comparative empirical tests on the programs are discussed throughout
the book.
A goal of this book is to present the algorithms in as simple and
direct a form as possible. The style is consistent whenever possible
so that similar programs look similar. For many of the algorithms,
the similarities remain regardless of which language is used: Dijkstra’s
algorithm (to pick one prominent example) is Dijkstra’s algorithm,
whether expressed in Algol-60, Basic, Fortran, Smalltalk, Ada, Pascal,
vii
PREFACE
Acknowledgments
Many people gave me helpful feedback on earlier versions of this book.
In particular, hundreds of students at Princeton and Brown have suf-
fered through preliminary drafts over the years. Special thanks are due
to Trina Avery and Tom Freeman for their help in producing the first
edition; to Janet Incerpi for her creativity and ingenuity in persuading
our early and primitive digital computerized typesetting hardware and
software to produce the first edition; to Marc Brown for his part in the
algorithm visualization research that was the genesis of so many of the
figures in the book; to Dave Hanson for his willingness to answer all of
my questions about C; and to Kevin Wayne, for patiently answering my
basic questions about networks. I would also like to thank the many
readers who have provided me with detailed comments about various
editions, including Guy Almes, Jon Bentley, Marc Brown, Jay Gischer,
Allan Heydon, Kennedy Lemke, Udi Manber, Dana Richards, John
Reif, M. Rosenfeld, Stephen Seidman, Michael Quinn, and William
Ward.
To produce this new edition, I have had the pleasure of working
with Peter Gordon and Helen Goldstein at Addison-Wesley, who have
patiently shepherded this project as it has evolved from a standard
update to a massive rewrite. It has also been my pleasure to work with
several other members of the professional staff at Addison-Wesley. The
nature of this project made the book a somewhat unusual challenge
for many of them, and I much appreciate their forbearance.
I have gained two new mentors in writing this book, and partic-
ularly want to express my appreciation to them. First, Steve Summit
carefully checked early versions of the manuscript on a technical level,
and provided me with literally thousands of detailed comments, partic-
ularly on the programs. Steve clearly understood my goal of providing
elegant, efficient, and effective implementations, and his comments not
only helped me to provide a measure of consistency across the imple-
mentations, but also helped me to improve many of them substantially.
Second, Lyn Dupre also provided me with thousands of detailed com-
viii
ments on the manuscript, which were invaluable in helping me not only
to correct and avoid grammatical errors, but also—more important—
to find a consistent and coherent writing style that helps bind together
the daunting mass of technical material here. I am extremely grateful
for the opportunity to learn from Steve and Lyn—their input was vital
in the development of this book.
Much of what I have written here I have learned from the teaching
and writings of Don Knuth, my advisor at Stanford. Although Don had
no direct influence on this work, his presence may be felt in the book,
for it was he who put the study of algorithms on the scientific footing
that makes a work such as this possible. My friend and colleague
Philippe Flajolet, who has been a major force in the development of
the analysis of algorithms as a mature research area, has had a similar
influence on this work.
I am deeply thankful for the support of Princeton University,
Brown University, and the Institut National de Recherce en Informa-
tique et Automatique (INRIA), where I did most of the work on the
books; and of the Institute for Defense Analyses and the Xerox Palo
Alto Research Center, where I did some work on the books while
visiting. Many parts of these books are dependent on research that
has been generously supported by the National Science Foundation
and the Office of Naval Research. Finally, I thank Bill Bowen, Aaron
Lemonick, and Neil Rudenstine for their support in building an aca-
demic environment at Princeton in which I was able to prepare this
book, despite my numerous other responsibilities.
Robert Sedgewick
Marly-le-Roi, France, February, 1983
Princeton, New Jersey, January, 1990
Jamestown, Rhode Island, May, 2001
ix
This page intentionally left blank
To Adam, Andrew, Brett, Robbie,
and especially Linda
xi
Notes on Exercises
Classifying exercises is an activity fraught with peril, because readers
of a book such as this come to the material with various levels of
knowledge and experience. Nonetheless, guidance is appropriate, so
many of the exercises carry one of four annotations, to help you decide
how to approach them.
Exercises that test your understanding of the material are marked
with an open triangle, as follows:
17.2 Consider the graph
3-7 1-4 7-8 0-5 5-2 3-8 2-9 0-6 4-9 2-6 6-4.
Draw the its DFS tree and use the tree to find the graph’s bridges
and edge-connected components.
Most often, such exercises relate directly to examples in the text. They
should present no special difficulty, but working them might teach you
a fact or concept that may have eluded you when you read the text.
Exercises that add new and thought-provoking information to the
material are marked with an open circle, as follows:
xiii
This page intentionally left blank
Contents
Graph Algorithms
xvi
Chapter 21. Shortest Paths 265
21.1 Underlying Principles · 273
21.2 Dijkstra’s algorithm · 280
21.3 All-Pairs Shortest Paths · 290
21.4 Shortest Paths in Acyclic Networks · 300
21.5 Euclidean Networks · 308
21.6 Reduction · 314
21.7 Negative Weights · 331
21.8 Perspective · 350
Index 475
xvii
This page intentionally left blank
P A R T
F I V E
Graph Algorithms
This page intentionally left blank
CHAPTER SEVENTEEN
3
4 CHAPTER SEVENTEEN
17.1 Glossary
A substantial amount of nomenclature is associated with graphs. Most
of the terms have straightforward definitions, and, for reference, it is
convenient to consider them in one place: here. We have already used
some of these concepts when considering basic algorithms in Part 1;
others of them will not become relevant until we address associated
advanced algorithms in Chapters 18 through 22.
Definition 17.1 A graph is a set of vertices plus a set of edges that
connect pairs of distinct vertices (with at most one edge connecting
any pair of vertices).
We use the names 0 through V-1 for the vertices in a V -vertex graph.
The main reason that we choose this system is that we can access
quickly information corresponding to each vertex, using array index-
ing. In Section 17.6, we consider a program that uses a symbol table
to establish a 1–1 mapping to associate V arbitrary vertex names with
the V integers between 0 and V − 1. With that program in hand, we
can use indices as vertex names (for notational convenience) without
loss of generality. We sometimes assume that the set of vertices is
defined implicitly, by taking the set of edges to define the graph and
considering only those vertices that are included in at least one edge.
To avoid cumbersome usage such as “the ten-vertex graph with the
following set of edges,” we do not explicitly mention the number of
vertices when that number is clear from the context. By convention,
we always denote the number of vertices in a given graph by V , and
denote the number of edges by E.
We adopt as standard this definition of a graph (which we first
encountered in Chapter 5), but note that it embodies two technical
simplifications. First, it disallows duplicate edges (mathematicians
8 §17.1 CHAPTER SEVENTEEN
sometimes refer to such edges as parallel edges, and a graph that can
contain them as a multigraph). Second, it disallows edges that connect
vertices to themselves; such edges are called self-loops. Graphs that
have no parallel edges or self-loops are sometimes referred to as simple
graphs.
We use simple graphs in our formal definitions because it is easier
to express their basic properties and because parallel edges and self-
loops are not needed in many applications. For example, we can
bound the number of edges in a simple graph with a given number of
vertices.
Property 17.1 A graph with V vertices has at most V (V −1)/2 edges.
Proof : The total of V 2 possible pairs of vertices includes V self-loops
and accounts twice for each edge between distinct vertices, so the
number of edges is at most (V 2 − V )/2 = V (V − 1)/2.
No such bound holds if we allow parallel edges: a graph that is not
simple might consist of two vertices and billions of edges connecting
them (or even a single vertex and billions of self-loops).
For some applications, we might consider the elimination of par-
allel edges and self-loops to be a data-processing problem that our
implementations must address. For other applications, ensuring that
a given set of edges represents a simple graph may not be worth the
trouble. Throughout the book, whenever it is more convenient to ad-
dress an application or to develop an algorithm by using an extended
definition that includes parallel edges or self-loops, we shall do so.
For example, self-loops play a critical role in a classical algorithm that
we will examine in Section 17.4; and parallel edges are common in
the applications that we address in Chapter 22. Generally, it is clear
from the context whether we intend the term “graph” to mean “simple
graph” or “multigraph” or “multigraph with self-loops.”
Mathematicians use the words vertex and node interchangeably,
but we generally use vertex when discussing graphs and node when
discussing representations—for example, in C data structures. We
normally assume that a vertex can have a name and can carry other
associated information. Similarly, the words arc, edge, and link are all
widely used by mathematicians to describe the abstraction embodying
a connection between two vertices, but we consistently use edge when
discussing graphs and link when discussing C data structures.
GRAPH PROPERTIES AND TYPES §17.1 9
the vertices correspond to points in the plane and the distances between
them are relevant. We refer to such graphs as Euclidean graphs. For
many other applications, such as graphs that represent relationships
0 or schedules, the graphs simply embody connectivity information, and
6 7 8 no particular geometric placement of vertices is ever implied. We
1 2 consider examples of algorithms that exploit the geometric information
in Euclidean graphs in Chapters 20 and 21, but we primarily work with
3 9 10
algorithms that make no use of any geometric information, and stress
4
5 11 12 that graphs are generally independent of any particular representation
in a drawing or in a computer.
Focusing solely on the connections themselves, we might wish to
view the vertex labels as merely a notational convenience, and to regard
10
6 1 8 two graphs as being the same if they differ in only the vertex labels.
7 2 Two graphs are isomorphic if we can change the vertex labels on one
to make its set of edges identical to the other. Determining whether
3 9 0
12 or not two graphs are isomorphic is a difficult computational problem
5 11 4 (see Figure 17.2 and Exercise 17.5). It is challenging because there are
V ! possible ways to label the vertices—far too many for us to try all
the possibilities. Therefore, despite the potential appeal of reducing
0 the number of different graph structures that we have to consider by
6 7 8 treating isomorphic graphs as identical structures, we rarely do so.
1 2
As we saw with trees in Chapter 5, we are often interested in
3 9 10 basic structural properties that we can deduce by considering specific
4 sequences of edges in a graph.
5 11 12
Definition 17.2 A path in a graph is a sequence of vertices in which
each successive vertex (after the first) is adjacent to its predecessor in
Figure 17.2 the path. In a simple path, the vertices and edges are distinct. A cycle
Graph isomorphism examples
is a path that is simple except that the first and final vertices are the
The top two graphs are isomorphic
same.
because we can relabel the ver-
tices to make the two sets of edges We sometimes use the term cyclic path to refer to a path whose first
identical (to make the middle
graph the same as the top graph, and final vertices are the same (and is otherwise not necessarily simple);
change 10 to 4, 7 to 3, 2 to 5, 3 to and we use the term tour to refer to a cyclic path that includes every
1, 12 to 0, 5 to 2, 9 to 11, 0 to 12, vertex. An equivalent way to define a path is as the sequence of
11 to 9, 1 to 7, and 4 to 10). The edges that connect the successive vertices. We emphasize this in our
bottom graph is not isomorphic to
the others because there is no way notation by connecting vertex names in a path in the same way as we
to relabel the vertices to make its connect them in an edge. For example, the simple paths in Figure 17.1
set of edges identical to either. include 3-4-6-0-2, and 9-12-11, and the cycles in the graph include
GRAPH PROPERTIES AND TYPES §17.1 11
vertex
path Figure 17.3
spanning tree Graph terminology
This graph has 55 vertices, 70
edges, and 3 connected compo-
cycle nents. One of the connected com-
ponents is a tree (right). The graph
has many cycles, one of which is
highlighted in the large connected
component (left). The diagram also
tree
depicts a spanning tree in the small
edge
connected component (center).
The graph as a whole does not
clique
have a spanning tree, because it
is not connected.
knots or beads, and the edges were physical connections, such as strings
2
3 or wires, a connected graph would stay in one piece if picked up by
1
any vertex, and a graph that is not connected comprises two or more
4
such pieces.
0
Definition 17.4 An acyclic connected graph is called a tree (see Chap-
5
ter 4). A set of trees is called a forest. A spanning tree of a connected
8
6
graph is a subgraph that contains all of that graph’s vertices and is a
7 single tree. A spanning forest of a graph is a subgraph that contains
2
all of that graph’s vertices and is a forest.
1 3
For example, the graph illustrated in Figure 17.1 has three con-
nected components, and is spanned by the forest 7-8 9-10 9-11 9-12
0 4
0-1 0-2 0-5 5-3 5-4 4-6 (there are many other spanning forests).
7 5 Figure 17.3 highlights these and other features in a larger graph.
6 We explore further details about trees in Chapter 4, and look at
various equivalent definitions. For example, a graph G with V vertices
2
1 is a tree if and only if it satisfies any of the following four conditions:
3
• G has V − 1 edges and no cycles.
0
• G has V − 1 edges and is connected.
4
6 • Exactly one simple path connects each pair of vertices in G.
5
• G is connected, but removing any edge disconnects it.
1 2 Any one of these conditions is necessary and sufficient to prove the
other three, and we can develop other combinations of facts about
0 3
trees from them (see Exercise 17.1). Formally, we should choose one
5 4 condition to serve as a definition; informally, we let them collectively
serve as the definition, and freely engage in usage such as the “acyclic
1
2 connected graph” choice in Definition 17.4.
0 Graphs with all edges present are called complete graphs (see
3 Figure 17.4). We define the complement of a graph G by starting with
4
a complete graph that has the same set of vertices as the original graph,
Figure 17.4 and removing the edges of G. The union of two graphs is the graph
Complete graphs induced by the union of their sets of edges. The union of a graph and
These complete graphs, with ev- its complement is a complete graph. All graphs that have V vertices are
ery vertex connected to every other subgraphs of the complete graph that has V vertices. The total number
vertex, have 10, 15, 21, 28, and of different graphs that have V vertices is 2V (V −1)/2 (the number of
36 edges (bottom to top). Every different ways to choose a subset from the V (V − 1)/2 possible edges).
graph with between 5 and 9 ver-
tices (there are more than 68 bil- A complete subgraph is called a clique.
lion such graphs) is a subgraph of Most graphs that we encounter in practice have relatively few
one of these graphs. of the possible edges present. To quantify this concept, we define the
GRAPH PROPERTIES AND TYPES §17.1 13
17.3 Write down a list of the nonisomorphic cycles of the graph in Fig-
ure 17.1. For example, if your list contains 3-4-5-3, it should not contain
3-5-4-3, 4-5-3-4, 4-3-5-4, 5-3-4-5, or 5-4-3-5.
17.4 Consider the graph
3-7 1-4 7-8 0-5 5-2 3-8 2-9 0-6 4-9 2-6 6-4.
Determine the number of connected components, give a spanning forest, list
all the simple paths with at least three vertices, and list all the nonisomorphic
cycles (see Exercise 17.3).
16 §17.2 CHAPTER SEVENTEEN
◦ 17.5 Consider the graphs defined by the following four sets of edges:
0-1 0-2 0-3 1-3 1-4 2-5 2-9 3-6 4-7 4-8 5-8 5-9 6-7 6-9 7-8
0-1 0-2 0-3 0-3 1-4 2-5 2-9 3-6 4-7 4-8 5-8 5-9 6-7 6-9 7-8
0-1 1-2 1-3 0-3 0-4 2-5 2-9 3-6 4-7 4-8 5-8 5-9 6-7 6-9 7-8
4-1 7-9 6-2 7-3 5-0 0-2 0-8 1-6 3-9 6-3 2-8 1-5 9-8 4-5 4-7
Which of these graphs are isomorphic to one another? Which of them are
planar?
17.6 Consider the more than 68 billion graphs referred to in the caption to
Figure 17.4. What percentage of them has fewer than nine vertices?
17.7 How many different subgraphs are there in a given graph with V ver-
tices and E edges?
• 17.8 Give tight upper and lower bounds on the number of connected com-
ponents in graphs that have V vertices and E edges.
◦ 17.9 How many different undirected graphs are there that have V vertices
and E edges?
••• 17.10 If we consider two graphs to be different only if they are not isomorphic,
how many different graphs are there that have V vertices and E edges?
17.11 How many V -vertex graphs are bipartite?
(by removing some edges and adding others), and to retrieve the graphs
(in the form of an array of edges).
The ADT in Program 17.1 is primarily a vehicle to allow us to
develop and test algorithms; it is not a general-purpose interface. As
usual, we work with the simplest interface that supports the basic
graph-processing operations that we wish to consider. Defining such
an interface for use in practical applications involves making numerous
tradeoffs among simplicity, efficiency, and generality. We consider a
few of these tradeoffs next; we address many others in the context of
implementations and applications throughout this book.
We assume for simplicity that graph representations include inte-
gers V and E that contain the number of vertices and edges, respectively,
so that we can refer directly to those values by name in ADT implemen-
tations. When convenient, we make other, similar, assumptions about
18 §17.2 CHAPTER SEVENTEEN
#include <stdio.h>
#include "GRAPH.h"
main(int argc, char *argv[])
{ int V = atoi(argv[1]), E = atoi(argv[2]);
Graph G = GRAPHrand(V, E);
if (V < 20)
GRAPHshow(G);
else printf("%d vertices, %d edges, ", V, E);
printf("%d component(s)\n", GRAPHcc(G));
}
#include <stdlib.h>
#include "GRAPH.h"
struct graph { int V; int E; int **adj; };
Graph GRAPHinit(int V)
{ Graph G = malloc(sizeof *G);
G->V = V; G->E = 0;
G->adj = MATRIXint(V, V, 0);
return G;
}
void GRAPHinsertE(Graph G, Edge e)
{ int v = e.v, w = e.w;
if (G->adj[v][w] == 0) G->E++;
G->adj[v][w] = 1;
G->adj[w][v] = 1;
}
void GRAPHremoveE(Graph G, Edge e)
{ int v = e.v, w = e.w;
if (G->adj[v][w] == 1) G->E--;
G->adj[v][w] = 0;
G->adj[w][v] = 0;
}
int GRAPHedges(Edge a[], Graph G)
{ int v, w, E = 0;
for (v = 0; v < G->V; v++)
for (w = v+1; w < G->V; w++)
if (G->adj[v][w] == 1)
a[E++] = EDGE(v, w);
return E;
}
GRAPH PROPERTIES AND TYPES §17.3 23
0
01 1 00 110 00 000
be 0 otherwise. Program 17.3 is an implementation of the graph ADT 1
10 0 00 000 00 000
that uses a direct representation of this matrix. The implementation 2
10 0 00 000 00 000
maintains a two-dimensional array of integers with the entry a[v][w] 3
00 0 01 100 00 000
set to 1 if there is an edge connecting v and w in the graph, and set to 0 4
00 0 10 110 00 000
otherwise. In an undirected graph, each edge is actually represented by 5
10 0 11 000 00 000
two entries: the edge v-w is represented by 1 values in both a[v][w] 6
10 0 01 000 00 000
7
and a[w][v], as is the edge w-v. 00 0 00 000 10 000
8
As mentioned in Section 17.2, we generally assume that the num- 00 0 00 001 00 000
9
ber of vertices is known to the client when the graph is initialized. For 00 0 00 000 00 111
10
many applications, we might set the number of vertices as a compile- 00 0 00 000 01 000
11
time constant and use statically allocated arrays, but Program 17.3 00 0 00 000 01 001
12
takes the slightly more general approach of allocating dynamically the 00 0 00 000 01 010
void GRAPHshow(Graph G)
{ int i, j;
printf("%d vertices, %d edges\n", G->V, G->E);
for (i = 0; i < G->V; i++)
{
printf("%2d:", i);
for (j = 0; j < G->V; j++)
if (G->adj[i][j] == 1) printf(" %2d", j);
printf("\n");
}
}
given a vertex, we can immediately access its list; we use linked lists so
that we can add new edges in constant time. Figure 17.10
Adjacency-lists data structure
Program 17.6 is an implementation of the ADT interface in Pro-
This figure depicts a representa-
gram 17.1 that is based on this approach, and Figure 17.10 depicts an
tion of the graph in Figure 17.1 as
example. To add an edge connecting v and w to this representation an array of linked lists. The space
of the graph, we add w to v’s adjacency list and v to w’s adjacency used is proportional to the number
list. In this way, we still can add new edges in constant time, but the of nodes plus the number of edges.
To find the indices of the vertices
total amount of space that we use is proportional to the number of
connected to a given vertex v , we
vertices plus the number of edges (as opposed to the number of vertices look at the v th position in an ar-
squared, for the adjacency-matrix representation). We again represent ray, which contains a pointer to a
each edge in two different places: an edge connecting v and w is rep- linked list containing one node for
each vertex connected to v . The
resented as nodes on both adjacency lists. It is important to include
order in which the nodes appear
both; otherwise, we could not answer efficiently simple questions such on the lists depends on the method
as, “Which vertices are connected directly to vertex v?” that we use to construct the lists.
28 §17.4 CHAPTER SEVENTEEN
#include <stdlib.h>
#include "GRAPH.h"
typedef struct node *link;
struct node { int v; link next; };
struct graph { int V; int E; link *adj; };
link NEW(int v, link next)
{ link x = malloc(sizeof *x);
x->v = v; x->next = next;
return x;
}
Graph GRAPHinit(int V)
{ int v;
Graph G = malloc(sizeof *G);
G->V = V; G->E = 0;
G->adj = malloc(V*sizeof(link));
for (v = 0; v < V; v++) G->adj[v] = NULL;
return G;
}
void GRAPHinsertE(Graph G, Edge e)
{ int v = e.v, w = e.w;
G->adj[v] = NEW(w, G->adj[v]);
G->adj[w] = NEW(v, G->adj[w]);
G->E++;
}
int GRAPHedges(Edge a[], Graph G)
{ int v, E = 0; link t;
for (v = 0; v < G->V; v++)
for (t = G->adj[v]; t != NULL; t = t->next)
if (v < t->v) a[E++] = EDGE(v, t->v);
return E;
}
GRAPH PROPERTIES AND TYPES §17.4 29
◦ 17.33 Add a function declaration to the graph ADT (Program 17.1) that
removes self-loops and parallel edges. Provide the trivial implementation
of this function for the adjacency-matrix–based ADT implementation (Pro-
gram 17.3), and provide an implementation of the function for the adjacency- 0 1 2 3 4 5 6 7 8 9 10 11 12
list–based ADT implementation (Program 17.6) that uses time proportional 0 1 1 1 0 0 1 1 0 0 0 0 0 0
to E and extra space proportional to V . 1 0 1 0 0 0 0 0 0 0 0 0 0 0
2 0 0 1 0 0 0 0 0 0 0 0 0 0
17.34 Extend your solution to Exercise 17.33 to also remove degree-0 (iso- 3 0 0 0 1 0 0 0 0 0 0 0 0 0
lated) vertices. Note: To remove vertices, you need to rename the other vertices 4 0 0 0 1 1 0 0 0 0 0 0 0 0
and rebuild the data structures—you should do so just once. 5 0 0 0 1 1 1 0 0 0 0 0 0 0
6 0 0 0 0 1 0 1 0 0 0 0 0 0
• 17.35 Write an ADT function for the adjacency-lists representation (Pro- 7 0 0 0 0 0 0 0 1 1 0 0 0 0
gram 17.6) that collapses paths that consist solely of degree-2 vertices. Specif- 8 0 0 0 0 0 0 0 0 1 0 0 0 0
ically, every degree-2 vertex in a graph with no parallel edges appears on some 9 0 0 0 0 0 0 0 0 0 1 1 1 1
10 0 0 0 0 0 0 0 0 0 0 1 0 0
path u-...-w where u and w are either equal or not of degree 2. Replace
11 0 0 0 0 0 0 0 0 0 0 0 1 1
any such path with u-w and then remove all unused degree-2 vertices as in 12 0 0 0 0 0 0 0 0 0 0 0 0 1
Exercise 17.34. Note: This operation may introduce self-loops and parallel
edges, but it preserves the degrees of vertices that are not removed.
17.36 Give a (multi)graph that could result from applying the transformation 0
described in Exercise 17.35 on the sample graph in Figure 17.1. 5 2 1 6
1
2
space E V2 V +E
initialize empty 1 V2 V
copy E V2 E
destroy 1 V E
insert edge 1 1 1
find/remove edge E 1 V
is v isolated? E V 1
path from u to v? E lg∗ V V2 V +E
programs that we consider differ only in the way that such abstract
operations are implemented.
Why not develop these algorithms at a higher level of abstraction,
then discuss different options for representing the data and implement-
ing the associated operations, as we have done in so many instances
throughout this book? The answer to this question is not clearcut. Of-
ten, we are presented with a clear choice between adjacency lists, for
sparse graphs, and adjacency matrices, for dense graphs. We choose
to work directly with one of these two particular representations in
our primary implementations because the performance characteristics
of the algorithms are clearly exposed in code that works with the low-
level representation, and that code is generally no more difficult to read
and understand than code written in terms of higher-level abstractions.
In some instances, our algorithm-design decisions depend on cer-
tain properties of the representation. Working at a higher level of
abstraction might obscure our knowledge of that dependence. If we
know that one representation would lead to poor performance but
another would not, we would be taking an unnecessary risk were we
to consider the algorithm at the wrong level of abstraction. As usual,
our goal is to craft implementations such that we can make precise
statements about performance.
It is possible to address these issues with a rigorous abstract
approach, where we build layers of abstraction that culminate in the
abstract operations that we need for our algorithms. Adding an ADT
operation to test for the existence of an edge is one example (see
Exercise 17.19), and we could also arrange to have representation-
independent code for processing each of the vertices adjacent to a given
vertex (see Exercise 17.60). Such an approach is worthwhile in many
situations. In this book, however, we keep one eye on performance by
using code that directly accesses adjacency matrices when working with
dense graphs and adjacency lists when working with sparse graphs,
augmenting the basic structures as appropriate to suit the task at hand.
All of the operations that we have considered so far are simple,
albeit necessary, data-processing functions; and the essential net re-
sult of the discussion in this section is that basic algorithms and data
structures from Parts 1 through 3 are effective for handling them. As
we develop more sophisticated graph-processing algorithms, we face
more difficult challenges in finding the best implementations for spe-
GRAPH PROPERTIES AND TYPES §17.5 37
◦ 17.38 Why not use a direct representation for graphs (a data structure that
models the graph exactly, with vertices represented as allocated records and
edge lists containing links to vertices instead of vertex names)?
17.39 Write a representation-independent ADT function that returns a pointer
to a vertex-indexed array giving the degree of each vertex in the graph. Hint:
Use GRAPHedges.
17.40 Modify the adjacency-matrix ADT implementation (Program 17.3) to
include in the graph representation a vertex-indexed array that holds the degree
of each vertex. Add an ADT function GRAPHdeg that returns the degree of a
given vertex.
17.41 Do Exercise 17.40 for the adjacency-lists representation.
17.42 Give a row to add to Table 17.1 for the problem of determining the
number of isolated vertices in a graph. Support your answer with function
implementations for each of the three representations.
◦ 17.43 Give a row to add to Table 17.1 for the problem of determining whether
a given digraph has a vertex with indegree V and outdegree 0. Support your
answer with function implementations for each of the three representations.
Note: Your entry for the adjacency-matrix representation should be V .
17.44 Use doubly-linked adjacency lists with cross links as described in the
text to implement a constant-time remove edge function GRAPHremoveE for the
adjacency-lists graph ADT implementation (Program 17.6).
17.45 Add a remove vertex function GRAPHremoveV to the doubly-linked
adjacency-lists graph ADT implementation described in the previous exercise.
◦ 17.46 Modify your solution to Exercise 17.15 to use a dynamic hash table,
as described in the text, such that insert edge and remove edge take constant
amortized time.
GRAPH PROPERTIES AND TYPES §17.5 39
int randV(Graph G)
{ return G->V * (rand() / (RAND_MAX + 1.0)); }
Graph GRAPHrand(int V, int E)
{ Graph G = GRAPHinit(V);
while (G->E < E)
GRAPHinsertE(G, EDGE(randV(G), randV(G)));
return G;
}
For graphs, the latter goal is more elusive than for other domains that
we have considered, although it is still a worthwhile objective. We
shall encounter various different models of randomness, starting with
these two:
Random edges This model is simple to implement, as indicated
by the generator given in Program 17.7. For a given number of vertices
V , we generate random edges by generating pairs of numbers between Figure 17.12
Two random graphs
0 and V − 1. The result is likely to be a random multigraph with self-
loops, rather than a graph as defined in Definition 17.1. A given pair Both of these random graphs have
50 vertices. The sparse graph at
could have two identical numbers (hence, self-loops could occur); and the top has 50 edges, while the
any pair could be repeated multiple times (hence, parallel edges could dense graph at the bottom has
occur). Program 17.7 generates edges until the graph is known to have 500 edges. The sparse graph is
E edges, and leaves to the implementation the decision of whether to not connected, with each vertex
connected only to a few others; the
eliminate parallel edges. If parallel edges are eliminated, the number of dense graph is certainly connected,
edges generated is substantially higher than then number of edges used with each vertex connected to 20
(E) for dense graphs (see Exercise 17.62); so this method is normally others, on the average. These di-
used for sparse graphs. agrams also indicate the difficulty
of developing algorithms that can
Random graph The classic mathematical model for random draw arbitrary graphs (the vertices
graphs is to consider all possible edges and to include each in the here are placed in random posi-
graph with a fixed probability p. If we want the expected number tion).
42 §17.6 CHAPTER SEVENTEEN
vertex defined for each phone number, and an edge for each pair i and
j with the property that i made a telephone call to j within some fixed
900-435-5100 201-332-4562 period. This set of edges represents a huge multigraph. It is certainly
415-345-3030 757-995-5030
757-310-4313 201-332-4562
sparse, since each person places calls to only a tiny fraction of the
747-511-4562 609-445-3260 available telephones. It is representative of many other applications.
900-332-3162 212-435-3562 For example, a financial institution’s credit card and merchant account
617-945-2152 408-310-4150 records might have similar information.
757-995-5030 757-310-4313
212-435-3562 803-568-8358 Function call graph We can associate a graph with any com-
913-410-3262 212-435-3562 puter program with functions as vertices and an edge connecting X
401-212-4152 907-618-9999 and Y whenever function X calls function Y . We can instrument the
201-232-2422 415-345-3120 program to create such a graph (or have a compiler do it). Two com-
913-495-1030 802-935-5112
609-445-3260 415-345-3120 pletely different graphs are of interest: the static version, where we
201-310-3100 415-345-3120 create edges at compile time corresponding to the function calls that
408-310-4150 802-935-5113 appear in the program text of each function; and a dynamic version,
708-332-4353 803-777-5834 where we create edges at run time when the calls actually happen. We
413-332-3562 905-828-8089
815-895-8155 208-971-0020 use static function call graphs to study program structure and dynamic
802-935-5115 408-310-4150 ones to study program behavior. These graphs are typically huge and
708-410-5032 212-435-3562 sparse.
201-332-4562 408-310-4150 In applications such as these, we face massive amounts of data,
815-511-3032 201-332-4562
301-292-3162 505-709-8080 so we might prefer to study the performance of algorithms on real
617-833-2425 208-907-9098 sample data rather than on random models. We might choose to try
800-934-5030 408-310-4150 to avoid degenerate situations by randomly ordering the edges, or by
408-982-3100 201-332-4562 introducing randomness in the decision making in our algorithms, but
415-345-3120 905-569-1313
413-435-4313 415-345-3120 that is a different matter from generating a random graph. Indeed, in
747-232-8323 408-310-4150 many applications, learning the properties of the graph structure is a
802-995-1115 908-922-2239 goal in itself.
Figure 17.14 In several of these examples, vertices are natural named objects,
Transaction graph and edges appear as pairs of named objects. For example, a transaction
A sequence of pairs of numbers graph might be built from a sequence of pairs of telephone numbers,
like this one might represent a and a Euclidean graph might be built from a sequence of pairs of
list of telephone calls in a local cities or towns. Program 17.9 is a client program that we can use to
exchange, or financial transfers build a graph in this common situation. For the client’s convenience,
between accounts, or any sim-
ilar situation involving transac- it takes the set of edges as defining the graph, and deduces the set of
tions between entities with unique vertex names from their use in edges. Specifically, the program reads a
identifiers. The graphs are hardly sequence of pairs of symbols from standard input, uses a symbol table
random—some phones are far
to associate the vertex numbers 0 to V − 1 to the symbols (where V is
more heavily used than others and
some accounts are far more active the number of different symbols in the input), and builds a graph by
than others. inserting the edges, as in Programs 17.7 and 17.8. We could adapt any
GRAPH PROPERTIES AND TYPES §17.6 45
#include <stdio.h>
#include "GRAPH.h"
#include "ST.h"
Graph GRAPHscan(int Vmax, int Emax)
{ char v[100], w[100];
Graph G = GRAPHinit(Vmax);
STinit();
while (scanf("%99s %99s", v, w) == 2)
GRAPHinsertE(G, EDGE(STindex(v), STindex(w)));
return G;
}
Figure 17.16 17.64 Write a client program that generates sparse random graphs for a well-
de Bruijn graphs chosen set of values of V and E, and prints the amount of space that it used
for the graph representation and the amount of time that it took to build it.
A de Bruijn digraph of order n has Test your program with a sparse-graph ADT implementation (Program 17.6)
2n vertices with edges from i to and with the random-graph generator (Program 17.7), so that you can do
2i mod n and (2i + 1) mod n, for meaningful empirical tests on graphs drawn from this model.
all i. Pictured here are the under-
lying undirected de Bruijn graphs 17.65 Write a client program that generates dense random graphs for a well-
of order 6, 5, 4, and 3 (top to bot- chosen set of values of V and E, and prints the amount of space that it used
tom). for the graph representation and the amount of time that it took to build it.
GRAPH PROPERTIES AND TYPES §17.6 49
in Exercise 17.64 (for low densities) and as described in Exercise 17.65 (for
high densities).
◦ 17.79 Design an appropriate extension that allows you to use Program 17.1
to build separation graphs without having to call a function for each implied
edge. That is, the number of function calls required to build a graph should
be proportional to the sum of the sizes of the groups. (Implementations of
graph-processing functions are left with the problem of efficiently handling
implied edges.)
17.80 Give a tight upper bound on the number of edges in any separation
graph with N different groups of k people.
17.81 Draw graphs in the style of Figure 17.16 that, for V = 8, 16, and 32,
have V vertices numbered from 0 to V − 1 and an edge connecting each vertex
i with i/2.
17.82 Modify the ADT interface in Program 17.1 to allow clients to use
symbolic vertex names and edges to be pairs of instances of a generic Vertex
type. Hide the vertex-index representation and the symbol-table ADT usage
completely from clients.
17.83 Add a function to the ADT interface from Exercise 17.82 that supports
a join operation for graphs, and provide implementations for the adjacency-
matrix and adjacency-lists representations. Note: Any vertex or edge in either
graph should be in the join, but vertices that are in both graphs appear only
once in the join, and you should remove parallel edges.
0
Program 17.12 Hamilton path
6
1 2 The function GRAPHpathH searches for a Hamilton path from v to w. It
uses a recursive function that differs from the one in Program 17.11 in
3 just two respects: First, it returns successfully only if it finds a path of
4 length V; second, it resets the visited marker before returning unsuc-
5 cessfully. Do not expect this function to finish except for tiny graphs
(see text).
0-1
1-2 static int visited[maxV];
2-3 int pathR(Graph G, int v, int w, int d)
3-4 { int t;
4-5 if (v == w)
4-6
2-4 { if (d == 0) return 1; else return 0; }
4-3 visited[v] = 1;
4-5 for (t = 0; t < G->V; t++)
4-6 if (G->adj[v][t] == 1)
0-2
2-1
if (visited[t] == 0)
2-3 if (pathR(G, t, w, d-1)) return 1;
3-4 visited[v] = 0;
4-5 return 0;
4-6
}
2-4
4-3 int GRAPHpathH(Graph G, int v, int w)
4-5 { int t;
4-6 for (t = 0; t < G->V; t++) visited[t] = 0;
0-5
return pathR(G, v, w, G->V-1);
5-4
4-2 }
2-1
2-3
4-3 Hamilton tour problem. Is there a cycle that visits every vertex in the
3-2 graph exactly once?
2-1 At first blush, this problem seems to admit a simple solution: We
4-6
can write down the simple recursive program for finding a Hamilton
0-6
path that is shown in Program 17.12. But this program is not likely
Figure 17.19 to be useful for many graphs, because its worst-case running time is
Hamilton-tour–search trace exponential in the number of vertices in the graph.
This trace shows the edges checked
by Program 17.12 when discover- Property 17.3 A recursive search for a Hamilton tour could take
ing that the graph shown at the top exponential time.
has no Hamilton tour. For brevity,
edges to marked vertices are omit- Proof : Consider a graph where vertex V-1 is isolated and the edges,
ted. with the other V − 1 vertices, constitute a complete graph. Pro-
GRAPH PROPERTIES AND TYPES §17.7 55
gram 17.12 will never find a Hamilton path, but it is easy to see by
induction that it will examine all of the (V − 1)! paths in the complete
graph, all of which involve V − 1 recursive calls. The total number of
recursive calls is therefore about V !, or about (V /e)V , which is higher
than any constant to the V th power.
Our implementations Program 17.11 for finding simple paths and
Program 17.12 for finding Hamilton paths are extremely similar, and
both programs terminate when all the elements of the visited array
are set to 1. Why are the running times so dramatically different?
Program 17.11 is guaranteed to finish quickly because it sets at least
one element of the visited array to 1 each time pathR is called.
Program 17.12, on the other hand, can set visited elements back to
0, so we cannot guarantee that it will finish quickly.
When searching for simple paths, in Program 17.11, we know
that, if there is a path from v to w, we will find it by taking one of
the edges v-t from v, and the same is true for Hamilton paths. But
there this similarity ends. If we cannot find a simple path from t to
w, then we can conclude that there is no simple path from v to w that
goes through t; but the analogous situation for Hamilton paths does
not hold. It could be that case that there is no Hamilton path to w
that starts with v-t, but there is one that starts with v-x-t for some
other vertex x. We have to make a recursive call from t corresponding
to every path that leads to it from v. In short, we may have to check
every path in the graph.
It is worthwhile to reflect on just how slow a factorial-time algo-
rithm is. If we could process a graph with 15 vertices in 1 second, it
would take 1 day to process a graph with 19 vertices, over 1 year for
21 vertices, and over 6 centuries for 23 vertices. Faster computers do
not help much, either. A computer that is 200,000 times faster than
our original one would still take more than a day to solve that 23-
vertex problem. The cost to process graphs with 100 or 1000 vertices
is too high to contemplate, let alone graphs of the size that we might
encounter in practice. It would take millions of pages in this book just
to write down the number of centuries required to process a graph that
contained millions of vertices.
In Chapter 5, we examined a number of simple recursive pro-
grams that are similar in character to Program 17.12 but that could
be drastically improved with top-down dynamic programming. This
56 §17.7 CHAPTER SEVENTEEN
0
Program 17.13 Euler path existence
6
1 2 This function, which is based upon the corollary to Property 17.4, tests
whether there is an Euler path from v to w in a connected graph, using
3 the GRAPHdeg ADT function from Exercise 17.40. It takes time propor-
4 tional to V , not including preprocessing time to check connectivity and
5 to build the vertex-degree table used by GRAPHdeg.
1-0-2-1
int GRAPHpathE(Graph G, int v, int w)
0
{ int t;
6 t = GRAPHdeg(G, v) + GRAPHdeg(G, w);
1 2 if ((t % 2) != 0) return 0;
for (t = 0; t < G->V; t++)
3 if ((t != v) && (t != w))
4
if ((GRAPHdeg(G, t) % 2) != 0) return 0;
5
return 1;
6-0-1-2-4-6
}
0
6
1 2
Proof : This statement is equivalent to Property 17.4 in the graph
formed by adding an edge connecting the two vertices of odd degree
3 (the ends of the path).
4
5 Therefore, for example, there is no way for anyone to traverse
2-0-1-2-3-4-2
all the bridges of Königsberg in a continuous path without retracing
their steps, since all four vertices in the corresponding graph have odd
0 degree (see Figure 17.21).
6 As discussed in Section 17.5, we can find all the vertex degrees
1 2
in time proportional to E for the adjacency-lists or set-of-edges rep-
3
resentation and in time proportional to V 2 for the adjacency-matrix
4 representation, or we can maintain a vertex-indexed array with ver-
5 tex degrees as part of the graph representation (see Exercise 17.40).
0-6-4-5-0-2-1-0 Given the array, we can check whether Property 17.4 is satisfied in
time proportional to V . Program 17.13 implements this strategy, and
Figure 17.22 demonstrates that determining whether a given graph has a Euler path
Partial tours is an easy computational problem. This fact is significant because we
Following edges from any vertex have little intuition to suggest that the problem should be easier than
in a graph that has an Euler tour determining whether a given graph has a Hamilton path.
always takes us back to that vertex,
as shown in these examples. The Now, suppose that we actually wish to find a Euler path. We are
cycle does not necessarily use all treading on thin ice because a direct recursive implementation (find
the edges in the graph. a path by trying an edge, then making a recursive call to find a path
GRAPH PROPERTIES AND TYPES §17.7 59
#include "STACK.h"
int path(Graph G, int v)
{ int w;
for (; G->adj[v] != NULL; v = w)
{
STACKpush(v);
w = G->adj[v]->v;
GRAPHremoveE(G, EDGE(v, w));
}
return v;
}
void pathEshow(Graph G, int v, int w)
{
STACKinit(G->E);
printf("%d", w);
while ((path(G, v) == v) && !STACKempty())
{ v = STACKpop(); printf("-%d", v); }
printf("\n");
}
for the rest of the graph) will have the same kind of factorial-time
performance as Program 17.12. We expect not to have to live with
such performance because it is so easy to test whether a path exists,
so we seek a better algorithm. It is possible to avoid factorial-time
blowup with a fixed-cost test for determining whether or not to use an
edge (rather than unknown costs from the recursive call), but we leave
this approach as an exercise (see Exercises 17.96 and 17.97).
Another approach is suggested by the proof of Property 17.4.
Traverse a cyclic path, deleting the edges encountered and pushing
onto a stack the vertices that it encounters, so that (i) we can trace
60 §17.7 CHAPTER SEVENTEEN
0: 2 5 6 0 0: 6 0
Figure 17.23 6 6
1: 2 1:
Euler tour by removing cycles
2: 0 3 4 1 1 2 2: 3 4 1 2
This figure shows how Pro- 3: 4 2 3: 4 2
gram 17.14 discovers an Euler 4: 6 5 3 2 3 4: 3 2 3
tour from 0 back to 0 in a sample 5: 4 0 4 5: 4
graph. Thick black edges are those 6: 4 0 5 6: 0 5
on the tour, the stack contents are
1 1 2 0 5 4 6
listed below each diagram, and ad-
jacency lists for the non-tour edges
are shown at left. 0: 2 5 6 0 0: 0
First, the program adds the 1: 6 1: 6
edge 0-1 to the tour and removes 2: 0 3 4 1 2 2: 3 4 1 2
it from the adjacency lists (in two 3: 4 2 3: 4 2
places) (top left, lists at left). Sec- 4: 6 5 3 2 3 4: 3 2 3
ond, it adds 1-2 to the tour in 5: 4 0 4 5: 4
the same way (left, second from 6: 4 0 5 6: 5
top). Next, it winds up back at 0 1 2 1 2 0 5 4 6 0
but continues to do another cycle
0-5-4-6-0, winding up back at 0 0: 5 6 0 0: 0
with no more edges incident upon 1: 6 1: 6
0 (right, second from top). Then 2: 3 4 2: 3 4
1 2 1 2
it pops the isolated vertices 0 and 3: 4 2 3: 2
6 from the stack until 4 is at the 4: 6 5 3 2 4: 2
3 3
top and starts a tour from 4 (right, 5: 4 0 5:
4 4
third from from top), which takes 6: 4 0 6:
5 5
it to 3, 2, and back to 4, where-
upon it pops all the now-isolated 1 2 0 1 2 0 5 4 3
vertices 4, 2, 3, and so forth. The
sequence of vertices popped from 0: 6 0 0: 0
the stack defines the Euler tour 1: 6 1: 6
0-6-4-2-3-4-5-0-2-1-0 of the 2: 3 4 1 2 2: 4 1 2
whole graph. 3: 4 2 3:
4: 6 5 3 2 3 4: 2 3
5: 4 4 5: 4
6: 4 0 6:
5 5
1 2 0 5 1 2 0 5 4 3 2
0: 6 0 0: 0
1: 6 1: 6
2: 3 4 1 2 2: 1 2
3: 4 2 3:
4: 6 3 2 3 4: 3
5: 5:
4 4
6: 4 0 6:
5 5
1 2 0 5 4 1 2 0 5 4 3 2 4
GRAPH PROPERTIES AND TYPES §17.7 61
back along that path, printing out its edges, and (ii) we can check each
vertex for additional side paths (which can be spliced into the main
path). This process is illustrated in Figure 17.23.
Program 17.14 is an implementation along these lines, for an
adjacency-lists graph ADT. It assumes that a Euler path exists, and it
destroys the graph representation; so the client has responsibility to
use GRAPHpathE, GRAPHcopy, and GRAPHdestroy as appropriate. The
code is tricky—novices may wish to postpone trying to understand it
until gaining more experience with graph-processing algorithms in the
next few chapters. Our purpose in including it here is to illustrate that
good algorithms and clever implementations can be very effective for
solving some graph-processing problems.
Property 17.6 We can find a Euler tour in a graph, if one exists, in
linear time.
We leave a full induction proof as an exercise (see Exercise 17.101).
Informally, after the first call on path, the stack contains a path from
v to w, and the graph that remains (after removal of isolated vertices)
consists of the smaller connected components (sharing at least one
vertex with the path so far found) that also have Euler tours. We pop
isolated vertices from the stack and use path to find Euler tours that
contain the nonisolated vertices, in the same manner. Each edge in the
graph is pushed onto (and popped from) the stack exactly once, so the
total running time is proportional to E.
Despite their appeal as a systematic way to traverse all the edges
and vertices, we rarely use Euler tours in practice because few graphs
have them. Instead, we typically use depth-first search to explore
graphs, as described in detail in Chapter 18. Indeed, as we shall see,
doing depth-first search in an undirected graph amounts to computing
a two-way Euler tour: a path that traverses each edge exactly twice,
once in each direction.
In summary, we have seen in this section that it is easy to find
simple paths in graphs, that it is even easier to know whether we can
tour all the edges of a large graph without revisiting any of them (by
just checking that all vertex degrees are even), and that there is even a
clever algorithm to find such a tour; but that it is practically impossible
to know whether we can tour all the graph’s vertices without revisiting
any. We have simple recursive solutions to all these problems, but the
62 §17.7 CHAPTER SEVENTEEN
Exercises
17.84 Show, in the style of Figure 17.17, the trace of recursive calls (and
vertices that are skipped), when Program 17.11 finds a path from 0 to 5 in
the graph
3-7 1-4 7-8 0-5 5-2 3-8 2-9 0-6 4-9 2-6 6-4.
17.85 Modify the recursive function in Program 17.11 to print out a trace
like Figure 17.17, using a global variable as described in the text.
◦ 17.88 Modify Program 17.11 such that it takes a third argument d and
tests the existence of a path connecting u and v of length greater than d. In
particular, GRAPHpath(v, v, 2) should be nonzero if and only if v is on a
cycle.
◦ 17.91 Consider the graphs defined by the following four sets of edges:
0-1 0-2 0-3 1-3 1-4 2-5 2-9 3-6 4-7 4-8 5-8 5-9 6-7 6-9 7-8
0-1 0-2 0-3 1-3 0-3 2-5 5-6 3-6 4-7 4-8 5-8 5-9 6-7 6-9 8-8
0-1 1-2 1-3 0-3 0-4 2-5 2-9 3-6 4-7 4-8 5-8 5-9 6-7 6-9 7-8
4-1 7-9 6-2 7-3 5-0 0-2 0-8 1-6 3-9 6-3 2-8 1-5 9-8 4-5 4-7
Which of these graphs have Euler tours? Which of them have Hamilton tours?
◦ 17.92 Give necessary and sufficient conditions for a directed graph to have
a (directed) Euler tour.
17.93 Prove that every connected undirected graph has a two-way Euler
tour.
17.94 Modify the proof of Property 17.4 to make it work for graphs with
parallel edges and self-loops.
17.95 Show that adding one more bridge could give a solution to the
bridges-of-Königsberg problem.
• 17.96 Prove that a connected graph has a Euler path from v to w only if
it has an edge incident on v whose removal does not disconnect the graph
(except possibly by isolating v).
• 17.97 Use Exercise 17.96 to develop an efficient recursive method for find-
ing a Euler tour in a graph that has one. Beyond the basic graph ADT
functions, you may use the ADT functions GRAPHdeg (see Exercise 17.40) and
GRAPHpath (see Program 17.11). Implement and test your program for the
adjacency-matrix graph representation.
17.98 Develop a representation-independent version of Program 17.14 that
uses a general interface for visiting all edges adjacent to a given vertex (see
Exercise 17.60). Note: Be careful of interactions between your code and the
GRAPHremoveE function. Make sure that your implementation works properly
in the presence of parallel edges and self-loops.
17.99 Give an example where the graph remaining after the first call to
path in Program 17.14 is not connected (in a graph that has a Euler tour).
17.100 Describe how to modify Program 17.14 so that it can be used to
detect whether or not a given graph has a Euler tour, in linear time.
17.101 Give a complete proof by induction that the linear-time Euler tour
algorithm described in the text and implemented in Program 17.14 properly
finds a Euler tour.
◦ 17.102 Find the number of V -vertex graphs that have a Euler tour, for as
large a value of V as you can feasibly afford to do the computation.
• 17.103 Run experiments to determine empirically the average length of the
path found by the first call to path in Program 17.14 for various graphs (see
Exercises 17.63–76). Calculate the probability that this path is cyclic.
64 §17.8 CHAPTER SEVENTEEN
◦ 17.106 Modify Program 17.12 to print out the Hamilton tour if it finds one.
• 17.107 Find a Hamilton tour of the graph
1-2 5-2 4-2 2-6 0-8 3-0 1-3 3-6 1-0 1-4 4-0 4-6 6-5 2-6
6-9 9-0 3-1 4-3 9-2 4-9 6-9 7-9 5-0 9-7 7-3 4-5 0-5 7-8
or show that none exists.
•• 17.108 Determine how many V -vertex graphs have a Hamilton tour, for as
large a value of V as you can feasibly afford to do the computation.
vertices anywhere, so we can solve this problem for many graphs, but
it is impossible to solve for many other graphs. A remarkable classical
result known as Kuratowski’s theorem provides an easy test for de-
termining whether a graph is planar: it says that the only graphs that
cannot be drawn with no edge intersections are those that contain some
subgraph that, after removing vertices of degree 2, is isomorphic to
one of the graphs in Figure 17.24. A straightforward implementation
of that test, even without taking the vertices of degree 2 into consid-
eration, would be too slow for large graphs (see Exercise 17.111), but
in 1974 R. Tarjan developed an ingenious (but intricate) algorithm for
solving the problem in linear time, using a depth-first search scheme
that extends those that we consider in Chapter 18. Tarjan’s algorithm
does not necessarily give a practical layout; it just certifies that a layout
exists. As discussed in Section 17.1, developing a visually pleasing lay-
out in applications where vertices do not necessarily relate directly to
the physical world has turned out to be a challenging research problem.
Matching Given a graph, what is the largest subset of its edges
with the property that no two are connected to the same vertex? This
classical problem is known to be solvable in time proportional to a 1
polynomial function of V and E, but a fast algorithm that is suitable 2
for huge graphs is still an elusive research goal. The problem is easier
0
to solve when restricted in various ways. For example, the problem of
matching students to available positions in selective institutions is an 3
4
example of bipartite matching: We have two different types of vertices
(students and institutions) and we are concerned with only those edges
0 1 2
that connect a vertex of one type with a vertex of the other type. We
see a solution to this problem in Chapter 22.
The solutions to some tractable problems have never been written 3 4 5
down as programs, or have running times so high that we could not
contemplate using them in practice. The following example is in this
Figure 17.24
class. It also demonstrates the capricious nature of the mathematical Forbidden subgraphs in pla-
reality of the difficulty of graph processing. nar graphs
Even cycles in digraphs Does a given digraph have a cycle Neither of these graphs can be
of even length? This question would seem simple to resolve because drawn in the plane without inter-
the corresponding question for undirected graphs is easy to solve (see secting edges, nor can any graph
that contains either of these graphs
Section 18.4), as is the question of whether a digraph has a cycle of odd as a subgraph (after we remove
length. However, for many years, the problem was not sufficiently well vertices of degree two); but all
understood for us even to know whether or not there exists an efficient other graphs can be so drawn.
68 §17.8 CHAPTER SEVENTEEN
tween those results and the properties of graphs that arise in practice
are little understood. The development of general schemes such as the
network-simplex algorithm has been an extremely effective approach
to dealing with such problems.
An intractable graph-processing problem is one for which there
is no known algorithm that is guaranteed to solve the problem in a rea-
sonable amount of time. Many such problems have the characteristic
that we can use a brute-force method where we can try all possibilities
to compute the solution—we consider them to be intractable because
there are far too many possibilities to consider. This class of problems
is extensive, and includes many important problems that we would
like to know how to solve. The term NP-hard describes the prob-
lems in this class. Most experts believe that no efficient algorithms
exist for these problems. We consider the bases for this terminology
and this belief in more detail in Part 8. The Hamilton path problem
that we discussed in Section 17.7 is a prime example of an NP-hard
graph-processing problem, as are those on the following list.
Longest path What is the longest simple path connecting two
given vertices in a graph? Despite its apparent similarity to shortest-
paths problems, this problem is a version of the Hamilton tour prob-
lem, and is NP-hard.
Colorability Is there a way to assign one of k colors to each of
the vertices of a graph such that no edge connects two vertices of the
same color? This classical problem is easy for k = 2 (see Section 18.4),
but it is NP-hard for k = 3.
Independent set What is the size of the largest subset of the
vertices of a graph with the property that no two are connected by an
edge? Just as we saw when contrasting the Euler and Hamilton tour
problems, this problem is NP-hard, despite its apparent similarity to
the matching problem, which is solvable in polynomial time.
Clique What is size of the largest clique (complete subgraph) in
a given graph? This problem generalizes part of the planarity problem,
because if the largest clique has more than four nodes, the graph is not
planar.
These problems are formulated as existence problems—we are
asked to determine whether or not a particular subgraph exists. Some
of the problems ask for the size of the largest subgraph of a particular
type, which we can do by framing an existence problem where we test
70 §17.8 CHAPTER SEVENTEEN
This table summarizes the discussion in the text about the relative diffi-
culty of various classical graph-processing problems, comparing them in
rough subjective terms. These examples indicate not only the range of
difficulty of the problems but also that classifying a given problem can
be a challenging task.
E T I ?
undirected graphs
connectivity ∗
general connectivity ∗
Euler tour ∗
Hamilton tour ∗
bipartite matching ∗
maximum matching ∗
planarity ∗
maximum clique ∗
2-colorability ∗
3-colorability ∗
shortest paths ∗
longest paths ∗
vertex cover ∗
isomorphism ∗
digraphs
transitive closure ∗
strong connectivity ∗
odd-length cycle ∗
even-length cycle ∗
weighted graphs
minimum spanning tree ∗
traveling salesperson ∗
networks
shortest paths (nonnegative weights) ∗
shortest paths (negative weights) ∗
maximum flow ∗
assignment ∗
minimum-cost flow ∗
Key:
E Easy—efficient classical algorithm known (see reference)
T Tractable—solution exists (implementation difficult)
I Intractable—no efficient solution known (NP-hard)
? Unknown
GRAPH PROPERTIES AND TYPES §17.8 73
have a running time that depends on only the number of vertices and
edges, rather than on the graph structure; so we can concentrate on
streamlining implementations and still can predict performance with
confidence.
In summary, there is a wide spectrum of problems and algorithms
known for graph processing. Table 17.2 summarizes some of the in-
formation that we have discussed. Every problem comes in different
versions for different types of graphs (directed, weighted, bipartite,
planar, sparse, dense), and there are thousands of problems and algo-
rithms to consider. We certainly cannot expect to solve every problem
that we might encounter, and some problems that appear to be simple
are still baffling the experts. Despite a natural a priori expectation
that we should have no problem distinguishing easy problems from
intractable ones, the many examples that we have discussed illustrate
that placing a problem even into these rough categories can turn into
a significant research challenge.
As our knowledge about graphs and graph algorithms expands,
given problems may move among these categories. Despite a flurry of
research activity in the 1970s and intensive work by many researchers
since then, the possibility still remains that all the problems that we
are discussing will someday be categorized as “easy” (solvable by an
algorithm that is compact, efficient, and possibly ingenious).
Having developed this context, we shall press on to consider
numerous useful graph-processing algorithms. Problems that we can
solve do arise often, the graph algorithms that we study serve well in
a great variety of applications, and these algorithms serve as the basis
for attacking numerous other problems that we need to handle even if
we cannot guarantee efficient solutions.
Exercises
• 17.109 Prove that neither of the two graphs depicted in Figure 17.24 is
planar.
17.114 What is the size of the largest clique in a de Bruijn graph of order n?
CHAPTER EIGHTEEN
Graph Search
75
76 §18.1 CHAPTER EIGHTEEN
One trick for exploring a maze without getting lost that has been
known since antiquity (dating back at least to the legend of Theseus
and the Minotaur) is to unroll a ball of string behind us. The string
guarantees that we can always find a way out, but we are also interested
in being sure that we have explored every part of the maze, and we
do not want to retrace our steps unless we have to. To accomplish
these goals, we need some way to mark places that we have been. We
could use the string for this purpose as well, but we use an alternative
approach that models a computer implementation more closely.
We assume that there are lights, initially off, in every intersection,
and doors, initially closed, at both ends of every passage. We further
assume that the doors have windows and that the lights are sufficiently
strong and the passages sufficiently straight that we can determine,
by opening the door at one end of a passage, whether or not the
intersection at the other end is lit (even if the door at the other end is
closed). Our goals are to turn on all the lights and to open all the doors.
To reach them, we need a set of rules to follow, systematically. The
following maze-exploration strategy, which we refer to as Trémaux
exploration, has been known at least since the nineteenth century (see
reference section):
(i) If there are no closed doors at the current intersection, go to step
(iii). Otherwise, open any closed door to any passage leading out
of the current intersection (and leave it open).
(ii) If you can see that the intersection at the other end of that passage
is already lighted, try another door at the current intersection
(step (i)). Otherwise (if you can see that the intersection at the
other end of the passage is dark), follow the passage to that
intersection, unrolling the string as you go, turn on the light, and
go to step (i).
(iii) If all the doors at the current intersection are open, check whether
you are back at the start point. If so, stop. If not, use the string
to go back down the passage that brought you to this intersection
for the first time, rolling the string back up as you go, and look
for another closed door there (that is, return to step (i)).
Figures 18.2 and 18.3 depict a traversal of a sample graph and show
that, indeed, every light is lit and every door is opened for that ex-
ample. The figures depict just one of many possible outcomes of the
exploration, because we are free to open the doors in any order at each
78 §18.1 CHAPTER EIGHTEEN
Figure 18.2 0 2
0 2
Trémaux maze exploration
example 6
6
In this diagram, places that we 1 7
1 7
have not visited are shaded (dark) 3
3
and places that we have visited are
white (light). We assume that there 5 4
5 4
are lights in the intersections, and
that, when we have opened doors
into lighted intersections on both 0 2
0 2
ends of a passage, the passage is
lighted. To explore the maze, we 6
6
begin at 0 and take the passage to 1 7
2 (left, top). Then we proceed to 1 7
6, 4, 3 and 5, opening the doors to 3
3
the passages, lighting the intersec-
5 4
tions as we proceed through them, 5 4
and leaving a string trailing behind
us (left). When we open the door
0 2
that leads to 0 from 5, we can see 0 2
that 0 is lighted, so we skip that
passage (top right). Similarly, we 6
6
skip the passage from 5 to 4 (right, 1 7
1 7
second from top), leaving us with 3
nowhere to go from 5 except back 3
to 3 and then back to 4, rolling up 5 4
5 4
our ball of string. When we open
the doorway from the passage from
4 to 5, we can see that 5 is lighted 0 2
0 2
through the open door at the other
end, and we therefore skip that 6
passage (right, bottom). We never 6
1 7
walked down the passage connect- 1 7
ing 4 and 5, but we lighted it by 3
3
opening the doors at both ends.
5 4
5 4
0 2
0 2
6
6
1 7
1 7
3
3
5 4
5 4
GRAPH SEARCH §18.1 79
0 2
0 2 Figure 18.3
Trémaux maze exploration
6
6 example (continued)
1 7 Next, we proceed to 7 (top left),
1 7
3 open the door to see that 0 is
3
lighted (left, second from top),
5 4
5 4 and then proceed to 1 (left, third
from top). At this point, most of
the maze is traversed, and we use
0 2 our string to take us back to the
0 2
beginning, moving from 1 to 7 to 4
6 to 6 to 2 to 0. Back at 0, we com-
6
1 7 plete our exploration by checking
1 7 the passages to 5 (right, second
3 from bottom) and 7 (bottom right),
3
leaving all passages and intersec-
5 4
5 4 tions lighted. Again, the passages
connecting 0 to 5 and 0 to 7 are
both lighted because we opened
0 2
0 2 the doors at both ends, but we did
not walk through them.
6
6
1 7
1 7
3
3
5 4
5 4
0 2
0 2
6
6
1 7
1 7
3
3
5 4
5 4
0 2
0 2
6
6
1 7
1 7
3
3
5 4
5 4
80 §18.1 CHAPTER EIGHTEEN
From the detailed example in Figures 18.2 and 18.3, we see that
there are four different possible situations that arise for each passage
that we consider taking:
(i) The passage is dark, so we take it.
(ii) The passage is the one that we used to enter (it has our string in
Figure 18.4 it), so we use it to exit (and we roll up the string).
Decomposing a maze (iii) The door at the other end of the passage is closed (but the inter-
To prove by induction that section is lit), so we skip the passage.
Trémaux exploration takes us ev- (iv) The door at the other end of the passage is open (and the inter-
erywhere in a maze (top), we
break it into two smaller pieces, by section is lit), so we skip it.
removing all edges connecting the The first and second situations apply to any passage that we traverse,
first intersection with any intersec- first at one end and then at the other end. The third and fourth
tion that can be reached from the
situations apply to any passage that we skip, first at one end and
first passage without going back
through the first intersection (bot- then at the other end. Next, we see how this perspective on maze
tom). exploration translates directly to graph search.
GRAPH SEARCH §18.2 81
Exercises
18.1 Assume that intersections 6 and 7 (and all the hallways connected to
them) are removed from the maze in Figures 18.2 and 18.3, and a hallway
is added that connects 1 and 2. Show a Trémaux exploration of the resulting
maze, in the style of Figures 18.2 and 18.3.
◦ 18.2 Which of the following could not be the order in which lights are turned
on at the intersections during a Trémaux exploration of the maze depicted in
Figures 18.2 and 18.3?
0-7-4-5-3-1-6-2
0-2-6-4-3-7-1-5
0-5-3-4-7-1-6-2
0-7-4-6-2-1-3-5
• 18.3 How many different ways are there to traverse the maze depicted in
Figures 18.2 and 18.3 with a Trémaux exploration?
0 2
Program 18.1 Depth-first search (adjacency-matrix)
6 This code is intended for use with a generic graph-search ADT function
1 7 that initializes a counter cnt to 0 and all of the entries in the vertex-
3 indexed array pre to -1, then calls search once for each connected com-
ponent (see Program 18.3), assuming that the call search(G, EDGE(v,
5 4 v)) marks all vertices in the same connected component as v (by setting
their pre entries to be nonnegative).
Here, we implement search with a recursive function dfsR that
01234567 visits all the vertices connected to e.w by scanning through its row in
0-0 0******* the adjacency matrix and calling itself for each edge that leads to an
0-2 0*1***** unmarked vertex.
2-0
2-6 0*1***2* #define dfsR search
6-2 void dfsR(Graph G, Edge e)
6-4 0*1*3*2* { int t, w = e.w;
4-3 0*143*2*
3-4
pre[w] = cnt++;
3-5 0*14352* for (t = 0; t < G->V; t++)
5-0 if (G->adj[w][t] != 0)
5-3 if (pre[t] == -1)
5-4
dfsR(G, EDGE(w, t));
4-5
4-6 }
4-7 0*143526
7-0
7-1 07143526 and 18.3 (see also Figure 18.17). Figure 18.6 depicts the same process
1-7 using standard graph drawings.
7-4
0-5 These figures illustrate the dynamics of a recursive DFS and show
0-7 the correspondence with Trémaux exploration of a maze. First, the
vertex-indexed array corresponds to the lights in the intersections:
Figure 18.5
DFS trace When we encounter an edge to a vertex that we have already visited
(see a light at the end of the passage), we do not make a recursive
This trace shows the order in
which DFS checks the edges and call to follow that edge (go down that passage). Second, the function
vertices for the adjacency-matrix call–return mechanism in the program corresponds to the string in
representation of the graph corre- the maze: When we have processed all the edges adjacent to a vertex
sponding to the example in Fig- (explored all the passages leaving an intersection), we “return” (in
ures 18.2 and 18.3 (top) and traces
the contents of the pre array (right) both senses of the word).
as the search progresses (asterisks In the same way that we encounter each passage in the maze
represent -1, for unseen vertices). twice (once at each end), we encounter each edge in the graph twice
There are two lines in the trace for
(once at each of its vertices). In Trémaux exploration, we open the
every graph edge (once for each
orientation). Indentation indicates doors at each end of each passage; in DFS of an undirected graph, we
the level of recursion. check each of the two representations of each edge. If we encounter
GRAPH SEARCH §18.2 83
Figure 18.6
Depth-first search
0 2 0 2 0
0 These diagrams are a graphical
2
6 6
view of the process depicted in
1 7 1 7 6 Figure 18.5, showing the DFS
4
recursive-call tree as it evolves.
3 3
Thick black edges in the graph cor-
3 respond to edges in the DFS tree
5 4 5 4
shown to the right of each graph
diagram. Shaded edges are the
candidates to be added to the tree
next. In the early stages (left) the
tree grows down in a straight line,
0 as we make recursive calls for 0,
0 2 0 0 2
2, 6, and 4 (left). Then we make
2
6
2
6
recursive calls for 3, then 5 (right,
1 7 1 7
6 top two diagrams); and return from
4 those calls to make a recursive call
3 3
for 7 from 4 (right, second from
3
5 4 5 4 bottom) and to 1 from 7 (right, bot-
5 tom).
0
0 2 0 0 2
2
2
6 6
6
1 7 6 1 7
4
3 3
3 7
5 4 5 4
5
0
0 2 0 0 2
2
2
6 6
6
1 7 6 1 7
4
3 4 3
3 7
5 4 5 4
5 1
84 §18.2 CHAPTER EIGHTEEN
18.8 Show, in the style of Figure 18.7, a trace of the recursive function calls
made for a standard adjacency-lists DFS of the graph
3-7 1-4 7-8 0-5 5-2 3-8 2-9 0-6 4-9 2-6 6-4.
Figure 18.9 0 0 0
DFS tree representations
2 5 7 2 2
If we augment the DFS recursive-
call tree to represent edges that are 0 6 6 6
checked but not followed, we get
a complete description of the DFS 2 4 4 4
process (left). Each tree node has
3 5 6 7 3 7 3 7
a child representing each of the
nodes adjacent to it, in the order 4 5 0 1 4 5 0 1 5 1
they were considered by the DFS,
and a preorder traversal gives the 0 3 4 7 0 4
0 7 9 Figure 18.11
DFS forest
1 2 5 6 8 10 11 12
The DFS forest at the top repre-
0 0 0 3 4 7 9 9 12
sents a DFS of an adjacency-matrix
4 5 9 11 representation of the graph at the
3 5 6
bottom right. The graph has three
connected components, so the
0 4
forest has three trees. The pre
array is a preorder numbering of
0 the nodes in the tree (the order in
6 7 8 0 1 2 3 4 5 6 7 8 9 10 11 12 which they are examined by the
1 2 pre 0 1 2 4 5 3 6 7 8 9 10 11 12 DFS) and the st array is a parent-
st 0 0 0 5 3 0 4 7 7 9 9 9 11 link representation of the forest.
3 9 10 cc 0 0 0 0 0 0 0 1 1 2 2 2 2
4 The cc array associates each ver-
5 11 12 tex with a connected-component
index (see Program 18.4). As in
Figure 18.9, edges to circles are
tree edges; edges that go to squares
corresponds to preorder tree traversal. In Section 18.6, we examine are back edges; and shaded nodes
the graph-searching analog to level-order tree traversal and explore indicate that the incident edge was
encountered earlier in the search,
its relationship to DFS; in Section 18.7, we examine a general schema
in the other direction.
that encompasses any traversal method.
When traversing graphs, we have been assigning preorder num-
bers to the vertices in the order that we start processing them (just after
entering the recursive search function). We can also assign postorder
numbers to vertices, in the order that we finish processing them (just
before returning from the recursive search function). When processing
a graph, we do more than simply traverse the vertices—as we shall
see, the preorder and postorder numbering give us knowledge about
global graph properties that helps us to accomplish the task at hand.
Preorder numbering suffices for the algorithms that we consider in this
chapter, but we use postorder numbering in later chapters.
We describe the dynamics of DFS for a general undirected graph
with a DFS forest that has one DFS tree for each connected component.
An example of a DFS forest is illustrated in Figure 18.11.
With an adjacency-lists representation, we visit the edges con-
nected to each vertex in an order different from that for the adjacency-
matrix representation, so we get a different DFS forest, as illustrated
in Figure 18.12. DFS trees and forests are graph representations that
describe not only the dynamics of DFS but also the internal representa-
tion of the graph. For example, by reading the children of any node in
Figure 18.12 from left to right, we see the order in which they appear
96 §18.4 CHAPTER EIGHTEEN
Figure 18.12 0 7 9
Another DFS forest
5 2 1 6 8 11 10 12
This forest describes depth-first
search of the same graph as Fig- 0 4 3 0 0 7 9 12 9
18.15 Draw the DFS forest that results from a standard adjacency-lists DFS
of the graph
3-7 1-4 7-8 0-5 5-2 3-8 2-9 0-6 4-9 2-6 6-4.
Figure 18.13
Depth-first search
This figure illustrates the progress
of DFS in a random Euclidean
near-neighbor graph (left). The
figures show the DFS tree ver-
tices and edges in the graph as the
search progresses through 1/4, 1/2,
3/4, and all of the vertices (top to
bottom). The DFS tree (tree edges
only) is shown at the right. As is
evident from this example, the
search tree for DFS tends to be
quite tall and thin for this type of
graph (as it is for many other types
of graphs commonly encountered
in practice). We normally find a
vertex nearby that we have not
seen before.
GRAPH SEARCH §18.5 99
0 2
0 Figure 18.14
A two-way Euler tour
2 5 7
6 0 6
Depth-first search gives us a way
1 to explore any maze, traversing
7 2 4
both passages in each direction.
3 7 3 5 6
We modify Trémaux exploration
4 0 1 4 5 to take the string with us wherever
7 0 3 4 we go and take a back-and-forth
4
5 trip on passages without any string
0 1 2 3 4 5 6 7 in them that go to intersections
pre 0 5 1 6 3 7 2 4 that we have already visited. This
figure shows a different traversal
order than shown in Figures 18.2
and 18.3, primarily so that we
choose to ignore the back links (first encounter) and to go back and can draw the tour without cross-
forth on down links (second encounter) (see Exercise 18.23). ing itself. This ordering might re-
Spanning tree Given a connected graph with V vertices, find a sult, for example, if the edges were
processed in some different order
set of V − 1 edges that connects the vertices. DFS solves this problem when building an adjacency-lists
because any DFS tree is a spanning tree. Our DFS implementations representation of the graph, or,
make precisely V −1 recursive calls for a connected graph, one for each we might explicitly modify DFS
edge on a spanning tree, and can be easily instrumented to produce a to take the geometric placement
of the nodes into account (see Ex-
parent-link representation of the tree, as discussed at the beginning of ercise 18.24). Moving along the
Section 18.4. If the graph has C connected components, we get a span- lower track leading out of 0, we
ning forest with V − C edges. In an ADT function implementation, we move from 0 to 2 to 6 to 4 to 7,
might choose to add the parent-link array to the graph representation, then take a trip from 7 to 0 and
back because pre[0] is less than
in the style of Program 18.4, or to compute it in a client-supplied array. pre[7]. Then we go to 1, back
Vertex search How many vertices are in the same connected to 7, back to 4, to 3, to 5, from
component as a given vertex? We can solve this problem easily by start- 5 to 0 and back, from 5 to 4 and
back, back to 3, back to 4, back to
ing a DFS at the vertex and counting the number of vertices marked. In
6, back to 2, and back to 0. This
a dense graph, we can speed up the process considerably by stopping path may be obtained by a recur-
the DFS after we have marked V vertices—at that point, we know that sive pre- and postorder walk of the
no edge will take us to a vertex that we have not yet seen, so we will DFS tree (ignoring the shaded ver-
tices that represent the second time
be ignoring all the rest of the edges. This improvement is likely to
we encounter the edges) where we
allow us to visit all vertices in time proportional to V log V , not E (see print out the vertex name, recur-
Section 18.8). sively visit the subtrees, then print
Two-colorability, bipartiteness, odd cycle Is there a way to out the vertex name again.
assign one of two colors to each vertex of a graph such that no edge
connects two vertices of the same color? Is a given graph bipartite
(see Section 17.1)? Does a given graph have a cycle of odd length?
These three problems are all equivalent: The first two are different
nomenclature for the same problem; any graph with an odd cycle is
104 §18.5 CHAPTER EIGHTEEN
18.26 Modify your solution to Exercise 18.25 such that you can use it for
huge graphs, where you might not have the space to make a copy of the
list nodes corresponding to each edge. That is, use the list nodes that were
allocated to build the graph, and destroy the original adjacency-lists graph
representation.
◦ 18.28 Explain why the approach taken in Program 18.6 does not generalize
to give an efficient method for determining whether a graph is three-colorable.
18.29 Most graphs are not two-colorable, and DFS tends to discover that
fact quickly. Run empirical tests to study the number of edges examined by
106 §18.6 CHAPTER EIGHTEEN
Program 18.6, for graphs of various sizes, drawn from various graph models
(see Exercises 17.63–76).
◦ 18.30 Prove that every connected graph has a vertex whose removal will not
disconnect the graph, and write a DFS function that finds such a vertex. Hint:
Consider the leaves of the DFS tree.
18.31 Prove that every graph with more than one vertex has at least two
vertices whose removal will not increase the number of connected components.
0 Figure 18.17
DFS tree for finding bridges
5 1 6
Nodes 5, 7, and 12 in this DFS
0 4 3 2 0
tree for the graph in Figure 18.16
3 9 5 11 6 1 all have the property that no back
edge connects a descendant with
4 5 4 11 7 2 0
an ancestor, and no other nodes
9 12 4 10 6 8 have that property. Therefore, as
indicated, breaking the edge be-
11 8 7
tween one of these nodes and its
10 7 parent would disconnect the sub-
tree rooted at that node from the
0 1 2 3 4 5 6 7 8 9 10 11 12
pre 0 7 8 3 2 1 9 10 12 4 11 5 6 rest of the graph. That is, the edges
low 0 0 0 1 1 1 0 10 10 2 10 2 6 0-5, 11-12, and 6-7 are bridges.
We use the vertex-indexed array
low to keep track of the lowest
preorder number referenced by any
graph remains connected when we remove any single edge. In some back edge in the subtree rooted at
contexts, it is more natural to emphasize our ability to disconnect the the vertex. For example, the value
graph rather than the graph’s ability to stay connected, so we freely of low[9] is 2 because one of the
back edges in the subtree rooted
use alternate terminology that provides this emphasis: We refer to
at 9 points to 4 (the vertex with
a graph that is not edge-connected as an edge-separable graph, and preorder number 2), and no other
we call bridges separation edges. If we remove all the bridges in an back edge points higher in the tree.
edge-separable graph, we divide it into edge-connected components or Nodes 5, 7, and 12 are the ones
for which the low value is equal to
bridge-connected components: maximal subgraphs with no bridges.
the pre value.
Figure 18.16 is a small example that illustrates these concepts.
Finding the bridges in a graph seems, at first blush, to be a
nontrivial graph-processing problem, but it actually is an application
of DFS where we can exploit basic properties of the DFS tree that we
have already noted. Specifically, back edges cannot be bridges because
we know that the two nodes they connect are also connected by a path
in the DFS tree. Moreover, we can add a simple test to our recursive
DFS function to test whether or not tree edges are bridges. The basic
idea, stated formally next, is illustrated in Figure 18.17.
Property 18.5 In any DFS tree, a tree edge v-w is a bridge if and
only if there are no back edges that connect a descendant of w to an
ancestor of w.
Proof : If there is such an edge, v-w cannot be a bridge. Conversely, if
v-w is not a bridge, then there has to be some path from w to v in the
graph other than w-v itself. Every such path has to have some such
edge.
108 §18.6 CHAPTER EIGHTEEN
12 Figure 18.18
11 Another DFS tree for finding
9 12 4 bridges
4 11 This diagram shows a different
3 9 5 11 DFS tree for than the one in Fig-
4 5 ure 18.17 for the graph in Fig-
ure 18.16, where we starting the
0 4 3
search at a different node. Al-
5 1 6
though we visit the nodes and
2 0
edges in a completely different or-
6 1 der, we still find the same bridges
7 2 0 (of course). In this tree, 0, 7, and
10 6 8 11 are the ones for which the low
8 7 value is equal to the pre value, so
10 7
the edges connecting each of them
to their parents (12-11, 5-0, and
0 1 2 3 4 5 6 7 8 9 10 11 12 6-7, respectively) are bridges.
pre 6 7 8 4 3 5 9 10 12 2 11 1 0
low 6 6 6 3 1 3 6 10 10 1 10 1 0
the adjacency list, keeping track of the minimum of the numbers that
we can reach by following each edge. For tree edges, we do the
computation recursively; for back edges, we use the preorder number
of the adjacent vertex. If the call to the recursive function for an edge
w-v does not uncover a path to a node with a preorder number less
than v’s preorder number, then w-v is a bridge.
Property 18.6 We can find a graph’s bridges in linear time.
Proof : Program 18.7 is a minor modification to DFS that involves
adding a few constant-time tests, so it follows directly from Proper-
ties 18.3 and 18.4 that finding the bridges in a graph requires time
proportional to V 2 for the adjacency-matrix representation and to
V + E for the adjacency-lists representation.
In Program 18.7, we use DFS to discover properties of the graph.
The graph representation certainly affects the order of the search, but
it does not affect the results because bridges are a characteristic of the
graph rather than of the way that we choose to represent or search
the graph. As usual, any DFS tree is simply another representation
of the graph, so all DFS trees have the same connectivity properties.
The correctness of the algorithm depends on this fundamental fact.
For example, Figure 18.18 illustrates a different search of the graph,
110 §18.6 CHAPTER EIGHTEEN
starting from a different vertex, that (of course) finds the same bridges.
Despite Property 18.6, when we examine different DFS trees for the
same graph, we see that some search costs may depend not just on
properties of the graph, but also on properties of the DFS tree. For
example, the amount of space needed for the stack to support the
recursive calls is larger for the example in Figure 18.18 than for the
example in Figure 18.17.
As we did for regular connectivity in Program 18.4, we may wish
to use Program 18.7 to build an ADT function for testing whether a
graph is edge-connected or to count the number of edge-connected
components. If desired, we can proceed as for Program 18.4 to create
ADT functions that gives clients the ability to call a linear-time pre-
processing function, then respond in constant time to queries that ask
whether two vertices are in the same edge-connected component (see
Exercise 18.35).
We conclude this section by considering other generalizations
of connectivity, including the problem of determining which vertices
are critical to keeping a graph connected. By including this material
biconnected component here, we keep in one place the basic background material for the more
complex algorithms that we consider in Chapter 22. If you are new
to connectivity problems, you may wish to skip to Section 18.7 and
return here when you study Chapter 22.
bridge When we speak of it removing a vertex, we also mean that we
edge-connected
remove all its incident edges. As illustrated in Figure 18.19, removing
component either of the vertices on a bridge would disconnect a graph (unless the
articulation point bridge were the only edge incident on one or both of the vertices), but
there are also other vertices, not associated with bridges, that have the
same property.
Definition 18.6 An articulation point in a graph is an vertex that, if
Figure 18.19 removed, would separate a connected graph into at least two disjoint
Graph separability terminol- subgraphs.
ogy
We also refer to articulation points as separation vertices or cut vertices.
This graph has two edge-connected
components and one bridge. The
We might use the term “vertex connected” to describe a graph that has
edge-connected component above no separation vertices, but we use different terminology based on a
the bridge is also biconnected; the related characterization that turns out to be equivalent.
one below the bridge consists of
two biconnected components that Definition 18.7 A graph is said to be biconnected if every pair of
are joined at an articulation point. vertices is connected by two disjoint paths.
GRAPH SEARCH §18.6 111
similar to Program 18.7, with an extra test to check whether the root
of the DFS tree is an articulation point (see Exercise 18.42). Develop-
ing code to print out the biconnected components is also a worthwhile
exercise that is only slightly more difficult than the corresponding code
for edge connectivity (see Exercise 18.43).
Proof : As for Property 18.7, this fact follows from the observation that
the solutions to Exercises 18.42 and 18.43 involve minor modifications
to DFS that amount to adding a few constant-time tests per edge.
Draw the standard adjacency-lists DFS tree. Use it to find the articulation
points and the biconnected components.
18.37 Do the previous exercise using the standard adjacency-matrix DFS tree.
18.38 Prove that every edge in a graph either is a bridge or belongs to exactly
one biconnected component.
• 18.39 Prove that any graph with no articulation points is biconnected. Hint:
Given a pair of vertices s and t and a path connecting them, use the fact
that none of the vertices on the path are articulation points to construct two
disjoint paths connecting s and t.
18.40 Modify Program 18.3 to derive a program for determining whether a
graph is biconnected, using a brute-force algorithm that runs in time propor-
tional to V (V + E). Hint: If you mark a vertex as having been seen before
you start the search, you effectively remove it from the graph.
◦ 18.41 Extend your solution to Exercise 18.40 to derive a program that deter-
mines whether a graph is 3-connected. Give a formula describing the approx-
imate number of times your program examines a graph edge, as a function of
V and E.
18.42 Prove that the root of a DFS tree is an articulation point if and only if
it has two or more (internal) children.
• 18.43 Write an ADT function for the adjacency-lists representation that prints
the biconnected components of the graph.
18.44 What is the minimum number of edges that must be present in any
biconnected graph with V vertices?
18.45 Modify Programs 18.3 and 18.7 to implement an ADT function that
determines whether a graph is edge-connected (returning as soon as it identifies
a bridge if the graph is not). Run empirical tests to study the number of edges
examined by your function, for graphs of various sizes, drawn from various
graph models (see Exercises 17.63–76).
◦ 18.46 Instrument Programs 18.3 and 18.7 to print out the number of artic-
ulation points, bridges, and biconnected components.
• 18.47 Run experiments to determine empirically the average values of the
quantities described in Exercise 18.46 for graphs of various sizes, drawn from
various graph models (see Exercises 17.63–76).
18.48 Give the edge connectivity and the vertex connectivity of the graph
0-1 0-2 0-8 2-1 2-8 8-1 3-8 3-7 3-6 3-5 3-4 4-6 4-5 5-6 6-7 7-8.
that no other path connecting those vertices has fewer edges. The
classical method for accomplishing this task, called breadth-first search
(BFS), is also the basis of numerous algorithms for processing graphs,
so we consider it in detail in this section. DFS offers us little assistance
in solving this problem, because the order in which it takes us through
the graph has no relationship to the goal of finding shortest paths. In
contrast, BFS is based on this goal. To find a shortest path from v to w,
we start at v and check for w among all the vertices that we can reach
by following one edge, then we check all the vertices that we can reach
by following two edges, and so forth.
When we come to a point during a graph search where we have
more than one edge to traverse, we choose one and save the others
to be explored later. In DFS, we use a pushdown stack (that is man-
aged by the system to support the recursive search function) for this
purpose. Using the LIFO rule that characterizes the pushdown stack
corresponds to exploring passages that are close by in a maze: We
choose, of the passages yet to be explored, the one that was most re-
cently encountered. In BFS, we want to explore the vertices in order of
their distance from the start. For a maze, doing the search in this order
might require a search team; within a computer program, however, it
is easily arranged: We simply use a FIFO queue instead of a stack.
Program 18.8 is an implementation of BFS for graphs represented
with an adjacency matrix. It is based on maintaining a queue of all
edges that connect a visited vertex with an unvisited vertex. We put
a dummy self-loop to the start vertex on the queue, then perform the
following steps until the queue is empty:
• Take edges from the queue until finding one that points to an
unvisited vertex.
• Visit that vertex; put onto the queue all edges that go from that
vertex to unvisited vertices.
Figure 18.21 shows the step-by-step development of BFS on a sample
graph.
As we saw in Section 18.4, DFS is analogous to one person ex-
ploring a maze. BFS is analogous to a group of people exploring by
fanning out in all directions. Although DFS and BFS are different in
many respects, there is an essential underlying relationship between
the two methods—one that we noted when we briefly considered the
methods in Chapter 5. In Section 18.8, we consider a generalized
116 §18.7 CHAPTER EIGHTEEN
Figure 18.21
Breadth-first search
This figure traces the operation of
BFS on our sample graph. We be- 0 2
0 0 2 0
gin with all the edges adjacent to 2 5 7
6 6
the start vertex on the queue (top 6
1 7 1 7
left). Next, we move edge 0-2
from the queue to the tree and pro- 3 3
cess its incident edges 2-0 and 5 4 5 4
2-6 (second from top, left). We do
not put 2-0 on the queue because 0-2 0-5 0-7 5-3 5-4 7-1 7-4 6-4
0 is already on the tree. Third, we
move edge 0-5 from the queue to
the tree; again 5’s incident edge
(to 0) leads nowhere new, but we
0 2 0 0 2 0
add 5-3 and 5-4 to the queue
(third from top, left). Next, we add 2 2 5 7
6 6
0-7 to the tree and put 7-1 on the 6 3
1 7 1 7
queue (bottom left).
The edge 7-4 is printed in 3 3
gray because we could also avoid 5 4 5 4
putting it on the queue, since there
is another edge that will take us 0-5 0-7 2-6 5-4 7-1 7-4 6-4 3-4
to 4 that is already on the queue.
To complete the search, we take
the remaining edges off the queue,
completely ignoring the gray edges 0
0 2 0 0 2
when they come to the front of
the queue (right). Edges enter and 2 5 2 5 7
6 6
leave the queue in order of their 1 7 1 7
6 3 4
distance from 0.
3 3
5 4 5 4
0 2 0 0 2 0
2 5 7 2 5 7
6 6
6 3 4 1
1 7 1 7
3 3
5 4 5 4
Property 18.11 BFS visits all the vertices and edges in a graph in
time proportional to V 2 for the adjacency-matrix representation and
to V + E for the adjacency-lists representation.
Figure 18.23
All-pairs shortest paths exam-
0 2
0
0 2
4 ple
6
7 5 2 6 7 6 5 3 These figures depict the result of
1 7 1 7
1 0 2 doing BFS from each vertex, thus
4 1 3 6 computing the shortest paths con-
3 3
necting all pairs of vertices. Each
5 4 5 4
search gives a BFS tree that defines
the shortest paths connecting all
graph vertices to the vertex at the
0 2 0 2 5 root. The results of all the searches
1
are summarized in the two matri-
6 6 4 3 0
7 ces at the bottom. In the left ma-
1 7 1 7
4 0
7 6 2 trix, the entry in row v and column
3 3
1 w gives the length of the shortest
6 5 3 2
5 4 5 4 path from v to w (the depth of v
in w’s tree). Each row of the right
matrix contains the st array for the
corresponding search. For exam-
0 2 0 2 6
2 ple, the shortest path from 3 to 2
4 2 has three edges, as indicated by
6 6 0 6
1 7 1 7
the entry in row 3 and column 2 of
7 5 3 0
4 7 5 the left matrix. The third BFS tree
3 3
1 from the top on the left tells us that
3 1
5 4 5 4 the path is 3-4-6-2, and this in-
formation is encoded in row 2 in
the right matrix. The matrix is not
0 2 0 2 7
necessarily symmetric when there
3 is more than one shortest path, be-
4 1 0 cause the paths found depend on
6 5 4 6
1 7 1 7 6 5 3 2 the BFS search order. For example,
0 7 6
3 3 the BFS tree at the bottom on the
2 1 left and row 3 of the right matrix
5 4 5 4
tell us that the shortest path from 2
to 3 is 2-0-5-3.
0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7
0 0 2 1 2 2 1 2 1 0 0 7 0 5 7 0 2 0
1 2 0 3 3 2 3 3 1 1 7 1 0 4 7 4 4 1
2 1 3 0 3 2 2 1 2 2 2 7 2 4 6 0 2 0
3 2 3 3 0 1 1 2 2 3 5 7 0 3 3 3 4 4
4 2 2 2 1 0 1 1 1 4 7 7 6 4 4 4 4 4
5 1 3 2 1 1 0 2 2 5 5 7 0 5 5 5 4 4
6 2 3 1 2 1 2 0 2 6 2 7 6 4 6 4 6 4
7 1 1 2 2 1 2 2 0 7 7 7 0 4 7 4 4 7
122 §18.7 CHAPTER EIGHTEEN
Figure 18.24
Breadth-first search shortest-path lengths in constant time and the paths themselves in time
This figure illustrates the progress proportional to their length (see Exercise 18.53).
of BFS in random Euclidean near- These BFS-based solutions are effective, but we do not consider
neighbor graph (left), in the same implementations in any further detail here because they are special
style as Figure 18.13. As is evident
from this example, the search tree
cases of algorithms that we consider in detail in Chapter 21. The term
for BFS tends to be quite short and shortest paths in graphs is generally taken to describe the correspond-
wide for this type of graph (and ing problems for digraphs and networks. Chapter 21 is devoted to this
many other types of graphs com- topic. The solutions that we examine there are strict generalizations
monly encountered in practice).
That is, vertices tend to be con- of the BFS-based solutions described here.
nected to one another by rather The basic characteristics of BFS search dynamics contrast sharply
short paths. The contrast between with those for DFS search, as illustrated in the large graph depicted in
the shapes of the DFS and BFS
Figure 18.24, which you should compare with Figure 18.13. The tree
trees is striking testimony to the
differing dynamic properties of the is shallow and broad, and demonstrates a set of facts about the graph
algorithms. being searched different from those shown by DFS. For example,
• There exists a relatively short path connecting each pair of ver-
tices in the graph.
• During the search, most vertices are adjacent to numerous unvis-
ited vertices.
Again, this example is typical of the behavior that we expect from BFS,
but verifying facts of this kind for graph models of interest and graphs
that arise in practice requires detailed analysis.
GRAPH SEARCH §18.7 123
DFS wends its way through the graph, storing on the stack the
points where other paths branch off; BFS sweeps through the graph,
using a queue to remember the frontier of visited places. DFS explores
the graph by looking for new vertices far away from the start point,
taking closer vertices only when dead ends are encountered; BFS com-
pletely covers the area close to the starting point, moving farther away
only when everything nearby has been examined. The order in which
the vertices are visited depends on the graph structure and representa-
tion, but these global properties of the search trees are more informed
by the algorithms than by the graphs or their representations.
The key to understanding graph-processing algorithms is to real-
ize that not only are various different search strategies effective ways
to learn various different graph properties, but also we can imple-
ment many of them uniformly. For example, the DFS illustrated in
Figure 18.13 tells us that the graph has a long path, and the BFS illus-
trated in Figure 18.24 tells us that it has many short paths. Despite
these marked dynamic differences, DFS and BFS are similar, essen-
tially differing in only the data structure that we use to save edges that
are not yet explored (and the fortuitous circumstance that we can use
a recursive implementation for DFS, to have the system maintain an
implicit stack for us). Indeed, we turn next to a generalized graph-
search algorithm that encompasses DFS, BFS, and a host of other use-
ful strategies, and will serve as the basis for solutions to numerous
classic graph-processing problems.
Exercises
18.49 Draw the BFS forest that results from a standard adjacency-lists BFS of
the graph
3-7 1-4 7-8 0-5 5-2 3-8 2-9 0-6 4-9 2-6 6-4.
18.50 Draw the BFS forest that results from a standard adjacency-matrix BFS
of the graph
3-7 1-4 7-8 0-5 5-2 3-8 2-9 0-6 4-9 2-6 6-4.
18.51 Give the all-shortest-path matrices (in the style of Figure 18.23) for the
graph
3-7 1-4 7-8 0-5 5-2 3-8 2-9 0-6 4-9 2-6 6-4,
assuming that you use the adjacency-matrix representation.
124 §18.8 CHAPTER EIGHTEEN
18.54 What does the BFS tree tell us about the distance from v to w when
neither is at the root?
18.55 Develop a graph ADT function that returns the path length that suf-
fices to connect any pair of vertices. (This quantity is known as the graph’s
diameter). Note: You need to define a convention for the return value in the
case that the graph is not connected.
18.56 Give a simple optimal recursive algorithm for finding the diameter of a
tree (see Exercise 18.55).
Figure 18.25
Stack-based DFS
Together with Figure 18.21, this
0 2 0 0 2 0 figure illustrates that BFS and DFS
7 differ only in the underlying data
6 6
4 structure. For BFS, we used a
1 7 1 7
6 queue; for DFS we use a stack. We
3 3 begin with all the edges adjacent
2
5 4 5 4
to the start vertex on the stack (top
left). Second, we move edge 0-7
0-2 0-5 0-7 0-2 0-5 7-1 4-3 4-5 from the stack to the tree and push
onto the stack its incident edges
that go to vertices not yet on the
tree 7-1, 7-4, and 7-6 (second
from top, left). The LIFO stack dis-
0 2 0 0 2 0 cipline implies that, when we put
7 7 an edge on the stack, any edges
6 6
4 that points to the same vertex are
1 7 1 7
6 5 obsolete and will be ignored when
3 3 they reach the top of the stack.
2
5 4 5 4 Such edges are printed in gray.
Third, we move edge 7-6 from the
0-2 0-5 7-1 7-4 0-2 0-5 7-1 4-3 5-3 stack to the tree and push its inci-
dent edges on the stack (third from
top, left). Next, we pop 4-6 push
its incident edges on the stack, two
of which will take us to new ver-
0 2 0 0 2 0 tices (bottom left). To complete
7 7 the search, we take the remaining
6 6
1 7
4 1 7
4 edges off the stack, completely ig-
6 5 noring the gray edges when they
3 3
2 3 come to the top of the stack (right).
5 4 5 4
0-2 0-5 7-1 4-3 4-5 4-6 0-2 0-5 7-1 4-3
0 2 0 0 2 0
7 7
6 6
4 4 1
1 7 1 7
6 6 5
3 3
2 3
5 4 5 4
to a start vertex on the fringe and an empty tree, perform the following
operation until the fringe is empty:
Move an edge from the fringe to the tree. If the vertex to
which it leads is unvisited, visit that vertex, and put onto
the fringe all edges that lead from that vertex to unvisited
vertices.
This strategy describes a family of search algorithms that will visit all
the vertices and edges of a connected graph, no matter what type of
generalized queue we use to hold the fringe edges.
When we use a queue to implement the fringe, we get BFS, the
topic of Section 18.6. When we use a stack to implement the fringe, we
get DFS. Figure 18.25, which you should compare with Figures 18.6
and 18.21, illustrates this phenomenon in detail. Proving this equiva-
lence between recursive and stack-based DFS is an interesting exercise
in recursion removal, where we essentially transform the stack under-
lying the recursive program into the stack implementing the fringe (see
unseen vertex Exercise 18.61). The search order for the DFS depicted in Figure 18.25
differs from the one depicted in Figure 18.6 only because the stack dis-
cipline implies that we check the edges incident on each vertex in the
fringe edge reverse of the order that we encounter them in the adjacency matrix (or
the adjacency lists). The basic fact remains that, if we change the data
tree vertex structure used by Program 18.8 to be a stack instead of a queue (which
is trivial to do because the ADT interfaces of the two data structures
tree edge
differ in only the function names), then we change that program from
fringe vertex BFS to DFS.
Now, as we discussed in Section 18.7, this general method may
not be as efficient as we would like, because the fringe becomes clut-
tered up with edges that point to vertices that are moved to the tree
Figure 18.26 during the time that the edge is on the fringe. For FIFO queues, we
Graph search terminology
avoid this situation by marking destination vertices when we put edges
During a graph search, we main- on the queue. We ignore edges to fringe vertices because we know that
tain a search tree (black) and a
fringe (gray) of edges that are can- they will never be used: The old one will come off the queue (and
didates to be next added to the the vertex visited) before the new one does (see Program 18.9). For
tree. Each vertex is either on the a stack implementation, we want the opposite: When an edge is to
tree (black), the fringe (gray), or be added to the fringe that has the same destination vertex as the one
not yet seen (white). Tree vertices
are connected by tree edges, and already there, we know that the old edge will never be used, because
each fringe vertex is connected by the new one will come off the stack (and the vertex visited) before the
a fringe edge to a tree vertex. old one. To encompass these two extremes and to allow for fringe
GRAPH SEARCH §18.8 127
#include <stdlib.h>
#include "GQ.h"
static Item *s;
static int N;
void RQinit(int maxN)
{ s = malloc(maxN*sizeof(Item)); N = 0; }
int RQempty()
{ return N == 0; }
void RQput(Item x)
{ s[N++] = x; }
void RQupdate(Item x)
{ }
Item RQget()
{ Item t;
int i = N*(rand()/(RAND_MAX + 1.0));
t = s[i]; s[i] = s[N-1]; s[N-1] = t;
return s[--N];
}
edges on the fringe, then ignore those that go to tree vertices when
we take them off. The disadvantage of this approach, as with BFS,
is that the maximum queue size has to be E instead of V . Or, we
could handle updates implicitly in the ADT implementation, just by
specifying that no two edges with the same destination vertex can
be on the queue. But the simplest way for the ADT implementation
to do so is essentially equivalent to using a vertex-indexed array (see
Exercises 4.51 and 4.54), so the test fits more comfortably into the
client graph-search program.
The combination of Program 18.10 and the generalized-queue
abstraction gives us a general and flexible graph-search mechanism.
To illustrate this point, we now consider briefly two interesting and
useful alternatives to BFS and DFS.
130 §18.8 CHAPTER EIGHTEEN
Figure 18.29
Randomized graph search
This figure illustrates the progress
of randomized graph searching
(left), in the same style as Fig-
ures 18.13 and 18.24. The search
tree shape falls somewhere be-
tween the BFS and DFS shapes.
The dynamics of these three al-
gorithms, which differ only in the
data structure for work to be com-
pleted, could hardly be more dif-
ferent.
132 §18.8 CHAPTER EIGHTEEN
• 18.69 Write a client program that does dynamic graphical animations of gen-
eralized graph search for graphs that have (x, y) coordinates associated with
each vertex (see Exercises 17.55 through 17.59). Test your program on ran-
dom Euclidean neighbor graphs, using as many points as you can process in
a reasonable amount of time. Your program should produce images like the
snapshots shown in Figures 18.13, 18.24, and 18.29, although you should
feel free to use colors instead of shades of gray to denote tree, fringe, and
unseen vertices and edges.
situation where, for many applications, we are on thin ice with respect
to all three properties listed in the previous paragraph:
• The mathematical analysis is challenging, and many basic ana-
lytic questions remain unanswered.
• There is a huge number of different types of graphs, and we
cannot reasonably test our algorithms on all of them.
• Characterizing the types of graphs that arise in practical problems
is, for the most part, a poorly understood problem.
Graphs are sufficiently complicated that we often do not fully under-
stand the essential properties of the ones that we see in practice or of
the artificial ones that we can perhaps generate and analyze.
The situation is perhaps not as bleak as just described for one
primary reason: Many of the graph algorithms that we consider are
optimal in the worst case, so predicting performance is a trivial ex-
ercise. For example, Program 18.7 finds the bridges after examining
each edge and each vertex just once. This cost is the same as the cost of
building the graph data structure, and we can confidently predict, for
example, that doubling the number of edges will double the running
time, no matter what kind of graphs we are processing.
When the running time of an algorithm depends on the structure
of the input graph, predictions are much harder to come by. Still,
when we need to process huge numbers of huge graphs, we want
efficient algorithms for the same reasons that we want them for any
other problem domain, and we will continue to pursue the goals of
understanding the essential properties of algorithms and applications,
striving to identify those methods that can best handle the graphs that
might arise in practice.
To illustrate some of these issues, we revisit the study of graph
connectivity, a problem that we considered already in Chapter 1 (!).
Connectivity in random graphs has fascinated mathematicians for
years, and it has been the subject of an extensive literature. That
literature is beyond the scope of this book, but it provides a backdrop
that validates our use of the problem as the basis for some experimen-
tal studies that help us understand the basic algorithms that we use
and the types of graphs that we are considering.
For example, growing a graph by adding random edges to a set of
initially isolated vertices (essentially, the process behind Program 17.7)
is a well-studied process that has served as the basis for classical ran-
GRAPH SEARCH §18.9 135
This table shows the number of connected components and the size of the
largest connected component for 100000-vertex graphs drawn from two
different distributions. For the random graph model, these experiments
support the well-known fact that the graph is highly likely to consist
primarily of one giant component if the average vertex degree is larger
than a small constant. The right two columns give experimental results
when we restrict the edges to be chosen from those that connect each
vertex to just one of 10 specified neighbors.
E C L C L
1000 99000 5 99003 3
2000 98000 4 98010 4
5000 95000 6 95075 5
10000 90000 8 90300 7
20000 80002 16 81381 9
50000 50003 1701 57986 27
100000 16236 79633 28721 151
200000 1887 98049 3818 6797
500000 4 99997 19 99979
1000000 1 100000 1 100000
Key:
C number of connected components
L size of largest connected component
dom graph theory. It is well known that, as the number of edges grows,
the graph coalesces into just one giant component. The literature on
random graphs gives extensive information about the nature of this
process, for example:
Proof : This fact was established by seminal work of Erdös and Renyi
in 1960. The proof is beyond the scope of this book (see reference
section).
This property tells us that we can expect large nonsparse random
graphs to be connected. For example, if V > 1000 and E > 10V , then
μ > 10 − 12 ln 1000 > 6.5 and the average number of vertices not in
the giant component is (almost surely) less than e−13 < .000003. If we
generate a million random 1000-vertex graphs of density greater than
10, we might get a few with a single isolated vertex, but the rest will
all be connected.
Figure 18.30 compares random graphs with random neighbor
graphs, where we allow only edges that connect vertices whose indices
are within a small constant of one another. The neighbor-graph model
yields graphs that are evidently substantially different in character from
random graphs. We eventually get a giant component, but it appears
suddenly, when two large components merge.
Table 18.2 shows that these structural differences between ran-
dom graphs and neighbor graphs persist for V and E in ranges of prac-
tical interest. Such structural differences certainly may be reflected in
the performance of our algorithms.
Table 18.3 gives empirical results for the cost of finding the num-
ber of connected components in a random graph, using various al-
gorithms. Although the algorithms perhaps are not subject to direct
comparison for this specific task because they are designed to handle
different tasks, these experiments do validate a subset of the general
conclusions that we have drawn.
First, it is plain from the table that we should not use the
adjacency-matrix representation for large sparse graphs (and cannot
use it for huge ones), not just because the cost of initializing the array
is prohibitive, but also because the algorithm inspects every entry in
the array, so its running time is proportional to the size (V 2 ) of the
array rather than to the number of 1s in it (E). The table shows, for
example, that it takes about as long to process a graph that contains
1000 edges as it does to process one that contains 100000 edges when
we are using an adjacency matrix.
Second, it is also plain from Table 18.3 that the cost of allocating
memory for the list nodes is significant when we build adjacency lists
for large sparse graphs. The cost of building the lists is more than five
GRAPH SEARCH §18.9 137
Figure 18.30
Connectivity in random
graphs
This figures show the evolution
of two types of random graphs at
10 equal increments as a total of
2E edges are added to initially
empty graphs. Each plot is a his-
togram of the number of vertices
in components of size 1 through
V − 1 (left to right). We start out
with all vertices in components of
size 1 and end with nearly all ver-
tices in a giant component. The
plot at left is for a standard random
graph: the giant component forms
quickly, and all other components
are small. The plot at right is for a
random neighbor graph: compo-
nents of various sizes persist for a
longer time.
138 §18.9 CHAPTER EIGHTEEN
This table shows relative timings for various algorithms for the task of
determining the number of connected components (and the size of the
largest one) for graphs with various numbers of vertices and edges. As
expected, algorithms that use the adjacency-matrix representation are
slow for sparse graphs, but competitive for dense graphs. For this spe-
cialized task, the union-find algorithms that we considered in Chapter 1
are the fastest, because they build a data structure tailored to solve the
problem and do not need otherwise to represent the graph. Once the
data structure representing the graph has been built, however, DFS and
BFS are faster, and more flexible. Adding a test to stop when the graph
is known to consist of a single connected component significantly speeds
up DFS and union-find (but not BFS) for dense graphs.
E U U* I D D* I D D* B B*
5000 vertices
500 1 0 255 312 356 1 0 0 0 1
1000 0 1 255 311 354 1 0 0 0 1
5000 1 2 258 312 353 2 2 1 2 1
10000 3 3 258 314 358 5 2 1 2 1
50000 12 6 270 315 202 25 6 4 5 6
100000 23 7 286 314 181 52 9 2 10 11
500000 117 5 478 248 111 267 54 16 56 47
100000 vertices
5000 5 3 3 8 7 24 24
10000 4 5 6 7 7 24 24
50000 18 18 26 12 12 28 28
100000 34 35 51 28 24 34 34
500000 133 137 259 88 89
Key:
U weighted quick union with halving (Program 1.4)
I initial construction of the graph representation
D recursive DFS (Programs 18.1 and 18.2)
B BFS (Programs 18.9 and 18.10)
* exit when graph is found to be fully connected
GRAPH SEARCH §18.9 139
times the cost of traversing them. In the typical situation where we are
going to perform numerous searches of various types after building the
graph, this cost is acceptable. Otherwise, we might consider preallo-
cating the array to reduce the cost of memory allocation, as discussed
in Chapter 2.
Third, the absence of numbers in the DFS columns for large
sparse graphs is significant. These graphs cause excessive recursion
depth, which (eventually) cause the program to crash. If we want to
use DFS on such graphs, we need to use the nonrecursive version that
we discussed in Section 18.7.
Fourth, the table shows that the union-find–based method from
Chapter 1 is faster than DFS or BFS, primarily because it does not
have to represent the entire graph. Without such a representation,
however, we cannot answer simple queries such as “Is there an edge
connecting v and w?” so union-find–based methods are not suitable
if we want to do more than what they are designed to do (answer “Is
there a path between v and w?” queries intermixed with adding edges).
Once the internal representation of the graph has been built, it is not
worthwhile to implement a union-find algorithm just to determine
whether it is connected, because DFS or BFS can provide the answer
about as quickly.
When we run empirical tests that lead to tables of this sort,
various anomalies might require further explanation. For example,
on many computers, the cache architecture and other features of the
memory system might have dramatic impact on performance for large
graphs. Improving performance in critical applications may require
detailed knowledge of the machine architecture in addition to all the
factors that we are considering.
Careful study of these tables will reveal more properties of these
algorithms than we are able to address. Our aim is not to do an ex-
haustive comparison but to illustrate that, despite the many challenges
that we face when we compare graph algorithms, we can and should
run empirical studies and make use of any available analytic results,
both to get a feeling for the algorithms’ important characteristics and
to predict performance.
Exercises
◦ 18.70 Do an empirical study culminating in a table like Table 18.2 for the
problem of determining whether or not a graph is bipartite (two-colorable).
140 §18.9 CHAPTER EIGHTEEN
18.71 Do an empirical study culminating in a table like Table 18.2 for the
problem of determining whether or not a graph is biconnected.
◦ 18.72 Do an empirical study to find the expected size of the second largest
connected component in sparse graphs of various sizes, drawn from various
graph models (see Exercises 17.63–76).
18.73 Write a program that produces plots like those in Figure 18.30, and
test it on graphs of various sizes, drawn from various graph models (see
Exercises 17.63–76).
◦ 18.74 Modify your program from Exercise 18.73 to produce similar his-
tograms for the sizes of edge-connected components.
•• 18.75 The numbers in the tables in this section are results for only one sample.
We might wish to prepare a similar table where we run 1000 experiments for
each entry and give the sample mean and standard deviation, but we probably
could not include nearly as many entries. Would this approach be a better use
of computer time? Defend your answer.
CHAPTER NINETEEN
141
142 CHAPTER NINETEEN
19.6 How many digits do we need to express the number of digraphs that
have V vertices as a base-10 number?
◦ 19.7 Draw the nonisomorphic digraphs that contain 3 vertices.
••• 19.8 How many different digraphs are there with V vertices and E edges if
we consider two digraphs to be different only if they are not isomorphic?
◦ 19.9 Compute an upper bound on the percentage of 10-vertex digraphs that
could ever be examined by any computer, under the assumptions described in
the text and the additional ones that the universe has less than 1080 electrons
and that the age of the universe will be less than 1020 years.
0
6 7 8 Program 19.1 Reversing a digraph (adjacency lists)
1 2 Given a digraph represented with adjacency lists, this function creates a
new adjacency-lists representation of a digraph that has the same vertices
3 9 10 and edges but with the edge directions reversed.
4
5 11 12 Graph GRAPHreverse(Graph G)
{ int v; link t;
0 1 2 3 4 5 6 7 8 9 10 11 12 Graph R = GRAPHinit(G->V);
0 1 0 1 0 0 0 0 0 0 0 0 0 0 for (v = 0; v < G->V; v++)
1 1 1 0 0 0 0 0 0 0 0 0 0 0
2 0 0 1 1 1 0 0 0 0 0 0 0 0 for (t = G->adj[v]; t != NULL; t = t->next)
3 0 0 1 1 1 0 0 0 0 0 0 0 0 GRAPHinsertE(R, EDGE(t->v, v));
4 0 0 0 0 1 1 1 0 0 0 0 0 0
5 1 0 0 1 0 1 0 0 0 0 0 0 0 return R;
6 1 0 0 0 0 0 1 1 0 0 0 0 0 }
7 0 0 0 0 0 0 0 1 1 0 0 0 0
8 0 0 0 0 0 0 0 1 1 0 0 0 0
9 0 0 0 0 0 0 1 0 1 1 0 0 1
10 0 0 0 0 0 0 0 0 0 1 1 0 0 grams for generating, building, and showing graphs (Programs 17.1
11 0 0 0 0 1 0 0 0 0 1 0 1 0
12 0 0 0 0 0 0 0 0 0 0 1 1 1 through 17.9), we remove the statement
G->adj[w] = NEW(v, G->adj[w]);
0
0 2 from the function GRAPHinsertE in the adjacency-lists version (Pro-
1
1 0 gram 17.6), remove the references to G->adj[w][v] from the functions
2
2 4 3 GRAPHinsertE and GRAPHremoveE in the adjacency-matrix version
3
3 4 2 (Program 17.3), and make the appropriate adjustments to GRAPHedges
4
4 6 5 in both versions.
5
5 3 0 The indegree of a vertex in a digraph is the number of directed
6
6 7 0 edges that lead to that vertex. The outdegree of a vertex in a digraph
7
7 8 is the number of directed edges that emanate from that vertex. No
8
8 7 vertex is reachable from a vertex of outdegree 0, which is called a
9
9 12 8 6 sink; a vertex of indegree 0, which is called a source, is not reachable
10
10 9 from any other vertex. A digraph where self-loops are allowed and
11
11 9 4 every vertex has outdegree 1 is called a map (a function from the set
12
12 11 10 of integers from 0 to V − 1 onto itself). We can easily compute the
indegree and outdegree of each vertex, and find sources and sinks, in
Figure 19.5 linear time and space proportional to V , using vertex-indexed arrays
Digraph reversal (see Exercise 19.19).
Reversing the edges of a digraph The reverse of digraph is the digraph that we obtain by switching
corresponds to transposing the
adjacency matrix but requires re- the direction of all the edges. Figure 19.5 shows the reverse and its
building the adjacency lists (see representations for the digraph of Figure 19.1. We use the reverse in
Figures 19.1 and 19.4). digraph algorithms when we need to know from where edges come
DIGRAPHS AND DAGS §19.1 147
because our standard representations tell us only where the edges go.
For example, indegree and outdegree change roles when we reverse a
digraph.
For an adjacency-matrix representation, we could compute the
reverse by making a copy of the array and transposing it (interchanging
its rows and columns). If we know that the graph is not going to be
modified, we can actually use the reverse without any extra computa-
tion by simply interchanging our references to the rows and columns
when we want to refer to the reverse. That is, an edge s-t in a digraph
G is indicated by a 1 in G->adj[s][t], so, if we were to compute the
reverse R of G, it would have a 1 in R->adj[t][s]; we do not need
to do so, however, because, if we want to check whether there is an
edge from s to t in the reverse of G, we can just check G->adj[t][s].
This opportunity may seem obvious, but it is often overlooked. For
an adjacency-lists representation, the reverse is a completely different
data structure, and we need to take time proportional to the number
of edges to build it, as shown in Program 19.1.
Yet another option, which we shall address in Chapter 22, is
to maintain two representations of each edge, in the same manner as
we do for undirected graphs (see Section 17.3) but with an extra bit
that indicates edge direction. For example, to use this method in an
adjacency-lists representation we would represent an edge s-t by a 0
6 7 8
node for t on the adjacency list for s (with the direction bit set to
1 2
indicate that to move from s to t is a forward traversal of the edge)
and a node for s on the adjacency list for t (with the direction bit set to 3 9 10
indicate that to move from t to s is a backward traversal of the edge). 4
This representation supports algorithms that need to traverse edges in 5 11 12
digraphs in both directions. It is also generally convenient to include
pointers connecting the two representations of each edge in such cases. 2-3 11-12 3-5 6-4
We defer considering this representation in detail to Chapter 22, where 0-6 9-12 8-7 6-9
0-1 9-10 5-4 7-6
it plays an essential role. 2-0 9-11 0-5
In digraphs, by analogy to undirected graphs, we speak of di-
rected cycles, which are directed paths from a vertex back to itself, Figure 19.6
A directed acyclic graph
and simple directed paths and cycles, where the vertices and edges are (DAG)
distinct. Note that s-t-s is a cycle of length 2 in a digraph but that
This digraph has no cycles, a prop-
cycles in undirected graphs must have at least three distinct vertices. erty that is not immediately appar-
In many applications of digraphs, we do not expect to see any ent from the edge list or even from
cycles, and we work with yet another type of combinatorial object. examining its drawing.
148 §19.1 CHAPTER NINETEEN
Note that no DAG that contains more than one vertex is strongly
connected.
Like simple connectivity in undirected graphs, this relation is
transitive: If s is strongly connected to t and t is strongly connected
to u, then s is strongly connected to u. Strong connectivity is an
equivalence relation that divides the vertices into equivalence classes
containing vertices that are strongly connected to each other. (See
Section 19.4 for a detailed discussion of equivalence relations.) Again,
strong connectivity provides a property for digraphs that we take for
granted with respect to connectivity in undirected graphs.
Property 19.1 A digraph that is not strongly connected comprises a
set of strongly connected components (strong components, for short),
which are maximal strongly connected subgraphs, and a set of directed
edges that go from one component to another.
Proof : Like components in undirected graphs, strong components in
digraphs are induced subgraphs of subsets of vertices: Each vertex is
in exactly one strong component. To prove this fact, we first note that
every vertex belongs to at least one strong component, which contains
(at least) the vertex itself. Then we note that every vertex belongs to
at most one strong component: If a vertex were to belong to two dif-
ferent ones, then there would be paths through that vertex connecting
vertices in those components to each other, in both directions, which
contradicts the maximality of both components.
For example, a digraph that consists of a single directed cycle
has just one strong component. At the other extreme, each vertex in
a DAG is a strong component, so each edge in a DAG goes from one
component to another. In general, not all edges in a digraph are in
the strong components. This situation is in contrast to the analogous
situation for connected components in undirected graphs, where every
vertex and every edge belongs to some connected component, but
similar to the analogous situation for edge-connected components in
undirected graphs. The strong components in a digraph are connected
by edges that go from a vertex in one component to a vertex in another,
but do not go back again.
Property 19.2 Given a digraph D, define another digraph K(D)
with one vertex corresponding to each strong component of D and
one edge in K(D) corresponding to each edge in D that connects
150 §19.1 CHAPTER NINETEEN
But our primary aim is to address the fact that reachability queries
in digraphs are more difficult to handle than connectivity or strong con-
nectivity queries. In this chapter, we examine classical algorithms that
require preprocessing time proportional to V E and space proportional
to V 2 , develop implementations that can achieve constant-time reach-
ability queries with linear space and preprocessing time for some di-
graphs, and study the difficulty of achieving this optimal performance
for all digraphs.
Exercises
19.10 Give the adjacency-lists structure that is built by Program 17.6, modi-
fied as described in the text to process digraphs, for the digraph
3-7 1-4 7-8 0-5 5-2 3-8 2-9 0-6 4-9 2-6 6-4.
19.11 Implement an ADT for sparse digraphs, based on Program 17.6, mod-
ified as described in the text. Include a random-digraph generator based on
Program 17.7. Write a client program to generate random digraphs for a well-
chosen set of values of V and E, such that you can use it to run meaningful
empirical tests on digraphs drawn from this model.
19.12 Implement an ADT for dense digraphs, based on Program 17.3, mod-
ified as described in the text. Include a random-digraph generator based on
Program 17.8. Write a client program to generate random graphs for a well-
chosen set of values of V and E, such that you can use it to run meaningful
empirical tests on graphs drawn from this model.
◦ 19.13 Write a program
√ that
√ generates random digraphs by connecting ver-
tices arranged in a V -by- V grid to their neighbors, with edge directions
randomly chosen (see Figure 19.3).
◦ 19.14 Augment your program from Exercise 19.13 to add R extra random
edges (all possible eges equally likely). For large R, shrink the grid so that the
total number of edges remains about V . Test your program as described in
Exercise 19.11.
◦ 19.15 Modify your program from Exercise 19.14 such that an extra edge
goes from a vertex s to a vertex t with probability inversely proportional to
the Euclidean distance between s and t.
◦ 19.16 Write a program that generates V random intervals in the unit interval,
all of length d, then builds a digraph with an edge from interval s to interval t if
and only if at least one of the endpoints of s falls within t (see Exercise 17.73).
Determine how to set d so that the expected number of edges is E. Test your
program as described in Exercise 19.11 (for low densities) and as described
in Exercise 19.12 (for high densities).
• 19.17 Write a program that chooses V vertices and E edges from the real
digraph that you found for Exercise 19.1. Test your program as described
152 §19.2 CHAPTER NINETEEN
in Exercise 19.11 (for low densities) and as described in Exercise 19.12 (for
high densities).
• 19.18 Write a program that produces each of the possible digraphs with V
vertices and E edges with equal likelihood (see Exercise 17.69). Test your
program as described in Exercise 19.11 (for low densities) and as described
in Exercise 19.12 (for high densities).
19.19 Add digraph ADT functions that return the number of sources and
sinks in the digraph. Modify the adjacency-lists ADT implementation (see
Exercise 19.11) such that you can implement the functions in constant time.
◦ 19.20 Use your program from Exercise 19.19 to find the average number of
sources and sinks in various types of digraphs (see Exercises 19.11–18).
19.21 Show the adjacency-lists structure that is produced when you use Pro-
gram 19.1 to find the reverse of the digraph
3-7 1-4 7-8 0-5 5-2 3-8 2-9 0-6 4-9 2-6 6-4.
• 19.27 Give a kernel DAG of the grid digraph shown in Figure 19.3.
19.28 How many digraphs have V vertices, all of outdegree k?
• 19.29 What is the expected number of different adjacency-list representations
of a random digraph? Hint: Divide the total number of possible representa-
tions by the total number of digraphs.
(recursively) visit all the vertices that can be reached from each of the
0 7
vertices on its adjacency list.
5 1 6 6 8
In an undirected graph, we have two representations of each edge,
4 9 4 7 9
but the second representation that is encountered in a DFS always leads
3 11 2
to a marked vertex and is ignored (see Section 18.2). In a digraph,
5 2 12
we have just one representation of each edge, so we might expect DFS
0 3 9
algorithms to be more straightforward. But digraphs themselves are
11 10
more complicated combinatorial objects than undirected graphs, so
12
this expectation is not justified. For example, the search trees that
0 1 2 3 4 5 6 7 8 9 10 11 12
we use to understand the operation of the algorithm have a more pre 0 9 4 3 2 1 10 11 12 7 8 5 6
complicated structure for digraphs than for undirected graphs. This post 10 8 0 1 6 7 9 12 11 3 2 5 4
Figure 19.11 1 2 7 6 7
DFS forests for a digraph 0 3 6 8 9 4 6 8
5 1 6 7 9 11 10 3 11 2 7 9
These forests describes depth-first
4 9 4 12 12 5 2
search of the same graph as Fig- 9 4 0 3
3 11 2
ure 19.9, when the graph search 5 1 6
5 2 12
function checks the vertices (and 9
calls the recursive function for 11 10 7
the unvisited ones) in the order 12 6 8
s, s+1, ..., V-1, 0, 1, ..., s-1 9 4 7 9
11 10 12 12 3 11 2
12 12 5 2
0 3
9
DIGRAPHS AND DAGS §19.2 157
edge. Suppose that v is the first of the vertices on the cycle that is
visited by the DFS. That vertex has the lowest preorder number of all
the vertices on the cycle. The edge that points to it will therefore be a
back edge: It will be encountered during the recursive call for v (for
a proof that it must be, see Property 19.5); and it points from some
node on the cycle to v, a node with a lower preorder number (see
Property 19.3).
We can convert any digraph into a DAG by doing a DFS and
removing any graph edges that correspond to back edges in the DFS.
For example, Figure 19.9 tells us that removing the edges 2-0, 3-5,
2-3, 9-11, 10-12, 4-2, and 8-7 makes the digraph in Figure 19.1 a
DAG. The specific DAG that we get in this way depends on the graph
representation and the associated implications for the dynamics of the
DFS (see Exercise 19.38). This method is a useful way to generate
large arbitrary DAGs randomly (see Exercise 19.79) for use in testing
DAG-processing algorithms.
Directed cycle detection is a simple problem, but contrasting
the solution just described with the solution that we considered in
Chapter 18 for undirected graphs gives insight into the necessity to
consider the two types of graphs as different combinatorial objects,
even though their representations are similar and the same programs
work on both types for some applications. By our definitions, we seem
to be using the same method to solve this problem as for cycle detection
in undirected graphs (look for back edges), but the implementation
that we used for undirected graphs would not work for digraphs.
For example, in Section 18.5 we were careful to distinguish between
parent links and back links, since the existence of a parent link does
not indicate a cycle (cycles in undirected graphs must involve at least
three vertices). But to ignore links back to a node’s parents in digraphs
would be incorrect; we do consider a doubly-connected pair of vertices
in a digraph to be a cycle. Theoretically, we could have defined back
edges in undirected graphs in the same way as we have done here, but
then we would have needed an explicit exception for the two-vertex
case. More important, we can detect cycles in undirected graphs in
time proportional to V (see Section 18.5), but we may need time
proportional to E to find a cycle in a digraph (see Exercise 19.33).
The essential purpose of DFS is to provide a systematic way to
visit all the vertices and all the edges of a graph. It therefore gives
158 §19.2 CHAPTER NINETEEN
the top level (the roots of the DFS trees) in a particular order that
immediately exposes the strong components.
Exercises
19.30 Add code to Program 19.2 to print out an indented DFS trace, as
described in the commentary to Figure 19.10.
19.31 Draw the DFS forest that results from a standard adjacency-lists DFS
of the digraph
3-7 1-4 7-8 0-5 5-2 3-8 2-9 0-6 4-9 2-6 6-4.
19.32 Draw the DFS forest that results from a standard adjacency-matrix DFS
of the digraph
3-7 1-4 7-8 0-5 5-2 3-8 2-9 0-6 4-9 2-6 6-4.
◦ 19.33 Describe a family of digraphs with V vertices and E edges for which a
standard adjacency-lists DFS requires time proportional to E for cycle detec-
tion.
19.34 Show that, during a DFS in a digraph, no edge connects a node to
another node whose preorder and postorder numbers are both smaller.
◦ 19.35 Show all possible DFS forests for the digraph
0-1 0-2 0-3 1-3 2-3.
Tabulate the number of tree, back, cross, and down edges for each forest.
19.36 If we denote the number of tree, back, cross, and down edges by t, b,
c, and d, respectively, then we have t + b + c + d = E and t < V for any DFS
of any digraph with V vertices and E edges. What other relationships among
these variables can you infer? Which of the values are dependent solely on
graph properties and which are dependent on dynamic properties of the DFS?
19.37 Prove that every source in a digraph must be a root of some tree in the
forest corresponding to any DFS of that digraph.
◦ 19.38 Construct a connected DAG that is a subgraph of Figure 19.1 by
deleting five edges (see Figure 19.11).
19.39 Define a digraph ADT function that provides the capability for a client
to check that a digraph is indeed a DAG, and provide a DFS-based implemen-
tation for the adjacency-matrix representation.
19.40 Use your solution to Exercise 19.39 to estimate (empirically) the prob-
ability that a random digraph with V vertices and E edges is a DAG for various
types of digraphs (see Exercises 19.11–18).
19.41 Run empirical studies to determine the relative percentages of tree,
back, cross, and down edges when we run DFS on various types of digraphs
(see Exercises 19.11–18).
DIGRAPHS AND DAGS §19.3 161
◦ 19.45 Give rules corresponding to Trémaux traversal for a maze where all the
passages are one-way.
0 1 2 3 4 5 0 1 2 3 4 5 Figure 19.14
1 2 1 2
0 0 0 1 0 0 1 0 0 1 0 0 1 0 Squaring an adjacency matrix
1 1 0 0 0 0 0 1 0 0 1 0 0 1
0 3 2 0 1 0 0 0 0 2 1 0 0 0 0 0 0 3 If we put 0s on the diagonal of a
3 0 0 1 0 1 0 3 0 1 0 0 0 1 digraph’s adjacency matrix, the
4 0 0 0 0 0 1 4 0 0 0 0 1 0
5 4 5 0 0 0 0 1 0 5 0 0 0 0 0 1 5 4 square of the matrix represents a
graph with an edge correspond-
0 1 2 3 4 5 0 1 2 3 4 5 ing to each path of length 2 (top).
1 2 0 1 0 1 0 0 1 0 1 1 1 0 1 1 1 2 If we put 1s on the diagonal, the
1 1 1 0 0 0 0 1 1 1 1 0 0 1 square of the matrix represents a
0 3 2 0 1 1 0 0 0 2 1 1 1 0 0 0 0 3 graph with an edge correspond-
3 0 0 1 1 1 0 3 0 1 1 1 1 1 ing to each path of length 1 or 2
4 0 0 0 0 1 1 4 0 0 0 0 1 1
5 4 5 0 0 0 0 1 1 5 0 0 0 0 1 1 5 4 (bottom).
closure with just one operation of this kind, building up the transitive
closure from the adjacency matrix in place, as follows:
0 1 2 3 4 5
1 2 0 1 0 1 0 0 1 for (i = 0; i < V; i++)
1 1 1 0 0 0 0 for (s = 0; s < V; s++)
0 3 2 0 1 1 0 0 0
3 0 0 1 1 1 0 for (t = 0; t < V; t++)
4 0 0 0 0 1 1 if (A[s][i] && A[i][t]) A[s][t] = 1;
5 4 5 0 0 0 0 1 1
This classical method, invented by S. Warshall in 1962, is the method
0 1 2 3 4 5 of choice for computing the transitive closure of dense digraphs. The
1 2 0 1 1 1 0 1 1 code is similar to the code that we might try to use to square a Boolean
1 1 1 1 0 0 1
0 3 2 1 1 1 0 0 0 matrix in place: The difference (which is significant!) lies in the order
3 0 1 1 1 1 1
4 0 0 0 0 1 1
of the for loops.
5 4 5 0 0 0 0 1 1
Property 19.7 With Warshall’s algorithm, we can compute the tran-
0 1 2 3 4 5 sitive closure of a digraph in time proportional to V 3 .
1 2 0 1 1 1 0 1 1
1 1 1 1 0 1 1 Proof : The running time is immediately evident from the structure of
0 3 2 1 1 1 0 0 1
3 1 1 1 1 1 1
the code. We prove that it computes the transitive closure by induction
5 4
4 0 0 0 0 1 1 on i. After the first iteration of the loop, the matrix has a 1 in row
5 0 0 0 0 1 1
s and column t if and only if we have either the paths s-t or s-0-t.
0 1 2 3 4 5 The second iteration checks all the paths between s and t that include
1 2 0 1 1 1 0 1 1 1 and perhaps 0, such as s-1-t, s-1-0-t, and s-0-1-t. We are led to
1 1 1 1 0 1 1
0 3 2 1 1 1 0 1 1 the following inductive hypothesis: the ith iteration of the loop sets
3 1 1 1 1 1 1 the bit in row s and column t in the matrix to 1 if and only if there
4 0 0 0 0 1 1
5 4 5 0 0 0 0 1 1 is a directed path from s to t in the digraph that does not include any
vertices with indices greater than i (except possibly the endpoints s
Figure 19.15 and t). As just argued, the condition is true when i is 0, after the first
Adjacency matrix powers and iteration of the loop. Assuming that it is true for the ith iteration of the
directed paths loop, there is a path from s to t that does not include any vertices with
This sequence shows the first, indices greater than i+1 if and only if (i) there is a path from s to t that
second, third, and fourth pow- does not include any vertices with indices greater than i, in which case
ers (right, top to bottom) of the
adjacency matrix at the top right, A[s][t] was set on a previous iteration of the loop (by the inductive
which gives graphs with edges for hypothesis); or (ii) there is a path from s to i+1 and a path from i+1
each of the paths of lengths less to t, neither of which includes any vertices with indices greater than
than 1, 2, 3, and 4, respectively, i (except endpoints), in which case A[s][i+1] and A[i+1][t] were
(left, top to bottom) in the graph
that the matrix represents. The bot- previously set to 1 (by hypothesis), so the inner loop sets A[s][t].
tom graph is the transitive closure We can improve the performance of Warshall’s algorithm with a
for this example, since there are no
paths of length greater than 4 that simple transformation of the code: We move the test of A[s][i] out
connect vertices not connected by of the inner loop because its value does not change as t varies. This
shorter paths. move allows us to avoid executing the t loop entirely when A[s][i] is
DIGRAPHS AND DAGS §19.3 165
0 1 2 3 4 5
1 2 0 1 0 1 0 0 1 Figure 19.16
1 1 1 0 0 0 0 Warshall’s algorithm
0 3 2 0 1 1 0 0 0
3 0 0 1 1 1 0 This sequence shows the devel-
4 0 0 0 0 1 1 opment of the transitive closure
5 4 5 0 0 0 0 1 1 (bottom) of an example digraph
(top) as computed with Warshall’s
0 1 2 3 4 5 0 1 2 3 4 5 algorithm. The first iteration of
1 2 0 1 0 1 0 0 1 1 2 0 1 1 1 0 0 1 the loop (left column, top) adds
1 1 1 1 0 0 1 1 1 1 1 0 0 1 the edges 1-2 and 1-5 because
0 3 2 0 1 1 0 0 0 0 3 2 1 1 1 0 0 1 of the paths 1-0-2 and 1-0-5,
3 0 0 1 1 1 0 3 1 1 1 1 1 1
which include vertex 0 (but no
4 0 0 0 0 1 1 4 0 0 0 0 1 1
5 4 5 0 0 0 0 1 1 5 4 5 0 0 0 0 1 1 vertex with a higher number); the
second iteration of the loop (left
0 1 2 3 4 5 0 1 2 3 4 5 column, second from top) adds
1 2 0 1 0 1 0 0 1 1 2 0 1 1 1 0 0 1 the edges 2-0 and 2-5 because
1 1 1 1 0 0 1 1 1 1 1 0 0 1 of the paths 2-1-0 and 2-1-0-5,
0 3 2 1 1 1 0 0 1 0 3 2 1 1 1 0 0 1 which include vertex 1 (but no ver-
3 0 0 1 1 1 0 3 1 1 1 1 1 1
4 0 0 0 0 1 1 4 0 0 0 0 1 1 tex with a higher number); and
5 4 5 0 0 0 0 1 1 5 4 5 0 0 0 0 1 1 the third iteration of the loop (left
column, bottom) adds the edges
0 1 2 3 4 5 0 1 2 3 4 5 0-1, 3-0, 3-1, and 3-5 because
1 2 0 1 1 1 0 0 1 1 2 0 1 1 1 0 1 1 of the paths 0-2-1, 3-2-1-0,
1 1 1 1 0 0 1 1 1 1 1 0 1 1 3-2-1, and 3-2-1-0-5, which in-
0 3 2 1 1 1 0 0 1 0 3 2 1 1 1 0 1 1
clude vertex 2 (but no vertex with
3 1 1 1 1 1 1 3 1 1 1 1 1 1
4 0 0 0 0 1 1 4 0 0 0 0 1 1 a higher number). The right col-
5 4 5 0 0 0 0 1 1 5 4 5 0 0 0 0 1 1 umn shows the edges added when
paths through 3, 4, and 5 are con-
sidered. The last iteration of the
loop (right column, bottom) adds
zero. The savings that we achieve from this improvement depends on the edges from 0, 1, and 2, to 4,
the digraph and is substantial for many digraphs (see Exercises 19.54 because the only directed paths
from those nodes to 4 include 5,
and 19.55). Program 19.3 implements this improvement and packages the highest-numbered vertex.
Warshall’s method as a pair of ADT functions for digraphs such that
clients can preprocess a digraph (compute the transitive closure), then
compute the answer to any reachability query in constant time.
We are interested in pursuing more efficient solutions, particu-
larly for sparse digraphs. We would like to reduce both the prepro-
cessing time and the space because both make the use of Warshall’s
method prohibitively costly for huge sparse digraphs.
In modern applications, abstract data types provide us with the
ability to separate out the idea of an operation from any particular
implementation, so we can focus on efficient implementations. For the
transitive closure, this point of view leads to a recognition that we do
not necessarily need to compute the entire matrix to provide clients
166 §19.3 CHAPTER NINETEEN
The same argument holds for any linear-time generalized search (see
Section 18.8 and Exercise 19.69).
Program 19.4 is an implementation of this search-based transitive-
closure algorithm. The result of running this program on the sample
digraph in Figure 19.1 is illustrated in the first tree in each forest in
Figure 19.11. The implementation is packaged in the same way as
we packaged Warshall’s algorithm in Program 19.3: a preprocessing
function that computes the transitive closure, and a function that can
determine whether any vertex is reachable from any other in constant
time by testing the indicated entry in the transitive-closure array.
For sparse digraphs, this search-based approach is the method
of choice. For example, if E is proportional to V , then Program 19.4
computes the transitive closure in time proportional to V 2 . How can
it do so, given the reduction to Boolean matrix multiplication that we
DIGRAPHS AND DAGS §19.3 171
This table shows running times that exhibit dramatic performance dif-
ferences for various algorithms for computing the transitive closure of
random digraphs, both dense and sparse. For all but the adjacency-lists
DFS, the running time goes up by a factor of 8 when we double V ,
which supports the conclusion that it is essentially proportional to V 3 .
The adjacency-lists DFS takes time proportional to V E, which explains
the running time roughly increasing by a factor of 4 when we double both
V and E (sparse graphs) and by a factor of about 2 when we double E
(dense graphs), except that list-traversal overhead degrades performance
for high-density graphs.
V W W* A L E W W* A L
Key:
W Warshall’s algorithm (Section 19.3)
W* Improved Warshall’s algorithm (Program 19.3)
A DFS, adjacency-matrix representation (Exercise 19.64)
L DFS, adjacency-lists representation (Program 19.4)
• 19.50 Show how to construct a digraph with V vertices and E edges with the
property that the number of edges in the transitive closure is proportional to
t, for any t between E and V 2 . As usual, assume that E > V .
19.51 Give a formula for the number of edges in the transitive closure of a
digraph that is a directed forest as a function of structural properties of the
forest.
19.52 Show, in the style of Figure 19.15, the process of computing the tran-
sitive closure of the digraph
3-7 1-4 7-8 0-5 5-2 3-8 2-9 0-6 4-9 2-6 6-4
through repeated squaring.
19.53 Show, in the style of Figure 19.16, the process of computing the tran-
sitive closure of the digraph
3-7 1-4 7-8 0-5 5-2 3-8 2-9 0-6 4-9 2-6 6-4
with Warshall’s algorithm.
◦ 19.54 Give a family of sparse digraphs for which the improved version of
Warshall’s algorithm for computing the transitive closure (Program 19.3) runs
in time proportional to V E.
◦ 19.55 Find a sparse digraph for which the improved version of Warshall’s
algorithm for computing the transitive closure (Program 19.3) runs in time
proportional to V 3 .
◦ 19.56 Develop an ADT for integer matrices with appropriate implementations
such that we can have a single client program that encompasses both Warshall’s
algorithm and Floyd’s algorithm. (This exercise is a version of Exercise 19.57
for people who are more familiar with abstract data types than with abstract
algebra.)
• 19.57 Use abstract algebra to develop a generic algorithm that encompasses
both Warshall’s algorithm and Floyd’s algorithm. (This exercise is a version of
Exercise 19.56 for people who are more familiar with abstract algebra than
with abstract data types.)
◦ 19.58 Show, in the style of Figure 19.16, the development of the all-shortest
paths matrix for the example graph in the figure as computed with Floyd’s
algorithm.
19.59 Is the Boolean product of two symmetric matrices symmetric? Explain
your answer.
19.60 Modify Programs 19.3 and 19.4 to provide implementations of a di-
graph ADT function that returns the number of edges in the transitive closure
of the digraph.
174 §19.4 CHAPTER NINETEEN
Exercises
19.70 Show that “has the same remainder after dividing by k” is a transitive
relation (and therefore is an equivalence relation) on the set of integers.
19.71 Show that “is in the same edge-connected component as” is an equiva-
lence relation among vertices in any graph.
19.72 Show that “is in the same biconnected component as” is not an equiv-
alence relation among vertices in all graphs.
178 §19.5 CHAPTER NINETEEN
◦ 19.75 Using an online dictionary, build a graph that represents the equivalence
relation “has k letters in common with” among words. Determine the number
of equivalence classes for k = 1 through 5.
19.76 The cardinality of a partial order is its number of ordered pairs. What
is the cardinality of the subset containment partial order for an n-element set?
19.77 Show that “is a factor of” is a partial order among integers.
19.5 DAGs
In this section, we consider various applications of directed acyclic
graphs (DAGs). We have two reasons to do so. First, because they
serve as implicit models for partial orders, we work directly with DAGs
in many applications and need efficient algorithms to process them.
Second, these various applications give us insight into the nature of
DAGs, and understanding DAGs is essential to understanding general
digraphs.
Since DAGs are a special type of digraph, all DAG-processing
problems trivially reduce to digraph-processing problems. Although
we expect processing DAGs to be easier than processing general di-
graphs, we know when we encounter a problem that is difficult to solve
on DAGs we should not expect to do any better solving the same prob-
lem on general digraphs. As we shall see, the problem of computing
the transitive closure lies in this category. Conversely, understanding
the difficulty of processing DAGs is important because every digraph
has a kernel DAG (see Property 19.2), so we encounter DAGs even
when we work with digraphs that are not DAGs.
The prototypical application where DAGs arise directly is called
scheduling. Generally, solving scheduling problems has to do with ar-
ranging for the completion of a set of tasks, under a set of constraints,
by specifying when and how the tasks are to be performed. Con-
straints might involve functions of the time taken or other resources
consumed by the tasks. The most important type of constraints are
DIGRAPHS AND DAGS §19.5 179
a standard DFS and checking that the DFS forest has no back edges.
A separate ADT for DAGs should include an ADT function allowing
client programs to perform such a check (see Exercise 19.78).
In a sense, DAGs are part tree, part graph. We can certainly
take advantage of their special structure when we process them. For
example, we can view a DAG almost as we view a tree, if we wish.
The following simple program is like a recursive tree traversal:
void traverseR(Dag D, int w)
{ link t;
visit(w);
for (t = D->adj[w]; t != NULL; t = t->next)
traverseR(D, t->v);
8
}
5 3 The result of this program is to traverse the vertices of the DAG D as
though it were a tree rooted at w. For example, the result of traversing
3 2 2 1
the two DAGs in Figure 19.18 with this program would be the same.
2 1 1 1 1 1
We rarely use a full traversal, however, because we normally want to
take advantage of the same economies that save space in a DAG to
1 1 save time in traversing it (for example, by marking visited nodes in
a normal DFS). The same idea applies to a search, where we make a
8 recursive call for only one link incident on each vertex. In such an
5 algorithm, the search cost will be the same for the DAG and the tree,
3
but the DAG uses far less space.
2
Because they provide a compact way to represent trees that have
identical subtrees, we often use DAGs instead of trees when we rep-
1
resent computational abstractions. In the context of algorithm de-
1
sign, the distinction between the DAG representation and the tree
representation of a program in execution is the essential distinction
Figure 19.18 behind dynamic programming (see, for example, Figure 19.18 and
DAG model of Fibonacci Exercise 19.81). DAGs are also widely used in compilers as interme-
computation
diate representations of arithmetic expressions and programs (see, for
The tree at the top shows the de-
pendence of computing each Fi- example, Figure 19.19), and in circuit-design systems as intermediate
bonacci number on computing its representations of combinational circuits.
two predecessors. The DAG at the Along these lines, an important example that has many applica-
bottom shows the same depen-
dence with only a fraction of the tions arises when we consider binary trees. We can apply the same
nodes. restriction to DAGs that we applied to trees to define binary trees:
DIGRAPHS AND DAGS §19.5 181
- - Figure 19.19
* * * * DAG representation of an
arithmetic expression
c + + + c +
Both of these DAGs are represen-
a b a b + e + e
tations of the arithmetic expression
a b a b (c*(a+b))-((a+b))*((a+b)+e)).
In the binary parse tree at left, leaf
nodes represent operands and in-
ternal nodes each represent op-
Definition 19.6 A binary DAG is a directed acyclic graph with two erators to be applied to the ex-
pressions represented by their two
edges leaving each node, identified as the left edge and the right edge,
subtrees (see Figure 5.31). The
either or both of which may be null. DAG at right is a more compact
representation of the same tree.
The distinction between a binary DAG and a binary tree is that in the More important, we can com-
binary DAG we can have more than one link pointing to a node. As pute the value of the expression
did our definition for binary trees, this definition models a natural rep- in time proportional to the size
resentation, where each node is a structure with a left link and a right of the DAG, which is typically
significantly less than the size of
link that point to other nodes (or are null), subject to only the global the tree (see Exercises 19.114
restriction that no directed cycles are allowed. Binary DAGs are sig- and 19.115).
nificant because they provide a compact way to represent binary trees
in certain applications. For example, we can compress an existence
trie into a binary DAG without changing the search implementation,
as shown Figure 19.20 and Program 19.5.
An equivalent application is to view the trie keys as correspond-
ing to rows in the truth table of a Boolean function for which the
function is true (see Exercises 19.87 through 19.90). The binary DAG
is a model for an economical circuit that computes the function. In
this application, binary DAGs are known as binary decision diagrams
(BDD)s.
Motivated by these applications, we turn, in the next two sec-
tions, to the study of DAG-processing algorithms. Not only do these
algorithms lead to efficient and useful DAG ADT function implemen-
tations, but also they provide insight into the difficulty of process-
ing digraphs. As we shall see, even though DAGs would seem to
be substantially simpler structures than general digraphs, some basic
problems are apparently no easier to solve.
Exercises
19.78 Define an ADT interface that is suitable for processing DAGs, and
build adjacency-lists and adjacency-matrix implementations. Include an ADT
function for verifying that the DAG has no cycles, implemented with DFS.
182 §19.5 CHAPTER NINETEEN
19.85 In the style of Figure 19.20, give the existence trie and corresponding
binary DAG for the keys 01001010 10010101 00100001 11101100 01010001
00100001 00000111 01010011 .
19.86 Implement an ADT based on building an existence trie from a set of
32-bit keys, compressing it as a binary DAG, then using that data structure to
support existence queries.
◦ 19.87 Draw the BDD for the truth table for the odd parity function of four
variables, which is 1 if and only if the number of variables that have the value
1 is odd.
19.88 Write a function that takes a 2n -bit truth table as argument and returns
the corresponding BDD. For example, given the input 1110001000001100,
your program should return a representation of the binary DAG in Fig-
ure 19.20. 0
n 6 7 8
19.89 Write a function that takes a 2 -bit truth table as argument, computes 1 2
every permutation of its argument variables, and, using your solution to Ex-
ercise 19.88, finds the permutation that leads to the smallest BDD.
3 9 10
• 19.90 Run empirical studies to determine the effectiveness of the strategy of 4
Exercise 19.90 for various Boolean functions, both standard and randomly 5 11 12
generated.
19.91 Write a program like Program 19.5 that supports common subexpres-
sion removal: Given a binary tree that represents an arithmetic expression, 0
compute a binary DAG that represents the same expression with common 6 5 4
subexpressions removed. 1 2
◦ 19.92 Draw all the nonisomorphic DAGs with two, three, four, and five
vertices. 3 9 10
7
•• 19.93 How many different DAGs are there with V vertices and E edges? 8 11 12
••• 19.94 How many different DAGs are there with V vertices and E edges, if
we consider two DAGs to be different only if they are not isomorphic? 0 1 2 3 4 5 6 7 8 9 10 11 12
tsI 0 1 2 3 7 8 6 5 4 9 10 11 12
Figure 19.22
Topological sorting (rear-
rangement).
This diagram shows another way to 0 1 2 3 8 7 6 4 5 9 10 11 12
look at the topological sort in Fig-
ure 19.21, where we specify a way
to rearrange the vertices, rather 0 1 2 3 4 5 6 7 8 9 10 11 12
than relabel them. When we place ts 0 1 2 3 8 7 6 4 5 9 10 11 12
the vertices in the order specified tsI 0 1 2 3 7 8 6 5 4 9 10 11 12
in the array ts, from left to right,
then all directed edges point from
left to right. The inverse of the per-
mutation ts is the permutation tsI
Topological sort (rearrange) Given a DAG, rearrange its ver-
that specifies the relabeling de- tices on a horizontal line such that all the directed edges point from
scribed in Figure 19.21. left to right (see Figure 19.22).
As indicated in Figure 19.22, it is easy to establish that the re-
labeling and rearrangement permutations are inverses of one another:
Given a rearrangement, we can obtain a relabeling by assigning the la-
bel 0 to the first vertex on the list, 1 to the second label on the list, and
so forth. For example, if an array ts has the vertices in topologically
sorted order, then the loop
for (i = 0; i < V; i++) tsI[ts[i]] = i;
defines a relabeling in the vertex-indexed array tsI. Conversely, if we
have the relabeling in an array tsI, then we can get the rearrangement
with the loop
for (i = 0; i < V; i++) ts[tsI[i]] = i;
which puts the vertex that would have label 0 first in the list, the vertex
that would have label 1 second in the list, and so forth. Most often,
we use the term topological sort to refer to the rearrangement version
of the problem.
In general, the vertex order produced by a topological sort is not
unique. For example,
8 7 0 1 2 3 6 4 9 10 11 12 5
0 1 2 3 8 6 4 9 10 11 12 5 7
0 2 3 8 6 4 7 5 9 10 1 11 12
8 0 7 6 2 3 4 9 5 1 11 12 10
are all topological sorts of the example DAG in Figure 19.6 (and there
are many others). In a scheduling application, this situation arises
whenever one task has no direct or indirect dependence on another and
DIGRAPHS AND DAGS §19.6 185
Figure 19.23
Reverse topological sort.
In this reverse topological sort of
5 12 11 10 9 4 6 3 2 1 0 7 8 our sample digraph, the edges all
point from right to left. Number-
ing the vertices as specified by
0 1 2 3 4 5 6 7 8 9 10 11 12 the inverse permutation tsI gives
ts 5 12 11 10 9 4 6 3 2 1 0 7 8 a graph where every edge points
tsI 10 9 8 7 5 0 6 11 12 4 3 2 1 from a higher-numbered vertex to a
lower-numbered vertex.
thus they can be performed either before or after the other (or even in
parallel). The number of possible schedules grows exponentially with
the number of such pairs of tasks.
As we have noted, it is sometimes useful to interpret the edges in
a digraph the other way around: We say that an edge directed from
s to t means that vertex s “depends” on vertex t. For example, the
vertices might represent terms to be defined in a book, with an edge
from s to t if the definition of s uses t. In this case, it would be
useful to find an ordering with the property that every term is defined
before it is used in another definition. Using this ordering corresponds
to positioning the vertices in a line such that edges all go from right
to left—a reverse topological sort. Figure 19.23 illustrates a reverse
topological sort of our sample DAG.
Now, it turns out that we have already seen an algorithm for
reverse topological sorting: our standard recursive DFS! When the
input graph is a DAG, a postorder numbering puts the vertices in
reverse topological order. That is, we number each vertex as the final
action of the recursive DFS function, as in the post array in the DFS
implementation in Program 19.2. As illustrated in Figure 19.24, using
this numbering is equivalent to numbering the nodes in the DFS forest
in postorder. Taking the vertices in this example in order of their
postorder numbers, we get the vertex order in Figure 19.23—a reverse
topological sort of the DAG.
Proof : Suppose that s and t are two vertices such that s appears before
t in the postorder numbering even though there is a directed edge s-t
in the graph. Since we are finished with the recursive DFS for s at the
186 §19.6 CHAPTER NINETEEN
every DAG has at least one sink. It follows that every DAG also has
at least one source: its reverse’s sink.
0 0
6 7 8 6 7 8
Figure 19.25
1 2 1 2
Topologically sorting a DAG
by removing sources
3 9 10 3 9 10 Since it is a source (no edges point
4 4 to it), 0 can appear first in a topo-
5 11 12 5 11 12 logical sort of this example graph
(left, top). If we remove 0 (and
0 0 all the edges that point from it to
6 7 8 6 7 8 other vertices), then 1 and 2 be-
1 2 1 2 come sources in the resulting DAG
(left, second from top), which we
3 9 10 3 9 10 can then sort using the same al-
4 4 gorithm. This figure illustrates the
5 11 12 5 11 12 operation of Program 19.8, which
picks from among the sources (the
0 0 shaded nodes in each diagram) us-
6 7 8 6 7 8 ing the FIFO discipline, though any
1 2 1 2 of the sources could be chosen at
each step. See Figure 19.26 for the
3 9 10 3 9 10 contents of the data structures that
4 4 control the specific choices that
5 11 12 5 11 12 the algorithm makes. The result of
the topological sort illustrated here
0 0 is the node order 0 8 2 1 7 3 6 5 4
6 7 8 6 7 8 9 11 10 12.
1 2 1 2
3 9 10 3 9 10
4 4
5 11 12 5 11 12
0 0
6 7 8 6 7 8
1 2 1 2
3 9 10 3 9 10
4 4
5 11 12 5 11 12
0 0
6 7 8 6 7 8
1 2 1 2
3 9 10 3 9 10
4 4
5 11 12 5 11 12
190 §19.6 CHAPTER NINETEEN
Exercises
19.95 Add a topological sort function to your DAG ADT from Exer-
cise 19.78, then add an ADT function that checks whether or not a given
permutation of a DAG’s vertices is a proper topological sort of that DAG.
19.96 How many different topological sorts are there of the DAG that is
depicted in Figure 19.6?
192 §19.6 CHAPTER NINETEEN
19.97 Give the DFS forest and the reverse topological sort that results from
doing a standard adjacency-lists DFS (with postorder numbering) of the DAG
3-7 1-4 7-8 0-5 5-2 3-8 2-9 0-6 4-9 2-6 6-4 4-3 2-3.
◦ 19.98 Give the DFS forest and the topological sort that results from building
a standard adjacency-lists representation of the DAG
3-7 1-4 7-8 0-5 5-2 3-8 2-9 0-6 4-9 2-6 6-4 4-3 2-3,
then using Program 19.1 to build the reverse, then doing a standard adjacency-
lists DFS with postorder numbering.
• 19.99 Prove the correctness of each of the three suggestions given in the
text for modifying DFS with postorder numbering such that it computes a
topological sort instead of a reverse topological sort.
19.100 Give the DFS forest and the topological sort that results from do-
ing a standard adjacency-matrix DFS with implicit reversal (and postorder
numbering) of the DAG
3-7 1-4 7-8 0-5 5-2 3-8 2-9 0-6 4-9 2-6 6-4 4-3 2-3
(see Program 19.7).
• 19.101 Given a DAG, does there exist a topological sort that cannot re-
sult from applying a DFS-based algorithm, no matter what order the vertices
adjacent to each vertex are chosen? Prove your answer.
19.102 Show, in the style of Figure 19.26, the process of topologically sort-
ing the DAG
3-7 1-4 7-8 0-5 5-2 3-8 2-9 0-6 4-9 2-6 6-4 4-3 2-3
with the source-queue algorithm (Program 19.8).
19.103 Give the topological sort that results if the data structure used in the
example depicted in Figure 19.25 is a stack rather than a queue.
• 19.104 Given a DAG, does there exist a topological sort that cannot result
from applying the source-queue algorithm, no matter what queue discipline is
used? Prove your answer.
19.105 Modify the source-queue topological-sort algorithm to use a gener-
alized queue. Use your modified algorithm with a LIFO queue, a stack, and a
randomized queue.
19.106 Use Program 19.8 to provide an implementation for the ADT func-
tion for verifying that a DAG has no cycles (see Exercise 19.78).
◦ 19.107 Convert the source-queue topological-sort algorithm into a sink-
queue algorithm for reverse topological sorting.
19.108 Write a program that generates all possible topological orderings of
a given DAG, or, if the number of such orderings exceeds a bound taken as an
argument, prints that number.
DIGRAPHS AND DAGS §19.7 193
19.109 Write a program that converts any digraph with V vertices and E
edges into a DAG by doing a DFS-based topological sort and changing the
orientation of any back edge encountered. Prove that this strategy always
produces a DAG.
•• 19.110 Write a program that produces each of the possible DAGs with V
vertices and E edges with equal likelihood (see Exercise 17.69).
19.111 Give necessary and sufficient conditions for a DAG to have just one
possible topologically sorted ordering of its vertices.
19.112 Run empirical tests to compare the topological-sort algorithms given
in this section for various DAGs (see Exercise 19.2, Exercise 19.79, Ex-
ercise 19.109, and Exercise 19.110). Test your program as described in
Exercise 19.11 (for low densities) and as described in Exercise 19.12 (for
high densities).
19.113 Modify Program 19.8 so that it computes the number of different
simple paths from any source to each vertex in a DAG.
◦ 19.114 Write a program that evaluates DAGs that represent arithmetic ex-
pressions (see Figure 19.19). Use the adjacency-lists graph ADT, extended to
include a double corresponding to each vertex (to hold its value). Assume
that values corresponding to leaves have been established.
◦ 19.115 Describe a family of arithmetic expressions with the property that
the size of the expression tree is exponentially larger than the size of the cor-
responding DAG (so the running time of your program from Exercise 19.114
for the DAG is proportional to the logarithm of the running time for the tree).
◦ 19.116 Write a program that finds the longest simple directed path in a DAG,
in time proportional to V . Use your program to implement an ADT function
that finds a Hamilton path in a given DAG, if it has one.
void DAGtc(Dag D)
{ int v;
D->tc = MATRIXint(D->V, D->V, 0);
for (v = 0; v < D->V; v++) pre[v] = -1;
for (v = 0; v < D->V; v++)
if (pre[v] == -1) TCdfsR(D, EDGE(v, v));
}
void TCdfsR(Dag D, Edge e)
{ int u, i, v = e.w;
pre[v] = cnt++;
for (u = 0; u < D->V; u++)
if (D->adj[v][u] != 0)
{
D->tc[v][u] = 1;
if (pre[u] > pre[v]) continue;
if (pre[u] == -1) TCdfsR(D, EDGE(v, u));
for (i = 0; i < D->V; i++)
if (D->tc[u][i] == 1) D->tc[v][i] = 1;
}
}
int DAGreach(Dag D, int s, int t)
{ return D->tc[s][t]; }
merge. We access a row of size V for each tree edge and each cross edge.
There are no back edges, and we can ignore down edges because we
accounted for any vertices they reach when we processed any ancestors
of both nodes earlier in the search.
If our DAG has no down edges (see Exercise 19.43), the running
time of Program 19.9 is proportional to V E and represents no im-
provement over the transitive-closure algorithms that we examined for
general digraphs in Section 19.3 (such as, for example, Program 19.4)
196 §19.8 CHAPTER NINETEEN
Exercises
◦ 19.117 Show, in the style of Figure 19.27, the reachability vectors that result
when we use Program 19.9 to compute the transitive closure of the DAG
3-7 1-4 7-8 0-5 5-2 3-8 2-9 0-6 4-9 2-6 6-4 4-3 2-3.
◦ 19.122 Does your solution to Exercise 19.121 require that you examine all
edges in the DAG, or are there edges that can be ignored, such as the down
edges in DFS? Give an example that requires examination of all edges, or
characterize the edges that can be skipped.
DIGRAPHS AND DAGS §19.8 197
0 1 9 Figure 19.28
0 2 0 6 8 12 Computing strong compo-
6 7 8 nents (Kosaraju’s algo-
3 4 10 11
1 2 rithm)
4 2 9 4 9
3 9 10
To compute the strong components
6 5
4 of the digraph at the lower left, we
7 0 0 3 first do a DFS of its reverse (top
5 11 12
8 left), computing a postorder ar-
7
ray that gives the vertex indices
in the order in which the recursive
0 1 2 3 4 5 6 7 8 9 10 11 12 DFS completed (top). This order
post 8 7 6 5 4 3 2 0 1 10 11 12 9
is equivalent to a postorder walk
of the DFS forest (top right). Then
1 7
we use the reverse of that order to
9 0
0 do a DFS of the original digraph
6 7 8 11 10 5 1 6 6 8 (bottom). First we check all nodes
1 2 12 12 4 9 4 7 9 reachable from 9, then we scan
from right to left through the ar-
3 9 10 9 3 11 2
ray to find that 1 is the rightmost
4 5 2 unvisited vertex, so we do the re-
5 11 12 cursive call for 1, and so forth. The
0 3
trees in the DFS forest that results
0 1 2 3 4 5 6 7 8 9 10 11 12 from this process define the strong
G->sc 2 1 2 2 2 2 2 3 3 0 0 0 0 components: all vertices in each
tree have the same value in the
vertex-indexed id array (bottom).
strong component (because all the vertices that can be reached from
that vertex have been processed). Second, the back links in the tree
provide a second path from one vertex to another and bind together
the strong components.
The recursive DFS function uses the same computation as Pro-
gram 18.7 to find the highest vertex reachable (via a back edge) from
any descendant of each vertex. It also uses a vertex-indexed array to
keep track of the strong components and a stack to keep track of the
current search path. It pushes the vertex names onto a stack on en-
try to the recursive function, then pops them and assigns component
numbers after visiting the final member of each strong component.
The algorithm is based on our ability to identify this moment with a
simple test (based on keeping track of the highest ancestor reachable
via one up link from all descendants of each node) at the end of the
recursive procedure that tells us that all vertices encountered since en-
try (except those already assigned to a component) belong to the same
strong component.
The implementation in Program 19.11 is a succinct and complete
description of the algorithm that fills in the details missing from the
brief sketch just given. Figure 19.29 illustrates the operation of the
algorithm for our sample digraph from Figure 19.1.
Figure 19.29
Computing strong compo- 0 7
nents (Tarjan and Gabow
algorithms) 5 1 6 6 8
4 9 4 7 9
Tarjan’s algorithm is based on a re-
cursive DFS, augmented to push 3 11 2
19.127 Show, in the style of Figure 19.28, the DFS forests and the contents
of the auxiliary vertex-indexed arrays that result when you use Kosaraju’s
algorithm to compute the strong components of the digraph
3-7 1-4 7-8 0-5 5-2 3-8 2-9 0-6 4-9 2-6 6-4.
◦ 19.131 Show, in the style of Figure 19.29, the DFS forest, stack contents
during the execution of the algorithm, and the final contents of the auxiliary
vertex-indexed arrays that result when you use Tarjan’s algorithm to compute
the strong components of the reverse of the digraph in Figure 19.5. (You
should have the same strong components.)
19.132 Show, in the style of Figure 19.29, the DFS forest, stack contents
during the execution of the algorithm, and the final contents of the auxiliary
vertex-indexed arrays that result when you use Tarjan’s algorithm to compute
the strong components of the digraph
3-7 1-4 7-8 0-5 5-2 3-8 2-9 0-6 4-9 2-6 6-4.
19.139 Develop a table in the spirit of Table 18.1 to study strong connectivity
in random digraphs (see Table 19.2). Let S be the set of vertices in the largest
strong component. Keep track of the size of S and study the percentages of
edges in the following four classes: those connecting two vertices in S, those
pointing out of S, those pointing in to S, those connecting two vertices not in
S.
19.140 Run empirical tests to compare the brute-force method for comput-
ing strong components described at the beginning of this section, Kosaraju’s
algorithm, Tarjan’s algorithm, and Gabow’s algorithm, for various types of
digraphs (see Exercises 19.11–18).
••• 19.141 Develop a linear-time algorithm for strong 2-connectivity: Determine
whether a strongly connected digraph has the property that it remains strongly
connected after deleting any vertex (and all its incident edges).
DAG (as described in the next paragraph); and DFS (Program 19.9) to
compute its transitive closure. After this preprocessing, we can imme-
diately access the information necessary to determine reachability.
Once we have a vertex-indexed array with the strong compo-
nents of a digraph, building an adjacency-array representation of its
kernel DAG is a simple matter. The vertices of the DAG are the com-
ponent numbers in the digraph. For each edge s-t in the original
digraph, we simply set D->adj[sc[s]][sc[t]] to 1. We would have
to cope with duplicate edges in the kernel DAG if we were using an
adjacency-lists representation—in an adjacency array, duplicate edges
simply correspond to setting an array entry to 1 that has already been
set to 1. This small point is significant because the number of duplicate
edges is potentially huge (relative to the size of the kernel DAG) in this
application.
Property 19.17 Given two vertices s and t in a digraph D, let sc(s)
and sc(t), respectively, be their corresponding vertices in D’s kernel
DAG K. Then, t is reachable from s in D if and only if sc(t) is
reachable from sc(s) in K.
This simple fact follows from the definitions. In particular, this prop-
erty assumes the convention that a vertex is reachable from itself (all
vertices have self-loops). If the vertices are in the same strong compo-
nent (sc(s) = sc(t)), then they are mutually reachable.
We determine whether a given vertex t is reachable from a given
vertex s in the same way as we built the kernel DAG: We use the
vertex-indexed array computed by the strong-components algorithm
to get component numbers sc(s) and sc(t) (in constant time), then
use those numbers to index into the transitive closure of the kernel
DAG (in constant time), which tells us the result. Program 19.13 is an
implementation of the abstract–transitive-closure ADT that embodies
these ideas.
We use an abstract–transitive-closure interface for the kernel
DAG as well. For purposes of analysis, we suppose that we use
an adjacency-matrix representation for the kernel DAG because we
expect the kernel DAG to be small, if not also dense.
Property 19.18 We can support constant query time for the abstract
transitive closure of a digraph with space proportional to V + v 2 and
time proportional to E + v 2 + vx for preprocessing (computing the
210 §19.9 CHAPTER NINETEEN
Property 19.19 We can support constant query time for the abstract
√
transitive closure of any digraph whose kernel DAG has less than 3 V
vertices with space proportional to V and time proportional to E + V
for preprocessing.
√
Proof : Take v < 3 V in Property 19.18. Then, xv < V since x < v 2 .
DIGRAPHS AND DAGS §19.9 211
This table shows the numbers of edges and vertices in the kernel DAGs
for random digraphs generated from two different models (the directed
versions of the models in Table 18.1). In both cases, the kernel DAG
becomes small (and is sparse) as the density increases.
E v e v e
1000 vertices
1000 983 981 916 755
2000 424 621 713 1039
5000 13 13 156 313
10000 1 1 8 17
20000 1 1 1 1
10000 vertices
50000 144 150 1324 150
100000 1 1 61 123
200000 1 1 1 1
Key:
v Number of vertices in kernel DAG
e Number of edges in kernel DAG
broadened the class of graphs for which we can avoid this worst-
case performance. Indeed, constructing a random-digraph model that
produces digraphs for which the algorithm is slow is a challenge (see
Exercise 19.146).
Table 19.2 displays the results of an empirical study; it shows
that random digraphs have small kernel DAGs even for moderate
densities and even in models with severe restrictions on edge placement.
Although there can be no guarantees in the worst case, we can expect
to see huge digraphs with small kernel DAGs in practice. When we
do have such a digraph, we can provide an efficient implementation of
the abstract–transitive-closure ADT.
Exercises
• 19.142 Develop a version of the implementation of the abstract transitive
closure for digraphs based on using an adjacency-lists representation of the
kernel DAG. Your challenge is to eliminate duplicates on the list without using
an excessive amount of time or space (see Exercise 19.68).
19.143 Show the kernel DAG computed by Program 19.13 and its transitive
closure for the digraph
3-7 1-4 7-8 0-5 5-2 3-8 2-9 0-6 4-9 2-6 6-4.
19.145 Do empirical studies to estimate the expected size of the kernel DAG
for various types of digraphs (see Exercises 19.11–18).
19.10 Perspective
In this chapter, we have considered algorithms for solving the
topological-sorting, transitive-closure, and shortest-paths problems for
digraphs and for DAGs, including fundamental algorithms for finding
cycles and strong components in digraphs. These algorithms have nu-
merous important applications in their own right and also serve as the
basis for the more difficult problems involving weighted graphs that
we consider in the next two chapters. Worst-case running times of
these algorithms are summarized in Table 19.3.
In particular, a common theme through the chapter has been
the solution of the abstract–transitive-closure problem, where we wish
to support an ADT that can determine quickly, after preprocessing,
whether there is a directed path from one given vertex to another.
Despite a lower bound that implies that our worst-case preprocessing
costs are significantly higher than V 2 , the method discussed in Sec-
tion 19.7 melds the basic methods from throughout the chapter into
a simple solution that provides optimal performance for many types
of digraphs—the significant exception being dense DAGs. The lower
bound suggests that better guaranteed performance on all graphs will
be difficult to achieve, but we can use these methods to get good per-
formance on practical graphs.
The goal of developing an algorithm with performance charac-
teristics similar to the union-find algorithms of Chapter 1 for dense di-
graphs remains elusive. Ideally, we would like to define an ADT where
we can add directed edges or test whether one vertex is reachable from
another and to develop an implementation where we can support all
the operations in constant time (see Exercises 19.157 through 19.159).
As discussed in Chapter 1, we can come close to that goal for undi-
rected graphs, but comparable solutions for digraphs or DAGs are still
not known. (Note that deleting edges presents a challenge even for
undirected graphs.) Not only does this dynamic reachability problem
have both fundamental appeal and direct application in practice, but
also it plays a critical role in the development of algorithms at a higher
level of abstraction. For example, reachability lies at the heart of the
problem of implementing the network simplex algorithm for the min-
cost flow problem, a problem-solving model of wide applicability that
we consider in Chapter 22.
214 §19.10 CHAPTER NINETEEN
digraphs
cycle detect E DFS
transitive closure V(E+V) DFS from each vertex
single-source shortest paths E DFS
all shortest paths V(E+V) DFS from each vertex
strong components E Kosaraju, Tarjan, or Gabow
transitive closure E + v(v + x) kernel DAG
DAGs
acyclic verify E DFS or source queue
topological sort E DFS or source queue
transitive closure V(V+E) DFS
transitive closure V(V+X) DFS/dynamic programming
out this chapter, some of the facts that we have discovered about
digraphs are expressions of more general mathematical phenomena,
and many of our algorithms have applicability at levels of abstrac-
tion different from that at which we have been working. On the one
hand, the concept of intractability tells us that we might encounter
fundamental roadblocks in our quest for efficient algorithms that can
guarantee efficient solutions to some problems. On the other hand, the
classic algorithms described in this chapter are of fundamental impor-
tance and have broad applicability, as they provide efficient solutions
to problems that arise frequently in practice and would otherwise be
difficult to solve.
Exercises
19.148 Adapt Programs 17.13 and 17.14 to implement an ADT function
for printing an Euler path in a digraph, if one exists. Explain the purpose of
any additions or changes that you need to make in the code.
19.149 Draw the dominator tree of the digraph
3-7 1-4 7-8 0-5 5-2 3-0 2-9 0-6 4-9 2-6
6-4 1-5 8-2 9-0 8-3 4-5 2-3 1-6 3-5 7-6.
•• 19.150 Write an ADT function that uses DFS to create a parent-link repre-
sentation of the dominator tree of a given digraph (see reference section).
◦ 19.151 Find a transitive reduction of the digraph
3-7 1-4 7-8 0-5 5-2 3-0 2-9 0-6 4-9 2-6
6-4 1-5 8-2 9-0 8-3 4-5 2-3 1-6 3-5 7-6.
.32
5-3 .18 7
might represent time or the cost of performing tasks or of waiting for 7-4 .46 1
.2
tasks to be performed. 5-4 .40 1
.60
0-5 .60
.46
.51
Questions that entail cost minimization naturally arise for such
6-4 .51
situations. We examine algorithms for two such problems: (i) find the 7-0 .31 3 .3
4
8
.1
lowest-cost way to connect all of the points, and (ii) find the lowest- 7-6 .25
.40
7-1 .21 5 4
cost path between two given points. The first type of algorithm, which
is useful for undirected graphs that represent objects such as electric
Figure 20.1
circuits, finds a minimum spanning tree; it is the subject of this chap- A weighted undirected graph
ter. The second type of algorithm, which is useful for digraphs that and its MST
represent objects such as an airline route map, finds the shortest paths; A weighted undirected graph is a
it is the subject of Chapter 21. These algorithms have broad applica- set of weighted edges. The MST
bility beyond circuit and map applications, extending to a variety of is a set of edges of minimal total
weight that connects the vertices
problems that arise on weighted graphs.
(black in the edge list, thick edges
When we study algorithms that process weighted graphs, our in the graph drawing). In this par-
intuition is often supported by thinking of the weights as distances: ticular graph, the weights are pro-
we speak of “the vertex closest to x,” and so forth. Indeed, the term portional to the distances between
“shortest path” embraces this bias. Despite numerous applications the vertices, but the basic algo-
rithms that we consider are appro-
where we actually do work with distance and despite the benefits of priate for general graphs and make
geometric intuition in understanding the basic algorithms, it is impor- no assumptions about the weights
tant to remember that the weights do not need to be proportional to a (see Figure 20.2).
219
220 CHAPTER TWENTY
5-3 .37 7
7-4 .12 .2
8 but the efficiency of implementations varies widely, and researchers still
5-4 .78 1 seek better methods. In this section, we examine three classical algo-
.65
0-5 .65
.01
.12
6-4 .01
rithms that are easily understood at a conceptual level; in Sections 20.3
7-0 .49 3 .6 through 20.5, we examine implementations of each in detail; and in
5
7
.3
7-6 .65
7-1 .28 5
.78
4 Section 20.6, we consider comparisons of and improvements on these
basic approaches.
Figure 20.2 Definition 20.1 A minimum spanning tree (MST) of a weighted
Arbitrary weights
graph is a spanning tree whose weight (the sum of the weights of its
In this example, the edge weights edges) is no larger than the weight of any other spanning tree.
are arbitrary and do not relate to
the geometry of the drawn graph If the edge weights are all positive, it suffices to define the MST as
representation at all. This example
also illustrates that the MST is not the set of edges with minimal total weight that connects all the vertices,
necessarily unique if edge weights as such a set of edges must form a spanning tree. The spanning-tree
may be equal: we get one MST by condition in the definition is included so that it applies for graphs that
including 3-4 (shown) and a differ- may have negative edge weights (see Exercises 20.2 and 20.3).
ent MST by including 0-5 instead
(although 7-6, which has the same If edges can have equal weights, the minimum spanning tree may
weight as those two edges, is not not be unique. For example, Figure 20.2 shows a graph that has two
in any MST). different MSTs. The possibility of equal weights also complicates the
MINIMUM SPANNING TREES 221
Exercises
20.1 Assume that the weights in a graph are positive. Prove that you can
rescale them by adding a constant to all of them or by multiplying them all by
a constant without affecting the MSTs, provided only that the rescaled weights
are positive.
222 §20.1 CHAPTER TWENTY
20.2 Show that, if edge weights are positive, a set of edges that connects all
the vertices whose weights sum to a quantity no larger than the sum of the
weights of any other set of edges that connects all the vertices is an MST.
20.3 Show that the property stated in Exercise 20.2 holds for graphs with
negative weights, provided that there are no cycles whose edges all have non-
positive weights.
◦ 20.4 How would you find a maximum spanning tree of a weighted graph?
20.5 Show that if a graph’s edges all have distinct weights, the MST is unique.
20.6 Consider the assertion that a graph has a unique MST only if its edge
weights are distinct. Give a proof or a counterexample.
• 20.7 Assume that a graph has t < V edges with equal weights and that all
other weights are distinct. Give upper and lower bounds on the number of
different MSTs that the graph might have.
20.1 Representations
In this chapter, we concentrate on weighted undirected graphs—the
most natural setting for MST problems. Extending the basic graph rep-
resentations from Chapter 17 to represent weighted graphs is straight-
forward: In the adjacency-matrix representation, the matrix can con-
tain edge weights rather than Boolean values; in the adjacency-lists
representation, we add a field to the list elements that represent edges,
for the weights.
In our examples, we generally assume that edge weights are real
numbers between 0 and 1. This decision does not conflict with various
alternatives that we might need in applications, because we can explic-
itly or implicitly rescale the weights to fit this model (see Exercises 20.1
and 20.8). For example, if the weights are positive integers less than
a known maximum value, we can divide them all by the maximum
value to convert them to real numbers between 0 and 1.
We use the same basic graph ADT interface that we used in
Chapter 17 (see Program 17.1), except that we add a weight field to
the edge data type, as follows:
typedef struct { int v; int w; double wt; } Edge;
Edge EDGE(int, int, double);
To avoid proliferation of simple types, we use double for edge weights
throughout this chapter and Chapter 21. If we wanted to do so,
we could build a more general ADT interface and use any data type
MINIMUM SPANNING TREES §20.1 223
0 1 2 3 4 5 6 7
0 * .32 .29 * * .60 .51 .31 0 7 .31 5 .60 2 .29 1 .32 6 .51
2 .29 * * * * * * * 2 0 .29
Figure 20.3
that supports addition, subtraction, and comparisons, since we do Weighted-graph representa-
little more with the weights than to accumulate sums and to make tions (undirected)
decisions based on their values. In Chapter 22, our algorithms are The two standard representations
concerned with comparing linear combinations of edge weights, and of weighted undirected graphs
include weights with each edge
the running time of some algorithms depends on arithmetic properties representation, as illustrated in
of the weights, so we switch to integer weights to allow us to more the adjacency-matrix (left) and
easily analyze the algorithms. adjacency-lists (right) representa-
tion of the graph depicted in Fig-
We use sentinel weights to indicate the absence of an edge. An-
ure 20.1. The adjacency matrix is
other straightforward approach would be to use the standard adja- symmetric and the adjacency lists
cency matrix to indicate the existence of edges and a parallel matrix contain two nodes for each edge,
to hold weights. With sentinels, many of our algorithms do not need as in unweighted directed graphs.
Nonexistent edges are represented
to explicitly test whether or not an edge exists. The adjacency-matrix
by a sentinel value in the matrix
representation of our sample graph is shown in Figure 20.3; Pro- (indicated by asterisks in the figure)
gram 20.1 gives the implementation details of the weighted-graph ADT and are not present at all in the
for an adjacency-matrix representation. It uses an auxiliary function lists. Self-loops are absent in both
of the representations illustrated
that allocates a matrix of weights and fills it with the sentinel weight. here because MST algorithms are
Inserting an edge amounts to storing the weight in two places in the simpler without them; other al-
matrix—one for each orientation of the edge. The sentinel weight value gorithms that process weighted
that indicates the absence of an edge is larger than all other weights, graphs use them (see Chapter 21).
not 0, which represents an edge of length 0 (an alternative would be
to disallow edges of length 0). As is true of algorithms that use the
adjacency-matrix representation for unweighted graphs, the running
time of any algorithm that uses this representation is proportional to
V 2 (to initialize the matrix) or higher.
Similarly, Program 20.2 gives the implementation details of the
weighted-graph ADT for an adjacency-lists representation. A vertex-
indexed array associates each vertex with a linked list of that vertex’s
224 §20.1 CHAPTER TWENTY
#include <stdlib.h>
#include "GRAPH.h"
struct graph { int V; int E; double **adj; };
Graph GRAPHinit(int V)
{ int v;
Graph G = malloc(sizeof *G);
G->adj = MATRIXdouble(V, V, maxWT);
G->V = V; G->E = 0;
return G;
}
void GRAPHinsertE(Graph G, Edge e)
{
if (G->adj[e.v][e.w] == maxWT) G->E++;
G->adj[e.v][e.w] = e.wt;
G->adj[e.w][e.v] = e.wt;
}
#include "GRAPH.h"
typedef struct node *link;
struct node { int v; double wt; link next; };
struct graph { int V; int E; link *adj; };
link NEW(int v, double wt, link next)
{ link x = malloc(sizeof *x);
x->v = v; x->wt = wt; x->next = next;
return x;
}
Graph GRAPHinit(int V)
{ int i;
Graph G = malloc(sizeof *G);
G->adj = malloc(V*sizeof(link));
G->V = V; G->E = 0;
for (i = 0; i < V; i++) G->adj[i] = NULL;
return G;
}
void GRAPHinsertE(Graph G, Edge e)
{ link t;
int v = e.v, w = e.w;
if (v == w) return;
G->adj[v] = NEW(w, e.wt, G->adj[v]);
G->adj[w] = NEW(v, e.wt, G->adj[w]);
G->E++;
}
226 §20.1 CHAPTER TWENTY
0
0 7 .31 2 .29
0-2 .29 2 7
1 7 .21
4-3 .34 1 4 6
2 0 .29
5-3 .18 3
3 5 .18 4 .34
7-4 .46
5
4 7 .46 3 .34
7-0 .31
5 3 .18
7-6 .25 0 1 2 3 4 5 6 7
6 7 .25
7-1 .21 st 0 7 0 4 7 3 7 0
7 1 .21 6 .25 0 .31 4 .46
val 0 .21 .29 .34 .46 .18 .25 .31
Figure 20.4
MST representations How should we represent the MST itself? The MST of a graph
This figure depicts various rep-
G is a subgraph of G that is also a tree, so we have numerous options.
resentations of the MST in Fig- Chief among them are
ure 20.1. The most straightforward • A graph
is a list of its edges, in no particu-
• A linked list of edges
lar order (left). The MST is also a
sparse graph, and might be repre- • An array of edges
sented with adjacency lists (center). • A vertex-indexed array with parent links
The most compact is a parent-link Figure 20.4 illustrates these options for the example MST in Fig-
representation: we choose one of
the vertices as the root and keep ure 20.1. Another alternative is to define and use an ADT for trees.
two vertex-indexed arrays, one The same tree might have many different representations in any
with the parent of each vertex in of the schemes. In what order should the edges be presented in the
the tree, the other with the weight
list-of-edges representation? Which node should be chosen as the
of the edge from each vertex to
its parent (right). The orientation root in the parent-link representation (see Exercise 20.15)? Generally
of the tree (choice of root vertex) speaking, when we run an MST algorithm, the particular MST repre-
is arbitrary, not a property of the sentation that results is an artifact of the algorithm used, rather than
MST. We can convert from any
reflecting any important features of the MST.
one of these representations to any
other in linear time. From an algorithmic point of view, the choice of MST repre-
sentation is of little consequence, because we can convert easily from
each of these representations to any of the others. To convert from the
graph representation to an array of edges, we can use the GRAPHedges
function in the graph ADT. To convert from the parent-link represen-
tation in an array st to an array of edges in an array mst, we can use
the simple loop
for (k = 1; k < G->V; k++) mst[k] = EDGE(k, st[k]);
This code is for the typical case where the MST is rooted at 0, and it
does not put the dummy edge 0-0 onto the MST edge list.
These two conversions are trivial, but how do we convert from
the array-of-edges representation to the parent-link representation?
MINIMUM SPANNING TREES §20.1 227
Exercises
20.8 Build a graph ADT that uses integer weights, but keep track of the
minimum and maximum weights in the graph and include an ADT function
that always returns weights that are numbers between 0 and 1.
20.12 Write a program that generates a random complete graph that has
weights chosen from a Gaussian distribution.
• 20.13 Write a program that generates V random points in the plane, then
builds a weighted graph by connecting each pair of points within a given
distance d of one another with an edge whose weight is the distance. (see
Exercise 17.60). Determine how to set d so that the expected number of
edges is E.
few basic terms from graph theory that we define next make possible
2
a concise statement of this property, which follows. 0
6
Definition 20.2 A cut in a graph is a partition of the vertices into 7
two disjoint sets. A crossing edge is one that connects a vertex in one 1
set with a vertex in the other. 3
crossing edge that is not in any MST and let T be any MST; or suppose
that T is an MST that contains no minimal crossing edge and let e be 2
0
any minimal crossing edge. In either case, T is an MST that does not 6
contain the minimal crossing edge e. Now consider the graph formed 7
by adding e to T . This graph has a cycle that contains e, and that cycle 1
must contain at least one other crossing edge—say, f , which is equal 3
or higher weight than e (since e is minimal). We can get a spanning
5 4
tree of equal or lower weight by deleting f and adding e, contradicting
either the minimality of T or the assumption that e is not in T .
2
0
If a graph’s edge weights are distinct, it has a unique MST; and
6
the cut property says that the shortest crossing edge for every cut must 7
be in the MST. When equal weights are present, we may have multiple 1
minimal crossing edges. At least one of them will be in any given MST
3
and the others may be present or absent.
5 4
Figure 20.5 illustrates several examples of this cut property. Note
that there is no requirement that the minimal edge be the only MST
edge connecting the two sets; indeed, for typical cuts there are several Figure 20.5
MST edges that connect a vertex in one set with a vertex in the other. If Cut property
we could be sure that there were only one such edge, we might be able These four examples illustrate
to develop divide-and-conquer algorithms based on judicious selection Property 20.1. If we color one set
of vertices gray and another set
of the sets; but that is not the case. white, then the shortest edge con-
We use the cut property as the basis for algorithms to find MSTs, necting a gray vertex with a white
and it also can serve as an optimality condition that characterizes one belongs to an MST.
230 §20.2 CHAPTER TWENTY
2 MSTs. Specifically, the cut property implies that every edge in an MST
0
is a minimal crossing edge for the cut defined by the vertices in the two
6
7 subtrees connected by the edge.
1 The second property, which we refer to as the cycle property, has
to do with identifying edges that do not have to be in a graph’s MST.
3
That is, if we ignore these edges, we can still find an MST.
5 4
Property 20.2 (Cycle property) Given a graph G, consider the graph
G defined by adding an edge e to G. Adding e to an MST of G and
2
0 deleting a maximal edge on the resulting cycle gives an MST of G .
6
7
Proof : If e is longer than all the other edges on the cycle, it cannot
1 be on an MST of G , because of Property 20.1: Removing e from
3 any such MST would split the latter into two pieces, and e would not
be the shortest edge connecting vertices in each of those two pieces,
5 4
because some other edge on the cycle must do so. Otherwise, let t be
a maximal edge on the cycle created by adding e to the MST of G.
2
0 Removing t would split the original MST into two pieces, and edges of
6 G connecting those pieces are no shorter than t; so e is a minimal edge
7
in G connecting vertices in those two pieces. The subgraphs induced
1
by the two subsets of vertices are identical for G and G , so an MST
3 for G consists of e and the MSTs of those two subsets.
5 4 In particular, note that if e is maximal on the cycle, then we have
shown that there exists an MST of G that does not contain e (the MST
of G).
Figure 20.6
Cycle property Figure 20.6 illustrates this cycle property. Note that the process
Adding the edge 1-3 to the graph of taking any spanning tree, adding an edge that creates a cycle, and
in Figure 20.1 invalidates the MST
then deleting a maximal edge on that cycle gives a spanning tree of
(top). To find the MST of the new
graph, we add the new edge to weight less than or equal to the original. The new tree weight will be
the MST of the old graph, which less than the original if and only if the added edge is shorter than some
creates a cycle (center). Deleting edge on the cycle.
the longest edge on the cycle (4-7) The cycle property also serves as the basis for an optimality
yields the MST of the new graph
(bottom). One way to verify that condition that characterizes MSTs: It implies that every edge in a
a spanning tree is minimal is to graph that is not in a given MST is a maximal edge on the cycle that it
check that each edge not on the forms with MST edges.
MST has the largest weight on the The cut property and the cycle property are the basis for the
cycle that it forms with tree edges.
For example, in the bottom graph, classical algorithms that we consider for the MST problem. We con-
4-6 has the largest weight on the sider edges one at a time, using the cut property to accept them as
cycle 4-6-7-1-3-4. MST edges or the cycle property to reject them as not needed. The
MINIMUM SPANNING TREES §20.2 231
edge does not create a cycle, it is the only crossing edge, and since we
consider the edges in sorted order, it is a minimal edge and therefore
in an MST. The basis for the induction is the V individual vertices;
once we have chosen V − 1 edges, we have one tree (the MST). No
unexamined edge is shorter than an MST edge, and all would create
a cycle, so ignoring all of the rest of the edges leaves an MST, by the
0 1 cycle property.
choice as the closest, and v would have led to the choice only one of
its lower-numbered neighbors, not both.
20.22 Suppose that a graph has distinct edge weights. Does its shortest edge
have to belong to the MST? Prove that it does or give a counterexample.
20.23 Answer Exercise 20.22 for the graph’s longest edge.
20.24 Give a counterexample that shows why the following strategy does not
necessarily find the MST: “Start with any vertex as a single-vertex MST, then
add V − 1 edges to it, always taking next a minimal edge incident upon the
vertex most recently added to the MST.”
20.25 Suppose that a graph has distinct edge weights. Does a minimal edge
on every cycle have to belong to the MST? Prove that it does or give a coun-
terexample.
20.26 Given an MST for a graph G, suppose that an edge in G is deleted.
Describe how to find an MST of the new graph in time proportional to the
number of edges in G.
20.27 Show the MST that results when you repeatedly apply the cycle property
to the graph in Figure 20.1, taking the edges in the order given.
20.28 Prove that repeated application of the cycle property gives an MST.
20.29 Describe how each of the algorithms described in this section can be
adapted (if necessary) to the problem of finding a minimal spanning forest of
a weighted graph (the union of the MSTs of its connected components).
2 2
Figure 20.8 0
0
0 0
Prim’s MST algorithm 6 6
2 7
7 7
The first step in computing the 1 6
1 1
MST with Prim’s algorithm is to
add 0 to the tree. Then we find all 3 3
the edges that connect 0 to other
5 4 5 4
vertices (which are not yet on the
tree) and keep track of the shortest 0-2 0-7 0-1 0-6 0-5 7-4 0-5
(top left). The edges that connect
tree vertices with nontree vertices
(the fringe) are shadowed in gray
and listed below each graph draw- 2 2
0 0 0
ing. For simplicity in this figure, 0
6 6 2 7
we list the fringe edges in order 7 2 7
of their length, so that the short- 1 6 4
1 1
est is the first in the list. Different
implementations of Prim’s algo- 3 3
rithm use different data structures 5 4 5 4
to maintain this list and to find the
minimum. The second step is to 0-7 0-1 0-6 0-5 4-3 4-5
move the shortest edge 0-2 (along
with the vertex that it takes us to)
from the fringe to the tree (second
from top, left). Third, we move 2 2 0
0 0 0
0-7 from the fringe to the tree, re-
6 6 2 7
place 0-1 by 7-1 and 0-6 by 7-6 7 2 7 7
on the fringe (because adding 7 1 6 4
1 1
to the tree brings 1 and 6 closer 3
to the tree), and add 7-4 to the 3 3
fringe (because adding 7 to the tree 5 4 5 4
makes 7-4 an edge that connects
a tree vertex with a nontree ver- 7-1 7-6 7-4 0-5 3-5
tex) (third from top, left). Next, we
move edge 7-1 to the tree (bottom,
left). To complete the computation,
we take 7-6, 7-4, 4-3, and 3-5 2 2 0
0 0 0
off the queue, updating the fringe
6 6 2 7
after each insertion to reflect any 7 2 7 7
shorter or new paths discovered 1
1 6 4
1 1
(right, top to bottom). 3
An oriented drawing of the 3 3
5
growing MST is shown at the right 5 4 5 4
of each graph drawing. The ori-
entation is an artifact of the algo- 7-6 7-4 0-5
0 1 2 3 4 5 6 7
rithm: we generally view the MST st 0 7 0 4 7 3 7 0
itself as a set of edges, unordered wt 0 .21 .29 .34 .46 .18 .25 .31
and unoriented.
MINIMUM SPANNING TREES §20.3 237
change. The key is to note that our interest is in the shortest distance
from each nontree vertex to the tree. When we add a vertex v to the
tree, the only possible change for each nontree vertex w is that adding
v brings w closer than before to the tree. In short, we do not need to
check the distance from w to all tree vertices—we just need to keep
track of the minimum and check whether the addition of v to the tree
necessitates that we update that minimum.
To implement this idea, we need data structures that can give us
the following information:
• For each tree vertex, its parent in the MST
• For each nontree vertex, the closest tree vertex
• For each tree vertex, the length of its parent link
• For each nontree vertex, its distance to the tree
The simplest implementation for each of these data structures is a
vertex-indexed array, though we have various options to save space by
combining arrays or using structures.
Program 20.3 is an implementation of Prim’s algorithm for an
adjacency-matrix graph ADT implementation. It uses the arrays st,
fr, and wt for these four data structures, with st and fr for the first
two (respectively), and wt for both the third and fourth. For a tree
vertex v, the entry wt[v] is the length of the parent link (corresponding
to st[v]); for a nontree vertex w, the entry wt[w] is the distance to the
tree (corresponding to fr[w]). The implementation is packaged as a
function GRAPHmstV that takes st and wt as client-supplied arrays. If
desired, we could add a wrapper function GRAPHmst to build an edge
list (or some other) MST representation, as discussed in Section 20.1.
After adding a new edge (and vertex) to the tree, we have two
tasks to accomplish:
• Check to see whether adding the new edge brought any nontree
vertex closer to the tree.
• Find the next edge to add to the tree.
The implementation in Program 20.3 accomplishes both of these tasks
with a single scan through the nontree vertices, updating wt[w] and
fr[w] if v-w brings w closer to the tree, then updating the current
minimum if wt[w] (the distance from w to fr[w] indicates that w is
closer to the tree than any other vertex with a lower index.
Property 20.6 Using Prim’s algorithm, we can find the MST of a
dense graph in linear time.
238 §20.3 CHAPTER TWENTY
nontree edges to find the one that is closest to the tree does not repre-
sent excessive extra cost. But in a sparse graph, we can expect to use
substantially fewer than V steps to perform each of these operations.
The crux of the strategy that we will use to do so is to focus on the set
of potential edges to be added next to the MST—a set that we call the
fringe. The number of fringe edges is typically substantially smaller
than the number of nontree edges, and we can recast our description
of the algorithm as follows. Starting with a self loop to a start vertex
on the fringe and an empty tree, we perform the following operation
until the fringe is empty:
Move a minimal edge from the fringe to the tree. Visit the
vertex that it leads to, and put onto the fringe any edges
that lead from that vertex to an nontree vertex, replacing
the longer edge when two edges on the fringe point to the
same vertex.
From this formulation, it is clear that Prim’s algorithm is nothing
more than a generalized graph search (see Section 18.8), where the
fringe is a priority queue based on a delete the minimum operation
(see Chapter 9). We refer to generalized graph searching with priority
queues as priority-first search (PFS). With edge weights for priorities,
PFS implements Prim’s algorithm.
This formulation encompasses a key observation that we made
already in connection with implementing BFS in Section 18.7. An
even simpler general approach is to simply keep on the fringe all of
the edges that are incident upon tree vertices, letting the priority-
queue mechanism find the shortest one and ignore longer ones (see
Exercise 20.37). As we saw with BFS, this approach is unattractive
because the fringe data structure becomes unnecessarily cluttered with
edges that will never get to the MST. The size of the fringe could grow
to be proportional to E (with whatever attendant costs having a fringe
this size might involve), while the PFS approach just outlined ensures
that the fringe will never have more than V entries.
As with any implementation of a general algorithm, we have
a number of available approaches for interfacing with priority-queue
ADTs. One approach is to use a priority queue of edges, as in our gener-
alized graph-search implementation of Program 18.10. Program 20.4
is an implementation that is essentially equivalent to Program 18.10
but uses a vertex-based approach so that it can use the index priority-
240 §20.3 CHAPTER TWENTY
Figure 20.9
Prim’s MST algorithm
This sequence shows how the MST
grows as Prim’s algorithm discovers
1/4, 1/2, 3/4, and all of the edges
in the MST (top to bottom). An
oriented representation of the full
MST is shown at the right.
MINIMUM SPANNING TREES §20.3 241
◦ 20.31 Answer Exercise 20.30 for graphs in which all vertices have the same
fixed degree t.
◦ 20.32 Answer Exercise 20.30 for general sparse graphs that have V vertices
and E edges. Since the running time depends on the weights of the edges and
on the degrees of the vertices, do a worst-case analysis. Exhibit a family of
graphs for which your worst-case bound is confirmed.
20.33 Show, in the style of Figure 20.8, the result of computing the MST of
the network defined in Exercise 20.21 with Prim’s algorithm.
• 20.34 Describe a family of graphs with V vertices and E edges for which
the worst-case running time of the PFS implementation of Prim’s algorithm is
confirmed.
•• 20.35 Develop a reasonable generator for random graphs with V vertices
and E edges such that the running time of the PFS implementation of Prim’s
algorithm (Program 20.4) is nonlinear.
20.36 Convert Program 20.4 for use in an adjacency-matrix graph ADT
implementation.
◦ 20.37 Modify Program 20.4 so that it works like Program 18.8, in that it
keeps on the fringe all edges incident upon tree vertices. Run empirical studies
to compare your implementation with Program 20.4, for various weighted
graphs (see Exercises 20.9–14).
• 20.38 Provide an implementation of Prim’s algorithm that makes use of your
representation-independent graph ADT from Exercise 17.51.
20.39 Suppose that you use a priority-queue implementation that maintains
a sorted list. What would be the worst-case running time for graphs with V
vertices and E edges, to within a constant factor? When would this method
be appropriate, if ever? Defend your answer.
◦ 20.40 An MST edge whose deletion from the graph would cause the MST
weight to increase is called a critical edge. Show how to find all critical edges
in a graph in time proportional to E lg V .
20.41 Run empirical studies to compare the performance of Program 20.3
to that of Program 20.4, using an unordered array implementation for the
priority queue, for various weighted graphs (see Exercises 2