Data Structures and
Algorithms in Python
Michael T. Goodrich
Department of Computer Science
University of California, Irvine
Roberto Tamassia
Department of Computer Science
Brown University
Michael H. Goldwasser
Department of Mathematics and Computer Science
Saint Louis University
Instructor’s Solutions Manual
Chapter
12 Sorting and Selection
Hints and Solutions
Reinforcement
R-12.1) Hint Argue in more detail about why the merge-sort tree has
height O(log n).
R-12.1) Solution For each element in a sequence of size n there is ex-
actly one exterior node in the merge-sort tree associated with it. Since the
merge-sort tree is binary with exactly n exterior nodes, we know it has
height ⌈log n⌉.
R-12.2) Hint Recall the definition of a recursion trace from Chapter 4.
R-12.2) Solution The downward arrows represent a recursive call of merge-
sort. The upward arrows represent the return of a recursive call.
R-12.3) Hint Consider “padding” out the input with infinities to make n a
power of 2. How does this affect the running time?
R-12.4) Hint Consider an input with duplicates.
R-12.4) Solution It is not stable. Given equality of S1[i] and S2[j], it
makes S2[j] the next element of the result. This could easily be fixed.
R-12.5) Hint Consider an input with duplicates.
R-12.5) Solution It is not stable. Given equality of S1.first() and S2.first(),
it chooses S2.first() as the next element of the result. This could easily be
fixed.
R-12.6) Hint You need a different way to handle the equal case in the
merge procedure.
R-12.7) Hint Consider using something like the merge for merge-sort.
R-12.7) Solution Merge sequences A and B into a new sequence C (i.e.,
call merge(A, B,C)). Do a linear scan through the sequence C removing all
duplicate elements (i.e., if the next element is equal to the current element,
remove it).
79
R-12.8) Hint Derive a recurrence equation for this algorithm assuming n
is a power of 2. Does it look familiar? It should.
R-12.8) Solution O(n log n) time.
R-12.9) Hint You want each choice of pivot to form a very bad split.
R-12.9) Solution The sequence should have the property that the selected
pivot is the largest element in the subsequence. This is to say that if G
contains no elements, the worst-case Ω(n2 ) bound is obtained.
R-12.10) Hint Recall what is the best possible split we can get for a given
pivot and then derive a recurrence equation assuming we get this kind of
a split. This equation should look familiar.
R-12.11) Hint To gain intuition, work out the first few splits on the se-
quence (1, 1, 1, 1, 1, 1, 1, 1, 2).
R-12.11) Solution The implementation of quickSortInPlace runs in worst-
case O(n log n) time on a sequence of identical values, and expected O(n log n)
time on any sequence. The key is that in the loops scanning from left and
right, we stop when finding equal elements, leading to a more even split.
R-12.12) Hint Clearly the flaw must involve a case where a pass of the
outermost loop completes with the value of left precisely equal to right.
R-12.12) Solution Note that within the loop, just after swapping two
elements, left is incremented and right is decremented before retesting
the loop condition. The flaw is that if the loop were to be exited when
left == right, the element currently at S[left] has not been examined. Just
outside the loop, we conclude by swapping the pivot element S[b] with the
element at S[left]. This may cause an element that is smaller than the pivot
to be placed in the right side. As a counterexample, consider original con-
tents {40, 10, 20, 30}. 30 is the pivot value. During the first pass of the
loop S[0] and S[2] are swapped and then left and right both become 1. If
the outermost loop is exited, we conclude by swapping S[1] and S[3], re-
sulting in contents {20, 30, 40, 10}. Although the recursive call will invert
40 and 10, 10 will remain to the right of 20 and 30. As a minimal coun-
terexample, notice that if sorting the contents {5, 6}, the outerloop never
executes, yet the values 6 and 5 become swapped after the loop.
R-12.13) Hint Develop a test case in which left equals right immediately
prior to the evaluation of line 14.
R-12.13) Solution If left == right immediately after the inner loops com-
plete, it is not that we need the values of S[left] and S[right] swapped; but
what is important is that left = left + 1 and right = right − 1 are exe-
cuted, or else we will be stuck in an infinite loop within the outerloop. As
a counterexample, simply consider input {30, 20}.
80 Chapter 12. Sorting and Selection
R-12.14) Hint 1/n2 is the same as 1/22 log n .
R-12.14) Solution In order for x to belong to more than i subproblems in
size group i, x has to belong to i subproblems with bad calls. This event
occurs with probability 1/22 log n = 1/n2 for i = 2 log n.
R-12.15) Hint What is the maximum number of external nodes that a
binary tree of height n can have?
R-12.15) Solution 2n .
R-12.16) Hint Recall that to sort n elements with a comparison-based
algorithm requires Ω(n log n) time.
R-12.16) Solution k must be O(n/ log n).
R-12.17) Hint No. Why not?
R-12.17) Solution No. Bucket-sort does not use a constant amount of
additional storage. It uses O(n + N) space.
R-12.18) Hint Work out some examples with triples first. Then move on
to d-tuples.
R-12.18) Solution Perform the stable bucket sort on m, l, and then k. In
general, perform the stable bucket sort on kd down to k1 .
R-12.19) Hint The two running times are not the same.
R-12.19) Solution Merge-sort takes O(n log n) time, as it is oblivious to
the case when there are only two possible values. On the other hand, the
way we have defined quick-sort with a three-way split implies that using
quick-sort to sort S will take only O(n) time.
R-12.20) Hint There are only two possible key values.
R-12.20) Solution O(n) time.
R-12.21) Hint Try to mimic the partition method used in the in-place
quick-sort algorithm.
R-12.21) Solution Imagine that we color the 0’s blue and the 1’s red. Start
with a marker at the beginning of the array and one at the end of the ar-
ray. While the first marker is at a blue element, continue incrementing its
index. Likewise, when the second marker is at a red element, continue
decrementing its index. When the first marker has reached a red element
and the second a blue element, swap the elements. Continue moving the
markers and swapping until they meet. At this point, the sequence is or-
dered. With three colors in the sequence, we can order it by doing the
above algorithm twice. In the first run, we will move one color to the
front, swapping back elements of the other two colors. Then we can start
at the end of the first run and swap the elements of the other two colors in
exactly the same way as before. Only this time the first marker will begin
where it stopped at the end of the first run.
81
R-12.22) Hint Check out the discussion comparing the various sorting
algorithms.
R-12.22) Solution An input list that is already sorted will cause merge-
sort and heap-sort to take O(n log n) time to sort, but can be sorted by
insertion-sort in O(n) time. If we reverse this list, then merge-sort and
heap-sort still take O(n log n) time, but insertion-sort will take O(n2 ) time.
R-12.22) Hint Check out the discussion comparing the various sorting
algorithms.
R-12.22) Solution An input list that is already sorted will cause .
R-12.23) Hint Consider the complexity of comparisons versus using ele-
ments as indices into an array.
R-12.24) Hint Think of the worst possible way to choose pivots in this
algorithm.
Creativity
C-12.25) Hint How do you know S and T have the same elements in
them?
C-12.26) Hint Sort first.
C-12.26) Solution First we sort the objects of A. Then we can walk
through the sorted sequence and remove all duplicates. This takes O(n log n)
time to sort and n time to remove the duplicates. Overall, therefore, this is
an O(n log n)-time method.
C-12.27) Hint Can you adapt the merge algorithm of Code Fragment 12.3
to directly manipulate nodes of the list.
C-12.28) Hint Merge-sort is a particularly good choice for a linked list.
C-12.29) Hint A queue of queues can be very helpful.
C-12.30) Hint It would be easier if the last element in the array were still
the pivot...
C-12.31) Hint For the overall worst case, recall the worst case for choos-
ing the last element as the pivot.
C-12.31) Solution This pivot should be a good one, since the median of
these d values has a better chance of being close to the true median than
the last element in the input had by itself. Still, the worst-case running
time is still not so great: O(n2 /d), which is still O(n2 ) for constant d.
C-12.32) Hint You need to use an induction hypothesis that T (n) ≤ cn log n,
for some constant c.
C-12.32) Solution We will show that T (n) ≤ cn log n, for some constant
c > 0, by induction (Section 3.4.3).
82 Chapter 12. Sorting and Selection
By the definition of good and bad calls (and their respective worst-case
partitions), as well as the linearity of expectation (Proposition B.19),
1 1
T (n) ≤ (T (3n/4) + T (n/4)) + (T (n − 1)) + bn,
2 2
where bn is the amount of time needed to do the nonrecursive work of
splitting a list into sublists and concatenating the final sorted sublists. Ap-
plying our induction hypothesis, we can write
1 3n 3n n n 1
T (n) ≤ c log + c log + [c(n − 1) log(n − 1)] + bn
2 4 4 4 4 2
1 3n 3n n n 1
≤ c log + c log + cn log n + bn
2 4 4 4 4 2
1 3n 3n 4 n n 1
= c log n − c log + c log n − c log 4 + cn log n + bn
2 4 4 3 4 4 2
= cn log n − c(3n/8) log(4/3) − c(n/8) log 4 + bn
≤ cn log n,
if c ≥ 4. Thus, the expected running time of randomized quick-sort is
O(n log n).
C-12.33) Hint Carefully consider how to maintain the stated invariant
when classifying each additional element.
C-12.34) Hint Sort the votes, and then determine who received the maxi-
mum number of votes.
C-12.34) Solution First sort the sequence S by the candidate’s ID. Then
walk through the sorted sequence, storing the current max count and the
count of the current candidate ID as you go. When you move on to a new
ID, check it against the current max and replace the max if necessary.
C-12.35) Hint Think of a data structure that can be used for sorting in a
way that only stores k elements when there are only k keys.
C-12.35) Solution In this case we can store candidate ID’s in a balanced
search tree, such as an AVL tree or red-black tree, where in addition to
each ID we store in this tree the number of votes that ID has received. Ini-
tially, all such counts are 0. Then, we traverse the sequence of votes, incre-
menting the count for the appropriate ID with each vote. Since this data
structure stored k elements, each such search and update takes O(log k)
time. Thus, the total time is O(n log k).
C-12.36) Hint Shoot for an O(n) expected running time.
C-12.37) Hint Develop a meaningful way to break ties during compar-
isons .
83
C-12.38) Hint Sort A and B first.
C-12.39) Hint Think of alternate ways of viewing the elements.
C-12.39) Solution To sort S, do a radix sort on the bits of each of the n
elements, viewing them as pairs (i, j) such that i and j are integers in the
range [0, n − 1].
C-12.40) Hint Find a way of sorting them as a group that keeps each
sequence contiguous in the final listing.
C-12.41) Hint Sort first.
C-12.41) Solution Sort the elements of S, which takes O(n log n) time.
Then, step through the sequence looking for two consecutive elements
that are equal, which takes an additional O(n) time. Overall, this method
takes O(n log n) time.
C-12.42) Hint Try to modify the merge-sort algorithm to solve this prob-
lem.
C-12.43) Hint Try to modify the insertion-sort algorithm to solve this
problem.
C-12.44) Hint 1
C-12.44) Solution Note that half of the elements ranked in the top half of a
sorted version of S are expected to be in the first half of S. Likewise, half of
the elements ranked in the bottom half of a sorted version of S are expected
to be in the second half of S. These two sets define (n/4)(1 + n/4)/2
inversions, which is Omega(n2 ).
C-12.45) Hint Consider the graph of the equation m = a + b for a fixed
value of m.
C-12.46) Hint Perform a selection first on some appropriate order statis-
tics.
C-12.47) Hint Try to design an efficient divide-and-conquer algorithm.
C-12.47) Solution This problem can be solved using a divide-and-conquer
approach. First, we choose a random bolt and partition the remaining nuts
around it. Then we take the nut that matches the chosen bolt and partition
the remaining bolts around it. We can continue doing this until all the nuts
and bolts are matched up. In essence, we are doing the randomized quick-
sort algorithm. Thus, we have an average running time of O(n log n).
C-12.48) Hint You will need two-passes through the data at each level of
recursion.
C-12.49) Hint Use in-place quick-sort as a starting point.
C-12.50) Hint Think about what would be the perfect pivot in a an algo-
rithm like quick-sort.
C-12.51) Hint Use linear-time selection in an appropriate way.
84 Chapter 12. Sorting and Selection
C-12.52) Hint Think of an alien version of quick-sort.
C-12.53) Hint You could dynamically define a new key type with its own
definition for comparisons.
C-12.54) Hint For (a), revisit the definition of the randomized quick-
sort algorithm. For (b), argue why the probability that Ci, j (x) = 1 is at
most 1/2 j and why the dependence between Ci, j (x)’s only helps. For (c),
review the book’s discussion of geometric sums. For (d), just plug in the
equation for µ and do the math. For (e), argue about all n elements from
the bound on a single one.
C-12.54) Solution For (a), note that we need not define Ci, j for j > n, since
it is impossible for an element to belong to more than n subproblems, since
each subproblem at least removes the pivot.
For (b), first note that the probability that Ci, j (x) = 1 is at most 1/2 j , since
in order for x to be in j + 1 subproblems in size group i it must belong
to j bad calls, each of which occurs with probability 1/2. Of course, if
the ( j + 1)st call for x in size group is good, then Ci,k = 0 for k > j, but
this only helps, as does the case if x is not in size group i at all. Thus,
∑Li=0 ∑nj=0 Ci, j (x) ≤ ∑Li=0 ∑nj=0 Xi, j .
For (c), note that this sum is 1 + 1/2 + 1/4 + 1/8 + · · · , which is at most
2.
For (d), we note that µ = 2L; hence Pr(X > 4L) < Pr(X > 2µ) < (4/e)−µ =
n
(4/e)−(2−1/2 )L < (4/3)−2L = (4/3)−2 log4/3 n = 1/n2 , for n ≥ 2.
For (e), we note that the above bound implies that, for any input element x,
the probability that x belongs to more than 4 log4/3 n subproblems is less
than 1/n2 . Thus, the probability that any input element is in more than
4 log4/3 n subproblems is less than 1/n. Thus, the randomized quick-sort
algorithm runs in O(n log n) time with probability at least 1 − 1/n.
C-12.55) Hint The recurrence equation denotes two recursive calls, but
one is smaller than the other.
Projects
P-12.56) Hint Think about how to define subproblems concisely and store
them on the stack. You then can use a while loop to process problems from
and to this stack. Also, please see the chapter discussion about in-place
quick-sort for more hints.
P-12.57) Hint Implement the version that is not in-place first.
P-12.58) Hint An almost sorted sequence could be one with at most a
linear number of inversions.
P-12.59) Hint An almost sorted sequence could be one with at most a
linear number of inversions.
85
P-12.60) Hint Be sure to perform enough tests so that your results are
trustworthy.
P-12.61) Hint Use good testing inputs to verify that your method is stable.
Also, be sure to copy the elements of the list in and out of the bucket array.
P-12.62) Hint One good animation style uses vertical lines various lengths
to represent the different elements.