Dynamic Programming - III
Longest Common Sequence
Instructor: Ashok Singh Sairam
Lecture Plan
• Longest Common Sequence
Motivation and Problem Definition
Brute Force Approach
Dynamic Programming Approach
MA512: Data Structures and Algorithms
2
Longest Common Subsequence (LCS)
• We have two strings X and Y and we want to find
the longest possible string that is a subsequence
of both X and Y.
• When we say that Z is a subsequence of X, we
mean we can derive Z from X by removing some of
its elements
MA512: Data Structures and Algorithms
3
Ex: LCS
• X = A, B, C, B, D, A, B X = A, B, C, B, D, A, B
• Y = B, D, C, A, B, A Y = B, D, C, A, B, A
• B, C, B, A and B, D, A, B are longest
common subsequences of X and Y (length = 4)
• B, C, A, however is not a LCS of X and Y
MA512: Data Structures and Algorithms
4
Practical example of LCS
• DNA string: A sequence of symbols A, C, G, T
S= ACCGGTCGAGCTTCGAATT
• DNA analysis: two DNA string comparison
MA512: Data Structures and Algorithms
5
Practical example of LCS
• DNA strand has millions of base elements and even a
simple gene has many thousands of elements.
Efficiency in comparison is important
• Exact match of gene unlikely, so researchers look for
common subsequences of gene and the DNA strand
If subsequence is most of length of gene sequence,
strand is regarded as containing the gene.
MA512: Data Structures and Algorithms
6
Problem Defintion
• Given two sequences
X = x1, x2, …, xm
Y = y1, y2, …, yn
find a maximum length common subsequence
(LCS) of X and Y
MA512: Data Structures and Algorithms
7
Brute-Force Solution
• For every subsequence of X, check whether it’s a
subsequence of Y
• There are 2m subsequences of X to check
• Each subsequence takes (n) time to check
scan Y for first letter, scan for second, and so on
• Running time: (n2m)
MA512: Data Structures and Algorithms
8
Notations
• Given a sequence X = x1, x2, …, xm we define the
i-th prefix of X, for i = 0, 1, 2, …, m
Xi = x1, x2, …, xi
• c[i, j] = the length of an LCS of the sequences
Xi = x1, x2, …, xi and Yj = y1, y2, …, yj
MA512: Data Structures and Algorithms
9
LCS DP: Step 1
• Characterize optimal substructure
MA512: Data Structures and Algorithms
10
What the theorem says?
• Case 1: If 𝑥𝑚 = 𝑦𝑛 find LCS of 𝑥𝑚−1 and 𝑦𝑛−1 ,
then append 𝑥𝑚
Include one element into the common sequence (Z) and
solve the resulting subproblem
One subproblem
X = A, B, D, E
Y = Z, B, E
MA512: Data Structures and Algorithms
11
What the theorem says? – contd.
• Case2: If 𝑥𝑚 ≠ 𝑦𝑛 find LCS of 𝑥𝑚−1 and 𝑦𝑛 and LCS
of 𝑥𝑚 and 𝑦𝑛−1 , take which one is longer
Exclude an element from a string and solve the resulting
subproblem
Results in two subproblems
X = A, B, D, G
Y = Z, B, D
• find a LCS of Xi-1 and Yj: Xi-1 = A, B, D and Yj = Z, B, D
• find a LCS of Xi and Yj-1: Xi = A, B, D, G and Yj = Z, B
MA512: Data Structures and Algorithms
12
Overlapping substructure
• Overlapping substructure
Both LCS of 𝑥𝑚−1 and 𝑦𝑛 and
LCS of 𝑥𝑚 and 𝑦𝑛−1 ,
will need to solve LCS of 𝑥𝑚−1 and 𝑦𝑛−1
MA512: Data Structures and Algorithms
13
LCS DP: Step 2 – Recursive solution
• Recall c[i, j] = Length of LCS of sequences Xi and Yj
• Case 1: xi = yj
c[i, j] = c[i - 1, j - 1] + 1
• Case 2: xi yj
c[i, j] = max { c[i - 1, j], c[i, j-1] }
• Recursive formula
MA512: Data Structures and Algorithms
14
Overlapping Subproblems
• To find a LCS of X and Y
we may need to find the LCS between Xm and Yn-1 and
that of Xm-1 and Y
Both the above subproblems has the subproblem of
finding the LCS of Xm-1 and Yn-1
• Subproblems share subsubproblems
MA512: Data Structures and Algorithms
15
LCS DP: Step 3 – Computing length
of the LCS
j
• Store the value of yj: y1 y2 yn
the c[i,j] in a table 0 xi 0 0 0 0 0 0
Compute the 1 x1 first
0
entries in a row-
2 x2 0 second
major order i
0
0
m xm 0
0 1 2 n
MA512: Data Structures and Algorithms
16
Additional information
A matrix b[i, j]:
• For a subproblem [i, j] it 0 1 2 3 n
tells us what choice was b & c: yj: A C D F
made to obtain the 0 xi 0 0 0 0 0 0
optimal value 1 A 0
• If xi = yj
2 B
b[i, j] = “ ” 0 c[i-1,j]
i
• Else, if c[i - 1, j] ≥ c[i, j-1] 3 C 0 c[i,j-1]
b[i, j] = “ ” 0
else m D 0
b[i, j] = “ ”
j
MA512: Data Structures and Algorithms
17
LCS-LENGTH(X, Y)
1. for i ← 1 to m
The length of the LCS if one of the sequences
2. do c[i, 0] ← 0 is empty is zero
3. for j ← 0 to n
4. do c[0, j] ← 0
5. for i ← 1 to m
6. do for j ← 1 to n
7. do if xi = yj
Case 1: xi = yj
8. then c[i, j] ← c[i - 1, j - 1] + 1
9. b[i, j ] ← “ ”
10. else if c[i - 1, j] ≥ c[i, j - 1]
11. then c[i, j] ← c[i - 1, j]
12. b[i, j] ← “↑”
Case 2: xi yj
13. else c[i, j] ← c[i, j - 1]
14. b[i, j] ← “←”
15. return c and b
Running time: (mn)
MA512: Data Structures and Algorithms
18
0 if i = 0 or j = 0
Example c[i, j] = c[i-1, j-1] + 1 if xi = yj
max(c[i, j-1], c[i-1, j]) if xi yj
X = A, B, C, B, D, A
Y = B, D, C, A, B, A 0 1 2 3 4 5 6
yj B D C A B A
If xi = yj 0 xi 0 0 0 0 0 0 0
b[i, j] = “ ” 1 A
0 0 0 0 1 1 1
Else if c[i - 1, j] ≥ c[i, j-1]
2 B 0 1 1 1 1 2 2
b[i, j] = “ ”
3 C
else 0 1 1 2 2 2 2
4 B
b[i, j] = “ ” 0 1 1 2 2 3 3
5 D
0 1 2 2 2 3 3
6 A 0 1 2 2 3 3 4
7 B 0 1 2 2 3 4 4
19
4. Constructing a LCS
• Start at b[m, n] and
follow the arrows 0 1 2 3 4 5 6
• When we encounter yj B D C A B A
a “ “ in b[i, j] 0 xi 0 0 0 0 0 0 0
xi = yj is an 1 A 0
0
0
0 1 1 1
element of the LCS 2 B
0 1 1 1 1 2 2
3 C
0 1 1 2 2 2 2
4 B
0 1 1 2 2 3 3
5 D
0 1 2 2 2 3 3
6 A 0 1 2 2 3 3 4
7 B 0 1 2 2 3 4 4
20
PRINT-LCS(b, X, i, j)
1. if i = 0 or j = 0 Running time: (m + n)
2. then return
3. if b[i, j] = “ ”
4. then PRINT-LCS(b, X, i - 1, j - 1)
5. print xi
6. elseif b[i, j] = “↑”
7. then PRINT-LCS(b, X, i - 1, j)
8. else PRINT-LCS(b, X, i, j - 1)
Initial call: PRINT-LCS(b, X, length[X], length[Y])
21
Improving the Code
• What can we say about how each entry c[i, j] is
computed?
It depends on c[i -1, j - 1], c[i - 1, j], and c[i, j - 1]
Eliminate table b and compute in O(1) which of the
three values was used to compute c[i, j]
We save (mn) space from table b
However, we do not asymptotically decrease the
auxiliary space requirements: still need table c
MA512: Data Structures and Algorithms
22
Improving the Code – contd.
• If we only need the length of the LCS
LCS-LENGTH works only on two rows of c at a time
• The row being computed and the previous row
We can reduce the asymptotic space requirements by
storing only these two rows
MA512: Data Structures and Algorithms
23
Exercise
MA512: Data Structures and Algorithms
24