Archive
Posts Tagged ‘Levenshtein’
Levenshtein distance
October 19, 2010
Leave a comment
The Levenshtein distance (or edit distance) between two strings is the minimal number of “edit operations” required to change one string into the other. The two strings can have different lengths. There are three kinds of “edit operations”: deletion, insertion, or alteration of a character in either string.
Example: the Levenshtein distance of “ag-tcc” and “cgctca” is 3.
#!/usr/bin/env python
def LD(s,t):
s = ' ' + s
t = ' ' + t
d = {}
S = len(s)
T = len(t)
for i in range(S):
d[i, 0] = i
for j in range (T):
d[0, j] = j
for j in range(1,T):
for i in range(1,S):
if s[i] == t[j]:
d[i, j] = d[i-1, j-1]
else:
d[i, j] = min(d[i-1, j] + 1, d[i, j-1] + 1, d[i-1, j-1] + 1)
return d[S-1, T-1]
a = 'ag-tcc'
b = 'cgctca'
print LD(a, b) # 3
The implementation is from here.
Categories: python
distance, Levenshtein
