WALCHAND INSTITUTE OF TECHNOLOGY, SOLAPUR
INFORMATION TECHNOLOGY
2021-22 SEMESTER - II
ASSIGNMENT - 10
Subject: Information Retrieval
Name: Ayush Pande Roll no: 74 Class: Final Year Btech IT
Title: Implementation of sequential searching using Boyer-Moore Algorithm
Theory:
The Boyer-Moore algorithm uses two different heuristics for determining the
maximum possible shift distance in case of a mismatch: the "bad
character" and the "good suffix" heuristics. Both heuristics can
lead to a shift distance of m. For the bad character heuristics this is the case, if the
first comparison causes a mismatch and the corresponding text symbol does not
occur in the pattern at all. For the good suffix heuristics this is the case, if only
the first comparison was a match, but that symbol does not occur elsewhere in the
pattern.
The preprocessing for the good suffix heuristics is rather difficult to understand
and to implement. Therefore, sometimes versions of the Boyer-Moore algorithm
are found in which the good suffix heuristics is left away. The argument is that
the bad character heuristics would be sufficient and the good suffix heuristics
would not save many comparisons. However, this is not true for small alphabets.
Analysis:
If there are only a constant number of matches of the pattern in the text, the Boyer-
Moore searching algorithm perfoms O(n) comparisons in the worst case. The
proof of this is rather difficult. In general Θ(n·m) comparisons are necessary, e.g.
if the pattern is am and the text an. By a slight modification of the algorithm the
number of comparisons can be bounded to O(n) even in the general case. If the
alphabet is large compared to the length of the pattern, the algorithm performs
O(n/m) comparisons on the average. This is because often a shift by m is possible
due to the bad character heuristics.
Program Code:
from tkinter import *
class BoyerMooreSearch:
def __init__(self, text, pattern):
[Link], [Link] = text, pattern
[Link], [Link] = len(text), len(pattern)
def match_in_pattern(self, char):
for i in range([Link] - 1, -1, -1):
if char == [Link][i]:
return i
return -1
def mismatch_in_text(self, currentPos):
for i in range([Link] - 1, -1, -1):
if [Link][i] != [Link][currentPos + i]:
return currentPos + i
return -1
def bad_character_heuristic(self):
positions = []
for i in range([Link] - [Link] + 1):
mismatch_index = self.mismatch_in_text(i)
if mismatch_index == -1:
[Link](i)
else:
match_index = self.match_in_pattern([Link][mismatch_index])
i=(
mismatch_index - match_index
)
return positions
def mhello():
mtext = [Link]()
pat1 = [Link]()
bms = BoyerMooreSearch(mtext, pat1)
positions = bms.bad_character_heuristic()
if len(positions) == 0:
mlabel3 = Label(mgui,fg="red",text="No match found !!").pack()
else:
mlabel4 = Label(mgui,fg="red",text="Pattern found in following positions:").pack()
for i in positions:
mlabel3 = Label(mgui,fg="red",text=str(i)).pack()
return
mgui = Tk()
ment = StringVar()
pat = StringVar()
[Link]('450x450+500+300')
2
[Link]('Assignment 8')
mlabel = Label(mgui,fg='white',bg="purple",text=': Boyer- Moore Algorithm :').pack()
mlabel2 = Label(mgui,text="ENTER THE TEXT").pack()
mentry = Entry(mgui,textvariable=ment).pack()
mlabel2 = Label(mgui,text="ENTER THE PATTERN").pack()
mentry1 = Entry(mgui,textvariable =pat).pack()
mbutton = Button(mgui,text="Boyer moore search",command=mhello,fg='white',bg='purple').pack()
[Link]()
Screenshots/Output: