Module 6: String Matching Algorithm
String Matching Introduction
String Matching Algorithm is also called "String Searching Algorithm."
This is a vital class of string algorithm is declared as "this is the method to find a place
where one is several strings are found within the larger string."
A string-searching algorithm, sometimes called string-matching algorithm, is
an algorithm that searches a body of text for portions that match by pattern.
Algorithms used for String Matching:
1. The Naive String Matching Algorithm
2. The Rabin-Karp-Algorithm
3. The Knuth-Morris-Pratt Algorithm
The Naïve String-Matching Algorithm
Naive pattern searching is the simplest method among other pattern searching
algorithms. Although, it is more efficient than the brute force approach
it also checks for all characters of the main string in order to find the pattern. Hence, its
time complexity is O(m*n) where the 'm' is the size of pattern and 'n' is the size of the
main string. This algorithm is helpful for smaller texts only.
It compares first character of pattern with searchable text. If Match Found, pointer in
the both string is advanced.
If Matched is not found, pointer of the text is incremented & pointer of pattern is reset.
Algorithm:
Step 1: There will be two loops: One loop for input pattern and the second loop for text
string.
Step 2: Now, At the time of searching algorithm in a window, there will be two cases:
o Case 1: If a match is found, we will match the entire pattern with the current
window of the text string. And if found the pattern string is found in the current
window after traversing, print the result. Else traverse in the next window.
o Case 2: If a match is not found, we will break from the loop (using the 'break'
keyword), and the j pointer of the inner loop will move one index more and start
the search algorithm in the next window.
Step 3: This process is repeated till the end of text.
Time Complexity:
o Worst case: O((n - m + 1) * m)
o Best case: O(n) (when mismatch happens quickly)
Example:
Rabin-Karp Algorithm
The Rabin-Karp algorithm is a string matching algorithm that efficiently finds patterns
within a larger text using a technique called hashing.
The basic idea is to convert the pattern and each possible substring of the text into
numeric values (hashes) and then compare these values rather than the strings
themselves. This allows for faster comparisons, especially when dealing with large texts.
The Rabin-Karp algorithm for string matching is useful because it can quickly find patterns
in large texts. It’s especially good when you need to search for multiple patterns at once
or when the text is very long.
Algorithm:
Step 1: Initially calculate the hash value of the pattern.
Step 2: Start iterating from the starting of the string:
o Calculate the hash value of the current substring having length m.
o If the hash value of the current substring and the pattern are same check if the
substring is same as the pattern.
o If they are same, store the starting index as a valid answer. Otherwise,
continue for the next substrings.
Step 3: Return the starting indices as the required answer.
Time Complexity:
o Average case: O(n + m)
o Worst case: O(nm) (due to hash collisions)
Example:
Knuth-Morris-Pratt (KMP) Algorithm
To avoid such redundancy, a linear sequence-matching algorithm named the KMP pattern
matching algorithm. It is also referred to as Knuth Morris Pratt pattern matching
algorithm.
The KMP algorithm starts the search operation from left to right. It uses the prefix
function to avoid unnecessary comparisons while searching for the pattern.
The KMP algorithm starts the search operation from left to right. It uses the prefix
function to avoid unnecessary comparisons while searching for the pattern.
This function stores the number of characters matched so far which is known as LPS
value.
Algorithm:
Step 1: Define a prefix function LPS (Longest Prefix Suffix) array.
Step 2: Slide the pattern over the text for comparison.
Step 3: If all the character’s match, we have found a match.
Step 4: If not, use the prefix function LPS Array to skip the unnecessary comparisons.
Time Complexity:
Preprocessing: O(m)
Search: O(n)
Overall: O(n + m)
Example:
Comparison Summary
Algorithm Time Complexity Space Best For
Naïve O((n-m+1)*m) O(1) Small texts
Rabin-Karp Avg: O(n + m) O(1) Hash-based searching
Efficient string matching with
Knuth-Morris-Pratt O(n + m) O(m)
preprocessing
Summary
Choose Naïve for simplicity and small data
Use Rabin-Karp for large text and multiple patterns
Use KMP for efficiency with a single long pattern and repetitive text