0% found this document useful (0 votes)
26 views9 pages

Rabin-Karp and KMP Algorithms Explained

The document discusses three string matching algorithms: Rabin-Karp, Knuth-Morris-Pratt (KMP), and Automata-Based. It provides C implementations for each algorithm, explaining their key concepts, parameters, and time complexities. Each algorithm is designed to efficiently find occurrences of a pattern in a given text, with unique approaches to handle mismatches and optimize search performance.

Uploaded by

Manipriyaa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views9 pages

Rabin-Karp and KMP Algorithms Explained

The document discusses three string matching algorithms: Rabin-Karp, Knuth-Morris-Pratt (KMP), and Automata-Based. It provides C implementations for each algorithm, explaining their key concepts, parameters, and time complexities. Each algorithm is designed to efficiently find occurrences of a pattern in a given text, with unique approaches to handle mismatches and optimize search performance.

Uploaded by

Manipriyaa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

NAME: MANIPRIYAA R

REG NO: 23BAI0214

1.Rabin-Karp Algorithm (C Implementation)


#include <stdio.h>

#include <string.h>

#define d 256

void rabinKarp(char pattern[], char text[], int q) {

int M = strlen(pattern);

int N = strlen(text);

int p = 0, t = 0, h = 1;

int i, j;

for (i = 0; i < M - 1; i++)

h = (h * d) % q;

for (i = 0; i < M; i++) {

p = (d * p + pattern[i]) % q;

t = (d * t + text[i]) % q;

}// Slide the pattern over the text

for (i = 0; i <= N - M; i++) {

if (p == t) {

for (j = 0; j < M; j++)

if (text[i + j] != pattern[j])

break;

if (j == M)

printf("Pattern found at index %d\n", i);

if (i < N - M) {

t = (d * (t - text[i] * h) + text[i + M]) % q;

if (t < 0) t += q; }}}
int main() {

char text[] = "ABCCDEABCDABCD";

char pattern[] = "ABCD";

int q = 101; // Prime number

rabinKarp(pattern, text, q);

return 0;

1. Understanding the Algorithm

 The Rabin-Karp algorithm is a pattern-searching algorithm that uses hashing to find


occurrences of a pattern in a given text efficiently.

 It calculates the hash value of the pattern and compares it with the hash values of substrings of
the text.

2. Defining Key Parameters

 The text and pattern lengths are calculated.

 Variables are initialized to store the hash values of the pattern and the current substring of the
text.

 h is a factor used in hash value calculations.

3. Precomputing the Hash Factor (h)

 A helper value h = (d^(M-1)) % q is computed.

 This helps in efficiently removing the first character and adding a new character while
updating the rolling hash

4. Calculating Initial Hash Values

 The hash values of the pattern and the first window of text (of the same length as the pattern)
are computed.

 This avoids comparing the pattern character-by-character initially.

5. Sliding the Pattern Over the Text

 The algorithm moves the pattern one character at a time over the text.
 If the hash values match, a character-by-character comparison is performed to confirm the
match.

6. Rolling Hash Technique

 Instead of recalculating the hash from scratch for each new substring, the hash is updated using
the rolling hash formula:

o Remove the first character's contribution.

o Add the new character at the end.

 This reduces redundant computations and makes the algorithm efficient.

7. Handling Negative Hash Values

 If the updated hash value becomes negative, a small adjustment is made to convert it into a
positive value.

8. Output the Match Index

 If a pattern match is found, the index of occurrence is printed.

9. Main Function Execution

 A sample text ("ABCCDEABCDABCD") and pattern ("ABCD") are given.

 The Rabin-Karp function is called, and it finds all occurrences of "ABCD" in the text.

10. Time Complexity

 Best case: O(N)O(N)O(N) (when there are few hash collisions).

 Worst case: O(NM)O(NM)O(NM) (when many hash collisions occur, leading to character-by-
character comparisons).

 Average case: O(N)O(N)O(N) (efficient due to hashing).

2. Knuth-Morris-Pratt (KMP) Algorithm


#include <stdio.h>

#include <string.h>
void computeLPSArray(char* pattern, int M, int* lps) {

int len = 0, i = 1;

lps[0] = 0;

while (i < M) {

if (pattern[i] == pattern[len]) {

len++;

lps[i] = len;

i++;

} else {

if (len != 0)

len = lps[len - 1];

else {

lps[i] = 0;

i++}}}}

void KMPSearch(char* pattern, char* text) {

int M = strlen(pattern);

int N = strlen(text);

int lps[M];

computeLPSArray(pattern, M, lps);

int i = 0, j = 0;

while (i < N) {

if (pattern[j] == text[i]) {

i++;

j++;

}
if (j == M) {

printf("Pattern found at index %d\n", i - j);

j = lps[j - 1];

} else if (i < N && pattern[j] != text[i]) {

if (j != 0)

j = lps[j - 1];

else

i++;}}}

int main() {

char text[] = "ABABDABACDABABCABAB";

char pattern[] = "ABABCABAB";

KMPSearch(pattern, text);

return 0;}

1. Compute the LPS (Longest Prefix Suffix) Array

 The LPS array helps determine how much the pattern should be shifted when a mismatch occurs.

 It stores the length of the longest proper prefix which is also a suffix for each prefix of the
pattern.

 If a mismatch happens, the LPS array tells where to resume the comparison instead of restarting
from the beginning.

2. Constructing the LPS Array

 Start with the first character (lps[0] = 0).

 Compare the current character with the last matched prefix character:

o If they match, extend the LPS length and update lps[i].

o If they don’t match, reduce the prefix length using previous LPS values until a match is
found or reset to zero.
 Continue until the entire pattern is processed.

3. Searching the Pattern in the Text

 Traverse the text and compare it with the pattern.

 If characters match, move both pointers forward.

 If a complete match occurs, print the index and use lps[j-1] to determine how much to shift.

 If a mismatch occurs:

o Use the LPS array to avoid unnecessary rechecking.

o Move the pattern according to lps[j-1], instead of shifting it back completely.

4. Time Complexity

 Preprocessing (LPS Calculation): O(M)O(M)O(M)

 Pattern Matching: O(N)O(N)O(N)

 Overall Complexity: O(N+M)O(N + M)O(N+M), making KMP much faster than naive string-
matching algorithms.

5. Key Advantage

 Unlike brute force methods, KMP doesn’t backtrack in the text, making it highly efficient for
searching in large strings.

3. Automata-Based String Matching Algorithm


#include <stdio.h>

#include <string.h>

#define NO_OF_CHARS 256

void computeTF(char* pattern, int M, int TF[][NO_OF_CHARS]) {

int i, lps = 0, x;
for (x = 0; x < NO_OF_CHARS; x++)

TF[0][x] = 0;

TF[0][pattern[0]] = 1;

for (i = 1; i < M; i++) {

for (x = 0; x < NO_OF_CHARS; x++)

TF[i][x] = TF[lps][x];

TF[i][pattern[i]] = i + 1;

if (i > 0)

lps = TF[lps][pattern[i]]; }

for (x = 0; x < NO_OF_CHARS; x++)

TF[M][x] = TF[lps][x];}

void search(char* text, char* pattern) {

int M = strlen(pattern);

int N = strlen(text);

int TF[M + 1][NO_OF_CHARS];

computeTF(pattern, M, TF);

int state = 0, i;

for (i = 0; i < N; i++) {

state = TF[state][text[i]];

if (state == M)

printf("Pattern found at index %d\n", i - M + 1);}}

int main() {

char text[] = "AABAACAADAABAABA";

char pattern[] = "AABA";

search(text, pattern);

return 0;
}

1. Transition Function (TF) Table Construction

 The Transition Function (TF) table determines the next state based on the current state and the
incoming character.

 It ensures that when a mismatch occurs, the algorithm jumps to the longest prefix which is also a
suffix of the matched portion.

Steps for Building the TF Table:

1. Initialize the table: Set all transitions for state 0 to 0, except for the first character.

2. Iterate through the pattern:

o Copy transitions from the longest prefix suffix (LPS) state.

o Update the transition for the current character.

o Update the LPS state using previous transitions.

3. Final state transitions: Ensure that the last row contains proper shift values to continue pattern
matching efficiently.

2. Pattern Matching Process

 Start from state 0.

 For each character in the text:

o Move to the next state using the TF table.

o If the final state (equal to pattern length) is reached, a match is found.

3. Time Complexity

 Preprocessing (TF Table Construction): O(M×NO_OF_CHARS)O(M \times


\text{NO\_OF\_CHARS})O(M×NO_OF_CHARS) where MMM is the pattern length.

 Pattern Matching: O(N)O(N)O(N) where NNN is the text length.

 Overall Complexity: O(M×NO_OF_CHARS+N)O(M \times \text{NO\_OF\_CHARS} +


N)O(M×NO_OF_CHARS+N).

4. Key Advantages
 Efficient searching: Uses precomputed shifts, avoiding unnecessary comparisons.

 No backtracking: Unlike brute force methods, FA ensures smooth state transitions.

5. Example Execution

 Text: "AABAACAADAABAABA"

 Pattern: "AABA"

 The algorithm quickly identifies matches at indices 0, 9, and 12 by efficiently shifting the pattern

You might also like