Finite Automaton Simulation With Regular Expression
Finite Automaton Simulation With Regular Expression
Submitted to:
In Partial Fulfillment
CSP20233X
By:
i
UNIVERSITY OF NEGROS OCCIDENTAL – RECOLETOS, INC.
COLLEGE OF INFORMATION TECHNOLOGY
#51 Lizares St, Bacolod City, 6100 Negros Occidental, Philippines
INTRODUCTION..................................................................................................................1
Problem Definition:............................................................................................................. 2
Input Requirements.......................................................................................................2
Accepted RE Symbols.................................................................................................... 2
PROGRAM DESIGN............................................................................................................. 5
Overview of the Program.................................................................................................... 5
Regex Input.................................................................................................................. 5
Standardization............................................................................................................ 5
Conversion to Postfix (Reverse Polish Notation)......................................................... 5
Regex to NFA converter............................................................................................... 5
Input String................................................................................................................... 5
String Validation............................................................................................................5
Input/Output....................................................................................................................... 6
Algorithm Implementation:.................................................................................................6
CODE WALKTHROUGH........................................................................................................8
TESTING........................................................................................................................... 16
CONCLUSION....................................................................................................................18
References....................................................................................................................... 19
Appendix............................................................................................................................... 20
CURRICULUM VITAE............................................................................................................21
INTRODUCTION
Automata theory, a branch of computer science that deals with the logic of
computation, focuses on abstract machines and the problems they can solve formally called
as Automatons. It’s an important subject for computer scientists to understand how machines
compute and decide whether a function is decidable or not (Basics of Automata Theory, n.d).
One of its core models is Finite Automatons (FA), which forms the basis of understanding
regular languages. This project focuses on simulating a finite automaton that is derived from
a regular expression.
Regular expressions (RE) or regex are concepts used commonly together with finite
automatons. They are notations that are used to define search patterns within strings
given string belongs to the language defined by the regular expression. The main goal of this
project is to design a regex to NFA converter and validate whether the given regex and input
The system works by taking a regular expression and standardizing it by making sure
that in between two symbols there is an operator. After standardizing, we convert the string
into a postfix notation version of the string to eliminate parentheses. Lastly, we take the
postfix string into the regular expression to NFA converter to generate an NFA using a
transition table. In the regular expression to NFA converter, we read each character in the
string. If the character is a symbol, we make an NFA fragment, else if it’s an operator, we do
PROBLEM DEFINITION
NFA) and accepts or rejects strings based on a given regular expression. The RE should be
converted to an automaton before string recognition. As such, the assumptions for the
- The user must always provide both a regular expression and an input string for
In a Non-deterministic Finite Automaton (NFA), the language is the set of all strings
over a given alphabet that the NFA "accepts". Since the given problem is to simulate a Finite
Automaton from a given regular expression, as long as both regex and input strings are
clearly defined, the program can generate any language. Thus any regular languages can be
THEORETICAL BACKGROUND
Automata theory is a branch of computer science that deals with the designing and
(Tutorialpoint, n.d). These abstract machines are commonly referred to as Automatons. These
automatons are used for natural language processing, pattern matching in text, or designing
compilers.
The components of an Automaton are states (Q) , alphabet (Σ), transition function (δ),
initial state (q0), and accept state (F). States are defined as the finite set of conditions that the
automaton can be in. Alphabets refers to the finite set of symbols that the automatons read as
input. Transition functions show how the automaton moves from one state to another based
on the language or input. The Initial state is where the computation/algorithm first begins.
Lastly, the Final state indicates whether the given input was accepted or rejected.
Expressions (RE) or regex. These are formal notations that describe patterns in strings, and
they define regular languages, the same class of language recognized by finite automatons.
Regular expressions are constructed using operators such as Concatenation (+), Union (|), and
transforms the RE input into a standardized format such that when a pair of symbols are
converted to postfix notation by removing the parentheses and moving the operator of the
pair of states to the right side of the symbols. The postfixed RE is then converted to an NFA
by reading each character and if that character is a symbol, it creates an NFA fragment with a
start state, an accept state, and a transition function to connect both states. If it's an operator, a
PROGRAM DESIGN
derived from a given regex and determines whether a given input string is accepted or
rejected. The program is implemented in Python for its simplicity and its extensive libraries
To fully simulate a Finite Automaton, the program needs to follow a structured flow
from providing a valid regex, converting it to NFA, and checking whether the inputted string
4. Regex to NFA converter - After the postfix, it creates a transition table to
5. Input String - The user inputs the string to be tested on the automaton.
6. String Validation - the NFA simulates state transitions for each character in the
input string. If the automaton ends in an accepting state, the string is accepted;
otherwise, it is rejected.
By doing these steps, the program not only validates the input strings correctly but
Input/Output
The program requires users to input a regex and input strings during compile time.
The regex must contain only alphanumeric characters and certain operators for it to be valid.
For the given example of string A and B, some valid expressions are (A.B), (A+B), (AB*),
etc,.
During the simulation process, the regex is converted into NFA where it presents a
transition table and compares the given input string with the converted regex. If it is of the
Algorithm Implementation:
To fully simulate the NFA-based automaton, the algorithm required several key steps
to fully convert a given regex into an NFA, and they can be broken down into the following:
precedence. An example being the infix regex A+B will become AB+.
- The postfix notation is then processed symbol after symbol to construct the
- Concatenation (.) connects the accept state of the first NFA to the start state of
the second. Connection of two NFA fragments refers to updating the transition
table of the first NFA such that the transition function of the accept state of
that NFA will be updated as the transition function of the start state of the
second NFA
- Union (+) creates NFA with a new start state then it reads the past NFA
fragments. The new start state is then connected to the accept state of the NFA
fragments. This is done by copying the transition function of the start states of
the two or more NFA fragments then applying it to the transition function of
- Lastly, the Kleene Star (*) makes the start state accepting and then adds a
- They will all then be combined step by step until the full NFA is constructed.
After which, another NFA object will be created to traverse the whole NFA to
- The given input string will be tested to be evaluated if it’s accepted on the
NFA or not. For each character in the input string, it checks all possible
transitions from the current states and moves on to the next states according to
the transition table. If any current state is an accepting state, then the string is
By following this algorithm, the program can fully simulate an NFA and generate any
language as long as the given regex and input strings are clearly defined.
CODE WALKTHROUGH
There are multiple functions that make this simulation possible, from creating the
Figure 1. State Creation function
In this portion of the code, the new_state() function is responsible for creating unique
state names to ensure every NFA fragment has distinct states during the construction. We also
have the constructor that initializes the states, its starting and accepting states, as well as the
transitions table which are implemented using a dictionary. This dictionary contains states as
keys and another dictionary as its value, this inner dictionary contains the alphabet as keys
The add_state() function is responsible for adding the states, identifying it if it’s either
a start or accepting state. This new state is added to its corresponding variables or containers
and then added to the transition table as a key. The add_transition() function defines the
transitions between states for a given symbol. The from state’s dictionary is updated by
adding a symbol and a set if it doesn’t exist, otherwise it just appends the destination state to
the set of destination states inside the dictionary of that from state.
The display() function prints out the transition table of the given NFA. First it collects
all symbols present in the RE. A header row is then created by joining the symbols. The loop
iterates each state then it prints the name of that state followed by the set of destination states
The simulate() function tests whether the given input string/test string is accepted or
rejected. Each character in the test string is read and traverses the graph if a state has a
transition function for that specific character. After all of the characters are read, it checks if
This function is done after converting the given regex into a postfix notation.
Alphanumeric symbols create the starting and accepting states and are added into a stack.
Operations such as concatenation take the last two fragments of the NFA by popping the
stack and are connected by replacing the transition table of the first NFA’s accept state with
the transition table of the start state of the second NFA. In union, a new NFA with a new start
state is created and is connected by replacing the transition table of the start state to the
transition table of the start state of the NFA fragments. Lastly, in kleene star, the start state
The inputted regex must be standardized before converting it into an NFA. The
an opening parenthesis don’t have an operation between them. This ensures that cases like
"ab", or "a*b" are rewritten as "a.b" and "a*.b", making the structure of the regex
After standardizing the regex, the postfix() function converts it into a postfix notation
using operator precedence to make building the NFA easier. It reads every character then it
checks if it’s an operator then it adds it to the stack if it’s empty but if another operator is
present in the stack then it checks the precedence such that if the current operator’s
precedence is lower than the previous operator then the past operator is appended. For
example, “a+b.c” will output “abc.+” because the “.” operator has a higher precedence thus it
is appended first.
In the main() function, it is where the users input the regex and test strings. It prints
out the original regex, then it becomes standardized, then converted into postfix notation,
then into an NFA transition table, then reconstructed to clean the NFA by eliminating the
dead states, lastly the transition table is displayed. Test strings are then declared and a for
loop is implemented to check if those test strings are accepted or rejected. Lastly, the set of
TESTING
To verify and test the accuracy of the program, multiple test cases were designed to
This test contains the regex “ab+(ba)*” and the input string are: “”, ”ab”, ”ba”,
“baba”, and “abab”. After the regex is standardized and converted into a postfix notation, the
given NFA transition table was made. The string for “ab” was accepted after traversing q8 →
q1 → q3 , for “ba” it’s q8→ q5 → q7, and for “baba” q8→ q5 → q7→q5 → q7. With q8 as
the start state and q3 and q7 as accept states, if the string reaches these accept states then the
test string is accepted. However, inputs such as blank and “abab” are rejected because in the
blank state, its state only stays in q8 while in “abab” traverses q8 → q1→ q3→ Φ thus both
CONCLUSION
regex into a Non-Deterministic Finite Automaton with the use of multiple functions and
following certain algorithms. The program works by first converting the regex into a
standardized form, followed by transforming it into a postfix notation. Afterwards, it uses the
After constructing the NFA, the test strings are then simulated and tested if they are
accepted or not. If they end in an accepting state, they are accepted; otherwise they are
rejected. The implementation demonstrates the use of Automata theory, and as long as both
regex and input strings are clearly defined, the program can generate any language.
Future works may include adding more regular expression operators, conversion of
NFA to DFA, and lastly, optimizing lines of code. Several lessons have been learned in the
making of this project such as how to implement graphs by using a table implemented by
dictionaries and how to implement a transition table. Other concepts have also been learned
such as the Postfix notation or the Reverse Polish notation. Lastly, what we learned overall is
the in-depth implementation and understanding of the regular expression to NFA conversion.
References
https://cs.stanford.edu/people/eroberts/courses/soco/projects/2004-05/automata-theory
/basics.html
GeeksforGeeks.
https://www.geeksforgeeks.org/theory-of-computation/regular-expressions-regular-gra
mmar-and-regular-languages/
https://www.geeksforgeeks.org/theory-of-computation/introduction-of-finite-automata
https://www.geeksforgeeks.org/theory-of-computation/applications-of-various-automa
ta/
https://www.tutorialspoint.com/automata_theory/constructing_fa_from_re.htm
Appendix
CURRICULUM VITAE