0% found this document useful (0 votes)
34 views24 pages

Finite Automaton Simulation With Regular Expression

The document outlines a project focused on simulating a Finite Automaton derived from a regular expression to determine if a given input string is accepted or rejected. It details the program's design, including steps for regex standardization, conversion to postfix notation, and the creation of a Non-deterministic Finite Automaton (NFA). The project serves as a practical application of automata theory and compiler design principles.

Uploaded by

Maxil Urocay
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views24 pages

Finite Automaton Simulation With Regular Expression

The document outlines a project focused on simulating a Finite Automaton derived from a regular expression to determine if a given input string is accepted or rejected. It details the program's design, including steps for regex standardization, conversion to postfix notation, and the creation of a Non-deterministic Finite Automaton (NFA). The project serves as a practical application of automata theory and compiler design principles.

Uploaded by

Maxil Urocay
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

UNIVERSITY OF NEGROS OCCIDENTAL – RECOLETOS, INC.

COLLEGE OF INFORMATION TECHNOLOGY


#51 Lizares St, Bacolod City, 6100 Negros Occidental, Philippines

FINITE AUTOMATON SIMULATION WITH REGULAR EXPRESSION

AUTOMATA AND COMPILER THEORIES AND FORMAL LANGUAGES

Submitted to: ​

Mr. Maxil S. Urocay, MSCS ongoing

And to the Faculty of the

College of Information Technology

University of Negros Occidental – Recoletos, Inc.

In Partial Fulfillment

Of the Requirements for the Course

CSP20233X

By:

Rujz Jeamar Bernaldez

Cristeto III Turiano

September 25, 2025

i
UNIVERSITY OF NEGROS OCCIDENTAL – RECOLETOS, INC.
COLLEGE OF INFORMATION TECHNOLOGY
#51 Lizares St, Bacolod City, 6100 Negros Occidental, Philippines

INTRODUCTION..................................................................................................................1
Problem Definition:............................................................................................................. 2
Input Requirements.......................................................................................................2
Accepted RE Symbols.................................................................................................... 2
PROGRAM DESIGN............................................................................................................. 5
Overview of the Program.................................................................................................... 5
Regex Input.................................................................................................................. 5
Standardization............................................................................................................ 5
Conversion to Postfix (Reverse Polish Notation)......................................................... 5
Regex to NFA converter............................................................................................... 5
Input String................................................................................................................... 5
String Validation............................................................................................................5
Input/Output....................................................................................................................... 6
Algorithm Implementation:.................................................................................................6
CODE WALKTHROUGH........................................................................................................8
TESTING........................................................................................................................... 16
CONCLUSION....................................................................................................................18
References....................................................................................................................... 19
Appendix............................................................................................................................... 20
CURRICULUM VITAE............................................................................................................21

College of Information Technology ​


UNIVERSITY OF NEGROS OCCIDENTAL – RECOLETOS, INC.
COLLEGE OF INFORMATION TECHNOLOGY
#51 Lizares St, Bacolod City, 6100 Negros Occidental, Philippines

INTRODUCTION

Automata theory, a branch of computer science that deals with the logic of

computation, focuses on abstract machines and the problems they can solve formally called

as Automatons. It’s an important subject for computer scientists to understand how machines

compute and decide whether a function is decidable or not (Basics of Automata Theory, n.d).

One of its core models is Finite Automatons (FA), which forms the basis of understanding

regular languages. This project focuses on simulating a finite automaton that is derived from

a regular expression.

Regular expressions (RE) or regex are concepts used commonly together with finite

automatons. They are notations that are used to define search patterns within strings

(GeeksforGeeks, n.d). By converting a regular expression into a finite automaton–either

Deterministic (DFA) or Non-Deterministic (NFA) – we can systematically determine if a

given string belongs to the language defined by the regular expression. The main goal of this

project is to design a regex to NFA converter and validate whether the given regex and input

string are of the same language.

The system works by taking a regular expression and standardizing it by making sure

that in between two symbols there is an operator. After standardizing, we convert the string

into a postfix notation version of the string to eliminate parentheses. Lastly, we take the

postfix string into the regular expression to NFA converter to generate an NFA using a

transition table. In the regular expression to NFA converter, we read each character in the

string. If the character is a symbol, we make an NFA fragment, else if it’s an operator, we do

the corresponding connection in between NFA fragments.

College of Information Technology ​


1
UNIVERSITY OF NEGROS OCCIDENTAL – RECOLETOS, INC.
COLLEGE OF INFORMATION TECHNOLOGY
#51 Lizares St, Bacolod City, 6100 Negros Occidental, Philippines

PROBLEM DEFINITION

The problem is to develop a program that simulates a Finite Automaton (DFA or

NFA) and accepts or rejects strings based on a given regular expression. The RE should be

converted to an automaton before string recognition. As such, the assumptions for the

following projects goes as the following:

a.​ Input Requirements

-​ The user must always provide both a regular expression and an input string for

the system to process.

b.​ Accepted RE Symbols

-​ Only alphanumeric characters and operators such as Concatenation (“.”),

Union (“+”), and Kleene Star (“*”)

​ In a Non-deterministic Finite Automaton (NFA), the language is the set of all strings

over a given alphabet that the NFA "accepts". Since the given problem is to simulate a Finite

Automaton from a given regular expression, as long as both regex and input strings are

clearly defined, the program can generate any language. Thus any regular languages can be

expressed based on a given regular expression inputted by the user.

College of Information Technology ​


2
UNIVERSITY OF NEGROS OCCIDENTAL – RECOLETOS, INC.
COLLEGE OF INFORMATION TECHNOLOGY
#51 Lizares St, Bacolod City, 6100 Negros Occidental, Philippines

THEORETICAL BACKGROUND

​ Automata theory is a branch of computer science that deals with the designing and

creation of abstract machines that follows a predetermined set of operations automatically

(Tutorialpoint, n.d). These abstract machines are commonly referred to as Automatons. These

automatons are used for natural language processing, pattern matching in text, or designing

compilers.

​ The components of an Automaton are states (Q) , alphabet (Σ), transition function (δ),

initial state (q0), and accept state (F). States are defined as the finite set of conditions that the

automaton can be in. Alphabets refers to the finite set of symbols that the automatons read as

input. Transition functions show how the automaton moves from one state to another based

on the language or input. The Initial state is where the computation/algorithm first begins.

Lastly, the Final state indicates whether the given input was accepted or rejected.

​ Another fundamental concept used in Automata theory is the use of Regular

Expressions (RE) or regex. These are formal notations that describe patterns in strings, and

they define regular languages, the same class of language recognized by finite automatons.

Regular expressions are constructed using operators such as Concatenation (+), Union (|), and

Kleene Star (*) to enable definition of more complex patterns.

The conversion of a regular expression to an NFA involves multiple steps such as

standardization, conversion to postfix notation, then finally postfix to NFA. Standardization

transforms the RE input into a standardized format such that when a pair of symbols are

present without an operator, a concatenation operator is added. The standardized RE is then

converted to postfix notation by removing the parentheses and moving the operator of the

College of Information Technology ​


3
UNIVERSITY OF NEGROS OCCIDENTAL – RECOLETOS, INC.
COLLEGE OF INFORMATION TECHNOLOGY
#51 Lizares St, Bacolod City, 6100 Negros Occidental, Philippines

pair of states to the right side of the symbols. The postfixed RE is then converted to an NFA

by reading each character and if that character is a symbol, it creates an NFA fragment with a

start state, an accept state, and a transition function to connect both states. If it's an operator, a

corresponding function handles the connection between the NFA fragments.

College of Information Technology ​


4
UNIVERSITY OF NEGROS OCCIDENTAL – RECOLETOS, INC.
COLLEGE OF INFORMATION TECHNOLOGY
#51 Lizares St, Bacolod City, 6100 Negros Occidental, Philippines

PROGRAM DESIGN

Overview of the Program

The program is designed to simulate a Finite Automaton, specifically NFA, that is

derived from a given regex and determines whether a given input string is accepted or

rejected. The program is implemented in Python for its simplicity and its extensive libraries

that makes the program possible.

To fully simulate a Finite Automaton, the program needs to follow a structured flow

from providing a valid regex, converting it to NFA, and checking whether the inputted string

is accepted or not. Specifically, the program does the following:

1.​ Regex Input - The user provides a valid regex

2.​ Standardization - The regex is preprocessed into a consistent format.

3.​ Conversion to Postfix (Reverse Polish Notation) - After standardization, the

regex is converted into a postfix notation for the operator precedence.

4.​ Regex to NFA converter - After the postfix, it creates a transition table to

convert the regex into an NFA.

5.​ Input String - The user inputs the string to be tested on the automaton.

6.​ String Validation - the NFA simulates state transitions for each character in the

input string. If the automaton ends in an accepting state, the string is accepted;

otherwise, it is rejected.

​ By doing these steps, the program not only validates the input strings correctly but

also demonstrates the practical application of Automata theory.

College of Information Technology ​


5
UNIVERSITY OF NEGROS OCCIDENTAL – RECOLETOS, INC.
COLLEGE OF INFORMATION TECHNOLOGY
#51 Lizares St, Bacolod City, 6100 Negros Occidental, Philippines

Input/Output​

The program requires users to input a regex and input strings during compile time.

The regex must contain only alphanumeric characters and certain operators for it to be valid.

For the given example of string A and B, some valid expressions are (A.B), (A+B), (AB*),

etc,.

​ During the simulation process, the regex is converted into NFA where it presents a

transition table and compares the given input string with the converted regex. If it is of the

same language, the output would be Accepted, else it would be Rejected.

Algorithm Implementation:

​ To fully simulate the NFA-based automaton, the algorithm required several key steps

to fully convert a given regex into an NFA, and they can be broken down into the following:

1.​ Regular Expression Standardization

-​ The inputted regular expressions are first standardized to make concatenation

explicit. For example, the regex AB* is converted into A.B*.

2.​ Conversion to Postfix Notation:

-​ The standardized regex is converted into postfix to simplify operator

precedence. An example being the infix regex A+B will become AB+.

3.​ Regex to NFA Conversion

-​ The postfix notation is then processed symbol after symbol to construct the

NFA. If it’s an alphanumeric character or a symbol, it creates an NFA with one

transition from a start to an accept state. Operators such as Concatenation, do

specific tasks based on their operation.

College of Information Technology ​


6
UNIVERSITY OF NEGROS OCCIDENTAL – RECOLETOS, INC.
COLLEGE OF INFORMATION TECHNOLOGY
#51 Lizares St, Bacolod City, 6100 Negros Occidental, Philippines

-​ Concatenation (.) connects the accept state of the first NFA to the start state of

the second. Connection of two NFA fragments refers to updating the transition

table of the first NFA such that the transition function of the accept state of

that NFA will be updated as the transition function of the start state of the

second NFA

-​ Union (+) creates NFA with a new start state then it reads the past NFA

fragments. The new start state is then connected to the accept state of the NFA

fragments. This is done by copying the transition function of the start states of

the two or more NFA fragments then applying it to the transition function of

the start state of the new NFA.

-​ Lastly, the Kleene Star (*) makes the start state accepting and then adds a

transition from the starting state to itself.

-​ They will all then be combined step by step until the full NFA is constructed.

After which, another NFA object will be created to traverse the whole NFA to

create a new NFA without any dead states.

4.​ String Simulation:

-​ The given input string will be tested to be evaluated if it’s accepted on the

NFA or not. For each character in the input string, it checks all possible

transitions from the current states and moves on to the next states according to

the transition table. If any current state is an accepting state, then the string is

accepted, otherwise it is rejected.

​ By following this algorithm, the program can fully simulate an NFA and generate any

language as long as the given regex and input strings are clearly defined.

College of Information Technology ​


7
UNIVERSITY OF NEGROS OCCIDENTAL – RECOLETOS, INC.
COLLEGE OF INFORMATION TECHNOLOGY
#51 Lizares St, Bacolod City, 6100 Negros Occidental, Philippines

CODE WALKTHROUGH

​ There are multiple functions that make this simulation possible, from creating the

states to the regex conversion process.


Figure 1. State Creation function

​ In this portion of the code, the new_state() function is responsible for creating unique

state names to ensure every NFA fragment has distinct states during the construction. We also

have the constructor that initializes the states, its starting and accepting states, as well as the

transitions table which are implemented using a dictionary. This dictionary contains states as

keys and another dictionary as its value, this inner dictionary contains the alphabet as keys

and a set of destination states as its value.

College of Information Technology ​


8
UNIVERSITY OF NEGROS OCCIDENTAL – RECOLETOS, INC.
COLLEGE OF INFORMATION TECHNOLOGY
#51 Lizares St, Bacolod City, 6100 Negros Occidental, Philippines

Figure 2. State and Transition Addition function

​ The add_state() function is responsible for adding the states, identifying it if it’s either

a start or accepting state. This new state is added to its corresponding variables or containers

and then added to the transition table as a key. The add_transition() function defines the

transitions between states for a given symbol. The from state’s dictionary is updated by

adding a symbol and a set if it doesn’t exist, otherwise it just appends the destination state to

the set of destination states inside the dictionary of that from state.

College of Information Technology ​


9
UNIVERSITY OF NEGROS OCCIDENTAL – RECOLETOS, INC.
COLLEGE OF INFORMATION TECHNOLOGY
#51 Lizares St, Bacolod City, 6100 Negros Occidental, Philippines

Figure 3. Display Function

​ The display() function prints out the transition table of the given NFA. First it collects

all symbols present in the RE. A header row is then created by joining the symbols. The loop

iterates each state then it prints the name of that state followed by the set of destination states

from the alphabetical order of the symbols.

College of Information Technology ​


10
UNIVERSITY OF NEGROS OCCIDENTAL – RECOLETOS, INC.
COLLEGE OF INFORMATION TECHNOLOGY
#51 Lizares St, Bacolod City, 6100 Negros Occidental, Philippines

Figure 4. Simulate Function

​ The simulate() function tests whether the given input string/test string is accepted or

rejected. Each character in the test string is read and traverses the graph if a state has a

transition function for that specific character. After all of the characters are read, it checks if

the state is an accept state thus returns as accepted.

College of Information Technology ​


11
UNIVERSITY OF NEGROS OCCIDENTAL – RECOLETOS, INC.
COLLEGE OF INFORMATION TECHNOLOGY
#51 Lizares St, Bacolod City, 6100 Negros Occidental, Philippines

Figure 5. Regex to NFA converter

​ This function is done after converting the given regex into a postfix notation.

Alphanumeric symbols create the starting and accepting states and are added into a stack.

Operations such as concatenation take the last two fragments of the NFA by popping the

stack and are connected by replacing the transition table of the first NFA’s accept state with

the transition table of the start state of the second NFA. In union, a new NFA with a new start

state is created and is connected by replacing the transition table of the start state to the

transition table of the start state of the NFA fragments. Lastly, in kleene star, the start state

will become an accept state and connected to itself.

College of Information Technology ​


12
UNIVERSITY OF NEGROS OCCIDENTAL – RECOLETOS, INC.
COLLEGE OF INFORMATION TECHNOLOGY
#51 Lizares St, Bacolod City, 6100 Negros Occidental, Philippines

Figure 6. Standardize function

​ The inputted regex must be standardized before converting it into an NFA. The

standardize() function transforms the regex to make concatenation explicit. It adds a

concatenation operator by checking if a pair of symbols or a closing parenthesis followed by

an opening parenthesis don’t have an operation between them. This ensures that cases like

"ab", or "a*b" are rewritten as "a.b" and "a*.b", making the structure of the regex

unambiguous and much easier to parse correctly during NFA construction.

College of Information Technology ​


13
UNIVERSITY OF NEGROS OCCIDENTAL – RECOLETOS, INC.
COLLEGE OF INFORMATION TECHNOLOGY
#51 Lizares St, Bacolod City, 6100 Negros Occidental, Philippines

Figure 7. Postfix (Reverse Polish Notation) function

​ After standardizing the regex, the postfix() function converts it into a postfix notation

using operator precedence to make building the NFA easier. It reads every character then it

checks if it’s an operator then it adds it to the stack if it’s empty but if another operator is

present in the stack then it checks the precedence such that if the current operator’s

precedence is lower than the previous operator then the past operator is appended. For

example, “a+b.c” will output “abc.+” because the “.” operator has a higher precedence thus it

is appended first.

College of Information Technology ​


14
UNIVERSITY OF NEGROS OCCIDENTAL – RECOLETOS, INC.
COLLEGE OF INFORMATION TECHNOLOGY
#51 Lizares St, Bacolod City, 6100 Negros Occidental, Philippines

Figure 8. Main function

​ In the main() function, it is where the users input the regex and test strings. It prints

out the original regex, then it becomes standardized, then converted into postfix notation,

then into an NFA transition table, then reconstructed to clean the NFA by eliminating the

dead states, lastly the transition table is displayed. Test strings are then declared and a for

loop is implemented to check if those test strings are accepted or rejected. Lastly, the set of

accept states are displayed.

College of Information Technology ​


15
UNIVERSITY OF NEGROS OCCIDENTAL – RECOLETOS, INC.
COLLEGE OF INFORMATION TECHNOLOGY
#51 Lizares St, Bacolod City, 6100 Negros Occidental, Philippines

TESTING

​ To verify and test the accuracy of the program, multiple test cases were designed to

cover normal inputs, edge cases, and special cases.

Figure 9 & 10 Simulation

College of Information Technology ​


16
UNIVERSITY OF NEGROS OCCIDENTAL – RECOLETOS, INC.
COLLEGE OF INFORMATION TECHNOLOGY
#51 Lizares St, Bacolod City, 6100 Negros Occidental, Philippines

​ This test contains the regex “ab+(ba)*” and the input string are: “”, ”ab”, ”ba”,

“baba”, and “abab”. After the regex is standardized and converted into a postfix notation, the

given NFA transition table was made. The string for “ab” was accepted after traversing q8 →

q1 → q3 , for “ba” it’s q8→ q5 → q7, and for “baba” q8→ q5 → q7→q5 → q7. With q8 as

the start state and q3 and q7 as accept states, if the string reaches these accept states then the

test string is accepted. However, inputs such as blank and “abab” are rejected because in the

blank state, its state only stays in q8 while in “abab” traverses q8 → q1→ q3→ Φ thus both

test strings are rejected.

College of Information Technology ​


17
UNIVERSITY OF NEGROS OCCIDENTAL – RECOLETOS, INC.
COLLEGE OF INFORMATION TECHNOLOGY
#51 Lizares St, Bacolod City, 6100 Negros Occidental, Philippines

CONCLUSION

​ This project successfully simulates a Finite automaton. Specifically, it can convert a

regex into a Non-Deterministic Finite Automaton with the use of multiple functions and

following certain algorithms. The program works by first converting the regex into a

standardized form, followed by transforming it into a postfix notation. Afterwards, it uses the

regex and converts it into a NFA.

​ After constructing the NFA, the test strings are then simulated and tested if they are

accepted or not. If they end in an accepting state, they are accepted; otherwise they are

rejected. The implementation demonstrates the use of Automata theory, and as long as both

regex and input strings are clearly defined, the program can generate any language.

​ Future works may include adding more regular expression operators, conversion of

NFA to DFA, and lastly, optimizing lines of code. Several lessons have been learned in the

making of this project such as how to implement graphs by using a table implemented by

dictionaries and how to implement a transition table. Other concepts have also been learned

such as the Postfix notation or the Reverse Polish notation. Lastly, what we learned overall is

the in-depth implementation and understanding of the regular expression to NFA conversion.

College of Information Technology ​


18
UNIVERSITY OF NEGROS OCCIDENTAL – RECOLETOS, INC.
COLLEGE OF INFORMATION TECHNOLOGY
#51 Lizares St, Bacolod City, 6100 Negros Occidental, Philippines

References

Basics of Automata Theory. (n.d.).

https://cs.stanford.edu/people/eroberts/courses/soco/projects/2004-05/automata-theory

/basics.html

GeeksforGeeks. (n.d). Regular expressions, regular grammar and regular languages.

GeeksforGeeks.

https://www.geeksforgeeks.org/theory-of-computation/regular-expressions-regular-gra

mmar-and-regular-languages/

GeeksforGeeks. (n.d). Introduction of finite automata. GeeksforGeeks.

https://www.geeksforgeeks.org/theory-of-computation/introduction-of-finite-automata

GeeksforGeeks. (n.d). Applications of various Automata. GeeksforGeeks.

https://www.geeksforgeeks.org/theory-of-computation/applications-of-various-automa

ta/

Construction of an FA from an RE. (n.d.).

https://www.tutorialspoint.com/automata_theory/constructing_fa_from_re.htm

Automata Theory Tutorial. (n.d.). https://www.tutorialspoint.com/automata_theory/index.htm

College of Information Technology ​


19
UNIVERSITY OF NEGROS OCCIDENTAL – RECOLETOS, INC.
COLLEGE OF INFORMATION TECHNOLOGY
#51 Lizares St, Bacolod City, 6100 Negros Occidental, Philippines

Appendix

College of Information Technology ​


20
UNIVERSITY OF NEGROS OCCIDENTAL – RECOLETOS, INC.
COLLEGE OF INFORMATION TECHNOLOGY
#51 Lizares St, Bacolod City, 6100 Negros Occidental, Philippines

CURRICULUM VITAE

Name ​ : ​ Rujz Jeamar P. Bernaldez


Program of Study : ​ BS in Computer Science
Date of Birth ​ : ​ December 29, 2004
Contact Details ​ : ​ [email protected];
​ 09682952204
Areas of Interest : ​ Software Engineering, Web dev, ​
​ ​ ​ ​ Computer Programmer

Name ​ : ​ Cristeto III M. Turiano


Program of Study : ​ BS in Computer Science
Date of Birth ​ : ​ November 27, 2004
Contact Details ​ : ​ [email protected];
​ 09762407404
Areas of Interest : ​ Cybersecurity, Software Engineer,
​ Computer Programmer

College of Information Technology ​


21
UNIVERSITY OF NEGROS OCCIDENTAL – RECOLETOS, INC.
COLLEGE OF INFORMATION TECHNOLOGY
#51 Lizares St, Bacolod City, 6100 Negros Occidental, Philippines

AUTOMATA PRELIM PROJECT RUBRICS:

RUBRICS CATEGORY SCORES

Title of the Project: Clear, concise, and accurately reflects 5 4 3 2 1


the project.
Introduction: Clear, concise explanation of the project, 5 4 3 2 1
problem, and importance
Problem Definition: Clear, concise, and defines the problem 5 4 3 2 1
perfectly
Theoretical Background: Comprehensive and accurate 5 4 3 2 1
definition with components well-explained
Program Design: Clear, well-structured program flow 5 4 3 2 1

Code Walk-through: Comprehensive, clear breakdown of the 5 4 3 2 1


code with references to key functions
Testing: Well-documented test cases with sample input/output 5 4 3 2 1

Conclusion: Clear and concise summary, highlights major 5 4 3 2 1


accomplishments
References & Appendix: Comprehensive and correctly 5 4 3 2 1
formatted list of references. Relevant and well-organized
additional information (e.g., full code, diagrams)
Video Walk-through: Clear, concise, and professional 5 4 3 2 1
presentation with detailed explanations
TOTAL SCORE:

College of Information Technology ​


22

You might also like