Assignment # 1
Question 1:
(5 + 5 = 10)
Calculate Tokens with its type in a Given Code Snippet?
Code:
int main() {
int a = 10;
float b = 20.5;
a = a + 1;
printf("Sum: %d", a + b);
return 0;
Solution:
Now that we have identified each token, we can classify and count them:
Token Type
Keywor
int
d
Identifie
main
r
Delimit
(
er
Delimit
)
er
Delimit
{
er
Keywor
int
d
Identifie
a
r
Operato
=
r
10 Literal
; Delimit
er
Keywor
float
d
Identifie
b
r
Operato
=
r
20.5 Literal
Delimit
;
er
Identifie
a
r
Operato
=
r
Identifie
a
r
Operato
+
r
1 Literal
Delimit
;
er
Identifie
printf
r
Delimit
(
er
"Sum:
Literal
%d"
Delimit
,
er
Identifie
a
r
Operato
+
r
Identifie
b
r
Delimit
)
er
Delimit
;
er
return Keywor
d
0 Literal
Delimit
;
er
Delimit
}
er
Total Tokens: 34
Summary of Token Types
Keywords: 4 (int, int, float, return)
Identifiers: 8 (main, a, b, a, a, printf, a, b)
Operators: 5 (=, =, =, +, +)
Literals: 5 (10, 20.5, 1, "Sum: %d", 0)
Delimiters: 12 ((, ), {, ;, ;, ;, (, ,, ), ;, }, ;)
Question 2:
What is the concept of Regular Expression (RE) and why we used it
in the Formal Language?
Solution:
A regular expression (regex) is a sequence of characters that defines a
search pattern, primarily for use in pattern matching with strings. Regular
expressions are foundational in formal language theory, a branch of
computer science that studies the syntax and structure of languages,
especially in terms of automata and formal grammars.
Key Concepts of Regular Expressions
In formal language and automata theory, regular expressions are used to
describe patterns in strings that belong to a regular language. A regular
language is one that can be expressed using a finite state machine (FSM) or
a deterministic finite automaton (DFA), making it suitable for describing
simple syntax in programming languages, lexical analysis, search patterns,
and more.
Why Regular Expressions Are Used in Formal Language
1. Lexical Analysis in Compilers: Regular expressions are fundamental
in lexical analysis, where the compiler uses them to recognize tokens
(keywords, identifiers, literals, etc.) in programming languages. Each
type of token has a corresponding regular expression that matches its
structure. For example:
a. Keywords: int|float|if|else
b. Identifiers: [a-zA-Z_][a-zA-Z0-9_] *
c. Integer literals: [0-9]+
2. String Matching and Pattern Recognition: Regular expressions
allow for efficient pattern recognition in strings. They can identify
sequences that match a particular structure, such as email addresses,
dates, phone numbers, etc., by specifying the rules of composition.
3. Finite State Machines (FSM): Regular expressions can be converted
into finite automata, such as DFA or NFA (nondeterministic finite
automata), which makes them easy to process computationally. This
relationship enables regular expressions to provide a framework for
designing and understanding FSMs, which in turn form the basis for
recognizing patterns in text or code.
4. Simplicity and Efficiency: Regular expressions allow for concise and
efficient specification of patterns. Since they only require finite
memory (can be processed without requiring unbounded memory like
stack), they are computationally efficient, which is ideal for tokenizing
programming languages in compilers.
Example of Regular Expressions in Formal Language
Suppose we want to recognize integer literals and identifiers:
Integer literals: [0-9]+ matches any sequence of digits, such as 123,
4567, etc.
Identifiers: [a-zA-Z_][a-zA-Z0-9_]* matches strings that start with
a letter or underscore and are followed by letters, numbers, or
underscores (e.g., count, _value1).
In formal language, regular expressions make defining these patterns
efficient, and they help in constructing finite automata for recognizing
language patterns systematically. This efficiency and precision are why
regular expressions are integral in defining formal languages and are widely
used in compiler construction and text processing tasks.