Ch2 - Lexical Analysis
Ch2 - Lexical Analysis
Lexical Analysis/Scanner
• Note that eof retains its use as a marker for the end of the entire
input.
• Any eof that appears other than at the end of a buffer means that
the input is at an end.
▪ All Strings in English alphabet that start with “tab” or end with
“bat”: tab{A,…,Z,a,...,z}*|{A,…,Z,a,....,z}*bat
▪ All Strings in English alphabet in Which {1,2,3} exist in
ascending order: {A,…,Z}*1 {A,…,Z}*2 {A,…,Z}*3 {A,…,Z}*
▪ where:
• Each di is a new symbol, not in and not the same as any other of the
d's
• Each ri is a regular expression over the alphabet U {dl, d2,. . . , di-1).
Note that
• * is used to indicate
that we must retract the
input/forward pointer
one position ( back)
• i.e., the lexeme does
not include the symbol
that got us to the
accepting state
NB: A non-letter/digit is used to test any character that cannot be the continuation of an identifier
Input Symbol
State 0 1
A {A,B} {A,C}
Transition graph shown B {D} --
above can be represented C -- {D}
by transition table on the D {D} {D}
right side
Chapter – 2 : Lexical Analysis 42 Bahir Dar Institute of Technology
Nondeterministic Finite Automata…
▪ Acceptance of Input Strings by Automata
• An NFA accepts an input string str iff there is some path in the
finite-state transition diagram from the start state to some final
state such that the edge labels along this path spell out str
• The language recognized by an NFA is the set of strings it
accepts
• Example
1
1
• Ex: a|b
▪ Example: (a|b)*
let Dtran[A, a] = B
▪ compute Dtran[A, b].
• Among the states in A, only 4 has a transition on b, and it
goes to 5
a
start a b b
0 1 2 3
b
-closure(0) = {0}
move(0,a) = {0,1} New states
move(0,b) = {0} a b
move({0,1}, a) = {0,1} A B A
A = {0}
move({0,1}, b) = {0,2} B = {0,1} B B C
move({0,2}, a) = {0,1} C = {0,2}
move({0,2}, b) = {0,3} C B D
D = {0,3}
move({0,3}, a) = {0,1} D B A
move({0,3}, b) = {0}
NFA to DFA conversion: Example 3
a
a
1 2
start
0 b
b
3 4
Regular
R RE NFA
Expression Conversion
NFA DFA
Conversion
▪ Solution:
{𝐴, 𝐵, 𝐶, 𝐷, 𝐸}
{𝐴, 𝐵, 𝐶} {𝐷}
{𝐴, 𝐶} {𝐵}
lex.yy.c C a.out
compiler
%%
{ws} {/* no action and no return */}
if {return(IF);}
then {return(THEN);}
else {return(ELSE);}
{id} {yylval = (int) installID(); return(ID); }
{number} {yylval = (int) installNum(); return(NUMBER);}
…