Compiler Design CS - 3
Compiler Design CS - 3
BITS Pilani
Pilani | Dubai | Goa | Hyderabad
BITS Pilani
Pilani | Dubai | Goa | Hyderabad
Contact Session - 3
Introduction to Lexical Analyzer
IMP Note to Self
5
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Hierarchy of languages
Recursive Languages
Context-Free Languages
Regular Languages
6
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Deterministic Finite State Automata (DFA)
……..
0 1 1 0 0
Finite
Control
• One-way, infinite tape, broken into cells
• One-way, read-only tape head.
• Finite control, i.e.,
– finite number of states, and
– transition rules between them, i.e.,
– a program, containing the position of the read head, current symbol being scanne
d, and the current “state.”
• A string is placed on the tape, read head is positioned at the left end,
and the DFA will read the string one symbol at a time until all symbols
have been read. The DFA will then either accept or reject the string.
7
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
• The finite control can be described by a transition diagram or table:
• Example #1:
1
0
q0 q1 1
0
1 0 0 1 1
q0 q0 q1 q0 q0 q0
M1 M2 …. M-inf
1 1
q3 q2
1 0 0 0
q0 0 q1 1
1 0 0 1 1
q0 q3 q1 q2 q2 q2 accept string
a a a/b/c
c c
q0 q1 q2
b b
a c c c b accepted
q0 q0 q1 q2 q2 q2
a a c rejected
q0 q0 q0 q1
11
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
a a a/b/c
q0 c q1 c q2
b b
Inductive Proof (sketch): that the machine correctly accepts strings with at least two c’s
Proof goes over the length of the string.
Inductive steps: Each case for symbol p, for string xp (|xp| = k+1), the last symbol p = a, b or c
xa xb xc
• A DFA is a five-tuple:
M = (Q, Σ, δ, q0, F)
Intuitively, δ(q,s) is the state entered by M after reading symbol s while in sta
te q.
13
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
• Revisit example #1: 1
Q = {q0, q1} 0
Σ = {0, 1} q0 q1 1
Start state is q0 0
F = {q0}
δ:
0 1
q0 q1 q0
q1 q0 q1
14
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
• Revisit example #2:
a a a/b/c
Q = {q0, q1, q2}
Σ = {a, b, c} c c
q0 q1 q2
Start state is q0
F = {q2} b b
δ: a b c
q0 q0 q0 q1
q1 q1 q1 q2
q2 q2 q2 q2
15
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Extension of δ to Strings
δ^ : (Q x Σ*) –> Q
δ^(q,w) – The state entered after reading string w having started in state
q.
Formally:
1) δ^(q, ε) = q, and
2) For all w in Σ* and a in Σ
δ^(q,wa) = δ (δ^(q,w), a)
16
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
• Recall Example #1: 1
0
q0 q1 1
0
• What is δ^(q0, 011)? Informally, it is the state entered by M after proc
essing 011 having started in state q0.
• Formally:
• Therefore:
18
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
• Example #3:
1 1 1
0
0
q0 q1 q2
0
• What is δ(q0, 011)? Informally, it is the state entered by M after proce
ssing 011 having started in state q0.
• Formally:
δ(q0, 011) = δ (δ(q0,01), 1) by rule #2
= δ (δ (δ(q0,0), 1), 1) by rule #2
= δ (δ (q1, 1), 1) by definition of δ
= δ (q1, 1) by definition of δ
= q1 by definition of δ
• Is 10 accepted? No, since δ(q0, 10) = q1 is not a final state. The fact tha
t δ(q1, 10) = q2 is irrelevant, q1 is not the start state!
20
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Definitions related to DFAs
• Let M = (Q, Σ, δ,q0,F) be a DFA and let w be in Σ*. Then w is accepted by M if
f δ(q0,w) = p for some state p in F.
• Let M = (Q, Σ, δ,q0,F) be a DFA. Then the language accepted by M is the set:
• Let M1 = (Q1, Σ1, δ1, q0, F1) and M2 = (Q2, Σ2, δ2, p0, F2) be DFAs. Then M1 and
M2 are equivalent iff L(M1) = L(M2).
21
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
• Notes:
– A DFA M = (Q, Σ, δ,q0,F) partitions the set Σ* into two sets: L(M) and
Σ* - L(M).
– If L = L(M) then L is a subset of L(M) and L(M) is a subset of L (def. of set equality).
– Similarly, if L(M1) = L(M2) then L(M1) is a subset of L(M2) and L(M2) is a subset of L(
M1).
• Can you write a program to “simulate” a given DFA, or any arbitrary input DFA?
22
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
• Give a DFA M such that:
0/1
0/1 0/1
q0 q1 q2
23
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
• Give a DFA M such that:
b/c a/b/c
a
a
q0 q1 q2
b/c
Logic:
In Start state (q0): b’s and c’s: ignore – stay in same state
q0 is also “accept” state
First ‘a’ appears: get ready (q1) to reject
But followed by a ‘b’ or ‘c’: go back to start state q0
When second ‘a’ appears after the “ready” state: go to reject state q2
Ignore everything after getting to the “reject” state q2
24
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
• Give a DFA M such that:
b/c a a/b/c
a a
b
q0 q1 q2 q3
c
b/c
Logic: acceptance is straight forward, progressing on each expected sym
bol
However, rejection needs special care, in each state (for DFA, we will see t
his becomes easier in NFA, non-deterministic machine)
25
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
• Give a DFA M such that:
q0 a b q7
b
b a
b a
q4 q5 q6
b
Remember, you may have multiple “final” states, but only one “start” stat
e
26
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
• Let Σ = {0, 1}. Give DFAs for {}, {ε}, Σ*, and Σ+.
0/1
q0 q0 q1
27
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
• Problem: Third symbol from last is 1
0/1
1 0/1 0/1
q0 q1 q2 q3
Is this a DFA?
28
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Practice problem 1
• Draw a DFA for the language accepting strings starting with ‘ab’ over input alphab
ets ∑ = {a, b}
– Solution:
– Step-01:
• All strings of the language starts with substring “ab”. So, length of substring =
2.
– Thus, Minimum number of states required in the DFA = 2 + 2 = 4.
– It suggests that minimized DFA will have 4 states.
– Step-02:
– We will construct DFA for the following strings-
• ab
• aba
• abab
29
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
• Step 3
30
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Practice problem 2
• Construct a DFA that accepts a language L over input alphabet
s ∑ = {a, b} such that L is the set of all strings starting with ‘aa’
or ‘bb’.
• Step-01:
• Minimum number of states required in the DFA = 5.
• It suggests that minimized DFA will have 5 states.
• Step-02:
• We will construct DFA for the following strings-
• aa
• aaa
• aaaa
• bb
• bbb
• bbbb
31
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
• Step 3
32
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Practice problem 3
• Let Σ = {a, b} and let L = { w ∈ Σ* | w ≠ ε and the first and last c
haracter of w are the same }. Design a DFA for L.
33
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Practice problem 4
• Let Σ = {a, b} and let L = { w ∈ Σ* | w is a nonempty string who
se characters alternate between a's and b's }. Design a DFA w
hose language is L.
34
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Nondeterministic Finite Automata
Nondeterminism
Subset Construction
35
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Nondeterminism
• A nondeterministic finite automaton has the a
bility to be in several states at once.
• Transitions from a state on an input symbol ca
n be to any set of states.
36
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Nondeterminism – (2)
• Start in one start state.
• Accept if any sequence of choices leads to a
final state.
• Intuitively: the NFA always “guesses right.”
37
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Formal NFA
• A finite set of states, typically Q.
• An input alphabet, typically Σ.
• A transition function, typically δ.
• A start state in Q, typically q0.
• A set of final states F ⊆ Q.
38
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Transition Function of an NFA
• δ(q, a) is a set of states.
• Extend to strings as follows:
• Basis: δ(q, ε) = {q}
• Induction: δ(q, wa) = the union over all states
p in δ(q, w) of δ(p, a)
39
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Language of an NFA
• A string w is accepted by an NFA if δ(q0, w) co
ntains at least one final state.
• The language of the NFA is the set of strings it
accepts.
40
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Equivalence of DFA’s, NFA’s
41
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Equivalence – (2)
42
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Subset Construction
43
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Critical Point
44
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Subset Construction – (2)
• The transition function δD is defined by:
δD({q1,…,qk}, a) is the union over all i = 1,…,k of
δN(qi, a).
• Example: We’ll construct the DFA equivalent o
f our “chessboard” NFA.
45
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Example: Subset Construction
r b r b
47
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Example: Subset Construction
r b r b
48
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Example: Subset Construction
r b r b
49
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Example: Subset Construction
r b r b
50
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Example: Subset Construction
r b r b
51
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Example: Subset Construction
r b r b
52
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
NFA’s With ε-Transitions
• We can allow state-to-state transitions on ε in
put.
• These transitions are done spontaneously, wit
hout looking at the input string.
• A convenience at times, but still only regular l
anguages are accepted.
53
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Example: ε-NFA
ε 0 1 ε
A {E} {B} ∅
1 1 B ∅ {C} {D}
1 B C D C ∅ {D} ∅
D ∅ ∅ ∅
A ε ε 0
* E
F
{F} ∅ {B, C}
{D} ∅ ∅
0
E F
0
54
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Closure of States
55
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Equivalence of NFA, ε-NFA
• Every NFA is an ε-NFA.
– It just has no transitions on ε.
• Converse requires us to take an ε-NFA and co
nstruct an NFA that accepts the same languag
e.
• We do so by combining ε–transitions with the
next transition on a real input.
a
Transitions
on ε
Transitions
on ε
57
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Picture of ε-Transition Removal
To here, and performs
Text goes
the subset construction
from here
a
Transitions
on ε
Transitions
on ε
58
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Picture of ε-Transition Removal
To here, with no
subset construction
a
We’ll go
from here
a
a
Transitions
on ε
Transitions
on ε
59
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Equivalence – (2)
• Start with an ε-NFA with states Q, inputs Σ, st
art state q0, final states F, and transition functi
on δE.
• Construct an “ordinary” NFA with states Q, inp
uts Σ, start state q0, final states F’, and transiti
on function δN.
60
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Equivalence – (3)
61
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Interesting closures:
Example: ε-NFA-to-NFA
CL(B) = {B,D};
CL(E) = {B,C,D,E}
0 1 ε 0 1
A {E} {B} ∅ A {E} {B}
B ∅ {C} {D} * B ∅ {C}
C ∅ {D} ∅ C ∅ {D}
D ∅ ∅ ∅ D ∅ ∅
*
* E {F} ∅ {B, C} E {F} {C, D}
F {D} ∅ ∅ * F {D} ∅
ε-NFA
Since closure of
E includes B and
Since closures of C; which have
B and E include transitions on 1
final state D. to C and D.
62
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Summary
• DFA’s, NFA’s, and ε–NFA’s all accept exactly th
e same set of languages: the regular language
s.
• The NFA types are easier to design and may h
ave exponentially fewer states than a DFA.
• But only a DFA can be implemented!
63
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
• https://www.gatevidyalay.com/tag/dfa-exam
ples-with-solutions-pdf/
• https://web.stanford.edu/class/cs103a/ass
n/070S%20Solutions%20for%20Practice%
20with%20Automata.pdf
• https://www.cse.iitd.ac.in/~naveen/courses/C
OL352/slides/fa3.ppt
64
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
IMP Note to Self