CSC 4181
Compiler Construction
Parsing
1
Outline
Top-down v.s. Bottom-up
Top-down parsing Bottom-up parsing
Recursive-descent Shift-reduce parsers
parsing
LR(0) parsing
LL(1) parsing
LR(0) items
LL(1) parsing
algorithm Finite automata of items
First and follow sets LR(0) parsing algorithm
Constructing LL(1) LR(0) grammar
parsing table SLR(1) parsing
Error recovery
SLR(1) parsing algorithm
SLR(1) grammar
Parsing conflict
Parsing 2
Introduction
Parsing is a process that constructs a
syntactic structure (i.e. parse tree) from the
stream of tokens.
We already learned how to describe the
syntactic structure of a language using
(context-free) grammar.
So, a parser only needs to do this?
Stream of tokens
Parser Parse tree
Context-free grammar
Parsing 3
Top–Down Parsing Bottom–Up Parsing
A parse tree is created A parse tree is created
from root to leaves from leaves to root
The traversal of parse trees The traversal of parse trees
is a preorder traversal is a reversal of postorder
Tracing leftmost derivation traversal
Two types: Tracing rightmost
Backtracking parser derivation
Predictive parser More powerful than top-
Try different
Backtracking:
down parsing
structures and backtrack if it
does not matched the input Guess the
Predictive:
structure of the parse
tree from the next input
Parsing 4
Parse Trees and Derivations
E
E E+E
E E
id + E
+
E E id + E * E
id *
id id id + id * E
Top-down parsing id + id * id
E
EE+E
E + E E + E * E
E E E + E * id
id *
E + id * id
id id
Bottom-up parsing id + id * id
Parsing 5
Top-down Parsing
What does a parser need to decide?
Which production rule is to be used at each point
of time ?
How to guess?
What is the guess based on?
What is the next token?
Reserved word if, open parentheses, etc.
What is the structure to be built?
If statement, expression, etc.
Parsing 6
Top-down Parsing
Why is it difficult?
Cannot decide until later
Next token: if Structure to be built: St
St MatchedSt | UnmatchedSt
UnmatchedSt
if (E) St| if (E) MatchedSt else UnmatchedSt
MatchedSt if (E) MatchedSt else MatchedSt |...
Production with empty string
Next token: id Structure to be built: par
par parList |
parList exp , parList | exp
Parsing 7
Recursive-Descent
Write one procedure for each set of
productions with the same nonterminal in
the LHS
Each procedure recognizes a structure
described by a nonterminal.
A procedure calls other procedures if it needs
to recognize other structures.
A procedure calls match procedure if it needs
to recognize a terminal.
Parsing 8
Recursive-Descent: Example
EEOF|F E ::= F {O F} For this grammar:
O+|- O ::= + | - We cannot decide which
F ( E ) | id F ::= ( E ) | id rule to use for E, and
If we choose E E O F,
procedure F procedure E it leads to infinitely
{ switch token { E; O; F; } recursive loops.
{ case (: match(‘(‘); Rewrite the grammar
E; into EBNF
match(‘)’);
case id: match(id); procedure E
default: error; { F;
} while (token=+ or token=-)
} { O; F; }
}
Parsing 9
Match procedure
procedure match(expTok)
{ if (token==expTok)
then getToken
else error
}
The token is not consumed until getToken
is executed.
Parsing 10
Problems in Recursive-Descent
Difficult to convert grammars into EBNF
Cannot decide which production to use at
each point
Cannot decide when to use -production
A
Parsing 11
LL(1) Parsing
LL(1)
Read input from (L) left to right
Simulate (L) leftmost derivation
1 lookahead symbol
Use stack to simulate leftmost derivation
Part of sentential form produced in the leftmost
derivation is stored in the stack.
Top of stack is the leftmost nonterminal symbol
in the fragment of sentential form.
Parsing 12
Concept of LL(1) Parsing
Simulate leftmost derivation of the input.
Keep part of sentential form in the stack.
If the symbol on the top of stack is a
terminal, try to match it with the next input
token and pop it out of stack.
If the symbol on the top of stack is a
nonterminal X, replace it with Y if we have a
production rule X Y.
Which production will be chosen, if there are
both X Y and X Z ?
Parsing 13
Example of LL(1) Parsing
E TX
FNX
n
F
(E)NX
(TX)NX T
N
( ( n + ( n ) ) * n $
(FNX)NX
(nNX)NX
(nX)NX X
E
(nATX)NX
(n+TX)NX
(n+FNX)NX
A
n
F
+
)
(n+(E)NX)NX E T X
(n+(TX)NX)NX (
T
N X A T X |
(n+(FNX)NX)NX
(n+(nNX)NX)NX
(n+(nX)NX)NX Finished
E
X
M
* A + | -
(n+(n)NX)NX T F N
(n+(n)X)NX
(n+(n))NX
F
)
n N M F N |
(n+(n))MFNX M *
(n+(n))*FNX T
N
(n+(n))*nNX F ( E ) | n
(n+(n))*nX
(n+(n))*n
E
X
Parsing $ 14
LL(1) Parsing Algorithm
Push the start symbol into the stack
WHILE stack is not empty ($ is not on top of stack) and the
stream of tokens is not empty (the next input token is not $)
SWITCH (Top of stack, next token)
CASE (terminal a, a):
Pop stack; Get next token
CASE (nonterminal A, terminal a):
IF the parsing table entry M[A, a] is not empty THEN
Get A X1 X2 ... Xn from the parsing table entry M[A,
a] Pop stack;
Push Xn ... X2 X1 into stack in that order
ELSE Error
CASE ($,$): Accept
OTHER: Error
Parsing 15
LL(1) Parsing Table
If the nonterminal N is on
the top of stack and the
next token is t, which
production rule to use?
Choose a rule N X t N
X
such that N
Y
X t
X * tY or Q Y
X * and S * WNtY
t … … …
Parsing 16
First Set
Let X be or be in V or T.
First(X ) is the set of the first terminal in any
sentential form derived from X.
If X is a terminal or , then First(X ) ={X }.
If X is a nonterminal and X X1 X2 ... Xn is a
rule, then
First(X1) -{} is a subset of First(X)
First(Xi )-{} is a subset of First(X) if for all j<i
First(Xj) contains {}
is in First(X) if for all j≤n First(Xj)contains
Parsing 17
Examples of First Set
exp exp addop term | st ifst | other
term ifst if ( exp ) st elsepart
addop + | - elsepart else st |
term term mulop factor | exp 0|1
factor
mulop * First(exp) = {0,1}
factor (exp) | num First(elsepart) = {else, }
First(ifst) = {if}
First(addop) = {+, -}
First(st) = {if, other}
First(mulop) = {*}
First(factor) = {(, num}
First(term) = {(, num}
First(exp) = {(, num}
Parsing 18
Algorithm for finding First(A)
For all terminals a, First(a) = {a} If A is a terminal or ,
then First(A) = {A}.
For all nonterminals A, First(A) := {} If A is a nonterminal,
While there are changes to any First(A) then for each rule A
X1 X2 ... Xn, First(A)
For each rule A X1 X2 ... Xn contains First(X1) - {}.
For each Xi in {X1, X2, …, Xn } If also for some i<n,
First(X1), First(X2), ...,
If for all j<i First(Xj) contains and First(Xi) contain ,
, then First(A) contains
First(Xi+1)-{}.
Then If First(X1), First(X2), ...,
add First(Xi)-{} to First(A) and First(Xn) contain ,
then First(A) also
If is in First(X1), First(X2), ..., contains .
and First(Xn)
ParsingThen add to First(A) 19
Finding First Set: An Example
exp term exp’ First
exp’ addop term exp’ | exp
addop + | - exp’
term factor term’ addop + -
term’ mulop factor term’ | term ( num
mulop * term’
factor ( exp ) | num mulop *
factor ( num
Parsing 20
Follow Set
Let $ denote the end of input tokens
If A is the start symbol, then $ is in
Follow(A).
If there is a rule B X A Y, then First(Y) -
{} is in Follow(A).
If there is production B X A Y and is in
First(Y), then Follow(A) contains Follow(B).
Parsing 21
Algorithm for Finding Follow(A)
Follow(S) = {$} If A is the start
FOR each A in V-{S} symbol, then $ is
Follow(A)={} in Follow(A).
WHILE change is made to some Follow sets If there is a rule A
Y X Z, then
FOR each production A X1 X2 ... Xn, First(Z) - {} is in
FOR each nonterminal Xi Follow(X).
Add First(Xi+1 Xi+2...Xn)-{} If there is production
into Follow(Xi). B X A Y and
(NOTE: If i=n, Xi+1 Xi+2...Xn= ) is in First(Y), then
IF is in First(Xi+1 Xi+2...Xn) THEN Follow(A) contains
Add Follow(A) to Follow(Xi) Follow(B).
Parsing 22
Finding Follow Set: An Example
exp term exp’
First Follow
exp’ addop term exp’ |
exp ( num $)
addop + | -
term factor term’ exp’ + - $ )
term’ mulop factor term’ | addop + -
mulop * term ( num + - $ )
factor ( exp ) | num term’ *
mulop *
factor ( num
Parsing 23
Constructing LL(1) Parsing Tables
FOR each nonterminal A and a production A X
FOR each token a in First(X)
A X is in M(A, a)
IF is in First(X) THEN
FOR each element a in Follow(A)
Add A X to M(A, a)
Parsing 24
Example: Constructing LL(1) Parsing Table
First Follow
exp {(, num} {$,)} ( ) + - * n $
exp’ {+,-, } {$,)}
addop {+,-} {(,num} exp
1 1
term {(,num} {+,-,),$}
term’ {*, } {+,-,),$} exp’ 3 2 2 3
mulop {*} {(,num}
factor {(, num} {*,+,-,),$} addop 4 5
1 exp term exp’
2 exp’ addop term exp’ term 6 6
3 exp’
4 addop +
5 addop - term’ 8 8 8 7 8
6 term factor term’
7 term’ mulop factor term’ mulop
8 term’ 9
9 mulop *
10 factor ( exp ) factor
11 factor num 10 11
Parsing 25
LL(1) Grammar
A grammar is an LL(1) grammar if its LL(1)
parsing table has at most one production in
each table entry.
Parsing 26
LL(1) Parsing Table for non-LL(1) Grammar
1 exp exp addop term
2 exp term
3 term term mulop factor
4 term factor
5 factor ( exp ) ( ) + - * num $
6 factor num exp 1,2 1,2
7 addop +
8 addop - term 3,4 3,4
9 mulop * factor 5 6
addop 7 8
First(exp) = { (, num } mulop 9
First(term) = { (, num }
First(factor) = { (, num }
First(addop) = { +, - }
First(mulop) = { * }
Parsing 27
Causes of Non-LL(1) Grammar
What causes grammar being non-LL(1)?
Left-recursion
Left factor
Parsing 28
Left Recursion
Immediate left Can be removed very
recursion easily
A A X | Y A=Y X* A Y A’, A’ X A’|
A A X1 | A X2 |…| A Xn A Y1 A’ | Y2 A’ |...| Ym A’,
| Y1 | Y2 |... | Ym A’ X1 A’| X2 A’|…| Xn A’|
A={Y1, Y2,…, Ym} {X1, X2, …, Xn}*
General left recursion Can be removed when
A => X =>* A Y there is no empty-string
production and no cycle
in the grammar
Parsing 29
Removal of Immediate Left Recursion
exp exp + term | exp - term | term
term term * factor | factor
factor ( exp ) | num
Remove left recursion
exp term exp’ exp = term ( term)*
exp’ + term exp’ | - term exp’ |
term factor term’ term = factor (* factor)*
term’ * factor term’ |
factor ( exp ) | num
Parsing 30
General Left Recursion
Bad News!
Can only be removed when there is no empty-
string production and no cycle in the grammar.
Good News!!!!
Never seen in grammars of any programming
languages
Parsing 31
Left Factoring
Left factor causes non-LL(1)
Given A X Y | X Z. Both A X Y and A X Z
can be chosen when A is on top of stack and a
token in First(X) is the next token.
AXY|XZ
can be left-factored as
A X A’ and A’ Y | Z
Parsing 32
Example of Left Factor
ifSt if ( exp ) st else st | if ( exp ) st
can be left-factored as
ifSt if ( exp ) st elsePart
elsePart else st |
seq st ; seq | st
can be left-factored as
seq st seq’
seq’ ; seq |
Parsing 33
Bottom-up Parsing
Use explicit stack to perform a parse
Simulate rightmost derivation (R) from left
(L) to right, thus called LR parsing
More powerful than top-down parsing
Left recursion does not cause problem
Two actions
Shift: take next input token into the stack
Reduce: replace a string B on top of stack by a
nonterminal A, given a production A B
Parsing 34
Example of Shift-reduce Parsing
Grammar
S’ S
S (S)S | Reverse of
Parsing actions rightmost derivation
Stack Input Action from left to right
$ (())$ shift 1 (())
$( ())$ shift 2 (())
$(( ))$ reduce S 3 (())
$((S ))$ shift 4 ((S))
$((S) )$ reduce S 5 ((S))
$((S)S )$ reduce S (S)S 6 ((S)S)
$(S )$ shift 7 (S)
$(S) $ reduce S 8 (S)
$(S)S $ reduce S (S)S 9 (S)S
$S $ accept 10 S’ S
Parsing 35
Example of Shift-reduce Parsing
Grammar
S’ S
S (S)S |
Parsing actions
Stack Input Action
$ (())$ shift 1 (()) handle
$( ())$ shift 2 (())
$(( ))$ reduce S 3 (())
$((S ))$ shift 4 ((S))
$((S) )$ reduce S 5 ((S))
$((S)S )$ reduce S (S)S 6 ((S)S)
$(S )$ shift 7 (S)
$(S) $ reduce S 8 (S)
$(S)S $ reduce S (S)S 9 (S)S
$S $ accept 10 S’ S
Viable prefix
Parsing 36
Terminologies
Right sentential form Right sentential form
sentential form in a rightmost (S)S
derivation ((S)S)
Viable prefix Viable prefix
( S ) S, ( S ), ( S, (
sequence of symbols on the
( ( S ) S, ( ( S ), ( ( S , ( (, (
parsing stack
Handle
Handle ( S ) S. with S
right sentential form + ( S ) S . with S
position where reduction can ( ( S ) S . ) with S ( S ) S
be performed + production
used for reduction LR(0) item
S ( S ) S.
LR(0) item S (S).S
production with distinguished S (S.)S
position in its RHS S (.S)S
S .(S)S
Parsing 37
Shift-reduce parsers
There are two possible actions:
shift and reduce
Parsing is completed when
the input stream is empty and
the stack contains only the start symbol
The grammar must be augmented
a new start symbol S’ is added
a production S’ S is added
To make sure that parsing is finished when S’ is on
top of stack because S’ never appears on the RHS of
any production.
Parsing 38
LR(0) parsing
Keep track of what is left to be done in the
parsing process by using finite automata of
items
An item A w . B y means:
A w B y might be used for the reduction in the
future,
at the time, we know we already construct w in the
parsing process,
if B is constructed next, we get the new item
AwB.Y
Parsing 39
LR(0) items
LR(0) item
production with a distinguished position in the RHS
Initial Item
Item with the distinguished position on the leftmost of
the production
Complete Item
Item with the distinguished position on the rightmost of
the production
Closure Item of x
Item x together with items which can be reached from x
via -transition
Kernel Item
Original item, not including closure items
Parsing 40
Finite automata of items
Grammar: S
S’ .S S’ S.
S’ S
S (S)S
S S .(S)S S.
Items:
S’ .S (
S’ S.
S .(S)S S
S (.S)S S (S.)S
S (.S)S
S (S.)S )
S (S).S
S
S (S)S. S (S).S S (S)S.
S.
Parsing 41
DFA of LR(0) Items
S S’ .S S S’ S.
S’ .S S’ S. S .(S)S
S.
S (S.)S
S .(S)S
S. ( S
)
( S (.S)S
S S .(S)S
S (.S)S S (S.)S S.
) ( ( S (S).S
S .(S)S
S.
S (S).S
S S
S (S)S.
S (S)S.
Parsing 42
LR(0) parsing algorithm
Item in state token Action
A-> x.By where B is terminal B shift B and push state s
containing A -> xB.y
A-> x.By where B is terminal not B error
A -> x. - reduce with A -> x (i.e. pop x,
backup to the state s on top of
stack) and push A with new
state d(s,A)
S’ -> S. none accept
S’ -> S. any error
Parsing 43
LR(0) Parsing Table
A’ .A A A’ A. 1
A .(A) State Action Rule ( a ) A
A .a 0 a
A a. 2
0 shift 3 2 1
( a 1 reduce A’ -> A
A (.A)
A (A.) 4
2 reduce A -> a
A .(A)
A .a 3 A 3 shift 3 2 4
)
( 4 shift 5
A (A). 5 5 reduce A -> (A)
Parsing 44
Example of LR(0) Parsing
State Action Rule ( a ) A
0 shift 3 2 1
1 reduce A’ -> A
2 reduce A -> a
3 shift 3 2 4
4 shift 5
5 reduce A -> (A)
Stack Input Action
$0 ((a))$ shift
$0(3 (a))$ shift
$0(3(3 a))$ shift
$0(3(3a2 ))$ reduce
$0(3(3A4 ))$ shift
$0(3(3A4)5 )$ reduce
$0(3A4 )$ shift
$0(3A4)5 $ reduce
$0A1 $ accept
Parsing 45
Non-LR(0)Grammar
Conflict
S’ .S S S’ S. 1
Shift-reduce conflict S .(S)S
A state contains a S .
0 S (S.)S 3
complete item A x. and (
a shift item A x.By S
Reduce-reduce conflict S (.S)S )
S .(S)S
A state contains more S .
2
than one complete items.
( ( S (S).S
S .(S)S
A grammar is a LR(0) S .
4
grammar if there is no
S
conflict in the
grammar. S (S)S. 5
Parsing 46
SLR(1) parsing
Simple LR with 1 lookahead symbol
Examine the next token before deciding to
shift or reduce
If the next token is the token expected in an
item, then it can be shifted into the stack.
If a complete item A x. is constructed and the
next token is in Follow(A), then reduction can be
done using A x.
Otherwise, error occurs.
Can avoid conflict
Parsing 47
SLR(1) parsing algorithm
Item in state token Action
A-> x.By (B is terminal) B shift B and push state s containing
A -> xB.y
A-> x.By (B is terminal) not B error
A -> x. in reduce with A -> x (i.e. pop x,
Follow(A) backup to the state s on top of
stack) and push A with new state
d(s,A)
A -> x. not in error
Follow(A)
S’ -> S. none accept
S’ -> S. any error
Parsing 48
SLR(1) grammar
Conflict
Shift-reduce conflict
A state contains a shift item A x.Wy such that W is
a terminal and a complete item B z. such that W
is in Follow(B).
Reduce-reduce conflict
A state contains more than one complete item with
some common Follow set.
A grammar is an SLR(1) grammar if there is
no conflict in the grammar.
Parsing 49
SLR(1) Parsing Table
A (A) | a
State ( a ) $ A
0 S3 S2 1
A’ .A A A’ A. 1 1 AC
A .(A)
A .a 0 a 2 R2
A a. 2 3 S3 S2 4
( a
A (.A)
4 S5
A .(A)
A .a 3 A A (A.) 4
5 R1
)
( A (A). 5
Parsing 50
SLR(1) Grammar not LR(0)
S’ .S S
S .(S)S S’ S. 1 S (S)S |
S. 0
S (S.)S 3
( S
S (.S)S ) State ( ) $ S
S .(S)S
S. 2 0 S2 R2 R2 1
S (S).S 1 AC
( ( S .(S)S 2 S2 R2 R2 3
S. 4 3 S4
S 4 S2 R2 R2 5
5 R1 R1
S (S)S. 5
Parsing 51
Disambiguating Rules for Parsing Conflict
Shift-reduce conflict
Prefer shift over reduce
In case of nested if statements, preferring shift over
reduce implies most closely nested rule for dangling
else
Reduce-reduce conflict
Error in design
Parsing 52
Dangling Else
S’ .S S S’ S. 1
0 I if S else .S 6
S .I I I S .I
S .other S I. 2 S .other
I .if S I .if S S
I .if S else I I .if S else S
S if
else
I .if S else S 7
other other
if state if else other $ S I
other
0 S4 S3 1 2
S .other 3 I if .S 4 1 ACC
other
I if .S else S
I if S. 5 S .I 2 R1 R1
I if S. else S S S .other 3 R2 R2
I .if S
4 S4 S3 5 2
I .if S else S
5 S6 R3
if
6 S4 S3 7 2
7 R4 R4
Parsing 53