Bottom-Up Parsing
(cont.)
CS2210
Lecture 6
CS2210 Compiler Design 2004/5
LR Grammars
■ = a grammar for which a LR parsing table can be
constructed
■ LR(0) and LR(1) typically of interest
■ What about LL(0)?
■ Parsing tables from LR grammars
■ SLR (simple LR) tables
■ Many grammars for which it is not possible
■ Canonical LR tables
■ Works on “most” grammars
■ LALR (= lookahead LR) tables
■ Often used in practice because significantly smaller than
canonical LR
CS2210 Compiler Design 2004/5
LR(0) items
■ A->XYZ
■ Items:
■ A->.XYZ
■ A->X.YZ
■ A->XY.Z
■ A->XYZ.
■ Represents how much of a production
we have seen in parsing
CS2210 Compiler Design 2004/5
1
Parse Table Construction
■ Build a DFA to recognize viable prefixes
■ = set of prefixes of right sentential forms that can appear on
the stack of a shift-reduce parser
■ Items are ~ states of an NFA for viable prefixes
■ Group items together (subset construction) to get DFA
■ Define a canonical collection of LR(0) items
■ Helpers:
■ Augmented grammar
■ S start symbol of G, S’->S new start symbol S’
■ Closure function
■ Goto function
CS2210 Compiler Design 2004/5
Closure
■ I set of items of G
■ Closure(I)
■ Initially I
■ Repeat
■ If A->α.Bβ in closure(I) and B->γ is a production add
B->.γ to closure
■ Until no new items get added
CS2210 Compiler Design 2004/5
Goto
■ Goto(I,X), I set of items, X grammar
symbol
■ Goto(I,X) :=
closure({A->αX.β| A->α.Xβ ∈ I})
■ For valid items I for viable prefix γ ,
then goto(I,X) = valid items for viable
prefix γX
CS2210 Compiler Design 2004/5
2
Sets of Items Construction
procedure items(G’);
begin
C := closure({S->.S});
repeat
for each set of items I in C and each
grammar symbol X such that goto(I,X) is not
empty and not in C do
add goto(I,X) to C
until no more changes to C
end
CS2210 Compiler Design 2004/5
Sets of Items Construction
Example
I0 =
E’ -> E I1 =
E -> E + T | T I2 =
I3 =
T -> T * F | F I4 =
I5 =
F -> (E) | id I6 =
I7 =
I8 =
I9 =
I10 =
I11 =
CS2210 Compiler Design 2004/5
SLR Table Construction
Input G’ augmented grammar
1. Construct C ={I 0 ,…,In}, collection of LR(0) items
2. State i is constructed from I i with parsing actions:
a) if A->α.aβ is in Ii and goto(I i,a) = Ij then action[i,a]
:= shift jγβ (a must be a terminal)
b) if A-> α. is in Iithen action[i,a] := reduce A-> α for
all a in Follow(A)
c) if S’->S is in Ii then set action[I,$] = accept
3. Goto transitions for state I are constructed for all
nonterminal A as follows: if goto(Ii,A) = Ij then
goto[i,A] := j
4. Undefined entries are set to error
5. The initial state of the parser is the on constructed
from the items containing S’->.S
CS2210 Compiler Design 2004/5
3
SLR Table Example
Same grammar…
CS2210 Compiler Design 2004/5
(1) E->E+T
Parsing Example (2) E->T
State id + * ( ) $ E T F
(3) T->T*F
0 (4) T->F
1 (5) F->(E)
2
(6) F->id
3
4
5
6
7
8
9
10
11
Action Goto
CS2210 Compiler Design 2004/5
Grammars that are not SLR
■ There are non-ambiguous grammars for
which SLR tables have multiple entries
■ Grammar 4.20 in Aho
■ Alternatives
■ Canonical LR
■ Most powerful
■ LALR
■ Smaller parsing tables
CS2210 Compiler Design 2004/5
4
Building LR Parsing Tables
■ Problem in SLR
■ If [A -> α.] is in Items set and
a ∈ Follow(A) then we reduce by A -> α but βα
on the stack is such that βA cannot be followed by
a.
■ Solution
■ Carry more information in state on stack
CS2210 Compiler Design 2004/5
SLR Problem Example
Grammar: Items:
I0= {S’-> .S, S->.L=R, S->.R, L->.*R,
S -> L = R L->.id, R->.L}
S -> R I1={S’->S.}
L -> *R I2={S->L.=R, R->L.}
I3={S->R.}
L -> id I4={L->*.R, R->.L, L->..*R, L->.id}
R -> L I5={L->id.}
Shift-reduce conflict I6->{S->L=.R, R->.L, L->.*R, L->.id}
I7={L->*R.}
Viable prefix L=R should I8={R->L.}
not reduce to R=R I9={S->L=R.}
CS2210 Compiler Design 2004/5
SLR Problem Example (2)
Parse id =id Items:
Stack Input I0= {S’-> .S, S->.L=R, S->.R, L->.*R,
s0 id=id L->.id, R->.L}
s0ids5 =id I1={S’->S.}
s0Ls2 =id I2={S->L.=R, R->L.}
s0Rs3 =id I3={S->R.}
s0Ss1 =id I4={L->*.R, R->.L, L->..*R, L->.id}
Error (expect to see a $) I5={L->id.}
I6->{S->L=.R, R->.L, L->.*R, L->.id}
I7={L->*R.}
I8={R->L.}
= ∈Follow(L) I9={S->L=R.}
CS2210 Compiler Design 2004/5
5
LR(1) Items
■ New item form: [A -> α. β, a] a
terminal (including $)
■ a represents (1 token) lookahead
■ Used only if β=ε, i.e., we use lookahead to
decide whether to reduce (or shift/error)
■ a ∈ Follow(A)
CS2210 Compiler Design 2004/5
LR(1) Items Construction
function closure(I); function goto(I,X);
begin begin
repeat let J be the set
for each item [A->α.Bβ, a] in I, of items
each production B->γ in G’, and [A->a.Xb,a] such
each terminal b in First(βa) that [A->a.Xb,a]
add [B->.γ, b] to I (if not present)
is in I;
until no more items can be added to I
end;
end;
procedure items(G’);
begin
C := {closure({S’->.S,$]})};
repeat
for each set of items I in C and
each grammar symbol X such that
goto(I,X) is not empty do
add goto(I,X) to C (if not there)
until no more items can be added to C
end
CS2210 Compiler Design 2004/5
Example
■ Consider same grammar again:
S -> L = R
S -> R
L -> *R
L -> id
R -> L
CS2210 Compiler Design 2004/5
6
LALR Parsing
■ Advantages
■ Smaller parsing tables than canonical LR
■ 100s versus 1,000s for typical language
■ Most PL constructs can be parsed with LALR
■ Algorithm of choice in practice
■ Construction idea
■ Merge different items with same core, e.g.
L->id., $ and L->id.,)
■ May reduce where an error should be reported
■ But reduction will lead to a parse error later
■ Merging does not produce new shift-reduce conflicts
■ Possible reduce-reduce conflicts; merge only if no conflicts
CS2210 Compiler Design 2004/5
LALR Table Construction
■ Simple but space / time consuming
■ Build LR tables
■ Merge item sets with identical cores
■ If conflict results: grammar not LALR
CS2210 Compiler Design 2004/5
Efficient LALR Construction
■ Ideas:
■ Represent only kernels for items set I, i.e.,
either S’->.S or item with “.” not leftmost
■ Can generate all items in I from kernel
■ Can generate parsing actions from kernel
■ Details in the book
CS2210 Compiler Design 2004/5
7
Parsing with more Lookahead
■ LR(k)
■ Closure: A->α.Bβ, a predict B->.γ, y where
y∈Firstk(βx)
■ Reduce: A-> α.,x reduce on lookahead x
■ Shift on lookahead x for A-> α.aβ, y, where a is a
terminal and x ∈ Firstk(aβy)
■ SLR(k)
■ Reduce: A-> α. if lookahead x ∈ Followk(A)
■ Shift on lookahead x for A-> α.aβ, a terminal and
x ∈ Firstk(aβFollow k(A))
■ LALR(k): merge LR(k) machine states with
same core CS2210 Compiler Design 2004/5
LR(k) in Practice
■ LR(k), SLR(k) are not used in practice
■ Tables too large
■ Not necessary in practice since most
grammars can be made LR(1) and even
LALR(1)
■ Some parser generators for LALR(2)
■ Useful if too lazy too rewrite the grammar
CS2210 Compiler Design 2004/5
Language Class Hierarchy
CS2210 Compiler Design 2004/5
8
Parsing with Ambiguous
Grammars
■ Use non-grammatical means to resolve
conflict
■ Want to have just ONE parse tree!
■ Useful because grammar can be
■ “more natural”
■ Smaller
■ Conflict resolution:
■ Precedence
■ Associativity
■ Shift versus reduce
CS2210 Compiler Design 2004/5
Precedence Example
E->E+E | E*E | (E) | I0={E’->.E, E->.E+E, E->.E*E, E->.(E),
E->.id}
id I1={E’->E., E->E.+E, E->E.*E}
I2={E->(.E), E->.E+E, E->.E*E, E->.(E),
versus E->.id}
I3={E->id.}
E-> E+T | T I4={E->E+.E, E->.E+E, E->.E*E, E->.(E),
E->id}
T -> T*F | F I5={E->E*.E, E->.E+E, E->.E*E, E->.(E),
E->.id}
F -> (E) | id I6={E->(E.), E->E.+E, E->E.*E}
I7={E->E+E., E->E.+E, E->E.*E}
Parse for id+id*id I8={E->E*E., E->E.+E, E->E.*E}
I9={E->(E).}
CS2210 Compiler Design 2004/5
Parser Generators
■ Yacc, bison, LLGen, LRGen etc
■ Specify grammar
■ Produce a table-drive parser
■ Typically can use precedence & associativity rules
and allow shift-reduce conflicts
■ Project uses Cup a parser generator for Java
CS2210 Compiler Design 2004/5