LL(1) or Table Driven or non recursive Predictive Parsing:
Predictive parser is a recursive descent parser, which has the capability to predict which
production is to be used to replace the input string. The predictive parser does not suffer from
backtracking.
To accomplish its tasks, the predictive parser uses a look-ahead pointer, which points to the
next input symbols. To make the parser backtracking free, the predictive parser puts some
constraints on the grammar and accepts only a class of grammar known as LL(k) grammar.
Predictive parsing uses a stack and a parsing table to parse the input and generate a parse tree.
Both the stack and the input contains an end symbol $ to denote that the stack is empty and
the input is consumed. The parser refers to the parsing table to take any decision on the input
and stack element combination.
The goal of predictive parsing is to construct a top-down parser that never backtracks. To do
so, we must transform a grammar in two ways:
1. eliminate left recursion, and
2. Perform left factoring.
What is Left recursion?
The grammar is ambiguous if it has left recursion.The production is left-recursive if the
leftmost symbol on the right side is the same as the non terminal on the left side.
Consider an example:
E E+T|T
T T*F|T
F a|b|0|1|2|3|4|5|6|7|8|9
In the above grammar the production
E E+T
T T*F
is left recursive because the nonterminal on the left hand side of the production, E, is the first
symbol on the right hand side of the production.
Ie: if the production in the form
A A
A
Then it is left recursive.
How to eliminate left recursion?
If the production in the form ,
A A
A
We can rewrite the rules to eliminate the left recursion as follows:
A A'
A' A'
A'
The production
E E+T
becomes after eliminating left recursion
E T E'
E' + T E'
E'
Here's the entire expression grammar, which is free from left recursion.
E T E'
E' + T E'
E'
T F T'
T' * F T'
T'
F a | b | 0 | 1 | 2 | ... | 9
Left Factoring
Another property required of a grammar to be suitable for top-down parsing is that the
grammar is left-factored.
Grammar with common prefixes- If the RHS of more than one production starts with the
same symbol, then such a grammar is called as grammar with common prefixes
1 | αβ then it is left factored as:
2
A A`
A |β
1 2
For example, here is a grammar that is not left-factored:
A abc
A abd
Both productions share the common prefix a b on the right hand side of the production.
We can left-factor the grammar by making a new nonterminal to represent the alternatives for
the symbols following the common prefix:
A a b A'
A' c
A' d
Steps for designing Predictive Parser:
1. Make the grammar suitable for top-down parser. By performing the elimination of left
recursion. And by performing left factoring.
2. Find the FIRST and FOLLOW of the variables
3. Design predictive parser table
4. Write predictive parsing algorithm
Step 1:FIRST
FIRST:It is Starting symbol of the production
● FIRST(α) is the set of terminal symbols that begin the strings derivable from a string
of terminal and nonterminal symbols α in a grammar.
If α can derive ε, then ε is also in FIRST(α).
● Algorithm to compute FIRST(X):
o If X is a terminal, then FIRST(X) = { X }.
o If X is a production, then add to FIRST( X).
o If X Y Y ... Y is a production for k
1 2 k
1, and
i k, Y Y ... Y derives the empty string, and a is in FIRST(Y ),
1 2 i-1 i
then add a to FIRST(X).
If Y Y ... Y derives the empty string, then add ε to FIRST(X).
1 2 k
● Example. Consider the grammar G:
o S (S)S|
o
● For G, FIRST(S) = {(, ε}.
Step 2. FOLLOW
Follow:It is the terminal which could follow the Non terminal or Variable in the process of
production.
● FOLLOW(A) is the set of terminals that can appear immediately to the right of A in
some sentential form in a grammar.
Let us assume the string to be parsed is terminated by an end-of-string endmarker $.
Then if A can be the rightmost symbol in some sentential form, the right endmarker $
is also in FOLLOW(A).
● Algorithm to compute FOLLOW(A) for all nonterminals A of a grammar:
1. Place $ in FOLLOW(S) where S is the start symbol of the grammar.
2. If A Bβ is a production, then add every terminal symbol a in FIRST(β) to
FOLLOW(B).
3. If there is a production A B, or a production A Bβ, where FIRST(β)
contains ε,
then add every symbol in FOLLOW(A) to FOLLOW(B).
● Example. For G above, FOLLOW(S) = {), $}.
Step 3. Algorithm to Construct a Predictive Parsing Table
1. Input: Grammar G.
2. Output: Predictive parsing table M.
3. Method:
4.
for (each production A in G) {
for (each terminal a in FIRST(α))
add A to M[A, a];
if (ε is in FIRST(α))
for (each symbol b in FOLLOW(A))
add A to M[A, b];
}
make each undefined entry of M be error;
LL(1) Grammars
●
A grammar is LL(1) iff whenever A | are two distinct productions, the following three conditions ho
1. For no terminal a do both α and β derive strings beginning with a.
2. At most one of α and β can derive the empty string.
3. If β derives the empty string, then α does not derive any string beginning with
a terminal in FOLLOW(A).
Likewise, if α derives the empty string, then β does not derive any string
beginning with a terminal in FOLLOW(A).
● We can use the algorithm above to construct a predictive parsing table with uniquely
defined entries for any LL(1) grammar.
● The first "L" in LL(1) means scanning the input from left to right, the second "L" for
producing a leftmost derivation, and the "1" for using one symbol of lookahead to
make each parsing action decision.
Step 4:Algorithm for LL(1) Parsing
Input:
string ω
parsing table M for grammar G
Output:
If ω is in L(G) then left-most derivation of ω,
error otherwise.
Initial State : $S on stack (with S being start symbol)
ω$ in the input buffer
SET ip to point the first symbol of ω$.
repeat
let X be the top stack symbol and a the symbol pointed by ip.
if X∈ V t
or $
Questions
1.Find the FIRST and FOLLOW for the grammar
S ABCDE
Aa|
Bb|
Cc
Dd|
Ee|
production FIRST FOLLOW
S ABCDE {a,b,c} {$}
(S is starting
symbol of the
grammar.
First(S) is A,First(A) is {a,},add {a} to FIRST(S) set and substitute in the production SABC
Therefore add $)
Aa| {a,ε} {b,c}
(A followed by
BCDE. So add
first symbols of
B and C)
Bb| {b,ε} {c}
(B followed by
CDE. So add
first of C)
Cc {c} {d,e,$}
(C followed by
DE,So add first
of THE. Right of
E is nothing. So
add $}
Dd| {d,ε} {e,$}
Ee| {e,ε} {$}
2.Find the FIRST and FOLLOW for the grammar
S Bb|Cd
B aB|
CcC|
production FIRST FOLLOW
S Bb|Cd {a,b,c,d} {$}
FIRST(S)=FIRST(B)={a,ε}, add {a} to FIRST(S)
set and substitute ε in S Bb.
Then FIRST(S)={b}
FIRST(S)=FIRST(C}={c,ε}
Add {c} to FIRST(S) and substitute ε in S Cd
Then FIRST(S)={d}.
∴FIRST(S)={a,b,c,d}
B aB| {a,ε} {b}
{ the terminal which could
follow the variable B is b)
CcC| {c,ε} {d}{ the terminal which
could follow the variable C is
d}
3.Find the FIRST and FOLLOW for the grammar
S ACB|CbB|Ba
Ada|BC
Bg|
Ch|
Production FIRST FOLLOW
S {d,g,h,ε,b,,a} {$}
ACB|CbB|Ba (no any S on right side of production)
Ada|BC {d,g,h,ε} {h,g,$}
(C follows A, FIRST(C)={h,ε}.Add {h} to FOLLOW set
and substitute ε to S ACB
Bg| {g,ε}
Ch| {h,ε}
4.Find the FIRST and FOLLOW for the grammar
SaABb
Ac|
Bd|
productions FIRST FOLLOW
SaABb { a} {$}
Ac| { c,ε } {d,b}
Bd| { d, ε } {b}
5.Check whether grammar is LL(1) or not?
If parsing table has multiple defined entries then it is not in LL(1).
Example
SA|a
Aa
The above grammar is not in LL(1)
Because the parsing table has multiple entries for the production Sa and Aa
6.Construct Predictive parsing table for the grammar:
S +SS | *SS | a;
FIRST(S) = {+, *, a}
FOLLOW(S) = {+, *, a, $}
Consider the production S +SS compare with A where
A=S and α=+SS and FIRST(α)=FIRST(+SS)={+}
add A to M[A, a];
Update table with production S +SS in the entry M[A,a]=M[A,+]
Input Symbol
Non terminal a + * $
S error
S a S *SS
7. Predictive parsing table for the grammar.
S (S)S|
FIRST(S) = {(, ε}
FOLLOW(S) = {), $}
Consider the production :S
Compare S with A in G
Here A=S and α=ε and FIRST(α)={ε}
As FIRST(α) contains {ε} so each symbol b in FOLLOW(A))
add A to M[A, b]
That is Follow(S)={),$}
Add S to the entries M[A, b ]=M[S ,),$]
Input symbols
Non terminal ( ) $
S
S (S)S
8. Predictive parsing table for the grammar.
S S(S)|
FIRST(S) = {(, ε}
FOLLOW(S) = {(, ), $}
Input Symbol
( ) $
Nonterminal
S
S S(S)
The above grammar is not LL(1).Because parsing table has multiple entries
9.Construct predictive parsing table for the grammar or LL(1) parsing table
E E+T|T
T T*F|T
F (E)|id
Step 1:Grammar has left recursion, eliminate left recursion.
Step 2:Construct FIRST and Follow set
Step 3:Construct Parsing table
Step1 : After eliminating left recursion the grammar becomes
E T E'
E' + T E'
E'
T F T'
T' * F T'
T'
F (E)|id.
Step2: Construct FIRST and FOLLOW for given grammar
Production FIRST FOLLOW
E T E' { (,id { ) ,$ }
} ( symbol which follows Variable E is ) and E is start symbol so
add $ )
{ +,ε } { ), $ }
E' + T E'| (Symbol follows E' is ε, so FOLLOW of E' is FOLLOW of E)
T F T' { (,id { +, ), $ }
} Variable which Follows T is E', First(E') ={+} and
FOLLOW( E')={i),$ }
∴FOLLOW(T)={+,),$ }
{*,ε } { +, ), $ }
T' * F T'| (Symbol Follows T' is ε, FOLLOW(T')=FOLLOW(T)
F (E)|id { (,id { *, +, ), $}
}
Therefore FOLLOW(T')=FOLLOW(T)={+,),$ }
So FOLLOW(F)={+}union{ +, ) , $ }={ * , + , ), $}
Predictive parsing table for above grammar is:
input symbol
Non terminal $
id + * ( )
E E T E' E T E'
E' E' + T E' E' E'
T T F T' T F T'
T' T' T' * F T' T' T'
F F id F (E)
Consider the production E T E',FIRST(E)={ id, ) }. Fill the cell with production E T E' where the input sym
Consider the production T' * F T', FIRST(T')={*}, Fill the cell with production T' * F T' where the input sym
Consider the production E T E',FIRST(E)={),id},Fill the cell with the production E T E'
where input symbol is {) ,id}
Now consider E' here FIRST(E)={},therefore consider the FOLLOW(E) and fill the table.
FOLLOW(E)={),$},fill the cell with the production E' where the input symbol is {),$}
Similarly for T' , Follow(T')={+,),$}
Step4: predictive parsing using parser table.
The following example illustrates the top-down predictive parsing for the input string id + id
* id
stack input Action Matched
$ id+id*id$
$E id+id*id$ Place E on stack
$E'T id+id*id$
E on id in above parsing table gives E T E', replace E by E T E' so that top of the stac
$E'T'F id+id*id$
T on id gives T F T',replace T by the production T F T' so that top of the stack will be
$E'T'id id+id*id$ F on id gives F id id
$E'T' +id*id$ Top of the stack id is matched with input id.
So pop id from stack and advance input pointer
$E' +id*id$ T' on + gives T' id
$E'T+ +id*id$ E' on + gives E' + T E',replace E' by E'T+ id
$E'T id*id$ + is mathed, so pop + from stack and remove + from input id+
$E'T'F id*id$ T on id gives T F T', push T'F on to stack id+
$E'T'id id*id$ F on id gives F id,push id on to stack id+
$E'T' *id$ Top of the stack is matched with input id, remove id from id+id
input string and pop id from stack, advance input pointer
$E' *id$ T' on * gives T' * F T, push TF* on to stack i id+id
$E'TF *id$ * is matched, pop * from stak and remove from input id+id*
*
$E'TF id$ F on id gives F id, push id on to stack id+id*
$E'T'id id$ id is matched with top of stack, pop id from stack and remove id+id*id
from input
$E'T' $ T' on $ gives T' id+id*id
$E' $ E' on $ gives E' id+id*id
$ $ String is accepted
Note:
While constructing LL(1) parser we have to follow the following
1)If the given grammar has left recursion , we have to eliminate left recursion.
2)We have to find FIRST and FOLLOW
3)We have to construct parsing table
4)Parse the given input with the help of parsing table.
Bottom Up Parsers / Shift Reduce Parsers
Bottom up parsing can be defined as an attempt to reduce the input string w to the start
symbol of a grammar by tracing out the rightmost derivations of w in reverse.
Bottom up parser or shift reduce parser is classified as follows:-
1.Operator precedence parser-This is the only parser allows ambiguous grammar for parsing
2.LR parser
2.1 LR(0) parser
2.2 SLR(1) parser
2.3 LALR(1)
2.4 CLR(1)
Here LR(0) and SLR(1) parser use canonical collection of LR(0) items for parsing. And
LR(0) parser is least powerful. LALR(1) and CLR(1) parser uses canonical collection of
LR(1) items and CLR(1) parser is most powerful
Shift reduce parser:
Shift-reduce parsing attempts to construct a parse tree for an input string beginning at the
leaves and working up towards the root.
In other words, it is a process of “reducing” (opposite of deriving a symbol using a
production rule) a string w to the start symbol of a grammar. At every (reduction) step, a
particular substring matching the RHS of a production rule is replaced by the symbol on the
LHS of the production.
A general form of shift-reduce parsing is LR (scanning from Left to right and using Right-
most derivation in reverse) parsing, which is used in a number of automatic parser generators
like Yacc, Bison, etc.
Here are the definitions of some commonly used terminologies in this context
What is handle pruning?
Handle:
A “handle” of a string is a substring that matches the RHS of a production and whose
reduction to the non-terminal (on the LHS of the production) represents one step along the
reverse of a rightmost derivation toward reducing to the start symbol.
If S * Aw * w, then A in the position following is a handle of w.
In such a case, it is suffice to say that the substring β is a handle of αβw, if the position of β
and the corresponding production are clear.
Consider the following grammar:
E E + E | E * E | (E) | id
and a right-most derivation is as follows:
E E+E
E*E
id3
id2 * id3
id1 + id2 * id3
Note that the reduction is in the opposite direction from id1 + id2 * id3 back to E, where the
handle at every step is underlined.
And this process of reduction is called handle pruning.
Implementation of shift reduce parsing:
A convenient way to implement a shift-reduce parser is to use a stack to hold grammar
symbols and an input buffer to hold the string w to be parsed. The symbol $ is used to mark
the bottom of the stack and also the right-end of the input.
Basic Operations –
● Shift: This involves moving of symbols from input buffer onto the stack.
● Reduce: If the handle appears on top of the stack then, its reduction by using
appropriate production rule is done i.e. RHS of production rule is popped out of stack
and LHS of production rule is pushed onto the stack.
● Accept: If only start symbol is present in the stack and the input buffer is empty then,
the parsing action is called accept. When accept action is obtained, it is means
successful parsing is done.
● Error: This is the situation in which the parser can neither perform shift action nor
reduce action and not even accept action.
Example 1 – Consider the grammar
E E+E
EE*E
E id
Perform Shift Reduce parsing for input string “id + id + id”.
stack Input Parsing action
$ id+id*id$ Shift id on to the stack
$id +id*id$ Reduce id by id
$E +id*id$ Shift + on to stack
$E+ id*id$ Shift id
$E+id *id$ Reduce id by E id
$E+E *id$ Shift *
$E+E* id$ Shift id
$E+E*id $ Reduce id by E id
$E+E*E $ Reduce E E*E
$E+E $ Reduce E E+E
$E $ accept
Operator precedence parser:
An operator precedence parser is a bottom-up parser that interprets an operator-precedence
grammar.This is the only parser allows ambiguous grammar for parsing.
What is operator grammar?
A grammar which is used to define mathematical operation is called operator grammar,
With some restrictions that no production right side is empty or has two adjacent non
terminal.
Example1:
E –> E + E
E –> E * E
E –> id
Above grammar is operator precedence grammar, no two adjacent variable and no ε
production
Example 2:Consider the grammar below which in not operator grammar, because two
adjacent non terminal
E –> E AE
A –> +|*
But we can convert this grammar to operator grammar
E->EAE
E->E+E
E->E*E
Now the grammar is operator grammar
Example 3:Convert the grammar to operator grammar
S SAS|a
AbSb|b
Above grammar is not operator grammar.
S SAS|a
SS bSbS|SbS|a (∵ AbSb)
AbSb|b
Now the grammar is operator grammar, no two adjacent variable.
Constructing Operator precedence parser:
To design operator precedence parser we should follow following steps:
step1:Convert the given grammar to operator grammar if the grammar is not operator
grammar
Step2: Construct operator precedence table
Step3:parse the given string.
Algorithm to construct Operator precedence table:
In operator precedence parsing, firstly precedence relations are defined between every pair of
terminal symbols and then operator precedence table is constructed.
● If precedence of b is higher than precedence of a, then we define a < b
● If precedence of b is same as precedence of a, then we define a = b
● If precedence of b is lower than precedence of a, then we define a > b
The precedence relation between terminal symbols is determined by making use of the
traditional ideas of associativity and operators precedence.The following rules we have to
follow to decide precedence
● Any identifier id will always be given higher precedence than the precedence of any
other symbol.
● Symbol $ will always be given the lowest precedence.
● If there exists two operators having the same precedence, then in that case, we go by
checking the associativity rule for that operator.
Algorithm to parse input:
initialize : set ip(input pointer) to point to the first symbol of the input string w$
Let b be the top symbol of the stack , a the input symbol pointed by ip
Repeat
if(a is $ and b is $)
return
if(a .> b)
push a into stack
move input pointer
else if(a <. b)
c<-pop stack
until(c .> b)
else
error()
Exampl1:Construct operator precedence parser for the given grammar
E –> E + E
E –> E * E
E –> id
Then parse the following string id+id*id
Step1: The above grammar is in the form of operator grammar.
step2:Construct operator precedence table as follows
id + * $
i - .> .> .>
d (id with id never (id has highest (compare id with *, id ($ has least
compare,two variable precedence over +) has highest precedence
never come side by precedence among all among all
side ) input) input)
+ <. .> <. .>
(+ has less precedence ( while comparing + ( + has less
over id) and + ,left side + precedence over * )
given more
precedence)
* < > > >
( id has highest (left associative, left *
precedence) has more precedence)
$ < < < -
Step3: Parse the given string id+id*id
Compare stack symbol $ and input id , $<.id , push id onto stack.
Now compare id And input +, id>+, so pop id from stack.
Same procedure is used for other input
Precedence functions:
Operator precedence parsers use precedence functions that map terminal symbols to integers.
Algorithm for Constructing Precedence Functions
Create functions fa for each grammar terminal 'a' and for the end of string symbol.
Partition the symbols in groups so that fa and gb are in the same group if a =· b (there can be
symbols in the same group even if they are not connected by this relation).
Create a directed graph whose nodes are in the groups, next for each symbols a and b do:
place an edge from the group of gb to the group of fa if a <· b, otherwise if a ·> b place an
edge from the group of fa to that of gb.
If the constructed graph has a cycle then no precedence functions exist. When there are no
cycles collect the length of the longest paths from the groups of fa and gb respectively.
Consider the following table:
Function f Function g
id + * $
id - .> .> .>
+ <. .> <. .>
* < > > >
$ < < < -
Resulting graph:From the above table if the entry in the cell is > then draw outgoing edge.
That is
Id with + gives precedence .>. Means draw the edge from fid to g+.
If the entry in the table is <. Then draw the incoming edge. That is + with id, + has
precedence <, that means draw the edge from gid to f+
Converting relation table into function table:
From the above graph we can get the path from fid to g+ with length 4
Similarly we get path from gid to f$ path length 5
id + * $
f 4 2 4 0
g 5 1 3 0