Chapter 3 - Syntax Analysis
Chapter 3 - Syntax Analysis
• The tokens generated by lexical analyzer are accepted by the next phase of compiler i.e.
Compiler Design syntax analyzer.
• The syntax analyzer or parser comes after lexical analysis.
• This is one of the important components of front end of the compiler.
Chapter 3: Syntax Analysis • The syntax analysis is the second phase in compilation.
• The syntax analyzer (parser) basically checks for the syntax of the language.
• The syntax analyzer takes the tokens from the lexical analyzer and groups them in such
Instructor: Fikru T. (MSc.) as a way that some programming structure(syntax) can be recognized.
Email: fikrutafesse08@gmail.com • After grouping the token, if any syntax cannot be recognized then syntactic error will be
generated.
AU 1 • This overall process is called syntax checking of the language. 2
For example: a = b + 10; • In turn, the lexical analyzer supplies tokens to syntax analyzer (parser).
Type List
Terminator combination of non-terminal and terminal symbols.
3. The NULL derivation can be specified as NT → ε
id ;
List ,
int 4. One of the non-terminals should be start symbol and congenitally we should
• Hence, int id, id; can be defined by means of the above context free grammars.
AU 9 AU 10
• Derivation from S means generation of string W from S. • Instead of choosing the arbitrary non-terminal one can choose:
• For constructing derivation two things are important. i. Either leftmost non-terminal in a sentential form then it is called leftmost derivation.
i. Choice of Non-terminal from several others. ii. Or rightmost non-terminal in a sentential form, then it is called rightmost derivation.
ii. Choice of rule from production rules for corresponding non-terminal. Example 1: Derive input string abb for left most derivation and right most derivation using
Definition of Derivation Tree CFG given by below:
Rightmost derivation:
• Let G = (V, T, P, S) be a Context Free Grammar. S → AB|ε Leftmost derivation:
S
S
A B
A B
a B S b • Also obtain the leftmost and rightmost derivation for the string 'aaabaab' using above
a B S b grammar.
b ɛ
S
b ɛ
S
ɛ • Solution:
ɛ
• Parse tree Leftmost derivation
Parse tree for Rightmost derivation
AU 13 AU 14
• There are two different parse trees for deriving string aab. rules for derivation.
• Hence the above given grammar is an ambiguous grammar. • And finally, a parse tree is constructed.
AU 17 AU 18
Cont'd Cont'd
• When the parse tree can be constructed from root and expanded to leaves the such type
of parser is called top-down parser.
• The name itself tells us that the parser tree can be built from top to bottom.
• When the parser tree can be constructed form leaves to root, then such type of parser is
called bottom-up parser.
• Thus, the parser tree is built in bottom up manner.
• The next figure shows types of parsers.
Problems with Top-Down Parsing • Secondly, in backtracking we need to move some levels upward in order to check
possibilities.
• There are certain problems in top-down parsing.
• This increases lot of overhead in implementation of parsing.
• In order to implement the parsing we need to eliminate these problems.
• And hence it becomes necessary to eliminate the backtracking by modifying the grammar.
AU 21 22
…Cont’d …Cont’d
2. Left Recursion: A
A Subtree
• Left recursive grammar is a grammar which is a given below:
A α
S→Aα A
α
• Here means deriving the input in one or more steps. A
α
Fig. 3.9 Left Recursion
• The A can be non-terminal and α denotes some input string.
• If left recursion is present in the grammar then it creates serous problem. • The expansion of A causes further expansion of A and due to generation of A, Aα,
• Because of the left recursion the top-down parser can enter infinite loop. Aαα, Aααα, …, the input pointer will not be advanced.
• This is as shown in the Figure below on the next slide. • This causes major problem in top-down parsing and therefore elimination of left recursion is
a must.
• To eliminate left recursion we need to modify the grammar.
• Let, G be a context free grammar having a production rule with left recursion.
AU 23 24
…Cont’d
A→Aα ------- (1) A A
A→β
A α β A'
• Then eliminate left recursion by re-writing the production rule as:
A α α A'
A→βA' ------- (2)
A A'
A'→αA' α α
A'→ε β α A'
• Thus a new symbol A' is introduced. ε
• We can also verify whether the modified grammar is equivalent to original or not. • For example: consider the grammar E E+T|T
β α α α • Then using equation 2 above, we can say • Similarly for the rule
A=E E→E+T|T T→T*F|F
Input string α = +T • Then the rule becomes • We can eliminate left recursions:
β=T E→TE'
T→FT'
E '→+TE'|ε
T'→*FT'|ε
AU 25 AU 26
…Cont’d …Cont’d
3. Left Factoring 4. Ambiguity
• If the grammar is left factored then it becomes suitable for the use.
• The ambiguous grammar is not desirable in top-down parsing.
• Basically left factoring is used when it is not clear that which the two alternatives is used to
• Hence, we need to remove the ambiguity from the grammar if it is present.
expand the non-terminals.
• Example: E→E+E|E*E|id is an ambiguous grammar.
• By the factoring we may be able to re-write the production in which the decision can be
• We will design the parse tree for id+id*id as follows:
defined until enough of the input is seen to make the first choice.
E E
• In general, if A→αβ1|αβ2 is a production then it is not possible for us to make to take decision
E E E E
whether to choose first rule or second. A→αA'
+ *
A'→β1|β2 id E + E id
E * E
• For example: consider the following expressions
S→iEtS|iEtSeS|a id id
id id
E→b S→iEtSS'|a
(a) Parse Tree 1 (b) Parse Tree 1
• The left factored grammar becomes: S'→ eS|ε
E →b Fig.3.10 Ambiguous Grammar
AU 27 AU 28
…Cont’d
Backtracking
• For removing the ambiguity we will apply one rule.
• Backtracking is a technique in which for expansion of no-terminal symbol we choose
• If the grammar has left associative operator (such as +, -, *, / ) then induce the left
one alternative and if some mismatch occurs then we try another alternative if any.
recursion and if the grammar has right associative operator (exponential operator) then
• Example:
induce the right recursion. E→E+T
E→E+E E→T • Note one thing that the grammar is
E→E*E T→T*F unambiguous but it is left recursive
E→id T→F
• Ambiguous grammar F→id and elimination of such left
• Unambiguous grammar recursion is again must. • If for a non-terminal there are multiple production rules beginning with the same input
symbol to get the correct derivation, we need to try all these possibilities.
E→TE'
E' → +TE'|ɛ • Secondly, in backtracking we need to move some levels upward in order to check
T→FT '
• Grammar after left recursion is removed. possibilities.
T'→*FT'|ɛ
F→id • This increases lot of overhead in implementation of parsing.
AU 29 30
Cont'd
Predictive Parser
• And hence it becomes necessary to eliminate the backtracking by modifying the
• As the name indicates predictive parser tries to predict the next construction using one or
grammar.
more lookahead symbols from input string.
• Backtracking will try different production rules to find the match for the input string
• There are two types of predictive parser:
by backtracking each time.
i. Recursive Descent Parser
• The backtracking is a powerful than predictive parsing.
ii. LL (1) Parser
• But this technique a backtracking parser is slower and it requires exponential time in
• A parser that uses collection of recursive procedures for parsing the given input string is
general.
called Recursive Descent (RD) Parser).
• Hence, backtracking is not preferred for practical compilers.
• This type of parser the CFG is used to build the recursive routines.
Limitation:
• The RHS of the production rule is directly converted to a program.
• If the given grammar has more number of alternatives then the cost of
• For each non-terminal a separate procedure is written and body of the procedure (code) is
backtracking is high.
AU 31 RHS of the corresponding non-terminal. AU 32
Cont'd Cont'd
Basic steps for construction of RD Parser Example: Consider the grammar having start symbol S.
• The RHS of the rule is directly converted into program code symbol by symbol.
1. If the input symbol is non-terminal then a call to the procedure corresponding to
• To construct a parse top-down for the input string w=cad, begin with a tree consisting of
the no terminal is made.
a single node labeled S and the input ptr. pointing to c, the first symbol of w.
2. If the input symbol is terminal then it is matched with the lookahead from input.
• S has only one production, so we use it to expand S and obtain the tree as in figure.
3. If the production rule has many alternates then all these alternates has to be
combined into a single body of procedure. • So leftmost leaf labeled c matches the first symbol of the
4. The parser should be activated by a procedure a corresponding to the start input w, so we advance the input ptr to a, the second symbol
Cont'd Cont'd
• The second alternative for A matches for A procedures the tree of Figure (c).
• Now we advanced A using the leftmost alternative. • The leaf a match the second symbol of w and the leaf d matches the third symbol.
• We have a match for the second input symbol a, so we advance the input ptr to d, the • Since we have produced a parse tree for w, we halt and announce successful completion
third input symbol and compare d against the next leaf, labeled b. of parsing.
• Since b does not match d, we report failure and go back to A to see whether there is
another alternative for A that has not been tried, but that might produce a match.
• In going back to A, we must reset the input pointer to position 2, the position it had
when first came to A, which means that the procedure for A must store the input pointer
in a local variable. 35 36
Cont'd Cont'd
ii. LL (1) Parser • The data structure used by LL (1) are:
• The simple block diagram for LL (1) parser is as given below. Input buffer
Stack
Parsing table
• The LL (1) parser uses input buffer to store the input tokens.
• The stack is used to hold the left sentential form.
• The symbol table in RHS of rule are pushed into the stack in reverse order. i.e. from
right to left.
• Thus, use of stack makes this algorithm no recursive.
• The table is basically a two-dimensional array.
AU 37 AU 38
Cont'd Cont'd
• The table has row for non-terminal and column for terminals. Top Input token Parsing Action
• The table can be represented as M[A, a] where A is a non-terminal and a is current input $ $ Parsing successfully halt
symbol. a A Pop a and advance lookahead to next token
a B Error
• The parser works as follows:
A a Refer table M[A, a] if entry at M[A, a] is error report error
• The parsing program reads top of the stack and a current input symbol.
A a Refer table M[A, a] is entry at M[A, a] is A→PQR then pop A
• With the help of these two symbols the parsing action is determined.
then push R, then push Q, then push P
• The parsing action can be:
• The parser consults the table M[A, a] each time while taking the parsing actions hence
this type of parsing method is called table driven parsing algorithm.
• The configuration of LL (1) parser can be defined by top of the stack and a lookahead
token.
AU 39 AU 40
Cont'd Cont'd
• One by one configuration is performed and the input is successfully parsed if the parser First Function
reaches the halting configuration. • FIRST(α) is a set of terminal symbols that begins in strings derived from α.
• When the stack is empty and next token is $ then it corresponds to successful parse. Example: A→ abc/def/ghi
Configuration of Predictive LL (1) Parser then FIRST(A) = {a, d, g}
• The construction of predictive LL (1) parser is based on two very important functions Rules for Calculating FIRST Function
and those are FIRST and FOLLOW. 1. For production rule X → ε then FIRST(X) ={ε}
• For construction of Predictive LL (1) parser we have to follow the following steps: 2. For any terminal symbol a, then FIRST(a) ={a}
1. Computation of FIRST and FOLLOW function 3. For production rule X→Y1 Y2 Y3
2. Construct the Predictive Parsing Table using FIRST and FOLLOW • Calculating FIRST(X)
functions i. If ε does not belongs to FIRST(Y1), then FIRST(X) = FIRST(Y1).
3. Stack Implementation ii. If ε belongs to FIRST(Y1), then FIRST(X) = {FIRST(Y1) - ε}U FIRST(Y2Y3).
AU 41 AU 42
4. Construct Parse Tree
Cont'd Cont'd
• Calculating FIRST(Y2Y3) Rules for Calculating Follow Function:
i. If ε does not belongs to FIRST(Y2), then FIRST(Y2Y3) = FIRST(Y2). 1. For the start symbol S, place $ in Follow(S)
ii. If ε belongs to FIRST(Y2), then FIRST(Y2Y3) = {FIRST(Y2 - ε } U FIRST(Y3) Follow(S) = {$}
• Similarly, we can expand the rule for any production rule
2. For any production A → αB
X→Y1 Y2 Y3 …. Yn
Follow(B) = FOLLOW{A}
FOLLOW Function:
3. For any production rule A → αBβ
• Follow(A) is the set of terminal symbols that appear immediately to the right of A.
• If ε does not belongs to FIRST(β), then Follow(B) = First(β)
Example: S → aAc FOLLOW(A) = {c} • If ε belongs to FIRST(β), then Follow(B) = {First(β) - ε}U Follow(A)
A → bd
AU 43 AU 44
Cont'd Cont'd
• Example 1: Calculate the First and Follow functions of the given grammar
Note Solution:
E → TE' Step 1: Calculating First and Follow Function
• ε may appear in the First function of a non terminal.
• ε will never appear in the Follow function of a non terminal. E' → +TE' | ε
Symbols FIRST FOLLOW
• It is recommended to eliminate left recursion from the grammar if present before T → F T'
E { (, id } { ), $ }
calculating First and Follow functions. T' → *FT' | ε E' { +, ε } { ), $ }
• We will calculate the Follow function of a non terminal looking where it is present on F → (E) | id T { (, id } { +, ), $ }
RHS of a production rule. T' { *, ε } { +, ), $ }
F { (, id } { +, *, ), $ }
AU 45 AU 46
* F T'
F → f/ε D {g, f, ε} {h}
id E {g, ε} {f, h}
ε F {f, ε} {h}
id
AU 49 AU 50
Cont'd Cont'd
Step 3: Stack Implementation Input string acbgh$
Step 2: Construct parse table using First and Follow function.
Terminal Symbol Stack Input Production(action)
S → aBDh
a b c f g h $ S$ acbgh$ S → aBDh
B → cC aBDh$ acbgh$ Pop a
Non Terminal Symbol
S→A
c E F • Left Recursive grammar
C A → aBA'
ε
b g A' → dA'/ ε
C B→b
C→g
string acbgh
ε
AU 53 AU
• New grammar 54
Cont'd Cont'd
First Function Follow Function • Step 2: Construct parse table using First and Follow function.
First(S) = First(A) = {a} Follow(S) = {$}
a b d g $
First(A) = {a} Follow(A) = Follow(S) = {$}
S S →A
First(A' ) = {d, ε} Follow(A' ) = Follow(A) = {$)
First(B) = {b} Follow(B) = {First(A') – ε} U Follow(A) = {d, $} A A → aBA'
First(C) = {g} Follow(C) = NA A' A' → dA' A' → ε
Cont'd Cont'd
• Example 4 : Show that the following grammar is LL(1). • Now consider the "ba" for Parsing:
S → AaAb | BbBa Stack Input Production
A→ε
S$ ba$ S → BbBa S
B→ε
BbBa$ ba$ B →ε
• Solution: now we will first construct FIRST and FOLLOW for the given grammar bBa$ ba$ Pop b a
B B
FISRT FOLLOW a b $ Ba$ a$ B →ε b
S { a, b } {$} S S → AaAb S → BbBa a$ a$ Pop a ε ε
A {ε} {a,b} $ $ Accept
A A→ε A→ε
B {ε } {a,b}
B B→ε B→ε • This shows that the given grammar is LL (1). • Parse Tree for the above given LL(1)
grammar
Parsing Table for the above grammar
AU 59 AU 60
Cont'd
• Example 5 : Construct LL (1) parsing table for the following grammar.
S → aB | aC | Sd | Se
B → bBc | f
C→g
• Solution: now we will first construct FIRST and FOLLOW for the given grammar
FISRT FOLLOW a b c d f g $
S S → aB
S {a} {d, e, $}
S → aC
B { b, f } {c, d, e, $} B B → bBc B→f
C {g } {d, e, $}}
C C→g