Chapter 3 - Syntax Analysis Part One
Chapter 3 - Syntax Analysis Part One
• The tokens generated by lexical analyzer are accepted by the next phase of compiler i.e.
Compiler Design syntax analyzer.
• The syntax analyzer or parser comes after lexical analysis.
• This is one of the important components of front end of the compiler.
Chapter 3: Syntax Analysis • The syntax analysis is the second phase in compilation.
• The syntax analyzer (parser) basically checks for the syntax of the language.
• The syntax analyzer takes the tokens from the lexical analyzer and groups them in such
Instructor: Fikru T. (MSc.) as a way that some programming structure(syntax) can be recognized.
Email: fikrutafesse08@gmail.com • After grouping the token, if any syntax cannot be recognized then syntactic error will be
generated.
AU 1 • This overall process is called syntax checking of the language. 2
For example: a = b + 10; • In turn, the lexical analyzer supplies tokens to syntax analyzer (parser).
Page 1
Cont'd ... Basic Issues in Parsing
• The parser collects sufficient number of tokens and builds a parse tree. • There are two important issues in parsing
• Thus, by building the parse tree, parser smartly finds the syntactical errors if any. 1. Specification of syntax
• It is also necessary that the parser should recover from commonly occurring errors so that 2. Representation of input after parsing
remaining task of process the input can be continued. • A very important issue in parsing is specification of syntax in programming language.
Why lexical and Syntax Analyzer are separated? • Specification of syntax means how to write any programming language statements.
• The lexical analyzer scans the input program and collects the tokens from it. • There some characteristics of specification of syntax:
• Parser builds a parse tree using these tokens. i. Specification should be precise and unambiguous
• There are two important activities and these activities are independently carried out by these ii. Specification should be in detail
two phases of compiler. iii. Specification should be complete.
• Separating out these two phases has two advantages: • Such specification is called "Context free Grammar" (CFG).
1. It accelerates the process of compilation • Another important issue in parsing is representation of the input after parsing.
2. The error in the source input can be identified precisely 5 • Lastly the most crucial issue is the parsing algorithm 6
Page 2
Cont'd ... Cont'd ...
• For example, if we want to define the declarative sentence using context free grammar then
it could be as follow: Using the given rules, we can derive: • Following rules to be followed while writing context free Grammars (CFG).
1. A single non-terminal should be at LHS.
State
2. The rule should be always in the form of LHS → RHS where RHS may be the
Type List
Terminator combination of non-terminal and terminal symbols.
3. The NULL derivation can be specified as NT → ε
id ;
List ,
int 4. One of the non-terminals should be start symbol and congenitally we should
• Hence, int id, id; can be defined by means of the above context free grammars.
AU 9 AU 10
• Derivation from S means generation of string W from S. • Instead of choosing the arbitrary non-terminal one can choose:
• For constructing derivation two things are important. i. Either leftmost non-terminal in a sentential form then it is called leftmost derivation.
i. Choice of Non-terminal from several others. ii. Or rightmost non-terminal in a sentential form, then it is called rightmost derivation.
ii. Choice of rule from production rules for corresponding non-terminal. Example 1: Derive input string abb for left most derivation and right most derivation using
Definition of Derivation Tree CFG given by below:
Rightmost derivation:
• Let G = (V, T, P, S) be a Context Free Grammar. S → AB|ε Leftmost derivation:
Page 3
Cont'd ... Cont'd ...
• The parse tree for the above derivation is as given below: • Example 2: Design a derivation tree for the following grammar:
S
S
A B
A B
a B S b • Also obtain the leftmost and rightmost derivation for the string 'aaabaab' using above
a B S b grammar.
b ɛ
S
b ɛ
S
ɛ • Solution:
ɛ
• Parse tree Leftmost derivation
Parse tree for Rightmost derivation
AU 13 AU 14
Page 4
Cont'd ... Parsing Techniques
Example 2: Find whether the following grammar is ambiguous or not. – There are two parsing techniques.
– These are:
1. Top-down parsing
Solution: we will construct parse tree for the string aab.
2. Bottom-up parsing
– These parsing techniques work on the following principles
• The parser scans input string from left to right and identifies that the derivation is
leftmost or rightmost.
• The parser makes use of production rules for choosing the appropriate derivation.
• The different parsing techniques use different approaches in selecting the
• There are two different parse trees for deriving string aab.
• Hence the above given grammar is an ambiguous grammar.
appropriate rules for derivation.
AU 17
• And finally, a parse tree is constructed.
AU 18
Cont'd
Parsing Techniques Cont.…
– When the parse tree can be constructed from root and expanded to leaves the
such type of parser is called top-down parser.
• The name itself tells us that the parser tree can be built from top to
bottom.
– When the parser tree can be constructed form leaves to root, then such type
of parser is called bottom-up parser.
• Thus, the parser tree is built in bottom up manner.
– The next figure shows types of parsers.
Figure: Parsing Techniques
AU 19 20
Page 5
Top-Down Parser Top-Down Parser Cont.…
– In top-down parsing the parse tree is generated from top to bottom (from root to leaves). – The derivation terminates when the required input string terminates.
– The derivation terminates when the required input string terminates. – TDP internally uses leftmost derivation.
– The process of construction of parse tree starting from root and proceed to children is – TDP is constructed for the grammar if it is free from ambiguity and left recursion.
called top-down parsing (TDP). – The leftmost derivation matches this requirement.
– That means starting from start symbol of the given grammar and reaching the input – The main task in top-down parsing is to find the appropriate production rule in order to
string. produce the correct input string.
– Example: Construct parse tree for the string w using the given grammar. – Example: consider the grammar
– Consider the input string xyz is as shown:
AU 21 AU 22
AU 23 AU 24
Page 6
…Cont’d …Cont’d
1. Backtracking: is a technique in which for expansion of no-terminal symbol we 2. Left Recursion:
choose one alternative and if some mismatch occurs then we try another alternative if • Left recursive grammar is a grammar which is a given below:
any. S S→Aα
S
S→xPz S x z
• For example: P→yw|y then P x z • Here means deriving the input in one or more steps.
x z P
P y w • The A can be non-terminal and α denotes some input string.
y
• If for a non-terminal there are multiple production rules beginning with the same input symbol • If left recursion is present in the grammar then it creates serous problem.
to get the correct derivation we need to try all these possibilities. • Because of the left recursion the top-down parser can enter infinite loop.
• Secondly, in backtracking we need to move some levels upward in order to check • This is as shown in the Figure below on the next slide.
possibilities.
• This increases lot of overhead in implementation of parsing.
• And hence it becomes necessary to eliminate the backtracking by modifying the grammar.
25 AU 26
…Cont’d …Cont’d
A A→Aα ------- (1)
A→β
A Subtree • Then eliminate left recursion by re-writing the production rule as:
A α
A→βA' ------- (2)
A
α A'→αA'
A A'→ε
α
Fig. 3.9 Left Recursion
• Thus a new symbol A' is introduced.
• The expansion of A causes further expansion of A and due to generation of A, Aα, • We can also verify whether the modified grammar is equivalent to original or not.
Aαα, Aααα, …, the input pointer will not be advanced. β α α α
• This causes major problem in top-down parsing and therefore elimination of left recursion is
Input string
a must.
• To eliminate left recursion we need to modify the grammar.
• Let, G be a context free grammar having a production rule with left recursion.
27 AU 28
Page 7
…Cont’d
A A 3. Left Factoring
A α β A' • If the grammar is left factored then it becomes suitable for the use.
A α α A' • Basically left factoring is used when it is not clear that which the two alternatives is used to
A A'
α α expand the non-terminals.
β α A' • By the factoring we may be able to re-write the production in which the decision can be
ε defined until enough of the input is seen to make the first choice.
• For example: consider the grammar E E+T|T • In general, if A→αβ1 |αβ2 is a production then it is not possible for us to make to take decision
whether to choose first rule or second. A→αA'
• Then using equation 2 above, we can say • Similarly for the rule A'→β1|β2
A= E E→E+T|T T→T*F|F • For example: consider the following expressions
α = +T • Then the rule becomes S→iEtS|iEtSeS|a
• We can eliminate left recursions:
β=T E→TE' E→b S→iEtSS'|a
T→FT' • The left factored grammar becomes: S'→ eS|ε
E '→+TE'|ε
T'→*FT'|ε E →b
AU 29 AU 30
…Cont’d …Cont’d
4. Ambiguity • For removing the ambiguity we will apply one rule.
• The ambiguous grammar is not desirable in top-down parsing. • If the grammar has left associative operator (such as +, -, *, / ) then induce the left
• Hence, we need to remove the ambiguity from the grammar if it is present. recursion and if the grammar has right associative operator (exponential operator) then
Page 8
Backtracking Cont'd
• If for a non-terminal there are multiple production rules beginning with the same input
symbol to get the correct derivation, we need to try all these possibilities.
There are types by which the top-down parsing can be performed.
• Secondly, in backtracking we need to move some levels upward in order to check
1. Backtracking
possibilities.
2. Predictive Parsing
• This increases lot of overhead in implementation of parsing.
33 AU 34
– Backtracking will try different production rules to find the match for the input string • As the name indicates predictive parser tries to predict the next construction using one or
– The backtracking is a powerful than predictive parsing. • There are two types of predictive parser:
– But this technique a backtracking parser is slower and it requires exponential time in i. Recursive Descent Parser
– Hence, backtracking is not preferred for practical compilers. • A parser that uses collection of recursive procedures for parsing the given input string is
– If the given grammar has more number of alternatives then the cost of backtracking • This type of parser the CFG is used to build the recursive routines.
Page 9
Cont'd Cont'd
Basic steps for construction of RD Parser Example: Consider the grammar having start symbol S.
• The RHS of the rule is directly converted into program code symbol by symbol.
1. If the input symbol is non-terminal then a call to the procedure corresponding to
• To construct a parse top-down for the input string w=cad, begin with a tree consisting of
the no terminal is made.
a single node labeled S and the input ptr. pointing to c, the first symbol of w.
2. If the input symbol is terminal then it is matched with the lookahead from input.
• S has only one production, so we use it to expand S and obtain the tree as in figure.
3. If the production rule has many alternates then all these alternates has to be
combined into a single body of procedure. • So leftmost leaf labeled c matches the first symbol of the
4. The parser should be activated by a procedure a corresponding to the start input w, so we advance the input ptr to a, the second symbol
Cont'd Cont'd
• The second alternative for A matches for A procedures the tree of Figure (c).
• Now we advanced A using the leftmost alternative. • The leaf a match the second symbol of w and the leaf d matches the third symbol.
• We have a match for the second input symbol a, so we advance the input ptr to d, the • Since we have produced a parse tree for w, we halt and announce successful completion
third input symbol and compare d against the next leaf, labeled b. of parsing.
• Since b does not match d, we report failure and go back to A to see whether there is
another alternative for A that has not been tried, but that might produce a match.
• In going back to A, we must reset the input pointer to position 2, the position it had
when first came to A, which means that the procedure for A must store the input pointer
in a local variable. 39 40
Page 10
Cont'd Cont'd
ii. LL (1) Parser and LL (1) Grammar • The data structure used by LL (1) are:
• The simple block diagram for LL (1) parser is as given below. Input buffer
Stack
Parsing table
• The LL (1) parser uses input buffer to store the input tokens.
• The stack is used to hold the left sentential form.
• The symbol table in RHS of rule are pushed into the stack in reverse order. i.e. from
right to left.
• Thus, use of stack makes this algorithm no recursive.
• The table is basically a two-dimensional array.
AU 41 AU 42
Cont'd Cont'd
• The table has row for non-terminal and column for terminals. Top Input token Parsing Action
• The table can be represented as M[A, a] where A is a non-terminal and a is current input $ $ Parsing successfully halt
symbol. a A Pop a and advance lookahead to next token
a B Error
• The parser works as follows:
A a Refer table M[A, a] if entry at M[A, a] is error report error
• The parsing program reads top of the stack and a current input symbol.
A a Refer table M[A, a] is entry at M[A, a] is A→PQR then pop A
• With the help of these two symbols the parsing action is determined.
then push R, then push Q, then push P
• The parsing action can be:
• The parser consults the table M[A, a] each time while taking the parsing actions hence
this type of parsing method is called table driven parsing algorithm.
• The configuration of LL (1) parser can be defined by top of the stack and a lookahead
token.
AU 43 AU 44
Page 11
Cont'd Cont'd
• One by one configuration is performed and the input is successfully parsed if the parser First Function
reaches the halting configuration. • FIRST(α) is a set of terminal symbols that begins in strings derived from α.
• When the stack is empty and next token is $ then it corresponds to successful parse. Example: A→ abc/def/ghi
Configuration of Predictive LL (1) Parser then FIRST(A) = {a, d, g}
• The construction of predictive LL (1) parser is based on two very important functions Rules for Calculating FIRST Function
and those are FIRST and FOLLOW. 1. For production rule X → ε then FIRST(X) ={ε}
• For construction of Predictive LL (1) parser we have to follow the following steps: 2. For any terminal symbol a, then FIRST(a) ={a}
1. Computation of FIRST and FOLLOW function 3. For production rule X→Y1 Y2 Y3
2. Construct the Predictive Parsing Table using FIRST and FOLLOW • Calculating FIRST(X)
functions i. If ε does not belongs to FIRST(Y1), then FIRST(X) = FIRST(Y1).
3. Stack Implementation ii. If ε belongs to FIRST(Y1), then FIRST(X) = {FIRST(Y1) - ε}U FIRST(Y2Y3).
AU 45 AU 46
4. Construct Parse Tree
Cont'd Cont'd
• Calculating FIRST(Y2Y3) Rules for Calculating Follow Function:
i. If ε does not belongs to FIRST(Y2), then FIRST(Y2Y3) = FIRST(Y2). 1. For the start symbol S, place $ in Follow(S)
ii. If ε belongs to FIRST(Y2), then FIRST(Y2Y3) = {FIRST(Y2 - ε } U FIRST(Y3) Follow(S) = {$}
• Similarly, we can expand the rule for any production rule
2. For any production A → αB
X→Y1 Y2 Y3 …. Yn
Follow(B) = FOLLOW{A}
FOLLOW Function:
3. For any production rule A → αBβ
• Follow(A) is the set of terminal symbols that appear immediately to the right of A.
• If ε does not belongs to FIRST(β), then Follow(B) = First(β)
Example: S → aAc FOLLOW(A) = {c} • If ε belongs to FIRST(β), then Follow(B) = {First(β) - ε}U Follow(A)
A → bd
AU 47 AU 48
Page 12
Cont'd Cont'd
• Example 1: Calculate the First and Follow functions of the given grammar
Note Solution:
E → TE' Step 1: Calculating First and Follow Function
• ε may appear in the First function of a non terminal.
• ε will never appear in the Follow function of a non terminal. E' → +TE' | ε
Symbols FIRST FOLLOW
• It is recommended to eliminate left recursion from the grammar if present before T → F T'
E { (, id } { ), $ }
calculating First and Follow functions. T' → *FT' | ε E' { +, ε } { ), $ }
• We will calculate the Follow function of a non terminal looking where it is present on F → (E) | id T { (, id } { +, ), $ }
RHS of a production rule. T' { *, ε } { +, ), $ }
F { (, id } { +, *, ), $ }
AU 49 AU 50
Page 13
Cont'd Cont'd
Step 4: Construct Parse Tree
• Example 2: Calculate the First and Follow functions of the given grammar
E Solution:
S → aBDh Step 1: Calculating First and Follow Function
T E' B → cC FIRST FOLLOW
* F T'
F → f/ε D {g, f, ε} {h}
id E {g, ε} {f, h}
ε F {f, ε} {h}
id
AU 53 AU 54
Cont'd Cont'd
Step 3: Stack Implementation Input string acbgh$
Step 2: Construct parse table using First and Follow function.
Terminal Symbol Stack Input Production(action)
S → aBDh
a b c f g h $ S$ acbgh$ S → aBDh
B → cC aBDh$ acbgh$ Pop a
Non Terminal Symbol
S S → aBDh
C → bC/ε BDh$ cbgh$ B → cC
B B → cC cCDh$ cbgh$ Pop c
D → EF
C C → bC C→ε C →ε C →ε CDh$ bgh$ C → bC
E → g/ε
D D → EF D → EF D → EF bCDh$ bgh$ Pop b
F → f/ε CDh$ gh$ C→ε
E E→ε E→g E→ε
Dh$ gh$ D → EF
F F→f F→ε
EFh$ gh$ E→g
gFh$ gh$ Pop g
Note: Blank entries are errors.
Fh$ h$ F→ε
h$ h$ Pop h
AU 55 $ $ accept 56
Page 14
Cont'd Cont'd
• Example 3: Calculate the First and Follow functions of the given grammar
Step 4: Constructing Parse Tree
S→A • Solution: the given grammar is left recursive.
S→A
c E F • Left Recursive grammar
C A → aBA'
ε
b g A' → dA'/ ε
C B→b
C→g
string acbgh
ε
AU 57 AU
• New grammar 58
Cont'd Cont'd
First Function Follow Function • Step 2: Construct parse table using First and Follow function.
First(S) = First(A) = {a} Follow(S) = {$}
a b d g $
First(A) = {a} Follow(A) = Follow(S) = {$}
S S→A
First(A' ) = {d, ε} Follow(A' ) = Follow(A) = {$)
First(B) = {b} Follow(B) = {First(A') – ε} U Follow(A) = {d, $} A A → aBA'
First(C) = {g} Follow(C) = NA A' A' → dA' A' → ε
Page 15
Cont'd Cont'd
• Step 3: Stack Implementation by using Parsing Table. • Step 4: Generate Parse Tree using Stack Implementation following Top Down Approach.
Cont'd Cont'd
• Example 4 : Show that the following grammar is LL(1). • Now consider the "ba" for Parsing:
S → AaAb | BbBa Stack Input Production
A→ε
S$ ba$ S → BbBa S
B→ε
BbBa$ ba$ B →ε
• Solution: now we will first construct FIRST and FOLLOW for the given grammar bBa$ ba$ Pop b a
B B
FISRT FOLLOW a b $ Ba$ a$ B →ε b
S { a, b } {$} S S → AaAb S → BbBa a$ a$ Pop a ε ε
A {ε} {a,b} $ $ Accept
A A→ε A→ε
B {ε } {a,b}
B B→ε B→ε • This shows that the given grammar is LL (1). • Parse Tree for the above given LL(1)
grammar
Parsing Table for the above grammar
AU 63 AU 64
Page 16
Cont'd
• Example 5 : Construct LL (1) parsing table for the following grammar.
S → aB | aC | Sd | Se
B → bBc | f
C→g
End!!!
Thank You!!!
FISRT FOLLOW a b c d f g $
S S → aB
S {a} {d, e, $}
S → aC
B { b, f } {c, d, e, $} B B → bBc B→f
C {g } {d, e, $}} To be Continued with
Part two Bottom up
C C→g
Page 17