University of sulaimani
College of science
Department of Computer Science
Compiler
Second Phase of the Compiler
Syntax Analyzer
2023-2024
Mzhda Hiwa Hama
Syntax Analyzer
The second phase of the compiler is syntax analysis
or parsing. The parser uses the first components of
the tokens produced by the lexical analyzer to create
a tree-like intermediate representation that depicts
the grammatical structure of the token stream.
Role of the Syntax Analyzer
• Constructs a tree (called a parse tree) to discover the
structure of a document/program. This parse tree is
used to guide translation.
• Syntactic error detection – reports to user where any
syntax error in the source code are.
• Recognizes sentences in a language.
• Represents the structure of the language.
Introduction to Parsing
• Parsing is the process of determining how a string of terminals
can be generated by a grammar. F → id | (E)
• A parser must be capable of constructing the tree in principle,
or else the translation cannot be guaranteed correct.
• A parser scans the input string from left to right and it makes
use of production rules for choosing appropriate derivation.
• Two parsing techniques:
• Top-down parsing
• Bottom-up parsing
Top-down parsing
• A parser is top-down if it generates a parse tree
starting from the root and precedes towards the
leaves.
• It is easier to understand and program manually
• A leftmost derivation is applied at each derivation step
Two kinds of top-down parsing techniques will be
studied
1. Recursive-Decent parser
2. Predictive parser
Bottom-Up Parsing
• A bottom-up parse corresponds to the construction
of a parse tree for an input string beginning at the
leaves (the bottom) and working up towards the
root (the top).
• Bottom-up parsing is more general than top-down
parsing.
Example
1- Top-Down Parsing by Recursive-Descent
• It reads characters from the input stream and matches
them with terminals from the grammar.
The operation involved are :
• Start from the “Start non-terminal” and select a rule
from the production rules (CFG)
• If it was not a correct rule, then backtrack and choose
another rule.
• If every production is unsuitable for string match, then
parse tree cannot be built, and syntax error is
reported.
Example1
• Consider grammar
S xPz
P yw | y
• For Token stream is: xyz
1- Select rule S xPz
S
x P z
Example1…cont’d
S xPz
2- Select rule P yw P yw | y
• Not correct S
x P z
y w
3- Select rule P y S
• Correct
x P z
y
Example2
• Consider the grammar
•ET+E|T
• T var | var * T
• Token stream is: var*var
• Start with top-level non-terminal E
• Try the rules for E in order E
Try E T + E
•Then try a rule for T var T
+
E
• Token matches var
• But + after var does not match input var
token *
Example2…cont’d
• Try T var* T
• Token matches.
• This will match but + after T will be unmatched
• Has exhausted the choices for E T + E E
T E
+
• Backtrack to choice for E var * T
Example2…cont’d
• Try E T
• Follow same steps as before for T
• Try a rule for T var
E
Token matches var
• But there is no other token after var T
var
• Try T var* T E
• Then try T var
• Succeed with the following parse tree T
var T
*
var
Notes
S
•Easy to implement by hand.
S a
•But does not always work …
S a
•Consider a production S S a S a
•S will get into an infinite loop.
S a
•This case is called left-recursion. .
.
•Recursive descent does not work in .
such cases.
Left Recursion
• A production of grammar is said to have left recursion if the
leftmost variable of its RHS is same as variable of its LHS.
• A grammar containing a production having left recursion is
called as Left Recursive Grammar.
Elimination of Left Recursion
If we have the left-recursive pair of productions-
A → Aα / β
Then, we can eliminate left recursion by replacing the pair of
productions with-
A → βA’
A’ → αA’ / ∈
Example of Elimination of Left Recursion
E → E + T|T
Eliminate immediate left recursion from the Grammar
Solution
Comparing E → E + T|T with A → A α |β
A = E, α = +T, β = T
A → A α |β is changed to A → βA′ and A′ → α A′|ε
A → βA′ means E → TE′
A′ → α A′|ε means E′ → +TE′|ε
Result
E → TE′
E′ → +TE′|ε
2- Top-Down Parsing by Predictive parser
• Consider Grammar
E → T E’
E’ → + T E’ | ε
T → F T’
T’ → * F T’ | ε
F → ( E ) | id
• String id + id * id
2- Top-Down Parsing by Predictive parser
• Predicts which production to use
• By looking at the next few tokens, using “lookahead”
variable
• No backtracking
• Predictive parsers accept LL(k) grammars
• L means “Left-to-Right” scan of input
• L means “Leftmost derivation”
• k means “predict based on k tokens of lookahead”
• In practice, LL(1) is used
• LL(k) grammar must be unambiguous
• LL(k) grammar must not include any left-recursion
LL(1) Parser
• input buffer : The string to be parsed
• Output: A production rule representing a step of the
derivation sequence (left-most derivation) of the string in the
input buffer.
• Stack: keeps the grammar symbols, initially contains $.
• The symbols in RHS of rule are pushed into the stack in
reverse order i.e. from right to left
• parsing table:
• a two-dimensional array
• each row is a non-terminal symbol
• each column is a terminal symbol or the special symbol $
• each entry holds a production rule.
20
LL(1) Parser Model
LL(1) Parser
Input token a + b
LL(1) Parser
Output
top
$
Stack
a + b $ Parsing table
A
B
C
Building Predictive Parser
Three steps
1. Compute FIRST and FOLLOW
2. Construct the predictive parsing table
3. Parse the input string
• Note: the grammar must be unambiguous and Left
Recursion must be eliminated
• First and follow are used to construct the predictive
parsing table
Computing First and Follow
• FIRST() is a set of the terminal symbols which
occur as first symbols in strings derived from .
Where is any string of grammar (terminals and
non-terminals).
• if derives to , then is also in FIRST() .
• FOLLOW(A) is the set of the terminals which occur
immediately after (follow) the non-terminal A. If the
strings derived from the starting symbol.
_ $ is in FOLLOW(A) if S A
• a terminal a is in FOLLOW(A) if S Aa
Computing FIRST
• FIRST(a) = {a} if a ∈ T
• FIRST(ε) = {ε}
• FIRST (X) for a non-terminal X
If there is production X→Y1Y2....Yk then
• FIRST(X) = FIRST(Y1) - {ε} .
• But, if ε ∈ FIRST(Y1),then add FIRST(Y2)-{ε}
• And, if ε ∈ FIRST(Y2),…
Example 1
• E → TE`
E`→ +TE` | ε
T → FT`
T`→ *FT` | ε
F → id | (E)
• FIRST(E) = FIRST(T) =FIRST(F)= { id, (}
• FIRST(E`) = {+, ε}
• FIRST(T) = FIRST(F)= { id, (}
• FIRST(T`) = {*, ε}
• FIRST(F) = { id, (}
Example 2
• type → simple
| ^ id
| array [ simple ] of type
• simple → integer
| char
| num dot num
• FIRST(simple) = { integer, char, num }
• FIRST(type) = { integer, char, num, ^, array }
Exercise 3
Find FIRST for the following grammar
• S ACB | CbB | Ba
A da | BC
Bg|ε
Ch|ε
• S Aa
A bdZ |eZ
Z cZ |adZ | ε
Computing FOLLOW
• If S is the start symbol $ is in FOLLOW(S)
• if A B is a production rule
everything in FIRST() is FOLLOW(B) except
• If ( A B is a production rule ) or
( A B is a production rule and is in FIRST() )
everything in FOLLOW(A) is in FOLLOW(B).
We apply these rules until nothing more can be
added to any follow set
Example 1
E → T E’
E’→ + T E’ | ε
T → F T’
T’→ * F T’ | ε
F → ( E ) | id
• FIRST(E) = FIRST(T) = FIRST(F) = {( , id}
FIRST(E’) = {+, ε}
FIRST(T’) = {*, ε}
• FOLLOW(E) = {) , $}
FOLLOW(E’) = {) , $}
FOLLOW(T) = {+, ), $}
FOLLOW(T’) = {+, ), $}
FOLLOW(F) = {*, +, ), $}
Example 2
E → T E'
E' → + T E‘ | - T E‘ | ε
T → F T'
T' → * F T‘ | / F T' | ε
F → num | id
First Follow
E {num , id} {$}
E’ {+, -, ε} {$}
T {num , id} {+, -, $}
T’ {*, /, ε} {+, -, $}
F {num , id} {*, /, +, -, $}
29
Notes
• To compute FIRST(A) you must look for A on a production's
left-hand side.
• To compute FOLLOW(A) you must look for A on a
production's right-hand side.
• FIRST sets are always sets of terminals (plus, perhaps
epsilon).
• FOLLOW sets are always sets of terminals (plus, perhaps $).
• Nonterminals are never in a FIRST or a FOLLOW set.
• epsilon is never in a FOLLOW set.
Constructing the Parse Table
• Parse table summarizes the applicable RHS for each
terminal/non-terminal combination.
• Construct a parsing table T for CFG :
• For each production X → α
• Add → α to the X row for each symbol in FIRST(α)
• If α is nullable, add → α for each symbol in
FOLLOW(X)
• Entry for [S, $] is ACCEPT
• All other undefined entries of the parsing table are
error entries.
Parse table Example (1)
E → T E’
E’→ + T E’ | ε
T → F T’
T’→ * F T’ | ε
F → ( E ) | id
First Follow
E {( , id} {), $}
E’ {+, ε} {), $}
T {( , id} {+, ), $}
T’ {*, ε} {+, ), $}
F {( , id} {*, +, ), $}
32
Parse table Example 1
• Create table with:
• Put each terminals in the columns
• Put each non-terminals to rows
+ * ( ) id $
E
E’
T
T’
F
33
Parse table Example 1
• For production E → T E’
• Add E → T E’ to the E row for each symbol in FIRST(E)
+ * ( ) id $
E E → T E’ E → T E’
E’
T
T’
F
Parse table Example 1
• For production E’→ + T E’ | ε
• E’→ + T E’, Add → + T E’ to the E’ row for each symbol
in FIRST(E’)
• E’ → ε, add → ε for each symbol in FOLLOW(E’)
+ * ( ) id $
E E → T E’ E → T E’
E’ E’→ + T E’ E’ → ε E’ → ε
T
T’
F
35
Parse table Example 1
• For production T → F T’
• Add T → F T’ to the T row for each symbol in FIRST(T)
+ * ( ) id $
E E → T E’ E → T E’
E’ E’→ + T E’ E’ → ε E’ → ε
T T → F T’ T → F T’
T’
F
36
Parse table Example 1
• For production T’→ * F T’ | ε
• Add T’→ * F T’ to the T’ row for each symbol in FIRST(T’)
• Add T’ → ε for each symbol in FOLLOW(T’)
+ * ( ) id $
E E → T E’ E → T E’
E’ E’→ + T E’ E’ → ε E’ → ε
T T → F T’ T → F T’
T’ T’ → ε T’→ * F T’ T’ → ε T’ → ε
F
37
Parse table Example 1
• For production F → ( E ) | id
• Add F → ( E ) to the F row and symbol (
• Add F → id to the F row and symbol id
+ * ( ) id $
E E → T E’ E → T E’
E’ E’→ + T E’ E’ → ε E’ → ε
T T → F T’ T → F T’
T’ T’ → ε T’→ * F T’ T’ → ε T’ → ε
F F→(E) F → id
38
Parser Actions
• The symbol at the top of the stack (say X) and the current symbol in
the input string (say a) determine the parser action.
• There are four possible parser actions.
1. If X and a are $ parser halts (successful completion)
2. If X and a are the same terminal symbol (different from $)
parser pops X from the stack, and moves the next symbol in the
input buffer.
3. If X is a non-terminal
parser looks at the parsing table entry M[X,a]. If M[X,a] holds a
production rule XY1Y2...Yk, it pops X from the stack and pushes
Yk,Yk-1,...,Y1 into the stack. The parser also outputs the production rule
XY1Y2...Yk to represent a step of the derivation.
4. none of the above error
• all empty entries in the parsing table are errors.
• If X is a terminal symbol different from a, this is also an error case.
LL(1)Parser Example
S aBa
B bB |
stack input output
$S abba$ S aBa
$aBa abba$
$aB bba$ B bB
$aBb bba$
$aB ba$ B bB
$aBb ba$
$aB a$ B
$a a$
$ $ accept, successful completion
LL(1) Parser – Example2
E TE’
E’ +TE’ |
T FT’
T’ *FT’ |
F (E) | id
LL(1) Parser – Example2…cont’d
Input is id+id
stack input output
$E id+id$ E TE’
$E’T id+id$ T FT ’
$E’ T ’F id+id$ F id
$ E’ T ’id id+id$
$ E’ T ’ +id$ T’
$ E’ +id$ E’ +FT ’
$ E’ T+ +id$
$ E’ T id$ T FT ’
$ E’ T ’ F id$ F id
$ E’ T ’id id$
$ E’ T ’ $ T’
$ E’ $ E’
$ $ accept