0% found this document useful (0 votes)
30 views60 pages

Lecture 6

The document discusses syntax analysis and parsing, focusing on ambiguous grammars and their resolution. It explains how to eliminate ambiguity through disambiguation rules and addresses issues like left recursion, cycles, and left factoring in grammars. Additionally, it covers top-down parsing techniques, including recursive descent and predictive parsing, along with the computation of the FIRST function for grammar analysis.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views60 pages

Lecture 6

The document discusses syntax analysis and parsing, focusing on ambiguous grammars and their resolution. It explains how to eliminate ambiguity through disambiguation rules and addresses issues like left recursion, cycles, and left factoring in grammars. Additionally, it covers top-down parsing techniques, including recursive descent and predictive parsing, along with the computation of the FIRST function for grammar analysis.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 60

SYNTAX ANALYSIS

OR
PARSING
Lecture 06
AMBIGUOUS GRAMMAR

2
AMBIGUOUS GRAMMAR

More than one Parse Tree for some sentence.


⚫ The grammar for a programming language may be ambiguous
⚫ Need to modify it for parsing.

Also: Grammar may be left recursive.


Need to modify it for parsing.

3
ELIMINATION OF AMBIGUITY

Ambiguous

A Grammar is ambiguous if there are multiple parse trees


for the same sentence.

Disambiguation

Express Preference for one parse tree over others


⚫ Add disambiguating rule into the grammar
4
RESOLVING PROBLEMS: AMBIGUOUS
GRAMMARS
Consider the following grammar segment:
stmt → if expr then stmt
| if expr then stmt else stmt
| other (any other statement)
If E1 then S1 else if E2 then S2 else S3

simple parse tree:

stmt

if expr then stmt else stmt

E1 S1 expr then stmt else stmt


if

E2 S2 S3 5
EXAMPLE : WHAT HAPPENS WITH
THIS STRING?
If E1 then if E2 then S1 else S2
How is this parsed ?

if E1 then if E1 then
if E2 then if E2 then
S1 vs. S1
else else
S2 S2

6
PARSE TREES: IF E1 THEN IF E2 THEN S1
ELSE S2
Form 1: stmt

if expr then stmt

E1 then stmt else stmt


if expr

E2 S1 S2

Form 2:
stmt

if expr then stmt else stmt

E1 expr then stmt S2


if
7
E2 S1
REMOVING AMBIGUITY

Take Original Grammar:


stmt → if expr then stmt
| if expr then stmt else stmt
| other (any other statement)

Rule: Match each else with the closest previous


unmatched then.
Revise to remove ambiguity:
stmt → matched_stmt | unmatched_stmt
matched_stmt → if expr then matched_stmt else matched_stmt | other
unmatched_stmt → if expr then stmt
8
| if expr then matched_stmt else unmatched_stmt
RESOLVING DIFFICULTIES : LEFT
RECURSION
A left recursive grammar has rules that support the
+
derivation : A ⇒ Aα, for some α.

Top-Down parsing can’t reconcile this type of grammar,


since it could consistently make choice which wouldn’t
allow termination.
A ⇒ Aα ⇒ Aαα ⇒ Aααα … etc. A→ Aα | β

Take left recursive grammar:


A → Aα | β
To the following:
A → βA’ 9
A’ → αA’ | ∈
WHY IS LEFT RECURSION A PROBLEM
?
Consider: Derive : id + id + id
E→E+T | T E⇒E+T⇒
T→T*F | F
F → ( E ) | id

How can left recursion be removed ?


E→E+T | T What does this generate?
E⇒E+T⇒T+T
E⇒E+T⇒E+T+T⇒T+T+T

How does this build strings ? 10

What does each string have to start with ?


RESOLVING DIFFICULTIES : LEFT
RECURSION (2)
Informal Discussion:

Take all productions for A and order as:


A → Aα1 | Aα2 | … | Aαm | β1 | β2 | … | βn
Where no βi begins with A.
Now apply concepts of previous slide:
A → β1A’ | β2A’ | … | βnA’
A’ → α1A’ | α2A’ | … | αm A’ | ∈

For our example:


E→E+T | T E → TE’
E’ → + TE’ | ∈
T→T*F | F T → FT’
T’ → * FT’ | ∈
F → ( E ) | id F → ( E ) | id
11
RESOLVING DIFFICULTIES : LEFT
RECURSION (3)
Problem: If left recursion is two-or-more levels deep,
this isn’t enough
S → Aa | b
S ⇒ Aa ⇒ Sda
A → Ac | Sd | ∈
Algorithm:
Input: Grammar G with ordered Non-Terminals A1, ..., An
Output: An equivalent grammar with no left recursion
1. Arrange the non-terminals in some order A1=start NT,A2,…An
2. for i := 1 to n do begin
for j := 1 to i – 1 do begin
replace each production of the form Ai → Ajγ
by the productions Ai → δ1γ | δ2γ | … | δkγ
where Aj → δ1|δ2|…|δk are all current Aj productions;
end 12
eliminate the immediate left recursion among Ai productions
end
USING THE
ALGORITHM

Apply the algorithm to: A1 → A2a | b| ∈


A2 → A2c | A1d
i=1
For A1 there is no left recursion

i=2
for j=1 to 1 do
Take productions: A2 → A1γ and replace with
A2 → δ1 γ | δ2 γ | … | δk γ|
where A1→ δ1 | δ2 | … | δk are A1 productions
in our case A2 → A1d becomes A2 → A2ad | bd | d

What’s left: A1→ A2a | b | ∈ 13


Are we done ?
A2 → A2 c | A2 ad | bd | d
USING THE ALGORITHM
(2)

No ! We must still remove A2 left recursion !


A1→ A2a | b | ∈
A2 → A2 c | A2 ad | bd | d
Recall: A1→ A2a | b | ∈
A → Aα1 | Aα2 | … | Aαm | β1 | β2 | … | βn
A2 → bdA2’ | dA2’

A → β1A’ | β2A’ | … | βnA’ A2’ → c A2’ | adA2’ | ∈


A’ → α1A’ | α2A’ | … | αm A’ | ∈

Apply to above case. What do you get ? 14


REMOVING DIFFICULTIES :
∈-MOVES

Transformation: In order to remove A→ ∈ find all


rules of the form B→ uAv and add the rule B→ uv to
the grammar G.
Why does this work ? A is Grammar ∈-free if:
Examples: 1. It has no ∈-production or
E → TE’ 2. There is exactly one ∈-production
E’ → + TE’ | ∈
S → ∈ and then the start symbol S
T → FT’
T’ → * FT’ | ∈ does not appear on the right side of
any production.
F → ( E ) | id

A1 → A2 a | b
A2 → bd A2’ | A2’
15
A2’ → c A2’ | bd A2’ | ∈
REMOVING DIFFICULTIES :
CYCLES

How would cycles be removed ?


Make sure every production is adding some terminal(s) (except
a single ∈ -production in the start NT)…
e.g.
S → SS | ( S ) | ∈ Transform to:
Has a cycle: S ⇒ SS ⇒ S S→S(S)|(S)|∈

S→∈

16
REMOVING DIFFICULTIES : LEFT
FACTORING
Problem : Uncertain which of 2 rules to choose:
stmt → if expr then stmt else stmt
| if expr then stmt

When do you know which one is valid ?


What’s the general form of stmt ?
A → αβ1 | αβ2 α : if expr then stmt
β1: else stmt β2 : ∈

Transform to: EXAMPLE:


A → α A’ stmt → if expr then stmt rest
17
A’ → β1 | β2 rest → else stmt | ∈
TOP DOWN PARSING
Find a left-most derivation
Find (build) a parse tree
Start building from the root and work down…
As we search for a derivation
⚫ Must make choices:
Which rule to use
Where to use it

May run into problems!!

18
TOP-DOWN PARSING

Recursive-Descent Parsing
Backtracking is needed (If a choice of a production rule does not
work, we backtrack to try other alternatives.)
It is a general parsing technique, but not widely used.
Not efficient
Predictive Parsing
no backtracking
efficient
needs a special form of grammars (LL(1) grammars).
Recursive Predictive Parsing is a special form of Recursive
Descent parsing without backtracking.
Non-Recursive (Table Driven) Predictive Parser is also known as
LL(1) parser. 19
RECURSIVE DESCENT PARSING
(BACKTRACKING)

20
RECURSIVE DESCENT PARSING
(BACKTRACKING)

21
RECURSIVE DESCENT PARSING
(BACKTRACKING)

22
RECURSIVE DESCENT PARSING
(BACKTRACKING)

23
RECURSIVE DESCENT PARSING
(BACKTRACKING)

24
RECURSIVE DESCENT PARSING
(BACKTRACKING)

25
RECURSIVE DESCENT PARSING
(BACKTRACKING)

26
RECURSIVE DESCENT PARSING
(BACKTRACKING)

27
RECURSIVE DESCENT PARSING
(BACKTRACKING)

28
RECURSIVE DESCENT PARSING
(BACKTRACKING)

29
RECURSIVE DESCENT PARSING
(BACKTRACKING)

30
RECURSIVE DESCENT PARSING
(BACKTRACKING)

31
RECURSIVE DESCENT PARSING
(BACKTRACKING)

32
RECURSIVE DESCENT PARSING
(BACKTRACKING)

33
RECURSIVE DESCENT PARSING
(BACKTRACKING)

34
RECURSIVE DESCENT PARSING
(BACKTRACKING)

35
RECURSIVE DESCENT PARSING
(BACKTRACKING)

36
Successfully parsed!!
RECURSIVE-DESCENT PARSING
ALGORITHM
A recursive-descent parsing program consists of a set of procedures –
one for each non-terminal
Execution begins with the procedure for the start symbol
⚫ Announces success if the procedure body scans the entire input

void A(){
for (j=1 to t){ /* assume there is t number of A-productions */
Choose a A-production, Aj X1X2…Xk;
for (i=1 to k){
if (Xi is a non-terminal)
call procedure Xi();
else if (Xi equals the current input symbol a)
advance the input to the next symbol;
else backtrack in input and reset the pointer
} 37
}
}
PREDICTIVE PARSER
When re-writing a non-terminal in a derivation step, a
predictive parser can uniquely choose a production rule by
just looking the current symbol in the input string.

A → α1 | ... | αn input: ... a .......

current token

38
PREDICTIVE PARSER (EXAMPLE)

stmt → if ...... |
while ...... |
begin ...... |
for .....

When we are trying to write the non-terminal stmt, if the current


token is if we have to choose first production rule.
When we are trying to write the non-terminal stmt, we can uniquely
choose the production rule by just looking the current token.
We eliminate the left recursion in the grammar, and left factor it. But
it may not be suitable for predictive parsing (not LL(1) grammar).

39
RECURSIVE PREDICTIVE PARSING
Each non-terminal corresponds to a procedure.

Ex: A → aBb (This is only the production rule for A)

proc A {
- match the current token with a, and move to the next token;
- call ‘B’;
- match the current token with b, and move to the next token;
}

40
RECURSIVE PREDICTIVE PARSING
(CONT.)
A → aBb | bAB

proc A {
case of the current token {
‘a’: - match the current token with a, and move to the next token;
- call ‘B’;
- match the current token with b, and move to the next token;

‘b’: - match the current token with b, and move to the next token;
- call ‘A’;
- call ‘B’;
}
}

41
RECURSIVE PREDICTIVE PARSING
(CONT.)
When to apply ε-productions.

A → aA | bB | ε

If all other productions fail, we should apply an


ε-production. For example, if the current token is not a or
b, we may apply the ε-production.
Most correct choice: We should apply an ε-production
for a non-terminal A when the current token is in the
follow set of A (which terminals can follow A in the
sentential forms). 42
RECURSIVE PREDICTIVE PARSING
(EXAMPLE)
A → aBe | cBd | C
B → bB | ε
C→f
proc C { match the current token with f,
proc A { and move to the next token; }
case of the current token {
a: - match the current token with a,
and move to the next token; proc B {
- call B; case of the current token {
- match the current token with e, b: - match the current token with b,
and move to the next token; and move to the next token;
c: - match the current token with c, - call B
and move to the next token; e,d: do nothing
- call B; }
- match the current token with d, }
and move to the next token;
f: - call C
}
}

follow set of B

43
first set of C
FIRST FUNCTION

• FIRST (E) =
• FIRST (E’) =
• FIRST (T) = 44
• FIRST (T’) =
• FIRST (F) =
COMPUTING THE FIRST FUNCTION

45
TO COMPUTE THE FIRST(X1X2X3...XN)

46
FIRST - EXAMPLE

P i |c|nTS
Q P|aS|bScST
R b|ε
S c|Rn|ε
T RSq
FIRST(P) =
FIRST(Q) =
FIRST(R) =
FIRST(S) =
FIRST(T) =
47
FIRST - EXAMPLE

P i |c|nTS
Q P|aS|bScST
R b|ε
S c|Rn|ε
T RSq FIRST(P) = {i,c,n}
FIRST(Q) = {i,c,n,a,b}
FIRST(R) = {b,ε}
FIRST(S) = {c,b,n,ε}
FIRST(T) = {b,c,n,q}

48
FIRST - EXAMPLE

S aSe|STS
T RSe|Q
R rSr|ε
Q ST|ε FIRST(S) =
FIRST(R) =
FIRST(T) =
FIRST(Q) =

49
FIRST - EXAMPLE

S aSe|STS
T RSe|Q
R rSr|ε
Q ST|ε FIRST(S) = {a}
FIRST(R) = {r, ε}
FIRST(T) = {r, a, ε}
FIRST(Q) = {a, ε}

50
FOLLOW SETS
FOLLOW(A) is the set of terminals (including end
marker of input - $) that may follow non-terminal A in
some sentential form.
FOLLOW(A) = {c | S ⇒+ …Ac…} ∪ {$} if S ⇒+
…A
For example, consider L ⇒+ (())(L)L
Both ‘)’ and end of file can follow L
NOTE: ε is never in FOLLOW sets

51
COMPUTING FOLLOW(A)
1. If A is start symbol, put $ in FOLLOW(A)
2. Productions of the form B α A β,
Add FIRST(β) – {ε} to FOLLOW(A)
3. Productions of the form B α A or
B α A β where β ⇒* ε
Add FOLLOW(B) to FOLLOW(A)

52
EXAMPLE
FOLLOW(E) = {$}
E T E′ FOLLOW(E′) =
E′ + T E′ | ε FOLLOW(T) =
T F T′ FOLLOW(T′) =
T′ * F T ′ | ε FOLLOW(F) =
F ( E ) | id

FIRST(E) = {(, id} Using rule #1


FIRST(E′) = {+, ε}
FIRST(T) = {(, id} 1. If A is start symbol, put $ in FOLLOW(A)
FIRST(T′) = {*, ε}
FIRST(F) = {(, id}} 53
Assume the first non-terminal is the start symbol
EXAMPLE
E T E′ FOLLOW(E) = {$, )}
E′ + T E′ | ε FOLLOW(E′) =
T F T′ FOLLOW(T) = {+}
T′ * F T ′ | ε FOLLOW(T′) =
F ( E ) | id FOLLOW(F) = {*}

FIRST(E) = {(, id}


FIRST(E′) = {+, ε}
FIRST(T) = {(, id}
FIRST(T′) = {*, ε}
Using rule #2
FIRST(F) = {(, id}}
2. Productions of the form B α A β, 54
Add FIRST(β) – {ε} to FOLLOW(A)
EXAMPLE
E T E′ FOLLOW(E) = {$, )}
E′ + T E′ | ε FOLLOW(E′) = FOLLOW(E)
= {$, )}
T F T′
FOLLOW(T) = {+} ∪ FOLLOW(E′)
T′ * F T ′ | ε
= {+, $, )}
F ( E ) | id
FOLLOW(T′) = FOLLOW(T)
= {+, $, )}
FIRST(E) = {(, id}
FOLLOW(F) = {*} ∪ FOLLOW(T′)
FIRST(E′) = {+, ε} = {*, +, $, )}
FIRST(T) = {(, id}
FIRST(T′) = {*, ε} Using rule #3
FIRST(F) = {(, id}}
3. Productions of the form B α A or
55
B α A β where β ⇒* ε
Add FOLLOW(B) to FOLLOW(A)
EXAMPLE
S ( A) | ε
A TE
E &TE|ε
T (A)|a|b|c

⚫ FIRST(T) =
⚫ FIRST(E) =
⚫ FIRST(A) =
⚫ FIRST(S) =

FOLLOW(S) =
FOLLOW(A) =
FOLLOW(E) =
FOLLOW(T) =
56
EXAMPLE
S ( A) | ε
A TE
E &TE|ε
T (A)|a|b|c

FIRST(T) = {(,a,b,c}
FIRST(E) = {&, ε }
FIRST(A) = {(,a,b,c}
FIRST(S) = {(, ε}

FOLLOW(S) = {$}
FOLLOW(A) = { ) }
FOLLOW(E) = FOLLOW(A) = { ) } 57

FOLLOW(T) = FIRST(E) ∪ FOLLOW(E) = {&, )}


EXAMPLE
S aSe|B FOLLOW(C) =
B bBCf|C
C cCg|d| ε FOLLOW(B) =

FIRST(C) = FOLLOW(S) = {$}


FIRST(B) =
FIRST(S) =
Assume the first non-terminal is the start symbol
1. If A is start symbol, put $ in FOLLOW(A)
2. Productions of the form B α A β,
Add FIRST(β) – {ε} to FOLLOW(A)
3. Productions of the form B α A or
58
B α A β where β ⇒* ε
Add FOLLOW(B) to FOLLOW(A)
EXAMPLE
S aSe|B FOLLOW(C) =
B bBCf|C
{f,g} ∪ FOLLOW(B)
C cCg|d| ε
= {c,d,e,f,g,$}
FIRST(C) = {c,d,ε} FOLLOW(B) =
FIRST(B) = {b,c,d,ε} {c,d} ∪ FOLLOW(S)
FIRST(S) = {a,b,c,d,ε}
= {c,d,e,$}
FOLLOW(S) = { }
$, e

59
QUESTIONS ?

60

You might also like