0% found this document useful (0 votes)
21 views53 pages

Chapter 4 Intro - To - Parsing

The document discusses the fundamentals of parsing in compiler design, focusing on formal languages, particularly regular and context-free languages. It explains the role of parsers in generating parse trees from token sequences and highlights the importance of context-free grammars (CFGs) in representing programming language constructs. Additionally, it addresses issues of ambiguity in grammars and methods for resolving such ambiguities to ensure clear program interpretation.

Uploaded by

ahmed301saeed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views53 pages

Chapter 4 Intro - To - Parsing

The document discusses the fundamentals of parsing in compiler design, focusing on formal languages, particularly regular and context-free languages. It explains the role of parsers in generating parse trees from token sequences and highlights the importance of context-free grammars (CFGs) in representing programming language constructs. Additionally, it addresses issues of ambiguity in grammars and methods for resolving such ambiguities to ensure clear program interpretation.

Uploaded by

ahmed301saeed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 53

Intro to Compiler

Introduction to Parsing

Chapter 4

1
Languages and Automata

• Formal languages are very important in CS


– Especially in programming languages

• Regular languages
– The weakest formal languages widely used
– Many applications

• We will also study context-free languages

2
Beyond Regular Languages

• Many languages are not regular

• Strings of balanced parentheses are not regular:

 ) | i  0
( i i

3
What Can Regular Languages Express?

A  B

0 0

4
What Can Regular Languages Express?

• Languages requiring counting modulo a fixed


integer

• Intuition: A finite automaton that runs long


enough must repeat states

• Finite automaton can’t remember # of times it


has visited a particular state

5
The Functionality of the Parser

• Input: sequence of tokens from lexer

• Output: parse tree of the program


(But some parsers never produce a parse tree . . .)

6
Example

• Cool
if x =y then 1 else 2 fi
• Parser input
IF ID =ID THEN INT ELSE INT FI
• Parser output
IF-THEN-ELSE
= INT INT

ID ID
7
Comparison with Lexical Analysis

Phase Input Output

Lexer String of String of tokens


characters

Parser String of tokens Parse tree


(may be implicit)

8
The Role of the Parser

• Not all strings of tokens are programs . . .


• . . . parser must distinguish between valid and
invalid strings of tokens

• We need
– A language for describing valid strings of tokens
– A method for distinguishing valid from invalid
strings of tokens

9
Chomsky Hierarchy

1 Unrestricted A  

2 Context-Sensitive | LHS |  | RHS |

3 Context-Free |LHS | = 1

4 Regular |RHS| = 1 or 2 ,
A  a | aB, or
A  a | Ba

10
Context-Free Grammars

• Programming language constructs have


recursive structure

• An STMT is
if EXPR then STMT else STMT
while EXPR do STMT end

• Context-free grammars are a natural notation
for this recursive structure

11
CFGs (Cont.)

• A CFG consists of
– A set of terminals T
– A set of non-terminals N
– A start symbol S (a non-terminal)
– A set of productions

12
Notational Conventions

• In these lecture notes


– Non-terminals are written upper-case
– Terminals are written lower-case
– The start symbol is the left-hand side of the first
production

13
Example of CFGs

Simple arithmetic expressions:

E  E E
| E+E
| E
| id
14
The Language of a CFG

Read productions as replacement rules:

X Y1…Yn

Means X can be replaced by Y1…Yn

15
Key Idea

1. Begin with a string consisting of the start


symbol “S”
2. Replace any non-terminal X in the string by a
the right-hand side of some production

X Y1…Yn

3. Repeat (2) until there are no non-terminals in


the string
16
The Language of a CFG (Cont.)

More formally, write

if there is a production

17
The Language of a CFG (Cont.)

Write

if

in 0 or more steps

18
The Language of a CFG

Let G be a context-free grammar with start symbol S.


Then the language L(G) of G is:

The sentential forms SF(G) of G is:

 X1 … Xn  S X1 … Xn and every Xi is a terminal or non-terminal

Therefore: L(G)  SF(G)


19
Terminals

• Terminals are so-called because there are no


rules for replacing them

• Once generated, terminals are permanent

• Terminals ought to be tokens of the language

20
Examples

L(G) is the language of CFG G

Strings of balanced parentheses ( )


i i
| i  0

Two grammars:
S  (S) S  (S)
OR
S   | 

21
Arithmetic Example

Simple arithmetic expressions:

E  E+E | E  E | (E) | id
Some elements of the language:
id id + id
(id) id  id
(id)  id id  (id)
22
Notes

The idea of a CFG is a big step. But:

• Membership in a language is “yes” or “no”; also


need parse tree of the input

• Must handle errors gracefully

• Need an implementation of CFG’s (e.g., bison)

23
Derivations and Parse Trees

A derivation is a sequence of productions


S …… ………

A derivation can be drawn as a tree


– Start symbol is the tree’s root
– For a production X  Y1…Yn add children Y1…Yn to
node X

24
Derivation Example

• Grammar

E  E+E | E  E | (E) | id
• String
id  id + id

25
Derivation Example (Cont.)

E
E
 E+E
E + E
 E  E+E
 id  E + E E * E id
 id id + E
id id
 id id + id
26
Derivation in Detail (1)

27
Derivation in Detail (2)

E + E
E
 E+E

28
Derivation in Detail (3)

E E + E

 E+E
E * E
 E  E+E

29
Derivation in Detail (4)

E
E + E
 E+E
 E  E+E E * E
 id  E + E
id

30
Derivation in Detail (5)

E
E
 E+E E + E

 E  E+E
E * E
 id  E + E
 id id + E id id

31
Derivation in Detail (6)

E
E
 E+E
E + E
 E  E+E
 id  E + E E * E id
 id id + E
id id
 id id + id
32
Notes on Derivations

• A parse tree has


– Terminals at the leaves
– Non-terminals at the interior nodes

• An in-order traversal of the leaves is the


original input

• The parse tree shows the association of


operations, the input string does not

33
Left-most and Right-most Derivations

• The example was a left-


most derivation E
– At each step, replaced the
left-most non-terminal  E+E
• There is an equivalent
 E+id
notion of a right-most  E  E + id
derivation
 E id + id
 id id + id
34
Right-most Derivation in Detail (1)

35
Right-most Derivation in Detail (2)

E + E
E
 E+E

36
Right-most Derivation in Detail (3)

E E + E

 E+E
id
 E+id

37
Right-most Derivation in Detail (4)

E
E + E
 E+E
 E+id E * E id
 E  E + id

38
Right-most Derivation in Detail (5)

E
E
 E+E E + E

 E+id
E * E id
 E  E + id
 E id + id id

39
Right-most Derivation in Detail (6)

E
E
 E+E
E + E
 E+id
 E  E + id E * E id
 E id + id
id id
 id id + id
40
Derivations and Parse Trees

• Note that right-most and left-most


derivations have the same parse tree

• The difference is the order in which branches


are added

41
Summary of Derivations

• We are not just interested in whether


s  L(G)
– We need a parse tree for s

• A derivation defines a parse tree


– But one parse tree may have many derivations

• Left-most and right-most derivations are


important in parser implementation

42
Ambiguity

• Grammar E  E + E | E * E | ( E ) | id

• String id * id + id

43
Ambiguity (Cont.)

This string has two parse trees


E E

E + E E * E

E E id id E + E
*
id id id id

44
Ambiguity (Cont.)

• A grammar is ambiguous if it has more than


one parse tree for some string
– Equivalently, there is more than one right-most or
left-most derivation for some string

• Ambiguity is BAD
– Leaves meaning of some programs ill-defined

45
Dealing with Ambiguity

• There are several ways to handle ambiguity

• Most direct method is to rewrite grammar


unambiguously
EE+T|T
TT*F|F
F  ( E ) | id

• Enforces precedence of * over +


46
Ambiguity in Arithmetic Expressions

• Recall the grammar


E  E + E | E * E | ( E ) | id
• The string id * id + id has two parse trees:
E E

E + E E * E
E * E id id E + E
id id id id
47
Ambiguity in Arithmetic Expressions

• Recall the grammar


E  E + E | E * E | ( E ) | id
• The string id * id + id has two parse trees:
E E

E + E E * E
E * E id id E + E
id id id id
48
Ambiguity: The Dangling Else

• Consider the grammar


S  if E then S
| if E then S else S
| OTHER

• This grammar is also ambiguous

49
The Dangling Else: Example

• The expression
if E1 then if E2 then S else S
has two parse trees
S S

if E1 then S else S if E1 then S

if E2 then S if E2 then S else S

• Typically we want the second form


50
The Dangling Else: A Fix

• else matches the closest unmatched then


• We can describe this in the grammar

S  MIF /* all then are matched */


| UIF /* some then is unmatched */
MIF  if E then MIF else MIF
| OTHER
UIF  if E then S
| if E then MIF else UIF
• Describes the same set of strings
51
The Dangling Else: Example Revisited

• The expression if E1 then if E2 then S else S


S S

UIF UIF

if E1 then MIF elseUIF if E1 then S


MIF
if E2 then S
if E2 then MIF else MIF
• Not valid because the • A valid parse tree
then expression is not (for a UIF)
a MIF
Ambiguity

• No general techniques for handling ambiguity

• Impossible to convert automatically an


ambiguous grammar to an unambiguous one

• Used with care, ambiguity can simplify the


grammar
– Sometimes allows more natural definitions
– We need disambiguation mechanisms

53

You might also like