0% found this document useful (0 votes)

46 views36 pages

G52Cmp Compilers: Syntax Analysis

The document discusses syntax analysis in compilers. It describes the process of checking the syntax of a program using context free grammars. Context free grammars can expressively describe the syntax of programming languages using terminals, nonterminals, productions and a start symbol. Parsing is the process of determining if an input token stream satisfies the grammar using techniques like top-down and bottom-up parsing. Ambiguities in grammars need to be resolved for accurate parsing.

Uploaded by

Kim Mey

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

Download as pptx, pdf, or txt

0% found this document useful (0 votes)

46 views36 pages

G52Cmp Compilers: Syntax Analysis

Uploaded by

Kim Mey

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

Download as pptx, pdf, or txt

You are on page 1/ 36

G52CMP COMPILERS

Syntax Analysis
Lecture 5

1
Learning objectives

• To describe the process of checking the syntax of a

program
• To describe the representation of a syntax for a
programming language using context free grammar
• To understand the acceptance of a sentence for a
grammar using derivation
• To understand the problem of ambiguity in grammars
and how to resolve it
• To describe the top-down and bottom-up parsing
processes in accepting sentences for a given grammar

2
Source Program
Syntax Analysis
Lexical Analyser

Syntax Analyser

Symbol Table Semantic Analyser
Manager Error
Handler
Intermediate Code
Generator

Code Optimiser

Code Generator

Target Program
3
Grouping of Tokens

4
Where is the Syntax Analysis?
if (b == 0) a = b;

Lexical Analysis or Scanner

if ( b == 0 ) a = b ;

Syntax Analysis or Parsing

if
abstract syntax tree
== = or parse tree
b 0 a b without token names
// inorder traversal
5
The Role of the Parser

token
source Lexical Semantic intermediate
program Analyzer Analyzer
Parser syntax representation
get next tree
token

symbol
table

6
Parsing Analogy (for English language)
• Syntax analysis for natural languages
• Recognize whether a sentence is grammatically correct
• Identify the function of each word

sentence

subject verb indirect object object

I gave him noun phrase

article noun
“I gave him the book”
the book

7
Parsing Overview
• Goal – determine if the input token stream satisfies
the syntax of the program
• What do we need to do this?
– An expressive way to describe the syntax
– A mechanism that determines if the input token
stream satisfies the syntax description
• For lexical analysis
– Regular expressions describe tokens
– Finite automata have the mechanisms to generate
tokens from input stream

8
Use Regular Expressions
• REs can expressively describe tokens
– Easy to implement via DFAs
• So just use them to describe the syntax of a
programming language ???
– NO! – They don’t have enough power to express any
non-trivial syntax. Example: Nested constructs (blocks,
expressions, statements) such as to detect balanced
braces:
{ { { { {
{{} {} {{} { }}}
...
} } } } }

9
Context-Free Grammars
// production rules or
// grammar
• Consist of 4 components/properties:
SaSa
– Terminal symbols = token or 
ST
– Nonterminal symbols = syntactic variables
TbTb
– Start symbol S = special non-terminal
– Productions of the form LHSRHS T
• LHS = single non-terminal
• RHS = string of ‘terminals and non-terminals’
• Specify how non-terminals may be expanded
• Language generated by a grammar is the set of strings of
terminals derived from the start symbol by repeatedly
applying the productions
– L(G) = language generated by grammar G

10
Context-Free Grammars continued

 A set of terminals: basic symbols from which

sentences are formed
 A set of nonterminals: syntactic categories
denoting sets of sentences
 A set of productions: rules specifying how the
terminals and nonterminals can be combined to
form sentences
 The start symbol: a distinguished nonterminal
denoting the language

11
Notational Conventions
To avoid always having to state that "these are the
terminals”, "these are the nonterminals”, and so on, we
shall employ the following notational conventions with
regard to grammars throughout the remainder of this
subject
Terminals: id, +, -, *, /, (, )
Nonterminals: expr, op
Productions:
expr  expr op expr

expr  ( expr )
expr  - expr
expr  id
op  + | - | * | /
The start symbol: expr
12
Notational Conventions continued
l. These Symbols are Terminals:
i) Lower-case letters early in the alphabet such as a, b, c.
ii) Operator symbols such as +, -, etc.
iii) Punctuation symbols such as parentheses, comma, etc.
iv) The digits 0, 1, . . . , 9.
v) Boldface strings such as id or if.

2. These Symbols are Nonterminals:

i) Upper-case letters early in the alphabet such as A, B, C.
ii) The letter S, when it appears, is usually the start symbol.
iii) Lower-case italic names such as expr or stmt.

13
Notational Conventions continued

3. Upper-case letters late in the alphabet, such as X, Y, Z,

represent grammar symbols, that is, either nonterminals
(normally) or terminals.
4. Lower-case letters late in the alphabet, chiefly u, v, . . . , z,
represent strings of terminals.
5. Lower-case Greek letters, , , , … , represent strings of
grammar symbols. Thus, a generic production could be written
as A   , indicating that there is a single nonterminal A on
the left of the arrow (the left side of the production) and a
string of grammar symbols  to the right of the arrow (the
right side of the production).

14
Notational Conventions continued

6. If A  1, A  2, . . . , A  k are all productions with A on

the left (we call them A-productions),
we may write A  1 | 2 | … | k. We call 1, 2, . . . , k the
alternatives for A.
7. The left side of the first production is the start symbol.
E  E A E | ( E ) | - E | id
A + |- |*|/

15
Derivations

A derivation step is an application of a production as a

rewriting rule
E-E
A sequence of derivation steps
E  - E  - ( E )  - ( id )
is called a derivation of “- ( id )” from E
The symbol  * denotes: derives in zero or more steps
+
The symbol  denotes: derives in one or more steps

E  - *( id ) E  - ( id+)

1.   *  for any string  // reflexive relation

2. If  *  and  , then  * 
  // transitive relation

16
example
• Grammar for balanced-parentheses language
– S(S)S
– S // Why is the final S required ?
• 1 non-terminal: S
• 2 terminals: “(”, “)”
• Start symbol: S
• 2 productions
• If grammar accepts a string, there is a derivation of that
string using the productions. For examples:
– S = (S) S = (S)  = ((S) S)  = (()  )  = (())
– S = (S) S = ((S)S) S = ((S)S) (S)S = (()) () = (()) ()

17
More on CFGs
• Shorthand notation – vertical bar for multiple
productions
SaSa|T
TbTb|
• CFGs powerful enough to expression the syntax in
most programming languages
• Derivation = successive application of productions
starting from S
• Acceptance? = Determine if there is a derivation for
an input token stream

18
RE is a Subset of CFG
Can inductively build a grammar for each RE
 S
a Sa
R1 R2 S  S1 S2
R1 | R2 S  S1 | S2
R1* S  S1 S | 

Note:
G1 = grammar for RE1, with start symbol S1
G2 = grammar for RE2, with start symbol S2

19
Context-Free Languages

A context-free language L(G) is the language defined by

a context-free grammar G
A string of terminals  is in L(G) if and only if S + ,
 is called a sentence of G
If S * , where  may contain nonterminals, then we
call  a sentential form of G
E  - E  - ( E )  - ( id )
G1 is equivalent to G2 if L(G1) = L(G2)

20
A Parser
Context free
grammar, G Yes, if s in L(G)
Parser
No, otherwise
Token stream, s
(from lexer)
Error messages

Syntax analyzers (parsers) = CFG acceptors which also

output the corresponding derivation when the token stream
is accepted

Various kinds of parsing: LL(k), LR(k), SLR, LALR

21
Left- & Right-most Derivations
Each derivation step needs to choose
a nonterminal to rewrite
a production to apply
Consider the following grammar:
E  E A E | ( E ) | - E | id
A +|-|*|/
A leftmost derivation always chooses the
leftmost nonterminal to rewrite
E lm - E lm - ( E ) lm - ( E A E ) lm - ( E + E )
lm - ( id + E ) lm - ( id + id )

A rightmost derivation always chooses the

rightmost nonterminal to rewrite
E rm - E rm - ( E ) rm - ( E A E ) rm - ( E + E )
rm - (E + id ) rm - ( id + id ) 22
Parse Trees
* A parse tree is a graphical representation for a derivation that
filters out the order of choosing nonterminals for rewriting
* Many derivations may correspond to the same parse tree, but
every parse tree has associated with it a unique leftmost and a
unique rightmost derivation

E // rightmost derivation
// leftmost derivation
E lm - E - E Erm - E
lm - ( E ) rm - ( E )
lm - ( E + E ) ( E ) rm - ( E + E )
lm - ( id + E ) rm - (E + id )
lm - ( id + id ) E + E rm - ( id + id )

id id 23
Parse Tree vs Abstract Syntax Tree
S SE+S | E +
E + S E  number | (S)
+ 5
Derive: (1 + 2 + (3 + 4)) + 5
( S ) E 1 +
• Parse tree is a tree
2 +
E + S 5 representation of the derivation
• Leaves of the tree are terminals
3 4
1 E + S • Internal nodes are non-
AST (abstract
terminals
syntax tree) discards
2 E • NO information about the order
unneeded
of the derivation steps
( S ) information, more
compact format,
E + S inorder traversal

3 E

4 24
Derivation Order
• Can choose to apply productions in any order,
select non-terminal and
substitute RHS of production
• Two standard orders: left and right-most
• Consider the following grammar:
SE+S | E
E  number | (S)
• Leftmost derivation
– In the string, find the leftmost non-terminal and apply a
production to it
– E+S1+S
• Rightmost derivation
– Same, but find rightmost non-terminal
– E+SE+E+S

25
Leftmost & Rightmost Derivation
Leftmost derive: SE+S|E Rightmost derive:
(1 + 2 + (3 + 4)) + 5 E  number | (S) (1 + 2 + (3 + 4)) + 5
SE+S S  E+S
 (S)+S  E+E
 (E+S) + S  E+5
 (1+S)+S  (S)+5
 (1+E+S)+S  (E+S)+5
 (1+2+S)+S  (E+E+S)+5
 (1+2+E)+S  (E+E+E)+5
 (1+2+(S))+S  (E+E+(S))+5
 (1+2+(E+S))+S  (E+E+(E+S))+5
 (1+2+(3+S))+S  (E+E+(E+E))+5
 (1+2+(3+E))+S  (E+E+(E+4))+5
 (1+2+(3+4))+S  (E+E+(3+4))+5
 (1+2+(3+4))+E  (E+2+(3+4))+5
 (1+2+(3+4))+5  (1+2+(3+4))+5
Result: Same parse tree, same productions chosen, but in different order
26
Ambiguous Grammar
* A grammar is ambiguous if it produces more than
one parse tree for some sentence
EE+E EE*E
 id + E EE*E E+E*E
 id + E * E |E+E  id + E * E
 id + id * E | id  id + id * E
 id + id * id Derive: id + id * id  id + id * id
E E

E + E E * E

id E * E E + E id

id id id id 27
Resolving Ambiguity

* Use disambiguating rules to throw away undesirable

parse trees

* Rewrite grammars by incorporating disambiguating

rules into grammars

28
Example
The dangling-else grammar
stmt  if expr then stmt
| if expr then stmt else stmt
| other
Which parse tree for if E1 then if E2 then S1 else S2 ?
S

if E then S else S
if E then S
S
if E then S
if E then S else S
29
Incorporating Disambiguating Rules
into Grammars
* Rule: match each “else” with the closest previous unmatched
“then”
* Remove undesired state transitions in the pushdown automaton
Which parse tree for if E1 then if E2 then S1 else S2 ?
S

if E then S else S
if E then S
S
if E then S
if E then S else S
30
Grammar Rewriting
Consider the ambiguous grammar below:
stmt  if expr then stmt
| if expr then stmt else stmt
| other

Incorporating Disambiguating Rules into a Grammar:

stmt  m_stmt
Try to derive by yourself:
| unm_stmt if E then if E then S else S

m_stmt  if expr then m_stmt else m_stmt

| other

unm_stmt  if expr then stmt

| if expr then m_stmt else unm_stmt 31
Different kinds of Parsing Algorithms
• Two big groups of algorithms can be distinguished:
– bottom up strategies
– top down strategies
• Analogy: parsing of “Micro-English” grammar below:
Sentence
Sentence ::=
::= Subject
Subject Verb
Verb Object
Object ..
Subject
Subject ::=
::= II || AA Noun
Noun || The
The Noun
Noun
Object
Object ::=
::= me
me || aa Noun
Noun || the
the Noun
Noun
Noun
Noun ::=
::= cat
cat || bat
bat || rat
rat
Verb
Verb ::=
::= like
like || isis || see
see || sees
sees

The cat sees a rat. The rat sees me. I like a cat.
The cat sees the rat. The rat like me. I see the rat.
I sees a rat.
I like the rat.
// Shhh… I don’t like rat. 32
Bottom up parsing
The parse tree “grows” from the bottom (leafs) up to the top (root).

Sentence

Subject Object

Noun Verb Noun

The cat sees a rat .

33
Top-down parsing
The parse tree is constructed starting at the top (root) down to the
bottom (leafs).

Sentence

Subject Verb Object

Noun Noun

The cat sees a rat .

34
Quick Review
• Syntactic analysis
– Lexical analysis
• Group letters into words (or group characters into
tokens)
• Use regular expressions and deterministic FSM’s
– Grammar transformations before parsing, coming soon
• Left-recursion removal
• Left-factoring
• Substitution
– Parsing = structural analysis of program
• Group words into sentences, paragraphs, and
documents (or tokens into expressions, commands, and
programs)
• Top-down Parsing
• Bottom-up Parsing
35
END

Doppler (Sound) Worksheet
0% (2)
Doppler (Sound) Worksheet
3 pages
240 L4-2.3L Sohc Vin 88 B230F (1987) PDF
No ratings yet
240 L4-2.3L Sohc Vin 88 B230F (1987) PDF
121 pages
Lisp Interpreter in Rust
From Everand
Lisp Interpreter in Rust
Vishal Patil
1/5 (1)
Henry Hardware Business Plan - Tobeprint
100% (3)
Henry Hardware Business Plan - Tobeprint
24 pages
Chapter 3 Syntax Analysis (Parsing)
No ratings yet
Chapter 3 Syntax Analysis (Parsing)
29 pages
Chapter 3 Syntax Analysis (Parsing)
No ratings yet
Chapter 3 Syntax Analysis (Parsing)
29 pages
Compiler Rewind
No ratings yet
Compiler Rewind
52 pages
Unit 2 - Sessions 1 - 2
No ratings yet
Unit 2 - Sessions 1 - 2
133 pages
Compiler Design Chapter-3
0% (1)
Compiler Design Chapter-3
177 pages
Chapter 4
No ratings yet
Chapter 4
23 pages
CSC 461 Final
No ratings yet
CSC 461 Final
170 pages
II. Parser: Syntax Analysis
No ratings yet
II. Parser: Syntax Analysis
18 pages
Compiler Construction: Chapter # 2 - Lexical Analysis Instructor: Ms. Raazia Sosan
No ratings yet
Compiler Construction: Chapter # 2 - Lexical Analysis Instructor: Ms. Raazia Sosan
53 pages
3a. Context Free Grammar
No ratings yet
3a. Context Free Grammar
18 pages
ch3 M.PPTX - 0
No ratings yet
ch3 M.PPTX - 0
46 pages
Chapter 3 Syntax Analysis I
No ratings yet
Chapter 3 Syntax Analysis I
27 pages
Unit 2 - Sessions 1 - 2
No ratings yet
Unit 2 - Sessions 1 - 2
36 pages
Lexical Analysis: Risul Islam Rasel
No ratings yet
Lexical Analysis: Risul Islam Rasel
148 pages
Lexical Analysis
No ratings yet
Lexical Analysis
57 pages
Unit-2 PCD
No ratings yet
Unit-2 PCD
36 pages
VMKV Engineering College Department of Computer Science & Engineering Principles of Compiler Design Unit I Part-A
No ratings yet
VMKV Engineering College Department of Computer Science & Engineering Principles of Compiler Design Unit I Part-A
80 pages
Slides 02 - Compiler Construction - UET CS - Lexical Analyzer Rev 2
No ratings yet
Slides 02 - Compiler Construction - UET CS - Lexical Analyzer Rev 2
69 pages
Unit 2
No ratings yet
Unit 2
45 pages
Ch3 - Lexical Analysis
No ratings yet
Ch3 - Lexical Analysis
52 pages
COMPILER DESIGN UNIT 2
No ratings yet
COMPILER DESIGN UNIT 2
44 pages
Chapter 2 - Lexical Analysis_Regular Expressions(1)
No ratings yet
Chapter 2 - Lexical Analysis_Regular Expressions(1)
27 pages
CD_UNIT-2
No ratings yet
CD_UNIT-2
64 pages
003chapter 3 - Syntax Analysis
No ratings yet
003chapter 3 - Syntax Analysis
171 pages
Day 5 - Syntax Analysis
No ratings yet
Day 5 - Syntax Analysis
46 pages
Syntax Analysis
No ratings yet
Syntax Analysis
58 pages
Compiler Design 1
100% (1)
Compiler Design 1
30 pages
CD Unit-2
No ratings yet
CD Unit-2
64 pages
Here are the answers to the questions in the provided Compiler Design final assignment
No ratings yet
Here are the answers to the questions in the provided Compiler Design final assignment
20 pages
CD Unit-2
100% (1)
CD Unit-2
60 pages
Chapter 4 - Syntax Analysis Part 1
No ratings yet
Chapter 4 - Syntax Analysis Part 1
36 pages
Grammar and Parse Trees (Syntax) : What Makes A Good Programming Language?
100% (2)
Grammar and Parse Trees (Syntax) : What Makes A Good Programming Language?
50 pages
CC Questions
No ratings yet
CC Questions
9 pages
Compiler Construction Week 04 Syntax Analysis I)
No ratings yet
Compiler Construction Week 04 Syntax Analysis I)
41 pages
Lexical Analysis: Deterministic Finite Automata
No ratings yet
Lexical Analysis: Deterministic Finite Automata
37 pages
L4 - Lexical Analysis
No ratings yet
L4 - Lexical Analysis
44 pages
CC Summary (Slides)
No ratings yet
CC Summary (Slides)
9 pages
03 Lex Analysis
No ratings yet
03 Lex Analysis
61 pages
Compiler Constructer
No ratings yet
Compiler Constructer
17 pages
2 LexicalAnalysis
No ratings yet
2 LexicalAnalysis
11 pages
Chapter 3 - Syntax Analysis
No ratings yet
Chapter 3 - Syntax Analysis
160 pages
4th - Syntax Analysis
No ratings yet
4th - Syntax Analysis
29 pages
Ch4a
No ratings yet
Ch4a
36 pages
Ch3_Syntax Analysis
No ratings yet
Ch3_Syntax Analysis
96 pages
Chapter 2 - Copy
No ratings yet
Chapter 2 - Copy
39 pages
Compiler Construction CS-4207: Lecture 4-5 Instructor Name: Atif Ishaq
100% (1)
Compiler Construction CS-4207: Lecture 4-5 Instructor Name: Atif Ishaq
37 pages
Lesson 08 2
No ratings yet
Lesson 08 2
33 pages
MOD 04 - Language Description & Lexical Analysis
No ratings yet
MOD 04 - Language Description & Lexical Analysis
107 pages
2nd Phase Syntax Analyzer -1
No ratings yet
2nd Phase Syntax Analyzer -1
136 pages
Introduction To Syntax Analysis: CSCI4160: Compiler Design and Software Development
No ratings yet
Introduction To Syntax Analysis: CSCI4160: Compiler Design and Software Development
36 pages
Chpater 2 Lexical Analysis
No ratings yet
Chpater 2 Lexical Analysis
48 pages
Chapter 2
No ratings yet
Chapter 2
56 pages
M2 Session2
No ratings yet
M2 Session2
17 pages
cd1
No ratings yet
cd1
92 pages
Cs1352 Principles of Compiler Design
No ratings yet
Cs1352 Principles of Compiler Design
33 pages
CPSC 388 - Compiler Design and Construction: Lecture: MWF 11:00am-12:20pm, Room 106 Colton
No ratings yet
CPSC 388 - Compiler Design and Construction: Lecture: MWF 11:00am-12:20pm, Room 106 Colton
27 pages
Principles of Compiler Design
100% (2)
Principles of Compiler Design
35 pages
Compiler Design
From Everand
Compiler Design
Knowledge Flow
No ratings yet
The Genetic Code of All Languages; Part-5 (Hebrew)
From Everand
The Genetic Code of All Languages; Part-5 (Hebrew)
Moni Kanchan Panda
No ratings yet
Lecture 01 - Finite Automat I PDF
No ratings yet
Lecture 01 - Finite Automat I PDF
28 pages
TIC 2151 - Theory of Computation: Context-Free Grammars (CFG)
No ratings yet
TIC 2151 - Theory of Computation: Context-Free Grammars (CFG)
23 pages
TIC 2151 - Theory of Computation: Church-Turing Thesis & Computability
No ratings yet
TIC 2151 - Theory of Computation: Church-Turing Thesis & Computability
17 pages
TIC 2151 - Theory of Computation: Decidability
100% (1)
TIC 2151 - Theory of Computation: Decidability
14 pages
Committee On Payments and Market Infrastructures: December 2019
No ratings yet
Committee On Payments and Market Infrastructures: December 2019
21 pages
HR800 PARTS 4372607 - 616 - A PDF
No ratings yet
HR800 PARTS 4372607 - 616 - A PDF
174 pages
G11-CSS June 19 (Am)
No ratings yet
G11-CSS June 19 (Am)
2 pages
Mpeg2 Ts Monitor
No ratings yet
Mpeg2 Ts Monitor
1 page
Asynchronous Activity 1
No ratings yet
Asynchronous Activity 1
1 page
Atkinson Cycle
No ratings yet
Atkinson Cycle
49 pages
Abap Cds Dev User Guide EN
No ratings yet
Abap Cds Dev User Guide EN
120 pages
Action Plan Catch-Up Fridays
No ratings yet
Action Plan Catch-Up Fridays
6 pages
Evaluating Workforce Mental Health and Well Being A Review of Assessments
No ratings yet
Evaluating Workforce Mental Health and Well Being A Review of Assessments
20 pages
Triaxial, Mecleod, CBR
0% (1)
Triaxial, Mecleod, CBR
27 pages
Course Outline: International Islamic University Malaysia
No ratings yet
Course Outline: International Islamic University Malaysia
6 pages
Module 3
No ratings yet
Module 3
6 pages
Yerma Revision
No ratings yet
Yerma Revision
1 page
Consumer Perception: Consumer Behavior, Eighth Edition Schiffman & Kanuk
No ratings yet
Consumer Perception: Consumer Behavior, Eighth Edition Schiffman & Kanuk
41 pages
Gpon Alcatel Presentation
No ratings yet
Gpon Alcatel Presentation
20 pages
CH 3
No ratings yet
CH 3
22 pages
The Effect of Workforce Diversity and Organizational Culture On Employee Performance in Pakistan
No ratings yet
The Effect of Workforce Diversity and Organizational Culture On Employee Performance in Pakistan
20 pages
Simulation of Thermoelastic Behaviour of Spacecraft Structures: Fundamentals and Recommendations (Springer Aerospace Technology) Simon Appel 2024 Scribd Download
100% (3)
Simulation of Thermoelastic Behaviour of Spacecraft Structures: Fundamentals and Recommendations (Springer Aerospace Technology) Simon Appel 2024 Scribd Download
50 pages
UTS. Section 3 (Reading Comprehension) - Attempt Review - 023
100% (1)
UTS. Section 3 (Reading Comprehension) - Attempt Review - 023
16 pages
Easwari Engineering College Department of Ece Question Bank: Unit Iii (Microwave Passive Components)
No ratings yet
Easwari Engineering College Department of Ece Question Bank: Unit Iii (Microwave Passive Components)
13 pages
A Work Book On Youth Leadership Program
No ratings yet
A Work Book On Youth Leadership Program
44 pages
Role of Mulches in Agriculture
No ratings yet
Role of Mulches in Agriculture
5 pages
Inline Baggage Fact Sheet
No ratings yet
Inline Baggage Fact Sheet
2 pages
P & N-Channel 30-V (D-S) MOSFET Features General Description
No ratings yet
P & N-Channel 30-V (D-S) MOSFET Features General Description
8 pages
Session2 Copula
No ratings yet
Session2 Copula
12 pages
Storage Tanks and Pressure Vessels
50% (2)
Storage Tanks and Pressure Vessels
18 pages
Curriculum Planning in The Philippine Model
No ratings yet
Curriculum Planning in The Philippine Model
4 pages