Compiler Construction
[SWE-310]
Syntax Analysis
Compiler Construction [SWE-310] Syntax Analysis
Syntax Analysis
2nd phase of a compiler.
Also called parsing or hierarchal analysis.
The grammar of a language defines the
correct form for sentences in that
language.
Compiler Construction [SWE-310] Syntax Analysis
Benefits of Syntax or Grammar
Grammars offer following significant benefits
for both language designers and compiler
writers:
A grammar gives a precise, yet easy-to-
understand, syntactic specification of a
programming language.
From certain classes of grammars, a compiler
writer can construct automatically an efficient
parser that determines the syntactic structure
of a source program.
Compiler Construction [SWE-310] Syntax Analysis
Benefits of Syntax or Grammar
The structure imparted to a language by a
properly designed grammar is useful for
translating source programs into correct object
code and for detecting errors.
A grammar allows a language to be evolved or
developed iteratively, by adding new
constructs to perform new tasks. These new
constructs can be integrated more easily into
an implementation that follows the
grammatical structure of the language.
Compiler Construction [SWE-310] Syntax Analysis
The Role of Syntax Analyzer
It involves grouping the tokens of the
source program into grammatical phrases
that are used by the compiler to synthesize
output.
A language must exhibit structure.
Just as the structure of English prose.
Formal grammars are used to describe
structure of computer program.
Compiler Construction [SWE-310] Syntax Analysis
The Role of Syntax Analyzer
Compiler Construction [SWE-310] Syntax Analysis
Types of Languages
There are two types of languages:
Natural languages
Formal or Computer languages
E.g.#1 Natural Language
<Sentences> <noun phrase> <verb phrase>
<noun phrase><adjective> <noun phrase>
<noun phrase> <noun>
<noun> boy
<adjective> little
Compiler Construction [SWE-310] Syntax Analysis
Types of Languages
E.g.#2 Formal or Computer Language
<expression> <expression> + <expression>
<expression> <expression> * <expression>
<expression> ( <expression> )
<expression> id
Compiler Construction [SWE-310] Syntax Analysis
Types of Languages
E.g.#3 Show that ( id + id ) * id is a valid
expression
<expression> <expression> * <expression>
<expression> (<expression>) * <expression>
<expression> (<expression> + <expression>) * <expression>
<expression> (<expression> + <expression>) * id
<expression> (<expression> + id ) * id
<expression> ( id + id ) * id
Compiler Construction [SWE-310] Syntax Analysis
Types of Grammars
There are four (4) types of grammars as
defined by Noam Chomsky in 1950’s.
Level Grammar
3 Regular
2 Context-Free
1 Context-Sensitive
0 Unrestricted
Compiler Construction [SWE-310] Syntax Analysis
Context-Free Grammar (CFG)
Allthe computer languages are expressed
by Context-Free Grammar (CFG).
A CFG, ‘G’ is defined as:
G = { T, N, S, P}, where,
T: Set of Terminals
N:Set of Non-terminals
S: Start Symbol
P: Productions
Compiler Construction [SWE-310] Syntax Analysis
Context-Free Grammar (CFG)
1. T: Set of Terminals
Terminals are the basic symbols from which
strings are formed.
The word “token" is a synonym for
“terminal“.
E.g. if, then, else, *, ;
Compiler Construction [SWE-310] Syntax Analysis
Context-Free Grammar (CFG)
2. N: Set of Non-terminals
Non-terminals are syntactic variables that
denote sets of strings.
impose a hierarchal structure on the
language.
Compiler Construction [SWE-310] Syntax Analysis
Context-Free Grammar (CFG)
3. S: Start Symbol
One non-terminals is distinguished as start
symbol.
Compiler Construction [SWE-310] Syntax Analysis
Context-Free Grammar (CFG)
4. P: Productions
Productions specify the manner in which the terminals and non-
terminals can be combined to form strings.
Each production consists of:
(a) A nonterminal called the head or left side of the production; this
production defines some of the strings denoted by the head.
(b) The symbol . Sometimes ::= has been used in place of the
arrow.
(c) A body or right side consisting of zero or more terminals and non-
terminals. The components of the body describe one way in which
strings of the nonterminal at the head can be constructed.
Compiler Construction [SWE-310] Syntax Analysis
Context-Free Grammar (CFG)
E.g. The CFG of simple arithmetic expression can be
defined by following productions:
expr expr op term
expr ( expr )
expr - expr
expr id
Op +
Op -
Op *
Op /
Op ↑
Compiler Construction [SWE-310] Syntax Analysis
Notational Conventions for CFG
1. Terminals:
Compiler Construction [SWE-310] Syntax Analysis
Notational Conventions for CFG
2. Non-terminals:
Compiler Construction [SWE-310] Syntax Analysis
Notational Conventions for CFG
3. Grammar Symbols:
4. Strings of Terminals:
Compiler Construction [SWE-310] Syntax Analysis
Notational Conventions for CFG
5. Strings of Grammar Symbols:
6. Alternatives for Non-terminals:
Compiler Construction [SWE-310] Syntax Analysis
Notational Conventions for CFG
7. Start Symbol:
Compiler Construction [SWE-310] Syntax Analysis
Notational Conventions for CFG
E.g. Using above notations and shorthand, the
grammar of previous example can be rewritten
as:
E E A E | ( E ) | - E | id
A+|-|*|/|↑
Compiler Construction [SWE-310] Syntax Analysis
Process of Defining Grammar as a Language
There are two processes by which a grammar
can be defined as a language:
Derivations
Parse Trees
Compiler Construction [SWE-310] Syntax Analysis
Process of Defining Grammar as a Language
1. Derivation:
Production is a re-writing rule in which the
non-terminal on the left is replaced by the
strings on the right side of productions.
E.g. E - E, states that E with a minus sign
is also an expression. OR
E ==> - E, implies E derives – E
Also,
E ==> - E ==> - ( E ) ==> - ( id )
Compiler Construction [SWE-310] Syntax Analysis
Derivation
Generalized definition of derivation would be:
α𝐴𝛽 ==> α𝛾𝛽 ,
if A ==> 𝛾
Also,
if α1 ==> α2 ==> ----- ==> αn, then
α1 ==> αn
==> : Derives in one step
==>* : Derives in zero or more steps
==>+ : Derives in one or more steps
Compiler Construction [SWE-310] Syntax Analysis
Derivation
If S ==>* α, where S is the start symbol of a grammar G, then it
can be deduce that α is a sentential form of G.
Note that a sentential form may contain both terminals and
non-terminals, and may be empty.
A sentence of G is a sentential form with no non-terminals.
The language generated by a grammar is its set of sentences.
Thus, a string of terminals w is in L(G), the language generated
by G, if and only if w is a sentence of G (or S ==>* w ).
A language that can be generated by a grammar is said to be a
context-free language.
If two grammars generate the same language, the grammars
are said to be equivalent.
Compiler Construction [SWE-310] Syntax Analysis
Derivation
E.g. Prove that - ( id + id) is a sentence of grammar of previous
example.
E ==> - E ==> - ( E ) ==> - ( E+ E ) ==> - (id + E ) ==> - ( id + id )
At each step of derivation, two choices are available:
Leftmost replacement of non-terminal
Rightmost replacement of non-terminal
Since the above derivation is leftmost, it can be re-written as:
E ==>lm - E ==>lm - ( E ) ==>lm - ( E+ E ) ==>lm
- (id + E ) ==>lm - ( id + id )
Similarly, a rightmost derivation would be:
E ==>rm - E ==>rm - ( E ) ==>rm - ( E+ E ) ==>rm
- (id + E ) ==>rm - ( id + id )
Compiler Construction [SWE-310] Syntax Analysis
Types of Parsers
Parsers that work on leftmost derivations
are called Top-down Parsers.
Parsers that work on rightmost derivations
are called Bottom-up Parsers.
Compiler Construction [SWE-310] Syntax Analysis
Parse Trees
2. Parse Trees:
A graphical representation for a derivation that
filters out the choice regarding replacement order
of non-terminals.
Each interior node of a parse tree represents the
application of a production.
The interior node is labeled with the non-terminal
in the head of the production.
The children of the node are labeled, from left to
right, by the symbols in the body of the
production by which the non-terminal are
replaced during the derivation.
Compiler Construction [SWE-310] Syntax Analysis
Parse Trees
E.g. Construct parse tree for the following derivation:
E ==> - E ==> - ( E ) ==> - ( E+ E ) ==> - (id + E ) ==> - ( id + id )
Compiler Construction [SWE-310] Syntax Analysis
Parse Trees
E.g. The sentence id + id * id has two
distinct leftmost derivations derivation:
Compiler Construction [SWE-310] Syntax Analysis
Parse Trees
E.g. The corresponding parse trees for the
sentence id + id * id are:
a+(b*c) (a+b)*c
Which of the above parse tree is right?
Compiler Construction [SWE-310] Syntax Analysis
Parse Tree Examples
E.g.#1 Construct parse tree for the expression
id + id * id using the following grammar:
EE+T
TT+F
F ( E ) | id
where,
E: Expression
T: Term
F: Factor
Compiler Construction [SWE-310] Syntax Analysis
Parse Tree Examples
The corresponding parse tree for the
expression id + id * id is given below:
Compiler Construction [SWE-310] Syntax Analysis
Parse Tree Examples
E.g.#2 Construct parse tree for the expression
id * num + id – num / id using the following
grammar:
E E Top T | T
Top + | -
T T Fop F | F
Fop * | /
F ( E ) | id | - F | num
Compiler Construction [SWE-310] Syntax Analysis
Parse Tree Examples
id * num + id – num / id
Compiler Construction [SWE-310] Syntax Analysis
Parse Tree Examples
E.g.#3 - id * ( num + num ) / id
Compiler Construction [SWE-310] Syntax Analysis
Ambiguity
A grammar that produces more then one
parse tree for some sentence is called
ambiguous.
OR
Ambiguous grammars produce more then
one leftmost or rightmost derivation for
same sentence.
Compiler Construction [SWE-310] Syntax Analysis
Eliminating Ambiguity
E.g. Consider the following grammar:
Here other stands for any other
statement.
Compiler Construction [SWE-310] Syntax Analysis
Eliminating Ambiguity
The above grammar is ambiguous since it produces two parse
trees for the following statement:
if E1 then if E2 then S1 else S2
Parse tree#1
Compiler Construction [SWE-310] Syntax Analysis
Eliminating Ambiguity
if E1 then if E2 then S1 else S2
Parse Tree#1
Parse Tree#2
Compiler Construction [SWE-310] Syntax Analysis
Eliminating Ambiguity
In all the programming languages 1st parse
tree is preferred.
The unambiguous grammar can be re-
written by applying general rule that:
“Match each else with the closest previous
unmatched then”
Compiler Construction [SWE-310] Syntax Analysis
Eliminating Ambiguity
Thus, the new grammar would be:
Compiler Construction [SWE-310] Syntax Analysis
A Sample Grammar
The formal grammar of VSL (Very Simple Language) is
presented below:
Compiler Construction [SWE-310] Syntax Analysis
A Sample Grammar
Compiler Construction [SWE-310] Syntax Analysis
A Sample Grammar