100% found this document useful (2 votes)

2K views

Compiler Design

The document discusses the principles of compiler design. It describes compilers as programs that translate programs written in a high-level language into an equivalent program in a low-level language. The phases of compilation are analysis and synthesis. The analysis phase performs lexical analysis, syntax analysis, and semantic analysis. The synthesis phase generates intermediate code, performs code optimization, and generates the target code. Compilers use techniques like symbol tables, intermediate representations, and error handling to perform the translation.

Uploaded by

shnsundar

Available Formats

Download as DOC, PDF, TXT or read online on Scribd

100% found this document useful (2 votes)

2K views

Compiler Design

Uploaded by

shnsundar

Available Formats

Download as DOC, PDF, TXT or read online on Scribd

You are on page 1/ 52

PRINCIPLES OF COMPILER DESIGN

1. Introduction to compilers:-
A compiler is a program that reads a program written in one language (source
language (or) high level language) and translates it into an equivalent program in another
language.(target language (or) low level language)

Source program COMPILER Target Program

( High Level Language) (Low Level Language)

Compiler:- It converts the high level language into an equivalent low level language program.

Assembler:- It converts an assembly language(low level language) into machine code.(binary

representation)

PHASES OF COMPILER

There are two parts to compilation. They are

(i) Analysis Phase
(ii) Synthesis Phase

Source Program

Lexical Analyzer

Syntax Analyzer

Semantic Analyzer
Symbol Table Manager Error Handler

Intermediate Code Generator

Code Optimizer

Code Generator

Target Program

Analysis Phase:-
The analysis phase breaks up the source program into constituent pieces. The analysis phase
of a compiler performs,
1. Lexical analysis
2. Syntax Analysis
3. Semantic Analysis

1 Principles of Compiler Design

1.Lexical Analysis (or) Linear Analysis (or) Scanning:-

The lexical analysis phase reads the characters in the program and groups them into
tokens that are sequence of characters having a collective meaning.
Such as an Identifier, a Keyboard, a Punctuation, character or a multi character operator like ++.
“ The character sequence forming a token is called lexeme”
For Eg. Pos = init + rate * 60
Lexeme Token Attribute value
rate ID Pointer to symbol table
+ ADD
60 num 60
init ID Pointer to symbol table

2. Syntax Analysis (or) Hierarchical Analysis:-

Syntax analysis processes the string of descriptors (tokens), synthesized by the lexical
analyzer, to determine the syntactic structure of an input statement. This process is known as
parsing.
ie, Output of the parsing step is a representation of the syntactic structure of a statement.
Example:-
pos = init + rate * 60
=

pos +

init *

rate 60
3. Semantic Analysis:-
The semantic analysis phase checks the source program for semantic errors.
Processing performed by the semantic analysis step can classified into
a. Processing of declarative statements
b. Processing of executable statements
During semantic processing of declarative statements items of information are added to the
lexical tables.
Example:- (symbol table or lexical table)
real a, b;

id a real length ……
id b real length …..
Synthesis Phase:-
2 Principles of Compiler Design
1. Intermediate code generation
2. Code optimization
3. Code Generator

1. Intermediate code generation:-

After syntax and semantic analysis some compilers generate an explicit intermediate
representation of the source program. This intermediate representation should have two
important properties.
a. It should be easy to produce
b. It should be easy to translate into the target program.
We consider the intermediate form called “Three Address Code”. It consists of sequence of
instructions, each of which has atmost three operands.

Example:-
pos = init + rate * 60
pos = init + rate * int to real (60)

Might appear in three address code as,

temp1 = int to real (60)

temp2 = id3 * temp1
temp3 = id2 + temp2
id1 = temp3
=

id1 +

id2 *

id3 60

2. Code Optimization:-
The code optimization phase attempts to improve the intermediate code, so that faster
running machine code will result.

3. Code Generation:-
The final phase of the compiler is the generation of the target code or machine code or
assembly code.
Memory locations are selected for each of the variables used by the program. Then intermediate
instructions are translated into a sequence of machine instructions that perform the same task.

Example:-
MOV F id3, R2
MUL F #60.0, R2
MOV F id2, R1
ADD F R2, R1
MOV F R1,id1

3 Principles of Compiler Design

Translation of a statement
pos = init + rate * 60

Lexical Analyzer

id1 = id2 + id3 * 60

Syntax Analyzer

id1 +

id2 *

id3 60

Semantic Analyzer

id1 +

id2 *

id3 int to real

60
Intermediate Code Generator

temp1 = int to real (60)

temp2 = id3 * temp1
temp3 = id2 + temp2
id1 = temp3

Code Optimizer

temp1 = id3 * 60.0

id1 = id2 + temp1

Code Generator

MOV F id3, R2
MUL F #60.0, R2
MOV F id2, R1
ADD F R2, R1
MOV F R1,id1

4 Principles of Compiler Design

Symbol table management:-
A symbol table is a data structure containing a record for each identifier, with fields for
the attributes of the identifier. The data structure allows us to find the record for each identifier
quickly and to store or retrieve data from that record quickly.

Error Handler:-
Each phase can encounter errors.

• The lexical phase can detect errors where the characters remaining in the input do not
form any token of the language.
• The syntax analysis phase can detect errors where the token stream violates the structure
rules of the language.
• During semantic analysis, the compiler tries to construct a right syntactic structure, but no
meaning to the operation involved.
• The intermediate code generator may detect an operator whose operands have in
compatible types.
• The code optimizer, doing control flow analysis may detect that certain statements can
never be reached.
• The code generator may find a compiler created constant that is too large to fit in a word
of the target machines.

5 Principles of Compiler Design

Role of Lexical Analyzer:-
The main task is to read the input characters and produce as output a sequence of
tokens that the parser uses for syntax analysis.

Source Token
Program Lexical Analyzer Parser
Get next Token

Symbol Table

After receiving a “get next token” command from the parser, the lexical analyzer
reads input characters until it can identify a next token.

Token:-
Token is a sequence of characters that can be treated as a single logical entity. Typical
tokens are,
(a) Identifiers
(b) Keywords
(c) Operators
(d) Special symbols
(e) Constants
Pattern:-
A set of strings in the input for which the same token is produced as output, this set of
strings is called pattern.

Lexeme:-
A lexeme is a sequence of characters in the source program that is matched by the
pattern for a token.

Finite Automata

Definition:-
A recognizer for a language is a program that takes as input a string x and answers
“yes” if x is a sentence of the language and “no” otherwise.
A better way to covert a regular expression to a recognizer is to construct a
generalized transition diagram from the expression. This diagram is called finite automation.

A finite automation can be,

1. Deterministic finite automata
2. Non-Deterministic finite automata

6 Principles of Compiler Design

1. Non – deterministic Finite Automata:- [ NFA]
A NFA is a mathematical model that consists of,
1. a set of states S
2. a set of input symbol Σ
3. a transition function δ
4. a state S0 that is distinguished as start state
5. a set of states F distinguished as accepting state. It is indicated by double circle.

Example:-
The transition graph for an NFA that recognizes the language (a/b)* a

start a
0 1

The transition table is,

Input Symbol
State
a b
0 0,1 0
1 - -

2. Deterministic Finite Automata:- [DFA]

A DFA is a special case of non – deterministic finite automata in which,
1. No state has an ε – transition
2. For each state S and input symbol there is atmost one edge labeled a leaving S.

PROBLEM:-

1. Construct a non – deterministic finite automata for a regular expression (a/b)*

Solution;-
r = (a/b)*
Decomposition of (a/b)* (parse tree)

r4 *

( r3 )

r1 / r2

a b

7 Principles of Compiler Design

For r1 construct NFA start a
2 3

For r1 construct NFA start b

2 3

a
2 3
ε ε
NFA for r3 = r1/r2 start 6
1
ε ε
b
4 5

NFA for r4, that is (r3) is the same as that for r3.

NFA for r5 = (r3)* ε

a
2 3 ε
start ε ε ε 7
0 1 6
ε ε
4 b
5

8 Principles of Compiler Design

2. Construct a non – deterministic finite automata for a regular expression (a/b)*abb
Solution;-
r = (a/b)*
Decomposition of (a/b)* abb (parse tree)
r11

r9 r10

r7 r8 b

r5 r6 b

r4 * a

( r3 )

r1 / r2

a b

For r1 construct NFA start a

2 3

For r1 construct NFA start b

2 3

a
2 3
ε ε
NFA for r3 = r1/r2 start 6
1
ε ε
b
4 5

NFA for r4, that is (r3) is the same as that for r3.

NFA for r5 = (r3)* ε

a
2 3 ε
start ε ε ε 7
0 1 6
ε ε
4 b
5

9 Principles of Compiler Design

NFA for r6=a start a
7
8

NFA for r7= r5.r6 ε

a
2 3 ε
start ε ε ε ε a 8
0 1 6
ε ε 7
b
4 5

NFA for r8 = b
start b
8
9

NFA for r9 = r7. r8

ε
a
2 ε3 ε
start ε ε 6
ε 7 a 8 b 9
0 1
ε ε
4
b
5

NFA for r10 =b start b 1

9
0

NFA for r11 = r9.r10 = (a/b)* abb

a
3 ε
2
start ε ε ε 7 a 8
b 9 b 1
0
0 1 6
ε ε
b
4 5
ε

10 Principles of Compiler Design

CONVERSION OF NFA INTO DFA

1. Convert the NFA (a/b)* into DFA?

Solution: ε
The NFA for (a/b)* is, a
3 ε
2
start ε ε ε
7
0 1 6
ε b ε
ε 4 5

ε closure {0} = { 0,1,2,4,7} -------------- A

Transition of input symbol a on A = { 3 }
Transition of input symbol b on A = { 5 }

ε closure {3} = {3,6,1,2,4,7} ------------ B

Transition of input symbol a on B = { 3 }
Transition of input symbol b on B = { 5 }

ε closure {5} = {5,6,1,2,4,7} ------------ C

Transition of input symbol a on C = { 3 }
Transition of input symbol b on C = { 5 }

Since A is the start state and state C is the only accepting state then, the transition table is,

Input symbol
State
a b
A B C
B B C
C B C

The DFA is,

a a b

Start a b
A B C

11 Principles of Compiler Design

2. Convert the NFA (a/b)*abb into DFA?
Solution:
The NFA for (a/b)*abb is,

a
3 ε
2
start ε ε ε 7 a 8
b 9 b 1
0
0 1 6
ε ε
b
4 5
ε

ε closure {0} = { 0,1,2,4,7} -------------- A

Transition of input symbol a on A = { 3,8 }
Transition of input symbol b on A = { 5 }

ε closure {3,8} = { 3,6,7,1,2,4,8} -------------- B

Transition of input symbol a on B = { 8,3 }
Transition of input symbol b on B = { 5,9 }

ε closure {5} = { 5,6,7,1,2,4} -------------- C

Transition of input symbol a on C = { 8,3 }
Transition of input symbol b on C = { 5 }

ε closure {5,9} = { 5,6,7,1,2,4,9} -------------- D

Transition of input symbol a on D = { 8,3 }
Transition of input symbol b on D = { 5,10 }

ε closure {5,10} = { 5,6,7,1,2,4,10} -------------- E

Transition of input symbol a on E = { 8,3 }
Transition of input symbol b on E = { 5 }

Since A is the start state and state E is the only accepting state then, the transition table is,

Input symbol
State
a b
A B C
B B D
C B C
D B E
E B C

12 Principles of Compiler Design

C
b
b a
a
start A a c b
B D E

a
a

MINIMIZATION OF STATES

Problem 1: Construct a minimum state DFA for a regular expression (a/b)* abb

Solution:-
1. The NFA of (a/b)*abb is

a
3 ε
2
start ε ε ε 7 a 8
b 9 b 1
0
0 1 6
ε ε
b
4 5
ε

2. Construct a DFA:

ε closure {0} = { 0,1,2,4,7} -------------- A

Transition of input symbol a on A = { 3,8 }
Transition of input symbol b on A = { 5 }

ε closure {3,8} = { 3,6,7,1,2,4,8} -------------- B

Transition of input symbol a on B = { 8,3 }
Transition of input symbol b on B = { 5,9 }

ε closure {5} = { 5,6,7,1,2,4} -------------- C

Transition of input symbol a on C = { 8,3 }
Transition of input symbol b on C = { 5 }
13 Principles of Compiler Design
ε closure {5,9} = { 5,6,7,1,2,4,9} -------------- D
Transition of input symbol a on D = { 8,3 }
Transition of input symbol b on D = { 5,10 }

ε closure {5,10} = { 5,6,7,1,2,4,10} -------------- E

Transition of input symbol a on E = { 8,3 }
Transition of input symbol b on E = { 5 }

Since A is the start state and state E is the only accepting state then, the transition table is,

Input symbol
State
a b
A B C
B B D
C B C
D B E
E B C

3. Minimizing the DFA

Let Π = ABCDE
The initial partition Π consists of two groups.
Π1 = ABCD ( that is the non – accepting states)
Π2 = E ( that is the accepting state)

So, (ABCD) (E)

AB
a a
A B B B

b b
A C B D

AC
a a
A B C B

b b
A C C C

14 Principles of Compiler Design

AD
a a
A B D B

b b
A C D E

On input “a” each of these states has a transition to B, so they could all remain in one group as
far as input a is concerned.
On input “b” A,B,C go to members of the group Π1 (ABCD) while D goes to Π2 (E) . Thus Π1
group is split into two new groups.

Π1 = ABC Π2 = D , Π3 = E
So, (ABC) (D) (E)

AB
a a
A B B B

b b
A C B D
Here B goes to Π2. Thus Π1 group is again split into two new groups. The new groups are,

Π1 = AC Π2 = B , Π3 = D, Π4 = E
So, (AC) (B) (D) (E)

Here we cannot split any of the groups consisting of the single state. The only possibility is try to
split only (AC)

For AC
a a
A B C B

b b
A C C C

But A and C go the same state B on input a, and they go to the same state C on input b.
Hence after this,
(AC) (B) (D) (E)
Here we choose A as the representative for the group AC.
Thus A is the start state and state E is the only accepting state.

15 Principles of Compiler Design

So the minimized transition table is,
Input symbol
State
a b
A B A
B B D
D B E
E B A

Thus the minimized DFA is,

b
b
a
start A a B b D b E

a
a

____________________________________________________________________________
__

16 Principles of Compiler Design

SYNTAX ANALYSIS

Definition of Context – free – Grammar:- [CFG]

A CFG has four components.
1. a set of Tokens known as Terminal symbols.
2. a set of non-terminals
3. start symbol
4. production.

Notational Conventions:-

a) These symbols are terminals. (Ts)

(i) Lower case letters early in the alphabet such as a,b,c
(ii) Operator symbols such as +, -, etc.
(iii) Punctuation symbols such as parenthesis, comma etc.
(iv) The digits 0, 1, 2, 3, …, 9
(v) Bold face Strings.
b) These symbols are Non-Terminals (NTs)
(i) Upper case letters early in the alphabet such as A, B, C
(ii) The letter S, which is the start symbol.
(iii) Lower case italic names such as expr, stmt.
c) Uppercase letters such as X, Y, Z represent grammar symbols either NTs or Ts.

PARSER:
A parser for grammar G is a program that takes a string W as input and produces either a
parse tree for W, if W is a sentence of G or an error message indicating that W is not a sentence
of G as output.

There are two basic types of parsers for CFG.

1. Bottom – up Parser
2. Top – down Parser

1. Bottom up Parser:-
The bottom up parser build parse trees from bottom (leaves) to the top (root). The input
to the parser is being scanned from left to right, one symbol at a time. This is also called as “Shift
Reduce Parsing” because it consisting of shifting input symbols onto a stack until the right side
of a production appears on top of the stack.

There are two kinds of shift reduce parser (Bottom up Parser)

1. Operator Precedence Parser
2. LR Parser ( move general type)

17 Principles of Compiler Design

Designing of Shift Reduce Parser(Bottom up Parser) :-

Here let us “reduce” a string w to the start symbol of a grammar. At each step a string
matching the right side of a production is replaced by the symbol on the left.

For ex. consider the grammar,

S  aAcBe
A  Ab/b
Bd
and the string abbcde.

We want to reduce the string to S. We scan abbcde, looking for substrings that match the right
side of some production. The substrings b and d qualify.

Let us choose the left most b and replace it by A. That is AAb/b

So, S abbcde
aAbcde (A  b)
We now that Ab,b and d each match the right side of some production.
Suppose this time we choose to replace the substring Ab by A, in the left side of the production.

A  Ab
We now obtain,
aAcde (A Ab)
Then replacing d by B
aAcBe (B  d)
Now we can replace the entire string by S.
W = abbcde position production
abbcde 2 AAb/b (that is, Ab)
aAbcde 2 AAb
aAcde 4 Bd
aAcBe SaAcBe
Thus we will be reached the starting symbol S.
Each replacement of the right side of a production by the left side in the process above is
called a reduction.
In the above example abbcde is a right sentential form whose handle is,
Ab at position 2
AAb at position 2
Bd at position 4.

18 Principles of Compiler Design

Example:- Consider the following grammar
EE+E
EE*E
E(E)
Eid and the input string id1+id2*id3. Reduce to the start symbol E.
Solution:-
Right sentential form handle Reducing Production
id1 + id2 * id3 id1 Eid
E + id2 * id3 id2 Eid
E + E * id3 id3 Eid
E+E*E E*E EE*E
E+E E+E EE+E
E

Stack implementation of shift reduce parsing:

Initialize the stack with $ at the bottom of the stack. Use a $ the right end of the input
string.
Stack Input String
$ w$
The parser operates by shifting one or more input symbols onto the stack until a handle β
is on the top of a stack.
Example:- Reduce the input string id1+id2*id3 according to the following grammar.
1. EE*E
2. EE+E
3. E(E)
4. Eid
Solution:-
Stack Input String Action
$ id1+id2*id3$ shift
$id1 +id2*id3$ Eid(reduce)
$E +id2*id3$ shift
$E+ id2*id3$ shift
19 Principles of Compiler Design
$E+id2 *id3$ Eid(reduce)

Stack Input String Action

$E+E *id3$ shift
$E+E*id3 $ Eid(reduce)
$E+E*E $ EE*E(reduce)
$E+E $ EE+E(reduce)
$E $ Accept

The Actions of shift reduce parser are,

1. shift  Shifts next input symbol to the top of the stack
2. Reduce  The parser knows the right end of the handle which is at the top of the
stack.
3. Accept  It informs the successful completion of parsing
4. Error  It detects syntax error then calls error recovery routine.

Operator Precedence Parsing;-

In operator precedence parsing we use three disjoint relations.
< if a < b means a “yields precedence to” b
= if a = b means a “has same precedence as” b
> if a > b means a “takes precedence over” b
There are two common ways of determining precedence relation hold between a pair of
terminals.
1. Based on associativity and precedence of operators
2. Using operator precedence relation.
For Ex, * have higher precedence than +. We make + < * and * > +

Problem 1:- Create an operator precedence relation for id+id*id$

id + * $
id - > > >
+ < > < >
* < > > >
$ < < < -

20 Principles of Compiler Design

21 Principles of Compiler Design
Problem 2: Tabulate the operator precedence relation for the grammar
EE+E | E-E | E*E | E/E | E E | (E) | -E | id
Solution:-
Assuming 1. has highest precedence and right associative
2. * and / have next higher precedence and left associative
3. + and – have lowest precedence and left associative

+ - * / id ( ) $
+ > > < < < < < > >
- > > < < < < < > >
* > > > > < < < > >
/ > > > > < < < > >
> > > > < < < > >
id > > > > > - - > >
( < < < < < < < = -
) > > > > > - - > >
$ < < < < < < < - -

Derivations:-
The central idea is that a production is treated as a rewriting rule in which the non-
terminal in the left side is replaced by the string on the right side of the production.
For Ex, consider the following grammar for arithmetic expression,
EE+E | E*E | (E) | -E |id
That is we can replace a single E by –E. we describe this action by writing
E => -E , which is read “E derives –E”
E(E) tells us that we could also replace by (E).
So, E*E => (E) * E or E*E => E* (E)
We can take a single E and repeatedly apply production in any order to obtain sequence of
replacements.
E => -E
E => -(E)
E => -(id)
We call such sequence of replacements is called derivation.
Suppose α A β => α γ β then
A γ is a production and α and β are arbitrary strings of grammar symbols.
If α1=> α2 ……. => αn we say α1 derives αn

22 Principles of Compiler Design

The symbol,
=> means “ derives in one step”
=> means “derives zero or more steps”
=> means “derives in one or more steps”

Example:- EE+E | E*E | (E) | -E |id. The string –(id+id) is a sentence of above grammar.
E => -E
=> -(E)
=> -(E+E)
=> - (id+E)
=> -(id+id)
The above derivation is called left most derivation and it can be re written as,
E => -E

=> -(E)

=> -(E+E)

=> - (id+E)

=> -(id+id)

we can write this as E => -(id+id)

Example for Right most Derivation:-

Right most derivation is otherwise called as canonical derivations.

E => -E

=> -(E)

=> -(E+E)

=> - (id+E)

=> -(id+id)

23 Principles of Compiler Design

Parse trees & Derivations:-
A parse tree is a graphical representation for a derivation that filters out the choice
regarding replacement order.
For a given CFG a parse tree is a tree with the following properties.
1. The root is labeled by the start symbol
2. Each leaf is labeled by a token or ε
3. Each interior node is labeled by a NT
Ex.
E => -E E

- E

E => -(E) E

- E
( E )
E => -(E+E) E

- E

( E )

E + E

E => -(id+E) E

- E

( E )

E + E

E => -(id+E) E

- E

( E )

E + E

id id

24 Principles of Compiler Design

Top-Down Parsing:-
Top down parser builds parse trees starting from the root and creating the nodes of the
parse tree in preorder and work down to the leaves. Here also the input to the parser is scanned
from left to right, one symbol at a time.
For Example,
ScAd
Aab/a and the input symbol w=cad.
To construct a parse tree for this sentence in top down, we initially create a parse tree consisting
of a single node S.
An input symbol of pointer points to c, the first symbol of w.
w = cad
S
c A d
The leftmost leaf labeled c, matches the first symbol of w. So now advance the input pointer to
‘a’ the second symbol of w.
w = cad
and consider the next leaf, labeled A. We can then expand A using the first alternative for A to
obtain the tree.
S

c A d

a b
We now have a match for the second input symbol. Advance the input pointer to d,
w = cad

We now consider the third input symbol, and the next leaf labeled b. Since b does not match d,
we report failure and go back to A to see whether there is another alternative for A.
In going back to A we must reset the input pointer to position 2.
W = cad

We now try the second alternative for A to obtain the tree,

c A d

a
The leaf a matches the second symbol of w and the leaf d matches the third symbol. Now we
produced a parse tree for w = cad using the grammar ScAd and Aab/a.
This is successful completion.
25 Principles of Compiler Design
Difficulties of Top – Down Parsing (or) Disadvantages of Top - Down Parsing
1. Left Recursion:-
A grammar G is said to be left recursion if it has a non terminal A such that there is a
derivation , A + Aα for some α
This grammar can cause a top-down parser go into an infinite loop.
Elimination of left Recursion:-
Consider the left recursive pair of production
A  Aα/ β ,
Where β does not begin with A.
This left recursion can be eliminated by replacing this pair of production with,
A βA´
A´ α A´ / ε
Parse tree of original Grammar:-
AA α/ β
A

A α

β
Parse tree for new grammar to eliminate left recursion:-

β A´

α A´

26 Principles of Compiler Design

Example 1:
Consider the following grammar
a. EE+T/T
b. TT*F/F
c. F(E)/id Eliminate the immediate left recursions.
Solution:-
These productions are of the form AA α/ β
A β A´
A´ α A´/ε
(a) EE + T / T
the production eliminating left recursion is,
ETE´
E´ +T E´/ε
(b) T  T * F / F
TFT´
T´ *F T´/ ε
(c) F(E)/id This is not left recursion. So the production must be F(E)/id
--------------------------------------------------------------------------------------------------------------------
-
Example 2:- Eliminate left recursion in the following grammar.
SAa/b
AAc/Sd/e
Solution:-
1.Arrange the non terminals in order
S,A
2. There is no immediate left recursions among the S productions. We then substitute the S
productions in ASd to obtain the following production.
AAc/(Aa/b)d/e
AAc/ Aad/bd /e now this production is in immediate left recursion form
AA(c/ad)/bd/e
The production eliminating left recursion is,
A(bd/e)A´ ie, AbdA´/eA´
A´(c/ad)A´/ε ie, A´cA´/adA´/ε
So the production is,

27 Principles of Compiler Design

1. SAa/b
2. A bdA´/eA´
3. A´cA´/adA´/ε
2. Back Tracking:-
The two top down parser which avoid back tracking are,
1. Recursive Descent Parser
2. Predictive Parser
1.Recursive Descent Parser:-
A parser that uses a set of recursive procedures to recognize its input with no back
tracking is called recursive descent parser.
2.Predictive Parser:-
A predictive parser is an efficient way of implementing recursive – descent parsing by
handling the stack of activation records explicitly.
The picture for predictive parser is,
INPUT

a+b $

STACK
X
Y OUTPUT
Parsing
Program
Z Table
$

The predictive parser has,

1. An input – string to be parsed followed by $ (w$)
2. A stack – A sequence of grammar symbols preceded by $(the bottom of stack
marker)
3. A parsing table
4. An output

28 Principles of Compiler Design

FIRST AND FOLLOW
The construction of a predictive parser is aided by two functions associated with a
grammar G. The functions FIRST and FOLLOW allow us to fill in the entries of a predictive
parsing table for G.
Ex. 1:-
A. Give the predictive parsing table for the following grammar:-
EE+T/T
TT*F/F
F(E)/id
B. Show the moves of the parser for the input (id+id) * id
Solution:-
A. Elimination of left recursion:-
ETE´
E´ +T E´/ε
TFT´
T´ *F T´/ ε
F(E)/id
Finding FIRST and FOLLOW:-
FIRST(E)= FIRST(T) = FIRST(F) = { ( , id }
FIRST(E´) = { + , ε}
FIRST(T´) = { * , ε}
FOLLOW(E) = { ) , $ }
FOLLOW(E´) = FOLLOW(E) = { ) , $ }
FOLLOW(T) = FIRST(E´) = { + , ε} + FOLLOW(E´)
= { + , ) ,$ }
FOLLOW(T´) = FOLLOW(T) = { + , ) ,$ }
FOLLOW(F) = FIRST(T´) = { * , ε} + FOLLOW(T´) = { *, + , ) ,$ }
So the Predictive Parsing table is,
T
id + * ( ) $
NT
ETE ETE
E ´ ´
E´+TE E
E´ ´ ´ε
E´ε
T TFT´ TFT´

29 Principles of Compiler Design

T´*FT T
T´ T´ ε
´ ´ε
T´ε
F
F Fid
´(E)

B. Moves made by predictive parser on Input id+id*id

STACK INPUT OUTPUT
$E id + id * id$ ETE´
$E´T id + id * id$ TFT´
$E´T´F id + id * id$ Fid
$E´T´(id id) + id * id$ Remove id
$E´T´ +id * id$
T´ε
$E´ +id * id$
E´+TE´
$E´T(+ +)id * id$
Remove +
$E´T id * id$
TFT´
$E´T´F id * id$
Fid
$E´T´(id id) * id$
Remove id
$E´T´ * id$
T´*FT´
$E´T´F(* *)id$
Remove *
$E´T´F id$
Fid
$E´T´(id id)$
$E´T´ $ Remove id

$E´ $ T´ε

$ $ E´ ε

Ex.No:2:- Give the Predictive parsing table for the following Grammar,
SiEtSS´/a
S´eS/ε
Eb
Solution:-
Elimination of Left Recursion
The above grammar has no left Recursion. So we move to First and Follow.
First(S) = {i, a}
First(S´) = {e, ε}
First(E) = { b}
30 Principles of Compiler Design
Follow(S) = First(S´) = {e, ε} + Follow(S´)
= { e, $}

31 Principles of Compiler Design

Follow (E) = {t, $}
Follow (S´) = Follow(S) = {e, $}
So the Predictive Parsing table is,
T
a b e i t $
NT
SiEtSS
S Sa
´
S
S´ ´eS S´ε
S´ε
E Eb

Ex.No:3:- Give the Predictive parsing table for the following Grammar,
SCC
CcC/d
Solution:-
First(S) = First(C) = {c,d}
Follow(S) = { $ }
Follow( C ) = First ( C ) = { c, d, $}
So the predictive parsing table is

T
c d $
NT
SC SC
S
C C
C CcC Cd

32 Principles of Compiler Design

33 Principles of Compiler Design
Ex.No:3:- Give the Predictive parsing table for the following Grammar,
SiCtSS´/ a
S´eS/ε
Cb
Solution:-
FIRST(S) = { i, a}
FIRST(S´) = { e, ε}
FIRST(C) = { b}
FOLLOW(S) = FIRST(S´) = {e, ε} + FOLLOW(S´)
= { e, $}
FOLLOW(S´) = FOLLOW(S) = {e, $}
FOLLOW(C) = {t, $}
So the predictive parsing table is

T
a b e i t $
NT
SiCtSS
S Sa
´
S
S´ ´eS S´ε
S´ε
C Cb

34 Principles of Compiler Design

LR PARSERS:-
Construction of efficient Bottom-Up Parsers for a large class of Context-Free Grammars.
Such Bottom Up Parsers are called LR Parsers.
LR parsers can be used to parse a large class of Context-Free Grammars. The technique is called
LR(k) parsing.
L denotes that input sequence is processed from left to right
R denotes that the right most derivation is performed
K denotes that atmost K symbols of the sequence are used to make a decision.
Features Of LR Parsers:-
* LR Parsers can be constructed to recognize virtually all programming constructs for
which CFG can be written
* The LR Parsing method is move general than operator precedence or any of the other
common shift reduce techniques.
* LR Parsers can detect syntactic errors as soon as it is possible to do so on a left to right
scan of the input.
* LR Parsers can handle all languages recognizable by LL(1)
*LR Parsers can handle a large class of CF languages.
Drawbacks of LR Parser:-
Too much work has to be done to implement an LR Parser manually for a typical
programming language grammar.
LR Parser consists of two parts.
(i) A driver routine
(ii) The parsing table changes from one parser to another

LR Parsing Algorithm:
LR Parsers consists of an input, an output, a stack, a driver program and a parsing table
that has two functions.
1. ACTION 2. GOTO
The driver program is same for all LR Parsers. Only the parsing table changes from one
parser to another parser. The parsing program reads character from an input buffer one at a time.
The program uses a STACK to store a string of the form S0X1S1X2S2……XmSm, where Sm is on
top. Each Si is a symbol called STATE and each Xi is a grammar symbol.

35 Principles of Compiler Design

The function ACTION takes a state and input symbol as arguments and produces one of
four values.
1. Shift S where S is a state
2. Reduce by a Grammar production
3. Accept and
4. Error
The function GOTO takes a state and Grammar symbol as arguments and produces as a state.

a1 … ai … an $ Input
STACK
Sm

Sm-1 LR Parsing Output

Program

Xm-1
….

S0
ACTION GOTO

Different LR Parsers Techniques:-

There are three techniques for constructing an LR Parsing table for a Grammar.
1. Simple LR Parsing (SLR)
* Easy to implement
* Fails to produce a table for certain Grammars
2. Canonical LR parsing (CLR)
* Most Powerful
* Very Expensive to implement
3. Look Ahead LR Parsing (LALR Parsing)
* It is intermediate in power between the SLR and the Canonical LR Methods.

36 Principles of Compiler Design

LR Grammars:-
A Grammar for which a parsing table can be constructed and for which every entry is
uniquely defined is said to be an LR Grammar.
All CFG’s are not a LR Grammar.
Closure Operation:-
If I is a set of items for a grammar G then Closure (I) is the set of items constructed from
I by the two rules.
1. Initially every item in I is added to closure (I) .
2. If Aα .Bβ is in closure(I) and Bγ is a production, then add the item B . γ to I, if
it is not already there.
Augmented Grammar:-
If G is a grammar with start symbol S, then G´, the augmented grammar for G, in G with
a new start symbol S´ and production S´S

Ex:-1. Consider the grammar given below,

EE+T / T
TT*F / F
F(E) / id
Construct an LR Parsing table for the above grammar.
Solution:-
(i) Elimination left Recursion
ETE´
E´ +T E´/ε
TFT´
T´ *F T´/ ε
F(E)/id
(ii) Finding FIRST and FOLLOW:-
FIRST(E)= FIRST(T) = FIRST(F) = { ( , id }
FIRST(E´) = { + , ε}
FIRST(T´) = { * , ε}
FOLLOW(E) = { ) , $ }
FOLLOW(E´) = FOLLOW(E) = { ) , $ }
FOLLOW(T) = FIRST(E´) = { + , ε} + FOLLOW(E´)
= { + , ) ,$ }
FOLLOW(T´) = FOLLOW(T) = { + , ) ,$ }

37 Principles of Compiler Design

FOLLOW(F) = FIRST(T´) = { * , ε} + FOLLOW(T´) = { *, + , ) ,$ }
(iii) Numbering the Grammar:-
1. EE+T
2. ET
3. TT*F
4. TF
5. F(E)
6. Fid
Augmented Grammar
E´E
EE+T
ET
TT*F
TF
F(E)
Fid
Closure ( I´)

E´.E
E.E+T
E.T
T .T*F I0
T .F
F .(E)
F .id

GO TO(I0, E )
E´E. I1
EE.+T

GO TO (I0, T )
ET. I2
T T.*F

GO TO(I0, F )

38 Principles of Compiler Design

T F. I3

39 Principles of Compiler Design

GO TO(I0, ( )

F(.E)
E.E+T
E.T
T.T*F I4
T.F
F.(E)
F.id

GO TO(I0, id )
Fid . I5

GO TO(I1, + )
EE+.T
T.T*F
T.F I6
F.(E)
F.id

GO TO(I2, * )
TT*.F
F.(E) I7
F.id
GO TO(I4, E )
F(E.) I8
E.E+T
GO TO(I4, T )
ET. I2
TT.*F
GO TO(I4, F )
TF. I3
GO TO(I4, ( )
F(.E)
E.E+T
E.T
T.T*F I4
T.F
F.(E)
F.id

40 Principles of Compiler Design

GO TO(I4, id )
Fid . I5

GO TO(I6, T )
EE+T. I9
TT.*F

GO TO(I6, F )
TF. I3
GO TO(I6, ( )

F(.E)
E.E+T
E.T
T.T*F I4
T.F
F.(E)
F.id
GO TO(I6, id )
Fid . I5
GO TO(I7, F )
TT*F. I10
GO TO(I7, ( )

F(.E)
E.E+T
E.T
T.T*F I4
T.F
F.(E)
F.id
GO TO(I7, id )
Fid . I5
GO TO(I8, ) )
F(E) . I11
GO TO(I8, + )
EE+.T
T.T*F
T.F I6
F.(E)
F.id

41 Principles of Compiler Design

GO TO(I9, * )
TT*.F
F.(E) I7
F.id
Reduce:-
ET. (I2 )
ACTION(2,FOLLOW(E)) = (2,) ), (2,$)  r2

TF. (I3 )
ACTION(3,FOLLOW(T)) = (3,+),(3,) ), (3,$)  r4

Fid. (I5 )
ACTION(5,FOLLOW(F)) = (5,*), (5,+ ), (5,) ), (5,$)  r6

EE+T. (I9 )
ACTION(9,FOLLOW(E)) = (9,* ), (9,$)  r1

TT*F. (I10 )
ACTION(10,FOLLOW(T)) = (10,+), (10,) ), (10,$)  r3

F(E). (I11 )
ACTION(11,FOLLOW(F)) = (11,*), (11,+ ), (11,) ), (11,$)  r5

ACTION GOTO
State
+ * ( ) id $ E T F
0 S4 S5 1 2 3
1 S6 acc
2 S7 r2 r2
3 r4 r4 r4
4 S4 S5 8 2 3
5 r6 r6 r6 r6
6 S4 S5 9 3
7 S4 S5 10
8 S6 S11
9 S7 r1 r1
10 r3 r3 r3
11 r5 r5 r5 r5

42 Principles of Compiler Design

Ex:-2. Consider the grammar given below,
SCC
CcC / d
Construct a CLR Parsing table for the above grammar.

Solution:-
(i) Elimination left Recursion
SCC
CcC / d
(ii) Finding FIRST and FOLLOW:-
FIRST(S)= FIRST(C) = { c , d }
FOLLOW(S) = { $ }
FOLLOW(C) = FIRST(C) = { c,d , $ }
(iii) Numbering the Grammar:-
1. SCC
2. CcC
3. Cd
Augmented Grammar
S´S
SCC I´
CcC
Cd
Closure ( I´)
S´ .S,$
S .CC,$ I0
.
C cC,c/d
.
C d,c/d

GOTO (I0,S)
S´ S .,$ I1

GOTO (I0,C)
SC.C, $
C.cC, $ I2
C.d, $

GOTO (I0,c)
Cc.C, c/d
C. cC, c/d

43 Principles of Compiler Design

C.d, c/d

44 Principles of Compiler Design

GOTO (I0,d)
Cd ., c/d I4

GOTO (I2,C)
S CC .,$ I5

GOTO (I2,c)
Cc.C, $
C. cC, $ I6
C.d, $

GOTO (I2,d)
Cd ., $ I7

GOTO (I3,C)
C cC .,c/d I8

GOTO (I3,c)
Cc.C, c/d
C. cC, c/d I3
C.d, c/d

GOTO (I3,d)
Cd ., c/d I4

GOTO (I6,C)
C cC .,$ I9

GOTO (I6,c)
Cc.C, $
C. cC, $ I6
C.d, $

GOTO (I6,d)
Cd ., $ I7

Reduce:-
Cd., c/d (I4 )
ACTION(4,c/d) = (4,c ), (4,d)  r3

SCC. , $ (I5 )
ACTION(5,$) = (5,$)  r1

Cd., c/d (I4 )

ACTION(4,c/d) = (4,c ), (4,d)  r3

45 Principles of Compiler Design

CcC., c/d (I8 )
ACTION(8,c/d) = (8,c) , (8,d)  r2

CcC., $ (I8 )
ACTION(9,$) = (9,$)  r2

46 Principles of Compiler Design

INTERMEDIATE CODE GENERATION

A compiler while translating a source program into a functionally equivalent object code
representation may first generate an intermediate representation.
Advantages of generating intermediate representation
1. Ease of conversion from the source program to the intermediate code
2. Ease with which subsequent processing can be performed from the intermediate code

Parse Tree Intermediate code Intermediate Code Code

Parser
Generator Generator

INTERMEDIATE LANGUAGES:
There are three kinds of Intermediate representation. They are,
1. Syntax Trees
2. Postfix Notation
3. Three address code
1. Syntax Tree:-
A syntax tree depicts the natural hierarchical structure of a source program. A DAG
(Direct Acyclic Graph) gives the same information but in a more compact way because common
sub expressions are identified.
A syntax tree and dag for the assignment statement a:= b* -c + b* -c
assign

a +
 Syntax Tree
* *

b uminus b uminus

c c

assign

a +

 DAG

b uminus

47 Principles of Compiler Design

2. Postfix notation:-
Post fix notation is a linearized representation of a syntax tree. It is a list of nodes of the
tree in which a node appears immediately after its children.
The postfix notation for the syntax tree is,
a b c uminus * b c uminus * + assign
3. Three Address Code:-
Three Address code is a sequence of statements of the general form
x := y op z
where x,y and z are names, constants or compiler generated temporaries.
op stands for any operator such as a fixed or floating point arithmetic operator or a logical
operator on a Boolean valued data.
The Three Address Code for the source language expression like x+y*z is,
t1:= y * z
t2 := x + t1
Where t1 and t2 are compiler generated temporary names
So, three address code is a linearized representation of a syntax tree or a dag in which explicit
names correspond to the interior nodes of the graph.
Three Address Code Corresponding to the syntax tree and DAG is,
Code for Syntax Tree
t1 := -c
t2 := b * t1
t3 := -c
t4 := b * t3
t5 := t2 + t4
a := t5
Code for DAG
t1 := -c
t2 := b * t1
t5 := t2 + t2
a := t5

48 Principles of Compiler Design

Types of Three Address Statements:-
1. Assignment statement of the form x := y op z
2. Assignment instructions of the form x := op z
where op is a unary operation.
3. Copy statements of the form x := y
where, the value of y is assigned to x.
4. The Unconditional Jump GOTO L
5. Conditional Jumps such as if x relop y goto l
6. param x and call p, n for procedure calls and return y.
7. Indexed assignments of the form x := y[i] and x[i] := y
8. Address and pointer assignments, x :=&y, x := *y and *x := y

Implementations of Three Address Statements:

It has three types,
1. Quadruples
2. Triples
3. Indirect Triples
Quadruples:-
A Quadruple is a record structure with four fields, which we call op, arg1, arg2, and
result. The op field contains an internal code for the operator.
For Eg, the three address statements,
x := y op z is represented by
y in arg1
z in arg2
x in result.
The quadruples for the assignment a:=b* -c + b* -c are,

op arg1 arg2 result

(0) uminus c t1
(1) * b t1 t2
(2) uminus c t3
(3) * b t3 t4
(4) + t2 t4 t5
(5) := t5 a

49 Principles of Compiler Design

Triples:-
A triple is a record structure with three fields: op, arg1, arg2. This method is used to
avoid entering temporary names into the symbol table.
Ex. Triple representation of a:= b * -c + b * -c

op arg1 arg2
(0) uminus c
(1) * b (0)
(2) uminus c
(3) * b (2)
(4) + (1) (3)
(5) assign a (4)

Indirect Triples:-
Listing pointers to triples rather than listing the triples themselves are called indirect
triples.
Eg. Indirect Triple Representation of a := b * -c + b * -c

statement op arg1 arg2

(0) (10) (10) uminus c
(1) (11) (11) * b (10)
(2) (12) (12) uminus c
(3) (13) (13) * b (12)
(4) (14) (14) + (11) (13)
(5) (15) (15) assign a (14)

50 Principles of Compiler Design

BASIC BLOCKS & FLOW GRAPHS

Basic Blocks:
A block of code means a block of intermediate code with no jumps in except at the
beginning and no jumps out except at the end.
A basic block is a sequence of consecutive statements in which flow of control enters at
the beginning and leaves at the end without halt or possibility of branching except at the end.

Algorithm for Partition into Basic Blocks:-

Input: - A sequence of Three Address statements.
Output:- A basic blocks with each three address statement in exactly one block.
Method:-
1. We first determine the set of leaders, the first statement of basic blocks.
The rules we use are the following,
(i) The first statement is a leader.
(ii) Any statement that is the target of a conditional or unconditional GOTO is a
leader.
(iii) Any statement that immediately follows a GOTO or unconditional GOTO
statement is a leader.
2. For each leader, its basic block consists of the leader and all statements up to but not
including the next leader or the end of the program.
Example:-
Consider the fragment of code, it computes the dot product of two vectors A and B of
length 20.
Begin
PROD:=0
I:=1
Do
Begin
PROD:=PROD+A[I]*B[I]
I:=I+1
End
While I<=20
End

A list of three address statements performing the computation of above program is, (for a
machine with four bytes per word)

51 Principles of Compiler Design

So the three address statements of the above Pascal code is,
1. PROD:=0
2. I:=1
3. t1:=4*I
4. t2:=A[t1]
5. t3:=4*I
6. t4:=B[t3]
7. t5:=t2*t4
8. t6:=PROD+t5
9. PROD:=t6
10. t7:=I+1
11. I:=t7
12. if I<=20 GOTO (3)
The Leaders are, 1 and 3. So there are two Basic Blocks

Block 1.

1.PROD:=0
2. I:=1

Block 2.

3 t1:=4*I
4 t2:=A[t1]
5 t3:=4*I
6 t4:=B[t3]
7 t5:=t2*t4
8 t6:=PROD+t5
9 PROD:=t6
10 t7:=I+1
11 I:=t7
12. If I<=20 GOTO (3)