0% found this document useful (0 votes)
125 views47 pages

Intermediate Code Generation Guide

Uploaded by

Yohannes Dereje
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
125 views47 pages

Intermediate Code Generation Guide

Uploaded by

Yohannes Dereje
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 47

Principles of Compiler

Design

Chapter 6: Intermediate Code


Generation
Contents

• Introduction

– Intermediate code generation

– Why we need Intermediate code generator

• Intermediate code representation

– Parse trees, syntax trees, and Direct Acyclic

Graph(DAG),

– Postfix notation,

– Three-address code,

• Data structure of TAC

– quadruples, triples and indirect Triples


Introduction — Intermediate code
generation
• In compiler, the front-end translates a source program into an

intermediate representation from which the back end

generates target code.

• Intermediate code is:

– The output of the Parser and the input to the Code Generator.

– Relatively machine-independent and allows the compiler to

be retargeted.

– Relatively easy to manipulate (optimize).

• Intermediate code serves as a bridge between the high-


level source language and the low-level machine code.
Why Intermediate Code?
• Facilitates retargeting: enables attaching a back
end for the new machine to an existing front end.
• Enables machine-independent code optimization.
• Simplicity : Conversion to Target Code is made
easy.
Without IR vs. With IR
• Without using intermediate code for n-source languages

and m-Target languages, for each source-target pair we

need (n*m) Compilers.

• If we use Intermediate code, We require n-Compilers to

convert each source language into Intermediate code and

m-Compilers to convert Intermediate code into m-target

languages. Thus We require only (n+m) Compilers.


Intermediate code Representations

• In compiler, the front-end translates a source program into

an intermediate representation from which the back end

generates target code.


Intermediate Languages

 The most commonly used intermediate representations

were :
– Syntax Tree
– Direct Acyclic Graph (DAG)
– Postfix Notation
– 3 Address Code
 Graphical Representation

• Includes-
– Syntax Tree
In addition, we use:
• Static Single Assignment (SSA)
– Direct Acyclic Graph(DAG)
• Control Flow Graph (CFG)
Syntax Tree or Abstract Syntax Tree(AST)
 Syntax Tree depicts the hierarchical structure of a

source program.

 Syntax tree (AST) is a condensed form of parse tree

useful for representing language constructs.

 Graphical Intermediate Representation

 Ex
Syntax Tree / Abstract Syntax Tree(AST)

 Example - Parse tree and syntax tree for 3 * 5 + 4

as follows.
Parse Tree VS Syntax Tree
Parse Tree Syntax Tree

• A parse tree is a graphical • A syntax tree (AST) is a


representation of a condensed form of parse
replacement process in a tree
derivation

• Each interior node • Each interior node


represents a grammar Rule represents an operator
• Each leaf node represents a • Each leaf node represents
terminal an operand

• Parse tree represent every • Syntax tree does not


detail from the real syntax represent every detail from
the real syntax
Constructing Syntax Tree For Expression

• Each node in a syntax tree can be implemented in a record


with several fields.
• In the node of an operator, one field contains operator and
remaining field contains pointer to the nodes for the
operands.
• When used for translation, the nodes in a syntax tree may
contain addition of fields to hold the values of attributes
attached to the node.
• Following functions are used to create syntax tree

1. mknode(op,left,right): creates an operator node with


label op and two fields containing pointers to left and
right.
Constructing Syntax Tree For Expression

• Example -
a–4+c
• The tree is constructed bottom up

P1 = mkleaf(id,entry a)
P2 = mkleaf(num, 4)
P3 = mknode(-, P1, P2)
P4 = mkleaf(id,entry c)
P5 = mknode(+, P3, P4)
Syntax directed definition
• Syntax trees for assignment statements are produced
by the syntax-directed definition.
• Non terminal S generates an assignment statement.
• The two binary operators + and * are examples of the
full operator set in a typical language.
• Operator associates and precedencies are the usual
ones, even though they have not been put into the
grammar.
• This definition constructs the tree from the input a:=b*
-c + b* -c
Syntax directed definition
This definition constructs the tree from the input
a:=b* -c + b* -c
Syntax directed definition
 Two representations of the syntax tree are as
follows.
Direct Acyclic Graph (DAG)
 DAG : Direct Acyclic Graph
 Variant of Syntax tree with a unique node for
each value. [contains no cycles]
 Helps optimize the code by identifying common
sub expressions in syntax tree.
 SDD used to generate Syntax tree will be used to
construct DAG too, with a simple check:

 The process of making this check is an overhead;


hence constructing DAG is costly.
Direct Acyclic Graph (DAG)
 DAG also gives the hierarchical structure of source
program but in a more compact way because
common sub expressions are identified.
Direct Acyclic Graph (DAG)
Postfix Notation
 Linearized representation of syntax tree
 In postfix notation, each operator appears
immediately after its last operand.
 Operators can be evaluated in the order in which
they appear in the string
 EXAMPLE
 Source String : a := b * -c + b * -c
 Postfix String: a b c uminus * b c uminus * +
assign
Postfix Notation - Rules
1) If E is a variable or constant, then the postfix notation for
E is E itself.
2) If E is an expression of the form E1 op E2 then postfix
notation for E is
– E1E2op.

3) If E is an expression of the form (E), then the postfix


notation for E is the same as the postfix notation for E.
4) For unary operation –E the postfix is E
– Ex: postfix notation for 9- (5+2) is 952+-

 Postfix notation of an infix expression can be obtained


using stack
Three-address Code

• In Three address statement, at most 3 addresses are used


to represent any statement.
• The reason for the term “three address code” is that each
statement contains 3 addresses at most.
– Two for the operands and one for the result.
• General Form of three Address Code is:

a = b op c where,
– a, b, c are the operands that can be names, constants
or compiler generated temporaries.
– op represents operator, such as fixed or floating point
arithmetic operator or a logical operator on Boolean
valued data.
Three-address Code
• Linearized representation of syntax tree or DAG.
• At most one operator on RHS of an instruction.
• Each instruction can have up to three addresses.
• Address can either be a
– name(identifier)
– constant(number)
– Temporary (holds intermediate result)
• Thus a source language expression like x + y * z
might be translated into a sequence
Three-address Code

• Example #2: Consider the assignment a:=b*-c+b*-c:


• The tree address code(TAC) representation will be;
T1 := -c
T2: = b*T1
T3: = T2 + T2
a := T3
• Let us calculate addresses based on widths, using the formula
address(List[i]) = base+i1*w1+i2*w2+………………+ik*wk,
where
– base – the address of first element
– W – width of the array, for Integer =4, char =1, Float =4, double =8, …
– i –the index of array
• Implementation of Arrays
• Example: calculate address(list[1]), assume base = 12 and
element size =4.
12 16 20 24 28 32 36 40 44 48
10 5 20 12 25 21 48 52 99 01

• Access function for single-dimensioned arrays:


address(list[k])= address(list[1])+((k - 1) * element_size)
address(list[4])= 12+((4-1)*4) = > 12+12 =24
address(list[3])= 12+((3-1)*4) = > 12+8 =20
address(list[2])= 12+((2-1)*4) = > 12+4 =16
address(list[1])= 12+((1-1)*4) = > 12+0 =12
• Translation of Array References – Example
– Let a denote an integer array of size 5, and
let c and i be integers.
– Assuming that the width of an integer is 4.
– Three address code for expression c+a[i] is
t1 = i * 4
t2 = a [ t 1 ] or a+t1
t 3 = C + t2
• TAC generation for Arrays
• Implementing Multi-dimensioned Arrays
• Multi-dimensioned arrays must be mapped to linear memory.
• Given this 3 x 3 array: A[3][3] =
347
625
138
• There are two common ordering ways:
• Row major order (stored by rows) – used in most languages
– 3, 4, 7, 6, 2, 5, 1, 3, 8
• Column major order (stored by columns) – used in Fortran
– 3, 6, 1, 4, 2, 3, 7, 5, 8
• Accessing an Element in a Multi-dimensioned Array
Location (a[i,j]) = address of a [row_lb,col_lb] +
(((number of rows above ith row) * size of row) +
(number of elements left to the jth column)) *
element_size
Location (a[i,j]) = address of a [row_lb,col_lb] +(((i -
row_lb) * n) + (j - col_lb)) * element_size
• Translation of Array References – Example
• Example 2- Let a denote a 3 x 3 array of integers, and let c, i,
and j all denote integers.
• Three address code for expression c+a[i][i] is

t1 = i * 12
t2=j*4
t 3 = t 1+ t 2
t 4 = a [ t 3 ] or a + t3
t s = C + t4
Implementation of Three-Address Statements

• Three address code is represented as record

structure with fields for operator and operands.

• These records can be stored as array or linked list.

• Those representations are

– Quadruples

– Triples

– Indirect triples
Quadruples

• A quadruple is a record structure with four fields,

which are op, ag1, arg2 and result

• The op field contains an internal code for the operator.

The three address statement


– x:= y op z is represented by placing y in arg1, z in arg2 and x

in result.

• Disadvantage – Temporary names have to be entered


Quadruples
Quadruples
TRIPLES
 In triples representation, the use of temporary
variables is avoided & instead serial number of
statement is used.
 So three address statements can be represented by
records with only there fields OP, arg1 & arg2.
 Since, there fields are used this intermediated code
formal is known as triples
 Advantages
 No need to use temporary variable which saves memory
as well as time
 Disadvantages
 Code Immovability
TRIPLES
INDIRECT TRIPLES
 This representation is an enhancement over triple
representation.
 It uses an additional instruction array to led the pointer
to the triples in the desired order.
 Since, it uses pointers instead of position to stage
reposition the expression to produce an optimized
code.
Indirect Triples - Example
Con…
Con…
PROBLEM 1
• Translate the following expression to quadruple tuples &
indirect tuples
a=b*-c+b*-c
• Sol : - Three address code for given expression is
TAC
t1 = uniminus c
t2 = b* t1
t3 = uniminus c
t4 = b* t3
t5 = t2 + t4
Q = t5
Summary

• Data Structures for TAC :


• Three address code is represented as record structure with
fields for operator and operands.
• These records can be stored as array or linked list.
• We discuss the following arrangements :
1. Quadruples

2. 2. Triples
3. 3. Indirect Triples

You might also like