INTERMEDIATE CODE
GENERATION
DR. K. M. AZHARUL HASAN
Phases of Compiler
Input Source Program
↓
Lexical Analyzer
↓
Syntax Analyzer
↓
Symbol Table
Semantic Analyzer Error Handler
Manager
↓
Intermediate Code
Generator
↓
Code Optimizer
↓
Code Generator
↓
Dr. Azhar,
CSE Dept. KUET
2
Target Program Wednesday, November 9, 2022
INTERMEDIATE CODE
3
Defined as
The source code of a program is translated into a form
called intermediate code which is suitable for code-
improvement and transformations before use to
generate object or machine code for a target machine.
Dr. Azhar, CSE Dept. KUET Wednesday, November 9, 2022
INTERMEDIATE CODE GENERATION
4
Translate the source program Such as
declarations, assignments and flow-of-control statements
into an intermediate form
Dr. Azhar, CSE Dept. KUET Wednesday, November 9, 2022
Intermediate Languages
5
Three types of intermediate representations
1. Syntax Trees
2. Postfix notation
3. Three Address Code
Dr. Azhar, CSE Dept. KUET Wednesday, November 9, 2022
Graphical Representations
6
A syntax tree depicts the natural hierarchical structure of a
source program.
Ex. A syntax tree for the assignment statement a:=b*-c+a*-c
Dr. Azhar, CSE Dept. KUET Wednesday, November 9, 2022
Graphical Representations
• There are two representations of the syntax tree.
• Each node is represented as a record with a field for its operator
and additional fields for pointers to its children.
• All the nodes in the syntax tree can be visited by following
pointers, starting from the root at position .
Dr. Azhar, CSE Dept. KUET 7 Wednesday, November 9, 2022
Graphical Representations
All nodes are allocated from an array of records and the index
or position of the node serves as the pointer to the node.
0 id b
1 id c
2 uminus 1
3 * 0 2
4 id a
5 id c
6 uminus 5
7 * 4 6
8 + 3 7
9 id a
10 assign 9 8
11 ……
Dr. Azhar, CSE Dept. KUET 8 Wednesday, November 9, 2022
Three-Address Code
A three-address statement is an abstract form of intermediate code.
Three-address code is a sequence of statements of the form
x:= y op z
where x, y, and z are names, constants, or compiler-generated
temporaries;
op stands for any operator, such as a fixed- or floating-point
arithmetic operator, or a logical operator on Boolean-valued data.
The reason for the term ”three-address code” is that each statement
usually contains three addresses, two for the operands and one for
the result.
Dr. Azhar, CSE Dept. KUET 9 Wednesday, November 9, 2022
Three-Address Code
No arithmetic expressions are permitted, as there is only one
operator on the right side of a statement
Thus expression like x+y*z might be translated into a sequence
t1: = y*z
where t1 and t2 are compiler-
t2:=x+t1 generated temporary names.
Dr. Azhar, CSE Dept. KUET 10 Wednesday, November 9, 2022
Three-Address Code
The use of names for the intermediate values computed by a
program allow- three-address code to be easily rearranged
a : = b*-c + a*-c
t1 := -c
t2 := b * t1
t3 := -c
t4 := a * t3
t5 := t2 + t4
a := t5
Dr. Azhar, CSE Dept. KUET 11 Wednesday, November 9, 2022
Three-Address Code : Types
12
Assignment statements of the form x: = y op z, where op is a
binary arithmetic or logical operation.
Assignment instructions of the form x:= op y, where op is a
unary operation.
Unary operations include
unary minus, logical negation, shift operators, and conversion
operators.
Dr. Azhar, CSE Dept. KUET Wednesday, November 9, 2022
Types Of Three-Address Statements
13
Copy statements of the form x: = y where the value of y
is assigned to x.
The unconditional jump goto L. The three-address
statement with label L is the next to be executed.
Conditional jumps such as if x relop y goto L. This
instruction applies a relational operator (<, =, >=, etc.)
Dr. Azhar, CSE Dept. KUET Wednesday, November 9, 2022
Types Of Three-Address Statements
14
param x and call p, n for procedure calls and return y,
where y representing a returned value is optional. Their
typical use is as the sequence of three-address
statements
param x1
param x2
param xn
call p, n
Dr. Azhar, CSE Dept. KUET Wednesday, November 9, 2022
Types Of Three-Address Statements
15
Indexed assignments of the form x: = y[ i ] and x [ i ]: =
y.
The statement x[i]:=y sets the contents of the location i units
beyond x to the value of y.
Address and pointer assignments of the form x:= &y,
x:= *y and *x: = y.
Dr. Azhar, CSE Dept. KUET Wednesday, November 9, 2022
Syntax-Directed Translation into Three-Address
Code
16
• When three-address code is generated, temporary
names are made up for the interior nodes of a syntax
tree.
• The value of non-terminal E on the left side of E E1
+ E2 will be computed into a new temporary t.
• In general, the three- address code for id := E consists
of code to evaluate E into some temporary t, followed
by the assignment [Link]: = t.
Dr. Azhar, CSE Dept. KUET Wednesday, November 9, 2022
Syntax-Directed Definition
17
• The attribute [Link] represents the three- address code
for the assignment S. The non-terminal E has two
attributes:
• [Link], the name that will hold the value of E, and
• [Link], the sequence of three-address statements evaluating E.
Dr. Azhar, CSE Dept. KUET Wednesday, November 9, 2022
Syntax directed translation for assignment
• The function newtemp returns a sequence of distinct names t1, t2,.. in response
to successive call
• Semantic rules are the agreed definitions of statements.
Dr. Azhar, CSE Dept. KUET 18 Wednesday, November 9, 2022
Semantic rules generating code for while loop
Semantic rules generating
code for while statement
S-> while E do S1
Dr. Azhar, CSE Dept. KUET 19 Wednesday, November 9, 2022
Implementations of three-Address Statements
20
Three address statements can be implemented as records with
fields for the operator and the operands.
Three such representations are
quadruples
Triples
indirect triples.
Dr. Azhar, CSE Dept. KUET Wednesday, November 9, 2022
Implementations of three-Address Statements
21
Quadruples
• A quadruple is a record structure with four fields
• op, argl, arg2, and result.
• The op field contains an internal code for the operator.
• The three-address statement x := y op z is represented by
placing y in arg1, z in arg2 and x in result.
• Statements with unary operators like x: = – y or x: = y do not
use arg 2.
Dr. Azhar, CSE Dept. KUET Wednesday, November 9, 2022
Implementations of three-Address Statements
22
Quadruples
Operators like param use neither arg2 nor result.
Conditional and unconditional jumps put the target label in
result.
t1 := -c op Arg1 Arg2 Result
(0) uminus c t1
t2 := b * t1
(1) * b t1 t2
t3 := -c
(2) uminus c t3
t4 := a * t3
(3) * a t3 t4
t5 := t2 + t4
(4) + t2 t4 t5
a := t5
(5) := t5 a
Dr. Azhar, CSE Dept. KUET Wednesday, November 9, 2022
Implementations of three-Address Statements
23
Triples
To avoid entering temporary names into the symbol table.
refer to a temporary value by the position of the statement
three-address statements can be represented by records with
only three fields: op, arg1 and arg2.
The fields argl and arg2, for the arguments of op, are either
pointers to the symbol table or pointers into the triple
structure.
Since three fields are used, this intermediate code format is
known as triples.
Dr. Azhar, CSE Dept. KUET Wednesday, November 9, 2022
Implementations of three-Address Statements
op Arg1 Arg2 Result
(0) uminus c t1
op Arg1 Arg2
(1) * b t1 t2
(2) uminus c t3 (0) uminus c
(3) * a t3 t4
(4) + t2 t4 t5 (1) * b (0)
(5) := t5 a
(2) uminus c
Quadruples
(3) * a (2)
t1 := -c
t2 := b * t1 (4) + (1) (3)
t3 := -c
(5) := a (4)
t4 := a * t3
t5 := t2 + t4
Triples
a := t5
Dr. Azhar, CSE Dept. KUET 24 Wednesday, November 9, 2022
Implementations of three-Address Statements
25
A ternary operation like x[ i ]: = y requires two entries in the
triple structure.
Dr. Azhar, CSE Dept. KUET Wednesday, November 9, 2022
Implementations of three-Address Statements
26
Indirect Triples
Listing pointers to triples, rather than listing the triples
themselves.
This implementation is naturally called indirect triples.
Use an array statement to list pointers to triples in the desired
order
Dr. Azhar, CSE Dept. KUET Wednesday, November 9, 2022
Comparison of the Implementations
27
Using the quadruple notation, a three- address statement a
temporary can immediately access the location via the symbol
table.
In the triples declaration; moving a statement that defines a
temporary value requires us to change all references to that
statement in the arg1 and arg2 arrays.
This problem makes triples difficult to use in an optimizing
compiler.
Dr. Azhar, CSE Dept. KUET Wednesday, November 9, 2022
Comparison of the Implementations
28
Indirect triples present no such problem. A statement can be
moved by reordering the statement list.
Indirect triples can save some space compared with quadruples
if the same temporary value is used more than once.
The reason is that two or more entries in the statement array can
point to the same line of the op-arg1-arg2 structure.
Dr. Azhar, CSE Dept. KUET Wednesday, November 9, 2022
Declarations
29
• A global variable, offset, can keep track of the next avai1able relative address.
• Non-terminal P generates a sequence of declarations of the form id: T.
Before the first declaration is considered, offset is set to 0.
PD {Offset=0 }
DD;D
D id : T {enter ([Link], [Link], offset); //creates a symbol table
Offset:= offset + [Link] }
T integer {[Link] := integer;
[Link] := 4}
T real {[Link] := real;
[Link] := 8}
T array [num] of T1 {[Link] := array ([Link], [Link]);
[Link] := [Link] × [Link]}
T *T1 {[Link] := pointer ([Link]);
[Link]:= 4}
Dr. Azhar, CSE Dept. KUET Wednesday, November 9, 2022
Keeping Track of Scope Information
When a nested procedure is seen, processing of declarations in the
enclosing procedure is temporarily suspended. This approach will
be illustrated by adding semantic rules to the following language.
PD
D D;D | id: T | proc id; D;S
A new symbol table is created when a procedure declaration D
proc id D1;S is seen, and entries for the declarations in D1 are
created in the new table. S is statements of the proc.
The new table points back to the symbol table of the enclosing
procedure;
Dr. Azhar, CSE Dept. KUET 30 Wednesday, November 9, 2022
Symbol table for nested procedures
Dr. Azhar, CSE Dept. KUET 31 Wednesday, November 9, 2022
Assignment Statement
32
How names can be looked up in the symbol table
How elements of arrays and records can be accessed
Emit() to emit three-address statements to an
output file
Operation lookup ([Link]) checks
if there is an entry for this occurrence of the name in the
symbol table.
If so, a pointer to the entry is returned;
otherwise, lookup returns nil to indicate that no entry was
found.
Dr. Azhar, CSE Dept. KUET Wednesday, November 9, 2022
Assignment Statement
S id := E {p=lookup([Link]);
if p!=nil then emit(P ’:=’ [Link])
else error }
E E1 + E2 {[Link] := newtemp;
emit([Link] ‘:=’[Link] ‘+’ [Link])}
E E1 * E2 {[Link] := newtemp;
Emit([Link] ‘:=’ [Link] ‘*’ [Link])}
E -E1 {[Link] := newtemp;
Emit([Link] ‘:=’ ‘uminus’ [Link])}
E ( E1 ) {[Link] := [Link] }
E id { p:= lookup([Link]);
If p != nil then [Link] := p
else error}
Dr. Azhar, CSE Dept. KUET 33 Wednesday, November 9, 2022
Reusing Temporary Names
34
newtemp generates a new temporary name each
time a temporary is needed.
It is useful, especially in optimizing compilers, to
actually create a distinct name each time newtemp is
called.
The temporaries used to hold intermediate values
in expression calculations
Space has to be allocated to hold their values.
Dr. Azhar, CSE Dept. KUET Wednesday, November 9, 2022
Reusing Temporary Names
35
The code generated by the rules for
E E1 + E2 has the general form:
evaluate E1 into t1
evaluate E2 into t2
t:= t1 + t2
From the rules [Link] it follows that t1 and t2 are
not used elsewhere in the program.
The lifetimes of all temporaries used in the evaluation
of E1 are contained in the lifetime of t1.
Dr. Azhar, CSE Dept. KUET Wednesday, November 9, 2022
Reusing Temporary Names
36
Let us assume for simplicity that we are dealing only
with integers.
Keep a count c, initialized to zero.
Whenever a temporary name is used as an operand,
decrement c by 1.
Whenever a new temporary name is generated, use $c
and increase c by 1.
Dr. Azhar, CSE Dept. KUET Wednesday, November 9, 2022
Reusing Temporary Names
Statement Value of c
x:=a*b+c*d-e*f
0
t1=a*b $0:= a * b 1
t2=c*d $1:= c * d 2
t3=t1+t2 $0:= $0 + $1 1
$1:= e * f 2
t4=e*f
$0:= $0 - $1 1
t5=t3-t4
x := $0 0
x:=t5
Dr. Azhar, CSE Dept. KUET 37 Wednesday, November 9, 2022
Addressing Array Elements
38
Elements of an array can be accessed quickly if the
elements are stored in a block of consecutive locations.
If the width of each array element is w, then the ith
element of array A begins in location
base + (i – low) × w
low is the lower bound on the subscript; base is the relative address of
A[low]. Can be written as
i×w + (base – low×w)
c= base – low×w can be evaluated during declaration.
• If c is saved in the symbol table, the relative address of A[i] is obtained
by simply adding i × w to c
Dr. Azhar, CSE Dept. KUET Wednesday, November 9, 2022
Addressing Array Elements
39
A two-dimensional array is normally stored in one of two forms, either
row-major or column-major.
Layout of a 2 × 3 array , If the no. of dimension is n then ordering is n!
Dr. Azhar, CSE Dept. KUET Wednesday, November 9, 2022
Addressing Array Elements
40
In the case of a two-dimensional array stored in row-major form,
the relative address of A[i1,i2] can be calculated by the formula
base + ((i1 – low1) × n2 + i2 – low2) × w
where low1 and low2 are the lower bounds on the values of i1 and i2
n2 is the number of values that i2 can take (length of dimension). if
high2 is the upper bound on the value of i2, then
n2 = high2 – low2 + 1.
Assuming that i1 and i2 are the only values that are not known at
compile time,
rewrite the above expression as
((i1 * n2) + i2) × w + (base – ((low1 × n2) + low2) × w)
The last term can be determined at compile time
Dr. Azhar, CSE Dept. KUET Wednesday, November 9, 2022
Addressing Array Elements
41
Generalization of A[i1, i2,..., ik]
(( …((i1n2+i2)n3+i3)… )nk + ik) × w + base – (( …((low1 n2 +
low2) n3 +low3)… )nk + lowk) × w
Since for all j, nj = highj – lowj+ 1 is assumed fixed
The 2nd term can be computed in the compile time.
Dr. Azhar, CSE Dept. KUET Wednesday, November 9, 2022
Addressing Array Elements
42
A be a 2D int Array 10*20
Low1=low2=1, n1=10, n2=20
x:=A[y,z]
The three address code
((i1 * n2) + i2) × w +
t1:=y*20 (base – ((low1 × n2) + low2) ×w)
t1:=t1+z
t2:=c //c=base-84
t3=4*t1 ((y *20) + z) X w
t4:=t2[t3] (base – ((1 X20) + 1 ) X 4)
x:=t4
Dr. Azhar, CSE Dept. KUET Wednesday, November 9, 2022
INTERMEDIATE CODE GENERATION
43
Type Conversions
The semantic rule for [Link] associated with the production
E E1 + E2 is:
E E1+E2
{
[Link] := if [Link] = integer and
[Link] = integer
then integer
else real
}
Dr. Azhar, CSE Dept. KUET Wednesday, November 9, 2022
INTERMEDIATE CODE GENERATION
44
[Link] := newtemp;
if [Link] = integer and [Link] = integer then
begin
emit([Link]’:=’ [Link] ’int+’ [Link]);
[Link]: = integer
end
else if [Link] = real and [Link] = real then
begin
emit ([Link] ’:=’ [Link] ’real +’ [Link]);
[Link] := real
end
else if [Link] = integer and [Link] = real then
begin
u := newtemp;
emit(u ’:=’ ’inttoreal’ [Link]);
emit([Link] ’:=’ u ’real+’ [Link]);
[Link]:= real
end
Dr. Azhar, CSE Dept. KUET Wednesday, November 9, 2022
INTERMEDIATE CODE GENERATION
45
else if [Link] = real and [Link] = integer then
begin
u := newtemp;
emit(u ’:=’ ’inttoreal’ [Link]);
emit([Link] ’:=’ [Link] ’real+’ u);
[Link]: = real
end
else
[Link]:= type error;
Dr. Azhar, CSE Dept. KUET Wednesday, November 9, 2022
INTERMEDIATE CODE GENERATION
46
BOOLEAN EXPRESSIONS
Boolean operators: AND, OR, NOT.
Form E1 relop E2
Relop <,>,=,<=,…
Grammer
E E or E E and E not E (E) id relop id true false
Nonzero true and zero false
a or b and not c (Three address code)
t1=not c
t2=b and t1
t3=a or t2
Dr. Azhar, CSE Dept. KUET Wednesday, November 9, 2022
INTERMEDIATE CODE GENERATION
47
a<b equivalent to if a<b then 1 else 0
Three address code
100: if a<b goto 103
101: t=0
102: goto 104
103: t:=1
104:
Dr. Azhar, CSE Dept. KUET Wednesday, November 9, 2022
INTERMEDIATE CODE GENERATION
48
Dr. Azhar, CSE Dept. KUET Wednesday, November 9, 2022
INTERMEDIATE CODE GENERATION
49
a<b or c<d and e<f
Dr. Azhar, CSE Dept. KUET Wednesday, November 9, 2022
INTERMEDIATE CODE GENERATION
50
Flow of control Statement
Grammer
Sif E then S1|
if E then S1 else S2|
while E do S1
E is boolean expression to be translated
[Link] and [Link] controls the flow
Dr. Azhar, CSE Dept. KUET Wednesday, November 9, 2022
INTERMEDIATE CODE GENERATION
51
a) if E then S1
b) if E then S1 else S2
c) while E do S1
Dr. Azhar, CSE Dept. KUET Wednesday, November 9, 2022
INTERMEDIATE CODE GENERATION
52
Case Statements
Case Statements
Switch E
Switch expression
Begin
begin
Case V1: S1
Case value: statement
Case value: statement Case V2: S2
… ….
Case value: statement Case Vn-1: Sn-1
Default: statement
Default: Sn
End end
Dr. Azhar, CSE Dept. KUET Wednesday, November 9, 2022
INTERMEDIATE CODE GENERATION
53
The translation of switch code is to
Evaluate the expression
Find the match in list of cases
n way branch
Can be implemented using goto
Or create a table of pairs (value and level)
Or hash table if n is very large. Construct array of levels
Execute the statement with the value
Dr. Azhar, CSE Dept. KUET Wednesday, November 9, 2022
INTERMEDIATE CODE GENERATION
54
Dr. Azhar, CSE Dept. KUET Wednesday, November 9, 2022
Phases
of Compiler
Input Source Program
↓
Lexical Analyzer
↓
Syntax Analyzer
↓
Symbol Table
Semantic Analyzer Error Handler
Manager
↓
Intermediate
Code
Generator
↓
Code Optimizer
↓
Code Generator
↓
Out Target Program
Dr. Azhar, CSE Dept. KUET Wednesday, November 9, 2022
56
Thank You
Dr. Azhar, CSE Dept. KUET Wednesday, November 9, 2022