Intermediate Code Generation
Part I
Compiler Architecture
Intermediate
Code
Static Intermediate Code Target
Parser
Checker Code generator Generator language
Front End Back End
• m × n compliers can be built by writing m front ends
and n back ends – save considerable amount of effort
• We assume parsing, static checking and IC
generation is done sequentially
– These can be combined and done during parsing
• Static checking
– Operator operand compatibility
– Proper placement of break/continue keywords etc.
Intermediate Code (IC)
• The given program in a source language is converted
to an equivalent program in an intermediate language
by the IC generator.
• Ties the front and back ends together
• Language and Machine neutral
• Many forms
• Level depends on how being processed
• More than one intermediate language may be used by
a compiler
• Intermediate language can be many different languages, and
the designer of the compiler decides this intermediate language.
– syntax trees can be used as an intermediate language.
– postfix notation can be used as an intermediate language.
– three-address code (Quadraples) can be used as an intermediate
language
• we will use quadraples to discuss intermediate code generation
• quadraples are close to machine instructions, but they are not
actual machine instructions.
– some programming languages have well defined intermediate
languages.
• java – java virtual machine
• prolog – warren abstract machine
• In fact, there are byte-code emulators to execute instructions in
these intermediate languages.
Intermediate language levels
• High • Medium • Low
T1 a[i,j+2] t1 j+2 r1 [fp-4]
t2 i * 20 r2 r1 + 2
t3 t1 + t2 r3 [fp-8]
t4 4 * t3 r4 r3*20
t5 addr a r5 r4 + r2
t6 t5 + t 4 r6 4 * r5
t7 *t6 r7 fp – 216
f1 [r7+r6]
Intermediate Languages Types
• Graphical IRs:
– Abstract Syntax trees
– Directed Acyclic Graphs (DAGs)
– Control Flow Graphs
• Linear IRs:
– Stack based (postfix)
– Three address code (quadruples)
Graphical IRs
• Abstract Syntax Trees (AST) – retain essential
structure of the parse tree, eliminating unneeded
nodes.
• Directed Acyclic Graphs (DAG) – compacted AST to
avoid duplication – smaller footprint as well
• Control flow graphs (CFG) – explicitly model control
flow
ASTs and DAGs:
a := b *-c + b*-c
:= :=
a + a +
* * *
b - (uni) b - (uni) b - (uni)
c c c
AST DAG
Implementation of DAG/AST: Value Number Method
Three-Address Code
• A three-address code is:
x := y op z
where x, y and z are names, constants or compiler-generated
temporaries; op is any operator.
• But we may also the following notation for three-address code
(it looks like a machine code instruction)
op y,z,x
apply operator op to y and z, and store the result in x.
• We use the term “three-address code” because each statement
usually contains three addresses (two for operands, one for the
result).
Linearized Representation of DAG/AST
• Source Code
– a = b * -c + b * -c
• Three address code
• Tree Representation
Three-Address Statements
Binary Operator: op y,z,result or result :=
y op z
where op is a binary arithmetic or logical operator. This binary
operator is applied to y and z, and the result of the operation is
stored in result.
Ex: add a,b,c
gt a,b,c
addr a,b,c
addi a,b,c
Unary Operator: op y,,result or result :=
op y
where op is a unary arithmetic or logical operator. This unary
operator is applied to y, and the result of the operation is stored
in result.
Ex: uminus a,,c
not a,,c
Three-Address Code
• Two concepts
– Address
– Instruction
• Address
– Name: source-program names to appear as addresses
– Constant: Different types of constants
– Compiler Generated temporary:
Three-Address Instruction
Assignment Type 1: x := y op z
op is a binary arithmetic or logical operation
x, y and z are addresses
Assignment Type 2: x := op z
op is a unary arithmetic or logical operation
x and z are addresses
Copy Instruction: x := y
x and z are addresses and x is assigned the value of y
Three-Address Instructions
Unconditional Jump: goto L
We will jump to the three-address code with the label L, and the
execution continues from that statement.
Ex: goto L1 // jump to L1
jmp 7 // jump to the statement 7
Conditional Jump 1: if x goto L and ifFalse x goto L
We will jump to the three-address code with the label L if x is TRUE
and FALSE, respectively. Otherwise, the following three-address
instruction in sequence is executed next.
Conditional Jump 2: if x relop y goto L
We will jump to the three-address code with the label L if the result of
y relop z is true, and the execution continues from that statement.
If the result is false, the execution continues from the statement
following this conditional jump statement.
Three-Address Statements (cont.)
Procedure Parameters: param x
Procedure Calls: call p,n
where x is an actual parameter, we invoke the procedure p with
n parameters.
Ex: param x1
param x2
…… p(x1,...,xn)
param xn
n is necessary because call
call p,n,
can be nested
f(x+1,y) add x,1,t1
param t1,,
param y,,
call f,2,
Three-Address Statements (cont.)
Indexed Assignments:
x := y[i]
sets x to the value in location i memory units beyond location y
y[i] := x
sets contents of the location i memory units beyond location y to the
value of x
Address and Pointer Assignments:
x := &y
sets the r-value of x to l-value of y
x := *y where y is a pointer whose r-value is a location
sets the r-value of x equal to the contents of that location
*x := y
sets the r-value of the object pointed by x to the r-value of y
Three address code example
do i=i+1; while (a[i] < v)
L: t1=i+1 100: t1 = I +1
i=t1 101: i = t1
t2=i*8 102: t2=i*8
t3=a[t2] 103: t3=a[t2]
if t3 < v goto L 104: if t3 < v goto 100
(A) Symbolic Labels (B) Position Numbers
Representing 3-Address Statements
op, arg1, arg2, result
• x = minus y
– Does not use arg2
• x = y
– Op is =
• param a1
– Uses neither arg2 nor result
• Conditional/Unconditional jumps
– Put the target label in result
Quadruples
• a = b * -c + b * -c
Quadruples
• Store each fields directly
Triples
op arg1 arg2
0 [ ]= x i
1 := 0 y
Indirect Triples