0% found this document useful (0 votes)
21 views115 pages

CS3007 Code Generation

Uploaded by

RihanshuRaj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views115 pages

CS3007 Code Generation

Uploaded by

RihanshuRaj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Code Generation

Compiler Design (CS 3007)

S. Pyne and T. K. Mishra

Assistant Professor
Computer Science and Engineering Department
National Institute of Technology, Rourkela, INDIA

November 6, 2020

S. Pyne and T. K. Mishra Code Generation November 6, 2020 1 / 115


Code Generator

Final phase of compilation


Takes intermediate code and symbol table information as input
Produces as output a semantically equivalent target program
Target program must
preserve semantic meaning of source program
of high quality - lesser time, space, energy, etc.
effectively use available resources of target machine
The code generator itself must run efficiently
Optimal target code generation is undecidable
Subproblems like register allocation is NP-Complete
Heuristic techniques to generate near optimal code
Efficient target code - optimization phase prior to code generation
Optimization of intermediate code for efficient code generation

S. Pyne and T. K. Mishra Code Generation November 6, 2020 2 / 115


Three primary tasks of code generator

Instruction selection
choosing appropriate target-machine instructions to implement
intermediate code
Register allocation and assignment
deciding what values to keep in which registers
Instruction ordering
deciding in what order to schedule execution of instructions

S. Pyne and T. K. Mishra Code Generation November 6, 2020 3 / 115


Instruction Selection
Code generator maps intermediate code into a code sequence that
can be executed by the target machine
The complexity of mapping is determined by factors such as
the level of the intermediate code
Each high level intermediate code is translated into a sequence of
machine instructions using code templates. Often produces poor code
that needs further optimization.
Intermediate code with low-level details of the underlying machine
leads to generation of more efficient code sequences.
the nature of the instruction-set architecture
If the target machine does not support each data type in a uniform
manner, then each exception to the general rule requires special
handling.
On some machines, for example, floating-point operations are done
using separate registers.
the desired quality of the generated code
quality of generated code is determined by its speed and size
naive translation to correct but inefficient code
translation to many different code sequences with significant cost
S. Pyne and T. K. Mishra Code Generation November 6, 2020 4 / 115
Generation of different code sequences
Translation of three-address statement of the form x = y + z
LD R0, y // R0 = y (load y into register R0)
ADD R0, R0, z // R0 = R0 + z (add z to R0)
ST x, R0 // x = R0 (store R0 into x)
This often produces redundant loads and stores
For example, the sequence of three-address statements
a=b+c
d=a+e
Can translated into
LD R0, b // R0 = b
ADD R0, R0, c // R0 = R0 + c
ST a, R0 // a = R0
LD R0, a // R0 = a
ADD R0, R0, e // R0 = R0 + e
ST d, R0 // d = R0
4th statement is redundant - it loads a value that has just been stored
3rd statement is redundant - if a is not subsequently used
S. Pyne and T. K. Mishra Code Generation November 6, 2020 5 / 115
Improving quality of code

The three-address statement a = a + 1 may be implemented as


LD R0, a // R0 = a
ADD R0, R0, #1 // R0 = R0 + 1
ST a, R0 // a = R0
If target machine has an “increment” instruction INC
a = a + 1 can be implemented more efficiently as INC a
The three-address statement a = a / 2 may be implemented as
LD R0, a // R0 = a
DIV R0, R0, #2 // R0 = R0 / 2
ST a, R0 // a = R0
If target machine has a “right shift” instruction RSHF
a = a / 2 can be implemented more efficiently as RSHF a

S. Pyne and T. K. Mishra Code Generation November 6, 2020 6 / 115


Register Allocation
Registers are fastest computational unit on target machine
Registers form smallest memory unit, cannot hold all values
Values not held in registers need to reside in memory
Instructions with register operands are shorter and faster
Instructions with operands in memory are longer and slower
Efficient utilization of registers is particularly important
The use of registers is subdivided into two subproblems
Register allocation, during which a set of variables is selected that will
reside in registers at each point in the program
Register assignment, during which the specific register is picked where
a variable will reside in
Finding an optimal assignment of registers to variables is difficult,
even with single-register machines.
Mathematically, the problem is NP-complete
Register-usage conventions of hardware and/or operating system
makes this problem more complicated
S. Pyne and T. K. Mishra Code Generation November 6, 2020 7 / 115
The Memory Hierarchy

Capacity

1 cycle Registers 256 – 8000 bytes

3 cycles Cache 256 kB – 1 MB

20 – 100 cycles Main Memory 32 MB – 1 GB

0.5 – 5 M cycles Disk 4 GB – 1 TB


Latency

S. Pyne and T. K. Mishra Code Generation November 6, 2020 8 / 115


Managing the Memory Hierarchy

Programs seems to interact with main memory and disk


Programmer is responsible for move data between disk and memory
(e.g., file I/O)
Hardware is responsible for moving data between memory and caches
Compiler is responsible for moving data between memory and registers
Cache and register sizes are growing slowly
Processor speed improves faster than speed of memory and disk
The cost of a cache miss is growing
The widening gap is bridged with more caches
It is very important to manage registers and caches properly
Compilers are good at managing registers

S. Pyne and T. K. Mishra Code Generation November 6, 2020 9 / 115


The Register Allocation Problem
Intermediate code uses as many temporaries as necessary
Complicates final translation to assembly
Simplifies code generation and optimization
Typical intermediate code uses too many temporaries
The register allocation problem:
Rewrite the intermediate code to use fewer temporaries than registers
Method: assign more temporaries to a register
But without changing the program behaviour
History
Register allocation is as old as intermediate code
Register allocation used in the original FORTRAN compiler in the 50’s
Very crude algorithms
In 1980 Chaitin invented a register allocation using graph coloring
Relatively simple and works well in practice
Gives a global picture of the register requirements
Characterizes legal register assignments
Architecture independent

S. Pyne and T. K. Mishra Code Generation November 6, 2020 10 / 115


An Example

Input Code Output Code


a := c + d r1 := r2 + r3
e := a + b r1 := r1 + r4
f := e − 1 r1 := r1 − 1
Assuming temporaries a and e die after use
Temporary a can be “reused” after e := a + b
Same with temporary e
Can allocate a, e and f all to one register r1

S. Pyne and T. K. Mishra Code Generation November 6, 2020 11 / 115


Basic Register Allocation Idea

The value in a dead temporary is not needed for the rest of the
computation
A dead temporary can be reused
Basic rule:
Temporaries t1 and t2 can share the same register if at any point in the
program at most one of t1 or t2 is live !
Algorithm for a machine with k registers:
1 Compute live variables for each point
2 Construct register interference graph
3 Color the register interference graph
4 If the #colors > k then spill a temporary and goto Step 1
5 Otherwise allocate register and stop
Coloring is NP-Complete – no efficient algorithms
Use of heuristics
Assign color to vertices ordered in decreasing degrees [Welch-Powell]

S. Pyne and T. K. Mishra Code Generation November 6, 2020 12 / 115


Computation of live variables

{a, c, f } a := b + c
{c, d, f } d := −a
e := d + f {b, c, f }

{c, e} {c, d, e, f }

b := d + e {b, c, e, f }
f := 2 ∗ e e := e − 1

{c, f } {c, f } {b}

b := f + c
{b}

S. Pyne and T. K. Mishra Code Generation November 6, 2020 13 / 115


Register interference graph

Two simultaneously live temporaries cannot be allocated in the same


register
Construction of the undirected graph
A node for each temporary
An edge between t1 and t2 if they are live simultaneously at some point
in the program
a
f b

e c

d
Two temporaries can be allocated to the same register if there is no
edge connecting them
S. Pyne and T. K. Mishra Code Generation November 6, 2020 14 / 115
Register Allocation Through Graph Coloring
Number of colors = number of registers required
Let k = number of machine registers
If the graph is k-colorable then register assigment uses no more than
k registers
a r3
r1 f b r4

r3 e c r2

d r4
Suppose k = 4 and the example graph requires 4 colors
4 color representing 4 registers
S. Pyne and T. K. Mishra Code Generation November 6, 2020 15 / 115
Code after register allocation

r3 := r2 + r4
r4 := −r3
r3 := r4 + r1

r4 := r4 + r3
r1 := 2 ∗ r3 r3 := r3 − 1

r4 := r1 + r2

S. Pyne and T. K. Mishra Code Generation November 6, 2020 16 / 115


Spilling
If #colors > k then spill the temporary with highest degree
Reconstruct the intereference graph without the spilled temporary
Color the new graph and allocate register
a r2 a
b b r3

e c r2 e c r1

d d r3
Suppose k = 3 and example graph requires 4 colors
Temporary f is spilled out
A memory location must be allocated for f
Belonging to the current stack frame
Let this address be fa
S. Pyne and T. K. Mishra Code Generation November 6, 2020 17 / 115
Code after Spilling

Before each operation that uses f , insert f := load fa


After each operation that defines f , insert store f , fa

r2 := r3 + r1
r3 := −r2
f := load f a
r2 := r3 + f

f := 2 ∗ r2 r3 := r3 + r2
store f, f a r2 := r2 − 1

f := load f a
r3 := f + r1

S. Pyne and T. K. Mishra Code Generation November 6, 2020 18 / 115


Spilling - liveliness and heuristics

The new liveness information is almost as before


f is live only
Between a f := load fa and the next instruction
Between a store f , fa and the preceding instruction
Spilling reduces the live range of f
And thus reduces its interferences
Which result in fewer neighbors in RIG for f
Spilling Heuristics
Spill temporaries with most conflicts
Spill temporaries with few definitions and uses
Avoid spilling in inner loops

S. Pyne and T. K. Mishra Code Generation November 6, 2020 19 / 115


Summary

Register allocation is a “must have” optimization in most compilers:


Because intermediate code uses too many temporaries
Because it makes a big difference in performance
Graph coloring is a powerful register allocation scheme
Register allocation is more complicated for CISC machines

S. Pyne and T. K. Mishra Code Generation November 6, 2020 20 / 115


The Target Language
A byte-addressable simple target machine model
Three-address machine with load-store, computation, and
conditional/unconditional jump operations
n general purpose registers, R0, R1,· · · ,Rn − 1
Load operations: Instruction LD dst, addr loads the value in location
addr into location dst (i.e. [dst] ← [addr ])
Denotes assigment dst = addr
LD r,x loads value in location x into register r (r ← [x])
LD r1 , r2 is a register-to-register content of r2 copied into r1 (r1 ← r2 )
Store operations: Instruction ST x, r stores the value in register r into
the location x (i.e [x] ← r )
This instruction denotes the assignment x = r
Computation operations: OP dst, src1 , src2 and OP dst, src1
OP ∈ {ADD, SUB, DIV , MUL, AND, OR, NOT , XOR, RSHF , LSHF }
src1 , src2 - operand locations, dst - result location
SUB r1 , r2 , r3 computes r1 ← r2 − r3
NOT r1 , r2 computes r1 ← r2
S. Pyne and T. K. Mishra Code Generation November 6, 2020 21 / 115
Branch instructions

Unconditional jumps: Instruction BR L causes control to branch to


the machine instruction with label L
Conditional jumps of the form Bcond r, L
r is a register, L is a label
cond stands for any of the common tests on values in the register r
BLTZ r, L causes a jump to label L if r < 0
Otherwise control passes to next machine instruction

S. Pyne and T. K. Mishra Code Generation November 6, 2020 22 / 115


Addressing Modes

In instructions, location of a variable named x is a memory address


reserved for x (i.e. the l-value of x)
A location can be an indexed address of the form a(r )
a is a variable and r is a register
memory location denoted by a(r ) is computed by taking the l-value of
a and adding to it the value in register r
For instruction LD R1, a(R2)
R1 ← contents(a + contents(R2))
contents(x) denotes the content a register or a memory location x
This addressing mode is useful for accessing arrays
a is the base address of the array (i.e. address of first element)
r is displacement from a in number of bytes where a(r ) is located
A memory location can be an integer indexed by a register
LD R1, 100(R2) =⇒ R1 ← contents(100 + contents(R2))
Useful for pointers

S. Pyne and T. K. Mishra Code Generation November 6, 2020 23 / 115


Addressing modes

Indirect addressing modes


∗r means the memory location found in the location represented by the
contents of register r
∗100(r ) means the memory location found in the location obtained by
adding 100 to the contents of r
LD R1, *100(R2)
=⇒ R1 ← contents(contents(100 + contents(R2)))
Immediate constant addressing mode
LD R1, #100 loads the integer 100 into register R1
ADD R1, R1, #100 adds the integer 100 into register R1

S. Pyne and T. K. Mishra Code Generation November 6, 2020 24 / 115


Equivalent 3-address statements and machine instructions

x=y-z
LD R1, y // R1 ← y
LD R2, z // R2 ← z
SUB R1, R1, R2 // R1 ← R1 - R2
ST x, R1 // x ← R1
b = a[i]; starting index of array a is 0, 8 bytes per array element
LD R1, i // R1 ← i
MUL R1, R1, 8 // R1 ← R1 * 8
LD R2, a(R1) // R2 ← contents(a + contents(R1))
ST b, R2 // b ← R2
a[j] = c
LD R1, c // R1 ← c
LD R2, j // R2 ← j
MUL R2, R2, 8 // R2 ← R2 * 8
ST a(R2), R1 // contents(a + contents(R2)) ← R1

S. Pyne and T. K. Mishra Code Generation November 6, 2020 25 / 115


Equivalent 3-address statements and machine instructions

x = *p
LD R1, p // R1 ← p
LD R2, 0(R1) // R2 ← contents(0 + contents(R1))
ST x, R2 // x ← R2
*p = y
LD R1, p // R1 ← p
LD R2, y // R2 ← y
ST 0(R1), R2 // contents(0 + contents(R1)) ← R2
if x < y goto L
LD R1, x // R1 = x
LD R2, y // R2 = y
SUB R1, R1, R2 // R1 = R1 - R2
BLTZ R1, M // if R1 < 0 jump to M
label M represents first machine instruction generated from 3-address
code with L

S. Pyne and T. K. Mishra Code Generation November 6, 2020 26 / 115


Program and Instruction Costs

Some common cost measures to be optimized


length of compilation time
size, running time and power consumption of target program
Determining compiling and running costs of a program is complex
Finding an optimal target program is undecidable
Many of the subproblems are NP-hard
Heuristic techniques produce good suboptimal target programs
Cost of instructions: one plus the costs of addressing modes of the
operands
This cost corresponds to the length in words of the instruction
Addressing modes involving registers have zero additional cost
Addressing modes with memory location or constant have additional
cost of one as operands are stored in words following the instruction

S. Pyne and T. K. Mishra Code Generation November 6, 2020 27 / 115


Instruction Cost

LD R0, R1 copies contents of register R1 into register R0


Cost is one because no additional memory words are required
LD R0, M loads contents of memory location M into register R0
Cost is two since address of M is in the word following the instruction
LD R1, *100(R2) loads into register R1 the value given by
contents(contents(100 + contents(R2)))
Cost is two since constant 100 is stored in the word following the
instruction
Cost of a target-language program on a given input is the sum of
costs of the individual instructions executed when the program is run
on that input
Good code-generation algorithms seek to minimize the sum of the
costs of the instructions executed by the generated target program on
typical inputs

S. Pyne and T. K. Mishra Code Generation November 6, 2020 28 / 115


Addresses in the Target Code

1 A statically determined area Code holds executable target code.


The size of target code is determined at compile time.
2 A statically determined data area Static for holding global constants
and other data generated by the compiler.
The size of the global constants and compiler data can also be
determined at compile time.
3 A dynamically managed area Heap for holding data objects that are
allocated and freed during program execution.
The size of the Heap cannot be determined at compile time.
4 A dynamically managed area Stack for holding activation records as
they are created and destroyed during procedure calls and returns.
The size of the Stack cannot be determined at compile time.

S. Pyne and T. K. Mishra Code Generation November 6, 2020 29 / 115


Static Allocation

call callee is implemented by


ST [Link], #here + 20
BR [Link]
Constant [Link] is start address of activation record for callee
Constant [Link] is address of first instruction of callee
Called procedure callee in Code area of run-time memory
#here + 20 in ST is return address - address of instruction following BR
#here is address of current instruction
call sequence of two instructions consume 5 words or 20 bytes
return is implemented by a BR ∗[Link]
transfers control to saved start location of callee’s activation record

S. Pyne and T. K. Mishra Code Generation November 6, 2020 30 / 115


Call and return

A three-address code
// code for c
action1
call p
action2
halt
// code for p
action3
return
first procedure has no caller, its final instruction is HALT
HALT returns control to the operating system

S. Pyne and T. K. Mishra Code Generation November 6, 2020 31 / 115


Call and return

Instructions starting at address 100 implement statements


action1; call p; action2; halt
Target program for static allocation
// code for c
100: ACTION1 // code for action1
120: ST 364, #140 // save return address 140 in location 364
132: BR 200 // call p
140: ACTION2
160: HALT // return to operating system
···
// code for p
200: ACTION3
220: BR *364 // return to address saved in location 364
···
// 300-363 hold activation record for c
300: // return address
304: // local data for c
···
// 364-451 hold activation record for p
364: // return address
368: // local data for p

S. Pyne and T. K. Mishra Code Generation November 6, 2020 32 / 115


Stack Allocation

By using relative addresses for storage in activation records


Position of a procedure’s activation record is unknown until run time
This position is usually stored in a register
Words in activation record accessed as offsets from value in register
Indexed address mode of target machine is convenient for this purpose
Relative addresses in activation record can be taken as offsets from
any known position in the activation record
Consider positive offsets by maintaining in a register SP
SP is a pointer to beginning of activation record on stack top
Caller increments SP, transfers control to called procedure
After control returns to caller leads to decrement of SP
Decrement of SP deallocates activation record of called procedure

S. Pyne and T. K. Mishra Code Generation November 6, 2020 33 / 115


Stack Allocation

First procedure initializes stack by setting SP to start of stack area in


memory:
LD SP, #stackStart // initialize the stack
code for the first procedure
HALT // terminate execution

A procedure call sequence increments SP, saves the return address,


and transfers control to the called procedure:
ADD SP, SP, #caller .recordSize // increment stack pointer
ST 0(SP), #here + 16 // save return address
BR [Link] // jump to the callee
#caller .recordSize represents the size of an activation record
The return sequence
BR *0(SP) // return to caller
SUB SP, SP, #caller .recordSize // decrement stack pointer
0(SP) is the address of the first word in activation record
*0(SP) is the return address saved there

S. Pyne and T. K. Mishra Code Generation November 6, 2020 34 / 115


An example

Three-address code
// code for m
action1
call q
action2
halt
// code for p
action3
return
// code for q
action4
call p
action5
call q
action6
call q
return

Activation record size for m, p, and q are msize, psize, and qsize
msize, psize, and qsize be 20, 40, and 60 bytes

S. Pyne and T. K. Mishra Code Generation November 6, 2020 35 / 115


Target code for stack allocation
// code for m
100: LD SP, #600 // initialize the stack
108: ACTION1 // code for action1
128: ADD SP, SP, #msize // call sequence begins
136: ST 0(SP), #152 // push return address
144: BR 300 // call q
152: SUB SP, SP, #msize // restore SP
160: ACTION2
180: HALT
···
// code for p
200: ACTION3
220: BR *0(SP) // return
···
// code for q
300: ACTION4 // contains a conditional jump to 456
320: ADD SP, SP, #qsize
328: ST 0(SP), #344 // push return address
336: BR 200 // call p
344: SUB SP, SP, #qsize
352: ACTION5
372: ADD SP, SP, #qsize
380: ST 0(SP), #396 // push return address
388: BR 300 // call q
396: SUB SP, SP, #qsize
404: ACTION6
424: ADD SP, SP, #qsize
432: ST 0(SP), #440 // push return address
440: BR 300 // call q
448: SUB SP, SP, #qsize
456: BR *0(SP) // return
···
600: // stack starts here
S. Pyne and T. K. Mishra Code Generation November 6, 2020 36 / 115
Run-Time Addresses for Names

Storage-allocation and layout of local data in an activation record for


a procedure determine how the storage for names is accessed
Name in a three-address statement is a pointer to a symbol-table
entry for that name
It makes the compiler more portable
Front end need not be changed even when compiler is moved to a
different machine where a different run-time organization is needed
Generating the specific sequence of access steps while generating
intermediate code benefits optimizing compiler
It lets the optimizer take advantage of details it would not see in the
simple three-address statement

S. Pyne and T. K. Mishra Code Generation November 6, 2020 37 / 115


Basic Blocks and Flow Graphs

Partition intermediate code into basic blocks with following properties


1 flow of control can only enter basic block through its first instruction
there are no jumps into the middle of the block
2 Control will leave the block without halting or branching
except at the last instruction in the block
Basic blocks become nodes of a flow graph
whose edges indicate which blocks can follow which other blocks

S. Pyne and T. K. Mishra Code Generation November 6, 2020 38 / 115


Algorithm - Partitioning 3-address code into basic blocks

INPUT: A sequence of three-address instructions


OUTPUT: A list of basic blocks
METHOD
1 Determination of leaders

i First 3-address instruction is a leader


ii Any instruction that is the target of a conditional or
unconditional jump is a leader
iii Any instruction that immediately follows a conditional
or unconditional jump is a leader
2 For each leader, its basic block consists of itself and all
instructions up to but not including the next leader or
the end of the intermediate program

S. Pyne and T. K. Mishra Code Generation November 6, 2020 39 / 115


Example - A program to form a 10 × 10 identity matrix

for i from 1 to 10 do
for j from 1 to 10 do
a[i, j] = 0.0;
for i from 1 to 10 do
a[i, i] = 1.0;

S. Pyne and T. K. Mishra Code Generation November 6, 2020 40 / 115


Example - A program to form a 10 × 10 identity matrix

for i from 1 to 10 do
for j from 1 to 10 do
a[i, j] = 0.0;
for i from 1 to 10 do
a[i, i] = 1.0;

S. Pyne and T. K. Mishra Code Generation November 6, 2020 41 / 115


Intermediate Code
1 i=1
2 j=1
3 t1 = 10 * i
4 t2 = t1 + j
5 t3 = 8 * t2
6 t4 = t3 - 88
7 a[t4] = 0.0
8 j=j+1
9 if j <= 10 goto (3)
10 i=i+1
11 if i <= 10 goto (2)
12 i=1
13 t5 = i - 1
14 t6 = 88 * t5
15 a[t6] = 1.0
16 i=i+1
17 if i <= 10 goto (13)

Leaders - instructions 1, 2, 3, 10, 12, and 13


S. Pyne and T. K. Mishra Code Generation November 6, 2020 42 / 115
Basic blocks
B1 : i = 1
B2 : j = 1
B3 : t1 = 10 * i
t2 = t1 + j
t3 = 8 * t2
t4 = t3 - 88
a[t4] = 0.0
j=j+1
if j <= 10 goto B3
B4 : i = i + 1
if i <= 10 goto B2
B5 : i = 1
B6 : t5 = i - 1
t6 = 88 * t5
a[t6] = 1.0
i=i+1
if i <= 10 goto B6
S. Pyne and T. K. Mishra Code Generation November 6, 2020 43 / 115
Next-Use Information

Knowing the next use of value in a variable is essential


If value of a variable currently in a register will never be referenced,
then that register can be assigned to another variable
use of a name in a three-address statement
A three-address statement i assigns a value to x
If statement j has x as an operand, and control can flow from
statement i to j along a path that has no intervening assignments to x
Then statement j uses the value of x computed at statement i
x is live at statement i

S. Pyne and T. K. Mishra Code Generation November 6, 2020 44 / 115


Algorithm - Determining liveness and next-use information

INPUT: A basic block B of three-address statements


OUTPUT: At each statement i: x = y op z in B, attach liveness and
next-use information of x, y, and z
METHOD:
Start at last statement in B and scan backwards to beginning of B
At each statement i: x = y op z in B
1 Attach to statement i the information currently found in the symbol
table regarding the next use and liveness of x, y, and z
2 In symbol table, set x to ”not live” and ”no next use”
3 In symbol table, set y and z to ”live” and next uses of y and z to i

S. Pyne and T. K. Mishra Code Generation November 6, 2020 45 / 115


Flow Graphs

Represent flow of control between basic blocks by a flow graph


Nodes of flow graph are basic blocks
An edge from block B to block C exists iff first instruction in C
immediately follow last instruction in B
Conditions for an edge from B to C
There is a conditional or unconditional jump from end of B to
beginning of C
C immediately follows B in original order of three-address instructions,
and B does not end in an unconditional jump
B is a predecessor of C, and C is a successor of B
entry node: Edge from entry node to first executable node
exit node: Edge from last executable instruction to exit node

S. Pyne and T. K. Mishra Code Generation November 6, 2020 46 / 115


Flow graph

S. Pyne and T. K. Mishra Code Generation November 6, 2020 47 / 115


Loops

A set of nodes L in a flow graph is a loop if L contains a node e called


the loop entry, such that
1 e is not ENTRY, the entry of the entire flow graph.
2 No node in L besides e has a predecessor outside L. That is, every path
from ENTRY to any node in L goes through e.
3 Every node in L has a nonempty path, completely within L, to e.

S. Pyne and T. K. Mishra Code Generation November 6, 2020 48 / 115


Example

Loops in the flow graph


1 B3 by itself
2 B6 by itself
3 {B2 , B3 , B4 }
S. Pyne and T. K. Mishra Code Generation November 6, 2020 49 / 115
Optimization of Basic Blocks

Improvement in the running time of code


Local optimization within each basic block by itself
Global optimization looks at how information flows among basic
blocks of a program

S. Pyne and T. K. Mishra Code Generation November 6, 2020 50 / 115


DAG Representation of Basic Blocks

Constructing a DAG for a basic block


1 There is a node in the DAG for each of the initial values of the
variables appearing in the basic block.
2 There is a node N associated with each statement s within the block.
The children of N are those nodes corresponding to statements that
are the last de
finitions, prior to s, of the operands used by s.
3 Node N is labeled by the operator applied at s, and also attached to
N is the list of variables for which it is the last de
nition within the block.
4 Certain nodes are designated output nodes. These are the nodes
whose variables are live on exit from the block; i.e., their values may
be used later, in another block of the flow graph. Calculation of these
live variables is a matter for global flow analysis.

S. Pyne and T. K. Mishra Code Generation November 6, 2020 51 / 115


Code improving transformations due to DAG

1 Eliminate local common subexpressions, i.e., instructions that


compute a value that has already been computed.
2 Eliminate dead code, i.e., instructions that compute a value that is
never used.
3 Reorder statements that do not depend on one another; such
reordering may reduce the time a temporary value needs to be
preserved in a register.
4 Apply algebraic laws to reorder operands of three-address instructions,
and sometimes thereby simplify the computation.

S. Pyne and T. K. Mishra Code Generation November 6, 2020 52 / 115


Finding Local Common Subexpressions

Common subexpressions can be detected by noticing


a new node M is about to be added
whether there is an existing node N with the same children, in the
same order, and with the same operator
If so, N computes the same value as M and may be used in its place

S. Pyne and T. K. Mishra Code Generation November 6, 2020 53 / 115


Example: A DAG for the block
1 a=b+c
2 b=a-d
3 c=b+c
4 d=a-d

After elimination of common sub


1 a=b+c
2 d=a-d
3 c=d+c
S. Pyne and T. K. Mishra Code Generation November 6, 2020 54 / 115
Example: A DAG for the block

1 a=b+c
2 b=b-d
3 c=c+d
4 e=b+c

No common subexpressions

S. Pyne and T. K. Mishra Code Generation November 6, 2020 55 / 115


Dead Code Elimination
Delete from a DAG any root (node with no ancestors) that has no
live variables attached
Repeated application of this transformation will remove all nodes
from the DAG that correspond to dead code

a and b are live but c and e are not


remove the root labeled e
then node labeled c becomes a root and can be removed
roots labeled a and b remain, each have live variables attached
S. Pyne and T. K. Mishra Code Generation November 6, 2020 56 / 115
The Use of Algebraic Identities

Algebraic identities for optimizations on basic blocks


x +0=0+x =x
x −0=x
x ×1=1×x =x
x/1 = x
Local reduction in strength - expensive → cheaper equivalent
x2 = x × x
2×x =x +x
x/2 = x × 0.5
Constant folding - compile time evaluation and replacement of constant
expressions
Expression 2 * 3.14 replaced by 6.28

S. Pyne and T. K. Mishra Code Generation November 6, 2020 57 / 115


Associativity and Commutativity

1 a = b + c;
2 e = c + d + b;
Generated intermediate code
1 a=b+c
2 t=c+d
3 e=t+b
If t is not needed outside this block then
1 a=b+c
2 e=a+d

S. Pyne and T. K. Mishra Code Generation November 6, 2020 58 / 115


Representation of Array References

1 x = a[i]
2 a[j] = y
3 z = a[i]
Representing array accesses in a DAG:
1 x = a[i]

create node with operator =[ ] as root node


children as initial value of array a0 and index i
x becomes label of node =[ ]
2 a[j] = y
new node with operator =[ ] as root node
three children a0 , j and index y
No labeling of node =[ ]
node =[ ] kills all nodes whose value depends on a0
A killed node cannot receive any more labels; i.e, it cannot become a
common subexpression.

S. Pyne and T. K. Mishra Code Generation November 6, 2020 59 / 115


Example: The DAG for the basic block

1 x = a[i]
2 a[j] = y
3 z = a[i]

S. Pyne and T. K. Mishra Code Generation November 6, 2020 60 / 115


Example: The DAG for the basic block

1 b = 12 + a
2 x = b[i]
3 b[j] = y

S. Pyne and T. K. Mishra Code Generation November 6, 2020 61 / 115


Pointer Assignments
Assign indirectly through a pointer
1 x = *p
2 *q = y
Do not know what p or q point to
x = *p is a use of every variable whatsoever
*q = y is a possible assignment to every variable
Operator =* must take all nodes that are currently associated with
identifiers as arguments, which is relevant for dead-code elimination
*= operator kills all other nodes so far constructed in the DAG
Global pointer analyses limit the set of variables a pointer could
reference at a given place in the code
Even pointer local analysis could restrict the scope of a pointer
1 p = &x
2 *p = y
Know x, and no other variable, is given value of y
So don’t need to kill any node but the node to which x was attached
S. Pyne and T. K. Mishra Code Generation November 6, 2020 62 / 115
Procedure Calls

Procedure calls behave much like assignments through pointers


In absence of global data-flow information, it is assumed that
a procedure uses and changes any data to which it has access
If procedure P is in the scope of variable x then
a call to P both uses the node with attached variable x
kills that node

S. Pyne and T. K. Mishra Code Generation November 6, 2020 63 / 115


Reassembling Basic Blocks From DAGs
After performing possible optimizations
while constructing the DAG or
by manipulating the DAG once constructed
Reconstitute the three-address code for the basic block from which
the DAG is built
For each node that has one or more attached variables
construct a 3-address statement that computes value of one of those
variables
Prefer to compute result into a variable that is live on exit from block
If do not have global live-variable information to work from, then
assume that every variable of the program (except temporaries
generated by compiler) is live on exit from block
If node has more than one live variable attached, then
introduce copy statements to give correct value to each of those
variables
Global optimization can eliminate copies, if arranged to use one of
two variables in place of the other
S. Pyne and T. K. Mishra Code Generation November 6, 2020 64 / 115
Example
Original basic block
1 a=b +c
2 b=a -d
3 c=b +c
4 d=a -d

If b is not live on exit from the block then, then three-address code
1 a=b+c
2 d=a-d
3 c=d+c

c = d + c, must use d as an operand rather than b


because the optimized block never computes b
If both b and d are live on exit
code with subtraction replaced by a less expensive copy
1 a=b+c
2 d=a-d
3 b=d
4 c=d+c
S. Pyne and T. K. Mishra Code Generation November 6, 2020 65 / 115
Rules to reconstruct basic blocks from DAG
1 Order of instructions must follow order of nodes in DAG i.e., compute
a node’s value until a value for each of its children is computed.
2 Assignments to an array must follow all previous assignments to, or
evaluations from, the same array, according to the order of these
instructions in the original basic block.
3 Evaluations of array elements must follow any previous (according to
the original block) assignments to the same array. The only
permutation allowed is that two evaluations from the same array may
be done in either order, as long as neither crosses over an assignment
to that array.
4 Any use of a variable must follow all previous (according to the
original block) procedure calls or indirect assignments through a
pointer.
5 Any procedure call or indirect assignment through a pointer must
follow all previous (according to the original block) evaluations of any
variable.
S. Pyne and T. K. Mishra Code Generation November 6, 2020 66 / 115
A Simple Code Generator

Consider an algorithm that generates code for a single basic block


Keep track of what values are in what registers
Avoid generating unnecessary loads and stores
Deciding how to use registers to best advantage
Four principal uses of registers
1 Some or all operands of an operation must be in registers
2 Registers make good temporaries - hold results of a subexpression
3 Registers hold global values computed in one basic block and used in
other blocks
4 Registers used in run-time storage management
Machine instructions are of the form
LD reg, mem
ST mem, reg
OP reg, reg, reg

S. Pyne and T. K. Mishra Code Generation November 6, 2020 67 / 115


Register and Address Descriptors

For each three-address instruction what loads are necessary to get


needed operands into registers
After generating the loads, generate operation
If there is a need to store result into a memory location, generate store
Require a data structure that tells what program variables currently
have their value in which registers
Need to know whether memory location for a given variable currently
has the proper value for that variable
Since a new value for the variable may have been computed in a
register and not yet stored

S. Pyne and T. K. Mishra Code Generation November 6, 2020 68 / 115


Descriptors of data structure

1 For each available register, a register descriptor keeps track of the


variable names whose current value is in that register.
Use those registers that are available for local use within a basic block
Assume initially all register descriptors are empty
As the code generation progresses, each register will hold the value of
zero or more names
2 For each program variable, an address descriptor keeps track of the
location or locations where the current value of that variable can be
found
The location might be a register, a memory address, a stack location,
or some set of more than one of these
The information can be stored in the symbol-table entry for that
variable name

S. Pyne and T. K. Mishra Code Generation November 6, 2020 69 / 115


Code-Generation Algorithm

Code-Generation Algorithm uses function getReg(I)


getReg(I) selects registers for each memory location associated with
three-address instruction I
getReg has access to register and address descriptors for all the
variables of the basic block, and data-flow information such as the
variables that are live on exit from the block

S. Pyne and T. K. Mishra Code Generation November 6, 2020 70 / 115


Machine Instructions for Operations

For a three-address instruction such as x = y op z, do the following:


1 1. Use getReg(x = y op z) to select registers for x, y, and z. Call
these Rx , Ry , and Rz .
2 If y is not in Ry (according to the register descriptor for Ry ), then
issue an instruction LD Ry , y 0 , where y 0 is one of the memory
locations for y (according to the address descriptor for y).
3 Similarly, if z is not in Rz , issue an instruction LD Rz , z 0 , where z’ is a
location for z.
4 Issue the instruction OP Rx , Ry , Rz .

S. Pyne and T. K. Mishra Code Generation November 6, 2020 71 / 115


Machine Instructions for Copy Statements

A three-address copy statement of the form x = y


Assume getReg will always choose same register for both x and y
If y is not already in that register Ry , then
generate the machine instruction LD Ry , y .
If y was already in Ry , then do nothing
It is only necessary to adjust the register descriptor for Ry
so that it includes x as one of the values found there

S. Pyne and T. K. Mishra Code Generation November 6, 2020 72 / 115


Ending the Basic Block

If the variable is a temporary used only within the block then


when the block ends, forget about the value of temporary and assume
its register is empty
If the variable is live on exit from the block, or if not known which
variables are live on exit then
need to assume that the value of the variable is needed later
For each variable x whose address descriptor does not say that its
value is located in the memory location for x
generate ST x, R
R is a register in which x’s value exists at the end of the block

S. Pyne and T. K. Mishra Code Generation November 6, 2020 73 / 115


Managing Register and Address Descriptors
Code-generation algorithm issues load, store, and other machine
instructions
Updates register and address descriptors using rules as
1 For the instruction LD R, x
a. Change register descriptor for register R so it holds only x
b. Change address descriptor for x by adding register R as an additional
location
2 For instruction ST x, R, change address descriptor for x to include its
own memory location
3 For an operation such as OP Rx , Ry , Rz implementing a three-address
instruction x = y op z
a. Change register descriptor for Rx so that it holds only x
b. Change address descriptor for x so that its only location is Rx
c. Remove Rx from address descriptor of any variable other than x
4 For a copy statement x = y, generate the load for y into register Ry
then manage descriptors as for all load statements (per rule 1)
a. Add x to register descriptor for Ry
b. Change address descriptor for x so that its only location is Ry
S. Pyne and T. K. Mishra Code Generation November 6, 2020 74 / 115
Example

S. Pyne and T. K. Mishra Code Generation November 6, 2020 75 / 115


Design of the Function getReg

Three-address instruction I: x = y op z
Rules for getReg (I )
1 If y is currently in a register
Pick a register already containing y as Ry .
Do not issue a machine instruction to load this register.
2 If y is not in a register and there is an empty register
Pick one such register as Ry
3 If y is not in a register and there is no empty register
Pick an allowable registers and make it safe to reuse.
Let R be a candidate register
v is one of the variables in register descriptor for R
Ensure that v’s value either not needed, or it can be obtained from
somewhere

S. Pyne and T. K. Mishra Code Generation November 6, 2020 76 / 115


Possibilities
(a) If the address descriptor for v says that v is somewhere besides R,
then we are OK.
(b) If v is x, the variable being computed by instruction I, and x is not
also one of the other operands of instruction I(z in this example),
then we are OK. The reason is that in this case, we know this value of
x is never again going to be used, so we are free to ignore it.
(c) Otherwise, if v is not used later (that is, after the instruction I, there
are no further uses of v, and if v is live on exit from the block, then v
is recomputed within the block), then we are OK.
(d) If we are not OK by one of the first three cases, then we need to
generate the store instruction
ST v , R to place a copy of v in its own memory location. This
operation is called a spill.
Since R may hold several variables at the moment, repeat the above
steps for each such variable v. At the end, R’s “score” is the number
of store instructions needed to generate.
Pick one of the registers with the lowest score.
S. Pyne and T. K. Mishra Code Generation November 6, 2020 77 / 115
Selection of the register Rx

The issues and options are almost as for y


The only differences are
1 Since a new value of x is being computed, a register that holds only x
is always an acceptable choice for Rx . This statement holds even if x is
one of y and z, since the considered machine instructions allows two
registers to be the same in one instruction.
2 If y is not used after instruction I, in the sense described for variable v
in item (3c), and Ry holds only y after being loaded, if necessary, then
Ry can also be used as Rx . A similar option holds regarding z and Rz .
For a copy instruction I: x = y pick register Ry , then choose Rx = Ry

S. Pyne and T. K. Mishra Code Generation November 6, 2020 78 / 115


Peephole Optimization

A simple but effective technique for locally improving target code


Examining a sliding window of target instructions - peephole
Replacing instructions within peephole by a shorter/faster sequence
Can be applied directly after intermediate code generation
Common peephole optimizations
Redundant-instruction elimination
Flow-of-control optimizations
Algebraic simpli
fications
Use of machine idioms

S. Pyne and T. K. Mishra Code Generation November 6, 2020 79 / 115


Redundant-instruction elimination
Eliminating Redundant Loads and Stores
LD R0, a
ST a, R0
delete store instruction
since first instruction ensures value of ’a’ loaded in R0
Eliminating Unreachable Code
if debug == 1 goto L1
goto L2
L1: print debugging information
L2:
Peephole optimization - eliminate jumps over jumps
if debug != 1 goto L2
print debugging information
L2:
If debug set is to 0 then by contstant propagation
if 0 != 1 goto L2
print debugging information
L2:
S. Pyne and T. K. Mishra Code Generation November 6, 2020 80 / 115
Flow-of-Control Optimizations
replace the sequence
goto L1
···
L1: goto L2
by the sequence
goto L2
···
L1: goto L2
replace the sequence
if a < b goto L1
···
L1: goto L2
by the sequence
if a < b goto L2
···
L1: goto L2
S. Pyne and T. K. Mishra Code Generation November 6, 2020 81 / 115
Flow-of-Control Optimizations

replace the sequence


goto L1
···
L1: if a < b goto L2
L3:
by the sequence
if a < b goto L2
goto L3
···
L3:

S. Pyne and T. K. Mishra Code Generation November 6, 2020 82 / 115


Algebraic Simplification and Reduction in Strength

Eliminate expressions x = x + 0 or x = x * 1
Replace x 2 with x ∗ x
Implement fixed-point multiplication or division using shift
Floating point division by a constant approximated as multiplication
by a constant

S. Pyne and T. K. Mishra Code Generation November 6, 2020 83 / 115


Use of Machine Idioms

Target machine instructions to implement certain operations


efficiently
Use auto-increment and auto-decrement addressing modes
For pushing or popping a stack during parameter passing
To replace statements like x = x + 1

S. Pyne and T. K. Mishra Code Generation November 6, 2020 84 / 115


Register Allocation and Assignment

Instructions involving only registers operands are faster than those


involving memory operands
Processor are faster than memory speeds
Efficient utilization of registers is vitally important in generating good
code
Register allocation - deciding at each point in a program what values
should reside in registers
Register assignment - in which register each value should reside

S. Pyne and T. K. Mishra Code Generation November 6, 2020 85 / 115


Global Register Allocation

Assign registers to frequently used variables and keep these registers


consistent across block boundaries (globally)
Programs spend most of their time in inner loops
Global register assignment keeps a frequently used value in a fixed
register throughout a loop
Assuming that the loop structure of a flow graph is known
It is known what values computed in a basic block are used outside
that block

S. Pyne and T. K. Mishra Code Generation November 6, 2020 86 / 115


Usage Counts

Variable x in a register for duration of a loop L


Saves one unit of cost for each reference to x if x is already in a
register
After x has been computed in a block it will remain in a register if
there are subsequent uses of x in that block
This counts a savings of one for each use of x in loop L that is not
preceded by an assignment to x in the same block
Saves two units if avoid a store of x at the end of a block
If x is allocated a register, count a savings of two for each block in
loop L for which x is live on exit and in which x is assigned a value
P
Approx. savings = blocks B in L use(x, B) + 2 × live(x, B)
use(x, B) is number of times x is used in B prior to any de
finition of x
live(x, B) is 1 if x is live on exit from B and is assigned a value in B,
and live(x, B) is 0 otherwise
S. Pyne and T. K. Mishra Code Generation November 6, 2020 87 / 115
An Example - Flow graph of an inner loop

for x = a, a is live on exit from B1 and is assigned a value there, but


is not live on exit from B2 , B3 , or B4
P
Thus, blocks B in L use(a, B) = 2, hence the savings for x = a is 4
Four units of cost saved by selecting ’a’ for one of global registers
Savings for b, c, d, e, and f are 5, 3, 6, 4, and 4, respectively
May select a, b, and d for registers R0, R1, and R2, respectively
S. Pyne and T. K. Mishra Code Generation November 6, 2020 88 / 115
Code sequence using global register assignment

S. Pyne and T. K. Mishra Code Generation November 6, 2020 89 / 115


Register Assignment for Outer Loops

If an outer loop L1 contains an inner loop L2


The names allocated registers in L2 need not be allocated registers in
L1 − L2
If allocate x a register in L2 but not L1
load x on entrance to L2
store x on exit from L2

S. Pyne and T. K. Mishra Code Generation November 6, 2020 90 / 115


Tree-Translation Schemes

Tree for the assignment statement a[i] = b + 1


array a is stored on the run-time stack
variable b is a global in memory location Mb
The run-time addresses of locals a and i are given as constant offsets
Ca and Ci from SP
ind operator treats its argument as a memory address
S. Pyne and T. K. Mishra Code Generation November 6, 2020 91 / 115
Tree Rewrite Rules

S. Pyne and T. K. Mishra Code Generation November 6, 2020 92 / 115


Code Generation by Tiling an Input Tree

S. Pyne and T. K. Mishra Code Generation November 6, 2020 93 / 115


Pattern Matching by Parsing

Prefix representation: ind + +Ca RSP ind + Ci RSP + Mb C1


LR-parser using productions of SDT generates
ind + +ca sp ind + ci sp + mb c1

S. Pyne and T. K. Mishra Code Generation November 6, 2020 94 / 115


Tree Matching

String C represents template with C at root


String +1R represents + and its left operand R in the two templates
that have + at the root
Using sets of strings a tree-pattern matcher can be constructed by
using techniques for efficiently matching multiple strings in parallel
The tree-rewriting process can be implemented by
running the tree-pattern matcher during a depth-
first traversal of input tree, and
performing the reductions as the nodes are visited for the last time
S. Pyne and T. K. Mishra Code Generation November 6, 2020 95 / 115
Optimal Code Generation for Expressions

Begin by assigning to each node of an expression tree a number


It tells how many registers are needed to evaluate that node
without storing any temporaries
These numbers are sometimes called Ershov numbers, after A. Ershov
Rules for numbering an expression tree
1 Label all leaves 1
2 The label of an interior node with one child is the label of its child.
3 The label of an interior node with two children is
(a) The larger of the labels of its children, if those labels are different
(b) One plus the label of its children if the labels are the same

S. Pyne and T. K. Mishra Code Generation November 6, 2020 96 / 115


An example
Consider an expression (a − b) + e × (c + d)
Three-address code
t1 = a - b
t2 = c + d
t3 = e * t2
t4 = t1 + t3
A tree labeled with Ershov numbers

S. Pyne and T. K. Mishra Code Generation November 6, 2020 97 / 115


Generating Code From Labeled Expression Trees

INPUT: A labeled tree with each operand appearing once


OUTPUT: An optimal sequence of machine instructions to evaluate
the root into a register
METHOD: Start at the root of the tree. If the algorithm is applied to
a node with label k, then only k registers will be used. However,
there is a “base” b ≥ 1 for the registers used so that the actual
registers used are Rb , Rb+1 , · · · , Rb+k−1 . The result always appears in
Rb+k−1 .
1. To generate machine code for an interior node with label k and two
children with equal labels (which must be k − 1) do the following:
(a) Recursively generate code for the right child, using base b + 1. The
result of the right child appears in register Rb+k−1 .
(b) Recursively generate code for the left child, using base b; the result
appears in Rb+k−2 .
(c) Generate the instruction OP Rb+k−1 , Rb+k−2 , Rb+k−1 , where OP is the
appropriate operation for the interior node in question.

S. Pyne and T. K. Mishra Code Generation November 6, 2020 98 / 115


Generating Code From Labeled Expression Trees

2. Suppose there is an interior node with label k and children with


unequal labels. Then one of the children, called the “big” child, has
label k, and the other child, the “little” child, has some label m < k.
Do the following to generate code for this interior node, using base b:

(a) Recursively generate code for the big child, using base b; the result
appears in register Rb+k−1 .
(b) Recursively generate code for the little child, using base b; the result
appears in register Rb+m−1 . Note that since m < k, neither Rb+k−1
nor any higher-numbered register is used.
(c) Generate the instruction OP Rb+k−1 , Rb+m−1 , Rb+k−1 or the
instruction OP Rb+k−1 , Rb+k−1 , Rb+m−1 , depending on whether the big
child is the right or left child, respectively.
3. For a leaf representing operand x, if the base is b generate the
instruction LD Rb , x.

S. Pyne and T. K. Mishra Code Generation November 6, 2020 99 / 115


Optimal three-register code

LD R3, d
LD R2, c
ADD R3, R2, R3
LD R2, e
MUL R3, R2, R3
LD R2, b
LD R1, a
SUB R2, R1, R2
ADD R3, R2, R3
S. Pyne and T. K. Mishra Code Generation November 6, 2020 100 / 115
Expressions with an Insufficient Supply of Registers

INPUT: A labeled tree with each operand appearing once and a


number of registers r ≥ 2.
OUTPUT: An optimal sequence of machine instructions to evaluate
the root into a register, using no more than r registers, which we
assume are R1 , R2 , · · · , Rr .
METHOD: Apply the following recursive algorithm, starting at the
root of the tree, with base b = 1. For a node N with label r or less,
the algorithm is exactly the same as the previous one. However, for
interior nodes with a label k > r , need to work on each side of the
tree separately and store the result of the larger subtree. That result is
brought back from memory just before node N is evaluated, and the fi
nal step will take place in registers Rr −1 and Rr . The modi
cations to the basic algorithm are as follows:

S. Pyne and T. K. Mishra Code Generation November 6, 2020 101 / 115


Expressions with an Insufficient Supply of Registers
1 Node N has at least one child with label r or greater. Pick the larger
child (or either if their labels are the same) to be the “big” child and
let the other child be the ”little” child.
2 Recursively generate code for the big child, using base b = 1. The
result of this evaluation will appear in register Rr .
3 Generate the machine instruction ST tk , Rr , where tk is a temporary
variable used for temporary results used to help evaluate nodes with
label k.
4 Generate code for the little child as follows. If the little child has label
r or greater, pick base b = 1. If the label of the little child is j < r ,
then pick b = r − j. Then recursively apply this algorithm to the little
child; the result appears in Rr .
5 Generate the instruction LD Rr −1 , tk .
6 If the big child is the right child of N, then generate the instruction
OP Rr , Rr ; Rr −1 . If the big child is the left child, generate
OP Rr , Rr −1 , Rr .
S. Pyne and T. K. Mishra Code Generation November 6, 2020 102 / 115
Optimal three-register code assuming r = 2

Available registers R1 and R2


LD R2, d
LD R1, c
ADD R2, R1, R2
LD R1, e
MUL R2, R1, R2
ST t3, R2
LD R2, b
LD R1, a
SUB R2, R1, R2
LD R1, t3
ADD R2, R2, R1

S. Pyne and T. K. Mishra Code Generation November 6, 2020 103 / 115


Dynamic Programming Code-Generation

Previous algorithm produces optimal code from an expression tree


Time taken is a linear function of the size of the tree
It works for machines in which all computation is done in registers
Where instructions consist of an operator applied to two registers or
to a register and a memory location
Dynamic programming can be used to extend the class of machines
for which optimal code can be generated from expression trees in
linear time
Dynamic programming algorithm applies to register machines with
complex instruction sets

S. Pyne and T. K. Mishra Code Generation November 6, 2020 104 / 115


Contiguous Evaluation

The dynamic programming algorithm partitions the problem of


generating optimal code for an expression into the subproblems of
generating optimal code for the subexpressions of the given
expression.
As a simple example, consider an expression E of the form E1 + E2 .
An optimal program for E is formed by combining optimal programs for
E1 and E2 , in one or the other order, followedvby code to evaluate the
operator +.
The subproblems of generating optimal code for E1 and E2 are solved
similarly.
It evaluates an expression E = E1 op E2 “contiguously”

S. Pyne and T. K. Mishra Code Generation November 6, 2020 105 / 115


Dynamic Programming

Expression E = E1 op E2
Syntax tree T for E
T1 and T2 are trees for E1 and E2
A program P evaluates a tree T contiguously if it
First evaluates subtrees of T to be computed into memory
Then, it evaluates remainder of T either in the order
T1 , T2 and then root, or
T2 , T1 and then root
using the previously computed values from memory
The contiguous evaluation ensures that for any expression tree T
there always exists an optimal program
that consists of optimal programs for subtrees of the root
followed by an instruction to evaluate the root

S. Pyne and T. K. Mishra Code Generation November 6, 2020 106 / 115


The Dynamic Programming Algorithm

Suppose the target machine has r registers


The dynamic programming algorithm proceeds in three phases
1 Compute bottom-up for each node n of the expression tree T
An array C of costs, in which the ith component C [i] is the optimal
cost of computing the subtree S rooted at n into a register, assuming i
registers are available for the computation, for 1 ≤ i ≤ r .
2 Traverse T , using the cost vectors to determine which subtrees of T
must be computed into memory
3 Traverse each tree using the cost vectors and associated instructions to
generate the final target code. The code for the subtrees computed
into memory locations is generated first.
Each phases run in time linearly proportional to size of expression tree

S. Pyne and T. K. Mishra Code Generation November 6, 2020 107 / 115


An example

Consider a machine having two registers R0 and R1


Following instructions, each of unit cost:
LD Ri, Mj // Ri = Mj
op Ri, Ri, Rj // Ri = Ri op Rj
op Ri, Ri, Mj // Ri = Ri op Mj
LD Ri, Rj // Ri = Rj
ST Mi, Rj // Mi = Rj
Ri is either R0 or R1, and Mj is a memory location
Expression (a − b) + c ∗ (d/e)

S. Pyne and T. K. Mishra Code Generation November 6, 2020 108 / 115


Code generation from expression tree

C [0], the cost of computing ’a’ into memory, is 0


it is already there
C [1], the cost of computing ’a’ into a register, is 1
load it into a register with the instruction LD R0, a.
C [2], the cost of loading a into a register with two registers available
is the same as that with one register
The cost vector at leaf a is therefore (0,1,1)
S. Pyne and T. K. Mishra Code Generation November 6, 2020 109 / 115
Cost Vector Evaluation

Minimum cost of evaluating root with one register available is


minimum cost of computing its right subtree into memory, plus
minimum cost of computing its left subtree into the register, plus
1 for the instruction
Three cases arise depending on order of evaluation

S. Pyne and T. K. Mishra Code Generation November 6, 2020 110 / 115


Cost Vector Evaluation

Compute the left subtree with two registers available into register R0
Compute the right subtree with one register available into register R1
Use the instruction ADD R0, R0, R1 to compute the root
This sequence has cost 2 + 5 + 1 = 8

S. Pyne and T. K. Mishra Code Generation November 6, 2020 111 / 115


Cost Vector Evaluation

Compute the right subtree with two registers available into R1


Compute the left subtree with one register available into R0
use the instruction ADD R0, R0, R1
This sequence has cost 4 + 2 + 1 = 7

S. Pyne and T. K. Mishra Code Generation November 6, 2020 112 / 115


Cost Vector Evaluation

Compute the right subtree into memory location M


Compute the left subtree with two registers available into register R0
Use the instruction ADD R0, R0, M
This sequence has cost 5 + 2 + 1 = 8
The second choice gives the minimum cost 7
The cost vector at the root is therefore (8,8,7)
S. Pyne and T. K. Mishra Code Generation November 6, 2020 113 / 115
Generated Optimal Code Sequence

LD R0, c // R0 = c
LD R1, d // R1 = d
DIV R1, R1, e // R1 = R1 / e
MUL R0, R0, R1 // R0 = R0 * R1
LD R1, a // R1 = a
SUB R1, R1, b // R1 = R1 - b
ADD R1, R1, R0 // R1 = R1 + R0
S. Pyne and T. K. Mishra Code Generation November 6, 2020 114 / 115
Thank You

S. Pyne and T. K. Mishra Code Generation November 6, 2020 115 / 115

You might also like