Code Optimization (Machine independent)
The Use of Algebraic Identities
Algebraic identities represent another important class of optimizations on
basic blocks. For example, we may apply arithmetic identities, such as
x +0=0+ x=x x – 0=x
x∗1=1∗x=x x /1=x
to eliminate computations from a basic block.
Expensive vs Cheaper operator (Operator Strength Reduction)
Another class of algebraic optimizations includes a local reduction in
strength, that is, replacing a more expensive operator with a cheaper one as
in:
2
x =x∗x
2∗x=x+ x
x /2=x∗0.5
Example:
while (i<10) s = 3*i+1;
{ while (i<10) {
j = 3*i+1; // expensive j = s;
a[j] = a[j] –2; a[j] = a[j] –2;
i = i+2; i = i+2;
} s= s+6; // cheaper
}
Constant Folding
A third class of related optimizations is constant folding. Here we evaluate
constant expressions at compile time and replace the constant expressions
by their values. Thus, the expression 2 * 3:14 would be replaced by 6.28.
Many
constant expressions arise in practice because of the frequent use of
symbolic constants in programs.
Example: y = 3 + 4 => y = 7
Example 8.9 Loops in Flow Graph
Optimization within each block (Local optimization)
a+ a∗(b−c )+(b−c )∗d
The DAG representation of a basic block lets us perform several code-
improving transformations on the code represented by the block.
a) We can eliminate local common subexpressions, that is, instructions that
compute a value that has already been computed.
b) We can eliminate dead code, that is, instructions that compute a value
that is never used.
c) We can reorder statements that do not depend on one another; such
reordering may reduce the time a temporary value needs to be preserved in
a register.
d) We can apply algebraic laws to reorder operands of three-address
instructions, and sometimes thereby simplify the computation.
An occurrence of an expression E is called a common subexpression if E was
previously computed and the values of the variables in E have not changed
since the previous computation. We avoid recomputing E if we can use its
previously computed value; that is, the variable x to which the previous
computation of E was assigned has not changed in the interim.
Many code generators partition IR instructions into basic blocks, which
consist of sequences of instructions that are always executed together.
Simple local transformations can be used to transform basic blocks into
modified basic blocks from which more efficient code can be generated.
These transformations are a rudimentary form of code optimization, although
the deeper theory of code optimization exists which is called Global Flow
Analysis.
The input to the code generator is the intermediate representation of the
source program produced by the front end, along with information in the
symbol table that is used to determine the run-time addresses of the data
objects denoted by the names in the IR.
The many choices for the IR include three-address representations such as
quadruples, triples, and indirect triples; virtual machine representations such
as bytecodes and stack-machine code; linear representations such as postfix
notation; and graphical representations such as syntax trees and DAG's.
Many of the algorithms in this chapter are couched in terms of the
representations such as three-address code, trees, and DAG's.
Instruction speeds and machine idioms are other important factors. If we do
not care about the efficiency of the target program, instruction selection is
straightforward. For each type of three-address statement, we can design a
code skeleton that defines the target code to be generated for that
construct. For example, every three-address statement of the form x = y+z,
where x, y, and z
are statically allocated, can be translated into the code sequence
LD R0, y // R0 = y (load y into register R0)
ADD R0, R0, z // R0 = R0 + z (add z to R0)
ST x, R0 // x = R0 (store R0 into x)
This strategy often produces redundant loads and stores. For example, the
sequence of three-address statements
a=b+c
d=a+e
would be translated into
LD R0, b // R0 = b
ADD R0, R0, c // R0 = R0 + c
ST a, R0 // a = R0
LD R0, a // R0 = a
ADD R0, R0, e // R0 = R0 + e
ST d, R0 // d = R0
Here, the fourth statement is redundant since it loads a value that has just
been stored, and so is the third if a is not subsequently used.
The quality of the generated code is usually determined by its speed and
size. On most machines, a given IR program can be implemented by many
different code sequences, with significant cost differences between the
different implementations. A naive translation of the intermediate code may
therefore lead to correct but unacceptably inefficient target code.
Basic Blocks and Flow Graph (PP 525)
Partition the intermediate code in basic blocks. A basic block is a maximal
sequence of three address code
(a) The flow of control can only enter the basic block through the first
instruction in the block. That is, there are no jumps into the middle of the
block.
(b) Control will leave the block without halting or branching, except possibly
at the last instruction in the block. which has a single-entry point
Basic Block Construction Algorithm (PP 526) – leader of block
Loops in Programming Languages
Programming-language constructs like while-statements, do-while-
statements, and for-statements naturally give rise to loops in programs.
Since virtually every program spends most of its time in executing its loops,
it is especially important for a compiler to generate good code for loops.
Many code transformations depend upon the identification of loops in a flow
graph.
Identify loops in Flow Graph
We say that a set of nodes L in a flow graph is a loop if L contains a node e
called the loop entry, such that:
1. e is not ENTRY, the entry of the entire flow graph.
2. No node in L besides e has a predecessor outside L. That is, every path
from ENTRY to any node in L goes through e.
3. Every node in L has a nonempty path, completely within L, to e.
This flow graph has three loops – B3 and B6 by itself, {B2, B3, B4}
The first two are single nodes with an edge to the node itself. For instance,
B3 forms a loop with B3 as its entry. Note that the last requirement for a loop
is that there be a nonempty path from B3 to itself. Thus, a single node like
B2, which does not have an edge B2 -> B2, is not a loop, since there is no
nonempty path from B2 to itself within {B2}.
The third loop, L = {B2, B3, B4}, has B2 as its loop entry. Note that among
these three nodes, only B2 has a predecessor, B1, that is not in L. Further,
each of the three nodes has a nonempty path to B2 staying within L. For
instance, B2 has the path B2 -> B3 -> B4 -> B2.
Code optimization of basic blocks
We can often obtain a substantial improvement in the running time of code
merely by performing local optimization within each basic block by itself.
Thorough global optimization, which looks at how information flows among
the basic blocks of a program, more optimization can be done.
DAG Representation of Basic Blocks
Many important techniques for local optimization begin by transforming a
basic block into a DAG (directed acyclic graph). The idea extends naturally to
the collection of expressions that are created within one basic block.
We construct a DAG for a basic block as follows:
1. There is a node in the DAG for each of the initial values of the variables
appearing in the basic block.
2. There is a node N associated with each statement within the block. The
children of N are those nodes corresponding to statements that are the last
definitions, prior to s, of the operands used by s.
3. Node N is labeled by the operator applied at s, and also attached to N is
the list of variables for which it is the last definition within the block.
4. Certain nodes are designated output nodes. These are the nodes whose
variables are live on exit from the block; that is, their values may be used
later, in another block of the ow graph.
Calculation of these “live variables" is a matter for global flow analysis.
The DAG representation of a basic block lets us perform several code-
improving transformations on the code represented by the block.
a) We can eliminate local common subexpressions, that is, instructions that
compute a value that has already been computed.
b) We can eliminate dead code, that is, instructions that compute a value
that is never used.
c) We can reorder statements that do not depend on one another; such
reordering may reduce the time a temporary value needs to be preserved in
a register.
d) We can apply algebraic laws to reorder operands of three-address
instructions, and sometimes thereby simplify the computation.
It might appear that, since there are only three nonleaf nodes in the DAG,
the basic block can be replaced by a block with only three statements.
In fact, if b is not live on exit from the block, then we do not need to compute
that variable and can use d to receive the value represented by the node
labeled ̶. The block then becomes
a=b+c
d=a-d
c=d+c
However, if both b and d are live on exit, then a fourth statement must be
used to copy the value from one to the other.
Reconstruction of three address code from modified DAG
After we perform whatever optimizations are possible while constructing the
DAG or by manipulating the DAG once constructed, we may reconstitute the
three-address code for the basic block from which we built the DAG.
If b is not live on exit of the block we remove b from the – node and then
reconstruct the code as shown above.
For each node that has one or more attached variables, we construct a three-
address statement that computes the value of one of those variables. We
prefer to compute the result into a variable that is live on exit from the block.
However, if we do not have global live-variable information to work from, we
need to assume that every variable of the program (but not temporaries that
are generated by the compiler to process expressions) is live on exit from the
block. If the node has more than one live variable attached, then we must
introduce copy statements to give the correct value to each of those
variables (copy is cheaper than subtraction). Sometimes, global optimization
can eliminate those copies, if we can arrange to use one of two variables in
place of the other.
Loops are important because programs spend most of their time executing
them, and optimizations that improve the performance of loops can have a
significant impact. Thus, it is essential that we identify loops and treat them
specially.
Peephole Optimization
Pg 549
Depth First Search Algorithm page 602
DFST (Page 661)
Program spends most of the execution time in loops. We introduce the
following concepts: dominators, depth first ordering, back edges, graph
depth, and reducibility. Each of these is needed for our subsequent
discussions on finding loops and the speed of convergence of iterative data
flow analysis.
There is one back edge: (B5, B2). Its natural loop consists of the
nodes B2, B3, B4, B5.
REDUCIBLE CONTROL FLOW GRAPHS. A control flow graph G is said to
be reducible if the removal of its back edges leads to an acyclic graph where
each node can be reached from the initial node of G. In this case, all the
edges of this acyclic graph are called forward edges.
What is depth-first ordering in compiler design?
Depth-first ordering is a traversal technique used in compiler design to visit
nodes in a tree or graph. In depth-first ordering, the nodes are visited in a
depth-first manner, meaning that the nodes are visited from the root to the
deepest level before backtracking and visiting the next node.
Depth First Spanning Tree (DFST)
Loop Determination
In Fig. 9.38, there are five back edges, those whose heads dominate their
tails: 10 -> 7, 7 -> 4, 4 -> 3, 8 -> 3 and 9 -> 1. Note that these are exactly
the edges that one would think of as forming loops in the flow graph.
Back edge 10 -> 7 has natural loop {7; 8; 10}, since 8 and 10 are the only
nodes that can reach 10 without going through 7. Back edge 7 -> 4 has a
natural loop consisting of {4; 5; 6; 7; 8; 10} and therefore contains the loop
of 10 -> 7. We thus assume the latter is an inner loop contained inside the
former. The natural loops of back edges 4 -> 3 and 8 -> 3 have the same
header, node 3, and they also happen to have the same set of nodes: {3; 4;
5; 6; 7; 8; 10}. We shall therefore combine these two loops as one. This loop
contains the two smaller loops discovered earlier. Finally, the edge 9 -> 1
has as its natural loop the entire flow graph, and therefore is the outermost
loop. In this example, the four loops are nested within one another. It is
typical, however, to have two loops, neither of which is a subset of the other.
Control Flow Analysis
Control-flow analysis begins by constructing a control-flow graph, which is a
graph of the different possible paths program flow could take through a
function. To build the graph, we first divide the code into basic blocks. A
basic block is a segment of the code that a program must enter at the
beginning and exit only at the end. This means that only the first statement
can be reached from outside the block (there are no branches into the
middle of the block) and all statements are executed consecutively after the
first one is (no branches or halts until the exit). Thus a basic block has
exactly one entry point and one exit point. If a program executes the first
instruction in a basic block, it must execute every instruction in the block
sequentially after it.
A basic block begins in one of several ways:
• the entry point into the function
• the target of a branch (in our example, any label)
• the instruction immediately following a branch or a return
A basic block ends in any of the following ways:
• a jump statement
• a conditional or unconditional branch
• a return statement
Now we can construct the control-flow graph between the blocks. Each basic
block is a node in the graph, and the possible different routes a program
might take are the connections, i.e. if a block ends with a branch, there will
be a path leading from that block to the branch target. The blocks that can
follow a block are called its successors.
There may be multiple successors or just one. Similarly, the block may have
many, one, or no predecessors.
Local Optimizations
Optimizations performed exclusively within a basic block are called "local
optimizations". These are typically the easiest to perform since we do not
consider any control flow information, we just work with the statements
within the block. Many of the local optimizations we will discuss have
corresponding global optimizations that operate on the same principle, but
require additional analysis to perform. We'll consider some of the more
common local optimizations as examples.
Constant Folding
Constant folding refers to the evaluation at compile-time of expressions
whose operands are known to be constant. In its simplest form, it involves
determining that all of the operands in an expression are constant-valued,
performing the evaluation of the expression at compile-time, and then
replacing the expression by its value. If an expression such as 10 + 2 * 3 is
encountered, the compiler can compute the result at compile-time (16) and
emit code as if the input contained the result rather than the original
expression. Similarly, constant conditions, such as a conditional branch if a
< b goto L1 else goto L2 where a and b are constant can be replaced by a
Goto L1 or Goto L2 depending on the truth of the expression evaluated at
compile-time. The constant expression has to be evaluated at least once, but
if the compiler does it, it means you don’t have to do it again as needed
during runtime.
Algebraic Simplification And Reassociation
Simplifications use algebraic properties or particular operator-operand
combinations to simplify expressions. Reassociation refers to using
properties such as associativity, commutativity and distributivity to
rearrange an expression to enable other optimizations such as constant-
folding or loop-invariant code motion. The most obvious of these are the
optimizations that can remove useless instructions entirely via algebraic
identities. The rules of arithmetic can come in handy when looking for
redundant calculations to eliminate. Consider the examples below, which
allow you to replace an expression on the left with a simpler equivalent on
the right:
x+0 = x
0+x = x
x*1 = x
1*x = x
0/x = 0
x-0 = x
b && true = b
b && false = false
b || true = true
b || false = b
The use of algebraic rearrangement can restructure an expression to enable
constant folding or common sub-expression elimination and so on.
Operator Strength Reduction
Operator strength reduction replaces an operator by a "less expensive" one.
Given each group of identities below, which operations are the most and
least expensive, assuming
f is a float and i is an int?
i*2 = 2*i = i+i = i << 1
i/2 = (int)(i*0.5)
0-1 = -i
f*2 = 2.0 * f = f + f
f/2.0 = f*0.5
Strength reduction is often performed as part of loop-induction variable
elimination.
Code snippet Three Address Code (TAC) Optimized TAC
This eliminates the multiplication entirely and reduces the need for an extra
temporary.
Copy Propagation
This optimization is similar to constant propagation, but generalized to non-
constant values. If we have an assignment a = b in our instruction stream,
we can replace later occurrences of a with b (assuming there are no changes
to either variable in-between).
Given the way we generate TAC code, this is a particularly valuable
optimization since it is able to eliminate a large number of instructions that
only serve to copy values from one variable to another.
The code on the left makes a copy of tmp1 in tmp2 and a copy of tmp3 in
tmp4. In the optimized version on the right, we eliminated those
unnecessary copies and propagated the original variable into the later uses:
Common Subexpression Elimination
Two operations are common if they produce the same result. In such a case,
it is likely more efficient to compute the result once and reference it the
second time rather than re-evaluate it. An expression is alive if the operands
used to compute the expression have not been changed. An expression that
is no longer alive is dead.
What subexpressions can be eliminated? How can valid common
subexpressions (live
ones) be determined?
There is an optimized version, after constant folding and propagation and
elimination of common sub-expressions:
Global Optimizations, Data-Flow Analysis
So far we were only considering making changes within one basic block. With
some additional analysis, we can apply similar optimizations across basic
blocks, making them global optimizations. It’s worth pointing out that global
in this case does not mean across the entire program. We usually only
optimize one function at a time. Inter procedural analysis is an even larger
task, one not even attempted by some compilers.
The additional analysis the optimizer must do to perform optimizations
across basic blocks is called data-flow analysis. Data-flow analysis is much
more complicated than control-flow analysis.
Let’s consider a global common subexpression elimination optimization as
our example. Careful analysis across blocks can determine whether an
expression is alive on entry to a block. Such an expression is said to be
available at that point. Once the set of available expressions is known,
common sub-expressions can be eliminated on a global basis.
Each block is a node in the flow graph of a program. The successor set
(succ(x)) for a node x is the set of all nodes that x directly flows into. The
predecessor set (pred(x)) for a node x is the set of all nodes that flow directly
into x. An expression is defined at the
point where it is assigned a value and killed when one of its operands is
subsequently assigned a new value. An expression is available at some point
p in a flow graph if every path leading to p contains a prior definition of that
expression which is not subsequently killed.
avail[B] = set of expressions available on entry to block B
exit[B] = set of expressions available on exit from B
avail[B] = ∩ exit[x]: x ∈ pred[B] (i.e. B has available the intersection of
the exit of its predecessors)
killed[B] = set of the expressions killed in B
defined[B] = set of expressions defined in B
exit[B] = avail[B] - killed[B] + defined[B]
avail[B] = ∩ (avail[x] - killed[x] + defined[x]) : x ∈ pred[B]
Here is an algorithm for global common sub-expression elimination:
1) First, compute defined and killed sets for each basic block (this does not
involve any of its predecessors or successors).
2) Iteratively compute the avail and exit sets for each block by running the
following algorithm until you hit a stable fixed point:
a) Identify each statement s of the form a = b op c in some block B such
that b op c is available at the entry to B and neither b nor c is redefined in B
prior to s.
b) Follow flow of control backward in the graph passing back to but not
through each block that defines b op c. The last computation of b op c in
such a block reaches s.
c) After each computation d = b op c identified in step 2a, add statement t
= d to that block where t is a new temp.
d) Replace s by a = t.