8.7.
PEEPHOLE OPTIMIZATION
generate three-address code, assuming that all array elements are integers tak-
ing four bytes each. In parts (d) and (e), assume that a, b, and c are constants
giving the location of the first (0th) elements of the arrays with those names,
as in all previous examples of array accesses in this chapter.
! Exercise 8.6.2 : Repeat Exercise 8.6.1 parts (d) and (e), assuming that the
arrays a, b, and c are located via pointers, pa, pb, and pc, respectively, pointing
to the locations of their respective first elements.
Exercise 8.6.3 : Convert your three-address code from Exercise 8.6.1 into ma-
chine code for the machine model of this section. You may use as many registers
as you need.
Exercise 8.6.4 : Convert your three-address code from Exercise 8.6.1 into ma-
chine code, using the simple code-generation algorithm of this section, assuming
three registers are available. Show the register and address descriptors after
each step.
Exercise 8.6.5: Repeat Exercise 8.6.4, but assuming only two registers are
available.
8.7 Peephole Optimization
While most production compilers produce good code through careful instruc-
tion selection and register allocation, a few use an alternative strategy: they
generate naive code and then improve the quality of the target code by applying
"optimizing" transformations to the target program. The term "optimizing" is
somewhat misleading because there is no guarantee that the resulting code is
optimal under any mathematical measure. Nevertheless, many simple transfor-
mations can significantly improve the running time or space requirement of the
target program.
A simple but effective technique for locally improving the target code is
peephole optimization, which is done by examining a sliding window of target
instructions (called the peephole) and replacing instruction sequences within
the peephole by a shorter or faster sequence, whenever possible. Peephole
optimization can also be applied directly after intermediate code generation to
improve the intermediate representation.
The peephole is a small, sliding window on a program. The code in the
peephole need not be contiguous, although some implementations do require
this. It is characteristic of peephole optimization that each improvement may
CHAPTER 8. CODE GENERATION
spawn opportunities for additional improvements. In general, repeated passes
over the target code are necessary to get the maximum benefit. In this sec-
tion, we shall give the following examples of program transformations that are
characteristic of peephole optimizations:
Redundant-instruction elimination
Flow-of-control optimizations
Algebraic simplifications
Use of machine idioms
8.7.1 Eliminating Redundant Loads and Stores
If we see the instruction sequence
LD a, RO
ST RO, a
in a target program, we can delete the store instruction because whenever it is
executed, the first instruction will ensure that the value of a has already been
loaded into register RO. Note that if the store instruction had a label, we could
not be sure that the first instruction is always executed before the second, so we
could not remove the store instruction. Put another way, the two instructions
have to be in the same basic block for this transformation to be safe.
Redundant loads and stores of this nature would not be generated by the
simple code generation algorithm of the previous section. However, a naive code
generation algorithm like the one in Section 8.1.3 would generate redundant
sequences such as these.
8.7.2 Eliminating Unreachable Code
Another opportunity for peephole optimization is the removal of unreachable
instructions. An unlabeled instruction immediately following an unconditional
jump may be removed. This operation can be repeated to eliminate a sequence
of instructions. For example, for debugging purposes, a large program may
have within it certain code fragments that are executed only if a variable debug
is equal to 1. In the intermediate representation, this code may look like
i f debug == 1 goto L1
got0 L2
L I : print debugging information
L2:
One obvious peephole optimization is to eliminate jumps over jumps. Thus,
no matter what the value of debug, the code sequence above can be replaced
by
8.7. PEEPHOLE OPTIMIZATION
if debug != 1 g o t o L2
print debugging information
L2:
If debug is set to 0 at the beginning of the program, constant propagation
would transform this sequence into
if 0 != 1 g o t o L2
print debugging information
L2:
Now the argument of the first statement always evaluates to true, so the
statement can be replaced by goto L2. Then all statements that print debug-
ging information are unreachable and can be eliminated one at a time.
8.7.3 Flow-of-Control Optimizations
Simple intermediate code-generation algorithms frequently produce jumps to
jumps, jumps to conditional jumps, or conditional jumps to jumps. These
unnecessary jumps can be eliminated in either the intermediate code or the
target code by the following types of peephole optimizations. We can replace
the sequence
g o t 0 L1
...
Ll: g o t 0 L2
by the sequence
If there are now no jumps to L1, then it may be possible to eliminate the
statement L1: goto L2 provided it is preceded by an unconditional jump.
Similarly, the sequence
can be replaced by the sequence
Finally, suppose there is only one jump to L1 and L1 is preceded by an
unconditional goto. Then the sequence
CHAPTER 8. CODE GENERATION
may be replaced by the sequence
While the number of instructions in the two sequences is the same, we sometimes
skip the unconditional jump in the second sequence, but never in the first. Thus,
the second sequence is superior to the first in execution time.
8.7.4 Algebraic Simplification and Reduction in Strength
In Section 8.5 we discussed algebraic identities that could be used to simplify
DAG's. These algebraic identities can also be used by a peephole optimizer to
eliminate t hree-address statements such as
in the peephole.
Similarly, reduction-in-strength transformations can be applied in the peep-
hole to replace expensive operations by equivalent cheaper ones on the target
machine. Certain machine instructions are considerably cheaper than others
and can often be used as special cases of more expensive operators. For ex-
ample, x2 is invariably cheaper to implement as x * x than as a call to an
exponentiation routine. Fixed-point multiplication or division by a power of
two is cheaper to implement as a shift. Floating-point division by a constant
can be approximated as multiplication by a constant, which may be cheaper.
8.7.5 Use of Machine Idioms
The target machine may have hardware'instructions to implement certain spe-
cific operations efficiently. Detecting situations that permit the use of these
instructions can reduce execution time significantly. For example, some ma-
chines have auto-increment and auto-decrement addressing modes. These add
or subtract one from an operand before or after using its value. The use of the
modes greatly improves the quality of code when pushing or popping a stack,
as in parameter passing. These modes can also be used in code for statements
like x=x+l.
8.8. REGISTER ALLOCATION AND ASSIGNMENT
8.7.6 Exercises for Section 8.7
Exercise 8.7.1 : Construct an algorithm that will perform redundant-instruc-
tion elimination in a sliding peephole on target machine code.
Exercise 8.7.2 : Construct an algorithm that will do flow-of-control optimiza-
tions in a sliding peephole on target machine code.
Exercise 8.7.3 : Construct an algorithm that will do simple algebraic simpli-
fications and reductions in strength in a sliding peephole on target machine
code.
8.8 Register Allocation and Assignment
Instructions involving only register operands are faster than those involving
memory operands. On modern machines, processor speeds are often an order
of magnitude or more faster than memory speeds. Therefore, efficient utilization
of registers is vitally important in generating good code. This section presents
various strategies for deciding at each point in a program what values should
reside in registers (register allocation) and in which register each value should
reside (register assignment).
One approach to register allocation and assignment is to assign specific
values in the target program to certain registers. For example, we could decide
to assign base addresses to one group of registers, arithmetic computations to
another, the top of the stack to a fixed register, and so on.
This approach has the advantage that it simplifies the design of a code gener-
ator. Its disadvantage is that, applied too strictly, it uses registers inefficiently;
certain registers may go unused over substantial portions of code, while unnec-
essary loads and stores are generated into the other registers. Nevertheless, it is
reasonable in most computing environments to reserve a few registers for base
registers, stack pointers, and the like, and to allow the remaining registers to
be used by the code generator as it sees fit.
8.8.1 Global Register Allocation
The code generation algorithm in Section 8.6 used registers to hold values for
the duration of a single basic block. However, all live variables were stored
at the end of each block. To save some of these stores and corresponding
loads, we might arrange to assign registers to frequently used variables and keep
these registers consistent across block boundaries (globally). Since programs
spend most of their time in inner loops, a natural approach to global register
assignment is to try to keep a frequently used value in a fixed register throughout
a loop. For the time being, assume that we know the loop structure of a flow
graph, and that we know what values computed in a basic block are used outside
that block. The next chapter covers techniques for computing this information.