Design Issues in Code Generator
There are 5 main issues in the design of a code generator as explained below:
1) Input to the Code Generator
The input to the code generator is the intermediate representation of the source program
produced by the front end, along with information in the symbol table that is used to
determine the run-time addresses of the data objects denoted by the names in the IR.
The many choices for the IR include three-address representations such as quadruples,
triples, indirect triples
virtual machine representations such as byte codes and stack-machine code; linear
representations such as postfix notation and graphical representations such as syntax trees
and DAG's
2) The Target Program
The instruction-set architecture of the target machine has a significant impact on the
difficulty of constructing a good code generator that produces high-quality machine code.
The most common target-machine architectures are RISC (reduced instruction set
computer), CISC (complex instruction set computer), and stack based.
A RISC machine typically has many registers, three-address instructions, simple
addressing modes, and a relatively simple instruction-set architecture
A CISC machine typically has few registers, two-address instructions, a variety of
addressing modes, several register classes, variable-length instructions, and instructions
with side effects.
In a stack-based machine, operations are done by pushing operands onto a stack and then
performing the operations on the operands at the top of the stack. To achieve high
performance the top of the stack is typically kept in registers.
Stack-based architectures were revived with the introduction of the Java Virtual Machine
(JVM), which produced object code either as absolute or relative
Producing an absolute machine-language program as output has the advantage that it can
be placed in a fixed location in memory and can be executed quickly.
3) Instruction Selection
The code generator must map the IR program into a code sequence that can be
executed by the target machine. The complexity of performing this mapping is
determined by factors such as : the level of the IR, the nature of the instruction-set
architecture, the desired quality of the generated code.
If the IR is high level, the code generator may translate each IR statement into a
sequence of machine instructions using code templates.
If the IR reflects some of the low-level details of the underlying machine, then the
code generator can use this information to generate more efficient code sequences.
If the target machine does not support each data type in a uniform manner, then
each exception to the general rule requires special handling
4) Register Allocation
A key problem in code generation is deciding what values to hold in what registers.
Registers are the fastest computational unit on the target machine, but we usually do not
have enough of them to hold all values.
Values not held in registers need to reside in memory. Instructions involving register
operands are invariably shorter and faster than those involving operands in memory, so
efficient utilization of registers is particularly important.
The use of registers is often subdivided into two sub problems:
1. Register allocation, during which we select the set of variables that will reside
in registers at each point in the program.
2. Register assignment, during which we pick the specific register that variable
will reside in.
5) Evaluation Order
The order in which the computations are performed can often affect the efficiency of the
target code.
Some computation orders require fewer registers to hold intermediate results than others.
However picking the best order in the general case is a difficult NP-Complete problem