Compiler Design Ch1
Compiler Design Ch1
What is a compiler?
a program that reads a program written in one language (the source language) and translates it into an equivalent
program in another language (the target language).
Output
Using a high-level language for programming has a large impact on how fast programs can be developed.
Compared to machine language, the notation used by programming languages is closer to the way humans think
about problems.
Programs written in a high-level language tend to be shorter than equivalent programs written in machine
language.
The same program can be compiled to many different machine languages and, hence, be brought to run on
many different machines.
Since different platforms, or hardware architectures along with the operating systems (Windows, Macs, Unix), require
different machine code, you must compile most programs separately for each platform.
1
compi progra
le r m
compi compi
ler ler
1) Interpreter
Works by analyzing and executing the source program commands one at a time
Does not translate the whole source program into object code
o Programmer is working in interactive mode and needs to view and update variables
o Commands have simple formats, and thus can be quickly analyzed and executed
o Basic interpreter, Lisp interpreter, UNIX shell command interpreter, SQL interpreter, java interpreter…
You can run bytecode on any computer that has a Java Interpreter installed
2
Java compil
Java bytecode
Program er Interpreter
2) Assemblers
3) Linker
4) Loader
Loading of the executable codes, which are the outputs of linker, into main memory.
5) Pre-processors
A pre-processor is a separate program that is called by the compiler before actual translation begins.
Such a pre-processor:
3
A compiler consists of
internally of a number of
steps, or phases, that
perform distinct logical
operations.
The phases of a compiler
are shown in the next slide,
together with three
auxiliary components that
interact with some or all of
the phases:
The symbol table,
the literal table,
and error handler.
Synthesis.
o Creates an intermediate representation of the source program.
o During analysis, the operations implied by the source program are determined and recorded in hierarchical structure
called a tree.
o The synthesis part constructs the desired program from the intermediate representation.
4
Analysis of the source program
1. Linear/Lexical analysis
2. Hierarchical/Syntax analysis
3. Semantic analysis
The stream of characters making up the source program is read from left to right and is grouped into tokens.
A lexical analyzer, also called a lexer or a scanner, receives a stream of characters from the source program and groups
them into tokens.
Examples:
Identifiers
Keywords
Symbols (+, -, …)
Numbers …
Blanks, new lines, tabulation marks will be removed during lexical analysis.
Example
a[index] = 4 + 2;
a identifier
[ left bracket
index identifier
] right bracket
= assignment operator
4 number
+ plus operator
2 number
; semicolon
A scanner may perform other operations along with the recognition of tokens.
5
a) lex - produces C source code (UNIX/linux).
The parser receives the source code in the form of tokens from the scanner and performs syntax analysis.
The results of syntax analysis are usually represented by a parse tree or a syntax tree.
Syntax tree à each interior node represents an operation and the children of the node represent the arguments of the
operation.
The syntactic structure of a programming language is determined by context free grammar (CFG).
6
Sometimes syntax trees are called abstract syntax trees, since they represent a further abstraction from parse trees.
Example is shown in the following figure.
3. Semantic analysis
o Type checking.
7
Ex. Consider again the line of C code: a[index] = 4 + 2
Code Improvement
o Constant folding
o Improving loops
8
o Should be easy to produce
Intermediate code
t1=2
t2 = 4 + t1
a[index] = t2
II. IC optimizer
This phase changes the IC so that the code generator produces a faster and less memory consuming program.
The optimized code does the same thing as the original (non-optimized) code but with less cost in terms of CPU time and
memory space.
Intermediate code
There are several techniques of optimizing code and they will be discussed in the forthcoming chapters.
Ex. Unnecessary lines of code in loops (i.e. code that could be executed outside of the loop) are moved out of the loop.
x = y+1;
z = x+i; }
x = y+1;
z = x+i;
In our previous example, we have included an opportunity for source level optimization; namely, the expression 4 + 2 can
be recomputed by the compiler to the result 6(This particular optimization is called constant folding).
9
This optimization can be performed directly on the syntax tree as shown below.
However, in a number of cases, it is easier to optimize a linearized form of the tree that is closer to assembly code.
A standard choice is Three-address code, so called because it contains the addresses of up to three locations in memory.
In our example, three address code for the original C expression might look like this:
o t1=2
o t2 = 4 + t 1
o a[index] = t2
Now the optimizer would improve this code in two steps, first computing the result of the addition
o t = 4+2
o a[index] = t
o a[index] = 6
The machine code generator receives the (optimized) intermediate code, and then it produces either:
Code generator
10
The code generator takes the IR code and generates code for the target machine.
*R1-indirect registers addressing (the last instruction stores the value 6 to the address contained in R1)
In this phase, the compiler attempts to improve the target code generated by the code generator.
In the sample target code given, use a shift instruction to replace the multiplication in the second instruction.
Another is to use a more powerful addressing mode, such as indexed addressing to perform the array store.
Grouping of phases
11
Back end - machine-specific and language-independent.
Compiler passes:
For example, the front-end phases of lexical analysis, syntax analysis, semantic analysis, and intermediate code
generation might be grouped together into one pass.
Single pass
o is a compiler that passes through the source code of each compilation unit only once.
o they are unable to generate as efficient programs, due to the limited scope available.
Multi pass
o is a type of compiler that processes the source code or abstract syntax tree of a program several times.
Token
Syntax Tree
Nodes have fields containing information collected by the parser and semantic analyzer
Symbol Table
o Code generation and optimization phases use the information in the symbol table
Performance Issues
o Insertion, deletion, and search operations need to be efficient because they are frequent
12
o More than one symbol table may be used
Literal Table
Various tools are used in the construction of the various parts of a compiler.
Scanner generators
Parser Generators
o These tools produce a parser /syntax analyzer/ if given a Context Free Grammar (CFG) that describes the syntax
of the source language.
o It produces a collection of routines that walk the parse tree and execute some tasks.
o Take a collection of rules that define the translation of the IC to target code and produce a code generator.
13