BCS 324: COMPILER DESIGN &
CONSTRUCTION
1. Introduction to Compiling
Dr. D. K. Muyobo
Introduction
Definitions:
◦ A program: is a set of instructions written in a
programming language to perform a specific task.
◦ Programs can be written in high-level languages
(e.g., Python, Java) or low-level languages (e.g.,
Assembly, Machine Code).
◦ A translator: is any program that converts code
from one language to another.
◦ They include:
1. Assemblers
2. compilers, and
3. interpreters.
Definition:
◦ A compiler is a program that reads a program
written in one language (the source language) and
translates into an equivalent program in another
language (the target language).
Why study compilers?
◦ Compiler writing is useful to programming
languages, machine architecture, language theory,
algorithms and software engineering
Understanding compilers helps in:
Writing efficient programs.
Learning about programming language design.
Optimizing performance.
Developing new programming languages.
Issues in Compiler Design
Optimization: How to make code run faster and
consume fewer resources.
Error Handling: Detecting and reporting
syntax/semantic errors.
Portability: Making the compiler work on different
architectures.
Speed: Efficient translation of code without long
compilation times.
In the process of compiling
◦ The compiler reports errors in the source
program
Source Target
Compiler
Program Program
Error messages
Fig.1: An oversimplified view of a compiler
5
In contrast, an interpreter is a program that
◦ Performs the operations implied by the source
program
◦ This course will focus on compilers.
Source
Interpreter Output
Program
Error messages
Fig.2: An oversimplified view of an interpreter
6
The analysis-synthesis model of compilation
There are two parts to compilation:
◦ Part 1:Analysis
The analysis part is also known as the front end of a
compiler
It breaks up source program into pieces and
imposes a grammatical structure on them
Creates intermediate representation of source
program
Determines the operations and records them in a
tree structure, syntax tree
7
◦ Part2: Synthesis
Synthesis part is also known as back end of a compiler
It constructs a target program from intermediate
representation
Takes the tree structure and translates the operations
into the target program
Intermediate
Language
Source Target
Language Front End Back End Language
Fig.3:The analysis-synthesis model
8
During analysis
◦ Source program statements are recorded in a
syntax tree
◦ Each node of the tree represents an operation
◦ The children of the node represents the
arguments of the operation e.g.
Stmt: Totalmarks = test1 + test2 * 3
=
Totalmarks +
test1 *
test2 3
Fig.4: Syntax tree for Totalmarks = test1 + test2 * 3
9
Other tools that use the analysis-synthesis
model
Editors
◦ Can perform text highlighting like an ordinary text
editor
◦ Can analyze text into hierarchical structures
Pretty printers
◦ Makes structure of program visible e.g. may indent
nested statements
Static checkers
◦ Analyzes a program and attempts to find errors in it
without running it.
10
Interpreters
◦ Translates one statement at a time instead of
producing a complete target program
Text formatters
◦ Takes a stream of characters as input
◦ Recognizes paragraphs, figures, tables etc.
Silicon compilers
◦ Used in circuit design
◦ Its variables are logical signals of “0” and “1” and not
memory locations
Query interpreters
◦ Used in query translation in databases.
11
A language-processing system
Skeletal source program
Preprocessor
Source program
Compiler
Target assembly program
Assembler
Relocatable machine code
Linker Libraries and
relocatable object files
Absolute machine code
Fig.5: A language processing system
12
Analysis
Analysis has three phases:
1. Lexical analysis (linear analysis/scanning):
Reads a stream of characters from left-to-right and
groups them into tokens
Eliminates white space
2. Syntax analysis (hierarchical analysis/parsing):
Groups the tokens hierarchically
3. Semantic analysis:
Checks if the program components fit together
meaningfully e.g. if float and integer are not being
computed together.
13
1. Lexical analysis
The stream of characters in source program is read
from left to right and grouped into tokens (sequences
of characters having a collective meaning)
Tokens: smallest units like keywords, operators,
identifiers [KEYWORD int] [IDENTIFIER x] [ASSIGN =] [NUMBER 5]
[SEMICOLON ;]
Example: Totalmarks = test1 + test2 * 3 is broken into
seven tokens as follows:
1. The identifier Totalmarks
2. The assignment =
3. The identifier test1
4. The plus sign
5. The identifier test2
6. The multiplication sign
7. The number 3 14
2. Syntax analysis (parsing)
The tokens are grouped into grammatical phrases
needed by the compiler to synthesize the output
The phrases are represented by a parse tree e.g.
asignment
stmt
id = expr
Totalmarks expr + expr
id expr * expr
test1 id number
test2 3
Fig.6: Parse tree for Totalmarks = test1 + test2 * 3
15
Syntax analysis is usually expressed by
recursive rules e.g. rules for definition of
expression:
1. Any identifier is an expression
2. Any number is an expression
3. If expression1 and expression2 are expressions, then so
are
expression1 + expression2
expression1 * expression2
(expression)
16
This means that
◦ Rules (1) and (2) are non-recursive
◦ Rule (3) defines expressions in terms of
operators applied to other expressions.
Thus
◦ By rule (1) – test1 and test2 are expressions
◦ By rule (2) – 3 is an expression
◦ By rule (3) – ‘test2 * 3’ is first inferred as an
expression then ‘test1 + test2 * 3’ is an
expression.
17
NB:
◦ The difference between lexical and syntax analysis
is that lexical doesn’t need recursion
◦ A syntax tree is a compressed form of a parse
tree. Thus, in a syntax tree, operators are interior
nodes.
Task:
◦ Compare and contrast Figs. 4 and 6 above
18
3. Semantic analysis
Checks source program for semantic errors
Gathers type information for subsequent code
generation (also called type checking)
Identifies operators and operands of
expressions and statements
Thus a semantic analyzer reports errors
◦ when two different types are involved in an operation
e.g. in Fig.6, the number 3 raises an error, and is
therefore converted to float i.e. to 3.0.
19
=
Totalmarks +
test1 *
test2 inttofloat
Fig.7: Semantic analysis converts int to float
20
Phases of a compiler
Fig.8: Phases of a compiler
21
Symbol-Table Management
Symbol table – data structure with a
record for each identifier and its
attributes
Attributes include storage allocation, type,
scope, etc.
All the compiler phases insert and modify
the symbol table e.g.
◦ Type information is detected late at semantic
phase, hence it is added to the symbol table
immediately after being seen.
22
Error detection and reporting
Each phase can detect an error
After detecting it, the phase must get a way of dealing
with the error
Syntax and semantic phases handle a large portion of
the errors
Lexical
◦ Detects errors where a character does not form a token of the
language e.g. := is an assignment statement in Pascal language
but meaningless in Java
Syntax
◦ Detects errors where structure (syntax) rules are violated
Semantic
◦ Detects constructs that may have the right syntax but no
meaning
23
The running example is given using real
characters and variables
However, internal representation is usually
done using id1,id2 and id3 instead of
actual token names
Therefore, Totalmarks = test1 + test2 * 3
is represented internally as
◦ id1 = id2 + id3 * 3.
24
Intermediate code generation
Comes after semantic analysis
It is like a program for an abstract
machine with two properties:
◦ Should be easy to produce
◦ Easy to translate into target program
It is a sequence of instructions, each with
at most three operands.
25
The intermediate code for our example is
written as follows:
temp1 = inttofloat(3)
temp2 = id3 * temp1
temp3 = id2 + temp2
id1 = temp3
26
Code generation
Code generation phase is used to
◦ Generate target code, which is either
assembly code or machine code
◦ The following assembly code translates each
statement in the intermediate code program
using registers as memory locations
MOVF id3, R2
MULF #3.0, R2
MOVF id2, R1
ADDF R2, R1
MOVF R1, id1
27
Code optimization
The code optimization phase
◦ Attempts to condense the number of
statements so as to increase speed in
executing the code i.e. Improve intermediate
code by producing code that runs faster e.g.
temp1 = id3 * 3.0
id1 = id2 * id2 + temp1
NB: The inttofloat statement is eliminated, and
temp3 is used only once.
28