Chapter 1 - Introduction To Compilers
Chapter 1 - Introduction To Compilers
Chapter # 1
Introduction to
Compiler
Course Book
The primary text for
the course is:
Compilers –
Principles,
Techniques and
Tools by Aho, Sethi
and Ullman
This is also called the
Dragon Book
1
3/22/2020
Compilers
A Compiler is a program (system software) that
reads a program written in one language – the
source language – and translates it into an
equivalent program in another language – the
target language.
During translation process, the compiler reports
to its user the presence of errors in the source
program.
Source Target
Program Compiler Program
Error
Messages
2
3/22/2020
Compilers
Source language can be any high level computer programming
language ranging from traditional programming language such as
Fortran, C, Java etc to specialized language that have been written
for a specific area of computer application such as LISP for AI etc.
Target language may be another programming language (assembly
language) or the machine language of a computer, depending upon
the compiler.
High--level source code
High
Compiler
Compilation Process
It takes the whole program at a time and either
displays all of the possible errors in the program
or creates an object program.
The time at which the conversion of a source
program to an object program occurs is called
compile time.
The object program is executed at run time.
3
3/22/2020
Properties of compilers
It must generate a correct executable code
The input program and the output program must
be equivalent
The compiler should preserve the meaning of
the input program
Output program should run fast
Compiler itself should be fast i.e., low
compilation time
Compiler should provide good diagnostics for
programming errors
Properties of compilers
Compiler should support separate compilation
Compiler should work well with debuggers
Compile time should be maximally proportional
to the code size
4
3/22/2020
Interpreter
Interpreter is a system software that is used for
the translation of high level language programs.
Directly execute the statements of source
program rather than generating object code
It is different from the compilers in a sense that:
It translates a program by taking one instruction at a
time and produces the results before taking the next
instruction.
It can identify only one error at a time.
It does not produces the object program.
Needs retranslation
Makes it slow than compilers by a factor of 10
times
Interpreter
5
3/22/2020
Assembler
Assembler is a translator (software) that
particularly converts a program written in
assembly language into machine language.
Assembly language is called low-level language.
Because there is one to one correspondence
between the assembly language statements
and machine language statements.
Symbolic form of the machine language,
makes it easy to translate
Compiler generates assembly language as its
target language and assembler translate it
into object code
Linker
Separate program often part of operating system
Collects code in object file(s) into a file that is
directly executable
Object code produced by a compiler or
assembler --- machine code that has not yet
been linked
Linker creates executable machine code
Connects an object program to the code of
standard library functions and to resources
supplied by operating system
For example, memory allocation and input and
output devices etc.
6
3/22/2020
Loader
The loader reads the reloadable machine code
Alters its addresses by adding the starting
position of main memory block to them and
loads the code into main memory
7
3/22/2020
8
3/22/2020
Analysis part
Analysis part breaks the source program into
constituent pieces and imposes a grammatical
structure on them which further uses this structure to
create an intermediate representation of the source
program
It is also termed as front end of compiler
Synthesis part
Synthesis part takes the intermediate representation
as input and transforms it to the target program
It is also termed as back end of compiler
9
3/22/2020
Analysis Part
Intermediate Representation
Synthesis Part
Target Program
10
3/22/2020
11
3/22/2020
Lexical Analysis.
It is also called Linear Analysis or Scanner
It reads the stream of characters making up the
source program from left-to-right and grouped
into tokens (the sequence of characters having a
collective meaning), meaningful units
For example, the characters in the assignment
statement:
Lexical Analysis.
Tokens:
1. The identifier position
2. The assignment symbol =
3. The identifier initial
4. The plus sign +
5. The identifier rate
6. The multiplication sign *
7. The number 60
Once a token is generated the corresponding entry is
made in the symbol table
White paces are removed
The blanks separating the characters of these
tokens would normally be eliminated during lexical
analysis
Carriage returns are removed (.i.e., \r)
12
3/22/2020
Syntax Analysis
It is also called Parsing or Hierarchical Analysis
Parser converts the tokens produced by lexical analyzer
into a tree like representation called parse tree
It involves grouping of the tokens of the source program
into grammatical phrases using source language grammar
The parser checks if the expression made by the
tokens is syntactically correct. Similar to performing
grammatical analysis on a sentence in a natural
language
The grammatical phrases of the source program are
represented by a parse tree/syntax tree
Syntax tree is a compressed representation of the parse
tree in which the operators appear as interior nodes and
the operands of the operator are the children of the node
for that operator
Syntax Analysis
The hierarchical structure of a program is expressed by
recursive rules
For example, the rules for the definition of expression are:
1. Any identifier is an expression
2. Any number is an expression
3. If expression1 and expression2 are expression, then so are
1. expression1 * expression2
2. expression1 + expression2
3. ( expression1 )
Thus by rule (1) initial and rate are expressions.
By rule (2) 60 is an expression
By rule (3), we can first infer that rate * 60 is an
expression and finally that initial + rate * 60 is an
expression
13
3/22/2020
Syntax Analysis
Example
14
3/22/2020
Semantic Analysis
The function of the semantic analyzer is to determine the
meaning of the source program.
Concerned with meanings – checks meaningfulness of the
statements in the source program
It checks the source program for semantic errors and gathers
type information
It uses the parse tree/syntax tree produced by the syntax
analysis phase whether the parse tree constructed
follows the rules of language.
For example, assignment of values is between compatible data
types, and adding string to an integer.
The semantic analysis performs type checking
Here the compiler checks that each operator has operands that
are permitted by the source language specification
Semantic Analysis
Type information is gathered and stored in the symbol
table or in syntax tree for the next phase code
generation
However, many language specification permit some
operand coercions.
When a binary arithmetic operator is applied to an integer and
real. The compiler may need to convert an integer to a real.
Also identify semantic errors
For example, many programming language definitions require a
compiler to report an error every time a real number is used to
index an array
The semantic analyzer produces an annotated syntax
tree as an output
15
3/22/2020
Semantic Analysis
16
3/22/2020
17
3/22/2020
Code Optimization
The code optimization phase attempts to improve the
intermediate code, so that faster-running machine code
will result
To produce more efficient object/target program to
execute faster.
To efficiently use memory
To yield better performance
To remove redundant code without changing the
meaning of program
Achieved through code transformation while
preserving semantics
There is a great variation in the amount of code
optimization different compilers perform.
It is optional - Compiler can be either optimizing
compiler or dirty compiler
Code Optimization
Optimizing compiler
The compilers, that do the most called
“optimizing compilers”
A significant fraction of the time of the compiler is
spent on this phase
Dirty compiler
Do not provide code optimization
Code optimization can slow down the
compilation process
Some simple optimization can improve running time of which improve
the running time of target program without slowing compilation process
Therefore code optimization and compilation time are directly
proportional to each other
Therefore code optimization and running time are inversely proportional
to each other
18
3/22/2020
19
3/22/2020
Code Optimization.
Code Generation
The final phase of the compiler is the generation
of the target program, consisting of normally
relocatable machine code or assembly code.
Memory locations are selected for each of the
variable used by the program.
Generate the target code form the intermediate
code
Intermediate instructions are each translated
in to the sequence of machine instructions
that perform the same task
Operations are converted into OP-Codes
Registers are loaded with variables Then,.
20
3/22/2020
Code Generation
21
3/22/2020
22
3/22/2020
23
3/22/2020
24
3/22/2020
Translation of a statement.
Types of Errors
Types of error
Lexical error
Syntax error
Semantic error
Logical error
Fatal error
Spurious error
25
3/22/2020
Types of Errors
Lexical Error
A sequence of characters does not form any
valid token of the language
Example
An identifier name begin with letter followed by any
combination of letters and digits
Misspelling of keywords
Syntax error
The stream of tokens violates the grammar
rules of a language
Types of Error
Example
Statement not ending with semicolon (;)
Expression like S = A + * B
Semantic Error
Statements not meaningful
Example
Adding character with integer
Indexing an array with a floating value
Expression like arrayname[1] + function name
26
3/22/2020
Types of Error
Logical Error
Error in the algorithm of the source program
which would produce undesired output
Compiler cannot detect logical error
Example
Infinite loop
Array index out of bound
Division by zero
Fatal Error
Error occur during run-time of a program
Halt the program
Types of Error
Example
Accessing invalid memory location
Code error not handled by exception handling
Spurious Error
Error made by compiler during the error
recovery
Example
A function fi () could be converted into if ()
27
3/22/2020
source IR machine
code Front Back code
End End
errors
28
3/22/2020
Back End
Target
Language
IBM IBM
29
3/22/2020
Cousins of Compiler
Software/programs that are related with compiler
Not specific parts of compiler but to carry out
some functions/processing as that of
compiler
To help compiler by performing operations
like compiler
Cousins of compiler are
Preprocessor
Assembler
Linker and loader
30
3/22/2020
Cousins of Compiler
Preprocessor
As the name indicates “pre-process” – to do something
before formally starting the compilation
A preprocessor is a program that processes its input
data to produce output that is used as input to another
program
The output is said to be a preprocessed form of the
input data, which is often used by some subsequent
programs like compilers
The preprocessor is executed before the actual
compilation of code begins, therefore the preprocessor
digests all these directives before any code is generated
by the statements
31
3/22/2020
Preprocessor
Preprocessor may perform the following
functions
Macro processing
File inclusion
Rational preprocessing
Conditional Compilation
Language extension
Macro Processing
A macro is a rule or pattern that specifies how a
certain input sequence (often a sequence of
characters) should be mapped to an output
sequence (also often a sequence of characters)
according to a defined procedure
The mapping process that instantiates
(transforms) a macro into a specific output
sequence is known as macro expansion
Macro preprocessor deals with
Macro definition
Macro use
32
3/22/2020
Macro Processing
Macro Definition
To define preprocessor macros we can use
#define construct in C++
It consists of name of macro and body
forming its definition
Its format is: #define identifier replacement
Identifier can be any valid name
This replacement can be an expression, a statement, a block
or simply anything
Example
#define PI 3.14
Macro Processing
Macro Use
When the preprocessor encounters this directive, it
replaces any occurrence of identifier in the rest of the
code by replacement
The preprocessor does not understand C++, it simply
replaces any occurrence of identifier by replacement
#define TABLE_SIZE 100
int table1[TABLE_SIZE];
int table2[TABLE_SIZE];
Afterthe preprocessor has replaced TABLE_SIZE,
the code becomes equivalent to
int table1[100];
int table2[100];
33
3/22/2020
Example
File Inclusion
Preprocessor includes header files into the
program text
When the preprocessor finds an #include
directive it replaces it by the entire content of the
specified file
There are two ways to specify a file to be
included:
#include "file“
#include <file>
The only difference between both expressions is
the places (directories) where the compiler is
going to look for the file
34
3/22/2020
File Inclusion
In the first case where the file name is specified
between double-quotes
The file is searched first in the same directory that
includes the file containing the directive
In case that it is not there, the compiler searches the file in
the default directories where it is configured to look for the
standard header files
If the file name is enclosed between angle-
brackets <>
The file is searched directly where the compiler is configured to
look for the standard header files
Therefore, standard header files are usually included in angle-
brackets, while other specific header files are included using
quotes
Relational Preprocessor
These processors augment older languages with
more modern flow of control and data structuring
facilities.
For example, such a preprocessor might
provide the user with built-in macros for
constructs like while-statements or if-
statements, where none exist in the
programming language itself
If an old language does not support “if” or “while”,
using relational preprocessor we can include it due
to macro (i.e. #)
35
3/22/2020
Conditional Compilation
Instruct preprocessor whether to include certain
chuck of code or not – allows programmer to
compile one part of his program leaving the
remaining program un-compiled
It's similar like a if statement. However, there is
a big difference you need to understand
The if statement is tested during the execution
time to check whether a block of code should
be executed or not whereas, the conditionals
is used to include (or skip) certain chucks of
code in your program before execution
Conditional Compilation
Uses of Conditional
Use different code depending on the machine,
operating system
Compile same source file in two different programs
To exclude certain code from the program but to keep
it as reference for future purpose
How to use conditional?
To use conditional, #ifdef, #if, #defined, #else and
#elseif directives are used
#ifdef Directive
#ifdef MACRO Here, the conditional codes
conditional codes are included in the program
#endif only if MACRO is defined.
36
3/22/2020
Conditional Compilation
#if, #elif and #else Directive
Here, expression is a expression of
#if expression integer type (can be integers, characters,
conditional codes arithmetic expression, macros and so
#endif on). The conditional codes are included
in the program only if the expression is
evaluated to a non-zero value.
Language Extension
These processors attempt to add capabilities to
the language by using built-in macros
For example, the language equal is a database query language
embedded in C
Statements begging with ## are taken by the preprocessor to be
database access statements unrelated to C and are translated
into procedure calls on routines that perform the database
access
The behavior of the compiler with respect to
extensions is declared with the #extension
directive:
#extension extension_name : behavior
#extension all : behavior
extension_name is the name of an extension
37
3/22/2020
Assembler
Typically a modern assembler creates object code by
translating assembly instruction mnemonics into
opcodes, and by resolving symbolic names for memory
locations and other entities
There are two types of assemblers based on how many
passes through the source are needed to produce the
executable program
One-pass assemblers go through the source code
once and assumes that all symbols will be defined
before any instruction that references them
Two-pass assemblers create a table with all
symbols and their values in the first pass, then use
the table in a second pass to generate code
Assembler
The advantage of a one-pass assembler is
speed, which is not as important as it once was
with advances in computer speed and
capabilities
The advantage of the two-pass assembler is
that symbols can be defined anywhere in the
program source
As a result, the program can be defined in a
more logical and meaningful way
This makes two-pass assembler programs
easier to read and maintain
38
3/22/2020
Linker
A linker is a program that takes one or more
objects generated by a compiler and combines
them into a single executable program.
Three tasks:
Searches the program to find library routines
used by program, e.g. printf(), math routines.
Determines the memory locations that code
from each module will occupy and relocates
its instructions by adjusting absolute
references
Resolves references among files Loader
Loader
A loader is the part of an operating system that
is responsible for loading programs, one of the
essential stages in the process of starting a
program
Loading a program involves reading the
contents of executable file - the file containing
the program text - into memory, and then
carrying out other required preparatory tasks to
prepare the executable for running
Once loading is complete, the operating system
starts the program by passing control to the
loaded program code
39
3/22/2020
Loader
Steps for loaders :
Read executable file's header to determine
the size of text and data segments
Create a new address space for the program
Copies instructions and data into address
space
Copies arguments passed to the program on
the stack
Jumps to a startup routine that copies the
program's arguments from the stack to
registers and calls the program's main routine
The End.
40