Course Name:
COMPILER DESIGN
Course code:
SCS1303
1/3/23 SCS1303 Compiler Design 1
Course Objective:
• To explore the principles, algorithms, and data structures
involved in the design and construction of compilers.
• To introduce the theory and tools that can be employed in
order to perform syntax-directed translation of a high-level
programming language into an executable code.
Prerequisite Courses:
• Theory of Computation
• Data Structures
• Programming Languages
Syllabus:Compiler_Design.doc
1/3/23 SCS1303 Compiler Design 2
1/3/23 SCS1303 Compiler Design 3
UNIT 1 LEXICAL ANALYSIS
• Structure of compiler
• Functions and Roles of lexical phase
• Input buffering
• Representation of tokens using regular expression
• Properties of regular expression
• Finite Automata
• Regular Expression to Finite Automata
• NFA to Minimized DFA.
1/3/23 SCS1303 Compiler Design 4
Outline
• Software
• Languages
• Language Processors
• Language Processing System
• Need for Compiler
• Structure of Compiler
1/3/23 SCS1303 Compiler Design 5
Software
• Software is a set of computer programs which are
designed and developed to perform specific task desired
by the user or the computer itself.
Software
System Application
Software Software
System System System General Special
Control Support Development Purpose Purpose
Programs Programs Programs
1/3/23 SCS1303 Compiler Design 6
System S/W & Application S/W
1/3/23 SCS1303 Compiler Design 7
• System Control Programs:
They control the execution of programs, manage the storage
and processing resources of the computer and perform other
management and monitoring functions. e.g.. OS
• System Support Programs:
They provide routine service functions to other computer
programs and computer users. e.g.. Utility Programs
• System Development Programs:
They assist in the creation of publication [Link].
Language translators like interpreters, compilers and
assemblers.
1/3/23 SCS1303 Compiler Design 8
Languages
• Low Level language is machine friendly. Commands or
functions in the language map closely to processor
instructions.
• Assembly language is machine dependent ,the mnemonics
that are being used to represent instructions are not directly
understandable by machine.
• High Level language is machine independent.
• A computer understands instructions in machine code, i.e. in
the form of 0s and 1s. It is a tedious task to write a computer
program directly in machine code.
• The programs are written mostly in high level languages like
Java, C++, Python etc. and are called source code.
1/3/23 SCS1303 Compiler Design 9
Language Processor
• Source code cannot be executed directly by the computer and
must be converted into machine language to be executed.
• Hence, special translator system software is used to translate
the program written in high-level language into machine
code(object program / object code) is called Language
Processor .
There are three types of translator programs:
– Assembler
– Interpreter
– Compiler
1/3/23 SCS1303 Compiler Design 10
Assembler
• Translate the program written in Assembly language into
machine code.
1/3/23 SCS1303 Compiler Design 11
Compiler
• Reads the source program written in High Level
Language(HLL) as a whole in one go and translates it into an
equivalent machine language program.
Example HLL: C, C++, C#, Java…
• The compiler specifies the errors at the end of compilation
with line numbers.
• The errors must be removed before the compiler can
successfully recompile the source code again.
1/3/23 SCS1303 Compiler Design 12
Interpreter
• The translation of single statement of source program into machine
code and executes it immediately before moving on to the next line is
called an interpreter.
• If there is an error in the statement, the interpreter terminates its
translating process at that statement and displays an error message.
• An Interpreter directly executes instructions written in a
programming or scripting language without previously converting
them to an object code or machine code. Example: Perl, Python and
Matlab.
• An interpreter generally uses one of the following strategies for
program execution:
– Parse the source code and perform its behavior directly;
– Translate source code into some efficient intermediate
representation and immediately execute this;
– Explicitly execute stored precompiled code made by a compiler which is
part of the interpreter system.
1/3/23 SCS1303 Compiler Design 13
What is the difference?
Once a program is compiled, its source code is not useful for running the
code.
For interpreted programs, the source code is needed to run the program
every time.
1/3/23 SCS1303 Compiler Design 14
Compiler vs Interpreter
COMPILER INTERPRETER
A compiler is a program which coverts the
Interpreter takes a source program and
entire source code of a programming runs it line by line, translating each line as
language into executable machine code for
it comes to it.
a CPU.
Compiler takes large amount of time to Interpreter takes less amount of time to
analyze the entire source code but the
analyze the source code but the overall
overall execution time of the program is
execution time of the program is slower.
comparatively faster.
Compiler generates the error message Its Debugging is easier as it continues
only after scanning the whole program, so
translating the program until the error is
debugging is hard as the error can be
present any where in the program. met.
Generates intermediate object code. No intermediate object code is generated.
1/3/23 SCS1303 Compiler Design 15
• Languages that use compilers; Pascal, C ++, Ada, Visual Basic, C
and many more.
• Languages that use interpreters; Python,HTML, XML, PHP, Script
Languages.
1/3/23 SCS1303 Compiler Design 16
Language Processing System
1/3/23 SCS1303 Compiler Design 17
Language Processing System
Preprocessor:
– It replaces macros with their definitions, discovers
dependencies and resolves preprocessor directives.
Linker:
– Link and merge various object files to create an executable
file.
– Search for called modules in a program and to find out the
memory location where all modules are stored.
Loader:
– Performs the tasks of loading executable files into memory
and run them.
– Calculates the size of a program which creates additional
memory space.
1/3/23 SCS1303 Compiler Design 18
Need for Compiler
• A Computer understands only binary language and executes
instructions coded in binary language.
• It is difficult write every program as a sequence of 0s and 1s?
• Humans are good at giving instructions in English language,
whereas computers can only process binary language.
• So, there was a need of a translator that translates the
computer instructions given in English language to binary
language.
1/3/23 SCS1303 Compiler Design 19
Applications
• The techniques used in compiler design can be applicable to many
problems in computer science.
– Techniques used in a lexical analyzer can be used in text editors,
information retrieval system and pattern recognition programs.
– Techniques used in a parser can be used in a query processing
system such as SQL.
– Many software having a complex front-end may need
techniques used in compiler design.
– Techniques used in compiler design can be used in Natural
Language Processing(NLP) systems.
1/3/23 SCS1303 Compiler Design 20
Structure of Compiler
1/3/23 SCS1303 Compiler Design 21
Analysis Part:
– Breaks up the source code into pieces and imposes a grammatical
structure.
– Uses this structure to create an intermediate representation of the
source code.
– Checks for syntax/semantics and provides messages to user in
case of errors.
– Builds a symbol table to collect and store information about source
code.
– It is also termed as front end of compiler.
.
1/3/23 SCS1303 Compiler Design 22
The analysis of a source program is divided into
mainly three phases. They are:
Linear Analysis
Hierarchical Analysis
Semantic Analysis
1/3/23 SCS1303 Compiler Design 23
Synthesis part:
– Synthesis part takes the intermediate representation as
input and transforms it to the target program.
– It is also termed as back end of compiler.
1/3/23 SCS1303 Compiler Design 24
Lexical Analysis or Scanner
• Reads characters in the source code and groups them into
meaningful word called Token.
• Tokens fall into these categories: Keywords, Operators,
Identifiers,Constants,Literals strings and Punctuation.
• Pattern: Rule for a set of string in the input for which a token is
produced as output.
• A Lexeme is a sequence of characters in the source code that is
matched by the Pattern for a Token.
Input: stream of characters
Output: Token
1/3/23 SCS1303 Compiler Design 25
Rules or regular expression
Regular expression for identifier:
Letter (Letter | Digit)* Kleene’s Closure
alternate operator
concatenation operator
Transition diagram for identifiers
where ,
letter –alphabet set {a-z A-Z}
digit –number set {0-9}
1/3/23 SCS1303 Compiler Design 26
APPLICATIONS
Transition diagram for Bank
Transition diagram for Traffic signal
1/3/23 SCS1303 Compiler Design 27
Syntax Analysis or Parser
• Parser groups the tokens produced by lexical analyzer into
grammatical phrases or syntactic variables.
• Generates a tree like representation called parse tree .
• A parse tree describes the syntactic structure of the input.
Parse tree for position :=initial + rate * 60
1/3/23 SCS1303 Compiler Design 28
Applications
• Grammar Checking
• Indexing for information retrieval
• Information extraction
• Machine translation
• Speech synthesis from parses
• Speech recognition using parsing
1/3/23 SCS1303 Compiler Design 29
Semantic Analysis
• Checks for semantic errors .
• Concentrates on Type checking -whether operands are type
compatible.
• e.g. real number used to index an array.
Semantic analysis inserts a conversion of integer to real
1/3/23 SCS1303 Compiler Design 30
Intermediate Code Generation
• Produces intermediate representations for the source program
which are of the following forms:
– Postfix notation
– Three address code
– Syntax tree
• Commonly used form is the three address code.
where t1 , t2 , t3 are compiler generated temporary variables.
Properties of intermediate code
• It should be easy to produce and easy to translate into target
program.
1/3/23 SCS1303 Compiler Design 31
Postfix
1/3/23 SCS1303 Compiler Design 32
Code Optimizer
• Improves and produces optimized intermediate code as
output.
e.g.
t1 = id3* 60.0
id1 = id2 + t1
• Helps in generating fast running machine code.
• Techniques used:
– Common sub expression elimination
– Dead code elimination
– Constant folding
– Copy propagation
– Code motion
– Reduction in strength
– Induction variable elimination......etc.
1/3/23 SCS1303 Compiler Design 33
e.g. Constant folding e.g. Code motion
Before:
int f (void)
{
return 3 + 5;
}
After:
int f (void)
{
return 8;
}
e.g. Copy Propagation
Before:
y=x
z=3+y
After:
z=3+x
1/3/23 SCS1303 Compiler Design 34
Code Generation
• Intermediate instructions are translated into a sequence of
machine instructions that perform the same task.
• Using registers R1 and R2 the translation of the e.g. is:
1/3/23 SCS1303 Compiler Design 35
Symbol Table Management
• Stores all the information about identifiers used in the
program.
• Data structure containing a record for each identifier, with
fields for the attributes of the identifier.
• Whenever an identifier is detected in any of the phases, it is
stored in the symbol table.
1/3/23 SCS1303 Compiler Design 36
Error Handling
• Each phase can encounter errors. After detecting an error, a
phase must handle the error so that compilation can proceed.
Few errors:
– In lexical analysis: misspelled name of some identifier.
– In syntax analysis: missing semicolon or unbalanced
parenthesis.
– In Semantic analysis: Type mismatch.
– In code optimization, errors occur when the result is
affected by the optimization.
– In code generation, it shows error when code is missing
etc.
1/3/23 SCS1303 Compiler Design 37
Translation of a statement
1/3/23 SCS1303 Compiler Design 38
References
1. Alfred V. Aho, Jeffery D. Ullman & Ravi Sethi, ” Compiler
Principles, Techniques & Tools”, Addison- Wesley Publishing
Company,1986.
2. Alfred V. Aho, Jeffery D. Ullman, “Principles of Compiler
Design”, Narosa Publishing House, 15th reprint, 1996.
3. D M. Dhamdhere , “System Programming”, 2nd Edition, Tata
McGraw Hill Publishing, 1999.
1/3/23 SCS1303 Compiler Design 39
THANK YOU
1/3/23 SCS1303 Compiler Design 40