0% found this document useful (0 votes)

20 views

Unit 1 Slides

Compiler Design Notes

Uploaded by

Ayush Jindal

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views

Unit 1 Slides

Compiler Design Notes

Uploaded by

Ayush Jindal

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 49

Compiler Design

CSE 353

Unit 1 syllabus
Course Outcome
CO Course Outcome(CO)
Code

CO1 CO 1:Explain the concepts and different phases of compilation with

compile time error handling

CO2 :Represent language tokens using regular expressions, context free

grammar and finite automata and design lexical analyzer for a language

CO3 Compare top down with bottom up parsers, and develop appropriate
parser to produce parse tree representation of the input
CO4 Design syntax directed translation schemes for a given context free
grammar.
CO5 Generate intermediate code for statements in high level language,
Benefits and limitations of automatic memory management.

CO6 Apply optimization techniques to intermediate code and generate

machine code for high level language program
Introduction to Language processing
System

HLL

Pure HLL
• Pre-Processor – The pre-processor removes all the #include directives by including
the files called file inclusion and all the #define directives using macro expansion. It
performs file inclusion, augmentation, macro-processing etc.

• Assembly Language – Its neither in binary form nor high level. It is an intermediate
state that is a combination of machine instructions and some other useful data
needed for execution.

• Assembler – For every platform (Hardware + OS) we will have an assembler. They
are not universal since for each platform we have one. The output of assembler is
called object file. It translates assembly language to machine code.

• Interpreter – An interpreter converts high level language into low level machine
language, just like a compiler. But they are different in the way they read the input.
The Compiler in one go reads the inputs, does the processing and executes the
source code whereas the interpreter does the same line by line. Compiler scans the
entire program and translates it as a whole into machine code whereas an
interpreter translates the program one statement at a time. Interpreted programs
are usually slower with respect to compiled ones.
• Relocatable Machine Code – It can be loaded at any point and can be run. The
address within the program will be in such a way that it will cooperate for the
program movement.

• Loader/Linker – It converts the relocatable code into absolute code and tries to
run the program resulting in a running program or an error message (or sometimes
both can happen). Linker loads a variety of object files into a single file to make it
executable. Then loader loads it in memory and executes it.
Compiler
Compiler is a translator program that translates a program written in (HLL)
the source program and translate it into an equivalent program in (MLL) the
target program. As an important part of a compiler is error showing to the
programmer.

Structure of Compiler
Executing a program written in HLL programming language is basically of two parts.
the source program must first be compiled translated into an object program. Then
the result object program is loaded into a memory is executed.

Execution process of source program in Compiler

LIST OF COMPILERS
1. Ada compilers
2. ALGOL compilers
3. BASIC compilers
4. C# compilers
5. C compilers
6. C++ compilers
7. COBOL compilers
8. Common Lisp compilers
9. ECMAScript interpreters
10. Fortran compilers
11. Java compilers
12. Pascal compilers
13. PL/I compilers
14. Python compilers
15. Smalltalk compilers
STRUCTURE OF THE COMPILER DESIGN
Phases of a compiler: A compiler operates in phases. A phase is a logically
interrelated operation that takes source program in one representation and produces
output in another representation.

There are two phases of compilation.

a. Analysis (Machine Independent/Language Dependent)
b. Synthesis (Machine Dependent/Language Independent)

Compilation process is partitioned into no. of sub-processes called ‘phases’.

Structure of Compiler

Stream of Tokens

Parse Tree

refined Parse
Tree

3- Address Code

Optimized Code
Lexical Analyzer-

• It is the first phase of the compiler.

• It gets input from the source program and produces tokens as output.
• It reads the characters one by one, starting from left to right and forms the tokens.
Token : It represents a logically cohesive sequence of characters a + b = 20 Here,
a,b,+,=,20 are all separate tokens. Group of characters forming a token is called the
Lexeme.
•The lexical analyser not only generates a token but also characters such as keywords,
operators, identifiers, special symbols etc. Example: enter the lexeme into the symbol
table if it is not already there.

Syntax Analyzer –

•It is the second phase of the compiler. It is also known as parser.

•It gets the token stream as input from the lexical analyser of the compiler and
generates syntax tree as the output.
•Syntax tree: It is a tree in which interior nodes are operators and exterior nodes are
operands.
•Example: For a=b+c*2, syntax tree is
Semantic Analyzer –
•It is the third phase of the compiler.
•It gets input from the syntax analysis as parse tree and checks whether the given
syntax is correct or not.
•It performs type conversion of all the data types into real data types.

Intermediate Code Generator –

•It is the fourth phase of the compiler.
•It gets input from the semantic analysis and converts the input into output as
intermediate code such as three-address code.
•The three-address code consists of a sequence of instructions, each of which has
atmost three operands.
Example: t1=t2+t3
T1=t3
CODE OPTIMIZATION:
• It is the fifth phase of the compiler.
• It gets the intermediate code as input and produces optimized intermediate code as
output.
•This phase reduces the redundant code and attempts to improve the intermediate
code so that faster-running machine code will result.
•During the code optimization, the result of the program is not affected.
•To improve the code generation, the optimization involves
-deduction and removal of dead code (unreachable code).
- - calculation of constants in expressions and terms.
-- collapsing of repeated expression into temporary string.
-- loop unrolling. - moving code outside the loop.
-- removal of unwanted temporary variables.

CODE GENERATION:
•It is the final phase of the compiler.
•It gets input from code optimization phase and produces the target code or object
code as result.
•Intermediate instructions are translated into a sequence of machine instructions that
perform the same task.
•The code generation involves - allocation of register and memory - generation of
correct references - generation of correct data types - generation of missing code.
SYMBOL TABLE MANAGEMENT:
•Symbol table is used to store all the information about identifiers used in the program.
•It is a data structure containing a record for each identifier, with fields for the attributes
of the identifier.
•It allows to find the record for each identifier quickly and to store or retrieve data from
that record.
•Whenever an identifier is detected in any of the phases, it is stored in the symbol table.

ERROR HANDLING:
•Each phase can encounter errors. After detecting an error, a phase must handle the
error so that compilation can proceed.
•In lexical analysis, errors occur in separation of tokens.
•In syntax analysis, errors occur during construction of syntax tree.
•In semantic analysis, errors occur when the compiler detects constructs with right
syntactic structure but no meaning during type conversion.
•In code optimization, errors occur when the result is affected by the optimization. In
code generation, it shows error when code is missing etc.
Example
Lexical Analyzer
Lexical Analysis is the first phase of the compiler also known as a scanner. It converts
the High level input program into a sequence of Tokens.
•Lexical Analysis can be implemented with the Deterministic finite Automata.
•The output is a sequence of tokens that is sent to the parser for syntax analysis.
•Its task is to read the input characters and produce as output a sequence of tokens
that the parser uses for syntax analysis

•Upon receiving a “get next token” command from the parser, the lexical analyzer
reads input characters until it can identify the next token.
TOKENS
A token is a string of characters, categorized according to the rules as a symbol (e.g.,
IDENTIFIER, NUMBER, COMMA).
The process of forming tokens from an input stream of characters is called
tokenization.
Consider this expression in the C programming language: sum=3+2;
Lexeme Token Type
Sum Identifier
= Assignment Operator
3 Number
+ Addition Operator
2 Number
; End of statement

LEXEME: Collection or group of characters forming tokens is called Lexeme.

PATTERN: A pattern is a description of the form that the lexemes of a token may
take. In the case of a keyword as a token, the pattern is just the sequence of
characters that form the keyword. For identifiers and some other tokens, the pattern
is a more complex structure that is matched by many strings.
How Lexical Analyzer functions

1. Tokenization i.e. Dividing the program into valid tokens.

2. Remove white space characters.
3. Remove comments.

The function of lexical analysis is to tokenize and separate them out from the
program or statement.

How to separate them from program?

We have designed strong regular expression to represent the tokens and
design NFA/DFA to act as token recognizer.

Letter/Digit
Start
0 1 2
Letter delimiter

Transition Diagram for identifier

Consider the Program and find the number of token

int main()
{
// 2 variables
int a, b;
a = 10;
return 0;
}
Solution

'int' 'main‘ '(' ')‘ '{' 'int‘ 'a‘

',‘ 'b‘ ';‘ 'a‘ '=‘ '10‘

';‘ 'return‘ '0‘ ';’ '}‘

Question 2:
int max(x,y)
int x, y;
/* find max of x and y*/
{
return(x>y ? X:y);
}
Solution:
int max ( x , y )

int x , y ; { return

( x > y ? x :

y ) ; }

Question3 :

Printf (“%d, hi”, &x);

Passes

A pass refers to the number of times the compiler goes through the source
code.

Two types of Passes in compiler

Single-pass compiler goes through the program only once. In other words,
the single pass compiler allows the source code to pass through each
compilation unit only once. It immediately translates each code section into
its final machine code.

Multi-pass compiler goes through the source code several times. In other
words, it allows the source code to pass through each compilation unit several
times. Each pass takes the result of the previous pass as input and creates
intermediate outputs. Therefore, the code improves in each pass. The final
code is generated after the final pass

The main difference between phases and passes of compiler is that phases are the
steps in the compilation process while passes are the number of times
the compiler traverses through the source code.
Bootstrapping and Cross Compiler
Bootstrapping is a process in which simple language is used to translate
more complicated program which in turn may handle far more complicated
program. This complicated program can further handle even more complicated
program and so on.

A cross compiler is a compiler capable of creating executable code for a

platform other than the one on which the compiler is running. For example, a
compiler that runs on a Windows 7 PC but generates code that runs on Android
smartphone is a cross compiler.
Continued..

▪ Suppose we want to write a cross

compiler for new language X.
▪ The implementation language of this
compiler is say Y and the target
code being generated is in language
Z. That is, we create XYZ.
▪ Now if existing compiler Y runs on
machine M and generates code for
M then it is denoted as YMM.
▪ Now if we run XYZ using YMM
then we get a compiler XMZ.
▪ That means a compiler for source
language X that generates a target
code in language Z.
Continued..
Difference between Native Compiler and
Cross Compiler
Native Compiler Cross Compiler
Translates program for same Translates program for different
hardware/platform/machine on it is hardware/platform/machine other
running. than the platform which it is running.

It is used to build programs for same It is used to build programs for other
system/machine & OS it is installed. system/machine like AVR/ARM.

It is dependent on System/machine and It is independent of System/machine

OS and OS

It can generate executable file like .exe It can generate raw code .hex

TurboC or GCC is native Compiler. Keil is a cross compiler.

Deterministic Finite Automata
• DFA refers to deterministic finite automata.
Deterministic refers to the uniqueness of the
computation.
• The finite automata are called deterministic finite
automata if the machine is read an input string
one symbol at a time.
• In DFA, there is only one path for specific input
from the current state to the next state.
• DFA does not accept the null move, i.e., the DFA
cannot change state without any input character.
• DFA can contain multiple final states. It is used in
Lexical Analysis in Compiler.
• A DFA is a collection of 5-tuples:-
• Q: finite set of states
• ∑: finite set of the input symbol
• q0: initial state
• F: final state
• δ: Transition function

Transition function can be defined as: DFA

• δ: Q x ∑→Q
•
• Design a FA with ∑ = {0, 1} accepts those
string which starts with 1 and ends with 0.
• 10100
• 111
• 00110
• Design FA with ∑ = {0, 1} accepts the set of all
strings with three consecutive 0's.
• 100001
• 10001
Deterministic and Nondeterministic
Automata
• Deterministic Finite Automata (DFA)
– One transition per input per state
– No ε-moves
• Nondeterministic Finite Automata (NFA)
– Can have multiple transitions for one input in a
given state
– Can have ε-moves
• Finite automata have finite memory
– Need only to encode the current state

33
Execution of Finite Automata
• A DFA can take only one path through the
state graph
– Completely determined by input

• NFAs can choose

– Whether to make ε-moves
– Which of multiple transitions for a single input to
take

34
Acceptance of NFAs
• An NFA can get into multiple states
1

0 1

• Input: 1 0 1

• Rule: NFA accepts if it can get in a final state

35
NFA vs. DFA (1)
• NFAs and DFAs recognize the same set of
languages (regular languages)

• DFAs are easier to implement

– There are no choices to consider

36
NFA vs. DFA (2)
• For a given language the NFA can be simpler than
the DFA
1
0 0
NFA
0

1 0
0 0
DFA
1
1

• DFA can be exponentially larger than NFA

37
Regular Expressions to Finite Automata
• High-level sketch
NFA

Regular
expressions DFA

Lexical
Specification

38
Regular Expressions to NFA (1)
• For each kind of rexp, define an NFA
– Notation: NFA for rexp A

• For ε
ε

• For input a
a

39
Regular Expressions to NFA (2)
• For AB
A ε
B

• For A | B
B ε
ε
ε
ε A

40
Example of RegExp -> NFA conversion
• Consider the regular expression
(1 | 0)*1
• The NFA is
ε

ε C 1 E ε
A B 1
ε 0 G H ε I J
ε D F ε
ε

41
NFA to DFA. The Trick
• Simulate the NFA
• Each state of resulting DFA
= a non-empty subset of states of the NFA
• Start state
= the set of NFA states reachable through ε-moves from
NFA start state
• Add a transition S →a S’ to DFA iff
– S’ is the set of NFA states reachable from the states in
S after seeing the input a
• considering ε-moves as well

42
NFA -> DFA Example
ε

ε C 1 E ε
A B 1
ε 0 G H ε I J
ε D F ε
ε
0
0 FGABCDHI
ABCDHI 0 1
1
1 EJGABCDHI

43
Issues in Lexical Analysis

Lexical analysis is the process of producing tokens from the source program.
It has the following issues:

•Lookahead

•Ambiguities
Lookahead
•Lookahead is required to decide when one token will end
and the next token will begin. The simple example which
has lookahead issues are i vs if,
•= vs. ==. Therefore, a way to describe the lexemes of
each token is required.
•A way needed to resolve ambiguities
• Is if it is two variables i and f or if?
• Is == is two equal signs =, = or ==?
•Hence, the number of lookahead to be considered and a
way to describe the lexemes of each token is also needed.
Ambiguities

The lexical analysis programs written with lex accept ambiguous

specifications and choose the longest match possible at each input point.
Lex can handle ambiguous specifications. When more than one expression
can match the current input, lex chooses as follows:

•The longest match is preferred.

• Among rules which matched the same number of characters, the rule given
first is preferred.
Lexical Errors
A character sequence which is not possible to scan into any valid
token is a lexical error. Important facts about the lexical error:
▪ Lexical errors are not very common, but it should be managed by a
scanner

▪ Misspelling of identifiers, operators, keyword are considered as

lexical errors

▪ Generally, a lexical error is caused by the appearance of some

illegal character, mostly at the beginning of a token.
Error Recovery in Lexical Analyzer

Most common error recovery techniques:

▶ Removes one character from the remaining input

▶ In the panic mode, the successive characters are always ignored
until we reach a well-formed token

▶ By inserting the missing character into the remaining input

▶ Replace a character with another character

▶ Transpose two serial characters

Lexical Analyzer vs. Parser
Lexical Analyser Parser

Scan Input program Perform syntax analysis

Identify Tokens Create an abstract

representation of the code

Insert tokens into Symbol Update symbol table entries

Table
It generates lexical errors It generates a parse tree of the
source code

Assembly Language:Simple, Short, And Straightforward Way Of Learning Assembly Programming
From Everand
Assembly Language:Simple, Short, And Straightforward Way Of Learning Assembly Programming
Sherwyn Allibang
2/5 (1)
Comp-XM Examination Guide
No ratings yet
Comp-XM Examination Guide
15 pages
Arm A330
100% (1)
Arm A330
581 pages
Signature 1940
100% (2)
Signature 1940
68 pages
CSE353 Slides
No ratings yet
CSE353 Slides
76 pages
Compiler Notes
No ratings yet
Compiler Notes
66 pages
Introduction To Compilers - Analysis of The Source Program - Phases of A Compiler Phases of Compiler
No ratings yet
Introduction To Compilers - Analysis of The Source Program - Phases of A Compiler Phases of Compiler
25 pages
CC_unit_1
No ratings yet
CC_unit_1
70 pages
Compiler Construction
No ratings yet
Compiler Construction
63 pages
CS602PC - Compiler - Design - Lecture Notes - Unit - 1
100% (2)
CS602PC - Compiler - Design - Lecture Notes - Unit - 1
19 pages
CD Unit - 1 Lms Notes
No ratings yet
CD Unit - 1 Lms Notes
58 pages
7MCE1C4-Principles of Compiler Design
No ratings yet
7MCE1C4-Principles of Compiler Design
117 pages
Unit 1,2 PDF
No ratings yet
Unit 1,2 PDF
31 pages
Unit 1,2 PDF
No ratings yet
Unit 1,2 PDF
31 pages
Cousins of Compiler
100% (1)
Cousins of Compiler
25 pages
section c
No ratings yet
section c
16 pages
Unit-1 PCD
No ratings yet
Unit-1 PCD
28 pages
Compiler Design: Objectives
No ratings yet
Compiler Design: Objectives
45 pages
SSCDNotes PDF
100% (1)
SSCDNotes PDF
53 pages
_CD -unit 1
No ratings yet
_CD -unit 1
46 pages
Ch1 Introduction
No ratings yet
Ch1 Introduction
12 pages
Introduction
No ratings yet
Introduction
46 pages
Introduction Compiler
No ratings yet
Introduction Compiler
47 pages
Compiler Design Module
No ratings yet
Compiler Design Module
120 pages
Compiler Notes
No ratings yet
Compiler Notes
68 pages
Lec#1
No ratings yet
Lec#1
36 pages
UNIT-I Compiler Design - SCS1303: School of Computing Department of Computer Science and Engineering
No ratings yet
UNIT-I Compiler Design - SCS1303: School of Computing Department of Computer Science and Engineering
27 pages
SCS13033
No ratings yet
SCS13033
121 pages
Compiler Design
No ratings yet
Compiler Design
5 pages
Analysis Phases:: The Structure of A Compiler
No ratings yet
Analysis Phases:: The Structure of A Compiler
9 pages
Compiler Unit - 1 PDF
No ratings yet
Compiler Unit - 1 PDF
16 pages
Compiler Construction
No ratings yet
Compiler Construction
244 pages
Introduction of Compiler Design.docx
No ratings yet
Introduction of Compiler Design.docx
118 pages
Compiler Design - Module 1-Notes
No ratings yet
Compiler Design - Module 1-Notes
74 pages
Unit 1
No ratings yet
Unit 1
29 pages
Unit 1
No ratings yet
Unit 1
29 pages
Unit 1 cd own (4)
No ratings yet
Unit 1 cd own (4)
33 pages
Unit I SRM
100% (1)
Unit I SRM
36 pages
Core Course Viii Compiler Design Unit I
No ratings yet
Core Course Viii Compiler Design Unit I
27 pages
Compiler Design Quick Guide
No ratings yet
Compiler Design Quick Guide
45 pages
CD - Unit I
No ratings yet
CD - Unit I
30 pages
CDUnit 1
No ratings yet
CDUnit 1
39 pages
CD All Units
No ratings yet
CD All Units
117 pages
Complier Design Btech 6th semester - CSE-A-B Unit 01
No ratings yet
Complier Design Btech 6th semester - CSE-A-B Unit 01
33 pages
Compiler 1
No ratings yet
Compiler 1
11 pages
Language Processor Notes
No ratings yet
Language Processor Notes
4 pages
PART1 - Compiler Lecture Notes
No ratings yet
PART1 - Compiler Lecture Notes
7 pages
Compiler Design Notes
No ratings yet
Compiler Design Notes
35 pages
Quick Book of Compiler
100% (1)
Quick Book of Compiler
66 pages
Compiler Lecture 3 4 5
No ratings yet
Compiler Lecture 3 4 5
14 pages
Compiler Final Modified
No ratings yet
Compiler Final Modified
33 pages
Unit 1 Compiler PDF
No ratings yet
Unit 1 Compiler PDF
19 pages
Compiler Design
No ratings yet
Compiler Design
29 pages
COMPILER DESIGNS
No ratings yet
COMPILER DESIGNS
17 pages
Introduction to Compiler
No ratings yet
Introduction to Compiler
10 pages
Compiler
No ratings yet
Compiler
17 pages
Compiler Design
No ratings yet
Compiler Design
56 pages
Compiler Design
No ratings yet
Compiler Design
15 pages
cd
No ratings yet
cd
12 pages
Compiler Final 27.4.24
No ratings yet
Compiler Final 27.4.24
36 pages
Unit 1 CD
No ratings yet
Unit 1 CD
26 pages
Chapter 1 Introduction
No ratings yet
Chapter 1 Introduction
9 pages
Compiler Design
From Everand
Compiler Design
Knowledge Flow
No ratings yet
C Unit-Business Studies
No ratings yet
C Unit-Business Studies
44 pages
Critical Load of Slender Elastomeric Seismic Isolators
No ratings yet
Critical Load of Slender Elastomeric Seismic Isolators
7 pages
Full Ironclad Captains of The Civil War 1st Edition Myron J. Smith Ebook All Chapters
100% (3)
Full Ironclad Captains of The Civil War 1st Edition Myron J. Smith Ebook All Chapters
76 pages
Ryse Energy Data Sheet E 10
No ratings yet
Ryse Energy Data Sheet E 10
2 pages
FINAL EXAM Grade 8 3rd Quarter
No ratings yet
FINAL EXAM Grade 8 3rd Quarter
5 pages
Construction of Wire Rope: MR. Ajit Vilas Deshmukh
No ratings yet
Construction of Wire Rope: MR. Ajit Vilas Deshmukh
8 pages
Bloodbank
No ratings yet
Bloodbank
54 pages
Soalan Matematik Tambahan Kertas 1 SABAH
No ratings yet
Soalan Matematik Tambahan Kertas 1 SABAH
18 pages
ABC Chemistry Rates Modern Publishers
18% (11)
ABC Chemistry Rates Modern Publishers
5 pages
Accountant Controller Finance Analyst in Lakeland FL Resume Fernando Bertolli
No ratings yet
Accountant Controller Finance Analyst in Lakeland FL Resume Fernando Bertolli
3 pages
S5 Ch.5 Permutation and Combination
No ratings yet
S5 Ch.5 Permutation and Combination
15 pages
Block Approach Butterfield
No ratings yet
Block Approach Butterfield
5 pages
coffee supplies direct Invoice 2nd
No ratings yet
coffee supplies direct Invoice 2nd
1 page
The Extra Long Breath Beacon
No ratings yet
The Extra Long Breath Beacon
4 pages
School Grant 2024 Letter (1)
No ratings yet
School Grant 2024 Letter (1)
474 pages
STL Varun Ehs M A02 Hira R0
No ratings yet
STL Varun Ehs M A02 Hira R0
6 pages
MedJDYPatilUniv9191-1385851 002305
No ratings yet
MedJDYPatilUniv9191-1385851 002305
4 pages
Lte s1 Handover
No ratings yet
Lte s1 Handover
5 pages
Mms
No ratings yet
Mms
18 pages
Skype Login Test Cases Updated
No ratings yet
Skype Login Test Cases Updated
4 pages
Grade 8 DLL Oct 14-18
No ratings yet
Grade 8 DLL Oct 14-18
8 pages
T. Jukariya and R. Singhvi PDF
No ratings yet
T. Jukariya and R. Singhvi PDF
8 pages
New Disney
No ratings yet
New Disney
3 pages
8779303-STANDARD
No ratings yet
8779303-STANDARD
4 pages
48 Yak54 EXP Manual
No ratings yet
48 Yak54 EXP Manual
27 pages
Lesson Plans 6th Semester Civil
No ratings yet
Lesson Plans 6th Semester Civil
16 pages
June 2012 (v2) MS - Paper 1 CIE Biology IGCSE
No ratings yet
June 2012 (v2) MS - Paper 1 CIE Biology IGCSE
2 pages