0% found this document useful (0 votes)

164 views

Phases of Compiler

The document discusses the different phases of compilation: lexical analysis, syntax analysis, semantic analysis, and intermediate code generation. Lexical analysis divides the source code string into tokens. Syntax analysis uses these tokens to construct a syntax tree that represents the grammatical structure. Semantic analysis checks the program for semantic consistency and performs type checking. Intermediate code generation translates the program into an intermediate representation as an intermediate step before generating target machine code. Using an intermediate representation provides benefits like modularity and the ability to target multiple machines from a single representation.

Uploaded by

ankita3031

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

164 views

Phases of Compiler

Uploaded by

ankita3031

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 9

UNIT 3

Compiler

3.1 Lexical analysis

The word lexical in the traditional sense means pertaining to words. In terms of
programming languages, words are entities like variable names, numbers, keywords etc. Such
words are traditionally called tokens/Lexemes.
A lexical analyser, or lexer for short, will as its input take a string of individual letters and divide
this string into tokens. Additionally, it will filter out whatever separates the tokens (the so-called
white-space), i.e., lay-out characters (spaces, newlines etc.) and comments.
The main purpose of lexical analysis is to make task easier for the subsequent syntax analysis
phase. In theory, the work that is done during lexical analysis can be made an integral part of
syntax analysis, and in simple systems this is indeed often done. However, there are reasons for
keeping the phases separate:

Efficiency: A lexer may do the simple parts of the work faster than the more general parser can.
Furthermore, the size of a system that is split in two may be smaller than a combined system. It
is usually not terribly difficult to write a lexer by hand Furthermore, a handwritten lexer may be
complex and difficult to maintain. Hence, lexers are normally constructed by lexer generators,
which transform human-readable specifications of tokens and white-space into efficient
programs. For lexical analysis, specifications are traditionally written using regular expressions:
The generated lexers are in a class of extremely simple programs called finite automata.

3.2 Syntax analysis

The syntax analysis phase of a compiler will take a string of tokens produced by the lexer, and
from this construct a syntax tree for the string by finding a derivation of the string from the start
symbol of the grammar. Where lexical analysis splits the input into tokens, the purpose of syntax
analysis (also known as parsing) is to recombine these tokens. Not back into a list of characters,
but into something that reflects the structure of the text. As the name indicates, this is a tree
structure. The leaves of this tree are the tokens found by the lexical analysis, and if the leaves are
read from left to right, the sequence is the same as in the input text. Hence, what is important in
the syntax tree is how these leaves are combined to form the structure of the tree and how the
interior nodes of the tree are labeled. In addition to finding the structure of the input text, the
syntax analysis must also reject invalid texts by reporting syntax errors.
This can be done by guessing derivations until the right one is found, but random guessing is
hardly an effective method. Even so, some parsing techniques are based on guessing
derivations. However, these make sure, by looking at the string, that they will always guess right.
These are called predictive parsing methods. Predictive parsers always build the syntax tree from
the root down to the leaves and are hence also called (deterministic) top-down parsers
Other parsers go the other way: They search for parts of the input string that matches right-hand
sides of productions and rewrite these to the left-hand non terminals, at the same time building
pieces of the syntax tree. The syntax tree is eventually completed when the string has been
rewritten (by inverse derivation) to the start symbol. Also here, we wish to make sure that we

always pick the right rewrites, so we get deterministic parsing. Such methods are called
bottom-up parsing methods.

3.3 Semantic analysis:

The semantic analysis phase deals with the following tasks.

Type checking of expressions

Type checking of function declaration
Verification of Assignment operation
Handling various programming features such as polymorphism, type conversion
etc.

3.4 Intermediate code generation:

The final goal of a compiler is to get programs written in a high-level language to run on a
computer. This means that, eventually, the program will have to be expressed as machine code
which can run on the computer. This doesnt mean that we need to translate directly from the
high-level abstract syntax to machine code. Many compilers use a medium-level language as a
stepping-stone between the high-level language and the very low-level machine code. Such
stepping-stone languages are called intermediate code
Apart from structuring the compiler into smaller jobs, using an intermediate language has other
advantages:
If the compiler needs to generate code for several different machine-architectures, only one
translation to intermediate code is needed. Only the translation from intermediate code to
machine language (i.e., the back-end) needs to be written in several versions.
If several high-level languages need to be compiled, only the translation to intermediate code
need to be written for each language. They can all share the back-end, i.e., the translation from
intermediate code to machine code.

Instead of translating the intermediate language to machine code, it can be interpreted by a

small program written in machine code or a language for which a compiler already exists.

The advantage of using an intermediate language is most obvious if many languages are to be
compiled to many machines. If translation is done directly, the number of compilers is equal to
the product of the number of languages and the number of machines. If a common intermediate
language is used, one front-end (i.e., compiler to intermediate code) is needed for every language
and one backend is needed for each machine, making the total equal to the sum of the number of
languages and the number of machines.
If an interpreter for the intermediate language is written in a language for which there already
exist compilers on the target machines, the interpreter can be compiled on each of these. This
way, there is no need to write a separate back-end for each machine.
The advantages of this approach are:
No actual back-end needs to be written for each new machine.
A compiled program can be distributed in a single intermediate form for all machines, as
opposed to shipping separate binaries for each machine.
The intermediate form may be more compact than machine code. This saves space both in
distribution and on the machine that executes the programs (though the latter is somewhat offset
by requiring the interpreter to be kept in memory during execution).

Example of source code translation:

Let us consider the following grammar and a string to implement all the phases of translation on
it.
Grammar:

Exp-> Exp + Exp

Exp -> Exp Exp
Exp-> Exp + Exp
Exp-> Exp + Exp
Exp-> num

String:

Position=initial+rate*60

Lexical analysis:
The lexical analyzer reads the input stream and groups them into meaningful sequence called
lexemes. For each lexeme the lexer produces tokens.
Position

<id,1>

<=>

Initial

< id,2>

<+>

Rate

<id,3>

<* >

<60>

And the statement is represented as below

<id,1><=><id,2><+><id,3><*><60>
Syntax analysis:
The second phase of the compiler is syntax analysis or parsing. The parser uses the first
components of the tokens produced by the lexical analyzer to create a tree-like intermediate
representation that depicts the grammatical structure of the token stream. A typical representation
is a syntax tree in which each interior node represents an operation and the children of the node
represent the arguments of the operation The tree has an interior node labeled * with ( id, 3 ) as
its left child and the integer 60 as its right child. The node (id, 3) represents the identifier rate.
The node labeled * makes it explicit that we must first multiply the value of rate by 60. The node
labeled + indicates that we must add the result of this multiplication to the value of initial. The
root of the tree, labeled =, indicates that we must store the result of this addition into the location
for the identifier posit ion. This ordering of operations is consistent with the usual conventions of
arithmetic which tell us that multiplication has higher precedence than addition, and hence that
the multiplication is to be performed before the addition.

Semantic Analysis:

The semantic analyzer uses the syntax tree and the information in the symbol table to check the
source program for semantic consistency with the language definition. It also gathers type
information and saves it in either the syntax tree or the symbol table, for subsequent use during
intermediate-code generation. An important part of semantic analysis is type checking, where the
compiler checks that each operator has matching operands. For example, many programming
language definitions require an array index to be an integer; the compiler must report an error if a
floating-point number is used to index an array.
The language specification may permit some type conversions called coercions. For example, a
binary arithmetic operator may be applied to either a pair of integers or to a pair of floating-point
numbers. If the operator is applied to a floating-point number and an integer, the compiler may
convert or coerce the integer into a floating-point number. Suppose that position, initial, and rate
have been declared to be floating-point numbers, and that the lexeme 60 by itself forms an
integer. The type checker in the semantic analyzer discovers that the operator * is applied to a
floating-point number rate and an integer 60. In this case, the integer may be converted into a
floating-point number. In the below Fig. notice that the output of the semantic analyzer has an
extra node for the operator inttofloat , which explicitly converts its integer argument into a
floating-point number

Intermediate Code Generation:

In the process of translating a source program into target code, a compiler may construct one or
more intermediate representations, which can have a variety of forms. One form is the Syntax
trees are a form of intermediate representation; they are commonly used during syntax and
semantic analysis. After syntax and semantic analysis of the source program, many compilers
generate an explicit low-level or machine-like intermediate representation, which we can think of
as a program for an abstract machine. This intermediate representation should have two
important properties: it should be easy to produce and it should be easy to translate into the target
machine.
The second type is the three address code which consists of a sequence of assembly-like
instructions with three operands per instruction. The output of the intermediate code generator
consists of the following three-address code sequence
t l= inttofloat (60)
t2= id3 * t l
t3= id2 + t2
id1 = t3

Code Optimization
The machine-independent code-optimization phase attempts to improve the intermediate code so
that better target code will result. Usually better means faster, but other objectives may be
desired, such as shorter code, or target code that consumes less power. For example, a
straightforward algorithm generates the intermediate code as in previous phase, using an
instruction for each operator in the tree representation that comes from the semantic analyzer.
A simple intermediate code generation algorithm followed by code optimization is a reasonable
way to generate good target code. The optimizer can deduce that the conversion of 60 from
integer to floating point can be done once and for all at compile time, so the inttofloat operation
can be eliminated by replacing the integer 60 by the floating-point number 60.0. Moreover, t3 is
used only once to transmit its value to id1 so the optimizer can transform the intermediate code
as produced by the previous phase, into the shorter sequence as below.

t 1 = id3 * 60 . 0
id1 = id2 +t 1

Code Generation
The code generator takes as input an intermediate representation of the source program and maps
it into the target language. If the target language is machine code, registers or memory locations
are selected for each of the variables used by the program. Then, the intermediate instructions are
translated into sequences of machine instructions that perform the same task. A crucial aspect of
code generation is the judicious assignment of registers to hold variables.
For example, using registers R1 and R2, the intermediate code produced by previous phase might
get translated into the following machine code
LDF

R2 , id3

MULF

R2 , #60 . 0

LDF

R1 , id2

ADDF

R 1 , R2

STF

id1 , R1

The first operand of each instruction specifies a destination. The F in each instruction tells us that
it deals with floating-point numbers. The instruction in above code loads the contents of address
id3 into register R2, then multiplies it with floating-point constant 60.0. The # signifies that 60.0
is to be treated as an immediate constant. The third instruction moves id2 into register Rl and the
fourth adds to it the value previously computed in register R2. Finally, the value in register Rl is
stored into the address of id l , so the code correctly implements the assignment statement that
we started with.

3.5 Compiler- Construction Tools

The compiler writer, like any software developer, can profitably use modern software
development environments containing tools such as language editors, debuggers, version
managers, profilers, test harnesses, and so on. In addition to these general software-development
tools, other more specialized tools have been created to help implement various phases of a
compiler.

These tools use specialized languages for specifying and implementing specific components, and
many use quite sophisticated algorithms. The most successful tools are those that hide the details
of the generation algorithm and produce components that can be easily integrated into the
remainder of the compiler.
Some commonly used compiler-construction tools include
1. Parser generators that automatically produce syntax analyzers from a grammatical description
of a programming language.
2. Scanner generators that produce lexical analyzers from a regular-expression description of the
tokens of a language.
3. Syntax-directed translation engines that produce collections of routines for walking a parse
tree and generating intermediate code.
4. Code-generator generators that produce a code generator from a collection of rules for
translating each operation of the intermediate language into the machine language for a target
machine.
5. Data-flow analysis engines that facilitate the gathering of information about how values are
transmitted from one part of a program to each other part. Data-flow analysis is a key part of
code optimization.
6. Compiler- construction toolkits that provide an integrated set of routines for constructing
various phases of a compiler

1 Lexial Analysis
No ratings yet
1 Lexial Analysis
24 pages
Lecture 01
No ratings yet
Lecture 01
47 pages
Lecture1 - Compiler Design
No ratings yet
Lecture1 - Compiler Design
52 pages
Unit 1
No ratings yet
Unit 1
37 pages
Introduction Compiler
No ratings yet
Introduction Compiler
47 pages
Introduction
No ratings yet
Introduction
46 pages
Introduction To Compilation
No ratings yet
Introduction To Compilation
33 pages
Overview of Compiler Environment Pass and Phase Phases of Compiler Regular Expression Lexical Analyzer LEX Tool Bootstrapping
No ratings yet
Overview of Compiler Environment Pass and Phase Phases of Compiler Regular Expression Lexical Analyzer LEX Tool Bootstrapping
35 pages
Introduction To Compiling
100% (1)
Introduction To Compiling
26 pages
Compiler Design
No ratings yet
Compiler Design
118 pages
Compiler Design Note1
No ratings yet
Compiler Design Note1
111 pages
Compiler Design: Dr. M. Moshiul Hoque Dept. of CSE, CUET
No ratings yet
Compiler Design: Dr. M. Moshiul Hoque Dept. of CSE, CUET
53 pages
Compiler Design
No ratings yet
Compiler Design
19 pages
Lec#1
No ratings yet
Lec#1
36 pages
Compiler Design
No ratings yet
Compiler Design
19 pages
Compiler Design
No ratings yet
Compiler Design
7 pages
Chapter 1 - Introduction To Comp
No ratings yet
Chapter 1 - Introduction To Comp
27 pages
SPCCPDF
No ratings yet
SPCCPDF
83 pages
Compiler Desining Complete Notes
No ratings yet
Compiler Desining Complete Notes
175 pages
Compiler Design Slide Chapter 1-6
No ratings yet
Compiler Design Slide Chapter 1-6
250 pages
_CD -unit 1
No ratings yet
_CD -unit 1
46 pages
CD 1
No ratings yet
CD 1
23 pages
Compiler Design
No ratings yet
Compiler Design
117 pages
Unit-1 Introduction To Compilers: Goals of Translation
No ratings yet
Unit-1 Introduction To Compilers: Goals of Translation
22 pages
Chapter One-Introduction
No ratings yet
Chapter One-Introduction
6 pages
Lecture#1 2
No ratings yet
Lecture#1 2
54 pages
Compiler Construction CSEC325 Token
No ratings yet
Compiler Construction CSEC325 Token
2 pages
Chapter 1
No ratings yet
Chapter 1
43 pages
Cdmodule 1
No ratings yet
Cdmodule 1
25 pages
CD FINALIZED NOTES
No ratings yet
CD FINALIZED NOTES
6 pages
Compiler Design Module
No ratings yet
Compiler Design Module
120 pages
CST302_FullNotes
No ratings yet
CST302_FullNotes
134 pages
Chapter 1 - Introduction
No ratings yet
Chapter 1 - Introduction
27 pages
CH1 3
No ratings yet
CH1 3
32 pages
Comp Chapter 1
No ratings yet
Comp Chapter 1
31 pages
Automata Theory and Compiler Design
No ratings yet
Automata Theory and Compiler Design
55 pages
Compiler Construction Week 2
No ratings yet
Compiler Construction Week 2
29 pages
Compilation Phases
No ratings yet
Compilation Phases
20 pages
INTRO TO COMPILERS
No ratings yet
INTRO TO COMPILERS
77 pages
Compiler Design Mod 1
No ratings yet
Compiler Design Mod 1
75 pages
Unit-1 PCD
No ratings yet
Unit-1 PCD
28 pages
Compiler_unit1
No ratings yet
Compiler_unit1
23 pages
Unit 1
No ratings yet
Unit 1
50 pages
Unit 1 Compiler Design
No ratings yet
Unit 1 Compiler Design
124 pages
Compiler Assignment
No ratings yet
Compiler Assignment
6 pages
CSC 318 Class Notes
No ratings yet
CSC 318 Class Notes
21 pages
Compiler Design: Instructor: Mohammed O. Samara University
No ratings yet
Compiler Design: Instructor: Mohammed O. Samara University
28 pages
Cousins of Compiler
100% (1)
Cousins of Compiler
25 pages
Introduction To Compiler Design-Unit I
No ratings yet
Introduction To Compiler Design-Unit I
30 pages
BR Phaseisofcompiler Presention
No ratings yet
BR Phaseisofcompiler Presention
30 pages
CH 02 - PL
No ratings yet
CH 02 - PL
92 pages
UNIT-I Compiler Design - SCS1303: School of Computing Department of Computer Science and Engineering
No ratings yet
UNIT-I Compiler Design - SCS1303: School of Computing Department of Computer Science and Engineering
27 pages
SCS13033
No ratings yet
SCS13033
121 pages
Compiler Design
No ratings yet
Compiler Design
47 pages
CD_ UNIT-1
No ratings yet
CD_ UNIT-1
10 pages
BCS 324 Lesson 1
No ratings yet
BCS 324 Lesson 1
28 pages
Compiler Design
From Everand
Compiler Design
Knowledge Flow
No ratings yet
Python Programming Concepts
From Everand
Python Programming Concepts
MRB
No ratings yet
Coding for beginners The basic syntax and structure of coding
From Everand
Coding for beginners The basic syntax and structure of coding
Diamond Moore
No ratings yet
Python: Advanced Guide to Programming Code with Python
From Everand
Python: Advanced Guide to Programming Code with Python
Charlie Masterson
No ratings yet
File Processing
No ratings yet
File Processing
10 pages
Chapter 4
No ratings yet
Chapter 4
17 pages
Lec1 Intro
No ratings yet
Lec1 Intro
119 pages
Installation: Melody Ii Technical Manual
100% (1)
Installation: Melody Ii Technical Manual
22 pages
Leveled Problem Solving Least Common Multiple: Lesson
No ratings yet
Leveled Problem Solving Least Common Multiple: Lesson
1 page
Fall Activity PDF
No ratings yet
Fall Activity PDF
1 page
Insert Paper Title Here in Title Case (Style: CET Title) : Chemical Engineering
No ratings yet
Insert Paper Title Here in Title Case (Style: CET Title) : Chemical Engineering
3 pages
TL101 - MAT1503 - Assignment - 5
No ratings yet
TL101 - MAT1503 - Assignment - 5
4 pages
Control WPF
No ratings yet
Control WPF
741 pages
Project Synopsis PRIYANKA Final
No ratings yet
Project Synopsis PRIYANKA Final
8 pages
Questions Scope PDF
No ratings yet
Questions Scope PDF
10 pages
Rad Model
No ratings yet
Rad Model
21 pages
P90
No ratings yet
P90
8 pages
discrete structures-assignment-2352821-DặngDuyNguyên
No ratings yet
discrete structures-assignment-2352821-DặngDuyNguyên
6 pages
NIMCET Binomial Theorem
No ratings yet
NIMCET Binomial Theorem
6 pages
Pg060 Floating Point
No ratings yet
Pg060 Floating Point
105 pages
Cricket Project
No ratings yet
Cricket Project
14 pages
Navigation and Path Following Guidance of A Small Auv - Maya: From Concept To Practice
No ratings yet
Navigation and Path Following Guidance of A Small Auv - Maya: From Concept To Practice
7 pages
ICT 34 Data Structures and Analysis of Algorithm
100% (1)
ICT 34 Data Structures and Analysis of Algorithm
9 pages
Profile Summary: Sindhu Rao
No ratings yet
Profile Summary: Sindhu Rao
3 pages
CSC 318 Solution To Past Q
No ratings yet
CSC 318 Solution To Past Q
4 pages
Factura/Invoice: Ap.51, Focsani, Vrancea
No ratings yet
Factura/Invoice: Ap.51, Focsani, Vrancea
1 page
Free Digital Marketing Resources
100% (1)
Free Digital Marketing Resources
14 pages
Linkedin - HR - Sample Resume
No ratings yet
Linkedin - HR - Sample Resume
3 pages
Haskell Introduction
No ratings yet
Haskell Introduction
26 pages
Charan Pamarthi
No ratings yet
Charan Pamarthi
8 pages
Plan 799
No ratings yet
Plan 799
1 page
Products Affected / Serial Numbers Affected:: TP17 212.pdf 08-11-17
No ratings yet
Products Affected / Serial Numbers Affected:: TP17 212.pdf 08-11-17
4 pages
EDS-305 308 309 316 Series
No ratings yet
EDS-305 308 309 316 Series
3 pages
New Approaches For Multi Criteria Analysis in Building Constructions
No ratings yet
New Approaches For Multi Criteria Analysis in Building Constructions
144 pages