CD Unit 1

Uploaded by

hemanthavanapu13

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views

CD Unit 1

Uploaded by

hemanthavanapu13

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 54

Unit 1

Lexical Analysis
• Language Processors
• Structure of a Compiler
• Lexical Analysis
• The Role of the Lexical Analyzer
• Bootstrapping
• Input Buffering
• Specification of Tokens
• Recognition of Tokens
• Lexical Analyzer Generator – LEX
• Finite Automata
• Regular Expressions and Finite Automata
• Design of a Lexical Analyzer Generator
Compiler
• Is a s/w which converts a prg written in HLL to LLL.
Language Processors
Steps:
1. Src code is converted into preprocessor
2. Preprocessor is connected to compiler which gives pure HLL code
3. With the help of compiler the preprocessor code is converted into
Assembler
4. Assembler will convert the code into m/c code
* Assembler translates LLL to m/c code
* Linker is used to link all parts of program together for execution
* Loader loads all them into memory and then the prg is executed
Preprocessor
• It removes all the include files
#include
#define
• In the places of these it is going to add the complete src code
Structure of a Compiler
• We can also call it as phases of a compiler as compiler operates in
phases.
• A phase is logically interrelated operation that takes the src prg in one
representation and produces o/p in another representation
• Two major phases of compilation
a) Analysis Phase – m/c independent but Language dependent
b) Synthesis Phase -- m/c dependent but Language independent
• Different phases are present in b/w conversion of HLL to LLL.
• All these 6 phases are managed with the help of symbol table manager
All the symbol related information will be stored here
If any error occurred in b/w these phases error handler is going to
handle that error.
1.Lexical analysis: lexical analysis is also called scanning.it is the phase of compilation
in which the complete source code is scanned and source program is broken-up in
to the group of strings called token.
Ex:consider an assignment statement
position:=initial+rate*60
the above statement in lexical analysis phase is broken up into series of
tokens as follows
lexeme token
position id
:= assignment symbol
initial id
+ plus sign
rate id
* multiplication
60 numerical literal
The statement after lexical analysis as id1:=id2+id3*60
where id1,id2,id3 are tokens for position ,initial and rate.
Lexical Analysis
• The source code will enter into phase 1
• This LA reads the stream of characters making up the src prg and group
the characters into meaningful sequences called Lexeme
• LA represents these lexemes in the form of token
2.Syntax analysis: The Syntax analysis is also called parsing.
• In this phase ,the tokens generated by the lexical analyzer are
grouped together to form a hierarchical structure
• the hierarchical structure generated in this phase is called
parse tree.
it can be represented by a syntax tree ,which is a
compressed representation of the parse tree
Syntax Analysis
• Also called as parsing
• In this phase expressions, statements,declarations are identified by
using the result of LA
• So this SA takes the tokens produced by LA as an i/p and generates a
parse tree/syntax tree
Semantic Analysis
• It checks whether the parse tree constructed follows the rules of
language or not
Semantic analysis:
• Semantic analysis checks the source program for semantic
errors .it performs type checking.
• changes one data type to another .
Example:id1:=id2+id3*60
Let all identifiers are real ,then we have to convert the
number 60 as int to real
4.Intermediate code generator:
• The intermediate code is a kind of code which is easy to generate
&this code can be easily converted to target code.(or)
• The intermediate code generator performs a conversion of syntax
tree into a code called target code.
• The intermediate code should be easily transferred to target code
and intermediate code generation should not take much time.
• It is in variety of forms such as Three-address code,
Quadruples,Triple and Postfix.
Ex:The three address code for the statement
position :=initial+rate*60 or id1:=id2+id3*60 is given
as

Temp1:=inttoreal(60)
Temp2:=id3*temp1
Temp3:=id2+temp2
Id1:=temp3
5.Code optimizer:
Code optimizer phase improves the intermediate
code .It reduces the code by removing the repeated
or unwanted instructions from the intermediate
code.
The code id1:=id2+id3*60 can be optimized as

temp1:=id3*60.0
id1:=id2+temp1
Code Optimization

• Used to improve the intermediate code so that

o/p runs faster and takes less space
It removes the unnecessary code lines and arranges the sequence of
stmts in order to speed up the prg without wasting the resources
6.Code generator: The code generator phase converts the
intermediate code into a target code.
Ex: Assembly code for id1:=id2+id3*60
MOV id3,R2
MUL #60.0,R2
MOV id2,R1
ADD R2,R1
MOV R1,id1
Symbol Table: The data structure is used to record this information is
called symbol table.this data structures allows us to find the record
for each identifier quickly and store or retrieve data quickly.
Error handler: Error handler invoked when a fault in the source
program is detected .it warns the programmer by issuing a message
and adjust the information being passed from phase to phase so
that the compilation proceeds.
Role of Lexical Analyzer
• LA is the first phase of compiler
• Also called as linear analysis or scanning
Both LA and parser are dependent on symbol table
Symbol table contains all the identifier names along with there types
Functions of Lexical Analyzer:
• Produces stream of tokens
• Eliminates blank spaces and comments
• It keeps track of line numbers
• It generate symbol table which stores the information about identifiers ,constants
encountered in the input.
Token: Token is a sequence of characters having a collective meaning.in most
programming languages keywords,operators,identifiers,constants,literal strings are
treated as tokens
Pattern:Pattern is a set of rules that describe the token.
Lexeme:lexeme is a sequence of characters in the source program that is matched by
the pattern for a token
Ex:if(a<b)
Lexeme token pattern
If keyword if
( left parenthesis ( or )
a identifier letter followed by Letter or digit
< relational op < or <= or > or >= or ==
b identifier letter followed by letter or digit
) right parenthesis ( or )
Lexical Errors:
 A Lexical analyzer reads input string and generate tokens.
 It may be encounter the following type of errors
i)A misspelled identifier
ii)An identifier with over than specified or defined length.
These errors can be caused due to the following reasons
 An invalid character
 A missing character
 Misplaced character
Example:
main()
{
printf(“hello)
}
In this code ,2 syntax errors
i)Double quote missing right side of string “hello”
ii)Semicolon missing in printf() statement.
Bootstrapping
• Mainly useful in order to design a compiler
• Generally any compiler is represented with 3 languages
1. Source language – HLL
2. Target language – LLL
3. Implementation language – converts S to T and acts as an
intermediate b/w them
The process of compilation is represented in T diagram
Bootstrapping is a process in which a simple language is used in order
to translate a complicated prg;but this complicated prg produce even
more complicated prg
• Compiler is written using A language
• As cpu understands only MLL so for that purpose we need to develop a
small set of language A
• So we have to design a compiler for A language.., this compiler accepts A
language as an input.. It is producing m/c code as o/p M.,
• Here the compiler is the assembly level language which is m/c
understandable code
Input Buffering
Buffer: contains data that is stored for a short amt of time typically in
the cmp memory(RAM).
• The purpose of buffer is to hold data right before it is used
• If something needs to be processed then before going straight into
the processor it gets into the buffer and then it is passed to the
processor. This process is known as buffering
• For the compilers this LA uses this i/p buffering to check the stream of
characters in the src code using the buffering method
eof – end of file
• Eof is also known as Sentinels
• It has 2 pointers, which are used to identify each lexeme one by one
1. lexemeBegin
2. Forward
Procedure
1. lexemeBegin(lB) points to the first element
2. Forward(f) also points to the same element at the beginning, so we get to
know that E is the first character
3. lB will be in the E and f will move to the next i/p symbol =
4. LA checks the symbol table that if there is E present or not
 if it is present what is the attribute of it

 if not it sends it and registers in the symbol table for further use
Specification of Tokens
• In order to specify the token we will use Regular Expression
• RE generates regular languages.
• Languages are set of strings which means set of alphabets
• We can specify token
1. Alphabets, strings, language
2. Operations on language
3. Regular Expression
4. Regular Definition
Alphabet
• It is a set of symbols denoted by Σ (sigma)
String
• A finite set of symbols which are generated from alphabet
• We can derive n no.of strings from the alphabet
a) Length of a string:
Length is the total no.of characters/symbols presented in the string
let s=1100
• The length of the string is denoted by |s| -- within the mod symbol
name of the string
|s|=4
b) Empty String:
• If length of the string is 0 then it is empty string
• Denoted by Epsilon 𝜖
𝜖=0
c) Prefix of a String:
Languages: Language means a set of Strings which are Generated from
the Alphabet.
Ex: The set of strings consisting of equal number of a’s and b’s:
L= {ε,ab,aabb,aaabbb….}
Operations on languages:
1.union
2.Concatenation
3.Kleen star/closure(a set of strings which
includes empty string)
Regular Expression:
• Expression can be defined over an alphabet as
• Let ∑ be an alphabet.The regular Expression over ∑ and the sets they denote are
defined as follows.
1) ∅ Is a Regular expression and denotes the empty set
2)ε is a Regular expression and denotes the set {ε } i.e.,a set containing empty String.
3)If ‘a’ is a symbol in ∑, then a is Regular expression that denotes {a}. i.e.,a set
containing the String a.
4)If r and s are two Regular expressions denoting the language L(r) and L(s), then
a)union of two Regular expression is a Regular expression.
b)concatenation of two Regular expression is a Regular expression.
c)Kleen closure of any Regular expression is a Regular expression.
• These three is known as properties or components of Regular expression.
• Ex: 0(0+1)*1 is a Regular expression denoting the set of all Strings over {0,1} starting
with 0 and ending with 1.
Regular Definition: A Regular Definition is a sequence of the definitions of
the form:
d1->r1 (d1 implies r1) which can be pronounced as produces
d2->r2
where d1,d2,…di is nothing but a symbol which is not part of the
alphabet and r1,r2… is nothing but a regular expression.
EX: Regular definition for identifier in pascal language.
letter->a|b…..|z|A|B…..|Z|.
digit->0|1|….|9|
id->letter(letter|digit)*.
Recognition of Tokens
• We have to recognize tokens with the help of transition diagrams
1) Recognition of Identifiers (Name of the Variable)
2) Recognition of delimiter (blank space/tab space/new line character)
3) Recognition of relational operators(>,>=,<,<=,==,!=)
4) Recognition of keywords such as if,else,for
5) Recognition of numbers(int/floating point numbers)
Lexical Analyzer Generator-LEX
• There is a wide range of tools for construction of lexical
analyzer
• The majority of these tools are based on regular expression.
• One of the traditional tool of that kind is LEX.
• LEX program is a tool that is used to generate lexical Analyzer
(or) Scanner. The tool is often referred as LEX compiler and its
input specification as LEX language.
1. lex.l prg is converted into c prg with the help of lex compiler
2. C compiler runs the lex.yy.c prg and produces an object prg
a.out
3. First 2 steps are performed by lex. A.out is LA that transforms
an i/p stream into a seq of tokens
Design of a Lexical Analyzer Generator
Declaration section:
This section is used to declare variables, manifest constants and regular definitions.
The manifest constant is an identifier that is used to represent a constant.
EX: In c manifest constants are declared as #define MAX=10
A regular definition is a sequence of regular expressions where each regular
regular expression is given a name .
d1->r1
d2->r2
d3->r3
…………
…………
dn->rn
Translation rules section: The Translation rules is a set of statements
which defines the action to be performed for each regular expression
given.
below are the form of statements in Translation rules section
r1 {method1}
r2 {method2}
. .
. .
rn {methodn}
Auxilary procedures: These procedures are used by the methods in
translation .These procedures can be compiled separately and loaded
with the lexical analyzer.
Lex Specification
Declaration section

Translation rules section LEX Program

Auxiliary procedures
section

Structure of a LEX program

EXAMPLE1:Write a lex program to count the no of vowels & constants in a given input.
%{
#include<stdio.h>
int vowels=0;
int const=0;
%}

{a e i o u A E I O U} {vowels++}
{a-z A-Z } {const++}

main()
{
printf(“enter the string “);
yylex();
printf(“no of vowels=%d\n no of constants =%d\n “,vowels,const);
}
int yywrop()
{
return 1;
}

DOTNET Mettl Solutions
75% (4)
DOTNET Mettl Solutions
73 pages
Excel Formulas in Hindi
83% (6)
Excel Formulas in Hindi
10 pages
1 Lexial Analysis
No ratings yet
1 Lexial Analysis
24 pages
Cse-V-Formal Languages and Automata Theory (10cs56) - Solution
No ratings yet
Cse-V-Formal Languages and Automata Theory (10cs56) - Solution
67 pages
Introduction To Artificial Intelligence PDF
No ratings yet
Introduction To Artificial Intelligence PDF
8 pages
UNIT-1 CD
No ratings yet
UNIT-1 CD
21 pages
Unit 1
No ratings yet
Unit 1
50 pages
cd UNIT-1
No ratings yet
cd UNIT-1
60 pages
Overview of Compiler Environment Pass and Phase Phases of Compiler Regular Expression Lexical Analyzer LEX Tool Bootstrapping
No ratings yet
Overview of Compiler Environment Pass and Phase Phases of Compiler Regular Expression Lexical Analyzer LEX Tool Bootstrapping
35 pages
SCS13033
No ratings yet
SCS13033
121 pages
UNIT-I Compiler Design - SCS1303: School of Computing Department of Computer Science and Engineering
No ratings yet
UNIT-I Compiler Design - SCS1303: School of Computing Department of Computer Science and Engineering
27 pages
Unit-1 PCD
No ratings yet
Unit-1 PCD
28 pages
Unit 5 SP
No ratings yet
Unit 5 SP
28 pages
SCSA1604
No ratings yet
SCSA1604
133 pages
Module-1 1
No ratings yet
Module-1 1
53 pages
Compiler Design
No ratings yet
Compiler Design
117 pages
Compiler Design Mod 1
No ratings yet
Compiler Design Mod 1
75 pages
Introduction To Compiler Design-Unit I
No ratings yet
Introduction To Compiler Design-Unit I
30 pages
Comp Final
No ratings yet
Comp Final
16 pages
Unit 1
No ratings yet
Unit 1
24 pages
CD - Ch.1
No ratings yet
CD - Ch.1
28 pages
COMPILER DESIGN UNIT 1
No ratings yet
COMPILER DESIGN UNIT 1
30 pages
CD Laqs
No ratings yet
CD Laqs
29 pages
Module 1
No ratings yet
Module 1
133 pages
HW_31712
No ratings yet
HW_31712
22 pages
Cat 1
No ratings yet
Cat 1
150 pages
CSE353 Slides
No ratings yet
CSE353 Slides
76 pages
Gate Compiler Design-
No ratings yet
Gate Compiler Design-
72 pages
VMKV Engineering College Department of Computer Science & Engineering Principles of Compiler Design Unit I Part-A
No ratings yet
VMKV Engineering College Department of Computer Science & Engineering Principles of Compiler Design Unit I Part-A
80 pages
Automata Theory and Compiler Design
No ratings yet
Automata Theory and Compiler Design
55 pages
Compiler Constructer
No ratings yet
Compiler Constructer
17 pages
Unit 1 Slides
No ratings yet
Unit 1 Slides
49 pages
Eex 6335 Tma 1
0% (1)
Eex 6335 Tma 1
12 pages
CC_unit_1
No ratings yet
CC_unit_1
70 pages
CD - Ch.1
No ratings yet
CD - Ch.1
28 pages
1.Q and A Compiler Design
No ratings yet
1.Q and A Compiler Design
20 pages
Lexical Analysis and Parsing CD
No ratings yet
Lexical Analysis and Parsing CD
107 pages
Compiler Notes 1
No ratings yet
Compiler Notes 1
20 pages
CD Assignment Question Bank
No ratings yet
CD Assignment Question Bank
20 pages
Unit-I - CD R2021
No ratings yet
Unit-I - CD R2021
60 pages
L2 Compiler Phases
No ratings yet
L2 Compiler Phases
29 pages
SSCDNotes PDF
100% (1)
SSCDNotes PDF
53 pages
Notes - IAE-1-CD
No ratings yet
Notes - IAE-1-CD
14 pages
Compiler Notes
No ratings yet
Compiler Notes
66 pages
Chapter 1
No ratings yet
Chapter 1
28 pages
Introduction
No ratings yet
Introduction
46 pages
Module 5 Lexical Analyser
No ratings yet
Module 5 Lexical Analyser
10 pages
Compiler Design: Instructor: Mohammed O. Samara University
No ratings yet
Compiler Design: Instructor: Mohammed O. Samara University
28 pages
Introduction Compiler
No ratings yet
Introduction Compiler
47 pages
CD - CH2 - Lexical Analysis
No ratings yet
CD - CH2 - Lexical Analysis
59 pages
Unit 2
No ratings yet
Unit 2
14 pages
CD 1
No ratings yet
CD 1
23 pages
compiler design unit 1 srm 21 regulation
No ratings yet
compiler design unit 1 srm 21 regulation
193 pages
CD Lexical
No ratings yet
CD Lexical
26 pages
Unit I SRM
100% (1)
Unit I SRM
36 pages
Lesson 08 2
No ratings yet
Lesson 08 2
33 pages
Compiler Design Slide Chapter 1-6
No ratings yet
Compiler Design Slide Chapter 1-6
250 pages
Slides 01 - Compiler Construction - UET CS - Introduction
No ratings yet
Slides 01 - Compiler Construction - UET CS - Introduction
37 pages
Compiler Module 1 Important Questions
No ratings yet
Compiler Module 1 Important Questions
14 pages
CC Questions
No ratings yet
CC Questions
9 pages
CC Record
No ratings yet
CC Record
83 pages
PCD
No ratings yet
PCD
3 pages
Compiler Design: Instructor: Mohammed O. Samara University
100% (1)
Compiler Design: Instructor: Mohammed O. Samara University
28 pages
Compiler Design
From Everand
Compiler Design
Knowledge Flow
No ratings yet
ACD Unit - III Notes
No ratings yet
ACD Unit - III Notes
22 pages
Tarski On Satisfaction
No ratings yet
Tarski On Satisfaction
4 pages
Working With The Java Class Library
100% (1)
Working With The Java Class Library
56 pages
Secondary RQA 2014-2015
100% (1)
Secondary RQA 2014-2015
27 pages
Exam Solution in Logic
No ratings yet
Exam Solution in Logic
9 pages
TOC MCQs
No ratings yet
TOC MCQs
84 pages
Abstract Syntax Tree Implementation Idioms
No ratings yet
Abstract Syntax Tree Implementation Idioms
10 pages
1-Automata Lec1
No ratings yet
1-Automata Lec1
23 pages
Learn Java - String Methods Cheatsheet - Codecademy
No ratings yet
Learn Java - String Methods Cheatsheet - Codecademy
3 pages
PDA Accept Context Free
No ratings yet
PDA Accept Context Free
69 pages
CC 2
No ratings yet
CC 2
65 pages
CS-501 TOC Notes
No ratings yet
CS-501 TOC Notes
98 pages
Compiler Design Chapter-4
100% (2)
Compiler Design Chapter-4
77 pages
Chapter 1: (Part 2) : The Foundations: Logic and Proofs: Propositional Equivalence
No ratings yet
Chapter 1: (Part 2) : The Foundations: Logic and Proofs: Propositional Equivalence
40 pages
Mozart-Oz Notation
No ratings yet
Mozart-Oz Notation
39 pages
PCD - Theory - Paper Solution - Nov - Dec - 2017
No ratings yet
PCD - Theory - Paper Solution - Nov - Dec - 2017
27 pages
02 Formal Logic KR
No ratings yet
02 Formal Logic KR
25 pages
Post's Correspondence Problem: Next Prev Prev-Tail Tail Up
No ratings yet
Post's Correspondence Problem: Next Prev Prev-Tail Tail Up
11 pages
Automata Theory: CS411-2012F-03 Determinisitic Finite Automata
No ratings yet
Automata Theory: CS411-2012F-03 Determinisitic Finite Automata
37 pages
Definition Greibach Normal Form (GNF)
No ratings yet
Definition Greibach Normal Form (GNF)
5 pages
Swift C Sharp Poster
100% (1)
Swift C Sharp Poster
1 page
Chöông 4: Phaân Tích Cuù Phaùp
No ratings yet
Chöông 4: Phaân Tích Cuù Phaùp
54 pages
CGF and Regular Grammar - Ambuity
No ratings yet
CGF and Regular Grammar - Ambuity
6 pages
Unit 3 notes logic concepts
No ratings yet
Unit 3 notes logic concepts
36 pages
SQL vs Pyspark-1
No ratings yet
SQL vs Pyspark-1
9 pages
AI 09 Inference in First Order Logic
No ratings yet
AI 09 Inference in First Order Logic
27 pages