0% found this document useful (0 votes)

39 views80 pages

CC Unit 2

The document outlines the process of lexical analysis, detailing how a program reads input characters, groups them into lexemes, and converts them into tokens for parsing. It discusses error handling in lexical analysis, including undefined symbols and invalid characters, as well as recovery methods. Additionally, it covers the structure and functioning of lexical analyzers, particularly using the Lex tool, and explains finite automata, including deterministic and nondeterministic types, in the context of recognizing patterns in input strings.

Uploaded by

knightrider772007

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

39 views80 pages

CC Unit 2

Uploaded by

knightrider772007

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

Unit 2: Lexical Analysis

It is a process by which a program reads an input stream of

character , groups them into lexeme and converts them into
tokens to be analyzed by parser.
Errors in lexical analysis phase

1. Undefined symbol
2. Invalid character
3. Unterminated comments
Main functions of lexical analyzer

1. Separate the lexeme

2. Modify reserve words, operators and special symbols

according to specification of terminal and non terminal
symbol.
3. Identify user defined words that is a word that doesn’t
contain blank or hash or slash character and check that
whether user defined word follows the rule defined by
language or not.
4. Tokenizing:- tokens are formed. <token, value>
5. Enter the information into symbol table.
Additional Task

 It removes spaces and comments from the input text.

 For every ‘\n’ it assign a line number.

 Identify a string of digits and converts them into its

internal binary representation.

 Processing of single special character or double special

character.
 Recognizing string constant.

 Deleting and recovering from errors while processing

illegal strings.
Error recovery in lexical analysis
 There are two issues to be considered if an error is diagnosed during lexical

analysis.
 One is that the lexical analyzer leave the statement completely and go to next

step.
 Identify the error , correct the error internally and call the parser for syntax

analysis.

 Suppose a situation may arrives in which the lexical analyzer is unable to

proceed because none of the pattern for token matches a prefix of remaining
input. For this we use simplest recovery strategy that is panic mode recovery.
 In this strategy if an error is diagnosed then the remaining characters are not

processed. But this is not the proper way to handle lexical error because the
tokens have to be sent to the parser for syntax analysis.
Other possible recovery methods

Inserting a missing character

Replacing an incorrect character by correct character.

Transposing (changing the place)to adjacent character.

Deleting an extra character.

Token, Patterns ,Lexemes

----Token describes the class or category of

input string.
eg. Identifiers, keywords, constants are called
tokens

----Patterns defines set of rules that describe

the token.

----Lexemes are the sequence of characters in

the source program
Token

A token is a pair consisting of a token name and an

optional attribute value.

A token name is an abstract symbol representing a

kind of lexical unit.

They are the input symbols that the parser

processes.
Pattern
 A pattern is a description of the form that the lexemes of

a token may take.

 In the case of a keywords as a token, the pattern is just

the sequence of characters that form the keyword.

 For identifiers and some other tokens, the pattern is a

more complex structure that is matched by many strings.

Operations on Languages:
Regular Expression

Regular expressions are mathematical symbolism which

describes the set of strings of specific language.

It provides convenient and useful notation for

representing tokens.

The grammar defined by regular expressions is known

as regular grammar. The language defined by regular
grammar is known as regular language.
Notations
If r and s are regular expressions denoting the
languages L(r) and L(s), then

Union : (r)|(s) is a regular expression denoting L(r) U

L(s)

Concatenation : (r)(s) is a regular expression denoting

L(r)L(s)

Kleene closure : (r)* is a regular expression denoting

(L(r))*

(r) is a regular expression denoting L(r)

Precedence and Associativity

*, concatenation (.), and | (pipe sign) are left

associative

* has the highest precedence

Concatenation (.) has the second highest

precedence.

| (pipe sign) has the lowest precedence of all.

Regular definition
 If Σ is an alphabet of basic symbols, then a regular definition is a

sequence of definitions of the form:

d1 → r1
d2 → r2
.
dn → rn
Where each di is a new symbol & each ri is a regular expression
over the alphabet Σ ᵁ {d1,d2,……di-1}
Recognition of tokens

For this transition diagrams are used.

It is a diagrammatic representation to depict the action

that will take place when a lexical analyzer is called by

the parser to get the next token.
It is used to keep track of information about the characters

that are seen as the forward pointer scans the input.

The lexical analyzer Generator Lex

Lex: The Lex compiler is a tool that allows one to specify

a lexical analyzer by specifying regular expressions to

describe patterns for tokens..
Inputs are specified in the Lex language and the tool itself

is a lex compiler.
Lex compiler transforms the input patterns into a transition

diagram and generates code in a file called [Link].c

Use of Lex
Overview of Lex

 An input file lex.l is written in the Lex language and describes

the lexical analyzer to be generated.

 The lex compiler transforms lex.l to C program, in a file that is

always named [Link].c.

 The latter file is compiled by the C compiler into a file called

[Link].
 The C compiler output is a working lexical analyzer that can take

a stream of input characters and produce a stream of tokens.

More about [Link]
It is a subroutine of the parser.

It is a C function that returns an integer ,

which is a code for one of the possible token

names.
The attribute value is placed in a global

variable yylval which is shared between the

lexical analyzer and parser, there by making
it simple to return both the name and an
attribute value of a token.
Structure of a Lex Program

A Lex program consists of

declarations
%%
translation rules
%%
auxiliary functions
Structure of a Lex Program

The declaration section includes declarations of variables,

constants and regular definitions.

The transition rules each have the form

Pattern {Action}

Each pattern is a regular expression which may use the

regular definitions of the declaration section.

Structure of a Lex Program

The actions are fragments of code typically written in C.

The third section holds additional functions used in the

actions. Alternatively these functions can be compiled

separately and loaded with the lexical analyzer.
Finite Automata(FA)
Finite Automata(FA) is the simplest machine to recognize

patterns.
A Finite Automata consists of the following :

Q : Finite set of states.

 ∑ : set of Input Symbols.

q : Initial state.

F : set of Final States.

δ : Transition Function. Formal specification of machine is

{ Q, ∑, q, F, δ }.
Deterministic Finite Automata (DFA)

DFA consists of 5 tuples {Q, ∑, q, F, δ}.

Q : set of all states.

∑ : set of input symbols. ( Symbols which
machine takes as input )
q : Initial state. ( Starting state of a machine )
F : set of final state.
δ : Transition Function, defined as δ : Q X ∑ -->
Q.
Deterministic Finite Automata (DFA)

In a DFA, for a particular input character, the

machine goes to one state only.

A transition function is defined on every state

for every input symbol.

Also in DFA null (or ε) move is not allowed,

i.e., DFA cannot change state without any

input character.
Nondeterministic Finite Automata(NFA)

NFA is similar to DFA except following

additional features:

1. Null (or ε) move is allowed i.e., it can move

forward without reading symbols.

2. Ability to transmit to any number of states

for a particular input.
Nondeterministic Finite Automata(NFA)

However, these above features don’t add any

power to NFA. If we compare both in terms of
power, both are equivalent.
Due to above additional features, NFA has a

different transition function, rest is same as

DFA.
δ: Transition Function δ: Q X (∑ U ϵ ) --> 2 ^

Q.
Some Important Points:
1. Every DFA is NFA but not vice versa.

2. Both NFA and DFA have same power and each NFA can be translated
into a DFA.

3. There can be multiple final states in both DFA and NFA.

3. NFA is more of a theoretical concept.

4. DFA is used in Lexical Analysis in Compiler.

Applications :Finite Automata in Practice

 Grep : Global Regular Expression and Print

A Unix pattern matching utility that searches files for a string of text and outputs the line that

contains the pattern.

 Thermostats

 Coke Machines

 Elevators

 Train Track Switches

 Security Properties

 Lexical Analyzers for Parsers

The structure of the generated analyzer
The structure of the generated analyzer

 The fig gives us the overview of the architecture of a lexical

analyzer generated by Lex.

 Lexical analyzer program includes:


 • Program to simulate automata
 • Components created from lex program by lex itself which are
listed as follows:
 o A transition table for automaton.
 o Functions that are passed directly through lex to the
output.
 o Actions from input program (fragments of code) which are
invoked by automaton simulator when needed.
The structure of the generated analyzer

steps to construct automaton

Step 1: Convert each regular expression into
NFA either by Thompson's subset
construction or Direct Method.
Step 2: Combine all NFAs into one by
introducing new start state with s-transitions
to each of start states of NFAs Ni for pattern
Pi·
Step 2 is needed as the objective is to
construct single automaton to recognize
lexemes that matches with any of the
patterns.
a {action A1 for pattern Pl}
 abb { action A2 for pattern P2 }
 a*b+ { action A3 for pattern P3 }

For string obb, pattern P2 and pattern p3
matches. But the pattern P2 will be taken into
account as it was listed first in lex program.
For string aabbb · · · , matches pattern p3 as
it has many prefixes.
Fig. Shows NFAs for recognizing the above
mentioned three patterns.
The combined NFA for all three given
patterns is shown in Fig.
Pattern Matching Based on NFAs

 Lexical analyzer reads input from input buffer from the beginning of lexeme

pointed by the pointer lexemeBegin.

 Forward pointer is used to move ahead of input symbols, calculates the set

of states it is in at each point.

 If NFA simulation has no next state for some input symbol, then there will be

no longer prefix which reaches the accepting state exists.

 At such cases, the decision will be made on the so seen longest prefix i.e.,

lexeme matching some pattern.

 Process is repeated until one or more accepting states are reached. If there

are several accepting states, then the pattern Pi which appears earliest in

the list of lex program is chosen.

e.g. W= aaba
Explanation
Process starts with s-closure of initial state 0. After
processing all the input symbols, no state is found as
there is no transition out of state 8 on input a. Hence,
look for accepting state by retracting to previous
state. From Fig. state 2 which is an accepting state is
reached after reading input symbol a and therefore
the pattern a has been matched. At state 8, string aab
has been matched with pattern avb": By Lex rule, the
longest matching prefix should be considered. Hence,
action Ag corresponds to pattern p3 will be executed
for the string aab.

Lexical Analysis in Compiler Design
No ratings yet
Lexical Analysis in Compiler Design
20 pages
Chapter 2
No ratings yet
Chapter 2
39 pages
Lexical Analysis in Compiler Design
No ratings yet
Lexical Analysis in Compiler Design
88 pages
Unit II - Lexical Analysis-20-1-2021
No ratings yet
Unit II - Lexical Analysis-20-1-2021
49 pages
Lexical Analysis in Compiler Design
No ratings yet
Lexical Analysis in Compiler Design
48 pages
Unit 6
No ratings yet
Unit 6
109 pages
Lexical Analysis
No ratings yet
Lexical Analysis
36 pages
Lexical Analysis in Compiler Design
No ratings yet
Lexical Analysis in Compiler Design
64 pages
Lexical Analysis in Compiler Design
No ratings yet
Lexical Analysis in Compiler Design
11 pages
Lexical Analysis: Tokens & Patterns Explained
No ratings yet
Lexical Analysis: Tokens & Patterns Explained
77 pages
Chapter 2
No ratings yet
Chapter 2
56 pages
Compiler Lexical Analysis Guide
No ratings yet
Compiler Lexical Analysis Guide
56 pages
Lexical Analyzer (Compiler Contruction)
100% (1)
Lexical Analyzer (Compiler Contruction)
6 pages
Lexical Analysis in Compiler Design
No ratings yet
Lexical Analysis in Compiler Design
16 pages
Chapter 2 Lexical Analysis
No ratings yet
Chapter 2 Lexical Analysis
33 pages
Compiler Construction Lexical Analysis
No ratings yet
Compiler Construction Lexical Analysis
63 pages
The Role of The Lexical Analyzer: Token Source Program
No ratings yet
The Role of The Lexical Analyzer: Token Source Program
30 pages
Lexical Analyzer Lab Manual
0% (1)
Lexical Analyzer Lab Manual
32 pages
4 Lexical Analysis
No ratings yet
4 Lexical Analysis
60 pages
Unit NO.03 Phases in Compilers-Lexical Analysis& Syntax Analysis
No ratings yet
Unit NO.03 Phases in Compilers-Lexical Analysis& Syntax Analysis
43 pages
CH 2
No ratings yet
CH 2
36 pages
Chapter 2 Lexical Analysis
No ratings yet
Chapter 2 Lexical Analysis
55 pages
Chapter 2 - Lexical Analyser
No ratings yet
Chapter 2 - Lexical Analyser
38 pages
Chapter 2
No ratings yet
Chapter 2
91 pages
1st Phase Lexical Analyzer
No ratings yet
1st Phase Lexical Analyzer
33 pages
Lexical Analysis
No ratings yet
Lexical Analysis
88 pages
File 1675742677 110405 LexicalAnalysis-Continue1
No ratings yet
File 1675742677 110405 LexicalAnalysis-Continue1
39 pages
Chapter 7 Lexical Analysis
No ratings yet
Chapter 7 Lexical Analysis
61 pages
Lexical Analysis and Token Recognition
100% (3)
Lexical Analysis and Token Recognition
51 pages
Compiler Syntax Analysis Guide
No ratings yet
Compiler Syntax Analysis Guide
39 pages
CH 3
No ratings yet
CH 3
66 pages
CH 3 Myppt
No ratings yet
CH 3 Myppt
59 pages
ch-2.pdf 2
No ratings yet
ch-2.pdf 2
27 pages
Lec 2
No ratings yet
Lec 2
30 pages
Scanner (Lexical Analyzer) : The Structure of A Compiler
No ratings yet
Scanner (Lexical Analyzer) : The Structure of A Compiler
109 pages
Compiler-Lexical Analysis
100% (1)
Compiler-Lexical Analysis
59 pages
Lexical Analysis in Compiler Design
No ratings yet
Lexical Analysis in Compiler Design
10 pages
Compiler - Lexical Analyzer-2
No ratings yet
Compiler - Lexical Analyzer-2
16 pages
Chapter 3 Finite Automata and Lexical Analysis
No ratings yet
Chapter 3 Finite Automata and Lexical Analysis
100 pages
Compiler Design: Lexical Analysis
No ratings yet
Compiler Design: Lexical Analysis
27 pages
Compiler Design - Lexical Analysis
No ratings yet
Compiler Design - Lexical Analysis
16 pages
2024 CD-Ch02 Lexical Analysis
No ratings yet
2024 CD-Ch02 Lexical Analysis
25 pages
Lexical Analysis and Lexical Analyzer Generators: COP5621 Compiler Construction
No ratings yet
Lexical Analysis and Lexical Analyzer Generators: COP5621 Compiler Construction
52 pages
Compiler Design
No ratings yet
Compiler Design
42 pages
Finite Automata and Lexical Analysis
No ratings yet
Finite Automata and Lexical Analysis
95 pages
Compiler
No ratings yet
Compiler
60 pages
Chapter 3 - Lexical Analysis and Lexical Analyzer Generators
No ratings yet
Chapter 3 - Lexical Analysis and Lexical Analyzer Generators
52 pages
Lexical Analysis
No ratings yet
Lexical Analysis
62 pages
Week 5-6
No ratings yet
Week 5-6
33 pages
CD Unit-2
No ratings yet
CD Unit-2
64 pages
CD Unit-2
No ratings yet
CD Unit-2
64 pages
Chapter 2 - Lexical Analyser
No ratings yet
Chapter 2 - Lexical Analyser
39 pages
Compiler Course: Lexical Analysis
No ratings yet
Compiler Course: Lexical Analysis
50 pages
Question Bank Part A, Part B&C
No ratings yet
Question Bank Part A, Part B&C
15 pages
Btech Roll Call List
No ratings yet
Btech Roll Call List
4 pages
Experiment No 11
No ratings yet
Experiment No 11
1 page
Syntax Analysis in Compiler Design
No ratings yet
Syntax Analysis in Compiler Design
51 pages
CC Unit 1
No ratings yet
CC Unit 1
70 pages
Exp4 CC
No ratings yet
Exp4 CC
10 pages
C# Questions For Oral
No ratings yet
C# Questions For Oral
5 pages
Metamodels For Industry 4.0
100% (1)
Metamodels For Industry 4.0
22 pages
Project Report Format - Draft - BMS
No ratings yet
Project Report Format - Draft - BMS
21 pages
The Fourth Phase of Water Explained
No ratings yet
The Fourth Phase of Water Explained
1 page
The Two Ibn Sina's PDF
No ratings yet
The Two Ibn Sina's PDF
26 pages
BMN B maXX 2024 en Web
No ratings yet
BMN B maXX 2024 en Web
40 pages
Purple Revolution & Other Color Revolutions
No ratings yet
Purple Revolution & Other Color Revolutions
134 pages
SPW Notes Unit I Wave Optics
No ratings yet
SPW Notes Unit I Wave Optics
58 pages
Basic Principles of Nursing Care PDF
No ratings yet
Basic Principles of Nursing Care PDF
11 pages
Technical Specifications of GIV-20A2616-GSOLA
No ratings yet
Technical Specifications of GIV-20A2616-GSOLA
5 pages
Ad. Jingles Brand Recall
No ratings yet
Ad. Jingles Brand Recall
13 pages
Soil Mechanics - KNS2123: Tutorial Compaction
No ratings yet
Soil Mechanics - KNS2123: Tutorial Compaction
5 pages
Finding A Silver Lining - Thanksgiving
No ratings yet
Finding A Silver Lining - Thanksgiving
4 pages
2021 Data Representation IGCSE 0478
No ratings yet
2021 Data Representation IGCSE 0478
18 pages
Understanding Barcode Technology Basics
No ratings yet
Understanding Barcode Technology Basics
17 pages
DC Circuits Worksheet
No ratings yet
DC Circuits Worksheet
4 pages
Important Equity Research Terms With Examples
No ratings yet
Important Equity Research Terms With Examples
13 pages
FMHM Coursefile
No ratings yet
FMHM Coursefile
77 pages
Upstanders Against Cyberbullying
No ratings yet
Upstanders Against Cyberbullying
9 pages
Theoretical and Conceptual Frameworks
83% (6)
Theoretical and Conceptual Frameworks
133 pages
Proposal For The Creation of The National Space Development and Utilization Policy and The National Space Agency
No ratings yet
Proposal For The Creation of The National Space Development and Utilization Policy and The National Space Agency
54 pages
Grade4 101304 5 9204
No ratings yet
Grade4 101304 5 9204
4 pages
3, 5, 7 Term B.Tech - BE Master Time Table ESE Winter - 2021
No ratings yet
3, 5, 7 Term B.Tech - BE Master Time Table ESE Winter - 2021
8 pages
46.sae100 R1at en 853 1SN
No ratings yet
46.sae100 R1at en 853 1SN
1 page
Liycy-Oz-Jz Delta PDF
No ratings yet
Liycy-Oz-Jz Delta PDF
3 pages
Three Phase Transformer Connections
No ratings yet
Three Phase Transformer Connections
17 pages
Kaka With Jianrong Song SCT SPS NiCrAlY 2010-Libre
No ratings yet
Kaka With Jianrong Song SCT SPS NiCrAlY 2010-Libre
4 pages
Profhilo Brochure
No ratings yet
Profhilo Brochure
17 pages
Accounting For Managers
No ratings yet
Accounting For Managers
11 pages
Data Acquisition System Development For A Hydraulic Plant Using Hybrid Communication Network Based On LoRa
No ratings yet
Data Acquisition System Development For A Hydraulic Plant Using Hybrid Communication Network Based On LoRa
6 pages
Cummins 4BT - 6BT Engine Parts Catalogue - Diesel Engine, Diesel Engine Parts, Generator Set Exporter
100% (1)
Cummins 4BT - 6BT Engine Parts Catalogue - Diesel Engine, Diesel Engine Parts, Generator Set Exporter
2 pages