0% found this document useful (0 votes)

234 views56 pages

Lexical Analyzer

This document discusses lexical analysis, which involves separating a stream of characters into tokens. It covers the responsibilities of a lexical analyzer, including scanning input, removing whitespace and comments, generating tokens, and reporting lexical errors. Buffer pairs are described as an efficient input buffering technique used by lexical analyzers, where two buffers are alternately reloaded and two pointers track the current lexeme and scan position. Regular expressions are also discussed as a standard way to specify patterns for recognizing tokens.

Uploaded by

Pooja Jain

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

234 views56 pages

Lexical Analyzer

Uploaded by

Pooja Jain

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 56

Unit 2

Lexical Analyzer

Outline
Informal sketch of lexical analysis
Identifies tokens in input string

Issues in lexical analysis

Lookahead
Ambiguities

Specifying lexemes
Regular expressions
Examples of regular expressions

Lexical Analyzer
Functions
Grouping input characters into tokens
Stripping out comments and white
spaces
Correlating error messages with the
source program
Issues (why separating lexical analysis
from parsing)
Simpler design
Compiler efficiency
Compiler portability (e.g. Linux to Win)

The role of the lexical

analyzer
Lexical analyzers are divided two processes:
Scanning
No tokenization of the input
deletion of comments, compaction of white space
characters
Lexical analysis
Producing tokens

Scanning
Perspective
Purpose
Transform a stream of symbols
Into a stream of tokens

Lexical Analyzer
Responsibilities
Lexical analyzer

[Scanner]

Scan input
Remove white
spaces
Remove comments
Manufacture tokens
Generate lexical
errors
Pass token to
parser

The Role of a Lexical Analyzer

pass token
and attribute value

read char
Source
program

Lexical
analyzer
put back
char

Read entire
program into
memory

Symbol Table

Compiler Construction

Parser
get next

Lexical Analysis
What do we want to do? Example:
if (i == j)
Z = 0;
else
Z = 1;
The input is just a string of characters:
\t if (i == j) \n \t \t z = 0;\n \t else \n \t \t z
= 1;
Goal: Partition input string into substrings
Where the substrings are tokens
Compiler Construction

Whats a Token?
A syntactic category
In English:

noun, verb, adjective,

In a programming language:

Identifier, Integer, Keyword,

Whitespace,

Compiler Construction

What are Tokens For?

Classify program substrings
according to role
Output of lexical analysis is a stream
of tokens . . .which is input to the
parser
Parser relies on token distinctions
An identifier is treated differently than a
keyword
Compiler Construction

Tokens
Tokens correspond to sets of strings.
Identifier: strings of letters or digits,
starting with a letter
Integer: a non-empty string of digits
Keyword: else or if or begin or
Whitespace: a non-empty sequence of
blanks, newlines, and tabs

Compiler Construction

Typical Tokens in a PL

Symbols: +, -, *, /, =, <, >, ->,

Keywords: if, while, struct, float, int,

Integer and Real (floating point)

literals
123, 123.45
Char (string) literals
Identifiers
Comments
White space
Compiler Construction

Terminology
Token
A classification for a common set of strings
Examples: Identifier, Integer, Float, Assign, LeftParen,
RightParen,....

Pattern
The rules that characterize the set of strings for a token
A pattern is a description of the form that the lexemes of
token may take.
Examples: [0-9]+

Lexeme
Actual sequence of characters that matches a pattern and
has a given Token class.
A lexeme is a sequence of characters in the source
program that matches the pattern for a token and is
identified by the lexical analyzer as an instance of that
token.
Examples:
Identifier: Name,Data,x
Integer: 345,2,0,629,....
13

Examples

Lexical Errors
Error Handling is very localized, w.r.t. Input Source
Example:
fi(a==f(x))
generates no lexical error in C

In what situations do errors occur?

Prefix of remaining input doesnt match any defined
token

Possible error recovery actions:

Deleting or Inserting Input Characters
Replacing or Transposing Characters

Or, skip over to next separator to ignore problem

Basic Scanning technique

Use 1 character of look-ahead
Obtain char with getc()

Do a case analysis
Based on lookahead char
Based on current lexeme

Outcome
If char can extend lexeme, all is well, go on.
If char cannot extend lexeme:
Figure out what the complete lexeme is and
return its token
Put the lookahead back into the symbol stream
16

Token Attribute

E = C1 ** 10
Token

Attribute

Index to symbol table entry E

=
ID

Index to symbol table entry C1

**
NUM

10
Compiler Construction

Lexical Errors
It is hard for a lexical analyzer to tell,
without the aid of other components,
that there is a source-code error.
E.g., fi ( a == f(x)) ...

Keyword if

an identifier

Suppose a situation in which none of

the patterns for tokens matches a
prefix of the remaining input.

E.g. $%#if a>0 a+=1;

The simplest recovery strategy is

panic mode recovery.
Delete successive characters from the
remaining input until the lexical analyzer
can find a well-formed token.

This technique may occasionally

confuse the parser, but in an
interactive computing environment
it may be quit adequate.
20

Other possible errorrecovery actions

Delete one extraneous character
from the remaining input.
Insert a missing character into the
remaining input.
Replace a character by another
character.
Transpose two adjacent characters.
21

Lexical Error and Recovery

Error detection
Error reporting
Error recovery
Delete the current character and
restart scanning at the next character
Delete the first character read by the
scanner and resume scanning at the
character following it.
How about runaway strings and
comments?
Compiler Construction

Specification of Tokens
Regular expressions are an important notation
for specifying lexeme patterns. While they
cannot express all possible patterns, they are
very effective in specifying those types of
patterns that we actually need for tokens.

Compiler Construction

Input Buffering
LA reads the source program character by character from secondary storage. Reading the input character by character by is
costly.
Therefore a block of data is first read into buffer and then scanned by the LA.
In order to efficiently move back and forth in the input stream input buffering is used.
There are various methods by which buffering may be done

One Buffer Scheme

Two Buffer Scheme

One Buffer Scheme: If a lexeme crosses over the boundary of the buffer the buffer has to be refilled in order to scan the
rest of the lexeme and the first part of the lexeme will be overwritten in the process.

Buffer Pairs
Two buffers of the same size, say 4096, are alternately
reloaded.
Two pointers to the input are maintained:
Pointer lexeme_Begin marks the beginning of the
current lexeme.
Pointer forward pointer scans ahead until a pattern
match is found.
The string of characters between the two pointers is the
current lexeme. Initially both pointers point to the
beginning of the lexeme.
The fp then starts scanning forward until a match for a
pattern is found. When the end of the lexeme is
identified, the fp is set to the character at its right end,
the token and the attribute corresponding to this
lexeme is returned.
After that both the pointers lb and fp are then set to the
beginning of the next token, which is the character
immediately past the lexeme.
26

Buffering
Input Buffering:
Some efficiency issues concerned with the buffering of input.
A two-buffer input scheme that is useful when lookahead on the input is necessary to
identify tokens.
Techniques for speeding up the lexical analyser, such as the use of sentinels to mark the
buffer end.
There are three general approaches to the implementation of a lexical analyser:
Use a lexical-analyser generator, such as Lex compiler to produce the lexical analyser from a
regular expression based specification. In this, the generator provides routines for reading and
buffering the input.
2. Write the lexical analyser in a conventional systems-programming language, using I/O
facilities of that language to read the input.
3. Write the lexical analyser in assembly language and explicitly manage the reading of input.

Buffer pairs:
Because of a large amount of time can be consumed moving characters, specialized
buffering techniques have been developed to reduce the amount of overhead required to
process an input character.
The scheme to be discussed:
Consists a buffer divided into two N-character halves.

N Number of characters on one disk block, e.g., 1024 or 4096.

Read N characters into each half of the buffer with one system read command.
If fewer than N characters remain in the input, then eof is read into the buffer
after the input characters.
Two pointers to the input buffer are maintained.
The string of characters between two pointers is the current lexeme.
Initially both pointers point to the first character of the next lexeme to be
found.
Forward pointer, scans ahead until a match for a pattern is found.

Once the next lexeme is determined, the forward pointer is set to the character at its
right end.
If the forward pointer is about to move past the halfway mark, the right half is
filled with N new input characters.
If the forward pointer is about to move past the right end of the buffer, the left
half is filled with N new characters and the forward pointer wraps around to the
beginning of the buffer.
Disadvantage of this scheme:
This scheme works well most of the time, but the amount of lookahead is
limited.

This limited lookahead may make it impossible to recognize tokens in

situations where the distance that the forward pointer must travel is more than the
length of the buffer.
For example: DECLARE ( ARG1, ARG2, , ARGn ) in PL/1 program;
Cannot determine whether the DECLARE is a keyword or an array name until
the character that follows the right parenthesis.

Whenever we advance fp we must check whether

we have reached the end of one buffer or not as in
that case we have to refill the other buffer before
advancing fp. So we have to make three tests.
1.For checking whether the first buffer is full or not.
2.For checking whether the second buffer is full or
not.
3.To check the end of the input.

If forward at end of first half then begin

reload second half;
forward:=forward + 1;
End
Else
if forward at end of second half then begin
reload first half;
move forward to beginning of first half
End
Else if (fp==eof)
{ terminate scanning
}
Else
forward:=forward + 1;
31

Sentinels
We can reduce these three tests to one by
introducing a special character called
sentinel at the end. Let us choose EOF as
the sentinel at the end of each buffer.
E

M * eof C * * 2 eof

eof

Sentinels
In the previous scheme, must check each time the move forward pointer that
have not moved off one half of the buffer. If it is done, then must reload the other
half.
Therefore the ends of the buffer halves require two tests for each advance of the
forward pointer.
This can reduce the two tests to one if it is extend each buffer half to hold a
sentinel character at the end.
The sentinel is a special character that cannot be part of the source program. (eof
character is used as sentinel).
In this, most of the time it performs only one test to see whether forward points
to an eof.
Only when it reach the end of the buffer half or eof, it performs more tests.
Since N input characters are encountered between eofs, the average number of
tests per input character is very close to 1.

forward:=forward+1;
If (forward==eof)
{
If forward at end of first half then begin
reload second half;
forward:=forward + 1;
End
Else if forward at end of second half then begin
reload first half;
move forward to beginning of first half
End
Else terminate lexical analysis;
}
34

How can deal with a long and long

andlong lexeme, this is a
problem in the two buffer scheme.
DECLARE(ARG1, ARG2,,ARGn)
E.g. When a function is rewritten in c+
+, a function name is represent
several function.
35

3.3 Specification of Tokens

Regular expressions are an important notation for
specifying token patterns.
Study formal notations for regular expressions.
these expressions are used in lexical-analyzer generator.
Sec. 3.7 shows how to build the lexical analyzer by
converting regular expressions to automata.

1 Regular Definition of Tokens

Defined in regular expression
e.g. identifier can be defined by regular
Grammar

Id letter(letter|digit)
letter A|B||Z|a|b||z
digit 0|1|2||9
Identifier can also be expressed by following
regular expression
(A|B||Z|a|b||z)(A|B||Z|a|b||z| 0|1|2||
9)*
37

Regular expressions are an

important notation for specifying
patterns. Each pattern matches a
set of strings, so regular
expressions will serve as names
for sets of strings.

2 Regular Expression & Regular

language
Regular Expression
A notation that allows us to define a pattern in
a high level language.

Regular language
Each regular expression r denotes a language
L(r) (the set of sentences relating to the regular
expression r)

Each token in a program can be expressed in a regular expression

3 The construct rule of regular

expression over alphabet

is a regular expression that denote

{}

is regular expression
{} is the related regular language

2) If a is a symbol in , then a is a regular

expression that denotes {a}

a is regular expression
{a} is the related regular language

3) Suppose and are regular

expressions, then |, , (), * , * is
also a regular expression
L(|)= L()L()
L()= L()L()
L(())= L()
L(*)={}L()L()L()... L()
L()

4 Algebraic laws of regular

expressions
1) |= |
2) |(|)=(|)| () =( )
3) (| )= |
(|)= |
4) = =
5)(*)*=*
6) *= |
* = *
7) (|)*= (* | *)*= (* *)*
43

8) If L(),then
= |

= *

= |

= *

Notes: We assume that the precedence

of * is the highest, the precedence of |
is the lowest and they are left
associative

Example unsigned numbers such as

5280, 39.37, 6.336E4, 1.894E-4
digit0 | 1 || 9
digits digit digit*
optional_fraction .digits|
optional_exponent (E(+|-| )digits)|
num digits optional_fraction
optional_exponent
45

5 Notational Short-hands
a)One or more instances
( r )+ digit+
b)Zero or one instance
r? is a shorthand for r|
digits)?
c)Character classes
[a-z] denotes a|b|c||z
[A-Za-z] [A-Za-z0-9]
46

(E(+|-)?

3.4 Recognition of Tokens

1 Task of recognition of token in a
lexical analyzer

Isolate the lexeme for the next token

in the input buffer

Produce as output a pair consisting of

the appropriate token and attributevalue, such as <id,pointer to table
entry> , using the translation table
given in the Fig in next page

Regular
expression
if
id

Token

relop

if
id

Attributevalue
Pointer to
table entry
LT

2 Methods to recognition of token

Use Transition Diagram

3 Transition Diagram(Stylized
flowchart)

Depict the actions that take place

when a lexical analyzer is called by the
parser to get the next token

start

0
Start
state

Accepting
state
6

return(relop,GE)

other
8

return(relop,GT)

Notes: Here we use * to indicate states on which input

retraction must take place

4 Implementing a Transition Diagram

Each state gets a segment of code

If there are edges leaving a state, then

its code reads a character and selects
an edge to follow, if possible

Use nextchar() to read next character

from the input buffer

while (1) {
switch(state) {
case 0: c=nextchar();
if (c==blank || c==tab ||
c==newline){
state=0;lexeme_beginning+
+}
else if (c== <) state=1;
else if (c===) state=5;
else if(c==>) state=6 else
state=fail();
break
case 9: c=nextchar();
if (isletter( c)) state=10;
else state=fail(); break
}}}
53

5 A generalized transition diagram

Finite Automation

Deterministic or non-deterministic FA

Non-deterministic means that more

than one transition out of a state may
be possible on the the same input
symbol

6 The model of recognition of

tokens
Input buffer

d 2 =
Lexeme_beginning

FA simulator

e.g The FA simulator for Identifiers is:

letter

letter
2
digit

Which represent the rule:

identifier=letter(letter|digit)*

Lexical Analyzer Flex Lab Report
No ratings yet
Lexical Analyzer Flex Lab Report
12 pages
CD Assignment-2
No ratings yet
CD Assignment-2
16 pages
CD PPTS 2
No ratings yet
CD PPTS 2
27 pages
Chapter 2 - Lexical Analysis
100% (1)
Chapter 2 - Lexical Analysis
69 pages
Lexical Analyzer Design in nGineer
No ratings yet
Lexical Analyzer Design in nGineer
7 pages
Lexical Analyzer Project Report
33% (3)
Lexical Analyzer Project Report
24 pages
Lexical Analysis for Developers
No ratings yet
Lexical Analysis for Developers
16 pages
Assignment 1 (Lexical Analyzer)
No ratings yet
Assignment 1 (Lexical Analyzer)
17 pages
Lexical Analysis in Compiler Design
100% (1)
Lexical Analysis in Compiler Design
3 pages
Lexical Analysis in Compiler Design
No ratings yet
Lexical Analysis in Compiler Design
5 pages
Chapter 2
No ratings yet
Chapter 2
56 pages
Develop a Lexical Analyzer in C
No ratings yet
Develop a Lexical Analyzer in C
17 pages
RE to DFA Conversion Guide
No ratings yet
RE to DFA Conversion Guide
38 pages
002chapter 2 - Lexical Analysis
No ratings yet
002chapter 2 - Lexical Analysis
114 pages
Mathematical Analysis of Recursive Algorithms
No ratings yet
Mathematical Analysis of Recursive Algorithms
19 pages
Storage Organization in Compiler Design
No ratings yet
Storage Organization in Compiler Design
16 pages
Unit 2 Lexical Analyzer
No ratings yet
Unit 2 Lexical Analyzer
30 pages
Configuring SMTP and FTP Servers in Packet Tracer
No ratings yet
Configuring SMTP and FTP Servers in Packet Tracer
12 pages
Lexical Analysis Lab Report
No ratings yet
Lexical Analysis Lab Report
5 pages
Compiler Lab Guide for Students
No ratings yet
Compiler Lab Guide for Students
47 pages
Elimination of Left Recursion
No ratings yet
Elimination of Left Recursion
17 pages
Compiler Design and Construction Note
No ratings yet
Compiler Design and Construction Note
97 pages
Database Management Systems Overview
No ratings yet
Database Management Systems Overview
45 pages
Re To DFA
No ratings yet
Re To DFA
6 pages
Compiler Design Course Overview
No ratings yet
Compiler Design Course Overview
103 pages
Compiler Design
No ratings yet
Compiler Design
48 pages
Principles of Compiler Design
No ratings yet
Principles of Compiler Design
36 pages
Automata and Compiler Design Project Based Lab Report ON: LR (0) Parsing
No ratings yet
Automata and Compiler Design Project Based Lab Report ON: LR (0) Parsing
7 pages
TIC 2151 - Theory of Computation: Decidability
100% (1)
TIC 2151 - Theory of Computation: Decidability
14 pages
Programming Languages: Names & Scopes
No ratings yet
Programming Languages: Names & Scopes
119 pages
Compiler Construction: Sohail Aslam
No ratings yet
Compiler Construction: Sohail Aslam
29 pages
Document 2A GATE CSE - Important Formulas & Short Notes (2025)
No ratings yet
Document 2A GATE CSE - Important Formulas & Short Notes (2025)
5 pages
Course File on Operating Systems
No ratings yet
Course File on Operating Systems
157 pages
Atc Module-5 - TM
100% (1)
Atc Module-5 - TM
29 pages
Dynamic Database Management in Prolog
No ratings yet
Dynamic Database Management in Prolog
5 pages
IMPORTANT QUESTIONS 2nd Sem SSAD
No ratings yet
IMPORTANT QUESTIONS 2nd Sem SSAD
2 pages
CS416 Compiler Design Course Overview
No ratings yet
CS416 Compiler Design Course Overview
31 pages
Compiler Design Lecture Notes Bapatla
No ratings yet
Compiler Design Lecture Notes Bapatla
45 pages
DBMS Component Modules Explained
No ratings yet
DBMS Component Modules Explained
14 pages
1 Lexial Analysis
No ratings yet
1 Lexial Analysis
24 pages
FDP On Research Through AI Tools Brochure
No ratings yet
FDP On Research Through AI Tools Brochure
3 pages
CFG Simplification Techniques
100% (2)
CFG Simplification Techniques
12 pages
Database Management System
No ratings yet
Database Management System
23 pages
2 Compiler Design Notes
No ratings yet
2 Compiler Design Notes
31 pages
Leave Management System Review 1
No ratings yet
Leave Management System Review 1
8 pages
Compiler Design Lab Guide
No ratings yet
Compiler Design Lab Guide
59 pages
Unit V (Accessing MYSQL)
No ratings yet
Unit V (Accessing MYSQL)
17 pages
Lexical Analysis with Flex Guide
No ratings yet
Lexical Analysis with Flex Guide
22 pages
Compiler Design Lab Manual for CSE
75% (16)
Compiler Design Lab Manual for CSE
55 pages
Automata, Computability, and Complexity
No ratings yet
Automata, Computability, and Complexity
11 pages
Database Management Lab Manual
No ratings yet
Database Management Lab Manual
40 pages
Finite Automata Basics
No ratings yet
Finite Automata Basics
15 pages
Fill in The Blanks
No ratings yet
Fill in The Blanks
2 pages
Unit 2
No ratings yet
Unit 2
61 pages
Lexical Analyzer
No ratings yet
Lexical Analyzer
24 pages
Unit II
No ratings yet
Unit II
35 pages
Lexical Analysis Overview
No ratings yet
Lexical Analysis Overview
17 pages
CD 2
No ratings yet
CD 2
20 pages
Pcdunit2 Class
No ratings yet
Pcdunit2 Class
21 pages
UNIT 2 Compiler Design
No ratings yet
UNIT 2 Compiler Design
23 pages
Twitter Spam Detection Based On Deep Learning: Tingmin Wu, Shigang Liu, Jun Zhang and Yang Xiang
No ratings yet
Twitter Spam Detection Based On Deep Learning: Tingmin Wu, Shigang Liu, Jun Zhang and Yang Xiang
8 pages
Binder 7
No ratings yet
Binder 7
8 pages
Case Study I
No ratings yet
Case Study I
7 pages
MPC8548PQIIIFS
No ratings yet
MPC8548PQIIIFS
2 pages
HD Mini Cam Manual v1
No ratings yet
HD Mini Cam Manual v1
20 pages
Circuit Analysis I Homework 10
No ratings yet
Circuit Analysis I Homework 10
1 page
How To Start Cracking With OpenBullet
No ratings yet
How To Start Cracking With OpenBullet
2 pages
Wireshark Lab 1.2 Import and Examine PCAP File (V1.1)
No ratings yet
Wireshark Lab 1.2 Import and Examine PCAP File (V1.1)
9 pages
IOS Operating System
No ratings yet
IOS Operating System
14 pages
Sons of Liberty
100% (1)
Sons of Liberty
99 pages
South Share Market
No ratings yet
South Share Market
12 pages
DP-300 Current Differential Protection Relay IM EN 6 2012 PDF
No ratings yet
DP-300 Current Differential Protection Relay IM EN 6 2012 PDF
26 pages
VSX-S520 - Manual Receiver Pioneer
No ratings yet
VSX-S520 - Manual Receiver Pioneer
485 pages
Y10 03 CT15 Activities Solutions
No ratings yet
Y10 03 CT15 Activities Solutions
4 pages
Marketing Cloud Connect Guide
No ratings yet
Marketing Cloud Connect Guide
68 pages
Understanding Business Intelligence
No ratings yet
Understanding Business Intelligence
5 pages
Ir - Iso Cycle
No ratings yet
Ir - Iso Cycle
41 pages
Script Simulation
No ratings yet
Script Simulation
5 pages
Ai Concept Paper
No ratings yet
Ai Concept Paper
3 pages
English Test for Language Learners
No ratings yet
English Test for Language Learners
1 page
Yeni Metin Belgesi
No ratings yet
Yeni Metin Belgesi
4 pages
Meeco Moisture Analyzer Manual
100% (1)
Meeco Moisture Analyzer Manual
2 pages
Class and Object
No ratings yet
Class and Object
15 pages
EyeBall™ Set-Up and Calibration Guide (Large PC - Cammegh
No ratings yet
EyeBall™ Set-Up and Calibration Guide (Large PC - Cammegh
13 pages
Bus Ticket Instructions
No ratings yet
Bus Ticket Instructions
5 pages
Introduction To The Computer Simulations - Borovinsek
No ratings yet
Introduction To The Computer Simulations - Borovinsek
90 pages
HW2 24
No ratings yet
HW2 24
8 pages
The Newport E360t Ventilator Specifications: Controls and Features
No ratings yet
The Newport E360t Ventilator Specifications: Controls and Features
2 pages
Otley 1980
No ratings yet
Otley 1980
16 pages
LQ 6-3
No ratings yet
LQ 6-3
24 pages

Lexical Analyzer

Uploaded by

Lexical Analyzer

Uploaded by

Unit 2

Issues in lexical analysis

The role of the lexical

The Role of a Lexical Analyzer

noun, verb, adjective,

Identifier, Integer, Keyword,

What are Tokens For?

Symbols: +, -, *, /, =, <, >, ->,

Integer and Real (floating point)

In what situations do errors occur?

Possible error recovery actions:

Or, skip over to next separator to ignore problem

Basic Scanning technique

Index to symbol table entry E

Index to symbol table entry C1

Suppose a situation in which none of

E.g. $%#if a>0 a+=1;

The simplest recovery strategy is

This technique may occasionally

Other possible errorrecovery actions

Lexical Error and Recovery

One Buffer Scheme

N Number of characters on one disk block, e.g., 1024 or 4096.

This limited lookahead may make it impossible to recognize tokens in

Whenever we advance fp we must check whether

If forward at end of first half then begin

How can deal with a long and long

3.3 Specification of Tokens

1 Regular Definition of Tokens

Regular expressions are an

2 Regular Expression & Regular

Each token in a program can be expressed in a regular expression

3 The construct rule of regular

expression over alphabet

is a regular expression that denote

2) If a is a symbol in , then a is a regular

3) Suppose and are regular

4 Algebraic laws of regular

Notes: We assume that the precedence

Example unsigned numbers such as

3.4 Recognition of Tokens

Isolate the lexeme for the next token

Produce as output a pair consisting of

2 Methods to recognition of token

Use Transition Diagram

Depict the actions that take place

Notes: Here we use * to indicate states on which input

4 Implementing a Transition Diagram

Each state gets a segment of code

If there are edges leaving a state, then

Use nextchar() to read next character

5 A generalized transition diagram

Non-deterministic means that more

6 The model of recognition of

e.g The FA simulator for Identifiers is:

Which represent the rule:

You might also like