0% found this document useful (0 votes)
40 views12 pages

Lexical Analysis

Uploaded by

Musa Gill
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
40 views12 pages

Lexical Analysis

Uploaded by

Musa Gill
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd

Lexical Analysis

1
Contents
 Lexical Analysis
 Role of the Lexical Analyzer
 Tokens, Patterns & Lexemes
 Attributes for Tokens

2
Lexical Analysis
 Typical tasks performed by the lexical analyzer

 Remove white space and comments


 Encode constants as tokens ( 45, Hello & True/False)
int x = 42;
char* str = "Hello";

 Recognize keywords (if, int, while, return, for etc)


 Recognize identifiers (int x = 10; float myVar = 20.5; )and store
identifier names in a global symbol table
Lexical Error
 A character sequence that cannot scan any valid
token is a lexical error.
Lexical Phase error can be:
 Spelling error
Exceeding the length of the identifier
Appearance of an illegal character
Note:
Lexical phase error is found during the execution of
the program.
Role of Lexical Analyzer

5
Role of Lexical Analyzer..
 Sometimes, lexical analyzers are divided into a cascade of two
processes:

 Scanning consists of simple processes that do not require tokenization


of the input, such as deletion of comments and compaction of
consecutive whitespace characters into one.

 Lexical analysis is the more complex portion, where the scanner


produces the sequence of tokens as output.

6
Lexical Analysis Vs Parsing
 There are several reasons why the analysis portion is normally
separated into lexical analysis and parsing.
 Parsing: It is the process of analyzing the grammatical structure of the
source code based on the rules of a formal grammar (usually context-
free grammar)
 The separation of lexical and syntactic analysis often allows us to
simplify at least one of these tasks.
 Lexical and syntactic separation simplifies these tasks by allowing the
lexical analyzer to focus on identifying tokens, while the parser focuses
on grammar structure. This improves compiler efficiency, as specialized
techniques can be used for each task separately.
 Compiler portability is enhanced because the lexical analyzer can
handle device-specific input peculiarities, isolating them from the main
parsing logic, making the compiler easier to adapt to different systems.
7
Tokens, Patterns & Lexemes
 A token is a sequence of characters that can be treated as a unit in
the grammar of the programming language.

Examples: Identifiers, Keywords, operators, special symbols, constants


Non Tokens: Comments, Tabs, Blanks & New Line

8
Tokens, Patterns & Lexemes
 A pattern is a description of the form that the lexemes of a token may take.
 In the case of a keyword as a token, the pattern is just the sequence of characters that
form the keyword.
Example:

For a keyword to be identified as a valid token, the pattern is the sequence of characters
that make the keyword.

For identifier to be identified as a valid token, the pattern is the predefined rules that it
must start with alphabet, followed by alphabet or a digit.

 A lexeme is a sequence of characters in the source program that matches the


pattern for a token and is identified by the lexical analyzer as an instance of that
token.
Example:
main is lexeme of type identifier(token)
(,),{,} are lexemes of type punctuation(token)
9
Tokens, Patterns & Lexemes..
 In many programming languages, the following classes cover most
or all of the tokens:

10
Attributes for Tokens
 When a lexeme is encountered in a source program, it is necessary
to keep track of other occurrences of the same lexeme.
For example, if this lexeme seen before or not.

11
Attributes for Tokens..
 Ex. The token names and associated attribute values for the
Fortran statement E = M * C2 are as follows:

<id, pointer to symbol-table entry for E>


<assign_op>
<id, pointer to symbol-table entry for M>
<mult_op>
<id, pointer to symbol-table entry for C>
<exp_op>
<number, integer value 2>

12

You might also like