0% found this document useful (0 votes)
23 views

Lexical Analysis

The document discusses lexical analysis which is the process of reading input characters and producing tokens. It defines tokens, patterns, and lexemes. It describes the role of the lexical analyzer in sending tokens to the parser and mentions why lexical analysis is separated from parsing for simpler design, compiler efficiency, and portability. It also discusses lexical errors, error recovery techniques, and specification of tokens using regular expressions.

Uploaded by

aryachoudhary101
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views

Lexical Analysis

The document discusses lexical analysis which is the process of reading input characters and producing tokens. It defines tokens, patterns, and lexemes. It describes the role of the lexical analyzer in sending tokens to the parser and mentions why lexical analysis is separated from parsing for simpler design, compiler efficiency, and portability. It also discusses lexical errors, error recovery techniques, and specification of tokens using regular expressions.

Uploaded by

aryachoudhary101
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Lexical Analysis

Prepared By: Dr. D. P. Singh


Lexical Analysis (Token, Pattern and Lexeme)
• Lexical analyzer: reads input characters and produces a sequence of tokens
as output (nexttoken()).
• Trying to understand each element in a program.
• Token: a group of characters having a collective meaning.
• Defined by the programming language Token:
• Can be separated by spaces <Token_Name, Attribute>
• Smallest units
• Defined by regular expressions
• A pattern is description of the form that the lexemes of a token may take
• A lexeme is a particular instant of a token.
e.g.
const int pi=3.14;
Tokens:<KEYWORD, const><KEYWORD,
int><ID,pi><OP,=><CONSTANT,3.14><SEPARATER,;>
Example of Non Tokens
• Comments
• Whitespaces
• Preprocessor Directives
• Macros
Role of Lexical Analyzer

sendToken
Source Program Lexical To semantic
Parser Analysis
Analyzer
getNextToken

Symbol Table
Why to separate Lexical Analysis and Parsing
• Simpler design.
• Separation allows the simplification of one or the other.
• Example: A parser with comments or white spaces is more
complex
• Compiler efficiency is improved.
• Optimization of lexical analysis because a large amount of time is
spent reading the source program and partitioning it into tokens.
• Compiler portability is enhanced.
• Input alphabet peculiarities and other device-specific anomalies
can be restricted to the lexical analyzer.
Lexical Errors
• Some errors are out of power of lexical analyzer to recognize:
• fi(a==b)……
• However, it may be able to recognize errors like:
• a=2b
• Such errors are recognized when no pattern for tokens matches a character
sequence
➢Exceeding length of identifier or numeric constants.
➢Appearance of illegal characters
➢Unmatched string
• e.g.
printf(“Compiler Design");$
• This is a lexical error since an illegal character $ appears at the end of statement.
This is a comment */
• This is an lexical error since end of comment is present but beginning is not present.
Error Recovery
• Panic Mode Recovery
• In this method, successive characters from the input are removed
one at a time until a designated set of synchronizing tokens is
found. Synchronizing tokens are delimiters such as; or }
• Advantage is that it is easy to implement and guarantees not to go to
infinite loop
• Disadvantage is that a considerable amount of input is skipped
without checking it for additional errors
Specification of tokens
• In theory of compilation regular expressions are used to formalize the
specification of tokens
• Regular expressions are means for specifying regular languages
• e.g.
letter(letter|digit)*

• Each regular expression is a pattern specifying the form of strings


Thank You

You might also like