Lexical Analysis
Lexical Analysis
sendToken
Source Program Lexical To semantic
Parser Analysis
Analyzer
getNextToken
Symbol Table
Why to separate Lexical Analysis and Parsing
• Simpler design.
• Separation allows the simplification of one or the other.
• Example: A parser with comments or white spaces is more
complex
• Compiler efficiency is improved.
• Optimization of lexical analysis because a large amount of time is
spent reading the source program and partitioning it into tokens.
• Compiler portability is enhanced.
• Input alphabet peculiarities and other device-specific anomalies
can be restricted to the lexical analyzer.
Lexical Errors
• Some errors are out of power of lexical analyzer to recognize:
• fi(a==b)……
• However, it may be able to recognize errors like:
• a=2b
• Such errors are recognized when no pattern for tokens matches a character
sequence
➢Exceeding length of identifier or numeric constants.
➢Appearance of illegal characters
➢Unmatched string
• e.g.
printf(“Compiler Design");$
• This is a lexical error since an illegal character $ appears at the end of statement.
This is a comment */
• This is an lexical error since end of comment is present but beginning is not present.
Error Recovery
• Panic Mode Recovery
• In this method, successive characters from the input are removed
one at a time until a designated set of synchronizing tokens is
found. Synchronizing tokens are delimiters such as; or }
• Advantage is that it is easy to implement and guarantees not to go to
infinite loop
• Disadvantage is that a considerable amount of input is skipped
without checking it for additional errors
Specification of tokens
• In theory of compilation regular expressions are used to formalize the
specification of tokens
• Regular expressions are means for specifying regular languages
• e.g.
letter(letter|digit)*