CS606
ASSIGNMENT NO 1
NAME: NARMEEN SHAHID
VU ID: BC230428551
Question 1
Explain the role of a lexical analyzer in a compiler. What are the different phases it
involves?
Use a small code snippet in C to illustrate how the lexical analyzer breaks down the
source code into tokens.
Solution:
A lexical analyzer (also known as a lexer or scanner) is the first phase of a compiler that
processes the source code and converts it into a sequence of tokens. These tokens are the
basic building blocks of the source code, such as keywords, operators, identifiers, literals, and
punctuation. The primary role of the lexical analyzer is to scan the input source code and
group characters into meaningful units that can be further processed by the parser in later
stages of compilation.
Role of a Lexical Analyzer in a Compiler:
1. Reading Input: It reads the raw source code, which is typically in the form of text,
character by
character.
2. Tokenization: The lexical analyzer splits the input into a series of tokens. A token is
a meaningful sequence of characters, often associated with a specific type (such as a
keyword, identifier, operator, etc.).
3. Classification: It classifies each token into a specific type (e.g., keyword, identifier,
operator, literal, etc.), which helps the parser understand the structure of the code.
4. Handling Whitespace and Comments: It removes whitespace (spaces, tabs,
newlines) and comments from the source code since they are not needed for syntax
analysis, but they help in human readability.
5. Error Detection: If the lexical analyzer encounters a character or sequence of
characters that do not conform to the expected syntax of tokens, it generates an error
.
Phases of a Lexical Analyzer:
1. Input Buffering: The source code is read into a buffer to allow efficient scanning.
2. Lexeme Recognition: A lexeme is a sequence of characters in the source code that
matches a regular expression or pattern for a token. The lexical analyzer identifies
lexemes and groups them into tokens.
3. Token Classification: Each lexeme is classified into a specific token type based on its
pattern (e.g., keywords, identifiers, operators, etc.).
4. Output Tokens: The analyzer outputs a sequence of tokens to the parser, often along
with additional information like the lexeme's value or position in the source code.
5. Error Reporting: If an invalid lexeme is encountered, the lexical analyzer reports an
error.
Breaking Down C Code into Tokens:
Simple C code snippet
int mainO
int x = 10;
x= x + 5;
return 0;
}
Lexem Token TypeDescription
e
Int Keyword Keyword: Indicates a data
type.
main Identifier Identifier: Name of the
function.
( Punctuation Opening parenthesis, part of
function declaration.
) Punctuation Closing parenthesis, part of
function declaration.
{ Punctuation Opening curly brace, start of
function body.
int Keyword Keyword: Indicates a data
type.
x Identifier Identifier: Variable name
= Operator Assignment operator
10 Literal Integer literal value.
; Punctuation Semicolon, statement
terminator.
x Identifier Identifier: Variable name.
= Operator Assignment operator.
x Identifier Identifier: Variable name.
+ Operator Addition operator.
5 Literal Integer literal value.
; Punctuation Semicolon, statement
terminator.
return Keyword Keyword: Return statement
in function.
0 Literal Integer literal value.
; Punctuation Semicolon, statement
terminator.
} Punctuation Closing curly brace, end of
function body.
Question 2:
Consider the following code snippet. Identify the lexemes and corresponding tokens for
each line.
int x = 20;
if (x> 10) {
x=x+ 5;
}
Solution
Breakdown Lexemes and Tokens
Line Lexeme Token Type Description
Line 1 : int x =20; Int Keyword A key Word
indicating the data
type int.
x Identifier An identifier
representing avariable
name.
= Operator The assignment
operator
20 Literal An integer literal
value
; Punctuation Statement
terminator(semicolon).
Line 2 : if(x>10){ If Keyword A keyword that
introduces a
conditional statement
( Punctuation Opening parenthesis
for the conditional
expression.
x Identifier An identifier
representing a
variable name.
> Operator The greater than
operator.
10 Literal an integer literal
value.
) Punctuation closing parentheses
for the conditional
expression.
{ Punctuation opening curly brace
indicating the start of
the block.
Line 3: x=x+5; x Identifier An identifier
representing variable
name.
= Operator The assignment
operator.
x Identifier An identifier
representing a variable
name
+ Operator The addition operator.
5 Literal An integer literal
value.
; Punctuation Statement
terminator(semicolon).
Line 4:} } Punctuation Closing curly brace,
indicating the end of
the block