0% found this document useful (0 votes)
55 views6 pages

Document 4 Compiler Design - Lexical Analysis Notes

Uploaded by

alok20007k
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
55 views6 pages

Document 4 Compiler Design - Lexical Analysis Notes

Uploaded by

alok20007k
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Document 4: Compiler Design – Lexical Analysis

Notes

1. Introduction to Lexical Analysis

First phase of compiler.

Converts source code tokens.

Removes whitespace, comments.

Passes token stream to syntax analysis.

2. Role of Lexical Analyzer

Reads input characters.

Groups them into lexemes.

Produces tokens (token name + attribute value).


Reports lexical errors.

3. Tokens, Lexemes, and Patterns

Token: Category of lexemes (e.g., keyword, identifier).

Lexeme: Actual string in source (e.g., int, main).

Pattern: Rule that defines a token (e.g., [a-zA-Z][a-zA-Z0-9]* for identifiers).

4. Example

Source Code:

int sum = a + b;

Tokens:
1.
Keyword: int

2.
Identifier: sum

3.
Operator: =

4.
Identifier: a

5.
Operator: +

6.
Identifier: b

7.
Delimiter: ;

5. Finite Automata in Lexical Analysis

Regular Expressions used to describe tokens.

NFA (Nondeterministic Finite Automata) converted to DFA for efficient scanning.

DFA is used to implement lexical analyzers.


6. Regular Expressions Examples

Identifier: [a-zA-Z][a-zA-Z0-9]*

Integer Constant: [0-9]+

Whitespace: ( \t | \n | " " )

7. Lexical Errors

Unrecognized symbols.

Unterminated strings/comments.

Illegal identifiers.

Error Handling:
Panic mode recovery.

Skipping characters until a valid token is found.

8. Symbol Table

Stores identifiers with attributes (type, scope, memory location).

Used by later phases of compiler.

9. Lex Tool

Lex (or Flex) is a tool to automatically generate lexical analyzers.

Input: set of token definitions (regex).

Output: C program for scanning tokens.


10. Conclusion

Lexical analysis is the foundation phase of compiler design. It simplifies input by converting
raw code into meaningful tokens, which allows further phases like syntax and semantic
analysis to work effectively.

You might also like