Introduction to
LEXICAL ANALYSIS AND LEX
PROGRAMMING
by - Abhishek kr. Shaw
Aman Kumar
Ranvir kr. Yadav
WHAT IS
LEX-PROGRAMMING?
Lexical analysis is the first step in the
compiler process, converting source
code into tokens.
Simplifies the structure of the source
code for the parser by breaking down
text into manageable units.
A sequence of tokens representing
keywords, operators, identifiers,
constants, and symbols.
Lexemes: The actual character sequences that match a
token pattern.
Common Token Types:
Keywords: (e.g., if, else, while).
Identifiers: variables, functions names etc.
(e.g., AddNums, Res).
Literals: Constant values (e.g., numbers, strings).
Operators: (e.g., +, -, *, /).
Punctuators: Symbols that separate parts of the
THE ROLE OF code (e.g., ;, (, )).
TOKENS
STRUCTURE OF LEX
Three Main Sections in Lex Files:
1. Definition Section:
Optional, for defining constants or C headers.
2. Rules Section:
Contains regular expressions and actions.
Each pattern is matched, and the corresponding action is executed.
3. User Subroutines Section:
Optional C code, often contains helper functions and main().
CODE EXAMPLE: LEX PROGRAM STRUCTURE
Input Code Example:
123 + varName
Expected Output:
Number: 123
Plus Operator [+]
Identifier: varName
HOW LEX AND YACC WORK TOGETHER
Lex: Handles lexical analysis, generating tokens.
YACC: Handles syntax analysis, using tokens generated by Lex to form a
parse tree or AST (Abstract Syntax Tree).
Diagram: Lex → Tokens → YACC → Parse Tree
BENEFITS OF USING LEX
Advantages:
Simplifies creation of lexical analyzers.
Uses flexible and powerful regular expressions.
Works well with Yacc to build compilers or interpreters.
Common Use Cases: Compilers, interpreters, data processors.
CONCLUSION
Summary:
Lexical analysis is a key step in the compiler process, breaking down
code into tokens.
Lex simplifies this process by allowing regular expressions to match
and define tokens.
Lex, combined with Yacc, enables powerful tools for building
language processors.