0% found this document useful (0 votes)

51 views12 pages

Compiler Design

Uploaded by

Sameer verma

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

51 views12 pages

Compiler Design

Uploaded by

Sameer verma

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 12

Compiler Design

Lexeme: In a compiler, a lexeme refers to the smallest meaningful unit in the source code of a
program. It is a sequence of characters in the source code that represents a single, indivisible
element, such as a keyword, identifier, operator, or literal. Lexemes are the building blocks of a
program's syntax and semantics, and they are used by the lexical analyzer (lexer) to generate tokens.

A lexer is a crucial component of the compiler that performs lexical analysis or scanning. Its primary
role is to read the source code character by character, identify lexemes, and categorize them into
tokens, associating each token with a specific type and, in some cases, additional attributes. Tokens
are then passed to the parser for further analysis and processing.

Here are some examples of lexemes and the corresponding tokens they may generate in a
programming language like C:

Lexeme: "while"

Token Type: Keyword

Description: The lexeme "while" is recognized as a keyword in C, indicating the start of a while loop.

Lexeme: "count"

Token Type: Identifier

Description: The lexeme "count" represents an identifier, which could be a variable or function name.

Lexeme: "+"

Token Type: Operator

Description: The lexeme "+" is recognized as an operator, indicating addition.

Lexeme: "42"

Token Type: Integer Literal

Description: The lexeme "42" represents an integer literal, a numeric constant.

Lexeme: "3.14"

Token Type: Floating-Point Literal

Description: The lexeme "3.14" represents a floating-point literal, a decimal number with a fractional
part.
Lexeme: "("

Token Type: Left Parenthesis

Description: The lexeme "(" is recognized as a left parenthesis, often used to group expressions.

In summary, a lexeme in a compiler is a sequence of characters in the source code that represents a
single, meaningful element of the program. Lexemes are identified and categorized into tokens
during the lexical analysis phase, and these tokens are then used by subsequent phases of the
compiler for parsing, semantic analysis, and code generation.

Type of grammar
CFG Simplification
[Video lecture-11: Knowledge Gate]

In a CFG, it may happen that all the production rules and symbols are not
needed for the derivation of strings. Besides, there may be some null
productions and unit productions. Elimination of these productions and
symbols is called simplification of CFGs. Simplification essentially comprises
of the following steps −

 Reduction of CFG
 Removal of Unit Productions
 Removal of Null Productions

Reduction of CFG
CFGs are reduced in two phases −

Phase 1 − Derivation of an equivalent grammar, G’, from the CFG, G, such

that each variable derives some terminal string.

Derivation Procedure −

Step 1 − Include all symbols, W1, that derive some terminal and
initialize i=1.

Step 2 − Include all symbols, Wi+1, that derive Wi.

Step 3 − Increment i and repeat Step 2, until Wi+1 = Wi.

Step 4 − Include all production rules that have Wi in it.

Phase 2 − Derivation of an equivalent grammar, G”, from the CFG, G’,

such that each symbol appears in a sentential form.

Derivation Procedure −

Step 1 − Include the start symbol in Y1 and initialize i = 1.

Step 2 − Include all symbols, Yi+1, that can be derived from Yi and include
all production rules that have been applied.

Step 3 − Increment i and repeat Step 2, until Yi+1 = Yi.

Problem
Find a reduced grammar equivalent to the grammar G, having production
rules, P: S → AC | B, A → a, C → c | BC, E → aA | e

Solution

Phase 1 −

T = { a, c, e }

W1 = { A, C, E } from rules A → a, C → c and E → aA

W2 = { A, C, E } U { S } from rule S → AC

W3 = { A, C, E, S } U ∅

Since W2 = W3, we can derive G’ as −

G’ = { { A, C, E, S }, { a, c, e }, P, {S}}

where P: S → AC, A → a, C → c , E → aA | e

Phase 2 −

Y1 = { S }

Y2 = { S, A, C } from rule S → AC

Y3 = { S, A, C, a, c } from rules A → a and C → c

Y4 = { S, A, C, a, c }

Since Y3 = Y4, we can derive G” as −

G” = { { A, C, S }, { a, c }, P, {S}}

where P: S → AC, A → a, C → c
Removal of Unit Productions
Any production rule in the form A → B where A, B ∈ Non-terminal is
called unit production..

Removal Procedure −

Step 1 − To remove A → B, add production A → x to the grammar rule

whenever B → x occurs in the grammar. [x ∈ Terminal, x can be Null]

Step 2 − Delete A → B from the grammar.

Step 3 − Repeat from step 1 until all unit productions are removed.

Problem

Remove unit production from the following −

S → XY, X → a, Y → Z | b, Z → M, M → N, N → a

Solution −

There are 3 unit productions in the grammar −

Y → Z, Z → M, and M → N

At first, we will remove M → N.

As N → a, we add M → a, and M → N is removed.

The production set becomes

S → XY, X → a, Y → Z | b, Z → M, M → a, N → a

Now we will remove Z → M.

As M → a, we add Z→ a, and Z → M is removed.

The production set becomes

S → XY, X → a, Y → Z | b, Z → a, M → a, N → a

Now we will remove Y → Z.

As Z → a, we add Y→ a, and Y → Z is removed.

The production set becomes

S → XY, X → a, Y → a | b, Z → a, M → a, N → a

Now Z, M, and N are unreachable, hence we can remove those.

The final CFG is unit production free −

S → XY, X → a, Y → a | b
Removal of Null Productions
In a CFG, a non-terminal symbol ‘A’ is a nullable variable if there is a
production A → ε or there is a derivation that starts at A and finally ends
up with

ε: A → .......… → ε

Removal Procedure

Step 1 − Find out nullable non-terminal variables which derive ε.

Step 2 − For each production A → a, construct all productions A →

x where x is obtained from ‘a’ by removing one or multiple non-terminals
from Step 1.

Step 3 − Combine the original productions with the result of step 2 and
remove ε - productions.

Problem

Remove null production from the following −

S → ASA | aB | b, A → B, B → b | ∈

Solution −

There are two nullable variables − A and B

At first, we will remove B → ε.

After removing B → ε, the production set becomes −

S→ASA | aB | b | a, A ε B| b | &epsilon, B → b
Now we will remove A → ε.

After removing A → ε, the production set becomes −

S→ASA | aB | b | a | SA | AS | S, A → B| b, B → b

This is the final production set without null transition.

Question : In general how many normal forms are for CFG (Context Free Grammar)
Ans. 2 Chomsky Normal Form and Greiback Normal Form

Important terms
The most general phase of structured grammar is?
Answer: a. Context Sensitive Grammar
Explanation: Context-sensitive grammar is the most general phase of
structured grammar because, in this grammar, the left-hand side and the right
side contain the terminals or non-terminals.
In the compiler, the function of using intermediate code is:

Answer: d. to increase the chances of re-using the machine-independent code

optimizer in other compilers.

Explanation: After semantic analysis, the intermediate code increases the

chances of reusing the machine-independent code optimizer in other compilers.
In how many types of optimization can be divided?
Answer: a. two types
Explanation: The code optimization technique is divided into machine-
dependent and machine-independent types.
Which language is accepted by the push-down automata?

Answer: c. Type 2 language

Explanation: According to the Chomsky hierarchy, push down automata accepts

the Type 2 language, which is used for context-free language.

Which of the following parser is a top-down parser?

Answer: d. Recursive descent parser

Explanation: Recursive descent parser is a type of top-down parser which

generates the parse tree from top to bottom and reads the input string from left
to right.

Which parser is most powerful in the following parsers?

a. Operator Precedence
b. SLR
c. Canonical LR
d. LALR

Answer: c. Canonical LR

Explanation: Canonical LR (CLR) is the most powerful parser than LALR and SLR.

The value of which variable is updated inside the loop by a loop-invariant value?

a. loop
b. strength
c. induction
d. invariable

Answer: c. induction

Explanation: The value of the induction variable is updated inside the loop by a
loop-invariant value.

Which of the following is not a characteristic of the compiler?

a. More execution time

b. Debugging process is slow
c. The execution takes place after the removal of all syntax errors
d. Firstly scans the entire program and then transforms it into machine-
understandable code

Answer: a. More execution time

Explanation: The compiler does not take more time to execute. So, more
execution time is not a characteristic of the compiler.
Which method merges the multiple loops into the single one?

a. Constant Folding
b. Loop rolling
c. Loop fusion or jamming
d. None of the above
Answer: c. Loop fusion or Loop jamming

Explanation: Loop fusion is an optimization technique which merges the multiple

bodies of loops into a single body. This programming technique may reduce the
runtime performance of the program.

Which parser is known as the shift-reduce parser?

a. Bottom-up parser
b. Top-down parser
c. Both Top-down and bottom-up
d. None of the Above

Answer: a. Bottom-up parser

Explanation: Bottom-up parser in the compiler is also called the shift-reduce

parser.

Unit-II CFG Pda Presentation
No ratings yet
Unit-II CFG Pda Presentation
68 pages
Toc 2
No ratings yet
Toc 2
17 pages
Parser Role in Compilation Process
No ratings yet
Parser Role in Compilation Process
31 pages
Compiler Design
No ratings yet
Compiler Design
10 pages
Context-Free Grammar Guide
No ratings yet
Context-Free Grammar Guide
31 pages
Chomsky II
No ratings yet
Chomsky II
78 pages
2 Contex Free Language
No ratings yet
2 Contex Free Language
13 pages
Unit 3 - Theory of Computation - WWW - Rgpvnotes.in
No ratings yet
Unit 3 - Theory of Computation - WWW - Rgpvnotes.in
14 pages
Chapter 3
No ratings yet
Chapter 3
32 pages
TCNTMN
No ratings yet
TCNTMN
48 pages
Flat Module 3
No ratings yet
Flat Module 3
18 pages
FLAT - Ch. 3 (Lecture Notes)
No ratings yet
FLAT - Ch. 3 (Lecture Notes)
23 pages
BIL308 - Biçimsel Diller Ve Otomata Teorisi Lecture 6: Context-Free Grammars
No ratings yet
BIL308 - Biçimsel Diller Ve Otomata Teorisi Lecture 6: Context-Free Grammars
59 pages
Compiler Design SUBJECT CODE: 203105351: Prof. Kapil Raghuwanshi
No ratings yet
Compiler Design SUBJECT CODE: 203105351: Prof. Kapil Raghuwanshi
66 pages
Unit 2 Notes
No ratings yet
Unit 2 Notes
29 pages
Context-Free Grammar Overview
No ratings yet
Context-Free Grammar Overview
22 pages
Toc Unit III
No ratings yet
Toc Unit III
36 pages
Context Free Grammar
No ratings yet
Context Free Grammar
16 pages
CFG Simplification Techniques
100% (2)
CFG Simplification Techniques
12 pages
CFG Normal Forms Explained
No ratings yet
CFG Normal Forms Explained
35 pages
Context-Free Grammars Explained
No ratings yet
Context-Free Grammars Explained
20 pages
Context Free Grammar in COMP6390
No ratings yet
Context Free Grammar in COMP6390
34 pages
CD UNIT-II Syntax Analysis
No ratings yet
CD UNIT-II Syntax Analysis
13 pages
Compiler Syntax & Yacc Guide
No ratings yet
Compiler Syntax & Yacc Guide
21 pages
UNIT-2 Notes 1
No ratings yet
UNIT-2 Notes 1
12 pages
Context-Free Grammar and Parsing Overview
No ratings yet
Context-Free Grammar and Parsing Overview
19 pages
Context Free Grammar: 1. G (V, T, P, S)
No ratings yet
Context Free Grammar: 1. G (V, T, P, S)
37 pages
Lex and YACC: Lexical Analysis Guide
No ratings yet
Lex and YACC: Lexical Analysis Guide
11 pages
Unit IV Context Free Languages
No ratings yet
Unit IV Context Free Languages
89 pages
Context-Free Grammar & Pumping Lemma
No ratings yet
Context-Free Grammar & Pumping Lemma
44 pages
Flat Unit-5 LM
No ratings yet
Flat Unit-5 LM
21 pages
Formal Lnguage & Automata Theory
No ratings yet
Formal Lnguage & Automata Theory
8 pages
Parser Lec1
No ratings yet
Parser Lec1
20 pages
Compiler Phases and Token Recognition Guide
100% (1)
Compiler Phases and Token Recognition Guide
22 pages
CFG Simplification Guide
No ratings yet
CFG Simplification Guide
65 pages
Module No. 5.1 - Analysis - Lexical Analysis
No ratings yet
Module No. 5.1 - Analysis - Lexical Analysis
23 pages
Toc Ii
No ratings yet
Toc Ii
40 pages
Regular Grammar - M2
No ratings yet
Regular Grammar - M2
78 pages
Context-Free Grammars and Parsing
No ratings yet
Context-Free Grammars and Parsing
7 pages
TOC-DEC-19 StrangeR
No ratings yet
TOC-DEC-19 StrangeR
8 pages
ACT Chapter 3
No ratings yet
ACT Chapter 3
28 pages
CC Lec 7
No ratings yet
CC Lec 7
16 pages
Chapter 3 - Syntax Analysis
No ratings yet
Chapter 3 - Syntax Analysis
9 pages
Multimedia Application L4
No ratings yet
Multimedia Application L4
42 pages
Chapter 3 - CFG
No ratings yet
Chapter 3 - CFG
26 pages
Unit 3
No ratings yet
Unit 3
36 pages
CFG Simplification
No ratings yet
CFG Simplification
3 pages
CS351 Context Free Grammars
No ratings yet
CS351 Context Free Grammars
9 pages
Unit 3 CFG
No ratings yet
Unit 3 CFG
65 pages
SE Compiler Chapter 3-Parser
No ratings yet
SE Compiler Chapter 3-Parser
27 pages
Unit 3-FLAT
No ratings yet
Unit 3-FLAT
80 pages
Compiler Design - Syntax Analysis
No ratings yet
Compiler Design - Syntax Analysis
11 pages
Assignmnt 1
No ratings yet
Assignmnt 1
26 pages
CSE322 #Automata Full Unit - 4 Context Free Languages (@rajkumar)
No ratings yet
CSE322 #Automata Full Unit - 4 Context Free Languages (@rajkumar)
74 pages
Compiler Phases Explained
No ratings yet
Compiler Phases Explained
10 pages
Flat Unit-3
No ratings yet
Flat Unit-3
74 pages
TOC II Updated
No ratings yet
TOC II Updated
41 pages
UNIT-2 TOc by Krishnendu
No ratings yet
UNIT-2 TOc by Krishnendu
44 pages
Unit Iv Context Free Languages
No ratings yet
Unit Iv Context Free Languages
74 pages
Java Regex Tutorial and Examples
No ratings yet
Java Regex Tutorial and Examples
6 pages
Overview of Natural Language Processing
No ratings yet
Overview of Natural Language Processing
43 pages
Lecture 1 - Introducing C & Fundamentals
No ratings yet
Lecture 1 - Introducing C & Fundamentals
17 pages
Compiler Design Viva Questions by FFT
No ratings yet
Compiler Design Viva Questions by FFT
17 pages
PPSC Lecturer of Computer Science Past Paper (Networks) Mcqs Questions Solved
No ratings yet
PPSC Lecturer of Computer Science Past Paper (Networks) Mcqs Questions Solved
27 pages
SQLite Database for Mobile Apps
No ratings yet
SQLite Database for Mobile Apps
19 pages
Syntax Analysis in Compiler Design
No ratings yet
Syntax Analysis in Compiler Design
24 pages
Compiler Design Course Overview
No ratings yet
Compiler Design Course Overview
58 pages
B.Tech CSE Program Structure & Syllabus
No ratings yet
B.Tech CSE Program Structure & Syllabus
184 pages
Lexical Analysis in Compiler Design
No ratings yet
Lexical Analysis in Compiler Design
33 pages
Machine Translation Questions
No ratings yet
Machine Translation Questions
9 pages
Compiler Design - 2-Mark and 16-Mark Answers
No ratings yet
Compiler Design - 2-Mark and 16-Mark Answers
19 pages
A Survey On Data Selection For Language Models
No ratings yet
A Survey On Data Selection For Language Models
72 pages
Compiler MCQs with Answers & Explanations
No ratings yet
Compiler MCQs with Answers & Explanations
22 pages
Regular Expressions and Its Applications
No ratings yet
Regular Expressions and Its Applications
6 pages
KTU S6 CSE Compiler Design Notes
No ratings yet
KTU S6 CSE Compiler Design Notes
28 pages
TCS Theory Questions
No ratings yet
TCS Theory Questions
7 pages
BNF Ebnf
100% (1)
BNF Ebnf
25 pages
1 Basics Operators Expressions
No ratings yet
1 Basics Operators Expressions
50 pages
1 - Introducntion To NLP
No ratings yet
1 - Introducntion To NLP
43 pages
Com 121
No ratings yet
Com 121
2 pages
Build Your Own C Interpreter
No ratings yet
Build Your Own C Interpreter
18 pages
Flex Bison WSL Setup Guide
No ratings yet
Flex Bison WSL Setup Guide
8 pages
Power Query M Formula Language Guide
No ratings yet
Power Query M Formula Language Guide
1,247 pages
Lex1 Lab Manual TE Computer SPPU
No ratings yet
Lex1 Lab Manual TE Computer SPPU
6 pages
Lexical Analysis and Tokenization
No ratings yet
Lexical Analysis and Tokenization
4 pages
Java Programming Guide
No ratings yet
Java Programming Guide
136 pages
Compiler Design Course Guide
No ratings yet
Compiler Design Course Guide
15 pages
Compiler Symbol Table Guide
No ratings yet
Compiler Symbol Table Guide
10 pages
Source Code in Database (SCID)
100% (8)
Source Code in Database (SCID)
99 pages

Compiler Design

Uploaded by

Compiler Design

Uploaded by

Compiler Design

Token Type: Keyword

Token Type: Identifier

Token Type: Operator

Description: The lexeme "+" is recognized as an operator, indicating addition.

Token Type: Integer Literal

Description: The lexeme "42" represents an integer literal, a numeric constant.

Token Type: Floating-Point Literal

Token Type: Left Parenthesis

Phase 1 − Derivation of an equivalent grammar, G’, from the CFG, G, such

Step 2 − Include all symbols, Wi+1, that derive Wi.

Step 3 − Increment i and repeat Step 2, until Wi+1 = Wi.

Step 4 − Include all production rules that have Wi in it.

Phase 2 − Derivation of an equivalent grammar, G”, from the CFG, G’,

Step 1 − Include the start symbol in Y1 and initialize i = 1.

Step 3 − Increment i and repeat Step 2, until Yi+1 = Yi.

W1 = { A, C, E } from rules A → a, C → c and E → aA

Since W2 = W3, we can derive G’ as −

Y3 = { S, A, C, a, c } from rules A → a and C → c

Since Y3 = Y4, we can derive G” as −

Step 1 − To remove A → B, add production A → x to the grammar rule

Step 2 − Delete A → B from the grammar.

Remove unit production from the following −

There are 3 unit productions in the grammar −

At first, we will remove M → N.

As N → a, we add M → a, and M → N is removed.

Now we will remove Z → M.

As M → a, we add Z→ a, and Z → M is removed.

The production set becomes

Now we will remove Y → Z.

As Z → a, we add Y→ a, and Y → Z is removed.

The production set becomes

Now Z, M, and N are unreachable, hence we can remove those.

The final CFG is unit production free −

Step 1 − Find out nullable non-terminal variables which derive ε.

Step 2 − For each production A → a, construct all productions A →

Remove null production from the following −

There are two nullable variables − A and B

At first, we will remove B → ε.

After removing B → ε, the production set becomes −

After removing A → ε, the production set becomes −

This is the final production set without null transition.

Answer: d. to increase the chances of re-using the machine-independent code

Explanation: After semantic analysis, the intermediate code increases the

Answer: c. Type 2 language

Explanation: According to the Chomsky hierarchy, push down automata accepts

Which of the following parser is a top-down parser?

Answer: d. Recursive descent parser

Explanation: Recursive descent parser is a type of top-down parser which

Which parser is most powerful in the following parsers?

Which of the following is not a characteristic of the compiler?

a. More execution time

Answer: a. More execution time

Explanation: Loop fusion is an optimization technique which merges the multiple

Which parser is known as the shift-reduce parser?

Answer: a. Bottom-up parser

Explanation: Bottom-up parser in the compiler is also called the shift-reduce

You might also like