0% found this document useful (0 votes)
10 views76 pages

Case Insensitive Regex For SQL

The document discusses case-insensitive regex in SQL, explaining how to match keywords like 'SELECT' using either a case-insensitive flag or manual case matching. It also outlines the steps to show that a grammar is LALR(1) and how to check for conflicts in LALR(1) tables. Additionally, it introduces attributed grammars and their role in translating infix expressions to postfix form.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views76 pages

Case Insensitive Regex For SQL

The document discusses case-insensitive regex in SQL, explaining how to match keywords like 'SELECT' using either a case-insensitive flag or manual case matching. It also outlines the steps to show that a grammar is LALR(1) and how to check for conflicts in LALR(1) tables. Additionally, it introduces attributed grammars and their role in translating infix expressions to postfix form.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Case Insensitive Regex for SQL

The programming language may be case sensitive or case in sensitive. Case


sensitive languages treat uppercase and lowercase letters as distinct characters in
programming whereas case insensitive languages treat both as same. Most of the
programming languages are case sensitive, However some languages like SQL are
case insensitive so the keyword can be written either in lower case, upper case, or
in any mixture case. The SQL keyword SELECT can also be written select,
Select, or SeLeCT. Show how to write a regular expression for a keyword in a
case insensitive language, explain with “select” in SQL

In a case-insensitive language like SQL, a regular expression (regex) to match a keyword (like `SELECT`)
should allow all variations of letter casing (e.g., `select`, `SELECT`, `Select`, `SeLeCT`, etc.).

✅ Option 1: Use Regex Flags (Preferred)


Most modern regex engines support a case-insensitive flag—commonly `i`—which tells the engine to
ignore letter casing:

Regex:

regex

/select/i

Explanation:

`/.../` is the regex pattern.


`select` is the keyword to match.

`i` flag makes the match case-insensitive.

Matches:

`SELECT`

`select`

`Select`

`SeLeCT`

etc.

✅ Option 2: Manual Case Matching (if flags are not allowed)


Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 1/76
You can explicitly allow all case variations by matching each letter individually:

Regex:

regex

[Ss][Ee][Ll][Ee][Cc][Tt]

Explanation:

`[Ss]` matches either `S` or `s`


`[Ee]` matches either `E` or `e`

and so on...

Matches:

Same as above: all casing combinations of "SELECT"

✅ SQL Example:
If you’re scanning SQL code for the `SELECT` keyword, either approach will help identify the keyword
regardless of case.

Example in Python (using the `re` module):

python

import re

sql_code = "select * from users"


match = [Link](r'select', sql_code, [Link])
if match:
print("SELECT keyword found!")

Would you like help building a regex for multiple SQL keywords (like `SELECT`, `FROM`, `WHERE`, etc.)?

give me the regular expression

Sure! Here's the regular expression for the SQL keyword `SELECT` in a case-insensitive language:

✅ Using Case-Insensitive Flag (Recommended if supported):


regex

/select/i

Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 2/76
Use this if your regex engine supports the `i` (ignore case) flag. It matches `select`, `SELECT`,
`Select`, `SeLeCT`, etc.

✅ Without Case-Insensitive Flag (Manual Matching):


regex

[Ss][Ee][Ll][Ee][Cc][Tt]

Use this if your regex engine does not support flags. It manually matches each letter in either
case.

Let me know if you want to match other SQL keywords too, like `FROM`, `WHERE`, etc., or build a pattern
to match full SQL statements.

Show how to write a regular expression for a keyword in a


case insensitive language

To write a regular expression for a keyword in a case-insensitive language, you can use either of
two main methods depending on your regex engine:

✅ Method 1: Use a Case-Insensitive Flag ( `i`)

This is the simplest and cleanest way to match a keyword like `"select"` in a case-insensitive language
(e.g., SQL):

Regular Expression:

regex

/select/i

Explanation:

`select` is the keyword.


`/.../` denotes the regex pattern (in many languages like JavaScript, Perl, etc.).

`i` is the case-insensitive flag, making the regex match any case variation:
`SELECT`

`select`

`SeLeCt`

`sElEcT`, etc.

Supported in:
Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 3/76
JavaScript

Python (`[Link]`)
Java (`Pattern.CASE_INSENSITIVE`)
Most modern regex engines

✅ Method 2: Manual Case Matching (If Flags Are Not Allowed)


If you cannot use the `i` flag, manually match upper and lower case for each character:

Regular Expression:

regex

[Ss][Ee][Ll][Ee][Cc][Tt]

Explanation:

`[Ss]` matches either `S` or `s`


`[Ee]` matches either `E` or `e`
And so on for each character in `SELECT`

Also Matches:

`select`, `SELECT`, `SeLeCt`, etc.

✅ Summary Table:
Method Regex Use When
Case-insensitive flag `/select/i` Your regex engine supports `i` flag
Manual case matching `[Ss][Ee][Ll][Ee][Cc][Tt]` Flags not supported (e.g., some tools)

Would you like a version that matches multiple SQL keywords case-insensitively (like
`SELECT|FROM|WHERE`)?

how to show the given grammar is LALR(1)

To show that a given grammar is LALR(1), you need to follow a formal procedure involving parsing
table construction and state merging. Here's a step-by-step explanation:

✅ Step-by-Step Process to Show a Grammar is LALR(1)


1. Start with the grammar
Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 4/76
Ensure your grammar is augmented (add a new start symbol, e.g., `S' → S`).

2. Construct the LR(0) Automaton

Build LR(0) items and construct the canonical collection of LR(0) states using the closure and goto
operations.

Each item looks like this:

css

A → α·β

Where `·` represents the current parsing position.

3. Construct the SLR(1) parsing table (optional but helpful)

This helps verify if the grammar is already SLR(1). If it is, then it is definitely LALR(1) (since SLR(1) ⊆
LALR(1)).

If not, continue with the LALR(1) procedure.

4. Construct the Canonical LR(1) Parsing Table

This is the key step:

Add lookahead symbols to each LR(0) item, making them LR(1) items:

css

A → α·β , a

where `a` is a lookahead symbol (terminal).


Build the canonical collection of LR(1) items and then the full LR(1) parse table.

5. Merge LR(1) States with Same LR(0) Core

Identify states in the LR(1) collection that have the same LR(0) core (i.e., the same productions
without the lookahead).
Merge these states only if their lookaheads don’t cause conflicts.

This is what defines the LALR(1) parser.

6. Check for Conflicts in the LALR(1) Table

If the resulting LALR(1) parsing table has no shift/reduce or reduce/reduce conflicts, the grammar is
LALR(1).

Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 5/76
✅ Summary Criteria:
Grammar is LALR(1) ⇔ Its LALR(1) parsing table has no parsing conflicts.
Construct LR(1) items → Merge compatible states → Check for table conflicts.

🔧 Tools That Can Help:


You can use parser generators to verify LALR(1) status:

YACC / Bison (C/C++)


ANTLR (Java/Python/etc.)

PLY (Python)
Lark (Python with LALR support)

They will output conflicts or confirm the grammar is LALR(1).

Would you like to try this with a specific grammar example?

how to Check for Conflicts in the LALR(1) Table

To check for conflicts in the LALR(1) parsing table, you follow a structured approach. This step is
crucial to determine whether a grammar is truly LALR(1).

✅ What Is a Conflict?
In a parsing table, a conflict happens when a parser doesn't know what action to take for a given state
and lookahead symbol. There are two main types:

1. Shift/Reduce Conflict:
The parser can either shift (read more input) or reduce (apply a grammar rule) — and it’s unclear
which to choose.

2. Reduce/Reduce Conflict:
The parser could reduce using two or more different rules — again, unclear which one to use.

✅ Steps to Check for Conflicts in an LALR(1) Table


1. Build the Canonical LR(1) Collection

Construct the full LR(1) item sets with lookaheads.


Merge LR(1) item sets that have the same LR(0) core into LALR(1) states.

Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 6/76
2. Create the LALR(1) Parsing Table

Build an Action and Goto table from the merged states:

Action table: For terminals (shift/reduce/accept).


Goto table: For non-terminals (transitions on non-terminals).

3. Scan the Action Table for Conflicts

For each [state, terminal] pair in the Action table, check:

Are there two or more actions listed?

If it's both shift and reduce → shift/reduce conflict

If it's reduce by two different rules → reduce/reduce conflict

If no conflicts exist, the grammar is LALR(1).

🔍 Example Conflict (in Action Table):


State Input Action

5 `)` shift to 9
5 `)` reduce by A→α

⛔ Conflict: Shift/Reduce on input `)` in state 5.

✅ How to Resolve Conflicts?


Modify grammar (left-factor, eliminate ambiguity, etc.).

Use precedence and associativity rules (some parser tools allow this).
Consider rewriting grammar for full LR(1) if LALR(1) isn't enough.

🔧 Tools That Help Identify Conflicts


Tool Language Notes

Bison/Yacc C/C++ Outputs shift/reduce and reduce/reduce conflicts


ANTLR Java/etc. Highlights ambiguous rules

Lark (Python) Python Shows conflicts in LALR table

Would you like a small grammar example showing how to find a conflict in an actual LALR(1) table?

What do you mean by attributed grammars? Discuss the translation scheme for

Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 7/76
Converting an infix expression to its equivalent postfix form.

✅ What is an Attributed Grammar?


An attributed grammar is an extension of a context-free grammar (CFG) where attributes are attached
to grammar symbols, and semantic rules (actions) define how to compute these attributes.

🔧 Key Concepts:
Attributes: Values associated with terminals and non-terminals (e.g., types, values, or
intermediate code).

Synthesized attributes: Computed from child nodes (bottom-up).


Inherited attributes: Passed down from parent or sibling nodes (top-down).

Semantic Rules: Used to compute attribute values during parsing.

Purpose: Attributed grammars enable translation, type checking, code generation, etc., in
compilers.

✅ Translation Scheme for Infix → Postfix


To convert an infix expression (like `a + b * c`) to postfix (`a b c * +`), we use an S-attributed
grammar (only synthesized attributes) and a bottom-up traversal.

💡 Grammar for Infix Expressions:


Let's define a simple grammar for expressions involving addition and multiplication:

E → E + T
E → T
T → T * F
T → F
F → (E)
F → id

🔄 Add Semantic Actions for Postfix Translation:


We add an attribute `[Link]` and define rules that compute the postfix string during parsing:

E → E1 + T { [Link] = [Link] [Link] "+" }


E → T { [Link] = [Link] }

T → T1 * F { [Link] = [Link] [Link] "*" }


T → F { [Link] = [Link] }

F → (E) { [Link] = [Link] }


F → id { [Link] = [Link] }

Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 8/76
`[Link]` represents the actual variable name or token string.

🧮 Example: Convert `a + b * c` to Postfix

Let’s walk through the derivation:

1. Infix: `a + b * c`

2. Parse Tree:

E → E + T
E → T
T → F
F → a
T → T * F
T → F
F → b
F → c

3. Bottom-up Postfix Computation:


`F → a` → `a`

`F → b` → `b`

`F → c` → `c`
`T → F` → `b`

`T → T * F` → `b c *`

`E → T` → `a`
`E → E + T` → `a b c * +`

✅ Final Postfix: `a b c * +`

✅ Summary:
Concept Description

Attributed Grammar CFG + attributes + semantic rules

Used For Translation, type checking, code gen


Postfix Conversion Use synthesized attributes in semantic rules

Translation Scheme Compute postfix recursively using children

Would you like a full implementation of this in Python or a parser generator like ANTLR or Lark?

Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 9/76
how t1 value is taken here

In the given Three Address Code (TAC), the line:

yaml

104 : T1 = i * 4

is computing the value of `T1` as `i * 4` because it is simulating array indexing in memory.

✅ Why `T1 = i * 4`?

This assumes:

The array `A` is an array of elements of size 4 bytes (commonly the size of an integer).
To access `A[i]` in memory, we calculate the byte offset:

ini

offset = i * size_of_element = i * 4

✅ Explanation of Line 104:


`T1 = i * 4`

`i` is the current index.

`T1` stores the memory offset of the `i-th` element in array `A`.

✅ Next Line Uses It:


`105 : T2 = A[T1]`

This line retrieves the value from array `A` at the offset `T1`.

Essentially, it means: `T2 = A[i]`.

Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 10/76
🧠 Summary:
TAC Line Meaning Explanation

`T1 = i * 4` Compute offset Each element is 4 bytes; `i` is index


`T2 = A[T1]` Access array element Get value at `A[i]`

This is a typical low-level representation of array access in intermediate code where arrays are stored
in contiguous memory.

Would you like to see this example simulated in memory for better understanding?

how to show a grammar is Is LR(1) but not LALR(1).

To show that a grammar is LR(1) but not LALR(1), you need to find a grammar where:

1. ✅ The LR(1) parsing table has no conflicts (so the grammar is LR(1)), but
2. ❌ Merging LR(1) states with identical LR(0) cores introduces conflicts, making the LALR(1)
table invalid.

✅ Step-by-Step Method:
1. Construct the Canonical LR(1) Item Sets

Create the full LR(1) collection with lookaheads.

Ensure no shift/reduce or reduce/reduce conflicts → Grammar is LR(1).

2. Merge States with the Same LR(0) Core

Now merge LR(1) items that have the same LR(0) items but different lookaheads.

If merging creates conflicting actions (like reduce/reduce) → The grammar is not LALR(1).

🔍 Example: A Grammar That Is LR(1) But Not LALR(1)


Here is a classic example:

less

S → Aa | bAc | dc | bda
A → d

📌 Step 1: Prove It Is LR(1)


The canonical LR(1) parse table for this grammar is conflict-free — so it's LR(1).

Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 11/76
📌 Step 2: LALR(1) Table Has Conflict
When you merge LR(1) states that contain:

css

A → d , a
A → d , c

You get:

css

A → d , a/c

Then, in the merged state, a reduce/reduce conflict occurs for lookaheads `a` and `c`.
Hence, this grammar is LR(1) but not LALR(1).

✅ Conclusion:
To prove a grammar is LR(1) but not LALR(1), do this:

Step Action Result


1 Build LR(1) item sets and table No conflicts → Grammar is LR(1)
2 Merge states with same LR(0) items If conflicts appear → Not LALR(1)

Would you like a detailed construction of the LR(1) and LALR(1) tables for the example above?

Explain with an example “why left recursion is an issue in top down parsing” and
write steps for left recursion elimination

✅ Why Left Recursion Is an Issue in Top-Down Parsing


Top-down parsers, like recursive descent parsers, expand grammar rules starting from the start
symbol and try to match input tokens.

However, left-recursive grammars can cause infinite recursion during this process.

⚠️ What Is Left Recursion?


A grammar is left-recursive if a non-terminal recursively calls itself as its first symbol on the right-
hand side:
Direct Left Recursion:

less

A → A α | β

Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 12/76
Indirect Left Recursion:

css

A → B α
B → A β

❗ Why It Causes Problems in Top-Down Parsing


Top-down parsers try to expand from the left. If the rule is left-recursive, the parser keeps applying the
left-recursive rule without consuming any input, leading to infinite recursion.

🔍 Example: Problem with Left Recursion


Grammar:

E → E + T | T
T → id

Input: `id + id`

Recursive Descent Parser Behavior:

Tries to expand `E` using `E → E + T`


Again expands the left `E → E + T`
And again...

Never reaches `T → id`

👉 Infinite recursion occurs.


✅ Steps to Eliminate Left Recursion
We eliminate left recursion by rewriting the grammar.

📌 General Algorithm (for Direct Left Recursion)


Given:

less

A → A α1 | A α2 | ... | β1 | β2 | ...

Where `β` is a production that doesn't start with A.


Transform into:

Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 13/76
nginx

A → β1 A' | β2 A' | ...


A' → α1 A' | α2 A' | ... | ε

🔄 Example: Remove Left Recursion


Original:

E → E + T | T

Transformed:

pgsql

E → T E'
E' → + T E' | ε

✅ Now it's not left-recursive and suitable for top-down parsing.


📌 Algorithm (for Indirect Left Recursion)
For non-terminals `A1, A2, ..., An`:

1. For each `Ai` from 1 to n:


For each `Aj` where j < i:

Replace occurrences of `Ai → Aj γ` with `Aj`'s right-hand sides.


Then eliminate direct left recursion for `Ai`.

This ensures all indirect left recursion is handled before removing direct left recursion.

✅ Summary
Topic Explanation

Left Recursion A → A α form causes infinite recursion


Issue in Top-Down Parser recurses infinitely without consuming input
Fix Rewrite using right recursion

Applies To Recursive Descent, LL(1) parsers

Would you like help transforming a specific grammar to eliminate left recursion?

Syntax Directed Definition


A syntax-directed definition (SDD) is a context-free grammar together with attributes

Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 14/76
and rules. Attributes are associated with grammar symbols and rules are associated
with productions. If X is a symbol and a is one of its attributes, then we write X.a to
denote the value of a at a particular parse-tree node labeled X. For example, an infix-to
postfix translator might have a production and rule
PRODUCTION
E  E1 + T
SEMANTIC RULE
[Link] = [Link] || [Link] || ‘+’
This production has two non terminals, E and T; the subscript in E1 distinguishes the
occurrence of E in the production body from the occurrence of E as the head. Both E and
T have a string-valued attribute code. The semantic rule specifies that the string [Link] is
formed by concatenating [Link], [Link], and the character '+'.
1.1 Inherited and Synthesized Attributes
Two kinds of attributes for nonterminals:
1. A synthesized attribute for a nonterminal A at a parse-tree node N is defined by a
semantic rule associated with the production at N. Note that the production must
have A as its head. A synthesized attribute at node N is defined only in terms of
attribute values at the children of N and at N itself.
2. An inherited attribute for a nonterminal B at a parse-tree node N is defined by a
semantic rule associated with the production at the parent of N. Note that the
production must have B as a symbol in its body. An inherited attribute at node N
is defined only in terms of attribute values at N's parent, N itself, and N's
siblings.
Example 3.3.1: The SDD in Fig. 3.3.1 is based on our familiar grammar for arithmetic
expressions with operators + and *. It evaluates expressions terminated by an end marker
n. In the SDD, each of the nonterminals has a single synthesized attribute, called val. We
also suppose that the terminal digit has a synthesized attribute lexval, which is an integer
value returned by the lexical analyzer.
Figure 3.3.1 : Syntax-directed definition of a simple desk calculator
 The rule for production 1, L  E n, sets [Link] to [Link], which we shall see is the
numerical value of the entire expression.
 Production 2, E  E1+ T, also has one rule, which computes the val attribute for
the head E as the sum of the values at E1and T. At any parse tree node N labeled E,
the value of val for E is the sum of the values of val at the children of node N
labeled E and T.
III [Link] I SEM I.T, KHIT
Page 1
UNIT-3- PART-1
COMPILER DESIGN
 Production 3, E T, has a single rule that defines the value of val for E to be the
same as the value of val at the child for T.
 Production 4 is similar to the second production; its rule multiplies the values at
the children instead of adding them.
 The rules for productions 5 and 6 copy values at a child, like that for the third
production. Production 7 gives [Link] the value of a digit, that is, the numerical
value of the token digit that the lexical analyzer returned.
An SDD that involves only synthesized attributes is called S-attributed; the SDD in Fig.
3.3.1 has this property. In an S-attributed SDD, each rule computes an attribute for the
nonterminal at the head of a production from attributes taken from the body of the
production.
An SDD without side effects is sometimes called an attribute grammar. The rules in an
Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 15/76
attribute grammar define the value of an attribute purely in terms of the values of other
attributes and constants.
Evaluation Orders for SDD's
1. Dependency Graphs
A dependency graph depicts the flow of information among the attribute instances in a
particular parse tree; an edge from one attribute instance to another means that the
value of the first is needed to compute the second. Edges express constraints implied by
the semantic rules.
 For each parse-tree node, say a node labeled by grammar symbol X, the
dependency graph has a node for each attribute associated with X.
 Suppose that a semantic rule associated with a production p defines the value of
synthesized attribute A.b in terms of the value of X.c. Then, the dependency graph
has an edge from X.c to A.b. More precisely, at every node N labeled A where
production p is applied; create an edge to attribute b at N, from the attribute c at
the child of N corresponding to this instance of the symbol X in the body of the
production.
 Suppose that a semantic rule associated with a production p defines the value of
inherited attribute B.c in terms of the value of X.a. Then, the dependency graph
has an edge from X.a to B.c. For each node N labeled B that corresponds to an
occurrence of this B in the body of production p, create an edge to attribute c at N
from the attribute a at the node M that corresponds to this occurrence of X. Note
that M could be either the parent or a sibling of N.
Example 3.4.1: Consider the following production and rule:
PRODUCTION
SEMANTIC RU LE
E ->E1+ T
[Link]= [Link] + [Link]
At every node N labeled E, with children corresponding to the body of this production,
the synthesized attribute val at N is computed using the values of val at the two children,
labeled E and T. Thus, a portion of the dependency graph for every parse tree in which
this production is used looks like Fig. 3.4.1. As a convention, we shall show the parse tree
edges as dotted lines, while the edges of the dependency graph are solid.
Figure 3.4.1: [Link] is synthesized from [Link] and [Link]
Example 3.4.2: An example of a complete dependency graph appears in Fig. 3.4.2. The
nodes of the dependency graph, represented by the numbers 1 through 9, correspond to
the attributes in the annotated parse tree in Fig. 3.3.5
III [Link] I SEM I.T, KHIT
Page 4
UNIT-3- PART-1
COMPILER DESIGN
Figure 3.4.2: Dependency graph for the annotated parse tree of Fig. 3.3.5
.
 Nodes 1 and 2 represent the attribute lexval associated with the two leaves
labeled digit.
 Nodes 3 and 4 represent the attribute val associated with the two nodes labeled F.
The edges to node 3 from 1 and to node 4 from 2 result from the semantic rule
that defines [Link] in terms of [Link] In fact, [Link] equals digit .lexval, but the
edge represents dependence, not equality.
 Nodes 5 and 6 represent the inherited attribute T'.inh associated with each of the
occurrences of nonterminal T'. The edge to 5 from 3 is due to the rule T'.inh =
[Link], which defines T'.inh at the right child of the root from [Link] at the left child.
Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 16/76
We see edges to 6 from node 5 for T'.inh and from node 4 for [Link], because these
values are multiplied to evaluate the attribute inh at node 6.
 Nodes 7 and 8 represent the synthesized attribute syn associated with the
occurrences of X". The edge to node 7 from 6 is due to the semantic rule T'.syn =
T'.inh associated with production 3 in Fig. 3.3.4. The edge to node 8 from 7 is due
to a semantic rule associated with production 2.
 Finally, node 9 represents the attribute [Link]. The edge to 9 from 8 is due to the
semantic rule, [Link] = T'.syn, associated with production 1.
The dependency graph of Fig. 3.4.2 has no cycles. One topological sort is the order in
which the nodes have already been numbered: 1,2,... ,9. Notice that every edge of the
graph goes from a node to a higher-numbered node, so this order is surely a topological
sort. There are other topological sorts as well, such as 1,3,5,2,4,6,7,8,9.
2. S-Attributed Definitions
Given an SDD, it is very hard to tell whether there exist any parse trees whose
dependency graphs have cycles. In practice, translations can be implemented using
classes of SDD's that guarantee an evaluation order, since they do not permit dependency
graphs with cycles. Moreover, the two classes can be implemented efficiently in
connection with top-down or bottom-up parsing.
The first class is defined as follows:
 An SDD is S-attributed if every attribute is synthesized.
Example 3.4.2: The SDD of Fig 3.3.1 is an example of an S-attributed definition. Each
attribute, [Link], [Link], [Link], and [Link] is synthesized.
When an SDD is S-attributed, we can evaluate its attributes in any bottom up order of the
nodes of the parse tree. It is often especially simple to evaluate the attributes by
III [Link] I SEM I.T, KHIT
Page 5
UNIT-3- PART-1
COMPILER DESIGN
performing a postorder traversal of the parse tree and evaluating the attributes at a node
N when the traversal leaves N for the last time. That is, we apply the function postorder,
defined below, to the root of the parse tree
postorder(N) {
for ( each child C of N, from the left ) postorder(C);
evaluate the attributes associated with node N;
}
S-attributed definitions can be implemented during bottom-up parsing, since a bottom-up
parse corresponds to a postorder traversal. Specifically, postorder corresponds exactly to
the order in which an LR parser reduces a production body to its head.
3. L-Attributed Definitions
The second class of SDD's is called L-attributed definitions. The idea behind this class is
that, between the attributes associated with a production body, dependency-graph edges
can go from left to right, but not from right to left(hence "L-attributed"). More precisely,
each attribute must be either
1. Synthesized, or
2. Inherited, but with the rules limited as follows. Suppose that there is a production
A X1X2 …. Xn, and that there is an inherited attribute Xi.a computed by a rule
associated with this production. Then the rule may use only:
(a) Inherited attributes associated with the head A.
(b) Either inherited or synthesized attributes associated with the
occurrences of symbols X1X2 …. Xn located to the left of Xi.
(c) Inherited or synthesized attributes associated with this occurrence of Xi
Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 17/76
itself, but only in such a way that there are no cycles in a dependency
graph formed by the attributes of this Xj .
Example 3.4.2: The SDD in Fig. 3.3.4 is L-attributed. To see why, consider the semantic
rules for inherited attributes, which are repeated here for convenience:
PRODUCTION
SEMANTIC RULE
T  FT'
T' ->*F T1’
T'.inh = [Link]
T1’. Inh = T'. inhx F. val
The first of these rules defines the inherited attribute T'.inh using only [Link], and F
appears to the left of T' in the production body, as required. The second rule defines
T[.inh using the inherited attribute T'.inh associated with the head, and [Link], where F
appears to the left of T[ in the production body.
In each of these cases, the rules use information "from above or from the left," as
required by the class. The remaining attributes are synthesized. Hence, the SDD is L
attributed. (Explain about SDD’s and also evaluation orders for SDD’s.)

Here’s a complete explanation of Syntax Directed Definitions (SDD) and their evaluation orders,
organized and simplified for clarity:

✅ Syntax Directed Definitions (SDD)


An SDD combines:

A Context-Free Grammar (CFG)


Attributes associated with grammar symbols

Semantic rules attached to grammar productions to compute attribute values

🧱 Basic Structure:
For a grammar symbol `X`, and attribute `a`, we write:

pgsql

X.a → value of attribute 'a' at node labeled 'X'

🧩 Types of Attributes
1. Synthesized Attributes

Computed from the children of a parse tree node

Associated with bottom-up flow


Most common in bottom-up parsers (like LR parsers)

🔹 Example:
For production: `E → E1 + T`
Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 18/76
Rule: `[Link] = [Link] + [Link]`
`[Link]` is synthesized from its children `E1` and `T`.

2. Inherited Attributes

Computed from the parent or siblings


Flow is typically top-down

Useful in top-down parsers (like LL parsers)

🔹 Example:
For production: `T → F T'`
Rule: `T'.inh = [Link]`
`T'.inh` is inherited from sibling `F`.

🎯 Example SDD – Infix to Postfix


Production:

E → E1 + T

Semantic Rule:

ini

[Link] = [Link] || [Link] || '+'

Here, `code` is a synthesized attribute—it’s computed from `[Link]` and `[Link]`.

📊 Evaluation Orders for SDD


To evaluate attribute values in the parse tree, we use dependency graphs.

1. Dependency Graphs

Shows which attributes depend on others.

Nodes = attribute instances

Edges = dependency direction (from needed attribute to dependent one)

🔸 Acyclic graph = valid evaluation order exists


🔸 If there’s a cycle, attributes can’t be evaluated consistently.
📍 Example:
Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 19/76
kotlin

E → E1 + T
[Link] = [Link] + [Link]

Dependency Graph (portion):

kotlin

[Link] → [Link]
[Link] → [Link]

2. S-Attributed Definitions

Only synthesized attributes

Evaluated bottom-up
Can use postorder traversal (visit children before evaluating parent)

🔸 Good for LR parsing


🔹 Example:
kotlin

E → E1 + T
[Link] = [Link] + [Link]

This is S-attributed.

3. L-Attributed Definitions

Allows both synthesized and inherited attributes

Left-to-right restrictions on inherited attributes


Still guarantees acyclic dependency graphs

🔸 Good for LL parsing (top-down)


Rule Restrictions for inherited attributes `Xi.a` in:

css

A → X1 X2 ... Xi ... Xn

Can depend on:

Inherited attributes of `A` (head)


Any attribute of `X1` to `Xi-1` (left siblings)
Attributes of `Xi` itself, as long as no cycles occur

Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 20/76
🔹 Example:
r

T → F T'
T'.inh = [Link]

This is L-attributed since `T'.inh` depends on the left sibling (`[Link]`).

🔄 Summary Table
Feature S-Attributed SDD L-Attributed SDD
Attribute Type Synthesized only Synthesized + Inherited
Evaluation Order Bottom-up (postorder) Top-down (left to right)

Parser Compatibility Bottom-up (LR) Top-down (LL)


Dependency Graph Cycles Never Never (if L-rules followed)

💡 Conclusion
SDDs provide a structured way to attach meaning to grammar rules.
Synthesized attributes work best in bottom-up parsing.

Inherited attributes enable top-down communication in parsing.


Using S-attributed or L-attributed rules ensures we can evaluate attributes deterministically.

Would you like a worked-out example of a full L-attributed or S-attributed grammar in action?

Write triples for the following statement.


c[i] = a[i] + b[i]
where a,b,c are arrays of type integers

To write triples for the statement:

c[i] = a[i] + b[i]

where `a`, `b`, and `c` are integer arrays, we break the expression into a sequence of 3-address
operations, then represent them using triples, which are indexed representations of these operations.

🔧 Step 1: Break Down into 3-Address Code (TAC)


Let’s express the computation step-by-step:

Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 21/76
cpp

t1 = a[i] // Load a[i]


t2 = b[i] // Load b[i]
t3 = t1 + t2 // Add values
c[i] = t3 // Store result into c[i]

📋 Step 2: Represent as Triples


A triple is a 3-field structure:

scss

(index) (operator, arg1, arg2)

Here’s the triple representation:

cpp

(0) = (LOAD, a, i) // t1 = a[i]


(1) = (LOAD, b, i) // t2 = b[i]
(2) = (+, (0), (1)) // t3 = t1 + t2
(3) = (STORE, c, i, (2)) // c[i] = t3 (Sometimes STORE is quad-like; or assume this is handled im

Alternatively, if we want to keep only three fields (as traditional triples), we can handle the store
separately in code generation.
Simplified triple format (without explicit store):

pgsql

(0) = (LOAD, a, i)
(1) = (LOAD, b, i)
(2) = (+, (0), (1))
(3) = (STORE, c, (2)) // interpreted as c[i] = result at (2)

✅ Final Answer: Triples


Index Operator Arg1 Arg2
(0) LOAD a i
(1) LOAD b i
(2) + (0) (1)
(3) STORE c (2)

Let me know if you want this in quadruple or indirect triple form too.

which phases of the compilation aremachine


dependent and which are machine-independent.

Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 22/76
In a compiler, the phases are generally categorized as machine-independent or machine-dependent,
based on whether they rely on the architecture or instruction set of the target machine.

✅ Machine-Independent Phases
These phases do not depend on the hardware architecture:

1. Lexical Analysis
→ Converts source code into tokens
(Machine-independent because it only deals with character streams.)
2. Syntax Analysis (Parsing)
→ Analyzes token structure according to grammar
(Independent of the target machine.)
3. Semantic Analysis
→ Checks for semantic consistency (e.g., type checking)
(Works with language rules, not machine specifics.)

4. Intermediate Code Generation


→ Converts syntax tree to intermediate representation (IR)
(IR is designed to be independent of any machine.)
5. Optimization (Middle-Level)
→ Performs optimizations on IR (e.g., dead code elimination, constant folding)
(General optimizations apply across machines.)

⚙️ Machine-Dependent Phases
These phases rely on the target machine's architecture:

1. Code Optimization (Low-Level)


→ Performs optimizations using knowledge of the target hardware (e.g., register allocation,
pipeline optimization)

2. Target Code Generation


→ Converts IR to actual machine code or assembly specific to a target architecture
(Clearly machine-dependent, as instruction sets vary.)
3. Assembly and Linking (if done by compiler)
→ May generate object code, link libraries
(Depends on machine’s binary formats and calling conventions.)

🔄 Summary Table
Phase Machine Dependency
Lexical Analysis Machine-Independent
Syntax Analysis Machine-Independent
Semantic Analysis Machine-Independent
Intermediate Code Generation Machine-Independent

Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 23/76
Phase Machine Dependency
Code Optimization (generic) Machine-Independent
Code Optimization (low-level) Machine-Dependent
Target Code Generation Machine-Dependent
Linking/Assembly Machine-Dependent
Would you like a diagram showing how these phases connect in a compiler?

Explain shift-reduce and reduce-reduce conflicts in the context of shift-reduce


parsing.

In shift-reduce parsing, conflicts occur when the parser cannot decide what action to take at a certain
point in the parse due to ambiguity or grammar design issues. The two main types of conflicts are:

⚔️ 1. Shift-Reduce Conflict
🔹 Definition:
A shift-reduce conflict occurs when the parser cannot decide whether to shift (read the next input
symbol) or reduce (apply a grammar rule) at a particular step.

🔸 Why It Happens:
Happens in ambiguous grammars or in situations where multiple parse paths seem valid.
Typically seen in LR parsers during construction of the parsing table.

📌 Example:
Consider the classic "dangling else" problem:

grammar

S → if E then S
S → if E then S else S
S → other

For input:

mathematica

if E then if E then S else S

At the `else`, the parser doesn't know whether to:

Shift the `else` and associate it with the nearest unmatched `if`
or Reduce the previous `if E then S` as a complete statement

Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 24/76
👉 This is a shift-reduce conflict.
⚔️ 2. Reduce-Reduce Conflict
🔹 Definition:
A reduce-reduce conflict occurs when the parser can reduce by two or more different grammar
rules and cannot decide which to choose.

🔸 Why It Happens:
This usually indicates a serious problem in the grammar (e.g., ambiguous or improperly factored).
Typically arises during parser table construction when multiple items in a state indicate reductions
by different rules.

📌 Example:
grammar

S → A
S → B
A → α
B → α

Here, for the same input that matches `α`, the parser could reduce by either `A → α` or `B → α`. It
doesn't know which rule to choose — this is a reduce-reduce conflict.

🧠 Key Differences
Aspect Shift-Reduce Conflict Reduce-Reduce Conflict
Decision Between Shift or Reduce Reduce by Rule 1 or Reduce by Rule 2
Cause Ambiguity in continuation or grouping Ambiguity in rule matching

Severity Can often be resolved by precedence Often requires grammar redesign

Would you like a visual parse table example to demonstrate these conflicts?

Explain S-attributed and L-attributed definitions with an example.

✅ S-Attributed and L-Attributed Definitions in Syntax-Directed Definitions (SDD)


In compiler design, Syntax-Directed Definitions (SDDs) are used to associate semantic rules with
grammar productions. These rules define how to compute attributes — values associated with
grammar symbols.
There are two important types of SDDs based on how attributes are computed:

Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 25/76
🟩 1. S-Attributed Definitions
🔹 Definition:
An S-attributed definition uses only synthesized attributes.

🔸 Synthesized Attribute:
Computed from children in the parse tree.
Flow is bottom-up.

📌 Example:
grammar

E → E1 + T { [Link] = [Link] + [Link] }


E → T { [Link] = [Link] }
T → num { [Link] = [Link] }

Explanation:

`E1`, `T`, and `num` are children nodes.


`[Link]` is synthesized from its children.
Suitable for bottom-up parsing like LR parsers.

🟦 2. L-Attributed Definitions
🔹 Definition:
An L-attributed definition uses both synthesized and inherited attributes, but inherited ones are
restricted.

🔸 Inherited Attribute:
Passed from parent or left siblings.
Flow is left-to-right.

✅ Rule for Inherited Attribute in L-Attributed SDD:


If a production is `A → X1 X2 ... Xn`, then `Xi.a` (inherited) can depend on:

1. `A`’s attributes
2. Attributes of symbols to the left of Xi (i.e., `X1` to `Xi−1`)
3. Its own attributes, as long as there’s no cycle

📌 Example:
Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 26/76
grammar

Production: A → B C
Semantic Rules:
[Link] = [Link]
[Link] = [Link]

This is L-attributed because:

`[Link]` (an inherited attribute) comes from `[Link]` (its parent)


`[Link]` (a synthesized attribute) is computed from `[Link]` (child)

🧠 Summary Table:
Feature S-Attributed L-Attributed
Uses Inherited Attr? ❌ No ✅ Yes
Uses Synthesized Attr? ✅ Yes ✅ Yes
Suitable for Bottom-Up Parsing (LR) Top-Down Parsing (LL) or Bottom-Up Parsing
Evaluation Order Bottom-Up Left-to-Right with restrictions

Would you like me to generate a dependency graph example for one of these?

a) Justify the following statements


(i) If there is no shift-reduce conflict in CLR(1) parsing table then there is no
shift-reduce conflict in LALR(1) parsing table of same grammar
(ii)Even If there is no reduce-reduce conflict in CLR(1) parsing table also there
maybe reduce-reduce conflict in LALR(1) parsing table of same grammar

Let’s analyze and justify both of the statements using concepts from LR parsing theory.

🔹 (i) If there is no shift-reduce conflict in CLR(1) parsing table, then there is no


shift-reduce conflict in LALR(1) parsing table of the same grammar.

✅ Justification:
CLR(1) (Canonical LR(1)) parsers are more powerful than LALR(1) parsers because they keep full
lookahead information in each item.

LALR(1) merges states with same cores but possibly different lookaheads.

When merging states in LALR(1), shift actions are determined only by the core items. That
means no new shift actions are introduced when merging.

Hence:

Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 27/76
If no shift-reduce conflict exists in CLR(1), the same shift actions will not create a conflict in
LALR(1), because merging cannot introduce a new shift action where none existed.

✅ Conclusion:
True. No shift-reduce conflict in CLR(1) implies none in LALR(1) for the same grammar.

🔹 (ii) Even if there is no reduce-reduce conflict in CLR(1) parsing table, there may be
a reduce-reduce conflict in LALR(1) parsing table of the same grammar.

✅ Justification:
In CLR(1), the lookaheads in each state are precise and separated.
In LALR(1), merging two CLR(1) states with the same core but different lookaheads combines
their lookaheads.

This can lead to a case where two reductions, which were in separate CLR(1) states with non-
overlapping lookaheads, now appear in the same LALR(1) state with combined
lookaheads.

If both reductions are valid for a common terminal, this causes a reduce-reduce conflict.
✅ Conclusion:
True. Reduce-reduce conflicts may appear in LALR(1) even if CLR(1) is conflict-free, because merging
states may lead to overlapping lookaheads for multiple reductions.

Would you like an example grammar that demonstrates this behavior?

Write quadruple, triple , and indirect triple for the following statement
x = y*-z + y*-z

To generate Quadruples, Triples, and Indirect Triples for the expression:

ini

x = y * -z + y * -z

we break it down into three-address code (TAC) first.

🔹 Step 1: Break into 3-address code (TAC)


Let’s rewrite `x = y * -z + y * -z` in smaller steps:

ini

t1 = -z
t2 = y * t1
t3 = -z
t4 = y * t3

Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 28/76
t5 = t2 + t4
x = t5

✅ 1. Quadruples
Each instruction is represented as a 4-tuple:
(Operator, Arg1, Arg2, Result)

No Operator Arg1 Arg2 Result


1 UMINUS z t1
2 * y t1 t2
3 UMINUS z t3
4 * y t3 t4
5 + t2 t4 t5

6 = t5 x

✅ 2. Triples
Each instruction is stored using indices (instead of temporary variables).
Triples: (Operator, Arg1, Arg2) — Arg1 and Arg2 can be constants, variables, or pointers to other rows.

No Operator Arg1 Arg2

0 UMINUS z
1 * y (0)
2 UMINUS z
3 * y (2)
4 + (1) (3)
5 = (4) x

✅ 3. Indirect Triples
Indirect triples use a pointer table to the actual triples, making it easier to reorder instructions if needed.
Pointer Table:

Index Points to
0 Instruction 0

1 Instruction 1
2 Instruction 2
3 Instruction 3
4 Instruction 4
5 Instruction 5

Instruction Table (same as triples):

Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 29/76
No Operator Arg1 Arg2
0 UMINUS z
1 * y (0)
2 UMINUS z
3 * y (2)
4 + (1) (3)

5 = (4) x

Would you like this expression optimized to avoid redundant calculations (like computing `y*-z` twice)?

Compare and contrast SLR(1), CLR(1), LALR(1) parsers.

Here’s a clear comparison of SLR(1), CLR(1), and LALR(1) parsers, focusing on their power, table size,
complexity, and conflict handling:

🔶 1. SLR(1) – Simple LR
Aspect Details
Full Form Simple LR

Lookahead 1 token
Parsing Table Smallest among the three
Uses Follow sets to construct parsing table
Strength Least powerful (can’t handle many real-world grammars)
Conflicts May have shift-reduce or reduce-reduce conflicts in many valid cases
Ease of Construction Easiest to build manually
Example Good for simple grammars but fails on ambiguous constructs

🔶 2. CLR(1) – Canonical LR
Aspect Details
Full Form Canonical LR
Lookahead 1 token
Parsing Table Largest, because it tracks full LR(1) items

Uses Lookahead with every item (more precise)


Strength Most powerful (handles all deterministic context-free grammars)
Conflicts Least number of conflicts
Ease of Construction Harder to construct manually
Example Used in Yacc-like tools when precision is required

🔶 3. LALR(1) – LookAhead LR
Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 30/76
Aspect Details
Full Form LookAhead LR
Lookahead 1 token
Parsing Table Same size as SLR(1), but more precise than SLR
Uses Merges similar LR(1) states with same core, combines lookaheads
Strength Middle ground: more powerful than SLR, less than CLR

Conflicts May introduce reduce-reduce conflicts that CLR avoids


Ease of Construction Practical, commonly used in real-world tools like Yacc/Bison
Example Handles most programming language grammars efficiently

🔽 Summary Table
Feature SLR(1) LALR(1) CLR(1)

Power Low Medium High


Table Size Small Medium (≈SLR) Large
Lookahead 1 1 1
Grammar Coverage Limited Good Excellent
Conflicts Common Fewer Least
Ease of Construction Easiest Moderate Hardest

Would you like a visual diagram to better understand how their state diagrams differ?

Write triples for the following statement.


c[i] = a[i] + b[i]
where a,b,c are arrays of type integers

To write triples for the statement


`c[i] = a[i] + b[i]`,
we break the expression into intermediate operations. A triple uses a format:

perl

(index, operation, argument1, argument2)

Here’s how the computation is broken down:

✅ Original Statement:
`c[i] = a[i] + b[i]`

🔧 Triples Representation:
Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 31/76
Index Operation Arg1 Arg2
(0) `[]` a i
(1) `[]` b i
(2) `+` (0) (1)
(3) `[]= ` c i, (2)

🧠 Explanation:
(0): Access `a[i]`
(1): Access `b[i]`
(2): Add the two values from above
(3): Store the result in `c[i]`

Note: `[]` represents array access, and `[]=` indicates array assignment.
Would you also like the quadruples and indirect triples for this?

first write three address code

Sure! Let's begin with the Three-Address Code (TAC) for the statement:
`c[i] = a[i] + b[i]`

✅ Three-Address Code (TAC):


text

t1 = a[i]
t2 = b[i]
t3 = t1 + t2
c[i] = t3

🧠 Explanation:
`t1` holds the value of `a[i]`
`t2` holds the value of `b[i]`

`t3` computes `t1 + t2`


`c[i] = t3` stores the result in the `i`-th index of `c`

Let me know if you'd like to see the quadruples, triples, or indirect triples for this as well.

Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 32/76
What are different intermediate code forms? Discuss different Three Address
code types and implementations of Three Address statements.

✅ Intermediate Code Forms in Compiler Design


Intermediate code is a low-level representation of the source code, generated after syntax analysis and
before code optimization or target code generation. It helps in making the compiler more machine-
independent, easier to optimize, and modular.

📌 Types of Intermediate Code Forms


1. Three-Address Code (TAC)
2. Quadruples

3. Triples
4. Indirect Triples
5. Polish Notation (Prefix and Postfix forms)
6. Syntax Trees
7. Directed Acyclic Graphs (DAG)
8. Stack-based Code (used in stack machines or JVM)

✅ Three-Address Code (TAC)


A common intermediate form where each instruction contains at most three addresses (operands).
These instructions are easy to map to assembly or machine code.

Example:

For:
`x = a + b * c`

TAC:

text

t1 = b * c
t2 = a + t1
x = t2

📌 Types of Three-Address Statements


1. Assignment with binary operator
`x = y op z`

2. Assignment with unary operator


`x = op y`

Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 33/76
3. Copy/Assignment statement
`x = y`

4. Unconditional jump
`goto L`

5. Conditional jump
`if x relop y goto L`

6. Indexed assignment
`x = y[i]` or `x[i] = y`

7. Pointer assignment
`x = *y` or `*x = y`

8. Procedure call and return


`call p, n`, `param x`, `return y`

✅ Implementations of TAC
There are three main ways to represent TAC in compilers:

1. Quadruples

Each instruction is a 4-tuple:


`(operator, arg1, arg2, result)`

Example:
`x = y + z`

op arg1 arg2 result


+ y z x

2. Triples

Each instruction is a 3-tuple:


`(operator, arg1, arg2)`, and operands can refer to result positions using indices.

Example:

Index op arg1 arg2


(0) + y z
(1) = (0)

3. Indirect Triples

Used to avoid complications in code optimization due to position-based references. It uses a pointer
table.

A table of pointers points to triples.

Useful when reordering code for optimization.

Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 34/76
✅ Summary
Form Structure Advantage
Three-Address Code Human-readable, high-level IR Easier for analysis and optimization
Quadruples Explicit temporary storage Easy to translate to machine code
Triples Saves space by avoiding result names Harder to reorder/optimize

Indirect Triples Adds flexibility to triples Better for code movement

Would you like a visual example comparing all three (TAC, quadruples, triples) for a specific statement
like `x = (a + b) * (c - d)`?

Write a note on simple type checker and list the different types of type
checking.

✅ Note on Simple Type Checker


A type checker is a crucial component of a compiler responsible for ensuring that the program adheres
to the type rules of the language. It verifies that the operations in a program are applied to compatible
data types.

A simple type checker is one that:

Works with basic, statically typed programming languages.


Assumes no polymorphism or complex type systems.
Validates expressions, assignments, function calls, etc., based on predefined type rules.

🧠 Key Tasks Performed by a Simple Type Checker


1. Type Compatibility: Ensures that operands used with operators are of compatible types.
Example: `int + int` → valid, but `int + string` → error.
2. Type Consistency: Ensures that variables are used with their declared types.
3. Type Propagation: Determines the type of complex expressions from subexpressions.
4. Function Argument Checking: Validates number and types of arguments in function calls.
5. Return Type Checking: Ensures function returns values of declared type.

📌 Types of Type Checking


Type Checking Description
Static Type Checking Performed at compile time. Detects errors before program runs. Example: C, Java
Dynamic Type Checking Performed at runtime. More flexible but less safe. Example: Python, JavaScript

Strong Type Checking Enforces strict rules to avoid type errors (e.g., no implicit conversions).

Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 35/76
Type Checking Description
Weak Type Checking Allows more implicit conversions and flexibility (risk of subtle bugs).

🛠️ Example (Static Type Checking):


c

int x = 5;
x = "hello"; // Error: assigning string to int

🛠️ Example (Dynamic Type Checking):


python

x = 5
x = "hello" # No error; type is checked during execution

✅ Summary:
A simple type checker ensures program correctness by enforcing type rules early in compilation. Type
checking can be static or dynamic, strong or weak, depending on the language's design philosophy.
Would you like an example of a type checker implemented in a mini-compiler or interpreter (in Python
or C)?

What do you mean by attributed grammars? Discuss the translation scheme for
Converting an infix expression to its equivalent postfix form.

✅ What is an Attributed Grammar?


An Attributed Grammar is a Context-Free Grammar (CFG) augmented with attributes and semantic
rules. It is used in syntax-directed translation — a technique that combines parsing and translation by
associating actions (semantic rules) with grammar productions.
Each symbol in the grammar (terminal or non-terminal) can have:

Attributes (values like type, code, value, etc.)

Semantic rules to compute those attributes

There are two main types of attributes:

1. Synthesized attributes – computed from children nodes (bottom-up)


2. Inherited attributes – computed from parent/sibling nodes (top-down or left-to-right)

📘 Infix to Postfix Translation Using Attributed Grammar


Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 36/76
Grammar:

E → E + T
E → T
T → T * F
T → F
F → (E)
F → id

Attributes:

Each non-terminal has a synthesized attribute `postfix` which stores the postfix form of the
expression.

Semantic Rules:

E → E1 + T { [Link] = [Link] || [Link] || '+' }


E → T { [Link] = [Link] }
T → T1 * F { [Link] = [Link] || [Link] || '*' }
T → F { [Link] = [Link] }
F → (E) { [Link] = [Link] }
F → id { [Link] = [Link] }

🧠 Example: Converting `a + b * c` to Postfix

Parse Tree (based on precedence rules):

E
/|\
E + T
| |
T T * F
| |
F F
| |
a b
|
c

Postfix computation using rules:

`F → id` → `[Link] = a`, `b`, `c`


`T → F` → `[Link] = c`
`T → T * F` → `[Link] = b c *`
`E → T` → `[Link] = a`
`E → E + T` → `[Link] = a b c * +`

✅ Result: `a b c * +` → Correct postfix for `a + b * c`

Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 37/76
🔁 Summary:
Feature Attributed Grammar
Base Structure Context-Free Grammar
Added Elements Attributes + Semantic Rules
Purpose Perform syntax-directed translation

Example Application Converting infix to postfix using synthesized attributes

Would you like this process illustrated with a parse tree diagram or in code form (e.g., Python)?

Show that the following grammar:


S → Aa|bAc|Bc|bBa
A→d
B→d
Is LR(1) but not LALR(1).

To show that the given grammar is LR(1) but not LALR(1), we need to follow these steps:

📌 Grammar:
markdown

1. S → Aa
2. S → bAc
3. S → Bc
4. S → bBa
5. A → d
6. B → d

✅ Step 1: Show it's LR(1)


Build LR(1) items:

Each item is of the form `[production, lookahead]`.


We compute LR(1) items, i.e., items with lookahead tokens. For example:

csharp

[S → • Aa, $]
[A → • d, a]
[B → • d, a]

This grammar does not produce conflicts in its full LR(1) parsing table:

Each item is distinct because of its lookahead, so no shift/reduce or reduce/reduce conflict


occurs.

Hence, it is LR(1).

Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 38/76
❌ Step 2: Show it's not LALR(1)
To check if it’s not LALR(1):

LALR(1) parsers are created by merging LR(1) items with the same core, i.e., same production
and dot position, ignoring lookahead.

Let's look at two distinct LR(1) items:

`[A → d, a]`

`[B → d, c]`

If we merge them for LALR(1) (both have core `[d → •]`), their lookaheads combine: `a` and `c`.

Now consider:

After seeing `d`, we must decide whether to reduce using `A → d` or `B → d`, based on the
lookahead (`a` or `c`).
Merging these would cause a reduce-reduce conflict on lookaheads `a` and `c`.

🛑 Conflict:
In LR(1): No conflict due to distinct lookaheads.
In LALR(1): Conflict occurs because both rules are reduced from the same core `[d → •]`.

✅ Conclusion:
✔ The grammar is LR(1) — no conflicts when using full lookaheads.
❌ The grammar is not LALR(1) — because merging items causes a reduce-reduce conflict.
Would you like a step-by-step construction of the LR(1) and LALR(1) tables for this grammar?

Generate the three address code for the following code fragment.
a=b+1 x=y+3 y=a/b a=b+c

Here's how we generate three-address code (TAC) for the given code fragment:

🧾 Given Statements:
ini

1. a = b + 1
2. x = y + 3
3. y = a / b
4. a = b + c

Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 39/76
✅ Three Address Code (TAC):
We introduce temporary variables (`t1`, `t2`, etc.) for intermediate computations.

plaintext

t1 = b + 1
a = t1

t2 = y + 3
x = t2

t3 = a / b
y = t3

t4 = b + c
a = t4

💡 Explanation:
Each arithmetic operation is isolated into a separate step using a temporary variable.
This makes optimization and code generation easier in compiler backends.

Would you like me to also generate triples, quadruples, or indirect triples for this code?

Explain the stack allocation strategy with an example.

📚 Stack Allocation Strategy in Compiler Design


Stack allocation is a common memory management strategy used for managing activation records
(function call frames) during program execution. It is widely used in procedural programming
languages like C, Java, and Python.

✅ What is Stack Allocation?


In stack allocation:

Memory is organized as a stack (LIFO - Last In, First Out).


When a function is called, an activation record (stack frame) is pushed onto the stack.

When the function returns, its activation record is popped off the stack.

Each activation record typically contains:

Return address
Parameters
Local variables

Temporaries

Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 40/76
Control link (pointer to the caller’s frame)

📌 Example:
Consider the following C-like code:

int main() {
int a = 5;
int result = square(a);
return result;
}

int square(int x) {
int y = x * x;
return y;
}

🧱 Stack Behavior:
Let’s trace stack allocation during execution:

1. Call `main()`

Push `main`'s activation record:


Local variable `a`
Local variable `result`
2. Call `square(a)`
Push `square`'s activation record:
Parameter `x = 5`

Local variable `y = 25`


3. Return from `square`
Pop `square`'s activation record.
4. Return from `main`
Pop `main`'s activation record.

📝 Advantages of Stack Allocation


Simple and efficient
Automatic memory management for functions
Fast allocation/deallocation

❌ Limitations
Not suitable for values that must persist beyond the function call (like dynamic objects or closures).

Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 41/76
Stack size is limited (stack overflow is possible).

Would you like a diagram of how the stack frames are organized for this example?

Explain register allocation by graph coloring.

🧠 Register Allocation by Graph Coloring


Register allocation is a phase in a compiler's backend where variables are assigned to a limited
number of CPU registers for faster access instead of memory. One powerful technique to do this is
graph coloring.

🧾 What is Graph Coloring?


In graph theory, graph coloring is the act of assigning colors to the nodes of a graph such that no two
adjacent nodes have the same color.
In register allocation, the idea is:

Each variable is a node in the graph.

An edge is drawn between two nodes if the variables interfere (i.e., are live at the same time).
The number of colors corresponds to the number of available registers.

✅ Steps in Register Allocation by Graph Coloring


1. Liveness Analysis:

Determine which variables are live at each point in the program.


2. Interference Graph Construction:
Create a graph where each node is a variable.
Add edges between variables that are live at the same time.
3. Coloring the Graph:
Try to assign a color (register) to each node such that no two adjacent nodes share a color.

If a variable cannot be colored (i.e., not enough registers), it is spilled to memory.


4. Code Generation:
Replace variable access with the register or memory location according to the assignment.

📌 Example
Consider the following simplified code:

Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 42/76
c

a = 1
b = 2
c = a + b
d = c + b
e = d + a

Live variables:

`a` and `b` are live when `c` is computed.


`c` and `b` are live when `d` is computed.
`d` and `a` are live when `e` is computed.

Interference graph:

lua

a -- b
a -- d
b -- c
c -- d
d -- e
a -- e

Coloring:

If we have 3 registers, assign:


`a`: R1
`b`: R2
`c`: R3
`d`: spill (no register available)

`e`: reuse R3 (if `c` is no longer live)

🔍 Key Terms
Spilling: Moving a variable from a register to memory due to lack of registers.
Coalescing: Removing unnecessary move instructions by assigning the same register to both
variables.

Chaitin’s Algorithm: A popular register allocation algorithm based on graph coloring.

✅ Advantages
Produces efficient code with fewer loads/stores.
Works well in optimizing compilers.

Would you like a diagram of an interference graph for the example?

Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 43/76
Explain about different forms object code forms used as target code in target
code generation.

In target code generation, the compiler translates intermediate representation (IR) into target
(machine) code. This target code may appear in several object code forms, each suitable for a different
stage of compilation or linking.
Here are the major forms of object code used as target code:

1. Absolute Code

Description: Machine code that is ready to be loaded into a fixed memory location and executed.
Fixed addresses are already assigned.
No linking or relocation is needed.

Example:

asm

MOV R1, 1000 ; Load from memory address 1000


ADD R1, R2

Advantages:

Simple to load and execute.

Disadvantages:

Not flexible (can’t relocate or link with other code).


Must know the exact memory addresses beforehand.

2. Relocatable Code

Description: Object code that includes relative addresses instead of fixed ones.
Uses a relocation table that tells the loader how to adjust addresses based on where the program
is loaded into memory.

Example:

Code might use `LOAD R1, VAR1` where `VAR1` is defined as an offset (e.g., +20) in a table.

Advantages:

More flexible than absolute code.


Can be loaded anywhere in memory.

Disadvantages:
Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 44/76
Loader must handle relocation.

3. Position-Independent Code (PIC)

Description: Code that can execute correctly regardless of its absolute memory address.
Often used in shared libraries and dynamic linking.

Mechanism:

Uses relative addressing (e.g., address of a variable is computed relative to current instruction
pointer).

Advantages:

Can be loaded at any address.


Ideal for shared libraries.

Disadvantages:

Slightly more complex to generate and manage.

4. Assembly Code

Description: A low-level textual representation of machine code instructions.


Human-readable and can be passed to an assembler to produce relocatable or absolute code.

Example:

asm

MOV R1, #5
ADD R1, R2

Advantages:

Easy to debug and modify.


Platform-specific optimizations possible.

Disadvantages:

Needs further assembly to become executable.

5. Binary Object Code (Object File)

Description: Encoded machine instructions stored in binary format, produced by the compiler or
assembler.
May include:
Code section

Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 45/76
Data section

Symbol table
Relocation info

Formats: ELF (Linux), COFF, PE (Windows)


Advantages:

Standard format for linking and loading.


Supports modular programming and separate compilation.

🔁 Summary Table
Code Form Fixed Address Relocatable Human-Readable Use Case
Absolute Code Yes No No Simple systems, embedded code
Relocatable Code No Yes No Modular programming, linking

Position-Independent No Yes No Shared libraries, dynamic linking


Assembly Code No Yes Yes Debugging, manual optimization
Binary Object Code Depends Yes No Compiler output, linking

Would you like an example of how a high-level statement (like `x = a + b`) is translated into one of
these forms?

Explain about next-use, register descriptor, address descriptor data structures


used in simple code generation algorithm.

In simple code generation, particularly for intermediate code like three-address code (TAC), we aim
to generate efficient target code using a limited number of registers. To do this effectively, we use three
important data structures:

🔹 1. Next-Use Information
Purpose:

Tells whether a variable will be used again in the future and when.
Helps in deciding whether to keep a variable in a register or store it back to memory.

Use:

If a variable is not used again, the register holding it can be reused immediately.
If a variable will be used later, it may need to be preserved.

Computed During:

Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 46/76
Liveness analysis on the intermediate code (e.g., TAC).

Format Example:

Statement x y z
x=y+z no yes yes

🔹 2. Register Descriptor
Purpose:

Keeps track of which variable is currently stored in each register.


Helps avoid unnecessary loads and stores.

Structure:

A table mapping registers to variables.

Example:

Register Contents (Variable)


R1 x
R2 y
R3 — (empty)

Use:

If a variable is already in a register, no need to load it again.


If a register is needed and full, decide which variable to spill based on next-use info.

🔹 3. Address Descriptor
Purpose:

Tells where the current value of each variable is stored (register, memory, or both).
Used to generate correct code and avoid reloading values unnecessarily.

Structure:

A table mapping variables to their locations.

Example:

Variable Location(s)
x R1, Memory[x]

Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 47/76
Variable Location(s)

y Memory[y]
z R2

Use:

When generating code, check if the variable is in a register.


If not, load from memory and update descriptors.

🔁 How They Work Together


Suppose we are generating code for:

x = y + z;

Steps:

1. Check Address Descriptor for `y` and `z`


If not in a register, load them, and update Register & Address Descriptors.
2. Perform the addition
Store the result in a register (say R1).
Update Register Descriptor: R1 → x.
Update Address Descriptor: x → R1.
3. Use Next-Use Info
If `y` or `z` is not needed later, their registers can be reused.
If `x` has no next use, it can be stored directly in memory.

✅ Summary Table
Data Structure Tracks Purpose
Next-Use Info Future uses of variables Help decide register reuse or spilling
Register Descriptor Variable(s) in each register Avoid redundant loads
Address Descriptor Location(s) of each variable’s value Generate correct and efficient code

Would you like to see a worked-out example of how these are used in code generation for a specific TAC
block?

Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 48/76
Write simple code generation algorithm.

✅ Simple Code Generation Algorithm


The simple code generation algorithm takes intermediate code (like three-address code) and
generates target machine instructions, while trying to use registers efficiently. It uses the following key
data structures:

Register Descriptor – Tracks what variables are in which registers.


Address Descriptor – Tracks where each variable's current value resides (register or memory).
Next-Use Info – Helps decide whether to keep or discard variables in registers.

📘 Algorithm Outline (for one three-address code statement)


Let's assume a general TAC statement:

ini

x = y op z

Steps:

1. Check Address Descriptor for y and z


If y or z is not in a register, generate code to load them from memory into a register.
Update both address and register descriptors.
2. Choose a register r to store result x
Prefer an empty register or reuse one holding a variable with no next use.
If all are in use, pick one and spill (store) its variable to memory if needed.
3. Generate instruction:

ini

r = y op z

4. Update descriptors:
`Register Descriptor[r] = x`

`Address Descriptor[x] = r`

Remove x from all other registers.


If y or z is not live anymore (based on next-use info), free their registers.
5. If x has no further use, store it to memory (optional optimization).

🧾 Pseudocode:
Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 49/76
plaintext

For each statement s: x = y op z


Ensure y in register Ry
Ensure z in register Rz

Choose register R to store x


- If R is occupied and value is live, store it to memory
- Free R

Generate: R = Ry op Rz

Update Register Descriptor: R -> x


Update Address Descriptor: x -> R
Remove x from other registers

If y or z not used later:


Free their registers

💡 Example:
Given: `a = b + c`, with initial Address Descriptor:

css

b in R1
c in memory

Steps:

1. Load `c` into register R2 → `MOV R2, c`


2. `ADD R3, R1, R2` → result in R3
3. Store `a` → `MOV a, R3`
4. Update descriptors:
RegDesc: R3 → a
AddrDesc: a → R3, mem[a]

Would you like this implemented for a full TAC block or translated to a specific target like x86 or MIPS?

Generate simple assembly language target code for the following intermediate
code statements using simple code generator. Assume that target machine has
three registers (R1, R2, R3). Initially all registers are empty and no variable is
live at the end of all statements
T1 = a*b
T2 = a*c
T3 = T1 + b
T4 = a+T2
T5 = b + T4

Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 50/76
To generate simple assembly language target code using a simple code generator, we assume the
following:

Machine has 3 registers: `R1`, `R2`, and `R3`.


Each operation is in three-address form.
All registers are initially empty.
No variable is live at the end, so we don’t need to save anything back to memory unless required
mid-way.

📌 Intermediate Code:
ini

T1 = a * b
T2 = a * c
T3 = T1 + b
T4 = a + T2
T5 = b + T4

📘 Target Code Generation


1. `T1 = a * b`

nginx

MOV R1, a ; Load 'a' into R1


MUL R1, b ; R1 = a * b
; T1 → R1

2. `T2 = a * c`

nginx

MOV R2, a ; Load 'a' into R2 (reuse R1 not possible, T1 is in R1)


MUL R2, c ; R2 = a * c
; T2 → R2

3. `T3 = T1 + b` (T1 in R1)

nginx

MOV R3, R1 ; Copy T1 from R1 to R3


ADD R3, b ; R3 = T1 + b
; T3 → R3

4. `T4 = a + T2` (T2 is in R2)

Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 51/76
nginx

MOV R1, a ; Reuse R1, since T1 is no longer used


ADD R1, R2 ; R1 = a + T2
; T4 → R1

5. `T5 = b + T4` (T4 is in R1)

nginx

MOV R2, b ; Reuse R2, since T2 is no longer used


ADD R2, R1 ; R2 = b + T4
; T5 → R2

✅ Final Assembly Code


asm

MOV R1, a ; R1 = a
MUL R1, b ; R1 = a * b (T1)

MOV R2, a ; R2 = a
MUL R2, c ; R2 = a * c (T2)

MOV R3, R1 ; R3 = T1
ADD R3, b ; R3 = T1 + b (T3)

MOV R1, a ; R1 = a (T1 overwritten)


ADD R1, R2 ; R1 = a + T2 (T4)

MOV R2, b ; R2 = b (T2 overwritten)


ADD R2, R1 ; R2 = b + T4 (T5)

Notes:

We reused registers carefully to avoid overwriting needed values.


Since no values are live at the end, no stores to memory are necessary.

Register pressure was handled within 3 registers.

Would you like this code adapted for a specific real-world architecture like x86 or MIPS?

Explain various object code forms used in compilation.

In compiler design, the object code is the output of the code generation phase. It is the machine-level
representation of the source code or an intermediate form ready for linking and execution. There are
several object code forms, depending on the target machine, the execution model, and the compiler
design strategy.

🔹 Various Object Code Forms


Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 52/76
1. Absolute Code

This is complete machine code that can be loaded directly into memory and executed.
Memory addresses are fixed.
No linking or relocation is required.
Usually generated when the exact memory location is known at compile time.

Example:

asm

LOAD R1, 500 ; Load value from address 500 into R1


ADD R1, 600 ; Add value at address 600
STORE R1, 700 ; Store result at address 700

Advantages:

Simple and fast to load and execute.

Disadvantages:

Not flexible; code must be generated for specific memory addresses.


Difficult to use in multi-module programs.

2. Relocatable Code

Code generated with relative addresses (e.g., offsets), not absolute addresses.
A relocation loader adjusts the addresses at load time.
Useful when code location in memory isn't known at compile time.

Example:

asm

LOAD R1, [Base + 0] ; Base is assigned at load time


ADD R1, [Base + 4]
STORE R1, [Base + 8]

Advantages:

Code can be loaded at different memory locations.


Useful for dynamic linking and memory management.

Disadvantages:

Slightly more complex loading process.

3. Position-Independent Code (PIC)

Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 53/76
Code that executes correctly regardless of where in memory it is loaded.
Often used in shared libraries and dynamic loading.
Uses registers or relative addressing rather than fixed addresses.

Example:

asm

MOV R1, [RIP + offset] ; RIP-relative addressing (x86_64)

Advantages:

Excellent for shared code across processes.


Increases flexibility in modern OS environments.

Disadvantages:

May involve slight performance overhead due to indirect addressing.

4. Assembly Code (Symbolic Object Code)

Human-readable intermediate code often generated before machine code.


Contains mnemonics and labels, not raw machine instructions.
Passed to an assembler to generate actual machine code.

Example:

asm

MOV R1, a
ADD R1, b
MOV c, R1

Advantages:

Easier to debug and analyze.


Useful in early stages or for optimization/debugging.

Disadvantages:

Needs an additional assembly step.

5. Binary Object File

The final object code stored in a standard file format (like ELF, COFF, or Mach-O).
Contains:
Machine code
Symbol tables
Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 54/76
Relocation info
Debugging info
Used by the linker to generate the final executable.

🔸 Summary Table
Fixed Needs Human-
Form Use Case
Address Linking readable
Embedded systems,
Absolute Code Yes No No
bootloaders
Relocatable Code No Yes No Modular applications
Position-
No Optional No Shared libraries, ASLR
Independent
Assembly Code No Yes Yes Debugging, optimization

Binary Object File No Yes No Final output for execution

Would you like a diagram to visualize these object code forms and how they fit into the compilation
pipeline?

Explain how does register allocation techniques affect program performance.

Register allocation is a key optimization technique in the code generation phase of a compiler. It
determines how variables and intermediate values are assigned to a limited number of CPU registers.
Since registers are the fastest form of storage, efficient register allocation significantly improves
program performance.

🔹 How Register Allocation Affects Program Performance


1. Reduced Memory Access

Accessing memory (RAM) is much slower than accessing registers.


Good register allocation minimizes the need to load/store values from memory, reducing
execution time.
Poor allocation leads to spilling, where variables are temporarily stored in memory due to a
shortage of registers.

Impact: Fewer memory operations → faster execution.

2. Instruction Count Minimization

With optimal register usage, intermediate results can stay in registers across multiple instructions.
Reduces the total number of load, store, and move instructions.

Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 55/76
Example:
Without good allocation:

asm

LOAD R1, a
STORE R1, temp1
LOAD R2, b
ADD R1, R2
STORE R1, temp2

With good allocation:

asm

LOAD R1, a
ADD R1, b

Impact: Fewer instructions → less CPU time and code size.

3. Pipeline and Cache Optimization

Registers help reduce pipeline stalls and improve instruction-level parallelism.


Fewer memory accesses reduce cache misses, especially in tight loops.

Impact: Better CPU utilization and performance.

4. Improved Loop Performance

Registers are critical in loops where variables are repeatedly accessed.


Allocating loop-invariant or frequently-used variables to registers accelerates loop execution.

Impact: Dramatic performance gain in compute-heavy loops.

🔸 Register Allocation Techniques


Technique Description Performance Impact
Allocates registers using interference graphs. Tries to avoid High (global
Graph Coloring
spilling. optimization)
Fast, suitable for JIT compilers; less optimal than graph
Linear Scan Medium
coloring.
Priority-Based Assigns registers based on usage frequency or loop nesting. Medium to High
Greedy
Simplified allocation for small code blocks. Low to Medium
Algorithms

🔹 Conclusion
Efficient register allocation:

Reduces execution time


Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 56/76
Minimizes memory usage
Improves CPU throughput
Leads to smaller, faster binaries

Inefficient allocation can severely bottleneck performance, especially in performance-critical


applications like games, scientific computation, or embedded systems.
Would you like a visual example of how spilling and optimal allocation affect a code block?

Explain peephole optimization on target assembly language programs with


examples.

Peephole optimization is a local code optimization technique that examines a small set of
instructions (called a peephole, usually a few lines at a time) and attempts to replace inefficient
instruction sequences with more efficient ones without changing the program's semantics.
It is called "peephole" because it only looks at a small portion of the code at a time — like looking
through a peephole.

🔸 Objectives of Peephole Optimization:


Eliminate redundant instructions
Replace inefficient instructions with efficient ones
Simplify address calculations
Optimize jumps and branches

🔹 Common Peephole Optimization Techniques


1. Redundant Load/Store Elimination

If a value is already in a register, loading it again is unnecessary.


Before:

asm

LOAD R1, a
LOAD R1, a ; redundant

After:

asm

LOAD R1, a

2. Algebraic Simplifications
Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 57/76
Replace operations with simpler equivalents.
Before:

asm

MUL R1, 2 ; R1 = R1 * 2

After:

asm

SHL R1, 1 ; Shift left is faster than multiplication

3. Strength Reduction

Replace expensive operations with cheaper ones.


Before:

asm

MUL R1, 8

After:

asm

SHL R1, 3 ; Multiply by 8 using shift left by 3

4. Unreachable Code Elimination

Code that follows an unconditional jump or return can be removed.

Before:

asm

JMP L1
MOV R1, 0 ; Unreachable
L1:

After:

asm

JMP L1
L1:

5. Jump-to-Jump Optimization

Jumping to another jump can be simplified.


Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 58/76
Before:

asm

JMP L1
...
L1: JMP L2

After:

asm

JMP L2

6. Use of Efficient Instructions

Some architectures have optimized instructions for common patterns.


Before:

asm

LOAD R1, #0
ADD R1, R2

After:

asm

MOVE R1, R2 ; if MOVE is more efficient

🔹 Example: Before and After Peephole Optimization


Before Optimization:

asm

LOAD R1, a
LOAD R2, b
ADD R1, R2
STORE R1, a
LOAD R1, a ; redundant

After Optimization:

asm

LOAD R1, a
LOAD R2, b
ADD R1, R2
STORE R1, a

🔸 Summary
Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 59/76
Feature Description
Scope Small window of instructions
Speed Fast and simple
Type Local optimization
Examples Redundant loads, jump simplification, algebraic optimizations

Peephole optimizations are often the last phase of compilation, applied just before or during code
generation, and can significantly improve performance and reduce code size.
Would you like a live example or illustration of how a peephole optimization engine might work?

Explain how relocatable object code form helps in cross-platform


compatibility.

🔧 Relocatable Object Code and Cross-Platform Compatibility


Relocatable object code is a format in which the code and data can be loaded at different memory
addresses without requiring recompilation. This flexibility is essential in modern systems and
contributes to cross-platform compatibility and modular software design.

🔹 What is Relocatable Object Code?


Relocatable object code is produced by a compiler or assembler and:

Contains placeholders or symbolic references for memory addresses.


Has relocation information to adjust those addresses during linking or loading.
Is not bound to a fixed memory address.

🔹 How It Works
When a program is compiled:

Absolute addresses are not hardcoded.


The object code contains symbolic names and relative addresses.
During the linking/loading phase, these addresses are resolved and relocated to match the
target system's memory layout.

🔹 Benefits for Cross-Platform Compatibility


Feature How It Helps
✅ Platform Independence of Code does not rely on fixed memory addresses; relocator adjusts for
Addressing different systems.

✅ Modular Compilation Code modules compiled separately can be linked on different systems.

Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 60/76
Feature How It Helps
✅ Dynamic Linking & Loading Shared libraries or DLLs can be loaded and relocated at runtime.
✅ Portability Across Systems The same object code can be linked differently on different architectures.

✅ Ease of Integration Object files from different compilers/platforms can be integrated into the
same program.

🔸 Example
Suppose we compile the following C code:

int sum(int a, int b) {


return a + b;
}

The compiler will produce relocatable object code for the `sum` function, where:

The code does not specify an exact address for `sum`.


The linker/loader on Windows or Linux can relocate it to the correct place in memory.

This same object file might be used:

On a Windows system in one memory segment.


On a Linux system at a different address.
Or in an embedded system, with a custom loader relocating it.

🔹 Summary
Relocatable object code:

Enables flexibility in where code and data reside in memory.


Makes software more modular, reusable, and portable.
Is crucial for supporting multiple architectures and operating systems with minimal changes.

Would you like a diagram showing the relocation process from compilation to execution?

Explain which phases of compilation are machine independent and which are
machine dependent?

In a compiler, different phases handle various aspects of translating source code into machine code.
These phases can be classified into machine-independent and machine-dependent based on whether
they depend on the details of the target hardware architecture.

🔷 Machine-Independent Phases
Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 61/76
These phases deal with language syntax and semantics, not the target hardware. They are usually the
front end of the compiler:

1. Lexical Analysis
Converts the source code into tokens.
Example: Identifies keywords, identifiers, operators, etc.
❌ No hardware dependency.
2. Syntax Analysis (Parsing)
Checks for grammatical structure using context-free grammars.
Builds a parse tree or syntax tree.
❌ Hardware independent.
3. Semantic Analysis

Ensures correct meaning of syntactically valid statements.


Example: Type checking, scope resolution.
❌ Independent of hardware.
4. Intermediate Code Generation
Produces a generic, low-level representation of the program.
Example: Three-address code (TAC), abstract syntax trees.
✅ Platform-neutral form.
5. Optimization (Machine-Independent)
Improves code performance without considering specific machine architecture.
Example: Constant folding, loop unrolling.
❌ Not tied to hardware.
🔷 Machine-Dependent Phases
These phases depend on the target machine architecture (e.g., number of registers, instruction set).
They form the back end of the compiler:

1. Code Generation
Converts intermediate code to machine code or assembly.
Takes into account registers, addressing modes, and instruction formats.
✅ Tightly tied to hardware.
2. Optimization (Machine-Dependent)
Improves performance based on specific hardware.
Example: Instruction scheduling, pipeline optimizations.
✅ Depends on target CPU.
3. Register Allocation and Assignment

Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 62/76
Maps variables to machine registers.
✅ Depends on the number and usage of CPU registers.
4. Instruction Selection and Scheduling
Chooses optimal instructions and reorders them for better CPU utilization.
✅ Target-machine-specific.
🔸 Summary Table
Phase Machine Independent Machine Dependent
Lexical Analysis ✅ Yes ❌ No
Syntax Analysis ✅ Yes ❌ No
Semantic Analysis ✅ Yes ❌ No
Intermediate Code Generation ✅ Yes ❌ No
Machine-Independent Optimization ✅ Yes ❌ No
Code Generation ❌ No ✅ Yes
Machine-Dependent Optimization ❌ No ✅ Yes
Register Allocation ❌ No ✅ Yes
Would you like a visual diagram showing how front-end and back-end phases interact?

Perform available expression analysis on the following intermediate code


statements numbered from 1 to 9
i) Argue whether a*b expression is available at statement 6 and
statement 8
ii) Argue whetherb*c expression is available at statement5 and
statement 9
1 x = a*b
2 y = b*c
3 if a>20 goto 6
4 z = a*b
5 w = b*c
6 p = a*b
7 a = a+20
8 q = a*b
9 r = b*c

To perform available expression analysis, we analyze where expressions like `a*b` or `b*c` are
available — meaning they have already been computed earlier and none of the variables involved
have been redefined in the meantime.

🔷 Step-by-Step: Available Expression Analysis


Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 63/76
We need to trace where expressions are computed, and whether the variables involved (`a`, `b`, `c`)
are redefined in between uses.
Let’s annotate each line first:

Line Statement Expression Generated Expression Killed (invalidated)


1 `x = a*b` `a*b` None
2 `y = b*c` `b*c` None
3 `if a > 20 goto 6` — None
4 `z = a*b` `a*b` None
5 `w = b*c` `b*c` None
6 `p = a*b` `a*b` None
7 `a = a + 20` — All expressions involving `a` (e.g. `a*b`, `a*c`)
8 `q = a*b` `a*b` None

9 `r = b*c` `b*c` None

🔶 i) Is `a*b` available at statement 6 and 8?

🔹 At Statement 6:
Path: From statement 1 → 2 → 3 → 6
`a*b` is computed at line 1 and again at line 4 (which is skipped if line 3 jumps directly to 6).
Between line 1 and line 6, neither `a` nor `b` are redefined.
✅ Yes, `a*b` is available at statement 6

🔹 At Statement 8:
Path: Statement 7 modifies `a`
Since `a` is changed at line 7, any previously computed `a*b` is no longer valid
❌ No, `a*b` is not available at statement 8

🔶 ii) Is `b*c` available at statement 5 and 9?

🔹 At Statement 5:
`b*c` is computed at line 2
No change to `b` or `c` between line 2 and line 5
✅ Yes, `b*c` is available at statement 5

🔹 At Statement 9:
No redefinition of `b` or `c` occurs between line 2 and line 9
Thus, the result of `b*c` at line 2 is still valid
✅ Yes, `b*c` is available at statement 9

Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 64/76
✅ Final Answers
(i)

`a*b` is available at statement 6:✅ Yes


`a*b` is not available at statement 8: ❌ No

(ii)

`b*c` is available at statement 5: ✅ Yes


`b*c` is available at statement 9: ✅ Yes

Would you like a control flow graph (CFG) to visualize these paths and dataflow?

Consider the following intermediate code statements numbered from 1 to 12


1 i=0
2 if i<n goto 4
3 goto 11
4 t1 = 4*i
5 t2 = a[t1]
6 t3 = t2 + 1
7 t4 = 4*i
8 b[t4] = t3
9 i= i+1
10 goto 2
11 t5 = 4*i
12 b[t5] = 0
Construct a control flow graph for the given code and explain which of the 4*i
computation in statement 7 and statement 11 are redundant.

To analyze redundancy in the computation of `4*i` in statements 7 and 11, we need to:

1. Construct the Control Flow Graph (CFG)


2. Trace the flow of expressions, particularly `4*i`
3. Determine if `4*i` is already computed and not invalidated before its reuse

🔷 Step 1: Intermediate Code Statements


Let’s restate the code for clarity:

cpp

1 i = 0
2 if i < n goto 4
3 goto 11
4 t1 = 4 * i
5 t2 = a[t1]
6 t3 = t2 + 1
7 t4 = 4 * i

Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 65/76
8 b[t4] = t3
9 i = i + 1
10 goto 2
11 t5 = 4 * i
12 b[t5] = 0

🔷 Step 2: Control Flow Graph (CFG)


Here's how the control flow works:

lua

+------+
| 1 |
+--+---+
|
v
+--+---+
| 2 |<-----------+
+--+---+ |
/ \ |
true false |
/ \ |
v v |
+---+ +----+ |
| 4 | | 11 | |
+---+ +----+ |
| | |
v v |
+---+ +----+ |
| 5 | | 12 | |
+---+ +----+ |
| |
v |
+---+ |
| 6 | |
+---+ |
| |
v |
+---+ |
| 7 | |
+---+ |
| |
v |
+---+ |
| 8 | |
+---+ |
| |
v |
+---+ |
| 9 | |
+---+ |
| |
v |
+---+ |
|10 |----------------------+
+---+

🔷 Step 3: Redundancy Analysis


Let’s track the expression `4 * i`:

✅ Statement 4: `t1 = 4 * i`

Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 66/76
First computation of `4*i`

✅ Statement 7: `t4 = 4 * i`

Same expression `4*i` appears again


Between statement 4 and 7:
`i` is not modified
So, `4*i` is already available in `t1`

👉 Therefore, statement 7 is redundant.


It can be replaced with `t4 = t1`

✅ Statement 11: `t5 = 4 * i`

To see if this is redundant:

The only modification to `i` occurs at statement 9: `i = i + 1`


So `i` has changed before we reach statement 11 (after the loop ends)
Even though `4*i` was computed earlier, the value of `i` has changed

👉 Therefore, statement 11 is not redundant.


`4*i` must be recomputed here.

✅ Final Answer
Statement 7: Redundant (same `4*i` already computed in statement 4, and `i` has not changed)
Statement 11: Not Redundant (`i` has been modified in statement 9)

Would you like a diagram of the control flow graph to visualize this better?

make the control flow graph with basic blocks

Here's the Control Flow Graph (CFG) with clearly defined basic blocks:

Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 67/76
B1: Statements 1 and 2 (initialization and loop condition)
B2: Statement 3 (exit jump if condition fails)
B3: Statements 4 to 10 (loop body)
B4: Statements 11 and 12 (post-loop actions)

Redundancy Analysis for `4*i` Computation:

Statement 7 (`t4 = 4*i`) is redundant because `4*i` was already computed in statement 4 and `i`
hasn't changed between 4 and 7.
Statement 11 (`t5 = 4*i`) is not redundant since `i` is updated in statement 9 (`i = i + 1`) before
reaching 11, hence the value of `4*i` has changed.

Would you like this CFG saved as an image or explained in more detail?

Show the contents of the activation record for the function call f(6) where
definition of f is given as follows and f is called from main function
int f(int n){
if n==0 return1
else returnn* f(n-1)
}

To show the activation record (AR) for the function call `f(6)` from `main`, we need to understand how
recursion builds a stack of activation records, one for each call to `f(n)` until `n == 0`.

🔍 What is an Activation Record?


An activation record (also called a stack frame) is a collection of information maintained during a
function call. It typically contains:

Return address (to return after the call)


Actual parameters (like `n`)
Local variables
Temporary values
Saved registers (optional)
Control link (link to the caller’s AR)

📌 Given Function:
c

int f(int n) {
if(n == 0) return 1;
else return n * f(n - 1);
}

Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 68/76
🧠 Function Call: `f(6)`

This results in recursive calls:


`f(6) → f(5) → f(4) → f(3) → f(2) → f(1) → f(0)`

Each call creates an activation record and pushes it onto the call stack.

📋 Example Activation Record Structure (for `f(n)`):

Field Example (for `f(3)`)


Return Address Line after `f(3)` call
Parameter `n` 3
Return Value (computed later)

Control Link Pointer to `f(4)` AR

🧮 Stack of Activation Records at Deepest Call ( `f(0)`):

markdown

Top of Stack
-------------
| f(0) AR |
| n = 0 |
-------------
| f(1) AR |
| n = 1 |
-------------
| f(2) AR |
| n = 2 |
-------------
| f(3) AR |
| n = 3 |
-------------
| f(4) AR |
| n = 4 |
-------------
| f(5) AR |
| n = 5 |
-------------
| f(6) AR |
| n = 6 |
-------------
| main AR |
-------------
Bottom of Stack

Each activation record holds the parameter `n`, a return address, and a link to the caller's frame.

Would you like a visual diagram of the activation record stack for this recursive call?

Write a brief note on structure preserving transfromations.

📘 Structure Preserving Transformations


Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 69/76
Structure preserving transformations are a category of code optimizations applied during
intermediate code optimization in compilers. These transformations improve the efficiency of the
code without changing the structure or the flow of the original program. That means the control flow
graph (CFG) of the program remains unchanged, even though individual instructions may be
optimized.

🔍 Key Characteristics:
Do not alter the flow of control.
Preserve program structure, making them safe and easy to apply.
Usually applied locally (within a basic block) or globally (across basic blocks).

✅ Common Structure Preserving Transformations:


1. Common Subexpression Elimination (CSE)
Avoid recomputing expressions that have already been evaluated.

// Before
x = a + b;
y = a + b;
// After
x = a + b;
y = x;

2. Copy Propagation
Replaces the occurrences of variables that are simply copies of other variables.

// Before
x = y;
z = x + 1;
// After
z = y + 1;

3. Dead Code Elimination


Removes code that computes values never used later.

// Before
x = a + b;
// x is never used
// After
// (x = a + b;) is removed

4. Constant Folding
Evaluates constant expressions at compile time.

Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 70/76
c

// Before
x = 3 * 4;
// After
x = 12;

5. Strength Reduction
Replaces expensive operations with cheaper ones.

// Before
x = y * 2;
// After
x = y + y;

📌 Why Important?
Improves performance (speed and/or size)
Keeps program semantics unchanged
Makes further optimizations possible

Would you like examples or a visual showing how these work in a control flow graph?

Draw the activation tree for the function call f(5) where definition of f is given
as follows
int f(int n){
if n==0 return 0
else if n==1 return 1
else return f(n-1) + f(n-2)
}

To draw the activation tree for the function call `f(5)` for the Fibonacci-like function:

int f(int n){


if (n == 0) return 0;
else if (n == 1) return 1;
else return f(n - 1) + f(n - 2);
}

We visualize the function calls as a tree, where each node represents a call to `f(n)` and its children
represent the recursive calls it makes.

📐 Activation Tree for `f(5)`

Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 71/76
markdown

f(5)
/ \
f(4) f(3)
/ \ / \
f(3) f(2) f(2) f(1)
/ \ / \ / \
f(2) f(1) f(1)f(0) f(1)
/ \
f(1) f(0)

🔍 Explanation:
Each node `f(n)` makes:

A call to `f(n-1)`
A call to `f(n-2)`

The recursion terminates when `n == 0` or `n == 1`, which are the base cases.

📊 Number of Function Activations:


The number of function calls grows exponentially, following the Fibonacci growth. The activation tree
reflects this by repeating many subcomputations, e.g., `f(2)`, `f(1)` appear multiple times.
Would you like this as a diagram image too?

Draw the activation tree for the function call f(5) where definition of f is given
as follows
int f(int n){
if n==0 return 0
else if n==1 return 1
else return f(n-1) + f(n-2)
}

show it like tree no need of image

Here is the activation tree for the function call `f(5)` based on the recursive Fibonacci-like function:

plaintext

f(5)
/ \
f(4) f(3)
/ \ / \
f(3) f(2) f(2) f(1)
/ \ / \ / \
f(2) f(1) f(1) f(0) f(1)

Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 72/76
/ \
f(1) f(0)

Explanation:

Each node `f(n)` branches into two children: `f(n-1)` and `f(n-2)`.
The base cases are `f(0)` and `f(1)`, which do not branch further.
This tree illustrates the recursive calls that occur during the evaluation of `f(5)`.

Would you like a table showing the number of times each `f(n)` is called?
Image created

Write quadruple, triple , and indirect triple for the following statement
x = y*-z + y*-z

Let's break the expression `x = y * -z + y * -z` into smaller parts and generate the Three Address
Code (TAC), then derive:

1. Quadruples
2. Triples
3. Indirect Triples

✅ Three Address Code (TAC)


We simplify the expression step by step:

ini

t1 = -z
t2 = y * t1

Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 73/76
t3 = -z
t4 = y * t3
t5 = t2 + t4
x = t5

📘 Quadruples
Index Operator Arg1 Arg2 Result
1 UMINUS z — t1
2 * y t1 t2
3 UMINUS z — t3
4 * y t3 t4

5 + t2 t4 t5
6 = t5 — x

📘 Triples
Index Operator Arg1 Arg2
1 UMINUS z —
2 * y (1)
3 UMINUS z —
4 * y (3)
5 + (2) (4)
6 = (5) —

📘 Indirect Triples
We store pointers (indexes) into the triple table:
Pointer Table

Index Triple Index


1 1
2 2
3 3
4 4
5 5
6 6

The triples themselves remain the same as above.

Would you like a visual diagram of this structure?

Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 74/76
What is an induction variable and identify the induction variables in the
following statements in C
for( i=0,j=1,k=2,l=4 ; i< n ; i++){
j= j+2
k = k+3
l = l+4
sum = sum + i*j*k
}

✅ What is an Induction Variable?


An induction variable is a variable that gets updated in a predictable arithmetic progression (usually
linear) within a loop. These are commonly used in loop optimization (like strength reduction or loop
unrolling) during compiler optimization.
Typically, induction variables follow a pattern like:

i = i + c; // linear induction
i = i * c; // sometimes non-linear induction

🔍 Now, for the given code:


c

for(i = 0, j = 1, k = 2, l = 4; i < n; i++) {


j = j + 2;
k = k + 3;
l = l + 4;
sum = sum + i * j * k;
}

🧠 Let's analyze:
`i` starts at `0` and is incremented by 1 → induction variable.
`j` starts at `1` and is incremented by `2` each iteration → induction variable.
`k` starts at `2` and is incremented by `3` each iteration → induction variable.
`l` starts at `4` and is incremented by `4` each iteration → induction variable.
`sum` accumulates a computed value → not an induction variable, it's an accumulator.

✅ Induction Variables Identified:


`i` — loop counter, incremented by 1

`j` — modified by constant step 2


`k` — modified by constant step 3
`l` — modified by constant step 4

Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 75/76
Would you like to see how a compiler might optimize this using strength reduction?

Differentiate between
i) Phase and a pass ii) single-pass and multi-pass compiler.

Here's the difference between Phase and Pass, and between Single-pass and Multi-pass compilers:

✅ i) Phase vs Pass
Aspect Phase Pass
A phase is a logical step or component of
A pass refers to a complete traversal of the
Definition compilation (e.g., lexical analysis, parsing,
source code or its intermediate representation.
etc.).
More fine-grained – each compilation task is More coarse-grained – may include multiple
Granularity
a phase. phases in one pass.
Lexical analysis, Syntax analysis, Semantic First pass for syntax + symbol table creation,
Examples
analysis, Code generation. second pass for code generation.
Each phase transforms the code in some way Each pass processes the entire code once and
Objective
for the next phase. may combine several phases.

✅ ii) Single-pass vs Multi-pass Compiler


Aspect Single-pass Compiler Multi-pass Compiler
Compiler that reads and processes the Compiler that reads the source code multiple
Definition
source code only once. times, each time with a specific goal.

Slower, but allows more sophisticated


Speed Faster and requires less memory.
optimizations.
Design Simpler design, but harder to More complex design, allows for better analysis and
Simplicity implement complex features. code optimization.
Common Used in smaller or simpler compilers Used in modern compilers (e.g., Java, C++) for
Usage (e.g., early C compilers). better performance and flexibility.
Advanced optimization possible due to full
Optimization Limited optimization.
knowledge of the code.

Summary:

Phases are logical units of compilation.


Passes are physical traversals over code.
Single-pass compilers are fast but limited.
Multi-pass compilers are powerful but more complex.

Would you like an example illustrating a single-pass vs multi-pass scenario?

Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 76/76

You might also like