0% found this document useful (0 votes)
27 views115 pages

Intermediate Code Generation

IIT KGP slides on Intermediate code generation

Uploaded by

shasankgedda6
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views115 pages

Intermediate Code Generation

IIT KGP slides on Intermediate code generation

Uploaded by

shasankgedda6
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Intermediate-Code Generation

Intermediate-Code Generation

Three-Address Code
• In three-address code, there is at most one operator on the right
side of an instruction
• Thus a source-language expression like x+y*z might be translated
into the sequence of three-address instructions
Common three-address instructions
Common three-address instructions
Data structures for TAC
Quadruples
Data structures for TAC
Quadruples
Data structures for TAC
Quadruples
Data structures for TAC
Triples
• A triple has only three fields, which we call op, arg1, and arg2

• Note that the result field in Quad is used primarily for


temporary names.

• Using triples, we refer to the result of an operation x op y by its


position, rather than by an explicit temporary name.

• Thus, instead of the temporary t, a triple representation would


refer to position (0).

• Parenthesized numbers represent pointers into the triple


structure itself.
Data structures for TAC
Triples
Benefit of Quad over Triples

• A benefit of quadruples over triples can be seen in an optimizing


compiler, where instructions are often moved around.
• With quadruples, if we move an instruction that
computes a temporary t, then the instructions that use t
require no change.

• With triples, the result of an operation is referred to by its


position

• So moving an instruction may require us to change all references


to that result.
Indirect triples
• Consist of a listing of pointers to triples,
• Rather than a listing of triples themselves.
• For example, use an array instruction to list pointers to triples in
the desired order.

With indirect triples, an optimizing compiler can move an instruction by


reordering the instruction list, without affecting the triples themselves.
Static Single-Assignment Form
Two distinctive aspects distinguish SSA from three-address code.
(a) The first is that all assignments in SSA are to variables
with distinct names; hence the term static single-assignment.
Static Single-Assignment Form
Two distinctive aspects distinguish SSA from three-address code.
(a) The first is that all assignments in SSA are to variables
with distinct names; hence the term static single-assignment.
(b)
Static Single-Assignment Form
Major translation classes of Three address
code generation
(a) Declaration statements (+ handling data
type and storage)

(b) Expressions and statements

(a) Control flow statements


Declaration statement
Representing data types: Type Expressions

Types have structure, which we shall represent using type


expressions.

• A type expression is either a basic type (boolean, char, integer,


float, and void)
or
• is formed by applying an operator called a type constructor to a
type expression.
• A type expression can be formed by applying the array type
constructor to a number and a type expression.
Declaration statement
• The array type int [2] [3] can be read as "array of 2
arrays of 3 integers each"
• Can be represented as a type expression array(2,
array(3, integer)).
• This type is represented by the tree.

• The operator array takes two parameters, a number


and a type.
• Here the type expression can be formed by
applying the array type constructor to a number
and a type expression.
Declaration statement
Example SDD

Type Expressions

• Nonterminal T generates either a basic type or an array type.


• Nonterminal B generates one of the basic types int and float.
• T generates a basic type when C derives €.
• Otherwise, C generates array components consisting of a sequence of
integers, each integer surrounded by brackets.
Declaration statement
Example SDD

Type Expressions

• The nonterminals B and T have a synthesized attribute t


representing a type.
• The nonterminal C has two attributes: an inherited attribute b
and a synthesized attribute t.
Declaration statement
Example SDD
input string int [2][3]

• The nonterminal C has two attributes: an inherited attribute b


and a synthesized attribute t.
• The inherited b attributes pass a basic type down the tree, and
the synthesized t attributes accumulate the result.
Declaration statement: Example SDD
Symbol table

Find the storage for each variable


More on Declaration statement
Data type + Storage layout
• Simplified grammar that
declares just one name at
a time;
• We already explored the
declarations with lists of
names

Storage layout:
• Relative address of all the variables
• From the type of a variable, we can determine the amount of storage that
will be needed for the variable at run time.
• At compile time, we can use these amounts to assign each variable a
relative address.
• The type and relative address are saved in the symbol-table entry for the
variable name.
More on Declaration statement
Data type + Storage layout
• The width of a type is the number of storage units needed for
objects of that type.
• A basic type, such as a character, integer, or float, requires an
integral number of bytes.
• Arrays allocated in one contiguous block of bytes
Computes data types and
their widths for basic and
array types

Type Expressions
The width of an array is obtained by multiplying the
width of an element by the number of elements in the
array.
More on Declaration statement
Data type + Storage layout
Relative address

Name Data type Relative address


Sequences of Declarations int x;
float y;
• Relative address: offset
• Keeps track of the next available relative address

• The translation scheme deals with a sequence of declarations of


the form T id, where T generates a data type
• Before the first declaration is considered, offset is set to 0.
• As each new name x is seen, x is entered into the symbol table
with its relative address = current value of offset,
• which is then incremented by the width of the type of x.
Sequences of Declarations

• Relative address: offset


• Keeps track of the next available relative address
Record or Structure data type
Record or Structure data type
Record or Structure data type
Homework

int x, y;
float p, q Write the grammar, SDD
record and populate the symbol
{ int x; table by executing those
float q; rules
} a;
char b;
Major translation classes of Three address
code generation
(a) Declaration statements (+ handling data
type and storage)

(b) Expressions and statements

(a) Control flow statements


Translation of Expressions
Three-address code for an assignment statement

Attribute code for S


attributes addr and code for an expression E.
Attributes S.code and E.code denote the three-address code for S and E, respectively.
Attribute E.addr denotes the address that will hold the value of E.
Translation of Expressions
Three-address code for an assignment statement

When an expression is a single identifier, say x,


then x itself holds the value of the expression.

The semantic rules for this production define


E.addr to point to the symbol-table entry

Address of the variable


Translation of Expressions
Three-address code for an assignment statement
Translation of Expressions
Three-address code for an assignment statement
Translation of Expressions
Three-address code for an assignment statement
E.code=‘ ’

E.code=‘ ’

E.code=‘ ’
E.code=
E.code=‘ ’
E.code=
E.code=‘ ’

S.code=
Translation of Expressions
Three-address code for an assignment statement
Incremental Translation

• So far, E.Code attributes were long strings


• Generated incrementally

• In incremental translation, generate only the new three-


address instructions

• Past sequence may either be retained in memory for


further processing, or it may be output incrementally.

• In the incremental approach, gen() not only constructs a


three-address instruction,
• it appends the instruction to the sequence of
instructions generated so far.
Incremental Translation

• This translation scheme generates the same code as the


previous syntax directed definition.
• With the incremental approach, the E.code attribute is not used,
• Since there is a single sequence of instructions that is
created by successive calls to gen().
Translation of Array Expressions
Translation of Array Expressions

Translate 2D index to 1D index

Let w1 be the length of a row (# of columns) and w2 be the size of an


element in a row.
The relative address of A[i1][i2] can be calculated by the formula
Translation of Array Expressions

Objective a is a 2×3 array


Translation of Array Expressions
L.addr denotes a temporary
variable containing the offset
of the array elements (array
index)
Translation of Array Expressions

• L.array is a pointer to the


symbol-table entry for the array
name.

• L.array.base indicates base


address of the array -- array
name

• L.type is the type t of the


subarray generated by L.

• For array type t, we assume that


its width is given by t.width.

• For any array type t, t.elem


gives the array element.
Translation of Array Expressions
Standard

Standard
Translation of Array Expressions
Translation of Array Expressions

Parse tree
Translation of Array Expressions
Translation of Array Expressions

a[i]
Translation of Array Expressions

a[i][j]
Translation of Array Expressions
Translation of Array Expressions
Control Flow
Translation of statements such as if-else-statements and while-
statements
Key step: Translation of boolean expressions

Boolean expressions are used as


(i) Conditional expressions in statements that alter the flow of
control
(ii) A boolean expression can evaluate true Or false as values. Such
boolean expressions can be evaluated in analogy to arithmetic
expressions using three-address instructions with logical
operators

• The intended use of boolean expressions is determined by its


syntactic context.
• We concentrate on the use of boolean expressions to alter the
flow of control.
• For clarity, we introduce a new nonterminal B
Control Flow: Three address codes
Boolean Expressions

Short-Circuit Code
Control Flow
Translation of Flow-of-Control Statements
into three-address code

• Nonterminal B represents a boolean expression and nonterminal S


represents a statement.
• Both B and S have a synthesized attribute code, which gives the
translation into three-address instructions.
• Inherited attributes B.true, B.false, S.next generate labels for control
flow
• B.true the label to which control flows if B is true,
• B.false, the label to which control flows if B is false.
• With a statement S, we associate an inherited attribute S.next
denoting a label for the instruction immediately after the code for S.
• Nonterminal B represents a boolean expression and nonterminal S
represents a statement.
• Both B and S have a synthesized attribute code, which gives the
translation into three-address instructions.
• Inherited attributes B.true, B.false, S.next generate labels for control
flow
• B.true the label to which control flows if B is true,
• B.false, the label to which control flows if B is false.
• With a statement S, we associate an inherited attribute S.next
denoting a label for the instruction immediately after the code for S.
• newlabel() creates a new label each time it is called,
• label(L) attaches label L to the next three-address instruction to be
generated

• A program consists of a statement generated by P -> S.


• The semantic rules associated with this production initialize S.next to a
new label.
• P.code consists of S.code followed by the new label S.next.
standard
S→ id=E;

assign in the production S —> assign is a placeholder for


assignment statements.
• In translating S -> if (B) S1, the semantic rules create a new label B.true
and attach it to the first three-address instruction generated for the
statement S1.
• Thus, jumps to B.true within the code for B will go to the code S1.
• By setting B.false to S.next, we ensure that control will skip the code
for S1 if B evaluates to false.
• In translating S -> if (B) S1, the semantic rules create a new label B.true
and attach it to the first three-address instruction generated for the
statement S1.
• Thus, jumps to B.true within the code for B will go to the code S1.
• By setting B.false to S.next, we ensure that control will skip the code
for S1 if B evaluates to false.
Translation of Boolean Expressions

if a>b
x=0 S→ id=E;
S→ id=E;

S.next=L1 P.Code:
S.code
L1:

if a>b
x=0
S.next=L1 P.Code:
B.true= L2 B.code
B. false=L1 L2:
S1.code
L1:
if a>b
x=0
S.next=L1 P.Code:
B.true= L2 if a>b goto L2
B. false=L1 goto L1
L2: S1
L1:…… if a>b
x=0
Translation of Boolean Expressions

If B1 is true, then we immediately know that B itself


is true, so B1.true is the same as B.true.
If B1 is false, then B2 must be evaluated, so B1.false
gets the label of the first instruction in the code for B2

The true and false exits of B2 are the same as the true and false
B1 exits of B, respectively.
B
L2
B2

L1
Translation of Boolean Expressions
Translation of if-else statement

• In translating the if-else-statement, if B is true , the code for the


boolean expression B has jumps to the first instruction of the code
for S1,
• if B is false, control jumps to the first instruction of the code for S2.
• Further, control flows from both S1 and S2 to the three-address
instruction immediately following the code for S — its label is
given by the inherited attribute S.next.
• An explicit goto S.next appears after the code for S1 to skip over
the code for S2 . No goto is needed after S2 , since S2.next is the
same as S.next.
S.next=L1

S→ id=E;

if a>b
x=0;
else
y=0; L1:
S.next=L1

S→ id=E;

B.code=
if a>b goto L2
goto L3
if a>b
x=0;
else L1:
y=0;
S.next=L1

S→ id=E;

S.code=
if a>b goto L2
goto L3
if a>b L2: x=0
x=0; goto L1
else L3: y=0
L1:
y=0;
Translation of while statement

• We use a local variable begin to hold a new label attached to the first
instruction for this while-statement,
• which is also the first instruction for B.
• Variable begin is local to the semantic rules for this production.
• The inherited label S.next marks the instruction that control must flow to if B
is false; hence, B.false is set to be S.next.
• A new label B.true is attached to the first instruction for S1;
• the code for B generates a jump to this label if B is true.
• After the code for S1 we place the instruction goto begin, which causes a jump
back to the beginning of the code for the boolean expression.
• Note that S1.next is set to this label begin
Avoiding Redundant Gotos
Free fall to L4 when x>200
Jumps to L1 only when x<=200

The code for statement S1


immediately follows the code for the
boolean expression B
if
B

L1 S1
L2

Jump when false

L1
Avoiding Redundant Gotos

Jump when true


Jump when false
Avoiding Redundant Gotos

• Meaning of label fall for B is different from its


meaning for B1.
• Suppose B.true is fall; i.e, control falls through B, if B B1
evaluates to true. B
L2
• Although B evaluates to true if B1 does, B1.true must B2
ensure that control jumps over the code for B2 to get
to the next instruction after B.
L1
Avoiding Redundant Gotos
Avoiding Redundant Gotos

change

change
Backpatching: Problem

S.next=L1 P.Code:
B.true= L2 if a>b goto L2
B. false=L1 goto L1
L2: S1
L1:…… if a>b
x=0
Backpatching
• Key problem when generating code for boolean expressions and flow-of-
control statements is that
• Matching a jump instruction with the target of the jump.
• For example, the translation of the boolean expression B in if ( B ) S contains
a jump, for when B is false, to the instruction following the code for S.

• In a one-pass translation, B must be translated before S is examined.

• What then is the target of the goto that jumps over the code for S?

if
B

L1 S
L2
Backpatching
• Lists of jumps are passed as synthesized attributes.
• When a jump is generated, the target of the jump is temporarily left
unspecified.
• Each such jump is put on a list of jumps whose labels are to be filled in
when the proper label can be determined.
• All of the jumps on a list have the same target label.

• Synthesized attributes truelist and falselist of nonterminal B are used to


manage labels.
• B.truelist will be a list of jump or conditional jump instructions into
which, we must insert the label to which control goes if B is true.
• B.falselist likewise is the list of instructions that eventually get the label
to which control goes when B is false.
• Code is generated for B, jumps to the true and false exits are left
incomplete, with the label field unfilled.
• These incomplete jumps are placed on lists pointed to by B.truelist and
B.falselist, as appropriate.
Backpatching
• We generate instructions into an instruction array,
• Labels will be indices into this array.
• To manipulate lists of jumps, we use three functions:

1. makelist(i) creates a new list containing an index i into the array of


instructions; makelist returns a pointer to the newly created list.

2. merge(p1,p2) concatenates the lists pointed to by p1 and p2, and returns


a pointer to the concatenated list.

3. backpatch(p,i) inserts i as the target label for each of the instructions on


the list pointed to by p.
Backpatching for Boolean Expressions
• We now construct a translation scheme suitable for generating code for
Boolean expressions during bottom-up parsing.

Semantic actions generates two instructions, a conditional goto and an


unconditional goto.
Neither has its target filled in.
These instructions are put on new lists, pointed to by B. truelist and B .falselist,
respectively
Backpatching for Boolean Expressions

• We now construct a translation scheme suitable for generating code for


Boolean expressions during bottom-up parsing.
• A marker nonterminal M causes a semantic action to pick up, at
appropriate times, the index of the next instruction to be generated.

• If B1 is true, then B is also true, so the jumps on B1.truelist become part of


B.truelist.
• If B1 is false, however, we must next test B2,
• So the target for the jumps B1.falselist must be the beginning of the code
generated for B2.
• This target is obtained using the marker nonterminal M.
• That nonterminal M produces, as a synthesized attribute M.instr, the index
of the next instruction, just before B2 code starts being generated.
Backpatching for Boolean Expressions
Flow-of-Control Statements

Boolean expressions generated by nonterminal B have two lists of


jumps, B.truelist and B.falselist, corresponding to the true and false
exits from the code for B

Statements generated by nonterminals S and L have a list of unfilled


jumps

S.nextlist is a list of all conditional and unconditional jumps to the


instruction following the code for statement S in execution order.
L.nextlist is defined similarly.
Flow-of-Control Statements
Flow-of-Control Statements (while)

• The two occurrences of the marker nonterminal M record the


instruction numbers of the
• M1--beginning of the code for B (begin) and the
• M2--beginning of the code for S1 (B.true).
Flow-of-Control Statements (while)

• Attribute M.instr points to the number of the next instruction.


• After the body S1 of the while-statement is executed,
• Control flows to the beginning.
• We backpatch S1.nextlist to make all targets on that list be
M1.instr.
• An explicit jump to the beginning of the code for B is appended after
the code for S1

• B. truelist is backpatched to go to the beginning of S1 ‘


• by making jumps on B.truelist go to M2.instr
Flow-of-Control Statements (if-else)
Flow-of-Control Statements (if-else)

• S1 is an assignment statement, we must


include at the end of the code for S1 a jump
over the code for S2

• N has attribute N.nextlist, which will be a list


consisting of the instruction number of the
jump goto _
Flow-of-Control Statements (if-else)
Translation of if-else statement
Intermediate Code for Procedures

• Nonterminals D and T generate function definition and types,


respectively
• A function definition generated by D consists of keyword define, a return
type, the function name, formal parameters in parentheses and a function
body consisting of a statement.
• Nonterminal F generates zero or more formal parameters, where a
formal parameter consists of a type followed by an identifier.
• Nonterminals S and E generate statements and expressions,
respectively. The production for S adds a statement that returns the
value of an expression.
• The production for E adds function calls, with actual parameters
generated by A. An actual parameter is an expression.
Intermediate Code for Procedures

Function calls.
• When generating three-address instructions for a function call id(E, E,... , E),
• It is sufficient to generate the three-address instructions for evaluating or
reducing the parameters E to addresses (E.addr),
• Followed by a param instruction for each parameter.
Intermediate Code for Procedures

Symbol tables.
• Let s be the top symbol table when the function definition is reached.
• The function name is entered into symbol table s for use in the rest of the
program.
• Data type function (return type, parameter type) inserted in symbol table
--------------------------------------------------
• The formal parameters of a function can be handled in analogy with field names
in a record
• In the production for D, after seeing define and the function name, we push s
and set up a new symbol table
• Env.push(top)]
• t = new Env();
• The new symbol table, t.
• The new table t is used to translate the function body.
• We revert to the previous symbol table s after the function body is translated.
Record or Structure data type
Type Conversions
• Consider expressions like x + i, where x is of type float and i is of type integer

• Since the representation of integers and floating-point numbers is different


within a computer
• Compiler may need to convert one of the operands of + to ensure that both
operands are of the same type when the addition occurs.
Type Conversions

Type conversion rules


widening conversions, which are intended to preserve information,

• Conversion from one type to another is said to be


implicit if it is done automatically by the compiler.
• Conversion is said to be explicit if the programmer must
write something to cause the conversion.
• Explicit conversions are also called type casts
New semantic action illustrates how type conversions can be added to the
SDT for translating expressions
New semantic action illustrates how type conversions can be added to the
SDT for translating expressions

You might also like