CD Unit 3 - Merged
CD Unit 3 - Merged
1
NEED FOR SEMANTIC ANALYSIS
Semantic analysis is a phase by a compiler that adds semantic information to the parse tree
and performs certain checks based on this information.
It logically follows the parsing phase, in which the parse tree is generated, and logically
precedes the code generation phase, in which (intermediate/target) code is generated. (In a
compiler implementation, it may be possible to fold different phases into one pass.)
Typical examples of semantic information that is added and checked is typing information
( type checking ) and the binding of variables and function names to their definitions
( object binding ).
Sometimes also some early code optimization is done in this phase. For this phase the
compiler usually maintains symbol tables in which it stores what each symbol (variable
names, function names, etc.) refers to.
2. Type checking : The process of verifying and enforcing the constraints of types is called type checking.
This may occur either at compile-time (a static check) or run-time (` dynamic check).
Static type checking is a primary task of the semantic analysis carried out by a compiler.
If type rules are enforced strongly (that is, generally allowing only those automatic type conversions
which do not lose information), the process is called strongly typed, if not, weakly typed.
3. Uniqueness checking : Whether a variable name is unique or not, in the its scope.
4. Type coercion : If some kind of mixing of types is allowed. Done in languages which are not
strongly typed. This can be done dynamically as well as statically.
5. Name Checks : Check whether any variable has a name which is not allowed. Ex. Name is same
as an identifier( Ex. int in java).
2
A parser has its own limitations in catching program errors related to semantics, something
that is deeper than syntax analysis.
Typical features of semantic analysis cannot be modeled using context free grammar
formalism.
If one tries to incorporate those features in the definition of a language then that language
doesn't remain context free anymore.
These are a couple of examples which tell us that typically what a compiler has to do
beyond syntax analysis.
An identifier x can be declared in two separate functions in the program, once of the type
int and then of the type char. Hence the same identifier will have to be bound to these two
different properties in the two different contexts.
Semantic Errors
We have mentioned some of the semantics errors that the semantic analyzer is expected to
recognize:
Type mismatch
Undeclared variable
Reserved identifier misuse.
Multiple declaration of variable in a scope.
Accessing an out of scope variable.
Actual and formal parameter mismatch.
Syntax Directed Translation
The Principle of Syntax Directed Translation states that the meaning of an input sentence is related
to its syntactic structure, i.e., to its Parse-Tree. By Syntax Directed Translations we indicate those
formalisms for specifying translations for programming language constructs guided by context-
free grammars.
– We associate Attributes to the grammar symbols representing the language constructs.
– Values for attributes are computed by Semantic Rules associated with grammar
productions.
Evaluation of Semantic Rules may:
– Generate Code;
– Insert information into the Symbol Table;
3
– Perform Semantic Check;
– Issue error messages;
– etc.
There are two ways to represent the semantic rules associated with grammar symbols.
1. Syntax-Directed Definitions (SDD)
2. Syntax-Directed Translation Schemes (SDT)
ATTRIBUTE GRAMMAR
Attributes are properties associated with grammar symbols. Attributes can be numbers,
strings, memory locations, data types, etc.
Attribute grammar is a special form of context-free grammar where some additional
information (attributes) are appended to one or more of its non-terminals in order to
provide context-sensitive information.
The right part of the CFG contains the semantic rules that specify how the grammar should
be interpreted. Here, the values of non-terminals E and T are added together and the result
is copied to the non-terminal E.
Semantic attributes may be assigned to their values from their domain at the time of
parsing and evaluated at the time of assignment or conditions.
4
Based on the way the attributes get their values, they can be broadly divided into two
categories : synthesized attributes and inherited attributes
ATTRIBUTES
Synthesized Inherited
1. Synthesized Attributes: These are those attributes which get their values from their
children nodes i.e. value of synthesized attribute at node is computed from the values of
attributes at children nodes in parse tree.
To illustrate, assume the following production:
EXAMPLE: S -> ABC
S.a= A.a,B.a,C.a
If S is taking values from its child nodes (A, B, C), then it is said to be a synthesized attribute, as
the values of ABC are synthesized to S.
Computation of Synthesized Attributes
Write the SDD using appropriate semantic rules for each production in given grammar.
The annotated parse tree is generated and attribute values are computed in bottom up
manner.
The value obtained at root node is the final output.
Consider the following grammar:
S --> E
E --> E1 +
T E --> T
T --> T1 *
F T --> F
F --> digit
5
Let us assume an input string 4 * 5 + 6 for computing synthesized attributes. The annotated parse
tree for the input string is
S
For computation of attributes we start from leftmost bottom node. The rule F –> digit is
used to reduce digit to F and the value of digit is obtained from lexical analyzer which
becomes value of F i.e. from semantic action F.val = digit.lexval.
Hence, F.val = 4 and since T is parent node of F so, we get T.val = 4 from semantic action
T.val = F.val.
Then, for T –> T1 * F production, the corresponding semantic action is T.val = T1.val *
F.val . Hence, T.val = 4 * 5 = 20
Similarly, combination of E1.val + T.val becomes E.val i.e. E.val = E1.val + T.val = 26.
Then, the production S –> E is applied to reduce E.val = 26 and semantic action associated
with it prints the result E.val . Hence, the output will be 26.
B can get values from A, C and D. C can take values from A, B, and D. Likewise, D can
take values from A, B, and C.
D --> T L
T --> int
T --> float
T --> double
L --> L1, id
L --> id
7
Let us assume an input string int a, c for computing inherited attributes. The annotated parse
tree for the input string is
The value of L nodes is obtained from T.type (sibling) which is basically lexical value
obtained as int, float or double.
Then L node gives type of identifiers a and c. The computation of type is done in top
down manner or preorder traversal.
Using function Enter_type the type of identifiers a and c is inserted in symbol table at
corresponding id.entry.
8
The interdependencies among the attributes of the various nodes of a parse-tree can be
depicted by a directed graph called a dependency graph.
There is a node for each attribute;
If attribute b depends on an attribute c there is a link from the node for c to
the node for b (b ← c).
Dependency Rule: If an attribute b depends from an attribute c, then we need to find the
semantic rule for c first and then the semantic rule for b.
9
2. Evaluation order
A dependency graph characterizes the possible order in which we can calculate the
attributes at various nodes of the parse tree.
If there is an edge from node M to N, then the attribute corresponding to M first be
evaluated before evaluating N.
Thus the allowable orders of evaluation are N1,N2…..,Nk such that if there is an edge
from Ni toNj then i<j
Such an ordering embeds a directed graph into a linear order, and is called a topological
sort of the graph.
If there is any cycle in the graph ,then there is no topologicalsort.ie, there is no way to
evaluate SDD on this parse tree.
TYPES OF SDT’S
1. S –attributed definition
2. L –attributed definition
S-attributed definition
10
L –attributed definition
L stands for one parse from left to right.
Ie, If an SDT uses both synthesized attributes and inherited attributes with a restriction
that inherited attribute can inherit values from parent and left siblings only, it is called as
L-attributed SDT.
EXAMPLE:
A BCD {B.a=A.a, C.a=B.a}
C.a=D.a This is not possible
Attributes in L-attributed SDTs are evaluated by depth-first and left-to-right parsing
manner.
Semantic actions are placed anywhere in RHS.
A{ }BC
B{ }C
BC{ }
Note: (Also write SDD for declaration statement as example)
If an attribute is S attributed, it is also L attributed.
Evaluation of L-attributed SDD
11
SDD for desk calculator/SDD for evaluation of expressions
Evaluate the expression 3*5+4n using the above SDD both in bottom up and top
down approach
Solution: Bottom up evaluation for this expression is shown below
In both case first we need to draw the parse tree.
12
Then traverse from top to down left to right
In bottom up approach, whenever there is a reduction ,go to the production and carry out
the action
13
Workout problems
Traversal
begin
visit(m)
14
evaluate semantic rule at node n
end
15
Bottom up evaluation of S-attributed definition
Syntax-directed definitions with only synthesized attributes(S- attribute) can be evaluated
by a bottom up parser (BUP) as input is parsed
In this approach, the parser will keep the values of synthesized attributes associated with
the grammar symbol on its stack.
The stack is implemented as a pair of state and value.
When a reduction is made ,the values of the synthesized attributes are computed from the
attribute appearing on the stack for the grammar symbols
implementation is by using an LR parser (e.g. YACC)
16
Example :- SDD and code fragment using S attributed definition for the input 3*5+4n is as
follows:
17
Top Down Translation
Do left
recursion for this grammar
18
Type Checking
The main aim of the compiler is to translate the source program to a form that can be
executed on a target machine.
For this purpose the compiler need to
1. Check that the source program follows the syntax and semantics of the concerned
language.
2. Check the flow of data between variables.
What is type?
19
Type expressions
20
21
Specification of simple type checker
A simple language
22
This grammar will generate programs represented by the nonterminal P, consisting of
a sequence of declaration D followed by a single expression E.
type error}
error}
TT1T2 {T.type=T1.typeT2.type}
**********
24
UNIT-3 INTERMEDIATE CODE GENERATION
INTRODUCTION
The front end translates a source program into an intermediate representation from which
the back end generates target code.
1. Retargeting is facilitated. That is, a compiler for a different machine can be created by
attaching a back end for the new machine to an existing front end.
INTERMEDIATE LANGUAGES
Syntax tree
Postfix notation
The semantic rules for generating three-address code from common programming language
constructs are similar to those for constructing syntax trees or for generating postfix notation.
Graphical Representations:
Syntax tree:
A syntax tree depicts the natural hierarchical structure of a source program. A dag
(Directed Acyclic Graph) gives the same information but in a more compact way because
common subexpressions are identified. A syntax tree and dag for the assignment statement a : =
b * - c + b * - c are as follows:
assign assign
a + a +
* * *
c c c
Postfix notation:
Syntax-directed definition:
Syntax trees for assignment statements are produced by the syntax-directed definition.
Non-terminal S generates an assignment statement. The two binary operators + and * are
examples of the full operator set in a typical language. Operator associativities and precedences
are the usual ones, even though they have not been put into the grammar. This definition
constructs the tree from the input a : = b * - c + b* - c.
E ( E1 ) E.nptr : = E1.nptr
Two representations of the syntax tree are as follows. In (a) each node is represented as a
record with a field for its operator and additional fields for pointers to its children. In (b), nodes
are allocated from an array of records and the index or position of the node serves as the pointer
to the node. All the nodes in the syntax tree can be visited by following pointers, starting from
the root at position 10.
aaaaaaaaaaaaa
assign 0 id b
1 id c
id a
2 uminus
2 1
3 * 0 2
+
4 id b
5 id c
* *
6 uminus 5
id b id b
7 * 4 6
uminus uminus 8 + 3 7
9 id a
id c id c
10 assign 9 8
(a) (b)
Three-Address Code:
x : = y op z
where x, y and z are names, constants, or compiler-generated temporaries; op stands for any
operator, such as a fixed- or floating-point arithmetic operator, or a logical operator on boolean-
valued data. Thus a source language expression like x+ y*z might be translated into a sequence
t1 : = y * z
t2 : = x + t 1
The use of names for the intermediate values computed by a program allows three-
address code to be easily rearranged – unlike postfix notation.
Three-address code corresponding to the syntax tree and dag given above
t1 : = - c t 1 : = -c
t 2 : = b * t1 t2 : = b * t1
t3 : = - c t 5 : = t2 + t2
t 4 : = b * t3 a : = t5
t5 : = t2 + t4
a : = t5
(a) Code for the syntax tree (b) Code for the dag
The reason for the term “three-address code” is that each statement usually contains three
addresses, two for the operands and one for the result.
4. The unconditional jump goto L. The three-address statement with label L is the next to be
executed.
5. Conditional jumps such as if x relop y goto L. This instruction applies a relational operator (
<, =, >=, etc. ) to x and y, and executes the statement with label L next if x stands in relation
relop to y. If not, the three-address statement following if x relop y goto L is executed next,
as in the usual sequence.
6. param x and call p, n for procedure calls and return y, where y representing a returned value
is optional. For example,
param x1
param x2
...
param xn
call p,n
generated as part of a call of the procedure p(x 1, x2, …. ,xn ).
When three-address code is generated, temporary names are made up for the interior
nodes of a syntax tree. For example, id : = E consists of code to evaluate E into some temporary
t, followed by the assignment id.place : = t.
E E1 + E2 E.place := newtemp;
E.code := E1.code || E2.code || gen(E.place ‘:=’ E1.place ‘+’ E2.place)
E E1 * E2 E.place := newtemp;
E.code := E1.code || E2.code || gen(E.place ‘:=’ E 1.place ‘*’ E2.place)
E - E1 E.place := newtemp;
E.code := E1.code || gen(E.place ‘:=’ ‘uminus’ E1.place)
E ( E1 ) E.place : = E1.place;
E.code : = E1.code
E id E.place : = id.place;
E.code : = ‘ ‘
Semantic rules generating code for a while statement
S.begin:
E.code
S1.code
goto S.begin
S.after: ...
Triples
Indirect triples
Quadruples:
A quadruple is a record structure with four fields, which are, op, arg1, arg2 and result.
The op field contains an internal code for the operator. The three-address statement x : =
y op z is represented by placing y in arg1, z in arg2 and x in result.
The contents of fields arg1, arg2 and result are normally pointers to the symbol-table
entries for the names represented by these fields. If so, temporary names must be entered
into the symbol table as they are created.
Triples:
To avoid entering temporary names into the symbol table, we might refer to a temporary
value by the position of the statement that computes it.
If we do so, three-address statements can be represented by records with only three fields:
op, arg1 and arg2.
The fields arg1 and arg2, for the arguments of op, are either pointers to the symbol table
or pointers into the triple structure ( for temporary values ).
Since three fields are used, this intermediate code format is known as triples.
Indirect Triples:
For example, let us use an array statement to list pointers to triples in the desired order.
Then the triples shown above might be represented as follows:
DECLARATIONS
Before the first declaration is considered, offset is set to 0. As each new name is seen ,
that name is entered in the symbol table with offset equal to the current value of offset,
and offset is incremented by the width of the data object denoted by that name.
The procedure enter( name, type, offset ) creates a symbol-table entry for name, gives its
type type and relative address offset in its data area.
Attribute type represents a type expression constructed from the basic types integer and
real by applying the type constructors pointer and array. If type expressions are
represented by graphs, then attribute type might be a pointer to the node representing a
type expression.
The width of an array is obtained by multiplying the width of each element by the
number of elements in the array. The width of each pointer is assumed to be 4.
P D { offset : = 0 }
DD;D
PD
D D ; D | id : T | proc id ; D ; S
One possible implementation of a symbol table is a linked list of entries for names.
A new symbol table is created when a procedure declaration D proc id D1;S is seen,
and entries for the declarations in D 1 are created in the new table. The new table points back to
the symbol table of the enclosing procedure; the name represented by id itself is local to the
enclosing procedure. The only change from the treatment of variable declarations is that the
procedure enter is told which symbol table to make an entry in.
For example, consider the symbol tables for procedures readarray, exchange, and
quicksort pointing back to that for the containing procedure sort, consisting of the entire
program. Since partition is declared within quicksort, its table points to that of quicksort.
sort
nil header
a
x
readarray to readarray
exchange to exchange
quicksort
partition
header
i
j
The semantic rules are defined in terms of the following operations:
1. mktable(previous) creates a new symbol table and returns a pointer to the new table. The
argument previous points to a previously created symbol table, presumably that for the
enclosing procedure.
2. enter(table, name, type, offset) creates a new entry for name name in the symbol table pointed
to by table. Again, enter places type type and relative address offset in fields within the entry.
3. addwidth(table, width) records the cumulative width of all the entries in table in the header
associated with this symbol table.
4. enterproc(table, name, newtable) creates a new entry for procedure name in the symbol table
pointed to by table. The argument newtable points to the symbol table for this procedure
name.
D D1 ; D2
The stack tblptr is used to contain pointers to the tables for sort, quicksort, and partition
when the declarations in partition are considered.
The top element of stack offset is the next available relative address for a local of the
current procedure.
A BC {actionA}
are done before actionA at the end of the production occurs. Hence, the action associated
with the marker M is the first to be done.
The action for nonterminal M initializes stack tblptr with a symbol table for the
outermost scope, created by operation mktable(nil). The action also pushes relative
address 0 onto stack offset.
For each variable declaration id: T, an entry is created for id in the current symbol table.
The top of stack offset is incremented by T.width.
When the action on the right side of D proc id; ND1; S occurs, the width of all
declarations generated by D1 is on the top of stack offset; it is recorded using addwidth.
Stacks tblptr and offset are then popped.
At this point, the name of the enclosed procedure is entered into the symbol table of its
enclosing procedure.
ASSIGNMENT STATEMENTS
Suppose that the context in which an assignment appears is given by the following grammar.
PMD
Mɛ
D D ; D | id : T | proc id ; N D ; S
Nɛ
Nonterminal P becomes the new start symbol when these productions are added to those in the
translation scheme shown below.
S id : = E { p : = lookup ( id.name);
if p ≠ nil then
emit( p ‘ : =’ E.place)
else error }
E E1 + E2 { E.place : = newtemp;
emit( E.place ‘: =’ E1.place ‘ + ‘ E2.place ) }
E E1 * E2 { E.place : = newtemp;
emit( E.place ‘: =’ E1.place ‘ * ‘ E2.place ) }
E - E1 { E.place : = newtemp;
emit ( E.place ‘: =’ ‘uminus’ E1.place ) }
E ( E1 ) { E.place : = E1.place }
E id { p : = lookup ( id.name);
if p ≠ nil then
E.place : = p
else error }
Temporaries can be reused by changing newtemp. The code generated by the rules for E
E1 + E2 has the general form:
evaluate E1 into t1
evaluate E2 into t2
t : = t 1 + t2
The lifetimes of these temporaries are nested like matching pairs of balanced parentheses.
statement value of c
0
$0 := a * b 1
$1 := c * d 2
$0 := $0 + $1 1
$1 := e * f 2
$0 := $0 - $1 1
x := $0 0
Elements of an array can be accessed quickly if the elements are stored in a block of
consecutive locations. If the width of each array element is w, then the ith element of array A
begins in location
base + ( i – low ) x w
where low is the lower bound on the subscript and base is the relative address of the storage
allocated for the array. That is, base is the relative address of A[low].
The expression can be partially evaluated at compile time if it is rewritten as
i x w + ( base – low x w)
The subexpression c = base – low x w can be evaluated when the declaration of the array is seen.
We assume that c is saved in the symbol table entry for A , so the relative address of A[i] is
obtained by simply adding i x w to c.
Row-major (row-by-row)
Column-major (column-by-column)
A[ 1,1 ] A [ 1,1 ]
first column
first row A[ 1,2 ] A [ 2,1 ]
A[ 1,3 ] A [ 1,2 ]
A[ 2,1 ] A [ 2,2 ] second column
In the case of row-major form, the relative address of A[ i1 , i2] can be calculated by the formula
where, low1 and low2 are the lower bounds on the values of i1 and i2 and n2 is the number of
values that i2 can take. That is, if high2 is the upper bound on the value of i2, then n2 = high2 –
low2 + 1.
Assuming that i1 and i2 are the only values that are known at compile time, we can rewrite the
above expression as
Generalized formula:
The expression generalizes to the following expression for the relative address of A[i 1,i2,…,ik]
(1) S L:=E
(2) E E+E
(3) E (E)
(4) E L
(5) L Elist ]
(6) L id
(7) Elist Elist , E
(8) Elist id [ E
We generate a normal assignment if L is a simple name, and an indexed assignment into the
location denoted by L otherwise :
When an array reference L is reduced to E , we want the r-value of L. Therefore we use indexing
to obtain the contents of the location L.place [ L.offset ] :
Elist.place : = E.place;
Elist.ndim : = 1 }
Consider the grammar for assignment statements as above, but suppose there are two
types – real and integer , with integers converted to reals when necessary. We have another
attribute E.type, whose value is either real or integer. The semantic rule for E.type associated
with the production E E + E is :
EE+E { E.type : =
if E1.type = integer and
E2.type = integer then integer
else real }
The entire semantic rule for E E + E and most of the other productions must be
modified to generate, when necessary, three-address statements of the form x : = inttoreal y,
whose effect is to convert integer y to a real of equal value, called x.
E.place := newtemp;
if E1.type = integer and E2.type = integer then begin
emit( E.place ‘: =’ E1.place ‘int +’ E2.place);
E.type : = integer
end
else if E1.type = real and E2.type = real then begin
emit( E.place ‘: =’ E1.place ‘real +’ E2.place);
E.type : = real
end
else if E1.type = integer and E2.type = real then begin
u : = newtemp;
emit( u ‘: =’ ‘inttoreal’ E1.place);
emit( E.place ‘: =’ u ‘ real +’ E2.place);
E.type : = real
end
else if E1.type = real and E2.type =integer then begin
u : = newtemp;
emit( u ‘: =’ ‘inttoreal’ E2.place);
emit( E.place ‘: =’ E1.place ‘ real +’ u);
E.type : = real
end
else
E.type : = type_error;
For example, for the input x : = y + i * j
assuming x and y have type real, and i and j have type integer, the output would look like
t1 : = i int* j
t3 : = inttoreal t1
t2 : = y real+ t3
x : = t2
BOOLEAN EXPRESSIONS
Boolean expressions have two primary purposes. They are used to compute logical
values, but more often they are used as conditional expressions in statements that alter the flow
of control, such as if-then-else, or while-do statements.
Boolean expressions are composed of the boolean operators ( and, or, and not ) applied
to elements that are boolean variables or relational expressions. Relational expressions are of the
form E1 relop E2, where E1 and E2 are arithmetic expressions.
There are two principal methods of representing the value of a boolean expression. They are :
To encode true and false numerically and to evaluate a boolean expression analogously
to an arithmetic expression. Often, 1 is used to denote true and 0 to denote false.
To implement boolean expressions by flow of control, that is, representing the value of a
boolean expression by a position reached in a program. This method is particularly
convenient in implementing the boolean expressions in flow-of-control statements, such
as the if-then and while-do statements.
Numerical Representation
Here, 1 denotes true and 0 denotes false. Expressions will be evaluated completely from
left to right, in a manner similar to arithmetic expressions.
For example :
E E1 or E2 { E.place : = newtemp;
emit( E.place ‘: =’ E1.place ‘or’ E2.place ) }
E E1 and E2 { E.place : = newtemp;
emit( E.place ‘: =’ E1.place ‘and’ E2.place ) }
E not E1 { E.place : = newtemp;
emit( E.place ‘: =’ ‘not’ E1.place ) }
E ( E1 ) { E.place : = E1.place }
E id1 relop id2 { E.place : = newtemp;
emit( ‘if’ id1.place relop.op id2.place ‘goto’ nextstat + 3);
emit( E.place ‘: =’ ‘0’ );
emit(‘goto’ nextstat +2);
emit( E.place ‘: =’ ‘1’) }
E true { E.place : = newtemp;
emit( E.place ‘: =’ ‘1’) }
E false { E.place : = newtemp;
emit( E.place ‘: =’ ‘0’) }
Short-Circuit Code:
We can also translate a boolean expression into three-address code without generating
code for any of the boolean operators and without having the code necessarily evaluate the entire
expression. This style of evaluation is sometimes called “short-circuit” or “jumping” code. It is
possible to evaluate boolean expressions without generating code for the boolean operators and,
or, and not if we represent the value of an expression by a position in the code sequence.
We now consider the translation of boolean expressions into three-address code in the
context of if-then, if-then-else, and while-do statements such as those generated by the following
grammar:
S if E then S1
| if E then S1 else S2
| while E do S1
E.true is the label to which control flows if E is true, and E.false is the label to which
control flows if E is false.
The semantic rules for translating a flow-of-control statement S allow control to flow
from the translation S.code to the three-address instruction immediately following
S.code.
S.next is a label that is attached to the first three-address instruction to be executed after
the code for S.
to E.true
E.code
to E.false
E.code to E.true E.true: S1.code
E.true : to E.false
S1.code goto S.next
E.false:
S2.code
E.false : ...
S.next: ...
to E.false
E.true: S1.code
goto S.begin
E.false: ...
(c) while-do
Syntax-directed definition for flow-of-control statements
E E1 or E2 E1.true : = E.true;
E1.false : = newlabel;
E2.true : = E.true;
E2.false : = E.false;
E.code : = E1.code || gen(E1.false ‘:’) || E2.code
E ( E1 ) E1.true : = E.true;
E1.false : = E.false;
E.code : = E1.code
CASE STATEMENTS
switch expression
begin
case value : statement
case value : statement
...
case value : statement
default : statement
end
switch E
begin
case V1 : S1
case V2 : S2
...
case Vn-1 : Sn-1
default : Sn
end
This case statement is translated into intermediate code that has the following form :
When keyword switch is seen, two new labels test and next, and a new temporary t are
generated.
As each case keyword occurs, a new label Li is created and entered into the symbol table.
A pointer to this symbol-table entry and the value Vi of case constant are placed on a
stack (used only to store cases).
Each statement case Vi : Si is processed by emitting the newly created label Li, followed
by the code for Si , followed by the jump goto next.
Then when the keyword end terminating the body of the switch is found, the code can be
generated for the n-way branch. Reading the pointer-value pairs on the case stack from
the bottom to the top, we can generate a sequence of three-address statements of the form
case V1 L1
case V2 L2
...
case Vn-1 Ln-1
case t Ln
label next
where t is the name holding the value of the selector expression E, and Ln is the label for
the default statement.
BACKPATCHING
The easiest way to implement the syntax-directed definitions for boolean expressions is
to use two passes. First, construct a syntax tree for the input, and then walk the tree in depth-first
order, computing the translations. The main problem with generating code for boolean
expressions and flow-of-control statements in a single pass is that during one single pass we may
not know the labels that control must go to at the time the jump statements are generated. Hence,
a series of branching statements with the targets of the jumps left unspecified is generated. Each
statement will be put on a list of goto statements whose labels will be filled in when the proper
label can be determined. We call this subsequent filling in of labels backpatching.
1. makelist(i) creates a new list containing only i, an index into the array of quadruples;
makelist returns a pointer to the list it has made.
2. merge(p1,p2) concatenates the lists pointed to by p1 and p2, and returns a pointer to the
concatenated list.
3. backpatch(p,i) inserts i as the target label for each of the statements on the list pointed to
by p.
Boolean Expressions:
We now construct a translation scheme suitable for producing quadruples for boolean
expressions during bottom-up parsing. The grammar we use is the following:
(1) E E1 or M E2
(2) | E1 and M E2
(3) | not E1
(4) | ( E1 )
(5) | id1 relop id2
(6) | true
(7) | false
(8) M ɛ
Synthesized attributes truelist and falselist of nonterminal E are used to generate jumping code
for boolean expressions. Incomplete jumps with unfilled labels are placed on lists pointed to by
E.truelist and E.falselist.
Consider production E E1 and M E2. If E1 is false, then E is also false, so the statements on
E1.falselist become part of E.falselist. If E1 is true, then we must next test E2, so the target for the
statements E1.truelist must be the beginning of the code generated for E2. This target is obtained
using marker nonterminal M.
Attribute M.quad records the number of the first statement of E2.code. With the production M
ɛ we associate the semantic action
{ M.quad : = nextquad }
The variable nextquad holds the index of the next quadruple to follow. This value will be
backpatched onto the E1.truelist when we have seen the remainder of the production E E1 and
M E2. The translation scheme is as follows:
(1) S if E then S
(2) | if E then S else S
(3) | while E do S
(4) | begin L end
(5) | A
(6) L L;S
(7) | S
The nonterminal E has two attributes E.truelist and E.falselist. L and S also need a list of
unfilled quadruples that must eventually be completed by backpatching. These lists are pointed
to by the attributes L..nextlist and S.nextlist. S.nextlist is a pointer to a list of all conditional and
unconditional jumps to the quadruple following the statement S in execution order, and L.nextlist
is defined similarly.
We backpatch the jumps when E is true to the quadruple M1.quad, which is the beginning of the
code for S1. Similarly, we backpatch jumps when E is false to go to the beginning of the code for
S2. The list S.nextlist includes all jumps out of S 1 and S2, as well as the jump generated by N.
The statement following L1 in order of execution is the beginning of S. Thus the L1.nextlist list is
backpatched to the beginning of the code for S, which is given by M.quad.
PROCEDURE CALLS
The procedure is such an important and frequently used programming construct that it is
imperative for a compiler to generate good code for procedure calls and returns. The run-time
routines that handle procedure argument passing, calls and returns are part of the run-time
support package.
Calling Sequences:
The translation for a call includes a calling sequence, a sequence of actions taken on entry
to and exit from each procedure. The falling are the actions that take place in a calling sequence :
When a procedure call occurs, space must be allocated for the activation record of the
called procedure.
The arguments of the called procedure must be evaluated and made available to the called
procedure in a known place.
Environment pointers must be established to enable the called procedure to access data in
enclosing blocks.
The state of the calling procedure must be saved so it can resume execution after the call.
Also saved in a known place is the return address, the location to which the called
routine must transfer after it is finished.
Finally a jump to the beginning of the code for the called procedure must be generated.
(3) Elist E
{ initialize queue to contain only E.place }
Here, the code for S is the code for Elist, which evaluates the arguments, followed by a
param p statement for each argument, followed by a call statement.
queue is emptied and then gets a single pointer to the symbol table location for the name
that denotes the value of E.