Module3 Part2
Module3 Part2
MODULE III
• Assembler design options:
o Machine Independent assembler features
▪ Literals
▪ Symbol Defining Statements
▪ Expressions
▪ Program blocks
▪ Control sections
o Assembler design options
▪ Algorithm for Single Pass assembler
▪ Multi pass assembler
o Implementation example of MASM Assembler
o Literals
▪ Programmers can be able to write the value of a constant operand as a part of the
instruction. Such an operand is called literals.
▪ A literal is defined with a prefix =
▪ Eg: LDA =X’05’
o We can have literals in SIC, but immediate operand is only valid in SIC\XE
▪ Literal Pools
• All the literal operands used in a program are gathered together into one or more
literal pools.
• There are two ways to place the literals in the program
o Can place the literals at the end of the program (After END statement).
o Can place the literals at some other location in the object program.
▪ Reason: keep the literal operand close to the instruction
▪ An assembler directive LTORG is used.
▪ Whenever the LTORG is encountered, it creates a literal pool that
contains all the literal operands used since the beginning of the program
or since the previous LTORG.
▪ It is better to place the literals close to the instructions.
• If the literal operand would be placed too far away from the
instruction referencing, we cannot use PC-relative addressing or
Base-relative addressing to generate Object Program. Here we are
forced to choose extended instruction format. To avoid this we can
use LTORG in different places in the program.
• LITTAB is often organized as a hash table, using the literal name or value as the
key
▪ Implementation of Literals
• During Pass-1:
o The literal encountered is searched in the literal table.
o If the literal already exists, no action is taken.
o If it is not present, the literal name, operand value and length are added to
the LITTAB.
o When encounters a LTORG statement or the end of the program
▪ The assembler makes a scan of the LITTAB and assigns an address for
each literal not yet assigned an address.
▪ Update the location counter value.
• During Pass-2:
o Search LITTAB for each literal operand encountered
o Literal values placed at correct locations in the object program.
o If the literal value represents an address in the program, the assembler must
also generate the appropriate Modification Record.
▪ Allow literals that refer to the current value of the location counter.
• ‘*’ denotes a literal refer to the current value of program counter
• Eg: LDB =*
▪ Duplicate literals
• The same literal used more than once in the program
• e.g. WLOOP TD =X’05’
---- ----- -----
---- ----- -----
WD =X’05’
• The assemblers should recognize duplicate literals and store only one copy of
the specified data value
• Usage:
o To improve readability in place of numeric values
▪ Eg: Replace +LDT #4096
With
MAXLEN EQU 4096
+LDT # MAXLEN
o To define mnemonic names for registers
▪ Eg: Replace RMO 0,1
with
A EQU 0
X EQU 1
RMO A,X
• No forward reference
o One restriction with the usage of EQU is whatever symbol occurs in the
right hand side of the EQU should be predefined.
o Eg:
▪ ORG Statement
• ORG is an Assembler directive
• Allow the assembler to reset the PC to values
• Syntax: ORG value
• When ORG is encountered, the assembler resets its LOCCTR to the specified
value
• ORG will affect the values of all labels defined until the next ORG
• We can return to the normal use of LOCCTR by simply write ORG
• ORG is used to control assignment storage in the object program.
• No forward reference is allowed
o All symbols used to specify the new LOCCTR value must have been
previously defined.
o During pass1 assembler would not know what value to assign to the location
counter in response to the first ORG statement. As a result, the symbols
BYTE1, BYTE2 and BYTE3 could not be assigned during pass 1.
o Expressions
▪ The assemblers allow the expressions as operand
▪ The assembler evaluates the expressions and produces a single operand address or
value
▪ Expressions consist of
4 Prepared By: Dona Jose, AP, CSE,VJCET
Reference Book: System Software: An Introduction to System Programming, Leland L Beck
Module III System Software(S5 CSE)
• Operator: +,-,*,/
• Constants
• User-defined symbols
• Special terms: *, the current value of LOCCTR
• Examples
MAXLEN EQU BUFEND-BUFFER
STAB RESB (6+3+2)*MAXENTRIES
BUFEND EQU *
The current value of location counter is assigned to BUFEND.
▪ Values of terms can be classified as absolute or relative.
• Absolute terms
o Independent of program location
o Eg: Constants
MAXLEN EQU 1000
• Relative terms
o Defined relative to the beginning of the program
o Eg:
▪ Labels on instructions
▪ References to location counter: *
▪ Expressions can be either absolute or relative
• Absolute Expression
o Expression contains only absolute terms
MAXLEN EQU 1000+5
o Relative terms in pairs with opposite signs for each pair
MAXLEN EQU BUFEND-BUFFER
▪ BUFEND and BUFFER both are relative terms, representing addresses
within the program. The expression BUFEND-BUFFER represents an
absolute value.
▪ When relative terms are paired with opposite signs, the dependency on
the program starting address is canceled out. The result is an absolute
value.
▪ No relative term may enter into a multiplication or division operation.
• Relative Expression
o Contains an odd number of relative terms, with one more positive term than
negative term.
STAB EQU OPTAB + (BUFEND – BUFFER)
o No relative term may enter into a multiplication or division operation.
▪ Eg: 3*BUFFER is incorrect.
• Expressions that are neither absolute nor relative will lead to assembler error.
o Eg:
▪ BUFEND+BUFFER
▪ 100-BUFFER
▪ 3*BUFFER
▪ Defining Symbol Types in the Symbol Table
• To find the type of expression, we must keep track of the types of all symbols
defined in this program.
• For this purpose we need a flag in the SYMTAB to indicate type of value
(absolute or relative) in addition to the value itself.
• With this information the assembler can easily determine the type of each
expression used as an operand and generate Modification Record in the object
program for relative values.
o Program blocks
▪ The source programs logically contained subroutines, data area etc.
▪ Within the object program the generated machine instructions and data appeared in
the same order as they were written in the source program.
▪ Program blocks allow the generated machine instructions and data to appear in a
different order while they are loading in memory.
• Separating blocks for storing code, data, stack, and larger data block
▪ Assembler directive: USE
• Syntax: USE [blockname]
• USE indicates which portion of the source program belongs to the various
blocks.
• At the beginning, statements are assumed to be part of the default block
• If no USE statements are included, the entire program belongs to this single
block
• Each program block may actually contain several separate segments of the
source program
▪ Assembler rearrange these segments to gather together the pieces of each block and
assign address
• Separate the program into blocks in a particular order
• Large buffer area is moved to the end of the object program
• Program readability is better if data areas are placed in the source program close
to the statements that reference them.
• Consider the following program. Here 3 blocks are used.
o Unnamed (default) block (block no: 0): contains the executable instructions
of the program.
o CDATA block (block no: 1): contains all data areas that are less in length.
o CBLKS block (block no: 2): contains all data areas that consist of larger
blocks of memory
• At the beginning, statements are assumed to be part of the default block
• The USE statement on line 92 signals the beginning of the block named
CDATA.
• The USE statement on line 103 signals the beginning of the block named
CBLKS.
• The USE statements on line 123 and 208 resume the default block, and the
statements on line 183 and 252 resume CDATA block.
• Line 107 is shown without a block number because the value of MAXLEN is an
absolute symbol.
▪ Pass 1
• A separate location counter for each program block
o At the beginning of a block, LOCCTR is set to 0.
o Save and restore LOCCTR when switching between blocks
• Assign each label an address relative to the start of the block that contains it.
• Store the block name (or number) in the SYMTAB along with the assigned
relative address of the label
• At the end of Pass1 the latest value of LOCCTR for each block indicates the
length of that block.
• At the end of Pass1 the assembler constructs a block table that contains the
block name, block number, starting addresses and length of all blocks.
▪ Pass 2
• Calculate the address for each symbol relative to the start of the object program
by adding the location of the symbol relative to the start of its block, to the
assigned block starting address.
• Eg:
o Consider the instruction LDA LENGTH
o The relative location of LENGTH in CDATA block = 0003
o Starting address for CDATA = 0066
o Therefore, TA = 0003 + 0066 = 0069
o This instruction is to be assembled using PC-relative addressing mode.
o After fetching this instruction, PC = 0009. Since the default block starts at
location 0000, this address = 0000 + 0009 = 0009
o Displacement = TA-PC = 0069 – 0009 = 0060
o Therefore, the object code is 032060
• First 2 text records are generated from lines 5 through 70(default block 1).
• No new text record is created for lines 95 through 105, because it is not
generated any code. Next 2 text records come from lines 125 through
180(default block 2). Fifth text record is for CDATA 2 block and so on.
• The loader loads the default block in the memory from location 0000 to 0065.
CDATA will occupy locations from 0066 through 0070. CBLKS will occupy
locations 0071 through 1070.
• CDATA(1) and CBLKS(1) are not present in object program. Storage will
automatically be reserved for these areas when the program is loaded.
▪ Benefits of Program Blocks
• Here the larger buffer area is moved to the end of the object program. So we can
avoid the use of extended instruction format.
• Program readability is improves if the definition of data areas are placed in the
source program close to the statements that reference them.
Pass 1 Algorithm
Begin
block number = 0
LOCCTR[i] = 0 for all i
Read the first input line
If OPCODE = ‘START’ then
{
Write line into intermediate file
Read next input line
}
While OPCODE != ‘END’ do
{ If OPCODE = ‘USE’ then
{ If there is no operand name then block name = Default
Else block name = OPERAND name
If there is no entry for block name in block table then
Insert (block name, block no++) in to block table
i = bock number for block name
if there is not a comment line then
{ If there is a symbol in the LABEL field then
9 Prepared By: Dona Jose, AP, CSE,VJCET
Reference Book: System Software: An Introduction to System Programming, Leland L Beck
Module III System Software(S5 CSE)
{
Search SYMTAB for LABEL
If found then Set error flag
Else Insert (LABEL, LOCCTR[i], block number) into SYMTAB
}
Search OPTAB for OPCODE
If found then LOCCTR[i] = LOCCTR[i] +3
Else if OPCODE = ‘WORD’ then LOCCTR[i] = LOCCTR[i] +3
Else if OPCODE = ‘RESW’ then LOCCTR[i] = +3 * #OPERAND
Else if OPCODE = ‘RESB’ then LOCCTR[i] = + #OPERAND
Else if OPCODE = ‘BYTES’ then LOCCTR[i]= +length of the constant
Else Set error flag
}
Write line into intermediate file
Read next input line
}
}
LENGTH[i] = LOCCTR[i] for all i
Address[0] = starting address
Address[i] = Address[i-1] + Length[i-1] for all i=1 to max(block number)
Insert (Address[i], Length[i]) in block table for all i
End
Pass 2 Algorithm
o Control sections
▪ Program blocks v.s. Control sections
• Program blocks: Segments of code that are rearranged within a single object
program unit
• Control sections: Segments of code that are translated into independent object
program units
▪ These are most often used for subroutines or other logical subdivisions of a program
▪ The programmer can assemble, load, and manipulate each of these control sections
separately
▪ Assembler directive: CSECT
• Syntax: secname CSECT
▪ Separate location counter for each control section. Initial value of the location
counter is 0.
▪ Instructions in one control section may need to refer to instructions or data located in
another section. Assembler has no idea where any other control sections will be
located at execution time.
10 Prepared By: Dona Jose, AP, CSE,VJCET
Reference Book: System Software: An Introduction to System Programming, Leland L Beck
Module III System Software(S5 CSE)
▪ It is necessary to provide some means of linking them together. For this purpose we
can use the following 2 assembler directives
• External definition EXTDEF symbol1,symbol2, . . . . . ,symboln
o Define symbols that are defined in this control section and may be used by
other sections
o Ex: EXTDEF BUFFER, BUFEND, LENGTH
• Both terms in each pair of an expression must be within the same control
section
o Legal: BUFEND-BUFFER
o Illegal: RDREC-COPY
▪ The assembler must include information in the object program that will cause the
loader to insert proper values where they are required. Define Record; Refer Record
and Modification Record are used for this purpose.
▪ In the main subroutine RDREC(line 15), WRREC(line 35, line 65) and
ENDFILL(line 30) are forward references.
▪ After scanning line 40 of the above program
• Some of the forward references have been resolved by this time, while others
have been added.
• When the symbol ENDFILL was defined (line 45), the assembler places 2024 in
the SYMTAB entry.
• Insert 2024 in the location 201C. Then delete the linked list.
• Similar operations are happened for all forward references.
LOCCTR = LOCCTR +3
}
Else if OPCODE = ‘WORD’ then
{
Object code = #OPERAND
load this object code in memory location LOCCTR
LOCCTR = LOCCTR +3
}
Else if OPCODE = ‘RESW’ then
LOCCTR = LOCCTR +3x#OPERAND
Else if OPCODE = ‘RESB’ then
LOCCTR = LOCCTR + #OPERAND
Else if OPCODE = ‘BYTE’ then
{
Convert constant to object code and load it in memory location LOCCTR
LOCCTR = LOCCTR +length of the constant
}
Else
Set error flag
}
Read the next input line
}
If there are still SYMTAB entries indicated undefined symbols
Reports the error
Else
Jump to the location specified in END statement.
End
}
Else
{ Insert (symbol name, null) into SYMTAB
Create a linked list with address as LOCCTR+1
}
Generate object code
LOCCTR = LOCCTR +3
}
Else if OPCODE = ‘WORD’ then
{
LOCCTR = LOCCTR +3
Object code = #OPERAND
}
Else if OPCODE = ‘RESW’ then
LOCCTR = LOCCTR +3x#OPERAND
Else if OPCODE = ‘RESB’ then
LOCCTR = LOCCTR + #OPERAND
Else if OPCODE = ‘BYTE’ then
{
LOCCTR = LOCCTR +length of the constant
Convert constant to object code
}
Else
Set error flag
If object code will not fit into the current text record then
{
Write Text Record into object program
Initialize new Text Record
}
Add object code to Text Record
}
Read the next input line
}
Write last Text Record to object program
Write End Record to object program
End
o Multipass Assembler
▪ The symbols used on the RHS of EQU should be defined previously in the program.
▪ Eg:
▪ Forward references tend to create difficulty for a person reading the program.
▪ The general solution for forward references is a multi-pass assembler that can make
as many passes as are needed to process the definitions of symbols.
▪ It is not necessary for such an assembler to make more than 2 passes over the entire
program.
▪ The portions of the program that involve forward references in symbol definition are
saved during Pass 1.
▪ Additional passes through these stored definitions are made as the assembly
progresses.
▪ This process is followed by a normal Pass 2.
▪ Implementation
• For a forward reference in symbol definition, we store in the SYMTAB:
o The symbol name
o The defining expression
o The number of undefined symbols in the defining expression
• The undefined symbol (marked with a flag *) associated with a list of symbols
depend on this undefined symbol.
• When a symbol is defined, we can recursively evaluate the symbol expressions
depending on the newly defined symbol.
▪ Example
1 A EQU B/2
2 B EQU C-D
3 E EQU D-1
4 D RESB 4096
5 C EQU *
• After executing statement 1, the SYMTAB will become
10. Develop the records (excluding header, text and end records) for the following control
section named COPY