1
Module 3
Assembler Features and Design Options
Machine Dependent Assembler Features
1. Instruction format and Addressing modes
● PC-relative or Base-relative addressing: op m
● Indirect addressing: op @m
● Immediate addressing: op #c
● Extended format: +op m
● Index addressing: op m,x
● register-to-register instructions
2. Program Relocation
Moving program from one memory location to another memory location is known
as Program Relocation. The only parts of the program that require modification at
load time are those that specify direct addresses. Except for absolute address, the
rest of the instructions need not be modified.
The below given diagram is an example for program relocation.
2
Machine-Independent Assembler Features
Machine Independent Assembler features are some common assembler features
that are not closely related to machine architecture.The main such features are
o Literals
o Symbol Defining Statement
o Expressions
o Program Blocks
o Control Sections and Program Linking
1. Literals
One feature is Literal which is an operand and is part of an instruction.
In Literal the value is stated literally in the instruction
In an assembler language, a literal is identified with the prefix = which is
followed by a specification of the literal value similar to that in a BYTE
statement.
Literal pool is a combination of literal operands.
Design idea
Let programmers to be able to write the value of a constant operand as a part
of the instruction that uses it.This avoids having to define the constant
elsewhere in the program and make up a label for it.
Literals vs. Immediate Operands
• Immediate addressing
3
– The operand value is assembled as part of the machine instruction.
e.g. 55 0020 LDA #3 010003
Literal
– The assembler generates the specified value as a constant at some
other memory location.
– The address of this generated constant is used as the target address for
the machine instruction.
– The effect of using a literal is exactly the same as if the programming
had defined the constant explicitly and used the label assigned to the
constant as the instruction operand.
e.g. 45 001A ENDFIL LDA =C’EOF’ 032010
Literal Pool
• All of the literal operands used in the program are gathered together into one or
more literal pools.Normally literals are placed into a pool at the end of the
program.Sometimes, it is desirable to place literals into a pool at some other
location in the object program.
– LTORG directive is introduced for this purpose.
– When the assembler encounters a LTORG, it creates a pool that
contains all of the literals used since the previous LTORG.
If we do not use a LTORG on line 93, the literal =C’EOF’ will be placed at
the end of the program.This operand will then begin at address 1073 and be
too far away from the instruction that uses it. PC-relative addressing mode
4
cannot be used. LTORG thus is used when we want to keep the literal
operands close to the instruction that uses it.
Duplicate literals – the same literal used in more than one place in the
program.For duplicate literals, we should store only one copy of the specified
data value to save space.Most assembler can recognize duplicate literals.
– E.g., There are two uses of =X’05’ on lines 215 and 230 respectively.
– However, just one copy is generated on line 1076.
A literal table LITTAB is needed for literal processing. For each literal used,
the table contains:
Literal name.
The operand value and length.
The address assigned to the operand when it is placed in a literal pool.
LITTAB is often organized as a hash table, using the literal name or value as
the key
Implementation of Literals
Pass 1
Build LITTAB with literal name, operand value and length, leaving the
address unassigned
5
When LTORG or END statement is encountered, assign an address to each
literal not yet assigned an address
The location counter is updated to reflect the number of bytes occupied by
each literal
Pass 2
Search LITTAB for each literal operand encountered
Generate data values using BYTE or WORD statements
Generate Modification record for literals that represent an address
in the program
2. Symbol Defining Statements
EQU directive is used to define a symbol’s value.The value assigned to a
symbol may be a constant, or any expression involving constants and
previously defined symbols.
E.g., +LDT #4096 can be changed to :
MAXLEN EQU 4096
+LDT #MAXLEN
• When the assembler encounters the EQU statement, it enters MAXLEN into
SYMTAB (with value 4096). During assembly of the LDT instruction, the
assembler searches SYMTAB for the symbol MAXLEN, using its value as
the operand in the instruction.
Define mnemonic names for registers
ex: RMO A,X
6
ORG Directive
This can be used to indirectly assign values to symbols. When this statement is
encountered during assembly of a program, the assembler resets its location
counter to the specified value.The ORG statement will thus affect the values of
all labels defined until the next ORG. Normally when an ORG without
specified value is encounter, the previously saved location counter value is
restored,
Suppose that we have the following data structure and want to access its fields:
SYMBOL: 6 bytes
VALUE: 3 bytes (one word)
FLAGS: 2 bytes
We want to refer to every field of each entry
If EQU statements are used
7
For EQU and ORG, all symbols used on the right hand side of the statement must
have been defined previously in the program.This is because in the two-pass
assembler, we require that all symbols must be defined in pass 1.
3. Expression
The assembler evaluates the expressions and produces a single operand address
or value. Expressions consist of
Operator
● +,-,*,/ (division is usually defined to produce an integer result)
Individual terms
▪ Constants
▪ User-defined symbols
▪ Special terms, e.g., *, the current value of LOCCTR
Regarding program relocation, a symbol’s value can be classified as
Relative
Its value is relative to the beginning of the object program, and
thus its value is dependent of program location.
E.g., labels or reference to location counter (*)
Absolute
Its value is independent of program location
E.g., a constant. None of the relative terms may enter into a
multiplication or division operation
Errors:
o BUFEND+BUFFER
8
o 100-BUFFER
o 3*BUFFER
The type of an expression keep track of the types of all symbols defined in the
program. Therefore, we need a flag in the symbol table to indicate type of value
(absolute or relative) in addition to the value itself.
4. Program blocks
Program blocks Refer to segments of code that are rearranged within a single
object program unit. Assembler directive USE is used to denote Program blocks.
o USE [blockname]
If no USE statements are included, the entire program belongs to this single block.
At the beginning, statements are assumed to be part of the unnamed (default)
block. If no USE statements are included, the entire program belongs to this single
block. Each program block may actually contain several separate segments of the
source program.
Three blocks are used
❖default: executable instructions
❖CDATA: all data areas that are less in length
❖CBLKS: all data areas that consists of larger blocks of memory
Assembler rearranges these segments to gather together the pieces of each block
and assign address. Separate the program into blocks in a particular order. Large
buffer area is moved to the end of the object program. Program readability is
better if data areas are placed in the source program close to the statements that
reference them.
9
10
It is not necessary to physically rearrange the generated code in the object program
. The assembler just simply insert the proper load address in each Text record.The
loader will load these codes into correct places
5. Control sections
It can be loaded and relocated independently of the others are most often used for
subroutines or other logical subdivisions of a program .The programmer can
11
assemble, load, and manipulate each of these control sections separately because of
this, there should be some means for linking control sections together. Assembler
Directive is CSECT
secname CSECT
secname is the control section name. Instructions in one control section may need to
refer to instructions or data located in another section.
1.External definition
EXTDEF name [, name]
EXTDEF names symbols that are defined in this control section and
may be used by other sections
Ex: EXTDEF BUFFER, BUFEND, LENGTH
2. External reference
EXTREF name [,name]
EXTREF names symbols that are used in this control section and are
defined elsewhere
Ex: EXTREF RDREC, WRREC
12
The assembler must include information in the object program that will
cause the loader to insert proper values where they are required .The
assembler communicate to loader through 3 types of records.
1. Define record
Col. 1D
Col. 2-7 Name of external symbol defined in this control section
Col. 8-13 Relative address within this control section (hexadeccimal)
Col.14-73 Repeat information in Col. 2-13 for other external symbols
2. Refer record
Col. 1D
Col. 2-7 Name of external symbol referred to in this control section
Col. 8-73 Name of other external reference symbols
3. Modification record
13
Col. 1M
Col. 2-7 Starting address of the field to be modified (hexiadecimal)
Col. 8-9 Length of the field to be modified, in half-bytes
(hexadeccimal)
Col.11-16 External symbol whose value is to be added to or
subtracted from the indicated field
Note: control section name is automatically an external symbol, i.e. it is
available for use in Modification records.
Example
M00000405+RDREC
M00000705+COPY
Assembler Design Options
1. One-pass assemblers
2. Multi-pass assemblers
1.One-pass assembler
One-pass assemblers are used when
a. it is necessary or desirable to avoid a second pass over the source
program
b. the external storage for the intermediate file between two passes
is slow or is inconvenient to use
Main problem: forward references to both data and instructions.
14
• One simple way to eliminate this problem: require that all areas be defined
before they are referenced.( will be placing all storage reservation statements
before they are referenced.)
here all storage reservations are given at the starting .
Types of one-pass assembler
● Type 1: Load-and-go
‒ Produces object code directly in memory for immediate execution
● Type 2:
‒ Produces usual kind of object code for later execution
Type 1:Load-and-go
Load-and-go assembler generates their object code in memory for immediate
execution.For a load-and-go assembler, the actual address must be known at
assembly time.No object program is written out, no loader is needed.
Assembler operations:
– Omits the operand address if the symbol has not yet been defined
– Enters this undefined symbol into SYMTAB and indicates that it is
undefined
15
– Adds the address of this operand address to a list of forward references
associated with the SYMTAB entry
– Scans the reference list and inserts the address when the definition for
the symbol is encountered.
– Reports the error if there are still SYMTAB entries indicated undefined
symbols at the end of the program
– Search SYMTAB for the symbol named in the END statement and
jumps to this location to begin execution if there is no error
16
Type 2: Assembler
It will produce object code.
Assembler operations:
Instruction referencing are written into object file as a Text record, even with
incorrect addresses.If the operand contains an undefined symbol, use 0 as the address
and write the Text record to the object program.Forward references are entered into
lists as in the load-and-go assembler.When the definition of a symbol is encountered,
the assembler generates another Text record with the correct operand address of each
entry in the reference list.When loaded, the incorrect address 0 will be updated by
the latter Text record containing the symbol definition.
17
2.Multi-pass assemblers
A multi-pass assembler that can make as many passes as are needed to process the
definitions of symbols. Only the portions of the program that involve forward
references in symbol definition are saved for multi-pass reading.
For a two pass assembler, forward references in symbol definition are not allowed:
ALPHA EQU BETA
BETA EQU DELTA
DELTA RESW 1
Reason: symbol definition must be completed in pass 1.
Motivation for using a multi-pass assembler
– DELTA can be defined in pass 1
– BETA can be defined in pass 2
– ALPHA can be defined in pass 3
A symbol table is used
– to store symbol definitions that involve forward references
– to indicate which symbols are dependant on the values of others
– to facilitate symbol evaluation
For a forward reference in symbol definition, we store in the SYMTAB:
– the symbol name
– the defining expression
– the number of undefined symbols in the defining expression
– the undefined symbol (marked with a flag *) associated with a list of
symbols depend on this undefined symbol.
When a symbol is defined, we can recursively evaluate the symbol
expressions depending on the newly defined symbol.
18
After Pass 1
19
20
21
Implementation Examples: MASM Assembler
The Microsoft Macro Assembler (MASM) is an assembler for the x86 family of
microprocessors, originally produced Microsoft MS-DOS operating system. The
8086 family of processors have a set of 16-bit registers. They are AX,BX,CX,DX
(General purpose registers) segment registers like ,SS.CS.DS.ES and pointer
registers like SP,BP and Index registers like DI and SI etc…MASM assembler
language program is written as a collection of segments. Each segment is defined as
belonging to a particular class .Common segments are
CODE Segment
DATA Segment
CONST Segment
STACK Segment
22
Every program written only in MASM has one main module, where program
execution begins. Main module can contain code, data, or stack segments defined
with all of the simplified segment directives. Any additional modules should contain
only code and data segments. Every module that uses simplified segments must
begin with the .MODEL directive. When you specify a segment in your program,
not only must you tell the CPU that a segment is a data segment, but you must also
tell the assembler where and when that segment is a data (or code/stack/extra/F/G)
segment. The assume directive provides this information to the assembler.
The assume directive takes the following form:
ASSUME ES:DATASEG2
It tells the assembler to assume that register ES indicates the segment DATASEG2.
ASSUME tells MASM the contents of a segment register, the programmer must
provide instructions to load this register when the program is executed.
ASSUME DS : DATA
This tells the assembler that for any program instruction which refers to the data
segment ,it should use the logical segment called DATA.
ENDS-End Segment:
This directive is used with the name of the segment to indicate the end of that
logical segment.
Ex: CODE SEGMENT : Start of logical segment containing code
CODE ENDS : End of the segment named CODE.
Near jump
A jump to an instruction within the current code segment (the segment currently
pointed to by the CS register), sometimes referred to as an intrasegment jump.Near
jump occupies 2 or 3 bytes.
Short jump
A near jump where the jump range is limited to -128 to +127 from the current EIP
value.
Far jump
A jump to an instruction located in a different segment than the current code
segment, sometimes referred to as an intersegment jump. Far jump occupies 5 bytes.
Consider a jump instruction
23
JMP TARGET
If the definition of the label TARGET occurs in the program before the JMP
instruction, the assembler can tell whether this is a near jump or a far jump, If
this is a forward reference to TARGET the assembler does not know how
many bytes to reserve for instruction. By default MASM assumes that a
forward jump is a near jump.If the jump address is within 128 bytes of the
current instruction, the programmer can specify the shorter (2 byte )near jump
by writing
JMP SHORT TARGET
If JMP to TARGET is a far jump and the programmer does not specify FAR
PTR a problem occurs.Segments in an MASM source program can be written
in more than one part .If a SEGMENT directive specifies the same name as a
previously defined segment it is considered to be a continuation of that
segment.External References and External Definition in MASM is handled
using the directive EXTRN and PUBLIC .PUBLIC has same function as
EXTDEF in SIC/XE and EXTRN has same function as EXTREF in SIC/XE.