0% found this document useful (0 votes)
55 views97 pages

Module 3 - Assemblers

The document provides an overview of assemblers, detailing their basic functions, data structures, and the two-pass assembly process. It explains how assemblers translate assembly language into machine code, manage symbolic operands, and handle forward references. Additionally, it outlines assembler directives, internal data structures like OPTAB and SYMTAB, and the format of assembler output.

Uploaded by

rajeshthoppilin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
55 views97 pages

Module 3 - Assemblers

The document provides an overview of assemblers, detailing their basic functions, data structures, and the two-pass assembly process. It explains how assemblers translate assembly language into machine code, manage symbolic operands, and handle forward references. Additionally, it outlines assembler directives, internal data structures like OPTAB and SYMTAB, and the format of assembler output.

Uploaded by

rajeshthoppilin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 97

IMCA4C03

SYSTEM SOFTWARE

Module 3
ASSEMBLERS - Basic assembler functions- A simple SIC assembler –Assembler
algorithm and data structures –Machine dependent assembler features -Instruction
formats and addressing modes –Program relocation -Machine independent assembler
features -Literals –Symbol-defining statements – Expressions -One pass assemblers
and Multi pass assemblers - Implementation example -MASM assembler.

1
Assemblers
Role of Assembler
Source Assembl Object
er Linker
Program Code

Executable
Assembler is a program that
Code
accepts an assembly language
program as input and produces
its machine language
Loader
equivalent along with
information for the loader
E.g. MASM, TASM
2
Introduction to Assemblers
□ Fundamental functions
■ Translate mnemonic operation codes to their machine language
equivalents
■ Assign machine addresses to symbolic labels used by the
programmer
□ The features and design of an assembler depend
■ Source language it translates
■ The machine language it produces
□ Machine dependency
■ different machine instruction formats and codes

3
Basic Assembler Functions
1. Convert mnemonic operations to their machine language
equivalents.
Ex. STL → 14, JSUB → 48
2. Convert symbolic operands to their equivalent machine address.
Ex. Cloop →100
3. Build equivalent machine instruction in the proper format (format1,
2, 3 or 4)
4. Convert data constant into internal machine representation.
Ex. EOF → 4546
5. Write the object program and assembly listing.

4
Assembler Directives
□ Pseudo-instructions
■ Not translated into machine instructions
■ Provide instructions to the assembler itself
Assembler directives are pseudo instructions.

• They provide definition to the assembler itself.


• They are not translated into machine operation code.
• In addition to the mnemonic machine instruction, we have used the following
assembler directives.
• START, END, BYTE, WORD, RESB, RESW.

5
Assembler Directives
□ Basic assembler directives
■ START: specify name and starting address of the program e.g., SUM START 4000
■ END: specify end of program and (option) the first executable instruction in the
program e.g., END 6700
□ If not specified, use the address of the first executable instruction
■ BYTE: direct the assembler to generate constants (character or hexadecimal)
occupying as many bytes as needed to represent the constant.
■ WORD : Generate one-word integer constant
■ RESB: instruct the assembler to reserve memory location without generating data
values
■ RESW: Reserves the indicated number of word for a data area.

6
Assembler Data Structures

□ Data Structures needed


■ Operation Code Table (OPTAB)
■ Symbol Table (SYMTAB)
■ Location Counter (LOCCTR)

7
Assembler Data Structures

• OPTAB (Operation Table) – it is used to look up mnemonic operation


code and translate them to their machine language equivalent.
• In more complex assembler, this table also contains the information
about instruction format and length.

8
Internal Data Structures

□ OPTAB (operation code table)


■ Content
□ Mnemonic machine code and its machine language equivalent
□ May also include instruction format, length etc.
■ Usage
□ Pass 1: used to loop up and validate operation codes in the source
program
□ Pass 2: used to translate the operation codes to machine language
■ Characteristics
□ Static table, predefined when the assembler is written
■ Implementation
□ Array or hash table with mnemonic operation code as the key (preferred)
9
Internal Data Structures
• SYMTAB is used to store values assigned to labels.
• It includes the name and value (address) for each label in the source
program.

10
Internal Data Structures (Cont.)
□ SYMTAB (symbol table)
■ Content
□ Label name and its value (address)
□ May also include flag (type, length) etc.
■ Usage
□ Pass 1: labels are entered into SYMTAB with their address (from
LOCCTR) as they are encountered in the source program
□ Pass 2: symbols used as operands are looked up in SYMTAB to
obtain the address to be inserted in the assembled instruction
■ Characteristic
□ Dynamic table (insert, delete, search)
■ Implementation
□ Hash table for efficiency of insertion and retrieval
11
SYMTAB (symbol table Example )
COPY 1000
FIRST 1000
CLOOP 1003
ENDFIL 1015
EOF 1024
THREE 102D
ZERO 1030
RETADR 1033
LENGTH 1036
BUFFER 1039
RDREC 2039

12
Internal Data Structures (Cont.)
• A Location Counter (LOCCTR) is used to be a variable and help in the
assignment of addresses.
• LOCCTR initialized to be beginning address specified in the START
statement.
• After each source statement is processed, the length of the assembled
instruction or data to be generated is added to LOCCTR.

13
Internal Data Structures (Cont.)
□ Location Counter
■ A variable used to help in assignment
of addresses
■ Initialized to the beginning address
specified in the START statement
■ Counted in bytes

14
Basic Assembler Functions (Cont.)
□ Constructions of assembly language program
■ Instruction
Label mnemonic operand

□ Operand
■ Direct addressing
□ E.g. LDA ZERO
■ Immediate addressing
□ E.g. LDA #0
■ Indexed addressing
□ E.g. STCH BUFFER, X
■ Indirect addressing
□ E.g J @RETADR 15
Basic Assembler Functions (Cont.)
□ Constructions of assembly language program (Cont.)
■ Data
Label BYTE value
Label WORD value
Label RESB value
Label RESW value

□ Label: name of operand


□ value: integer, character
□ E.g. EOF BYTE C’EOF’
□ E.g. FIVE WORD 5

16
Example of a SIC Assembler Language
Program (Fig. 2.1)
□ Goal:
■ Reads records from input device (code F1)
■ Copies them to output device (code 05)
■ Loop until end of the file is detected
□ Write EOF on the output device
□ Terminate by executing an RSUB instruction to return to the
operating system
■ Assume this program is called by OS using JSUB

17
Example of a SIC Assembler Language
Program (Fig. 2.2)
□ Show the generated object code for each statement in Fig 2.1
□ Loc column shows the machine address for each part of the
assembled program
■ Assume program starts at address 1000
■ All instructions, data, or reserved storage are sequential arranged
according to their order in source program.
■ A location counter is used to keep track the address changing

18
Example of a SIC Assembler Language
Program (Fig. 2.1, 2.2)

1
1

19
Example of a SIC Assembler Language Program
(Fig. 2.1, 2.2) (Cont.)

1
2

20
Example of a SIC Assembler Language Program
(2.1, 2.2) (Cont.)

1
3

21
Functions of a Basic Assembler
□ Convert mnemonic operation codes to their machine language
equivalents
■ E.g. STL -> 14 (line 10)
□ Convert symbolic operands to their equivalent machine addresses
■ E.g. RETADR -> 1033 (line 10)
□ Build the machine instructions in the proper format
□ Convert the data constants to internal machine representations
■ E.g. EOF -> 454F46 (line 80)
□ Write the object program and the assembly listing

22
Functions of a Basic Assembler (Cont.)

□ All of above functions can be accomplished by sequential processing of the


source program
■ Except in processing symbolic operands
□ Example
■ 10 STL RETADR
□ RETADR is not yet defined when we encounter STL instruction
□ Called forward reference

23
Difficulties: Forward Reference

● Forward reference: reference to a label that is defined later in the


program.

Loc Label Operator Operand

1000 FIRST STL RETADR


1003 CLOOP JSUB RDREC
… … … … …
1012 J CLOOP
… … … … …
1033 RETADR RESW 1

24
Symbolic Operands
□ We’re not likely to write memory addresses directly in our code.
■ Instead, we will define variable names.
□ Other examples of symbolic operands
■ Labels (for jump instructions)
■ Subroutines
■ Constants

26
Address Translation Problem

□ Forward reference
■ A reference to a label that is defined later in the program
□ We will be unable to process this statement
□ As a result, most assemblers make 2 passes over the source
program
■ 1st pass: scan label definitions and assign addresses
■ 2nd pass: actual translation (object code)

27
Assembler output format
□ Contains 3 types of records:
■ Header record:
Col. 1 H
Col. 2-7 Program name
Col. 8-13 Starting address of object program (hex)
Col. 14-19 Length of object program in bytes (hex)
■ Text record
Col.1 T
Col.2-7 Starting address for object code in this record (hex)
Col. 8-9 Length of object code in this record in bytes (hex)
Col. 10-69 Object code (hex) (2 columns per byte)
■ End record
Col.1 E
Col.2~7 Address of first executable instruction in object program (hex)
(END program_name)
28
Assembler output format

□ Header Record → only one


□ Text Record → any number depends on program length
□ End Record → only one

29
Object Program for Fig 2.2 (Fig 2.3)
Program name,Starting address
(hex),Length of object program in bytes
(hex)

Starting address (hex),Length of object


Address of first code in this record (hex),Object code
executable (hex)
instruction (hex)
30
Object Program (Cont.)
LOCATION
Line no LABEL OPCODE OPERAND Object Code
COUNTER
1 SUM START 4000(H)
2 FIRST LDX ZERO
3 LDA ZERO
4 LOOP ADD TABLE, X
5 TIX COUNT
6 JLT Loop
7 STA TOTAL
8 RSUB
9 TABLE RESW 2000
10 COUNT RESW 1
11 ZERO WORD 0
12 TOTAL RESW 1
13 END FIRST 31
Object Program (Cont.)
LOCATION
Line no LABEL OPCODE OPERAND Object Code
COUNTER
1 SUM START 4000(H)
LDA=00,
2 4000 FIRST LDX ZERO
LDX=04, 3 4003 LDA ZERO
4 4006 LOOP ADD TABLE, X
STA=0C,
5 4009 TIX COUNT
ADD=18, 6 400C JLT Loop
TIX=2C, 7 400F STA TOTAL
8 4012 RSUB
JLT=38,
9 4015 TABLE RESW 2000
RSUB=4C 10 5785 COUNT RESW 1
11 5788 ZERO WORD 0
12 578B TOTAL RESW 1
13 578E END FIRST 32
Object Program (Cont.)
LOCATION
Line no LABEL OPCODE OPERAND Object Code
COUNTER
1 SUM START 4000(H)
LDA=00,
2 4000 FIRST LDX ZERO 045788
LDX=04, 3 4003 LDA ZERO 005788
4 4006 LOOP ADD TABLE, X 18C015
STA=0C,
5 4009 TIX COUNT 2C5785
ADD=18, 6 400C JLT Loop 384006
TIX=2C, 7 400F STA TOTAL 0C578B
8 4012 RSUB 4C0000
JLT=38,
9 4015 TABLE RESW 2000
RSUB=4C 10 5785 COUNT RESW 1
11 5788 ZERO WORD 0 000000
12 578B TOTAL RESW 1
13 578E END FIRST 33
Functions of Two Pass Assembler
□ Pass 1 - define symbols (assign addresses)
● Assign addresses to all statements in the program
● Save the values assigned to all labels for use in Pass 2
● Perform some processing of assembler directives

□ Pass 2 - assemble instructions and generate object program


● Assemble instructions
● Generate data values defined by BYTE, WORD
● Perform processing of assembler directives not done in Pass 1
● Write the object program and the assembly listing

34
Object Program

□ Finally, assembler must write the generated object


code to some output device
■ Called object program

■ Will be later loaded into memory for execution

35
Two Pass Assembler
● Read from input line
● LABEL, OPCODE, OPERAND

Source
program

Intermediate Object
Pass 1 Pass 2
file codes

OPTAB SYMTAB SYMTAB

36
Algorithm for 2 Pass
Assembler (Fig 2.4)
□ Both pass1 and pass 2 need to read
the source program.
■ However, pass 2 needs more information
□ Location counter value, error flags
□ Intermediate file
■ Contains each source statement with its
assigned address, error indicators, etc
■ Used as the input to Pass 2

37
Intermediate File

Source ■ LABEL, OPCODE,


Program OPERAND

Pass 1 Pass 2
Intermediate Object
assembler file assembler Program

LOCCTR OPTAB SYMTAB

38
□ Pass 1
Passes of an Assembler
● Separate contents of the label, mnemonic opcode and operand fields of a
statement.
● If a symbol is present in the label field, enter the pair (symbol, <LC>) in a
new entry of the symbol table.
● Check validity of the mnemonic opcode through a look-up in the optab.
● Perform LC processing, i.e., update the address contained in the location
counter by considering the opcode and operands of the statement.
□ Pass 2
● Obtain the machine opcode corresponding to the opcode from the
optab.
● Obtain the address of each memory operand from the Symbol table.
● Synthesize a machine instruction or the correct representation of a
constant, as the case may be.

39
Algorithm for Pass 1 of
Assembler (Fig 2.4a)

2
9

40
3
0

41
Algorithm for Pass 2 of
Assembler (Fig 2.4b)

42
3
2

43
Assembler Design
□ Machine Dependent Assembler Features
■ instruction formats and addressing modes
■ program relocation
□ Machine Independent Assembler Features
■ literals
■ symbol-defining statements
■ expressions
■ program blocks
■ control sections and program linking
□ Assembler design Options
■ one-pass assemblers
■ multi-pass assemblers
□ Implementation example -MASM assembler.
44
Machine Dependent Assembler Features
□ Many features of assemblers depend on the machine architecture
because they use memory, register etc.

□ So assembler features which depend on the machine are unique, i.e.


they are different for each machine.

□ The main two machine dependent assembler features


- Instruction Format and Addressing Modes
- Program Relocation

45
Instruction Format and Addressing Modes
□ The assembler convert the mnemonic to opcode and change register
mnemonic to numeric equivalent.

□ Conversion of mnemonic register to numeric is done usually in


SYMTAB or a separate table can be used for this purpose.

□ SYMTAB already contains information such as A - 0, X - 1, etc. for


registers.

46
Instruction Format and Addressing Modes
□ Translation of register to memory instruction are done using PC
relative or base relative addressing.

□ In this case the assembler must calculate a displacement to be


assembled as part of object instruction.

□ When displacement are too large, instead of PC or base relative, use


Format 4 instruction format.

47
Program Relocation
□ If the assembler does not know where the object code will be loaded in
memory, the object code generated by the assembler is called
relocatable code.

□ So during assembling assembler assumes a starting address of 0.

48
Need for Relocation
□ Relocation phenomenon occurs due to two reasons:

▪ When the assembler doesn't know, the generated object code will be in
which location in memory it generates relocatable object code. In this
case a relocating loader is used to load the object module in memory
anywhere.

▪ When machine memory is not large enough to support many


programs, they need to be swapped in and out for its management, so
programs need not be in same memory location when they are
swapped in. So relocation needed.

49
Relocatable Program
□ An assembler doesn't know the address to be loaded is

▪ But assembler can identify for the loader, which all parts of object
program needs to be relocated.

▪ A program which contains this information of relocation is called


program relocation / relocatable program

50
Which all instructions do not need relocation and why?
□ Instructions whose instruction operand do not refer memory at all.
User get operand value using register (which do not change even if
user relocate also)

▪ Instruction which are assembled using PC relative or base relative ie,


operand value is obtained by adding content of PC or base register.

▪ Instructions that need direct mode only need modification. So the


advantages of relative addressing is that they don't need modification.

51
Machine Independent Assembler Features
□ Features which do not depend on the machine can be implemented for
any assembler.

□ The features are:


- Literals
- Symbol Defining Statements
- Expressions
- Program Blocks
- Control Sections and Program Linking

52
Literals
□ Literals are defined as operands which are used to assign value
constant operand as part of a instruction
□ So constant operand can have already assigned values.

▪ A Literal is identified with prefix"=" which specified by a value.


▪ All Literal operands used in a program are gathered together into one
or more literal pools.
▪ Literals are placed into a pool at the end of the program.

53
Literals (cont…)
□ The Literal pool shows assigned addresses and generated data values.

□ When the literal pool is to be used at some other location in the object
program, the assembler directive LTORG is used.

□ When the assembler encounters a LTORG statement it creates a literal


pool that contains all of the literal operands since the previous LTORG
statements.

□ When the LTORG statement is not used the literal operand would be
defined at the literal pool only at the end.

54
Handling of Literals
□ The basic data structure used by the assembler to handle literal is
LITTAB(Literal Table).
□ For each literal, this table contains the literal name, operand value,
length, and address assigned to the operand when it is placed in the
literal pool.
□ LITTAB is organized as a hash table with the literal name as the key.

□ During pass 1, the assembler searches LITTAB for the specified literal
name or value.
□ During pass 2, the operand address for use in generating code is
obtained by searching LITTAB for each literal operand when
encountered.

55
Symbol Defining Statements
□ These are statements that define symbols and help to assign
values to them. EQU, ORG
□ Labels on instructions or data areas
■ The value of such a label is the address assigned to the statement
on which it appears
□ Defining symbols
■ A special assembler directive called EQU(Equate) directive
allows the programmer to define symbols and specify their values.

56
Symbol Defining Statements
■ Format: symbol EQU value
□ Value can be constant or expression involving constants and previously
defined symbols
□ This statement defines the given symbol (enters into SYMTAB) and
assigns it to the value specified.

□ Usage:
■ Make the source program easier to understand
■ Example
MAXLEN EQU 4096
+LDT #MAXLEN

57
Object Program Using Literal (Fig 2.9 & 2.10)

71

58
Symbol Defining Statements
□ How assembler handle it?
■ In pass 1: when the assembler encounters the EQU statement,
it enters the symbol into SYMTAB for later reference.
■ In pass 2: assemble the instruction with the value of the symbol

59
Symbol Defining Statements
□ ORG (origin)
■ Assembler directive: ORG value
□ Value can be constant or expression involving constants and
previously defined symbols

■ Assembler resets the location counter (LOCCTR) to the specified


value
LOCCTR controls the assignment of storage in the object program

60
Expressions
□ Most assembler allow the use of expressions, wherever such a single
operand is permitted.
□ Each such expressions must be evaluated by the assembler to produce a
single operand address or value.
□ Expressions are classified as either absolute expressions or relative
expressions depending upon the type of value they produce.

□ An expression that contains only absolute term is an absolute expression.

□ A relative expression is one in which all of the relative terms except one
can be paired, the remaining unpaired relative term must have a positive
sign.
□ Expressions that donot meet the conditions given for either absolute or
relative expressions should be flagged by the assembler as error.
61
Program Blocks
□ The source programs logically contained subroutines, data areas etc.

□ They were handled by the assembler as one entity, resulting in a single


block of object code.

□ The term program blocks refer to segments of code that are rearranged
within a single object program unit and control sections refer to segments
that are translated into independent object program units.

□ Each program block may contain several separate segments of the source
program.

□ The assembler will logically rearrange these segments to gather together the
pieces of each block.
62
Program Blocks (cont..)
□ In pass 1, assembler will rearrange the segment of program block to
gather together the pieces of each block.

□ - These blocks will then be assigned address in the object program.

□ - Separate location counter is set for each block.

□ At the beginning of the program location counter is "0"

□ - When switching to another block current value of location counter is


saved and it is restored when block is resumed.

63
Program Blocks (cont..)
□ At the end of pass 1 the location counter of each block will tell the
length of the block.

□ In pass 2, the assembler needs the address of each symbol relative to


the start of the object program.

□ So it adds the location of the symbol with the start address of the
individual block.

64
Control Sections and Program Linking
□ It is a part of the program that maintains its identity after assembly
each such control section can be loaded and reloaded independently of
the others.

□ Different control sections are most often used for subroutines or other
logical subdivision of a program.

□ The programmer can assemble, load and manipulate each of these


control sections separately.

□ The two new record types are DEFINE and REFER Record.

65
Define Record
□ Col 1: D

□ Col 2-7: Name of external symbol defined in this control section

□ Col 8-13: Relative address of symbol within the control section(hex)

□ Col 14-73: Repeat information in Col 2-13 for other external symbols.

66
Refer Record

□ Col 1: R

□ Col 2-7: Name of external symbol referred to in this control section

□ Col 8-73: Names of other external reference symbols.

67
Modification Record

□ Col 1: M

□ Col 2-7: Starting address of the field to be modified, relative to the


beginning of the control section

□ Col 8-9: Length of the field to be modified in half bytes

□ Col 10: Modification flag(+ or -)

□ Col 11-16: External symbol whose value is to be added to or


subtracted from the indicated value.
68
Assembler Design Options

□ One-pass assemblers
□ Multi-pass assemblers

69
One-Pass Assemblers

□ Goal: avoid a second pass over the source program


□ Main problem
■ Forward references to data items or labels on instructions
■ Instruction operands often are symbols that have not yet been defined in the source
program.
■ Thus the assembler does not know what address to insert in the translated instruction.
□ Solution
■ Data items: require all such areas be defined before they are referenced
■ Label on instructions: cannot be eliminated
□ E.g. the logic of the program often requires a forward jump
□ It is too inconvenient if forward jumps are not permitted

70
One-Pass Assemblers
Two Types of One-Pass Assemblers:
□ One type Produces object code directly in memory for immediate
execution - Load-and-go assembler

□ The other assembler Produces the usual kind of object code for later
execution.

71
One-Pass Assemblers
Load-and-Go Assembler
□ No object program is written out, no loader is needed
□ Useful for program development and testing
□ It avoids the overhead of writing the object program out and reading it back
in
□ Both one-pass and two-pass assemblers can be designed as load-
and-go
□ However, one-pass also avoids the overhead of an additional pass over the
source program
□ For a load-and-go assembler, the actual address must be known at
assembly time.

72
Sample Program for a One-Pass
Assembler (Fig. 2.18)

135

73
Sample Program for a One-Pass
Assembler (Fig. 2.18) (Cont.)

136

74
Sample Program for a One-Pass
Assembler (Fig. 2.18) (Cont.)

137

75
Forward Reference Handling in One-pass Assembler

□ When the assembler encounter an


instruction operand that has not yet been
defined:
1. The assembler omits the translation of operand address
2. Insert the symbol into SYMTAB, if not yet exist,
and entry is flagged to indicate this symbol
undefined
3. The address that refers to the undefined symbol is added
to a list of forward references associated with the
symbol table entry
4. When the definition for a symbol is encountered
1. The forward reference list for that symbol is scanned
133
2. The proper address for the symbol is inserted into
any instructions previous generated. 76
Handling Forward Reference in One-pass Assembler (Cont.)

□ At the end of the program


■ Any SYMTAB entries that are still marked with * indicate undefined
symbols
□ Be flagged by the assembler as errors
■ Search SYMTAB for the symbol named in the END statement and
jump to this location to begin execution of the assembled program.

77
Example
□ Fig. 2.19 (a)
■ Show the object code in memory and symbol table entries
after scanning line 40
■ Line 15: forward reference (RDREC)
□ Object code is marked ----
□ Value in symbol table is marked as * (undefined)
□ Insert the address of operand (2013) in a list associated with
RDREC
■ Line 30 and Line 35: follow the same procedure

78
Object Code in Memory and SYMTAB
After scanning line 40

139

79
Example (Cont.)
□ Fig. 2.19 (b)
■ Show the object code in memory and symbol table entries after scanning line
160
■ Line 45: ENDFIL was defined
□ Assembler place its value in the SYMTAB entry
□ Insert this value into the address (at 201C) as directed by the forward reference list
■ Line 125: RDREC was defined
□ Follow the same procedure
■ Line 65 and 155
□ Two new forward reference (WRREC and EXIT)

80
Object Code in Memory and SYMTAB
After scanning line 160

141

81
Object Code in Memory and SYMTAB
Entries for Fig 2.18 (Fig. 2.19b)

82
One-Pass Assembler Producing Object Code

□ Forward reference are entered into the symbol table’s list as before
■ If the operand contains an undefined symbol, use 0 as the address and write the
Text record to the object program.
□ However, when definition of a symbol is encountered, the assembler
must generate another Text record with the correct operand address.
□ When the program is loaded, this address will be inserted into the
instruction by loader.
□ The object program records must be kept in their original order when
they are presented to the loader

83
Multi-Pass Assemblers

□ Motivation: for a 2-pass assembler, any symbol used on the right-


hand side should be defined previously.
■ No forward references since symbols’ value can’t be defined during
the first pass

□ APLHA EQU BETA


E.g. BETA EQU DELTA Not allowed !
DELTA RESW 1

84
Multi-Pass Assemblers (Cont.)
□ Multi-pass assemblers
■ Eliminate the restriction on EQU and ORG
■ Make as many passes as are needed to process the definitions of symbols.
□ Implementation
■ To facilitate symbol evaluation, in SYMTAB, each entry must indicate which
symbols are dependent on the values it
■ Each entry keeps a linking list to keep track of whose symbols’ value depends
on this entry

85
Example of Multi-pass Assembler Operation
(fig 2.21a)

HALFSZ EQU MAXLEN/2


MAXLEN EQU BUFEND-BUFFER
PREVBT EQU BUFFER-1
.
.
.
BUFFER RESB 4096
BUFEND EQU *

86
Example of Multi-Pass
Assembler Operation (Fig 2.21b)
&1: one undefined symbol

HALFSZ EQU MAXLEN/2


MAXLEN EQU BUFEND-BUFFER
PREVBT EQU BUFFER-1 *: undefined
.
.
.
A list of the symbols whose
BUFFER RESB 4096 values depend on MAXLEN
BUFEND EQU *

149

87
Example of Multi-Pass
Assembler Operation (Fig 2.21c)

HALFSZ EQU MAXLEN/2


MAXLEN EQU BUFEND-BUFFER
PREVBT EQU BUFFER-1
.
.
.
BUFFER RESB 4096
BUFEND EQU *

88
Example of Multi-pass
Assembler Operation (fig 2.21d)

HALFSZ EQU MAXLEN/2


MAXLEN EQU BUFEND-BUFFER
PREVBT EQU BUFFER-1
.
.
.
BUFFER RESB 4096
BUFEND EQU *

89
Example of Multi-pass Assembler
Operation (fig 2.21e)

HALFSZ EQU MAXLEN/2


MAXLEN EQU BUFEND-BUFFER
PREVBT EQU BUFFER-1
.
.
.
BUFFER RESB 4096
BUFEND EQU *

Suppose Buffer =* = (PC)=103416


152

90
Example of Multi-pass Assembler Operation
(Fig 2.21f)
BUFEND=*(PC)=103416+409610=103416+100016=203416

1000 ÷ 2 = 800
Decimal value:
HALFSZ EQU MAXLEN/2 4096 ÷ 2
MAXLEN EQU BUFEND-BUFFER = 2048
PREVBT EQU BUFFER-1
.
.
.
BUFFER RESB 4096
BUFEND EQU *

91
Implementation Example
Microsoft MASM Assembler
□ Microsoft MASM assembler for Pentium and other x86 systems
□ Programmer of an x86 system views memory as a collection of segments.
□ An MASM assembler language program is written as a collection of segments.

□ Each segment is defined as belonging to a particular class ( CODE, DATA,


CONST, STACK ) corresponding to content.
□ Assembler directive: SEGMENT
■ Similar to program blocks in SIC

■ All parts of a segment are gathered together by assembler

92
Microsoft MASM Assembler (Cont.)
□ Segment registers are automatically set by the system loader when a program is
loaded for execution: CS (code), SS (stack), DS (data), ES (destination), FS
(file), GS (graphic)
□ Assembler directive: ASSUME
■ By default, assembler assumes all references to data segments use register DS
■ We can change by the assembler directive ASSUME
■ e.g. ASSUME ES : DATASEG2

▪ Tell the assembler that register ES indicate the segment DATASEG2


▪ Thus, any reference to labels are defined in DATASEG2 will be assembled using
register ES

■ Similar to BASE directive in SIC/XE


□ BASE tell a SIC/XE assembler the contents of register B.
□ ASSUME tell MASM the contents of a segment register (programmer must
provide instructions to load this register when the program is executed.) 93
Microsoft MASM Assembler (Cont.)
□ Jump instructions are assembled in 2 different ways:
■ Near jump: jump to a target in the same code segment using the current
segment register CS.
□ 2- or 3-byte instruction
■ Far jump: jump to a target in a different code segment
□ 5-byte instruction

□ E.g. JMP TARGET

□ If the definition of the label TARGET occurs in the program before JMP
instruction, the assembler can tell whether this is a near jump or a far jump.
□ If it is a forward reference, MASM assumes it is a near jump.
□ So the programmer must warn the assembler.
94
Microsoft MASM Assembler (Cont.)
□ Problem: Jump with forward reference
■ By default, MASM assumes that a forward jump is a near jump
■ If it is a far jump, the programmer must tell the assembler.
□ E.g. JMP FAR PTR TARGET
■ If the jump address is within 128 bytes of the current instruction the
programmer can specify the shorter near jump by writing
JMP SHORT TARGET

■ In pass1, the assembler reserves 3 bytes for the jump instruction.


■ The actual assembled instruction requires 5 bytes.
■ In the earlier versions of MASM, this caused an assembler error, called a
phase error. 95
Microsoft MASM Assembler (Cont.)
□ In later version of MASM, the assembler can repeat pass1 to
generate the correct location counter values.
□ Segment in an MASM source program can be written in more than
one part.
□ If a SEGMENT directive specifies the same as a previously
defined segment, it is considered to be a continuation of that
segment.
□ All of the parts of a segment are gathered together by the
assembly process.
□ References between segments that are assembled together are automatically
handled by the assembler.
96
Microsoft MASM Assembler (Cont.)
□ In x86, the length of an assembled instruction depends on the operands that are
used.
■ Operands maybe registers, memory locations, immediate values (1~4 bytes)
■ Thus, Pass1 in MASM is much complex that in SIC assembler

□ External references between separately assembled modules must be handled


by the linker.
■ MASM directive: PUBLIC, EXTRN
■ Similar to EXTDEF, EXTREF in SIC/XE
□ The object program from MASM may be in several different formats to allow
easy and efficient execution of the program in a variety of operating
environments.

97
Module 2 and 3 Important Questions
1. Compare SIC and SIC/XE machine architecture.
2. Explain the addressing modes and instruction sets in SIC
machine architecture with examples.
3. Write in detail data structures used by assembler.
4. Discuss the detailed design of a two-pass assembler with
algorithm.
5. Explain One pass assemblers and two/Multi pass assemblers
6. Explain machine dependent and machine independent
features of an assembler.
7. Explain in detail the features of MASM assembler.
8. What is the need for program relocation?
98

You might also like