Module 3 - Assemblers
Module 3 - Assemblers
SYSTEM SOFTWARE
Module 3
ASSEMBLERS - Basic assembler functions- A simple SIC assembler –Assembler
algorithm and data structures –Machine dependent assembler features -Instruction
formats and addressing modes –Program relocation -Machine independent assembler
features -Literals –Symbol-defining statements – Expressions -One pass assemblers
and Multi pass assemblers - Implementation example -MASM assembler.
1
Assemblers
Role of Assembler
Source Assembl Object
er Linker
Program Code
Executable
Assembler is a program that
Code
accepts an assembly language
program as input and produces
its machine language
Loader
equivalent along with
information for the loader
E.g. MASM, TASM
2
Introduction to Assemblers
□ Fundamental functions
■ Translate mnemonic operation codes to their machine language
equivalents
■ Assign machine addresses to symbolic labels used by the
programmer
□ The features and design of an assembler depend
■ Source language it translates
■ The machine language it produces
□ Machine dependency
■ different machine instruction formats and codes
3
Basic Assembler Functions
1. Convert mnemonic operations to their machine language
equivalents.
Ex. STL → 14, JSUB → 48
2. Convert symbolic operands to their equivalent machine address.
Ex. Cloop →100
3. Build equivalent machine instruction in the proper format (format1,
2, 3 or 4)
4. Convert data constant into internal machine representation.
Ex. EOF → 4546
5. Write the object program and assembly listing.
4
Assembler Directives
□ Pseudo-instructions
■ Not translated into machine instructions
■ Provide instructions to the assembler itself
Assembler directives are pseudo instructions.
5
Assembler Directives
□ Basic assembler directives
■ START: specify name and starting address of the program e.g., SUM START 4000
■ END: specify end of program and (option) the first executable instruction in the
program e.g., END 6700
□ If not specified, use the address of the first executable instruction
■ BYTE: direct the assembler to generate constants (character or hexadecimal)
occupying as many bytes as needed to represent the constant.
■ WORD : Generate one-word integer constant
■ RESB: instruct the assembler to reserve memory location without generating data
values
■ RESW: Reserves the indicated number of word for a data area.
6
Assembler Data Structures
7
Assembler Data Structures
8
Internal Data Structures
10
Internal Data Structures (Cont.)
□ SYMTAB (symbol table)
■ Content
□ Label name and its value (address)
□ May also include flag (type, length) etc.
■ Usage
□ Pass 1: labels are entered into SYMTAB with their address (from
LOCCTR) as they are encountered in the source program
□ Pass 2: symbols used as operands are looked up in SYMTAB to
obtain the address to be inserted in the assembled instruction
■ Characteristic
□ Dynamic table (insert, delete, search)
■ Implementation
□ Hash table for efficiency of insertion and retrieval
11
SYMTAB (symbol table Example )
COPY 1000
FIRST 1000
CLOOP 1003
ENDFIL 1015
EOF 1024
THREE 102D
ZERO 1030
RETADR 1033
LENGTH 1036
BUFFER 1039
RDREC 2039
12
Internal Data Structures (Cont.)
• A Location Counter (LOCCTR) is used to be a variable and help in the
assignment of addresses.
• LOCCTR initialized to be beginning address specified in the START
statement.
• After each source statement is processed, the length of the assembled
instruction or data to be generated is added to LOCCTR.
13
Internal Data Structures (Cont.)
□ Location Counter
■ A variable used to help in assignment
of addresses
■ Initialized to the beginning address
specified in the START statement
■ Counted in bytes
14
Basic Assembler Functions (Cont.)
□ Constructions of assembly language program
■ Instruction
Label mnemonic operand
□ Operand
■ Direct addressing
□ E.g. LDA ZERO
■ Immediate addressing
□ E.g. LDA #0
■ Indexed addressing
□ E.g. STCH BUFFER, X
■ Indirect addressing
□ E.g J @RETADR 15
Basic Assembler Functions (Cont.)
□ Constructions of assembly language program (Cont.)
■ Data
Label BYTE value
Label WORD value
Label RESB value
Label RESW value
16
Example of a SIC Assembler Language
Program (Fig. 2.1)
□ Goal:
■ Reads records from input device (code F1)
■ Copies them to output device (code 05)
■ Loop until end of the file is detected
□ Write EOF on the output device
□ Terminate by executing an RSUB instruction to return to the
operating system
■ Assume this program is called by OS using JSUB
17
Example of a SIC Assembler Language
Program (Fig. 2.2)
□ Show the generated object code for each statement in Fig 2.1
□ Loc column shows the machine address for each part of the
assembled program
■ Assume program starts at address 1000
■ All instructions, data, or reserved storage are sequential arranged
according to their order in source program.
■ A location counter is used to keep track the address changing
18
Example of a SIC Assembler Language
Program (Fig. 2.1, 2.2)
1
1
19
Example of a SIC Assembler Language Program
(Fig. 2.1, 2.2) (Cont.)
1
2
20
Example of a SIC Assembler Language Program
(2.1, 2.2) (Cont.)
1
3
21
Functions of a Basic Assembler
□ Convert mnemonic operation codes to their machine language
equivalents
■ E.g. STL -> 14 (line 10)
□ Convert symbolic operands to their equivalent machine addresses
■ E.g. RETADR -> 1033 (line 10)
□ Build the machine instructions in the proper format
□ Convert the data constants to internal machine representations
■ E.g. EOF -> 454F46 (line 80)
□ Write the object program and the assembly listing
22
Functions of a Basic Assembler (Cont.)
23
Difficulties: Forward Reference
24
Symbolic Operands
□ We’re not likely to write memory addresses directly in our code.
■ Instead, we will define variable names.
□ Other examples of symbolic operands
■ Labels (for jump instructions)
■ Subroutines
■ Constants
26
Address Translation Problem
□ Forward reference
■ A reference to a label that is defined later in the program
□ We will be unable to process this statement
□ As a result, most assemblers make 2 passes over the source
program
■ 1st pass: scan label definitions and assign addresses
■ 2nd pass: actual translation (object code)
27
Assembler output format
□ Contains 3 types of records:
■ Header record:
Col. 1 H
Col. 2-7 Program name
Col. 8-13 Starting address of object program (hex)
Col. 14-19 Length of object program in bytes (hex)
■ Text record
Col.1 T
Col.2-7 Starting address for object code in this record (hex)
Col. 8-9 Length of object code in this record in bytes (hex)
Col. 10-69 Object code (hex) (2 columns per byte)
■ End record
Col.1 E
Col.2~7 Address of first executable instruction in object program (hex)
(END program_name)
28
Assembler output format
29
Object Program for Fig 2.2 (Fig 2.3)
Program name,Starting address
(hex),Length of object program in bytes
(hex)
34
Object Program
35
Two Pass Assembler
● Read from input line
● LABEL, OPCODE, OPERAND
Source
program
Intermediate Object
Pass 1 Pass 2
file codes
36
Algorithm for 2 Pass
Assembler (Fig 2.4)
□ Both pass1 and pass 2 need to read
the source program.
■ However, pass 2 needs more information
□ Location counter value, error flags
□ Intermediate file
■ Contains each source statement with its
assigned address, error indicators, etc
■ Used as the input to Pass 2
37
Intermediate File
Pass 1 Pass 2
Intermediate Object
assembler file assembler Program
38
□ Pass 1
Passes of an Assembler
● Separate contents of the label, mnemonic opcode and operand fields of a
statement.
● If a symbol is present in the label field, enter the pair (symbol, <LC>) in a
new entry of the symbol table.
● Check validity of the mnemonic opcode through a look-up in the optab.
● Perform LC processing, i.e., update the address contained in the location
counter by considering the opcode and operands of the statement.
□ Pass 2
● Obtain the machine opcode corresponding to the opcode from the
optab.
● Obtain the address of each memory operand from the Symbol table.
● Synthesize a machine instruction or the correct representation of a
constant, as the case may be.
39
Algorithm for Pass 1 of
Assembler (Fig 2.4a)
2
9
40
3
0
41
Algorithm for Pass 2 of
Assembler (Fig 2.4b)
42
3
2
43
Assembler Design
□ Machine Dependent Assembler Features
■ instruction formats and addressing modes
■ program relocation
□ Machine Independent Assembler Features
■ literals
■ symbol-defining statements
■ expressions
■ program blocks
■ control sections and program linking
□ Assembler design Options
■ one-pass assemblers
■ multi-pass assemblers
□ Implementation example -MASM assembler.
44
Machine Dependent Assembler Features
□ Many features of assemblers depend on the machine architecture
because they use memory, register etc.
45
Instruction Format and Addressing Modes
□ The assembler convert the mnemonic to opcode and change register
mnemonic to numeric equivalent.
46
Instruction Format and Addressing Modes
□ Translation of register to memory instruction are done using PC
relative or base relative addressing.
47
Program Relocation
□ If the assembler does not know where the object code will be loaded in
memory, the object code generated by the assembler is called
relocatable code.
48
Need for Relocation
□ Relocation phenomenon occurs due to two reasons:
▪ When the assembler doesn't know, the generated object code will be in
which location in memory it generates relocatable object code. In this
case a relocating loader is used to load the object module in memory
anywhere.
49
Relocatable Program
□ An assembler doesn't know the address to be loaded is
▪ But assembler can identify for the loader, which all parts of object
program needs to be relocated.
50
Which all instructions do not need relocation and why?
□ Instructions whose instruction operand do not refer memory at all.
User get operand value using register (which do not change even if
user relocate also)
51
Machine Independent Assembler Features
□ Features which do not depend on the machine can be implemented for
any assembler.
52
Literals
□ Literals are defined as operands which are used to assign value
constant operand as part of a instruction
□ So constant operand can have already assigned values.
53
Literals (cont…)
□ The Literal pool shows assigned addresses and generated data values.
□ When the literal pool is to be used at some other location in the object
program, the assembler directive LTORG is used.
□ When the LTORG statement is not used the literal operand would be
defined at the literal pool only at the end.
54
Handling of Literals
□ The basic data structure used by the assembler to handle literal is
LITTAB(Literal Table).
□ For each literal, this table contains the literal name, operand value,
length, and address assigned to the operand when it is placed in the
literal pool.
□ LITTAB is organized as a hash table with the literal name as the key.
□ During pass 1, the assembler searches LITTAB for the specified literal
name or value.
□ During pass 2, the operand address for use in generating code is
obtained by searching LITTAB for each literal operand when
encountered.
55
Symbol Defining Statements
□ These are statements that define symbols and help to assign
values to them. EQU, ORG
□ Labels on instructions or data areas
■ The value of such a label is the address assigned to the statement
on which it appears
□ Defining symbols
■ A special assembler directive called EQU(Equate) directive
allows the programmer to define symbols and specify their values.
56
Symbol Defining Statements
■ Format: symbol EQU value
□ Value can be constant or expression involving constants and previously
defined symbols
□ This statement defines the given symbol (enters into SYMTAB) and
assigns it to the value specified.
□ Usage:
■ Make the source program easier to understand
■ Example
MAXLEN EQU 4096
+LDT #MAXLEN
57
Object Program Using Literal (Fig 2.9 & 2.10)
71
58
Symbol Defining Statements
□ How assembler handle it?
■ In pass 1: when the assembler encounters the EQU statement,
it enters the symbol into SYMTAB for later reference.
■ In pass 2: assemble the instruction with the value of the symbol
59
Symbol Defining Statements
□ ORG (origin)
■ Assembler directive: ORG value
□ Value can be constant or expression involving constants and
previously defined symbols
60
Expressions
□ Most assembler allow the use of expressions, wherever such a single
operand is permitted.
□ Each such expressions must be evaluated by the assembler to produce a
single operand address or value.
□ Expressions are classified as either absolute expressions or relative
expressions depending upon the type of value they produce.
□ A relative expression is one in which all of the relative terms except one
can be paired, the remaining unpaired relative term must have a positive
sign.
□ Expressions that donot meet the conditions given for either absolute or
relative expressions should be flagged by the assembler as error.
61
Program Blocks
□ The source programs logically contained subroutines, data areas etc.
□ The term program blocks refer to segments of code that are rearranged
within a single object program unit and control sections refer to segments
that are translated into independent object program units.
□ Each program block may contain several separate segments of the source
program.
□ The assembler will logically rearrange these segments to gather together the
pieces of each block.
62
Program Blocks (cont..)
□ In pass 1, assembler will rearrange the segment of program block to
gather together the pieces of each block.
63
Program Blocks (cont..)
□ At the end of pass 1 the location counter of each block will tell the
length of the block.
□ So it adds the location of the symbol with the start address of the
individual block.
64
Control Sections and Program Linking
□ It is a part of the program that maintains its identity after assembly
each such control section can be loaded and reloaded independently of
the others.
□ Different control sections are most often used for subroutines or other
logical subdivision of a program.
□ The two new record types are DEFINE and REFER Record.
65
Define Record
□ Col 1: D
□ Col 14-73: Repeat information in Col 2-13 for other external symbols.
66
Refer Record
□ Col 1: R
67
Modification Record
□ Col 1: M
□ One-pass assemblers
□ Multi-pass assemblers
69
One-Pass Assemblers
70
One-Pass Assemblers
Two Types of One-Pass Assemblers:
□ One type Produces object code directly in memory for immediate
execution - Load-and-go assembler
□ The other assembler Produces the usual kind of object code for later
execution.
71
One-Pass Assemblers
Load-and-Go Assembler
□ No object program is written out, no loader is needed
□ Useful for program development and testing
□ It avoids the overhead of writing the object program out and reading it back
in
□ Both one-pass and two-pass assemblers can be designed as load-
and-go
□ However, one-pass also avoids the overhead of an additional pass over the
source program
□ For a load-and-go assembler, the actual address must be known at
assembly time.
72
Sample Program for a One-Pass
Assembler (Fig. 2.18)
135
73
Sample Program for a One-Pass
Assembler (Fig. 2.18) (Cont.)
136
74
Sample Program for a One-Pass
Assembler (Fig. 2.18) (Cont.)
137
75
Forward Reference Handling in One-pass Assembler
77
Example
□ Fig. 2.19 (a)
■ Show the object code in memory and symbol table entries
after scanning line 40
■ Line 15: forward reference (RDREC)
□ Object code is marked ----
□ Value in symbol table is marked as * (undefined)
□ Insert the address of operand (2013) in a list associated with
RDREC
■ Line 30 and Line 35: follow the same procedure
78
Object Code in Memory and SYMTAB
After scanning line 40
139
79
Example (Cont.)
□ Fig. 2.19 (b)
■ Show the object code in memory and symbol table entries after scanning line
160
■ Line 45: ENDFIL was defined
□ Assembler place its value in the SYMTAB entry
□ Insert this value into the address (at 201C) as directed by the forward reference list
■ Line 125: RDREC was defined
□ Follow the same procedure
■ Line 65 and 155
□ Two new forward reference (WRREC and EXIT)
80
Object Code in Memory and SYMTAB
After scanning line 160
141
81
Object Code in Memory and SYMTAB
Entries for Fig 2.18 (Fig. 2.19b)
82
One-Pass Assembler Producing Object Code
□ Forward reference are entered into the symbol table’s list as before
■ If the operand contains an undefined symbol, use 0 as the address and write the
Text record to the object program.
□ However, when definition of a symbol is encountered, the assembler
must generate another Text record with the correct operand address.
□ When the program is loaded, this address will be inserted into the
instruction by loader.
□ The object program records must be kept in their original order when
they are presented to the loader
83
Multi-Pass Assemblers
84
Multi-Pass Assemblers (Cont.)
□ Multi-pass assemblers
■ Eliminate the restriction on EQU and ORG
■ Make as many passes as are needed to process the definitions of symbols.
□ Implementation
■ To facilitate symbol evaluation, in SYMTAB, each entry must indicate which
symbols are dependent on the values it
■ Each entry keeps a linking list to keep track of whose symbols’ value depends
on this entry
85
Example of Multi-pass Assembler Operation
(fig 2.21a)
86
Example of Multi-Pass
Assembler Operation (Fig 2.21b)
&1: one undefined symbol
149
87
Example of Multi-Pass
Assembler Operation (Fig 2.21c)
88
Example of Multi-pass
Assembler Operation (fig 2.21d)
89
Example of Multi-pass Assembler
Operation (fig 2.21e)
90
Example of Multi-pass Assembler Operation
(Fig 2.21f)
BUFEND=*(PC)=103416+409610=103416+100016=203416
1000 ÷ 2 = 800
Decimal value:
HALFSZ EQU MAXLEN/2 4096 ÷ 2
MAXLEN EQU BUFEND-BUFFER = 2048
PREVBT EQU BUFFER-1
.
.
.
BUFFER RESB 4096
BUFEND EQU *
91
Implementation Example
Microsoft MASM Assembler
□ Microsoft MASM assembler for Pentium and other x86 systems
□ Programmer of an x86 system views memory as a collection of segments.
□ An MASM assembler language program is written as a collection of segments.
92
Microsoft MASM Assembler (Cont.)
□ Segment registers are automatically set by the system loader when a program is
loaded for execution: CS (code), SS (stack), DS (data), ES (destination), FS
(file), GS (graphic)
□ Assembler directive: ASSUME
■ By default, assembler assumes all references to data segments use register DS
■ We can change by the assembler directive ASSUME
■ e.g. ASSUME ES : DATASEG2
□ If the definition of the label TARGET occurs in the program before JMP
instruction, the assembler can tell whether this is a near jump or a far jump.
□ If it is a forward reference, MASM assumes it is a near jump.
□ So the programmer must warn the assembler.
94
Microsoft MASM Assembler (Cont.)
□ Problem: Jump with forward reference
■ By default, MASM assumes that a forward jump is a near jump
■ If it is a far jump, the programmer must tell the assembler.
□ E.g. JMP FAR PTR TARGET
■ If the jump address is within 128 bytes of the current instruction the
programmer can specify the shorter near jump by writing
JMP SHORT TARGET
97
Module 2 and 3 Important Questions
1. Compare SIC and SIC/XE machine architecture.
2. Explain the addressing modes and instruction sets in SIC
machine architecture with examples.
3. Write in detail data structures used by assembler.
4. Discuss the detailed design of a two-pass assembler with
algorithm.
5. Explain One pass assemblers and two/Multi pass assemblers
6. Explain machine dependent and machine independent
features of an assembler.
7. Explain in detail the features of MASM assembler.
8. What is the need for program relocation?
98