Assembly Language
Each personal computer has a microprocessor that manages the computer's arithmetical, logical
and control activities.
Each family of processors has its own set of instructions for handling various operations like
getting input from keyboard, displaying information on screen and performing various other jobs.
These set of instructions are called 'machine language instruction.
Processor understands only machine language instructions which are strings of 1s and 0s.
However machine language is too obscure and complex for using in software development. So the
low level assembly language is designed for a specific family of processors that represents various
instructions in symbolic code and a more understandable form.
Assembly language lacks high-level conveniences such as variables and functions, and it is not
portable between various families of processors. Nevertheless, assembly language is the most
powerful computer programming language available, and it gives programmers the insight required
to write effective code in high-level languages.
Assembly language is very closely related to machine language, and there is usually a
straightforward way to translate programs written in assembly language into machine language.
(This algorithm is usually implemented by a program called the assembler.) Because of the close
relationship between machine and assembly languages, each different machine architecture
usually has its own assembly language (in fact, each architecture may have several), and each is
unique.
The advantage of programming in assembler (rather than machine language) is that assembly
language is much easier for a human to read and understand.
Example
The program above, written in assembly language, looks like this:
MOV AX, 57104
MOV DS, AX
MOV [3998], 36
INT 32
When an assembler reads the program above, it converts each line of code into one CPU-
level instruction. This program uses two types of instructions, MOV and INT. On Intel
processors, the MOV instruction moves data around, while the INT instruction transfers
processor control to the device drivers or operating system.
The program still isn't quite clear, but it is much easier to understand than it was before. The
first instruction, MOV AX, 57104, tells the computer to copy the number 47104 into the
location AX. The next instruction, MOV DS, AX, tells the computer to copy the number in AX
into the location DS. The next instruction, MOV [3998], 36 tells the computer to put the
number 36 into memory location 3998. Finally, INT 32 exits the program by returning to the
operating system.
Advantages of Assembly Language
Interface of programs with OS, processor and BIOS
Representation of data in memory and other external devices
Processor accesses and executes instruction
Instructions accesses and process data
Program access external devices.
It requires less memory and execution time
It allows hardware-specific complex jobs in an easier way
It is suitable for time-critical jobs
It is most suitable for writing interrupt service routines and other memory resident
programs.
Disadvantages of Assembly Language
It is long and tedious to write initially
It is quite bug-prone
Bugs can be very difficult to chase
Code can be fairly difficult to understand and modify, i.e. to maintain
Result is non-portable to other architectures, existing or upcoming
Code will be optimized only for a certain implementation of a same architecture: for
instance, among Intel-compatible platforms each CPU design and its variations (relative
latency, through-output, and capacity, of processing units, caches, RAM, bus, disks,
presence of FPU, MMX, 3DNOW, SIMD extensions, etc) implies potentially completely
different optimization techniques.
Spend more time on a few details and can't focus on small and large algorithmic design
that are known to bring the largest part of the speed up.
A small change in algorithmic design might completely invalidate all your existing
assembly code.
Code that aren’t too far from what's in standard benchmarks, commercial optimizing
compilers outperform hand-coded assembly.
Assembly Basic Syntax
An assembly program can be divided into three sections:
data section
bss section
text section
Data Section
Data section is used for declaring initialized data or constants. This data does not change at
runtime. You can declare various constant values, file names or buffer size etc. in this section.
The syntax for declaring data section is:
section .data
bss Section
bss section is used for declaring variables. The syntax for declaring bss section is:
section .bss
text section
The text section is used for keeping the actual code. This section must begin with the
declaration global main, which tells the kernel where the program execution begins.
The syntax for declaring text section is:
section .text
global main
main:
Comments
Assembly language comment begins with a semicolon (;). It may contain any printable character
including blank. It can appear on a line by itself, like:
; This program displays a message on screen
or, on the same line along with an instruction, like:
add eax ,ebx ; adds ebx to eax
Assembly Language Statements
Assembly language programs consist of three types of statements:
1. Executable instructions or instructions
2. Assembler directives or pseudo-ops
3. Macros
The executable instructions or simply instructions tell the processor what to do. Each
instruction consists of an operation code (opcode). Each executable instruction generates one
machine language instruction.
The assembler directives or pseudo-ops tell the assembler about the various aspects of the
assembly process. These are non-executable and do not generate machine language instructions.
Macros are basically a text substitution mechanism.
Syntax of Assembly Language Statements
Assembly language statements are entered one statement per line. Each statement follows the
following format:
[label] mnemonic [operands] [;comment]
The fields in the square brackets are optional. A basic instruction has two parts, the first one is
the name of the instruction (or the mnemonic) which is to be executed, and the second are the
operands or the parameters of the command.
Following are some examples of typical assembly language statements:
INC COUNT ; Increment the memory variable COUNT
MOV TOTAL, 48 ; Transfer the value 48 in the
; memory variable TOTAL
ADD AH, BH ; Add the content of the
; BH register into the AH register
AND MASK1, 128 ; Perform AND operation on the
; variable MASK1 and 128
ADD MARKS, 10 ; Add 10 to the variable MARKS
MOV AL, 10 ; Transfer the value 10 to the AL register
The Hello World Program in Assembly
The following assembly language code displays the string 'Hello World' on the screen:
section .text
global main ;must be declared for linker (ld)
main: ;tells linker entry point
mov edx,len ;message length
mov ecx,msg ;message to write
mov ebx,1 ;file descriptor (stdout)
mov eax,4 ;system call number (sys_write)
int 0x80 ;call kernel
mov eax,1 ;system call number (sys_exit)
int 0x80 ;call kernel
section .data
msg db 'Hello, world!', 0xa ;our dear string
len equ $ - msg ;length of our dear string
When the above code is compiled and executed, it produces following result:
Hello, world!