Assembly (x86)
Cheat Sheet(part-1)
By: Zain Arshad Sandhu
This book as a cheat sheet is originated from the book Assembly
Language for x86 by Kip Irvine and some other articles to help my
fellows and readers to have a quick overview of Assembly Language .
@All rights reserved. Reproduction of article sheet is
Prohibted.
ASCII TABLE
Data Representation
Computer understand Machine Language ( 0 or 1 ).
Languages like Assembly , C/C++ , Java and Python just
help human to give instruction in human readable language
which then convert all the code in Machine code.
Most Commonly , hexadecimal numbers (0-F) are used to show
the content of computer memory.
Binary Integers
As per requirement to work with computer we have to talk
with computer with its own language(0 or 1).
Each ASCII character has its own number of bits to be used
to represent it in computer.
Example : 2 is set as a sequence of bits using 10 ( one and
zero).
MS : Most Significant Bit - Bit on the left
LSB : Least Significant Bit- Bit on the right
Unsigned Integer
Positive number i.e All Numbers >= 0.
Signed Integer
Negative number i.e All Numbers < 0.
Storage Sizes in Assembly
Basic storage unit in x86 machines : BYTE
Hexa Decimal Integers
Hexa decimal numbers system is used to represent machine
code.
0001.0110.1010.0111.1001.0100 in Hex is:
ASCII String
String : Sequence of one or more characters.
Terminology: ASCII string in memory is stored as a succes-
sion of bytes containing ASCII code /numeric code.
Example : ABC123 is 41h , 42h , 43h , 31h , 32h, 33h in
numeric code.
Boolean Operations
NOT ( ~ ) : Reverse the bit
AND ( ^ ) : Implement logical AND expression
OR ( + or v ) : Implement logical OR expression
Registers
Registers are high speed direct memory storage locations in-
side the CPU.
General Purpose Register : Primarily used for Arithmetic and
data movement purposes.
Other registers , flags and linear data Management :
See kip Irvine Assembly Language for x86
(eddition 6 page 128)...
Label: Place marker of instruction and data
Data Label : Identifies the location of variable
NOTE: One Data Label can be referred for multiple data
items
i.e Array DWORD 1,2,3
Code Label : Area where instructions are placed
code label is ended with ( : ) operator.
Example :
codeLabel :
instructions...
jmp codeLabel
jmp codeLabel is the line refer to codeLabel to perform all
the instruction again.
zero operand : stc ;example of setting carry flag (No oper-
and)
One operand : inc ecx
Increment operator adding ecx by 1 ( one operand)
Two Operand: MOV has two operand
Example: MOV eax,2
Three Operands : imul has three operands
Example : imul eax , ebx , 20
eax ( destination operand) , ebx and 20 ( operands )
imul eax , ebx ,20 is equivalent to eax = ebx * 20
Comments : Not Executable simple English to understand the
code written
1. Single line comments
; comments here
2.Block-Comments
COMMENT !
This code define comments...
This is another line of definition of
comments
!
another way is:
COMMENT &
This code define comments...
This is another line of definition of com-
ments
&
First Look to the Assembly Code
INCLUDE Irvine32.inc
INCLUDE : A directive extract all the information from
Irvine32.inc
.Code Directive: Area where all executable code is written
PROC Directive: identifies the where the procedure start
from (main)
Call : Display the current values of CPU for respective regis-
ters that directly related to Call instructions.
Example:
call writrInt display whatever in eax
call writeString display whatever in edx
Exit : halts the main procedure
ENDP : marks the end of main procedure
END Directive : last line of procedure to be assembled.
.386 Directives : identifies the minimum CPU require for pro-
gram
.model flat , stdcall
.MODEL directive
1. Identifies the segmentation model to be used for program
2. Identifies the code convention used for passing the param-
eters to procedures.
flat keyword : tells the assembler to generate the code for
protected mode
stdcall Keyword : enables the calling of MS-WINDOWS
function.
PROTO Directives : declares the prototype for procedure
EXITPROCCESS Proccess : MS-WINDOWS function halts
the
current program.
DumpRegs : display the registers’ content
INVOKE Directive : assembler directives that calls a pro-
cedure or function.
Template for ASM program
Variable Initialization
Syntax: Variable_name Variable_Type Variable_Content
i.e
var Word 5
name Byte “XYZ”
Variable without Content:
Syntax: Variable_name Variable_Type ?
var BYTE ?
Note : ? leaves the variable uninitialized
Initializer : integer constant or expression that exactly
matches the storage capacity of type of variable.
Example : value BYTE 256
The line written above will generate the error as:
The maximum number BYTE can store is 255
var SBYTE -129
ERROR : Minimum Value SBYTE can store is -128.
Multiple Initializers :
A signal Variable name can be used as initilezer of one or more
than one data items i.e ARRAY
Syntax: variable_name Data_type C1, C2 , ..... , Cn
C1, C2 , ..... , Cn are values separated by ( , ) operator.
i.e
array BYTE 10, 20 , 30 , 40
Note : Initially , the array variable has the offset of
first entry of
data list ( 10 ) in our case.
Mix Data as different redixes
list1 and list2 have same meaning in term of their content.
String Definitions
Syntax :
String_Name Type "string here.." or
String_Name Type ‘string here..’
i.e
Name BYTE "ABC...XYZ"
NULL-Terminated String: end with null byte (containing 0 ).
Example : Name BYTE "ABC...XYZ",0
Term to know :
Each character of string uses a byte of storage
A,B,.....,Z each uses 8-bit of storage place.
Meaning
Þ Name BYTE 'A','B','C',....,'X','Y','Z',0 is same as:
Name BYTE "ABC...XYZ",0
Þ Name[0]="A" Name[2]="C" and so on.
Multiple lines of string:
String BYTE "Do you know how to prgram?",0dh,0ah
BYTE "It is not programming that is tricky"
BYTE "But the Problem to be solved by Language"
BYTE "Good Luck!",0dh , 0ah ,0
0dh : carriage-return -CR
0Ah: Line Feed -LF
0dh and 0ah or CRLF when are used break the current
pointed line.
call crlf give the same meaning as 0dh , 0ah.
line Continuation character ( \ ): Concatinates two state-
ments in a single statement.
Def_0f_Scientist_By_Me \
BYTE “You are the biggest scientist if you explore yourself”,0
DUP operator
m DUP stands for Duplicate
m DUP allocates storage for multiple data elements of same
type
m DUP is usefull declaring array and strings
Examples:
Terminology of Byte 4 DUP("STACK") :
S T A C K each uses A byte for storage that means :
Each STACK word uses 5 bytes then for 4 Similar Words will
use 20 bytes ( 4-times data * 5 byte for each item ) .
BYTE 20 DUP(0) takes 20 byte for storage How?
Size of list = 20 * size of Data Type = 20 * 1byte = 20 bytes
For Word data type : Size = 20 * 2 bytes = 40 bytes
For Dword data Type : Size = 20 * 32 bytes = 640 bytes
Little endian Method (low to High access of data) : Least sig-
nificant byte is stored at the 1st memory address.
int dword 12345678h 78 will be stored at : 0000 then
56 will be stored at : 0001 and so on.
Declaring uninitialized data :
m .DATA? directive use uninitialized data efficiently.
m .DATA? reduce the size of compiled program.
( = ) Symbol :
variable_name= constant or expression
Size of Arrays and Strings:
arrayList BYTE 1,2,3,4
Explicitly : ArraySize = 4
Implicitly : ArraySize = ($ - ArraySize)
Referencing and Dereferencing
variable content is accessed through address of variable.
.data
var BYTE 30
mov AL , var is dereferencing technique to access the
data in var.
Let address of var is (0x1000) then :
0x1000 is offset of var ( an other term of Address )
[ ] for Dereferencing
AL
i.e mov AL , [var] 0x1000
Assembly C/C++
Mov destination , source destination =source
mov eax , 10 eax = 10
left-Operand : destination ( mostly )
right-Operands : source
variant of MOV instruction
MOV eax , ebx
MOV var , eax
MOV eax , var
MOV var , 5
MOV eax , 5
Rules to use MOV instruction:
Size of Destination & Source must be of same.
MOV EAX , BL ;ERROR
Both Destination and Source cannot be memory operands
MOV var1 , var2 ;ERROR
MOVZX ( Move with zero-Extend)
MOVZX mov the content of source to destination with ex-
tending the
zeros to remaining bits.
byteVal BYTE 10001111b
movzx ax , byteVal ; ax = 0000000010001111b
MOVSX ( mov with sign-extend)
MOVSX Ax , 10001111b ;Ax=1111111110001111b
XCHG instruction ( swap 2 values )
XCHG reg , reg
XCHG reg , mem
XCHG mem , reg
Basic Operational Instructions
mov - move a value to another location
add - add two values
sub - subtract a value from another
jmp - jump to a new location in the program
mul - multiply two values call - call a procedure
INC inc ecx ; ecx = ecx +1
DEC dec ecx ; ecx = ecx -1
ADD add eax,ebx ; eax = eax + ebx
SUB sub eax , ebx ; eax = eax - ebx
NEG ( 2’s compliment of number / data)
mov eax , 5
var = -1
NEG reg NEG eax ; -5
NEG mem NEG var ; 1
Offset Operator
returns the distance of variable in bytes form base location
PTR Operator
used to access specific size of a register/memory
CASE : There is 32-bit(Dword) array of 5 element and
1st element is to be stored in ax. how can this possible?
array dword 1,2,3,4,5
mov ax , array ;instruction operand must be the same
size
ERROR : Recall ( mov instruction rules do not allow to deal
with
different sizes of source and destination ).
Generally : ax can only have 16-bit size of integer.
Solution:
mov ax ,word PTR array
Syntax:
instruction destination , size of variable PTR varia-
ble_name
mov AL , 256 ;error : invalid Operands
Example:
val byte 5
ADD EAX , DWORD ptr val
TYPE Operator
Returns the size of variable according to data type in
Byte/s.
LENGHTOF Operator
counts the number of elements in array
returns the size/length of array
arr byte 1 , 2 ,3 ,4 ,5
SIZEOF Operator
Returns the size of variable/array/string as:
SIZEOF = TYPE * LENGTHOF
Example :
intArray BYTE 1,2,3
mov eax , sizeOf intArray ;eax = 3
type (in byte) = 1 (size of byte data type )
lengthof = 3
sizeof = 3*1
A.
B.
C.
D.
E.
F.
A.eax = 1 (Type of byte)
B.eax = 4 (no. of elements)
C.eax = 4 (type * length)
D.eax = 2 (type of word)
E.eax = 4 (#of elements)
F. eax = 8 (type * length)
G.eax = 5 (type * length)
LEA(load effective address) Operator:
loads the calculated offset/address of memory in
specified register.
i.e lea eax, array
mov eax,offset array works correspondent
Arrays Data Structure
An Array is set of elements.
i.e 1 ,2 3 ,4 , 5 ,6 are elements that is set of integers.
In Assembly language :
name_of_Array Type_of_array val1,val2....,valn
i.e IntegerArray BYTE 1,2,3,4
Inside The memory for 8-bit size array:
OFFSETs Access
0000 IntegerArray+0
1
0001 2 IntegerArray+1
0002 3 IntegerArray+2
0003 4 IntegerArray+3
Accessing Elements of Array
lea esi, IntegerArray
mov AL,[esi] 1st element
mov AL,[esi+1] 2ND element
mov AL,[esi+2] 3rd element
mov AL,[esi+3] 4th element
i.e IntegerArray Word 1,2,3,4
Inside The memory for 16-bit size array:
OFFSETs element Access
0000 1 base+0
0002 2 base+2
0004 3 base+4
0006 4 base+6
Accessing Elements of Array
lea esi, IntegerArray ;base address
mov AL,[esi+0] 1st element
mov AL,[esi+2] 2ND element
mov AL,[esi+4] 3rd element
mov AL,[esi+6] 4th element
i.e IntegerAray DWORD 1,2,3,4
Inside The memory for 32-bit size array:
keep in mind to get next element of Arrays:
for 8-bit array Add 1 in General purpose reg
for 16-bit array Add 2 in General purpose reg
for 32-bit array Add 4 in General purpose reg
OFFSETs element Access
0000 1 base+0
0004 2 base+0
0008 3 base+8
0012 4 base+12
Accessing Elements of Array
lea esi, IntegerArray
mov AL,[esi+0] 1st element
mov AL,[esi+2] 2ND element
mov AL,[esi+4] 3rd element
mov AL,[esi+6] 4th element
you can use any register for offset of array as:
Simple Debugging
Other technique to access element of array:
1.
mov esi,0 ;Any general purpose reg
mov al,[IntegerArray+esi]
inc esi
2. Easy Syntax as in C/C++(arrayName[index])
Syntax: mov reg , arrayName[index]
i.e mov al,IntegerArray[esi]
inc esi
i.e mov al,IntegerArray[4]
3. Scale Factor as Index ( index * TYPE )
mov esi,1 * TYPE IntegerArray ;esi=1*1byte
mov al,IntegerArray[esi]
Also
mov esi,0
mov al,IntegerArray[esi * TYPE IntegerArray]
For 16-bit and 32-bit array the formula will be same
because the type operator automatically give the
number of byte
Pointers
A variable that contains the address/offset of
other variable.
Used for Dynamic Memory Allocation
ptrB and ptrW now are pointing to arrayB and arrayW
respectively and can also be written as:
ptrB DWORD OFFSET arrayB
ptrW DWORD OFFSET arrayW
Accessing data of arrays
mov esi,ptrB ;esi = 0000 (supposed)
mov al,[esi] ;al = 10
TYPEDEF Operator
To create User-defined data types.
Ideal for creating pointer variable.
Syntax:
Type_name TYPEDEF PTR Data_type
i.e
BytePtr TYPEDEF PTR BYTE ;pointer to BYTE
WordPtr TYPEDEF PTR WORD ;Pointer to Word
DWordPtr TYPEDEF PTR DWORD ;Pointer to DWord
Use Case:
JMP and LOOP instruction
JMP instruction skip the next instruction and go to the
jump label to execute the statement based on the
values of CPU status Flags (ZF , CF , SF , PF etc.)
LOOP instruction execute a cycle of instructions based
on ecx/cx register value. ( if cx=0 loop terminates )
JMP Syntax: JMP destination
i.e JMP labelXYZ
Use Case : Endless Loop ( infinite Loop )
loopX :
......
.....
jmp loopX
LOOP Sytax: loop destination
label1:
....
....
loop label1
Use case: Print 1 to 5 elements
Some Coding ERRORs
Nested Loops : loop within another loop
Basic Syntax:
Outer-loop label1:
...
...
label2:
... inner-loop
...
loop label2
loop label1
Use Case: Each PAKISTAN with 2 ZindaBaad
while loop
Another technique to use loop structure is while loop.
Syntax:
.while(condition is true )
....
....
.endw
Use Case: Print PAKISTAN 5 TIMES
Using while loop ecx register not effected so that if you
are using ecx in condition , you have to define the base condition
through which loop will terminate.
while( destination < source )
while( destination <= source )
while( destination > source )
while( destination <= source )
while( destination != source )
while( !destination ) all conditions are allowed
Equal-Sign Directive ( = ) : Associates symbol name with inte-
ger expression.
Syntax: Symbol_name = expression
i.e lengthofArray = 5
i.e counter = 10 etc..
where to Initialize the = :
we can initialize the ( = ) directive in .data and .code
module.
Note : count can be modified anywhere in the program.
Current location counter ( $ ) :
Returns the offset of associated with current program
statement.
selfPtr BYTE $ ;contains the offset of selfPtr
Understanding:
array BYTE 1,2,3,4
arraySize = ( $ - array )
array Offset $ (current Offset )
let current offset = 00406004
distance
and offset of array= 00406000
( $ - array ) = 00000004 = arraySize is 4
Keep in mind:
if you will not count the size of array just after the array
declaration , the size will be incorrect.
See Example:
array BYTE 1,2,3,4
count BYTE ?
arraySize = ( $ - array )
Assuming the offsets mentioned in example given above
the arraySize = 5
Number of Array Elements : 32-bit and 16-bit
WordArray Word 1,2,3,4
DwordArray Dword 1,2,3,4
To get the number of element of array of 16-bit :
arraySize = ($-WordArray) / 2
why to divide by 2?
arraySize symbol is offset of 32-bit in size.
And each element is far from each other at the distance
of 16-bit(2-BYTES) in storage point of view.
1 2 3 4
Elements
Offsets 0000 0002 0004 0006
if offset of WordArray = 0x0000
the offset of arraySize = 0x0008
then
arraySize = 0x0008 - 0x0000 = 8 /2 = 4 elements
To get the number of element of array of 32-bit:
arraySize = ($-WordArray) / 4
why to divide by 4?
1 2 3 4
Elements
Offsets (hex) 0000 0004 0008 000C
if offset of WordArray = 0x0000
the offset of arraySize = 0x0010 ::10 hex = 16 deci
then arraySize = 0x0010 - 0x0000 = 16 /4 = 4 elements
NOTE : I have tried to solve and debug the example used in
this sheet with the help of MASM assembler. BUT being a
human I would say : “Mistakes are made and solved by human”.
Regards: Z.A Sandhu
THANKS