0% found this document useful (0 votes)
27 views13 pages

4 Disassembler Example

This document provides a detailed guide on the linear sweep disassembly algorithm, illustrating how to disassemble machine code step-by-step. It includes specific examples of machine code instructions, their corresponding opcodes, and how to interpret them using ModR/M and SIB bytes. The document serves as a resource for a Disassembler Programming Assignment, emphasizing the importance of reading binary files correctly.

Uploaded by

nightmarepuma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views13 pages

4 Disassembler Example

This document provides a detailed guide on the linear sweep disassembly algorithm, illustrating how to disassemble machine code step-by-step. It includes specific examples of machine code instructions, their corresponding opcodes, and how to interpret them using ModR/M and SIB bytes. The document serves as a resource for a Disassembler Programming Assignment, emphasizing the importance of reading binary files correctly.

Uploaded by

nightmarepuma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

Disassembler Example

REVA

Introduction
This document will help walk through the linear sweep disassembly algorithm step-by-step.
This includes a variety of instructions that you will see. Keep in mind that not all instruction
formats and addressing modes may be represented in this document, however, this will act as
a guide to help you with the Disassembler Programming Assignment. Of note, the opcode 6A
is not required for your homework.

1 The Machine Code


Below is the machine code that we will be disassembling. Keep in mind that this machine
code will be available to you in a binary file. Thus, you will have to read in these values from
your program. Please remember to open your files as binary since there could be end of line
terminators and NULL terminators appearing in the machine code.

6A 01 68 04 00 00 00 E8 0C 00 00 00 8B 44 24 04
03 84 24 08 00 00 00 C3 55 89 E5 B8 00 00 00 00
C7 45 00 44 33 22 11 FF 75 0C FF 75 08 E8 DA FF
FF FF 59 59 85 C0 75 05 B8 FF FF FF FF 5D C3

2 The Disassembly
2.1 Opening the Binary File and Reading Bytes
After opening the file for (binary) reading, we must begin reading in the bytes. You may do this
all at once, or you may choose to read the bytes in one at a time. Either choice is fine. For our
examples, you will not run out of memory so reading everything in at once is acceptable!
Keep in mind, as we parse byte-by-byte, we may not know what the instruction is until we
parse the ModR/M byte (recall the /digit cases).

2.2 Parsing the Input


1. First, we set our counter (address) to 00000000 and end to 0000003E based on how
many bytes were read. After each instruction is disassembled, we will increment the
counter by the size of the instruction. If a byte is not able to be processed (invalid or
unsupported opcode), the current counter will be displayed along with the hexadecimal
representation of the byte. See below as an example:

0000000A: 99 db 0x99
2. Our counter points to the byte 6A. When we look up this opcode in the Intel Instruction
Manual, we recognize this as a push imm8.

(a) This tells us that this instruction is two bytes: 1 opcode + 1 byte for the imm8.
(b) This imm8 follows the opcode and is the byte 01.
(c) Our output should be as follows, which will be added to the dictionary.
00000000: 6A 01 push 0x01

(d) counter += 2 −→ 00000002.

3. counter points to byte 68 which we recognize as push imm32.

(a) This instruction is 5 bytes: 1-byte opcode + 4-byte immediate.


(b) The imm32 follows the opcode and is the byte sequence 04 00 00 00.
(c) We know that Intel is little-endian, so accounting for endianness the imm32 is the
decimal value 4 or the hexadecimal value 0x00000004.
(d) Add the formatted instruction to the dictionary.
00000002: 6804000000 push 0x00000004

(e) counter += 5 −→ 00000007.

4. counter points to byte E8 which we recognize as call rel32.

(a) This instruction is 5 bytes: 1-byte opcode + 4-byte relative displacement.


(b) The rel32 follows the opcode and is relative to the end of the current instruction.
The target address is the address of the current instruction + the instruction size +
displacement. Note that our counter is tracking the address of each instruction.
target = counter + instr_len + rel32
(c) Accounting for endianness, the displacement byte sequence of 0C 00 00 00 is
0x0000000C.
(d) target = 0x00000007 + 0x5 + 0x0000000C.
(e) target = 0x00000018. Recall that targets are 32-bit so we must truncate the
value. In this case, the value can already be represented with 32 bits so no truncation
is required, but it is good to remember this.
(f) Add the formatted instruction to the dictionary.

00000007: E80C000000 call offset_00000018h

(g) counter += 5 −→ 0000000C.


(h) Add the label offset_00000018h to a label dictionary.

2
5. counter points to byte 8B which we recognize as mov r32, r/m32.

(a) We have to inspect at least one more byte to decode the instruction. The byte
following the opcode is the ModR/M byte.
(b) Read and decode the next byte, 44, as the ModR/M byte: 01 000 100.
• MOD = 0b01 means there is a disp8.
• REG = 0b000 means r32 (1st operand) is eax.
• R/M = 0b100 means a SIB byte follows!
(c) Read and decode the next byte, 24, as the SIB byte: 00 100 100.
• SS = 0b00 means the index is scaled by 1.
• INDEX = 0b100 means there is no index register.
• BASE = 0b100 means the base is esp.
(d) Read the next byte as the disp8: 04 is a displacement of 0x04.
(e) This instruction is 4 bytes: 1-byte opcode + 1-byte ModR/M + 1-byte SIB + 1-byte
displacement.
(f) Add the formatted instruction to the dictionary.

0000000C: 8B442404 mov eax, [ esp + 0x04 ]

(g) counter += 4 −→ 00000010.

6. counter points to byte 03 which we recognize as add r32, r/m32.

(a) ModR/M is required to interpret the r/m32 operand.


(b) Read and decode the next byte, 84, as the ModR/M byte: 10 000 100.
• MOD = 0b10 means there is a disp32.
• REG = 0b000 means r32 (1st operand) is eax.
• R/M = 0b100 means a SIB byte follows!
(c) Read and decode the next byte, 24, as the SIB byte: 00 100 100.
• SS = 0b00 means the index is scaled by 1.
• INDEX = 0b100 means there is no index register.
• BASE = 0b100 means the base is esp.
(d) Read the next 4 bytes as the displacement, 08 00 00 00 becomes a displacement
of 0x00000008.
(e) This instruction is 7 bytes: 1-byte opcode + 1-byte ModR/M + 1-byte SIB + 4-byte
displacement.
(f) Add the formatted instruction to the dictionary (we could emit dword because of the
disp32).
00000010: 03842408000000 add eax, [ dword esp + 0x00000008 ]

(g) counter += 7 −→ 00000017.

3
7. counter points to byte C3 which we recognize as retn.

(a) This is a 1-byte instruction with no operands.


(b) Add the formatted instruction to the dictionary.

00000017: C3 retn

(c) counter += 1 −→ 00000018.

8. counter points to byte 55 which we recognize as push r32.

(a) This is a 1-byte instruction where the register being pushed is encoded in the opcode.
To disassemble, we take 0x55 - 0x50 = 5, or 0b101, is the ebp register.
(b) Add the formatted instruction to the dictionary.

00000018: 55 push ebp

(c) counter += 1 −→ 00000019.

9. counter points to byte 89 which we recognize as mov r/m32, r32.

(a) Read and decode the next byte, E5, as the ModR/M byte: 11 100 101.
• MOD = 0b11 means r/m32 is a register, not memory.
• REG = 0b100 means r32 (2nd operand) is esp.
• R/M = 0b101 means r/m32 (1st operand) is ebp.
(b) This instruction is 2 bytes: 1-byte opcode + 1-byte ModR/M.
(c) Add the formatted instruction to the dictionary.
00000019: 89E5 mov ebp, esp

(d) counter += 2 −→ 0000001B.

10. counter points to byte B8 which we recognize as mov r32, imm32.

(a) This is another opcode encoding, but also is followed by an imm32 (id).
(b) The instruction is 5 bytes: 1-byte opcode + 4-byte immediate.
(c) We take the given opcode from the file (0xB8) and subtract 0xB8 to get 0, or 0b000
which is eax (the r32 operand).
(d) Read the next 4 bytes as the immediate, 00 00 00 00 becomes the immediate
value 0x00000000.
(e) Add the formatted instruction to the dictionary.

0000001B: B800000000 mov eax, 0x00000000

(f) counter += 5 −→ 00000020.

4
11. counter points to byte C7 which cannot be directly decoded to an instruction mnemonic.

(a) We must read the ModR/M byte and inspect the REG field to determine the mnemonic.
(b) Read and decode the next byte, 45, as the ModR/M byte: 01 000 101.
• MOD = 0b01 means a disp8 follows.
• REG = 0b000 means an extension of /0 indicating mov r/m32, imm32.
• R/M = 0b101 means r/m32 (1st operand) is ebp.
(c) The instruction is 7 bytes: 1-byte opcode + 1-byte ModR/M + 1-byte displacement
+ 4-byte immediate.
(d) Read the next byte, 00, as the disp8 giving a displacement of 0x00.
(e) Read the next 4 bytes as the immediate, 44 33 22 11 becomes the immediate
value 0x11223344.
(f) Add the formatted instruction to the dictionary. We know to use dword because of
the imm32.
00000020: C7450044332211 mov dword [ ebp + 0x00 ], 0x11223344

(g) counter += 7 −→ 00000027.

12. counter points to byte FF which cannot be directly decoded to an instruction mnemonic.

(a) We must read the ModR/M byte and inspect the REG field to determine the mnemonic.
(b) Read and decode the next byte, 75, as the ModR/M byte: 01 110 101.
• MOD = 0b01 means a disp8 follows.
• REG = 0b110 means an extension of /6 indicating push r/m32.
• R/M = 0b101 means r/m32 is ebp.
(c) The instruction is 3 bytes: 1-byte opcode + 1-byte ModR/M + 1-byte displacement.
(d) Read the next byte, 0C, as the disp8 giving a displacement of 0x0C.
(e) Add the formatted instruction to the dictionary. We know to insert dword because
of the r/m32 operand.

00000027: FF750C push dword [ ebp + 0x0C ]

(f) counter += 3 −→ 0000002A.

5
13. counter points to byte FF which cannot be directly decoded to an instruction mnemonic.

(a) We must read the ModR/M byte and inspect the REG field to determine the mnemonic.
(b) Read and decode the next byte, 75, as the ModR/M byte: 01 110 101.
• MOD = 0b01 means a disp8 follows.
• REG = 0b110 means an extension of /6 indicating push r/m32.
• R/M = 0b101 means r/m32 is ebp.
(c) The instruction is 3 bytes: 1-byte opcode + 1-byte ModR/M + 1-byte displacement.
(d) Read the next byte, 08, as the disp8 giving a displacement of 0x08.
(e) Add the formatted instruction to the dictionary. We know to insert dword because
of the r/m32 operand.
0000002A: FF7508 push dword [ ebp + 0x08 ]

(f) counter += 3 −→ 0000002D.

14. counter points to byte E8 which we recognize as call rel32.

(a) This instruction is 5 bytes: 1-byte opcode + 4-byte relative displacement.


(b) The rel32 follows the opcode and is relative to the end of the current instruction.
The target address is the address of the current instruction + the instruction size +
displacement. Note that our counter is tracking the address of each instruction.
target = counter + instr_len + rel32
(c) Accounting for endianness, the displacement byte sequence of
DA FF FF FF is 0xFFFFFFDA.
(d) target = 0x0000002D + 0x5 + 0xFFFFFFDA.
(e) target = 0x10000000C (this is a 33-bit value). Recall that targets are 32-bit so
we must truncate the value to 0x0000000C.
(f) Add the formatted instruction to the dictionary.

0000002D: E8DAFFFFFF call offset_0000000Ch

(g) counter += 5 −→ 00000032.


(h) Add the label offset_0000000Ch to a label dictionary.

15. counter points to byte 59 which we recognize as pop r32.

(a) This is a 1-byte instruction where the register being pushed is encoded in the opcode.
To disassemble, we take 0x59 - 0x58 = 1 (or 0b001) which is the ecx register.
(b) Add the formatted instruction to the dictionary.

00000032: 59 pop ecx

(c) counter += 1 −→ 00000033.

6
16. counter points to byte 59 which we recognize as pop r32.

(a) This is a 1-byte instruction where the register being pushed is encoded in the opcode.
To disassemble, we take 0x59 - 0x58 = 1 (or 0b001) which is the ecx register.
(b) Add the formatted instruction to the dictionary.

00000033: 59 pop ecx

(c) counter += 1 −→ 00000034.

17. counter points to byte 85 which we recognize as test r/m32, r32.

(a) Read and decode the next byte, C0, as the ModR/M byte: 11 000 000.
• MOD = 0b11 means r/m32 is a register, not memory.
• REG = 0b000 means r32 (2nd operand) is eax.
• R/M = 0b000 means r/m32 (1st operand) is eax.
(b) The instruction is 2 bytes: 1-byte opcode + 1-byte ModR/M.
(c) Add the formatted instruction to the dictionary.
00000034: 85C0 test eax, eax

(d) counter += 2 −→ 00000036.

18. counter points to byte 75 which we recognize as jnz rel8.

(a) We now know that 1 byte (rel8 means 8-bit displacement to follow) follows the 75
and is part of our instruction.
(b) This instruction is 2 bytes: 1-byte opcode + 1-byte relative displacement.
(c) The rel8 follows the opcode and is relative to the end of the current instruction.
The target address is the address of the current instruction + the instruction size +
displacement. Due to the rel8, the displacement is sign extended to 32 bits. Note
that our counter is tracking the address of each instruction.
target = counter + instr_len + sign_extend(rel8)
(d) Reading the next byte as the displacement, 05, we note that the high-order bit is
0 so this is zero extended to 0x00000005. Note: had the rel8 been 80, the
high-order bit would be 1 giving a sign-extended value of 0xFFFFFF80.
(e) target = 0x00000036 + 0x2 + 0x00000005.
(f) target = 0x0000003D. Recall that targets are 32-bit so we must truncate the
value to 0x0000003D (no truncation was necessary).
(g) Add the formatted instruction to the dictionary.
00000036: 7505 jnz offset_0000003Dh

(h) counter += 2 −→ 00000038.


(i) Add the label offset_0000003Dh to a label dictionary.

7
19. counter points to byte B8 which we recognize as mov r32, imm32.

(a) This is another opcode encoding, but also is followed by an imm32 (id).
(b) The instruction is 5 bytes: 1-byte opcode + 4-byte immediate.
(c) We take the given opcode from the file (0xB8) and subtract 0xB8 to get 0, or 0b000
which is eax (the r32 operand).
(d) Read the next 4 bytes as the immediate, FF FF FF FF becomes the immediate
value 0xFFFFFFFF.
(e) Add the formatted instruction to the dictionary.

00000038: B8FFFFFFFF mov eax, 0xFFFFFFFF

(f) counter += 5 −→ 0000003D.

20. counter points to byte 5D which we recognize as pop r32.

(a) This is a 1-byte instruction where the register being pushed is encoded in the opcode.
To disassemble, we take 0x5D - 0x58 = 5 (or 0b101) which is the ebp register.
(b) Add the formatted instruction to the dictionary.
0000003D: 5D pop ebp

(c) counter += 1 −→ 0000003E.

21. counter points to byte C3 which we recognize as retn.

(a) This is a 1-byte instruction with no operands.


(b) Add the formatted instruction to the dictionary.
0000003E: C3 retn

(c) counter += 1 −→ 0000003F.

22. counter is greater than end so there are no more bytes to read or disassemble.

8
23. Output all entries in the dictionary. Since labels must be added where appropriate, you
may want to check each address to see if it is in your label dictionary (keep in mind that
branching backwards is possible). This is only one method of solving this requirement.

00000000: 6A01 push 0x01


00000002: 6804000000 push 0x00000004
00000007: E80C000000 call offset_00000018h
offset_0000000Ch:
0000000C: 8B442404 mov eax, [ esp + 0x04 ]
00000010: 03842408000000 add eax, [ dword esp + 0x00000008 ]
00000017: C3 retn
offset_00000018h:
00000018: 55 push ebp
00000019: 89E5 mov ebp, esp
0000001B: B800000000 mov eax, 0x00000000
00000020: C7450044332211 mov dword [ ebp + 0x00 ], 0x11223344
00000027: FF750C push dword [ ebp + 0x0C ]
0000002A: FF7508 push dword [ ebp + 0x08 ]
0000002D: E8DAFFFFFF call offset_0000000Ch
00000032: 59 pop ecx
00000033: 59 pop ecx
00000034: 85C0 test eax, eax
00000036: 7505 jnz offset_0000003Dh
00000038: B8FFFFFFFF mov eax, 0xFFFFFFFF
offset_0000003Dh:
0000003D: 5D pop ebp
0000003E: C3 retn

9
3 Additional Decodings
3.1 Negative Displacement
89 78 FC
1. counter points to byte 89 which we recognize as mov r/m32, r32.

(a) Read and decode the next byte, 78, as the ModR/M byte: 01 111 000.
• MOD = 0b01 means there is a disp8. We need to remember that the 8-bit
displacement value is treated as a signed value. Thus the range of non-negative
values is 0x00 - 0x7F. The range of negative values is 0x80 - 0xFF.
• REG = 0b111 means the second operand in this case (the r32 location) will
be edi.
• R/M = 0b000 means the r/m32 register operand will be eax.
(b) Read the next byte, FC, as the disp8 giving a displacement of -4. We can also
represent this as the 32-bit sign extended hex representation 0xFFFFFFFC. The
preferred representation is -4 but either one is acceptable. Since the displacement
is an 8-bit signed value, the 0xFC value represents a negative number. The simplest
way to derive its value for subtraction is to take the 8-bit twos complement which is
4. Thus, our final encoding would subract 4 instead of add.
(c) Add the formatted instruction to the dictionary. We can use either representation.
00000000: 8978FC mov dword [ eax - 4 ], edi
-or-
00000000: 8978FC mov dword [ eax + 0xFFFFFFFC ], edi

3.2 Scale 0b00 and Base ebp


C7 04 15 DD CC BB AA 44 33 22 11
1. counter points to byte C7 which cannot be directly decoded to an instruction mnemonic.

(a) We must read the ModR/M byte and inspect the REG field to determine the mnemonic.
(b) Read and decode the next byte, 04, as the ModR/M byte: 00 000 100.
• MOD = 0b00 normally means no displacement. However we do note that
ebp and esp are special cases. We will see esp used shortly as the SIB byte
indicator, and we will see ebp used to indicate a 32-bit displacement only within
the SIB byte.
• REG = 0b000 means an extension of /0 indicating mov r/m32, imm32.
• R/M = 0b100 means a SIB byte follows.
(c) Read and decode the next byte, 15, as the SIB byte: 00 010 101.
• SS = 0b00 means the index is scaled by 1.

10
• INDEX = 0b010 means the scaled index is edx.
• BASE = 0b101 means the base is ebp. However, looking at Table 2-3, we
note that a base of ebp in addressing mode 0b00 indicates there is no base but
rather only a disp32. Thus, we only have a scaled register (edx) and a 32-bit
displacement.
(d) The instruction is 11 bytes: 1-byte opcode + 1-byte ModR/M + 1-byte SIB + 4-byte
displacement + 4-byte immediate.
(e) Read the next 4 bytes, DD CC BB AA, as the disp32 giving a displacement of
0xAABBCCDD.
(f) Read the next 4 bytes as the immediate, 44 33 22 11 becomes the immediate
value 0x11223344.
(g) Add the formatted instruction to the dictionary. We know to use dword because of
the imm32.

00000000: C70415DDCCBBAA44332211
mov dword [ edx*1 + 0xAABBCCDD ], 0x11223344

3.3 Scale 0b10 and Base edx


C7 84 BA DD CC BB AA 44 33 22 11
1. counter points to byte C7 which cannot be directly decoded to an instruction mnemonic.

(a) We must read the ModR/M byte and inspect the REG field to determine the mnemonic.
(b) Read and decode the next byte, 84, as the ModR/M byte: 10 000 100.
• MOD = 0b10 means disp32.
• REG = 0b000 means an extension of /0 indicating mov r/m32, imm32.
• R/M = 0b100 means a SIB byte follows.
(c) Read and decode the next byte, BA, as the SIB byte: 10 111 010.
• SS = 0b10 means the index is scaled by 4.
• INDEX = 0b111 means the scaled index is edi.
• BASE = 0b010 means the base is edx.
(d) The instruction is 11 bytes: 1-byte opcode + 1-byte ModR/M + 1-byte SIB + 4-byte
displacement + 4-byte immediate.
(e) Read the next 4 bytes, DD CC BB AA, as the disp32 giving a displacement of
0xAABBCCDD.
(f) Read the next 4 bytes as the immediate, 44 33 22 11 becomes the immediate
value 0x11223344.
(g) Add the formatted instruction to the dictionary. We know to use dword because of
the imm32.
00000000: C784BADDCCBBAA44332211
mov dword [ edx + edi*4 + 0xAABBCCDD ], 0x11223344

11
3.4 Scale 0b11 and No Displacement
C7 04 FA 44 33 22 11
1. counter points to byte C7 which cannot be directly decoded to an instruction mnemonic.

(a) We must read the ModR/M byte and inspect the REG field to determine the mnemonic.
(b) Read and decode the next byte, 04, as the ModR/M byte: 00 000 100.
• MOD = 0b00 normally means no displacement, but if there is a SIB we will
need to consult Table 2-3 as well.
• REG = 0b000 means an extension of /0 indicating mov r/m32, imm32.
• R/M = 0b100 means a SIB byte follows.
(c) Read and decode the next byte, FA, as the SIB byte: 11 111 010.
• SS = 0b11 means the index is scaled by 8.
• INDEX = 0b111 means the scaled index is edi.
• BASE = 0b010 means the base is edx. Since the base is not ebp, there is no
disp32.
(d) The instruction is 7 bytes: 1-byte opcode + 1-byte ModR/M + 1-byte SIB + 4-byte
immediate.
(e) Read the next 4 bytes as the immediate, 44 33 22 11 becomes the immediate
value 0x11223344.
(f) Add the formatted instruction to the dictionary. We know to use dword because of
the imm32.
00000000: C704FA44332211 mov dword [ edx + edi*8 ], 0x11223344

3.5 Using esp in SIB


89 3C 24
1. counter points to byte 89 which we recognize as mov r/m32, r32.

(a) Read and decode the next byte, 3C, as the ModR/M byte: 00 111 100.
• MOD = 0b00 means no displacement but we also need to check the SIB byte
if one exists.
• REG = 0b111 means r32 (2nd operand) is edi.
• R/M = 0b100 means SIB byte follows.
(b) Read and decode the next byte, 24, as the SIB byte: 00 100 100.
• SS = 0b00 means the index is scaled by 1.
• INDEX = 0b100 means there is no scaled index.
• BASE = 0b100 means the base is esp. Since the base is not ebp, there is no
disp32.

12
(c) The instruction is 3 bytes: 1-byte opcode + 1-byte ModR/M + 1-byte SIB
(d) Add the formatted instruction to the dictionary.
00000000: 893C24 mov dword [ esp ], edi

13

You might also like