0% found this document useful (0 votes)

17 views5 pages

Assembly #2

The document discusses various assembly programming techniques in x86, including creating loops without the LOOP instruction, using while-loops, and advanced memory addressing. It explains the LEA instruction for efficient address computation and how to handle command-line arguments by converting ASCII to integers. Additionally, it covers register renaming to avoid dependencies and improve instruction-level parallelism in CPU operations.

Uploaded by

Braincain007

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views5 pages

Assembly #2

Uploaded by

Braincain007

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

Assembly #02 - More on x86

Loop without LOOP instruction

You will notice in the last lecture we used mainly the LOOP command that uses the
ECX register as a method of counting. This is done by decrementing the register every
time the LOOP instruction is triggered. Now although this is a standard practice to
use the LOOP command there are other ways to create for-loops.

** The current print function (i.e., print ret), has a slight problem. **

Sadly, with how I have implemented the signed version of printing, and the way some
OS’s are set up to run, “custom” loops requires messing with a stack. It is important
in any case to always use a safety procedure. This is where stacking values can be
useful. The main idea is to stack (push) used registers before sending out a call and
then popping the registers after a ret.

For example, let’s try converting a for-loop without the LOOP instruction.

Python:
for i in range (0 , 6) : # inclusive of 5 ( i . e . , i <= 5)
print ( i )

While-Loops
While-loops are pretty fundamental to our programming, as such there are a few ways
to handle them. Let’s take this Python code for example:

Python:
counter = 0
while counter <= 5:
print ( counter )
counter -= 1

While-loops work on a true/false concept, meaning we will need to use either CMP
or TEST instructions to make decisions. Remember it is a main part of our control
flow to use CMP or TEST.

1
Advanced Memory Addressing in x86-64 (without hiding be-
hind abstractions)
The x86-64 addressing formula is:
[ base + index * scale + displacement ]

Where:

• base = general-purpose register

• index = general-purpose register (optional)

• scale = 1, 2, 4, or 8 (usually matching data sizes)

• displacement = constant offset

This idea was discussed earlier, and in general is used for arrays.

Let’s look at an example:

# Python code for arrays ( really a list lmao )
array = [10 , 20 , 30 , 40 , 50]
index = 2
print ( array [ index ]) # prints 30
print ( array [ index + 1]) # prints 40

LEA - Compute Effective Address

One of the most useful commands is one that allows us to do a memory read without
storage and a computation, this is known as the LEA instruction. Can compute an
address without storing it in a register and then follow through with a computation
(i.e., no dereference needed).

Instruction syntax:
lea destination , [ base + index * scale + displacement ]

Essentially we skip the register loading step. Let’s look at an example:

; Python sample code
rbx = 10
rcx = 3
sum = rbx + rcx * 4 + 8

Normally this would require multiple instructions to first load and store to compute
the sum portion. Let’s see how it is handled with LEA:
extern print_ret
section .text
global _start

_start :
mov rbx , 10

2
mov rcx , 3

lea rax , [ rbx + rcx 4 + 8] ; rax = rbx + rcx 4 + 8

; rax = 10 + 3*4 + 8 = 30

; rax holds 30 without a single ADD or MUL instruction !

call print_ret ; view rax

; exit
mov rax , 60
xor rdi , rdi
syscall

It’s important to note that MUL and ADD are not the only instructions that LEA
can handle, it can also do SUB (but only in unique cases).

LEA is also more powerful than this as it can be used to do Address Calculation (i.e.,
it’s normal use-case).
; assume we have this array again
section .data
array dq 10 , 20 , 30 , 40 , 50 ; array [0] to array [4]

; how can we get address of array [2] = > 30 using LEA

lea rsi , [ array + 2*8] ; rsi = & array [2]

; rsi now points to the memory where array [2] is stored

; it did not load array [2] ’ s value , only its address

This is just good old address math! This is typically faster than MOV and ADD/MUL
instructions to do a similar idea.

Arguments/Parameters
We understand how to hard-code values into registers (i.e., as constants) but what if
we want to handle dynamic user input?

For example, what if we want to pass in the numbers 10 and 5 to a program?

$ ./ if 10 5

It was mentioned that we have specific register’s dedicated to handling arguments

(e.g., RDI, RSI, RDX, RCX, R8, and R9). While this is true they are a noted
convention, they still require the values being stored into those registers.

For example:
$ ./ if 10 5
# does not mean RDI now equals 10 and RSI equals 5
# we must grab them from a stack and store them accordingly

3
So how do we handle this, don’t CLI arguments come back as a ‘char’ type? The
answer: we will have to do something similar to what is normal in C → atoi.

In C:
# include < stdio .h >
# include < stdlib .h >

int main ( int argc , char * argv []) {

int x = argv [1]; // will not resolve properly , it is an
ASCII representation at this point
int x = atoi ( argv [1]) ; // returns the ASCII representation
to int
return 0;
}

For us to handle this, we will need to first grab the arguments from the stack pointer
(i.e., RSP). Then convert it from an ASCII representation into an integer represen-
tation.

Register Renaming
In reality the CPU internally has more physical registers than the 16 we have shown/dis-
cussed in past lectures.

A problem arises when dealing when we reuse registers, we may introduce:

• True Dependency (Read After Write, RAW)

– This is where an instruction needs a result from previous instruction.

• False Dependency (Write After Write, WAW)

– This is where two instructions write to the same register

• Anti Dependency (Write After Read, WAR)

– This is where an instruction writes to a register that is used as a source

before

With Register Renaming the main idea is to write code that avoids unnecessary
dependencies.

Let’s take a look at some Bad Code:

mov rax , [ array ] ; rax = array [0]
add rax , [ array +8] ; rax = array [0] + array [1]
sub rax , [ array +16] ; rax = rax - array [2]

What are the issues here?

• RAX is used for both accumulation and subtraction in sequence.

4
• This forces the CPU to wait for the previous RAX instruction to finish before
starting the next.

How can we improve this?

mov rax , [ array ] ; rax = array [0]
mov rbx , [ array +8] ; rbx = array [1]
add rax , rbx ; rax = array [0] + array [1]
mov rcx , [ array +16] ; rcx = array [2]
sub rax , rcx ; rax = rax - array [2]

What is different here?

• Now the CPU can schedule mov rbx, [array+8] and mov rcx, [array+16] in
“parallel”

• No false dependencies

• Exploits instruction-level parallelism (ILP)

Register renaming helps remove WAR and WAW hazards.

Without Renaming :
rax --> rax --> rax ( sequential dependency chain )

With Renaming :
[ one core ] [ one core ]
[ rax + rbx ] --> [ rax + rcx ] = > result ( parallelizable )

Lecture 10
No ratings yet
Lecture 10
20 pages
2.avr Risc
100% (1)
2.avr Risc
46 pages
Review of Assembly Language: Program "Text" Contains Binary Instructions
No ratings yet
Review of Assembly Language: Program "Text" Contains Binary Instructions
27 pages
RISC-V Datapath Overview
No ratings yet
RISC-V Datapath Overview
36 pages
Lec 4
100% (2)
Lec 4
42 pages
Memory
No ratings yet
Memory
43 pages
RISC ISA Development Project
No ratings yet
RISC ISA Development Project
28 pages
Lecture 4
100% (1)
Lecture 4
109 pages
04 - Instruction Set Architecture-RV Part III
No ratings yet
04 - Instruction Set Architecture-RV Part III
56 pages
Intel Assembly Language Cheat Sheet
No ratings yet
Intel Assembly Language Cheat Sheet
8 pages
Design With Microprocessors: Year III Computer Science 1-st Semester Lecture 2: AVR Registers and Instructions
No ratings yet
Design With Microprocessors: Year III Computer Science 1-st Semester Lecture 2: AVR Registers and Instructions
26 pages
x86 Assembly and GCC __asm__ Guide
No ratings yet
x86 Assembly and GCC __asm__ Guide
28 pages
Assembly
No ratings yet
Assembly
27 pages
ARM Assembly Language Primer
No ratings yet
ARM Assembly Language Primer
7 pages
657668478
No ratings yet
657668478
78 pages
Lab 02
No ratings yet
Lab 02
7 pages
Lec07 Annotated
No ratings yet
Lec07 Annotated
26 pages
A Brief Tutorial On GCC Inline Asm
100% (5)
A Brief Tutorial On GCC Inline Asm
10 pages
Cse331 l3 Arm Isa
100% (2)
Cse331 l3 Arm Isa
101 pages
EE216 Course Lecture-8 160425
No ratings yet
EE216 Course Lecture-8 160425
63 pages
Microprocessor Instruction Guide
No ratings yet
Microprocessor Instruction Guide
58 pages
Lecture 6
No ratings yet
Lecture 6
54 pages
4-Instruction Set
No ratings yet
4-Instruction Set
45 pages
PowerPC Assembly Basics
No ratings yet
PowerPC Assembly Basics
36 pages
EE447 Week5 2023-24
No ratings yet
EE447 Week5 2023-24
37 pages
Arm Assembly Language
No ratings yet
Arm Assembly Language
53 pages
EE209A - 24 15 Assembly2
No ratings yet
EE209A - 24 15 Assembly2
45 pages
NET3001 4 AdvAsm
No ratings yet
NET3001 4 AdvAsm
43 pages
Avr A & A: Rchitecture Ssembly
No ratings yet
Avr A & A: Rchitecture Ssembly
45 pages
Mips Instruction Set
No ratings yet
Mips Instruction Set
57 pages
x86 Assembly Language Tutorial Guide
No ratings yet
x86 Assembly Language Tutorial Guide
23 pages
Roadmap: Java: C
No ratings yet
Roadmap: Java: C
96 pages
L7 Single Cycle DP
No ratings yet
L7 Single Cycle DP
24 pages
Feleke Labwork2
No ratings yet
Feleke Labwork2
4 pages
MIPS Branch and Jump Instructions Guide
No ratings yet
MIPS Branch and Jump Instructions Guide
30 pages
8051 Assembly Guide for Developers
No ratings yet
8051 Assembly Guide for Developers
39 pages
8051 Microcontroller Assembly Guide
No ratings yet
8051 Microcontroller Assembly Guide
22 pages
Chapter 11 Single Cycle Datapath
No ratings yet
Chapter 11 Single Cycle Datapath
17 pages
Lab 4: Introduction To x86 Assembly
No ratings yet
Lab 4: Introduction To x86 Assembly
14 pages
01 Lecture02
No ratings yet
01 Lecture02
78 pages
Linux x64 Calling Conventions Explained
No ratings yet
Linux x64 Calling Conventions Explained
33 pages
ch4 Handouts
No ratings yet
ch4 Handouts
72 pages
Memory Locations and Addresses
No ratings yet
Memory Locations and Addresses
41 pages
MIPS Processor Basics for Engineers
No ratings yet
MIPS Processor Basics for Engineers
25 pages
CPU Architecture Basics
No ratings yet
CPU Architecture Basics
56 pages
Lecture01 Intro
No ratings yet
Lecture01 Intro
67 pages
Assembly Notes
No ratings yet
Assembly Notes
18 pages
ARM Microcontrollers Instruction Set Guide
No ratings yet
ARM Microcontrollers Instruction Set Guide
115 pages
Chapter 4 - Assembly Language Programming
No ratings yet
Chapter 4 - Assembly Language Programming
33 pages
Quiz2 Practice Solutions-1
No ratings yet
Quiz2 Practice Solutions-1
10 pages
Referral Sheet Format
No ratings yet
Referral Sheet Format
2 pages
Computer Instruction Sets Guide
No ratings yet
Computer Instruction Sets Guide
31 pages
MAES - MID - LECTURE 05 - v4
No ratings yet
MAES - MID - LECTURE 05 - v4
21 pages
Module 2
No ratings yet
Module 2
68 pages
Intel x86 Architecture Overview
100% (1)
Intel x86 Architecture Overview
72 pages
6 Machine - Intro v2
No ratings yet
6 Machine - Intro v2
29 pages
Cse331 L3 Arm Isa
100% (2)
Cse331 L3 Arm Isa
103 pages
Spellshaper Wizard - V1.0
No ratings yet
Spellshaper Wizard - V1.0
7 pages
PB Naval Combat
No ratings yet
PB Naval Combat
14 pages
Questmaster 2.0 (Digital View)
No ratings yet
Questmaster 2.0 (Digital View)
33 pages
Perrich Tealeaf (Thief)
No ratings yet
Perrich Tealeaf (Thief)
2 pages
Eng Et Al. - 2024 - Patterns of Multi-Container Composition For Service Orchestration With Docker Compose
No ratings yet
Eng Et Al. - 2024 - Patterns of Multi-Container Composition For Service Orchestration With Docker Compose
43 pages
Practice Final Exam
No ratings yet
Practice Final Exam
11 pages
Final - Study Guide
No ratings yet
Final - Study Guide
3 pages
Wk3 - Lecture 3-27-25 Practical Firewalls - WB
No ratings yet
Wk3 - Lecture 3-27-25 Practical Firewalls - WB
41 pages
Choi Lecture CH19
No ratings yet
Choi Lecture CH19
2 pages
Assembly #4
No ratings yet
Assembly #4
3 pages
Assembly #1
100% (1)
Assembly #1
8 pages
Midterm - Study Guide
No ratings yet
Midterm - Study Guide
4 pages
Lesson 1 - Overview & Key Concepts
No ratings yet
Lesson 1 - Overview & Key Concepts
12 pages
Fault Tolerance in High Performance Computing
No ratings yet
Fault Tolerance in High Performance Computing
25 pages
Lesson 2.1 - Intro + x86-x64 Assembly
100% (1)
Lesson 2.1 - Intro + x86-x64 Assembly
33 pages
Instruction Level Parallelism Through Microtrheading - A Scalable Approach To Chip Multiprocessors
No ratings yet
Instruction Level Parallelism Through Microtrheading - A Scalable Approach To Chip Multiprocessors
23 pages
Instruction Scheduling For Instruction Level Parallel Processors
No ratings yet
Instruction Scheduling For Instruction Level Parallel Processors
22 pages
A Survey of Fault-Tolerance and Fault-Recovery Techniques in Parallel Systems
No ratings yet
A Survey of Fault-Tolerance and Fault-Recovery Techniques in Parallel Systems
13 pages
Dynamics GP Email Troubleshooting Guide: If You Are Having Workflow Email Issues, Those Are Held in The Last Section
No ratings yet
Dynamics GP Email Troubleshooting Guide: If You Are Having Workflow Email Issues, Those Are Held in The Last Section
14 pages
Servicenow Interview Questions From Interviewbit
No ratings yet
Servicenow Interview Questions From Interviewbit
20 pages
Cloud Services Comparison: AWS, Azure, GCP
No ratings yet
Cloud Services Comparison: AWS, Azure, GCP
2 pages
Abhishek ATS Resume
No ratings yet
Abhishek ATS Resume
1 page
Week 04 Assignment
No ratings yet
Week 04 Assignment
5 pages
Unit III
No ratings yet
Unit III
85 pages
Karthikeyan Resume
No ratings yet
Karthikeyan Resume
2 pages
Introduction To Parallel Computing LLNL
No ratings yet
Introduction To Parallel Computing LLNL
44 pages
Akash Web File (Final) - 1
No ratings yet
Akash Web File (Final) - 1
65 pages
Voice-Controlled Wheelchair Design
No ratings yet
Voice-Controlled Wheelchair Design
34 pages
Decision Structures: Starting Out With Java: From Control Structures Through Objects Fifth Edition
No ratings yet
Decision Structures: Starting Out With Java: From Control Structures Through Objects Fifth Edition
61 pages
Understanding Classes and Objects in C++
No ratings yet
Understanding Classes and Objects in C++
32 pages
Pgdmad 202 SLM
No ratings yet
Pgdmad 202 SLM
237 pages
How To Use Geographic Maps in Visual Studio Applications
No ratings yet
How To Use Geographic Maps in Visual Studio Applications
27 pages
Python Basics.
No ratings yet
Python Basics.
56 pages
Blue Cherry
No ratings yet
Blue Cherry
2 pages
Java Utility Classes Explained
No ratings yet
Java Utility Classes Explained
3 pages
Complete Spring Boot Interview Questions
No ratings yet
Complete Spring Boot Interview Questions
5 pages
wolfSSL Embedded SSL - TLS Library - Now Supporting TLS 1.3
No ratings yet
wolfSSL Embedded SSL - TLS Library - Now Supporting TLS 1.3
2 pages
NEP BCA IV Sem Python Programming
No ratings yet
NEP BCA IV Sem Python Programming
2 pages
Handwritten Python Code Paper 01 Practice Questions - STUDENT
No ratings yet
Handwritten Python Code Paper 01 Practice Questions - STUDENT
13 pages
Object-Oriented and Classical Software Engineering: Stephen R. Schach
No ratings yet
Object-Oriented and Classical Software Engineering: Stephen R. Schach
63 pages
1 Scratch Material
No ratings yet
1 Scratch Material
19 pages
PRACTICAL 2 Python
No ratings yet
PRACTICAL 2 Python
7 pages
Intro to Computer Programming
No ratings yet
Intro to Computer Programming
9 pages
Bauhaus: Program Analysis Tool Suite
100% (1)
Bauhaus: Program Analysis Tool Suite
12 pages
Java Mapping Secrets Revealed - SAP Blogs
No ratings yet
Java Mapping Secrets Revealed - SAP Blogs
13 pages
Fundamental Data Structures
No ratings yet
Fundamental Data Structures
370 pages
Documentation (Pizza Hut)
94% (16)
Documentation (Pizza Hut)
87 pages
COMPUTER 4 Reviewer
No ratings yet
COMPUTER 4 Reviewer
4 pages

Assembly #2

Uploaded by

Assembly #2

Uploaded by

Assembly #02 - More on x86

Loop without LOOP instruction

• base = general-purpose register

• index = general-purpose register (optional)

• scale = 1, 2, 4, or 8 (usually matching data sizes)

• displacement = constant offset

Let’s look at an example:

LEA - Compute Effective Address

Essentially we skip the register loading step. Let’s look at an example:

lea rax , [ rbx + rcx *4 + 8] ; rax = rbx + rcx *4 + 8

; rax holds 30 without a single ADD or MUL instruction !

; how can we get address of array [2] = > 30 using LEA

; rsi now points to the memory where array [2] is stored

For example, what if we want to pass in the numbers 10 and 5 to a program?

It was mentioned that we have specific register’s dedicated to handling arguments

int main ( int argc , char * argv []) {

A problem arises when dealing when we reuse registers, we may introduce:

• True Dependency (Read After Write, RAW)

– This is where an instruction needs a result from previous instruction.

• False Dependency (Write After Write, WAW)

– This is where two instructions write to the same register

• Anti Dependency (Write After Read, WAR)

– This is where an instruction writes to a register that is used as a source

Let’s take a look at some Bad Code:

What are the issues here?

• RAX is used for both accumulation and subtraction in sequence.

How can we improve this?

What is different here?

• Exploits instruction-level parallelism (ILP)

Register renaming helps remove WAR and WAW hazards.

You might also like

lea rax , [ rbx + rcx 4 + 8] ; rax = rbx + rcx 4 + 8