0% found this document useful (0 votes)

77 views18 pages

Floating Point Instructions in Assembly

The document discusses floating point instructions for 64-bit Intel assembly language. It describes how floating point operations were previously handled by a separate chip but are now performed using 16 floating point registers that support both scalar and SIMD instructions. It provides an overview and examples of instructions for moving data to and from registers, basic math operations, conversions, comparisons, and mathematical functions.

Uploaded by

irshad

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

77 views18 pages

Floating Point Instructions in Assembly

Uploaded by

irshad

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Floating Point Instructions

Ray Seyfarth

June 29, 2012

64 Bit Intel Assembly Language 2011

c Ray Seyfarth
Floating point instructions

PC floating point operations were once done in a separate chip - 8087

This chip managed a stack of eight 80 bit floating point values
The stack and instructions still exist, but are largely ignored
x86-64 CPUs have 16 floating point registers (128 or 256 bits)
These registers can be used for single data instructions or single
instruction multiple data instructions (SIMD)
We will focus on these newer registers
The older instructions tended to start with the letter “f” and
referenced the stack using register names like ST0
The newer instructions reference using registers with names like
“XMMO”

64 Bit Intel Assembly Language 2011

c Ray Seyfarth
Outline

1 Moving data in and out of floating point registers

2 Addition

3 Subtraction

4 Basic floating point instructions

5 Data conversion

6 Floating point comparisons

7 Mathematical functions

8 Sample floating point code

64 Bit Intel Assembly Language 2011

c Ray Seyfarth
Moving scalars to or from floating point registers

movss moves a single 32 bit floating point value to or from an XMM

register
movsd moves a single 64 bit floating point value
There is no implicit data conversion - unlike the old instructions
which converted floating point data to an 80 bit internal format
The instructions follow the standard pattern of having possibly one
memory address

movss xmm0, [x] ; move value at x into xmm0

movsd [y], xmm1 ; move value from xmm1 to y
movss xmm2, xmm0 ; move from xmm0 to xmm2

64 Bit Intel Assembly Language 2011

c Ray Seyfarth
Moving packed data

The XMM registers are 128 bits

They can hold 4 floats or 2 doubles (or integers of various sizes)
On newer CPUs they are extended to 256 bits and referred to as YMM
registers when using all 256 bits
movaps moves 4 floats to/from a memory address aligned at a 16
byte boundary
movups does the same task with unaligned memory addresses
The Core i series performs unaligned moves efficiently
movapd moves 2 doubles to/from a memory address aligned at a 16
byte boundary
movupd does the same task with unaligned memory addresses

movups xmm0, [x] ; move 4 floats to xmm0

movupd [a], xmm15 ; move 2 doubles to a

64 Bit Intel Assembly Language 2011

c Ray Seyfarth
Floating point addition
addss adds a scalar float (single precision) to another
addsd adds a scalar double to another
addps adds 4 floats to 4 floats - pairwise addition
addpd adds 2 doubles to 2 doubles
There are 2 operands: destination and source
The source can be memory or an XMM register
The destination must be an XMM register
Flags are unaffected

movss xmm0, [a] ; load a

addss xmm0, [b] ; add b to a
movss [c], xmm0 ; store sum in c
movapd xmm0, [a] ; load 2 doubles from a
addpd xmm0, [b] ; add a[0]+b[0] and a[1]+b[1]
movapd [c], xmm0 ; store 2 sums in c
64 Bit Intel Assembly Language 2011
c Ray Seyfarth
Floating point subtraction

subss subtracts the source float from the destination

subsd subtracts the source double from the destination
subps subtracts 4 floats from 4 floats
subpd subtracts 2 doubles from 2 doubles

movss xmm0, [a] ; load a

subss xmm0, [b] ; add b from a
movss [c], xmm0 ; store a-b in c
movapd xmm0, [a] ; load 2 doubles from a
subpd xmm0, [b] ; add a[0]-b[0] and a[1]-b[1]
movapd [c], xmm0 ; store 2 differences in c

64 Bit Intel Assembly Language 2011

c Ray Seyfarth
Basic floating point instructions

instruction effect
addsd add scalar double
addss add scalar float
addpd add packed double
addps add packed float
subsd subtract scalar double
subss subtract scalar float
subpd subtract packed double
subps subtract packed float
mulsd multiply scalar double
mulss multiply scalar float
mulpd multiply packed double
mulps multiply packed float
divsd divide scalar double
divss divide scalar float
divpd divide packed double
divps divide packed float

64 Bit Intel Assembly Language 2011

c Ray Seyfarth
Conversion to a different length floating point

cvtss2sd converts a scalar single (float) to a scalar double

cvtps2pd converts 2 packed floats to 2 packed doubles
cvtsd2ss converts a scalar double to a scalar float
cvtpd2ps converts 2 packed doubles to 2 packed floats

cvtss2sd xmm0, [a] ; get a into xmm0 as a double

addsd xmm0, [b] ; add a double to a
cvtsd2ss xmm0, xmm0 ; convert to float
movss [c], xmm0

64 Bit Intel Assembly Language 2011

c Ray Seyfarth
Converting floating point to/from integer

cvtss2si converts a float to a double word or quad word integer

cvtsd2si converts a float to a double word or quad word integer
These 2 round the value
cvttss2si and cvttsd2si convert by truncation
cvtsi2ss converts an integer to a float in an XMM register
cvtsi2sd converts an integer to a double in an XMM register
When converting from memory a size qualifier is needed

cvtss2si eax, xmm0 ; convert to dword integer

cvtsi2sd xmm0, rax ; convert qword to double
cvtsi2sd xmm0, dword [x] ; convert dword integer

64 Bit Intel Assembly Language 2011

c Ray Seyfarth
Unordered versus ordered comparisons
Floating point comparisons can cause exceptions
Ordered comparisons cause exceptions one QNaN or SNaN
I QNaN means “quiet not a number”
I SNaN means “signalling not a number”
I Both have all exponent field bits set to 1
I QNaN has its top fraction bit equal to 1
An unordered comparison causes exceptions only for SNaN
gcc uses unordered comparisons
If it’s good enough for gcc, it’s good enough for me
ucomiss compares floats
ucomisd compares doubles
The first operand must be an XMM register
They set the zero flag, parity flag and carry flags
movss xmm0, [a]
mulss xmm0, [b]
ucomiss xmm0, [c]
jmple less_eq ; jmp if a*b <= c
64 Bit Intel Assembly Language 2011
c Ray Seyfarth
Mathematical functions

8087 had sine, cosine, arctangent and more

The newer instructions omit these operations on XMM registers
Instead you are supposed to use efficient library functions
There are instructions for
I Minimum
I Maximum
I Rounding
I Square root
I Reciprocal of square root

64 Bit Intel Assembly Language 2011

c Ray Seyfarth
Minimum and maximum

minss and maxss compute minimum or maximum of scalar floats

minsd and maxsd compute minimum or maximum of scalar doubles
The destination operand must be an XMM register
The source can be an XMM register or memory
minps and maxps compute minimum or maximum of packed floats
minpd and maxpd compute minimum or maximum of packed doubles
minps xmm0, xmm1 computes 4 minimums and places them in xmm0

movss xmm0, [x] ; move x into xmm0

maxss xmm0, [y] ; xmm0 has max(x,y)
movapd xmm0, [a] ; move a[0] and a[1] into xmm0
minpd xmm0, [b] ; xmm0[0] has min(a[0],b[0])
; xmm0[1] has min(a[1],b[1])

64 Bit Intel Assembly Language 2011

c Ray Seyfarth
Rounding

roundss rounds 1 float

roundps rounds 4 floats
roundsd rounds 1 double
roundpd rounds 2 doubles
The first operand is an XMM destination register
The second is the source in an XMM register or memory
The third operand is a rounding mode

mode meaning
0 round, giving ties to even numbers
1 round down
2 round up
3 round toward 0 (truncate)

64 Bit Intel Assembly Language 2011

c Ray Seyfarth
Square roots

sqrtss computes 1 float square root

sqrtps computes 4 float square roots
sqrtsd computes 1 double square root
sqrtpd computes 2 double square roots
The first operand is an XMM destination register
The second is the source in an XMM register or memory

64 Bit Intel Assembly Language 2011

c Ray Seyfarth
Distance in 3D
q
d= ((x1 − x2 )2 + (y1 − y2 )2 + (z1 − z2 )2 )
distance3d:
movss xmm0, [rdi] ; x from first point
subss xmm0, [rsi] ; subtract x from second point
mulss xmm0, xmm0 ; (x1-x2)^2
movss xmm1, [rdi+4] ; y from first point
subss xmm1, [rsi+4] ; subtract y from second point
mulss xmm1, xmm1 ; (y1-y2)^2
movss xmm2, [rdi+8] ; z from first point
subss xmm2, [rsi+8] ; subtract z from second point
mulss xmm2, xmm2 ; (z1-z2)^2
addss xmm0, xmm1 ; add x and y parts
addss xmm0, xmm2 ; add z part
sqrt xmm0, xmm0
ret
64 Bit Intel Assembly Language 2011
c Ray Seyfarth
Dot product in 3D

d = x1 x2 + y1 y2 + z1 z2

dot_product:
movss xmm0, [rdi]
mulss xmm0, [rsi]
movss xmm1, [rdi+4]
mulss xmm1, [rsi+4]
addss xmm0, xmm1
movss xmm2, [rdi+8]
mulss xmm2, [rsi+8]
addss xmm0, xmm2
ret

64 Bit Intel Assembly Language 2011

c Ray Seyfarth
Polynomial evaluation by Horner’s Rule

P(x) = p0 + p1 x + p2 x 2 · · · pn x n

bn = pn
bn−1 = pn−1 + bn x
bn−2 = pn−2 + bn−1 x
b0 = p0 + b1 x
horner: movsd xmm1, xmm0 ; use xmm1 as x
movsd xmm0, [rdi+rsi*8] ; accumulator for b_k
test esi, 0 ; is the degree 0?
jz done
more: sub esi, 1
mulsd xmm0, xmm1 ; b_k * x
addsd xmm0, [rdi+rsi*8] ; add p_k
jnz more
done: ret
64 Bit Intel Assembly Language 2011
c Ray Seyfarth

SIMD Programming Overview
No ratings yet
SIMD Programming Overview
31 pages
Advanced Computer Architecture
No ratings yet
Advanced Computer Architecture
71 pages
x86 Instruction Guide
No ratings yet
x86 Instruction Guide
31 pages
CPU Architecture Essentials
No ratings yet
CPU Architecture Essentials
26 pages
Intel SIMD Architecture Guide
No ratings yet
Intel SIMD Architecture Guide
74 pages
Lec15 x86SIMD
No ratings yet
Lec15 x86SIMD
74 pages
Exception Handling: M. Krishna Kumar MM/M4/LU11/V1/2004 1
No ratings yet
Exception Handling: M. Krishna Kumar MM/M4/LU11/V1/2004 1
33 pages
Assembly #4
No ratings yet
Assembly #4
3 pages
Intel Microprocessor Architecture Guide
No ratings yet
Intel Microprocessor Architecture Guide
126 pages
CME321 Microprocessors: Dr. O Uzhan Menemencio LU
No ratings yet
CME321 Microprocessors: Dr. O Uzhan Menemencio LU
109 pages
Chapter 7 P1
No ratings yet
Chapter 7 P1
25 pages
Vector Floating Point Instruction Set Quick Reference Card: Key To Tables
No ratings yet
Vector Floating Point Instruction Set Quick Reference Card: Key To Tables
3 pages
Artifact FLO1 2 ELab Learning Tools
No ratings yet
Artifact FLO1 2 ELab Learning Tools
31 pages
7 Machine - Condition - Codes v2
No ratings yet
7 Machine - Condition - Codes v2
25 pages
IA32 Assembly Language Overview
100% (1)
IA32 Assembly Language Overview
9 pages
L5 8086 Instructions 1
No ratings yet
L5 8086 Instructions 1
25 pages
COA2
No ratings yet
COA2
28 pages
Do Hoang Tu - Operating System From 0 To 1 (2022) - Removed - Removed - Removed
No ratings yet
Do Hoang Tu - Operating System From 0 To 1 (2022) - Removed - Removed - Removed
21 pages
MM 2
No ratings yet
MM 2
266 pages
L5 Arithmetic Logic and Shift Instr
No ratings yet
L5 Arithmetic Logic and Shift Instr
13 pages
MP Lect 4
No ratings yet
MP Lect 4
37 pages
Instruction Set and Addressing Modes Guide
No ratings yet
Instruction Set and Addressing Modes Guide
36 pages
12 - Floating Point Instructions
No ratings yet
12 - Floating Point Instructions
25 pages
8085 Assembly Language Programming Guide
100% (1)
8085 Assembly Language Programming Guide
22 pages
FALLSEM2025 26 - VL - BACSE103 - 00100 - ETH - 2025 09 17 - Module 3 Up To Division
No ratings yet
FALLSEM2025 26 - VL - BACSE103 - 00100 - ETH - 2025 09 17 - Module 3 Up To Division
71 pages
Lec17 x86SIMD PDF
No ratings yet
Lec17 x86SIMD PDF
80 pages
MMX Notes
No ratings yet
MMX Notes
2 pages
l5 Instruction Set and Addressing Modes
No ratings yet
l5 Instruction Set and Addressing Modes
48 pages
x86 and ARM Data Types Overview
No ratings yet
x86 and ARM Data Types Overview
20 pages
FALLSEM2021-22 CSE2006 ETH VL2021220104026 Reference Material I 16-11-2021 23-A-8087-Coprocessor Instructions-Programming
No ratings yet
FALLSEM2021-22 CSE2006 ETH VL2021220104026 Reference Material I 16-11-2021 23-A-8087-Coprocessor Instructions-Programming
51 pages
Chapter 4
No ratings yet
Chapter 4
18 pages
Computer Organization
No ratings yet
Computer Organization
26 pages
Addressing Modes
No ratings yet
Addressing Modes
4 pages
Activity No 2 Registers
No ratings yet
Activity No 2 Registers
9 pages
Programming With SIMD-instructions
No ratings yet
Programming With SIMD-instructions
10 pages
Chapter2 Part 2 Machine Instructions and Programs
No ratings yet
Chapter2 Part 2 Machine Instructions and Programs
38 pages
CPU Architecture: ALU and Number Systems
100% (1)
CPU Architecture: ALU and Number Systems
126 pages
Intel SIMD Architecture Overview
No ratings yet
Intel SIMD Architecture Overview
80 pages
Computer Architecture - Lab 7: Floating Point Arithmetic On MIPS
100% (1)
Computer Architecture - Lab 7: Floating Point Arithmetic On MIPS
10 pages
MP02 - Insruction Set 1
No ratings yet
MP02 - Insruction Set 1
31 pages
Lec2 Instructions
No ratings yet
Lec2 Instructions
33 pages
Sunu4 Arithmetic, Logic and Control Instructions2012
No ratings yet
Sunu4 Arithmetic, Logic and Control Instructions2012
45 pages
Instruction Sets in Computer Architecture
No ratings yet
Instruction Sets in Computer Architecture
47 pages
Chapter 4 - Arithmetic and Logic Instructions
No ratings yet
Chapter 4 - Arithmetic and Logic Instructions
47 pages
Intel x86 Instruction Set Architecture: Dr. Nihat Adar
No ratings yet
Intel x86 Instruction Set Architecture: Dr. Nihat Adar
41 pages
Computer Organization & Architecture: Chapter 2 (Lecture 2)
No ratings yet
Computer Organization & Architecture: Chapter 2 (Lecture 2)
50 pages
Unit I
No ratings yet
Unit I
131 pages
Assembly Language for 8086 Users
No ratings yet
Assembly Language for 8086 Users
27 pages
COA Class3
No ratings yet
COA Class3
57 pages
COAL Lab#05
100% (1)
COAL Lab#05
23 pages
Lecture 7
No ratings yet
Lecture 7
62 pages
Machine Instructions and Number Systems
No ratings yet
Machine Instructions and Number Systems
17 pages
Assembly Language Programming Basics
No ratings yet
Assembly Language Programming Basics
12 pages
Lecture01 Intro
No ratings yet
Lecture01 Intro
67 pages
Mic Unit III
No ratings yet
Mic Unit III
70 pages
AL41
No ratings yet
AL41
104 pages
Chapter14 - THE ARITHMETIC COPROCESSOR, MMX, AND SIMD TECHNOLOGIES
No ratings yet
Chapter14 - THE ARITHMETIC COPROCESSOR, MMX, AND SIMD TECHNOLOGIES
134 pages
Iran Detailed Political Notes
No ratings yet
Iran Detailed Political Notes
6 pages
Iran Current Report
No ratings yet
Iran Current Report
1 page
India Current Report
No ratings yet
India Current Report
1 page
FakeNet: Dynamic Malware Analysis Tool
No ratings yet
FakeNet: Dynamic Malware Analysis Tool
50 pages
Designing Interactions Foreword
No ratings yet
Designing Interactions Foreword
14 pages
Unit 6-Lists Explore For AP CSP
No ratings yet
Unit 6-Lists Explore For AP CSP
27 pages
Android Food Waste Management System
No ratings yet
Android Food Waste Management System
7 pages
COMSATS Electrical Engineering Class Schedule
No ratings yet
COMSATS Electrical Engineering Class Schedule
8 pages
Wowza ServerSideAPI
No ratings yet
Wowza ServerSideAPI
3,430 pages
6 System Calls
No ratings yet
6 System Calls
42 pages
Smart Home Automation Project Enhanced
No ratings yet
Smart Home Automation Project Enhanced
4 pages
Office 365 Secure Configuration Alignment
100% (1)
Office 365 Secure Configuration Alignment
82 pages
ERP Systems and Information Integration
No ratings yet
ERP Systems and Information Integration
8 pages
Network Security Engineer L1
No ratings yet
Network Security Engineer L1
2 pages
Manual Qhmi PDF
No ratings yet
Manual Qhmi PDF
354 pages
CL Tle Grade 9
No ratings yet
CL Tle Grade 9
15 pages
Computerized Census System Study
No ratings yet
Computerized Census System Study
136 pages
ARGUS-wt-control system-brochure-EN
No ratings yet
ARGUS-wt-control system-brochure-EN
8 pages
Essentials Four Day Student Workbook
No ratings yet
Essentials Four Day Student Workbook
328 pages
Notes Class 6
No ratings yet
Notes Class 6
25 pages
Internetworking Devices Overview
No ratings yet
Internetworking Devices Overview
3 pages
Spirits Bot
No ratings yet
Spirits Bot
6 pages
Manual SDK
No ratings yet
Manual SDK
59 pages
2025 PTT Usecases Activity Guide - Recruiting
No ratings yet
2025 PTT Usecases Activity Guide - Recruiting
13 pages
Using
No ratings yet
Using
108 pages
Pandas Data Cleaning Techniques Guide
No ratings yet
Pandas Data Cleaning Techniques Guide
11 pages
IoT 5th Unit
No ratings yet
IoT 5th Unit
10 pages
MCA Project: Telephone Directory
No ratings yet
MCA Project: Telephone Directory
22 pages
Compiler Design Assignment II Guide
No ratings yet
Compiler Design Assignment II Guide
2 pages
SharePoint Design and Implementation - An Enterprise Architecture
No ratings yet
SharePoint Design and Implementation - An Enterprise Architecture
89 pages
Suyog Toc Lab File
No ratings yet
Suyog Toc Lab File
49 pages
Excel Data Validation Guide
No ratings yet
Excel Data Validation Guide
14 pages
IAT Hooking Tutorial for x86 Processes
No ratings yet
IAT Hooking Tutorial for x86 Processes
3 pages
Credit Card Project-2
No ratings yet
Credit Card Project-2
17 pages

Floating Point Instructions in Assembly

Uploaded by

Floating Point Instructions in Assembly

Uploaded by

Floating Point Instructions

June 29, 2012

64 Bit Intel Assembly Language 2011

PC floating point operations were once done in a separate chip - 8087

64 Bit Intel Assembly Language 2011

1 Moving data in and out of floating point registers

4 Basic floating point instructions

6 Floating point comparisons

8 Sample floating point code

64 Bit Intel Assembly Language 2011

movss moves a single 32 bit floating point value to or from an XMM

movss xmm0, [x] ; move value at x into xmm0

64 Bit Intel Assembly Language 2011

The XMM registers are 128 bits

movups xmm0, [x] ; move 4 floats to xmm0

64 Bit Intel Assembly Language 2011

movss xmm0, [a] ; load a

subss subtracts the source float from the destination

movss xmm0, [a] ; load a

64 Bit Intel Assembly Language 2011

64 Bit Intel Assembly Language 2011

cvtss2sd converts a scalar single (float) to a scalar double

cvtss2sd xmm0, [a] ; get a into xmm0 as a double

64 Bit Intel Assembly Language 2011

cvtss2si converts a float to a double word or quad word integer

cvtss2si eax, xmm0 ; convert to dword integer

64 Bit Intel Assembly Language 2011

8087 had sine, cosine, arctangent and more

64 Bit Intel Assembly Language 2011

minss and maxss compute minimum or maximum of scalar floats

movss xmm0, [x] ; move x into xmm0

64 Bit Intel Assembly Language 2011

roundss rounds 1 float

64 Bit Intel Assembly Language 2011

sqrtss computes 1 float square root

64 Bit Intel Assembly Language 2011

64 Bit Intel Assembly Language 2011

You might also like