Cortex M4
Floating Point Unit
Overview
FPU : Floating Point Unit
Handles real number computation
Standardized by IEEE.754-2008
Number format
Arithmetic operations
Number conversion
Special values
4 rounding modes
5 exceptions and their handling
ARM Cortex-M FPU ISA
Supports
Add, subtract, multiply, divide
Multiply and accumulate
Square root operations
C language example
float function1(float number1, float number2)
{
float temp1, temp2;
temp1 = number1 + number2;
temp2 = number1/temp1;
return temp2;
}
# float function1(float number1, float number2)
# {
#
float temp1, temp2;
#
#
temp1 = number1 + number2;
VADD.F32 S1,S0,S1
#
temp2 = number1/temp1;
VDIV.F32 S0,S0,S1
#
#
return temp2;
BX
LR
# }
1 assembly instruction
Call Soft-FPU
# float function1(float number1, float number2)
# {
PUSH
{R4,LR}
MOVS
R4,R0
MOVS
R0,R1
#
float temp1, temp2;
#
#
temp1 = number1 + number2;
MOVS
R1,R4
BL
__aeabi_fadd
MOVS
R1,R0
#
temp2 = number1/temp1;
MOVS
R0,R4
BL
__aeabi_fdiv
#
#
return temp2;
POP
{R4,PC}
# }
Performances
Time execution comparison for a 29 coefficient FIR on float 32 with
and without FPU (CMSIS library)
Execution
Time
10x improvement
Best compromise
Development time
vs. performance
No FPU
FPU
Rounding issues
The precision has some limits
Rounding errors can be accumulated along the various operations an
may provide unaccurate results (do not do financial operations with
floatings)
Few examples
If you are working on two numbers in different base, the hardware
automatically denormalizes one of the two numbers to make the
calculation in the same base
If you are substracting two very close numbers, you are loosing the
relative precision (also called cancellation error)
If you are reorganizing the various operations, you may not
obtain the same result because of the rounding errors
ARM Cortex-M FPU
Introduction
Single precision FPU
Conversion between
Integer numbers
Single precision floating point numbers
Half precision floating point numbers
Handling floating point exceptions (Untrapped)
Dedicated registers
32 single precision registers (S0-S31) which can be viewed as 16
Doubleword registers for load/store operations (D0-D15)
FPSCR for status & configuration
Modifications vs IEEE 754
Full Compliance mode
Process all operations according to IEEE 754
Alternative Half-Precision format
(-1)s x (1 + (Ni.2-i) ) x 216 and no de-normalize number support
Flush-to-zero mode
De-normalized numbers are treated as zero
Associated flags for input and output flush
Default NaN mode
Any operation with an NaN as an input or that generates a NaN
returns the default NaN
Complete implementation
Cortex-M4F does NOT support all operations of IEEE
754-2008
Full implementation is done by software
Unsupported operations
Remainder (% operator)
Round FP number to integer-value FP number
Binary to decimal conversions
Decimal to binary conversions
Direct comparison of Single Precision (SP) and Double Precision
(DP) values
Floating-Point Status & Control Register
Condition code bits
negative, zero, carry and overflow (update on compare
operations)
ARM special operating mode configuration
half-precision, default NaN and flush-to-zero mode
The rounding mode configuration
nearest, zero, plus infinity or minus infinity
The exception flags
Inexact result flag may not be routed to the interrupt controller
FPU instructions
FPU arithmetic instructions
Operation
Absolute value
Description
Assembler
Cycle
of float
VABS.F32
Addition
float
and multiply float
floating point
VNEG.F32
VNMUL.F32
VADD.F32
1
1
1
Subtract
float
VSUB.F32
float
then accumulate float
then subtract float
then accumulate then negate float
the subtract the negate float
then accumulate float
then subtract float
then accumulate then negate float
then subtract then negate float
VMUL.F32
VMLA.F32
VMLS.F32
VNMLA.F32
VNMLS.F32
VFMA.F32
VFMS.F32
VFNMA.F32
VFNMS.F32
1
3
3
3
3
3
3
3
3
float
VDIV.F32
14
of float
VSQRT.F32
14
Negate
Multiply
Multiply
(fused)
Divide
Square-root
FPU compare & convert instructions
Operation
Compare
Convert
Description
float with register or zero
float with register or zero
between integer, fixed-point, half precision
and float
Assembler
Cycle
VCMP.F32
VCMPE.F32
1
1
VCVT.F32
FPU Load/Store Instructions
Operation
Load
Store
Move
Pop
Push
Description
multiple doubles (N doubles)
multiple floats (N floats)
single double
single float
multiple double registers (N doubles)
multiple float registers (N doubles)
single double register
single float register
top/bottom half of double to/from core register
immediate/float to float-register
two floats/one double to/from core registers
one float to/from core register
floating-point control/status to core register
core register to floating-point control/status
double registers from stack
float registers from stack
double registers to stack
float registers to stack
Assembler
VLDM.64
VLDM.32
VLDR.64
VLDR.32
VSTM.64
VSTM.32
VSTR.64
VSTR.32
VMOV
VMOV
VMOV
VMOV
VMRS
VMSR
VPOP.64
VPOP.32
VPUSH.64
VPUSH.32
Cycle
1+2*N
1+N
3
2
1+2*N
1+N
3
2
1
1
2
1
1
1
1+2*N
1+N
1+2*N
1+N