02 Computer Arithmetic
02 Computer Arithmetic
Computer Science
Chapter 2 Computer Arithmetic
2. Computer Arithmetic
2 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Computer Arithmetic | Winter Term 2024/2025
Foundations of
Computer Science
Chapter 2.1 Computer Arithmetic
Numeral Systems
Prof. Dr.-Ing. Richard Membarth
7 decimal system
VII Roman system
Maya system
25
||||| || line (unary) system
4 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Computer Arithmetic: Numeral Systems | Winter Term 2024/2025
Numeral Systems
Classification
numeral systems
polyadic system
sign-value notation hybrid systems
positional system
Roman system unary system Maya system decimal system binary system
... ...
2
LXXX ||||| || 42 101
5 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Computer Arithmetic: Numeral Systems | Winter Term 2024/2025
Numeral Systems
Polyadic System
Polyadic System
The polyadic system is a numeral system in which the contribution of a digit to the value of a number depends on its position. A
polyadic system to the base b is a system in which a number is formed as a sum of powers of b.
n
X
A natural number N can be represented as a sum of powers: N = ai · bi
i =0
6 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Computer Arithmetic: Numeral Systems | Winter Term 2024/2025
Numeral Systems
Polyadic System
7 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Computer Arithmetic: Numeral Systems | Winter Term 2024/2025
Numeral Systems
Binary System
binary system
decimal system octal system binary system
b=2 102 101 100 82 81 80 24 23 22 21 20
b ∈ {0, 1} 0 0 0
polyadic representation 1 1 1
2 2 1 0
overflow to next position if value range is exhausted
3 3 1 1
can be efficiently implemented in hardware
4 4 1 0 0
(represented and processed) 5 5 1 0 1
basic numeral system for number processing in 6 6 1 1 0
7 7 1 1 1
digital systems
8 1 0 1 0 0 0
9 1 1 1 0 0 1
1 0 1 2 1 0 1 0
... ... ... ... ...
1 9 2 3 1 0 0 1 1
2 0 2 4 1 0 1 0 0
8 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Computer Arithmetic: Numeral Systems | Winter Term 2024/2025
Numeral Systems
Polyadic System
examples
decimal system: 376110
376110 = 3761 = 3 · 103 + 7 · 102 + 6 · 101 + 1 · 100
octal system: 37618
37618 = 3 · 83 + 7 · 82 + 6 · 81 + 1 · 80 = 2033
| {z }
number shown as decimal
binary system: 1001012
1001012 = 1 · 25 + 0 · 24 + 0 · 23 + 1 · 22 + 0 · 21 + 1 · 20 = |{z}
37
number shown as decimal
hexadecimal system 3E116
3E116 = 3 · 162 + 14 · 161 + 1 · 160 = |{z}
993
number shown as decimal
9 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Computer Arithmetic: Numeral Systems | Winter Term 2024/2025
Numeral Systems
Horner’s Scheme
n
X
b-adic representation defines a polynomial: N = ai · bi
i =0
polynomial Horner’s scheme
p(b) = a0 + a1 · b + a2 · b2 + · · · + an · bn p(b) = (· · · (an · b + an−1 ) · b + · · · ) · b + a0
multiplications, additions, and only multiplications and additions, no
exponentiations exponentiations
expensive to compute more efficient computation
problems with accuracy (for floating-point can be used to convert between different
numbers) positional numeral systems
1 p = 0 1 p = 0
2 for i in range(0, n): 2 for i in range(n-1, -1, -1):
3 e = 1 3 p = p * b + a[i]
4 for k in range(1, i):
5 e = e * b
6 p = p + a[i] * e
10 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Computer Arithmetic: Numeral Systems | Winter Term 2024/2025
Numeral Systems
Horner’s Scheme
11 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Computer Arithmetic: Numeral Systems | Winter Term 2024/2025
Numeral Systems
Division with Remainder Method
2/2 = 1 remainder 0
1/2 = 0 remainder 1
12 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Computer Arithmetic: Numeral Systems | Winter Term 2024/2025
Numeral Systems
Number Conversion
13 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Computer Arithmetic: Numeral Systems | Winter Term 2024/2025
Numeral Systems
Conversion by Combination
if the base of a numeral system is the power of the base of another numeral system, numbers can be easily converted
combining several digits into a single digit
converting 1011 0101 00012 to hexadecimal
( 1011 0101 0001 )2
⇓ ⇓ ⇓
( B 5 1 )16
converting 5218 to binary
( 5 2 1 )8
⇓ ⇓ ⇓
( 101 010 001 )2
14 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Computer Arithmetic: Numeral Systems | Winter Term 2024/2025
Numeral Systems
Conversion of Digits
it is much easier to convert between numeral systems when considering each digits separately
this is not possible when converting from the decimal to the binary system
remedy: BCD codes
each decimal digit is represented by a 4-bit binary digit
4910 = 0100 1001BCD 8421
15 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Computer Arithmetic: Numeral Systems | Winter Term 2024/2025
Foundations of
Computer Science
Chapter 2.2 Computer Arithmetic
Signed and Unsigned Numbers
Prof. Dr.-Ing. Richard Membarth
negative numbers are prefixed by minus sign: 2510 , −2510 , −10102 , +10102 , · · ·
how should we encode the numbers / sign?
different types of signed binary number representations
sign-magnitude
one’s complement
two’s complement
elementary arithmetic supported in all systems: +, −, ∗, /
technical realization: costs (area) and performance (throughput) are important
17 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Computer Arithmetic: Signed and Unsigned Numbers | Winter Term 2024/2025
Signed and Unsigned Numbers
Negative Numbers
sign-magnitude
010101000011
simple approach 0110 5 4 3 0010
use sign bit with magnitude 6 2
0111 0001
7 1
left-most bit (most significant bit) indicates sign 4 bit
1000 -0 0 0000
0: positive number -7
-1
1001 1111
1: negative number -2 -6
1010 -3 -4 -5 1110
signed magnitude
101111001101
↑ | {z } 2 = 1210
0 0001100 | {z } 2 = −1210
1 0001100
↑
| magnitude | magnitude
sign bit sign bit
zero represented twice
000000002 (+0)
100000002 (−0)
18 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Computer Arithmetic: Signed and Unsigned Numbers | Winter Term 2024/2025
Signed and Unsigned Numbers
Negative Numbers
one’s complement
010101000011
obtained by inverting all bits 0110 5 4 3 0010
4310 = 001010112 → −4310 = 110101002 6 2
0111 0001
7 1
range of signed numbers: −(2N −1 − 1) · · · 2N −1 − 1 4 bit
1000 -7 0 0000
8-bit range: −127 · · · + 127
-6 -0
1001 1111
zero represented twice -5 -1
1010 -4 -3 -2 1110
000000002 (+0)
101111001101
111111112 (−0)
19 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Computer Arithmetic: Signed and Unsigned Numbers | Winter Term 2024/2025
Signed and Unsigned Numbers
Negative Numbers
two’s complement
010101000011
obtained by inverting all bits and adding one 0110 5 4 3 0010
4310 = 001010112 → −4310 = 110101012 6 2
0111 0001
7 1
range of signed numbers: −(2N −1 ) · · · 2N −1 − 1 4 bit
1000 -8 0 0000
8-bit range: −128 · · · + 127
-7 -1
1001 1111
zero represented once -6 -2
1010 -5 -4 -3 1110
000000002
101111001101
converting to decimal
−27 +26 +24 +22 +20 = −128+64+16+4+1 = −43
20 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Computer Arithmetic: Signed and Unsigned Numbers | Winter Term 2024/2025
Signed and Unsigned Numbers
Logical Operators
21 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Computer Arithmetic: Signed and Unsigned Numbers | Winter Term 2024/2025
Signed and Unsigned Numbers
Logical Operators
0 1 0 0 0 0 0 0 0 0 0
1 0 0 1 0 0 1 1 0 1 1
1 0 0 1 0 1 1 0 1
1 1 1 1 1 1 1 1 0
22 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Computer Arithmetic: Signed and Unsigned Numbers | Winter Term 2024/2025
Signed and Unsigned Numbers
Logical Operators
5 00000101 5 00000101
−9 11110111 −9 11110111
2 00000010 2 00000010
23 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Computer Arithmetic: Signed and Unsigned Numbers | Winter Term 2024/2025
Signed and Unsigned Numbers
Arithmetic
3) 1 + 0 = 1 0 0 0 0 0 1 0 1 0 0 0 1 (8110 )
4) 1 + 1 = 0 (with a carry-over of 1)
two’s complement: carry beyond the msb is
ignored
24 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Computer Arithmetic: Signed and Unsigned Numbers | Winter Term 2024/2025
Signed and Unsigned Numbers
Arithmetic
3) 1 − 0 = 1 0 0 0 0 0 0 1 1 1 0 0 1 (5710 )
4) 1 − 1 = 0
two’s complement: subtraction via addition
25 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Computer Arithmetic: Signed and Unsigned Numbers | Winter Term 2024/2025
Signed and Unsigned Numbers
Arithmetic
1 1 1 1 1 1 0 0 0 1 1 1 (−5710 )
26 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Computer Arithmetic: Signed and Unsigned Numbers | Winter Term 2024/2025
Signed and Unsigned Numbers
Arithmetic
multiplication
can be realized by shifting and summing up
multiplying n-bit number with m-bit number results in n+m-bit number
1 1 0 1 (1310 )
× 1 0 1 (510 )
1 1 0 1
+ 0 0 0 0
1 1 0 1
+ 1 1 0 1
1 0 0 0 0 0 1 (6510 )
27 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Computer Arithmetic: Signed and Unsigned Numbers | Winter Term 2024/2025
Signed and Unsigned Numbers
Arithmetic
division
using Euclidean’s algorithm 1 1 0 1 / 1 0 1 = 10 (1310 /510 = 210 )
− 1 0 1 0
1 # c = n / d, remainder r
2 r = n 0 1 1
3 c = 0 − 0 0 0
4 while r >= d:
5 r = r - d 1 1 remainder (310 )
6 c = c + 1
28 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Computer Arithmetic: Signed and Unsigned Numbers | Winter Term 2024/2025
Signed and Unsigned Numbers
Adder
S3 C3 S2 C2 S1 C1 S0
29 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Computer Arithmetic: Signed and Unsigned Numbers | Winter Term 2024/2025
Signed and Unsigned Numbers
Status Register
the adder sets several status registers for the next instruction or software
N negative (sign) flag: indicates that the result is negative
Z zero flag: indicates that the result is zero
C carry flag: allows to carry a binary digit from a less significant word to the LSB of a more significant word
V overflow flag: indicates that the signed result of an operation is too large to fit in the register width
the carry flag can be used to indicate overflow for unsigned addition
the overflow flag is set if the sign of the summands are equal and different from the sign of the result
0 1 1 1 1 1 1 1 (12710 ) 1 0 0 1 1 0 0 1 ( − 10310 )
+ 0 0 0 0 0 0 0 1 (110 ) + 1 0 0 0 1 1 1 1 ( − 11310 )
1 1 1 1 1 1 1 carry 1 1 1 1 1 1 carry
1 0 0 0 0 0 0 0 (−12810 ) 0 0 1 0 1 0 0 0 (4010 )
N: 1 Z : 0 C: 0 V : 1 N: 0 Z : 0 C: 1 V : 1
8-bit signed integer (two’s complement) 8-bit signed integer (two’s complement)
30 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Computer Arithmetic: Signed and Unsigned Numbers | Winter Term 2024/2025
Signed and Unsigned Numbers
Status Register
1 1 1 1 1 1 1 1 ( − 110 ) 1 1 1 1 1 1 1 1 (25510 )
+ 1 1 1 1 1 1 1 1 ( − 110 ) + 1 1 1 1 1 1 1 1 (25510 )
1 1 1 1 1 1 1 1 carry 1 1 1 1 1 1 1 1 carry
1 1 1 1 1 1 1 0 (−210 ) 1 1 1 1 1 1 1 0 (25410 )
N: 1 Z : 0 C: 1 V : 0 N: 1 Z : 0 C: 1 V : 0
31 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Computer Arithmetic: Signed and Unsigned Numbers | Winter Term 2024/2025
Signed and Unsigned Numbers
Data Types
32 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Computer Arithmetic: Signed and Unsigned Numbers | Winter Term 2024/2025
Signed and Unsigned Numbers
Hex Dumps
hex dump or hex editor allow to inspect and edit raw data
first column: displacement (offset) in file or memory in hexadecimal format
second column: raw data in hexadecimal format, 16 bytes per row
third column (optional): ASCII character translation
1 00000000: 4761 6c6c 6961 2065 7374 206f 6d6e 6973 Gallia est omnis
2 00000010: 2064 6976 6973 6120 696e 2070 6172 7465 divisa in parte
3 00000020: 7320 7472 6573 2c20 7175 6172 756d 2075 s tres, quarum u
4 00000030: 6e61 6d20 696e 636f 6c75 6e74 2042 656c nam incolunt Bel
5 00000040: 6761 652c 2061 6c69 616d 2041 7175 6974 gae, aliam Aquit
6 00000050: 616e 692c 2074 6572 7469 616d 2071 7569 ani, tertiam qui
7 00000060: 2069 7073 6f72 756d 206c 696e 6775 6120 ipsorum lingua
8 00000070: 4365 6c74 6165 2c20 6e6f 7374 7261 2047 Celtae, nostra G
9 00000080: 616c 6c69 2061 7070 656c 6c61 6e74 7572 alli appellantur
10 00000090: 2e20 4869 206f 6d6e 6573 206c 696e 6775 . Hi omnes lingu
11 000000a0: 612c 2069 6e73 7469 7475 7469 732c 206c a, institutis, l
12 000000b0: 6567 6962 7573 2069 6e74 6572 2073 6520 egibus inter se
13 000000c0: 6469 6666 6572 756e 742e 2047 616c 6c6f differunt. Gallo
14 ...
33 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Computer Arithmetic: Signed and Unsigned Numbers | Winter Term 2024/2025
Signed and Unsigned Numbers
Data Types
34 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Computer Arithmetic: Signed and Unsigned Numbers | Winter Term 2024/2025
Signed and Unsigned Numbers
Data Types
35 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Computer Arithmetic: Signed and Unsigned Numbers | Winter Term 2024/2025
Signed and Unsigned Numbers
GPS Week Number Rollover
36 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Computer Arithmetic: Signed and Unsigned Numbers | Winter Term 2024/2025
Signed and Unsigned Numbers
Y2K22
37 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Computer Arithmetic: Signed and Unsigned Numbers | Winter Term 2024/2025
Signed and Unsigned Numbers
Saturation
normal behavior
overflow / underflow silently tolerated → often incorrect result
cause traps for exception handling → expensive
often it is enough to continue the calculating with the highest (overflow) or lowest (underflow) value
example
z = (a + b)/2
a = 7, b = 9 → z = (7 + 9)/2 = 16/2 = 8
calculation with 4-bit unsigned integer
38 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Computer Arithmetic: Signed and Unsigned Numbers | Winter Term 2024/2025
Signed and Unsigned Numbers
Saturation
39 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Computer Arithmetic: Signed and Unsigned Numbers | Winter Term 2024/2025
Foundations of
Computer Science
Chapter 2.3 Computer Arithmetic
Fixed-Point Numbers
Prof. Dr.-Ing. Richard Membarth
41 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Computer Arithmetic: Fixed-Point Numbers | Winter Term 2024/2025
Fixed-Point Numbers
Q Number Format
42 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Computer Arithmetic: Fixed-Point Numbers | Winter Term 2024/2025
Fixed-Point Numbers
Q Number Format
010101000011 010101000011
0110 1.25 1.00 0.75 0010 0110 1.25 1.00 0.75 0010
1.50 0.50 1.50 0.50
0111 0001 0111 0001
1.75 0.25 1.75 0.25
4 bit 4 bit
1000 2.00 0.00 0000 1000 -2.00 0.00 0000
UQ2.2 Q2.2
2.25 3.75 -1.75 -0.25
1001 1111 1001 1111
2.50 3.50 -1.50 -0.50
43 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Computer Arithmetic: Fixed-Point Numbers | Winter Term 2024/2025
Fixed-Point Numbers
Horner’s Scheme
Horner’s scheme
p.101 (b) = ((1/b + 0)/b + 1)/b
p101. (b) = (1 · b + 0) · b + 1
44 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Computer Arithmetic: Fixed-Point Numbers | Winter Term 2024/2025
Fixed-Point Numbers
Division with Remainder Method
45 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Computer Arithmetic: Fixed-Point Numbers | Winter Term 2024/2025
Fixed-Point Numbers
Division with Remainder Method
0.1 · 2 = 0.2
0.2 · 2 = 0.4
0.4 · 2 = 0.8
0.8 · 2 = 1.6
0.6 · 2 = 1.2
0.2 · 2 = 0.4
0.110 = 0.000112
a finite number of digits in the source system can mean an infinite number of digits in the target system
→ only an approximate representation in the target system is then possible!
46 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Computer Arithmetic: Fixed-Point Numbers | Winter Term 2024/2025
Fixed-Point Numbers
Q Number Format
Self Check
The binary representation of a fixed-point number is given as 1011001. Specify the decimal representation of this number if
the number uses the UQ3.4 format
the number uses the Q3.4 format
47 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Computer Arithmetic: Fixed-Point Numbers | Winter Term 2024/2025
Foundations of
Computer Science
Chapter 2.4 Computer Arithmetic
Floating-Point Numbers
Prof. Dr.-Ing. Richard Membarth
integer and fixed-point numbers do not represent very small or very large number well
1 exaFLOP/s: 1 · 1018 floating-point operations per second (FLOP/s)
electron mass: 9.1093837015(28) × 10−31 kg
scientific notation: 6.0210 × 1023 11101.0112 × 2−1001
n = significand × baseexponent
1.2345 = 12345 × 10−4
base: 10, significand: 12345, exponent: −4
representation of floating-point numbers is defined by the IEEE-754 norm
(−1)s × (1 + f ) × 2e
s: sign, m: mantissa, e: exponent
49 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Computer Arithmetic: Floating-Point Numbers | Winter Term 2024/2025
Floating-Point Numbers
Representation
general form
±1.sssss2 × 2eeee or (−1)s × (1 + f ) × 2e
sign (s): uses sign-magnitude representation, 0 for positive, 1 for negative
mantissa (f): uses binary number representation for fractional part
1 + f is called significand
normalized if exactly “1” is in front of the point
1.1011011: normalized mantissa (1 ≤ 1 + f < 2)
otherwise denormalized
0.0011011: denormalized mantissa
111.1110111: denormalized mantissa
IEEE-754 standard requires normalized mantissa
⇒ the msb of the mantissa is always 1 and is *not* stored (hidden bit)
exponent (e): uses biased binary number representation for exponent
50 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Computer Arithmetic: Floating-Point Numbers | Winter Term 2024/2025
Floating-Point Numbers
Exponent
smallest representable non-negative number is approximately 2.0 × 10−38 (float) and 2.0 × 10−308 (double)
largest representable number is approximately 2.0 × 1038 (float) and 2.0 × 10308 (double)
what if the number is outside of that range?
1 . 0 1 1 0 ×2127 1 . 0 0 1 1 ×2−124
+ 1 . 1 1 0 1 ×2127 − 1 . 0 0 0 1 ×2−124
1 1 1 carry carry
1 1 . 0 0 1 1 ×2127 0 . 0 0 1 0 ×2−124
34.510 = 100010.12
100010.12 × 20 = 1.0001012 × 25
s: 0 m: 1.0001012 e: 510 + 12710 = 13210 = 100001002
fp32 0 1 0 0 0 0 1 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
53 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Computer Arithmetic: Floating-Point Numbers | Winter Term 2024/2025
Floating-Point Numbers
Conversion
fp32 1 1 0 0 0 0 1 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1|10000100|01010000000000000000000
0|00011011|00100111100111011100110
n = (−1)0 × (20 + 2−3 + 2−6 + 2−7 + 2−8 + 2−9 + 2−12 + 2−13 + 2−14 + 2−16 + 2−17 + 2−18 + 2−21 + 2−22 ) × 227−127
= 1 × (20 + 2−3 + 2−6 + 2−7 + 2−8 + 2−9 + 2−12 + 2−13 + 2−14 + 2−16 + 2−17 + 2−18 + 2−21 + 2−22 ) × 2−100
= (2−100 + 2−97 + 2−94 + 2−93 + 2−92 + 2−91 + 2−88 + 2−87 + 2−86 + 2−84 + 2−83 + 2−82 + 2−79 + 2−78 )
= · · · = 9.1093837015 × 10−31
54 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Computer Arithmetic: Floating-Point Numbers | Winter Term 2024/2025
Floating-Point Numbers
Special Numbers
the value range of floating-point numbers does not include 0, ±∞, and NaN
for those, special encodings are reserved
0: all bits for e and f set to 0
0x00000000 = 0
0x80000000 = 0
∞: all bits for e set to 1 and for f to 0, the sign bit indicates +∞ or −∞
0x7F 800000 = +∞
0xFF 800000 = −∞
NaN: all bits for e and f set to 1
0x7FFFFFFF = NaN
0xFFFFFFFF = NaN
55 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Computer Arithmetic: Floating-Point Numbers | Winter Term 2024/2025
Floating-Point Numbers
Value Range
−∞ 0 +∞
−∞
−63 0 +∞
64
56 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Computer Arithmetic: Floating-Point Numbers | Winter Term 2024/2025
Floating-Point Numbers
Addition
1 0 . 0 0 0 0 1 1 0 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 ×26
1 . 0 0 0 0 0 1 1 0 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 ×27 normalize
1 . 0 0 0 0 0 1 1 0 1 0 0 1 1 0 0 1 1 0 0 1 1 0 1 ×27 round
fp32 1 1 0 0 0 0 1 1 0 0 0 0 0 0 1 1 0 1 0 0 1 1 0 0 1 1 0 0 1 1 0 1
57 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Computer Arithmetic: Floating-Point Numbers | Winter Term 2024/2025
Floating-Point Numbers
Addition
1 . 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 ×22 1 . 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 ×22
+ 0 . 1 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ×22
− 1 . 0 1 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ×22 1 1 1 1 1 1 1 0 carry
0 . 0 0 0 0 0 0 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 ×22
1 . 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 0 0 0 0 0 ×2−5 normalize
1 . 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 0 0 0 0 0 ×2−5 round
fp32 0 0 1 1 1 1 0 1 0 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 0 0 0 0 0
58 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Computer Arithmetic: Floating-Point Numbers | Winter Term 2024/2025
Floating-Point Numbers
Rounding
59 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Computer Arithmetic: Floating-Point Numbers | Winter Term 2024/2025
Floating-Point Numbers
Multiplication, Division
multiplication
multiply mantissas
add exponents
division
divide mantissas
subtract exponents
60 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Computer Arithmetic: Floating-Point Numbers | Winter Term 2024/2025
Floating-Point Numbers
Recap: Boolean Algebra
commutativity
x ∧y =y ∧x
x ∨y =y ∨x
associativity
(x ∧ y ) ∧ z = x ∧ (y ∧ z )
(x ∨ y ) ∨ z = x ∨ (y ∨ z )
distributivity
x ∧ (y ∨ z ) = x ∧ y ∨ x ∧ z
x ∨ (y ∧ z ) = x ∨ y ∧ x ∨ z
De Morgan’s law
x ∧y =x ∨y
x ∨y =x ∧y
61 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Computer Arithmetic: Floating-Point Numbers | Winter Term 2024/2025
Floating-Point Numbers
Associativity
x = −1.100000 × 2100
y = +1.100000 × 2100
z = +1.000000
what is x + (y + z) ?
y + z : 1.1000000 × 2100 + 0.0000000 × 2100 = 1.1000000 × 2100 → x + (y + z ) = 0.0
what is (x + y) + z ?
x + y = 0.0 → (x + y ) + z = 1.0
x + (y + z ) ̸= (x + y ) + z → floating-point addition is *not* associative!
floating-point addition is *not* distributive: x · (y + z ) ̸= (x · y ) + (x · z )
floating-point addition is commutative: x + y = y + x and x × y = y × x
what does that mean when we add x1 + x2 + · · · + xn in parallel?
62 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Computer Arithmetic: Floating-Point Numbers | Winter Term 2024/2025
Floating-Point Numbers
Endianness
IEEE-754 defines the representation of 16-bit, 32-bit, and 64-bit floating-point numbers, but not the order in which the
4 or 8 bytes of a number are stored
the endianness of a processor defines the order it stores the bytes of a word (byte sex)
little-endian: the least significant bit is stored at the lowest memory address
big-endian: the most significant bit is stored at the lowest memory address
some processors support big- and little-endian (e.g., ARM, MIPS)
memory addresses
fp32 0 0 1 1 1 1 0 1 0 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 0 0 0 0 0
3 D 1 9 9 9 8 016
63 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Computer Arithmetic: Floating-Point Numbers | Winter Term 2024/2025
Floating-Point Numbers
Patriot Failure at Dhahran
64 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Computer Arithmetic: Floating-Point Numbers | Winter Term 2024/2025
Floating-Point Numbers
IEEE-754 Floating-Point Formats
65 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Computer Arithmetic: Floating-Point Numbers | Winter Term 2024/2025
Floating-Point Numbers
New Floating-Point Formats
Self Check
We consider the values that can be represented using FP4 (E2M1) with 1 bit for the sign, 2 bits for the exponent, and 1 bit for the
mantissa. List the values that can be represented if
encodings are reserved for 0, ∞, and NaN
encodings are only reserved for 0
67 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Computer Arithmetic: Floating-Point Numbers | Winter Term 2024/2025