0% found this document useful (0 votes)

40 views67 pages

02 Computer Arithmetic

Chapter 2 of 'Foundations of Computer Science' covers Computer Arithmetic, focusing on numeral systems, signed and unsigned numbers, fixed-point, and floating-point representations. It explains various numeral systems, including polyadic systems and binary, octal, decimal, and hexadecimal representations, along with methods for number conversion. Additionally, it discusses the encoding of negative numbers using sign-magnitude, one’s complement, and two’s complement representations.

Uploaded by

lucky25hada

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

40 views67 pages

02 Computer Arithmetic

Uploaded by

lucky25hada

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Foundations of

Computer Science
Chapter 2 Computer Arithmetic

Prof. Dr.-Ing. Richard Membarth

October 16th 2024

Contents

2. Computer Arithmetic

2.1 Numeral Systems

2.2 Signed and Unsigned Numbers

2.3 Fixed-Point Numbers

2.4 Floating-Point Numbers

2 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Computer Arithmetic | Winter Term 2024/2025
Foundations of
Computer Science
Chapter 2.1 Computer Arithmetic
Numeral Systems
Prof. Dr.-Ing. Richard Membarth

October 16th 2024

Numeral Systems

artificial system to quantify amount of elements in a set

consists of
set of symbols, a.k.a. digits
rules to combine digits

set of elements representation of size

7 decimal system
VII Roman system
Maya system

25
||||| || line (unary) system

4 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Computer Arithmetic: Numeral Systems | Winter Term 2024/2025
Numeral Systems
Classification

numeral systems

polyadic system
sign-value notation hybrid systems
positional system

Roman system unary system Maya system decimal system binary system
... ...

2
LXXX ||||| || 42 101

5 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Computer Arithmetic: Numeral Systems | Winter Term 2024/2025
Numeral Systems
Polyadic System

Polyadic System

The polyadic system is a numeral system in which the contribution of a digit to the value of a number depends on its position. A
polyadic system to the base b is a system in which a number is formed as a sum of powers of b.
n
X
A natural number N can be represented as a sum of powers: N = ai · bi
i =0

with b as base of the number system: b ∈ N, b ≥ 2

and set of b digits (symbols!) ∈ {0, 1, . . . , b − 1}
any number can be represented uniquely in the base-b numeral system
representation of a natural number N
N = (an an−1 · · · a1 a0 )b

6 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Computer Arithmetic: Numeral Systems | Winter Term 2024/2025
Numeral Systems
Polyadic System

system base symbols used by humans used in computers

decimal 10 0, 1, . . . , 9 yes no
binary 2 0, 1 no yes
octal 8 0, 1, . . . , 7 no no
hexadecimal 16 0, 1, . . . , 9, A, B, . . . , F no no

decimal system: b = 10(10) = (10)10 = 1010 = 10ten

example: 1999
binary system: b = 210
example: 00102 = 0b0010
octal system: b = 810
example: 1000768
hexadecimal system: b = 1610
example: FF 0AC516 = 0xFF 0AC5

7 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Computer Arithmetic: Numeral Systems | Winter Term 2024/2025
Numeral Systems
Binary System

binary system
decimal system octal system binary system
b=2 102 101 100 82 81 80 24 23 22 21 20
b ∈ {0, 1} 0 0 0
polyadic representation 1 1 1
2 2 1 0
overflow to next position if value range is exhausted
3 3 1 1
can be efficiently implemented in hardware
4 4 1 0 0
(represented and processed) 5 5 1 0 1
basic numeral system for number processing in 6 6 1 1 0
7 7 1 1 1
digital systems
8 1 0 1 0 0 0
9 1 1 1 0 0 1
1 0 1 2 1 0 1 0
... ... ... ... ...
1 9 2 3 1 0 0 1 1
2 0 2 4 1 0 1 0 0
8 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Computer Arithmetic: Numeral Systems | Winter Term 2024/2025
Numeral Systems
Polyadic System

examples
decimal system: 376110
376110 = 3761 = 3 · 103 + 7 · 102 + 6 · 101 + 1 · 100
octal system: 37618
37618 = 3 · 83 + 7 · 82 + 6 · 81 + 1 · 80 = 2033
| {z }
number shown as decimal
binary system: 1001012
1001012 = 1 · 25 + 0 · 24 + 0 · 23 + 1 · 22 + 0 · 21 + 1 · 20 = |{z}
37
number shown as decimal
hexadecimal system 3E116
3E116 = 3 · 162 + 14 · 161 + 1 · 160 = |{z}
993
number shown as decimal

9 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Computer Arithmetic: Numeral Systems | Winter Term 2024/2025
Numeral Systems
Horner’s Scheme

n
X
b-adic representation defines a polynomial: N = ai · bi
i =0
polynomial Horner’s scheme
p(b) = a0 + a1 · b + a2 · b2 + · · · + an · bn p(b) = (· · · (an · b + an−1 ) · b + · · · ) · b + a0
multiplications, additions, and only multiplications and additions, no
exponentiations exponentiations
expensive to compute more efficient computation
problems with accuracy (for floating-point can be used to convert between different
numbers) positional numeral systems
1 p = 0 1 p = 0
2 for i in range(0, n): 2 for i in range(n-1, -1, -1):
3 e = 1 3 p = p * b + a[i]
4 for k in range(1, i):
5 e = e * b
6 p = p + a[i] * e

10 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Computer Arithmetic: Numeral Systems | Winter Term 2024/2025
Numeral Systems
Horner’s Scheme

number conversion using the Horner’s scheme

convert from a base b1 to b2 ((((1 · 2 + 1) · 2 + 0) · 2 + 1) · 2 + 0) · 2 + 1
computation happens in b2
(((3 · 2 + 0) · 2 + 1) · 2 + 0) · 2 + 1
example: 1101012 → 5310
((6 · 2 + 1) · 2 + 0) · 2 + 1
polynomial
(13 · 2 + 0) · 2 + 1
p110101 (b) = 1 · b0 + 0 · b1 + 1 · b2 + 0 · b3 + 1 · b4 + 1 · b5
26 · 2 + 1
Horner’s scheme
53
p110101 (b) = ((((1 · b + 1)· b + 0)· b + 1)· b + 0)· b + 1

11 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Computer Arithmetic: Numeral Systems | Winter Term 2024/2025
Numeral Systems
Division with Remainder Method

number conversion using successive division with remainder

Euclidean division 157/2 = 78 remainder 1
convert from a base b1 to b2
78/2 = 39 remainder 0
computation happens in b1
39/2 = 19 remainder 1
example: 15710 → 1001 11012
19/2 = 9 remainder 1
it is often easier to convert first to the decimal system and then
9/2 = 4 remainder 1
to the desired system
4/2 = 2 remainder 0

2/2 = 1 remainder 0

1/2 = 0 remainder 1

12 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Computer Arithmetic: Numeral Systems | Winter Term 2024/2025
Numeral Systems
Number Conversion

7258 to base 3 using the Horner’s scheme

78 = 213 ; 28 = 23 ; 58 = 123 ; 108 = 810 = 223
7258 = (213 · 223 + 23 ) · 223 + 123
213 · 223 = 20023
20023 + 23 = 20113
20113 · 223 = 1220123
1220123 + 123 = 1221013
889 to base 5 using division with remainder
889 /59 = 179 R 09
179 /59 = 39 R 19
39 /59 = 09 R 39
889 = 3105

13 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Computer Arithmetic: Numeral Systems | Winter Term 2024/2025
Numeral Systems
Conversion by Combination

if the base of a numeral system is the power of the base of another numeral system, numbers can be easily converted
combining several digits into a single digit
converting 1011 0101 00012 to hexadecimal
( 1011 0101 0001 )2
⇓ ⇓ ⇓
( B 5 1 )16
converting 5218 to binary
( 5 2 1 )8
⇓ ⇓ ⇓
( 101 010 001 )2

14 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Computer Arithmetic: Numeral Systems | Winter Term 2024/2025
Numeral Systems
Conversion of Digits

it is much easier to convert between numeral systems when considering each digits separately
this is not possible when converting from the decimal to the binary system
remedy: BCD codes
each decimal digit is represented by a 4-bit binary digit
4910 = 0100 1001BCD 8421

15 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Computer Arithmetic: Numeral Systems | Winter Term 2024/2025
Foundations of
Computer Science
Chapter 2.2 Computer Arithmetic
Signed and Unsigned Numbers
Prof. Dr.-Ing. Richard Membarth

October 17th 2024

Signed and Unsigned Numbers
Negative Numbers

negative numbers are prefixed by minus sign: 2510 , −2510 , −10102 , +10102 , · · ·
how should we encode the numbers / sign?
different types of signed binary number representations
sign-magnitude
one’s complement
two’s complement
elementary arithmetic supported in all systems: +, −, ∗, /
technical realization: costs (area) and performance (throughput) are important

17 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Computer Arithmetic: Signed and Unsigned Numbers | Winter Term 2024/2025
Signed and Unsigned Numbers
Negative Numbers

sign-magnitude
010101000011
simple approach 0110 5 4 3 0010
use sign bit with magnitude 6 2
0111 0001
7 1
left-most bit (most significant bit) indicates sign 4 bit
1000 -0 0 0000
0: positive number -7
-1
1001 1111
1: negative number -2 -6
1010 -3 -4 -5 1110
signed magnitude
101111001101
↑ | {z } 2 = 1210
0 0001100 | {z } 2 = −1210
1 0001100
↑
| magnitude | magnitude
sign bit sign bit
zero represented twice
000000002 (+0)
100000002 (−0)

18 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Computer Arithmetic: Signed and Unsigned Numbers | Winter Term 2024/2025
Signed and Unsigned Numbers
Negative Numbers

one’s complement
010101000011
obtained by inverting all bits 0110 5 4 3 0010
4310 = 001010112 → −4310 = 110101002 6 2
0111 0001
7 1
range of signed numbers: −(2N −1 − 1) · · · 2N −1 − 1 4 bit
1000 -7 0 0000
8-bit range: −127 · · · + 127
-6 -0
1001 1111
zero represented twice -5 -1
1010 -4 -3 -2 1110
000000002 (+0)
101111001101
111111112 (−0)

19 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Computer Arithmetic: Signed and Unsigned Numbers | Winter Term 2024/2025
Signed and Unsigned Numbers
Negative Numbers

two’s complement
010101000011
obtained by inverting all bits and adding one 0110 5 4 3 0010
4310 = 001010112 → −4310 = 110101012 6 2
0111 0001
7 1
range of signed numbers: −(2N −1 ) · · · 2N −1 − 1 4 bit
1000 -8 0 0000
8-bit range: −128 · · · + 127
-7 -1
1001 1111
zero represented once -6 -2
1010 -5 -4 -3 1110
000000002
101111001101
converting to decimal
−27 +26 +24 +22 +20 = −128+64+16+4+1 = −43

20 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Computer Arithmetic: Signed and Unsigned Numbers | Winter Term 2024/2025
Signed and Unsigned Numbers
Logical Operators

the execution unit of a processor supports different operations on binary numbers

arithmetic operations: +, −, ∗, /
logical operations: NOT, AND, OR, XOR
shift operations: shift left, shift right
logical operations work bitwise
bits are updated in parallel and independent of each other
no polyadic system, no carry-over

operation symbol C syntax meaning

NOT a ¬ ~a invert all bits of a (one’s complement)

a AND b ∧ a & b logical AND operation on each pair of corresponding bits
a OR b ∨ a | b logical OR operation on each pair of corresponding bits
a XOR b ⊕ a ^ b logical XOR operation on each pair of corresponding bits

21 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Computer Arithmetic: Signed and Unsigned Numbers | Winter Term 2024/2025
Signed and Unsigned Numbers
Logical Operators

a ¬a a b a∧b a b a∨b a b a⊕b

0 1 0 0 0 0 0 0 0 0 0
1 0 0 1 0 0 1 1 0 1 1
1 0 0 1 0 1 1 0 1
1 1 1 1 1 1 1 1 0

a = 01010101 a = 01010101 a = 01010101 a = 01010101

b = 11110000 b = 11110000 b = 11110000

¬a = 10101010 a ∧ b = 01010000 a ∨ b = 11110101 a ⊕ b = 10100101

22 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Computer Arithmetic: Signed and Unsigned Numbers | Winter Term 2024/2025
Signed and Unsigned Numbers
Logical Operators

the logical operators !, &&, and || in C are wordwise logic operators

they combine n-ary binary numbers and return a n-digit zero or one
the bitwise logical operators ~, &, |, and ^ operate on all bits simultaneously
they return n single digit zeros or ones

(5 & (-9)) | 2 -> 7 (5 && (-9)) || 2 -> 1

5 00000101 5 00000101

−9 11110111 −9 11110111

5&(−9) 00000101 5&&(−9) 00000001

2 00000010 2 00000010

(5&(−9))|2 00000111 (5&&(−9))|2 00000001

23 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Computer Arithmetic: Signed and Unsigned Numbers | Winter Term 2024/2025
Signed and Unsigned Numbers
Arithmetic

addition addition can be realized as known from school

the binary system requires only a few rules
0 0 0 0 0 1 0 0 0 1 0 1 (6910 )
1) 0 + 0 = 0 + 0 0 0 0 0 0 0 0 1 1 0 0 (1210 )
2) 0 + 1 = 1 1 1 carry

3) 1 + 0 = 1 0 0 0 0 0 1 0 1 0 0 0 1 (8110 )

4) 1 + 1 = 0 (with a carry-over of 1)
two’s complement: carry beyond the msb is
ignored

24 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Computer Arithmetic: Signed and Unsigned Numbers | Winter Term 2024/2025
Signed and Unsigned Numbers
Arithmetic

subtraction subtraction can be realized as known from school

the binary system requires only a few rules
0 0 0 0 0 1 0 0 0 1 0 1 (6910 )
1) 0 − 0 = 0 − 0 0 0 0 0 0 0 0 1 1 0 0 (1210 )
2) 0 − 1 = 1 (with a borrow of 1) 1 1 borrow

3) 1 − 0 = 1 0 0 0 0 0 0 1 1 1 0 0 1 (5710 )

4) 1 − 1 = 0
two’s complement: subtraction via addition

25 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Computer Arithmetic: Signed and Unsigned Numbers | Winter Term 2024/2025
Signed and Unsigned Numbers
Arithmetic

subtraction: using two’s complement

overflow when taking the two’s complement of 0 is 0 0 0 0 0 1 0 0 0 1 0 1 (6910 )
ignored + 1 1 1 1 1 1 1 1 0 1 0 0 (−1210 )
1 1 1 1 1 1 1 carry
how can we get the value of a negative number
0 0 0 0 0 0 1 1 1 0 0 1 (5710 )
using two’s complement?
(x + ¬x ) + 1 = 0
¬x + 1 = −x
0 0 0 0 0 0 0 0 1 1 0 0 (1210 )
1111 1100 01112 →
+ 1 1 1 1 1 0 1 1 1 0 1 1 (−6910 )
0000 0011 10012 → −5710 1 1 1 carry

1 1 1 1 1 1 0 0 0 1 1 1 (−5710 )

26 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Computer Arithmetic: Signed and Unsigned Numbers | Winter Term 2024/2025
Signed and Unsigned Numbers
Arithmetic

multiplication
can be realized by shifting and summing up
multiplying n-bit number with m-bit number results in n+m-bit number

1 1 0 1 (1310 )
× 1 0 1 (510 )
1 1 0 1
+ 0 0 0 0
1 1 0 1
+ 1 1 0 1
1 0 0 0 0 0 1 (6510 )

27 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Computer Arithmetic: Signed and Unsigned Numbers | Winter Term 2024/2025
Signed and Unsigned Numbers
Arithmetic

division
using Euclidean’s algorithm 1 1 0 1 / 1 0 1 = 10 (1310 /510 = 210 )
− 1 0 1 0
1 # c = n / d, remainder r
2 r = n 0 1 1
3 c = 0 − 0 0 0
4 while r >= d:
5 r = r - d 1 1 remainder (310 )
6 c = c + 1

28 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Computer Arithmetic: Signed and Unsigned Numbers | Winter Term 2024/2025
Signed and Unsigned Numbers
Adder

circuit for binary addition A3 B3 A2 B2 A1 B1 A0 B0

nested for multiple bits
full adder required for more than 2 bits
+ + + +

S3 C3 S2 C2 S1 C1 S0

29 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Computer Arithmetic: Signed and Unsigned Numbers | Winter Term 2024/2025
Signed and Unsigned Numbers
Status Register

the adder sets several status registers for the next instruction or software
N negative (sign) flag: indicates that the result is negative
Z zero flag: indicates that the result is zero
C carry flag: allows to carry a binary digit from a less significant word to the LSB of a more significant word
V overflow flag: indicates that the signed result of an operation is too large to fit in the register width
the carry flag can be used to indicate overflow for unsigned addition
the overflow flag is set if the sign of the summands are equal and different from the sign of the result

0 1 1 1 1 1 1 1 (12710 ) 1 0 0 1 1 0 0 1 ( − 10310 )
+ 0 0 0 0 0 0 0 1 (110 ) + 1 0 0 0 1 1 1 1 ( − 11310 )
1 1 1 1 1 1 1 carry 1 1 1 1 1 1 carry

1 0 0 0 0 0 0 0 (−12810 ) 0 0 1 0 1 0 0 0 (4010 )
N: 1 Z : 0 C: 0 V : 1 N: 0 Z : 0 C: 1 V : 1

8-bit signed integer (two’s complement) 8-bit signed integer (two’s complement)
30 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Computer Arithmetic: Signed and Unsigned Numbers | Winter Term 2024/2025
Signed and Unsigned Numbers
Status Register

1 1 1 1 1 1 1 1 ( − 110 ) 1 1 1 1 1 1 1 1 (25510 )
+ 1 1 1 1 1 1 1 1 ( − 110 ) + 1 1 1 1 1 1 1 1 (25510 )
1 1 1 1 1 1 1 1 carry 1 1 1 1 1 1 1 1 carry

1 1 1 1 1 1 1 0 (−210 ) 1 1 1 1 1 1 1 0 (25410 )
N: 1 Z : 0 C: 1 V : 0 N: 1 Z : 0 C: 1 V : 0

8-bit signed integer (two’s complement) 8-bit unsigned integer

31 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Computer Arithmetic: Signed and Unsigned Numbers | Winter Term 2024/2025
Signed and Unsigned Numbers
Data Types

what does 100001002 represent? 1 char c; 1 int8_t c;

2 short s; 2 int16_t s;
8410 ? −410 ? z ? 3 int i; 3 int32_t i;
4 unsigned int u; 4 uint32_t u;
we need to specify the data type!
languages like Python do not expose data types finite arithmetic: overflow and underflow
C/C++ allows to specify data types

C type stdint type bits sign range

char uint8_t 8 unsigned 0 . . . 255
signed char int8_t 8 signed -128 . . . 127
unsigned short uint16_t 16 unsigned 0 . . . 65,535
short int16_t 16 signed -32,768 . . . 32,767
unsigned int uint32_t 32 unsigned 0 . . . 4,294,967,295
int int32_t 32 signed -2,147,483,648 . . . 2,147,483,647
unsigned long long uint64_t 64 unsigned 0 . . . 18,446,744,073,709,551,615
long long int64_t 64 signed -9,223,372,036,854,775,808 . . . 9,223,372,036,854,775,807

32 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Computer Arithmetic: Signed and Unsigned Numbers | Winter Term 2024/2025
Signed and Unsigned Numbers
Hex Dumps

hex dump or hex editor allow to inspect and edit raw data
first column: displacement (offset) in file or memory in hexadecimal format
second column: raw data in hexadecimal format, 16 bytes per row
third column (optional): ASCII character translation

1 00000000: 4761 6c6c 6961 2065 7374 206f 6d6e 6973 Gallia est omnis
2 00000010: 2064 6976 6973 6120 696e 2070 6172 7465 divisa in parte
3 00000020: 7320 7472 6573 2c20 7175 6172 756d 2075 s tres, quarum u
4 00000030: 6e61 6d20 696e 636f 6c75 6e74 2042 656c nam incolunt Bel
5 00000040: 6761 652c 2061 6c69 616d 2041 7175 6974 gae, aliam Aquit
6 00000050: 616e 692c 2074 6572 7469 616d 2071 7569 ani, tertiam qui
7 00000060: 2069 7073 6f72 756d 206c 696e 6775 6120 ipsorum lingua
8 00000070: 4365 6c74 6165 2c20 6e6f 7374 7261 2047 Celtae, nostra G
9 00000080: 616c 6c69 2061 7070 656c 6c61 6e74 7572 alli appellantur
10 00000090: 2e20 4869 206f 6d6e 6573 206c 696e 6775 . Hi omnes lingu
11 000000a0: 612c 2069 6e73 7469 7475 7469 732c 206c a, institutis, l
12 000000b0: 6567 6962 7573 2069 6e74 6572 2073 6520 egibus inter se
13 000000c0: 6469 6666 6572 756e 742e 2047 616c 6c6f differunt. Gallo
14 ...
33 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Computer Arithmetic: Signed and Unsigned Numbers | Winter Term 2024/2025
Signed and Unsigned Numbers
Data Types

type conversions might require an extension of the operand length

the sign of signed operands needs to be preserved
unsigned operands are extended by leading zeros
signed operands copy the sign bit to the leading positions

1 unsigned char c = 0x80; 1 char c = 0x7F; 1 unsigned char c = 0x80;

2 unsigned char d = 0x80; 2 char d = 0x80; 2 char d = 0x80;
3 printf("%08X", c + d); 3 printf("%08X", c + d); 3 printf("%08X", c + d);

1 c: 00000080 1 c: 0000007F 1 c: 00000080

2 d: 00000080 2 d: FFFFFF80 2 d: FFFFFF80
3 -> 00000100 3 -> FFFFFFFF 3 -> 00000000

34 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Computer Arithmetic: Signed and Unsigned Numbers | Winter Term 2024/2025
Signed and Unsigned Numbers
Data Types

1 #include <stdio.h> overflow / underflow

2
3 void main() { addition / subtraction can overflow
4 int x = 2;
multiplication can overflow
5 for (int i = 0; i < 5; i++) {
6 x = x * x; precision loss
7 printf("%d, ", x);
division can lose precision (rounding, truncation)
8 }
9 printf("\n");
10 }

35 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Computer Arithmetic: Signed and Unsigned Numbers | Winter Term 2024/2025
Signed and Unsigned Numbers
GPS Week Number Rollover

GPS broadcasts also a time, starting from January 6th, 1980

10-bit week number
19-bit seconds-into-week
week number counter range: 0–1023
integer overflow causes the counter to roll over every 1024 weeks or ~19.6 years
1st rollover August 21st to 22nd, 1999
2nd rollover April 6th to 7th, 2019
3rd rollover November 20th to 21st, 2038
...
many devices rely on GPS time, resulting in problems
Honeywell’s flight management and navigation software
New York City Wireless Network (NYCWiN)
GPS navigation devices, mobile phones (Apple), car time (Honda, Acura), . . .

36 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Computer Arithmetic: Signed and Unsigned Numbers | Winter Term 2024/2025
Signed and Unsigned Numbers
Y2K22

dates that are stored in the format yymmddHHMM

31.12.2021, 23:59 → 2112312359 → 01111101 11100111 01010100 00100111
01.01.2022, 00:00 → 2201010000 → 10000011 00110000 10111111 01010000
problem when using 32-bit signed integer to represent dates
Microsoft Exchange was affected by this
malware-scanning component used 32-bit signed integer to check for latest update
became known as Y2K22 bug

37 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Computer Arithmetic: Signed and Unsigned Numbers | Winter Term 2024/2025
Signed and Unsigned Numbers
Saturation

normal behavior
overflow / underflow silently tolerated → often incorrect result
cause traps for exception handling → expensive
often it is enough to continue the calculating with the highest (overflow) or lowest (underflow) value
example
z = (a + b)/2
a = 7, b = 9 → z = (7 + 9)/2 = 16/2 = 8
calculation with 4-bit unsigned integer

correct overflow saturated

a 0111 0111 0111
b 1001 1001 1001
a+b 10000 0000 1111
(a + b) / 2 1000 0000 0111

38 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Computer Arithmetic: Signed and Unsigned Numbers | Winter Term 2024/2025
Signed and Unsigned Numbers
Saturation

Ariane 5 (V88) (1996)

overflow during converting a 64-bit floating-point value to 16-bit integer
maximum thrust, then self-destruction of the rocket (39s after launch)
loss: 290 mio Euro
root cause: wrong specification (mistakenly copied over by Ariane 4)
[Link]

39 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Computer Arithmetic: Signed and Unsigned Numbers | Winter Term 2024/2025
Foundations of
Computer Science
Chapter 2.3 Computer Arithmetic
Fixed-Point Numbers
Prof. Dr.-Ing. Richard Membarth

October 24th 2024

Fixed-Point Numbers
Numeral Systems

Recap: Polyadic System

n
X
A natural number N can be represented as a sum of powers: N = ai · bi
i =0

can we also represent non-integer values?

n
X
also use negative values for i: R = ai · b i
i =−m
representation of rational number R
R = (an an−1 an−2 · · · a1 a0 . a−1 a−2 · · · a−m )b
| {z } | {z }
integer part of R fractional part of R
example
n = 2; m = 5; R = 011.110012
011.110012 = 1 · 21 + 1 · 20 + 1 · 2−1 + 1 · 2−2 + 1 · 2−5 = 2 + 1 + 0.5 + 0.25 + 0.03125 = 3.7812510

41 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Computer Arithmetic: Fixed-Point Numbers | Winter Term 2024/2025
Fixed-Point Numbers
Q Number Format

specifies the parameters of a binary fixed-point number

n number of bits before binary point
m number of bits after binary point
Qn.m signed fixed-point number stored using two’s complement
UQn.m unsigned fixed-point number
Arm Q-format documentation
[Link]
example
011.110012 = 3.7812510 uses the Q number format UQ3.5 or Q3.5
11.110012 = 3.7812510 uses the Q number format UQ3.5

42 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Computer Arithmetic: Fixed-Point Numbers | Winter Term 2024/2025
Fixed-Point Numbers
Q Number Format

unsigned fixed-point number signed fixed-point number using two’s complement

example: 4-bit in UQ2.2 format example: 4-bit in Q2.2 format

010101000011 010101000011
0110 1.25 1.00 0.75 0010 0110 1.25 1.00 0.75 0010
1.50 0.50 1.50 0.50
0111 0001 0111 0001
1.75 0.25 1.75 0.25
4 bit 4 bit
1000 2.00 0.00 0000 1000 -2.00 0.00 0000
UQ2.2 Q2.2
2.25 3.75 -1.75 -0.25
1001 1111 1001 1111
2.50 3.50 -1.50 -0.50

1010 2.75 3.00 3.25 1110 1010 -1.25 -1.00 -0.75

1110
101111001101 101111001101

fixed-point representation, the point is at a fixed position

+ hardware units for integer numbers can be used
− only fixed and limited value range

43 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Computer Arithmetic: Fixed-Point Numbers | Winter Term 2024/2025
Fixed-Point Numbers
Horner’s Scheme

number conversion using the Horner’s scheme

convert integer and fractional parts separately Rf = ((1/2 + 0)/2 + 1)/2
fractional part
= (0.5/2 + 1)/2
p(b) = (· · · ((a−m /b + a−m+1 ) /b + · · · ) /b + a−1 ) /b
= 1.25/2
integer part
= 0.625
p(b) = (· · · (an · b + an−1 ) · b + · · · ) · b + a0
Ri = (1 · 2 + 0) · 2 + 1
example: 101.1012 → 5.62510
=2·2+1
polynomial
p.101 (b) = 1 · b−3 + 0 · b−2 + 1 · b−1 =5

p101. (b) = 1 · b2 + 0 · b1 + 1 · b0 R = Ri .Rf = 5.625

Horner’s scheme
p.101 (b) = ((1/b + 0)/b + 1)/b
p101. (b) = (1 · b + 0) · b + 1

44 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Computer Arithmetic: Fixed-Point Numbers | Winter Term 2024/2025
Fixed-Point Numbers
Division with Remainder Method

convert integer and fractional parts separately

fractional part: successive multiplication with remainder
integer part: successive division with remainder
example: 12.687510 → 1100.10112
integer part → successive division fractional part → successive multiplication

12/2 = 6 remainder 0 0.6875 · 2 = 1.375

6/2 = 3 remainder 0 0.375 · 2 = 0.75

3/2 = 1 remainder 1 0.75 · 2 = 1.5

1/2 = 0 remainder 1 0.5 · 2 = 1.0

45 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Computer Arithmetic: Fixed-Point Numbers | Winter Term 2024/2025
Fixed-Point Numbers
Division with Remainder Method

successive division / multiplication with remainder

convert 0.110 to binary
fractional part → successive multiplication

0.1 · 2 = 0.2

0.2 · 2 = 0.4

0.4 · 2 = 0.8

0.8 · 2 = 1.6

0.6 · 2 = 1.2

0.2 · 2 = 0.4

0.110 = 0.000112
a finite number of digits in the source system can mean an infinite number of digits in the target system
→ only an approximate representation in the target system is then possible!

46 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Computer Arithmetic: Fixed-Point Numbers | Winter Term 2024/2025
Fixed-Point Numbers
Q Number Format

Self Check

The binary representation of a fixed-point number is given as 1011001. Specify the decimal representation of this number if
the number uses the UQ3.4 format
the number uses the Q3.4 format

47 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Computer Arithmetic: Fixed-Point Numbers | Winter Term 2024/2025
Foundations of
Computer Science
Chapter 2.4 Computer Arithmetic
Floating-Point Numbers
Prof. Dr.-Ing. Richard Membarth

October 30th 2024

Floating-Point Numbers

integer and fixed-point numbers do not represent very small or very large number well
1 exaFLOP/s: 1 · 1018 floating-point operations per second (FLOP/s)
electron mass: 9.1093837015(28) × 10−31 kg
scientific notation: 6.0210 × 1023 11101.0112 × 2−1001
n = significand × baseexponent
1.2345 = 12345 × 10−4
base: 10, significand: 12345, exponent: −4
representation of floating-point numbers is defined by the IEEE-754 norm
(−1)s × (1 + f ) × 2e
s: sign, m: mantissa, e: exponent

float s e: 8 bit m: 23 bit

double s e: 11 bit m: 52 bit

49 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Computer Arithmetic: Floating-Point Numbers | Winter Term 2024/2025
Floating-Point Numbers
Representation

general form
±1.sssss2 × 2eeee or (−1)s × (1 + f ) × 2e
sign (s): uses sign-magnitude representation, 0 for positive, 1 for negative
mantissa (f): uses binary number representation for fractional part
1 + f is called significand
normalized if exactly “1” is in front of the point
1.1011011: normalized mantissa (1 ≤ 1 + f < 2)
otherwise denormalized
0.0011011: denormalized mantissa
111.1110111: denormalized mantissa
IEEE-754 standard requires normalized mantissa
⇒ the msb of the mantissa is always 1 and is *not* stored (hidden bit)
exponent (e): uses biased binary number representation for exponent

50 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Computer Arithmetic: Floating-Point Numbers | Winter Term 2024/2025
Floating-Point Numbers
Exponent

floating-point numbers ≥ 1 have a positive exponent

floating-point numbers < 1 have a negative exponent
negative exponents are not represented as two’s complement
instead, a fixed offset 2n−1 − 1 (bias) that corresponds to half of the value range is added
float: 8-bit exponent: 2−126 · · · 2127 → excess-127
double: 11-bit exponent 2−1022 · · · 21023 → excess-1023
float e e+127 double e e+1023
0.0 0 0.0 0
-126 1 -1022 1
... ... ... ...
-1 126 -1 1022
0 127 0 1023
1 128 1 1024
... ... ... ...
127 254 1023 2046
∞ 255 ∞ 2047
51 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Computer Arithmetic: Floating-Point Numbers | Winter Term 2024/2025
Floating-Point Numbers
Value Range, Overflow, and Underflow

float s e: 8 bit m: 23 bit

double s e: 11 bit m: 52 bit

smallest representable non-negative number is approximately 2.0 × 10−38 (float) and 2.0 × 10−308 (double)
largest representable number is approximately 2.0 × 1038 (float) and 2.0 × 10308 (double)
what if the number is outside of that range?

1 . 0 1 1 0 ×2127 1 . 0 0 1 1 ×2−124
+ 1 . 1 1 0 1 ×2127 − 1 . 0 0 0 1 ×2−124
1 1 1 carry carry

1 1 . 0 0 1 1 ×2127 0 . 0 0 1 0 ×2−124

→ 1 . 1 0 0 1 ×2128 overflow! → 1 . 0 0 0 0 ×2−127 underflow!

exponent can be −126 · · · 127 for float

52 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Computer Arithmetic: Floating-Point Numbers | Winter Term 2024/2025
Floating-Point Numbers
Conversion

we can convert a rational number to IEEE-754 as follows

1. convert integer and fractional parts to binary system

2. normalize mantissa by shifting, adapt exponent
shift m by one position to the left → decrement e by one
shift m by one position to the right → increment e by one
3. combine s, e + offset, and m to build the IEEE-754 representation

34.510 = 100010.12
100010.12 × 20 = 1.0001012 × 25
s: 0 m: 1.0001012 e: 510 + 12710 = 13210 = 100001002

fp32 0 1 0 0 0 0 1 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

53 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Computer Arithmetic: Floating-Point Numbers | Winter Term 2024/2025
Floating-Point Numbers
Conversion

fp32 1 1 0 0 0 0 1 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

we can convert a IEEE-754 number to a rational decimal number as follows

(−1)s × (1 + fraction) × 2exponent −bias

1|10000100|01010000000000000000000

n = (−1)1 × (20 + 2−2 + 2−4 ) × 2132−127 = −1 × (20 + 2−2 + 2−4 ) × 25

= −(25 + 23 + 21 ) = −(32 + 8 + 2) = −42

0|00011011|00100111100111011100110

n = (−1)0 × (20 + 2−3 + 2−6 + 2−7 + 2−8 + 2−9 + 2−12 + 2−13 + 2−14 + 2−16 + 2−17 + 2−18 + 2−21 + 2−22 ) × 227−127

= 1 × (20 + 2−3 + 2−6 + 2−7 + 2−8 + 2−9 + 2−12 + 2−13 + 2−14 + 2−16 + 2−17 + 2−18 + 2−21 + 2−22 ) × 2−100
= (2−100 + 2−97 + 2−94 + 2−93 + 2−92 + 2−91 + 2−88 + 2−87 + 2−86 + 2−84 + 2−83 + 2−82 + 2−79 + 2−78 )
= · · · = 9.1093837015 × 10−31
54 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Computer Arithmetic: Floating-Point Numbers | Winter Term 2024/2025
Floating-Point Numbers
Special Numbers

−2128 −2−126 2−126 2128

−∞ 0 +∞

the value range of floating-point numbers does not include 0, ±∞, and NaN
for those, special encodings are reserved
0: all bits for e and f set to 0
0x00000000 = 0
0x80000000 = 0
∞: all bits for e set to 1 and for f to 0, the sign bit indicates +∞ or −∞
0x7F 800000 = +∞
0xFF 800000 = −∞
NaN: all bits for e and f set to 1
0x7FFFFFFF = NaN
0xFFFFFFFF = NaN

55 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Computer Arithmetic: Floating-Point Numbers | Winter Term 2024/2025
Floating-Point Numbers
Value Range

7-bit floating-point number

1 bit sign, 3 bits exponent, 3 bits mantissa
numbers non-uniformly distributed between [−15; +15]
dynamic range, not uniformly spaced
difference between two consecutive representable numbers varies with exponent

−∞ 0 +∞

7-bit integer number

two’s complement
numbers uniformly distributed between [−63; +64]

−∞
−63 0 +∞
64

56 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Computer Arithmetic: Floating-Point Numbers | Winter Term 2024/2025
Floating-Point Numbers
Addition

adding two floating-point numbers with the same sign

1. adapt the smaller exponent to the larger one
shift the mantissa correspondingly (denormalizes the mantissa)
2. add the mantissas
3. normalize and round the result if required
−87.8f = 1010111.11002 × 20 = 1.01011111002 × 26
−43.5f = 101011.12 × 20 = 1.0101112 × 25 = 0.10101112 × 26
1 . 0 1 0 1 1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 ×26
+ 0 . 1 0 1 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ×26
1 1 1 1 1 1 1 1 carry

1 0 . 0 0 0 0 1 1 0 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 ×26
1 . 0 0 0 0 0 1 1 0 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 ×27 normalize
1 . 0 0 0 0 0 1 1 0 1 0 0 1 1 0 0 1 1 0 0 1 1 0 1 ×27 round

fp32 1 1 0 0 0 0 1 1 0 0 0 0 0 0 1 1 0 1 0 0 1 1 0 0 1 1 0 0 1 1 0 1
57 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Computer Arithmetic: Floating-Point Numbers | Winter Term 2024/2025
Floating-Point Numbers
Addition

adding two floating-point numbers with different sign

1. adapt the smaller exponent to the larger one

2. the result has the sign of the operand with the larger absolute value
3. subtraction of the number with the smaller absolute value from the number with the larger absolute value: the two’s
complement of the number with the smaller absolute value will be added the number with the larger absolute value
4. normalize and round the result if required

5.610 − 5.562510 = 101.10012 × 20 − 101.10012 × 20 = 1.0110012 × 22 − 1.0110012 × 22

1 . 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 ×22 1 . 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 ×22
+ 0 . 1 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ×22
− 1 . 0 1 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ×22 1 1 1 1 1 1 1 0 carry

0 . 0 0 0 0 0 0 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 ×22
1 . 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 0 0 0 0 0 ×2−5 normalize
1 . 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 0 0 0 0 0 ×2−5 round

fp32 0 0 1 1 1 1 0 1 0 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 0 0 0 0 0
58 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Computer Arithmetic: Floating-Point Numbers | Winter Term 2024/2025
Floating-Point Numbers
Rounding

1.1000110 × 26 + 1.0111010 ×22

0 0 0 0 0 1 1 0 1 0 0 1 1 0 0 1 1 0 0 1 1 0 1 GR S
=1.1000110 × 26 + 0.00010111010×26
the hardware uses an extended mantissa
G R S
guard-bit
1 . 1 0 0 0 1 1 0 0 0 0 ×26
round-bit
+ 0 . 0 0 0 1 0 1 1 1 0 1 ×26
sticky-bit 1 1 1 1 carry

rounding considers the extended mantissa 1 . 1 0 1 0 0 0 1 1 0 1 ×26

59 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Computer Arithmetic: Floating-Point Numbers | Winter Term 2024/2025
Floating-Point Numbers
Multiplication, Division

multiplication
multiply mantissas
add exponents
division
divide mantissas
subtract exponents

60 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Computer Arithmetic: Floating-Point Numbers | Winter Term 2024/2025
Floating-Point Numbers
Recap: Boolean Algebra

commutativity
x ∧y =y ∧x
x ∨y =y ∨x
associativity
(x ∧ y ) ∧ z = x ∧ (y ∧ z )
(x ∨ y ) ∨ z = x ∨ (y ∨ z )
distributivity
x ∧ (y ∨ z ) = x ∧ y ∨ x ∧ z
x ∨ (y ∧ z ) = x ∨ y ∧ x ∨ z
De Morgan’s law
x ∧y =x ∨y
x ∨y =x ∧y

61 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Computer Arithmetic: Floating-Point Numbers | Winter Term 2024/2025
Floating-Point Numbers
Associativity

x = −1.100000 × 2100
y = +1.100000 × 2100
z = +1.000000
what is x + (y + z) ?
y + z : 1.1000000 × 2100 + 0.0000000 × 2100 = 1.1000000 × 2100 → x + (y + z ) = 0.0
what is (x + y) + z ?
x + y = 0.0 → (x + y ) + z = 1.0
x + (y + z ) ̸= (x + y ) + z → floating-point addition is *not* associative!
floating-point addition is *not* distributive: x · (y + z ) ̸= (x · y ) + (x · z )
floating-point addition is commutative: x + y = y + x and x × y = y × x
what does that mean when we add x1 + x2 + · · · + xn in parallel?

62 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Computer Arithmetic: Floating-Point Numbers | Winter Term 2024/2025
Floating-Point Numbers
Endianness

IEEE-754 defines the representation of 16-bit, 32-bit, and 64-bit floating-point numbers, but not the order in which the
4 or 8 bytes of a number are stored
the endianness of a processor defines the order it stores the bytes of a word (byte sex)
little-endian: the least significant bit is stored at the lowest memory address
big-endian: the most significant bit is stored at the lowest memory address
some processors support big- and little-endian (e.g., ARM, MIPS)
memory addresses

0x12345678 78 56 34 12 little-endian (e.g., Intel, RISC-V)

0x12345678 12 34 56 78 big-endian (e.g., Motorola, Sun)

0x12345678 34 12 78 56 mixed-endian (e.g., National Semiconductor)

fp32 0 0 1 1 1 1 0 1 0 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 0 0 0 0 0
3 D 1 9 9 9 8 016
63 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Computer Arithmetic: Floating-Point Numbers | Winter Term 2024/2025
Floating-Point Numbers
Patriot Failure at Dhahran

US Patriot system failed to intercept an incoming Iraqi Scud

missile at an army base in Saudi Arabia, killing 28 soldiers (Feb.
25th, 1991)
the time was stored as 24-bit floating-point
the time was increased every 100ms by 1/10s
1/10 can’t be exactly represented as binary number
error accumulated to 0.34s after 100h standby
as a consequence, the target position was off by ~570m
a software update was available to address the issue
1 void main() {
[Link] 2 float sec = 0.0, sec_inc = 0.1;
3 for (int i = 0; i < 10000; i++)
4 sec = sec + sec_inc;
5 printf("time: %f\n", sec);
6 }

64 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Computer Arithmetic: Floating-Point Numbers | Winter Term 2024/2025
Floating-Point Numbers
IEEE-754 Floating-Point Formats

IEEE-754 floating-point formats

64-bit double-precision floating-point (double, fp64)
32-bit single-precision floating-point (float, fp32)
16-bit half-precision floating-point (half, fp16)

double s e: 11 bit m: 52 bit

float s e: 8 bit m: 23 bit

half s e: 5 bit m: 10 bit

65 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Computer Arithmetic: Floating-Point Numbers | Winter Term 2024/2025
Floating-Point Numbers
New Floating-Point Formats

new floating-point representations for Deep Learning

bf16 (brain floating-point, bfloat16): Google TPU, Intel AVX-512, NVIDIA Tensor Cores, AMD Matrix Cores, . . .
tf32 (tensor float, tfloat32): NVIDIA Tensor Cores (Ampere), AMD Matrix Cores (MI300)
fp8 (E5M2, E4M3): NVIDIA Tensor Cores (Hopper), AMD Matrix Cores (MI300)
fp4 (E1M2, E2M1): NVIDIA Tensor Cores (Blackwell), AMD Matrix Cores (MI355)

fp4 s e:1 m:2

fp4 s e:2 m:1
fp8 s e: 4 bit m: 3 bit
fp8 s e: 5 bit m: 2 bit
fp16 s e: 5 bit m: 10 bit
fp32 s e: 8 bit m: 23 bit
bf16 s e: 8 bit m: 7 bit
tf32 s e: 8 bit m: 10 bit
66 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Computer Arithmetic: Floating-Point Numbers | Winter Term 2024/2025
Floating-Point Numbers
FP4 Values

Self Check

We consider the values that can be represented using FP4 (E2M1) with 1 bit for the sign, 2 bits for the exponent, and 1 bit for the
mantissa. List the values that can be represented if
encodings are reserved for 0, ∞, and NaN
encodings are only reserved for 0

67 THI | Prof. Dr.-Ing. Richard Membarth | Foundations of Computer Science | Computer Arithmetic: Floating-Point Numbers | Winter Term 2024/2025

Digital System Design 1 - Chapter 2 Slide
No ratings yet
Digital System Design 1 - Chapter 2 Slide
72 pages
Chapter 2
No ratings yet
Chapter 2
36 pages
Number Systems and Codes in Digital Electronics
No ratings yet
Number Systems and Codes in Digital Electronics
92 pages
Chap1-Eng-L2-Logique Combinatoire Et Sequentielle-P
No ratings yet
Chap1-Eng-L2-Logique Combinatoire Et Sequentielle-P
9 pages
Converting Decimal to Binary Systems
No ratings yet
Converting Decimal to Binary Systems
106 pages
Chapter 2
No ratings yet
Chapter 2
22 pages
Digital Lecture 01 - Number Systems and Codes
No ratings yet
Digital Lecture 01 - Number Systems and Codes
16 pages
Understanding Number Systems and Conversions
No ratings yet
Understanding Number Systems and Conversions
61 pages
EC4010IntroductionToDigitalLogic L1
No ratings yet
EC4010IntroductionToDigitalLogic L1
46 pages
Number System
No ratings yet
Number System
22 pages
DLD UNIT I-Chkk
No ratings yet
DLD UNIT I-Chkk
103 pages
Unit 1 Number Systems
No ratings yet
Unit 1 Number Systems
54 pages
CH-1 Number System
No ratings yet
CH-1 Number System
44 pages
Foc 1st Chapter Notes
No ratings yet
Foc 1st Chapter Notes
16 pages
Highlighted Notes
No ratings yet
Highlighted Notes
25 pages
Digital Computer Fundamentals Unit1
No ratings yet
Digital Computer Fundamentals Unit1
2 pages
Digital Systems & Binary Conversion
No ratings yet
Digital Systems & Binary Conversion
33 pages
Logic Design - Unit 1 - v1.6 - 20210915
No ratings yet
Logic Design - Unit 1 - v1.6 - 20210915
55 pages
Chapter Two
No ratings yet
Chapter Two
39 pages
Logic Circuit and Design (LABORATORY) Prelims
No ratings yet
Logic Circuit and Design (LABORATORY) Prelims
20 pages
Systems I: Computer Organization and Architecture
No ratings yet
Systems I: Computer Organization and Architecture
21 pages
Chapter 2
No ratings yet
Chapter 2
35 pages
Lecture2 Number Systems
No ratings yet
Lecture2 Number Systems
32 pages
Analogue vs Digital Concepts Guide
No ratings yet
Analogue vs Digital Concepts Guide
41 pages
Digital Logic Design Introduction
No ratings yet
Digital Logic Design Introduction
35 pages
Chapter 3 (Comp. Arch)
No ratings yet
Chapter 3 (Comp. Arch)
31 pages
Logic Design: Some Slides Are Taken From
No ratings yet
Logic Design: Some Slides Are Taken From
31 pages
Unit - 1
No ratings yet
Unit - 1
92 pages
DLD Module 1 Part 1
No ratings yet
DLD Module 1 Part 1
44 pages
ICS Day 1
No ratings yet
ICS Day 1
15 pages
L-8 Number Systems
No ratings yet
L-8 Number Systems
50 pages
Number Systems and Conversions
No ratings yet
Number Systems and Conversions
30 pages
Digital Systems & Binary Basics
No ratings yet
Digital Systems & Binary Basics
102 pages
Computer Organization and Architecture Computer Arithmetic
No ratings yet
Computer Organization and Architecture Computer Arithmetic
78 pages
Cam Scanner
No ratings yet
Cam Scanner
20 pages
EEE 204 - DIgital Logic Techniques and Computer - TEXTBOOK
No ratings yet
EEE 204 - DIgital Logic Techniques and Computer - TEXTBOOK
91 pages
Chapter 1 Binary System
No ratings yet
Chapter 1 Binary System
73 pages
Chapter 1 Binary System
No ratings yet
Chapter 1 Binary System
76 pages
Digital Electronics ND3 Summary
100% (1)
Digital Electronics ND3 Summary
49 pages
Lec 7
No ratings yet
Lec 7
15 pages
Chapter 1 Digital Systems and Binary Numbers
No ratings yet
Chapter 1 Digital Systems and Binary Numbers
60 pages
f37 Book Intarch Pres pt3 PDF
No ratings yet
f37 Book Intarch Pres pt3 PDF
95 pages
Understanding Number Systems in Computing
No ratings yet
Understanding Number Systems in Computing
48 pages
Digital Logic for Engineering Students
No ratings yet
Digital Logic for Engineering Students
47 pages
Chapter 1 Digital Systems and Binary Numbers
No ratings yet
Chapter 1 Digital Systems and Binary Numbers
40 pages
Chapter 3
No ratings yet
Chapter 3
72 pages
Logic Design - Slide - 1
No ratings yet
Logic Design - Slide - 1
50 pages
COA Chapter 3
No ratings yet
COA Chapter 3
23 pages
Lec 1
No ratings yet
Lec 1
27 pages
Lec03 Number Systems
No ratings yet
Lec03 Number Systems
17 pages
Number Systems and Conversions Guide
No ratings yet
Number Systems and Conversions Guide
30 pages
Chapter One, Numbering System
No ratings yet
Chapter One, Numbering System
4 pages
COA Unit 1
No ratings yet
COA Unit 1
115 pages
Digital Electronics
No ratings yet
Digital Electronics
224 pages
3
No ratings yet
3
145 pages
Điện tử số
No ratings yet
Điện tử số
20 pages
Understanding Number Systems in Computing
No ratings yet
Understanding Number Systems in Computing
21 pages
Chapter 2 Computer Architecture and Orgnization
No ratings yet
Chapter 2 Computer Architecture and Orgnization
61 pages
D 5 9042019 ProjectDependencies ReplyToDiscomQueries
No ratings yet
D 5 9042019 ProjectDependencies ReplyToDiscomQueries
4 pages
CH 4 Algorithms and Flowcharts
No ratings yet
CH 4 Algorithms and Flowcharts
23 pages
How To Make A Unity Character Move
No ratings yet
How To Make A Unity Character Move
1 page
Cloud Password Reset Process Must Be Initiated Before Moving A Database To On-Premise Environment
No ratings yet
Cloud Password Reset Process Must Be Initiated Before Moving A Database To On-Premise Environment
3 pages
Compromise Assessment Report CORVID
No ratings yet
Compromise Assessment Report CORVID
22 pages
Teamcenter SOA Create You Own SOA
No ratings yet
Teamcenter SOA Create You Own SOA
1 page
Mastering C
No ratings yet
Mastering C
3 pages
System Design Interview Questions 1696745718
No ratings yet
System Design Interview Questions 1696745718
32 pages
Mendix Training 6 Days Rapid
No ratings yet
Mendix Training 6 Days Rapid
2 pages
Python Program 1 To 13
No ratings yet
Python Program 1 To 13
13 pages
Computing Essentials
No ratings yet
Computing Essentials
27 pages
(FCP Sec Ops - NSE5) - FortiEDR 5.0 - Study Guide
100% (1)
(FCP Sec Ops - NSE5) - FortiEDR 5.0 - Study Guide
257 pages
Predictive Analysis For Big Mart Sales Using Machine Learning Algorithms
No ratings yet
Predictive Analysis For Big Mart Sales Using Machine Learning Algorithms
14 pages
Memory Dump Analysis Project Report - Edited1
No ratings yet
Memory Dump Analysis Project Report - Edited1
18 pages
LiBEIS: EIS Software for Battery Analysis
No ratings yet
LiBEIS: EIS Software for Battery Analysis
4 pages
Notes
No ratings yet
Notes
16 pages
50 Essential Windows Commands Guide
No ratings yet
50 Essential Windows Commands Guide
30 pages
Oop Unit-3
No ratings yet
Oop Unit-3
4 pages
Chap4 Fortran Language
No ratings yet
Chap4 Fortran Language
7 pages
Supermicro X10drh-CT User Manual MNL-1628
No ratings yet
Supermicro X10drh-CT User Manual MNL-1628
130 pages
HomeAssistant INIM API Notes Public-V5
No ratings yet
HomeAssistant INIM API Notes Public-V5
10 pages
ABAP Performance Tips
No ratings yet
ABAP Performance Tips
17 pages
Dat-1400 Technical Manual
No ratings yet
Dat-1400 Technical Manual
88 pages
MCA Curriculum2023
No ratings yet
MCA Curriculum2023
4 pages
Basic 2
No ratings yet
Basic 2
4 pages
Final Project Report Mca
0% (2)
Final Project Report Mca
41 pages
Computer Organization and Design RISC V Edition The Hardware Software Interface David A. Patterson eBook scholarly edition
No ratings yet
Computer Organization and Design RISC V Edition The Hardware Software Interface David A. Patterson eBook scholarly edition
47 pages
Software Requirement Specification For: (Srs Document)
No ratings yet
Software Requirement Specification For: (Srs Document)
37 pages
Database Partitioning Strategies Explained
No ratings yet
Database Partitioning Strategies Explained
8 pages
02 PowerScale Next Gen 16G Hardware Overview
No ratings yet
02 PowerScale Next Gen 16G Hardware Overview
48 pages