0% found this document useful (0 votes)
35 views36 pages

Lec 08

Uploaded by

sabajitboro0
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
35 views36 pages

Lec 08

Uploaded by

sabajitboro0
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 36

Introduction to Computing

(CS 1109/1110)
http://jatinga.iitg.ac.in/~asahu/cs1109/

Floating point

A. Sahu

Dept of Comp. Sc. & Engg.


Indian Institute of Technology Guwahati 1
Outline
• Floating Point
• Floating Point Density
• Operations
• Type Casting

2
Floating Point Numbers

3
Floating Point Numbers
• Need to floating point number
• Number representation : IEEE 754
• Floating point range
• Floating point density
–Accuracy
• Arithmetic and Logical Operation on
FP
• Conversions and type casting in C
4
Need to go beyond integers
complex
• integer 7
• rational 5/8 real
• real √3 rational
• complex 2-3i integer

Extremely large and small values:


distance pluto - sun = 5.9 1012 m
mass of electron = 9.1 x 10-28 gm
Representing fractions
• Integer pairs (for rational numbers)
5 8 = 5/8
Strings with explicit decimal point
- 2 4 7 . 0 9
Implicit point at a fixed position
010011010110001011
Floating point implicit point

fraction x base power


Numbers with binary point
101.11 = 1x22 + 0x21 + 1x20 + . +1x2-1 + 1x2-2
= 4 + 1 + .+ 0.5 + 0.25 = 5.7510
0.6 = 0.10011001100110011001.....
.6 x 2 = 1 + .2
.2 x 2 = 0 + .4
.4 x 2 = 0 + .8
.8 x 2 = 1 + .6
Numeric Data Type
• char, short, int, long int
– char : 8 bit number (1 byte=1B)
– short: 16 bit number (2 byte)
– int : 32 bit number (4B)
– long int : 64 bit number (8B)
• float, double, long double
– float : 32 bit number (4B)
– double : 64 bit number (8B)
– long double : 128 bit number (16B)
8
Numeric Data Type
unsigned char
char
unsigned short
short

Unsigned int

int
9
Number in integer and log scale
• unsigned char C;
– 8 bit, 256 number
– Start from 0, 1, 2, 3, 4, …, 255 //Unit spaced
• Int A
– 32 bit numbers
– Start from 0, 1, 2, ,,,231
– Negative side: -1, -2, -3, …., -231 //Unit spaced
• Float f
– 32 bit
– Log scale
10
Number in integer and log scale
• Log Scale
– 10-5 10-4 10-3 10-2 10-1 100 101 102 103 104 105 106
• Number of integers :
– Between 100 and 101 = 9
– Between 101 and 102 = 90
– Between 102 and 103 = 900
– Between 103 and 104 = 9000
– Between 104 and 105 = 90000
– Between 105 and 106 = 900000
11
Example Log Scale Numbers
• Example Scientific Format is d.dd X 10n
• For specific exponent value : only 999 numbers
Range N Range
0.01x100 - 9.99x100 999 0.01 – 9.99
0.01x101 - 9.99x101 999 0. 1 – 99.9
0.01x102 - 9.99x102 999 1.00 – 999
0.01x103 - 9.99x103 999 10 – 9990
0.01x104 - 9.99x104 999 100 – 99900
0.01x105 - 9.99x105 999 1000 –999000
0.01x106 - 9.99x106 999 10000-9990000
12
Consecutive Two numbers when
power is 5
• Ex1 Difference
– 0.01x105 = 1000
– 0.02x105 = 2000 One Digit
• Ex2 Difference Minimum
– 5.55x105 = 555000 Difference is
– 5.54x105 = 554000 Valued
• Ex2 Difference : 1000
– 9.99x105 = 999000
– 9.98x105 = 998000
13
C Float: with precision printing

float A=5.84e5;
A=A+20;
printf(“A=%e \n”,A); //A=5.840200e5
printf(“A=%.2e \n”,A); //A=5.84e5

14
Numeric Data Type
• char, short, int, long int
– We have : Signed and unsigned version
– char (8 bit)
• char : -128 to 127, we have +0 and -0 ☺ ☺ Fun
• unsigned char: 0 to 255
– int : -231 to 231-1
– unsigned int : 0 to 232-1
• float, double, long double
– For fractional, real number data
– All these numbered are signed and get stored in
different format
15
Sign bit Numeric Data Type

Exponent Mantissa
float

Exponent Mantiss-1

Mantissa-2
double
16
FP numbers with base = 10
(-1) S xFx 10E
S = Sign
F = Fraction (fixed point number)
usually called Mantissa or Significand
E = Exponent (positive or negative integer)
Example 5.9x1012 , -2.6x103 9.1 x 10-28
Only one non-zero digit left to the point
FP with two sign: How to handle
• Two signs: one for number other for
exponents
± d.dd x 10 ± dd
• Remove confusion:
– Only one sign for number
– Sign for exponent managed by Biasing
– 8 bit 256 represented as [-127 to 127]
– With Bias 127 means
[0, 1, 2, 3, 4, …127] [127, 128,..254]
[-1, -2, ..-127] [126, 125, …0]
IEEE 754 standard
Single precision numbers
1 8 23
0 1011 0101 1101 0110 1011 0001 0110 110
S E F
Double precision numbers
1 11 20+32
0 1011 0101 111 1101 0110 1011 0001 0110

S E F
1011 0001 0110 1100 1011 0101 1101 0110
Representing F in IEEE 754
Single precision numbers
23
1. 110101101011000101101101
F
Double precision numbers
20+32
1. 101101011000101101101
F
101100010110110010110101110101101

Only one non-zero digit left to the point: default it will be 1 incase
of binary. So no need to store this bit
Value Range for F
Single precision numbers
1 ≤ F ≤ 2 - 2-23 or 1≤F<2
Double precision numbers
1 ≤ F ≤ 2 - 2-52 or 1≤F<2

These are “normalized”.


Representing E in IEEE 754
Single precision numbers
8
10110101
E bias 127
Double precision numbers
11
10110101110
E bias 1023
FP-How to store: -0.75 in fp
• V = -0.75 = (0.11)2 Given numeric value, how FP
store it in Bits in memory?
• Scientific : - 1.1x2-1
• With Bias : E= -1+127 =126, S=1 for Neg
• Mantissa: remove default part 1.1 => X.1
Single precision numbers
1 8 23
1 0111 1110 1000 0000 0000 0000 0000 000
S E’ F
23
FP : What value stored?
Stored Bits in memory in FP
format: What numeric value?

• E=E’-127, V =(-1)s x 1 .M x 2 E’-127


• V= 1.1101… x 2 (40-127)=1.1101.. x 2-87
Single precision numbers
1 8 23
0 0010 1000 1101 0110 1011 0001 0110 110
S E’ F

24
Value Range for E
Single precision numbers
-126 ≤ E ≤ 127
(all 0’s and all 1’s have special meanings)
Double precision numbers
-1022 ≤ E ≤ 1023
(all 0’s and all 1’s have special meanings)
Floating point demo applet on the
web
• https://www.h-
schmidt.net/FloatConverter/IEEE754.html

• Google “Float applet” to get the above link

26
Overflow and underflow
largest positive/negative number (SP) =
±(2 - 2-23) x 2127 ≅ ± 2 x 1038
smallest positive/negative number (SP) =
± 1 x 2-126 ≅ ± 2 x 10 -38

Largest positive/negative number (DP) =


±(2 - 2-52) x 21023 ≅ ± 2 x 10308
Smallest positive/negative number (DP) =
± 1 x 2-1022 ≅ ± 2 x 10 -308
Density of int vs float
Int : 32 bit

Exponent Mantissa
Float : 32 bit
• Number of number can be represented
– Both the cases (float, int) : 232
• Range
– int (-231 to 231-1)
– float Large ±(2 - 2-23) x 2127 Small±
± 1 x 2-126
• 50% of float numbers are Small (less then ±1 ) 28
Density of Floating Points
• 256 Persons in Room of Capacity 256 (Range)
8 bit integer : 256/256 = 1
• 256 person in Room of Capacity 200000
(Range)
– 1st Row should be filled with 128 person
– 50% number with negative power are -1 < N > +1
• Density of Floating point number is
– Dense towards 0
-–∞Sparse towards ∞
-2 -1 0 +1 +2 +∞
29
Expressible Numbers(int and float)
Expressible integers

- overflow + overflow
-231 0 231-1
- underflow
Expressible Float
+ underflow

- overflow + overflow
0
(1-2-24)x2128 -0.5x2-127 0.5x2-127 (1-2-24)x2128
Distribution of Values
• 6-bit IEEE-like format
– e = 3 exponent bits
– f = 2 fraction bits
– Bias is 3

• Notice how the distribution gets denser


-15
toward-10zero.Denormalized
-5 0 5
Normalized Infinity
10 15
Distribution of Values
(close-up view)
• 6-bit IEEE-like format
– e = 3 exponent bits
– f = 2 fraction bits
– Bias is 3

-1 -0.5 0 0.5 1
Denormalized Normalized Infinity
Density of 32 bit float SP
• Fraction/mantissa is 23 bit
• Number of different number can be stored for
particular value of exponent
– Assume for exp=1, 223=8x1024x1024 ≈8x106
– Between 1-2 we can store 8x106 numbers
• Similarly
– for exp=2, between 2-4, 8x106 number of number can
be stored
– for exp=3, between 4-8, 8x106 number of number can
be stored
– for exp=4, between 8-16, 8x106 number of number
can be stored
33
Density of 32 bit float SP
• Similarly
– for exp=23, between 222-223, 8x106 number of
number can be stored
– for exp=24, between 223-224, 8x106 number of
number can be stored OK

– for exp=25, between 224-225, 8x106 number of


number can be stored
• 224-225 >8 x106 BAD

–…
– for exp=127, between 2126-2127, 8x106 number of
number can be stored WROST 34
Density of 32 bit float SP
• 223=8x1024x1024 ≈8x106

012 4 8 16

35
Thanks

36

You might also like