0% found this document useful (0 votes)
121 views5 pages

High-Speed FPU Design Using Verilog

This document summarizes a research paper that presents the design and implementation of a high-speed floating point unit using Verilog. It discusses the architecture of the floating point unit, which includes pre-normalization and post-normalization units. It also describes the algorithms used to perform single precision floating point addition, subtraction, multiplication, and division. The unit was modeled in Verilog HDL and simulated using ModelSim to verify the arithmetic operations and exception handling.

Uploaded by

Anju Bala
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
121 views5 pages

High-Speed FPU Design Using Verilog

This document summarizes a research paper that presents the design and implementation of a high-speed floating point unit using Verilog. It discusses the architecture of the floating point unit, which includes pre-normalization and post-normalization units. It also describes the algorithms used to perform single precision floating point addition, subtraction, multiplication, and division. The unit was modeled in Verilog HDL and simulated using ModelSim to verify the arithmetic operations and exception handling.

Uploaded by

Anju Bala
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

International Journal of Computer Applications (0975 8887)

National conference on VSLI and Embedded systems 2013

Implementation of a High Speed Single Precision


Floating Point Unit using Verilog
Ushasree G R Dhanabal Sarat Kumar Sahoo
ECE-VLSI, VIT University, ECE-VLSI, VIT University, ECE-VLSI, VIT University,
Vellore- 632014, Tamil Vellore- 632014, Tamil Vellore- 632014, Tamil
Nadu, India Nadu, India Nadu, India

ABSTRACT
To represent very large or small values, large range is Table 1. Bit Range for Single (32-bit) and Double (64-bit)
required as the integer representation is no longer appropriate. Precision Floating-Point Values [2]
These values can be represented using the IEEE-754 standard
based floating point representation. This paper presents high Sign Exponent Fraction Bias
speed ASIC implementation of a floating point arithmetic unit Single 1[31] 8[30-23] 23[22-00] 127
which can perform addition, subtraction, multiplication, precision
division functions on 32-bit operands that use the IEEE 754- Double 1[63] 11[62-52] 52[51-00] 1023
2008 standard. Pre-normalization unit and post normalization
units are also discussed along with exceptional handling. All precision
the functions are built by feasible efficient algorithms with
several changes incorporated that can improve overall latency, There are four types of exceptions that arise during floating
and if pipelined then higher throughput. The algorithms are point operations. The Overflow exception is raised whenever
modeled in Verilog HDL and have been implemented in the result cannot be represented as a finite value in the
ModelSim. precision format of the destination [13]. The Underflow
exception occurs when an intermediate result is too small to
Keywords be calculated accurately, or if the operation's result rounded to
Floating point number, normalization, exceptions, latency,
the destination precision is too small to be normalized [13]
etc. The Division by zero exception arises when a finite nonzero
number is divided by zero [13]. The Invalid operation
1. INTRODUCTION exception is raised if the given operands are invalid for the
An arithmetic circuit which performs digital arithmetic
operation to be performed [13].In this paper ASIC
operations has many applications in digital coprocessors,
implementation of a high speed FPU has been carried out
application specific circuits, etc. Because of the advancements
using efficient addition, subtraction, multiplication, division
in the VLSI technology, many complex algorithms that
algorithms. Section II depicts the architecture of the floating
appeared impractical to put into practice, have become easily
point unit and methodology, to carry out the arithmetic
realizable today with desired performance parameters so that
operations. Section III presents the arithmetic operations that
new designs can be incorporated [2]. The standardized
use efficient algorithms with some modifications to improve
methods to represent floating point numbers have been
latency. Section IV presents the results that have been
instituted by the IEEE 754 standard through which the
simulated in ModelSim. Section V presents the conclusion.
floating point operations can be carried out efficiently with
modest storage requirements,. 2. ARCHITECTURE AND
The three basic components in IEEE 754 standard floating METHODOLOGY
point numbers are the sign, the exponent, and the mantissa The FPU of a single precision floating point unit that performs
[3]. The sign bit is of 1 bit where 0 refers to positive number add, subtract, multiply, divide functions is shown in figure 1
and 1 refers to negative number [3]. The mantissa, also called [1]. Two pre-normalization units for addition/subtraction and
significand which is of 23bits composes of the fraction and a multiplication/division operations has been given[1]. Post
leading digit which represents the precision bits of the number normalization unit also has been given that normalizes the
[3] [2]. The exponent with 8 bits represents both positive and mantissa part[2]. The final result can be obtained after post-
negative exponents. A bias of 127 is added to the exponent to normalization. To carry out the arithmetic operations, two
get the stored exponent [2]. Table 1 show the bit ranges for IEEE-754 format single precision operands are considered.
single (32-bit) and double (64-bit) precision floating-point Pre-normalization of the operands is done. Then the selected
values [2]. operation is performed followed by post-normalizing the
The value of binary floating point representation is as follows output obtained .Finally the exceptions occurred are detected
where S is sign bit, F is fraction bit and E is exponent field. and handled using exceptional handling. The executed
operation depends on a two bit control signal (z) which will
Value of a floating point number= (-1)S x val(F) x 2val(E) determine the arithmetic operation is shown in table 2.

32
International Journal of Computer Applications (0975 8887)
National conference on VSLI and Embedded systems 2013

Fig 2: 16 bit modified carry look ahead adder [4]

Fig 1: Block diagram of floating point arithmetic unit [1] Fig 3: Metamorphosis of partial full adder [4]

Table 2. Floating Point Unit Operations 3.2 Subtraction Unit


Subtraction operation is is implemented by taking 2s
z(control signal) Operation complement of second operand. Similar to addition operation,
2b00 Addition subtraction consists of three major operations, pre
2b01 Subtraction normalization, addition of mantissas, post normalization and
exceptional handling[4]. Addition of mantissas is carried out
2b10 Multiplication
using the 24 bit modified carry look ahead adder .
2b11 Division
3.3 Multiplication
Constructing an efficient multiplication module is a iterative
process and 2n-digit product is obtained from the product of
two n-digit operands. In IEEE 754 floating-point
3. 32 BIT FLOATING POINT multiplication, the two mantissas are multiplied, and the two
ARITHMETIC UNIT exponents are added. Here first the exponents are added from
which the exponent bias (127) is removed. Then mantissas
3.1 Addition Unit have been multiplied using feasible algorithm and the output
One of the most complex operation in a floating-point unit sign bit is determined by exoring the two input sign bits. The
comparing to other functions which provides major delay and obtained result has been normalized and checked for
also considerable area. Many algorithms has been developed exceptions.
which focused to reduce the overall latency in order to
improve performance. The floating point addition operation is To multiply the mantissas Bit Pair Recoding (or Modified
carried out by first checking the zeros, then aligning the Booth Encoding) algorithm has been used, because of which
significand, followed by adding the two significands using an the number of partial products get reduces by about a factor of
efficient architecture. The obtained result is normalized and is two, with no requirement of pre-addition to produce the
checked for exceptions. To add the mantissas, a high speed partial products. It recodes the bits by considering three bits at
carry look ahead has been used to obtain high speed. a time. Bit Pair Recoding algorithm increases the efficiency of
Traditional carry look ahead adder is constructed using AND, multiplication by pairing. To further increase the efficiency of
XOR and NOT gates. The implemented modified carry look the algorithm and decrease the time complexity, Karatsuba
ahead adder uses only NAND and NOT gates which decreases algorithm can be paired with the bit pair recoding algorithm.
the cost of carry look ahead adder and also enhances its speed
One of the fastest multiplication algorithm is Karatsuba
also [4].
algorithm which reduces the multiplication of two n-digit
The 16 bit modified carry look ahead adder is shown in figure numbers to 3nlog32 ~ 3n1.585 single-digit multiplications and
2 and the metamorphosis of partial full adder is shown in therefore faster than the classical algorithm, which requires n2
figure 3 using which a 24 bit carry look ahead adder has been single-digit products [11]. It allows to compute the product of
constructed and performed the addition operation. two large numbers x and y using three multiplications of
smaller numbers, each with about half as many digits as x or

33
International Journal of Computer Applications (0975 8887)
National conference on VSLI and Embedded systems 2013

y, with some additions and digit shifts instead of four


multiplications[11]. The steps are carried out as follows
Let x and y be represented as n-digit numbers with base B and
m<n.
x = x1Bm + x0
y = y1Bm + y0
where x0 and y0 are less than Bm [11]. The product is then
xy = (x1Bm + x0)(y1Bm + y0)= c1B2m + b1Bm + a1
Where c1 = x1y1
b1 = x1y0+ x0y1
Fig 5: partial product generation [10]
a1 = x0y0.
b1 = p1- z2 - z0
p1 = (x1 + x0)(y1 + y0)
Here c1, a1, p1 has been calculated using bit pair recoding
algorithm. Radix-4 modified booth encoding has been used
which allows for the reduction of partial product array by half
[n/2]. The bit pair recoding table is shown in table 3. In the Fig 6: Sign Prevention Extension of Partial Products [10]
implemented algorithm for each group of three bits (y2i1,
y2i, y2i_1) of multiplier, one partial product row is generated 3.4 Division Algorithm
according to the encoding in table 3. Radix-4 modified booth Division is the one of the complex and time-consuming
encoding signals and their respective partial products has been operation of the four basic arithmetic operations. Division
generated using the figures 4 and 5. For each partial product operation has two components as its result i.e. quotient and a
row, figure 4 generates the one, two, and neg signals. These remainder when two inputs, a dividend and a divisor are
values are then given to the logic in figure 5 with the bits of given. Here the exponent of result has been calculated by
the multiplicand, to produce the whole partial product array. using the equation, e0 = eA eB + bias (127) -zA + zB
To prevent the sign extension the obtained partial products are followed by division of fractional bits [5] [6]. Sign of result
extended as shown in figure 6 and the the product has been has been calculated from exoring sign of two operands. Then
calculated using carry save select adder. the obtained quotient has been normalized [5] [6].

Table 3. Bit-Pair Recoding [11] Division of the fractional bits has been performed by using
non restoring division algorithm which is modified to improve
BIT OPERATION the delay. The non-restoring division algorithm is the fastest
PATTERN among the digit recurrence division methods [5] [6].
0 0 0 NO Generally restoring division require two additions for each
OPERATION iteration if the temporary partial remainder is less than zero
and this results in making the worst case delay longer[5] [6].
0 0 1 1xa prod=prod+a;
To decrease the delay during division, the non-restoring
0 1 0 2xa-a prod=prod+a; division algorithm was introduced which is shown in figure 7.
0 1 1 2xa prod=prod+2a; Non-restoring division has a different quotient set i.e it has
1 0 0 -2xa prod=prod-2a; one and negative one, while restoring division has zero and
1 0 1 -2xa+a prod=prod-a; one as the quotient set[5] [6] Using the different quotient set,
reduces the delay of non-restoring division compared to
1 1 0 -1xa prod=prod-a;
restoring division. It means, it only performs one addition per
1 1 1 NO iteration which improves its arithmetic performance[6].
OPERATION
The delay of the multiplexer for selecting the quotient digit
and determining the way to calculate the partial remainder can
be reduced through rearranging the order of the computations.
In the implemented design the adder for calculating the partial
remainder and the multiplexer has been performed at the same
time, so that the multiplexer delay can be ignored since the
adder delay is generally longer than the multiplexer delay.
Second, one adder and one inverter are removed by using a
new quotient digit converter. So, the delay from one adder and
one inverter connected in series will be eliminated.

Fig 4: MBE Signal Generation [10]

34
International Journal of Computer Applications (0975 8887)
National conference on VSLI and Embedded systems 2013

Fig 9: Implementation of 32 bit Subtraction operation


Fig 7: Non Restoring Division algorithm
4.3 Multiplication Unit
The single precision multiplication operation has been
4. RESULTS implementation in modelsim is shown in figure 10. For inputs
in_sign1=1b0,in_sign2=1b0;in_exp1=8b10000011,in_exp2
4.1 Addition Unit =8b10000010,in_mant1=23b00100,in_mant2=23b001100
The single precision addition operation has been and the output obtained is out_sign=1b0;out_exp=8d131;
implementation in modelsim for the inputs, input1=25.0 and ,out_mant=23b00101011.
input2=4.5 which is input2=4.5 shown in figure 8 for which
result has been obtained as 29.5

Fig 10: Implementation of 32 bit Multiplication operation


Fig 8: Implementation of 32 bit Addition operation
4.4 Division Operation
4.2 Subtraction Unit The single precision division operation has been
The single precision addition operation has been implementation in modelsim for the inputs, input1=32d100
implementation in modelsim for the inputs, input1=25.0 and and input2=32d36 which is shown in figure 11 for which
input2=4.5 which is shown in figure 9 for which result has quotient has been obtained as 23d2 and the remainder as
been obtained as 20.5. 23d28.

35
International Journal of Computer Applications (0975 8887)
National conference on VSLI and Embedded systems 2013

5. REFERENCES
[1] Rudolf Usselmann, Open Floating Point Unit, The Free
IP Cores Projects.
[2] Edvin Catovic, Revised by: Jan Andersson, GRFPU
High Performance IEEE754 Floating Point Unit,
Gaisler Research, Frsta Lngatan 19, SE413 27
Gteborg, and Sweden.
[3] David Goldberg, What Every Computer Scientist
Should Know About Floating-Point Arithmetic, ACM
Computing Surveys, Vol 23, No 1, March 1991, Xerox
Palo Alto Research Center, 3333 Coyote Hill Road, Palo
Alto, California 94304.
[4] Yu-Ting Pai and Yu-Kumg Chen, The Fastest Carry
Lookahead Adder, Department of Electronic
Engineering, Huafan University.
[5] S. F. Oberman and M. J. Flynn, Division algorithms and
implementations, IEEE Transactions on Computers,
vol. 46, pp. 833854, 1997.
[6] Milos D. Ercegovac and Tomas Lang, Division and
Square Root: Digit-Recurrence Algorithms and
Implementations, Boston: Kluwer Academic Publishers,
1994.
[7] ANSI/IEEE Standard 754-1985, IEEE Standard for
Binary Floating-Point Arithmetic, 1985.
[8] Behrooz Parhami, Computer Arithmetic - Algorithms
and Hardware Designs, Oxford: Oxford University Press,
2000.
[9] Steven Smith, (2003), Digital Signal Processing-A
Practical guide for Engineers and Scientists, 3rd Edition,
Elsevier Science, USA.
[10] D. J. Bernstein. Multidigit Multiplication for
Mathematicians. Advances in Applied Mathemat-ics, to
appear
[11] A. Karatsuba and Y. Ofman. Multiplication of Multidigit
Numbers on Automata. Soviet Physics- Doklady, 7
(1963), 595-596.
[12] D. E. Knuth. The Art of Computer Programming.
Volume 2: Seminumerical Algorithms.Addison-Wesley,
Reading, Massachusetts, 3rd edition, 1997.
Yamin Li and Wanming Chu, Implementation of Single
Precision Floating Point Square Root on FPGAs, Proc
of FCCM97, IEEE Symposium on FPGAs for Custom
Computing Machines, April 16 18, 1997, Napa,
California, USA, pp.226-232.
Fig 11: Implementation of 32 bit Multiplication operation

36

You might also like