Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2012, IJRET
This work describes the design of a divder technique Low-power techniques are applied in the design of the unit, and energy-delay tradeoffs considered. The energy dissipation in the divider can be reduced by up to 70% with respect to a standard implementation not optimized for energy, without penalizing the latency. In this dividing technique we compare The radix-8 divider is compared with one obtained by overlapping three radix-2 stages and with a radix-4 divider. Results show that the latency of our divider is similar to that of the divider with overlapped stages, but the area is smaller. The speed-up of the radix-8 over the radix-4 is about 23.58% and the energy dissipated to complete a division is almost the same, although the area of the radix-8 is 67.58% larger.
IEEE Transactions on Computers, 1999
The general objective of our work is to develop methods to reduce the energy consumption of arithmetic modules while maintaining the delay unchanged and keeping the increase in the area to a minimum. Here, we illustrate some techniques for dividers realized in CMOS technology. The energy dissipation reduction is carried out at different levels of abstraction: from the algorithm level down to the implementation, or gate, level. We describe the use of techniques such as switching-off not active blocks, retiming, dual voltage, and equalizing the paths to reduce glitches. Also, we describe modifications in the on-the-fly conversion and rounding algorithm and in the redundant representation of the residual in order to reduce the energy dissipation. The techniques and modifications mentioned above are applied to a radix-4 divider, realized with static CMOS standard cells, for which a reduction of 40 percent is obtained with respect to the standard implementation. This reduction is expected to be about 60 percent if low-voltage gates, for dual voltage implementation, are available. The techniques used here should be applicable to a variety of arithmetic modules which have similar characteristics.
IEEE 17th International Conference on Application-specific Systems, Architectures and Processors (ASAP'06), 2006
As technology evolves, there is a never ending need to explore design tradeoffs and alternatives. In the CMOS technologies of the recent past where minimizing the die area was crucial, radix-4 minimally redundant SRT dividers were widely used because they only require simple multiples of divisor. Quotient conversion was typically done by on-the-fly conversion. In deep submicron CMOS technology these decisions need to be reconsidered. Now it is attractive to use maximum redundancy to simplify quotient selection. Replacing the on-the-fly conversion that operates on every cycle with an adder that operates only one cycle reduces the switching factor by the order of 29x for the conversion during a double precision division. This is significant because the onthe-fly conversion can consume 30% of the total energy of a divider. Furthermore, the quotient computation is sped up by the elimination of the big lookup table of minimally redundant SRT dividers. To illustrate this concept of trading extra hardware for improved power and speed and a simpler implementation, a radix-4 maximally redundant divider is designed and implemented in 65 nm CMOS technology using an ASIC flow and single, double and triple V T devices. Clock and data gating and data recirculation techniques are used to save power. Finally, a method to evaluate design alternatives for energy efficiency is proposed that takes into account the active power consumption, the inactive power consumption and the duty cycle.
Proceedings of the 8th International Conference on VLSI Design
The digit-recurrence division relies on a sequence of addition/subtraction and shift operations in a manner similar to the paper-and-pencil approach, that gives a very regular structure suitable for efficient VLSI implementation. Speed is obtained through the use of redundant number notation allowing carry-propagation-free addition/subtraction with a delay independent of the size of the divisor. Since the quotient digits are obtained sequentially, the delay can theoretically be further reduced by recurring to higher-order radixes to obtain several quotient bits at once. This paper compares the synthesis of radix-2 and radix-4 dividers.
2006 IEEE International Symposium on Signal Processing and Information Technology, 2006
This paper presents a proposal for design of radix 4 SRT dividers for single precision DSP in deep submicron CMOS technology. Radix 4 dividers with minimal redundancy were used widely in the previous technologies where minimizing the die area was the top priority. This was done because these dividers only require simple multiples of divisor, and quotient conversion was typically done by on-the-fly conversion without the need for an adder. On the other hand, in the current deep submicron CMOS technology where many millions of transistors are available in a relatively small silicon area, it is attractive to use an adder and maximum redundancy to simplify quotient selection and conversion. Replacing the on-the-fly conversion that operates on every cycle by an adder that operates only one cycle reduces the switching factor by the order of 6x. This is significant because the on-the-fly conversion can consume 30% of the total energy of a divider. Furthermore, thanks to the elimination of the big lookup table intrinsic in minimally redundant SRT dividers, the quotient computation is sped up. To illustrate this concept of trading a little extra hardware for reduced power and increased speed in the deep submicron CMOS technology, a number of single precision radix 4 dividers with maximal redundancy are designed and implemented in 65 nm CMOS technology using an ASIC flow and triple VT devices. Clock and data gating and data recirculating techniques are used to save power. High VT devices are introduced to reduce leakage power. Finally, a novel method to evaluate different alternatives for energy efficiency is described along with the implementation results. Index terms -Single precision DSP, SRT divider, redundant divider, simple quotient computation, multiple V T , deep submicron CMOS, energy efficiency, ASIC flow, redundant quotient, active energy, leakage energy.
Proceedings 13th IEEE Sympsoium on Computer Arithmetic
SRT dividers are common in modern floating point units. Higher division performance is achieved by retiring more quotient bits in each cycle. Previous research has shown that realistic stages are limited to radix-2 and radix-4. Higher radix dividers are therefore formed by a combination of low-radix stages. In this paper, we present an analysis of the effects of radix-2 and radix-4 SRT divider architectures and circuit families on divider area and performance. We show the performance and area results for a wide variety of divider architectures and implementations. We conclude that divider performance is only weakly sensitive to reasonable choices of architecture but significantly improved by aggressive circuit techniques.
IEEE Access
The electronics world is very well described in two distinct but dependent interdisciplinary areas, namely hardware and software. Arithmetic operations are very vital building blocks of an electronic system. An algorithm is a systematic arrangement that helps develop a sophisticated electronic system, including hardware and software aspects. Addition, subtraction, multiplication, and division are critical elements of arithmetic implementation in the electronic system, but fewer efforts have been made to implement division than other arithmetic operations, even though the number of transistors on a chip is increasing beyond the Moore's law prediction. It is quite complicated to implement arithmetical operations; here, a sophisticated algorithm is essential to successful implementation. Technological upgrades are leading to a new paradigm of applications, where the performance of a division circuit or block is a vital and critical feature of a successful system. The lexicon of algorithms used in the implementation of the division operation in electronics systems is discussed in detail in the present article, which indicates the mathematical formulation, criticality, conversion pattern, hardware requirements, and logic used for conversion. The current report describes the broad classification of dividers into basic classes named digit recurrence, high radix, functional iteration, estimation, a look-up table, and variable latency. It also illustrates that, in practical implementation, many algorithms have been developed that combine one or many classes and are implemented with different hardware architectures. The study indicated the possibility of improving the presently available algorithms or creating a new algorithm to enhance practical implementation.
International Journal of Computer Applications, 2015
The ever increasing demand in VLSI architecture to handle complex systems has resulted for designing of high speed divider architecture. The divider is designed using ever known ancient methodology "Vedic mathematics". There are several methods present in Vedic mathematics but here Parvartya sutra is used. It is a general division formula which can be applicable to all cases of division which is an efficient way for dividing large numbers with respect to delay and power consumption. Here thirty-two bit divider architecture is implemented using this sutra & synthesized and simulated using Xilinx ISE simulator and implemented on virtex4 FPGA device XC4VLX15.The output parameters such as propagation delay and device utilization are calculated from synthesis results. Our result shows speed improvement as compared to other architecture presented in this literature. This architecture can be implemented in many applications such as digital signal processing, cryptography, processor arithmetic unit design etc.
This paper presents an efficient 4-bit unsigned binary serial divider and its implementation using 180nm CMOS process technology. The layout design of the serial divider circuit is efficiently optimized in terms of area. The serial divider circuit provides a good compromise between area and performance in divider design. The serial divider is designed based on repeated one’s complement binary subtraction algorithm. The implementation consists of several combinational and sequential components such as 4-bit ripple carry adder, 2:1 multiplexers, D flip-flops and 4-bit synchronous up counter. The circuit analysis is carried out in terms of performance parameters such as transistor count, propagation delay and power consumption. According to the estimations done, the transistor count, propagation delay and power consumption of the serial divider without parasitics was found to be 568, 5.13ns and 196.2µW respectively.The presence of parasitics due to metal layers in the layout design increase propagation delay to 86.42ns and power consumption to 206µW.
1994
The increasing computation requirements of modern computer applications have stim- ulated a large interest in developing extremely high-performance floating-point dividers. A variety of division algorithms are available, with SRT being utilized in many computer systems. A careful analysis of SRT divider topologies has demonstrated that a relatively simple divider designed in an aggressive circuit style can achieve extremely high perfor- mance. Further, an aggressive circuit implementation can minimize many of the perfor- mance advantages of more complex divider algorithms. This paper presents the tradeoffs of the different divider topologies, the design of the divider, and performance results.
2000
Hardware dividers are needed in many areas of applications like computer floating-point units, communication systems, cryptography, signal pro- cessing, etc. The performance requirements of these applications dier regarding data and architectural issues. In this paper, the basic principles used in hardware integer dividers are shown. A hybrid data-dependent divider is proposed based on several improvements that speedup division on average
1997
This paper presents two new radix-8 division algorithms. These algorithms are based on comparisons of relatively narrow operands, rather than on confusing P-D plots. It is hoped that by implementing comparison-based division algorithms, Pentium-style bugs would become less likely.
Lecture Notes in Computer Science, 2004
This paper surveys different implementations of dividers on FPGA technology. A special attention is paid on ATP (area-time-power) trade-offs between restoring, non-restoring, and SRT dividers algorithms for different operand widths, remainder representations, and radices. Main results show that SRT radix-2 present the best ATP figure. In combinational implementation, an important power improvement,-up to 51%-with respect to traditional nonrestoring implementations is obtained. Moreover, up to 93% power improvement can be achieved if pipelining is implemented. Finally, the sequential implementation is another important way to reduce the consumption in more than 89 %.
Scientific Reports
This article elaborates on the state-of-the-art novel Udayan S. Patankar (USP)-Awadhoot algorithm for distinctive implementation area improvement for area-critical electronic applications. The proposed USP-Awadhoot divider is a digit recurrence class, but it can be flexibly implemented as a restoring or nonrestoring algorithm. The implementation example indicates the use of the Baudhayan-Pythagoras triplet method in association with the proposed USP-Awadhoot divider. The triplet method provides an easy way to generate Mat_Term1, Mat_Term2, and T_Term, which are further utilized with the proposed USP-Awadhoot divider. The USP-Awadhoot divider is implemented in three parts. First is preprocessing circuit stage for executing a dynamic separate scaling operation on input operands, ensuring the inputs are in the correct form. Second is the processing circuit stage for implementing the conversion logic expressed by the Awadhoot matrix, and third is the postprocessing circuit stage for rec...
Solid-State Circuits …, 1994
A 1.2 ¿m CMOS combinational implementation of a new hybrid radix-4 division algorithm is presented. The algorithm is named hybrid because the dividend, the quotient, and the remainder are represented using the signed-digit-set {2,1,0,1,2}; while the divisor is represented using the conventional digit-set {0, 1, 2, 3}. The divider requires the divisor Y to be pre-scaled to the range 1
1996 IEEE International Symposium on Circuits and Systems. Circuits and Systems Connecting the World. ISCAS 96, 1996
A hybrid radix-4/radix-8 architecture targeted for high bit multipliers is presented as a compromise between the high speed of a radix-4 multiplier architecture and the low power dissipation of a radix-8 multiplier architecture. In this hybrid radix-4/radix-8 multiplier architecture, the performance bottleneck of a radix-8 multiplier, the generation of three times the multiplicand for use in generating the radix-8 partial product, is performed in parallel with the reduction of the radix+ partird products rather than serially, as in a radix-8 multiplier. This hybrid radix+radix-8 multiplier architecture requires 13% less power for a 64 x 64 bit multiplier, and results in only a 9% increase in delay, as com-"pared with a radix~implementation, When supply voltage is sealed such that all multipliers exhibit the same delay, the 64 x 64 bit hybrid radixJVradix-8 multiplier dissipates less power than either the radix-4 or radix-8 multipliers. The hybrid radix-4/radix-8 amhiteeture is therefore appropriate for those applications that must dissipate minimal power and operate at high speeds.
2011 International Symposium on Electronic System Design, 2011
Vedic Mathematics is the ancient methodology of Indian mathematics which has a unique computational technique for calculations based on 16 Sutras (Formulae). Novel divider architecture for high speed VLSI application using such ancient methodology is presented in this paper. Propagation delay and dynamic power consumption of a divider circuitry were minimized significantly by removing unnecessary recursion through Vedic division methodology. The functionality of these circuits was checked and performance parameters like propagation delay and dynamic power consumption were calculated by spice spectre using 90nm CMOS technology. The propagation delay of the resulting 16-bit binary dividend by an 8-bit divisor circuitry was only ~10.5ns and consumed ~24μW power for a layout area of ~10.25 mm 2. By combining Boolean logic with ancient Vedic mathematics, substantial amount of iteration were eliminated that resulted in ~45% reduction in delay and ~30% reduction in power compared with the mostly used (Digit Recurrence, Convergence & Series Expansion) architectures.
The Journal of Engineering, 2014
Transistor level implementation of division methodology using ancient Vedic mathematics is reported in this Letter. The potentiality of the 'Dhvajanka (on top of the flag)' formula was adopted from Vedic mathematics to implement such type of divider for practical very large scale integration applications. The division methodology was implemented through half of the divisor bit instead of the actual divisor, subtraction and little multiplication. Propagation delay and dynamic power consumption of divider circuitry were minimised significantly by stage reduction through Vedic division methodology. The functionality of the division algorithm was checked and performance parameters like propagation delay and dynamic power consumption were calculated through spice spectre with 90 nm complementary metal oxide semiconductor technology. The propagation delay of the resulted (32 ÷ 16) bit divider circuitry was only ∼300 ns and consumed ∼32.5 mW power for a layout area of 17.39 mm 2. Combination of Boolean arithmetic along with ancient Vedic mathematics, substantial amount of iterations were reduced resulted as ∼47, ∼38, 34% reduction in delay and ∼34, ∼21, ∼18% reduction in power were investigated compared with the mostly used (e.g. digit-recurrence, Newton-Raphson, Goldschmidt) architectures.
Unpublished, August, 1997
This paper presents two new radix-8 division algorithms. These algorithms are based on comparisons of relatively narrow operands, rather than on confusing P-D plots. It is hoped that by implementing comparison-based division algorithms, Pentium-style bugs would become less likely.
Integration, 2021
Energy efficiency has emerged as one of the most essential design parameters in contemporary computing system design. Approximate computing is a new computing paradigm to achieve energy efficiency by tradingoff energy/area/latency improvements with accuracy for error-resilient applications. This paper proposes Reconfigurable Energy-efficient Approximate Divider (READ) that achieves several energy-quality configurable modes using fixed restoring array divider architecture. Conventional approximate binary dividers require various divider hardware configurations to achieve distinct energy-quality trade-off points, which decreases the hardware flexibility, especially for modern embedded systems. READ accomplishes energy efficiency while meeting the dynamically varying accuracy requirements of the targeted application. READ uses reconfigurable subtractor cells that can work in either accurate or approximate mode using a subtractor cell controller logic. The paper also introduces the design of overflow detector using minimal hardware resources. A comprehensive accuracy and hardware evaluation on CMOS 45-nm technology node are performed for the proposed dividers as well as other state-of-the-art divider designs. Compared to the accurate 16∕8 divider design, the proposed divider shows an improvement of 49% in terms of energy efficiency and is 1.26x faster, while introducing minimal errors. The proposed divider design is demonstrated for its efficacy in image processing tasks and shows nominal effect on the output quality.
Proceedings of ASP-DAC/VLSI Design 2002. 7th Asia and South Pacific Design Automation Conference and 15h International Conference on VLSI Design
In this paper, we present a new method of performing Division in Hardware and explore different ways of implementing it. This method involves computing a preliminary estimate of the quotient by splitting the Dividend, performing division of each of the parts in parallel and merging them. The estimate is refined iteratively to get the final quotient. This method is significantly fast since it carries out parallel operations to compute the preliminary quotient and makes use of a fast multiplier to refine the result. It is possible to pipeline the execution of the unit yielding further increase in throughput. Speed estimates show that this method yields a much higher throughput than other fast methods, while area and latency are comparable ©
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.