Alberto Nannarelli

Proceedings of the 43rd IEEE Midwest Symposium on Circuits and Systems (Cat.No.CH37144), 2000

×ØÖ Ø-The aim of this work is to reduce the power dissipated in high order Finite Impulse Respons... more ×ØÖ Ø-The aim of this work is to reduce the power dissipated in high order Finite Impulse Response (FIR) filters, while maintaining the delay unchanged. We compare in terms of performance, area, and power dissipation the implementation of a traditional FIR filter with a Residue Number System (RNS) based one. The resulting implementations, designed to work at the same clock rate, show that the RNS filter is smaller and consumes less power than the traditional one for a number of taps larger than eight.

Low Latency Digit-Recurrence Reciprocal and Square-Root Reciprocal Algorithm and Architecture

Fast radix-4 retimed division with selection by comparisons

Proceedings IEEE International Conference on Application- Specific Systems, Architectures, and Processors, 2002

Since a large portion of the critical path in an implementation of radix-4 division corresponds t... more Since a large portion of the critical path in an implementation of radix-4 division corresponds to the delay of the quotient-digit selection module, it is of interest to reduce this delay. The proposal of this paper extends the approach presented recently of prestoring the selection constants corresponding to the actual value of the divisor and to perform the determination of the quotient digit by carry-free subtraction and sign detection. This extension consists in advancing the subtraction so that it is outside of the critical path. This advancement also provides the possibility of placing the registers so as to minimize the cycle time. We present the method and report results of synthesis using a family of standard cells. We conclude that the extension results in a speedup of 1.35 with respect to the basic implementation and of 1.3 with respect to the previously mentioned approach. We estimate that the areas of all three units are about the same.

17th IEEE Symposium on Computer Arithmetic (ARITH'05), 2005

Code compression architecture for cache energy minimisation in embedded systems

Low-power radix-4 divider

IEE Proceedings - Computers and Digital Techniques, 2002

ABSTRACT

Proceedings of 1996 International Symposium on Low Power Electronics and Design, 1996

Low-power radix-8 divider

Proceedings International Conference on Computer Design. VLSI in Computers and Processors (Cat. No.98CB36273), 1998

This work describes the design of a double-precision radix-8 divider. Low-power techniques are ap... more This work describes the design of a double-precision radix-8 divider. Low-power techniques are applied in the design of the unit, and energy-delay tradeoffs considered. The energy dissipation in the divider can be reduced by up to 70% with respect to a standard implementation not optimized for energy, without penalizing the latency. The radix-8 divider is compared with the one obtained

Cached-code compression for energy minimization in embedded processors

A Radix-10 Digit-Recurrence Division Unit: Algorithm and Architecture

Proceedings of the 2001 international symposium on Low power electronics and design - ISLPED '01, 2001

This paper contributes a novel approach for reducing static code size and instruction fetch energ... more This paper contributes a novel approach for reducing static code size and instruction fetch energy for cache-based core processors running embedded applications. Our implementation of the decompression unit guarantees fast and low-energy, on-the-fly instruction decompression at each cache lookup. The decompressor is placed outside the core boundaries; therefore, processor architecture does not need any modification, making the proposed compression approach

Temperature aware power optimization for multicore floating-point units

2010 Conference Record of the Forty Fourth Asilomar Conference on Signals, Systems and Computers, 2010

... Sensor Networks WA3b Multiuser Beamforming and Interference Channels WA4 Advances onAdaptive ... more

Power and Thermal Efficient Numerical Processing

Handbook on Data Centers, 2015

Division Unit for Binary Integer Decimals

2009 20th IEEE International Conference on Application-specific Systems, Architectures and Processors, 2009

IEEE Transactions on Computers, 2007

In this work, we present a radix-10 division unit that is based on the digit-recurrence algorithm... more In this work, we present a radix-10 division unit that is based on the digit-recurrence algorithm. The previous decimal division designs do not include recent developments in the theory and practice of this type of algorithm, which were developed for radix-2 k dividers. In addition to the adaptation of these features, the radix-10 quotient digit is decomposed into a radix-2 digit and a radix-5 digit in such a way that only five and two times the divisor are required in the recurrence. Moreover, the most significant slice of the recurrence, which includes the selection function, is implemented in radix-2, avoiding the additional delay introduced by the radix-10 carry-save additions and allowing the balancing of the paths to reduce the cycle delay. The results of the implementation of the proposed radix-10 division unit show that its latency is close to that of radix-16 division units (comparable dynamic range of significands) and it has a shorter latency than a radix-10 unit based on the Newton-Raphson approximation.

Reducing power dissipation in complex digital filters by using the quadratic residue number system

A Radix-10 Combinational Multiplier

2006 Fortieth Asilomar Conference on Signals, Systems and Computers, 2006

... n*4 n n A B D S C n − 1 ... 1 0 A aaaa ... aaaa aaaa B bbbb ... bbbb bbbb D d ... d d S ssss ... more ... n*4 n n A B D S C n − 1 ... 1 0 A aaaa ... aaaa aaaa B bbbb ... bbbb bbbb D d ... d d S ssss ... ssss ssss C c ... c 0 Fig. ... C5 c c ... c c c C6 c c ... c c c C7 c c ... c c c C8 c c ... c c c Z zzzz zzzz ... zzzzzzzz zzzz Fig. 5. m-digit radix-10 counter. The adder can be divided into three stages. ...

Low-power division: comparison among implementations of radix 4, 8 and 16

Proceedings 14th IEEE Symposium on Computer Arithmetic (Cat. No.99CB36336), 1999

Although division is less frequent than addition and mul- tiplication, because of its longer late... more Although division is less frequent than addition and mul- tiplication, because of its longer latency it dissipates a sub- stantial part of the energy in floating-point units. In this paper we explore the relation between the radix and the en- ergy dissipated. Previous work has been done on radix-4 and radix-8 division. Here we extend this study to a radix-

A variant of a radix-10 combinational multiplier

2008 IEEE International Symposium on Circuits and Systems, 2008

We consider the problem of adding the partial products in the combinational decimal multiplier pr... more We consider the problem of adding the partial products in the combinational decimal multiplier presented by Lang and Nannarelli. In the original paper this addition is done with a tree of decimal carry-save adders. In this paper, we treat the problem using the multi-operand decimal addition previously published by Dadda, where the sum of each column of the partial product

A Hybrid RNS Adaptive Filter for Channel Equalization

by Alberto Nannarelli, Gian Cardarilli, and Andrea Del Re

2006 Fortieth Asilomar Conference on Signals, Systems and Computers, 2006

In this work a hybrid residue number system (RNS) implementation of an adaptive FIR filter is pre... more In this work a hybrid residue number system (RNS) implementation of an adaptive FIR filter is presented. The used adaptation algorithm is the least mean squares (LMS). The filter has been designed to meet the constraints of specific class of applications. In fact, it is suitable for applications requiring a large number of taps where a serial updating of the

Combined Radix-10 and Radix-16 Division Unit

2007 Conference Record of the Forty-First Asilomar Conference on Signals, Systems and Computers, 2007

In this work we extend a previously proposed digit- recurrence radix-10 division unit to be able ... more In this work we extend a previously proposed digit- recurrence radix-10 division unit to be able to perform also radix-16 division. The extension is simplified by the fact that in the radix-10 implementation the quotient digit is decomposed into two parts and that this decomposition is also appropriate for the radix-16 case. Moreover, to reduce the latency in the radix- 10 the most-significant portion of the datapath, including the selection function, has been implemented in radix-2, so that the modifications of that part to include radix-16 consists mainly in combining the two modules to obtain the selection constants. The rest of the modifications relate to the generation of multiples, to the carry-save adder, to the carry-propagate adder, and to the on-the-fly conversion and rounding. The implementation results show that the delay of an iteration is similar to that of the radix-10 case and that the area is about thirty percent larger.

Conference Record of the Thirty-Fourth Asilomar Conference on Signals, Systems and Computers (Cat. No.00CH37154), 2000

The aim of this work is to compare in terms of performance, area and power dissipation, a complex... more The aim of this work is to compare in terms of performance, area and power dissipation, a complex FIR filter realized in the traditional two's complement system with a Quadratic Residue Number System (QRNS) based one. The resulting implementations, designed to work at the same clock rate, show that the QRNS filter is almost half the size of the traditional one, and dissipates about one third of the energy.

Low-power implementation of polyphase filters in Quadratic Residue Number system

by Alberto Nannarelli and Andrea Del Re

2004 IEEE International Symposium on Circuits and Systems (IEEE Cat. No.04CH37512), 2004

The aim of this work is the reduction of the power dissipated in digital filters, while maintaini... more The aim of this work is the reduction of the power dissipated in digital filters, while maintaining the timing unchanged. A polyphase filter bank in the Quadratic Residue Number System (QRNS) has been implemented and then compared, in terms of performance, area, and power dissipation to the implementation of a polyphase filter bank in the traditional two's complement system (TCS). The resulting implementations, designed to have the same clock rates, show that the QRNS filter is smaller and consumes less power than the TCS one.

Reducing power dissipation in FIR filters using the residue number system

Low-power radix-4 combined division and square root

Proceedings 1999 IEEE International Conference on Computer Design: VLSI in Computers and Processors (Cat. No.99CB37040), 1999

Tradeoffs between residue number system and traditional FIR filters

ISCAS 2001. The 2001 IEEE International Symposium on Circuits and Systems (Cat. No.01CH37196), 2001

In this work, a study on the implementation of FIR fil-ters in the Residue Number System (RNS) is... more

Proceedings of the 43rd IEEE Midwest Symposium on Circuits and Systems (Cat.No.CH37144), 2000

Low Latency Digit-Recurrence Reciprocal and Square-Root Reciprocal Algorithm and Architecture

Fast radix-4 retimed division with selection by comparisons

Proceedings IEEE International Conference on Application- Specific Systems, Architectures, and Processors, 2002

17th IEEE Symposium on Computer Arithmetic (ARITH'05), 2005

Code compression architecture for cache energy minimisation in embedded systems

Low-power radix-4 divider

IEE Proceedings - Computers and Digital Techniques, 2002

ABSTRACT

Proceedings of 1996 International Symposium on Low Power Electronics and Design, 1996

Low-power radix-8 divider

Proceedings International Conference on Computer Design. VLSI in Computers and Processors (Cat. No.98CB36273), 1998

Cached-code compression for energy minimization in embedded processors

A Radix-10 Digit-Recurrence Division Unit: Algorithm and Architecture

Proceedings of the 2001 international symposium on Low power electronics and design - ISLPED '01, 2001

Temperature aware power optimization for multicore floating-point units

2010 Conference Record of the Forty Fourth Asilomar Conference on Signals, Systems and Computers, 2010

... Sensor Networks WA3b Multiuser Beamforming and Interference Channels WA4 Advances onAdaptive ... more

Power and Thermal Efficient Numerical Processing

Handbook on Data Centers, 2015

Division Unit for Binary Integer Decimals

2009 20th IEEE International Conference on Application-specific Systems, Architectures and Processors, 2009

IEEE Transactions on Computers, 2007

Reducing power dissipation in complex digital filters by using the quadratic residue number system

A Radix-10 Combinational Multiplier

2006 Fortieth Asilomar Conference on Signals, Systems and Computers, 2006

Low-power division: comparison among implementations of radix 4, 8 and 16

Proceedings 14th IEEE Symposium on Computer Arithmetic (Cat. No.99CB36336), 1999

A variant of a radix-10 combinational multiplier

2008 IEEE International Symposium on Circuits and Systems, 2008

A Hybrid RNS Adaptive Filter for Channel Equalization

by Alberto Nannarelli, Gian Cardarilli, and Andrea Del Re

2006 Fortieth Asilomar Conference on Signals, Systems and Computers, 2006

Combined Radix-10 and Radix-16 Division Unit

2007 Conference Record of the Forty-First Asilomar Conference on Signals, Systems and Computers, 2007

Conference Record of the Thirty-Fourth Asilomar Conference on Signals, Systems and Computers (Cat. No.00CH37154), 2000

Low-power implementation of polyphase filters in Quadratic Residue Number system

by Alberto Nannarelli and Andrea Del Re

2004 IEEE International Symposium on Circuits and Systems (IEEE Cat. No.04CH37512), 2004