Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2007, 2007 44th ACM/IEEE Design Automation Conference
Asynchronous circuits are increasingly attractive as low power or high-performance replacements to synchronous designs. A key part of these circuits are asynchronous micropipelines; unfortunatelly, the existing micropipeline styles either improve performance or decrease power consumption, but not both. Very often, the pipeline register plays a crucial role in these cost metrics. In this paper we introduce a new register design, called self-resetting latches, for asynchronous micropipelines which bridges the gap between fast, but power hungry, latch-based designs and slow, but low power, flip-flop designs. The energy-delay metric for large asynchronous systems implemented with self-resetting latches is, on average, 41% better than latch-based designs and 15% better than flip-flop designs.
Proceedings Second International Symposium on Advanced Research in Asynchronous Circuits and Systems, 1996
This paper presents design and simulation results of two high-performance asynchronous pipeline circuits. The rst circuit is a two-phase micropipeline but uses pseudo-static Svensson-style double edge-triggered Dip-ops (DETDFF) for data storage in place of traditional transmission gate latches or Sutherland's capture-pass latches. The second circuit is a fourphase micropipeline with burst-mode control circuits.
ACM Journal on Emerging Technologies in Computing Systems, 2011
We present two novel energy-efficient pipeline templates for high throughput asynchronous circuits. The proposed templates, called N-P and N-Inverter pipelines, use a single-track handshake protocol. There are multiple stages of logic within each pipeline. The proposed techniques minimize handshake overheads associated with input tokens and intermediate logic nodes within a pipeline template. Each template can pack a significant amount of logic in a single stage, while still maintaining a fast cycle time of only 18 transitions. Noise and timing robustness constraints of our pipelined circuits are quantified across all process corners. We present completion detection scheme based on wide NOR gates, which results in significant latency and energy savings especially as the number of outputs increase. To fully quantify all design trade-offs, three separate pipeline implementations of an 8x8-bit Booth-encoded array multiplier are presented. Compared to a standard QDI pipeline implementat...
IEICE Electronics Express
Voltage scaling is an effective technique for ultra-low-power applications. However, PVT variation degrades the robust of traditional synchronous pipelines severely when voltage scales into the subthreshold region. In this paper, we propose a register-based bundleddata asynchronous pipeline that can operate robustly in sub-threshold, called Snake. By looping the match delay line, the Snake halves the design overhead compared to other asynchronous pipelines. We also propose a practical asynchronous design methodology which is compatible with commercial EDA and needs only a few modifications to synchronous design flow. Monte-Carlo SPICE simulation shows that the pipelined multiplier applying the proposed techniques operates stably in 0.2V and achieves minimum power 1.3nW in 0.2V, minimum energy 1.07pJ per cycle in 0.3V. It provides 6.7 times superiority over synchronous baseline design with 22% area overhead. Comparison with other works in the state of art shows the proposed techniques are quite competitive.
2008 14th IEEE International Symposium on Asynchronous Circuits and Systems, 2008
We present a technique to automatically synthesize heterogeneous asynchronous pipelines by combining two different latching styles: normally open D-latches for high performance and self-resetting D-latches for low power. The former is fast but results in high power consumption due to data glitches that leak through the latch when it is open. The latter is normally closed and is opened just before data stabilizes. Thus, it is more power-efficient but slower than normally open D-latches.
Circuits and Systems
The objective of the work is to design a new clock gated based flip flop for pipelining architecture. In computing and consumer products, the major dynamic power is consumed in the system's clock signal, typically about 30% to 70% of the total dynamic (switching) power consumption. Several techniques to reduce the dynamic power have been developed, of which clock gating is predominant. In this work, a new methodology is applied for gating the Flip flop by which the power will be reduced. The clock gating is employed to the pipelining stage flip flop which is active only during valid data are arrived. The methodology used in project named Selective Look-Ahead Clock Gating computes the clock enabling signals of each FF one cycle ahead of time, based on the present cycle data of those FFs on which it depends. Similarly to data-driven gating, it is capable of stopping the majority of redundant clock pulses. In this work, the circuit implementation of the various blocks of data driven clock gating is done and the results are observed. The proposed work is used for pipelining stage in microprocessor and DSP architectures. The proposed method is simulated using the quartus for cyclone 3 kit.
International Journal of Engineering and Technology, 2011
Synchronous logic design is the dominant main stream integrated circuit design methodology. Flip-flops are an inherent building block in any synchronous design. Furthermore flip-flops constitute most of the load on the clock distribution and power networks, which are the main power consuming networks of a synchronous integrated circuit. We survey, design and simulate a superset of flip-flops designed for low power and high performance. We highlight the basic design features of these flip-flops and evaluate them based on timing characteristics, power consumption, and other metrics. Moreover, we propose a new flip-flop design. We go in depth into a finer granularity comparison of the lowest peak power surveyed flip-flops reported in the literature; we show the competitiveness of the new design and make our recommendations.
IEEE Computer Society Annual Symposium on VLSI, 2014
Advancement in deep submicron (DSM) technologies led to miniaturization. However, it also increased the vulnerability against some electrical and device non-idealities, including the soft errors. These errors are significant threat to the reliable functionality of digital circuits. Several techniques for the detection and deterrence of soft errors (to improve the reliability) have been proposed, both in synchronous and asynchronous domain. In this paper we propose a low power and soft error tolerant solution for synchronous systems that leverages the asynchronous pipeline within a synchronous framework. We named our technique as macro synchronous micro asynchronous (MSMA) pipeline. We provided a framework along with timing analysis of the MSMA technique. MSMA is implemented using a macro synchronous system and soft error tolerant and low power version of null convention logic (NCL) asynchronous circuit. It is found out that this solution can easily replace the intermediate stages of synchronous and asynchronous pipelines without changing its interface protocol. Such NCL asynchronous circuits can be used as a standard cell in the synchronous ASIC design flow. Power and performance analysis is done using electrical simulations, which shows that this techniques consumes at least 22% less power and 45% less energy delay product (EDP) compared to state-of-the-art solutions.
The demand for low power circuit design has increased tremendously due to explosive growth of battery operated portable devices like Microcontroller.Microcontroller uses register blocks that are inturn consists of flip flops. The mandate to reduce system power consumption and design energy-efficient ICs has led to the increasing use of low-power IC design techniques that prolong the battery life. In this paper, a novel highly efficient power and delay optimized True Single Phase clocked (TPSC) edge triggered flip-flop has been proposed. The proposed circuit uses lesser number of transistors than the conventional transmission gate D flip-flop that reduce the overall power and delay.The proposed design is also free from both glitch and charge sharing problems making it suitable for high speed and low power applications. The circuits are simulated in TANNER EDA simulation tool using PTM 180nm technology files to compare the performance of proposed circuit with the existing ones. The circuit performs well at different supply voltages.
Journal of Circuits, Systems and Computers, 2011
With CMOS technology scaling, leakage power is expected to become a significant portion of the total power. A dual-threshold CMOS circuit, which has both high and low threshold transistors in a single chip, can be used to deal with the leakage problem in high performance applications. This paper presents dual-threshold voltage technique for reducing leakage power dissipation of Quasi Delay Insensitive asynchronous pipelines while still maintaining high performance of these circuits. We exploited the Dependency Graph model to produce a formal performance analysis. In order to reduce leakage power an efficient algorithm for selecting and assigning high threshold voltage to templates of a pipeline is proposed. Results obtained indicate that our proposed technique can achieve on average 40% savings for leakage power, while there is no performance penalty.
Design, Automation & Test in Europe Conference & Exhibition (DATE), 2017, 2017
Power dissipation is one of the primary design constraints in modern digital circuits. From a magnitude of hand-held portable devices to big data analytics using highperformance computing, low energy dissipation is a key requirement for most modern devices. This paper showcases an elegant low power circuit design methodology based on Relative Timing driven asynchronous techniques. A low power MSP430 microprocessor design based on a novel asynchronous finite state machine implementation is presented. The design showcases the power benefits of the proposed asynchronous implementation over the synchronous counterpart and avoids major architectural modification which would directly influence the performance or power consumption. The implemented asynchronous MSP430 exhibits a minimum of 8x power benefit over the synchronous design for an almost identical pipeline structure and comparable throughput. The paper further elaborates on the novel asynchronous state machine implementation used ...
2003
With increasing clock frequencies and silicon integration, power aware computing has become a critical concern in the design of embedded processors and systems-on-chip. One of the more effective and widely used methods for poweraware computing is dynamic voltage scaling (DVS). In order to obtain the maximum power savings from DVS, it is essential to scale the supply voltage as low as possible while ensuring correct operation of the processor. The critical voltage is chosen such that under a worst-case scenario of process and environmental variations, the processor always operates correctly. However, this approach leads to a very conservative supply voltage since such a worst-case combination of different variabilities will be very rare. In this paper, we propose a new approach to DVS, called Razor, based on dynamic detection and correction of circuit timing errors. The key idea of Razor is to tune the supply voltage by monitoring the error rate during circuit operation, thereby eliminating the need for voltage margins and exploiting the data dependence of circuit delay. A Razor flip-flop is introduced that double-samples pipeline stage values, once with a fast clock and again with a time-borrowing delayed clock. A metastability-tolerant comparator then validates latch values sampled with the fast clock. In the event of a timing error, a modified pipeline mispeculation recovery mechanism restores correct program state. A prototype Razor pipeline was designed in 0.18 µm technology and was analyzed. Razor energy overheads during normal operation are limited to 3.1%. Analyses of a fullcustom multiplier and a SPICE-level Kogge-Stone adder model reveal that substantial energy savings are possible for these devices (up to 64.2%) with little impact on performance due to error recovery (less than 3%).
IEEE Access
Proceeding miniaturization in VLSI circuits continues to pose challenges to the conventionally used synchronous design style in microprocessors. These include distribution of clock in the GHz range, robustness to delay variations, reduction in electromagnetic interference and energy conservation, and to name a few. Asynchronous logic has been known for its ability to address the aforementioned challenges by means of closed-loop handshake protocols, instead of notorious clock signals. Because of these advantages, there have been numerous attempts on building general and special purpose microprocessors during the last three decades. Still, however, the number of asynchronous processors commercially available is scarce, mainly due to an insufficient electronic design and automation tools support, an ambiguous design flow and testing mechanisms for asynchronous logic, and, most importantly, absence of a forum to look for relevant works, explaining the design steps and tools for such microprocessors. This work is intended to bridge this gap by 1) reviewing the design principles of asynchronous logic, including classification, signaling conventions, and pipelining approaches, 2) presenting the complete design flow, and available EDA tools, 3) developing an encyclopedia of various general and special purpose microprocessors proposed by far, and 4) presenting an evaluation of those works in terms of area on the die and performance metrics. This work will also serve as guidelines for asynchronous microprocessor design and implementation in all phases from specification to tape out. INDEX TERMS Asynchronous logic, electronic design and automation, microprocessor.
1994
The clock frequency of a synchronous circuit can be increased at the expense of increased system latency, area, and power using synchronous optimization techniques such as pipelining and retiming. Pipelining is well developed methodology, having been applied to almost every computer architecture from microprocessors to supercomputers. Retiming, on the other hand, has only recently become popular and practical application areas currently being developed. Both pipelining and retiming are reviewed in this paper. In order to make retiming more generally useful, low-level circuit delay components inherent to IC must be incorporated into the retiming process. These include variable register delay, clock skew, and interconnect delay. An algorithm is presented by the authors for incorporating variable register delays, interconnect delay, and the clock skew into retiming. The algorithm identifies and eliminates path-dependant race conditions in synchronous circuits. The results of applying the algorithm to MCNC benchmarks is presented and both performance and reliability improvements are observed
IEEE Transactions on Circuits and Systems II: Analog and Digital Signal Processing, 1997
Energy consumption has become one of the important factors in digital systems, because of the requirement to dissipate this energy in high-density circuits and to extend the battery life in portable systems such as devices with wireless communication capabilities. Flip-flops are one of the most energyconsuming components of digital circuits. This paper presents techniques to reduce energy consumption by individually deactivating the clock when flip-flops do not have to change their value. Flip-flop structures are proposed and selection criteria given to obtain minimum energy consumption. The structures have been evaluated using energy models and validated by switch-level simulations. For the applications considered, significant energy reductions are achieved. Index Terms-Flip-flop energy model, gated clocks, low power datapaths.
IEEE Design & Test of Computers, 2011
Carolina at Chapel Hill ONE OF THE FOUNDATIONS of high-performance digital system design is the use of pipelining. In synchronous systems, for several decades, pipelining has been the fundamental technique used to increase parallelism and hence boost system throughputÀ À whether for high-performance processors, multimedia and graphics units, or signal processors. This article provides an overview of pipelining in asynchronous, or clockless, digital systems. We do not attempt an exhaustive coverage, but rather introduce the basics of several leading representative styles. These pipelines naturally fall into two classes: those that use static logic versus those that use dynamic logic for the data path. Each class tends to use a distinct approach for its control and data storage. For static logic, we introduce the classic micropipeline of Sutherland, 1 along with two highperformance variants: Mousetrap 2 (which uses a standard cell design) and GasP 3 (which uses a custom design). For dynamic logic, we present the classic PS0 pipeline of Williams and Horowitz, 4,5 along with two high-performance variants: the precharge half-buffer (PCHB) pipeline 6 (which provides greater timing robustness) and the high-capacity (HC) pipeline 7 (which provides double the storage capacity). We also briefly discuss design tradeoffs, performance evaluation, systemlevel analysis and optimization techniques, CAD tool support, testing, and recent industrial and academic applications.
IEE Proceedings - Circuits, Devices and Systems, 1996
This paper presents new high-performance building blocks for two-phase micropipelines. We develop pseudo-static Svensson-style double edge-triggered D-ip-ops (DETDFF) for datapath storage in place of traditional capture-pass or transmission gate latches. We compare a DETDFF FIFO bu er implementation with the current state-of-the-art micropipeline implementation using four-phase controllers designed by Day and Woods for the AMULET-2 processor. We implemented both designs in the MOSIS 1:2 m CMOS process and simulated them under the worst-case process corner with a 4.6V power supply and at 100 C. Our SPICE simulations show that the DETDFF design has 70% higher throughput. This higher throughput is due to latching the data on both edges of the latch control, removing the need of a reset phase and simplifying the control structures. In addition, we present two commonly used micropipeline event-control structures, the select and toggle elements, implemented using the extended-burst-mode 3D synthesis system. Detailed simulations demonstrate that our implementations are up to 50% faster than traditional implementations. This speed advantage can be primarily attributed to careful applications of generalized C-elements rather than discrete basic gates.
IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 2000
Micropipelines and most of its variants use a delay-insensitive controller to moderate a pipeline. In search of improved performance, we depart from the delay-insensitive model in favor of a bounded-delay model for the controller. In particular, we demonstrate how a general delay-insensitive controller for level-sensitive pipelines can be improved by assuming a bounded-delay model and taking advantage of delay information to make the controller faster and more efficient. The new control scheme is referred to as locally clocked (LC) control. A highly pipelined logic technique called LC dynamic logic is presented that combines the bounded-delay controller with a latching dynamic logic gate design. Simulations comparing LC control with its delay-insensitive counterpart are presented. Also, an 8 8 bit multiplier with a maximum frequency of 715 MHz for a 1 m CMOS process that uses LC dynamic logic is presented.
2007 IEEE International Symposium on Circuits and Systems, 2007
This paper presents a method to investigate powerperformance tradeoffs in digital pipelined designs. The method is applied at the architectural level of the design. It will be shown that addressing the tradeoffs at this level will result in significant savings in power consumption without impacting the performance. The reduction in power is obtained through reducing the number of registers used in implementing the pipeline stages. The method has been validated by synthesizing a floating-point unit with different pipeline stages and power consumption of the designs were obtained using industry standard tools. It is shown that it is possible to obtain up to 18% reduction in power without affecting the clock period and with less area. I.
Memoria Investigaciones en Ingeniería N23, 2022
Asynchronous circuits is an alternative to design digital systems that is becoming the interest of many researchers in the digital design area mainly due to it’s low-power consumption and robustness. One of the most compelling design paradigms of asynchronous circuits is the NULL Convention Logic (NCL). The pipeline is a very common technique used in digital circuits to achieve high throughput. Although one can implement a pipeline using NCL gates, recent works have shown that register-less pipelines are possible using modified NCL gates. In this paper we propose two new Register-Less NCL (RL-NCL) pipeline architectures and two new methods to design NCL gates, which can be implemented even in Field Programmable Gate Arrays (FPGAs) or using the standard cells method. The new design of the proposed architecture was able to achieve an average area reduction of 27,32%, an average latency reduction of 14,1% and an average throughput increase of 5,54% comparing with the conventional NCL pipeline architecture.
Proceedings of the 1998 international symposium on Low power electronics and design - ISLPED '98, 1998
In this paper we propose a set of rules for consistent estimation of the real performance and power features of the latch and flip-flop structures. A new simulation and optimization approach is presented, targeting both high-performance and power budget issues. The analysis approach reveals the sources of performance and power consumption bottlenecks in different design styles. Certain misleading parameters have been properly modified and weighted to reflect the real properties of the compared structures. Furthermore, the results of the comparison of representative latches and flipflops illustrate the adxantages of our approach and the suitability of different design styles for low-power and highperformance applications.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.