Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2004, … Workshop on Logic …
We present a complete toolflow that translates ANSI-C programs into asynchronous circuits. The toolflow is built around a compiler that converts C into a functional dataflow intermediate representation, exposing instruction-level, pipeline and memory parallelism. The compiler performs optimizations and converts the intermediate representation into pipelined asynchronous circuits, with no centralized controllers. In the resulting circuits, control is distributed, communication is achieved through local wires, and arbitration for datapath resources is unnecessary. Circuits automatically synthesized from Mediabench kernels exhibit excellent energy-delay.
Sixth International Workshop on High-Level Synthesis, 1992
This paper presents Achilles, a High-Level Synthesis System for asynchronous digital circuits. A new architecture model based on a completely distributed control structure is proposed. The most relevant di erences from synthesis systems for synchronous circuits appear in the phases of scheduling and synthesis of the control. Signal Transition Graphs are automatically generated to describe the behavior of local controllers.
ACM Journal on Emerging Technologies in Computing Systems, 2011
We present two novel energy-efficient pipeline templates for high throughput asynchronous circuits. The proposed templates, called N-P and N-Inverter pipelines, use a single-track handshake protocol. There are multiple stages of logic within each pipeline. The proposed techniques minimize handshake overheads associated with input tokens and intermediate logic nodes within a pipeline template. Each template can pack a significant amount of logic in a single stage, while still maintaining a fast cycle time of only 18 transitions. Noise and timing robustness constraints of our pipelined circuits are quantified across all process corners. We present completion detection scheme based on wide NOR gates, which results in significant latency and energy savings especially as the number of outputs increase. To fully quantify all design trade-offs, three separate pipeline implementations of an 8x8-bit Booth-encoded array multiplier are presented. Compared to a standard QDI pipeline implementat...
IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 2010
A method is described for synthesising asynchronous circuits based on the Handshake Circuit paradigm but employing a data-driven, rather than a control-driven, style. This approach attempts to combine the performance advantages of data-driven asynchronous design styles with the handshake circuit style of construction used in existing syntax-directed synthesis. The method is demonstrated on a significant design-a 32-bit microprocessor. This example shows that the data-driven circuit style provides better performance than control-driven synthesised circuits. The paper extends previous reported work by illustrating how conditional execution, oft-cited as a problem for data-driven descriptions, is handled within the system, and by a more detailed analysis of the design example.
Though asynchronous design has been studied for decades, some objective design di culties made synchronous design style more suitable for commercial applications. The promise of no clock skew, the higher degree of modularity, the low power consumption, have generated a resurgence of interest of the scienti c community for asynchronous logic.
2001
This paper describes a novel approach to high-level synthesis of complex pipelined circuits, including pipelined circuits with feedback. This approach combines a high-level, modular specification language with an efficient implementation. In our system, the designer specifies the circuit as a set of independent modules connected by conceptually unbounded queues. Our synthesis algorithm automatically transforms this modular, asynchronous specification into a tightly coupled, fully synchronous implementation in synthesizable Verilog. ¡ Approach: It presents a new approach to high-level synthesis. This approach combines the best of both worlds: a modular, asynchronous specification language and an automatically generated synchronous, fully pipelined implementation. ¡ Algorithms: It presents a relaxation algorithm for decreasing the clock cycle time and a coordinated global scheduling algorithm for mapping the individual operations of the modules into clock cycles. The latter is the enabling technology for efficient pipelining, as it allows the data to move together across the circuit even when the pipeline buffers are full.
2008 14th IEEE International Symposium on Asynchronous Circuits and Systems, 2008
We present a technique to automatically synthesize heterogeneous asynchronous pipelines by combining two different latching styles: normally open D-latches for high performance and self-resetting D-latches for low power. The former is fast but results in high power consumption due to data glitches that leak through the latch when it is open. The latter is normally closed and is opened just before data stabilizes. Thus, it is more power-efficient but slower than normally open D-latches.
2005 International Conference on Computer Design
The development of robust synthesis techniques and tools is important if asynchronous design is to gain more widespread acceptance. Handshake circuits are a method of constructing asynchronous circuits from a set of modular components connected by handshake channels. They offer a level of abstraction above a particular target technology or implementation style. The Balsa system employs the handshake circuit approach and has demonstrated that it can be used to rapidly generate large, robust circuits. This speed and flexibility is currently achieved at the cost of performance. This paper examines the problem of control overhead in handshake circuits and proposes new handshake component specifications and implementations that significantly reduce this overhead. These changes are incorporated into the Balsa synthesis system and are shown to produce a doubling of the performance of a 32-bit processor without making any changes to the original description.
Asynchronous techniques are regaining relevance in the VLSI research community as they allow increasing robustness against process variability considerably, by relaxing timing assumptions. In addition, asynchronous circuits enable achieving low-power and high-speed designs. However, due to the absence of commercial dedicated standard cell libraries to take the most of asynchronous design, such circuits implementations are relegated to full-custom approaches only. This limits applicability of asynchronous solutions and avoids further development of dedicated design automation tools. This paper describes an improvement to this situation by proposing a fully-automated design-flow called ASCEnD-A, able to implement standard cells specifically required for asynchronous circuits design. The flow is capable of generating cells at the layout level, providing physical, power and timing models required by cell-based flows available in the state-of-the-art technologies.
IEEE Design & Test of Computers, 2011
IEEE Access
Proceeding miniaturization in VLSI circuits continues to pose challenges to the conventionally used synchronous design style in microprocessors. These include distribution of clock in the GHz range, robustness to delay variations, reduction in electromagnetic interference and energy conservation, and to name a few. Asynchronous logic has been known for its ability to address the aforementioned challenges by means of closed-loop handshake protocols, instead of notorious clock signals. Because of these advantages, there have been numerous attempts on building general and special purpose microprocessors during the last three decades. Still, however, the number of asynchronous processors commercially available is scarce, mainly due to an insufficient electronic design and automation tools support, an ambiguous design flow and testing mechanisms for asynchronous logic, and, most importantly, absence of a forum to look for relevant works, explaining the design steps and tools for such microprocessors. This work is intended to bridge this gap by 1) reviewing the design principles of asynchronous logic, including classification, signaling conventions, and pipelining approaches, 2) presenting the complete design flow, and available EDA tools, 3) developing an encyclopedia of various general and special purpose microprocessors proposed by far, and 4) presenting an evaluation of those works in terms of area on the die and performance metrics. This work will also serve as guidelines for asynchronous microprocessor design and implementation in all phases from specification to tape out. INDEX TERMS Asynchronous logic, electronic design and automation, microprocessor.
Proceedings Second International Symposium on Advanced Research in Asynchronous Circuits and Systems, 1996
This paper presents design and simulation results of two high-performance asynchronous pipeline circuits. The rst circuit is a two-phase micropipeline but uses pseudo-static Svensson-style double edge-triggered Dip-ops (DETDFF) for data storage in place of traditional transmission gate latches or Sutherland's capture-pass latches. The second circuit is a fourphase micropipeline with burst-mode control circuits.
This paper describes a novel approach to high-level synthesis of complex pipelined circuits, including pipelined circuits with feedback. This approach combines a high-level, modular speci£cation language with an ef£cient implementation. In our system, the designer speci£es the circuit as a set of independent modules connected by conceptually unbounded queues. Our synthesis algorithm automatically transforms this modular, asynchronous speci£cation into a tightly coupled, fully synchronous implementation in synthesizable Verilog.
IEEE Transactions on Computer-aided Design of Integrated Circuits and Systems, 2006
Asynchronous implementation techniques, which measure logic delays at runtime and activate registers accordingly, are inherently more robust than their synchronous counterparts, which estimate worst case delays at design time and constrain the clock cycle accordingly. Desynchronization is a new paradigm to automate the design of asynchronous circuits from synchronous specifications, thus, permitting widespread adoption of asynchronicity without requiring special design skills or tools. In this paper, different protocols for desynchronization are first studied, and their correctness is formally proven using techniques originally developed for distributed deployment of synchronous language specifications. A taxonomy of existing protocols for asynchronous latch controllers, covering, in particular, the four-phase handshake protocols devised in the literature for micropipelines, is also provided. A new controller that exhibits provably maximal concurrency is then proposed, and the performance of desynchronized circuits is analyzed with respect to the original synchronous optimized implementation. Finally, this paper proves the feasibility and effectiveness of the proposed approach by showing its application to a set of real designs, including a complete implementation of the DLX microprocessor architecture
2006
Asynchronous microprocessors are more flexible to adapt to physical parameters, and have lower power consumption than synchronous microprocessors. In this paper we will introduce the design of an asynchronous microprocessor (V8-uRISC) and explore its design process compared to synchronous design. The processor is synthesized by Persia, an automatic tool for synthesizing asynchronous circuits. We have performed full functional test at various levels of design and synthesis. Our results show that an area overhead is expected for the asynchronous design as the cost for lower power and robustness.
The 2020 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2020
Commercial high-level synthesis tools typically produce statically scheduled circuits. Yet, effective C-to-circuit conversion of arbitrary software applications calls for dataflow circuits, as they can handle efficiently variable latencies (e.g., caches) and unpredictable memory dependencies. Dataflow circuits exhibit an unconventional property: registers (usually referred to as "buffers") can be placed anywhere in the circuit without changing its semantics, in strong contrast to what happens in traditional datapaths. Yet, although functionally irrelevant, this placement has a significant impact on the circuit's timing and throughput. In this work, we show how to strategically place buffers into a dataflow circuit to optimize its performance. Our approach extracts a set of choice-free critical loops from arbitrary dataflow circuits and relies on the theory of marked graphs to optimize the buffer placement and sizing. We demonstrate the performance benefits of our approach on a set of dataflow circuits obtained from imperative code.
Proceedings Design, Automation and Test in Europe Conference and Exhibition, 2004
A novel methodology and algorithm for the design of large low-power asynchronous systems are described. The system is synthesized by a commercial tool as a synchronous circuit, and subsequently converted into an asynchronous one. The conversion algorithm consists of extracting input and output sets, replacing the storage elements, identifying fork and join sets, and constructing request and acknowledge networks. A DLAP (Doubly Latched Asynchronous Pipeline) architecture is employed. The resulting asynchronous circuit can adapt its effective operating frequency to the supply voltage, facilitating flexible and efficient power management. The algorithm has been validated on several circuits.
2008 3rd International Design and Test Workshop, 2008
As the number of cores continues to grow in both digital signal and general purpose processors, tools which perform automatic scheduling from model-based designs are of increasing interest. CAL is a new actor/dataflow oriented language that aims at helping the programmer to express the concurrency and parallelism that are very important aspects of embedded system design as we enter in the multicore era. The design framework is composed by the OpenDF simulation platform, by Cal2C and CAL2HDL code generators and by a multiprocessor scheduling tool called PREESM. Yet in this paper, a subset of CAL is used to describe the application such that the application is SDF. This SDF graph is one starting point of the workflow of PREESM (composed of several plug-ins) to be prototyped/distributed/scheduled over an IP-XACT multiprocessor platform description. The PREESM automatic scheduling consists in statically distributing the tasks that constitute an application between available cores in a multi-core architecture in order to minimize the final latency. This problem has been proven to be NP-complete. An IDCT 2D example will be used as test case of the full framework.
In order to convert High Level Language (HLL) into hardware, a Control Dataflow Graph (CDFG) is a fundamental element to be used. Otherwise, Dataflow Architecture, can be obtained directly from the CDFG. In the 1970s and late 1980s, the Dataflow Model was the focus of attention that provided parallelism in a natural form. In particular, dynamic dataflow architecture can be generated to produce a high level of parallelism. In this paper, the ChipCflow project is described as a system to convert HLL into a dynamic dataflow graph to be executed in a dynamic reconfigurable hardware, exploring the dynamic reconfiguration. The ChipCflow consists of various parts: the compiler to convert the C program into a dataflow graph; the operators and its instances; the tagged-token; and the matching data. Some results are presented in order to show a proof of concept for the project.
Design & Test of Computers, IEEE, 2006
2012 IEEE 11th International Conference on Signal Processing, 2012
Increasing use of multiprocessor system-on-chip (MPSoC) technology is an important trend in the design and implementation of signal processing systems. However, the design of efficient DSP software for MPSoC platforms involves complex inter-related steps, including data decomposition, memory management, and inter-task and inter-thread synchronization. These design steps are challenging, especially under strict constraints on performance and power consumption, and tight time to market pressures. To facilitate these steps, we have developed a new dataflow based design flow within the targeted dataflow interchange format (TDIF) design tool. Our new MPSoC-oriented design flow, called TDIF-PPG, is geared towards analysis and mapping of embedded DSP applications on MPSoCs. An important feature of TDIF-PPG is its capability to integrate graph level parallelism for DSP system flowgraphs and actor level parallelism for DSP functional modules into the application mapping processing. Here, graph level parallelism is exposed by the dataflow graph application representation in TDIF, and actor level parallelism is modeled by a novel model for multiprocessor dataflow graph implementation that we call the parallel processing group (PPG) model. We demonstrate our approach through actor and subsystem design for software defined radio.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.