Papers by Tiberiu Chelcea
Iwls, Jun 2, 2004
We present a complete toolflow that translates ANSI-C programs into asynchronous circuits. The to... more We present a complete toolflow that translates ANSI-C programs into asynchronous circuits. The toolflow is built around a compiler that converts C into a functional dataflow intermediate representation, exposing instruction-level, pipeline and memory parallelism. The compiler performs optimizations and converts the intermediate representation into pipelined asynchronous circuits, with no centralized controllers. In the resulting circuits, control is distributed, communication is achieved through local wires, and arbitration for datapath resources is unnecessary. Circuits automatically synthesized from Mediabench kernels exhibit excellent energy-delay.

An effective method for focusing optimization effort on the most important parts of a design is t... more An effective method for focusing optimization effort on the most important parts of a design is to examine those elements on the critical path. Traditionally, the critical path is defined at the RTL level, as the longest path in the combinational logic between clocked registers. In this paper, we present a system-level timing analysis technique to define the concept of a Global Critical Path (GCP), for predicting system-level performance. We show how the GCP can be used as a theoretical and practical tool for understanding, summarizing and optimizing the behavior of highly concurrent self-timed circuits. We formally define the GCP and show how it can be constructed using a discrete event model and hardware profiling techniques. The GCP provides valuable insight into the control-path behavior of circuits and in finding system-level bottlenecks. We have incorporated the GCP construction and analysis framework into a high-level synthesis and simulation toolchain, thus enabling complete automation in modeling, analysis and optimization.
Adding Faster with Application Specific Early Termination
ABSTRACT

Area Optimizations for Dual-Rail Circuits Using Relative-Timing Analysis
13th IEEE International Symposium on Asynchronous Circuits and Systems (ASYNC'07), 2007
ABSTRACT Future deep sub-micron technologies will be characterized by large parametric variations... more ABSTRACT Future deep sub-micron technologies will be characterized by large parametric variations, which could make asynchronous design an attractive solution for use on large scale. However, the investment in asynchronous CAD tools does not approach that in synchronous ones. Even when asynchronous tools leverage existing synchronous tool flows, they introduce large area and speed overheads. This paper proposes several heuristic and optimal algorithms, based on timing interval analysis, for improving existing asynchronous CAD solutions by optimizing area. The optimized circuits are 2.4 times smaller for an optimal algorithm and 1.8 times smaller for a heuristic one than the existing solutions. The optimized circuits are also shown to be resilient to large parametric variations, yielding better average-case latencies than their synchronous counterparts.
A burst-mode oriented back-end for the Balsa synthesis system
Proceedings 2002 Design, Automation and Test in Europe Conference and Exhibition, 2002
Abstract This paper introduces several new component clustering techniques for the optimization o... more Abstract This paper introduces several new component clustering techniques for the optimization of asynchronous systems. In particular, novel" burst-mode aware" restriction.; are imposed to limit the cluster sizes and to ensure synthesizability. A new control ...
2008 14th IEEE International Symposium on Asynchronous Circuits and Systems, 2008
We present a technique to automatically synthesize heterogeneous asynchronous pipelines by combin... more We present a technique to automatically synthesize heterogeneous asynchronous pipelines by combining two different latching styles: normally open D-latches for high performance and self-resetting D-latches for low power. The former is fast but results in high power consumption due to data glitches that leak through the latch when it is open. The latter is normally closed and is opened just before data stabilizes. Thus, it is more power-efficient but slower than normally open D-latches.
2007 44th ACM/IEEE Design Automation Conference, 2007
Asynchronous circuits are increasingly attractive as low power or high-performance replacements t... more Asynchronous circuits are increasingly attractive as low power or high-performance replacements to synchronous designs. A key part of these circuits are asynchronous micropipelines; unfortunatelly, the existing micropipeline styles either improve performance or decrease power consumption, but not both. Very often, the pipeline register plays a crucial role in these cost metrics. In this paper we introduce a new register design, called self-resetting latches, for asynchronous micropipelines which bridges the gap between fast, but power hungry, latch-based designs and slow, but low power, flip-flop designs. The energy-delay metric for large asynchronous systems implemented with self-resetting latches is, on average, 41% better than latch-based designs and 15% better than flip-flop designs.
Several approaches have been proposed for the syntax- directed compilation of asynchronous circui... more Several approaches have been proposed for the syntax- directed compilation of asynchronous circuits from high- level specification languages, such as Balsa (1) and Tan- gram (13, 10). Both compilers have been successfully used in large real-world applications; however, in prac- tice, these methods suffer from significant performance overheads due to their reliance on straightforward syntax- directed translation. This paper describes
Soma
Proceedings of the 3rd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis - CODES+ISSS '05, 2005
Arbitrary memory dependencies and variable latency memory systems are major obstacles to the synt... more Arbitrary memory dependencies and variable latency memory systems are major obstacles to the synthesis of large-scale ASIC systems in high-level synthesis. This paper presents SOMA, a synthesis framework for constructing Memory Access Network (MAN) architectures that inherently enforce memory consistency in the presence of dynamic memory access dependencies. A fundamental bottleneck in any such network is arbitrating between concurrent accesses
International Workshop on Logic & Synthesis, 2005
A major constraint in high-level synthesis (HLS) for large-scale ASIC systems is memory access pa... more A major constraint in high-level synthesis (HLS) for large-scale ASIC systems is memory access patterns. Typically, most state- of-the-art HLS tools severely constrain the kinds of memory ref- erences allowed in the source, requiring them to have predictable access patterns or requiring dependencies between them to be stat- ically determinable. This paper shows how these constraints can be eliminated. We
Tartan
ACM SIGARCH Computer Architecture News, 2006
Spatial Computing (SC) has been shown to be an energy-efficient model for implementing program ke... more Spatial Computing (SC) has been shown to be an energy-efficient model for implementing program kernels. In this paper we explore the feasibility of using SC for more than small kernels. To this end, we evaluate the performance and energy efficiency of entire applications on Tartan, a general-purpose architecture which integrates a reconfigurable fabric (RF) with a superscalar core. Our compiler
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems - ASPLOS-XII, 2006
Spatial Computing (SC) has been shown to be an energy-efficient model for implementing program ke... more Spatial Computing (SC) has been shown to be an energy-efficient model for implementing program kernels. In this paper we explore the feasibility of using SC for more than small kernels. To this end, we evaluate the performance and energy efficiency of entire applications on Tartan, a general-purpose architecture which integrates a reconfigurable fabric (RF) with a superscalar core. Our compiler automatically partitions and compiles an application into an instruction stream for the core and a configuration for the RF. We use a detailed simulator to capture both timing and energy numbers for all parts of the system.

2007 44th ACM/IEEE Design Automation Conference, 2007
An effective method for focusing optimization effort on the most important parts of a design is t... more An effective method for focusing optimization effort on the most important parts of a design is to examine those elements on the critical path. Traditionally, the critical path is defined at the RTL level, as the longest path in the combinational logic between clocked registers. In this paper, we present a system-level timing analysis technique to define the concept of a Global Critical Path (GCP), for predicting system-level performance. We show how the GCP can be used as a theoretical and practical tool for understanding, summarizing and optimizing the behavior of highly concurrent self-timed circuits. We formally define the GCP and show how it can be constructed using a discrete event model and hardware profiling techniques. The GCP provides valuable insight into the control-path behavior of circuits and in finding system-level bottlenecks. We have incorporated the GCP construction and analysis framework into a high-level synthesis and simulation toolchain, thus enabling complete automation in modeling, analysis and optimization.
Spatial computation
ACM SIGPLAN Notices, 2004
Page 1. Spatial Computation Mihai Budiu, Girish Venkataramani, Tiberiu Chelcea and Seth Copen Gol... more Page 1. Spatial Computation Mihai Budiu, Girish Venkataramani, Tiberiu Chelcea and Seth Copen Goldstein {mihaib,girish,tibi,seth}@cs.cmu.edu Carnegie Mellon University ABSTRACT This paper describes a computer architecture ...

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2000
A major obstacle to successful high-level synthesis (HLS) of large-scale application-specified in... more A major obstacle to successful high-level synthesis (HLS) of large-scale application-specified integrated circuit systems is the presence of memory accesses to a shared-memory subsystem. The latency to access memory is often not statically predictable, which creates problems for scheduling operations dependent on memory reads. More fundamental is that dependences between accesses may not be statically provable (e.g., if the specification language permits pointers), which introduces memory-consistency problems. Addressing these issues with static scheduling results in overly conservative circuits, and thus, most state-of-the-art HLS tools limit memory systems to those that have predictable latencies and limit programmers to specifications that forbid arbitrary memory-reference patterns. A new HLS framework for the synthesis and optimization of memory accesses (SOMA) is presented. SOMA enables specifications to include arbitrary memory references (e.g., pointers) and allows the memory system to incorporate features that might cause the latency of a memory access to vary dynamically. This results in raising the level of abstraction in the input specification, enabling faster design times. SOMA synthesizes a memory access network (MAN) architecture that facilitates dynamic scheduling and ordering of memory accesses. The paper describes a basic MAN construction technique that illustrates how dynamic ordering helps in efficiently maintaining memory consistency and how dynamic scheduling helps alleviate the variable-latency problem. Then, it is shown how static analysis of the access patterns can be used to optimize the MAN. One optimization changes the MAN interconnect topology to increase concurrence. A second optimization reduces the synchronization overhead necessary to maintain memory consistency. Postlayout experiments demonstrate that SOMA's application-specific MAN construction significantly improves power and performance for a range of benchmarks.
… Workshop on Logic …, 2004
We present a complete toolflow that translates ANSI-C programs into asynchronous circuits. The to... more We present a complete toolflow that translates ANSI-C programs into asynchronous circuits. The toolflow is built around a compiler that converts C into a functional dataflow intermediate representation, exposing instruction-level, pipeline and memory parallelism. The compiler performs optimizations and converts the intermediate representation into pipelined asynchronous circuits, with no centralized controllers. In the resulting circuits, control is distributed, communication is achieved through local wires, and arbitration for datapath resources is unnecessary. Circuits automatically synthesized from Mediabench kernels exhibit excellent energy-delay.
Modeling the global critical path in concurrent systems
Page 1. Modeling the Global Critical Path in Concurrent Systems Girish Venkataramani Tiberiu Che... more Page 1. Modeling the Global Critical Path in Concurrent Systems Girish Venkataramani Tiberiu Chelcea Mihai Budiu Seth C. Goldstein Carnegie Mellon University Microsoft Research Pittsburgh, PA Mountain View, CA {girish,tibi,seth}@cs.cmu.edu ...
Balsa-cube: an optimising back-end for the balsa synthesis system
Abstract Several approaches have been proposed for the syntaxdirected compilation of asynchronous... more Abstract Several approaches have been proposed for the syntaxdirected compilation of asynchronous circuits from highlevel specification languages, such as Balsa [1] and Tangram [13, 10]. Both compilers have been successfully used in large real-world ...
MCMAHAN, HUGH BRENDAN CMU-CS-06-166 MEADOWS, Catherine CMU-CS-06-172 MILLER, Gary CMU-CS-06-132
reports-archive.adm.cs.cmu.edu
2006 Technical Reports by Author Computer Science Department School of Computer Science, Carnegie... more 2006 Technical Reports by Author Computer Science Department School of Computer Science, Carnegie Mellon University. ACAR, Umut A. CMU-CS-06-115, CMU-CS-06-168. AILAMAKI, Anastassia CMU-CS-06-116, CMU-CS-06-128, CMU-CS-06-139. ALDRICH, Jonathan CMU-CS-06-109, CMU-CS-06-178. ANDERSEN, David G. CMU-CS-06-114, CMU-CS-06-154. AVRAMOPOULOS, Ioannis CMU-CS-06-154. BALAN, Rajesh CMU-CS-06-120, CMU-CS-06-123. BENNETT, Paul N. CMU-CS-06-121. BENNETT, Rachael CMU-CS-06-125. ...
Uploads
Papers by Tiberiu Chelcea