Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2009, Lecture Notes in Computer Science
Reconfigurable computing is an emerging paradigm enabled by the growth in size and speed of FPGAs. In this paper we discuss its place in the evolution of computing as a technology as well as the role it can play in the current technology outlook. We discuss the evolution of ROCCC (Riverside Optimizing Compiler for Configurable Computing) in this context.
2013
— Reconfigurable systems can offer the high spatial parallelism and fine-grained, bit-level resource control traditionally associated with hardware implementations, along with the flexibility and adaptability characteristic of software. While reconfigurable systems create new opportunities for engineering and delivering high-performance programmable systems, the traditional approaches to programming and managing computations used for hardware systems (e.g. Verilog, VHDL) and software systems (e.g. C, Fortran, Java) are inappropriate and inadequate for exploiting reconfigurable platforms. To address this need, we develop a stream-oriented compute model, system architecture, and execution patterns which can capture and exploit the parallelism of spatial computations while simultaneously abstracting software applications from hardware details (e.g., timing, device capacity, microarchitectural implementation details) and consequently allowing applications to scale to exploit newer, larg...
2010
Abstract Reconfigurable computing platforms offer the promise of substantially accelerating computations through the concurrent nature of hardware structures and the ability of these architectures for hardware customization.
Computer, 2000
2011
As the complexity of modern embedded systems increases, it becomes less practical to design monolithic processing platforms. As a result, reconfigurable computing is being adopted widely for more flexible design. Reconfigurable Computers offer the spatial parallelism and fine-grained customizability of application-specific circuits with the postfabrication programmability of software.
To accelerate the execution of an application, repetitive logic and arithmetic computation tasks may be mapped to reconfigurable hardware, since dedicated hardware can deliver much higher speeds than those of a general-purpose processor. However, this is only feasible if the run-time reconfiguration of new tasks is fast enough, so as not to delay application execution. Currently, this is opposed by architectural constraints intrinsic to current Field-Programmable Logic Array (FPGA) architectures. Despite all new features exhibited by current FPGAs, architecturally they are still largely based on general-purpose architectures that are inadequate for the demands of reconfigurable computing. Large configuration file sizes and poor hardware and software support for partial and dynamic reconfiguration limits the acceleration that reconfigurable computing may bring to applications. The objective of this work is the identification of the architectural limitations exhibited by current FPGAs...
Parallel …, 1999
This paper presents the Cameron Project 1 , which aims to provide a high level, algorithmic language and optimizing compiler for the development of image processing applications on Reconfigurable Computing Systems (RCSs). SA-C, a single assignment variant of the C programming language, is designed to exploit both coarse-grain and fine-grain parallelism in image processing applications. Khoros, a software development environment commonly used for image processing, has been modified to support SA-C program development.
Dynamic programming languages have become increasingly popular and adaptive compilation, which uses runtime mea- surements to generate improved code, is a key technology for high performance implementations of such languages. While it has been used in servers and desktops, adaptive compila- tion has not been as successful in the low end and embedded systems and even less so in the high end such as supercom- puters. SiliconSqueak is a parallel, reconfigurable architec- ture optimized for adaptive compilation which can address both computing extremes. This manycore system includes a mix of basic and extended processors, where these extensions are configurable accelerators. In FPGA implementations the ratio of these changes at runtime.
2011
In this paper we present "Snake", a novel technique for allocating and executing hardware tasks onto partially reconfigurable Xilinx FPGAs. Snake permits to alleviate the bottleneck introduced by the Internal Configuration Access Port (ICAP) in Xilinx FPGAs, by reusing both intermediate partial results and previously allocated pieces of circuitry. Moreover, Snake considers often neglected aspects in previous approaches when making allocation decisions, such as the technological constraints introduced by reconfigurable technology and inter-task communication issues. As a result of being a realistic solution its implementation using real FPGA hardware has been successful. We have checked its ability to reduce not only the overall execution time of a wide range of synthetic reconfigurable applications, but also time overheads in making allocation decisions in the first place.
Proceedings 5th Australasian Computer Architecture Conference. ACAC 2000 (Cat. No.PR00512), 1999
A novel architecture for reconfigurable computing based on a coarse grain FPGA-like architecture is introduced. The basic blocks contain all arithmetical and logical capacities as well as some registers and will be programmable by sequential instruction streams produced by software compiler. Reconfiguration is related to hyperblocks of instructions. For the composed reconfigurable processors a classification is introduced for describing realtime, multithreading and performance capabilities.
Journal of Signal Processing Systems, 2012
Day after day, embedded systems add more compute-intensive applications inside their end products: cryptography or image and video processing are some examples found in leading markets like consumer electronics and automotive. To face up these ever-increasing computational demands, the use of hardware accelerators synthesized in field-programmable gate arrays (FPGA) lets achieve processing speedups of orders of magnitude versus their counterpart CPU-based software approaches. However, the inherent increment in physical resources penalizes in cost. To address this issue, dynamically reconfigurable hardware technology definitively reached its maturity. SRAM-based reconfigurable logic goes beyond the classical conception of static hardware resources distributed in space and held invariant for the entire application life cycle; it provides a new design abstraction featured by the temporal partitioning of such resources to promote their continuous reuse, reconfiguring them on the fly to play a different role in each instant. This new computing paradigm lets balance the design of embedded applications by partitioning their functionality in space and time-through a series of mutually-exclusive processing tasks synthesized multiplexed in time on the same set of resources-and achieving thus cost savings in both area and power metrics. However, the exploitation of this system versatility requires special attention to avoid performance degradation. Such technical aspects are addressed in this work intended to be a survey on reconfigurable hardware technology and aimed at defining an open, standard and cost-effective system architecture driven by flexible coprocessors instantiated on demand on reconfigurable resources of an FPGA. This concept fits well with the functional features demanded to many embedded applications today and its feasibility has been proved with a state-of-the-art commercial SRAM-based FPGA platform. The achieved results highlight dynamic partial reconfiguration as a potential technology to lead the next computing wave in the industry.
2005
ROCCC (Riverside Optimizing Configurable Computing Compiler) is an optimizing C to HDL compiler targeting FPGA and CSOC (Configurable System On a Chip) architectures. ROCCC system is built on the SUIF-MACHSUIF compiler infrastructure. Our system first identifies frequently executed kernel loops inside programs and then compiles them to VHDL after optimizing the kernels to make best use of FPGA resources. This paper presents an overview of the ROCCC project as well as optimizations performed inside the ROCCC compiler.
Computer, 2000
Initial performance results with FPGAs were impressive. However, commercial FPGAs have inherent shortcomings, which heretofore made reconfigurable computing impractical for mainstream computing: • Logic granularity. FPGAs are designed for logic replacement. The functional units' granularity is optimized to replace random logic, not to perform multimedia computations. Reconfigurable computing will change the way computing systems are designed, built, and used. PipeRench, a new reconfigurable fabric, combines the flexibility of general-purpose processors with the efficiency of customized hardware to achieve extreme performance speedup.
Compilation Techniques for Reconfigurable Architectures, 2008
This chapter describes the most prominent academic efforts on compilation and synthesis of application codes written in high-level programming languages to reconfigurable architectures. The maturity of some of the compilation and mapping techniques described in Chaps. 4 and 5, and the stability of the underlying reconfigurable technologies, have enabled the emergence of commercial compilation solutions, such as the MAP compiler from SRC Computers [292] and the High-Level Compiler from Nallatech [223], both of which support the mapping of programs written in a subset of the C programming language to FPGAs. In this chapter, we distinguish between compilation efforts that target finegrained commercially available reconfigurable devices, such as well-known FP-GAs, and efforts that target architectures with proprietary reconfigurable devices, typically coarse-grained devices. Despite their granularity distinction, and thus the different mapping techniques used, these efforts exhibit many commonalities. We begin with a brief historical perspective on early compilation efforts, which naturally focused on fine-grained architectures. We then describe various representative compilation efforts, highlighting their use of the transformations and mapping techniques described in the previous two chapters. We conclude by summarizing and highlighting the differences between the described compilation efforts.
it - Information Technology, 2007
The CRC project focuses on the utilization of fast reconfiguration to optimize area, performance, and power. The results are quantified by a synthesizable architecture model and by a commercial architecture. In order to assure good applicability of the research, a C-compiler is co-developed with the architecture. This article provides an overview of the optimization techniques and a summary of current evaluation results.
Proceedings of 1998 Asia and South Pacific Design Automation Conference
Ersa, 2004
Modern reconfigurable computing systems feature powerful hybrid architectures with multiple microprocessor cores, large reconfigurable logic arrays and distributed memory hierarchies. Mapping applications to these complex systems requires a representation that allows both hardware and software synthesis. Additionally, this representation must enable optimizations that exploit fine and coarse grained parallelism in order to effectively utilize the performance of the underlying reconfigurable architecture. Our work explores a representation based on the program dependence graph (PDG) incorporated with the static single-assignment (SSA) for synthesis to high performance reconfigurable devices. The PDG effectively describes control dependencies, while SSA yields precise data dependencies. When used together, these two representations provide a powerful, synthesizable form that exploits both fine and coarse grained parallelism. Compared to other commonly used representations for reconfigurable systems, the PDG+SSA form creates faster execution time, while using similar area.
Proceedings of the 2006 ACM/IEEE Conference on Supercomputing, SC'06, 2006
Background High-Performance Reconfigurable Computers (HPRCs) based on integrating conventional microprocessors and Field Programmable Gate Arrays (FPGA) have been gaining increasing attention in the past few years. With offerings from rising companies such as SRC and major high-performance computing vendors such as Cray and SGI, a wide array of such architectures is already available and it is believed that more and more offerings by others will be emerging. Furthermore, in spite of the recent birth of this class of highperformance computing architectures, the approaches followed by hardware and software vendors are starting to converge, signaling a progress towards the maturity of this area and making a room, perhaps, for standardization.
ACM Transactions on Reconfigurable Technology and Systems, 2009
________________________________________________________________________ Run-Time Reconfiguration (RTR) has been traditionally utilized as a means for exploiting the flexibility of High-Performance Reconfigurable Computers (HPRCs). However, the RTR feature comes with the cost of high configuration overhead which might negatively impact the overall performance. Currently, modern FPGAs have more advanced mechanisms for reducing the configuration overheads, particularly Partial Run-Time Reconfiguration (PRTR). It has been perceived that PRTR on HPRC systems can be the trend for improving the performance. In this work, we will investigate the potential of PRTR on HPRC by formally analyzing the execution model and experimentally verifying our analytical findings by enabling PRTR for the first time, to the best of our knowledge, on one of the current HPRC systems, Cray XD1. Our approach is general and can be applied to any of the available HPRC systems. The paper will conclude with recommendations and conditions, based on our conceptual and experimental work, for the optimal utilization of PRTR as well as possible future usage in HPRC.
2007
High-Performance Reconfigurable Computing (HPRC) systems have always been characterized by their high performance and flexibility. Flexibility has been traditionally exploited through the Run-Time Reconfiguration (RTR) provided by most of the available platforms. However, the RTR feature comes with the cost of high configuration overhead which might negatively impact the overall performance. Currently, modern FPGAs have more advanced mechanisms for reducing the configuration overheads, particularly Partial Run-Time Reconfiguration (PRTR). It has been perceived that PRTR on HPRC systems can be the trend for improving the performance. In this work, we will investigate the potential of PRTR on HPRC by formally analyzing the execution model and experimentally verifying our analytical findings by enabling PRTR for the first time, to the best of our knowledge, on one of the state-of-the-art HPRC systems, Cray XD1. Our approach is general and can be applied to any of the available HPRC systems. The paper will conclude with recommendations and conditions, based on our conceptual and experimental work, for the optimal utilization of PRTR as well as possible future usage in HPRC.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.