Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2000, VLSI: Systems on a Chip
Currently multi-FPGA reeonfigurable eomputing systems are still eommonly used for accelerating algorithms. This teehnology where acceleration is aehieved by spatial implementation of an algorithm in reeonfigurable hardware has proven to be feasible. However, the best suiting algorithms are those who are very struetured, ean benefit from deep pipelining and need only loeal eommunieation resourees. Many algorithms ean not fulfil the third requirement onee the problem size grows and multi-FPGA systems beeome neeessary. In this paper we address the emulation of a run time reeonfigurable proeessor arehiteeture, whieh seales better for this kind of eomputing problems.
2013
— Reconfigurable systems can offer the high spatial parallelism and fine-grained, bit-level resource control traditionally associated with hardware implementations, along with the flexibility and adaptability characteristic of software. While reconfigurable systems create new opportunities for engineering and delivering high-performance programmable systems, the traditional approaches to programming and managing computations used for hardware systems (e.g. Verilog, VHDL) and software systems (e.g. C, Fortran, Java) are inappropriate and inadequate for exploiting reconfigurable platforms. To address this need, we develop a stream-oriented compute model, system architecture, and execution patterns which can capture and exploit the parallelism of spatial computations while simultaneously abstracting software applications from hardware details (e.g., timing, device capacity, microarchitectural implementation details) and consequently allowing applications to scale to exploit newer, larg...
To accelerate the execution of an application, repetitive logic and arithmetic computation tasks may be mapped to reconfigurable hardware, since dedicated hardware can deliver much higher speeds than those of a general-purpose processor. However, this is only feasible if the run-time reconfiguration of new tasks is fast enough, so as not to delay application execution. Currently, this is opposed by architectural constraints intrinsic to current Field-Programmable Logic Array (FPGA) architectures. Despite all new features exhibited by current FPGAs, architecturally they are still largely based on general-purpose architectures that are inadequate for the demands of reconfigurable computing. Large configuration file sizes and poor hardware and software support for partial and dynamic reconfiguration limits the acceleration that reconfigurable computing may bring to applications. The objective of this work is the identification of the architectural limitations exhibited by current FPGAs...
Many real-world engineering problems require high computational power, especially regarding to the processing time. Current parallel processing techniques play an important role in reducing the processing time. Recently, reconfigurable computation has gained large attention thanks to its ability to combine hardware performance and software flexibility. Also, the availability of high-density FPGA (Field Programmable Gate Array) devices and corresponding development systems, allowed the popularization of reconfigurable computation, encouraging the development of very complex, compact and powerful systems for custom applications. This work presents an architecture for parallel reconfigurable computation based on the dataflow concept. This architecture allows reconfigurability of the system for many problems and, particularly, for numerical computation. Several experiments were done analyzing the scalability of the architecture, as well as comparing its performance with other approaches. Overall results are relevant and promising. The developed architecture has performance and scalability suited for engineering problems that demand intensive numerical computation.
2007 International Conference on Field Programmable Logic and Applications, 2007
Modern FPGAs' parallel computing capability and their ability to be reconfigured make them an ideal platform to build accelerators for supercomputing systems. As a multicore processor, the recently announced Cell Broadband EngineTM1 offers tremendous computing power. In this paper, we introduce a prototype system that combines these two types of computing devices together in a reconfigurable blade and we describe its architecture, memory system and abundant interfaces.
2013
Partial Reconfiguration is the ability to dynamically modify blocks of logic by downloading partial bit files while the remaining logic con-tinues to operate without interruption. The concept is analogue to a processor context switch.- System Flexibility: When a specific part of a design needs to be reconfigured it is sometimes necessary to preserve the existing com-munication link instead of resetting the full device.- Size and Cost Reduction: Some function are time-mutual exclusive to each other. This means some functions never need to exists on the same time. Instead of implementing all functions in parallel and selecting the needed function using a multiplexer, PR can dynami-cally change the needed function.- Power Reduction: In embedded systems where power efficiency is an issue. Some functions can be reconfigured with a blank bitstream to save power consumption. Also multiple versions of the same func-tion can be made. A high-end implementation consuming a lot of power and a m...
ArXiv, 2011
In this paper, the acceleration of algorithms using a design of a field pro-grammable gate array (FPGA) as a prototype of a static dataflow architec-ture is discussed. The static dataflow architecture using operators intercon-nected by parallel buses was implemented. Accelerating algorithms usinga dataflow graph in a reconfigurable system shows the potential for highcomputation rates. The results of benchmarks implemented using the staticdataflow architecture are reported at the end of this paper. Keywords: Accelerating algorithms, Reconfigurable Computing, StaticDataflow Graph, Modules C to VHDL. 1. Introduction With the advent of reconfigurable computing, basically using a Field Pro-grammable Gate Array(FPGA), researchers are trying to explore the maxi-mum capacities of these devices, which are: flexibility, parallelism, optimiza-tion for power, security and real time applications [7, 14].Because of the complexity of the applications and the large possibilitiesto develop systems using FPGAs...
Journal of Computers, 2007
Embedded systems normally involve a combination of hardware and software resources designed to perform dedicated tasks. Such systems have widely crept into industrial control, automotive, networking, and consumer products. These systems require efficient devices that occupy small area and consume low power. The device area can be minimized by reusing the same hardware for different applications. If possible, reconfiguring the hardware to adapt to the application needs is important for reducing execution time and/or power consumption. Partial reconfiguration facilitates minimum hardware changes to form a new configuration. We have designed a reconfigurable vector processor for embedded applications. Benchmark results on Xilinx FPGAs (Field-Programmable Gate Arrays) are presented involving partial reconfiguration for embedded applications that process vectors. Two approaches are studied toward performance evaluation. The first one estimates the required partial reconfiguration time based on the resources consumed by the corresponding vector kernels. The second approach uses the actual measurement of partial reconfiguration time on a platform that supports a particular type of partial reconfiguration. More than 20% performance improvement has been observed for benchmark kernels, without neglecting the reconfiguration overhead. A framework is proposed as well to efficiently manage the reconfiguration overhead for applications involving multiple kernels.
Proceedings of the 14th international conference on Compilers, architectures and synthesis for embedded systems - CASES '11, 2011
Coarse-grained reconfigurable architecture has emerged as a promising model for embedded systems as a solution to reduce the complexity of FPGA synthesis and mapping steps, consequently reducing reconfiguration time. Despite these advantages, CGRA usage has been limited due to the lack of commercial CGRA circuits. This work proposes a virtual and dynamic CGRA implemented on top of an FPGA. This approach allows the usage of commercial-off-the-shelf FPGA devices combined with the advantages of CGRAs. The proposed architecture consists of a set of heterogeneous functional units (FU) and a global interconnection network. The global network allows any FU to be used at each cycle, which reduces significantly the placement complexity. In addition, we introduce a polynomial mapping algorithm which includes scheduling, placement and routing steps (SPR). Moreover, the proposed approach performs a very fast placement and routing in comparison to similar CGRA approaches. The three SPR steps are computed in few milliseconds. The feasibility of this approach is demonstrated for a suite of digital signal processing benchmarks.
Proceedings 13th Symposium on Integrated Circuits and Systems Design (Cat. No.PR00843), 2000
This paper addresses the design and implementation of a configurable "combinatorial processor", a computational device, which can be used for solving different combinatorial problems. These can be characterized by a set of variables having a limited number of values with a corresponding set of operations that might be applied to these variables. Different mathematical models can be used to describe such tasks. We adopted a matrix representation, which is easier to treat in digital devices. The operations on discrete matrices are unique and cannot be efficiently performed on a general-purpose processor. Although the number of such operations grows exponentially with the number of variables, to solve a particular combinatorial problem a very small number of such operations is usually required. Hence the importance of providing for the dynamic change of operations. The paper presents an approach allowing the run-time modification of combinatorial computations via reloading the RAM-based configurable logic blocks of the FPGAs.
2011
Adaptive embedded systems are currently investigated as an answer to more stringent requirements on low power, in combination with significant performance. It is clear that runtime adaptation can offer benefits to embedded systems over static implementations as the architecture itself can be tuned to the problem at hand. Such architecture specialisation should be done fast enough so that the overhead of adapting the system does not overshadow the benefits obtained by the adaptivity. In this paper, we propose a methodology for FPGA design that allows such a fast reconfiguration for dynamic datafolding applications. Dynamic Data Folding (DDF) is a technique to dynamically specialize an FPGA configuration according to the values of a set of parameters. The general idea of DDF is that each time the parameter values change, the device is reconfigured with a configuration that is specialized for the new parameter values. Since specialized configurations are smaller and faster than their generic counterpart, the hope is that their corresponding system implementation will be more cost efficient. In this paper, we show that DDF can be implemented on current commercial FPGAs by using the parameterizable run-time reconfiguration methodology. This methodology comprises a tool flow that automatically transforms DDF applications to a runtime adaptive implementation. Experimental results with this tool flow show that we can reap the benefits (smaller area and faster clocks) without too much reconfiguration overhead.
ACACES Poster Abstracts, L\'Aquila, Italy
2007
This work deals with reconfigurable computation platforms for high speed simulation of physical phenomena, based on numerical models of algebraic linear systems. This type of simulation is of extreme importance in research centers as CENPES/Petrobrs, that develops applications of geophysical processing for prospection of oil and gas. Currently, these applications are implemented on PCs conventional clusters. A new approach for this type of problem is presented here, based on reconfigurable computer systems using Field Programmable Gate Arrays technology (FPGA) and its implications regarding the hardware/software partitioning, operating system, memory connections, communication and device drivers. Such technologies make possible appreciable profits in terms of performance -electric power and processing speed when compared to the conventional clusters. This solution also promotes cost reduction when applied to massive computation and high complexity large data applications, normally used in scientific computation.
1997
While reconfigurable computing promises to deliver incomparable performance, it is still a marginal technology due to the high cost of developing and upgrading applications.
Proceedings of the 2000 ACM/SIGDA eighth international symposium on Field programmable gate arrays - FPGA '00, 2000
With increased logic density due to the shift towards Deep Submicron technologies (DSM), FPGAs have become a viable option for implementing large designs. However, most commercial FPGAs, due to their general purpose architectural nature, cannot handle designs which require very high throughput. In this paper, we propose a novel high throughput FPGA architecture which tries to combine the high-performance of Application Specific Integrated Circuits (ASICs) and the flexibility afforded by the reconfigurability of FPGAs. This architecture utilizes the concept of 'Wave-Steering' and works best for designs which are highly regular and have almost equal delays along all paths. It has enormous potential in Digital Signal and Image Processing applications since a good portion of these applications are regular in nature. Preliminary results for some commonly used DSP designs are encouraging and yield throughputs in the neighborhood of 770 MHz in 0.5µ CMOS technology.
2011
As the complexity of modern embedded systems increases, it becomes less practical to design monolithic processing platforms. As a result, reconfigurable computing is being adopted widely for more flexible design. Reconfigurable Computers offer the spatial parallelism and fine-grained customizability of application-specific circuits with the postfabrication programmability of software.
Computer, 2000
2016
During recent years much research focused on making Partial Reconfiguration (PR) more widespread. The FASTER project aimed at realizing an integrated toolchain that assists the designer in the steps of the design flow that are necessary to port a given application onto an FPGA device. The novelty of the framework lies in the use of partial dynamic reconfiguration seen as a first class citizen throughout the entire design flow in order to exploit FPGA device potential. The STMicroelectronics SPEAr development platform combines an ARM pro-cessor alongside with a Virtex-5 FPGA daughter-board. While partial reconfigura-tion in the attached board was considered as feasible from the beginning, there was no full implementation of a hardware architecture using PR. This work describes our efforts to exploit PR on the SPEAr prototyping embedded platform. The pa-per discusses the implemented architecture, as well as the integration of Run-Time System Manager for scheduling (run-time reconfiogu...
ACS/IEEE International Conference on Computer Systems and Applications, 2003. Book of Abstracts., 2003
The main focus of this paper is on implementing high level functional algorithms in reconfigurable hardware. The approach adopts the transformational programming paradigm for deriving massively parallel algorithms from functional specifications. It extends previous work by systematically generating efficient circuits and mapping them into reconfigurable hardware. The massive parallelisation of the algorithm works by carefully composing "off the shelf" highly parallel implementations of each of the basic building blocks involved in the algorithm. These basic building blocks are a small collection of well-known higher order functions such as map, fold, and zipwith. By using function decomposition and data refinement techniques, these powerful functions are refined into highly parallel implementations described in Hoare's CSP. The CSP descriptions are very closely associated with Handle-C program fragments. Handle-C is a programming language based on C and extended by parallelism and communication primitives taken from CSP. In the final stage the circuit description is generated by compiling Handle-C programs and mapping them onto the targeted reconfigurable hardware such as the Celoxica RC-1000 FPGA system. This approach is illustrated by a case study involving the generation of several versions of the matrix multiplication algorithm.
2011
In this paper we present "Snake", a novel technique for allocating and executing hardware tasks onto partially reconfigurable Xilinx FPGAs. Snake permits to alleviate the bottleneck introduced by the Internal Configuration Access Port (ICAP) in Xilinx FPGAs, by reusing both intermediate partial results and previously allocated pieces of circuitry. Moreover, Snake considers often neglected aspects in previous approaches when making allocation decisions, such as the technological constraints introduced by reconfigurable technology and inter-task communication issues. As a result of being a realistic solution its implementation using real FPGA hardware has been successful. We have checked its ability to reduce not only the overall execution time of a wide range of synthetic reconfigurable applications, but also time overheads in making allocation decisions in the first place.
International Journal of Reconfigurable Computing, 2011
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.