Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2003, Lecture Notes in Computer Science
…
11 pages
1 file
In this paper we present compiler extensions for the Molen programming paradigm, which is a sequential consistency paradigm for programming custom computing machines (CCM). The compiler supports instruction set extensions and register file extensions. Based on pragma annotations in the application code, it identifies the code fragments implemented on the reconfigurable hardware and automatically maps the application on the target reconfigurable architecture. We also define and implement a mechanism that allows multiple operations to be executed in parallel on the reconfigurable hardware. In a case study, the Molen processor has been evaluated. We considered two popular multimedia benchmarks: mpeg2enc and ijpeg and some well-known timeconsuming operations implemented in the reconfigurable hardware. The total number of executed instructions has been reduced with 72% for mpeg2enc and 35% for ijpeg encoder, compared to their pure software implementations on a general purpose processor (GPP).
ACM Transactions on Embedded Computing Systems, 2007
In this paper, we describe the compiler developed to target the Molen reconfigurable processor and programming paradigm. The compiler automatically generates optimized binary code for C applications, based on pragma annotation of the code executed on the reconfigurable hardware. For the IBM PowerPC 405 processor included in the Virtex II Pro platform FPGA, we implemented code generation, register and stack frame allocation following the PowerPC EABI (Embedded Application Binary Interface). The PowerPC backend has been extended to generate the appropriate instructions for the reconfigurable hardware and data transfer, taking into account the information of the specific hardware implementations and system. Starting with an annotated C application, a complete design flow has been integrated to generate the executable bitstream for the reconfigurable processor. The flexible design of the proposed infrastructure allows to consider the special features of the reconfigurable architectures. In order to hide the reconfiguration latencies, we implemented an instruction scheduling algorithm for the dynamic hardware configuration instructions. The algorithm schedules in advance the hardware configuration instructions, taking into account the conflicts for the reconfigurable hardware resources (FPGA area) between the hardware operations. To verify the Molen compiler, we used the multimedia video frame M-JPEG encoder of which the extended Discrete Cosine Transform(DCT*) function was mapped on the FPGA. We obtained an overall speedup of 2.5 (about 84 % efficiency over the maximal theoretical speedup of 2.96). The performance efficiency is achieved using automatically generated non-optimized DCT* hardware implementation. The instruction scheduling algorithm has been tested for DCT, Quantization and VLC operations. Based on simulation results, we determine that, while a simple scheduling produces a significant performance decrease, our proposed scheduling contributes for up to 16x M-JPEG encoder speedup.
IEEE Transactions on Computers, 2004
In this paper, we present a polymorphic processor paradigm incorporating both general purpose and custom computing processing. The proposal incorporates an arbitrary number of programmable units, exposes the hardware to the programmers/ designers, and allows them to modify and extend the processor functionality at will. To achieve the previously stated attributes, we present a new programming paradigm, a new instruction set architecture, a microcode-based microarchitecture, and a compiler methodology. The programming paradigm, in contrast with the conventional programming paradigms, allows general-purpose conventional code and hardware descriptions to coexist in a program. In our proposal, for a given instruction set architecture, a onetime instruction set extension of eight instructions is sufficient to implement the reconfigurable functionality of the processor. We propose a microarchitecture based on reconfigurable hardware emulation to allow high-speed reconfiguration and execution. To prove the viability of the proposal, we experimented with the MPEG-2 encoder and decoder and a Xilinx Virtex II Pro FPGA. We have implemented three operations, SAD, DCT, and IDCT. The overall attainable application speedup for the MPEG-2 encoder and decoder is between 2.64-3.18 and between 1.56-1.94, respectively, representing between 93 percent and 98 percent of the theoretically obtainable speedups.
The advantages of the reconfigurable technology in terms of performance have been widely recognized. However, programming reconfigurable systems and designing hardware accelerators for them is not a trivial task. The Molen paradigm provides an easy to use approach to couple a General Purpose Processor (GPP) with custom designed reconfigurable accelerators both at program level and at hardware design level. In this case study, we illustrate the entire design flow to demonstrate how one can use the Delft-Workbench Automated Reconfigurable VHDL Generator (DWARV) tool, the Molen compiler and the Molen reconfigurable co-processor to accelerate a C application code in hardware. As a case study application, the G721 audio encoder is used. The implementation platform is a Xilinx Vir-texII Pro XC2VP30-7 FPGA, which integrates two PowerPC 405 processors. The experimental results obtained after employing the described design flow suggest an overall application speedup of 2.7 times over a pure software implementation.
2008
One of the upcoming challenges in embedded processing is to incorporate an increasing amount of adaptivity in order to respond to the multifarious constraints induced by today's embedded systems that feature complex and diverse application behaviors.
2010
Abstract Reconfigurable computing platforms offer the promise of substantially accelerating computations through the concurrent nature of hardware structures and the ability of these architectures for hardware customization.
Lecture Notes in Computer Science, 2004
We use the Xilinx Virtex II Pro™ technology as prototyping platform to design a MOLEN polymorphic processor, a custom computing machine based on the co-processor architectural paradigm. The PowerPC embedded in the FPGA is operating as a general purpose (core) processor and the reconfigurable fabric is used as a reconfigurable co-processor. The paper focuses on hardware synthesis results and experimental performance evaluation, proving the viability of the MOLEN concept. More precisely, the MPEG-2 application is accelerated very closely to its theoretical limits by implementing SAD, DCT and IDCT as reconfigurable co-processors. For a set of popular test video sequences the MPEG-2 encoder overall speedup is in the range between 2.64 and 3.18. The speedup of the MPEG-2 decoder varies between 1.65 and 1.94.
Processor Design, 2007
The capability to tailor the processor instruction set architecture (ISA) around the computational requirements of a given application is proposed today as the most appealing way to match performance with very short time-to-market, accomplishing the reduction of non-recurring engineering (NRE) costs. From Mask-Time Configurable Processors (MTCPs) to Run-Time Reconfigurable Processors (RTRPs), the ISA customization is performed "moving" kernels of initial code from software to hardware, thus introducing a design space exploration problem involving skills in both software and hardware design. Since adaptive processors appear as the natural extension of Digital Signal Processors (DSPs), programming tools for customizable processors need to be as similar as possible to standard software development environments, in order to enable the adaptive computing to the wide audience of DSP programmers. While fast design-space explorations can be performed using high-level description languages, programmers proficient in hardware design can further improve the performance through "structural" descriptions involving, for example, the direct utilization of macro-operators or the possibility of balancing critical paths through register insertion. The widespread knowledge of the ANSI C among developers suggests its usage as main entry language for both configurable and reconfigurable architectures, thus introducing the problem of translating C codes (or C dialects) into some kind of hardware description, be it HDL in case of MTCPs or bit-stream for RTRPs. In this context, Data-Flow Graphs (DFGs) can be efficiently used to close the gap between hardware and software design, thus representing the most natural bridge between the hardware and software descriptions. Furthermore, standard ANSI C can be used by the programmer for the management of the application control flow on the processor core, embedding custom-designed instructions in
2005
A recent approach to platform-based design involves the use of extensible processors, offering architecture customization possibilities. Part of the designer responsibilities is the domain-specific extension of the baseline processor to fit customer requirements. Key issues of this process are the automated application analysis and candidate instruction identification/selection for implementation as applicationspecific functional units (AFUs). In this paper, a design approach that encapsulates automated workload characterization and instruction generation is utilized for extending processors to efficiently support embedded application sets. The method used for instruction generation is a highly parameterized adaptation of the MaxMISO technique, which allows for fast design space exploration. It is proven that only a small number of AFUs are needed in order to support the algorithms of interest (MPEG-4 encoding kernels) and that it is possible to achieve 2× to 3.5× performance improvements although further possibilities such as subword parallelization are not currently regarded.
Compilation Techniques for Reconfigurable Architectures, 2008
This chapter describes the most prominent academic efforts on compilation and synthesis of application codes written in high-level programming languages to reconfigurable architectures. The maturity of some of the compilation and mapping techniques described in Chaps. 4 and 5, and the stability of the underlying reconfigurable technologies, have enabled the emergence of commercial compilation solutions, such as the MAP compiler from SRC Computers [292] and the High-Level Compiler from Nallatech [223], both of which support the mapping of programs written in a subset of the C programming language to FPGAs. In this chapter, we distinguish between compilation efforts that target finegrained commercially available reconfigurable devices, such as well-known FP-GAs, and efforts that target architectures with proprietary reconfigurable devices, typically coarse-grained devices. Despite their granularity distinction, and thus the different mapping techniques used, these efforts exhibit many commonalities. We begin with a brief historical perspective on early compilation efforts, which naturally focused on fine-grained architectures. We then describe various representative compilation efforts, highlighting their use of the transformations and mapping techniques described in the previous two chapters. We conclude by summarizing and highlighting the differences between the described compilation efforts.
2008
Processors with a reconfigurable instruction set combine the performance of dedicated application accelerators with a flexibility that goes beyond that of traditional Application Specific Instruction Set Processors (ASIPs). The latter are optimized for certain application domains and thus typically do not provide a high performance and/or efficiency when deployed in other domains. State-of-the-art Reconfigurable Processors on the other side still use the concept of monolithic Special Instructions (SIs, i.e. the application accelerators). In our work, we instead present modular SIs as a hierarchy of elementary data paths and different SI implementations that facilitate a high flexibility and performance. This is a novel concept that achieves a speedup of 26.6x compared to a General Purpose Processor and 1.24x compared to a state-of-the-art Reconfigurable Processor (that is statically optimized for the predetermined benchmark situation) when executing an H.264 video encoder. We introduce a novel infrastructure for computation and communication that actually enables the implementation of modular SIs and offers various parameters to match specific requirements. The infrastructure is implemented and tested on an FPGA-based prototype to demonstrate its feasibility.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.
ACM Transactions on Embedded Computing Systems, 2003
Proceedings of the international conference on Compilers, architecture, and synthesis for embedded systems - CASES '01, 2001
2007 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation, 2007
International Journal of Electronics, 2007
… Symposium on Circuits and Systems, 2006. …, 2006
Lecture Notes in Computer Science, 2004