Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2005, Design, Automation and Test in Europe
…
6 pages
1 file
FPGAs, as computing devices, offer significant speedup over microprocessors. Furthermore, their configurability offers an advantage over traditional ASICs. However, they do not yet enjoy high-level language programmability, as microprocessors do. This has become the main obstacle for their wider acceptance by application designers.
FPGAs, as computing devices, offer significant speedup over microprocessors. Furthermore, their configurability offers an advantage over traditional ASICs. However, they do not yet enjoy high-level language programmability, as microprocessors do. This has become the main obstacle for their wider acceptance by application designers.
ACM Transactions on Architecture and Code Optimization, 2008
The wider acceptance of FPGAs as a computing device requires a higher level of programming abstraction. ROCCC is an optimizing C to HDL compiler. We describe the code generation approach in ROCCC. The smart buffer is a component that reuses input data between adjacent iterations. It significantly improves the performance of the circuit and simplifies loop control. The ROCCCgenerated datapath can execute one loop iteration per clock cycle when there is no loop dependency or there is only scalar recurrence variable dependency. ROCCC's approach to supporting while-loops operating on scalars makes the compiler able to move scalar iterative computation into hardware.
FPGA computing is always thought as a media to dramatically improve computational performances. The real obstacle to its widespread diffusion is primarily due to the lack of compiling tools which allow to use common specification languages (like the ANSI C); on the contrary, FPGAs have to be programmed either through very low level HDL languages or through some not standard languages which are dialects derived from the C but which are very far from the standard C-language. In order to overcome previous drawbacks, Ylichron developed a compiling chain, the HARWEST Compiling Environment (HCE), which allows to specify algorithms to be mapped onto FPGAs through standard C programs: as a consequence, no special skills are required to access the power of FPGA computing and no special efforts have to be spent to learn proprietary languages. The HCE Design Flow and some performance figures are presented in the paper.
Proceedings. The 5th Annual IEEE Symposium on Field-Programmable Custom Computing Machines Cat. No.97TB100186)
2010 18th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2010
While FPGA-based hardware accelerators have repeatedly been demonstrated as a viable option, their programmability remains a major barrier to their wider acceptance by application code developers. These platforms are typically programmed in a low level hardware description language, a skill not common among application developers and a process that is often tedious and error-prone. Programming FPGAs from high level languages would provide easier integration with software systems as well as open up hardware accelerators to a wider spectrum of application developers. In this paper, we present a major revision to the Riverside Optimizing Compiler for Configurable Circuits (ROCCC) designed to create hardware accelerators from C programs. Novel additions to ROCCC include (1) intuitive modular bottom-up design of circuits from C, and (2) separation of code generation from specific FPGA platforms. The additions we make do not introduce any new syntax to the C code and maintain the high level optimizations from the ROCCC system that generate efficient code. The modular code we support functions identically as software or hardware. Additionally, we enable user control of hardware optimizations such as systolic array generation and temporal common subexpression elimination. We evaluate the quality of the ROCCC 2.0 tool by comparing it to hand-written VHDL code. We show comparable clock frequencies and a 18% higher throughput. The productivity advantages of ROCCC 2.0 is evaluated using the metrics of lines of code and programming time showing an average of 15x improvement over hand-written VHDL.
2018
Multi-Processor System-on-Chip FPGAs can utilize programmable logic for compute intensive functions, using socalled Accelerators, implementing a heterogeneous computing architecture. Thereby, Embedded Systems can benefit from the computing power of programmable logic while still maintaining the software flexibility of a CPU. As a design option to the well-established RTL design process, Accelerators can be designed using High-Level Synthesis. The abstraction level for the functionality description can be raised to algorithm level by a tool generating HDL code from a high-level language like C/C++. The Xilinx tool Vivado HLS allows the user to guide the generated RTL implementation by inserting compiler pragmas into the C/C++ source code. This paper analyzes the possibilities to improve the performance of an FPGA accelerator generated with Vivado HLS and integrated into a Vivado block design. It investigates, how much the pragmas affect the performance and resource cost and shows pro...
IEEE Transactions on Computer-aided Design of Integrated Circuits and Systems, 2011
Escalating system-on-chip design complexity is pushing the design community to raise the level of abstraction beyond register transfer level. Despite the unsuccessful adoptions of early generations of commercial high-level synthesis (HLS) systems, we believe that the tipping point for transitioning to HLS methodology is happening now, especially for field-programmable gate array (FPGA) designs. The latest generation of HLS tools has made significant progress in providing wide language coverage and robust compilation technology, platform-based modeling, advancement in core HLS algorithms, and a domain-specific approach. In this paper, we use AutoESL's AutoPilot HLS tool coupled with domain-specific system-level implementation platforms developed by Xilinx as an example to demonstrate the effectiveness of state-of-art C-to-FPGA synthesis solutions targeting multiple application domains. Complex industrial designs targeting Xilinx FPGAs are also presented as case studies, including comparison of HLS solutions versus optimized manual designs. In particular, the experiment on a sphere decoder shows that the HLS solution can achieve an 11-31% reduction in FPGA resource usage with improved design productivity compared to hand-coded design.
Although high-level synthesis tools and processor synthesis tools have emerged to improve the design productivity of SoC components, we have yet to see a practical solution for the challenging tasks of system-level integration and verification of these individual components involving different design languages, different tool sets and different debugging environments. In this paper, we propose a new C-based design framework where the RTL structure is directly described on dataflow C coding style, while the same C code serves as a fast simulation model. Design example on image signal processing pipeline shows the effectiveness of the proposed C-based tool framework where the dataflow C codes have 1/5 of the number of lines compared to HDLs, can generate high performance circuits with enormously high parallelism of 4000 operations/cycle. Also for RISC processor designs, our dataflow C coding style effectively captures the behavior of the instruction set simulator with less than 1000 lines of C code that runs at 11M cycles/sec speed which is 42x faster than RTL simulation, that can also be directly transformed into RTL description.
This paper describes our approaches to raise the level of abstraction at which hardware suitable for accelerating computationally intensive applications can be specified. Field-programmable gate arrays are becoming adopted as a computational platform by the high-performance computing community, but there are challenges to extract maximum performance from these devices. Unlike other approaches, our focus is on data memory organization and input–output bandwidth considerations, which are the typical stumbling block of existing hardware compilation schemes. We describe our approaches, which are based on formal optimization techniques, and present some results showing the advantage of exposing the interaction between data memory system design and parallelism extraction to the compiler.
2010
Reconfigurable computers, where one or more FPGAs are attached to a conventional microprocessor, are promising platforms for code acceleration. Despite their advantages, programmability concerns and the lack of efficient design tools/compilers for FPGAs are preventing the technology's widespread adoption. The traditional compiler technology is microprocessor-based-systemsspecific and needs to be customized and augmented to address the needs in reconfigurable computing. The challenges are several due to the resources and performance constraints for FP-GAs being drastically different than those of microprocessors, and also that compiling for FP-GAs requires laying the computation in space by a circuit rather than in time by a sequence of instructions.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.
Lecture Notes in Computer Science, 2006
The Computer Journal, 2011
Proceedings of the 50th Annual Design Automation Conference on - DAC '13, 2013
Proceedings of ASP-DAC '97: Asia and South Pacific Design Automation Conference, 1997
2013 23rd International Conference on Field programmable Logic and Applications, 2013
Lecture Notes in Computer Science, 2002
Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation, 2021
2013 International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES), 2013
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2020
Lecture Notes in Computer Science, 2001