Review of parallel computing methods and tools for FPGA technology

Radosław Cieszewski; Maciej Linczuk; Krzysztof Pozniak; Ryszard Romaniuk

Review of parallel computing methods and tools for FPGA

Abstract

Parallel computing is emerging as an important area of research in computer architectures and software systems. Many algorithms can be greatly accelerated using parallel computing techniques. Specialized parallel computer architectures are used for accelerating specific tasks. High-Energy Physics Experiments measuring systems often uses FPGAs for fine-grained computation. FPGA combines many benefits of both software and ASIC implementations. Like software, the mapped circuit is flexible, and can be reconfigured over the lifetime of the system. FPGAs therefore have the potential to achieve far greater performance than software as a result of bypassing the fetch-decode-execute operations of traditional processors, and possibly exploiting a greater level of parallelism. Creating parallel programs implemented in FPGAs is not trivial. This paper presents existing methods and tools for fine-grained computation implemented in FPGA using High Level Programming Languages.

High-performance computing using accelerators A recent trend in high-performance computing is the development and use of heterogeneous architectures that combine fine-grain and coarse-grain parallelism using tens or hundreds of disparate processing cores. These processing cores are available as accelerators or many-core processors, which are designed with the goal of achieving higher parallel-code performance. This is in contrast with traditional multicore CPUs that effectively replicate serial CPU cores. The recent demand for these accelerators comes primarily from consumer applications, including computer gaming and multimedia. Examples of such accelerators include graphics processing units (GPUs), Cell Broadband Engines (Cell BEs), field-programmable gate arrays (FPGAs), and other data-parallel or streaming processors. Compared to conventional CPUs, the accelerators can offer an order-of-magnitude improvement in performance per dollar as well as per watt. Moreover, some recent industry announcements are pointing towards the design of heterogeneous processors and computing environments, which are scalable from a system with a single homogeneous processor to a high-end computing platform with tens, or even hundreds, of thousands of heterogeneous processors. This special issue on ''High-Performance Computing Using Accelerators'' includes many papers on such commodity, many-core processors, including GPUs, Cell BEs, and FPGAs. GPGPUs: Current top-of-the-line GPUs have tens or hundreds of fragment processors and high memory bandwidth, i.e. 10• more than current CPUs. This processing power of GPUs has been successfully exploited for scientific, database, geometric and imaging applications (i.e. GPGPUs, short for General-Purpose computation on GPUs). The significant increase in parallelism within a processor can also lead to other benefits including higher power-efficiency and better memory-latency tolerance. In many cases, an order-of-magnitude performance was shown, as compared to top-of-the-line CPUs. For example, GPUTeraSort used the GPU interface to drive memory more efficiently and resulted in a threefold improvement in records/second/CPU. Similarly, some of the fastest algorithms for many numerical computations-including FFT, dense matrix multiplications and linear solvers, and collision and proximity computations-use GPUs to achieve tremendous speed-ups. Cell Broadband Engines: The Cell Broadband Engine is a joint venture between Sony, Toshiba, and IBM. It appears in consumer products such as Sony's PlayStation 3 computer entertainment system and Toshiba's Cell Reference Set, a development tool for Cell Broadband Engine applications. When viewed as a processor, the Cell can exploit the orthogonal dimensions of task and data parallelism on a single chip. The Cell processor consists of a symmetric multi-threaded (SMT) Power Processing Element (PPE) and eight Synergistic Processing Elements (SPEs) with pipelined SIMD capabilities. The processor achieves a theoretical peak performance of over 200 Gflops for single-precision floating-point calculations and has a peak memory bandwidth of over 25 GB/s. Actual speed-up factors achieved when automatically parallelizing sequential code kernels via the Cell's pipelined SIMD capabilities reach as high as 26-fold. Field-Programmable Gate Arrays (FPGAS): FPGAs support the notion of reconfigurable computing and offer a high degree of on-chip parallelism that can be mapped directly from the dataflow characteristics of an application's parallel algorithm. Their recent emergence in the high-performance computing arena can be attributed to a hybrid approach that combines the logic blocks and interconnects of traditional FPGAs with

Log In

Review of parallel computing methods and tools for FPGA

Sign up for access to the world's latest research

Abstract

Related papers

Related topics