Academia.eduAcademia.edu

Efficient Hardware Looping Units for FPGAs

2010

Abstract

Looping operations impose a significant bottleneck to achieving better computational efficiency for embedded applications. To confront this problem in embedded computation either in the form of programmable processors or FSMD (Finite-State Machine with Datapath) architectures, the use of customized loop controllers has been suggested. In this paper, a thorough examination of zero-cycle overhead loop controllers applicable to perfect loop nests operating on multi-dimensional data is presented. The design of such loop controllers is formalized by the introduction of a hardware algorithm that fully automates this task for the spectrum of behavioral as well as generated register-transfer level architectures. The presented algorithm would prove beneficial in the field of high-level synthesis of architectures for data-intensive processing. It is also shown that the proposed loop controllers can be efficiently utilized for supporting generalized loop structures such as imperfect loop nests. The performance characteristics (cycle time, chip area) of the proposed architectures have been evaluated for FPGA target implementations. It is shown that maximum clock frequencies of above 230MHz with low logic footprints of about 1.4% of the overall logic resources can be achieved for supporting up to 8 nested loops with 16-bit indices on a modestly-sized Xilinx Virtex-5 device.

Key takeaways

  • Further, potential uses and extensions of the HWLU design are discussed for the support of irregular loop structures such as imperfect loop nests.
  • While with ZOLC, a complex loop structure with an arbitrary number and combination of loops can be controlled, by using a single process unit (one adder, one comparator etc), HWLU demands this hardware replicated for each loop.
  • The hardware looping architecture (HWLU) naturally can incorporate any number of levels of loop nesting in hardware to eliminate branch instruction overhead for loop increments.
  • In order to assess the performance of the HWLU, IXGEN-B and IXGEN-R hardware looping units for perfect loop nests, they are evaluated over the entire parameter set for the following value set; N LP : 1 − 8 and DW : 8, 12, 16.
  • All three variants of the hardware looping architecture (HWLU, IXGEN-B, and IXGEN-R) have been designed in VHDL and synthesized for XC5VLX50.