Figure 2 The XSV-300 FPGA board Fig. 1: General FPGA architecture
Related Figures (14)
For the particular implementation reported in this paper, we have used a prototyping board called the XSV-300 FPGA Board, developed by XESS"! The board, shown in Fig. 2, employs a Xilinx XCV300 FPGA with 300,000 gates "'°! It can accept video with up to 9-bits of resolution and output video images through a 110 MHz, 24-bit RAMDAC. It can The fundamental multirate filter banks are the analysis filter bank and the synthesis filter bank ''”!. The analysis filter bank is shown in Figure 3a. It consists of two decimators connected in parallel; the upper decimator is a low pass, H)(z), followed by a down- sampler, and the lower decimator is a high pass filter, H,(z), followed by a down-sampler. Each down- sampler operates by taking a filtered sequence x[n] and generating an output sequence y[n] according to the relation y[n] = x[2n]. All filtered elements in the subsequence x[2n+1] are discarded. On the other hand, the synthesis filter bank is shown in Fig. 3b. It consists of two interpolators connected in parallel; the upper is a low pass filter, Go(z), proceeded by an up-sampler, and the lower is a high pass filter,G,(z), proceeded by an up-sampler |'?!. Each up-sampler inserts an equidistant Since the LUT size in a 4d istributed arithmetic implementation increases exponentially with the number of coefficients, the LUT access time can be a bottleneck for t of the whole system when the LUT size large. Hence, we decomposed the 8-bit two 4-bit LUTs, and added their outputs he speed becomes LUT into using a two-input accumulator. The partitioned-LUT FIR filter architecture is shown in Fig. 6. The to storage is now reduced since the accumulat al size of or is less costly than the larger 8-bit LUT. Furthermore, Fig. 4: Direct FIR filter structure Fig. 6: Partitioned serial distributed arithmetic implementation of the FIR filter partitioning the larger LUT into two smaller LUTs accessed in parallel reduces access time. In addition, throughput of the filter is maintained regardless of the length of the FIR filter. Parallel distributed arithmetic fir filter: As with most hardware applications, we can obtain more performance by using more hardware. In this case, more than one bit sum can be computed at a time by Fig. 7: Two-bit parallel distributed arithmetic implementation of the FIR filter Fig. 8: Single-bit parallel distributed arithmetic implementation of the FIR filter Fig. 9: Eight-bit parallel distributed arithmetic implementation of the FIR filter Fig. 11: Simplified functional Verilog simulation of the analysis filter Fig. 10: FPGA-based implementation of the analysis filter bank implementations. Its also worthwhile comparing the performance of the analysis and synthesis filter banks. Referring back to Tables 1 and 2, its obvious that the throughput of the synthesis filter bank is lower than the throughput of the analysis filter bank. This is due to the fact that, the synthesis filter bank is a little more computation-intensive than the analysis filter bank by virtue of the up-sampling operation which inserts a zero sample between every two successive input samples. Moreover, the synthesis filter bank architecture Table 1: Performance of different implementations of the analysis filter bank. Table 2: Performance of different implementations of the synthesis filter bank Fig. 12: FPGA-based implementation of the synthesis filter bank