Academia.eduAcademia.edu

Before the accelerator runs, input data is transferred, trig- gered by a command from the CPU, from DRAM into BRAM. Following this, the first row of A is buffered into a register. Each row of B is then fetched in turn and presented to a full row’s contingent of multiply-accumulators (MACs) along with the corresponding element of A. Once the last row of B has been consumed, the computation of Q’s first row is complete: it is stored, the MACs are reset and the process repeats. Once a multiplication is completed in its entirety, its results are transferred from the output BRAM back to DRAM.  C. Error Detection

Figure 1 Before the accelerator runs, input data is transferred, trig- gered by a command from the CPU, from DRAM into BRAM. Following this, the first row of A is buffered into a register. Each row of B is then fetched in turn and presented to a full row’s contingent of multiply-accumulators (MACs) along with the corresponding element of A. Once the last row of B has been consumed, the computation of Q’s first row is complete: it is stored, the MACs are reset and the process repeats. Once a multiplication is completed in its entirety, its results are transferred from the output BRAM back to DRAM. C. Error Detection