Academia.edu no longer supports Internet Explorer.

To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.

Log In
Sign Up

Figure from:Datapath fault tolerance for parallel accelerators

See full PDF downloadDownload figure

Before the accelerator runs, input data is transferred, trig- gered by a command from the CPU, from DRAM into BRAM. Following this, the first row of A is buffered into a register. Each row of B is then fetched in turn and presented to a full row’s contingent of multiply-accumulators (MACs) along with the corresponding element of A. Once the last row of B has been consumed, the computation of Q’s first row is complete: it is stored, the MACs are reset and the process repeats. Once a multiplication is completed in its entirety, its results are transferred from the output BRAM back to DRAM. C. Error Detection — Figure 1 Before the accelerator runs, input data is transferred, trig- gered by a command from the CPU, from DRAM into BRAM. Following this, the first row of A is buffered into a register. Each row of B is then fetched in turn and presented to a full row’s contingent of multiply-accumulators (MACs) along with the corresponding element of A. Once the last row of B has been consumed, the computation of Q’s first row is complete: it is stored, the MACs are reset and the process repeats. Once a multiplication is completed in its entirety, its results are transferred from the output BRAM back to DRAM. C. Error Detection

Related Figures (11)

Fig. 2. Checksum generation logic Fig. 3. Checksum validation logic

TABLE I. BASELINE ACCELERATOR RESOURCE USAGE & Fmuax B. Error Detection and Fault Location

ABLE IV. ERROR-DETECTING ACCELERATOR PERFORMANCE

TABLE VI. ERROR-DETECTING AND FAULT-AVOIDING ACCELERATOR PERFORMANCE UNDER FAULT-FREE OPERATION TABLE VII. ERROR-DETECTING AND FAULT-AVOIDING ACCELERATOR PERFORMANCE IN THE PRESENCE OF A SINGLE FAULTY MAC

Fig. 4. Resource reallocation steps for N = 2 accelerator with single faulty MAC acquired for the accelerator with error detection logic only. For N = 382, it can be seen that a slowdown of 0.842 over the baseline accelerator occurs, representing an 18.8% increase in computation time. Performance drops off with N when fault avoidance is required. Fig. 6 summarises speedups for all designs over their software equivalents.

Fig. 5. Total resource usage impacts for varying N

Fig. 6. Performance impacts for varying N

About
Press
Papers
Topics
We're Hiring!
Help Center

Find new research papers in:
Physics
Chemistry
Biology
Health Sciences
Ecology
Earth Sciences
Cognitive Science
Mathematics
Computer Science

Terms
Privacy
Copyright
Academia ©2025