Datapath fault tolerance for parallel accelerators

Peter Y K Cheung

Datapath fault tolerance for parallel accelerators

Peter Y K Cheung

2013, 2013 International Conference on Field-Programmable Technology (FPT)

Sign up for access to the world's latest research

checkGet notified about relevant papers

checkSave papers to use in your research

checkJoin the discussion with peers

checkTrack your impact

Abstract

While we reap the benefits of process scaling in terms of transistor density and switching speed, consideration must be given to the negative effects it causes: increased variation, degradation and fault susceptibility. Above device level, such phenomena and the faults they induce can lead to reduced yield, decreased system reliability and, in extreme cases, total failure after a period of successful operation. Although error detection and correction are almost always considered for highly sensitive and susceptible applications such as those in space, for other, more general-purpose applications they are often overlooked. In this paper, we present a parallel matrix multiplication accelerator running in hardware on the Xilinx Zynq system-onchip platform, along with 'bolt-on' logic for detecting, locating and avoiding faults within its datapath. Designs of various sizes are compared with respect to resource overhead and performance impact. Our largest-implemented fault-tolerant accelerator was found to consume 17.3% more area, run at a 3.95% lower frequency and incur an 18.8% execution time penalty over its equivalent fault-susceptible design during fault-free operation.

Figures (12)

Before the accelerator runs, input data is transferred, trig- gered by a command from the CPU, from DRAM into BRAM. Following this, the first row of A is buffered into a register. Each row of B is then fetched in turn and presented to a full row’s contingent of multiply-accumulators (MACs) along with the corresponding element of A. Once the last row of B has been consumed, the computation of Q’s first row is complete: it is stored, the MACs are reset and the process repeats. Once a multiplication is completed in its entirety, its results are transferred from the output BRAM back to DRAM. C. Error Detection

Fig. 2. Checksum generation logic Fig. 3. Checksum validation logic

TABLE I. BASELINE ACCELERATOR RESOURCE USAGE & Fmuax B. Error Detection and Fault Location

ABLE IV. ERROR-DETECTING ACCELERATOR PERFORMANCE

TABLE VI. ERROR-DETECTING AND FAULT-AVOIDING ACCELERATOR PERFORMANCE UNDER FAULT-FREE OPERATION TABLE VII. ERROR-DETECTING AND FAULT-AVOIDING ACCELERATOR PERFORMANCE IN THE PRESENCE OF A SINGLE FAULTY MAC

Fig. 4. Resource reallocation steps for N = 2 accelerator with single faulty MAC acquired for the accelerator with error detection logic only. For N = 382, it can be seen that a slowdown of 0.842 over the baseline accelerator occurs, representing an 18.8% increase in computation time. Performance drops off with N when fault avoidance is required. Fig. 6 summarises speedups for all designs over their software equivalents.

Fig. 5. Total resource usage impacts for varying N

Fig. 6. Performance impacts for varying N

Peter Y K Cheung

2016

As the threat of fault susceptibility caused by mechanisms including variation and degradation increases, engineers must give growing consideration to error detection and correction. While the use of common fault tolerance strategies frequently causes the incursion of significant overheads in area, performance and/or power consumption, options exist that buck these trends. In particular, algorithm-based fault tolerance embodies a proven family of low-overhead error mitigation techniques able to be built upon to create self-verifying circuitry. In this paper, we present our research into the application of algorithm-based fault tolerance ABFT in FPGA-implemented accelerators at reduced levels of precision. This allows for the introduction of a previously unexplored tradeoff: sacrificing the observability of faults associated with low-magnitude errors for gains in area, performance and efficiency by reducing the bit-widths of logic used for error detection. We describe the implementat...

Log In

Datapath fault tolerance for parallel accelerators

Sign up for access to the world's latest research

Abstract

Related papers

Related topics

Related papers