Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2022, Springer eBooks
…
41 pages
1 file
The design and implementation of Deep Learning (DL) models is currently receiving a lot of attention from both industrials and academics. However, the computational workload associated with DL is often out of reach for low-power embedded devices and is still costly when run on datacenters. By relaxing the need for fully precise operations, Approximate Computing (AxC) substantially improves performance and energy efficiency. DL is extremely relevant in this context, since playing with the accuracy needed to do adequate computations will significantly enhance performance, while keeping the quality of results in a user-constrained range. This chapter will explore how AxC can improve the performance and energy efficiency of hardware accelerators in DL applications during inference and training.
2021 24th International Symposium on Design and Diagnostics of Electronic Circuits & Systems (DDECS)
The design and implementation of Convolutional Neural Networks (CNNs) for deep learning (DL) is currently receiving a lot of attention from both industrials and academics. However, the computational workload involved with CNNs is often out of reach for low power embedded devices and is still very costly when running on datacenters. By relaxing the need for fully precise operations, approximate computing substantially improves performance and energy efficiency. Deep learning is very relevant in this context, since playing with the accuracy to reach adequate computations will significantly enhance performance, while keeping quality of results in a user-constrained range. AdequateDL is a project aiming to explore how approximations can improve performance and energy efficiency of hardware accelerators in DL applications. This paper presents the main concepts and techniques related to approximation of CNNs and preliminary results obtained in the AdequateDL framework.
ACM Computing Surveys, 2022
Deep Neural Networks (DNNs) are very popular because of their high performance in various cognitive tasks in Machine Learning (ML). Recent advancements in DNNs have brought beyond human accuracy in many tasks, but at the cost of high computational complexity. To enable efficient execution of DNN inference, more and more research works, therefore, exploit the inherent error resilience of DNNs and employ Approximate Computing (AC) principles to address the elevated energy demands of DNN accelerators. This article provides a comprehensive survey and analysis of hardware approximation techniques for DNN accelerators. First, we analyze the state of the art and by identifying approximation families, we cluster the respective works with respect to the approximation type. Next, we analyze the complexity of the performed evaluations (with respect to the dataset and DNN size) to assess the efficiency, the potential, and limitations of approximate DNN accelerators. Moreover, a broad discussion...
2018 Design, Automation & Test in Europe Conference & Exhibition (DATE)
Deep Neural Networks (DNNs) have emerged as a powerful and versatile set of techniques to address challenging artificial intelligence (AI) problems. Applications in domains such as image/video processing, natural language processing, speech synthesis and recognition, genomics and many others have embraced deep learning as the foundational technique. DNNs achieve superior accuracy for these applications using very large models which require 100s of MBs of data storage, ExaOps of computation and high bandwidth for data movement. Despite advances in computing systems, training state-of-the-art DNNs on large datasets takes several days/weeks, directly limiting the pace of innovation and adoption. In this paper, we discuss how these challenges can be addressed via approximate computing. Based on our earlier studies demonstrating that DNNs are resilient to numerical errors from approximate computing, we present techniques to reduce communication overhead of distributed deep learning training via adaptive residual gradient compression (AdaComp), and computation cost for deep learning inference via Prameterized clipping ACTivation (PACT) based network quantization. Experimental evaluation demonstrates order of magnitude savings in communication overhead for training and computational cost for inference while not compromising application accuracy.
2019 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), 2019
The state-of-the-art approaches employ approximate computing to reduce the energy consumption of DNN hardware. Approximate DNNs then require extensive retraining afterwards to recover from the accuracy loss caused by the use of approximate operations. However, retraining of complex DNNs does not scale well. In this paper, we demonstrate that efficient approximations can be introduced into the computational path of DNN accelerators while retraining can completely be avoided. ALWANN provides highly optimized implementations of DNNs for custom low-power accelerators in which the number of computing units is lower than the number of DNN layers. First, a fully trained DNN (e.g., in TensorFlow) is converted to operate with 8-bit weights and 8-bit multipliers in convolutional layers. A suitable approximate multiplier is then selected for each computing element from a library of approximate multipliers in such a way that (i) one approximate multiplier serves several layers, and (ii) the overall classification error and energy consumption are minimized. The optimizations including the multiplier selection problem are solved by means of a multiobjective optimization NSGA-II algorithm. In order to completely avoid the computationally expensive retraining of DNN, which is usually employed to improve the classification accuracy, we propose a simple weight updating scheme that compensates the inaccuracy introduced by employing approximate multipliers. The proposed approach is evaluated for two architectures of DNN accelerators with approximate multipliers from the open-source "EvoApprox" library, while executing three versions of ResNet on CIFAR-10. We report that the proposed approach saves 30% of energy needed for multiplication in convolutional layers of ResNet-50 while the accuracy is degraded by only 0.6% (0.9% for the ResNet-14). The proposed technique and approximate layers are available as an open-source extension of TensorFlow at https://github.com/ehw-fit/tf-approximate. Index Terms-approximate computing, deep neural networks,
Journal of Low Power Electronics, 2018
Growing interest towards the development of smart Cyber Physical Systems (CPS) and Internet of Things (IoT) has motivated the researchers to explore the suitability of carrying out embedded machine learning. This has enabled a new age of smart CPS and IoT with emerging applications like autonomous vehicles, smart cities and houses, advanced robotics, IoT-Healthcare, and Industry 4.0. Due to the availability of a huge amount of data and compute power, Deep Neural Networks (DNNs) have become one of the enabling technologies behind this current age of machine learning and intelligent systems. The benefits of DNNs however come at a high computational cost and require tremendous amount of energy/power resources that are typically not available on (embedded) IoT and CPS devices, especially when considering the IoT-Edge nodes. To improve the performance and energy/power efficiency of these DNNs, this paper presents a cross-layer approximation methodology which exploits the error resiliency offered by DNNs at various hardware and software layers of the computing stack. We present various case studies at both software and hardware level in order to demonstrate the energy benefits of the proposed methodology. At software level we provide a systematic pruning methodology while at hardware level we provide a case study utilizing approximation of multipliers used for performing the weighted sum operation in the neural processing of DNNs.
IEEE Journal on Emerging and Selected Topics in Circuits and Systems, 2018
Deep Neural Networks (DNNs) have emerged as the state-of-the-art technique in a wide range of machine learning tasks for analytics and computer vision in the next generation of embedded (mobile, IoT, wearable) devices. Despite their success, they suffer from high energy requirements. In recent years, the inherent error resiliency of DNNs has been exploited by introducing approximations at either the algorithmic or the hardware levels (individually) to obtain energy savings while incurring tolerable accuracy degradation. However, there is a need for investigating the overall energy-accuracy trade-offs arising from the introduction of approximations at different levels in complex DNNs. We perform a comprehensive analysis to determine the effectiveness of cross-layer approximations for the energy-efficient realization of large-scale DNNs. The approximations considered are as follows: (i) use of lower complexity networks (containing lesser number of layers and/or neurons per layer), (ii) pruning of synaptic weights, (iii) approximate multiplication operation in the neuronal MAC (Multiply-and-Accumulate) computation, and (iv) approximate write/read operations to/from the synaptic memory. Our experiments on recognition benchmarks (MNIST, CIFAR10) show that cross-layer approximation provides substantial improvements in energy efficiency for different accuracy/quality requirements. Furthermore, we propose a synergistic framework for combining the approximation techniques to achieve maximal energy benefits from approximate DNNs.
Lattice ML Journal, 2020
Deep Neural networks are among the most powerful machine learning techniques that are becoming very interesting for Big Data applications. In the context of embedded platforms, implementing efficient DNNs in terms of performance and energy consumption while they maintain a required quality is very challenging. Sparsity can be used as an intelligent technique for reducing the size of DNNs. The purpose of this research is to explore the possibilities of introducing sparsity to CNNs and to evaluate their performance.
Abstract Energy efficiency is becoming crucial to realizing the benefits of technology scaling. We introduce a new class of low-power accelerators called Neural Processing Units (NPUs). Instead of being programmed, NPUs learn to behave like general-purpose code written in an imperative language. After a training phase, NPUs mimic the original code with acceptable accuracy.
TELKOMNIKA Telecommunication Computing Electronics and Control, 2019
This paper investigates about the possibility to reduce power consumption in Neural Network using approximated computing techniques. Authors compare a traditional fixed-point neuron with an approximated neuron composed of approximated multipliers and adder. Experiments show that in the proposed case of study (a wine classifier) the approximated neuron allows to save up to the 43% of the area, a power consumption saving of 35% and an improvement in the maximum clock frequency of 20%.
International Journal of Advanced Trends in Computer Science and Engineering, 2020
In today's technology era, Convolutional Neural Networks (CNNs) are the limelight for various cognitive tasks because of their high accuracy. With the increasing complexity in the applications, CNNs present high computation and storage demands which call for customized hardware support to boost their performance. The streaming nature of CNN's workloads makes them suitable for hardware implementations like FPGAs and ASICs. Providing sufficient resources alone cannot solve this difficulty, which makes Approximate Computing a solution. This article gives an insight into various approximate computing techniques used to accelerate the CNNs at multiple levels for the hardware implementations. The survey has been conducted by considering different metrics: approximation technique used, datasets used for evaluation, network structure (AlexNet, LeNet, Visual Geometry Group (VGG) ), hardware platform for implementation (Application Specific Integrated Circuit (ASIC) or Field Programmable Gate Array (FPGA)), training or testing phase and results (in terms of accuracy, area, power, throughput, resource utilization). The approximate computation techniques applied at the various levels of the network and layers are discussed. Necessary comparisons have also been made to know the utility of these techniques for yielding more significant performance gains with minimal losses in the accuracy. Methods are presented with recent contributions in the state-of-the-art image processing applications along with the various future outlooks based on the studies made.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.
2020 IEEE International Symposium on Circuits and Systems (ISCAS), 2020
2016 IEEE International Conference on Rebooting Computing (ICRC), 2016
2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA), 2015
Applied Reconfigurable Computing. Architectures, Tools, and Applications, 2020
2018 Design, Automation & Test in Europe Conference & Exhibition (DATE)
2018 IEEE International Conference on Big Data (Big Data), 2018
Proceedings of the 2021 on Great Lakes Symposium on VLSI, 2021
Proceedings of the Conference for Next Generation Arithmetic 2019
arXiv: Neural and Evolutionary Computing, 2018
Applied Sciences
Artificial Intelligence Review
IET Computers & Digital Techniques, 2021
2018 Design, Automation & Test in Europe Conference & Exhibition (DATE)
arXiv (Cornell University), 2022
IEEE Journal on Emerging and Selected Topics in Circuits and Systems