Towards Real-time CNN Inference from a Video Stream on a Mobile GPU (WiP Paper)
The 21st ACM SIGPLAN/SIGBED Conference on Languages, Compilers, and Tools for Embedded Systems, 2020
While there are several frameworks for CNN inference on mobile GPUs, they do not achieve real-tim... more While there are several frameworks for CNN inference on mobile GPUs, they do not achieve real-time processing for the most of the CNNs that aim at reasonable accuracy since they all employ kernel-by-kernel execution model and do not effectively support INT8 quantization yet. In this paper, we reveal that mobile GPUs suffer from large kernel launch overhead unlike server GPUs, and then propose an on-device deep learning inference framework that can achieve real-time inference of CNNs on mobile GPUs by removing kernel launch overhead and by effectively exploiting INT8 quantization. We have evaluated the proposed framework with a state-of-the-art CNN based face detector (RetinaFace), and observed up to 2.01X of speedup compared to ARM Compute Library (ACL) on a commodity smartphone.
Uploads
Papers by Youngmin Yi