0% found this document useful (0 votes)
38 views25 pages

Module 4 - Hardware Accelerators For Deep Learning

The document provides an overview of the Intel® Distribution of OpenVINO™ Toolkit, focusing on hardware accelerators for deep learning applications, including Intel® CPUs, GPUs, and the Movidius™ Myriad™ X VPU. It outlines factors to consider when selecting hardware for deep learning inference, discusses performance benchmarks, and includes hands-on lab exercises for practical application. The module aims to equip educators and learners with the knowledge to evaluate and deploy deep learning models effectively using Intel hardware.

Uploaded by

Aayush Bhure
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
38 views25 pages

Module 4 - Hardware Accelerators For Deep Learning

The document provides an overview of the Intel® Distribution of OpenVINO™ Toolkit, focusing on hardware accelerators for deep learning applications, including Intel® CPUs, GPUs, and the Movidius™ Myriad™ X VPU. It outlines factors to consider when selecting hardware for deep learning inference, discusses performance benchmarks, and includes hands-on lab exercises for practical application. The module aims to equip educators and learners with the knowledge to evaluate and deploy deep learning models effectively using Intel hardware.

Uploaded by

Aayush Bhure
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

Intel® Distribution of OpenVINO™ Toolkit

Digital Courseware for


Educators
Course: Deploying Deep Learning Applications
Intel® Distribution of OpenVINO™ Toolkit

Deploying Deep Learning


Applications
MODULE 4: Hardware Accelerators for Deep Learning
Notices and disclaimers

Performance varies by use, configuration, and other factors. Learn more at [Link]/PerformanceIndex .
Performance results are based on testing as of dates shown in configurations and may not reflect all publicly available updates. See
backup for configuration details. No product or component can be absolutely secure.
Your costs and results may vary.
Intel® technologies may require enabled hardware, software, or service activation.
Intel® optimizations, for Intel® compilers or other products, may not optimize to the same degree for non-Intel products.
Intel does not control or audit third-party data. You should consult other sources to evaluate accuracy.
Results have been estimated or simulated.
Intel is committed to respecting human rights and avoiding complicity in human rights abuses.
See Intel’s Global Human Rights Principles. Intel® products and software are intended only to be used in
applications that do not cause or contribute to a violation of an internationally recognized human right.
© Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries.
Other names and brands may be claimed as the property of others.

3
Module 4
Hardware Accelerators for Deep Learning

4
Table of Contents
• Factors to be Considered while Selecting a Hardware • Hands-on Lab
Platform for Deep Learning (DL) Inference
• Exercise 1: Benchmark App
• Intel® CPU (Central Processing Unit) for Deep
Learning Inference • Exercise 2: Accelerated Object
• GPU and iGPU (Integrated Graphical Processing Detection
Unit) for Deep Learning Inference
• Intel® MovidiusTM MyriadTM X Vision Processing Unit
(VPU)
• Intel® Neural Compute Stick 2
• Performance Benchmark Results

5
Module 4: Learning Objective

• Discover the various Intel® hardware platforms for AI inference: CPU, GPU, and VPU.

• Learn about the Intel® Movidius™ Myriad™ X VPU platform and its features.

• Apply the concepts and knowledge gained in previous modules to evaluate different hardware
platforms and their capabilities.

6
Module 4: Learning Outcomes

After completing this module, students/learners should be able to:

• Evaluate different hardware platforms for AI inference.

• Have a thorough understanding of the hardware platforms available in the Intel® AI Ecosystem.

7
Module 4: Key Questions Addressed

• How to evaluate different hardware platforms for AI inference?

• What hardware platforms are available in the Intel Ecosystem?

8
Factors to be Considered while Selecting a Hardware
Platform for Deep Learning (DL) Inference

There are several factors that you should consider when selecting the hardware platform
for DL inference. These are primarily dependent on your use case.

▪ Deploying on the Edge or on the Cloud


▪ Power Consumption of the platform
▪ Latency
▪ Extensibility
▪ Support

9
Discussion Points
• What factors other than those listed in the previous section should you consider while
selecting a hardware platform?
• What is the latency-throughput tradeoff? How does this affect hardware platform
selection?
• At the hardware level, can you think of some features that are desirable for AI
applications?

10
Intel® CPU (Central Processing Unit) for Deep
Learning Inference
The CPU is the host device that powers the processes that support deep learning inference.
These processes include pre-processing, data communication, and post-processing. CPU performs
instructions sequentially, and it is a latency-oriented device. While generally, they support accelerators,
you can also use the CPU for accelerated AI inference.

The Intel® CPU platform provides a flexible, cost-efficient platform for AI inference that you can use
while deploying to the Edge or the Cloud. Figure below shows the various Intel® CPU platforms.

Various Performance Index for Intel® processors can be found here

11
iGPU (Integrated Graphical Processing Unit) and
Intel® Xe Graphics for Deep Learning Inference
As part of its GPU offerings, Intel provides both integrated and discrete graphicsplatforms for
accelerated parallel inference on GPUs.
• Integrated GPU (iGPU) within select Intel® CPUs
• Intel® Xe Graphics (Discrete GPU)
• Intel® Arc™ Graphics (Discrete GPU)
• Intel® Data Center GPU (Discrete GPU)
The architecture of both the Intel® Xe Graphics and iGPU provides a powerful ISA (Instruction Set
Architecture) supporting FP-32 and FP-16 instructions with SIMD Multiply-Accumulate instructions.
They also offer efficient memory blocks to load data quickly. The optimized GEMM (generalized
matrix-multiply) is very effective in accelerating deep learning operations.

To read more about deep learning performance on GPUs, please refer to this link.

12
Intel® MovidiusTM MyriadTM X Vision Processing Unit
(VPU)
The Intel ® MovidiusTM MyriadTM X Vision Processing Unit (VPU) is a
dedicated hardware accelerator for deep neural network inference.

The Intel® MovidiusTM MyriadTM VPU platform provides a Neural


Compute Engine that offers flexible interconnect and ease of
configuration, allowing for running on-device DNN applications. These
features are essential for safety-critical applications like self- driving
cars.

To read more about deep learning performance on GPUs, please refer to


this link.

13
Next generation ai inference
Intel® Movidius™ myriad™ x vpu
Neural Compute Engine
An entirely new deep neural network (DNN) inferencing engine
that offers flexible interconnect and ease of configuration for on-
device DNNs and computer vision applications

16 SHAVE Cores
VLIW (DSP) programmable processors are
optimized for complex vision & imaging workloads

Hardware-based encoder
for up to 4K video resolution and includes a new
stereo depth block that is capable of processing
dual 720p feeds at up to 180Hz.

14
Intel® Vision Accelerator Design With Intel®
Movidius™ Vision Processing Unit (VPU)

• Specialized processors designed to deliver high-


performance machine vision at ultra-low power.
• Supports up to 16 video streams per device
• Ideal for camera and network video recorder
(NVR) use cases with power, size, and cost
constraints
• Supports small memory footprint networks

15
Examples of Intel® Vision Accelerator Design Products
Accelerators based on Intel® Movidius™ VPU

Example card
based on
Vision Accelerator
Designs
1 Intel® Movidius™ 2 Intel® Movidius™ 8 Intel® Movidius™
VPU VPUs VPUs

Interface M.2, Key E miniPCIe** PCIe x4

Currently
manufactured by*
**Other names and brands may be claimed as the property of others

INTEL® DISTRIBUTION OF OPENVINO™ TOOLKIT


Software tools
Develop NN Model; Deploy across Intel® CPU, GPU, VPU, FPGA; Leverage common algorithms

16
Discussion Points
• Based on your understanding of the topics covered in the preceding section, what are
the limitations of each of the Intel® hardware platforms?
• Why do you believe processors specialized for specific data workloads are better suited
for tasks like computer vision?
• Discuss which hardware platforms are already a part of your daily life and what role they
play in your life.

17
Performance Benchmark Results
The benchmark results show significant performance improvements on several public neural networks on a
variety of Intel® CPUs, GPUs, and VPUs, spanning a wide performance range. The outcomes may be useful
for planning AI workload on the Intel computing already present in your solutions or for determining which
hardware is ideal for your applications and solutions.

More details on the benchmark setup information can be found in the documentation here.

Parameters used for measurement for OpenVINO toolkit benchmark results :

▪ Throughput
▪ Value
▪ Efficiency
▪ Latency

Refer here for performance benchmark results for Intel pretrained models.

18
Summary
• In this module, we learned about the various hardware platforms available for deep
learning inference, including the Intel® CPU, iGPU, and Intel® Movidius™ Myriad™ X
VPU platforms as well as the benefits and limitations of each hardware platform.
• We will discover how to use Deep Learning Workbench to optimize and deploy a
deep learning model in the upcoming module.

19
Hands-on Lab

20
Hands-on Lab
Exercise 1: Benchmark App

In this lab exercise, you will learn how to use the


benchmarking tool on Intel® DevCloud for Edge Workload
to rate the efficiency of your model's synchronous and
asynchronous inference.

You can access the lab exercise on Intel DevCloud for Edge
Workloads.

21
Hands-on Lab
Exercise 2: Accelerated Object Detection

In this lab exercise, you will learn to accelerate object


detection by using asynchronous inferencing and
distributing workloads to multiple types of
processing units.

You can access the lab exercise on Intel DevCloud for


Edge Workloads.

Launch the Jupyter* Notebook and follow the


instructions to finish the lab.

22
System configuration
System board Intel prototype, TGL U DDR4 SODIMM RVP ASUSTek COMPUTER INC./Prime z370-a

CPU 11th Gen Intel® Core™ i5-1145G7 @ 2.6 GHz 8th Gen Intel ® Core™ i5-8500t @ 3.0 GHz

Sockets/physical cores 1/4 1/6

Hyperthreading/turbo setting Enabled/On NA/On

Memory 2 x 8198 MB 3200 MT/s DDR4 2 x 16384 MB 2667 MT/s DDR4

OS Ubuntu 18.04 LTS Ubuntu 18.04 LTS

Kernel 5.8.0-050800-generic 5.3.0-24-generic

Software Intel® Distribution of OpenVINO™ toolkit 2021.1.075 Intel Distribution of OpenVINO toolkit 2021.1.075

BIOS Intel TGLIFUI1.R00.3243.A04.2006302148 AMI, version 2401

BIOS release date June 30, 2020 July 12, 2019

BIOS setting Load default settings Load default settings, set XMP to 2667

Test date September 9, 2020 September 9, 2020

Precision and batch size CPU: int8, GPU: FP16-int8, batch size: 1 CPU: int8, GPU: FP16-int8, batch size: 1

Number of inference requests 4 6

Number of execution streams 4 6

Power (TDP link) 28W 35W

Price (USD) link on 02/25/2022


USD 312 USD 192
Prices may vary

1) Memory is installed such that all primary memory slots are populated.
2) Testing by Intel as of September 9, 2020.

24
Compounding effect of hardware and software configuration
See the compounding effect

1) Purley E63448-400,
System board 2) Intel® Server Board S2600STB 3) Intel Server Board S2600STB 4) Intel® Internal Reference System
Intel® Internal Reference System

CPU Intel® Xeon® Silver 4116 @ 2.1 GHz Intel® Xeon® Silver 4216 CPU @ 2.10 GHz Intel® Xeon® Silver 4216R CPU @ 2.20 GHz Intel® Xeon® Silver 4316 CPU @ 2.30 GHz

Sockets, physical cores/socket 2, 12 2, 16 2, 16 2, 20

Hyperthreading/turbo setting Enabled/On Enabled/On Enabled/On Enabled/On

Memory 12x 16 GB DDR4 2400 MHz 12x 64 GB DDR4 2400 MHz 12x 32GB DDR4 2666 MHz 16 x32GB DDR4 2666 MHz

OS UB-16.04.3 LTS UB-18.04 LTS UB-18.04 LTS UB-20.04 LTS

Kernel 4.4.0-210-generic 4.15.0-96-generic 5.3.0-24-generic 5.13.0-rc5-intel-next+


Intel® Distribution of OpenVINO™ toolkit Intel® Distribution of OpenVINO™ toolkit Intel® Distribution of OpenVINO™ toolkit Intel® Distribution of OpenVINO™ toolkit
Software
R5 2018 R3 2019 2021.2 2021.4.1
Intel Corporation
BIOS PLYXCRB1.86B.0616.D08.2109180410 — SE5C620.86B.02.01. [Link].0020.P93.2103190412
0009.092820190230
BIOS release date September 18, 2021 — September 28, 2019 March 19, 2021
Select optimized default settings, Select optimized default settings,
Select optimized default settings, Select optimized default settings,
BIOS setting change power policy to "performance," change power policy to "performance,"
save, and exit save, and exit
save, and exit save, and exit
Test date October 8, 2021 September 27, 2019 December 24, 2020 September 6, 2021

Precision and batch size FP32/Batch 1 int8/Batch 1 int8/Batch 1 int8/Batch 1

Workload: model/image size MobileNet-SSD/300x300 MobileNet-SSD/300x300 MobileNet-SSD/300x300 MobileNet-SSD/300x300

Number of inference requests 24 32 32 10

Number of execution streams 24 32 32 10

Power (TDP link) 170W 200W 250W 300W

Price (USD) link on 02/25/2022


USD 2,024 USD 1,926 USD 2,004 USD 2,166
Prices may vary

25

You might also like