0% found this document useful (0 votes)

23 views19 pages

PDC 21 - Graphical Processing Unit

The document provides an overview of Graphics Processing Units (GPUs), highlighting their architecture, programming models, and applications. It contrasts GPUs with Central Processing Units (CPUs) in terms of parallel processing capabilities and details the components of GPU architecture, including CUDA cores and memory hierarchy. Additionally, it discusses programming frameworks like CUDA and OpenCL, optimization techniques, and various applications of GPU computing in fields such as deep learning, scientific computing, and gaming.

Uploaded by

jammubrothers358

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

23 views19 pages

PDC 21 - Graphical Processing Unit

Uploaded by

jammubrothers358

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 19

Graphical

Processing Unit
Parallel and Distributed Computing
Arfan Shahzad
{ [email protected] }
GPU Architecture & Programming

• Graphics Processing Units (GPUs) are specialized hardware designed

for highly parallel computing tasks.

• Unlike CPUs, which have a few powerful cores optimized for

sequential processing, GPUs consist of thousands of smaller cores

that execute tasks in parallel.

GPU Architecture & Programming cont…
GPU vs. CPU Architecture
• GPUs differ from Central Processing Units (CPUs) in terms of architecture
and execution model:

• CPU: Optimized for sequential tasks, with few powerful cores, complex
control logic, and large caches.

• GPU: Optimized for parallelism, with thousands of simple cores executing

multiple threads simultaneously.
GPU Architecture & Programming cont…
GPU vs. CPU Architecture
Feature CPU GPU
Cores Few (4–64) Thousands
Execution Model Sequential Parallel
Latency Low High
Throughput Low High
Control Flow Complex Simple
Memory Hierarchy Large caches Small shared memory
GPU Architecture & Programming cont…
Architectural components
• A typical GPU consists of following architectural components:

• 1- Streaming Multiprocessors (SMs): A GPU consists of multiple SMs, each

containing numerous CUDA cores (or streaming processors) that perform
computations in parallel.

• 2- CUDA Cores: Individual processing units within SMs, executing

instructions in parallel.
GPU Architecture & Programming cont…
Architectural components
• 3- Memory Hierarchy: GPUs have multiple memory types, including:

• Global Memory: Large but relatively slow, accessible by all threads.

• Shared Memory: Faster, limited in size, shared within a single SM.

• Registers: Fastest memory, private to each thread.

• Texture and Constant Memory: Optimized for specific read patterns and
frequently used data.
GPU Architecture & Programming cont…
Architectural components
• 4- Warp-based Execution: Threads are organized into warps (typically 32
threads), which execute instructions in lockstep.

• 5- Memory Controller: Manages access to different memory types and optimizes

bandwidth.

• 6- SIMD Execution Model: GPUs follow the Single Instruction Multiple Data
(SIMD) model, where multiple threads execute the same instruction on different
data.
GPU Architecture & Programming cont…
Programming models
• Programming a GPU involves writing parallel code using specialized
frameworks. The most common models include:

• 1- CUDA (Compute Unified Device Architecture): A parallel

computing platform and API from NVIDIA that allows direct
programming of GPUs using C/C++. Some features of CUDA are given
here:
GPU Architecture & Programming cont…
Programming models
• A- CUDA Kernels: Functions executed on the GPU, written in CUDA.

• B- CUDA Threads and Blocks: Threads are grouped into blocks, and

blocks form a grid, allowing scalable parallel execution.

• C- Memory Management: Optimizing memory access patterns (e.g.,

coalesced memory access) is crucial for performance.

GPU Architecture & Programming cont…
Programming models
• D- Streams and Asynchronous Execution: CUDA streams allow

overlapping computation and memory transfers.

GPU Architecture & Programming cont…
Programming models
• 2- OpenCL (Open Computing Language): A framework for writing

programs that execute across heterogeneous platforms, including

GPUs, CPUs, and FPGAs.

• 3- HIP (Heterogeneous Interface for Portability): A CUDA-like

framework for AMD GPUs.

GPU Architecture & Programming cont…
CUDA Programming Model
• CUDA is a widely used framework for GPU programming. It follows a
hierarchical execution model:

• Thread: Smallest execution unit.

• Block: A group of threads sharing memory and synchronization

mechanisms.

• Grid: A collection of thread blocks.

GPU Architecture & Programming cont…
CUDA Programming Model
• A basic CUDA program consists of:

• A- Kernel Function: Defines the computation to be executed on the GPU.

• B- Memory Management: Allocating and transferring data between CPU (host)

and GPU (device).

• C- Launching Kernel: Configuring execution parameters (grid and block sizes).

• D- Synchronizing Threads: Ensuring proper execution order.

GPU Architecture & Programming cont…
Memory Management in GPU
• Efficient memory management is crucial for performance optimization:

• A- Memory Coalescing: Ensuring threads access consecutive memory

locations.

• B- Shared Memory Usage: Reducing global memory accesses by using

shared memory.
GPU Architecture & Programming cont…
Memory Management in GPU
• C- Register Optimization: Minimizing register spilling to avoid

performance degradation.

• D- Unified Memory: Allowing CPU and GPU to share memory

seamlessly.
GPU Architecture & Programming cont…
Optimization Techniques
• To achieve high performance in GPU programming, the following
optimizations are applied:

• A- Thread Block Sizing: Choosing optimal thread and block sizes for
maximum occupancy.

• B- Loop Unrolling: Reducing loop overhead by manually unrolling

iterations.
GPU Architecture & Programming cont…
Optimization Techniques
• C- Warp-Level Synchronization: Utilizing warp-level primitives for

faster communication.

• D- Occupancy Maximization: Ensuring high SM utilization by

managing resource allocation.

• E- Reducing Divergence: Minimizing branch divergence within warps.

GPU Architecture & Programming cont…
Application of GPU Computing
• GPUs are extensively used in:

• A- Deep Learning & AI: Accelerating neural network training and

inference (e.g., TensorFlow, PyTorch).

• B- Scientific Computing: Simulating physical phenomena, fluid

dynamics, and quantum mechanics.

GPU Architecture & Programming cont…
Application of GPU Computing
• C- Cryptography & Blockchain: Processing cryptographic algorithms

and mining cryptocurrencies.

• D- Gaming & Graphics: Real-time rendering and physics-based

simulations.

• E- Financial Modeling: Risk analysis and Monte Carlo simulations.

GPU Architecture and Programming
No ratings yet
GPU Architecture and Programming
3 pages
GPU Cluster4
No ratings yet
GPU Cluster4
31 pages
PDC Lecture 09
No ratings yet
PDC Lecture 09
36 pages
HPC 5th Unit - 240504 - 160548
No ratings yet
HPC 5th Unit - 240504 - 160548
18 pages
Chapter7 GPU
No ratings yet
Chapter7 GPU
45 pages
GPU Programming for Developers
No ratings yet
GPU Programming for Developers
9 pages
CUDA Programming for Engineers
No ratings yet
CUDA Programming for Engineers
17 pages
Understanding GPU Architecture and CUDA
No ratings yet
Understanding GPU Architecture and CUDA
12 pages
DS1822 - Parallel Computing-Unit3
No ratings yet
DS1822 - Parallel Computing-Unit3
17 pages
Unit 5'
No ratings yet
Unit 5'
33 pages
AMPE Tema4 GPU Architecture
No ratings yet
AMPE Tema4 GPU Architecture
95 pages
Lecture 12 GPU Programming
No ratings yet
Lecture 12 GPU Programming
65 pages
Programming Gpus With Cuda: John Mellor-Crummey
No ratings yet
Programming Gpus With Cuda: John Mellor-Crummey
42 pages
Introduction to GPGPU Programming
No ratings yet
Introduction to GPGPU Programming
32 pages
GPU Basics
No ratings yet
GPU Basics
93 pages
w13s1 MultiprocessingGPU
No ratings yet
w13s1 MultiprocessingGPU
21 pages
Manycore GPU Programming Overview
No ratings yet
Manycore GPU Programming Overview
67 pages
Cuuda Nvidai Guide - Part1
No ratings yet
Cuuda Nvidai Guide - Part1
15 pages
CUDA Class Lecture01
No ratings yet
CUDA Class Lecture01
26 pages
0 Gpu Computing I Give It
No ratings yet
0 Gpu Computing I Give It
57 pages
Comp206 Lecture14
No ratings yet
Comp206 Lecture14
29 pages
06 Intro Gpus
No ratings yet
06 Intro Gpus
33 pages
Gpu Computing
No ratings yet
Gpu Computing
57 pages
GPU in Supercomputer
No ratings yet
GPU in Supercomputer
7 pages
GPU Programming: Dr. Florian Ferreira
No ratings yet
GPU Programming: Dr. Florian Ferreira
101 pages
Understanding PGPU and CUDA Basics
No ratings yet
Understanding PGPU and CUDA Basics
70 pages
Thesis Support for GPU Programming
100% (2)
Thesis Support for GPU Programming
6 pages
p10 Cuda
No ratings yet
p10 Cuda
28 pages
Lecture GPUArchCUDA01
No ratings yet
Lecture GPUArchCUDA01
57 pages
Parallel & Distributed Computing Report
No ratings yet
Parallel & Distributed Computing Report
4 pages
GPU Architecture Ebook
No ratings yet
GPU Architecture Ebook
67 pages
Chapter 8
No ratings yet
Chapter 8
58 pages
Intro GPUs
No ratings yet
Intro GPUs
36 pages
CUDA
No ratings yet
CUDA
46 pages
What Is A GPU
No ratings yet
What Is A GPU
3 pages
DS1822 - Parallel Computing-Unit3
No ratings yet
DS1822 - Parallel Computing-Unit3
6 pages
GPGPU Programming With CUDA: Leandro Avila - University of Northern Iowa
No ratings yet
GPGPU Programming With CUDA: Leandro Avila - University of Northern Iowa
29 pages
Seminar Igor Kamzic COSC3P93
No ratings yet
Seminar Igor Kamzic COSC3P93
58 pages
10 GPU-IntroCUDA3
No ratings yet
10 GPU-IntroCUDA3
141 pages
Day1 1
No ratings yet
Day1 1
25 pages
CUDA Tutorial
100% (1)
CUDA Tutorial
50 pages
GPUProgramming Talk
No ratings yet
GPUProgramming Talk
18 pages
Topic GPU1
No ratings yet
Topic GPU1
32 pages
CUDA Programming Model Overview
No ratings yet
CUDA Programming Model Overview
31 pages
BCS702 Module 5 Textbook
No ratings yet
BCS702 Module 5 Textbook
48 pages
CUDA Programming Model Explained
No ratings yet
CUDA Programming Model Explained
14 pages
GPU Architecture
33% (3)
GPU Architecture
28 pages
Unit 4
100% (1)
Unit 4
48 pages
Kirk+Hwu GPU
No ratings yet
Kirk+Hwu GPU
92 pages
GPU Programming Slides 1
No ratings yet
GPU Programming Slides 1
33 pages
GPU (Graphics Processing Unit)
No ratings yet
GPU (Graphics Processing Unit)
11 pages
Lecture 2
No ratings yet
Lecture 2
77 pages
CUDA Programming for Engineers
No ratings yet
CUDA Programming for Engineers
84 pages
Gpus
No ratings yet
Gpus
32 pages
Why GPU?: CS8803SC Software and Hardware Cooperative Computing
No ratings yet
Why GPU?: CS8803SC Software and Hardware Cooperative Computing
14 pages
Programming Models For GPU Architecture
No ratings yet
Programming Models For GPU Architecture
55 pages
Coe4590 15 Gpu1
No ratings yet
Coe4590 15 Gpu1
14 pages
Introduction - CUDA C Programming Guide
No ratings yet
Introduction - CUDA C Programming Guide
573 pages
Comp Arch Project 2 Final
No ratings yet
Comp Arch Project 2 Final
29 pages
Word Processor 3 Practical Questions
No ratings yet
Word Processor 3 Practical Questions
2 pages
Labels
No ratings yet
Labels
5 pages
B Msgs Server
No ratings yet
B Msgs Server
1,042 pages
CCS372 - Important Questions
100% (1)
CCS372 - Important Questions
2 pages
Big Data-11-20
No ratings yet
Big Data-11-20
10 pages
MB Manual H510-Features
No ratings yet
MB Manual H510-Features
13 pages
MODULE 4 HPC
No ratings yet
MODULE 4 HPC
41 pages
Cse211 Operating-Systems TH 2.00 Sc04
No ratings yet
Cse211 Operating-Systems TH 2.00 Sc04
1 page
2 - ARM Parallel IO 123
100% (2)
2 - ARM Parallel IO 123
42 pages
123step by Step Tutorials in Bapi Sap Abap PDF Free
No ratings yet
123step by Step Tutorials in Bapi Sap Abap PDF Free
116 pages
Excel & Google Sheets Guide
No ratings yet
Excel & Google Sheets Guide
40 pages
KWP2000 Manual
50% (2)
KWP2000 Manual
20 pages
MB Manual Ga-Z87p-D3 v2.0 e
No ratings yet
MB Manual Ga-Z87p-D3 v2.0 e
100 pages
Interactive System Programming Tools
No ratings yet
Interactive System Programming Tools
4 pages
TEMS Investigation: C/I and SQI Insights
No ratings yet
TEMS Investigation: C/I and SQI Insights
6 pages
Server 2022 Slides Chapter 6
No ratings yet
Server 2022 Slides Chapter 6
54 pages
Discovery Kits With Stm32Mp157 Mpus: User Manual
No ratings yet
Discovery Kits With Stm32Mp157 Mpus: User Manual
47 pages
DI-C001-1339A Combined Upgrade Instructions 6135 - 6753 To 6742 PDF
No ratings yet
DI-C001-1339A Combined Upgrade Instructions 6135 - 6753 To 6742 PDF
36 pages
CS001 Computer Proficiency All Midterm Solved Objective and Subjective Papers in One File For Preparation of Midterm Exam
No ratings yet
CS001 Computer Proficiency All Midterm Solved Objective and Subjective Papers in One File For Preparation of Midterm Exam
24 pages
Computer Architecture and Organization 1st Ed 2021 Shuangbao Paul Wang PDF Download
No ratings yet
Computer Architecture and Organization 1st Ed 2021 Shuangbao Paul Wang PDF Download
85 pages
Final CN Mini Project
No ratings yet
Final CN Mini Project
17 pages
Techlog 2018-2 SynchronizationTool DeploymentGuide
No ratings yet
Techlog 2018-2 SynchronizationTool DeploymentGuide
20 pages
CCPINTRO
No ratings yet
CCPINTRO
14 pages
Interview Questions
No ratings yet
Interview Questions
4 pages
8
No ratings yet
8
367 pages
z/OS FTP TLS/SSL Client Setup Guide
No ratings yet
z/OS FTP TLS/SSL Client Setup Guide
9 pages
Windows XP Product Keys & Serials List
100% (2)
Windows XP Product Keys & Serials List
2 pages
Configuring PfSense for Intrusion Detection
No ratings yet
Configuring PfSense for Intrusion Detection
31 pages
Comprehensive YouTube Coding Guide
No ratings yet
Comprehensive YouTube Coding Guide
4 pages
CensorNet Self Install Guide
No ratings yet
CensorNet Self Install Guide
14 pages

PDC 21 - Graphical Processing Unit

Uploaded by

PDC 21 - Graphical Processing Unit

Uploaded by

Graphical

• Graphics Processing Units (GPUs) are specialized hardware designed

for highly parallel computing tasks.

• Unlike CPUs, which have a few powerful cores optimized for

sequential processing, GPUs consist of thousands of smaller cores

that execute tasks in parallel.

• GPU: Optimized for parallelism, with thousands of simple cores executing

• 1- Streaming Multiprocessors (SMs): A GPU consists of multiple SMs, each

• 2- CUDA Cores: Individual processing units within SMs, executing

• Global Memory: Large but relatively slow, accessible by all threads.

• Shared Memory: Faster, limited in size, shared within a single SM.

• Registers: Fastest memory, private to each thread.

• 5- Memory Controller: Manages access to different memory types and optimizes

• 1- CUDA (Compute Unified Device Architecture): A parallel

blocks form a grid, allowing scalable parallel execution.

• C- Memory Management: Optimizing memory access patterns (e.g.,

coalesced memory access) is crucial for performance.

overlapping computation and memory transfers.

programs that execute across heterogeneous platforms, including

GPUs, CPUs, and FPGAs.

• 3- HIP (Heterogeneous Interface for Portability): A CUDA-like

framework for AMD GPUs.

• Thread: Smallest execution unit.

• Block: A group of threads sharing memory and synchronization

• Grid: A collection of thread blocks.

• A- Kernel Function: Defines the computation to be executed on the GPU.

• B- Memory Management: Allocating and transferring data between CPU (host)

• C- Launching Kernel: Configuring execution parameters (grid and block sizes).

• D- Synchronizing Threads: Ensuring proper execution order.

• A- Memory Coalescing: Ensuring threads access consecutive memory

• B- Shared Memory Usage: Reducing global memory accesses by using

• D- Unified Memory: Allowing CPU and GPU to share memory

• B- Loop Unrolling: Reducing loop overhead by manually unrolling

• D- Occupancy Maximization: Ensuring high SM utilization by

managing resource allocation.

• E- Reducing Divergence: Minimizing branch divergence within warps.

• A- Deep Learning & AI: Accelerating neural network training and

inference (e.g., TensorFlow, PyTorch).

• B- Scientific Computing: Simulating physical phenomena, fluid

dynamics, and quantum mechanics.

and mining cryptocurrencies.

• D- Gaming & Graphics: Real-time rendering and physics-based

• E- Financial Modeling: Risk analysis and Monte Carlo simulations.

You might also like