0% found this document useful (0 votes)

5 views31 pages

Lecture 4GPU

Uploaded by

mahadahmed3164

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views31 pages

Lecture 4GPU

Uploaded by

mahadahmed3164

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Parallel and Distributed

Computing
BSCS 20F Morning
Dawood University of Engineering and Technology

Dr. Sarwat Iqbal 1

Lecture Topics

• GPU
• GPU Architecture
• GPU Programming
• Key Concepts
• Parallel Computing Model
• Applications of GPU Programming
• GPU Programming LanguagesDr. Sarwat Iqbal 2
Graphics Processing Unit (GPU)
• GPU stands for Graphics Processing Unit.

• It's a specialized electronic circuit designed to accelerate the creation and

rendering of images, videos, and animations.

• Originally, GPUs were developed primarily to handle the complex calculations

required for rendering graphics in video games, but they have since found
applications in various fields such as scientific simulations, artificial
intelligence, cryptocurrency mining, and more.
Dr. Sarwat Iqbal 3
Graphics Processing Unit (GPU)
• GPUs are parallel processors, meaning they can perform many calculations
simultaneously, making them well-suited for tasks that involve processing
large amounts of data in parallel.

• This parallel processing capability also makes GPUs useful for tasks like
machine learning and deep learning, where many simple calculations need to
be performed simultaneously.

Dr. Sarwat Iqbal 4

GPU Architecture
• GPU architecture refers to the design and organization of the various
components within a graphics processing unit (GPU).

• It encompasses the layout of processing cores, memory subsystems, cache

structures, interconnects, and other components that work together to perform
computations and render graphics.

Dr. Sarwat Iqbal 5

GPU Architecture
• The architecture of a GPU typically includes the following key elements:

1. Processing Cores: These are the computational units responsible for executing instructions and
performing calculations. Modern GPUs contain thousands of processing cores, organized into
streaming multiprocessors (SMs) or similar units.

2. Memory Subsystem: This includes various types of memory such as VRAM (Video RAM),
which stores textures, frame buffers, and other graphics data. The memory subsystem also
includes cache memory to speed up data access.

3. Interconnects: These are the pathways that allow communication between different components
of the GPU, such as between processing cores, memory, and other units.
Dr. Sarwat Iqbal 6
GPU Architecture
4. Instruction Set Architecture (ISA): This defines the set of instructions that the GPU can
execute and the organization of these instructions.

5. Control Units: These units manage the flow of instructions and data within the GPU,
orchestrating the execution of programs and coordinating various tasks.

6. Specialized Units: Some GPUs may include specialized units for specific tasks such as
geometry processing, tessellation, ray tracing, or tensor operations in the case of AI-
focused architectures.

Dr. Sarwat Iqbal 7

GPU Architecture
• GPU architectures can vary significantly between different manufacturers and
product lines, with each architecture optimized for specific applications or
performance targets.

• Major GPU manufacturers like NVIDIA and AMD continuously develop and
refine their architectures to improve performance, power efficiency, and
support for new features and technologies.

Dr. Sarwat Iqbal 8

GPU Programming
• GPU programming refers to the process of writing and executing code that harnesses the
computational power of a Graphics Processing Unit (GPU) to perform tasks beyond
traditional graphics rendering.

• GPU programming typically involves utilizing the parallel processing capabilities of GPUs
to accelerate computations in various domains, such as scientific simulations, machine
learning, data analytics, and more.

Dr. Sarwat Iqbal 9

GPU Programming: Key Concepts in GPU
Programming:
1. Parallelism: GPUs excel at parallel execution of tasks. GPU programs are structured to exploit
parallelism by dividing workloads into smaller tasks that can be executed simultaneously across
processing cores.

2. Thread Hierarchy: GPU programs organize computation into threads, with threads grouped into
blocks, and blocks organized into a grid. This hierarchical structure allows for efficient
management of parallel execution.

3. Memory Management: Efficient memory access is crucial for GPU performance. GPU
programming involves managing different memory types, optimizing memory access patterns,
and minimizing data transfers between CPUDr.and GPU memory.
Sarwat Iqbal 10
GPU Programming: Key Concepts in GPU
Programming:
4. Optimization Techniques: GPU programming requires optimizing algorithms and code for
parallel execution. Techniques such as loop unrolling, data tiling, and shared memory utilization
are employed to maximize performance.

5. Data Parallelism vs. Task Parallelism: GPU programs can exploit both data parallelism (where
the same operation is performed on multiple data elements concurrently) and task parallelism
(where independent tasks are executed simultaneously).

Dr. Sarwat Iqbal 11

GPU Programming
1. Parallel Computing Model:
• GPUs excel at parallel processing, allowing them to execute thousands of
computational tasks simultaneously.

• Unlike CPUs, which typically have a few powerful cores optimized for
sequential processing, GPUs contain numerous smaller cores optimized for
parallel workloads.

• GPU programming harnesses this parallelism to accelerate computations.

Dr. Sarwat Iqbal 12
GPU Programming
2. Programming Models:
• Graphics APIs: Historically, GPU programming was primarily done through graphics APIs like OpenGL
and DirectX. These APIs were designed for graphics rendering but can also be used for general-purpose
computing (GPGPU) through techniques like shader programming.

• GPGPU APIs: To enable more general-purpose computation on GPUs, specialized APIs such as NVIDIA's
CUDA (Compute Unified Device Architecture) and OpenCL (Open Computing Language) have been
developed. These APIs provide a lower-level programming interface for directly controlling and utilizing the
computational resources of GPUs.

• High-Level Frameworks: High-level frameworks and libraries built on top of GPGPU APIs, such as
NVIDIA's cuDNN (CUDA Deep Neural Network library) for deep learning or libraries like OpenACC and
OpenMP for parallel programming, provide abstractions that simplify GPU programming by handling low-
level details and optimizations.
Dr. Sarwat Iqbal 13
GPU Programming
3. Data Parallelism:
• GPU programming often leverages data parallelism, where the same computation is
applied to multiple data elements simultaneously.

• This is achieved by launching a large number of threads, each responsible for processing a
different data element in parallel. GPUs automatically manage thread scheduling and
execution across their cores.

Dr. Sarwat Iqbal 14

GPU Programming
4. Memory Hierarchy:
• Global Memory: GPU cores have access to global memory, which is shared among all threads.
However, accessing global memory can be slower due to latency and limited bandwidth.

• Shared Memory: Each GPU multiprocessor (SM) typically has its own shared memory, which is
much faster than global memory. Shared memory is shared among threads within the same block
(a group of threads executed together).

• Registers and Local Memory: Each thread has access to its own registers for fast, private
storage. If the data does not fit in registers, it spills over to local memory, which is slower than
registers but faster than global memory.
Dr. Sarwat Iqbal 15
GPU Programming
5. Thread Hierarchy:
• Thread Blocks: Threads are organized into groups called thread blocks. Threads within
the same block can communicate and synchronize using shared memory and
synchronization primitives.

• Grids: Thread blocks are organized into a grid. Each block can be executed
independently, and multiple blocks can execute concurrently on different multiprocessors.

Dr. Sarwat Iqbal 16

GPU Programming
6. Optimizations:
• Memory Access Patterns: Optimizing memory access patterns to maximize memory bandwidth
utilization and minimize latency.

• Thread Divergence: Minimizing thread divergence to ensure that threads within the same warp (a
group of threads executed together) execute the same instructions to avoid performance penalties.

• Occupancy: Maximizing GPU occupancy by efficiently utilizing computational resources to keep

all cores busy.

• Kernel Fusion: Combining multiple computational tasks into a single GPU kernel to reduce
overhead and improve efficiency.
Dr. Sarwat Iqbal 17
GPU Programming
7. Debugging and Profiling:
• GPU programming often involves debugging and profiling tools to identify performance bottlenecks,
memory access errors, and other issues.

• Tools like NVIDIA Nsight, AMD CodeXL, and Intel VTune provide insights into GPU execution and help
optimize code.

Dr. Sarwat Iqbal 18

GPU Programming
• Overall, GPU programming offers significant performance benefits for
parallelizable tasks but requires a good understanding of parallel computing
concepts, GPU architecture, and optimization techniques to achieve optimal
performance.

Dr. Sarwat Iqbal 19

Applications of GPU Programming:
• Graphics Rendering: GPUs are widely used for real-time rendering of 3D graphics in video games,
virtual reality (VR), and computer-aided design (CAD) applications.

• Scientific Computing: GPUs accelerate scientific simulations, computational fluid dynamics

(CFD), weather forecasting, and other complex calculations in fields such as physics, chemistry, and
engineering.

• Machine Learning and AI: GPUs power deep learning algorithms for tasks like image recognition,
natural language processing (NLP), and speech recognition. Their parallel processing capabilities
enable fast training and inference in neural networks.

Dr. Sarwat Iqbal 20

Applications of GPU Programming:
• Data Processing and Analytics: GPUs accelerate data processing tasks such as data mining, signal
processing, and database queries. They are used in big data analytics, financial modeling, and
scientific data analysis.

• High-Performance Computing (HPC): GPUs are integral to HPC clusters for solving large-scale
computational problems in areas like computational biology, molecular dynamics, and numerical
simulations.

Dr. Sarwat Iqbal 21

GPU Programming Languages :
• GPU programming is supported in several languages, each with its
own set of libraries and frameworks tailored to interact with the
GPU.

• Some of the most commonly used languages for GPU programming

include:

Dr. Sarwat Iqbal 22

Applications of GPU Programming
• CUDA (Compute Unified Device Architecture):

• Developed by NVIDIA, CUDA is a parallel computing platform and

programming model specifically designed for NVIDIA GPUs.

• CUDA provides a C/C++ programming interface with extensions for

parallelism, allowing developers to write code directly targeting NVIDIA

GPUs. Dr. Sarwat Iqbal 23

GPU Programming Languages :
• OpenCL (Open Computing Language):
• OpenCL is an open standard for parallel programming across heterogeneous
platforms, including GPUs, CPUs, and other accelerators.

• It offers a C-like programming language with APIs for writing parallel code
that can execute across various devices, including those from NVIDIA,
AMD, Intel, and others.

Dr. Sarwat Iqbal 24

GPU Programming Languages :
• OpenGL and Vulkan:

• While primarily graphics APIs, OpenGL and Vulkan can also be used for
GPU computing tasks.

• They provide compute shaders, which allow developers to perform general-

purpose computation on the GPU within the graphics pipeline.

• However, their use for non-graphics tasks is less common compared to

CUDA and OpenCL.
Dr. Sarwat Iqbal 25
GPU Programming Languages :
• DirectX Compute (DirectCompute):
• DirectCompute is part of the Microsoft DirectX API and enables general-
purpose computing on GPUs using shaders within the DirectX framework.

• While primarily used for graphics rendering, DirectCompute can also be

utilized for parallel processing tasks on compatible GPUs.

Dr. Sarwat Iqbal 26

GPU Programming Languages :
• SYCL:

• SYCL (pronounced "sickle") is a higher-level programming model built on

top of OpenCL.

• It allows developers to write code in C++ and take advantage of OpenCL's

parallel execution capabilities.

• SYCL provides a more modern C++ interface compared to traditional

OpenCL programming.
Dr. Sarwat Iqbal 27
GPU Programming Languages :
• CUDA Python (CuPy) and Numba:
• For Python developers, there are libraries like CuPy and Numba that allow
GPU programming using CUDA.

• CuPy provides a NumPy-like interface for GPU arrays and operations, while
Numba offers just-in-time (JIT) compilation of Python functions for
execution on the GPU.

Dr. Sarwat Iqbal 28

GPU Programming Languages :
• ROCm (Radeon Open Compute):
• Developed by AMD, ROCm is an open-source platform for GPU computing.

• It supports various programming languages, including C/C++, Python, and Fortran,

and provides libraries and tools for developing GPU-accelerated applications on AMD
GPUs.

• ROCm consists of a collection of drivers, development tools, and APIs that enable
GPU programming from low-level kernel to end-user applications.

Dr. Sarwat Iqbal 29

GPU Programming Languages :
• These languages and frameworks offer different levels of abstraction and
support for GPU programming, catering to diverse developer preferences and
requirements.

• The choice of language often depends on factors such as hardware

compatibility, performance goals, existing codebase, and developer expertise.

Dr. Sarwat Iqbal 30

END

Dr. Sarwat Iqbal 31

GPU Architecture and Programming
No ratings yet
GPU Architecture and Programming
3 pages
HPC 5th Unit - 240504 - 160548
No ratings yet
HPC 5th Unit - 240504 - 160548
18 pages
AMPE Tema4 GPU Architecture
No ratings yet
AMPE Tema4 GPU Architecture
95 pages
Unit 5'
No ratings yet
Unit 5'
33 pages
PDC 21 - Graphical Processing Unit
No ratings yet
PDC 21 - Graphical Processing Unit
19 pages
GPUProgramming Talk
No ratings yet
GPUProgramming Talk
18 pages
Intro To Gpu &amp Cuda
No ratings yet
Intro To Gpu &amp Cuda
15 pages
Gpus
No ratings yet
Gpus
32 pages
01-EmergingTech GPUComp GPGPU History MNIIJ Sep01 2025
No ratings yet
01-EmergingTech GPUComp GPGPU History MNIIJ Sep01 2025
30 pages
02-Emergingtech Nvidia Gpu Cuda Prog Mniij Sep01 2025
No ratings yet
02-Emergingtech Nvidia Gpu Cuda Prog Mniij Sep01 2025
52 pages
Brodtkorb Etal Meta10
No ratings yet
Brodtkorb Etal Meta10
15 pages
PDC Lecture 09
No ratings yet
PDC Lecture 09
36 pages
GPU Programming for Developers
No ratings yet
GPU Programming for Developers
9 pages
Seminar Igor Kamzic COSC3P93
No ratings yet
Seminar Igor Kamzic COSC3P93
58 pages
p10 Cuda
No ratings yet
p10 Cuda
28 pages
Why GPU?: CS8803SC Software and Hardware Cooperative Computing
No ratings yet
Why GPU?: CS8803SC Software and Hardware Cooperative Computing
14 pages
Lec0 Intro 1609
No ratings yet
Lec0 Intro 1609
29 pages
Lecture 12 GPU Programming
No ratings yet
Lecture 12 GPU Programming
65 pages
GPU Programming Slides 1
No ratings yet
GPU Programming Slides 1
33 pages
Understanding GPU Architecture and CUDA
No ratings yet
Understanding GPU Architecture and CUDA
12 pages
GPU Khoruzhenko
No ratings yet
GPU Khoruzhenko
5 pages
UNIT 4 GPU Computing - HPC
No ratings yet
UNIT 4 GPU Computing - HPC
13 pages
Graphics Processing Unit GPU Programming Strategie
No ratings yet
Graphics Processing Unit GPU Programming Strategie
14 pages
Graphics Processing Unit (GPU) Programming Strategies and Trends in GPU Computing
No ratings yet
Graphics Processing Unit (GPU) Programming Strategies and Trends in GPU Computing
10 pages
GPU (Graphics Processing Unit)
No ratings yet
GPU (Graphics Processing Unit)
23 pages
GPU Architecture
33% (3)
GPU Architecture
28 pages
CUDA Class Lecture01
No ratings yet
CUDA Class Lecture01
26 pages
Lecture GPUArchCUDA01
No ratings yet
Lecture GPUArchCUDA01
57 pages
0 Gpu Computing I Give It
No ratings yet
0 Gpu Computing I Give It
57 pages
Introduction To GPU Architecture
No ratings yet
Introduction To GPU Architecture
10 pages
GPU Evolution in High-Performance Computing
No ratings yet
GPU Evolution in High-Performance Computing
36 pages
Graphics Processing Units Paper PDF
No ratings yet
Graphics Processing Units Paper PDF
14 pages
Kirk+Hwu GPU
No ratings yet
Kirk+Hwu GPU
92 pages
GPU Computing Overview and Future
No ratings yet
GPU Computing Overview and Future
20 pages
GPU Gems2 ch29
No ratings yet
GPU Gems2 ch29
21 pages
Understanding GPU Architecture and Evolution
No ratings yet
Understanding GPU Architecture and Evolution
2 pages
Cuda
No ratings yet
Cuda
69 pages
Unit 4
100% (1)
Unit 4
48 pages
Gpu Computing
No ratings yet
Gpu Computing
57 pages
Comp Arch Project 2 Final
No ratings yet
Comp Arch Project 2 Final
29 pages
Unit 2 - GPU DFG
No ratings yet
Unit 2 - GPU DFG
27 pages
GPGPU
100% (1)
GPGPU
139 pages
Cuuda Nvidai Guide - Part1
No ratings yet
Cuuda Nvidai Guide - Part1
15 pages
CAO Report
No ratings yet
CAO Report
17 pages
Chapter 8
No ratings yet
Chapter 8
58 pages
Evolution of The Graphics Process Units: Dr. Zhijie Xu Z.xu@hud - Ac.uk
No ratings yet
Evolution of The Graphics Process Units: Dr. Zhijie Xu Z.xu@hud - Ac.uk
24 pages
8 Things You Should Know About GPGPU Technology: Q&A With TACC Research Scientists
No ratings yet
8 Things You Should Know About GPGPU Technology: Q&A With TACC Research Scientists
2 pages
Chapter7 GPU
No ratings yet
Chapter7 GPU
45 pages
ECE 498AL The CUDA Programming Model
No ratings yet
ECE 498AL The CUDA Programming Model
37 pages
Parallel Processing Using GPU's
No ratings yet
Parallel Processing Using GPU's
34 pages
Day1 1
No ratings yet
Day1 1
25 pages
DS1822 - Parallel Computing-Unit3
No ratings yet
DS1822 - Parallel Computing-Unit3
17 pages
06 Intro Gpus
No ratings yet
06 Intro Gpus
33 pages
Graphics Processing Unit
No ratings yet
Graphics Processing Unit
2 pages
Understanding PGPU and CUDA Basics
No ratings yet
Understanding PGPU and CUDA Basics
70 pages
BCS702 Module 5 Textbook
No ratings yet
BCS702 Module 5 Textbook
48 pages
GPU in Supercomputer
No ratings yet
GPU in Supercomputer
7 pages
Introduction to GPGPU Programming
No ratings yet
Introduction to GPGPU Programming
32 pages
DS1822 - Parallel Computing-Unit3
No ratings yet
DS1822 - Parallel Computing-Unit3
6 pages
Lecture 1.2and2ConcurrencyControl
No ratings yet
Lecture 1.2and2ConcurrencyControl
25 pages
Lecture 1.1SupportedReadingSynAsynch
No ratings yet
Lecture 1.1SupportedReadingSynAsynch
13 pages
Unlocking Insights A Comprehensive Overview of Ibm Cognos Analytics
No ratings yet
Unlocking Insights A Comprehensive Overview of Ibm Cognos Analytics
14 pages
IBM Cognos for Predictive Insights
No ratings yet
IBM Cognos for Predictive Insights
14 pages
IPV4 Addressing Classful and Classless
No ratings yet
IPV4 Addressing Classful and Classless
7 pages
Computer Architecture Course Plan
No ratings yet
Computer Architecture Course Plan
6 pages
3G Paging Check and Troubleshooting V1.0
No ratings yet
3G Paging Check and Troubleshooting V1.0
12 pages
HP BIOS Sure Start - Technical White Paper - 4AA7-4554ENW
No ratings yet
HP BIOS Sure Start - Technical White Paper - 4AA7-4554ENW
19 pages
Cloud Computing MCQs
100% (2)
Cloud Computing MCQs
5 pages
Network Devices and Their Layers
No ratings yet
Network Devices and Their Layers
21 pages
Epson SureLab OrderController Operation Guide en
No ratings yet
Epson SureLab OrderController Operation Guide en
196 pages
Smc7004awbr Manual
No ratings yet
Smc7004awbr Manual
50 pages
F3arRa1n DFU Eng
No ratings yet
F3arRa1n DFU Eng
29 pages
IC Design Summer School 2024 - Project
No ratings yet
IC Design Summer School 2024 - Project
2 pages
Revised - Syllabus - Computer - Science - BSC - 2023
No ratings yet
Revised - Syllabus - Computer - Science - BSC - 2023
36 pages
Nutanix For XC Series Deployment and Configuration 5 X Lab Guide
No ratings yet
Nutanix For XC Series Deployment and Configuration 5 X Lab Guide
145 pages
COAL Course Outline
No ratings yet
COAL Course Outline
3 pages
NG and XN Self-Management (5G RAN3.1 - Draft A) PDF
No ratings yet
NG and XN Self-Management (5G RAN3.1 - Draft A) PDF
83 pages
Understanding CGI: Uses and Requirements
100% (1)
Understanding CGI: Uses and Requirements
33 pages
Power Toys Configration
No ratings yet
Power Toys Configration
7 pages
EVX-LINK Programming Quick Reference Guide (NA) : Topics
No ratings yet
EVX-LINK Programming Quick Reference Guide (NA) : Topics
9 pages
DS Lab
50% (4)
DS Lab
16 pages
Dell Precision 5820 Tower Owner's Manual - Dell US
No ratings yet
Dell Precision 5820 Tower Owner's Manual - Dell US
3 pages
Classful vs Classless IP Addressing
No ratings yet
Classful vs Classless IP Addressing
16 pages
UM2237 STM32CubeProgrammer Software Description
No ratings yet
UM2237 STM32CubeProgrammer Software Description
39 pages
DMA Controller
No ratings yet
DMA Controller
7 pages
Lecture-2 (Basic IO System Design)
No ratings yet
Lecture-2 (Basic IO System Design)
36 pages
NEW IT PAST PAPER UE 2020 (By Zusti Official
No ratings yet
NEW IT PAST PAPER UE 2020 (By Zusti Official
9 pages
New Router Checklist PDF
No ratings yet
New Router Checklist PDF
15 pages
Bluesocket Controller (BSC) Setup and Administration Guide
No ratings yet
Bluesocket Controller (BSC) Setup and Administration Guide
376 pages
Any Connect 31 RN
No ratings yet
Any Connect 31 RN
34 pages
Peer2Peer File Sharing System With Chat Using Java.
87% (38)
Peer2Peer File Sharing System With Chat Using Java.
153 pages
It Health Check Report Template
100% (1)
It Health Check Report Template
3 pages
RN Erdas Er Mapper 2010
No ratings yet
RN Erdas Er Mapper 2010
18 pages

Lecture 4GPU

Uploaded by

Lecture 4GPU

Uploaded by

Parallel and Distributed

Dr. Sarwat Iqbal 1

• It's a specialized electronic circuit designed to accelerate the creation and

• Originally, GPUs were developed primarily to handle the complex calculations

Dr. Sarwat Iqbal 4

• It encompasses the layout of processing cores, memory subsystems, cache

Dr. Sarwat Iqbal 5

Dr. Sarwat Iqbal 7

Dr. Sarwat Iqbal 8

Dr. Sarwat Iqbal 9

Dr. Sarwat Iqbal 11

• GPU programming harnesses this parallelism to accelerate computations.

Dr. Sarwat Iqbal 14

Dr. Sarwat Iqbal 16

• Occupancy: Maximizing GPU occupancy by efficiently utilizing computational resources to keep

Dr. Sarwat Iqbal 18

Dr. Sarwat Iqbal 19

• Scientific Computing: GPUs accelerate scientific simulations, computational fluid dynamics

Dr. Sarwat Iqbal 20

Dr. Sarwat Iqbal 21

• Some of the most commonly used languages for GPU programming

Dr. Sarwat Iqbal 22

• Developed by NVIDIA, CUDA is a parallel computing platform and

programming model specifically designed for NVIDIA GPUs.

• CUDA provides a C/C++ programming interface with extensions for

parallelism, allowing developers to write code directly targeting NVIDIA

GPUs. Dr. Sarwat Iqbal 23

Dr. Sarwat Iqbal 24

• They provide compute shaders, which allow developers to perform general-

• However, their use for non-graphics tasks is less common compared to

• While primarily used for graphics rendering, DirectCompute can also be

Dr. Sarwat Iqbal 26

• SYCL (pronounced "sickle") is a higher-level programming model built on

• It allows developers to write code in C++ and take advantage of OpenCL's

• SYCL provides a more modern C++ interface compared to traditional

Dr. Sarwat Iqbal 28

• It supports various programming languages, including C/C++, Python, and Fortran,

Dr. Sarwat Iqbal 29

• The choice of language often depends on factors such as hardware

Dr. Sarwat Iqbal 30

Dr. Sarwat Iqbal 31

You might also like