0% found this document useful (0 votes)
23 views19 pages

PDC 21 - Graphical Processing Unit

The document provides an overview of Graphics Processing Units (GPUs), highlighting their architecture, programming models, and applications. It contrasts GPUs with Central Processing Units (CPUs) in terms of parallel processing capabilities and details the components of GPU architecture, including CUDA cores and memory hierarchy. Additionally, it discusses programming frameworks like CUDA and OpenCL, optimization techniques, and various applications of GPU computing in fields such as deep learning, scientific computing, and gaming.

Uploaded by

jammubrothers358
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views19 pages

PDC 21 - Graphical Processing Unit

The document provides an overview of Graphics Processing Units (GPUs), highlighting their architecture, programming models, and applications. It contrasts GPUs with Central Processing Units (CPUs) in terms of parallel processing capabilities and details the components of GPU architecture, including CUDA cores and memory hierarchy. Additionally, it discusses programming frameworks like CUDA and OpenCL, optimization techniques, and various applications of GPU computing in fields such as deep learning, scientific computing, and gaming.

Uploaded by

jammubrothers358
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

Graphical

Processing Unit
Parallel and Distributed Computing
Arfan Shahzad
{ [email protected] }
GPU Architecture & Programming

• Graphics Processing Units (GPUs) are specialized hardware designed

for highly parallel computing tasks.

• Unlike CPUs, which have a few powerful cores optimized for

sequential processing, GPUs consist of thousands of smaller cores

that execute tasks in parallel.


GPU Architecture & Programming cont…
GPU vs. CPU Architecture
• GPUs differ from Central Processing Units (CPUs) in terms of architecture
and execution model:

• CPU: Optimized for sequential tasks, with few powerful cores, complex
control logic, and large caches.

• GPU: Optimized for parallelism, with thousands of simple cores executing


multiple threads simultaneously.
GPU Architecture & Programming cont…
GPU vs. CPU Architecture
Feature CPU GPU
Cores Few (4–64) Thousands
Execution Model Sequential Parallel
Latency Low High
Throughput Low High
Control Flow Complex Simple
Memory Hierarchy Large caches Small shared memory
GPU Architecture & Programming cont…
Architectural components
• A typical GPU consists of following architectural components:

• 1- Streaming Multiprocessors (SMs): A GPU consists of multiple SMs, each


containing numerous CUDA cores (or streaming processors) that perform
computations in parallel.

• 2- CUDA Cores: Individual processing units within SMs, executing


instructions in parallel.
GPU Architecture & Programming cont…
Architectural components
• 3- Memory Hierarchy: GPUs have multiple memory types, including:

• Global Memory: Large but relatively slow, accessible by all threads.

• Shared Memory: Faster, limited in size, shared within a single SM.

• Registers: Fastest memory, private to each thread.

• Texture and Constant Memory: Optimized for specific read patterns and
frequently used data.
GPU Architecture & Programming cont…
Architectural components
• 4- Warp-based Execution: Threads are organized into warps (typically 32
threads), which execute instructions in lockstep.

• 5- Memory Controller: Manages access to different memory types and optimizes


bandwidth.

• 6- SIMD Execution Model: GPUs follow the Single Instruction Multiple Data
(SIMD) model, where multiple threads execute the same instruction on different
data.
GPU Architecture & Programming cont…
Programming models
• Programming a GPU involves writing parallel code using specialized
frameworks. The most common models include:

• 1- CUDA (Compute Unified Device Architecture): A parallel


computing platform and API from NVIDIA that allows direct
programming of GPUs using C/C++. Some features of CUDA are given
here:
GPU Architecture & Programming cont…
Programming models
• A- CUDA Kernels: Functions executed on the GPU, written in CUDA.

• B- CUDA Threads and Blocks: Threads are grouped into blocks, and

blocks form a grid, allowing scalable parallel execution.

• C- Memory Management: Optimizing memory access patterns (e.g.,

coalesced memory access) is crucial for performance.


GPU Architecture & Programming cont…
Programming models
• D- Streams and Asynchronous Execution: CUDA streams allow

overlapping computation and memory transfers.


GPU Architecture & Programming cont…
Programming models
• 2- OpenCL (Open Computing Language): A framework for writing

programs that execute across heterogeneous platforms, including

GPUs, CPUs, and FPGAs.

• 3- HIP (Heterogeneous Interface for Portability): A CUDA-like

framework for AMD GPUs.


GPU Architecture & Programming cont…
CUDA Programming Model
• CUDA is a widely used framework for GPU programming. It follows a
hierarchical execution model:

• Thread: Smallest execution unit.

• Block: A group of threads sharing memory and synchronization


mechanisms.

• Grid: A collection of thread blocks.


GPU Architecture & Programming cont…
CUDA Programming Model
• A basic CUDA program consists of:

• A- Kernel Function: Defines the computation to be executed on the GPU.

• B- Memory Management: Allocating and transferring data between CPU (host)


and GPU (device).

• C- Launching Kernel: Configuring execution parameters (grid and block sizes).

• D- Synchronizing Threads: Ensuring proper execution order.


GPU Architecture & Programming cont…
Memory Management in GPU
• Efficient memory management is crucial for performance optimization:

• A- Memory Coalescing: Ensuring threads access consecutive memory

locations.

• B- Shared Memory Usage: Reducing global memory accesses by using

shared memory.
GPU Architecture & Programming cont…
Memory Management in GPU
• C- Register Optimization: Minimizing register spilling to avoid

performance degradation.

• D- Unified Memory: Allowing CPU and GPU to share memory

seamlessly.
GPU Architecture & Programming cont…
Optimization Techniques
• To achieve high performance in GPU programming, the following
optimizations are applied:

• A- Thread Block Sizing: Choosing optimal thread and block sizes for
maximum occupancy.

• B- Loop Unrolling: Reducing loop overhead by manually unrolling


iterations.
GPU Architecture & Programming cont…
Optimization Techniques
• C- Warp-Level Synchronization: Utilizing warp-level primitives for

faster communication.

• D- Occupancy Maximization: Ensuring high SM utilization by

managing resource allocation.

• E- Reducing Divergence: Minimizing branch divergence within warps.


GPU Architecture & Programming cont…
Application of GPU Computing
• GPUs are extensively used in:

• A- Deep Learning & AI: Accelerating neural network training and

inference (e.g., TensorFlow, PyTorch).

• B- Scientific Computing: Simulating physical phenomena, fluid

dynamics, and quantum mechanics.


GPU Architecture & Programming cont…
Application of GPU Computing
• C- Cryptography & Blockchain: Processing cryptographic algorithms

and mining cryptocurrencies.

• D- Gaming & Graphics: Real-time rendering and physics-based

simulations.

• E- Financial Modeling: Risk analysis and Monte Carlo simulations.

You might also like