Parallel and Distributed
Computing
BSCS 20F Morning
Dawood University of Engineering and Technology
Dr. Sarwat Iqbal 1
Lecture Topics
• GPU
• GPU Architecture
• GPU Programming
• Key Concepts
• Parallel Computing Model
• Applications of GPU Programming
• GPU Programming LanguagesDr. Sarwat Iqbal 2
Graphics Processing Unit (GPU)
• GPU stands for Graphics Processing Unit.
• It's a specialized electronic circuit designed to accelerate the creation and
rendering of images, videos, and animations.
• Originally, GPUs were developed primarily to handle the complex calculations
required for rendering graphics in video games, but they have since found
applications in various fields such as scientific simulations, artificial
intelligence, cryptocurrency mining, and more.
Dr. Sarwat Iqbal 3
Graphics Processing Unit (GPU)
• GPUs are parallel processors, meaning they can perform many calculations
simultaneously, making them well-suited for tasks that involve processing
large amounts of data in parallel.
• This parallel processing capability also makes GPUs useful for tasks like
machine learning and deep learning, where many simple calculations need to
be performed simultaneously.
Dr. Sarwat Iqbal 4
GPU Architecture
• GPU architecture refers to the design and organization of the various
components within a graphics processing unit (GPU).
• It encompasses the layout of processing cores, memory subsystems, cache
structures, interconnects, and other components that work together to perform
computations and render graphics.
Dr. Sarwat Iqbal 5
GPU Architecture
• The architecture of a GPU typically includes the following key elements:
1. Processing Cores: These are the computational units responsible for executing instructions and
performing calculations. Modern GPUs contain thousands of processing cores, organized into
streaming multiprocessors (SMs) or similar units.
2. Memory Subsystem: This includes various types of memory such as VRAM (Video RAM),
which stores textures, frame buffers, and other graphics data. The memory subsystem also
includes cache memory to speed up data access.
3. Interconnects: These are the pathways that allow communication between different components
of the GPU, such as between processing cores, memory, and other units.
Dr. Sarwat Iqbal 6
GPU Architecture
4. Instruction Set Architecture (ISA): This defines the set of instructions that the GPU can
execute and the organization of these instructions.
5. Control Units: These units manage the flow of instructions and data within the GPU,
orchestrating the execution of programs and coordinating various tasks.
6. Specialized Units: Some GPUs may include specialized units for specific tasks such as
geometry processing, tessellation, ray tracing, or tensor operations in the case of AI-
focused architectures.
Dr. Sarwat Iqbal 7
GPU Architecture
• GPU architectures can vary significantly between different manufacturers and
product lines, with each architecture optimized for specific applications or
performance targets.
• Major GPU manufacturers like NVIDIA and AMD continuously develop and
refine their architectures to improve performance, power efficiency, and
support for new features and technologies.
Dr. Sarwat Iqbal 8
GPU Programming
• GPU programming refers to the process of writing and executing code that harnesses the
computational power of a Graphics Processing Unit (GPU) to perform tasks beyond
traditional graphics rendering.
• GPU programming typically involves utilizing the parallel processing capabilities of GPUs
to accelerate computations in various domains, such as scientific simulations, machine
learning, data analytics, and more.
Dr. Sarwat Iqbal 9
GPU Programming: Key Concepts in GPU
Programming:
1. Parallelism: GPUs excel at parallel execution of tasks. GPU programs are structured to exploit
parallelism by dividing workloads into smaller tasks that can be executed simultaneously across
processing cores.
2. Thread Hierarchy: GPU programs organize computation into threads, with threads grouped into
blocks, and blocks organized into a grid. This hierarchical structure allows for efficient
management of parallel execution.
3. Memory Management: Efficient memory access is crucial for GPU performance. GPU
programming involves managing different memory types, optimizing memory access patterns,
and minimizing data transfers between CPUDr.and GPU memory.
Sarwat Iqbal 10
GPU Programming: Key Concepts in GPU
Programming:
4. Optimization Techniques: GPU programming requires optimizing algorithms and code for
parallel execution. Techniques such as loop unrolling, data tiling, and shared memory utilization
are employed to maximize performance.
5. Data Parallelism vs. Task Parallelism: GPU programs can exploit both data parallelism (where
the same operation is performed on multiple data elements concurrently) and task parallelism
(where independent tasks are executed simultaneously).
Dr. Sarwat Iqbal 11
GPU Programming
1. Parallel Computing Model:
• GPUs excel at parallel processing, allowing them to execute thousands of
computational tasks simultaneously.
• Unlike CPUs, which typically have a few powerful cores optimized for
sequential processing, GPUs contain numerous smaller cores optimized for
parallel workloads.
• GPU programming harnesses this parallelism to accelerate computations.
Dr. Sarwat Iqbal 12
GPU Programming
2. Programming Models:
• Graphics APIs: Historically, GPU programming was primarily done through graphics APIs like OpenGL
and DirectX. These APIs were designed for graphics rendering but can also be used for general-purpose
computing (GPGPU) through techniques like shader programming.
• GPGPU APIs: To enable more general-purpose computation on GPUs, specialized APIs such as NVIDIA's
CUDA (Compute Unified Device Architecture) and OpenCL (Open Computing Language) have been
developed. These APIs provide a lower-level programming interface for directly controlling and utilizing the
computational resources of GPUs.
• High-Level Frameworks: High-level frameworks and libraries built on top of GPGPU APIs, such as
NVIDIA's cuDNN (CUDA Deep Neural Network library) for deep learning or libraries like OpenACC and
OpenMP for parallel programming, provide abstractions that simplify GPU programming by handling low-
level details and optimizations.
Dr. Sarwat Iqbal 13
GPU Programming
3. Data Parallelism:
• GPU programming often leverages data parallelism, where the same computation is
applied to multiple data elements simultaneously.
• This is achieved by launching a large number of threads, each responsible for processing a
different data element in parallel. GPUs automatically manage thread scheduling and
execution across their cores.
Dr. Sarwat Iqbal 14
GPU Programming
4. Memory Hierarchy:
• Global Memory: GPU cores have access to global memory, which is shared among all threads.
However, accessing global memory can be slower due to latency and limited bandwidth.
• Shared Memory: Each GPU multiprocessor (SM) typically has its own shared memory, which is
much faster than global memory. Shared memory is shared among threads within the same block
(a group of threads executed together).
• Registers and Local Memory: Each thread has access to its own registers for fast, private
storage. If the data does not fit in registers, it spills over to local memory, which is slower than
registers but faster than global memory.
Dr. Sarwat Iqbal 15
GPU Programming
5. Thread Hierarchy:
• Thread Blocks: Threads are organized into groups called thread blocks. Threads within
the same block can communicate and synchronize using shared memory and
synchronization primitives.
• Grids: Thread blocks are organized into a grid. Each block can be executed
independently, and multiple blocks can execute concurrently on different multiprocessors.
Dr. Sarwat Iqbal 16
GPU Programming
6. Optimizations:
• Memory Access Patterns: Optimizing memory access patterns to maximize memory bandwidth
utilization and minimize latency.
• Thread Divergence: Minimizing thread divergence to ensure that threads within the same warp (a
group of threads executed together) execute the same instructions to avoid performance penalties.
• Occupancy: Maximizing GPU occupancy by efficiently utilizing computational resources to keep
all cores busy.
• Kernel Fusion: Combining multiple computational tasks into a single GPU kernel to reduce
overhead and improve efficiency.
Dr. Sarwat Iqbal 17
GPU Programming
7. Debugging and Profiling:
• GPU programming often involves debugging and profiling tools to identify performance bottlenecks,
memory access errors, and other issues.
• Tools like NVIDIA Nsight, AMD CodeXL, and Intel VTune provide insights into GPU execution and help
optimize code.
Dr. Sarwat Iqbal 18
GPU Programming
• Overall, GPU programming offers significant performance benefits for
parallelizable tasks but requires a good understanding of parallel computing
concepts, GPU architecture, and optimization techniques to achieve optimal
performance.
Dr. Sarwat Iqbal 19
Applications of GPU Programming:
• Graphics Rendering: GPUs are widely used for real-time rendering of 3D graphics in video games,
virtual reality (VR), and computer-aided design (CAD) applications.
• Scientific Computing: GPUs accelerate scientific simulations, computational fluid dynamics
(CFD), weather forecasting, and other complex calculations in fields such as physics, chemistry, and
engineering.
• Machine Learning and AI: GPUs power deep learning algorithms for tasks like image recognition,
natural language processing (NLP), and speech recognition. Their parallel processing capabilities
enable fast training and inference in neural networks.
Dr. Sarwat Iqbal 20
Applications of GPU Programming:
• Data Processing and Analytics: GPUs accelerate data processing tasks such as data mining, signal
processing, and database queries. They are used in big data analytics, financial modeling, and
scientific data analysis.
• High-Performance Computing (HPC): GPUs are integral to HPC clusters for solving large-scale
computational problems in areas like computational biology, molecular dynamics, and numerical
simulations.
Dr. Sarwat Iqbal 21
GPU Programming Languages :
• GPU programming is supported in several languages, each with its
own set of libraries and frameworks tailored to interact with the
GPU.
• Some of the most commonly used languages for GPU programming
include:
Dr. Sarwat Iqbal 22
Applications of GPU Programming
• CUDA (Compute Unified Device Architecture):
• Developed by NVIDIA, CUDA is a parallel computing platform and
programming model specifically designed for NVIDIA GPUs.
• CUDA provides a C/C++ programming interface with extensions for
parallelism, allowing developers to write code directly targeting NVIDIA
GPUs. Dr. Sarwat Iqbal 23
GPU Programming Languages :
• OpenCL (Open Computing Language):
• OpenCL is an open standard for parallel programming across heterogeneous
platforms, including GPUs, CPUs, and other accelerators.
• It offers a C-like programming language with APIs for writing parallel code
that can execute across various devices, including those from NVIDIA,
AMD, Intel, and others.
Dr. Sarwat Iqbal 24
GPU Programming Languages :
• OpenGL and Vulkan:
• While primarily graphics APIs, OpenGL and Vulkan can also be used for
GPU computing tasks.
• They provide compute shaders, which allow developers to perform general-
purpose computation on the GPU within the graphics pipeline.
• However, their use for non-graphics tasks is less common compared to
CUDA and OpenCL.
Dr. Sarwat Iqbal 25
GPU Programming Languages :
• DirectX Compute (DirectCompute):
• DirectCompute is part of the Microsoft DirectX API and enables general-
purpose computing on GPUs using shaders within the DirectX framework.
• While primarily used for graphics rendering, DirectCompute can also be
utilized for parallel processing tasks on compatible GPUs.
Dr. Sarwat Iqbal 26
GPU Programming Languages :
• SYCL:
• SYCL (pronounced "sickle") is a higher-level programming model built on
top of OpenCL.
• It allows developers to write code in C++ and take advantage of OpenCL's
parallel execution capabilities.
• SYCL provides a more modern C++ interface compared to traditional
OpenCL programming.
Dr. Sarwat Iqbal 27
GPU Programming Languages :
• CUDA Python (CuPy) and Numba:
• For Python developers, there are libraries like CuPy and Numba that allow
GPU programming using CUDA.
• CuPy provides a NumPy-like interface for GPU arrays and operations, while
Numba offers just-in-time (JIT) compilation of Python functions for
execution on the GPU.
Dr. Sarwat Iqbal 28
GPU Programming Languages :
• ROCm (Radeon Open Compute):
• Developed by AMD, ROCm is an open-source platform for GPU computing.
• It supports various programming languages, including C/C++, Python, and Fortran,
and provides libraries and tools for developing GPU-accelerated applications on AMD
GPUs.
• ROCm consists of a collection of drivers, development tools, and APIs that enable
GPU programming from low-level kernel to end-user applications.
Dr. Sarwat Iqbal 29
GPU Programming Languages :
• These languages and frameworks offer different levels of abstraction and
support for GPU programming, catering to diverse developer preferences and
requirements.
• The choice of language often depends on factors such as hardware
compatibility, performance goals, existing codebase, and developer expertise.
Dr. Sarwat Iqbal 30
END
Dr. Sarwat Iqbal 31