IT301: INTRODUCTION TO
CUDA
By,
Ms. Thanmayee
Adhoc Faculty,
Department of IT,
NITK, Surathkal
OUTLINE
● Introduction to GPU
● Evolution of GPU microarchitectures
● General Purpose GPU
● Introduction to CUDA
● CUDA Execution Model
● CUDA Memory Model
● Steps in GPU Execution
● Hello World Program
● CUDA Device Variables
● CUDA Programming examples
CPU vs GPU
● Need to understand how CPUs and GPUs differ
− Simpler calculation versus complex calculation
− Basic graphics versus 3D rendering, animations.
− Few higher capacity cores versus many low capacity
cores
− Latency Intolerance versus Latency Tolerance
− Task Parallelism versus Data Parallelism
− 10s of Threads versus 10,000s of Threads
Latency Hiding in GPU
General Purpose GPU : GPGPU
The dawn of GPGPU
General Purpose Computing on GPU was far from easy back then
− Even for those who knew graphics programming
languages such as OpenGL!
− Developers had to map scientific calculations onto
problems that could be represented by triangles and
polygons.
Applications
Applications
● Machine Learning – self driving cars,
Watson AI Supercomputer.
● Scientific Applications such as Genome
sequencing, molecular simulations.
● Medical Image processing.
● Image tagging in Facebook.
● Numeric weather predictions.
● Oil exploration.
● Movie making.
● Atmospheric simulation.
● Sequencing the novel coronavirus and the
genomes of people afflicted with
COVID-19.
CUDA – Compute Unified Device Architecture
● In 2003, a team of researchers led by Ian Buck unveiled Brook,
the first widely adopted programming model to extend C with
data-parallel constructs.
● Exposed the GPU as a general - purpose processor in a high-
level language
− Most importantly, Brook programs were
● Easier to write than hand-tuned GPU code
● Seven times faster than similar existing code
CUDA – Compute Unified Device Architecture
● NVIDIA invited Ian Buck to join the company.
− Started evolving a solution to seamlessly run C on the GPU.
− Putting the software and hardware together, NVIDIA unveiled CUDA in
2006
●
− CUDA was launched in 2007.
− The world's first solution for general-computing on GPUs
− CUDA:
■ is a parallel computing architecture and programming model.
■ Includes C/C++ compiler and also support for OpenCL, DirectCompute.
General Structure of the GPU Program in
CUDA
● Host Program – Executed by the
CPU.
●
− This is a serial code.
− Sets up the parameters for
GPU (kernel) execution.
● Kernel Program – Executed in
Parallel by the SIMD cores
(Streaming Processors) in the
GPU.
Compiling CUDA Program:
CUDA Execution Model
● Threads :
○ perform computations. They run
on Scalar Processor (Streaming
Processors) in GPU.
○ Thousands are needed to get full
efficiency.
● Blocks :
○ Group of Threads. Max. Number
of Threads vary from 1 to 1024.
○ They are alloted to Streaming
Multiprocessors (SMs) in GPU.
○ Multiple blocks can reside in one
SM.
● Grid :
○ Group of Blocks.
○ Holds the complete computation
task. They represent the Kernel.
Blocks in SMs
THANK YOU