0% found this document useful (0 votes)
16 views7 pages

Introduction To CUDA Programming

The document provides an overview of CUDA programming, which is an extension of C/C++ developed by Nvidia for parallel computing using GPUs. It discusses the architecture, execution model, and applications of CUDA, highlighting its benefits such as significant speed-ups in processing tasks. Additionally, it outlines the limitations of CUDA, including its compatibility with only NVIDIA hardware and interoperability issues with other languages like OpenGL.

Uploaded by

xilvenkat
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views7 pages

Introduction To CUDA Programming

The document provides an overview of CUDA programming, which is an extension of C/C++ developed by Nvidia for parallel computing using GPUs. It discusses the architecture, execution model, and applications of CUDA, highlighting its benefits such as significant speed-ups in processing tasks. Additionally, it outlines the limitations of CUDA, including its compatibility with only NVIDIA hardware and interoperability issues with other languages like OpenGL.

Uploaded by

xilvenkat
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

11/3/25, 5:59 PM Introduction to CUDA Programming - GeeksforGeeks

Search... Sign In

A Practice Problems C C++ Java Python JavaScript Data Science Machine Learning Cour

Introduction to CUDA Programming


Last Updated : 23 Jul, 2025

In this article, we will cover the overview of CUDA programming and


mainly focus on the concept of CUDA requirement and we will also
discuss the execution model of CUDA. Finally, we will see the
application. Let us discuss it one by one.

CUDA stands for Compute Unified Device Architecture. It is an extension


of C/C++ programming. CUDA is a programming language that uses the
Graphical Processing Unit (GPU). It is a parallel computing platform and
an API (Application Programming Interface) model, Compute Unified
Device Architecture was developed by Nvidia. This allows computations
to be performed in parallel while providing well-formed speed. Using
CUDA, one can harness the power of the Nvidia GPU to perform common
computing tasks, such as processing matrices and other linear algebra
operations, rather than simply performing graphical calculations.

Why do we need CUDA?

GPUs are designed to perform high-speed parallel computations to


display graphics such as games.
Use available CUDA resources. More than 100 million GPUs are
already deployed.
It provides 30-100x speed-up over other microprocessors for some
applications.
GPUs have very small Arithmetic Logic Units (ALUs) compared to the
somewhat larger CPUs. This allows for many parallel calculations,
such as calculating the color for each pixel on the screen, etc.

https://www.geeksforgeeks.org/electronics-engineering/introduction-to-cuda-programming/ 1/7
11/3/25, 5:59 PM Introduction to CUDA Programming - GeeksforGeeks

Architecture of CUDA

16 Streaming Multiprocessor (SM) diagrams are shown in the above


diagram.
Each Streaming Multiprocessor has 8 Streaming Processors (SP) ie,
we get a total of 128 Streaming Processors (SPs).
Now, each Streaming processor has a MAD unit (Multiplication and
Addition Unit) and an additional MU (multiplication unit).
The GT200 has 30 Streaming Multiprocessors (SMs) and each
Streaming Multiprocessor (SM) has 8 Streaming Processors (SPs) ie, a
total of 240 Streaming Processors (SPs), and more than 1 TFLOP
processing power.
Each Streaming Processor is gracefully threaded and can run
thousands of threads per application.
The G80 card has 16 Streaming Multiprocessors (SMs) and each SM
has 8 Streaming Processors (SPs), i.e., a total of 128 SPs and it
supports 768 threads per Streaming Multiprocessor (note: not per SP).
Eventually, after each Streaming Multiprocessor has 8 SPs, each SP
supports a maximal of 768/8 = 96 threads. Total threads that can run
on 128 SPs - 128 * 96 = 12,228 times.
Therefore these processors are called massively parallel.
The G80 chips have a memory bandwidth of 86.4GB/s.

https://www.geeksforgeeks.org/electronics-engineering/introduction-to-cuda-programming/ 2/7
11/3/25, 5:59 PM Introduction to CUDA Programming - GeeksforGeeks

It also has an 8GB/s communication channel with the CPU (4GB/s for
uploading to the CPU RAM, and 4GB/s for downloading from the CPU
RAM).

How CUDA works?

GPUs run one kernel (a group of tasks) at a time.


Each kernel consists of blocks, which are independent groups of ALUs.
Each block contains threads, which are levels of computation.
The threads in each block typically work together to calculate a value.
Threads in the same block can share memory.
In CUDA, sending information from the CPU to the GPU is often the
most typical part of the computation.
For each thread, local memory is the fastest, followed by shared
memory, global, static, and texture memory the slowest.

Typical CUDA Program flow

1. Load data into CPU memory


2. Copy data from CPU to GPU memory - e.g., cudaMemcpy(...,
cudaMemcpyHostToDevice)
3. Call GPU kernel using device variable - e.g., kernel<<<>>> (gpuVar)
4. Copy results from GPU to CPU memory - e.g., cudaMemcpy(..,
cudaMemcpyDeviceToHost)
5. Use results on CPU

How work is distributed?

Each thread "knows" the x and y coordinates of the block it is in, and
the coordinates where it is in the block.
These positions can be used to calculate a unique thread ID for each
thread.
The computational work done will depend on the value of the thread
ID.
https://www.geeksforgeeks.org/electronics-engineering/introduction-to-cuda-programming/ 3/7
11/3/25, 5:59 PM Introduction to CUDA Programming - GeeksforGeeks

For example, the thread ID corresponds to a group of matrix elements.

CUDA Applications

CUDA applications must run parallel operations on a lot of data, and be


processing-intensive.

1. Computational finance
2. Climate, weather, and ocean modeling
3. Data science and analytics
4. Deep learning and machine learning
5. Defence and intelligence
6. Manufacturing/AEC
7. Media and entertainment
8. Medical imaging
9. Oil and gas
10. Research
11. Safety and security
12. Tools and management

Benefits of CUDA

There are several advantages that give CUDA an edge over traditional
general-purpose graphics processor (GPU) computers with graphics
APIs:

Integrated memory (CUDA 6.0 or later) and Integrated virtual memory


(CUDA 4.0 or later).
Shared memory provides a fast area of shared memory for CUDA
threads. It can be used as a caching mechanism and provides more
bandwidth than texture lookup.
Scattered read codes can be read from any address in memory.
Improved performance on downloads and reads, which works well
from the GPU and to the GPU.
CUDA has full support for bitwise and integer operations.
https://www.geeksforgeeks.org/electronics-engineering/introduction-to-cuda-programming/ 4/7
11/3/25, 5:59 PM Introduction to CUDA Programming - GeeksforGeeks

Limitations of CUDA

CUDA source code is given on the host machine or GPU, as defined by


the C++ syntax rules. Longstanding versions of CUDA use C syntax
rules, which means that up-to-date CUDA source code may or may
not work as required.
CUDA has unilateral interoperability(the ability of computer systems
or software to exchange and make use of information) with transferor
languages like OpenGL. OpenGL can access CUDA registered
memory, but CUDA cannot access OpenGL memory.
Afterward versions of CUDA do not provide emulators or fallback
support for older versions.
CUDA supports only NVIDIA hardware.

Comment A amitve… Follow 20

Article Tags : Electronics Engineering TrueGeek-2021

Explore
Electronic Devices & Components

Digital Circuits & Logic

Analog & Circuit Behavior

Solid-State Devices

Communication Systems

Signal Processing

https://www.geeksforgeeks.org/electronics-engineering/introduction-to-cuda-programming/ 5/7
11/3/25, 5:59 PM Introduction to CUDA Programming - GeeksforGeeks

Corporate & Communications Address:


A-143, 7th Floor, Sovereign Corporate
Tower, Sector- 136, Noida, Uttar Pradesh
(201305)

Registered Address:
K 061, Tower K, Gulshan Vivante
Apartment, Sector 137, Noida, Gautam
Buddh Nagar, Uttar Pradesh, 201305

https://www.geeksforgeeks.org/electronics-engineering/introduction-to-cuda-programming/ 6/7
11/3/25, 5:59 PM Introduction to CUDA Programming - GeeksforGeeks

Company Explore
About Us POTD
Legal Job-A-Thon
Privacy Policy Blogs
Contact Us Nation Skill Up
Advertise with us
GFG Corporate Solution
Campus Training Program

Tutorials Courses
Programming Languages IBM Certification
DSA DSA and Placements
Web Technology Web Development
AI, ML & Data Science Programming Languages
DevOps DevOps & Cloud
CS Core Subjects GATE
Interview Preparation Trending Technologies
Software and Tools

Videos Preparation Corner


DSA Interview Corner
Python Aptitude
Java Puzzles
C++ GfG 160
Web Development System Design
Data Science
CS Subjects

@GeeksforGeeks, Sanchhaya Education Private Limited, All rights reserved

https://www.geeksforgeeks.org/electronics-engineering/introduction-to-cuda-programming/ 7/7

You might also like