0% found this document useful (0 votes)

160 views9 pages

Getting Started With CUDA Samples

Guia de inicio com exemplos para CUDA Nvidia.

Uploaded by

Vinícius Lisboa

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

160 views9 pages

Getting Started With CUDA Samples

Guia de inicio com exemplos para CUDA Nvidia.

Uploaded by

Vinícius Lisboa

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

GETTING STARTED WITH CUDA

SAMPLES
DA-05723-001_v5.0 | October 2012

Application Note

TABLE OF CONTENTS
Chapter1.Getting Started with CUDA Samples........................................................... 1
1.1 Before You Begin......................................................................................... 1
1.2 Getting Started........................................................................................... 1
1.2.1 Getting Started Samples........................................................................... 2
1.2.2 Simple CUDA Samples.............................................................................. 2
1.2.3CUDA + Graphics Interoperability................................................................ 5
1.2.4 Multi-GPU Programming............................................................................ 6

www.nvidia.com

Getting Started with CUDA Samples

DA-05723-001_v5.0|ii

Chapter 1.
GETTING STARTED WITH CUDA SAMPLES

NVIDIA CUDA is a general purpose parallel computing architecture introduced

by NVIDIA. It includes the CUDA Instruction Set Architecture (ISA) and the parallel
compute engine in the GPU.
This document is intended to introduce to you a set of samples that can be run as an
introduction to CUDA. Most of these samples use the CUDA runtime API except for
ones explicitly noted that are CUDA Driver API.
To run these samples, you should have experience with C and/or C++. It is not required
that you have any parallel programming experience to start out.
The CUDA C samples listed in this document are found in both the C and
7_CUDALibraries default directories in the following folders:
Windows
ProgramData\NVIDIA Corporation\CUDA Samples\v5.0
Linux
~/NVIDIA_CUDA-5.0_Samples
Mac OSX
/Developer/NVIDIA/CUDA-5.0/Samples

1.1BEFORE YOU BEGIN

This document assumes you have installed CUDA on your system. CUDA runs
on Windows, Mac, and Linux environments. To install CUDA, refer to the CUDA
Getting Started Guide available with the SDK and on the CUDA web site at http://
www.nvidia.com/cuda

1.2GETTING STARTED
The list of samples is divided up into three categories:
Getting started samples. If you are new to CUDA, these are the best samples to begin
with.
Simple CUDA samples

www.nvidia.com

Getting Started with CUDA Samples

DA-05723-001_v5.0|1

Getting Started with CUDA Samples

Samples that demonstrate CUDA + Graphics interoperability

There are some overlaps between the three categories.

1.2.1Getting Started Samples

matrixMul, matrixMulCUBLAS
This sample implements matrix multiplication as a CUDA kernel. It has been written
for clarity of exposition to illustrate various CUDA programming principles, but not
with the goal of providing the best performance kernel for matrix multiplication. For
performance, this sample also uses the CUBLAS library to show high-performance matrix
multiplication.
simpleTemplates
This sample is a templatized version of the template project. It also shows how to
correctly templatize dynamically-allocated shared memory arrays.
template
This sample is a basic template project that can be used as a starting point for creating
new CUDA projects.
template_runtime
This is a simple template project that can be used as a starting point to create a new
CUDA project that does not use the cutil library.

1.2.2Simple CUDA Samples

BandwidthTest
This is a test program to measure the memory copy bandwidth of the GPU. It currently
is capable of measuring device-to-device copy bandwidth, host-to-device copy
bandwidth for pageable and page-locked memory, and device-to-host copy bandwidth
for pageable and page-locked memory.
Clock
This example shows how to use the clock function in CUDA kernels to measure the
performance within a kernel accurately.

www.nvidia.com

Getting Started with CUDA Samples

DA-05723-001_v5.0|2

Getting Started with CUDA Samples

cudaOpenMP
This is a sample application demonstrating how to use the OpenMP API for launching
workloads across multiple GPUs. The binaries for this sample are not pre-built with the
Windows installer.
deviceQuery, deviceQueryDrv
These two samples show how to enumerate properties of the CUDA devices present in
the system (using the CUDA runtime API). The *Drv version has the same functions as
the runtime sample, but uses the CUDA Driver API.
Ptxjit
This sample demonstrates JIT compilation of PTX code. This sample uses a PTX program
embedded in a string array. The CUDA Driver API calls are used to compile and run a
PTX program.
simpleAtomicIntrinsics
This is a simple demonstration of global memory atomic instructions. This sample
requires Compute Capability 1.1 or higher.
simpleCUBLAS
This is a basic example demonstrating how to use the CUBLAS (CUDA Basic Linear
Algebra) library. This sample can be found within the 7_CUDALibraries folder.
For more details on how to use the CUBLAS Library, refer to the CUBLAS Library
Programming Guide included with the CUDA Toolkit.
simpleCUFFT
This is a basic example demonstrating how to use the CUFFT (CUDA Fast Fourier
Transform) Library. In this sample, CUFFT is used to compute the 1D-convolution of a
signal. The signal is transformed to the frequency domain, multiplied together with a
filter kernel, and then the signal is transformed back to time domain. This sample can be
found within the 7_CUDALibraries folder.
For more details on how to use the CUFFT Library, refer to the CUBLAS Library
Programming Guide included with the CUDA Toolkit.
simpleMPI
This demonstrates how to use MPI in combination with CUDA to demonstrate how to
launch workloads across multiple systems that have a GPU. This sample generates some
random numbers on one node, dispatches to all nodes, then computes the square root on

www.nvidia.com

Getting Started with CUDA Samples

DA-05723-001_v5.0|3

Getting Started with CUDA Samples

each nodes GPU. Then the average results of the results are computed. The binary is not
pre-built with the installer.
simpleMultiCopy
Since Compute Capability 1.1, it is possible to overlap compute with one memcopy to/
from the host. Compute Capability 2.0 with a Tesla or Quadro GPU improves on this by
enabling a second parallel copy operation in the opposite direction at full speed (PCIe is
symmetric). This sample illustrates the usage of CUDA streams to achieve overlapping
of kernel execution with copying data to and from the device.
simpleMultiGPU
This application demonstrates how to use the CUDA API launch workloads across
multiple GPUs. Starting with CUDA 4.0, there is a new API for CUDA context
management and multi-threaded access. This greatly simplifies the way that CUDA
kernels can be launched across multiple GPUs.
For more details, refer to Portable Memory, Mapped Memory, and Multi-Device-System in
the CUDA C Programming Guide and to the CUDA_4.0_Readiness_Tech_Brief.pdf about the
new multi-device programming model.
simplePitchLinearTexture
This sample demonstrates how to use 1D Pitch Linear Textures in a CUDA program.
simplePrintf
This CUDA Runtime API sample is a very basic sample that implements how to use the
printf function in the device code. Specifically, for devices with compute capability
less than 2.0, the function cuPrintf is called; otherwise, printf can be used directly.
For more details, refer to Assertion in the CUDA C Programming Guide included with the
CUDA Toolkit.
simpleStreams
This sample uses CUDA streams to overlap kernel executions with memcopies between
the device and the host. Starting with CUDA 4.0, this sample adds support to pin of
generic host memory. This sample requires Compute Capability 1.1 or higher.
For more details, refer to Streams in the CUDA C Programming Guide and to the
CUDA_4.0_Readiness_Tech_Brief.pdf included with the CUDA Toolkit.
simpleSurfaceWrite
This sample demonstrates the use of surface references, thus enabling write-to-texture.
This sample requires a Fermi-based GPU (Compute Capability 2.0).

www.nvidia.com

Getting Started with CUDA Samples

DA-05723-001_v5.0|4

Getting Started with CUDA Samples

simpleTemplates
This sample is a templatized version of the template project. It also shows how to
correctly templatize dynamically-allocated shared memory arrays.
simpleTexture, simpleTextureDrv
This sample demonstrates how to use textures with CUDA (Runtime API and the Driver
API versions).
simpleVoteIntrinsics
This is a simple program that demonstrates how to use the Vote (any, all) intrinsic
instruction in a CUDA kernel. This sample requires Compute Capability 1.2 or higher.
simpleZeroCopy
This sample illustrates how to use Zero Memory Copy. Kernels can read directly from
and write directly to pinned system memory. This sample requires GPUs that support this
feature (MCP79, GT200, Fermi based GPUs).
Refer to Parallel Libraries in the CUDA C Best Practices Guide for more details.
template
This sample is a basic template project that can be used as a starting point for creating
new CUDA projects.
template_runtime
This is a simple template project that can be used as a starting point to create a new
CUDA project that does not use the cutil library.

1.2.3CUDA + Graphics Interoperability

These SDK samples demonstrate interoperability between CUDA and graphics.
simpleD3D9
This program demonstrates the interoperability between CUDA and Direct3D9. The
program modifies vertex positions with CUDA and uses Direct3D9 to render the
geometry. A Direct3D capable device is required.
simpleD3D9texture
This program demonstrates Direct3D9 texture interoperability with CUDA. The
program creates a number of D3D9 textures (2D, 3D, and CubeMap) which are written
to/from CUDA kernels. Direct3D then renders the results on the screen. A Direct3D
capable device is required.

www.nvidia.com

Getting Started with CUDA Samples

DA-05723-001_v5.0|5

Getting Started with CUDA Samples

simpleD3D10
This program demonstrates the interoperability between CUDA and Direct3D10. The
program modifies vertex positions with CUDA and uses Direct3D10 to render the
geometry. A Direct3D Capable device is required.
simpleD3D10Texture
This program demonstrates Direct3D10 texture interoperability with CUDA. The
program creates a number of D3D10 textures (2D, 3D, and CubeMap) which are written
to from CUDA kernels. Direct3D then renders the results on the screen. A Direct3D
Capable device is required.
simpleD3D11Texture
This program demonstrates Direct3D11 texture interoperability with CUDA. The
program creates a number of D3D11 textures (2D, 3D, and CubeMap) which are written
to from CUDA kernels. Direct3D then renders the results on the screen. A Direct3D
Capable device is required.
simpleGL
This program demonstrates interoperability between CUDA and OpenGL. The program
modifies vertex positions with CUDA and uses OpenGL to render the geometry.
simpleTexture3D
This program demonstrates the use of 3D textures in CUDA.

1.2.4Multi-GPU Programming
simpleP2P
This sample demonstrates how to use a new feature in CUDA 4.0 API for multi-device
programming with UVA (Unified Virtual Addressing) and GPU Direct 2.0 peer to peer
communications (copying of data and memory addressing). This sample requires two
GPUs with peer to peer capability based on GF100 or GF110 or newer.

www.nvidia.com

Getting Started with CUDA Samples

DA-05723-001_v5.0|6

Notice
ALL NVIDIA DESIGN SPECIFICATIONS, REFERENCE BOARDS, FILES, DRAWINGS,
DIAGNOSTICS, LISTS, AND OTHER DOCUMENTS (TOGETHER AND SEPARATELY,
"MATERIALS") ARE BEING PROVIDED "AS IS." NVIDIA MAKES NO WARRANTIES,
EXPRESSED, IMPLIED, STATUTORY, OR OTHERWISE WITH RESPECT TO THE
MATERIALS, AND EXPRESSLY DISCLAIMS ALL IMPLIED WARRANTIES OF
NONINFRINGEMENT, MERCHANTABILITY, AND FITNESS FOR A PARTICULAR
PURPOSE.
Information furnished is believed to be accurate and reliable. However, NVIDIA
Corporation assumes no responsibility for the consequences of use of such
information or for any infringement of patents or other rights of third parties
that may result from its use. No license is granted by implication of otherwise
under any patent rights of NVIDIA Corporation. Specifications mentioned in this
publication are subject to change without notice. This publication supersedes and
replaces all other information previously supplied. NVIDIA Corporation products
are not authorized as critical components in life support devices or systems
without express written approval of NVIDIA Corporation.
Trademarks
NVIDIA and the NVIDIA logo are trademarks or registered trademarks of NVIDIA
Corporation in the U.S. and other countries. Other company and product names
may be trademarks of the respective companies with which they are associated.
Copyright
2007-2012 NVIDIA Corporation. All rights reserved.

www.nvidia.com

Introduction To Gpu Programming With Cuda and Openacc
100% (1)
Introduction To Gpu Programming With Cuda and Openacc
40 pages
Programming Gpus With Cuda: John Mellor-Crummey
No ratings yet
Programming Gpus With Cuda: John Mellor-Crummey
42 pages
NVIDIA OpenCL JumpStart Guide
No ratings yet
NVIDIA OpenCL JumpStart Guide
15 pages
GPU Programming Slides 2
No ratings yet
GPU Programming Slides 2
37 pages
Cuda Lab Manual
100% (1)
Cuda Lab Manual
22 pages
Nvidia Cuda Getting Started Guide For Microsoft Windows: Installation and Verification On Windows
No ratings yet
Nvidia Cuda Getting Started Guide For Microsoft Windows: Installation and Verification On Windows
15 pages
Understanding CUDA Architecture and GPU
No ratings yet
Understanding CUDA Architecture and GPU
6 pages
Image Rotation Using CUDA
No ratings yet
Image Rotation Using CUDA
18 pages
Lecture 0: Cpus and Gpus: Prof. Mike Giles
No ratings yet
Lecture 0: Cpus and Gpus: Prof. Mike Giles
36 pages
CUDA Installation Guide Windows
100% (1)
CUDA Installation Guide Windows
17 pages
Chapter 2. Pair Programming
No ratings yet
Chapter 2. Pair Programming
15 pages
GPU Computing for Data Scientists
No ratings yet
GPU Computing for Data Scientists
34 pages
June 2001, Volume 7, Number 6: On The Cover
100% (2)
June 2001, Volume 7, Number 6: On The Cover
47 pages
Laravel 5.1 Beauty - Creating Beautiful Web Apps With Laravel 5.1 PDF
No ratings yet
Laravel 5.1 Beauty - Creating Beautiful Web Apps With Laravel 5.1 PDF
247 pages
GPU Computing Revolution CUDA
100% (1)
GPU Computing Revolution CUDA
5 pages
Mastering Oracle-Python Queries
No ratings yet
Mastering Oracle-Python Queries
5 pages
Gpu1 - GPU Introduction
No ratings yet
Gpu1 - GPU Introduction
20 pages
Introduction To OpenCL Programming (201005)
No ratings yet
Introduction To OpenCL Programming (201005)
132 pages
TensorFlow Lite Micro Embedded Machine L
No ratings yet
TensorFlow Lite Micro Embedded Machine L
13 pages
GPUProgramming Talk
No ratings yet
GPUProgramming Talk
18 pages
Machine Learning and AI Workloads Hardware Requirements
No ratings yet
Machine Learning and AI Workloads Hardware Requirements
2 pages
Install TensorFlow With Pip - TensorFlow
No ratings yet
Install TensorFlow With Pip - TensorFlow
3 pages
Neural ODEs: Continuous-Depth Models
No ratings yet
Neural ODEs: Continuous-Depth Models
13 pages
Programming For The Series 60 Platform and Symbian OS PDF
No ratings yet
Programming For The Series 60 Platform and Symbian OS PDF
549 pages
Nvidia DGX A100 Datasheet
No ratings yet
Nvidia DGX A100 Datasheet
2 pages
Dzone Rc251 Gettingstartedwithtensorflow
No ratings yet
Dzone Rc251 Gettingstartedwithtensorflow
5 pages
Bigdata Engineer PDF
No ratings yet
Bigdata Engineer PDF
3 pages
Image Processing with CUDA on GPU
No ratings yet
Image Processing with CUDA on GPU
87 pages
Leukemia Cancer Cells Segmentation and Classification Using Machine Learning
No ratings yet
Leukemia Cancer Cells Segmentation and Classification Using Machine Learning
18 pages
Introduction To Edge Computing
No ratings yet
Introduction To Edge Computing
24 pages
RNN For Malware Detection
No ratings yet
RNN For Malware Detection
18 pages
PC Intern - Encyclopedia of System Programming PDF
No ratings yet
PC Intern - Encyclopedia of System Programming PDF
1,009 pages
CUDA Memory for HPC Students
No ratings yet
CUDA Memory for HPC Students
27 pages
Deep Learning-Based Approach For Sign Language Gesture Recognition With Efficient Hand Gesture Representation
No ratings yet
Deep Learning-Based Approach For Sign Language Gesture Recognition With Efficient Hand Gesture Representation
16 pages
Extreme Programming
No ratings yet
Extreme Programming
22 pages
A Beginner's Guide To Stable LM Suite of Language Models
No ratings yet
A Beginner's Guide To Stable LM Suite of Language Models
4 pages
GANs for Financial Data Augmentation
No ratings yet
GANs for Financial Data Augmentation
8 pages
Pair Programming for Developers
No ratings yet
Pair Programming for Developers
11 pages
Linux Kernel Module Programming
No ratings yet
Linux Kernel Module Programming
123 pages
Accelerating Scientific Computing with GPUs
100% (2)
Accelerating Scientific Computing with GPUs
96 pages
Tensor Processing Unit Overview
50% (2)
Tensor Processing Unit Overview
23 pages
Medical Image Fusion Method by Deep Learning
No ratings yet
Medical Image Fusion Method by Deep Learning
9 pages
Cuda 9 and Beyond
100% (1)
Cuda 9 and Beyond
45 pages
Pair Programming Microsoft
No ratings yet
Pair Programming Microsoft
9 pages
Introduction to N-grams in Language Modeling
No ratings yet
Introduction to N-grams in Language Modeling
76 pages
Modern GPU
100% (1)
Modern GPU
221 pages
Vision-Language Models Intro Guide
No ratings yet
Vision-Language Models Intro Guide
76 pages
Lecture 1 - Programming With VB
100% (2)
Lecture 1 - Programming With VB
30 pages
Overview of Multilayer Perceptrons
No ratings yet
Overview of Multilayer Perceptrons
24 pages
Data-Level Parallelism in Vector, SIMD, And: GPU Architectures
100% (1)
Data-Level Parallelism in Vector, SIMD, And: GPU Architectures
29 pages
Survey of Deep Learning Accelerators
No ratings yet
Survey of Deep Learning Accelerators
44 pages
Resource-Constrained Machine Learning For ADAS: A Systematic Review
No ratings yet
Resource-Constrained Machine Learning For ADAS: A Systematic Review
26 pages
2003 Siggraph Perez
No ratings yet
2003 Siggraph Perez
6 pages
Overview of Federated Learning Techniques
No ratings yet
Overview of Federated Learning Techniques
50 pages
PyCUDA AH PDF
No ratings yet
PyCUDA AH PDF
16 pages
TinyGo for Embedded Software Development
No ratings yet
TinyGo for Embedded Software Development
12 pages
Ultimate Data Science - GenAI Bootcamp
No ratings yet
Ultimate Data Science - GenAI Bootcamp
34 pages
E
No ratings yet
E
7 pages
ACA Unit3 Revised
No ratings yet
ACA Unit3 Revised
53 pages
Introduction to CUDA Programming
No ratings yet
Introduction to CUDA Programming
26 pages
Itu-T Q.1912.5 P020100707552040138128
No ratings yet
Itu-T Q.1912.5 P020100707552040138128
114 pages
Sharepoint 2013 Architecture Overview
No ratings yet
Sharepoint 2013 Architecture Overview
1 page
Dead Man Sheet Pile Wall, SI Units (DeepEX 2015)
No ratings yet
Dead Man Sheet Pile Wall, SI Units (DeepEX 2015)
14 pages
Fluid Machinery for Engineers
No ratings yet
Fluid Machinery for Engineers
17 pages
Affordable Glass & Aluminum Solutions
No ratings yet
Affordable Glass & Aluminum Solutions
2 pages
Bright Before Sunrise by Tiffany Schmidt (Excerpt)
No ratings yet
Bright Before Sunrise by Tiffany Schmidt (Excerpt)
10 pages
Zhone Dslam
No ratings yet
Zhone Dslam
410 pages
David O. Inc. Design Presentation
100% (2)
David O. Inc. Design Presentation
28 pages
AMAT Sample Test
67% (3)
AMAT Sample Test
6 pages
BIM & Energy Modeling Analysis
No ratings yet
BIM & Energy Modeling Analysis
57 pages
bp344 Reviewer
No ratings yet
bp344 Reviewer
13 pages
Structural Floor Design Guide
No ratings yet
Structural Floor Design Guide
26 pages
Penetration Testing Guide
No ratings yet
Penetration Testing Guide
9 pages
BNCET Civil Engg. Exam Results 2019
No ratings yet
BNCET Civil Engg. Exam Results 2019
177 pages
High Rise Case Study1
86% (28)
High Rise Case Study1
12 pages
Ancient Ephesus: Library of Celsus
No ratings yet
Ancient Ephesus: Library of Celsus
9 pages
Residential Architecture Portfolio 2021-2024
No ratings yet
Residential Architecture Portfolio 2021-2024
13 pages
Mobile App Prototype Assignment
No ratings yet
Mobile App Prototype Assignment
2 pages
Bridge Load Rating Guidelines
No ratings yet
Bridge Load Rating Guidelines
13 pages
Architectural Record 2011-07
No ratings yet
Architectural Record 2011-07
124 pages
Duplex Easy en 2013 12
No ratings yet
Duplex Easy en 2013 12
4 pages
RFC 2196 (Site Security Handbook) Annotated With ISO 27001 References and Other Updates
No ratings yet
RFC 2196 (Site Security Handbook) Annotated With ISO 27001 References and Other Updates
75 pages
Java Questions
No ratings yet
Java Questions
43 pages
Lps Construction Manual - Part 2
100% (1)
Lps Construction Manual - Part 2
630 pages
Material Approval Sheet
No ratings yet
Material Approval Sheet
16 pages
Grobler Criteria (2006)
No ratings yet
Grobler Criteria (2006)
23 pages
Tubular Structures in Architecture
No ratings yet
Tubular Structures in Architecture
127 pages
Kitchen & Laundry Design Experts
No ratings yet
Kitchen & Laundry Design Experts
20 pages
TAPP REFRESHER Answer Key-1
No ratings yet
TAPP REFRESHER Answer Key-1
4 pages

Getting Started With CUDA Samples

Uploaded by

Getting Started With CUDA Samples

Uploaded by

GETTING STARTED WITH CUDA

Getting Started with CUDA Samples

NVIDIA CUDA is a general purpose parallel computing architecture introduced

1.1BEFORE YOU BEGIN

Getting Started with CUDA Samples

Getting Started with CUDA Samples

Samples that demonstrate CUDA + Graphics interoperability

1.2.1Getting Started Samples

1.2.2Simple CUDA Samples

Getting Started with CUDA Samples

Getting Started with CUDA Samples

Getting Started with CUDA Samples

Getting Started with CUDA Samples

Getting Started with CUDA Samples

Getting Started with CUDA Samples

1.2.3CUDA + Graphics Interoperability

Getting Started with CUDA Samples

Getting Started with CUDA Samples

Getting Started with CUDA Samples

You might also like