0% found this document useful (0 votes)

16 views6 pages

GPU Assignment2

The assignment requires implementing a 2D convolution operation using CUDA to enhance understanding of parallel computing. Students must handle input images and filters as stacked 2D matrices, ensuring efficient GPU parallelism and zero-padding for out-of-bounds operations. The output must maintain specified dimensions, and adherence to guidelines is crucial to avoid penalties and ensure proper execution timing.

Uploaded by

Nachiket Mahesh Dighe ee21b093

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views6 pages

GPU Assignment2

Uploaded by

Nachiket Mahesh Dighe ee21b093

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

CS6023: GPU Programming (Jan 2025)

Assignment 2
Due March 2, 23:59
1 Problem Statement
Convolution is a crucial operation in image processing, computer vision, and neural
networks. It helps extract spatial features by applying filters to input data, allowing
for pattern detection, edge recognition, and more complex feature extraction. This
assignment focuses on implementing a 2D convolution operation using CUDA to better
understand parallel computing concepts.
In this assignment, you will implement a 2D convolution operation on an input image
and filter set using CUDA. The task requires handling images and filters in a non-tensor
form, specifically as stacked 2D matrices. Your objective is to perform this operation
efficiently using GPU parallelism; using memory coalescing and shared memory is
a must.

1.1 Definitions
• Input Image: An image typically has dimensions H × W × C, where H is the
height, W is the width, and C is the number of channels (e.g., RGB has 3 channels).
• Filter Set: Filters typically have dimensions K × R × S × C, where K is the
number of filters, R and S are the spatial dimensions of each filter, and C is the
number of channels (which will be the same as the input filter channels).

1.2 Input Transformation

To simplify the implementation using 2D matrices, the input image and filters are trans-
formed as follows:
1. The input image of size H × W × C is stacked along the channel axis, resulting in
a matrix of size (H × C) × W , refer to figure 1 to know how.
2. Each filter of size R × S × C is transformed into a matrix of size (R × C) × S. The
full filter set, with K filters, will be a collection of K such matrices.

1.3 Task Description

Your task is to:

• Perform a 2D convolution of the transformed input image with each filter in the
set. You have to apply the convolution of the given channel with the corresponding
channel in the filter and sum it over channels. You can refer to figure 2.

1
Figure 1: Input representation

• The output for each filter should have dimensions H × W , as appropriate padding
is applied.

• Populate the output matrices in the corresponding variables provided in the starter
code.

Figure 2: conv2d operation for one filter

• Ensure that your implementation is parallelized using CUDA kernel calls.

• Assume that if any cell required for the convolution operation falls outside the
matrix boundaries, its value is 0 (zero-padding).

• The stride length for the convolution is 1.

2
• Filters will always have odd dimensions, and you are expected to align the center
of the filter with each corresponding matrix section during the operation.

2 General Guidelines
• Do not use tensor libraries or pre-built convolution functions. You must work with
2D matrices only.

• Do not modify any existing pieces of code provided in the starter template. You are
expected to add necessary memory allocation and CUDA memory copy operations.

• We will time your kernel executions to verify parallelism.

• Plagiarism is a serious offense and will result in academic disciplinary actions.

3 Input and Output Format

3.1 Input Format and Output Format
The input folder consists of test case files. Each file has the following format:

• The input image matrix is stacked vertically along its channels.

• The filter matrix is stacked similarly and applied to the image using 2D convolu-
tion.

• The output matrix corresponds to the result of this operation, including padding
to maintain the H × W dimensions for one filter, and in the case of multiple filters,
the matrices are stacked vertically.

Example 1 (Single Channel):

Image Dimensions:

3 3 1

Height = 3, Width = 3, Channels = 1

Image Matrix:
1 2 3
4 5 6
7 8 9
Filter Dimensions:

1 3 3 1

Filter Channels = 1, Filter Height = 3, Width = 3, Number of Filters = 1

Filter Matrix:

3
1 1 1
1 1 1
1 1 1
Output Matrix:
12 21 16
27 45 33
24 39 28

Example 2 (Two Channels, One Output Filter):

Image Dimensions:

3 3 2

Height = 3, Width = 3, Channels = 2

Image Matrix (Stacked by Channels):

1 2 3
4 5 6
7 8 9
10 11 12
13 14 15
16 17 18
Filter Dimensions:

2 3 3 1

Filter Channels = 2, Filter Height = 3, Width = 3, Number of Filters = 1

Filter Matrix (Stacked by Channels):

1 0 1
1 0 1
1 0 1
2 0 -1
2 0 -1
2 0 -1
Output Matrix:
-18 33 57
-27 63 99
-18 51 75

Example 3 (Two Channels, Two Output Filters):

Image Dimensions:

3 3 2

4
Height = 3, Width = 3, Channels = 2
Image Matrix (Stacked by Channels):

1 2 3
4 5 6
7 8 9
10 11 12
13 14 15
16 17 18
Filter Dimensions:

2 3 3 2

Filter Channels = 2, Filter Height = 3, Width = 3, Number of Filters = 2

Filter Matrix (Stacked by Channels):

1 1 1
1 1 1
1 1 1
2 2 2
2 2 2
2 2 2
3 3 3
3 3 3
3 3 3
4 4 4
4 4 4
4 4 4
Output Matrices:
108 171 120
Filter 1: 189 297 207
144 225 156
228 363 256
Filter 2: 405 639 447
312 489 340

3.2 Constraints
• 0 ≤ each cell of the input ≤ 1000

• 1 ≤ H, W ≤ 1024

• 1 ≤ K × R × S × C ≤ 4096

5
4 Testing and Submission
• Use the provided tester program to validate your solution.

• Penalty: -2 points will be penalized for execution times greater than 2*average.

• Submit a zip file with roll number (e.g., [Link]) which contains a single
.cu file named with your roll number (e.g., [Link]).

• Ensure the output matches the expected format to avoid penalties.

• 0 Marks if the code was written sequentially/ not used shared memory.

Pdclab 6
No ratings yet
Pdclab 6
15 pages
DS1822 ParallelComputing Unit4
No ratings yet
DS1822 ParallelComputing Unit4
16 pages
Fulltext01 2
No ratings yet
Fulltext01 2
60 pages
Synopsis
No ratings yet
Synopsis
7 pages
Convolutional Neural Networks
No ratings yet
Convolutional Neural Networks
102 pages
Intro to Convolutional Networks
No ratings yet
Intro to Convolutional Networks
152 pages
Convolution Separable
No ratings yet
Convolution Separable
21 pages
Understanding Convolutional Neural Networks
No ratings yet
Understanding Convolutional Neural Networks
68 pages
22BEECE12
No ratings yet
22BEECE12
5 pages
Convnets 1
No ratings yet
Convnets 1
21 pages
CS 182 Berkeley 2021 Discussion 3
No ratings yet
CS 182 Berkeley 2021 Discussion 3
5 pages
DeepLearning Unit-II
No ratings yet
DeepLearning Unit-II
48 pages
03 - Intel-AI-convolutions-openmp
No ratings yet
03 - Intel-AI-convolutions-openmp
13 pages
Lecture 11 Slides
No ratings yet
Lecture 11 Slides
15 pages
2D Convolution with NVIDIA Caching
No ratings yet
2D Convolution with NVIDIA Caching
30 pages
Cours 8 A
No ratings yet
Cours 8 A
34 pages
CNN Tutorial: Learn from Scratch
No ratings yet
CNN Tutorial: Learn from Scratch
11 pages
(Fall 2024) Images and Convolutions
No ratings yet
(Fall 2024) Images and Convolutions
69 pages
Convolution
No ratings yet
Convolution
6 pages
Convolutional Neural Networks Explained
No ratings yet
Convolutional Neural Networks Explained
33 pages
HODL Lec 3 DNNs For Vision 1
No ratings yet
HODL Lec 3 DNNs For Vision 1
36 pages
Convolutional Neural Network
No ratings yet
Convolutional Neural Network
26 pages
Building Convolutional Neural Network From Scratch
No ratings yet
Building Convolutional Neural Network From Scratch
11 pages
Convolutional Neural Networks
No ratings yet
Convolutional Neural Networks
88 pages
CS 4476 Project 1 Description
No ratings yet
CS 4476 Project 1 Description
8 pages
Dis3 Sol
No ratings yet
Dis3 Sol
7 pages
Conv2d Intro
No ratings yet
Conv2d Intro
32 pages
CNN Assignment 3 Solutions
No ratings yet
CNN Assignment 3 Solutions
8 pages
HPC Report
No ratings yet
HPC Report
9 pages
Image Convolution Assignment in C++
No ratings yet
Image Convolution Assignment in C++
3 pages
Module 1 Part 2
No ratings yet
Module 1 Part 2
49 pages
Gpucoder Ug
No ratings yet
Gpucoder Ug
560 pages
Image Filtering & Hybrid Images Guide
No ratings yet
Image Filtering & Hybrid Images Guide
7 pages
Modifications to CNN in Sliding Window
No ratings yet
Modifications to CNN in Sliding Window
14 pages
Convolutional Layer Examples
No ratings yet
Convolutional Layer Examples
69 pages
Convolutional Neural Networks
No ratings yet
Convolutional Neural Networks
161 pages
MV cs4243 2024 Amir 2
No ratings yet
MV cs4243 2024 Amir 2
70 pages
Convolutional Model
No ratings yet
Convolutional Model
21 pages
Example of 2D Convolution
No ratings yet
Example of 2D Convolution
3 pages
CNN 1
No ratings yet
CNN 1
9 pages
Chap4 CNN (20240205) - DL4H Practioner Guide
No ratings yet
Chap4 CNN (20240205) - DL4H Practioner Guide
23 pages
26-Deep Convolutional Models - ResNet, AlexNet, InceptionNet and Others-16!09!2024
No ratings yet
26-Deep Convolutional Models - ResNet, AlexNet, InceptionNet and Others-16!09!2024
6 pages
Cse108 Lab5
No ratings yet
Cse108 Lab5
2 pages
CNNs for Visual Recognition
No ratings yet
CNNs for Visual Recognition
170 pages
CS217 2024 Lec14
No ratings yet
CS217 2024 Lec14
11 pages
Convolutional Neural Networks - Part 1
No ratings yet
Convolutional Neural Networks - Part 1
44 pages
3 Lecture 21 01 25
No ratings yet
3 Lecture 21 01 25
62 pages
Convolutional Neural Networks (CNNS) Introduction, Convolution Operation, Pooling Layers, Padding, Hyper Parameter Tuning
No ratings yet
Convolutional Neural Networks (CNNS) Introduction, Convolution Operation, Pooling Layers, Padding, Hyper Parameter Tuning
51 pages
Lecture 3 CNN Training
No ratings yet
Lecture 3 CNN Training
109 pages
Convo Luci On Adores
No ratings yet
Convo Luci On Adores
6 pages
Digital Signal Processing: Name: Roll No: Aim
No ratings yet
Digital Signal Processing: Name: Roll No: Aim
15 pages
Deep Learning: Convolutional Neural Network & Its Applications
No ratings yet
Deep Learning: Convolutional Neural Network & Its Applications
53 pages
Implementation of Convolution
No ratings yet
Implementation of Convolution
3 pages
Gen AI
No ratings yet
Gen AI
21 pages
Grouped vs Standard Convolution Analysis
No ratings yet
Grouped vs Standard Convolution Analysis
48 pages
Image Filter Design for Developers
No ratings yet
Image Filter Design for Developers
3 pages
Sarma CNN Vce Oct 2022
No ratings yet
Sarma CNN Vce Oct 2022
63 pages
Larry W. Swanson-Brain Architecture - Understanding The Basic Plan (Medicine) - Oxford University Press, USA (2002)
100% (2)
Larry W. Swanson-Brain Architecture - Understanding The Basic Plan (Medicine) - Oxford University Press, USA (2002)
282 pages
Psychology Paper2
No ratings yet
Psychology Paper2
4 pages
Test 4 Quadratic Relations
No ratings yet
Test 4 Quadratic Relations
4 pages
Calculus in Basketball 1
100% (1)
Calculus in Basketball 1
3 pages
Mã Đề 340 - Đề Bài
No ratings yet
Mã Đề 340 - Đề Bài
4 pages
Model 300E CO Analyzer Manual
No ratings yet
Model 300E CO Analyzer Manual
95 pages
N4 Communication Paper 2 November 2016
No ratings yet
N4 Communication Paper 2 November 2016
8 pages
Ownership Concentration - Cgi
No ratings yet
Ownership Concentration - Cgi
22 pages
GSC 110 Assignment
No ratings yet
GSC 110 Assignment
3 pages
Zaitex Zetalene Eng
100% (1)
Zaitex Zetalene Eng
2 pages
Dcsmat K Mat Qa 2025
No ratings yet
Dcsmat K Mat Qa 2025
10 pages
NAOC Reservoir Study Tender Notice
No ratings yet
NAOC Reservoir Study Tender Notice
5 pages
Gip Ar Form
100% (1)
Gip Ar Form
1 page
Environmental Audits (Cliff - (Z-Library)
No ratings yet
Environmental Audits (Cliff - (Z-Library)
205 pages
lecturezeroCSE377 1
No ratings yet
lecturezeroCSE377 1
23 pages
Lecture 3 - Ch2 (2.6-2.9) - Probability
No ratings yet
Lecture 3 - Ch2 (2.6-2.9) - Probability
25 pages
Lacan's "Lituraterre": Psychoanalysis & Literature
No ratings yet
Lacan's "Lituraterre": Psychoanalysis & Literature
8 pages
Key Quarterly Examination in Earth and Life Science (Q2)
No ratings yet
Key Quarterly Examination in Earth and Life Science (Q2)
8 pages
David Easton
No ratings yet
David Easton
12 pages
Engineering Hydrology (Precipitation)
No ratings yet
Engineering Hydrology (Precipitation)
76 pages
Jigar's Physics Project
No ratings yet
Jigar's Physics Project
20 pages
BM Worksheet q1 Week 3
No ratings yet
BM Worksheet q1 Week 3
1 page
International Network List - SEPTEMBER 2025
No ratings yet
International Network List - SEPTEMBER 2025
3,557 pages
Industrial Training Feedback Form
No ratings yet
Industrial Training Feedback Form
1 page
DRDP2015 IT Essential View 20190628 ADA
No ratings yet
DRDP2015 IT Essential View 20190628 ADA
52 pages
Pharmacy Refrigerator HYC-639 2024.06.21
No ratings yet
Pharmacy Refrigerator HYC-639 2024.06.21
4 pages
Venturi Meter Pressure Distribution Lab
No ratings yet
Venturi Meter Pressure Distribution Lab
20 pages
G 11 and 12 Woodwork Technology Syllabus - 25 Jun 2025
100% (1)
G 11 and 12 Woodwork Technology Syllabus - 25 Jun 2025
69 pages
New Freshwater Fish Species Described in 2024 CAS IUCN
No ratings yet
New Freshwater Fish Species Described in 2024 CAS IUCN
22 pages

GPU Assignment2

Uploaded by

GPU Assignment2

Uploaded by

CS6023: GPU Programming (Jan 2025)

1.2 Input Transformation

1.3 Task Description

Figure 2: conv2d operation for one filter

• Ensure that your implementation is parallelized using CUDA kernel calls.

• The stride length for the convolution is 1.

• We will time your kernel executions to verify parallelism.

• Plagiarism is a serious offense and will result in academic disciplinary actions.

3 Input and Output Format

• The input image matrix is stacked vertically along its channels.

Example 1 (Single Channel):

Height = 3, Width = 3, Channels = 1

Filter Channels = 1, Filter Height = 3, Width = 3, Number of Filters = 1

Example 2 (Two Channels, One Output Filter):

Height = 3, Width = 3, Channels = 2

Filter Channels = 2, Filter Height = 3, Width = 3, Number of Filters = 1

Example 3 (Two Channels, Two Output Filters):

Filter Channels = 2, Filter Height = 3, Width = 3, Number of Filters = 2

• Ensure the output matches the expected format to avoid penalties.

You might also like