0% found this document useful (0 votes)

30 views4 pages

Parallel Computing Project

This project report discusses the implementation of image processing tasks using CUDA to enhance performance through GPU acceleration. It compares CPU and GPU processing, detailing the architecture, dataset, and performance results, which show significant speed improvements for various image operations. The conclusion highlights the effectiveness of CUDA for pixel-based operations and suggests future work in deep learning and real-time applications.

Uploaded by

Siddhant Singh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

30 views4 pages

Parallel Computing Project

Uploaded by

Siddhant Singh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

TEAM:

TEAM MEMBERS:

Aryan Sinha 20214530

Ayushi Singh 20214174

Aviral Gupta 20214508

Emad Shoaib 20214506

Ganesh Patidar 20214061

Gautham Krishna Jayasurya 20214532

Harshika SINGH 20214234

Hruday Vinayak 20214514

Khushbu Yadav 20214111

Project Report: Image Processing with CUDA

1. Problem Statement
Image processing tasks, such as filtering, edge detection, and convolution, are
computationally intensive when processed on CPUs. Traditional serial processing struggles
to meet real-time requirements in applications like medical imaging, surveillance, and
autonomous driving. This project explores GPU acceleration using CUDA, demonstrating
how parallel computing improves performance in image processing tasks.

2. Overview
This project aims to:

● Implement and accelerate common image processing tasks using CUDA.

● Compare the performance difference between CPU and GPU-based processing.
● Optimize CUDA implementations using memory coalescing, shared memory, and
warp-level optimizations.
● Evaluate speedup using benchmarking tools like NVIDIA Nsight Compute.

3. Dataset & Data Source

We use publicly available image datasets for testing our CUDA-based image processing
algorithms. Some suitable sources include:

● COCO Dataset (Common Objects in Context): [Link]

● ImageNet Dataset: [Link]
● BSDS500 (Berkeley Segmentation Dataset):
[Link]
l
● Custom Images: Captured or generated synthetic images.

4. Dataset Breakdown
The dataset consists of images in various resolutions for benchmarking. A typical dataset
breakdown:

● Training Set: 70% (Used to develop and test algorithms).

● Validation Set: 15% (Used to tune parameters).
● Testing Set: 15% (Used for final evaluation).
● Image Types:
○ Grayscale & RGB images.
○ Resolution: 128x128, 256x256, 512x512, 1024x1024.
○ Various image types: natural scenes, medical images, and textures.

5. Model Architecture (CUDA Implementation)

5.1 CUDA Parallelization Approach

We use CUDA to parallelize pixel-wise image operations, enabling thousands of threads

to run concurrently. The key architectural elements include:

● Thread Hierarchy

○ Grid: Entire image.

○ Blocks: Subsections of the image.
○ Threads: Individual pixels.
● Memory Optimization Strategies

○ Global Memory: Used for large datasets.

○ Shared Memory: Faster access for intra-block data sharing.
○ Texture Memory: Used for 2D spatial locality optimization.

5.2 Implemented CUDA Kernels

● Image Filtering (Gaussian Blur, Sharpening, Edge Detection - Sobel & Prewitt)
● Histogram Equalization (Contrast enhancement)
● Image Convolution (Using custom kernels)
● Thresholding & Segmentation (Otsu's method for object segmentation)

6. Performance Analysis & Results

The GPU-based implementation is compared against a CPU-based approach using
OpenCV. Key performance metrics include:

● Execution Time (CPU vs. GPU)

● Speedup Factor
● Memory Usage

Results:

Operation CPU Time (ms) GPU Time (ms) Speedu

Gaussian Blur (512x512) 120 8 15x

Sobel Edge Detection 250 14 18x

(1024x1024)

Histogram Equalization (256x256) 90 6 15x

Key Insights:

● CUDA significantly reduces processing time.

● Larger images benefit more from parallelization.
● Optimizations like shared memory usage further enhance performance.

7. Conclusion & Future Scope

Conclusion
● GPU-based image processing using CUDA outperforms CPU-based methods by a
significant factor.
● CUDA parallelism is highly effective for pixel-based operations like convolution
and filtering.
● Shared memory and texture memory optimization further boost efficiency.

Future Scope

● Extend to Deep Learning-based image processing (CNNs, GANs).

● Implement real-time applications like video processing and object detection.
● Explore TensorRT for further optimization.

CUDA Image Processing Thesis
No ratings yet
CUDA Image Processing Thesis
66 pages
OpenCV CUDA Functions
100% (1)
OpenCV CUDA Functions
30 pages
Opencv GTC Express Shalini Gupta
No ratings yet
Opencv GTC Express Shalini Gupta
47 pages
Image Parallel Processing Based On GPU PDF
No ratings yet
Image Parallel Processing Based On GPU PDF
4 pages
Image Processing with CUDA on GPU
No ratings yet
Image Processing with CUDA on GPU
87 pages
AcceleratingAIAdvancements Pre Print Doube Blind
No ratings yet
AcceleratingAIAdvancements Pre Print Doube Blind
9 pages
Parallel Programming With CUDA - Architecture, Analysis
No ratings yet
Parallel Programming With CUDA - Architecture, Analysis
93 pages
Duplichecker Plagiarism Report 0.76729900 1744563856
No ratings yet
Duplichecker Plagiarism Report 0.76729900 1744563856
5 pages
A New Approach For Parallel Region Growing Algorithm in Image Segmentation Using MATLAB On GPU Architecture
No ratings yet
A New Approach For Parallel Region Growing Algorithm in Image Segmentation Using MATLAB On GPU Architecture
5 pages
Nvidia Profiling Tools Keipert 10 4 22
No ratings yet
Nvidia Profiling Tools Keipert 10 4 22
27 pages
Assignment III - Advanced CUDA
No ratings yet
Assignment III - Advanced CUDA
12 pages
Programming in Parallel With CUDA A Practical Guide (Richard Ansorge)
100% (1)
Programming in Parallel With CUDA A Practical Guide (Richard Ansorge)
477 pages
Progress Report
No ratings yet
Progress Report
4 pages
S62256 - Demystify CUDA Debugging and Performance With Powerful Developer Tools
No ratings yet
S62256 - Demystify CUDA Debugging and Performance With Powerful Developer Tools
44 pages
Neural Network Implementation Using CUDA and OpenMP
No ratings yet
Neural Network Implementation Using CUDA and OpenMP
7 pages
High-Performance Tomographic Reconstruction Using OpenCL
No ratings yet
High-Performance Tomographic Reconstruction Using OpenCL
99 pages
Bandwidth Intensive 3-D FFT Kernel For Gpus Using Cuda: Akira Nukada, Yasuhiko Ogata, Toshio Endo, Satoshi Matsuoka
No ratings yet
Bandwidth Intensive 3-D FFT Kernel For Gpus Using Cuda: Akira Nukada, Yasuhiko Ogata, Toshio Endo, Satoshi Matsuoka
11 pages
基于 Cuda 加速的 Nvidia 点云解决方案
No ratings yet
基于 Cuda 加速的 Nvidia 点云解决方案
45 pages
Analyzing CUDA Workloads Using A Detailed GPU Simulator
No ratings yet
Analyzing CUDA Workloads Using A Detailed GPU Simulator
12 pages
Python GPU Acceleration Webinar
No ratings yet
Python GPU Acceleration Webinar
33 pages
GPGPU Tutorial
No ratings yet
GPGPU Tutorial
155 pages
Writing a Thesis on Nvidia CUDA
100% (3)
Writing a Thesis on Nvidia CUDA
8 pages
Instruction Mix
No ratings yet
Instruction Mix
14 pages
Instruction Mix
No ratings yet
Instruction Mix
14 pages
Instruction Mix
No ratings yet
Instruction Mix
14 pages
Programming Gpus With Cuda: John Mellor-Crummey
No ratings yet
Programming Gpus With Cuda: John Mellor-Crummey
42 pages
CUDA Cuts Fast Graph Cuts On The GPU
No ratings yet
CUDA Cuts Fast Graph Cuts On The GPU
8 pages
Barnett Haskins
No ratings yet
Barnett Haskins
29 pages
Cuda Review 1
No ratings yet
Cuda Review 1
13 pages
Image Feature Extraction Algorithm Based On CUDA Architecture: Case Study GFD and GCFD
No ratings yet
Image Feature Extraction Algorithm Based On CUDA Architecture: Case Study GFD and GCFD
8 pages
Thesis Support for GPU Programming
100% (2)
Thesis Support for GPU Programming
6 pages
CUDA Programming Model Overview
No ratings yet
CUDA Programming Model Overview
31 pages
Padp 1RV18CS189
No ratings yet
Padp 1RV18CS189
17 pages
HPC Final 4-8
No ratings yet
HPC Final 4-8
25 pages
Batch 17 Paper
No ratings yet
Batch 17 Paper
10 pages
Scipy09 Pycuda Tut
No ratings yet
Scipy09 Pycuda Tut
162 pages
Computer Vision Project - Edge Detection
No ratings yet
Computer Vision Project - Edge Detection
9 pages
Cuuda Nvidai Guide - Part1
No ratings yet
Cuuda Nvidai Guide - Part1
15 pages
Christopher Noel Hesse
No ratings yet
Christopher Noel Hesse
103 pages
Synthesis Gpgpu Draft2012 09
No ratings yet
Synthesis Gpgpu Draft2012 09
100 pages
Introduction to CUDA Programming
No ratings yet
Introduction to CUDA Programming
26 pages
Analysis and Comparison of Performance and Power Consumption of Neural Networks On CPU, GPU, TPU and FPGA - Christopher - Noel - Hesse
No ratings yet
Analysis and Comparison of Performance and Power Consumption of Neural Networks On CPU, GPU, TPU and FPGA - Christopher - Noel - Hesse
103 pages
CUDA Zone - Library of Resources - NVIDIA Developer
No ratings yet
CUDA Zone - Library of Resources - NVIDIA Developer
7 pages
High Performance Computing On Gpu
No ratings yet
High Performance Computing On Gpu
37 pages
CUDA
No ratings yet
CUDA
46 pages
FDTD Cuda
No ratings yet
FDTD Cuda
118 pages
Al-Naqshbndi, S. MSC Thesis
No ratings yet
Al-Naqshbndi, S. MSC Thesis
85 pages
Performance Comparison of FPGA, GPU and CPU in Image Processing
No ratings yet
Performance Comparison of FPGA, GPU and CPU in Image Processing
7 pages
1 Cuda
100% (1)
1 Cuda
173 pages
Zakysakr 2011
No ratings yet
Zakysakr 2011
6 pages
Classification of Textures Using Convolutional
No ratings yet
Classification of Textures Using Convolutional
30 pages
CUDA For Tegra AppNote
No ratings yet
CUDA For Tegra AppNote
60 pages
Introduction to CUDA Parallel Programming
No ratings yet
Introduction to CUDA Parallel Programming
25 pages
S51413 - Developing Optimal CUDA Kernels On Hopper Tensor Cores - 1679452516682001bWRm
No ratings yet
S51413 - Developing Optimal CUDA Kernels On Hopper Tensor Cores - 1679452516682001bWRm
80 pages
Case Study: CFD Dr. Graham Pullan University of Cambridge: Nvidia Tesla
No ratings yet
Case Study: CFD Dr. Graham Pullan University of Cambridge: Nvidia Tesla
56 pages
Image Rotation Using CUDA
No ratings yet
Image Rotation Using CUDA
18 pages
CUDA Toolkit Reference Manual
No ratings yet
CUDA Toolkit Reference Manual
441 pages
Untitled Notebook 1
No ratings yet
Untitled Notebook 1
1 page
Skincare
No ratings yet
Skincare
1 page
Raen Valtor 28 Day Tracker
No ratings yet
Raen Valtor 28 Day Tracker
2 pages
NEET UG 2023 Seat Allotment Guide
No ratings yet
NEET UG 2023 Seat Allotment Guide
2 pages
Iso 105 B02 2014 en PDF
No ratings yet
Iso 105 B02 2014 en PDF
11 pages
Interview Questions On ME FM
No ratings yet
Interview Questions On ME FM
7 pages
Weekly Food and Beverage Diary
No ratings yet
Weekly Food and Beverage Diary
3 pages
Ieee Y32.9 (1989 (
No ratings yet
Ieee Y32.9 (1989 (
42 pages
Maria's Lancer Alternative Structure Stress Brace
No ratings yet
Maria's Lancer Alternative Structure Stress Brace
2 pages
ReviewModule24 Steel2
No ratings yet
ReviewModule24 Steel2
3 pages
Laboratory Diagnosis of Typhoid and Liver Disease
No ratings yet
Laboratory Diagnosis of Typhoid and Liver Disease
23 pages
A System-On-Chip Implementation of The IEEE 802.11a MAC Layer
No ratings yet
A System-On-Chip Implementation of The IEEE 802.11a MAC Layer
6 pages
1 Study Guide - Unit 2
No ratings yet
1 Study Guide - Unit 2
3 pages
Endura-Shield II SERIES 1074: Product Profile
100% (1)
Endura-Shield II SERIES 1074: Product Profile
2 pages
Understanding Tour Operation: Chapter Two Tour Operations Management 1
No ratings yet
Understanding Tour Operation: Chapter Two Tour Operations Management 1
14 pages
Heaven - The New Jerusalem
No ratings yet
Heaven - The New Jerusalem
16 pages
Debasmita Dutta, Roll No - 63
No ratings yet
Debasmita Dutta, Roll No - 63
15 pages
Love of God John 3 16 Part 1
100% (1)
Love of God John 3 16 Part 1
12 pages
Pindyck 8e IM Ch04App
No ratings yet
Pindyck 8e IM Ch04App
7 pages
The Prisoner of Chillon: Lord Byron
No ratings yet
The Prisoner of Chillon: Lord Byron
3 pages
Dungeon Adventure Encounters
No ratings yet
Dungeon Adventure Encounters
10 pages
Vocab - Short Test - Unit - 6A&6B
No ratings yet
Vocab - Short Test - Unit - 6A&6B
1 page
Chemistry Exam: Food Constituents, Alcohols, Acids, and Esters
No ratings yet
Chemistry Exam: Food Constituents, Alcohols, Acids, and Esters
11 pages
RB211-535 (Emp)
No ratings yet
RB211-535 (Emp)
7 pages
FLUKE Automotive Diagnosis
100% (4)
FLUKE Automotive Diagnosis
14 pages
Programming The Raspberry Pi Pico in C (Harry Fairhead) (Z-Lib - Org) 8
No ratings yet
Programming The Raspberry Pi Pico in C (Harry Fairhead) (Z-Lib - Org) 8
1 page
United Arab Emirate's Luxury Accessories
No ratings yet
United Arab Emirate's Luxury Accessories
1 page
Pathology Viva Questions
No ratings yet
Pathology Viva Questions
3 pages
Determination of Static Thermal Conductor Rating Using Statistical Analysis Method
No ratings yet
Determination of Static Thermal Conductor Rating Using Statistical Analysis Method
7 pages
(LAWENG3) Final Paper - Venida, Josue Marie N.
No ratings yet
(LAWENG3) Final Paper - Venida, Josue Marie N.
11 pages
Sr Electrical Technician CV Summary
No ratings yet
Sr Electrical Technician CV Summary
2 pages
Essential Guide to Buying a Snare Drum
No ratings yet
Essential Guide to Buying a Snare Drum
2 pages
AAQ-Smog Hog General Brochures-V.3.9
No ratings yet
AAQ-Smog Hog General Brochures-V.3.9
17 pages
Zou 等 - 2020 - Poorer sleep quality correlated with mental health
No ratings yet
Zou 等 - 2020 - Poorer sleep quality correlated with mental health
8 pages

Parallel Computing Project

Uploaded by

Parallel Computing Project

Uploaded by

TEAM:

Aryan Sinha 20214530

Ayushi Singh 20214174

Aviral Gupta 20214508

Emad Shoaib 20214506

Ganesh Patidar 20214061

Gautham Krishna Jayasurya 20214532

Harshika SINGH 20214234

Hruday Vinayak 20214514

Khushbu Yadav 20214111

Project Report: Image Processing with CUDA

●​ Implement and accelerate common image processing tasks using CUDA.

3. Dataset & Data Source

●​ COCO Dataset (Common Objects in Context): [Link]

●​ Training Set: 70% (Used to develop and test algorithms).

5. Model Architecture (CUDA Implementation)

We use CUDA to parallelize pixel-wise image operations, enabling thousands of threads

○​ Grid: Entire image.

○​ Global Memory: Used for large datasets.

5.2 Implemented CUDA Kernels

6. Performance Analysis & Results

●​ Execution Time (CPU vs. GPU)

Operation CPU Time (ms) GPU Time (ms) Speedu

Gaussian Blur (512x512) 120 8 15x

Sobel Edge Detection 250 14 18x

Histogram Equalization (256x256) 90 6 15x

●​ CUDA significantly reduces processing time.

7. Conclusion & Future Scope

●​ Extend to Deep Learning-based image processing (CNNs, GANs).

You might also like

● Implement and accelerate common image processing tasks using CUDA.

● COCO Dataset (Common Objects in Context): [Link]

● Training Set: 70% (Used to develop and test algorithms).

○ Grid: Entire image.

○ Global Memory: Used for large datasets.

● Execution Time (CPU vs. GPU)

● CUDA significantly reduces processing time.

● Extend to Deep Learning-based image processing (CNNs, GANs).