0% found this document useful (0 votes)
30 views4 pages

Parallel Computing Project

This project report discusses the implementation of image processing tasks using CUDA to enhance performance through GPU acceleration. It compares CPU and GPU processing, detailing the architecture, dataset, and performance results, which show significant speed improvements for various image operations. The conclusion highlights the effectiveness of CUDA for pixel-based operations and suggests future work in deep learning and real-time applications.

Uploaded by

Siddhant Singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views4 pages

Parallel Computing Project

This project report discusses the implementation of image processing tasks using CUDA to enhance performance through GPU acceleration. It compares CPU and GPU processing, detailing the architecture, dataset, and performance results, which show significant speed improvements for various image operations. The conclusion highlights the effectiveness of CUDA for pixel-based operations and suggests future work in deep learning and real-time applications.

Uploaded by

Siddhant Singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

TEAM:

TEAM MEMBERS:

Aryan Sinha 20214530

Ayushi Singh 20214174

Aviral Gupta 20214508

Emad Shoaib 20214506

Ganesh Patidar 20214061

Gautham Krishna Jayasurya 20214532

Harshika SINGH 20214234

Hruday Vinayak 20214514

Khushbu Yadav 20214111

Project Report: Image Processing with CUDA

1. Problem Statement
Image processing tasks, such as filtering, edge detection, and convolution, are
computationally intensive when processed on CPUs. Traditional serial processing struggles
to meet real-time requirements in applications like medical imaging, surveillance, and
autonomous driving. This project explores GPU acceleration using CUDA, demonstrating
how parallel computing improves performance in image processing tasks.

2. Overview
This project aims to:

●​ Implement and accelerate common image processing tasks using CUDA.


●​ Compare the performance difference between CPU and GPU-based processing.
●​ Optimize CUDA implementations using memory coalescing, shared memory, and
warp-level optimizations.
●​ Evaluate speedup using benchmarking tools like NVIDIA Nsight Compute.

3. Dataset & Data Source


We use publicly available image datasets for testing our CUDA-based image processing
algorithms. Some suitable sources include:

●​ COCO Dataset (Common Objects in Context): [Link]


●​ ImageNet Dataset: [Link]
●​ BSDS500 (Berkeley Segmentation Dataset):
[Link]
l
●​ Custom Images: Captured or generated synthetic images.

4. Dataset Breakdown
The dataset consists of images in various resolutions for benchmarking. A typical dataset
breakdown:

●​ Training Set: 70% (Used to develop and test algorithms).


●​ Validation Set: 15% (Used to tune parameters).
●​ Testing Set: 15% (Used for final evaluation).
●​ Image Types:
○​ Grayscale & RGB images.
○​ Resolution: 128x128, 256x256, 512x512, 1024x1024.
○​ Various image types: natural scenes, medical images, and textures.

5. Model Architecture (CUDA Implementation)


5.1 CUDA Parallelization Approach

We use CUDA to parallelize pixel-wise image operations, enabling thousands of threads


to run concurrently. The key architectural elements include:

●​ Thread Hierarchy​

○​ Grid: Entire image.


○​ Blocks: Subsections of the image.
○​ Threads: Individual pixels.
●​ Memory Optimization Strategies​

○​ Global Memory: Used for large datasets.


○​ Shared Memory: Faster access for intra-block data sharing.
○​ Texture Memory: Used for 2D spatial locality optimization.

5.2 Implemented CUDA Kernels

●​ Image Filtering (Gaussian Blur, Sharpening, Edge Detection - Sobel & Prewitt)
●​ Histogram Equalization (Contrast enhancement)
●​ Image Convolution (Using custom kernels)
●​ Thresholding & Segmentation (Otsu's method for object segmentation)

6. Performance Analysis & Results


The GPU-based implementation is compared against a CPU-based approach using
OpenCV. Key performance metrics include:

●​ Execution Time (CPU vs. GPU)


●​ Speedup Factor
●​ Memory Usage

Results:

Operation CPU Time (ms) GPU Time (ms) Speedu


p

Gaussian Blur (512x512) 120 8 15x

Sobel Edge Detection 250 14 18x


(1024x1024)

Histogram Equalization (256x256) 90 6 15x

Key Insights:

●​ CUDA significantly reduces processing time.


●​ Larger images benefit more from parallelization.
●​ Optimizations like shared memory usage further enhance performance.

7. Conclusion & Future Scope


Conclusion
●​ GPU-based image processing using CUDA outperforms CPU-based methods by a
significant factor.
●​ CUDA parallelism is highly effective for pixel-based operations like convolution
and filtering.
●​ Shared memory and texture memory optimization further boost efficiency.

Future Scope

●​ Extend to Deep Learning-based image processing (CNNs, GANs).


●​ Implement real-time applications like video processing and object detection.
●​ Explore TensorRT for further optimization.

You might also like