Experiment 2
Title: Parallel Computation with OpenMP: Summation Techniques and Matrix-Vector
Multiplication
Aim/Objective:
Illustrate parallel computation techniques using OpenMP for two fundamental algorithms—
summation and matrix-vector multiplication—aiming to enhance performance through
concurrent processing.
Description:
Implement OpenMP-based programs for parallel summation and matrix-vector multiplication,
exploring various OpenMP directives and strategies to optimize computation speed and
scalability.
Pre-Requisites:
Proficiency in a language supporting OpenMP (e.g., C/C++), understanding of OpenMP
directives, familiarity with array and matrix manipulation, and a foundational grasp of parallel
programming concepts including thread management, synchronization, and parallelization
strategies.
Pre-Lab:
1. What is the primary objective of utilizing OpenMP in parallel summation techniques?
improve computational efficiency and performance by distributing the workload of
summation across multiple threads.
1.Dividing Workload Across Threads
2 .Leveraging Multicore Processors
3. Simplifying Parallel Programming
4. Reducing Computational Bottlenecks: A 'computational bottleneck' refers to a point in
an algorithm where the computational demand is significantly high, causing a slowdown in
the overall process.
5. Maintaining Scalability:
2. Briefly discuss one advantage of using OpenMP for parallel summation techniques
compared to sequential approaches.
reduced execution time.
3 In the context of matrix-vector multiplication, define the term "data parallelism" and explain
how OpenMP leverages this concept for parallel computation.
Data parallelism refers to the simultaneous execution of the same operation on different pieces of data.
In the context of matrix-vector multiplication, this means distributing the computation of individual
rows of the matrix (each contributing to a single element of the result vector) across multiple threads or
processing units.
How OpenMP Leverages Data Parallelism:
Dividing the Rows of the Matrix Among Threads
Parallelizing the Loop: Using the #pragma omp parallel for
Managing Workload Distribution
Reducing Overhead
4. Explain the role of load balancing in the context of parallel matrix-vector multiplication
using OpenMP, and why it is crucial for optimizing performance.
Load balancing refers to the even distribution of computational tasks among the available threads or
processing units in a parallel system.
Why Load Balancing is Crucial for Optimizing Performance
Minimizing Idle Time
Reducing Execution Time
Improving Scalability
Mitigating(reducing) Overhead
5. Name one key OpenMP directive used in the parallelization of computations. Provide a
brief description of its purpose?
One key OpenMP directive is #pragma omp parallel.
In-Lab:
1. Implement Parallel Summation using OMP - The Array Element Sum Problem, Tree
structure global sum - Parallel-tree-sum.c
o Program:
Aim: Implement Parallel Summation using OMP - The Array Element Sum Problem .c
Source code:
#include <stdio.h>
#include <omp.h>
int main() {
int n = 10; // Number of elements in the array
int arr[] = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10}; // Array elements
int sum = 0; // Shared variable to store the result
// Parallel region with reduction to compute the sum
#pragma omp parallel for reduction(+:sum)
for (int i = 0; i < n; i++) {
sum += arr[i]; // Add each element to the shared sum
}
printf("The sum of the array elements is: %d\n", sum);
return 0;
}
Output:
Program Overview
1. Array Initialization:
2. The array arr has 10 elements: {1, 2, 3, 4, 5, 6, 7, 8, 9, 10}.
3. Parallelization: The #pragma omp parallel for directive is used to parallelize the for
loop, and the reduction(+:sum) clause ensures that the sum is correctly computed in parallel
without race conditions.
4. Goal: Compute the sum of the elements in the array, i.e., 1 + 2 + 3 + ... + 10 = 55.
Tracing:
------------------------------------------------------------------------------------
Aim: Tree structure global sum
Source code:
#include <stdio.h>
#include <omp.h>
int main() {
int n = 16; // Number of elements in the array
int arr[] = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16};
int sum = 0; // Variable to store the global sum
omp_set_num_threads(4); // Set the number of threads to 4
int num_threads;
int local_sums[4] = {0}; // Array to hold partial sums for each thread
// Parallel region with thread-local partial sums
#pragma omp parallel
{
int tid = omp_get_thread_num(); // Thread ID
int chunk_size = n / omp_get_num_threads();
int start = tid * chunk_size; // Start index for this thread
int end = start + chunk_size; // End index for this thread
// Compute the partial sum for this thread
for (int i = start; i < end; i++) {
local_sums[tid] += arr[i];
}
// Synchronize threads before the final reduction
#pragma omp barrier
// Combine partial sums in a tree-like fashion
#pragma omp single
{
num_threads = omp_get_num_threads();
for (int step = 1; step < num_threads; step *= 2) {
for (int i = 0; i + step < num_threads; i += 2 * step) {
local_sums[i] += local_sums[i + step];
}
}
}
}
// The final sum is stored in local_sums[0]
sum = local_sums[0];
printf("The sum of the array elements is: %d\n", sum);
return 0;
}
Output:
Explanation
1. Initialization:
a. The array has 16 elements.
b. We set the number of threads to 4, so each thread handles 4 elements.
2. Thread Work:
a. Each thread computes the partial sum of its assigned chunk of the array:
i.Thread 0: arr[0..3]
ii.Thread 1: arr[4..7]
iii.Thread 2: arr[8..11]
iv.Thread 3: arr[12..15]
3. Tree-Structured Reduction:
a. After all threads compute their local sums, the results are combined in a tree-like
fashion:
i.Step 1: Combine sums from adjacent threads (0+1, 2+3).
ii.Step 2: Combine the results from the previous step (0+2).
b. This reduction process minimizes contention and is efficient for large arrays.
Tracing
-------------------------------------------------------------------------------------
Aim:Parallel-Tree-Sum.c
Source code:
#include <omp.h>
#include <stdio.h>
#include <stdlib.h>
#define SIZE 16 // Size of the array (power of 2 for simplicity)
int main() {
int array[SIZE];
int i;
// Initialize the array with values 1 to SIZE
for (i = 0; i < SIZE; i++) {
array[i] = i + 1;
}
printf("Array elements: ");
for (i = 0; i < SIZE; i++) {
printf("%d ", array[i]);
}
printf("\n");
int sum = 0; // Final global sum
int step = 1; // Step size for tree reduction
// Perform the tree-based parallel sum
#pragma omp parallel
{
while (step < SIZE) {
#pragma omp for
for (i = 0; i < SIZE; i += 2 * step) {
array[i] += array[i + step];
}
#pragma omp barrier
step *= 2; // Move to the next level of the tree
}
}
// The total sum is stored in the first element
sum = array[0];
printf("Total Sum: %d\n", sum);
return 0;
}
Output
For SIZE = 16, the array elements are [1, 2, 3, ..., 16]. The output would look like this:
Explanation of the Code
use a binary tree structure to progressively reduce pairs of elements, eventually summing the entire
array.
Tracing
2. Implement a program to demonstrate Parallel prefix sum using OpenMP - OMP Parallel
prefix sumfinal.c
The prefix sum (or scan) is a common operation in parallel computing, where each element in an array
is replaced by the sum of all the previous elements in the array, including the element itself.
• Program:
#include <stdio.h>
#include <omp.h>
#define SIZE 16 // Size of the array (power of 2 for simplicity)
int main() {
int arr[SIZE];
int prefix_sum[SIZE]; // Array to hold the prefix sum
int i;
// Initialize the array with values 1 to SIZE
for (i = 0; i < SIZE; i++) {
arr[i] = i + 1;
}
// Print the original array
printf("Original array: ");
for (i = 0; i < SIZE; i++) {
printf("%d ", arr[i]);
}
printf("\n");
// Prefix sum computation using parallel approach
#pragma omp parallel
{
int tid = omp_get_thread_num(); // Thread ID
int step;
// Copy the input array to the prefix sum array
#pragma omp for
for (i = 0; i < SIZE; i++) {
prefix_sum[i] = arr[i];
}
// Perform the parallel prefix sum in a tree-like reduction fashion
for (step = 1; step < SIZE; step *= 2) {
// Parallel step where each thread computes its part of the prefix sum
#pragma omp for
for (i = step; i < SIZE; i++) {
prefix_sum[i] += prefix_sum[i - step]; // Add the value from step distance
}
}
}
// Print the prefix sum array
printf("Prefix sum: ");
for (i = 0; i < SIZE; i++) {
printf("%d ", prefix_sum[i]);
}
printf("\n");
return 0;
}
Final Output
Explanation of Output