CSC 334 – Parallel and Distributed Computing
Instructor: Ms. Muntha Amjad
Lecture# 05: Parallel Algorithms
1
Problem
• Assume we have a 100-inch sandwich to feed 8 people.
• We know how many inches each person wants.
• [6 4 16 10 16 14 2 8]
• How do we cut the sandwich quickly? How much will be left?
• Method 1: cut the sections sequentially: 6 inches first, 4 inches second, 16
inches third, etc.
• Method 2: calculate Prefix sum and cut in parallel
• [6, 10, 26, 36, 52, 66, 68, 76] (24 inches left) 2
Prefix Sum (Scan)
• Given an array X, compute output array Y as follows:
for
Using recursive definition
Y[0] = X[0];
• Implementation: for (i = 1; i < N; i++)
Y[i] = Y[i-1] + X[i];
• The above is an inclusive prefix sum since includes X.
• Fo an exclusive prefix sum perform the summation for . Y[i] excludes X[i], i.e.,
sum from index 0 to i - 1.
3
• It is easy to see that prefix sums can be computed sequentially in O(N) time
Naïve Inclusive Parallel Prefix Sum
• Assign one thread to calculate each element that adds up all X elements
needed for that Y element.
• Implementation:
#pragma omp parallel for
for (i = 0; i < N; i++){
Y[i] = 0;
for (j = 0; j <= i; j++)
Y[i] += X[j];
}
4
Naïve Inclusive Parallel Prefix Sum
10 26 36 52 66 68 76
6 4
10 16 26 10 36 16 52 14 66 2 68 8
6 4 52 14 66 2
10 16 26 10 36 16
6 4 36 16 52 14
10 16 26 10
6 4 26 10 36 16
10 16
6 4 10 26 10
16
6 4
10 16
6 4
5
Naïve Inclusive Parallel Prefix Sum
Observations:
6
Naïve Inclusive Parallel Prefix Sum
Best-Case Parallel Time:
The time is still O(N) — same as the sequential version! You used lots of processors, but got no
speedup. Parallel programming is easy as long as you don’t care about performance.
Parallel Prefix Sum
• How to improve efficiency?
• Build a balanced binary tree on the input data and sweep it to and from the root
• Traverse up from leaves to root building partial sums at internal nodes in the tree
• Root holds sum of all leaves
• Traverse back down the tree building the scan from the partial sums
8
Parallel Prefix Sum: Upward Sweep (Bottom-up)
76
for (i = 0; i < n - 1; i++)
Y[i] = X[i]; 36 40
for (d = 1; d < n; d *= 2) {
#pragma omp parallel for 10 26 30 10
for (int i = 0; i < n; i += 2 * d) {
Y[i + 2 * d - 1] += Y[i + d - 1];
} 6 4 16 10 16 14 2 8
}
Step 6 4 16 10 16 14 2 8
d =1 6 10 16 26 16 30 2 10
d=2 6 10 16 36 16 30 2 40
9
d=4 6 10 16 36 16 30 2 76
Parallel Prefix Sum: Downward Sweep (Top-down)
0
Y[n-1] = 0;
for (d = n / 2; d >= 1; d /= 2) {
#pragma omp parallel for 0 36
for (int i = 0; i < n; i += 2 * d) {
int left = i + d - 1;
int right = i + 2 * d - 1; 0 10 36 66
temp = Y[left];
Y[left] Y[right];
Y[right] temp; 0 6 10 26 36 52 66 68
}
}
Step 6 10 16 36 16 30 2 0
d=4 6 10 16 0 16 30 2 36
d=2 6 0 16 10 16 36 2 66 10
d=1 0 6 10 26 36 52 66 68
Parallel Prefix Sum: Inclusive
for (i = 0; i < n; i++) { X 6 4 16 10 16 14 2 8
Y[i] += X[i];
}
Y 0 6 10 26 36 52 66 68
Updated Y 6 10 26 36 52 66 68 76
11
Parallel Prefix Sum: Inclusive
Observations:
12
Applications of Prefix Sum
• Parallel Prefix Sum has several applications that go way beyond computing the
• sum of array elements. A key primitive in many parallel algorithms to convert
serial computation into parallel computation
• Radix Sort
• String Comparison
• Lexical Analysis
• Stream Compaction
• Polynomial Evaluation
• Solving Recurrences
• Tree Operations
• Histograms
• Assigning space in farmers market
13
• Allocating memory to parallel threads
• Allocating memory buffer for communication channels
• Frequently used for parallel work assignment and resource allocation