Parallel Computational Algorithms
1. Prefix Computation
• Parallel algorithms organize an application’s computational work so that multiple parts of the workload can be performed concurrently.
• This can reduce the time to solution and increase performance.
• A prefix computational algorithm processes data sequentially to compute prefix sums.
• Mathematically, the prefix sum array P of A is defined as,
P[i]=A[0]+A[1]+⋯+A[i] for all i∈[0,n−1], where n is the size of the array.
Example:
Applications:
Input:
1.Cumulative Sums in data analysis.
A=[1,2,3,4,5]. 2.Parallel Algorithms for GPUs and multi-
Output: threaded systems.
Prefix Sums: P=[1,3,6,10,15]. 3.Range Queries in segment trees.
Sum(L,R) = prefix(R)-Prefix(L-1) 4.Scan Operations in functional
Sum(2,4) = 15 – 3 = 12 programming.
Algorithm Steps:
5.Image Processing (integral image
computations).
•
Initialize P[0]=A[0] Time Complexity: O(n) (linear time),
For each subsequent element i in A, where n is the size of the input array.
Note: O(n) + O(1).
P[i]=P[i−1]+A[i] •Space Complexity: O(n) for the output array P.
This ensures that P[i] holds the cumulative sum up to index i.
Examples:
• A = [2 4 5 6 10 20 30]
• B = [2 4 6 8 10]
• C = [3 4 8 9 14 23 25]
• D= [3 1 4 1 5 9 2]
• Prefix sum = [3 4 8 9 14 23 25]
• Index starts from 0.
• Prefix(L,R) = Prefix(R)-Prefix(L-1).
2D (Matrix):
Location 0,0 0,1 0,2 1,0 1,1 1,2 2,0 2,1 2,2
Index 0 1 2 3 4 5 6 7 8
Elements 7 2 1 3 5 2 4 6 8
sum 7 9 10 13 18 20 24 30 38
2. Matrix-Vector
• Matrix-vector multiplication is a
fundamental operation, which is
used in linear algebra, and machine
learning.
Matrix-Matrix Multiplication
• Matrix-matrix multiplication involves multiplying two matrices A (of size m×n) and B (of
size n×p). The result is a matrix C with dimensions m×p.
• Steps:
1.Take each row of the first matrix A and each column of the second matrix B.
2.Compute the dot product between the row of A and the column of B to get an entry in
the resulting matrix C.
3.Repeat this for every row-column pair.
3. Gaussian Elimination
• Gaussian elimination algorithm used for solving systems of linear equations.
• The basic concept is to subtract some scalar of an equation from another equation to
eliminate an unknown.
• It transforms a given system into an upper triangular matrix (row echelon form)
through a series of row operations, followed by back substitution to find the
solution.
• Example: x+y+z=2, x+2y+3z =5, 2x+3y+4z = 11
Forward Elimination
• Convert the matrix into an upper triangular form by eliminating elements below the pivot element (diagonal element).
• Perform row operations to make entries below the pivot zero.
Backward Substitution
• Solve for variables starting from the last row, moving upwards.
Parallel Computing in Gaussian Elimination (sequential computation)
• The algorithm performs O(n³) operations for an n×n matrix.
• It is computationally expensive for large datasets.
• Therefore, Parallel computing improves performance by distributing tasks across multiple processors.
Parallelization Strategies:
1. Row-wise Parallelism
1. Each row operation is independent and can be computed simultaneously by different processors.
2. Suitable for shared-memory systems where threads access the same matrix.
2. Column-wise Parallelism
1. Parallelize operations involving columns to perform computations on pivot elements simultaneously.
2. Often used in distributed-memory systems.
3. Block-based Decomposition
1. Divide the matrix into smaller submatrices (blocks) and operate on them concurrently.
2. Suitable for architectures like GPUs, which handle matrix blocks effectively.
4. Pipeline Parallelism
1. Break down tasks into stages and assign each stage to a processor.
2. Overlaps computation and communication for better utilization.
4. Vector Addition and Dot Product
• Due to the computational intensity of these operations for large datasets, parallel computing
techniques are employed to enhance performance.
Problem Statement
• Given two vectors and , compute their sum:
Parallel Algorithm
1. Input: Two vectors and of size .
2. Output: Resultant vector of size .
3. Parallel Steps:
1. Partition the vectors into segments, where is the number of processors available.
2. Assign each processor a subrange of indices to compute partial sums.
3. Aggregate results into the final output vector.
Discuss the complexity of the algorithm
Dot Product
Problem Statement
• Given two vectors and , compute their dot product
Parallel Algorithm
1. Input: Two vectors and of size .
2. Output: Scalar result .
3. Parallel Steps:
1. Partition vectors into segments.
2. Each processor computes partial products and sums them locally.
3. Perform a reduction operation to combine partial sums into the final result.
• Discuss the complexity of the algorithm
5. Parallel Computation of π
• Parallel computation of π (pi) involves distributing the
computation across multiple processors or threads to speed up
the process.
• Several algorithms can be adapted for parallel computation
including
• Monte Carlo Method
• Gregory–Leibniz Series
• Bailey–Borwein–Plouffe (BBP) Formula
• Gauss-Legendre Algorithm
• Chudnovsky Algorithm
Time Complexity: O(N*K)
Auxiliary Space: O(1)
Bailey–Borwein–Plouffe (BBP) Formula
• A digit-extraction algorithm for π:
• Each thread computes the sum for a specific range of k,
and combines the results.
• Discuss the complexity of the algorithm.
Trapezoidal Integration
• Trapezoidal integration is a numerical method used to approximate the definite
integral of a function.
• It divides the integration interval into smaller subintervals and approximates the
area under the curve using trapezoids.
Formula
For a function f(x) defined on the interval [a,b], the Trapezoidal Rule with n
subintervals is defined as,
Implementation Steps:
1. Divide the interval [a,b][a, b][a,b] into n subintervals.
2. Calculate the height h.
3. Evaluate the function at the endpoints and intermediate
points.
4. Sum the areas of the trapezoids formed.
Example:
• Compute the integral of f(x)=x2 on [0,1]using 4 subintervals
(n=4).
• Assignment problem: Use a parallel algorithm to compute the
trapezoidal integration of f(x)=x2 over the interval [0, 2] with
two subintervals.
TAsk1:
• Using the trapezoidal rule, calculate the integral of f(x)=x 3 over the interval [1, 3] using four
subintervals in parallel. Show the computations for each segment.
Mandelbrot set computation
To compute the Mandelbrot set in parallel,
• Identify which parts of the computation can be calculated independently.
• Split the work across multiple cores.
• Use dynamic scheduling instead of static scheduling.
• Example for implementation of the algorithm please refer to the following link
https://github.com/philipzhux/parallel-mandelbrot-compute/tree/master/src
Pointer Jumping Techniques:
• Pointer jumping (also known as path doubling or pointer doubling) is a common technique used to solve
problems involving linked structures, such as
• List Ranking
• Linked List Parallel Prefix
• Euler Tour
• It is particularly useful in parallel algorithms and graph theory applications, such as finding ancestors, shortest
paths, or connected components.
Why Pointer Jumping technique:
• Instead of traversing elements one step at a time, pointer jumping jumps multiple steps in each iteration
by updating pointers. to point further along the path.
• This exponentially reduces the number of iterations required, making traversal faster.
Applications:
Finding the Root or Representative in Disjoint Sets (Union-Find Algorithm)
• Used in disjoint-set union-find data structures to maintain and query connected components
efficiently.
• Pointer jumping helps flatten the tree structure during path compression for faster queries.
Lowest Common Ancestor (LCA) Problem
• Pointer jumping is used to precompute ancestors at powers of 2 (binary lifting).
• This allows querying ancestors in logarithmic time.
Successor Queries in Linked Lists
• Speeds up successor finding by skipping intermediate nodes.
Tree Algorithms
• Used for parent pointer doubling in tree traversal or processing.
• Finds ancestors or computes depths quickly.
Parallel Algorithms
Pointer jumping is useful in parallelizing computations, such as list ranking and spanning trees.
1. List Ranking
• Assignment
Euler Tour
• Example
• DFS traversal: [1, 2, 4, 4, 2, 5, 5, 2, 1, 3, 6, 6, 3, 7, 7, 3,
1]
• Discuss complexity.
Linked List Parallel Prefix
• Assignment