KWAME NKRUMAH UNIVERSITY OF SCIENCE AND
TECHNOLOGY, KUMASI
Faculty of Physical and Computational Sciences
Department of Mathematics
Parallel Computing Project Report
(MPI)
Date: August 24, 2025
Contents
1 Background and Motivation 2
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Key Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2 Methodology 3
2.1 Mathematical Foundation . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.2 Parallel Strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
3 System Design and Implementation 4
3.1 Design Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
3.2 Implementation Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . 4
4 Experimental Setup and Observations 5
5 Evaluation and Insights 6
5.1 Performance Trends . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
5.2 Scalability Assessment . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
6 Conclusion and Future Work 7
7 References 8
1
1 Background and Motivation
1.1 Introduction
This work investigates the use of Message Passing Interface (MPI) in accelerating vector
computations. The primary focus is the dot product of two vectors, a key operation
in scientific and engineering workloads. By distributing computations among multiple
processes, we aim to demonstrate both performance gains and scalability.
1.2 Key Concepts
Dot Product: The dot product of two vectors produces a scalar and is widely used in
numerical simulations, machine learning, and data analysis.
Parallel Execution: To leverage modern multi-core and multi-node systems, the
operation is parallelized with MPI, which allows flexible communication and workload
distribution, even when partition sizes differ.
2
2 Methodology
2.1 Mathematical Foundation
For vectors A[i] and B[i] of length N ,
N
X
R= A[i] · B[i] (2.1)
i=1
This direct computation is O(N ).
2.2 Parallel Strategy
The approach divides the data across MPI processes:
1. Initialize MPI and determine process ranks and total processes.
2. Partition arrays A and B into suitable chunks (not necessarily equal).
3. Each process computes its local dot product.
4. Results are combined at the root process using MPI Reduce.
3
3 System Design and Implementation
3.1 Design Overview
The design prioritizes load balance and efficient communication. Unequal data sizes are
supported via MPI Scatterv, which assigns tailored portions of vectors to each process.
3.2 Implementation Workflow
Initialization: MPI environment setup, rank identification, and allocation of local
arrays.
Distribution: The root process divides input vectors and uses MPI Scatterv for
communication.
Local Computation: Each process performs its assigned dot product segment.
Aggregation: Local results are reduced to a single global result.
Termination: Root process reports execution time; memory is freed; MPI is final-
ized.
4
4 Experimental Setup and Observations
A test case with vector size 2.5 × 108 was executed under varying numbers of processes.
Table 4.1 summarizes recorded runtimes.
Table 4.1: Execution Time under Different Core Counts
Vector Size Processes Runtime (s)
250000000 1 0.6697
250000000 2 0.4596
250000000 3 0.2876
250000000 4 0.2502
250000000 5 0.1808
250000000 6 0.1404
250000000 7 0.1096
250000000 8 0.0853
5
5 Evaluation and Insights
5.1 Performance Trends
Execution time consistently declined with additional processes, confirming the effective-
ness of parallel decomposition. The flexibility of MPI Scatterv ensured that uneven data
partitioning did not hinder performance.
5.2 Scalability Assessment
The speedup S(n) is defined as:
T (1)
S(n) = (5.1)
T (n)
where T (1) is the runtime with one process and T (n) is with n processes. Experimental
results demonstrate near-linear improvements up to 8 processes.
6
6 Conclusion and Future Work
This project confirms that MPI can significantly accelerate vector operations while han-
dling both uniform and irregular data distribution effectively. Future studies could exam-
ine hybrid MPI+OpenMP implementations, adaptive data partitioning, or experiments
on distributed cluster environments.
7
7 References
8
Bibliography
[1] MPI Forum, MPI Standard Documentation, https://www.mpi-forum.org/docs/.
[2] M. J. Quinn, Parallel Programming in C with MPI and OpenMP.
[3] V. Eijkhout, Introduction to High-Performance Scientific Computing.