0% found this document useful (0 votes)
5 views10 pages

Project Report Restructured

The report discusses the use of Message Passing Interface (MPI) to accelerate vector computations, specifically focusing on the dot product of two vectors. It details the methodology, system design, and implementation, demonstrating performance gains and scalability through experimental results. The findings indicate that MPI effectively handles both uniform and irregular data distributions, with suggestions for future research on hybrid implementations and adaptive data partitioning.

Uploaded by

Philip Owusu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views10 pages

Project Report Restructured

The report discusses the use of Message Passing Interface (MPI) to accelerate vector computations, specifically focusing on the dot product of two vectors. It details the methodology, system design, and implementation, demonstrating performance gains and scalability through experimental results. The findings indicate that MPI effectively handles both uniform and irregular data distributions, with suggestions for future research on hybrid implementations and adaptive data partitioning.

Uploaded by

Philip Owusu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

KWAME NKRUMAH UNIVERSITY OF SCIENCE AND

TECHNOLOGY, KUMASI
Faculty of Physical and Computational Sciences
Department of Mathematics

Parallel Computing Project Report


(MPI)

Date: August 24, 2025


Contents

1 Background and Motivation 2


1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Key Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

2 Methodology 3
2.1 Mathematical Foundation . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.2 Parallel Strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

3 System Design and Implementation 4


3.1 Design Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
3.2 Implementation Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . 4

4 Experimental Setup and Observations 5

5 Evaluation and Insights 6


5.1 Performance Trends . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
5.2 Scalability Assessment . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

6 Conclusion and Future Work 7

7 References 8

1
1 Background and Motivation

1.1 Introduction
This work investigates the use of Message Passing Interface (MPI) in accelerating vector
computations. The primary focus is the dot product of two vectors, a key operation
in scientific and engineering workloads. By distributing computations among multiple
processes, we aim to demonstrate both performance gains and scalability.

1.2 Key Concepts


Dot Product: The dot product of two vectors produces a scalar and is widely used in
numerical simulations, machine learning, and data analysis.
Parallel Execution: To leverage modern multi-core and multi-node systems, the
operation is parallelized with MPI, which allows flexible communication and workload
distribution, even when partition sizes differ.

2
2 Methodology

2.1 Mathematical Foundation


For vectors A[i] and B[i] of length N ,
N
X
R= A[i] · B[i] (2.1)
i=1

This direct computation is O(N ).

2.2 Parallel Strategy


The approach divides the data across MPI processes:

1. Initialize MPI and determine process ranks and total processes.

2. Partition arrays A and B into suitable chunks (not necessarily equal).

3. Each process computes its local dot product.

4. Results are combined at the root process using MPI Reduce.

3
3 System Design and Implementation

3.1 Design Overview


The design prioritizes load balance and efficient communication. Unequal data sizes are
supported via MPI Scatterv, which assigns tailored portions of vectors to each process.

3.2 Implementation Workflow


ˆ Initialization: MPI environment setup, rank identification, and allocation of local
arrays.

ˆ Distribution: The root process divides input vectors and uses MPI Scatterv for
communication.

ˆ Local Computation: Each process performs its assigned dot product segment.

ˆ Aggregation: Local results are reduced to a single global result.

ˆ Termination: Root process reports execution time; memory is freed; MPI is final-
ized.

4
4 Experimental Setup and Observations

A test case with vector size 2.5 × 108 was executed under varying numbers of processes.
Table 4.1 summarizes recorded runtimes.

Table 4.1: Execution Time under Different Core Counts


Vector Size Processes Runtime (s)
250000000 1 0.6697
250000000 2 0.4596
250000000 3 0.2876
250000000 4 0.2502
250000000 5 0.1808
250000000 6 0.1404
250000000 7 0.1096
250000000 8 0.0853

5
5 Evaluation and Insights

5.1 Performance Trends


Execution time consistently declined with additional processes, confirming the effective-
ness of parallel decomposition. The flexibility of MPI Scatterv ensured that uneven data
partitioning did not hinder performance.

5.2 Scalability Assessment


The speedup S(n) is defined as:
T (1)
S(n) = (5.1)
T (n)
where T (1) is the runtime with one process and T (n) is with n processes. Experimental
results demonstrate near-linear improvements up to 8 processes.

6
6 Conclusion and Future Work

This project confirms that MPI can significantly accelerate vector operations while han-
dling both uniform and irregular data distribution effectively. Future studies could exam-
ine hybrid MPI+OpenMP implementations, adaptive data partitioning, or experiments
on distributed cluster environments.

7
7 References

8
Bibliography

[1] MPI Forum, MPI Standard Documentation, https://www.mpi-forum.org/docs/.

[2] M. J. Quinn, Parallel Programming in C with MPI and OpenMP.

[3] V. Eijkhout, Introduction to High-Performance Scientific Computing.

You might also like