0% found this document useful (0 votes)
23 views17 pages

Module 3

Uploaded by

arpith kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views17 pages

Module 3

Uploaded by

arpith kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

SMAG-J ENTERPRISES

Email: smagjinfo@[Link] | Phone: 9392680519

MODULE-3
A. MPI FUNCTIONS

MPI (Message Passing Interface) is a standardized and portable


message-passing system designed for parallel computing
architectures. It allows multiple processes (running on the same or
different machines) to communicate with one another by sending and
receiving messages.
Here’s a categorized list of important MPI functions, with brief
descriptions:

1. Initialization and Finalization

2. Process Information

1|Page
SMAG-J ENTERPRISES
Email: smagjinfo@[Link] | Phone: 9392680519

3. Point-to-Point Communication

4. Collective Communication

2|Page
SMAG-J ENTERPRISES
Email: smagjinfo@[Link] | Phone: 9392680519

5. Derived Datatypes

6. Miscellaneous Utilities

7. Communicators and Groups

3|Page
SMAG-J ENTERPRISES
Email: smagjinfo@[Link] | Phone: 9392680519

B. THE TRAPIZOID RULE IN MPI


The Trapezoidal Rule in MPI is a classic parallel programming
example to illustrate distributed computing using MPI (Message
Passing Interface). It's used for numerical integration by
approximating the area under a curve.

Problem Statement
Approximate the definite integral:

MPI Program Structure (in C)


#include <stdio.h>
#include <mpi.h>

double f(double x) {
return x * x; // Example function: f(x) = x^2
}

double trap_area(double local_a, double local_b, int local_n, double h)


{
double area;
double x;
int i;

area = (f(local_a) + f(local_b)) / 2.0;


for (i = 1; i < local_n; i++) {
x = local_a + i * h;

4|Page
SMAG-J ENTERPRISES
Email: smagjinfo@[Link] | Phone: 9392680519

area += f(x);
}
area = area * h;
return area;
}

int main(int argc, char** argv) {


int my_rank, comm_sz, n = 1024, local_n;
double a = 0.0, b = 1.0, h, local_a, local_b;
double local_int, total_int;

MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);
MPI_Comm_size(MPI_COMM_WORLD, &comm_sz);

h = (b - a) / n; // Step size
local_n = n / comm_sz; // Trapezoids per process

local_a = a + my_rank * local_n * h;


local_b = local_a + local_n * h;

local_int = trap_area(local_a, local_b, local_n, h);

// Reduce all local integrals into total integral at process 0


MPI_Reduce(&local_int, &total_int, 1, MPI_DOUBLE, MPI_SUM, 0,
MPI_COMM_WORLD);

if (my_rank == 0) {
printf("With n = %d trapezoids, integral from %.2f to %.2f = %.15f\n", n, a,
b, total_int);
}

MPI_Finalize();
return 0;
}

5|Page
SMAG-J ENTERPRISES
Email: smagjinfo@[Link] | Phone: 9392680519

Key MPI Concepts Used

Sample Output:
With n = 1024 trapezoids, integral from 0.00 to 1.00 =
0.333333333333333

C. DEALING WITH I/O


In MPI, dealing with I/O (Input/Output) efficiently is crucial for
performance, especially when working with large-scale parallel
applications. MPI provides multiple ways to handle I/O depending
on the need: from simple standard I/O to parallel file I/O using MPI
I/O (part of MPI-2 standard).

1. Standard I/O (Serial I/O)


This is the simplest form: only one process (often rank 0) performs
the file I/O.
if (rank == 0) {
FILE *fp = fopen("[Link]", "w");
fprintf(fp, "Result: %f\n", result);
fclose(fp);
}

6|Page
SMAG-J ENTERPRISES
Email: smagjinfo@[Link] | Phone: 9392680519

2. Collective I/O (Using MPI_Gather and then write)


Each process computes partial data, sends it to rank 0, which writes
to file.
// Each process computes `local_data`
double *recv_data = NULL;
if (rank == 0) recv_data = malloc(size * sizeof(double));

MPI_Gather(&local_data, 1, MPI_DOUBLE, recv_data, 1,


MPI_DOUBLE, 0, MPI_COMM_WORLD);

if (rank == 0) {
FILE *fp = fopen("gathered_output.txt", "w");
for (int i = 0; i < size; i++) {
fprintf(fp, "%f\n", recv_data[i]);
}
fclose(fp);
}

3. Parallel I/O with MPI-IO (Best for Performance)


MPI provides its own functions for reading/writing to files in
parallel.

7|Page
SMAG-J ENTERPRISES
Email: smagjinfo@[Link] | Phone: 9392680519

#include <stdio.h>
#include <mpi.h>

int main(int argc, char *argv[]) {


int rank, size;
MPI_File fh;
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &size);

char filename[] = "parallel_output.txt";


int data = rank * 10;
MPI_Offset offset = rank * sizeof(int);

MPI_File_open(MPI_COMM_WORLD, filename,
MPI_MODE_CREATE | MPI_MODE_WRONLY, MPI_INFO_NULL, &fh);
MPI_File_write_at(fh, offset, &data, 1, MPI_INT,
MPI_STATUS_IGNORE);
MPI_File_close(&fh);

MPI_Finalize();
return 0;
}

“Each process writes its own part of the file simultaneously – no


bottleneck!”

D. Collective communication
In MPI, collective communication refers to communication
operations that involve all processes in a communicator
(usually MPI_COMM_WORLD). These are high-level

8|Page
SMAG-J ENTERPRISES
Email: smagjinfo@[Link] | Phone: 9392680519

operations that simplify tasks like broadcasting, gathering,


scattering, and synchronizing data across processes.

Categories of Collective Communication

1. Broadcast
Sends data from one process (root) to all others.
MPI_Bcast(&data, 1, MPI_INT, root, MPI_COMM_WORLD);
Use when all processes need the same data.

2. Gather
Each process sends data to a root process.
MPI_Gather(&send_data, 1, MPI_INT, recv_data, 1, MPI_INT,
root, MPI_COMM_WORLD);
• send_data – data each process sends
• recv_data – array to store received data at root

3. Scatter
Root process sends chunks of data to all other processes.
MPI_Scatter(send_data, 1, MPI_INT, &recv_data, 1, MPI_INT,
root, MPI_COMM_WORLD);
9|Page
SMAG-J ENTERPRISES
Email: smagjinfo@[Link] | Phone: 9392680519

• Opposite of Gather
• Useful in parallel processing workloads

4. Allgather
Each process gathers data from all other processes (no root).
MPI_Allgather(&send_data, 1, MPI_INT, all_data, 1, MPI_INT,
MPI_COMM_WORLD);
Result: All processes end up with a complete copy.

5. Alltoall
Every process sends a unique piece of data to every other
process.
MPI_Alltoall(send_buf, 1, MPI_INT, recv_buf, 1, MPI_INT,
MPI_COMM_WORLD);
Use in matrix transposes or ring-based algorithms.

6. Reduction
Combines data from all processes using an operation like
MPI_SUM, MPI_MAX.
MPI_Reduce(&local_val, &global_sum, 1, MPI_INT,
MPI_SUM, root, MPI_COMM_WORLD);
Common Operations:
• MPI_SUM, MPI_PROD (product)
• MPI_MAX, MPI_MIN
• MPI_LAND, MPI_LOR (logical AND/OR)

7. Allreduce
Same as Reduce, but result is sent to all processes.
MPI_Allreduce(&local_val, &global_sum, 1, MPI_INT,
MPI_SUM, MPI_COMM_WORLD);

10 | P a g e
SMAG-J ENTERPRISES
Email: smagjinfo@[Link] | Phone: 9392680519

8. Barrier
Synchronizes all processes. No process continues until all
have reached the barrier.
MPI_Barrier(MPI_COMM_WORLD);
Use before/after time measurement or I/O consistency.

E. DERIVED DATATYPES
Already explained in MPI Functions
Few examples
1. MPI_Type_contiguous
MPI_Datatype conti_type;
MPI_Type_contiguous(4, MPI_INT, &conti_type);
MPI_Type_commit(&conti_type);
Used for sending 4 integers as a block.

2. MPI_Type_vector
Used for sending every nth element (e.g., matrix columns).
MPI_Datatype vector_type;
MPI_Type_vector(count, blocklength, stride, MPI_DOUBLE,
&vector_type);
MPI_Type_commit(&vector_type);
• count = number of blocks
• blocklength = how many elements in each block
• stride = spacing between start of each block

3. MPI_Type_indexed
For irregularly spaced data blocks.
int blocklengths[3] = {1, 2, 3};
int displs[3] = {0, 4, 7};
MPI_Datatype indexed_type;
MPI_Type_indexed(3, blocklengths, displs, MPI_INT,
&indexed_type);
MPI_Type_commit(&indexed_type);
11 | P a g e
SMAG-J ENTERPRISES
Email: smagjinfo@[Link] | Phone: 9392680519

4. MPI_Type_create_struct
Used for user-defined C structures.
typedef struct {
int id;
double grade;
} Student;

int blocklens[2] = {1, 1};


MPI_Aint displs[2];
MPI_Datatype types[2] = {MPI_INT, MPI_DOUBLE};
MPI_Datatype student_type;

displs[0] = offsetof(Student, id);


displs[1] = offsetof(Student, grade);

MPI_Type_create_struct(2, blocklens, displs, types, &student_type);


MPI_Type_commit(&student_type);
Now you can MPI_Send a whole Student struct!

Don’t Forget: Free the Type


MPI_Type_free(&your_type);

F. PERFORMANCE EVALUATION OF MPI


PROGRAMS

What is Performance Evaluation in MPI?


It helps us measure how fast and efficiently our MPI program runs
when we use multiple processes.

What Do We Measure?
1. Execution Time
How long the program takes to run.

12 | P a g e
SMAG-J ENTERPRISES
Email: smagjinfo@[Link] | Phone: 9392680519

2. Speedup
How much faster the program is with many processes.

3. Efficiency
How well the processes are used.

How to Measure Time in MPI?


Use MPI_Wtime():
double start = MPI_Wtime();
// Your computation here
double end = MPI_Wtime();

if (rank == 0)
printf("Time = %f seconds\n", end - start);

13 | P a g e
SMAG-J ENTERPRISES
Email: smagjinfo@[Link] | Phone: 9392680519

Means not all processes are being used effectively.

Why Performance May Drop?


• Too much communication between processes
• Uneven work distribution (some processes do more)
• Small problem size (not enough work for each process)

How to Improve Performance?


✅ Use fewer MPI calls
✅ Share data only when needed
✅ Use MPI_Scatter, MPI_Gather, and MPI_Reduce smartly
✅ Use non-blocking communication (MPI_Isend) to save
time
✅ Make sure each process does equal work

14 | P a g e
SMAG-J ENTERPRISES
Email: smagjinfo@[Link] | Phone: 9392680519

G. Parallel Sort-Algorithm
Parallel Odd-Even Transposition Sort
This is a commonly used comparison-based parallel sorting
algorithm that works well with MPI and process communication.

Idea
• Each process has a portion of the full array.
• Processes compare and swap data with their neighbors in alternating
phases:
o Even phase: rank 0 with 1, 2 with 3, ...
o Odd phase: rank 1 with 2, 3 with 4, ...
Repeat for P iterations (where P = number of processes).

Structure
1. Distribute the array to all processes (use MPI_Scatter)
2. Each process sorts its own part (qsort)
3. Loop for P phases:
o Exchange border values with neighbor (using MPI_Sendrecv)
o Merge & keep either min or max values
4. Collect sorted parts (use MPI_Gather)

MPI C Code Example (Simplified)


#include <stdio.h>
#include <stdlib.h>
#include <mpi.h>

void merge(int *local, int *recv, int n, int keep_small) {


int *temp = malloc(2 * n * sizeof(int));
int i = 0, j = 0, k = 0;

for (int t = 0; t < 2 * n; t++) {


if (i < n && (j >= n || local[i] <= recv[j])) temp[k++] = local[i++];
else temp[k++] = recv[j++];
}

15 | P a g e
SMAG-J ENTERPRISES
Email: smagjinfo@[Link] | Phone: 9392680519

if (keep_small)
for (i = 0; i < n; i++) local[i] = temp[i];
else
for (i = 0; i < n; i++) local[i] = temp[i + n];

free(temp);
}

int compare(const void *a, const void *b) {


return (*(int*)a - *(int*)b);
}

int main(int argc, char **argv) {


int rank, size, N = 8;
int data[] = {34, 7, 23, 32, 5, 62, 1, 45}; // Array to sort

MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &size);

int local_n = N / size;


int *local_data = malloc(local_n * sizeof(int));

MPI_Scatter(data, local_n, MPI_INT, local_data, local_n, MPI_INT,


0, MPI_COMM_WORLD);

qsort(local_data, local_n, sizeof(int), compare); // Local sort

for (int phase = 0; phase < size; phase++) {


int partner;
if (phase % 2 == 0) {
partner = (rank % 2 == 0) ? rank + 1 : rank - 1;
} else {
partner = (rank % 2 == 0) ? rank - 1 : rank + 1;
}

16 | P a g e
SMAG-J ENTERPRISES
Email: smagjinfo@[Link] | Phone: 9392680519

if (partner >= 0 && partner < size) {


int *recv_data = malloc(local_n * sizeof(int));
MPI_Sendrecv(local_data, local_n, MPI_INT, partner, 0,
recv_data, local_n, MPI_INT, partner, 0,
MPI_COMM_WORLD, MPI_STATUS_IGNORE);

if (rank < partner)


merge(local_data, recv_data, local_n, 1); // keep smaller
else
merge(local_data, recv_data, local_n, 0); // keep larger

free(recv_data);
}
}

MPI_Gather(local_data, local_n, MPI_INT, data, local_n, MPI_INT,


0, MPI_COMM_WORLD);

if (rank == 0) {
printf("Sorted array:\n");
for (int i = 0; i < N; i++) printf("%d ", data[i]);
printf("\n");
}

free(local_data);
MPI_Finalize();
return 0;
}

Output (for 8 elements)


Sorted array:
1 5 7 23 32 34 45 62

17 | P a g e

You might also like