0% found this document useful (0 votes)
8 views10 pages

Bert 3a Parallel Algorithms Distributed Memory

The document discusses algorithms for distributed memory systems, contrasting them with the PRAM model where all processors share memory. It highlights the importance of task interaction graphs for unstructured problems and the challenges of graph partitioning, which is NP-hard and aims to minimize communication costs while balancing workloads. Several partitioning strategies, including BFS and more sophisticated methods, are mentioned as heuristics to tackle this complex problem.

Uploaded by

ewnetdbu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views10 pages

Bert 3a Parallel Algorithms Distributed Memory

The document discusses algorithms for distributed memory systems, contrasting them with the PRAM model where all processors share memory. It highlights the importance of task interaction graphs for unstructured problems and the challenges of graph partitioning, which is NP-hard and aims to minimize communication costs while balancing workloads. Several partitioning strategies, including BFS and more sophisticated methods, are mentioned as heuristics to tackle this complex problem.

Uploaded by

ewnetdbu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Algorithms with

distributed memory
Introduction
Distributed vs. PRAM
In the PRAM model we assumed all processors were
connected to the same memory.
A more common architecture is that each processor has its
own memory.
Connection network

P P P P P P
M M M M M M

Now concepts like reduction, broadcast, all-to-all broadcast


become relevant.
We will have communication costs of the shape
𝑡𝑠 𝑎 + 𝑡𝑤 𝑏, as we have seen in a previous lesson.
This will make the analysis for linear scaling more involved
Structured problems
• A well-structured problem, in which the
interactions between (blocks of) data are well-
structured, might allow for a simple mapping to
different processors.
• We will discuss this in the case of matrix-vector and
matrix-matrix multiplication.
Unstructured problems
• A more general problem is not per se structured
and can have any shape.
• We would then map different tasks to processers
with help of a task interaction graph.
• The problem then consists of partitioning the taks
interaction graph in a good way.
• Our goal is partitioning as to have maximal locality.
• In general this is an NP-hard problem.
Graph partitioning
Partitioning of a graph
• The problem of distributing tasks over processors
can be interpreted as the problem of partitioning a
graph.

• Each node represents some computational work,


and each edge represents some communication.
Graph partitioning
The optimization problem has two goals
• Minimize the number of edge cuts by the partitioning.
• Try to divide the nodes evenly.
Abstractly:
Input: Graph G = (V, E), number of partitions P
Output: Partition V=V0 ∪ V1 ∪ V2 ∪ ⋯ ∪ VP-1 such that
1) {Vi} are disjoint
2) Work is balanced, if each node represents the same
amount of work this: |Vi|≈|Vj|
3) Let Ecut={(u,v)|u∈Vi, v∈Vj, i≠j}, minimize |Ecut|
NP-hard, so solved heuristically!
Graph partitioning via BFS
Run BFS from any vertex
Stop when about ½ vertices are visited, assign
visited to one partition, unvisited to the other.
Repeat for each partition (divide-and-conquer) until
you have 𝑃 partitions.

BFS
Graph partitioning
• BFS assures connectedness of the resulting
partition.
• More sophisticated partitioning strategies:
• Coarsen, partition, refine.
• Maximal and maximum matchings
• Spectral partitioning
• Kernighan Lin algorithm: find local improvements
Overview
• When having distributed memory, we need to map
such that the number of interactions is minimized.
• Matrix calculations tend to be well-structured (the task
interaction graph is often just a mesh), at least in the
case of dense matrices!
This allows for a simple way to map.
• In general this is not the case and you would need to
take the task interaction graph into account and
partition it.
• This is a difficult problems, but there exist several heuristics.
• One heuristic would be graph partitioning via BFS.

You might also like