Parallel and Distributed
Programming
Dr. Muhammad Naveed Akhtar
Lecture – 01
Why Parallel Computing
Parallel and Distributed Programming (Dr. M. Naveed Akhtar) 2
Roadmap
• Why we need ever-increasing performance.
• Why we’re building parallel systems.
• Why we need to write parallel programs.
• How do we write parallel programs?
• What we’ll be doing.
• Concurrent, parallel, distributed!
Parallel and Distributed Programming (Dr. M. Naveed Akhtar) 3
Changing times
• From 1986 – 2002, microprocessors were speeding like a rocket, increasing in
performance an average of 50% per year.
• Since then, it’s dropped to about 20% increase per year.
An intelligent solution
• Instead of designing and building faster microprocessors, put multiple
processors on a single integrated circuit.
Parallel and Distributed Programming (Dr. M. Naveed Akhtar) 4
Now it’s up to the programmers
• Adding more processors doesn’t help much if programmers aren’t aware of them…
• … or don’t know how to use them.
• Serial programs don’t benefit from this approach (in most cases).
Parallel and Distributed Programming (Dr. M. Naveed Akhtar) 5
Why we need ever-increasing performance
• Computational power is increasing, but so are our computation problems and needs.
• Problems we never dreamed of have been solved because of past increases, such as decoding the
human genome.
• More complex problems are still waiting to be solved.
Climate modeling Protein folding Drug discovery Energy research Data analysis
Parallel and Distributed Programming (Dr. M. Naveed Akhtar) 6
Why we’re building parallel systems
• Up to now, performance increases • A little physics lesson
have been attributable to • Smaller transistors = faster processors.
increasing density of transistors. • Faster processors = increased power consumption.
• But there are inherent problems. • Increased power consumption = increased heat.
• Increased heat = unreliable processors.
• Solution (Introduce parallelism!!!)
• Move away from single-core systems to multicore
processors.
• “core” = central processing unit (CPU)
https://www.fujitsu.com/sg/about/businesspolicy/tech/k/whatis/processor/cpu.html
Parallel and Distributed Programming (Dr. M. Naveed Akhtar) 7
Why we need to write parallel programs
Need for Parallel Programs Approaches to the serial problem
• Running multiple instances of a serial program • Rewrite serial programs so that they’re
often isn’t very useful. parallel.
• Think of running multiple instances of your • Write translation programs that automatically
favorite game. convert serial programs into parallel
programs.
• What you really want is for
it to run faster. • This is very difficult to do.
• Success has been limited.
Parallel and Distributed Programming (Dr. M. Naveed Akhtar) 8
More problems
• Some coding constructs can be recognized by an automatic program generator, and converted to
a parallel construct.
• However, it’s likely that the result will be a very inefficient program.
• Sometimes the best parallel solution is to step back and devise an entirely new algorithm.
Parallel and Distributed Programming (Dr. M. Naveed Akhtar) 9
Example
• Compute n values and add them together. • We have p cores, p much smaller than n.
• Serial solution: • Each core performs a partial sum of
approximately n/p values.
Each core uses it’s own private variables and executes
Parallel and Distributed Programming (Dr. M. Naveed Akhtar) this block of code independently of the other cores. 10
Example (cont.)
• After each core completes execution of the code, is a private variable my_sum contains the sum of
the values computed by its calls to Compute_next_value.
• Ex., 8 cores, n = 24, then the calls to Compute_next_value return:
1,4,3 9,2,8 5,1,1 5,2,7 2,5,0 4,1,8 6,5,1 2,3,9
0 1 2 3 4 5 6 7
• Once all the cores are done computing their private my_sum, they form a global sum by sending
results to a designated “master” core which adds the final result.
Parallel and Distributed Programming (Dr. M. Naveed Akhtar) 11
Example (cont.)
Core 0 1 2 3 4 5 6 7
my_sum 8 19 7 15 7 13 12 14
Global sum
8 + 19 + 7 + 15 + 7 + 13 + 12 + 14 = 95
Core 0 1 2 3 4 5 6 7
my_sum 95 19 7 15 7 13 12 14
But wait!
There’s a much better way, to compute the global sum.
Parallel and Distributed Programming (Dr. M. Naveed Akhtar) 12
Better parallel algorithm
• Don’t make the master core do all the work.
• Share it among the other cores.
• Pair the cores:
• Core 0 adds its result with Core 1’s result.
• Core 2 adds its result with Core 3’s result, etc.
• Work with odd and even numbered pairs of cores.
• Repeat the process now with only the evenly ranked cores.
• Core 0 adds result from core 2.
• Core 4 adds the result from core 6, etc.
• Now cores divisible by 4 repeat the process, and so forth,
until core 0 has the final result.
Parallel and Distributed Programming (Dr. M. Naveed Akhtar) 13
Analysis
• In the first example, the master core performs 7 receives and 7 additions.
• In the second example, the master core performs 3 receives and 3 additions.
• The improvement is more than a factor of 2!
• The difference is more dramatic with a larger number of cores.
• If we have 1000 cores:
• The first example would require the master to perform 999 receives and 999 additions.
• The second example would only require 10 receives and 10 additions.
• That’s an improvement of almost a factor of 100!
Parallel and Distributed Programming (Dr. M. Naveed Akhtar) 14
How do we write parallel programs?
• Task parallelism
• Partition various tasks carried out solving the problem among the cores.
• Data parallelism
• Partition the data used in solving the problem among the cores.
• Each core carries out similar operations on it’s part of the data.
Parallel and Distributed Programming (Dr. M. Naveed Akhtar) 15
Professor “P” in some University
15 questions
300 exams
TA#1 TA#3
TA#2
Parallel and Distributed Programming (Dr. M. Naveed Akhtar) 16
Division of work
Data Parallelism Task Parallelism
TA#1 TA#1
100 exams Questions 1 - 5
TA#2 TA#2
100 exams Questions 6 - 10
TA#3 TA#3
100 exams Questions 11 - 15
Parallel and Distributed Programming (Dr. M. Naveed Akhtar) 17
Division of work (Programmer's view)
Data Parallelism Task Parallelism
• Partition the data used in solving the problem • Partition the various tasks carried out in
among the cores solving the problem among the cores.
• Each core performs almost similar operations • Core’s operation depends on task assigned
Tasks
1. Receiving
2. Addition
Parallel and Distributed Programming (Dr. M. Naveed Akhtar) 18
Coordination
• Cores usually need to coordinate their work.
• Communication – one or more cores send their current partial sums to another core.
• Load balancing – share the work evenly among the cores so that one is not heavily loaded.
• Synchronization – because each core works at its own pace, make sure cores do not get too far
ahead of the rest.
Parallel and Distributed Programming (Dr. M. Naveed Akhtar) 19
What we’ll be doing
• Learning to write programs that are explicitly parallel.
• Using the C language.
• Using three different extensions to C.
• Message-Passing Interface (MPI)
• Posix Threads (Pthreads)
• OpenMP
Parallel and Distributed Programming (Dr. M. Naveed Akhtar) 20
Type of parallel systems
Shared-memory Distributed-memory
• The cores can share same computer’s memory. • Each core has its own, private memory.
• Coordinate the cores by having them examine • The cores must communicate explicitly by
and update shared memory locations. sending messages across a network.
Parallel and Distributed Programming (Dr. M. Naveed Akhtar) 21
Terminology
• Concurrent computing – a program is one in which multiple tasks can be in progress at any instant.
• Parallel computing – a program is one in which multiple tasks cooperate closely to solve a problem
• Distributed computing – a program may need to cooperate with other programs to solve a
problem.
Parallel and Distributed Programming (Dr. M. Naveed Akhtar) 22
Concluding Remarks
• The laws of physics have brought us to the doorstep of multicore technology.
• Serial programs typically don’t benefit from multiple cores.
• Automatic parallel program generation from serial program code isn’t the most efficient approach
to get high performance from multicore computers.
• Learning to write parallel programs involves learning how to coordinate the cores.
• Parallel programs are usually very complex and therefore, require sound program techniques and
development.
Parallel and Distributed Programming (Dr. M. Naveed Akhtar) 23
Questions and comments?
Parallel and Distributed Programming (Dr. M. Naveed Akhtar) 24