High Performance
Computing
LECTURE #3
1
O Introduction
O Parallel Computing Platforms: logical & physical
Agenda O Logical Organization
1- Control
2- Communication
O Flynn Taxonomy
O 1- Control
2
1- Introduction
❖A computing platform includes a hardware architecture and a software
framework (including application frameworks), where the combination allows
software to run.
❖ Typical platforms include a
- computer architecture,
- operating system,
- programming languages and
- related user interface (run-time system libraries or graphical user interface).
Parallel Computing Platform
2- Parallel Computing Platform
4
Parallel Computing Platform
5
Parallel Computing Platform
Logical Organization
8
Parallel Computing Platform
Logical Organization
❖ Parallelism can be expressed at various levels of granularity.
❖ Computation / Communication Ratio:
◦ In parallel computing, granularity is a qualitative measure of the ratio of
computation to communication.
◦ Periods of computation are typically separated from periods of communication by
synchronization events.
9
Parallel Computing Platform
Logical Organization
Fine-grain Parallelism: Coarse-grain Parallelism:
❖ Relatively small amounts of computational
❖Relatively large amounts of computational work
work are done between communication events are done between communication/synchronization
❖ Low computation to communication ratio. events
❖ High computation to communication ratio
❖ Facilitates load balancing.
❖ Implies more opportunity for performance
❖ Implies high communication overhead and less increase
opportunity for performance enhancement
❖ Harder to load balance efficiently.
❖ If granularity is too fine it is possible that the
overhead required for communications and
synchronization between tasks takes longer than
the computation.
Parallel Computing Platform
Logical Organization
Which is Best?
❖ The most efficient granularity is dependent on the algorithm and the
hardware environment in which it runs.
❖In most cases the overhead associated with communications and
synchronization is high relative to execution speed so it is advantageous to
have coarse granularity.
❖ Fine-grain parallelism can help reduce overheads due to load imbalance.
11
Parallel Computing Platform
Logical Organization
Task A logically discrete section of computational work. A task is typically a program
or program-like set of instructions that is executed by a processor.
12
Models:
Flynn's Classical Taxonomy
❖Flynn’s classification scheme is based on the notion of a stream of information.
❖Two types of information flow into a processor: instructions and data
❖Each of these dimensions can have only one of two possible states: Single or
Multiple.
❖One of the more widely used classifications that classify parallel computers, use
since 1966.
1
4
Flynn's Classical Taxonomy
The matrix below defines the 4 possible classifications according to Flynn:
1
5
Flynn's Classical Taxonomy
The matrix below defines the 4 possible classifications according to Flynn:
1
6
Flynn's Taxonomy
Single Instruction, Single Data (SISD):
❖ A serial (non-parallel) computer
❖Single instruction: only one instruction stream is being acted on by the
CPU during any one clock cycle
❖Single data: only one data stream is being used as input during any one
clock cycle
❖ This is the oldest and even today, the most common type of computer
❖ Examples: older generation mainframes, minicomputers and
workstations; most modern day PCs.
1
7
Flynn's Taxonomy
Single Instruction, Multiple Data (SIMD):
❖Single instruction: All processing
units execute the same
instruction at any given clock
cycle
❖Multiple data: Each processing
unit can operate on a different
data element A single control unit that
dispatches the same
instruction to various
processors (that work on
different data)
9
19
Flynn's Taxonomy
❖ Processor Arrays: ILLIAC IV, DAP Connection Machine CM-2, MasPar MP-1.
❖Vector Pipelines: IBM 9000, Cray X-MP, Y-MP & C90, Fujitsu VP, NEC SX-2,
Hitachi S820, ETA10
❖Most modern computers, particularly those with graphics processor units
(GPUs) employ SIMD instructions and execution units.
❖ Examples:
For (I = 0; i<1000; i++)
c[i] = a[i] + b[i];
20
21
Flynn's Taxonomy
22
Your Turn !!!
Guess what are the SIMD drawbacks??!!
23
Flynn's Taxonomy
Multiple Instruction, Single Data (MISD):
❖A single data stream is fed into multiple processing
units.
❖Each processing unit operates on the data
independently via independent instruction streams.
❖Few actual examples of this class of parallel
computer have ever existed. One is the experimental
Carnegie-Mellon C.mmp computer (1971).
❖ex: Multiple cryptography algorithms attempting to
crack a single coded message.
24
Multiple Instruction, Multiple Data (MIMD):
❖Currently, the most common type of parallel computer. Most
modern computers fall into this category.
❖Multiple Instruction: every processor may be executing a
different instruction stream.
❖Multiple Data: every processor may be working with a
different data stream.
❖Examples: most current supercomputers, networked parallel
computer clusters and "grids", multi-processor SMP computers,
multi-core PCs.
❖Note: many MIMD architectures also include SIMD execution
sub-components
25
26
YourTurn
Compare between SIMD and MIMD
28