Chapter 3
Performance Analysis of
Multiprocessor Architecture
Advanced Computer Architecture and Parallel Processing Hesham El-Rewini & Mostafa Abd-El-Barr
3.1 Computational Models
• Equal Duration Model
– A task is divided into n equal subtasks each of
which is executed by 1 processor.
– ts is the execution time of the whole task using
a single processor.
– The time taken by each processor to execute
its subtask tm= ts/n.
– Then time taken by the whole system to
execute the whole task is tm= ts/n.
Advanced Computer Architecture and Parallel Processing Hesham El-Rewini & Mostafa Abd-El-Barr
3.1 Computational Models
– Speedup factor S(n) = ts/tm = ts/ (ts/n) = n.
– Speedup factor = number of processors used,
n.
– Assume time tc is needed as overhead time
for processors to communicate, therefore: tm=
(ts/n) + tc.
– Speedup (n) becomes: n/ (1+(n * tc/ts)).
– Efficiency = speedup(n)/n = 1/ (1+(n * tc/ts)).
Advanced Computer Architecture and Parallel Processing Hesham El-Rewini & Mostafa Abd-El-Barr
3.1 Computational Models
• Parallel Computation With Serial Sections
Model:
– A fraction f of a given task is not dividable into
concurrent subtasks.
– (1-f) is assumed to be dividable into
concurrent subtasks.
– Therefore speedup (n) = ts
t
= n ft s + (1 − f ) s
1 + (n − 1) f n
Advanced Computer Architecture and Parallel Processing Hesham El-Rewini & Mostafa Abd-El-Barr
3.1 Computational Models
• Speedup:
– S = Speed(new) / Speed(old)
– S = Work/time(new) / Work/time(old)
– S = time(old) / time(new)
– S = time(before improvement) /
time(after improvement)
Advanced Computer Architecture and Parallel Processing Hesham El-Rewini & Mostafa Abd-El-Barr
3.1 Computational Models
• Speedup:
– Time (one CPU): T(1).
– Time (n CPUs): T(n).
– Speedup: S
– S = T(1)/T(n)
Advanced Computer Architecture and Parallel Processing Hesham El-Rewini & Mostafa Abd-El-Barr
3.2 An Argument For Parallel
Architectures
• Grosch’s law
– “To sell a computer for twice as much, it must
be four times as fast”.
– Vendors skip small speed improvements in
favor of waiting for large ones.
Advanced Computer Architecture and Parallel Processing Hesham El-Rewini & Mostafa Abd-El-Barr
3.2 An Argument For Parallel
Architectures
– Buyers of expensive machines would wait for
a twofold improvement in performance for the
same price.
Power
Cost
Advanced Computer Architecture and Parallel Processing Hesham El-Rewini & Mostafa Abd-El-Barr
3.2 An Argument For Parallel
Architectures
• Amdahl’s law
– The performance improvement to be gained
from using some faster mode of execution is
limited by the fraction of the time the faster
mode can be used.
– There is an intrinsic limit set on the
performance improvement (speed)
regardless of the number of processors used.
Advanced Computer Architecture and Parallel Processing Hesham El-Rewini & Mostafa Abd-El-Barr
3.2 An Argument For Parallel
Architectures
• Amdahl’s law
20 hours
A B
must walk 200 miles
Walk 4 miles /hour Æ 50 + 20 = 70 hours S=1
Bike 10 miles / hour Æ 20 + 20 = 40 hours S = 1.8
Car-1 50 miles / hour Æ 4 + 20 = 24 hours S = 2.9
Car-2 120 miles / hour Æ 1.67 + 20 = 21.67 hours S = 3.2
Car-3 600 miles /hour Æ 0.33 + 20 = 20.33 hours S = 3.4
Advanced Computer Architecture and Parallel Processing Hesham El-Rewini & Mostafa Abd-El-Barr
3.2 An Argument For Parallel
Architectures
• Amdahl’s law
– β : The fraction of the program that is naturally
serial.
– (1- β): The fraction of the program that is
naturally parallel.
Advanced Computer Architecture and Parallel Processing Hesham El-Rewini & Mostafa Abd-El-Barr
3.2 An Argument For Parallel
Architectures
• Amdahl’s law
S = T(1)/T(N)
T(1)(1- β )
T(N) = T(1)β +
N
1 N
S= =
β+ (1- β )
βN + (1- β )
N
Advanced Computer Architecture and Parallel Processing Hesham El-Rewini & Mostafa Abd-El-Barr
3.2 An Argument For Parallel
Architectures
• Amdahl’s law
Advanced Computer Architecture and Parallel Processing Hesham El-Rewini & Mostafa Abd-El-Barr
3.2 An Argument For Parallel
Architectures
• Gustafson-Barsis’s law
– If s and p are the serial and parallel time
spent on a parallel system, then s+ p*n is the
time needed by a serial processor to perform
the computation.
Advanced Computer Architecture and Parallel Processing Hesham El-Rewini & Mostafa Abd-El-Barr
3.2 An Argument For Parallel
Architectures
• Gustafson-Barsis’s law
N & β are not independent from each other.
α: The fraction of the program that is naturally serial.
T(N) = 1
T(1) = α + (1- α ) N
S = N – (N-1) α
Advanced Computer Architecture and Parallel Processing Hesham El-Rewini & Mostafa Abd-El-Barr
3.2 An Argument For Parallel
Architectures
• Gustafson-Barsis’s law
Advanced Computer Architecture and Parallel Processing Hesham El-Rewini & Mostafa Abd-El-Barr
3.2 An Argument For Parallel
Architectures
• Gustafson-Barsis’s law
Advanced Computer Architecture and Parallel Processing Hesham El-Rewini & Mostafa Abd-El-Barr
3.3 Interconnection Networks
Performance Issues
• Bandwidth of a crossbar:
– is the average number of requests that can
be accepted by a crossbar in a given cycle.
– For M memory modules and n processors,
• if a processor generates a request with probability
ρ in a cycle directed to each memory with equal
probability, then the expression for the bandwidth
is: M(1-(1-(ρ/M))n)
Advanced Computer Architecture and Parallel Processing Hesham El-Rewini & Mostafa Abd-El-Barr
3.3 Interconnection Networks
Performance Issues
• Bandwidth of a multiple bus:
B N
BW = ∑ k × β + ∑ B×β
k =1 k = B +1
P1 P2 PN
M1 M2 MM
Advanced Computer Architecture and Parallel Processing Hesham El-Rewini & Mostafa Abd-El-Barr
3.3 Interconnection Networks
Performance Issues
• Bandwidth of a Multistage Interconnection
Network:
– Assumption: MIN consists of stages of a x b
crossbar switches.
– BW = bn X rn
Advanced Computer Architecture and Parallel Processing Hesham El-Rewini & Mostafa Abd-El-Barr
3.4 Scalability of Parallel
Architectures
– A parallel architecture is scalable if it can be expanded
(or reduced) to a larger (smaller) system with a linear
increase (decrease) in its performance (cost).
– In terms of speed, a scalable system is capable of
increasing its speed in proportion to the increase in
number of processors.
– In terms of efficiency, a parallel system is scalable if its
efficiency is kept fixed as the number of processors is
increased.
– Size scalability: measures the maximum number of
processors a system can accommodate.
Advanced Computer Architecture and Parallel Processing Hesham El-Rewini & Mostafa Abd-El-Barr
3.4 Scalability of Parallel
Architectures
– Application scalability: ability to run application
software with improved performance on a scaled-up
version of the system.
– Generation scalability: ability of a system to scale-up
by using next-generation (fast) components.
– Heterogeneous scalability: ability of a system to scale-
up by using hardware and software components
supplied by different vendors.
Advanced Computer Architecture and Parallel Processing Hesham El-Rewini & Mostafa Abd-El-Barr
3.5 Benchmark Performance
– Benchmark performance: is the use of a set of
integer and floating-point programs (known as
benchmark) designed to test different
performance aspects of a computing system
under test.
– Benchmark programs should be designed to
provide fair and effective comparisons among
high-performance computing systems.
Advanced Computer Architecture and Parallel Processing Hesham El-Rewini & Mostafa Abd-El-Barr
3.5 Benchmark Performance
– Serial Benchmarks
– Parallel Benchmarks
– PERFECT Benchmarks
– NAS Kernel
– The SLALOM
– The Golden Bell Prize
– WebSTONE for the Web
Advanced Computer Architecture and Parallel Processing Hesham El-Rewini & Mostafa Abd-El-Barr
3.6 Summary
• A number of issues related to the performance of
multiprocessor systems was covered.
• Two computational models were introduced:
– Equal duration
– Parallel computations with serial sections
• A rebuttal to a number of critical views about the
effectiveness of parallel architectures has been made:
– Grosch’s law
– Amdahl’s law
• A number of performance metrics for static and dynamic
interconnection networks has been provided.
Advanced Computer Architecture and Parallel Processing Hesham El-Rewini & Mostafa Abd-El-Barr
3.6 Summary
• The scalability of parallel architectures in terms of speed
and efficiency has been discussed.
• A number of unconventional metrics for scalability has
also been discussed.
• Finally, The issue of benchmark performance
measurement has been introduced.
Advanced Computer Architecture and Parallel Processing Hesham El-Rewini & Mostafa Abd-El-Barr