100% found this document useful (1 vote)
127 views26 pages

03 - Lecture - Performance Analysis

This chapter discusses performance analysis and scalability of multiprocessor architectures. It introduces computational models like the equal duration model and parallel computation with serial sections model. It describes speedup factors and efficiency. Amdahl's law and Gustafson-Barsis's law relate to the limits of parallelization. Interconnection network bandwidth is analyzed for crossbars, buses, and multistage networks. Scalability is defined in terms of speed, efficiency, size, application, and generation scalability.

Uploaded by

Ahmed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
127 views26 pages

03 - Lecture - Performance Analysis

This chapter discusses performance analysis and scalability of multiprocessor architectures. It introduces computational models like the equal duration model and parallel computation with serial sections model. It describes speedup factors and efficiency. Amdahl's law and Gustafson-Barsis's law relate to the limits of parallelization. Interconnection network bandwidth is analyzed for crossbars, buses, and multistage networks. Scalability is defined in terms of speed, efficiency, size, application, and generation scalability.

Uploaded by

Ahmed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Chapter 3

Performance Analysis of
Multiprocessor Architecture

Advanced Computer Architecture and Parallel Processing Hesham El-Rewini & Mostafa Abd-El-Barr
3.1 Computational Models
• Equal Duration Model
– A task is divided into n equal subtasks each of
which is executed by 1 processor.
– ts is the execution time of the whole task using
a single processor.
– The time taken by each processor to execute
its subtask tm= ts/n.
– Then time taken by the whole system to
execute the whole task is tm= ts/n.

Advanced Computer Architecture and Parallel Processing Hesham El-Rewini & Mostafa Abd-El-Barr
3.1 Computational Models
– Speedup factor S(n) = ts/tm = ts/ (ts/n) = n.
– Speedup factor = number of processors used,
n.
– Assume time tc is needed as overhead time
for processors to communicate, therefore: tm=
(ts/n) + tc.
– Speedup (n) becomes: n/ (1+(n * tc/ts)).
– Efficiency = speedup(n)/n = 1/ (1+(n * tc/ts)).

Advanced Computer Architecture and Parallel Processing Hesham El-Rewini & Mostafa Abd-El-Barr
3.1 Computational Models
• Parallel Computation With Serial Sections
Model:
– A fraction f of a given task is not dividable into
concurrent subtasks.
– (1-f) is assumed to be dividable into
concurrent subtasks.
– Therefore speedup (n) = ts
t
= n ft s + (1 − f ) s
1 + (n − 1) f n

Advanced Computer Architecture and Parallel Processing Hesham El-Rewini & Mostafa Abd-El-Barr
3.1 Computational Models
• Speedup:
– S = Speed(new) / Speed(old)
– S = Work/time(new) / Work/time(old)
– S = time(old) / time(new)
– S = time(before improvement) /
time(after improvement)

Advanced Computer Architecture and Parallel Processing Hesham El-Rewini & Mostafa Abd-El-Barr
3.1 Computational Models
• Speedup:
– Time (one CPU): T(1).
– Time (n CPUs): T(n).
– Speedup: S
– S = T(1)/T(n)

Advanced Computer Architecture and Parallel Processing Hesham El-Rewini & Mostafa Abd-El-Barr
3.2 An Argument For Parallel
Architectures
• Grosch’s law
– “To sell a computer for twice as much, it must
be four times as fast”.
– Vendors skip small speed improvements in
favor of waiting for large ones.

Advanced Computer Architecture and Parallel Processing Hesham El-Rewini & Mostafa Abd-El-Barr
3.2 An Argument For Parallel
Architectures
– Buyers of expensive machines would wait for
a twofold improvement in performance for the
same price.

Power

Cost

Advanced Computer Architecture and Parallel Processing Hesham El-Rewini & Mostafa Abd-El-Barr
3.2 An Argument For Parallel
Architectures
• Amdahl’s law
– The performance improvement to be gained
from using some faster mode of execution is
limited by the fraction of the time the faster
mode can be used.
– There is an intrinsic limit set on the
performance improvement (speed)
regardless of the number of processors used.

Advanced Computer Architecture and Parallel Processing Hesham El-Rewini & Mostafa Abd-El-Barr
3.2 An Argument For Parallel
Architectures
• Amdahl’s law

20 hours

A B
must walk 200 miles

Walk 4 miles /hour Æ 50 + 20 = 70 hours S=1


Bike 10 miles / hour Æ 20 + 20 = 40 hours S = 1.8
Car-1 50 miles / hour Æ 4 + 20 = 24 hours S = 2.9
Car-2 120 miles / hour Æ 1.67 + 20 = 21.67 hours S = 3.2
Car-3 600 miles /hour Æ 0.33 + 20 = 20.33 hours S = 3.4

Advanced Computer Architecture and Parallel Processing Hesham El-Rewini & Mostafa Abd-El-Barr
3.2 An Argument For Parallel
Architectures
• Amdahl’s law
– β : The fraction of the program that is naturally
serial.

– (1- β): The fraction of the program that is


naturally parallel.

Advanced Computer Architecture and Parallel Processing Hesham El-Rewini & Mostafa Abd-El-Barr
3.2 An Argument For Parallel
Architectures
• Amdahl’s law
S = T(1)/T(N)

T(1)(1- β )
T(N) = T(1)β +
N

1 N
S= =
β+ (1- β )
βN + (1- β )
N

Advanced Computer Architecture and Parallel Processing Hesham El-Rewini & Mostafa Abd-El-Barr
3.2 An Argument For Parallel
Architectures
• Amdahl’s law

Advanced Computer Architecture and Parallel Processing Hesham El-Rewini & Mostafa Abd-El-Barr
3.2 An Argument For Parallel
Architectures
• Gustafson-Barsis’s law
– If s and p are the serial and parallel time
spent on a parallel system, then s+ p*n is the
time needed by a serial processor to perform
the computation.

Advanced Computer Architecture and Parallel Processing Hesham El-Rewini & Mostafa Abd-El-Barr
3.2 An Argument For Parallel
Architectures
• Gustafson-Barsis’s law
N & β are not independent from each other.

α: The fraction of the program that is naturally serial.

T(N) = 1

T(1) = α + (1- α ) N

S = N – (N-1) α

Advanced Computer Architecture and Parallel Processing Hesham El-Rewini & Mostafa Abd-El-Barr
3.2 An Argument For Parallel
Architectures
• Gustafson-Barsis’s law

Advanced Computer Architecture and Parallel Processing Hesham El-Rewini & Mostafa Abd-El-Barr
3.2 An Argument For Parallel
Architectures
• Gustafson-Barsis’s law

Advanced Computer Architecture and Parallel Processing Hesham El-Rewini & Mostafa Abd-El-Barr
3.3 Interconnection Networks
Performance Issues
• Bandwidth of a crossbar:
– is the average number of requests that can
be accepted by a crossbar in a given cycle.
– For M memory modules and n processors,
• if a processor generates a request with probability
ρ in a cycle directed to each memory with equal
probability, then the expression for the bandwidth
is: M(1-(1-(ρ/M))n)

Advanced Computer Architecture and Parallel Processing Hesham El-Rewini & Mostafa Abd-El-Barr
3.3 Interconnection Networks
Performance Issues
• Bandwidth of a multiple bus:
B N
BW = ∑ k × β + ∑ B×β
k =1 k = B +1

P1 P2 PN

M1 M2 MM

Advanced Computer Architecture and Parallel Processing Hesham El-Rewini & Mostafa Abd-El-Barr
3.3 Interconnection Networks
Performance Issues
• Bandwidth of a Multistage Interconnection
Network:
– Assumption: MIN consists of stages of a x b
crossbar switches.
– BW = bn X rn

Advanced Computer Architecture and Parallel Processing Hesham El-Rewini & Mostafa Abd-El-Barr
3.4 Scalability of Parallel
Architectures
– A parallel architecture is scalable if it can be expanded
(or reduced) to a larger (smaller) system with a linear
increase (decrease) in its performance (cost).
– In terms of speed, a scalable system is capable of
increasing its speed in proportion to the increase in
number of processors.
– In terms of efficiency, a parallel system is scalable if its
efficiency is kept fixed as the number of processors is
increased.
– Size scalability: measures the maximum number of
processors a system can accommodate.

Advanced Computer Architecture and Parallel Processing Hesham El-Rewini & Mostafa Abd-El-Barr
3.4 Scalability of Parallel
Architectures
– Application scalability: ability to run application
software with improved performance on a scaled-up
version of the system.
– Generation scalability: ability of a system to scale-up
by using next-generation (fast) components.
– Heterogeneous scalability: ability of a system to scale-
up by using hardware and software components
supplied by different vendors.

Advanced Computer Architecture and Parallel Processing Hesham El-Rewini & Mostafa Abd-El-Barr
3.5 Benchmark Performance
– Benchmark performance: is the use of a set of
integer and floating-point programs (known as
benchmark) designed to test different
performance aspects of a computing system
under test.
– Benchmark programs should be designed to
provide fair and effective comparisons among
high-performance computing systems.

Advanced Computer Architecture and Parallel Processing Hesham El-Rewini & Mostafa Abd-El-Barr
3.5 Benchmark Performance
– Serial Benchmarks
– Parallel Benchmarks
– PERFECT Benchmarks
– NAS Kernel
– The SLALOM
– The Golden Bell Prize
– WebSTONE for the Web

Advanced Computer Architecture and Parallel Processing Hesham El-Rewini & Mostafa Abd-El-Barr
3.6 Summary
• A number of issues related to the performance of
multiprocessor systems was covered.
• Two computational models were introduced:
– Equal duration
– Parallel computations with serial sections
• A rebuttal to a number of critical views about the
effectiveness of parallel architectures has been made:
– Grosch’s law
– Amdahl’s law
• A number of performance metrics for static and dynamic
interconnection networks has been provided.

Advanced Computer Architecture and Parallel Processing Hesham El-Rewini & Mostafa Abd-El-Barr
3.6 Summary
• The scalability of parallel architectures in terms of speed
and efficiency has been discussed.
• A number of unconventional metrics for scalability has
also been discussed.
• Finally, The issue of benchmark performance
measurement has been introduced.

Advanced Computer Architecture and Parallel Processing Hesham El-Rewini & Mostafa Abd-El-Barr

You might also like