0% found this document useful (0 votes)
41 views27 pages

Lecture 02 - Performance

This document discusses improving CPU performance by optimizing execution time. It explains that execution time depends on instruction count, cycles per instruction (CPI), and clock rate. While clock rates historically increased due to Moore's Law, power constraints now prevent further increases. As a result, computer architects focus on reducing CPI through techniques like pipelining, caching, and speculative execution to improve performance within the same power budget. Software also aims to minimize instruction counts through algorithm and compiler optimizations.

Uploaded by

f.noun7
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
41 views27 pages

Lecture 02 - Performance

This document discusses improving CPU performance by optimizing execution time. It explains that execution time depends on instruction count, cycles per instruction (CPI), and clock rate. While clock rates historically increased due to Moore's Law, power constraints now prevent further increases. As a result, computer architects focus on reducing CPI through techniques like pipelining, caching, and speculative execution to improve performance within the same power budget. Software also aims to minimize instruction counts through algorithm and compiler optimizations.

Uploaded by

f.noun7
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Lecture 2: Performance

CMPS 221 – Computer Organization and Design

Slides by Mahmoud Bdeir and Izzat El Hajj


Clocks
• A computer is driven by a clock that determines
when events take place

• A clock cycle is a discrete time interval between two


pulses of an oscillator
• A clock period is the duration of a clock cycle
• The clock rate or frequency is the number of clock
cycles per second (inverse of the clock period)
• Example: the Intel Core i7-8700K has a clock rate of 3.7GHz.
What is its clock period?
Which has better performance?
• CPU1: 2.4 GHz
• CPU2: 3.8 GHz

Trick question!
Which has better performance?

It depends on the performance metric we care about


Which has better performance?

If we care about minimizing the time to transport one person


from one place to another (i.e., execution time)…

…the car
Which has better performance?

If we care about maximizing the number of people we can


transport in a certain amount of time (i.e., throughput)…

…the bus
Which has better performance?

If we care about minimizing the energy it takes


to transport people (i.e., energy efficiency)…

…bikes
Which has better execution time?
• CPU1: 2.4 GHz
• CPU2: 3.8 GHz

Still a trick question!


Components of Execution Time
𝑺𝒆𝒄𝒐𝒏𝒅𝒔 𝑰𝒏𝒔𝒕𝒓𝒖𝒄𝒕𝒊𝒐𝒏𝒔 𝑺𝒆𝒄𝒐𝒏𝒅𝒔
= ×
𝑷𝒓𝒐𝒈𝒓𝒂𝒎 𝑷𝒓𝒐𝒈𝒓𝒂𝒎 𝑰𝒏𝒔𝒕𝒓𝒖𝒄𝒕𝒊𝒐𝒏

𝑰𝒏𝒔𝒕𝒓𝒖𝒄𝒕𝒊𝒐𝒏𝒔 𝑪𝒍𝒐𝒄𝒌 𝑪𝒚𝒄𝒍𝒆𝒔 𝑺𝒆𝒄𝒐𝒏𝒅𝒔


= × ×
𝑷𝒓𝒐𝒈𝒓𝒂𝒎 𝑰𝒏𝒔𝒕𝒓𝒖𝒄𝒕𝒊𝒐𝒏 𝑪𝒍𝒐𝒄𝒌 𝑪𝒚𝒄𝒍𝒆

Instruction Count CPI Clock Rate -1


Which has better execution time?
𝑺𝒆𝒄𝒐𝒏𝒅𝒔 𝑰𝒏𝒔𝒕𝒓𝒖𝒄𝒕𝒊𝒐𝒏𝒔 𝑺𝒆𝒄𝒐𝒏𝒅𝒔
= ×
𝑷𝒓𝒐𝒈𝒓𝒂𝒎 𝑷𝒓𝒐𝒈𝒓𝒂𝒎 𝑰𝒏𝒔𝒕𝒓𝒖𝒄𝒕𝒊𝒐𝒏

𝑰𝒏𝒔𝒕𝒓𝒖𝒄𝒕𝒊𝒐𝒏𝒔 𝑪𝒍𝒐𝒄𝒌 𝑪𝒚𝒄𝒍𝒆𝒔 𝑺𝒆𝒄𝒐𝒏𝒅𝒔


= × ×
𝑷𝒓𝒐𝒈𝒓𝒂𝒎 𝑰𝒏𝒔𝒕𝒓𝒖𝒄𝒕𝒊𝒐𝒏 𝑪𝒍𝒐𝒄𝒌 𝑪𝒚𝒄𝒍𝒆

Instruction Count CPI Clock Rate -1

CPU1 ? ? 2.4 GHz


CPU2 ? ? 3.8 GHz

Cannot decide based on just the clock rate


Improving a CPU’s Execution Time
𝑰𝒏𝒔𝒕𝒓𝒖𝒄𝒕𝒊𝒐𝒏𝒔 𝑪𝒍𝒐𝒄𝒌 𝑪𝒚𝒄𝒍𝒆𝒔 𝑺𝒆𝒄𝒐𝒏𝒅𝒔
× ×
𝑷𝒓𝒐𝒈𝒓𝒂𝒎 𝑰𝒏𝒔𝒕𝒓𝒖𝒄𝒕𝒊𝒐𝒏 𝑪𝒍𝒐𝒄𝒌 𝑪𝒚𝒄𝒍𝒆

Instruction Count CPI Clock Rate -1

Approaches to decreasing execution time involve


decreasing one of these three components
Improving a CPU’s Execution Time
𝑰𝒏𝒔𝒕𝒓𝒖𝒄𝒕𝒊𝒐𝒏𝒔 𝑪𝒍𝒐𝒄𝒌 𝑪𝒚𝒄𝒍𝒆𝒔 𝑺𝒆𝒄𝒐𝒏𝒅𝒔
× ×
𝑷𝒓𝒐𝒈𝒓𝒂𝒎 𝑰𝒏𝒔𝒕𝒓𝒖𝒄𝒕𝒊𝒐𝒏 𝑪𝒍𝒐𝒄𝒌 𝑪𝒚𝒄𝒍𝒆

Instruction Count CPI Clock Rate -1

For a long time, improvements in circuits technology enabled


driving processors at higher clock rates, improving execution
time without the need for additional effort by software
developers and computer architects

This trend was called the “free lunch”


Moore’s “Law”
107
Transistors
106
(thousands)
105

104

103

102

101

100

1970 1980 1990 2000 2010 2020


Source: M. Horowitz, F. Labonte, O. Shacham, K. Olukotun, L. Hammond, C. Batten (1970-2010 ). K. Rupp (2010-2017).

Moore’s “Law” predicted that the number of transistors


per unit area would double every 18-24 months
No More Free Lunch
107
Transistors
106
(thousands)
105

104
Frequency
103
(MHz)

102

101

100

1970 1980 1990 2000 2010 2020


Source: M. Horowitz, F. Labonte, O. Shacham, K. Olukotun, L. Hammond, C. Batten (1970-2010 ). K. Rupp (2010-2017).

Processor frequency (clock rate) followed the same trend because


smaller transistors can be switched faster… until around 2005.
Power Wall
107
Transistors
106
(thousands)
105

104
Frequency
103
(MHz)

102
Power Wall
101

100

1970 1980 1990 2000 2010 2020


Source: M. Horowitz, F. Labonte, O. Shacham, K. Olukotun, L. Hammond, C. Batten (1970-2010 ). K. Rupp (2010-2017).

Around 2005, frequency stopped increasing due to the Power Wall


Power Breakdown

2
𝑃 ∝ 𝐶 𝑉 𝑓
(Power) (Capacitance) (Voltage) (Frequency)
Power Breakdown

2
𝑃 ∝ 𝐶 𝑉 𝑓
(Power) (Capacitance) (Voltage) (Frequency)

Increasing frequency increases power which dissipates more heat,


requiring more support for cooling the chip
Power Breakdown

2
𝑃 ∝ 𝐶 𝑉 𝑓
(Power) (Capacitance) (Voltage) (Frequency)

Historically, the increase in power was partially compensated for by a


decrease in voltage, enabled by the decrease in transistor size
(over 20yrs, there was a 1,000x increase in frequency but only
a 30x increase in power because voltage decreased by 5x)
Power Breakdown

2
𝑃 ∝ 𝐶 𝑉 𝑓
(Power) (Capacitance) (Voltage) (Frequency)

Today, voltage can no longer be decreased because it makes transistors unreliable

and power can no longer be increased because we have reached the limit of what we
can cool

therefore, frequency can no longer be increased.


Power Trend
107 But we still get more
Transistors
106 transistors! 
(thousands)
105 What to do with
104
them?
Frequency
103
(MHz)
Typical Power
102
(Watts)
101

100

1970 1980 1990 2000 2010 2020


Source: M. Horowitz, F. Labonte, O. Shacham, K. Olukotun, L. Hammond, C. Batten (1970-2010 ). K. Rupp (2010-2017).

Stagnation in frequency is associated with a stagnation in power


Where to invest transistors?
• Increase number of cores (or threads per core)
• Improves throughput

• Make cores more advanced


• Improves execution time

• Tradeoff between execution time and throughput


Improving a CPU’s Execution Time
𝑰𝒏𝒔𝒕𝒓𝒖𝒄𝒕𝒊𝒐𝒏𝒔 𝑪𝒍𝒐𝒄𝒌 𝑪𝒚𝒄𝒍𝒆𝒔 𝑺𝒆𝒄𝒐𝒏𝒅𝒔
× ×
𝑷𝒓𝒐𝒈𝒓𝒂𝒎 𝑰𝒏𝒔𝒕𝒓𝒖𝒄𝒕𝒊𝒐𝒏 𝑪𝒍𝒐𝒄𝒌 𝑪𝒚𝒄𝒍𝒆

Instruction Count CPI Clock Rate -1

Computer architects have developed a wide variety of


techniques for improving the number of instructions that
can be executed each clock cycle
Techniques for Reducing CPI
• Pipelining (Chapter 4)

• Caching (Chapter 5)

• Speculative Execution (covered briefly)

• Out-of-order Execution (covered briefly)


Improving a CPU’s Execution Time
𝑰𝒏𝒔𝒕𝒓𝒖𝒄𝒕𝒊𝒐𝒏𝒔 𝑪𝒍𝒐𝒄𝒌 𝑪𝒚𝒄𝒍𝒆𝒔 𝑺𝒆𝒄𝒐𝒏𝒅𝒔
× ×
𝑷𝒓𝒐𝒈𝒓𝒂𝒎 𝑰𝒏𝒔𝒕𝒓𝒖𝒄𝒕𝒊𝒐𝒏 𝑪𝒍𝒐𝒄𝒌 𝑪𝒚𝒄𝒍𝒆

Instruction Count CPI Clock Rate -1

Software plays a primary role in reducing a program’s instruction count


(low complexity algorithms, powerful compiler optimizations)

Computer architecture also plays a role by providing special purpose


hardware for common operations (increasingly popular trend)
Pitfalls
𝑰𝒏𝒔𝒕𝒓𝒖𝒄𝒕𝒊𝒐𝒏𝒔 𝑪𝒍𝒐𝒄𝒌 𝑪𝒚𝒄𝒍𝒆𝒔 𝑺𝒆𝒄𝒐𝒏𝒅𝒔
× ×
𝑷𝒓𝒐𝒈𝒓𝒂𝒎 𝑰𝒏𝒔𝒕𝒓𝒖𝒄𝒕𝒊𝒐𝒏 𝑪𝒍𝒐𝒄𝒌 𝑪𝒚𝒄𝒍𝒆

Instruction Count CPI Clock Rate -1

A CPU manufacturer increases the clock rate of their processor,


decreasing the clock cycle duration.

As a result, some instructions that used to take 1 cycle to


complete now require 2 cycles, increasing the overall CPI.

If CPI increase is disproportionate, execution time my increase.


Pitfalls
𝑰𝒏𝒔𝒕𝒓𝒖𝒄𝒕𝒊𝒐𝒏𝒔 𝑪𝒍𝒐𝒄𝒌 𝑪𝒚𝒄𝒍𝒆𝒔 𝑺𝒆𝒄𝒐𝒏𝒅𝒔
× ×
𝑷𝒓𝒐𝒈𝒓𝒂𝒎 𝑰𝒏𝒔𝒕𝒓𝒖𝒄𝒕𝒊𝒐𝒏 𝑪𝒍𝒐𝒄𝒌 𝑪𝒚𝒄𝒍𝒆

Instruction Count CPI Clock Rate -1

A CPU manufacturer creates a fused-multiply-add (FMA)


instruction which is a common operation in linear algebra.

Assuming an add instruction takes 4 cycles and a multiply


instruction takes 8 cycles, if the FMA instruction takes 12 cycles,
then execution time does not improve.
Textbook Sections
• Some of the content in these slides corresponds to:

• Textbook:
• Computer Organization and Design, 5th Edition by David
Patterson and John Hennessy, Morgan Kaufmann, 2014.

• Sections:
• 1.6, 1.7

You might also like