0% found this document useful (0 votes)
5 views12 pages

CH 6

The document discusses pipelining in microprocessors, emphasizing its importance for improving instruction throughput and reducing execution time. It outlines the differences between single-cycle and multi-cycle implementations, detailing how pipelining can enhance performance while addressing data and control hazards. The text also covers the architecture of pipelined datapaths and control mechanisms necessary for efficient instruction processing.

Uploaded by

jonathanj302
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views12 pages

CH 6

The document discusses pipelining in microprocessors, emphasizing its importance for improving instruction throughput and reducing execution time. It outlines the differences between single-cycle and multi-cycle implementations, detailing how pipelining can enhance performance while addressing data and control hazards. The text also covers the architecture of pipelined datapaths and control mechanisms necessary for efficient instruction processing.

Uploaded by

jonathanj302
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Pipelining Motivation

Forecast Want to minimize:


• Big Picture • Time = Instructions/prog x CPI x Cycle time = P x ? x ?
• Datapath Single cycle implementation:
• Control • CPI = 1
• Data Hazards • Cycle=imem+reg_rd+alu+dmem+reg_wr+muxes & control
• Stalls
• = 500 + 250 + 500 + 500 + 250 + 0 + 0 = 2000 ps = 2 ns
• Forwarding
• Time/prog = P * 2 ns
• Control Hazards
• Exceptions

© 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 6 1 © 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 6 2

Motivation Big Picture


Multicycle implementation: Multicycle implementation: Cycles
• CPI = 3, 4, 5
Instructions
• Cycle=Max(memory, registers, ALU, muxes&control)
1 1 1 1
• = max(500, 250, 500) = 500 ps 1 2 3 4 5 6 7 8 9
0 1 2 3
• Time/prog = P * 4 * 500 = P * 2000 ps = P * 2 ns i F D X M W
Would like: i+1 F D X
• CPI = 1 + hazards i+2 F D X M
• Cycle = 500 ps + overheads i+3 F
• In reality, ~3x improvement i+4

© 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 6 3 © 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 6 4
Big Picture Big Picture
Instruction Latency = 5 cycles Cycles
Instruction Throughput = 1/5 instructions per cycle Instructions
CPI = 5 cycles per instruction 1 1 1 1
1 2 3 4 5 6 7 8 9
0 1 2 3
i F D X M W
Pipelining: process instructions like a lunch buffet!
i+1 F D X M W
ALL microprocessors today employ pipelining for speed i+2 F D X M W
E.g., Intel PentiumIII and Compaq Alpha 21264 i+3 F D X M W
i+4 F D X M W

© 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 6 5 © 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 6 6

Big Picture Big Picture


Instruction latency = 5 cycles - no change But

Instruction throughput = 1 instruction per cycle • datapath? note: five instructions in datapath in cycle 5
• control? must be generated by multiple instructions
CPI = 1 cycle per instruction
• instructions may have data and control flow dependences
CPI = cycle between instruction completion = 1!

© 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 6 7 © 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 6 8
Datapath (Fig. 6.11) Datapath (Fig. 6.10)
Time (in clock cycles)
IF: Instruction fetch
ID: Instruction decode/ EX: Execute/ MEM: Memory access WB: Write bac
Program register file read address calculation
CC 1 CC 2 CC 3 CC 4 CC 5 CC 6 CC 7 0
execution M
u
order x
1
(in instructions)
lw $1, 100($0) IM Reg ALU DM Reg
Add

4 Add
Add result

Shift
left 2

Read
lw $2, 200($0) IM Reg ALU DM Reg PC Address register 1
Read
data 1
Read
register 2 Zero
Instruction RegistersRead ALU ALU
Write data 2 0 Read
result Address 1
register M data
Instruction M
u Data
memory Write x u
memory x
data 1
0
Write
data
16 32
lw $3, 300($0) IM Reg ALU DM Reg Sign
extend

© 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 6 9 © 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 6 10

Big Picture Data Dependence


Control One instruction produces a value used by a later instruction
• Set by five different instructions E.g.,
• Divide and conquer: carry IR down the pipeline • add $1, - , -
• sub -, $4, -

MIPS ISA requires the appearance of sequential execution 1 2 3 4 5 6 7 8 9


i F D X M W*
i+1 F D* X M W

© 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 6 11 © 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 6 12
Data Dependence Control Dependence
Simple solution : Stall the pipeline One instruction affects which instruction will execute next
E.g., E.g., bne, j
• add $1, - , - • sw $4, 0($5)
• sub -, $4, - • bne $2, $3, loop
• sub -, - , -
1 2 3 4 5 6 7 8 9
add F D X M W* 1 2 3 4 5 6 7 8 9
sub F D* X M W sw F D X M W
bne F D X* M W
But CPI > 1, we will do better using “register forwarding”
sub F D X M W

© 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 6 13 © 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 6 14

Control Dependence Pipelined Datapath


• sw $4, 0($5) Single-cycle datapath (Recall Fig. 6.10)
• bne $2, $3, loop Pipelined execution
• sub -, - , - • assume each instruction has itw own datapath (Fig. 6.11)
• but each instruction uses different part in every cycle
1 2 3 4 5 6 7 8 9
• multiplex all on one datapath
sw F D X M W
• latch to separate cycles (as in multicycle) and instructions!
bne F D X* M W
?? F D X M W Ignore data and control flow dependences for now
• data hazards
CPI > 1, we will do better • control flow hazards

© 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 6 15 © 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 6 16
Pipelined Datapath (Fig. 6.12) Pipelined Datapath
0
M
u
Instruction flow
x

• add and load


1

IF/ID ID/EX EX/MEM MEM/WB

Add • write of registers


4 Add
Add result

Shift
left 2
• pass register specifiers
Instruction

Read
PC Address register 1 Read
data 1
Read
register 2 Zero

Any info needed by a later stage will be passed down


Instruction
RegistersRead ALU ALU
memory Write 0 Read
data 2 result Address 1
register M data
u M
Data u
Write x memory
data x
1

• store value through EX


0
Write
data
16 32
Sign
extend

© 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 6 17 © 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 6 18

Pipelined Control Figure 6.25


IF and ID PCSrc

• none M
u
x
1

EX IF/ID ID/EX EX/MEM MEM/WB

Add

• ALUop, ALUsrc, Regdst 4

Shift
Add
Add
result
Branch

RegWrite left 2

Instruction
Read MemWrite

MEM PC Address register 1

Read
register 2
Read
data 1 ALUSrc
Zero
Zero MemtoReg
Instruction
RegistersRead ALU ALU
memory Write 0 Read

• Branch MemRead, MemWrite


data 2 result Address 1
register M data
u M
Data u
Write x memory
data x
1
0
Write
data
Instruction

WB [15–0] 16
Sign
extend
32 6
ALU
control MemRead

Instruction

• MemtoReg, RegWrite [20–16]

Instruction
0
M
u
ALUOp

[15–11] x
1

RegDst

© 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 6 19 © 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 6 20
Figure 6.29 Figure 6.30
PCSrc

ID/EX
0
M
u WB
WB x
1
EX/MEM

Control M WB
MEM/WB

EX M WB
Instruction IF/ID

Control M WB
Add

4 Add
Add result

RegWrite
Branch
EX M WB Shift

MemWrite
left 2

Instruction
ALUSrc
Read

MemtoReg
PC Address register 1 Read
data 1
Read
register 2 Zero
Instruction
RegistersRead ALU ALU
memory Write 0 Read
data 2 result Address 1
register M data
u Data M
Write x memory u
data x
1
0
Write
data

Instruction
16 32 6
[15–0] Sign ALU MemRead
extend control

Instruction
[20–16]
0
IF/ID ID/EX EX/MEM MEM/WB M
ALUOp

Instruction u
[15–11] x
1
RegDst

© 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 6 21 © 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 6 22

Pipelined Control Pipelining


But controlled by different instructions Not too complex yet

Decode instructions and pass the signals down the pipe • data hazards
• control hazards
Control sequencing is embedded in the pipeline
• exceptions

© 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 6 23 © 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 6 24
Data Hazards Data Hazards
sub $2, $1, $3 Must first detect hazards
and $12, $2, $5 ID/EX.WriteRegister = IF/ID.ReadRegister1

or $13, $6, $2 ID/EX.WriteRegister = IF/ID.ReadRegister2


add $14, $2, $2 EX/MEM.WriteRegister = IF/ID.ReadRegister1

sw $15, 100($2) EX/MEM.WriteRegister = IF/ID.ReadRegister2

MEM/WB.WriteRegister = IF/ID.ReadRegister1
MEM/WB.WriteRegister = IF/ID.ReadRegister2

© 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 6 25 © 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 6 26

Data Hazards Data Hazards


Not all hazards because some Hazard detection unit
• WriteRegister not used e.g., sw • several 5-bit (or 6-bit) comparators
• ReadRegister not used e.g., addi, jump
Response? Stall pipeline
• Do something only if necessary • Instructions in IF and ID stay
• IF/ID pipeline latch not updated
• send “nop” down pipeline - called a “bubble”
• PcWrite, IF/IDWrite and nop mux

© 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 6 27 © 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 6 28
Register Forwarding (Figure 6.38) Data Hazard
A better response - forwarding

u
x

u
x
M

M
MEM/WB

MEM/WB
all of the above made sure reg read after reg write

MEM/WB.RegisterRd
EX/MEM.RegisterRd
memory

memory
Data

Data
Instead of stalling
• use mux to select forwarded value rather than reg value
EX/MEM

EX/MEM

Forwarding
ALU
• control mux with hazard detection logic

unit
ForwardA
ALU

ForwardB
u
x

u
x

u
x
M

M
Rs
Rt
Rt
Rd
ID/EX

ID/EX

b. With forwarding
a. No forwarding
Registers

Registers

© 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 6 29 © 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 6 30

Data Hazards Data Hazards


Load followed by a use Other options

Can’t avoid a stall Disallow hazardous sequences

Stall one cycle and the forward • compiler will never generate them
• assembly programmers will not use them
• If used, result is random

© 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 6 31 © 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 6 32
Control Flow Hazards Control Flow Hazards
Control flow instructions What to do?
• branches, jumps, jals, returns • Always stall
• can’t fetch until branch outcome known • easy to implement
• too late for next IF • performs poorly
• 1/6th instructions is a branch, each branch takes 3 cycle
• what is the CPI?

© 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 6 33 © 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 6 34

Control Flow Hazards Control Flow Hazards


Predict branch not taken Late flush of instructions on misprediction

let sequential instructions go down the pipeline Complex

must kill later instructions if incorrect

must stop memory accesses and reg writes


• including loads (why?)

© 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 6 35 © 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 6 36
Control Flow Hazards Control Flow Hazards
Even better but more complex Another option: delayed branches
• predict taken • always execute following instruction
• predict both • delay slot
• dynamically adapt to program branch patters • put useful instruction, nop otherwise
• significant fraction of chip real estate losing popularity
• PentiumIII
• Alpha 21264
• current topic of research

© 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 6 37 © 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 6 38

Exceptions Exceptions
add $1, $2, $3 overflows! Even worse: in one cycle

a surprise branch • I/O interrupt

• earlier instruction flow to completion • user trap to OS

• kill later instructions • illegal instruction

• save PC in EPC, PC to exception handler, Cause, etc • arithmetic overflow


• hardware error
cost a lot of designer sanity
• etc

© 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 6 39 © 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 6 40
State of the Art: Superscalar State of the Art: Superscalar
1 1 1 1 IF: parallel access to I-cache, require alignment?
1 2 3 4 5 6 7 8 9
0 1 2 3
ID: replicate logic, fixed length instrs? hazard checks? dynamic?
i F D X M W
i+1 F D X M W EX: parallel/pipelined
i+2 F D X M W
MEM: >1 per cycle? If so, hazards, multi-ported register D-cache?
i+3 F D X M W
i+4 F D X M W WB: different register files? multi-ported register files?
i+5 F D X M W more things replicated
i+5 F D X M W
i+7 F D X M W more possibilities for hazards
more loss due to hazards (why?)

© 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 6 41 © 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 6 42

State of the Art: Out of Order Out of Order in the Limit


• execute later instructions while previous is waiting Program Processing
Form Phase
• decouple into different units
static
• one to fetch/decode, several to execute, one to write back program
instruction fetch
• fetch in program order & branch prediction

dynamic
• execute out of order speculatively! instruction dependence checking
stream & dispatch
• commit in order execution
window
instruction issue
Execution Wavefront
instruction execution

completed instruction reorder


& commit
instructions

© 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 6 43 © 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 6 44
A Generic Out of Order Processor Review
Big picture
floating pt.
register
Datapath
file
floating pt.
Control
instruction functional units
buffers • data hazards
register memory
instr.
pre- instr. rename interface • stalls
decode cache buffer
&dispatch functional units
integer/address
instruction and
• forwarding
buffers data cache
• control flow hazards
integer
register • branch prediction
file
re-order buffer
Exceptions

© 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 6 45 © 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 6 46

You might also like