CH 6
CH 6
© 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 6 1 © 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 6 2
© 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 6 3 © 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 6 4
Big Picture Big Picture
Instruction Latency = 5 cycles Cycles
Instruction Throughput = 1/5 instructions per cycle Instructions
CPI = 5 cycles per instruction 1 1 1 1
1 2 3 4 5 6 7 8 9
0 1 2 3
i F D X M W
Pipelining: process instructions like a lunch buffet!
i+1 F D X M W
ALL microprocessors today employ pipelining for speed i+2 F D X M W
E.g., Intel PentiumIII and Compaq Alpha 21264 i+3 F D X M W
i+4 F D X M W
© 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 6 5 © 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 6 6
Instruction throughput = 1 instruction per cycle • datapath? note: five instructions in datapath in cycle 5
• control? must be generated by multiple instructions
CPI = 1 cycle per instruction
• instructions may have data and control flow dependences
CPI = cycle between instruction completion = 1!
© 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 6 7 © 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 6 8
Datapath (Fig. 6.11) Datapath (Fig. 6.10)
Time (in clock cycles)
IF: Instruction fetch
ID: Instruction decode/ EX: Execute/ MEM: Memory access WB: Write bac
Program register file read address calculation
CC 1 CC 2 CC 3 CC 4 CC 5 CC 6 CC 7 0
execution M
u
order x
1
(in instructions)
lw $1, 100($0) IM Reg ALU DM Reg
Add
4 Add
Add result
Shift
left 2
Read
lw $2, 200($0) IM Reg ALU DM Reg PC Address register 1
Read
data 1
Read
register 2 Zero
Instruction RegistersRead ALU ALU
Write data 2 0 Read
result Address 1
register M data
Instruction M
u Data
memory Write x u
memory x
data 1
0
Write
data
16 32
lw $3, 300($0) IM Reg ALU DM Reg Sign
extend
© 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 6 9 © 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 6 10
© 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 6 11 © 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 6 12
Data Dependence Control Dependence
Simple solution : Stall the pipeline One instruction affects which instruction will execute next
E.g., E.g., bne, j
• add $1, - , - • sw $4, 0($5)
• sub -, $4, - • bne $2, $3, loop
• sub -, - , -
1 2 3 4 5 6 7 8 9
add F D X M W* 1 2 3 4 5 6 7 8 9
sub F D* X M W sw F D X M W
bne F D X* M W
But CPI > 1, we will do better using “register forwarding”
sub F D X M W
© 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 6 13 © 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 6 14
© 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 6 15 © 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 6 16
Pipelined Datapath (Fig. 6.12) Pipelined Datapath
0
M
u
Instruction flow
x
Shift
left 2
• pass register specifiers
Instruction
Read
PC Address register 1 Read
data 1
Read
register 2 Zero
© 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 6 17 © 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 6 18
• none M
u
x
1
Add
Shift
Add
Add
result
Branch
RegWrite left 2
Instruction
Read MemWrite
Read
register 2
Read
data 1 ALUSrc
Zero
Zero MemtoReg
Instruction
RegistersRead ALU ALU
memory Write 0 Read
WB [15–0] 16
Sign
extend
32 6
ALU
control MemRead
Instruction
Instruction
0
M
u
ALUOp
[15–11] x
1
RegDst
© 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 6 19 © 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 6 20
Figure 6.29 Figure 6.30
PCSrc
ID/EX
0
M
u WB
WB x
1
EX/MEM
Control M WB
MEM/WB
EX M WB
Instruction IF/ID
Control M WB
Add
4 Add
Add result
RegWrite
Branch
EX M WB Shift
MemWrite
left 2
Instruction
ALUSrc
Read
MemtoReg
PC Address register 1 Read
data 1
Read
register 2 Zero
Instruction
RegistersRead ALU ALU
memory Write 0 Read
data 2 result Address 1
register M data
u Data M
Write x memory u
data x
1
0
Write
data
Instruction
16 32 6
[15–0] Sign ALU MemRead
extend control
Instruction
[20–16]
0
IF/ID ID/EX EX/MEM MEM/WB M
ALUOp
Instruction u
[15–11] x
1
RegDst
© 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 6 21 © 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 6 22
Decode instructions and pass the signals down the pipe • data hazards
• control hazards
Control sequencing is embedded in the pipeline
• exceptions
© 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 6 23 © 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 6 24
Data Hazards Data Hazards
sub $2, $1, $3 Must first detect hazards
and $12, $2, $5 ID/EX.WriteRegister = IF/ID.ReadRegister1
MEM/WB.WriteRegister = IF/ID.ReadRegister1
MEM/WB.WriteRegister = IF/ID.ReadRegister2
© 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 6 25 © 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 6 26
© 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 6 27 © 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 6 28
Register Forwarding (Figure 6.38) Data Hazard
A better response - forwarding
u
x
u
x
M
M
MEM/WB
MEM/WB
all of the above made sure reg read after reg write
MEM/WB.RegisterRd
EX/MEM.RegisterRd
memory
memory
Data
Data
Instead of stalling
• use mux to select forwarded value rather than reg value
EX/MEM
EX/MEM
Forwarding
ALU
• control mux with hazard detection logic
unit
ForwardA
ALU
ForwardB
u
x
u
x
u
x
M
M
Rs
Rt
Rt
Rd
ID/EX
ID/EX
b. With forwarding
a. No forwarding
Registers
Registers
© 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 6 29 © 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 6 30
Stall one cycle and the forward • compiler will never generate them
• assembly programmers will not use them
• If used, result is random
© 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 6 31 © 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 6 32
Control Flow Hazards Control Flow Hazards
Control flow instructions What to do?
• branches, jumps, jals, returns • Always stall
• can’t fetch until branch outcome known • easy to implement
• too late for next IF • performs poorly
• 1/6th instructions is a branch, each branch takes 3 cycle
• what is the CPI?
© 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 6 33 © 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 6 34
© 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 6 35 © 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 6 36
Control Flow Hazards Control Flow Hazards
Even better but more complex Another option: delayed branches
• predict taken • always execute following instruction
• predict both • delay slot
• dynamically adapt to program branch patters • put useful instruction, nop otherwise
• significant fraction of chip real estate losing popularity
• PentiumIII
• Alpha 21264
• current topic of research
© 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 6 37 © 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 6 38
Exceptions Exceptions
add $1, $2, $3 overflows! Even worse: in one cycle
© 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 6 39 © 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 6 40
State of the Art: Superscalar State of the Art: Superscalar
1 1 1 1 IF: parallel access to I-cache, require alignment?
1 2 3 4 5 6 7 8 9
0 1 2 3
ID: replicate logic, fixed length instrs? hazard checks? dynamic?
i F D X M W
i+1 F D X M W EX: parallel/pipelined
i+2 F D X M W
MEM: >1 per cycle? If so, hazards, multi-ported register D-cache?
i+3 F D X M W
i+4 F D X M W WB: different register files? multi-ported register files?
i+5 F D X M W more things replicated
i+5 F D X M W
i+7 F D X M W more possibilities for hazards
more loss due to hazards (why?)
© 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 6 41 © 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 6 42
dynamic
• execute out of order speculatively! instruction dependence checking
stream & dispatch
• commit in order execution
window
instruction issue
Execution Wavefront
instruction execution
© 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 6 43 © 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 6 44
A Generic Out of Order Processor Review
Big picture
floating pt.
register
Datapath
file
floating pt.
Control
instruction functional units
buffers • data hazards
register memory
instr.
pre- instr. rename interface • stalls
decode cache buffer
&dispatch functional units
integer/address
instruction and
• forwarding
buffers data cache
• control flow hazards
integer
register • branch prediction
file
re-order buffer
Exceptions
© 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 6 45 © 2000 by Mark D. Hill CS/ECE 552 Lecture Notes: Chapter 6 46