Unit 3 Pipelining
Unit 3 Pipelining
Organization
Clock cycle 1 2 3 4 5 6
I1 I2 I3
F1 E1 F2 E2 F3 E3
Where
I Instruction To execute 3 instructions, sequential execution is
F Fetch required 6 clock cycles.
E Execution
I1 I2 I3
F1 B1 E1 B2 F2 B3 E2 B4 F3 B5 E3
Where
I Instruction
F Fetch Buffer
E Execution
I1 F1 E 1
Instruction I2 F2 E 2
I3 F3 E 3
Decode (D) : To decode the instruction and fetch the source operand(s).
FB 1
1
B1
B2
D1 B2
B3
E1 B3 W1
Where
I Instruction
F Fetch
D Decode
E Execute Buffer
W Write
Time
Clock cycle 1 2 3 4 5 6 7
Instruction
1 1 1 1 1
I F D E W
2 2 2 2 2
I F D E W
3 3 3 3 3
I F D E W
4 4 4 4 4
I F D E W
Each stage is allotted for one clock cycle to complete the process.
Faster stage can only wait for the slowest one to complete.
Processor is faster than the main memory. So Instruction should be fetched and
stored in some place. In this cache memory is used.
Hazard is 3 types
1) Data Hazard
2) Instruction Hazard
3) Structural Hazard
Structural hazard:
Two different instructions are trying to access one hardware at the same time.
So, identification of hazard is important to reduce the error and so performance can
be increased.
Process of Instruction execution, output data will be written in output register then it
will be forwarded to next instruction.
But before writing the data into output register, the data can be forwarded from
“Buffer” memory itself..
SRC1 SRC2
Register
file
ALU
RSLT
Destination
RSLT
SRC1, SRC2
W: Write
E: Execute (ALU)
Forwarding Path
When NOP is initiated, next instruction will not start its operations.
So any other operations can be performed by using hardwares.
Pipelined hardware should have few side effects when Instructions designed for
execution.
Instruction will be fetched and stored in cache memory about the next data to be
given to ALU.
If cache miss occurs, branch also can’t able to execute in the allotted time slot.
Instruction
I1 F1 E1
I 2 (Branch) F2 E2 Execution
unit idle
I3 F3 X
Ik Fk Ek
I k+1 F k+ 1 E k+ 1
I1 F1 D1 E1 W1
I3 F3 D3 X
I4 F 4` X
Ik Fk Dk Ek Wk
I k+ 1 F k+ 1 D k+ 1 E k+ 1
T ime
Clock c ycle 1 2 3 4 5 6 7
I1 F1 D1 E1 W1
I 2 (Branch) F2 D2
I3 F3 X
Ik Fk Dk Ek Wk
I k+ 1 F k+ 1 D k+ 1 E k+ 1
D : Dispatch/
E : Execute W : Write
Decode
instruction results
unit
The decision to branch cannot be made until the execution of that instruction has
been completed.
Branch instructions represent about 20% of the dynamic instruction count of most
programs.
Branch F E
Branch F E
Instruction
I 1 (Compare) F1 D1 E1 W1
I 2 (Branch>0) F2 D 2 /P2 E2
I3 F3 D3 X
I4 F4 X
Ik Fk Dk
LT : Likely to be taken
LNT : Likely not to be taken
Bus A
A
Bus B
ALU R
B
Bus C
PC
Control signal pipeline
Incrementer
Datapath modified for pipelined
execution Instruction IMAR
decoder
Memory address
(Instruction fetches)
Instruction
queue
Data cache