Unit-3
Pipelining
By
Dr. Sarvesh Vishwakarma
Professor – CSE
TCS 704 - Advanced Computer Architecture
Dr. Sarvesh Vishwakarma TCS-704 ACA
Introduction of pipelining
• Pipelining can efficiently increase the performance of processor by
overlapping execution of instructions. But the efficiency of the pipelining
depends upon, how the problem encounter during the implementation of
pipelining is handled. These problems are known as Hazards.
• Pipelining is an implementation techniques where by multiple instructions
are overlapped in execution. It takes advantage of parallelism that exists
among the actions needed to execute an instruction. A pipeline is like an
assembly line, In an automobile assembly line, there are many steps, each
constituting something to the construction of the car. Each steps operate
in parallel with the other steps.
• Pipelining increases the CPU instruction throughput the number of
instruction completed per unit of time-but it does not reduce the
execution time of an individual instruction. In fact, it usually slightly
increase the execution time of each instruction due to overhead in the
control of the pipeline.
Dr. Sarvesh Vishwakarma TCS-704 ACA
Classic five-stage pipeline for a RISC processor
Input
Common Clock
L
S1
L
S2
L
Sk
Output
Dr. Sarvesh Vishwakarma TCS-704 ACA
Classic five-stage pipeline for a RISC processor
The Logic circuitry in each stage 𝑆𝑖 has a time delay denoted by τ𝑖
Let τ𝑙 be time delay of each interface latch
Clock period of a linear pipeline is defined by speed of the slowest stage plus overhead
τ = max τ𝑖 1𝑘 + τ𝑙
τ = 𝜏𝑚 + 𝜏𝑙
1
The reciprocal of the clock period is called the frequency ‘𝑓’ of a pipeline processor 𝑓 =
τ
𝐼𝑗𝑖 ∶ 𝑡ℎ𝑒 𝑗𝑡ℎ 𝑖𝑛𝑠𝑡𝑟𝑢𝑐𝑡𝑖𝑜𝑛 𝑜𝑛 𝑖𝑡ℎ 𝑠𝑡𝑎𝑔𝑒
1 2 3 4 5 6 7 8 9 10 11 12 13
S 𝐼11 𝐼21 𝐼31 𝐼41 𝐼51 𝐼61 𝐼71
Space
S1 𝐼12 𝐼22 𝐼32 𝐼42 𝐼52 𝐼62 𝐼72
S2 𝐼13 𝐼23 𝐼33 𝐼43 𝐼53 𝐼63 𝐼73
S3 𝐼14 𝐼24 𝐼34 𝐼44 𝐼54 𝐼64 𝐼74
S4 𝐼15 𝐼25 𝐼35 𝐼45 𝐼55 𝐼65 𝐼75
5
Time
Dr. Sarvesh Vishwakarma TCS-704 ACA
Classic five-stage pipeline for a RISC processor
Ideally, a linear pipeline processor with k-stage can process n instructions in
𝑇𝑘 = 𝑘 + (𝑛 − 1) clock periods
𝑘 cycles are used to filled up the pipeline or to complete execution of the first instruction I 1
𝑛 − 1 cycles are needed to complete the remaining 𝑛 − 1 instructions
a non-pipeline processor with k-stage can process n instructions in
𝑇1 = 𝑘(𝑛) clock periods
𝐼𝑗𝑖 ∶ 𝑡ℎ𝑒 𝑗𝑡ℎ 𝑖𝑛𝑠𝑡𝑟𝑢𝑐𝑡𝑖𝑜𝑛 𝑜𝑛 𝑖𝑡ℎ 𝑠𝑡𝑎𝑔𝑒
1 2 3 4 5 6 7 8 9 10 11 12 13
S 𝐼11 𝐼21 𝐼31
Space
S1 𝐼12 𝐼22 𝐼32
S2 𝐼13 𝐼23 𝐼33
S3 𝐼14 𝐼24
S4 𝐼15 𝐼25
5
Time
Dr. Sarvesh Vishwakarma TCS-704 ACA
Classic five-stage pipeline for a RISC processor
Speedup (𝑆𝑘 ) of a k-stage linear pipeline processor over an equivalent non-pipeline processor as
𝑇1 𝑛×𝑘
Speedup = =
𝑇𝑘 𝑘+(𝑛−1)
For n ≫ 𝑘
𝑛×𝑘
Speedup 𝑆𝑘 ≈ =𝑘
𝑛
The maximum speedup that a linear pipeline can provide is k, where k is the number of stages in the pipeline . This
max. speedup is never fully achievable because of data dependencies between instructions, interrupts, program
branches, and other factors.
Pipeline efficiency (η): Large the number of instructions flowing through the pipeline, the better is its efficiency.
Let n : Nos. of instructions
𝑘 : No. of pipeline stages
τ ∶ 𝑐𝑙𝑜𝑐𝑘 𝑝𝑒𝑟𝑖𝑜𝑑 𝑜𝑓 𝑎 𝑙𝑖𝑛𝑒𝑎𝑟 𝑝𝑖𝑝𝑒𝑙𝑖𝑛𝑒
𝑛×𝑘×τ 𝑛
Pipeline efficiency η = {𝑘+ 𝑛−1 }×τ
= 𝑘+𝑛−1 (if 𝑛 ≫ 𝑘 )
η should approach to 1
Dr. Sarvesh Vishwakarma TCS-704 ACA
Classic five-stage pipeline for a RISC processor
Throughput (W): Number of instructions that can be executed by a pipeline processor per unit time is called its
throughput rate. This reflect the computing power of a pipeline processor.
𝑛 𝑛 1
W= = ×
{𝑘+ 𝑛−1 }×τ 𝑘+𝑛−1 τ
𝑛
Since η =
𝑘+𝑛−1
1
Therefore, Throughput W = η ×
τ
when 𝑛 ≫ 𝑘 then, η → 1
1
So, Throughput W ≈
τ
Throughput become equivalent to frequency of pipeline processor.
Dr. Sarvesh Vishwakarma TCS-704 ACA
Classic five-stage pipeline for a RISC processor
1 2 3 4 5 6 7 8 9
S 𝐼𝑖 𝐼𝑗 𝐼𝑘 𝐼𝑙 𝐼𝑚
Space
S1 𝐼𝑖 𝐼𝑗 𝐼𝑘 𝐼𝑙 𝐼𝑚
2
S 𝐼𝑖 𝐼𝑗 𝐼𝑘 𝐼𝑙 𝐼𝑚
3 𝐼𝑗 ∶ 𝑡ℎ𝑒 𝑗𝑡ℎ 𝑖𝑛𝑠𝑡𝑟𝑢𝑐𝑡𝑖𝑜𝑛
S 𝐼𝑖 𝐼𝑗 𝐼𝑘 𝐼𝑙 𝐼𝑚
4
S
𝐼𝑖 𝐼𝑗 𝐼𝑘 𝐼𝑙 𝐼𝑚
5 𝐵𝑢𝑠𝑦 𝑆𝑡𝑎𝑡𝑒
Performance =
Time 𝐵𝑢𝑠𝑦 𝑆𝑡𝑎𝑡𝑒+𝑖𝑑𝑒𝑎𝑙 𝑆𝑡𝑎𝑡𝑒
Can be shifted here 𝑘×𝑛×τ 𝑛
= =
𝑘×{𝑘+ 𝑛−1 }×τ 𝑘+ 𝑛−1
1 2 3 4 5 6 7 8 9 If 𝑛 → ∞,
𝐼𝑖 𝐼𝑗 𝐼𝑘 𝐼𝑙 𝐼𝑚
Space
S 𝑛
Then Performance ≈ = 1
S1 𝑛
𝐼𝑚 𝐼𝑖 𝐼𝑗 𝐼𝑘 𝐼𝑙
2
S 𝐼𝑙 𝐼𝑚 𝐼𝑖 𝐼𝑗 𝐼𝑘
3
S 𝐼𝑘 𝐼𝑙 𝐼𝑚 𝐼𝑖 𝐼𝑗
4
S
𝐼𝑗 𝐼𝑘 𝐼𝑙 𝐼𝑚 𝐼𝑖
5 Time
Busy state Ideal state
Dr. Sarvesh Vishwakarma TCS-704 ACA
Introduction
Ii……………
Pipelining Ij……………
Ik……………
Il……………
• Challenges: Im……………
– Data dependency
• Instruction Ij is data dependent on instruction Ii if
– Instruction Ii produces a result that may be used by instruction Ij
• Dependent instructions cannot be executed
simultaneously
Dr. Sarvesh Vishwakarma TCS-704 ACA
Introduction
Ii……………
Pipelining Ik……………
Ij……………
• Challenges: Il……………
– Data dependency Im……………
– Instruction Ij is data dependent on instruction Ik and
• Dependent instructions cannot be executed
simultaneously
Dr. Sarvesh Vishwakarma TCS-704 ACA
Introduction
Pipelining Ii……………
Ik……………
Ij……………
• Challenges: Il……………
– Data dependency
Im……………
– Instruction Ij is data dependent on instruction Ik and
instruction Ik is data dependent on instruction Ii.
• Dependent instructions cannot be executed
simultaneously
Dr. Sarvesh Vishwakarma TCS-704 ACA
Introduction
Pipelining Ii……………
Ik……………
• Challenges: Ij……………
Il……………
– Data dependency Im……………
• Instruction Ij is data dependent on instruction Ii if
– Instruction Ij is data dependent on instruction Ik and
instruction Ik is data dependent on instruction Ii.
• Dependent instructions cannot be executed
simultaneously
Dr. Sarvesh Vishwakarma TCS-704 ACA
Introduction
Pipelining
• Challenges:
– Data dependency
• Instruction Ij is data dependent on instruction Ii if
– Instruction Ii produces a result that may be used by instruction Ij
– Instruction Ij is data dependent on instruction Ik and instruction
Ik is data dependent on instruction Ii.
• Dependent instructions cannot be executed
simultaneously
Dr. Sarvesh Vishwakarma TCS-704 ACA
Range of Instruction
ADD.D F4, F0, F2 F4 = F0 + F2
SUB.D F6, F5, F4 F6 = F5 – F4
SUB.D instruction depends on
Ii ADD.D F4, F0, F2
ADD.D due to transfer of data Ij SUB.D F6, F5, F4
through resource object F4.
Ik……………
Il……………
Im……………
Dr. Sarvesh Vishwakarma TCS-704 ACA
Non-pipeline processor
ADD.D F4, F0, F2 F4 = F0 + F2
SUB.D F6, F5, F4 F6 = F5 – F4
SUB.D instruction depends on
ADD.D due to transfer of data
through resource object F4.
stage 1 2 3 4 5 6 7 8 9 10
IF ADD SUB
ID ADD SUB
Rop ADD SUB
EXE ADD SUB
WB ADD SUB
Dr. Sarvesh Vishwakarma TCS-704 ACA
Pipeline processor
Data Hazards
ADD.D F4, F0, F2 F4 = F0 + F2
SUB.D F6, F5, F4 F6 = F5 – F4
SUB.D instruction depends on
ADD.D due to transfer of data Reading a value from F4
through resource object F4.
stage 1 2 3 4 5 6 7 8 9 10
IF ADD SUB
ID ADD SUB
Rop ADD SUB
EXE ADD SUB
WB ADD SUB
Value written
in F4 TCS-704 ACA
Dr. Sarvesh Vishwakarma
Pipeline processor
Avoid Data Hazards
ADD.D F4, F0, F2 F4 = F0 + F2
SUB.D F6, F5, F4 F6 = F5 – F4
SUB.D instruction depends on Reading a value from F4
ADD.D due to transfer of data
through resource object F4.
stage 1 2 3 4 5 6 7 8 9 10
IF ADD SUB
ID ADD SUB
Rop ADD STALL STALL SUB
EXE ADD STALL SUB
WB ADD SUB
Value written
in F4 TCS-704 ACA
Dr. Sarvesh Vishwakarma
Introduction
Data Dependence
• Dependencies are a property of programs
• Pipeline organization determines if dependence is
detected and if it causes a stall
• Data dependence conveys:
– Possibility of a hazard
– Order in which results must be calculated
– Upper bound on exploitable instruction level
parallelism
• Dependencies that flow through memory
locations are difficult to detect
Dr. Sarvesh Vishwakarma TCS-704 ACA
Name Dependence continue…….
Anti dependence
I0…………… Instruction I1 precedes Instruction I3 in
program order.
I1…………… READ There is no flow of
I2…………… Resource data between
I3…………… WRITE Object Instructions
I4…………… associated with
resource object.
The Original ordering must be preserved to ensure
No data flow from
that Instruction I1 read the correct value.
Instruction I1 to
Instruction I3 in
this program.
Dr. Sarvesh Vishwakarma TCS-704 ACA
Name Dependence continue…….
Output dependence
I0…………… Instruction I1 precedes Instruction I3 in
program order.
I1…………… WRITE There is no flow of
I2…………… Resource data between
I3…………… WRITE Object Instructions
I4…………… associated with
resource object.
The ordering between the instructions must be
No data flow from
preserved to ensure that the value finally written Instruction I1 to
corresponds to Instruction I3 . Instruction I3 in
this program.
Dr. Sarvesh Vishwakarma TCS-704 ACA
Introduction
Name Dependence
• Two instructions use the same name but no flow
of information
– Not a true data dependence, but is a problem when
reordering instructions
– Antidependence: instruction j writes a register or
memory location that instruction i reads
• Initial ordering (i before j) must be preserved
– Output dependence: instruction i and instruction j
write the same register or memory location
• Ordering must be preserved
• To resolve, use renaming techniques
Dr. Sarvesh Vishwakarma TCS-704 ACA
Introduction
Other Factors
• Data Hazards
– Read after write (RAW)
– Write after write (WAW)
– Write after read (WAR)
• Control Dependence
– Ordering of instruction i with respect to a branch
instruction
• Instruction control dependent on a branch cannot be moved
before the branch so that its execution is no longer
controller by the branch
• An instruction not control dependent on a branch cannot be
moved after the branch so that its execution is controlled by
the branch
Dr. Sarvesh Vishwakarma TCS-704 ACA
Pipeline Hazards
Pipelining can efficiently increase the performance of
processor by overlapping execution of instructions. But
the efficiency of the pipelining depends upon, how the
problem encountered during the implementation of
pipelining is handled. These problems are known as
Hazards.
1) Data Hazards:- due to true data & name dependence
2) Control Hazards:- due to presence of branch instruction
3) Resource/structure Hazards:- due to lack of resources
Dr. Sarvesh Vishwakarma TCS-704 ACA
• Structural hazards
– If some combination of instructions can not be
accommodated because of a resource conflict, the
machine is said to have a structural hazards.
– This type of hazard occurs, when two activities require
the same resource simultaneously. Under this
situation hardware can not support all possible
combination of instruction simultaneously.
– Also, named as Resource Hazard.
Dr. Sarvesh Vishwakarma TCS-704 ACA
Implementation of Pipeline
• Need to take stalls at run time during
execution of instructions if there exist a data
dependence.
• Pipeline CPI = (Ideal pipeline CPI + Structural
stalls + Data hazard stalls + Control stalls)
Dr. Sarvesh Vishwakarma TCS-704 ACA
What makes pipelining hard to
implement?
• Pipeline hazards
– True data dependence
– Name dependence
• Anti dependence
• Output dependence
– Control dependence
– Resource hazards
Dr. Sarvesh Vishwakarma TCS-704 ACA
Domain of Instruction
• Set of resource objects whose data objects may
affect the execution of instruction `I’
• Denoted by D(I)
L.D F0, 0(R1) D(L.D) = {R1}
ADD.D F4, F0, F2 D(ADD.D) = {F0, F2}
S.D F4, 0(R1) D(S.D) = {R1,F4}
Dr. Sarvesh Vishwakarma TCS-704 ACA
Range of Instruction
• Set of resource objects whose data objects may
be modified by the execution of instruction `I’
• Denoted by R(I)
L.D F0, 0(R1) R(L.D) = {F0}
ADD.D F4, F0, F2 R(ADD.D) = {F4}
S.D F4, 0(R1) R(S.D) = {Nil}
Dr. Sarvesh Vishwakarma TCS-704 ACA
Pipeline Hazard Detection and
Resolution
R (I ) D(J ) for RAW
D(I ) R (J ) for WAR
R (I ) R (J ) for WAW
Dr. Sarvesh Vishwakarma TCS-704 ACA
RAW hazard condition
(Pure data dependence)
Instruction I
D(I) R(I)
(Write)
Instruction J
D(J) R(J)
(Read)
Dr. Sarvesh Vishwakarma TCS-704 ACA
WAR hazard condition
(Anti dependence)
Instruction I
D(I) R(I)
(Read)
Instruction J
D(J) R(J)
(Write)
Dr. Sarvesh Vishwakarma TCS-704 ACA
WAW hazard condition
(Output dependence)
Instruction I
D(I) R(I)
(Write)
R(J)
Instruction J
(Write)
D(J)
Dr. Sarvesh Vishwakarma TCS-704 ACA
Find Data Dependence
True Data depenence (RAW)
D(ADD.D) R(LD) = {F0} L.D F0, 0(R1) D(L.D) = {R1} R(L.D) ={F0}
ADD.D F4, F0, F2 D(ADD.D) = {F0, F2} R(ADD.D) ={F4}
D(SD) R(LD) = S.D F4, 0(R1) D(S.D) = {R1,F4} R(S.D) ={Nil}
D(SD) R(ADD.D) = {F4}
Two cases of true data dependence.
Output dependence (WAW)
R(LD) R(ADD.D) =
R(LD) R(SD) =
R(ADD.D) R(SD) =
No case of output dependence.
Antidependence (WAR)
D(LD) R(ADD.D) =
D(LD) R(SD) =
D(ADD.D) R(SD) =
No case of anti dependence.
Dr. Sarvesh Vishwakarma TCS-704 ACA