University Institute of Engineering
Department of Computer Science & Engineering
COMPUTER ORGANIZATION & ARCHITECTURE
(23CST-204/23ITT-204)
ER. SHIKHA ATWAL
E11186
ASSISTANT PROFESSOR
BE-CSE
INSTRUCTION PIPELINE
Instruction pipelining is a technique that implements a
form of parallelism called
instruction level parallelism within a single processor.
A pipelined processor does not wait until the previous instruction has been
executed completely.
Rather, it fetches the next instruction and begins its execution.
Multiple instructions execute simultaneously in this.
The efficiency of pipelined execution is more than that of non-pipelined
execution.
Four-Stage Pipeline
In a four-stage pipelined architecture, the execution of each instruction is
completed in the following four stages:
1.Instruction Fetch (IF)
2.Instruction Decode (ID)
3.Instruction Execute (IE)
4.Write back (WB)
To implement a four-stage pipeline,
●The hardware of the CPU is divided into four functional units.
●Each functional unit performs a dedicated task.
Stage-01:
At stage-01,
●First functional unit performs instruction fetch.
●It fetches the instruction to be executed.
●Instruction to be fetched from code area of main memory into the instruction
register.
Stage-02:
At stage-02,
●The second functional unit performs instruction decode.
●It decodes the instruction to be executed.
●The opcode from instruction buffer is decoded so as to identify the operation to
be performed.
●The instruction to be operated in input data to be fetched from memory (data
area).
Stage-03:
At stage-03,
●The third functional unit performs instruction execution.
●It executes the instruction.
●Perform arithmetic/logical or any specified operation on the operand and
generate result.
Stage-04:
At stage-04,
●The fourth functional unit performs writeback.
●It writes back the result, to the main memory or the registers, obtained after
executing the instruction.
Execution of Instruction Pipeline
In pipeline architecture,
●Instructions in the program execute in parallel.
●When one instruction goes from the nth stage to the (n+1)th stage, another
instruction goes from the (n-1)th stage to the nth stage.
In non-pipelined mode, these four stages are performed sequentially one after
other for each instruction.
S4
S3
S2
S1
In a 4-stage pipelined computer, successive stages are operated/executed in
overlapped fashion.
S4
S3
S2
S1
Phase-Time Diagram
●A phase-time diagram shows the execution of instructions in the pipelined
architecture.
●The following diagram shows the execution of three instructions in a four-stage
pipeline architecture.
Time taken to execute three instructions in four stage pipelined architecture = 6
clock cycles.
NOTE
In a non-pipelined architecture, the time taken to execute three instructions would
be = 3 x Time taken to execute one instruction
= 3 x 4 clock cycles
= 12 clock cycles
Clearly, pipelined execution of instructions is far more efficient than non-
pipelined execution.
Performance of Pipelined Execution
The following parameters serve as criteria to estimate the performance of
pipelined execution:
●Speed Up
●Efficiency
●Throughput
1. Speed Up
It gives an idea of "how much faster" the pipelined execution is as
compared to non- pipelined execution. It is calculated as-
2. Efficiency
The efficiency of pipelined execution is calculated as-
2. Throughput
Throughput is defined as number of instructions executed per unit time. It is
calculated as-
Calculation of Important Parameters
Let us learn how to calculate certain important parameters of pipelined
architecture. Consider-
●A pipelined architecture consisting of k-stage pipeline
●Total number of instructions to be executed = n
Point-01: Calculating Cycle Time-
In pipelined architecture,
●There is a global clock that synchronizes the working of all the stages.
●Frequency of the clock is set such that all the stages are synchronized.
●At the beginning of each clock cycle, each stage reads the data from its register
and process it.
●Cycle time is the value of one clock cycle.
There are two cases possible-
Case-01: All the stages offer same delay-
If all the stages offer same delay, then-
Cycle time = Delay offered by one stage including the delay due to its register
Case-02: All the stages do not offer same delay-
If all the stages do not offer same delay, then-
Cycle time = Maximum delay offered by any stage including the delay due to its
register
Point-02: Calculating Frequency of Clock-
Frequency of the clock (f) = 1 / Cycle time
Point-03: Calculating Non-Pipelined Execution Time-
In non-pipelined architecture,
• The instructions execute one after the other.
• The execution of a new instruction begins only after the previous instruction
has executed completely.
So, number of clock cycles taken by each instruction = k clock cycles
Thus, Non-pipelined execution time
= Total number of instructions x Time taken to execute one instruction
= n x k clock cycles
Point-04: Calculating Pipelined Execution Time-
In pipelined architecture,
●Multiple instructions execute parallelly.
●Number of clock cycles taken by the first instruction = k clock cycles
●After first instruction has completely executed, one instruction comes out per
clock cycle.
●So, number of clock cycles taken by each remaining instruction = 1 clock
cycle
Thus, Pipelined execution time
= Time taken to execute first instruction + Time taken to execute remaining
instructions
= 1 x k clock cycles + (n-1) x 1 clock cycle
= (k + n – 1) clock cycles
Point-05: Calculating Speed Up-
Speed up
= Non-pipelined execution time / Pipelined execution time
= n*k clock cycles / (k + n – 1) clock cycles
= n*k / (k + n – 1)
= n*k / n + (k – 1)
= k / [1 + (k – 1)/n]
●For very large number of instructions, n→∞. Thus, speed up = k.
●Practically, total number of instructions never tend to infinity.
●Therefore, speed up is always less than number of stages in pipeline.
●Theoretically, K stage pipeline time is K - times faster than serial.
●But this ideal speed up mentioned above cannot be achieved due to factors like
data dependency branch and interrupts.
Important Notes
Note-01:
●The aim of pipelined architecture is to execute one complete
●In other words, the aim of pipelining is to maintain CPI ≅ 1.
instruction in one clock cycle.
●Practically, it is not possible to achieve CPI ≅ 1 due to delays that get
introduced due to registers.
●Ideally, a pipelined architecture executes one complete instruction per
clock cycle (CPI=1).
Note-02:
●The maximum speed up that can be achieved is always equal to the
number of stages.
●This is achieved when efficiency becomes 100%.
●Practically, efficiency is always less than 100%.
●Therefore speed up is always less than number of stages in pipelined
architecture.
Note-03:
Under ideal conditions,
●One complete instruction is executed per clock cycle i.e. CPI = 1.
●Speed up = Number of stages in pipelined architecture
Note-04:
●Experiments show that 5 stage pipelined processor gives the best
performance.
Note-05:
In case only one instruction has to be executed, then-
●Non-pipelined execution gives better performance than pipelined
execution.
●This is because delays are introduced due to registers in pipelined
architecture.
●Thus, time taken to execute one instruction in non-pipelined
architecture is less.
Note-06:
High efficiency of pipelined processor is achieved when-
●All the stages are of equal duration.
●There are no conditional branch instructions.
●There are no interrupts.
●There are no register and memory conflicts.
●Performance degrades in absence of these conditions.
References
Reference Books:
●J.P. Hayes, “Computer Architecture and
Organization”, Third Edition.
●Mano, M., “Computer System Architecture”, Third
Edition, Prentice Hall.
●Stallings, W., “Computer Organization and Architecture”, Eighth
Edition, Pearson Education.
Text Books:
●Carpinelli J.D,” Computer systems organization &Architecture”, Fourth
Edition, Addison Wesley.
●Patterson and Hennessy, “Computer Architecture”, Fifth Edition Morgaon
Other References:
●https://www.gatevidyalay.com/pipelining-practice-problems/
Video Links:
●https://youtu.be/YhGv5AOcz1s?si=2Me5gnNt1SqiIOgl
●https://youtu.be/1vqCgOVTB0I?si=vGnV1sPbpO8ZY3dA
●https://youtu.be/_cNrYUUDaq8?si=uUf63U4XXXiLOHWZ