0% found this document useful (0 votes)
24 views12 pages

Pipelining & Parallel Processing Guide

Uploaded by

bittukamble0
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views12 pages

Pipelining & Parallel Processing Guide

Uploaded by

bittukamble0
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd

Chapter 06

Pipelining and Parallel Processing


Agenda

• Pipelining: Introduction, Pipeline organization, Pipelining issues,


Memory delays, Branch delays, Performance evaluation, The ARM
processor
Pipelining :
• Pipelining is a particularly effective way of organizing concurrent activity in a computer
system.
• Pipelining is a technique where multiple instructions are overlapped during execution.
• In pipeline system, each segment consists of an input register followed by a
combinational circuit.
Pipelining :
• A Pipeline is a series of stages, where some work is done at each stage. The work is
not finished until it has passed through all stages.

Types of Pipelines
• Arithmetic pipeline
where different stages of an arithmetic operation are handled along the stages of a
pipeline. They are used for floating point operations, multiplication of fixed point numbers
• A and B are mantissas (significant digit of floating point numbers), while a and b are
exponents. The floating point addition and subtraction is done in 4 parts:
1. Compare the exponents.
2. Align the mantissas.
3. Add or subtract mantissas
4. Produce the result.
Pipelining :
• Instructional pipeline
where different stages of an instruction fetch and execution are handled in a
pipeline.
In this a stream of instructions can be executed by overlapping fetch, decode and execute
phases of an instruction cycle.
An instruction pipeline reads instruction from the memory while previous instructions are
being executed in other segments of the pipeline.
Pipeline
organization
• each stage of the pipeline is processing a
different instruction.
• Interstage buffer B1 feeds the Decode stage with
a newly-fetched instruction.
• Interstage buffer B2 feeds the Compute stage
with the two operands read from the register
file, the source/destination register identifiers,
the immediate value derived from the
instruction, the incremented PC value.
• Interstage buffer B3 holds the result of the ALU
operation, which may be data to be written into
the register file or an address that feeds the
Memory stage
• Interstage buffer B4 feeds the Write stage with a
value to be written into the register file.
Pipelining Issues
• there are times when it is not possible to have a new instruction enter the pipeline in every
cycle.
• Consider the case of two instructions, Ij and Ij+1, where the destination register for
instruction Ij is a source register for instruction Ij+1.
• the result of instruction Ij+1 would be incorrect because the arithmetic operation would be
performed using the old value of the register in question.
• Any condition that causes the pipeline to stall is called a hazard
Memory delays
• Delays arising from memory accesses are another cause of pipeline stalls.
• This occur because the requested instruction or data are not found in the cache, resulting in a
cache miss. A memory access may take ten or more cycles.
• There is an additional type of memory-related stall that occurs when there is a data
dependency involving a Load instruction.
Load R2, (R3)
Subtract R9, R2, #30

• The compiler can eliminate the one-cycle


stall for this type of data dependency by
reordering instructions to insert a useful
instruction between the Load instruction
and the instruction that depends on the
data read from the memory.
Branch delays
• Branch instructions can alter the sequence of execution, but they must first be executed to
determine whether and where to branch.
• The effect of branch instructions and the techniques that can be used for mitigating their
impact on pipelined execution.

Unconditional Branches-
• two-cycle delay constitutes a branch penalty
• With a two-cycle branch penalty, the relatively
high frequency of branch instructions could
increase the execution time for a program by
as much as 40 percent.
• Reducing the branch penalty requires the
branch target address to be computed earlier
in the pipeline.
Branch delays
Conditional Branches-
• Consider a conditional branch instruction such as
Branch_if_[R5]=[R6] LOOP
• For pipelining, the branch condition must be tested as early as possible to limit the branch
penalty.
• Moving the branch decision to the Decode stage ensures a common branch penalty of only
one cycle for all branch instructions.
Performance Evaluation
• For a non-pipelined processor, the execution time, T, of a program that has a dynamic
instruction count of N is given by
T=N×S
R
where S is the average number of clock cycles it takes to fetch and execute one instruction, and
R is the clock rate in cycles per second.
• A useful performance indicator is the instruction throughput, which is the number of
instructions executed per second.
• Pipelining improves performance by overlapping the execution of successive instructions,
which increases instruction throughput even though an individual instruction is still executed
in the same number of cycles.
Performance Evaluation
[Link] of Stalls and Penalties-
• The five-stage pipeline involves memory-access operations in the Fetch and Memory stages,
and ALU operations in the Compute stage.
• The operations with the longest delay dictate the cycle time, and hence the clock rate R.
• The compiler can improve performance by reducing the number of times that a Load
instruction is immediately followed by a dependent instruction.
• A stall is eliminated each time the compiler can safely move a nearby instruction to a position
between the Load instruction and the dependent instruction.
• The effect of cache misses on performance can be assessed by considering the frequency of
their occurrence.

2. Number of Pipeline Stages-


• The number of pipeline stages increases, there are more instructions being executed
concurrently.

You might also like