0% found this document useful (0 votes)
44 views42 pages

Unit 3 Pipelining

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
44 views42 pages

Unit 3 Pipelining

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 42

Computer Architecture and UNIT - 3

Organization

© Kalasalingam Academy of Research and Education


Course Outline

CO 1 Examine functional units of a Course Description


computer, bus structure & addressing modes
This course aims to provide a strong foundation for
students to understand computer system architecture
CO 2 Apply various algorithms to solve and to apply these in sights and principles to future
arithmetic unit problems computer designs. The course is structured around the
three primary building blocks of general purpose
computing systems: processors, memories and
CO 3. Demonstrate single bus, multiple bus input/output.
organization and pipelining
This course includes the organization and architecture of
CO 4. Analyze RAM, ROM, Cache memory computer systems hardware, instruction set
and virtual memory architectures, addressing modes, register transfer
notation, processor design and computer arithmetic,
memory systems, hardware implementations of virtual
memory and input/output control and devices.
CO 5. Explore the various I/O interfaces

© Kalasalingam Academy of Research and Education COMPUTER ARCHITECTURE AND ORGANIZATION


Syllabus

Fundamental Concepts - Execution of a


Complete Instruction - Multiple Bus
Organization - Hardwired Control - Micro Unit 3
Programmed Control – Pipelining: Basic
Concepts - Data Hazards - Instruction
Hazards - Data Path and Control
Consideration

© Kalasalingam Academy of Research and Education COMPUTER ARCHITECTURE AND ORGANIZATION


Unit 1 Outline

Lesson 1. Basic Structure of Computers


Lesson 3.
Basic Processing Unit
Lesson 2. Arithmetic Unit

Lesson 3. Basic Processing Unit

Lesson 4. Memory System

Lesson 5. I/O Organization

© Kalasalingam Academy of Research and Education COMPUTER ARCHITECTURE AND ORGANIZATION


Course Progress (use before starting new lesson)

Lesson 1. Basic Structure of Computers

Lesson 2. Arithmetic Unit

Lesson 3. Basic Processing Unit

Lesson 4. Memory System

Lesson 5. I/O Organization

© Kalasalingam Academy of Research and Education COMPUTER ARCHITECTURE AND ORGANIZATION


Topic 1: Fundamental Concepts

Topic 2: Execution of a Complete


Instruction
Topic 3: Multiple Bus Organization
Topic 4: Hardwired Control
Topic 5: Micro Programmed
Control
Topic 6: Pipelining – Basic Concepts

Topic 7: Data Hazards Third Lesson Summary


Topic 8: Instruction Hazards
We will learn about the functional operations of a Processor.
Topic 9: Data Path and Control
Consideration How the data are transferred inside the processor registers using
internal processor bus.

© Kalasalingam Academy of Research and Education COMPUTER ARCHITECTURE AND ORGANIZATION


Topic 6: Pipelining –Basic Concepts

 Pipelining: is effective way of organizing hardware and executing multiple


instructions at the same clock.

 Pipelined organization requires sophisticated compilation techniques

 In modern processors, Pipelining concept is largely used.

 Throughput will be increased when Pipelining system is used.

 Processor and the Main memory uses the fastest technology.

 Hardware is arranged properly to do more number of operations at the same time.

© Kalasalingam Academy of Research and Education COMPUTER ARCHITECTURE AND ORGANIZATION


Topic 6: Pipelining –Basic Concepts

 Pipelining doesn’t help latency of single task, it helps throughput of entire


workload

 Pipeline rate limited by slowest pipeline stage

 Multiple tasks operating simultaneously using different resources

 Potential speedup = Number pipe stages

 Unbalanced lengths of pipe stages reduces speedup

 Time to “fill” pipeline and time to “drain” it reduces speedup

 Stall for Dependences


© Kalasalingam Academy of Research and Education COMPUTER ARCHITECTURE AND ORGANIZATION
Topic 6: Pipelining –in computer

 Pipelining – Sequential execution (2 steps) Time

Clock cycle 1 2 3 4 5 6

I1 I2 I3

F1 E1 F2 E2 F3 E3

Where
 I  Instruction  To execute 3 instructions, sequential execution is
 F  Fetch required 6 clock cycles.
 E  Execution

© Kalasalingam Academy of Research and Education COMPUTER ARCHITECTURE AND ORGANIZATION


Topic 6: Pipelining –in computer
Time

I1 I2 I3

F1 B1 E1 B2 F2 B3 E2 B4 F3 B5 E3

Where
 I  Instruction
 F  Fetch Buffer
 E  Execution

© Kalasalingam Academy of Research and Education COMPUTER ARCHITECTURE AND ORGANIZATION


Topic 6: Pipelining –in computer

 Pipelining – Pipelined execution ( 2 steps )


T ime
Clock cycle 1 2 3 4

I1 F1 E 1

Instruction I2 F2 E 2

I3 F3 E 3

 To execute 3 instructions, pipelined execution is


required 4 clock cycles.

© Kalasalingam Academy of Research and Education COMPUTER ARCHITECTURE AND ORGANIZATION


Topic 6: Pipelining –in computer

 Pipelining – Pipelined execution ( 4 steps )

 Each instruction in pipelined processor may process in following four steps.

 Fetch (F) : To read the instruction from memory

 Decode (D) : To decode the instruction and fetch the source operand(s).

 Execute (E) : To perform the operation specified by the instruction

 Write (W) : To store and result in the destination location

© Kalasalingam Academy of Research and Education COMPUTER ARCHITECTURE AND ORGANIZATION


Topic 6: Pipelining –in computer
Time
I1

FB 1
1
B1
B2
D1 B2
B3
E1 B3 W1

Where
 I  Instruction
 F  Fetch
 D  Decode
 E  Execute Buffer
 W  Write

© Kalasalingam Academy of Research and Education COMPUTER ARCHITECTURE AND ORGANIZATION


Topic 6: Pipelining –in computer

Time
Clock cycle 1 2 3 4 5 6 7
Instruction
1 1 1 1 1
I F D E W
2 2 2 2 2
I F D E W
3 3 3 3 3
I F D E W
4 4 4 4 4
I F D E W

© Kalasalingam Academy of Research and Education COMPUTER ARCHITECTURE AND ORGANIZATION


Topic 6: Pipeline Performance

 Pipeline is having either 2 step/stage of 4 step/stage process.

 Each stage is allotted for one clock cycle to complete the process.

 The clock period should be allotted based on slowest pipeline stage.

 Faster stage can only wait for the slowest one to complete.

 Processor is faster than the main memory. So Instruction should be fetched and
stored in some place. In this cache memory is used.

© Kalasalingam Academy of Research and Education COMPUTER ARCHITECTURE AND ORGANIZATION


Topic 6: Pipeline Performance
Time
 Each instruction stage Clock cycle 1 2 3 4 5 6 7 8 9
should complete in 1 clock
cycle. I1 F1 D1 E1 W1

 In Instruction 2 (I2) execute I2 F2 D2 E2 W2


stage not completed in 1
clock cycle and took 3 clock Instruction I3 F3 D3 E3 W3
cycle to complete the
process.
I4 F4 D4 E4 W4
 I2 has taken 2 more clock
cycle to complete. I5 F5 D5 E5

© Kalasalingam Academy of Research and Education COMPUTER ARCHITECTURE AND ORGANIZATION


Topic 6: Pipeline Performance

 The previous example pipeline is stalled for two clock cycles.

 Stall - Delay in execution of an instruction

 Hazard - Any condition that causes a pipeline to stall

 Hazard is 3 types

1) Data Hazard

2) Instruction Hazard

3) Structural Hazard

© Kalasalingam Academy of Research and Education COMPUTER ARCHITECTURE AND ORGANIZATION


Topic 6: Pipeline Performance
 Data hazard:
 Operands (Source or Destination) are not available during the execution of an
Instruction which causes the pipeline to stall.

 Instruction (control) hazard:


 Instruction is not available or delay in fetching the instruction is cause the
pipeline to stall.

 Structural hazard:
 Two different instructions are trying to access one hardware at the same time.

© Kalasalingam Academy of Research and Education COMPUTER ARCHITECTURE AND ORGANIZATION


Topic 6: Pipeline Performance

 When stalling occurs in Pipeline, may affect the performance.

 So, identification of hazard is important to reduce the error and so performance can
be increased.

© Kalasalingam Academy of Research and Education COMPUTER ARCHITECTURE AND ORGANIZATION


Topic 7: Data Hazard
 Data hazard:
 Operands (Source or Destination) are not available during the execution of an
Instruction which causes the pipeline to stall.

 Example for when Hazard occurs


I1 = A←3+C where C = 5
I2 = B←4×A

© Kalasalingam Academy of Research and Education COMPUTER ARCHITECTURE AND ORGANIZATION


Topic 7: Data Hazard
 Example 1 - for No hazard
I1 = A←5×C where C = 6
I2 = B ← 20 + C
 Above Both the Instructions can execute simultaneously in the above example.

 Example 2 – for No hazard


I1 = Mul R1, R2, R3 in this case R2 will be given
I2 = Add R4, R2, R6

© Kalasalingam Academy of Research and Education COMPUTER ARCHITECTURE AND ORGANIZATION


Topic 7: Operand Forwarding

 Process of Instruction execution, output data will be written in output register then it
will be forwarded to next instruction.

 But before writing the data into output register, the data can be forwarded from
“Buffer” memory itself..

 To do the above operation, a special arrangement needs to be made to “forward” the


output of ALU to the input of ALU. This is called Operand Forwarding.

© Kalasalingam Academy of Research and Education COMPUTER ARCHITECTURE AND ORGANIZATION


Topic 7: Operand Forwarding
Source 1
Source 2

SRC1 SRC2

Register
file

ALU

RSLT

Destination

© Kalasalingam Academy of Research and Education COMPUTER ARCHITECTURE AND ORGANIZATION


Topic 7: Operand Forwarding

RSLT
SRC1, SRC2

W: Write
E: Execute (ALU)

Forwarding Path

© Kalasalingam Academy of Research and Education COMPUTER ARCHITECTURE AND ORGANIZATION


Topic 7: Handling Data Hazards in Software
 Hazards can be handled by the software to avoid stall.

Instruction 1 : Mul R1, R2, R3


NOP Where NOP = No OPeration
NOP
Instruction 2 : Add R4, R3, R5

 When NOP is initiated, next instruction will not start its operations.
 So any other operations can be performed by using hardwares.

© Kalasalingam Academy of Research and Education COMPUTER ARCHITECTURE AND ORGANIZATION


Topic 7: Side effects
 Instruction will change the contents of a register with the output.
 When a location other than one explicitly named in an instruction as a destination
operand is affected, the instruction is said to have a side effect.
Example: conditional code flags:
Add R1, R3
AddWithCarry R2, R4

 Pipelined hardware should have few side effects when Instructions designed for
execution.

© Kalasalingam Academy of Research and Education COMPUTER ARCHITECTURE AND ORGANIZATION


Topic 8: Instruction Hazard
 The Pipeline stalls, whenever the instruction fetch unit is interrupted.

 Instruction will be fetched and stored in cache memory about the next data to be
given to ALU.

 In such case, data is not available in cache is know as cache miss.

 Because of cache miss, pipeline may stalls

 If cache miss occurs, branch also can’t able to execute in the allotted time slot.

© Kalasalingam Academy of Research and Education COMPUTER ARCHITECTURE AND ORGANIZATION


Topic 8: Instruction Hazard
8.1 - Unconditional Branch
Time
Clock cycle 1 2 3 4 5 6

Instruction
I1 F1 E1

I 2 (Branch) F2 E2 Execution
unit idle

I3 F3 X

Ik Fk Ek

I k+1 F k+ 1 E k+ 1

© Kalasalingam Academy of Research and Education COMPUTER ARCHITECTURE AND ORGANIZATION


Topic 8: Instruction Hazard
8.1 - Unconditional Branch
 The time lost as a result of a
branch instruction is known as T ime
Branch penalty Clock cycle 1 2 3 4 5 6 7 8

I1 F1 D1 E1 W1

 Reducing the penalty I 2 (Branch) F2 D2 E2

I3 F3 D3 X

I4 F 4` X

Ik Fk Dk Ek Wk

I k+ 1 F k+ 1 D k+ 1 E k+ 1

© Kalasalingam Academy of Research and Education COMPUTER ARCHITECTURE AND ORGANIZATION


Topic 8: Instruction Hazard
8.1 - Unconditional Branch

T ime
Clock c ycle 1 2 3 4 5 6 7

I1 F1 D1 E1 W1

I 2 (Branch) F2 D2

I3 F3 X

Ik Fk Dk Ek Wk

I k+ 1 F k+ 1 D k+ 1 E k+ 1

© Kalasalingam Academy of Research and Education COMPUTER ARCHITECTURE AND ORGANIZATION


Topic 8: Instruction Hazard
8.1 - Unconditional Branch
 Instruction Queue and Prefetching

Instruction fetch unit


Instruction queue
F : Fetch
instruction

D : Dispatch/
E : Execute W : Write
Decode
instruction results
unit

© Kalasalingam Academy of Research and Education COMPUTER ARCHITECTURE AND ORGANIZATION


Topic 8: Instruction Hazard
8.2 - Conditional Branch
 A conditional branch instruction introduces the added hazard caused by the
dependency of the branch condition on the result of a preceding instruction.

 The decision to branch cannot be made until the execution of that instruction has
been completed.

 Branch instructions represent about 20% of the dynamic instruction count of most
programs.

© Kalasalingam Academy of Research and Education COMPUTER ARCHITECTURE AND ORGANIZATION


Topic 8: Instruction Hazard
8.2 - Conditional Branch
 Delayed Branch
 The instructions in the delay slots are always fetched. Therefore, we would like to
arrange for them to be fully executed whether or not the branch is taken.

 The objective is to place useful instructions in these slots.

 The effectiveness of the delayed branch approach depends on how often it is


possible to reorder instructions.

© Kalasalingam Academy of Research and Education COMPUTER ARCHITECTURE AND ORGANIZATION


Topic 8: Instruction Hazard
8.2 - Conditional Branch
 Delayed Branch

LOOP Shift_left R1 LOOP Decrement R2


Decrement R2 Branch=0 LOOP
Branch=0 LOOP Shift_left R1
NEXT Add R1,R3 NEXT Add R1,R3

Original program loop

© Kalasalingam Academy of Research and Education COMPUTER ARCHITECTURE AND ORGANIZATION


Topic 8: Instruction Hazard
8.2 - Conditional Branch Time
Clock c ycle 1 2 3 4 5 6 7 8
 Delayed Branch Instruction
Decrement F E

Branch F E

Shift (delay slot) F E

Decrement (Branch taken F E

Branch F E

Shift (delay slot) F E

Add (Branch not taken F E

© Kalasalingam Academy of Research and Education COMPUTER ARCHITECTURE AND ORGANIZATION


Topic 8: Instruction Hazard
8.2 - Conditional Branch
 Branch Prediction
 To predict whether or not a particular branch will be taken.
 Simplest form: assume branch will not take place and continue to fetch instructions in
sequential address order.
 Until the branch is evaluated, instruction execution along the predicted path must be done
on a speculative basis.
 Speculative execution: instructions are executed before the processor is certain that they
are in the correct execution sequence.
 Need to be careful so that no processor registers or memory locations are updated until it
is confirmed that these instructions should indeed be executed.

© Kalasalingam Academy of Research and Education COMPUTER ARCHITECTURE AND ORGANIZATION


Topic 8: Instruction Hazard
8.2 - Conditional Branch
 Incorrectly Predicted Branch
Clock cycle 1 2 3 4 5 6

Instruction
I 1 (Compare) F1 D1 E1 W1

I 2 (Branch>0) F2 D 2 /P2 E2

I3 F3 D3 X

I4 F4 X

Ik Fk Dk

© Kalasalingam Academy of Research and Education COMPUTER ARCHITECTURE AND ORGANIZATION


Topic 8: Instruction Hazard
8.2 - Conditional Branch
 Dynamic Branch Prediction 4 State Algorithm

LT : Likely to be taken
LNT : Likely not to be taken

2 State Algorithm ST : Strong Likely to be taken


LT : Likely to be taken
LNT : Likely not to be taken
SNT : Strong likely not to be taken

© Kalasalingam Academy of Research and Education COMPUTER ARCHITECTURE AND ORGANIZATION


Topic 8: Datapath and Control Considerations
Register
file

Bus A
A

Bus B
ALU R
B

Bus C
PC
Control signal pipeline
Incrementer
Datapath modified for pipelined
execution Instruction IMAR
decoder
Memory address
(Instruction fetches)
Instruction
queue

MDR/Write DMAR MDR/Read


Instruction cache
Memory address
(Data access)

Data cache

© Kalasalingam Academy of Research and Education COMPUTER ARCHITECTURE AND ORGANIZATION


Topic 8: Datapath and Control Considerations
 Major changes in Processor bus are:
 Separate instruction and data caches are available
 Program Counter (PC) is connected to Instruction Memory Address Register
(IMAR)
 Data Memory Address Register (DMAR) is given separately
 Separate MDR
 Buffers memory is given for ALU to improve the performance
 Instruction queue is available to fetch and store.
 Instruction decoder is connected to output unit

© Kalasalingam Academy of Research and Education COMPUTER ARCHITECTURE AND ORGANIZATION


Topic 8: Datapath and Control Considerations
 Advantages of modified data path are:
 Reading an instruction from the instruction cache
 Incrementing the PC
 Decoding an instruction
 Reading from or writing into the data cache
 Reading the contents of up to two regs
 Writing into one register in the reg file
 Performing an ALU operation

© Kalasalingam Academy of Research and Education COMPUTER ARCHITECTURE AND ORGANIZATION


Thank You!

© Kalasalingam Academy of Research and Education COMPUTER ARCHITECTURE AND ORGANIZATION

You might also like