0% found this document useful (0 votes)

21 views32 pages

Unit5 Parallel Processing Multiprocessor

The document discusses CPU performance, parallel processing, and multi-processor systems, highlighting the importance of processor time, clock cycles, and cache memory in program execution. It explains pipelining as a technique to enhance computational speed by executing sub-operations simultaneously and outlines various pipeline hazards that can impede performance. Additionally, it covers multi-processing systems, interconnection structures, and their implications for memory access and system reliability.

Uploaded by

Vabhav Mehta

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

21 views32 pages

Unit5 Parallel Processing Multiprocessor

Uploaded by

Vabhav Mehta

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

COMPUTER

ORGANIZATION AND
ARCHITECTURE

Unit-5: CPU Performance, Parallel Processing, Multi-processor

• The elapsed time for the execution of a program depends on all units in a
computer system, the processor time depends on the hardware involved in the
execution of individual machine instructions. This hardware comprises of the
processor and the memory which are usually connected by the bus.

• Let us examine the flow of program instructions and data between the memory
and the processor. At the start of execution, all program instructions and the
required data are stored in the main memory. As the execution proceeds,
instructions are fetched one by one over the bus into the processor, and a copy is
placed in the cache later if the same instruction or data item is needed a second
time, it is read directly from the cache.

03/08/2025 2
• The processor and relatively small cache memory can be fabricated on a single IC
chip. The internal speed of performing the basic steps of instruction processing
on chip is very high and is considerably faster than the speed at which the
instruction and data can be fetched from the main memory. A program will be
executed faster if the movement of instructions and data between the main
memory and the processor is minimized, which is achieved by using the cache.
•

03/08/2025 3
• Processor Clock
Processor circuits are controlled by a timing signal called clock. The clock designer the
regular time intervals called clock cycles. To execute a machine instruction the
processor divides the action to be performed into a sequence of basic steps that each
step can be completed in one clock cycle. The length of one clock cycle is an important
parameter that affects the processor performance.

• Basic performance equation

Let “T” be the processor time required to execute a program that has been prepared
in some high-level language. The compiler generates a machine language object
program that corresponds to the source program. Assume that complete execution of
the program requires the execution of “N” machine cycle language instructions. Some
instruction may be executed more than once, which in the case for instructions inside
a program loop others may not be executed all, depending on the input data used.
03/08/2025 4
Suppose that the average number of basic steps needed to execute one machin
cycle instruction is “S”, where each basic step is completed in one clock cycle. I
clock rate is “R” cycles per second, the program execution time is given by

• Clock rate
The rate at which a processor completes its total processing cycle in one second
Generally, it is said that the higher the clock speed, the faster the CPU. But this ma
not be the only reason for a faster CPU. There are many factors behind it like th
number of processors, speed of RAM, bus speed, size of cache etc. Som
instructions require more cycles from the CPU to be completed. Depending upo
the architecture of the CPU the clock speed can be more or less important.

03/08/2025 5
5
MIPS- Million instruction per second
A measure of the execution speed of the computer. The measure approximately
provides the number of machine instructions that could be executed in a second by
a computer.

03/08/2025 6
Parallel processing and Pipelining
• A large class of techniques that provide simultaneous data-processing tasks for
increasing the computational speed of a computer

• Pipelining is a technique of dividing a sequential process into sub-operations with

each sub-operation is getting executed in a dedicated segment that works
simultaneously with other segments

• Each segment in a pipeline performs partial processing and the result obtained
from one segment is transferred to the next segment

• The final result is achieved after the data has passed through all the segments
03/08/2025 7
• Simplest example of a pipeline can be the use of an input register and digital
combinational circuit in each segment. The register holds the data and digital
circuit performs the sub-operation. The output of the digital circuit is then fed to
data register of the next segment.

03/08/2025 8
03/08/2025 9
Each operand needs to pass through all four segments in a fixed sequence.

Each segment has a combinational circuit Si that performs the sub-operation on a

data stream. The segments are separated by registers Ri that hold the intermediate
results between stages.

10
03/08/2025
Space time diagram
Used to illustrate the behaviour of a pipeline. Indicating the segment utilization as
a function of time.

11
03/08/2025
Assume a k-segment pipeline that takes clock cycle time Tp to execute n tasks.
Time required by task T1 to be completely executed is kTp
Remaining (n-1) tasks will be completed after time (n-1)Tp.
Total no. of clock cycles required = k+(n-1)

12
03/08/2025
Arithmetic Pipeline
• Pipelined arithmetic units are found in high speed
computers.
• Used to implement floating point operations, multiplication
of fixed point numbers or scientific problems etc.
• e.g. two floating point numbers and need to be added
The pipeline sub-operations can be broken down as:
 Compare the exponents
 Align the mantissa
 Add the mantissa
 Normalize the result

13
03/08/2025
Instruction Pipeline
• An instruction pipeline reads consecutive instructions from memory while
previous instructions are being executed in other segments.
This causes the instruction fetch and execute phases to overlap and perform
simultaneous operations

• Consider using a two-segment pipeline with instruction fetch and execution units.
The fetch segment can be implemented using a FIFO queue. Whenever execution
unit is not using the memory, the control increments the PC and uses its address
to fetch the consecutive instructions from memory and stores these instructions
into the queue

15
03/08/2025
In most general case, steps needed to process each instruction are:
 Fetch the instruction from memory
 Decode the instruction
 Calculate the effective address
 Fetch operands from memory
 Execute the instruction
 Store the results

16
03/08/2025
There are certain difficulties that prevent instruction pipeline from operating at its
maximum rate
 Different segments take different times to operate on incoming information
 Some segments get skipped for certain operations
 Two or more segments require memory access at the same time causing one
segment to go into wait state

17
03/08/2025
As an example take a 4-sement pipeline for instruction execution

18
03/08/2025
Vector Processing
• Utilized in science and engineering problems where vast number of calculations
are required which might take days or weeks to complete.

• Applications like: Long-range weather forecasting

Seismic data analysis
Medical diagnosis
Artificial intelligence
Image processing

19
03/08/2025
• In scientific problems, the data is usually formulated as vectors and
matrices of floating point numbers.
• To access each element in these vectors, program loops are
introduced

• The computer capable of vector processing eliminates the overhead

associated with time taken to fetch and execute the instructions in a
program loop

• Vector instruction includes the initial address of the operands, length

of vectors and operation to be performed all in one instruction

20
03/08/2025
Pipeline Hazards
• Pipeline hazards are situations that prevent the next instruction in the instruction
stream from executing during its designated clock cycles.
• Any condition that causes a stall in the pipeline operations can be called a hazard.
• There are primarily three types of hazards:

Data Hazards
Control Hazards or instruction Hazards
Structural Hazards

21
03/08/2025
• Data Hazard
Any condition in which either the source or the destination operands of an
instruction are not available at the time expected in the pipeline. As a result of
which some operation has to be delayed and the pipeline stalls. Whenever there
are two instructions one of which depends on the data obtained from the other.

• Structural Hazard
This situation arises mainly when two instructions require a given hardware
resource at the same time and hence for one of the instructions the pipeline needs
to be stalled.

22
03/08/2025
• Control Hazard
The instruction fetch unit of the CPU is responsible for providing a stream of
instructions to the execution unit. The instructions fetched by the fetch unit are in
consecutive memory locations and they are executed. However the problem arises
when one of the instructions is a branching instruction to some other memory
location. Thus all the instruction fetched in the pipeline from consecutive memory
locations are invalid now and need to removed. This induces a stall till new
instructions are again fetched from the memory address specified in the branch
instruction.

23
03/08/2025
• Multi-processing improves the reliability of a system so that failure in one part
has limited effect on rest of the system.

• If a fault causes one processor to fail, a second processor can be assigned to

perform functions of a disabled processor.

• An overall function can be partitioned onto number of tasks handled by each

processor individually

• A program can be decomposed into parallel executable tasks

03/08/2025 24
• A multi-processor system with common shared memory is called shared-memory
or tightly-coupled multi-processor

• Alternative of the above system is called distributed-memory or loosely-coupled

system wherein each processor element has its own private local memory. The
processors are tied together by switching scheme designed to route information
from 1 processor to another through message-passing scheme.

03/08/2025 25
Interconnection structures
• Physical forms available for establishing an interconnection network between
various components of the computer system.

 Time-shared common bus

In any multiprocessor system, the time-shared common bus interconnection
structures provide a common communication path by connecting all the functional
units like I/O processor, processor, memory unit, etc.

03/08/2025 26
• Only one processor can communicate with memory or another processor at any
given time. Transfer operations are conducted by the processor that is in control
of the bus at the time.
• Any other processor wishing to initiate a transfer must first determine the
availability status of the bus and when the bus becomes available, the processor
can address the destination unit to initiate transfer.

• A single bus system is restricted to one transfer at a time i.e. when one processor
is communicating with the memory all other processors are idle waiting for the
bus.

• One solution for this is to implement a dual bus structure

03/08/2025 27
03/08/2025 28
 Multi-port memory
A multiport memory structure employs separate buses for every memory module
and CPU. Every processor in a multiport memory is connected to each memory
unit.

03/08/2025 29
The processor bus consists of address, data and control lines required to communicate
with the memory.

Each memory module has multiple ports and each port accommodates one of the
buses.

The module must have internal control logic to determine which port will have access
to memory at any given time.

Memory access conflicts are resolved by assigning fixed priorities to each memory port.

Disadvantage of this technique is that it requires expensive memory control logic and
large no. of connectors.
03/08/2025 30
 Crossbar switch
This organization consists of a no. of cross-points placed at intersections between
processor buses and memory module paths

03/08/2025 31
The crosspoint consists of a switch that determines the path from a processor to a
memory module

Each switchpoint has control logic to set up the transfer path between processor
and memory. It examines the address that is placed in the bus to determine
whether its particular module is being addressed.

It also resolves multiple requests for access to same memory module on the basis
of pre-determined priority.

03/08/2025 32

Unit 5
No ratings yet
Unit 5
36 pages
Chapter 9
No ratings yet
Chapter 9
28 pages
Multiprocessor Systems & Pipelining
No ratings yet
Multiprocessor Systems & Pipelining
11 pages
Computer Architecture 1
No ratings yet
Computer Architecture 1
37 pages
Pipeline and Vector Processing
No ratings yet
Pipeline and Vector Processing
52 pages
Pipeline and Vector Processing
100% (1)
Pipeline and Vector Processing
18 pages
Final
No ratings yet
Final
26 pages
BCA Semester II Computer Organisation and Architecture (COA
No ratings yet
BCA Semester II Computer Organisation and Architecture (COA
24 pages
COMP Unit 1
No ratings yet
COMP Unit 1
52 pages
Pipeline and Vector Processing Overview
No ratings yet
Pipeline and Vector Processing Overview
74 pages
Instruction Formats and Control Units
No ratings yet
Instruction Formats and Control Units
63 pages
Lec18 Pipeline
No ratings yet
Lec18 Pipeline
59 pages
Coa Unit 5
No ratings yet
Coa Unit 5
20 pages
Unit 5
No ratings yet
Unit 5
51 pages
Unit 5
No ratings yet
Unit 5
23 pages
Unit 6 - Pipeline, Vector Processing and Multiprocessors
No ratings yet
Unit 6 - Pipeline, Vector Processing and Multiprocessors
23 pages
Introduction To High Performance Computing: Unit-I
No ratings yet
Introduction To High Performance Computing: Unit-I
70 pages
Presentation 5156 Content Document 20250301102853AM
No ratings yet
Presentation 5156 Content Document 20250301102853AM
40 pages
COA Unit-5
No ratings yet
COA Unit-5
144 pages
Chapter 9 - Pipeline and Vector Processing Section 9.1 - Parallel Processing
No ratings yet
Chapter 9 - Pipeline and Vector Processing Section 9.1 - Parallel Processing
10 pages
Lecture 10
No ratings yet
Lecture 10
23 pages
Parallelism in Uniprocessor System and Granularity
100% (5)
Parallelism in Uniprocessor System and Granularity
5 pages
20-Unit 7-22-04-2024
No ratings yet
20-Unit 7-22-04-2024
97 pages
3rd Unit
No ratings yet
3rd Unit
72 pages
Pipelining and Vector Processing
No ratings yet
Pipelining and Vector Processing
30 pages
Lecture 2
No ratings yet
Lecture 2
51 pages
3 Pipelining Pipeline:: "Folder" Takes 20 Minutes
No ratings yet
3 Pipelining Pipeline:: "Folder" Takes 20 Minutes
8 pages
Coa Unit 5
No ratings yet
Coa Unit 5
71 pages
Chapter 5 Pipelining and Vector Processing Modified
No ratings yet
Chapter 5 Pipelining and Vector Processing Modified
37 pages
ACA - All Unit
No ratings yet
ACA - All Unit
31 pages
5.1-5.3 Pipelining and Parallel Processing
No ratings yet
5.1-5.3 Pipelining and Parallel Processing
56 pages
A4 版本1 （未使用）
No ratings yet
A4 版本1 （未使用）
2 pages
CS326 Parallel and Distributed Computing: SPRING 2021 National University of Computer and Emerging Sciences
No ratings yet
CS326 Parallel and Distributed Computing: SPRING 2021 National University of Computer and Emerging Sciences
47 pages
Vectors
No ratings yet
Vectors
52 pages
Unit 5
No ratings yet
Unit 5
44 pages
RISC vs CISC: Characteristics & Processing
No ratings yet
RISC vs CISC: Characteristics & Processing
16 pages
Campmc Unit Ii
No ratings yet
Campmc Unit Ii
61 pages
Chapter - 5 Parallel Processing
No ratings yet
Chapter - 5 Parallel Processing
117 pages
Unit 3
No ratings yet
Unit 3
94 pages
Caalp Unit5
No ratings yet
Caalp Unit5
20 pages
Parallel Computing
No ratings yet
Parallel Computing
46 pages
COA DR MVN 5 UNIT - Latest PDF
No ratings yet
COA DR MVN 5 UNIT - Latest PDF
24 pages
Unit-5-Parallel Processing
No ratings yet
Unit-5-Parallel Processing
11 pages
Unit-V NEW
No ratings yet
Unit-V NEW
21 pages
CA Slides#3 Pipeline Introduction
No ratings yet
CA Slides#3 Pipeline Introduction
26 pages
Instruction Pipelining in Operating Systems
No ratings yet
Instruction Pipelining in Operating Systems
50 pages
ARM Architecture in Embedded Systems
No ratings yet
ARM Architecture in Embedded Systems
463 pages
Chapter 1 Edit PDF
No ratings yet
Chapter 1 Edit PDF
40 pages
Parallel Processing Essentials
No ratings yet
Parallel Processing Essentials
32 pages
Computer Architecture - 2marks: 1) What Is The Need For Speculation? (NOV/DEC 2014)
No ratings yet
Computer Architecture - 2marks: 1) What Is The Need For Speculation? (NOV/DEC 2014)
11 pages
CO4 PPT Modified
No ratings yet
CO4 PPT Modified
35 pages
HPC - Unit-1 Insem Notes
No ratings yet
HPC - Unit-1 Insem Notes
76 pages
Ch7 Processing
No ratings yet
Ch7 Processing
22 pages
Processor Organization & Instruction Cycle
No ratings yet
Processor Organization & Instruction Cycle
31 pages
04 Pipeline
No ratings yet
04 Pipeline
83 pages
Pipelining & Vector Processing Guide
No ratings yet
Pipelining & Vector Processing Guide
73 pages
MPMC Module 5
No ratings yet
MPMC Module 5
25 pages
9 - Processor Organization and Architecture
No ratings yet
9 - Processor Organization and Architecture
91 pages
Unit-4: InputOutput Memory Organization
No ratings yet
Unit-4: InputOutput Memory Organization
44 pages
Unit 3
No ratings yet
Unit 3
61 pages
Unit 2
No ratings yet
Unit 2
43 pages
Unit 5 Disk Management
No ratings yet
Unit 5 Disk Management
11 pages
Unit 1
No ratings yet
Unit 1
53 pages
Unit 3 Concurrent Processes & Deadlocks
No ratings yet
Unit 3 Concurrent Processes & Deadlocks
34 pages
CS3351 Digital Principles and Computer Organization
No ratings yet
CS3351 Digital Principles and Computer Organization
30 pages
Parallel Computer Consistency & Coherence
No ratings yet
Parallel Computer Consistency & Coherence
12 pages
Understanding NVIDIA Volta GPU Architecture
No ratings yet
Understanding NVIDIA Volta GPU Architecture
36 pages
Advancing Financial Services
No ratings yet
Advancing Financial Services
11 pages
Parallel Computing & Flynn's Taxonomy
No ratings yet
Parallel Computing & Flynn's Taxonomy
34 pages
Ruijie WiFi Setup and Data Plans
No ratings yet
Ruijie WiFi Setup and Data Plans
4 pages
Parallel Computer Models Overview
No ratings yet
Parallel Computer Models Overview
43 pages
Co Unit3
No ratings yet
Co Unit3
41 pages
Microprocessors Have Played A Pivotal Role in The Advancement of Computing Technology
No ratings yet
Microprocessors Have Played A Pivotal Role in The Advancement of Computing Technology
29 pages
(Lecture 2) - (Classification of OS)
No ratings yet
(Lecture 2) - (Classification of OS)
19 pages
Lecture 3.2.2 (Instruction Level Parallelism, Pipeline Hazards)
No ratings yet
Lecture 3.2.2 (Instruction Level Parallelism, Pipeline Hazards)
34 pages
FAC1002 - Computer Hardware (I)
No ratings yet
FAC1002 - Computer Hardware (I)
16 pages
ThinkAir: Cloud-Based Mobile Offloading
No ratings yet
ThinkAir: Cloud-Based Mobile Offloading
9 pages
Chapter 2 Solutions for Systems Architecture
No ratings yet
Chapter 2 Solutions for Systems Architecture
5 pages
Superscalar Vs Superpipeline Processor
No ratings yet
Superscalar Vs Superpipeline Processor
17 pages
Supra Containers Whitepaper
No ratings yet
Supra Containers Whitepaper
16 pages
M.Tech VLSI Design First Semester Syllabus
No ratings yet
M.Tech VLSI Design First Semester Syllabus
15 pages
Intro to Research Modeling
No ratings yet
Intro to Research Modeling
18 pages
Started On State Completed On Time Taken Marks Grade
No ratings yet
Started On State Completed On Time Taken Marks Grade
3 pages
CMG Imex
0% (1)
CMG Imex
6 pages
ILP Hazards & Mitigation Strategies
No ratings yet
ILP Hazards & Mitigation Strategies
5 pages
System On Chips Soc'S & Multiprocessor System On Chips Mpsocs
No ratings yet
System On Chips Soc'S & Multiprocessor System On Chips Mpsocs
42 pages
Cse PDF
No ratings yet
Cse PDF
55 pages
Mid Sem QP&Solution
No ratings yet
Mid Sem QP&Solution
7 pages
Basics of Parallel Programming: Unit-1
No ratings yet
Basics of Parallel Programming: Unit-1
79 pages
Unit1chap 01 DTU IntroBTech2020Handout
No ratings yet
Unit1chap 01 DTU IntroBTech2020Handout
59 pages
OpenCL Programming Essentials
No ratings yet
OpenCL Programming Essentials
62 pages
Thread Libraries and Implicit Threading
No ratings yet
Thread Libraries and Implicit Threading
3 pages
Grid and Cloud Computing Overview
No ratings yet
Grid and Cloud Computing Overview
26 pages
DNA Assembly With de Bruijn Graphs On FPGA PDF
No ratings yet
DNA Assembly With de Bruijn Graphs On FPGA PDF
4 pages

Unit5 Parallel Processing Multiprocessor

Uploaded by

Unit5 Parallel Processing Multiprocessor

Uploaded by

COMPUTER

Unit-5: CPU Performance, Parallel Processing, Multi-processor

• Basic performance equation

• Pipelining is a technique of dividing a sequential process into sub-operations with

Each segment has a combinational circuit Si that performs the sub-operation on a

• Applications like: Long-range weather forecasting

• The computer capable of vector processing eliminates the overhead

• Vector instruction includes the initial address of the operands, length

• If a fault causes one processor to fail, a second processor can be assigned to

• An overall function can be partitioned onto number of tasks handled by each

• A program can be decomposed into parallel executable tasks

• Alternative of the above system is called distributed-memory or loosely-coupled

 Time-shared common bus

• One solution for this is to implement a dual bus structure

You might also like