0% found this document useful (0 votes)
21 views32 pages

Unit5 Parallel Processing Multiprocessor

The document discusses CPU performance, parallel processing, and multi-processor systems, highlighting the importance of processor time, clock cycles, and cache memory in program execution. It explains pipelining as a technique to enhance computational speed by executing sub-operations simultaneously and outlines various pipeline hazards that can impede performance. Additionally, it covers multi-processing systems, interconnection structures, and their implications for memory access and system reliability.

Uploaded by

Vabhav Mehta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views32 pages

Unit5 Parallel Processing Multiprocessor

The document discusses CPU performance, parallel processing, and multi-processor systems, highlighting the importance of processor time, clock cycles, and cache memory in program execution. It explains pipelining as a technique to enhance computational speed by executing sub-operations simultaneously and outlines various pipeline hazards that can impede performance. Additionally, it covers multi-processing systems, interconnection structures, and their implications for memory access and system reliability.

Uploaded by

Vabhav Mehta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd

COMPUTER

ORGANIZATION AND
ARCHITECTURE

Unit-5: CPU Performance, Parallel Processing, Multi-processor


• The elapsed time for the execution of a program depends on all units in a
computer system, the processor time depends on the hardware involved in the
execution of individual machine instructions. This hardware comprises of the
processor and the memory which are usually connected by the bus.

• Let us examine the flow of program instructions and data between the memory
and the processor. At the start of execution, all program instructions and the
required data are stored in the main memory. As the execution proceeds,
instructions are fetched one by one over the bus into the processor, and a copy is
placed in the cache later if the same instruction or data item is needed a second
time, it is read directly from the cache.

03/08/2025 2
• The processor and relatively small cache memory can be fabricated on a single IC
chip. The internal speed of performing the basic steps of instruction processing
on chip is very high and is considerably faster than the speed at which the
instruction and data can be fetched from the main memory. A program will be
executed faster if the movement of instructions and data between the main
memory and the processor is minimized, which is achieved by using the cache.

03/08/2025 3
• Processor Clock
Processor circuits are controlled by a timing signal called clock. The clock designer the
regular time intervals called clock cycles. To execute a machine instruction the
processor divides the action to be performed into a sequence of basic steps that each
step can be completed in one clock cycle. The length of one clock cycle is an important
parameter that affects the processor performance.

• Basic performance equation


Let “T” be the processor time required to execute a program that has been prepared
in some high-level language. The compiler generates a machine language object
program that corresponds to the source program. Assume that complete execution of
the program requires the execution of “N” machine cycle language instructions. Some
instruction may be executed more than once, which in the case for instructions inside
a program loop others may not be executed all, depending on the input data used.
03/08/2025 4
Suppose that the average number of basic steps needed to execute one machin
cycle instruction is “S”, where each basic step is completed in one clock cycle. I
clock rate is “R” cycles per second, the program execution time is given by

• Clock rate
The rate at which a processor completes its total processing cycle in one second
Generally, it is said that the higher the clock speed, the faster the CPU. But this ma
not be the only reason for a faster CPU. There are many factors behind it like th
number of processors, speed of RAM, bus speed, size of cache etc. Som
instructions require more cycles from the CPU to be completed. Depending upo
the architecture of the CPU the clock speed can be more or less important.

03/08/2025 5
5
MIPS- Million instruction per second
A measure of the execution speed of the computer. The measure approximately
provides the number of machine instructions that could be executed in a second by
a computer.

03/08/2025 6
Parallel processing and Pipelining
• A large class of techniques that provide simultaneous data-processing tasks for
increasing the computational speed of a computer

• Pipelining is a technique of dividing a sequential process into sub-operations with


each sub-operation is getting executed in a dedicated segment that works
simultaneously with other segments

• Each segment in a pipeline performs partial processing and the result obtained
from one segment is transferred to the next segment

• The final result is achieved after the data has passed through all the segments
03/08/2025 7
• Simplest example of a pipeline can be the use of an input register and digital
combinational circuit in each segment. The register holds the data and digital
circuit performs the sub-operation. The output of the digital circuit is then fed to
data register of the next segment.

03/08/2025 8
03/08/2025 9
Each operand needs to pass through all four segments in a fixed sequence.

Each segment has a combinational circuit Si that performs the sub-operation on a


data stream. The segments are separated by registers Ri that hold the intermediate
results between stages.

10
03/08/2025
Space time diagram
Used to illustrate the behaviour of a pipeline. Indicating the segment utilization as
a function of time.

11
03/08/2025
Assume a k-segment pipeline that takes clock cycle time Tp to execute n tasks.
Time required by task T1 to be completely executed is kTp
Remaining (n-1) tasks will be completed after time (n-1)Tp.
Total no. of clock cycles required = k+(n-1)

12
03/08/2025
Arithmetic Pipeline
• Pipelined arithmetic units are found in high speed
computers.
• Used to implement floating point operations, multiplication
of fixed point numbers or scientific problems etc.
• e.g. two floating point numbers and need to be added
The pipeline sub-operations can be broken down as:
 Compare the exponents
 Align the mantissa
 Add the mantissa
 Normalize the result

13
03/08/2025
Instruction Pipeline
• An instruction pipeline reads consecutive instructions from memory while
previous instructions are being executed in other segments.
This causes the instruction fetch and execute phases to overlap and perform
simultaneous operations

• Consider using a two-segment pipeline with instruction fetch and execution units.
The fetch segment can be implemented using a FIFO queue. Whenever execution
unit is not using the memory, the control increments the PC and uses its address
to fetch the consecutive instructions from memory and stores these instructions
into the queue

15
03/08/2025
In most general case, steps needed to process each instruction are:
 Fetch the instruction from memory
 Decode the instruction
 Calculate the effective address
 Fetch operands from memory
 Execute the instruction
 Store the results

16
03/08/2025
There are certain difficulties that prevent instruction pipeline from operating at its
maximum rate
 Different segments take different times to operate on incoming information
 Some segments get skipped for certain operations
 Two or more segments require memory access at the same time causing one
segment to go into wait state

17
03/08/2025
As an example take a 4-sement pipeline for instruction execution

18
03/08/2025
Vector Processing
• Utilized in science and engineering problems where vast number of calculations
are required which might take days or weeks to complete.

• Applications like: Long-range weather forecasting


Seismic data analysis
Medical diagnosis
Artificial intelligence
Image processing

19
03/08/2025
• In scientific problems, the data is usually formulated as vectors and
matrices of floating point numbers.
• To access each element in these vectors, program loops are
introduced

• The computer capable of vector processing eliminates the overhead


associated with time taken to fetch and execute the instructions in a
program loop

• Vector instruction includes the initial address of the operands, length


of vectors and operation to be performed all in one instruction

20
03/08/2025
Pipeline Hazards
• Pipeline hazards are situations that prevent the next instruction in the instruction
stream from executing during its designated clock cycles.
• Any condition that causes a stall in the pipeline operations can be called a hazard.
• There are primarily three types of hazards:

Data Hazards
Control Hazards or instruction Hazards
Structural Hazards

21
03/08/2025
• Data Hazard
Any condition in which either the source or the destination operands of an
instruction are not available at the time expected in the pipeline. As a result of
which some operation has to be delayed and the pipeline stalls. Whenever there
are two instructions one of which depends on the data obtained from the other.

• Structural Hazard
This situation arises mainly when two instructions require a given hardware
resource at the same time and hence for one of the instructions the pipeline needs
to be stalled.

22
03/08/2025
• Control Hazard
The instruction fetch unit of the CPU is responsible for providing a stream of
instructions to the execution unit. The instructions fetched by the fetch unit are in
consecutive memory locations and they are executed. However the problem arises
when one of the instructions is a branching instruction to some other memory
location. Thus all the instruction fetched in the pipeline from consecutive memory
locations are invalid now and need to removed. This induces a stall till new
instructions are again fetched from the memory address specified in the branch
instruction.

23
03/08/2025
• Multi-processing improves the reliability of a system so that failure in one part
has limited effect on rest of the system.

• If a fault causes one processor to fail, a second processor can be assigned to


perform functions of a disabled processor.

• An overall function can be partitioned onto number of tasks handled by each


processor individually

• A program can be decomposed into parallel executable tasks

03/08/2025 24
• A multi-processor system with common shared memory is called shared-memory
or tightly-coupled multi-processor

• Alternative of the above system is called distributed-memory or loosely-coupled


system wherein each processor element has its own private local memory. The
processors are tied together by switching scheme designed to route information
from 1 processor to another through message-passing scheme.

03/08/2025 25
Interconnection structures
• Physical forms available for establishing an interconnection network between
various components of the computer system.

 Time-shared common bus


In any multiprocessor system, the time-shared common bus interconnection
structures provide a common communication path by connecting all the functional
units like I/O processor, processor, memory unit, etc.

03/08/2025 26
• Only one processor can communicate with memory or another processor at any
given time. Transfer operations are conducted by the processor that is in control
of the bus at the time.
• Any other processor wishing to initiate a transfer must first determine the
availability status of the bus and when the bus becomes available, the processor
can address the destination unit to initiate transfer.

• A single bus system is restricted to one transfer at a time i.e. when one processor
is communicating with the memory all other processors are idle waiting for the
bus.

• One solution for this is to implement a dual bus structure

03/08/2025 27
03/08/2025 28
 Multi-port memory
A multiport memory structure employs separate buses for every memory module
and CPU. Every processor in a multiport memory is connected to each memory
unit.

03/08/2025 29
The processor bus consists of address, data and control lines required to communicate
with the memory.

Each memory module has multiple ports and each port accommodates one of the
buses.

The module must have internal control logic to determine which port will have access
to memory at any given time.

Memory access conflicts are resolved by assigning fixed priorities to each memory port.

Disadvantage of this technique is that it requires expensive memory control logic and
large no. of connectors.
03/08/2025 30
 Crossbar switch
This organization consists of a no. of cross-points placed at intersections between
processor buses and memory module paths

03/08/2025 31
The crosspoint consists of a switch that determines the path from a processor to a
memory module

Each switchpoint has control logic to set up the transfer path between processor
and memory. It examines the address that is placed in the bus to determine
whether its particular module is being addressed.

It also resolves multiple requests for access to same memory module on the basis
of pre-determined priority.

03/08/2025 32

You might also like