Parallel processing: - The propose of parallel processing in increasing the componential speed
of CPU. Instead of processing each instruction sequentially a parallel processor perform concurrent
data processing task to achieve faster execution time. Suppose when an instruction in being
executed in the ALU, to next instruction can read from memory. If two or more ALU and included
within the processor unit. Two or more instruction execute at the same time. The purpose of parallel
processing in speed up the processing capabilities. The amount of hardware increases the parallel
processing with it, the cast of system increases.
To reduce the cost there are same parallel processing technique
1. Large computation can be divided into many points that can be performed in parallel
(PTO)
2. Pipeline processing
Parallel structure: -
SISD: - A single processor computer system in called a single instruction single data stream. The
instruction reading from memory forming.
Instruction stream: - Various operation performed on the data in the processor form continuity a
data stream. A program executed by the processor continue a single Instruction stream, and the
sequence of data items that its apparats on continuity two single data stream.
SIMD: - Single instruction stream and multiple data stream. Single stream of instruction in broad cast
to each processor operates an its own data. All processor executes the same program but apparat
on different data in called SIMP.
MIMD: - Multiple instruction stream multiple data stream it involves a number of independent
processors, each executing a different program and accessing its own sequence of data items. Such
mechanism is called MIMD.
MISD: - Multiple instruction single data stream. In such a system a common data structure in
manipulated by separate processors. Each executing a different program parallel processing in
established by distributing the data among the function unite. All function unites under the
supervision of a control unit
Adder subtractors
Processor register Adder subtractors
Figure should the
Integer multiply possible way of
memory separating the execution
Logic unit unite into eight fusional
units. Operating in
Sift unit parallel.
Increment
Floating point add-
subtract
Floating point
multiply
Floating point
Deferent: - A parallel processing in a technology in which a multiprocessor system in used to salve
complex problem faster by breaking the problem to be processed simultaneously by different
processors of the multiprocessor system.
Different system bus: - A bus that connect major components in a multiprocessor system, such as
CPU, IOPs and memory in called system bus. A typical system bus consists of approximately 100
signal lines. Wise line are address. Data control and power distribution line that supply power to
components.
Pipelining
Pipelining is a technique of decomposing a sequential processing into sub operation, with each
subprocess being executed in a special dedicated segment that operates concurrently with all other
segments. Pipelining provides a way to start a new task before an old one has been completed Each
segment performs partial processing. The result obtained from the computation in each segment is
transferred to the next segment in pipeline. The final result is obtained after the data have passed
through all segment.
[ The characteristic of pipelines that several computations can be in progress in distance segments at
the same time.]
Let’s consider an example to perform the combined multiply and add operation with a stream of
numbers. For I = 1 to 7 do
A [i] * B[i] + c[i];
Each segment consists of an input register followed by a combinational circuit. The register holds the
data and the combination circuits performs the sub operation in the particular segment.
[ In pipelining several computations can be in progress in process in distinct segments at the same
time. The overlapping of computation is made possible by associating a register with each segment
in the pipeline. The registers provided isolation between each segment so that each can operate on
distinct data simultaneously.]
R1 R2
multiplier
R3 R4
Adder
R5
R1, -------R5 are registers that revise new data with every clock plus. The multiplier and adder
and combinational circuits. The sub operation for the above processing can be written as
R1 A[i]; R2 B[i]; Inputs A[i] and B[i]
R3 R1+R2; R4 C[i] Multiply and input C[i]
R5 R3+R4 Add C[i] to product
The five resisters are loaded with new data every clock pulse.
The first clock pulse transfers the values A [1] and B [1] into registers R1 and R2 respectively.
The second clock pulse transfers the product of r1 and R2 registers into register R3 and input data C
[1] into register Ra. The same clock pulse transfers A [2] and B [2] into R1 and R2. The third Clock
pulse places A [3] and B [3] R1 and R2, transfers the product of R1 and R2 into R#< transfers C [2]
into R4 and places the some of R3 and R4 into R5.
Clock pulse numbers Segment 1 Segment 2 Segment 3
R1 R2 R3R4 R5
1 A1 B1
2 A1 B2 A1*B1 C1
3 A3 B3 A2*B2 C2 A1*B1+C1
4 A4 B4 A3*B3 C3 A2*B2+C2
5 A5 B5 A4*B4 C4 A3*B3+C3
6 A6 B6 A5*B5 C5 A4*B4+C4
7 A7 B7 A6*B6 C6 A5*B5+C5
8 A7*B7 C7 A6*B6+C6
9 A7*B7+C7
The process continues in the same manner for the first seven clock pules. In the eight clock pulse, no
data to input the segment 1 Of the pipeline consisting of registers R1 and R2 lie idle, While the
product of previous R1 and R2 computed and stored in the register R3 and new value of C input In
the register R4. In the 9th clock cycle both segment 1 and 2 lie idle, while segment 3 continues with
its computation. When no more input data are available, the clock continue until the pipeline
produces the last output.
Array Processors
Array processors are highly specialized machines. These machines perform computation on large
arrays of data. Two different types of processors are;
Attached array processor
SIMD array processor
Attached Array Processor
An attached array processor is an auxiliary processor attached to a general-purpose computer.
Attached array processor is designed as a peripheral to the host computer and its purpose is to
improve the performance of the computer by providing vector processing for complex scientific
applications.
General propose Input-output Attached array
computer Interface processor
High speed memory-to-memory bus
Local memory
Main memory
[Attached Array processor with host computer]
It has multiple functional units, which work in parallel over different type of data. It includes an
arithmetic unit consisting of one or more floating multiple and adder. It has multiple functional units,
which work in parallel over different stream of data.
The host computer is any general-purpose computer and the array processor is a back-end machine
driven by host computer. The array processor is connected by an input-output controller to the host
computer trats it like an external interface. The data for the attached processor are transferred from
main memory to a local memory through a high-speed bus. The general-purpose computer performs
all arithmetic and logical operations.
SIMD Array Processors
A SIMD array processor is processor that has a single-instruction multiple-data organization. An array
processor consists of multiple processing elements (PEs) each having local memory in under the
supervision of one control unit (U). An array processor can handle data streams.
PE1 M1
Master control unit
PE2 M2
PE3 M3
Main memory
PE n Mn
[SIMD array processor organization]
The main memory is used for storage of the program. The function of the master control unit is to
decode the instructions and determine how the instructions is to be executed. Scalar and program
control instructions are directly executed within the master control unit. SIMD processors are highly
specialized computers. They are suited for numerical problems that can be expressed in vector or
matrix form.
Flynn’s Classification
Flynn’s classification is based on the multiplicity of instruction stream and data streams in a
computer system. The sequence of instruction mead forms the memory constitute the instruction
stream, and the data they approve on in the processor constitute the data stream.
Flynn’s classification divided the computer into 4 categories
1. Single instruction Single data stream (SISD)
2. Single instruction Multiple data stream (SIMD)
3. Multiple instruction single data stream (MISD)
4. Multiple instruction Multiple data stream (MIMD)
SISD: - It represents the organization of a single computer. containing a control unit, a processor
unit and a memory unit.
CU- Control unit IS
PU- Processor unit
CU PU MM
IS- Instruction stream
IS DS
DS- Data stream
MM- Memory module
Most computers are built in SISD principle. Here instruction are executed sequential and the system
may or may not have internal parallel processing capabilities. A SISD system may have more than
one functional unit but all their functional units are under the supervision of one control unit. In
order to increase the processing, speed most SISD Uni processor system pipelined.
SIMD: It consist of multiple processing elements supervised by the same control unit.
All processing units receive the same instruction which is broad cast by the control unit but works on
different data streams. The shared memory unit most contain multiple modules, so that it can
communicate with all the processor simultaneously.
DS1 MM1
PU1
DS2
IS PU2 MM2
CU
DS3
PU n
MM n
IS
[SM- shared memory]
MISD: MISD organization consists of n processor units, each working on a different set of
instruction nut working on the same set of data. The output of one processor becomes the I/P to the
other unit. This configuration is yet to be realize practically. I have fore it is given less importance.
IS1 Ds
CU1 PU1
SM
IS2
CU2 PU2
MM1 MM2 MM3
Ds ISn IS2
IS1
CU x PU x
MIMD : It implies interaction between ‘n’ processors because all memory stream and derived
from the same data stream shared by all processors.
If the interaction between the processor n high, it is called a tightly coupled system. If the interaction
between the processor in low, it is called a loosely coupled system. [Parallel computers appear as
either SIMB or MIMD configuration]
CU1 PU1 MM1 IS1
IS2
CU2 PU2 MM2
CU n PU n MM n
IS n