0% found this document useful (0 votes)
59 views10 pages

Aca UNIT-5

The document discusses multi-vector and SIMD computers, detailing their architectures and processing principles. It explains SIMD array processors, their organization, and the types of vector processing principles, including gather and scatter instructions. Additionally, it covers the architectures of the CM-2 and CM-5 Connection Machines, highlighting their processing nodes, networks, and operational paradigms.

Uploaded by

sarah .s
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
59 views10 pages

Aca UNIT-5

The document discusses multi-vector and SIMD computers, detailing their architectures and processing principles. It explains SIMD array processors, their organization, and the types of vector processing principles, including gather and scatter instructions. Additionally, it covers the architectures of the CM-2 and CM-5 Connection Machines, highlighting their processing nodes, networks, and operational paradigms.

Uploaded by

sarah .s
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

UNIT-5

1Q.Discuss the Multi vector and SIMD Computers


Ans: MULTIVECTOR COMPUTERS
Vector supercomputer (same as unit-1 ans)
SIMD COMPUTERS
1. SIMD array processors:
o A synchronous array of parallel processors is called an array processor. These
processors are composed of N identical processing elements (PES) under the
supervision of a one control unit (CU) This Control unit is a computer with high
speed registers, local memory and arithmetic logic unit.
o An array processor is basically a single instruction and multiple data (SIMD)
computers. There are N data streams; one per processor, so different data can
be used in each processor. The figure below show a typical SIMD or array
processor.

o These processors consist of a number of memory modules which can be either


global or dedicated to each processor. Thus the main memory is the aggregate of
the memory modules. These Processing elements and memory unit
communicate with each other through an interconnection network. SIMD
processors are especially designed for performing vector computations
2. SIMD COMPUTER ORGANIZATIONS:
Vector processing can also be carried out by SIMD computers.
Most SIMD computers use a single control unit and distributed memories, except for a
few that use associative memories.
Based on memory distribution and addressing schemes SIMD computer models are
divided into:
1
a. Distributed-Memory SIMD model
b. Shared-Memory SIMD model
a. Distributed-Memory Model:
o Spatial parallelism is exploited among the PEs in an SIMD computer. A
distributed-memory SIMD computer consists of an array of PEs which are
controlled by the same array control unit, as shown in Fig. a. Program and data
are loaded into the control memory through the host computer.
o An instruction is sent to the control unit for decoding. If it is a scalar or program
control operation, it will be directly executed by a scalar processor attached to
the control unit. If the decoded instruction is a vector operation, it will be
broadcast to all the PEs for parallel execution.
o Partitioned data sets are distributed to all the local memories attached to the
PEs through a vector data bus.
o The PEs are interconnected by a data-routing network which performs inter-PE
data communications such as shifting, permutation, and other routing
operations. The data—routing network is under program control through the
control unit.
o The PEs are synchronized in hardware by the control unit, the same instruction
is executed by all the PEs in the same cycle.

b. Shared-Memory Model:
o An alignment network is used as the inter-PE memory communication
network. Again this network is controlled by the control unit.
o this architecture has n = 16 PEs updating m = 17 shared-memory modules
through a 16x17 alignment network.
o The alignment network must be properly set to avoid access conflicts.
o In Fig, we show a variation of the SIMD computer using shared memory
among the PEs.
2
SIMD Instruction: SIMD computers execute vector instructions for arithmetic, logic, data-
routing, and masking operations over vector quantities. ln bit-slice SIMD machines, the
vectors are nothing but binary vectors. In word-parallel SIMD machines, the vector
components are 4- or 8-byte numerical values.
All SIMD instructions must use vector operands of equal length n, when n is the number of
PEs. SIMD instructions are similar to those used in pipelined vector processors, except that
temporal parallelism in pipelines is replaced by spatial parallelism in multiple PEs.
The data-routing instructions include permutations. broadcasts, multicasts, and various
rotate and shift operations. Masking operations are used to enable or disable a subset of PEs
in any instruction cycle.
Host and I/O: All I/O activities are handled by the host computer in the above SIMD
organizations. A special control memory is used between the host and the array control unit.
This is a staging memory for holding programs and data.
Divided data sets are distributed to the local memories or to the shared memory modules
before starting the program execution. The host manages the mass storage and graphics
display of computational results. The scalar processor operates concurrently with the PE
array under the coordination of the control unit.

2Q. Discuss the Vector processing principles.


Ans: VECTOR PROCESSING PRINCIPLES:
1. Vector processor:
o Vector processor is basically a central processing unit that has the ability to
execute the complete vector input in a single instruction.

3
o It is a complete unit of hardware resources that executes a sequential set of
similar data items in the memory using a single instruction. These instructions
are said to be single instruction multiple data or vector instructions.

o The functional units of a vector computer are as follows:


• IPU or instruction processing unit
• Vector register
• Scalar register
• Scalar processor
• Vector instruction controller
• Vector access controller
• Vector processor
o A vector is defined as an ordered set of a one-dimensional array of data items. A
vector V of length n can be represented as a row vector by V = [V1 V2 V3 · · · Vn].
o Usually, the vector elements are ordered to have a fixed addressing increment
between successive elements, called the stride.
o For a processor with multiple ALUs, it is possible to operate on multiple data
elements in parallel using a single instruction. Such instructions are called single-
instruction multiple-data (SIMD) instructions. They are also called vector
instructions.

2. Vector Instruction Types: (Spectrum pg4.14)


Gather and scatter instruction diagram:
Gather is an operation that fetches from memory the nonzero elements of a sparse
vector using indices that themselves are indexed. Scatter docs the opposite, storing
into memory a vector in a sparse vector whose nonzero entries are indexed. The vector
register V1 contains the data, and the vector register V0 is used as an index to gather
or scatter data from or to random memory locations as illustrated in Figs. a and b,
respectively.
4
Masking Instruction diagram:

3. Vector-access memory schemes or organizations (spectrum pg 4.15 Q16)

3Q. Explain in detail CM-2 Architecture.


Ans : CM-2:
The Connection Machine CM-2 is a fine-grain MPP(supercomputer) computer built using
thousands of parallel bit-slice PEs to achieve a peak processing speed.

5
ARCHITECTURE OF CM-2:
Program Execution Paradigm: All programs started execution on a front-end, which issued
microinstructions to the back-end processing array when data-parallel operations were
desired. The sequencer broke down these microinstructions and broadcast them to all data
processors in the array.
Data sets and results could be exchanged between the front-end and the processing array in
one of three ways: broadcasting, global combining, and scalar memory bus as depicted in Fig.
Broadcasting was carried out through the broadcast bus to all data processors at once.

The CM-2 Architecture consists of the following:


a. Processing Array
b. Processing Nodes
c. Hypercube Routers

6
a. Processing Array:
It is a bit-slice data processor(or PEs) whose size ranges from 4k to 64k which are
controlled by a sequencer. The sequencer decoded microinstructions from the front-
end and broadcast nano instructions to the processors in the array. All processors
could access their memories simultaneously. All processors executed the broadcast
instructions in a lockstep manner.
b. Processing Nodes:
• Each data processing node contained 32 bit-slice data processors, an optional
floating-point accelerator, and interfaces for inter-processor communication.
• Each data processor was implemented with a 3-input and 2-output bit-slice
ALU and associated latches and a memory interface.
• This ALU could perform bit-serial full-adder and Boolean logic operations.
• Each processor chip contained 16 processors. The 18-bit memory address is
used to enable sharing of 256K memory words among 32 processors.

c. Hypercube routers:
• Special hardware is built on each processor chip for data routing among
processor chips are wired together to form a Boolean n-cube
• Each router node is connected to 12 other router nodes, including its paired
node.

7
4Q. Discuss the connection machine CM-5
Ans: CM-5:
CM-5 stands for Connection machine 5th generation. It is a distributed memory parallel
computer that uses a large number of processing nodes each with its own memory. It is
known for its scalability, flexibility and its ability to handle complex scientific simulations.

Key architectural features:


1. A Synchronized MIMD Machine:
• Unlike SIMD architecture of CM-1 & CM-2 ,The CM-5 is designed with a synchronized
MIMD structure. The machine was designed to contain from 32 to 16,384 processing
nodes.
• Instead of using a single sequencer (as in the CM-2), the system used a number of
control processors. Each control processor was configured with memory and disk
based on the needs.
• Input and output were provided via high-bandwidth I/O interfaces to graphics devices,
mass secondary storage such as a data vault, and high-performance networks.

2.The Network Functions:


• The building blocks were interconnected by three networks:
➢ data network
➢ control network
➢ diagnostic network.

8
a. Data Network:
▪ The data network provides high-performance, point-to-point data communications
between the processing nodes.
▪ The date network is based on the fat-tree concept
▪ To route a message from one processor node to another, the message was sent up
the tree to the least common ancestor of the two processors and then down to the
destination.
▪ Fat Trees: A fat tree is more like a real tree in that it becomes thicker as it acquires
more leaves. Processing nodes, control processors, and I/O channels are located at
the leaves of a fat tree. The internal nodes are switches.
▪ The CM-5 data network is implemented with a 4-ary fat tree as shown in Fig. Each
of the internal switch nodes was made up of several router chips. Each router chip
was connected to four child chips and either two or four parent chips.

b. Control Network:
▪ The control network provided cooperative operations, including broadcast,
synchronization, and scans, as well as system management functions.
▪ The architecture of the control network is that of a complete binary tree with all
system components at the leaves. Each user partition was assigned to a subtree of
the network.
▪ Processing nodes were located at leaves of the subtree, and a control processor
was mapped into the partition at an additional leaf. The control processor executed
scalar part of the code, while the processing nodes executed the data-parallel part.

9
▪ CONTROL PROCESSOR:

▪ The basic control processor consisted of a RISC microprocessor (CPU), memory


subsystem, I/O with local disks and Ethernet connections, and a CM-5 network
interface.
▪ The network interface is connected to the control processor to the rest of the
system through the control network and the data network.
▪ Control processors specialized in managerial functions rather than computational
functions.
c. Diagnostic network:
▪ This network is needed for upgrading system availability. Built-in testability was
achieved with scan—based diagnostics.
▪ The diagnostic network allowed groups of pods to be addressed according to a
“hypercube-address" scheme. A special diagnostic interface was designed to form
an in-system check of the integrity of all CM-5 chips

10

You might also like