0% found this document useful (0 votes)
89 views7 pages

For Example: C (1:50) A (1:50) + B (1:50)

This document discusses array processors and their two main types: attached array processors and SIMD (Single Instruction Multiple Data) array processors. Attached array processors are attached to a general purpose computer to enhance its performance on numerical computational tasks through parallel processing with multiple functional units. SIMD array processors contain multiple identical processing elements that operate under the control of a common instruction stream, providing a single instruction stream and multiple data streams.

Uploaded by

Sathish A Avn
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
89 views7 pages

For Example: C (1:50) A (1:50) + B (1:50)

This document discusses array processors and their two main types: attached array processors and SIMD (Single Instruction Multiple Data) array processors. Attached array processors are attached to a general purpose computer to enhance its performance on numerical computational tasks through parallel processing with multiple functional units. SIMD array processors contain multiple identical processing elements that operate under the control of a common instruction stream, providing a single instruction stream and multiple data streams.

Uploaded by

Sathish A Avn
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 7

Array Processor and its Types

Array processors are also known as multiprocessors or vector processors.


They perform computations on large arrays of data. Thus, they are used to
improve the performance of the computer.
There are basically two types of array processors:
Attached Array Processors
SIMD Array Processors
Attached Array Processors
An attached array processor is a processor which is attached to a general
purpose computer and its purpose is to enhance and improve the
performance of that computer in numerical computational tasks. It
achieves high performance by means of parallel processing with multiple
functional units.

Vector processing

The need to increase computational power is a never-ending requirement.


In scientific and research areas, the computational involved are quite
extensive and hence high power computers are the must.

The areas like structural engineering, petroleum exploration,


aerodynamics, hydrodynamics, nuclear research, tomography, VLSI
design, AI can have data in the form of matrices which suits vector
processors to process it at a high speed.

Some examples of its are:

1. In radar and signal processing for detection of space /


underwater targets.
2. In remote sensing for earth resources exploration.
3. In computational wind tunnel experiments.
4. In 3D stop-action computer assisted tomography.
5. Weather forecasting.
6. Medical diagnosis.

Characteristics of Vector processing

1. A vector is an ordered set of elements. A vector operand contains an


ordered set of n elements, where n is called the length of the vector. Each
element in a vector is a scalar quantity, which may be floating point
number, an integer, a logical value, or a character (byte).
2. In vector processing, two successive pairs of elements are processed
SIMD Array Processors each clock period. In dual vector pipes and dual sets of vector functional
SIMD is the organization of a single computer containing multiple units allow two pairs of elements to be processed during the same clock
processors operating in parallel. The processing units are made to operate period. As each pair of operations is completed, the results are delivered to
under the control of a common control unit, thus providing a single the appropriate elements of the result register. The operation continues
instruction stream and multiple data streams. until the number of elements processed is equal to the count specified by
A general block diagram of an array processor is shown below. It contains the vector length register.
a set of identical processing elements (PE's), each of which is having a For example: C (1:50) = A (1:50) + B (1:50)
local memory M. Each processor element includes an ALUand registers. This vector instruction includes the initial addresses of the two source
The master control unit controls all the operations of the processor operands, one destination operand, the length of the vectors and the
elements. It also decodes the instructions and determines how the operation to be performed.
instruction is to be executed. 3. Vector instructions are classified into for basic types:
The main memory is used for storing the program. The control unit is
responsible for fetching the instructions. Vector instructions are send to all
PE's simultaneously and results are returned to the memory.
F1: V = V f2: V = S
The best known SIMD array processor is the ILLIAC IV computer
F3: V * V = V f4: V*S = V
developed by the Burroughs corps. SIMD processors are highly
Where V indicates vector operand and S indicates scalar operand. The
specialized computers. They are only suitable for numerical problems that
can be expressed in vector or matrix form and they are not suitable for operations f1 and f2 are unary operations such as vector square root,
other types of computations. vector sine, vector complement, vector summation and so on. On the other
Array processors increases the overall instruction processing speed. hand, operations f3 and f4 are binary operations such as vector add,
As most of the Array processors operates asynchronously from the host vector multiply, vector scalar adds and so on.
CPU, hence it improves the overall capacity of the system.
Array Processors has its own local memory, hence providing extra memory 4. In vector processing, identical processes are repeatedly invoked many
for systems with low memory.
times, each of which can be subdivided into subprocesses.
5. In vector processing, successive operands are fed through the pipeline
segments and require as few buffers and local controls as possible. This
parallel vector processing allows the generation of more than two results
per clock period. The parallel vector operations are automatically initiated
either when successive vector instructions use different functional units
and different vector registers, or when successive vector instructions use
the result stream from one vector register as the operand of another
operation using different functional units. This process is known as
chaining.
"Direct Memory Access" and is a method of transferring data from that takes control of the bus when the central arbitration control point grants
the computer's RAM to another part of the computer without processing it the DMA slave's request.
using the CPU. While most data that is input or output from your computer
is processed by the CPU, some data does not require processing, or can be
processed by another device.
Multiprocessor
In these situations, DMA can save processing time and is a more efficient
way to move data from the computer's memory to other devices. In order for A multiprocessor system is an interconnection of two or more CPUs with
devices to use direct memory access, they must be assigned to a DMA memory and I/O equipment.  IOPs are generally not included in the
channel. Each type of port on a computer has a set of DMA channels that definitions of multiprocessor system unless they have computational
can be assigned to each connected device. For example, a PCI controller facilities comparable to CPUs.  Multiprocessor are MIMD system. 
and a hard drive controller each have their own set of DMA channels. Multicomputer system includes number of computers connected together
by means of communication lines.

It improves reliability.  If one system fails, the whole system continue to


function with perhaps low efficiency.  The computation can proceed in
parallel in two ways:  Multiple independent jobs operate in parallel  A
single job can be partitioned into multiple parallel tasks  Multiprocessor
systems are classified by the way their memory is organized: 1. tightly
coupled systems (Shared memory) 2. loosely coupled systems (Distributed
memory)

Cache Memory is a special very high-speed memory. It is used to speed


up and synchronizing with high-speed CPU. Cache memory is costlier than
main memory or disk memory but economical than CPU registers. Cache
memory is an extremely fast memory type that acts as a buffer between
For example, a sound card may need to access data stored in the RAM and the CPU. It holds frequently requested data and instructions so
computer's RAM, but since it can process the data itself, it may use DMA to that they are immediately available to the CPU when needed.
bypass the CPU. Video cards that support DMA can also access the system Cache memory is used to reduce the average time to access data from the
memory and process graphics without needing the CPU. Ultra DMA hard Main memory. The cache is a smaller and faster memory which stores
drives use DMA to transfer data faster than previous hard drives that copies of the data from frequently used main memory locations. There are
required the data to first be run through the CPU. various different independent caches in a CPU, which stored instruction
and data.
An alternative to DMA is the Programmed Input/Output (PIO) interface in
which all data transmitted between devices goes through the processor. A
newer protocol for the ATAIIDE interface is Ultra DMA, which provides a
burst data transfer rate up to 33 mbps. Hard drives that come with Ultra Levels of memory:
DMAl33 also support PIO modes 1, 3, and 4, and multiword DMA mode 2 at  Level 1 or Register –
16.6 mbps. It is a type of memory in which data is stored and accepted that are
immediately stored in CPU. Most commonly used register is
accumulator, Program counter, address register etc.
DMA Transfer Types
 Level 2 or Cache memory –
It is the fastest memory which has faster access time where data is
Memory To Memory Transfer temporarily stored for faster access.
 Level 3 or Main Memory –
In this mode block of data from one memory address is moved to another It is memory on which computer works currently it is small in size
memory address. In this mode current address register of channel 0 is used and once power is off data no longer stays in this memory
to point the source address and the current address register of channel is
used to point the destination address in the first transfer cycle, data byte  Level 4 or Secondary Memory –
from the source address is loaded in the temporary register of the DMA It is external memory which is not fast as main memory but data
controller and in the next transfer cycle the data from the temporary register stays permanently in this memory
is stored in the memory pointed by destination address. After each data
transfer current address registers are decremented or incremented
according to current settings. The channel 1 current word count register is
also decremented by 1 after each data transfer. When the word count of
channel 1 goes to FFFFH, a TC is generated which activates EOP output
terminating the DMA service.

Auto initialize
In this mode, during the initialization the base address and word count
registers are loaded simultaneously with the current address and word count
registers by the microprocessor. The address and the count in the base
registers remain unchanged throughout the DMA service.
After the first block transfer i.e. after the activation of the EOP signal, the
original values of the current address and current word count registers are
automatically restored from the base address and base word count register
of that channel. After auto initialization the channel is ready to perform
another DMA service, without CPU intervention.

DMA Controller
The controller is integrated into the processor board and manages all DMA
data transfers. Transferring data between system memory and an 110
device requires two steps. Data goes from the sending device to the DMA
controller and then to the receiving device. The microprocessor gives the
DMA controller the location, destination, and amount of data that is to be
transferred. Then the DMA controller transfers the data, allowing the
microprocessor to continue with other processing tasks. When a device
needs to use the Micro Channel bus to send or receive data, it competes
with all the other devices that are trying to gain control of the bus. This
process is known as arbitration. The DMA controller does not arbitrate for
control of the BUS instead; the I/O device that is sending or receiving data
(the DMA slave) participates in arbitration. It is the DMA controller, however,
Cache Performance: the rest of the memory. It is referred to as the level 2 (L2)
When the processor needs to read or write a location in main memory, it cache. Often, the Level 2 cache is also housed on the
first checks for a corresponding entry in the cache. processor chip.
Cache Mapping:
There are three different types of mapping used for the purpose of cache
memory which are as follows: Direct mapping, Associative mapping, and Locality of reference –
Set-Associative mapping. These are explained as following below. Since size of cache memory is less as compared to main memory.
1. Direct Mapping – So to check which part of main memory should be given priority
The simplest technique, known as direct mapping, maps each and loaded in cache is decided based on locality of reference.
block of main memory into only one possible cache line. or
In Direct mapping, assigned each memory block to a specific line in
the cache. If a line is previously taken up by a memory block when What is addressing modes?
a new block needs to be loaded, the old block is trashed. An The addressing modes is a really important topic to be considered in
address space is split into two parts index field and a tag field. The microprocessor or computer organisation. The addressing modes in
cache is used to store the tag field whereas the rest is stored in the computer architecture actually define how an operand is chosen to
main memory. Direct mapping`s performance is directly execute an instruction. It is the way that is used to identify the location of
proportional to the Hit ratio. an operand which is specified in an instruction.
2. i = j modulo m Whenever an instruction executes, it requires operands to be operated on.
An instruction field consisting of opcode and operand. Where
3. where operand means the data and opcode means the instruction itself. In case of
operations like addition or subtraction, they require two data. So, they are
4. i=cache line number called binary instruction. On the other hand, the increment or decrement
5. j= main memory block number operations need only one data and are so called unary instruction. Now the
question is how these data can be obtained.
m=number of lines in the cache

For purposes of cache access, each main memory address can be Consider an example that you want to buy a product from an online shopping
viewed as consisting of three fields. The least significant w bits site say Amazon. Now you can pay it either by using cash on delivery, by
identify a unique word or byte within a block of main memory. In net banking, by debit/credit card, by UPI etc. So, in different ways, you can
most contemporary machines, the address is at the byte level. The make payment to Amazon. This is the various payment modes available.
remaining s bits specify one of the 2s blocks of main memory. The But here we are discussing the addressing modes in computer architecture,
cache logic interprets these s bits as a tag of s-r bits (most that means how an instruction can access operands to be operated on using
significant portion) and a line field of r bits. This latter field identifies various modes.
one of the m=2r lines of the cache.
6. Associative Mapping – The various addressing modes in computer architecture can be classified as
In this type of mapping, the associative memory is used to store below. We have some other addressing modes too, but these are the prime
content and addresses both of the memory word. Any block can go addressing modes in computer architecture.
into any line of the cache. This means that the word id bits are
used to identify which word in the block is needed, but the tag
becomes all of the remaining bits. This enables the placement of
any word at any place in the cache memory. It is considered to be
the fastest and the most flexible mapping form.
7. Set-associative Mapping –
This form of mapping is an enhanced form of direct mapping where
the drawbacks of direct mapping are removed. Set associative
addresses the problem of possible thrashing in the direct mapping
method. It does this by saying that instead of having exactly one
line that a block can map to in the cache, we will group a few lines
together creating a set. Then a block in memory can map to any
one of the lines of a specific set..Set-associative mapping allows
that each word that is present in the cache can have two or more
words in the main memory for the same index address. Set
associative cache mapping combines the best of direct and
associative cache mapping techniques.
In this case, the cache consists of a number of sets, each of which
consists of a number of lines. The relationships are

m=v*k

i= j mod v

where

i=cache set number

j=main memory block number

v=number of sets

m=number of lines in the cache number of sets

k=number of lines in each set

Application of Cache Memory –


1. Usually, the cache memory can store a reasonable number
of blocks at any given time, but this number is small
compared to the total number of blocks in the main
memory.
2. The correspondence between the main memory blocks and
those in the cache is specified by a mapping function.
Types of Cache –
 Primary Cache –
A primary cache is always located on the processor chip.
This cache is small and its access time is comparable to
that of processor registers.
 Secondary Cache –
Secondary cache is placed between the primary cache and
Addressing modes in computer architecture: Register addressing mode:
In case of register addressing mode, the instruction will have
the opcode and a register number. Depending upon the register number,
 Implicit one of the registers will be selected from the available sets of registers by
 Immediate default automatically.
 Direct The unique identification of the register can be done by the register number
which is mentioned in the instruction. In that register, the operand can be
 Indirect found.
 Register
 Register Indirect Register indirect addressing mode :
 Displacement In the register indirect addressing mode, the instruction will contain
o Relative the opcode as well as a register number. Depending upon the register
o Base register number mentioned in the instruction, the corresponding register will be
o Indexing accessed from the set of registers. But here the register doesn’t contain the
 Stack operand but will contain the address of the operand in the memory at where
Let us discuss all these addressing modes available in computer the operand can be found.
architecture one by one.
Suppose in memory, the operand is in the Ath location. Now, this
Implicit addressing mode : address A will be stored in the register and the register number say R will
The term implicit addressing mode means here we are not mentioning be mentioned in the instruction. This is called register addressing mode.
clearly in details that from where the instruction can get the operand. But by Displacement addressing mode :
default, the instruction itself knows from where it is supposed to access the In the displacement addressing mode, the instruction will be having three
operand. For example, CMA stands for complement accumulator. The fields. One for the opcode, one for the register number and the remaining
meaning of the CMA instruction is whatever the value present in the one for an absolute address.
accumulator will be replaced by its 1’s complement. At first, depending upon the register number the register will be selected
In this instruction CMA or along with this instruction, we are not mentioning from the register set. After that its content will be added with the absolute
any operand. So here it knows that the operand has to be accessed from address and the new address formed will be the actual physical address of
the accumulator implicitly. This is known as implicit addressing modes. the operand in the memory.
Immediate addressing mode :
In the immediate addressing mode, the instruction contains two fields. One
for the opcode and another field contains the operand itself. That means in
this addressing mode, there is no need to go anywhere to access the
operand because of the instruction itself containing the operand. This is
known as immediate addressing mode.

Displacement addressing mode in computer architecture can be categorized


Direct addressing mode : into 3 different modes.
In the direct addressing mode, the instruction will have the two parts. One
part will contain the opcode and another one will contain the address of
the memory location at where the operand can be found. 1. Relative
Here A is the address of the operand. That means at the Ath location in the 2. Base register
memory, the operand can be found. 3. Indexing
In case of relative addressing mode, the register used will be a program
counter.
In the base addressing mode, the register will contain the base address and
the absolute field will be the offset or displacement from the base address.
After adding both the actual physical address of the operand can be
obtained and mapping this address in the memory we can access the
operand.
For example, if the base address is 3000 and the offset is 20, then after
adding both i.e. 3020 will be the actual address of the operand.
In case of Indexing mode, the absolute field will contain the starting base
address of the memory block and the register field will contain the index
value. Adding both will give the actual physical address of the operand.
Stack addressing mode:
In case of stack addressing mode, the instruction knows the topmost data
should be the operand. If the instruction is a unary instruction then it will
select the topmost data as the operand and if the instruction is a binary
instruction then it will select the topmost two data as the operands from the
top of the stack.
These are the basic and primary addressing modes in computer
architecture. Apart from these modes, we have some other addressing
modes in computer architecture including auto-increment, auto-decrement
mode. But above mentioned are the most important addressing modes in
computer architecture or computer organisation.

Indirect addressing mode :


Indirect addressing mode contains the opcode and address field. But
unlike direct addressing mode, it doesn’t contain the address of the operand
but contains the address of a memory location in which the actual address
of the operand can be found.

Here A contains the address of the location B in memory and B contains the
actual address of the operand in memory.
Computer Organization | Instruction Formats (Zero, One, Two and Three
Address Instruction) LOAD A AC = M[A]

Computer perform task on the basis of instruction provided. A instruction in ADD B AC = AC + M[B]
computer comprises of groups called fields. These field contains different
information as for computers every thing is in 0 and 1 so each field has
different significance on the basis of which a CPU decide what so perform.
STORE T M[T] = AC
The most common fields are:

 Operation field which specifies the operation to be performed like


addition. LOAD C AC = M[C]
 Address field which contain the location of operand, i.e., register or
memory location.
ADD D AC = AC + M[D]
 Mode field which specifies how operand is to be founded.
A instruction is of various length depending upon the number of addresses
it contain. Generally CPU organization are of three types on the basis of
MUL T AC = AC * M[T]
number of address fields:

1. Single Accumulator organization


2. General register organization STORE X M[X] = AC
3. Stack organization
In first organization operation is done involving a special register called
3. Two Address Instructions –
accumulator. In second on multiple registers are used for the computation
This is common in commercial computers.Here two address can be
purpose. In third organization the work on stack basis operation due to
specified in the instruction.Unlike earlier in one address instruction
which it does not contain any address field. It is not necessary that only a
the result was stored in accumulator here result cab be stored at
single organization is is applied a blend of various organization is mostly
different location rather than just accumulator, but require more
what we see generally.
number of bit to represent address.
On the basis of number of address instruction are classified as:
Here destination address can also contain operand.
Note that we will use X = (A+B)*(C+D) expression to showcase the
procedure. Expression: X = (A+B)*(C+D)

1. Zero Address Instructions – R1, R2 are registers

M[] is any memory location


A stack based computer do not use address field in instruction.To
evaluate a expression first it is converted to revere Polish Notation MOV R1, A R1 = M[A]
i.e. Post fix Notation.

Expression: X = (A+B)*(C+D)
ADD R1, B R1 = R1 + M[B]
Postfixed : X = AB+CD+*

TOP means top of stack MOV R2, C R2 = C


M[X] is any memory location
ADD R2, D R2 = R2 + D
PUSH A TOP = A

MUL R1, R2 R1 = R1 * R2
PUSH B TOP = B

MOV X, R1 M[X] = R1
ADD TOP = A+B
4. Three Address Instructions –
This has three address field to specify a register or a memory
PUSH C TOP = C location. Program created are much short in size but number of bits
per instruction increase. These instructions make creation of
program much easier but it does not mean that program will run
PUSH D TOP = D much faster because now instruction only contain more information
but each micro operation (changing content of register, loading
address in address bus etc.) will be performed in one cycle only.
ADD TOP = C+D

Expression: X = (A+B)*(C+D)
MUL TOP = (C+D)*(A+B) R1, R2 are registers

M[] is any memory location


POP X M[X] = TOP
ADD R1, A, B R1 = M[A] + M[B]
2. One Address Instructions –
This use a implied ACCUMULATOR register for data
manipulation.One operand is in accumulator and other is in register ADD R2, C, D R2 = M[C] + M[D]
or memory location.Implied means that the CPU already know that
one operand is in accumulator so there is no need to specify it.
MUL X, R1, R2 M[X] = R1 * R2
Expression: X = (A+B)*(C+D)

AC is accumulator

M[] is any memory location

M[T] is temporary location


RAID 2. Vertical Micro-programmed control Unit :
The control signals re represented in the encoded binary format. For N
RAID, or “Redundant Arrays of Independent Disks” is a technique which control signals- Log2(N) bits are required.
makes use of a combination of multiple disks instead of using a single disk  It supports shorter control words.
for increased performance, data redundancy or both. The term was coined  It supports easy implementation of new conrol signals therefore it is
by David Patterson, Garth A. Gibson, and Randy Katz at the University of more flexible.
California, Berkeley in 1987.
 It allows low degree of parallelism i.e., degree of parallelism is
Why data redundancy? either 0 or 1.
Data redundancy, although taking up extra space, adds to disk reliability.  Requires an additional hardware (decoders) to generate control
This means, in case of disk failure, if the same data is also backed up onto signals, it implies it is slower than horizontal microprogrammed.
another disk, we can retrieve the data and go on with the operation. On the  It is less flexible than horizontal but more flexible than that of
other hand, if the data is spread across just multiple disks without the RAID hardwired control unit
technique, the loss of a single disk can affect the entire data.

Key evaluation points for a RAID System


Cache Coherence Protocols
 Reliability: How many disk faults can the system tolerate?
 Availability: What fraction of the total session time is a system in Prerequisite – Cache Memory
uptime mode, i.e. how available is the system for actual use? In multiprocessor system where many processes needs a copy of same
 Performance: How good is the response time? How high is the memory block, the maintenance of consistency among these copies raises
throughput (rate of processing work)? Note that performance a raises a problem referred to as Cache Coherence Problem.
contains a lot of parameters and not just the two. This occurs mainly due to these causes:-
 Capacity: Given a set of N disks each with B blocks, how much  Sharing of writable data.
useful capacity is available to the user?  Process migration.
RAID is very transparent to the underlying system. This means, to the host  Inconsistency due to I/O.
system, it appears as a single big disk presenting itself as a linear array of Cache Coherence Protocols:
blocks. This allows older technologies to be replaced by RAID without These are explained as following below:
making too many changes in the existing code. 1. MSI Protocol:
This is a basic cache coherence protocol used in multiprocessor system.
The letters of protocol name identify possible states in which a cache can
Hardwired Control Unit – be. So, for MSI each block can have one of the following possible states:
The control hardware can be viewed as a state machine that changes from Modified –
one state to another in every clock cycle, depending on the contents of the The block has been modified n cache, i.e., the data in the cache is
instruction register, the condition codes and the external inputs. The inconsistent with the backing store (memory). So, a cache with a block in
outputs of the state machine are the control signals. The sequence of the “M” state has responsibility to write the block to backing store when it is
operation carried out by this machine is determined by the wiring of the evicted.
logic elements and hence named as “hardwired”. Shared –
This block is not modified and is present in atleast one cache. The cache
 Fixed logic circuits that correspond directly to the Boolean can evict the data without writing it to backing store.
expressions are used to generate the control signals. Invalid –
 Hardwired control is faster than micro-programmed control. This block is invalid and must be fetched from memory or from another
 A controller that uses this approach can operate at high speed. cache if it is to be stored in this cache.
 RISC architecture is based on hardwired control unit 2. MOSI Protocol:
This protocol is an extension of MSI protocol. It adds the following state in
MSI protocol:
Micro-programmed Control Unit – Owned –
It indicates that the present processor owns this block and will service
 The control signals associated with operations are stored in special requests from other processors for the block.
memory units inaccessible by the programmer as Control Words.
3. MESI Protocol –
 Control signals are generated by a program are similar to machine It is the most widely used cache coherence protocol. Every cache line is
language programs. marked with one the the following states:
 Micro-programmed control unit is slower in speed because of the Modified –
time it takes to fetch microinstructions from the control memory. This indicates that the cache line is present in current cache only and is
Some Important Terms – dirty i.e its value is different from the main memory. The cache is required
1. Control Word : A control word is a word whose individual bits to write the data back to main memory in future, before permitting any
represent various control signals. other read of invalid main memory state.
2. Micro-routine : A sequence of control words corresponding to the Exclusive – This indicates that the cache line is present in current cache
control sequence of a machine instruction constitutes the micro- only and is clean i.e its value matches the main memory value.
routine for that instruction. Shared – It indicates that this cache line may be stored in other caches of
3. Micro-instruction : Individual control words in this micro-routine the machine.
are referred to as microinstructions. Invalid – It indicates that this cache line is invalid.
4. Micro-program : A sequence of micro-instructions is called a 4. MOESI Protocol:
micro-program, which is stored in a ROM or RAM called a Control This is a full cache coherence protocol that encompasses all of the
Memory (CM). possible states commonly used in other protocols. Each cache line is in
5. Control Store : the micro-routines for all instructions in the one of the following states:
instruction set of a computer are stored in a special memory called Modified –A cache line in this state holds the most recent, correct copy of
the Control Store. the data while the copy in the main memory is incorrect and no other
processor holds a copy.
Owned –
Types of Micro-programmed Control Unit – Based on the type of A cache line in this state holds the most recent, correct copy of the data. It
Control Word stored in the Control Memory (CM), it is classified into two is similar to shared state in that other processors can hold a copy of most
types : recent, correct data and unlike shared state however, copy in main
1. Horizontal Micro-programmed control Unit : memory can be incorrect. Only one processor can hold the data in owned
The control signals are represented in the decoded binary format that is 1 state while all other processors must hold the data in shared state.
bit/CS. Example: If 53 Control signals are present in the processor than 53 Exclusive – A cache line in this state holds the most recent, correct copy
bits are required. More than 1 control signal can be enabled at a time. of the data. The main memory copy is also most recent, correct copy of
 It supports longer control word. data while no other holds a copy of data.
Shared – A cache line in this state holds the most recent, correct copy of
 It is used in parallel processing applications. the data. Other processors in system may hold copies of data in shared
 It allows higher degree of parallelism. If degree is n, n CS are state as well. The main memory copy is also the most recent, correct copy
enabled at a time. of the data, if no other processor holds it in owned state.
 It requires no additional hardware(decoders). It means it is faster  Invalid –
than Vertical Microprogrammed. A cache line in this state does not hold a valid copy of data. Valid
 It is more flexible than vertical microprogrammed copies of data can be either in main memory or another processor
cache.
Shared-Memory Multiprocessors
a major role in the medium-scale computing which is a major part of the
A shared-memory multiprocessor is an architecture consisting of a modest
commercial market. SMPs also serve as nodes within much larger MPPs.
number of processors, all of which have direct (hardware) access to all the

main memory in the system This permits any of the system processors to

access data that any of the other processors has created or will use. The

key to this form of multiprocessor architecture is

the interconnection network that directly connects all the processors to the

memories. This is complicated by the need to retain cache

coherence across all caches of all processors in the system.

Sign in to download full-size image

Figure 2.17. The shared-memory multiprocessor architecture.

Cache coherence ensures that any change in the data of one cache is

reflected by some change to all other caches that may have a copy of the

same global data location. It guarantees that any data load or store to a

processor register, if acquired from the local cache, will be correct, even if

another processor is using the same data. The interconnection

network that provides cache coherence may employ any one of several

techniques. One of the earliest is the modified exclusive shared invalid

(MESI) protocol, sometimes referred to as a “snooping cache”, in which a

shared bus is used to connect all processors and memories together. This

method permits any write of one processor to memory to be detected by all

other processors and checked to see if the same memory location is

cached locally. If so, some indication is recorded and the cache is either

updated or at least invalidated, such that no error occurs.

Shared-memory multiprocessors are differentiated by the relative time to

access the common memory blocks by their processors. A SMP is

a system architecture in which all the processors can access each memory

block in the same amount of time. This capability is often referred to as

“UMA” or uniform memory access. SMPs are controlled by a single

operating system across all the processor coresand a network such as a

bus or cross-bar that gives direct access to the multiple memory banks.

Access times can still vary, as contention between two or more processors

for any single memory bank will delay access times of one or more

processors. But all processors still have the same chance and equal

access. Early SMPs emerged in the 1980s with such systems as the

Sequent Balance 8000. Today SMPs serve as enterprise servers,

deskside machines, and even laptops using multicore chips, and thus play

You might also like