Chapter 4.1, 4.2, 4.3, 4.4, 4.
5 - Hesham & Mostafa
Shared Memory Architecture
Nadeem Kafi
[Link]@[Link]
All material are taken from the Text Books & Internet
Shared Memory Systems
Communication between tasks running on
different processors is performed through
writing to and reading from the global
memory.
All inter-processor coordination and
synchronization is also accomplished via
the global memory.
Characteristics of shared memory
systems
• Any processor can directly reference any memory
location.
• Communication occurs implicitly as result of loads and
stores.
• Location of data in memory is transparent to the
programmer.
• Inherently provided on wide range of platforms (standard
processors today have specific extra hardware for share
memory systems)
• Memory may be physically distributed among
processors.
3
Shared Memory Systems
Two main problems need to be addressed when designing a shared
memory system: (1) performance degradation due to contention, and
(2) coherence problems.
Performance degradation might happen when multiple processors are
trying to access the shared memory simultaneously. A typical design
might use caches to solve the contention problem.
However, having multiple copies of data, spread throughout the caches,
might lead to a coherence problem.
The copies in the caches are coherent if they are all equal to the same
value. However, if one of the processors writes over the value of one of
the copies, then the copy becomes inconsistent because it no longer
equals the value of the other copies.
Classification of Shared Memory
UMA
Each processor has equal opportunity to read/write to
memory, including equal access speed.
NUMA
COMA
There is no memory hierarchy and the address space is made of all the
caches. There is a cache directory (D) that helps in remote cache access.
Shared Memory Requirements
• Support for memory coherency
– The machine must make sure that all of the
processing nodes have an accurate picture of the
most up-to-date memory.
• Support for atomic operations on data
– The machine must allow for only one processor to
change data at a time.
– Non-atomic operation: One processor requests data
and before the request is answered, another
processor changes that data.
9
Shared Memory Design
There are two type of interconnection
network designs:
• Bus-based or
• switch-based
Bus-based SMP
The bus/cache architecture alleviates the need for expensive multi-ported
memories and interface circuitry as well as the need to adopt a message-
passing paradigm when developing application software.
However, the bus may get saturated if multiple processors are trying to
access the shared memory (via the bus) simultaneously.
A typical bus-based design uses caches to solve the bus contention
problem. High speed caches connected to each processor on one side
and the bus on the other side mean that local copies of instructions and
data can be supplied at the highest possible rate.
Hit rate & Miss rate of a cache
If the request is not be satisfied by the cache, and so must be copied from
the global memory, across the bus, into the cache, and then passed on to
the local processor.
Bus-based SMP
The maximum number of processors with cache memories that the
bus can support is given by the relation
Where:
Example
Caches and Cache Coherence
• Caches play key role in all cases
– Reduce average data access time
– Reduce bandwidth demands placed on shared
interconnect
• But private processor caches create a problem
– Copies of a variable can be present in multiple caches
– A write by one processor may not become visible to
others
• They’ll keep accessing stale value in their caches
– Cache coherence problem
– Need to take actions to ensure visibility
14
Cache Memory Coherence
Cache coherence
16
Cache Cache Coherence
Shared Memory System
Coherence
The four combinations to maintain coherence among
all caches and global memory are:
Snooping Protocols
for Cache Coherence
Snooping protocols are based on watching bus activities and carry out the
appropriate coherency commands when necessary.
Global memory is moved in blocks, and each block has a state associated
with it, which determines what happens to the entire contents of the block.
The state of a block might change as a result of the operations Read-Miss,
Read-Hit, Write-Miss, and Write-Hit. A cache miss means that the
requested block is not in the cache or it is in the cache but has been
invalidated.
Snooping protocols differ in whether they update or invalidate shared copies
in
remote caches in case of a write operation.
They also differ as to where to obtain the new data in the case of a cache
miss.
Directory based Protocols
Updating or invalidating caches using snoopy protocols might become
unpractical, owing to the nature of some interconnection networks and the size of
the shared memory system.
For example, when a multistage network is used to build a large shared
memory system, the broadcasting techniques used in the snoopy
protocols becomes very expensive.
In such situations, coherence commands need to be sent to only those
caches that might be affected by an update. This is the idea behind
directory-based protocols.
Directory based Protocols
Cache coherence protocols that somehow store information on where
copies of blocks reside are called directory schemes.
A directory is a data structure that maintains information on the
processors that share a memory block and on its state. The information
maintained in the directory could be either centralized or distributed.
A Central directory maintains information about all blocks in a central
data structure. While Central directory includes everything in one
location, it becomes a bottleneck and suffers from large search time.
To alleviate this problem, the same information can be handled in a
distributed fashion by allowing each memory module to maintain a
separate directory. In a distributed directory, the entry associated with a
memory block has only one pointer one of the cache that requested the
block.
Fully mapped Directory
Limited directory
Chained Directory
Invalidation Protocols
Centralized Directory Invalidation