0% found this document useful (0 votes)

126 views31 pages

Shared Memory System Design Overview

The document discusses shared memory architectures, including characteristics, problems, and design considerations. It covers topics such as performance degradation due to contention, coherence problems when multiple copies of data exist, bus-based and switch-based interconnect designs, caches and their role in reducing access time and bandwidth demands, and different protocols for maintaining coherence among caches including snooping and directory-based approaches. It provides details on snooping protocols, cache states, and different types of directory schemes like centralized, distributed, fully mapped, limited, and chained directories.

Uploaded by

Asif Khan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPS, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

126 views31 pages

Shared Memory System Design Overview

Uploaded by

Asif Khan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPS, PDF, TXT or read online on Scribd

Chapter 4.1, 4.2, 4.3, 4.4, 4.

5 - Hesham & Mostafa

Shared Memory Architecture

Nadeem Kafi
[Link]@[Link]

All material are taken from the Text Books & Internet
Shared Memory Systems
Communication between tasks running on
different processors is performed through
writing to and reading from the global
memory.

All inter-processor coordination and

synchronization is also accomplished via
the global memory.
Characteristics of shared memory
systems
• Any processor can directly reference any memory
location.
• Communication occurs implicitly as result of loads and
stores.
• Location of data in memory is transparent to the
programmer.
• Inherently provided on wide range of platforms (standard
processors today have specific extra hardware for share
memory systems)
• Memory may be physically distributed among
processors.

3
Shared Memory Systems
Two main problems need to be addressed when designing a shared
memory system: (1) performance degradation due to contention, and
(2) coherence problems.

Performance degradation might happen when multiple processors are

trying to access the shared memory simultaneously. A typical design
might use caches to solve the contention problem.

However, having multiple copies of data, spread throughout the caches,

might lead to a coherence problem.

The copies in the caches are coherent if they are all equal to the same
value. However, if one of the processors writes over the value of one of
the copies, then the copy becomes inconsistent because it no longer
equals the value of the other copies.
Classification of Shared Memory
UMA

Each processor has equal opportunity to read/write to

memory, including equal access speed.
NUMA
COMA

There is no memory hierarchy and the address space is made of all the
caches. There is a cache directory (D) that helps in remote cache access.
Shared Memory Requirements

• Support for memory coherency

– The machine must make sure that all of the
processing nodes have an accurate picture of the
most up-to-date memory.
• Support for atomic operations on data
– The machine must allow for only one processor to
change data at a time.
– Non-atomic operation: One processor requests data
and before the request is answered, another
processor changes that data.

9
Shared Memory Design
There are two type of interconnection
network designs:

• Bus-based or

• switch-based
Bus-based SMP
The bus/cache architecture alleviates the need for expensive multi-ported
memories and interface circuitry as well as the need to adopt a message-
passing paradigm when developing application software.

However, the bus may get saturated if multiple processors are trying to
access the shared memory (via the bus) simultaneously.

A typical bus-based design uses caches to solve the bus contention

problem. High speed caches connected to each processor on one side
and the bus on the other side mean that local copies of instructions and
data can be supplied at the highest possible rate.

Hit rate & Miss rate of a cache

If the request is not be satisfied by the cache, and so must be copied from
the global memory, across the bus, into the cache, and then passed on to
the local processor.
Bus-based SMP
The maximum number of processors with cache memories that the
bus can support is given by the relation

Where:
Example
Caches and Cache Coherence
• Caches play key role in all cases
– Reduce average data access time
– Reduce bandwidth demands placed on shared
interconnect

• But private processor caches create a problem

– Copies of a variable can be present in multiple caches
– A write by one processor may not become visible to
others
• They’ll keep accessing stale value in their caches

– Cache coherence problem

– Need to take actions to ensure visibility
14
Cache Memory Coherence
Cache coherence

16
Cache Cache Coherence
Shared Memory System
Coherence
The four combinations to maintain coherence among
all caches and global memory are:
Snooping Protocols
for Cache Coherence
Snooping protocols are based on watching bus activities and carry out the
appropriate coherency commands when necessary.

Global memory is moved in blocks, and each block has a state associated
with it, which determines what happens to the entire contents of the block.

The state of a block might change as a result of the operations Read-Miss,

Read-Hit, Write-Miss, and Write-Hit. A cache miss means that the
requested block is not in the cache or it is in the cache but has been
invalidated.

Snooping protocols differ in whether they update or invalidate shared copies

in
remote caches in case of a write operation.

They also differ as to where to obtain the new data in the case of a cache
miss.
Directory based Protocols
Updating or invalidating caches using snoopy protocols might become
unpractical, owing to the nature of some interconnection networks and the size of
the shared memory system.

For example, when a multistage network is used to build a large shared

memory system, the broadcasting techniques used in the snoopy
protocols becomes very expensive.

In such situations, coherence commands need to be sent to only those

caches that might be affected by an update. This is the idea behind
directory-based protocols.
Directory based Protocols
Cache coherence protocols that somehow store information on where
copies of blocks reside are called directory schemes.

A directory is a data structure that maintains information on the

processors that share a memory block and on its state. The information
maintained in the directory could be either centralized or distributed.

A Central directory maintains information about all blocks in a central

data structure. While Central directory includes everything in one
location, it becomes a bottleneck and suffers from large search time.

To alleviate this problem, the same information can be handled in a

distributed fashion by allowing each memory module to maintain a
separate directory. In a distributed directory, the entry associated with a
memory block has only one pointer one of the cache that requested the
block.
Fully mapped Directory
Limited directory
Chained Directory
Invalidation Protocols
Centralized Directory Invalidation

Snooping Cache in Multiprocessors
No ratings yet
Snooping Cache in Multiprocessors
59 pages
ARM Cortex-A9 MPCore
No ratings yet
ARM Cortex-A9 MPCore
34 pages
7-Ethernet Switching
No ratings yet
7-Ethernet Switching
8 pages
Memory Hierarchy in Computer Design
No ratings yet
Memory Hierarchy in Computer Design
87 pages
LSSTMPDF
No ratings yet
LSSTMPDF
66 pages
System Bus Noc
No ratings yet
System Bus Noc
102 pages
System-on-Chip Design: Interface Standards
No ratings yet
System-on-Chip Design: Interface Standards
9 pages
Elastic-Buffer Flow Control in Networks
No ratings yet
Elastic-Buffer Flow Control in Networks
31 pages
PCIe Layering and 8b - 10b Encoding
No ratings yet
PCIe Layering and 8b - 10b Encoding
4 pages
Ethernet Standards and MAC Addresses
No ratings yet
Ethernet Standards and MAC Addresses
26 pages
Shared Memory MIMD Architecture Explained
No ratings yet
Shared Memory MIMD Architecture Explained
7 pages
SystemC Communication Concepts Explained
No ratings yet
SystemC Communication Concepts Explained
16 pages
InfiniBand Architecture Guide
No ratings yet
InfiniBand Architecture Guide
22 pages
3 - InfiniBand Architecture - The Physical Layer
No ratings yet
3 - InfiniBand Architecture - The Physical Layer
23 pages
Multi-Core Cache Hierarchies (Rajeev Balasubramonian, Norman Jouppi) (Z-Library)
No ratings yet
Multi-Core Cache Hierarchies (Rajeev Balasubramonian, Norman Jouppi) (Z-Library)
155 pages
System Bus
No ratings yet
System Bus
11 pages
AMBA Protocols Introduction
No ratings yet
AMBA Protocols Introduction
14 pages
SDRAM Architecture & Efficiency
No ratings yet
SDRAM Architecture & Efficiency
31 pages
PCIe Clock Arch
No ratings yet
PCIe Clock Arch
10 pages
Data Link Layer
No ratings yet
Data Link Layer
39 pages
Distributed Shared Memory Guide
100% (1)
Distributed Shared Memory Guide
20 pages
02 01 Troubleshooting PCI Express Link Training and Protocol Issues FROZEN
No ratings yet
02 01 Troubleshooting PCI Express Link Training and Protocol Issues FROZEN
61 pages
CHAPTER 2 Memory Hierarchy Design & APPENDIX B. Review of Memory Heriarchy
No ratings yet
CHAPTER 2 Memory Hierarchy Design & APPENDIX B. Review of Memory Heriarchy
73 pages
02 - 01 PCIe 6.0 Electrical - Update
No ratings yet
02 - 01 PCIe 6.0 Electrical - Update
46 pages
2-3 SSUSB DevCon LinkLayer Vining
No ratings yet
2-3 SSUSB DevCon LinkLayer Vining
54 pages
2-3 - Common - Storage - Protocols - Copie
No ratings yet
2-3 - Common - Storage - Protocols - Copie
58 pages
W-CDMA Air Interface Overview and Functions
100% (1)
W-CDMA Air Interface Overview and Functions
29 pages
High-Speed 8B/10B Encoder Design Using A Simplified Coding Table
100% (1)
High-Speed 8B/10B Encoder Design Using A Simplified Coding Table
5 pages
W-CDMA Air Interface Overview and Functions
No ratings yet
W-CDMA Air Interface Overview and Functions
29 pages
Forward Error Correction (FEC) PDF
No ratings yet
Forward Error Correction (FEC) PDF
6 pages
ARM RISC Architecture Overview
No ratings yet
ARM RISC Architecture Overview
243 pages
IEEE 802.11ax - An: Osama Aboul-Magd Huawei Technologies, Canada
No ratings yet
IEEE 802.11ax - An: Osama Aboul-Magd Huawei Technologies, Canada
30 pages
Parallel Processors: Performance & Scaling
No ratings yet
Parallel Processors: Performance & Scaling
21 pages
PCIe
No ratings yet
PCIe
20 pages
Eye Diagram Basics and Measurements
100% (1)
Eye Diagram Basics and Measurements
10 pages
Pulse-Shaping Filters in Communication
No ratings yet
Pulse-Shaping Filters in Communication
11 pages
Advanced Bus
No ratings yet
Advanced Bus
53 pages
Serial Communication and PCI Express Overview
No ratings yet
Serial Communication and PCI Express Overview
18 pages
PCIe Measurements 4HW 19375 1
No ratings yet
PCIe Measurements 4HW 19375 1
36 pages
SystemC Language Guide & Concepts
No ratings yet
SystemC Language Guide & Concepts
23 pages
Metastability and Synchronizers - IEEEDToct2011 PDF
No ratings yet
Metastability and Synchronizers - IEEEDToct2011 PDF
13 pages
RF Communication
No ratings yet
RF Communication
15 pages
Cortex A9 Processor
No ratings yet
Cortex A9 Processor
20 pages
System Buses and Networks-on-Chip Overview
No ratings yet
System Buses and Networks-on-Chip Overview
102 pages
Introduction NoC Paper PDF
No ratings yet
Introduction NoC Paper PDF
12 pages
Embedded System On Pci: Abstract
No ratings yet
Embedded System On Pci: Abstract
12 pages
Cache Replacement Policies
No ratings yet
Cache Replacement Policies
82 pages
Pipeline Hazards
No ratings yet
Pipeline Hazards
94 pages
Layering Protocol Verification with UVM
No ratings yet
Layering Protocol Verification with UVM
8 pages
PF Smarttime Sta Ug PDF
No ratings yet
PF Smarttime Sta Ug PDF
90 pages
Chapter 05
No ratings yet
Chapter 05
25 pages
Understanding Cache Design and Memory Hierarchy
No ratings yet
Understanding Cache Design and Memory Hierarchy
59 pages
LZW Data Compression Algorithm Design
No ratings yet
LZW Data Compression Algorithm Design
11 pages
Advanced VLSI Design Overview
No ratings yet
Advanced VLSI Design Overview
339 pages
Lect4 Parallelsystem-Shared Memory
No ratings yet
Lect4 Parallelsystem-Shared Memory
31 pages
Shared vs. Distributed Memory Architectures
No ratings yet
Shared vs. Distributed Memory Architectures
33 pages
Shared Memory System Design Overview
No ratings yet
Shared Memory System Design Overview
39 pages
Shared Memory. Distributed Memory. Hybrid Distributed-Shared Memory
No ratings yet
Shared Memory. Distributed Memory. Hybrid Distributed-Shared Memory
22 pages
Shared Memory Architecture Guide
No ratings yet
Shared Memory Architecture Guide
34 pages
Cache Coherence Protocols Explained
No ratings yet
Cache Coherence Protocols Explained
14 pages
Spanning Tree Protocol (STP) Overview
No ratings yet
Spanning Tree Protocol (STP) Overview
7 pages
Java Code Output and Error Analysis
No ratings yet
Java Code Output and Error Analysis
2 pages
Installing 3CX Using 3CX Debian ISO
No ratings yet
Installing 3CX Using 3CX Debian ISO
4 pages
AZ 04T00A ENU PowerPoint Introduction
No ratings yet
AZ 04T00A ENU PowerPoint Introduction
14 pages
Msi Thin 15 B13ve 2053nl
No ratings yet
Msi Thin 15 B13ve 2053nl
60 pages
Security Cloud 2024
100% (1)
Security Cloud 2024
299 pages
Minimax and Alpha-Beta in Game Trees
No ratings yet
Minimax and Alpha-Beta in Game Trees
57 pages
CSS-Grade 11-Q3-LAS6
100% (1)
CSS-Grade 11-Q3-LAS6
6 pages
Linux Command Line Essentials
No ratings yet
Linux Command Line Essentials
2 pages
Electronics Engineering Textbooks List
No ratings yet
Electronics Engineering Textbooks List
53 pages
Java Script Examples
No ratings yet
Java Script Examples
29 pages
Nvidia Geforce2 Mx-400™: Graphics Accelerator
No ratings yet
Nvidia Geforce2 Mx-400™: Graphics Accelerator
53 pages
Online-Shopping - 3
No ratings yet
Online-Shopping - 3
90 pages
It Workshop
No ratings yet
It Workshop
81 pages
Digital Transformation in Manufacturing
No ratings yet
Digital Transformation in Manufacturing
23 pages
Computer Software Basics
No ratings yet
Computer Software Basics
15 pages
Understanding Multimedia Devices
No ratings yet
Understanding Multimedia Devices
69 pages
CPU Pipelining and Cache Basics
No ratings yet
CPU Pipelining and Cache Basics
61 pages
C# Constructors Explained
No ratings yet
C# Constructors Explained
18 pages
SEP2W System Base Edition Ang
No ratings yet
SEP2W System Base Edition Ang
4 pages
The Guide To Day-2 Operations in Kubernetes
No ratings yet
The Guide To Day-2 Operations in Kubernetes
35 pages
Advanced High Speed Buses
No ratings yet
Advanced High Speed Buses
19 pages
Huawei U2020 MBB SMS Notification Issue
100% (1)
Huawei U2020 MBB SMS Notification Issue
1 page
ThinkBook 16 G7 IML 21MS0022AX
No ratings yet
ThinkBook 16 G7 IML 21MS0022AX
2 pages
Altair Flux - V2019 Ch2 - General Operation
No ratings yet
Altair Flux - V2019 Ch2 - General Operation
32 pages
How To Build Your Own Image For Raspberry PI 4 - Home Connected Device Innovation - Confluence For Orange
0% (1)
How To Build Your Own Image For Raspberry PI 4 - Home Connected Device Innovation - Confluence For Orange
6 pages
DigitoBits Trading Company Profile
No ratings yet
DigitoBits Trading Company Profile
9 pages
Hardware Release Note - MiR100 HW 7.0
No ratings yet
Hardware Release Note - MiR100 HW 7.0
4 pages
Year 8 Mathematics Exam Paper
No ratings yet
Year 8 Mathematics Exam Paper
7 pages

Shared Memory System Design Overview

Uploaded by

Shared Memory System Design Overview

Uploaded by

Chapter 4.1, 4.2, 4.3, 4.4, 4.

5 - Hesham & Mostafa

Shared Memory Architecture

All inter-processor coordination and

Performance degradation might happen when multiple processors are

However, having multiple copies of data, spread throughout the caches,

Each processor has equal opportunity to read/write to

• Support for memory coherency

A typical bus-based design uses caches to solve the bus contention

Hit rate & Miss rate of a cache

• But private processor caches create a problem

– Cache coherence problem

The state of a block might change as a result of the operations Read-Miss,

Snooping protocols differ in whether they update or invalidate shared copies

For example, when a multistage network is used to build a large shared

In such situations, coherence commands need to be sent to only those

A directory is a data structure that maintains information on the

A Central directory maintains information about all blocks in a central

To alleviate this problem, the same information can be handled in a

You might also like