Digital Design &
Computer Architecture
Sarah Harris & David Harris
Chapter 8:
Memory Systems
Chapter 8 :: Topics
• Introduction
• Memory System Performance
Analysis
• Caches
• Virtual Memory
• MemoryMapped I/O
• Summary
2 Digital Design & Computer Architecture Memory Systems
Introduction
• Computer performance depends on:
– Processor performance
– Memory system performance
Processor / Memory Interface:
CLK CLK
MemWrite WE
Address ReadData
Processor Memory
WriteData
3 Digital Design & Computer Architecture Memory Systems
ProcessorMemory Gap
• In prior chapters, assumed access memory in 1
clock cycle – but hasn’t been true since the
1980’s.
4 Digital Design & Computer Architecture Memory Systems
Memory System Challenge
• Make memory system appear as fast as
processor
• Use hierarchy of memories
• Ideal memory:
– Fast
– Cheap (inexpensive)
– Large (capacity)
• But can only choose two!
5 Digital Design & Computer Architecture Memory Systems
Memory Hierarchy
CLK Processor Chip
Main
Hard
CPU Cache Memory
Disk
Technology Price / GB Access Time (ns) Bandwidth (GB/s)
SRAM $100 0.2 3 100+
Cache
DRAM $3 10 50 30
Speed
Main Memory
SSD $0.10 20,000 0.05 3
HDD $0.03 5,000,000 0.001 0.1
Virtual Memory
Capacity
6 Digital Design & Computer Architecture Memory Systems
Locality
Exploit locality to make memory accesses fast:
• Temporal Locality:
– Locality in time
– If data used recently, likely to use it again soon
– How to exploit: keep recently accessed data in higher
levels of memory hierarchy
• Spatial Locality:
– Locality in space
– If data used recently, likely to use nearby data soon
– How to exploit: when access data, bring nearby data
into higher levels of memory hierarchy too
7 Digital Design & Computer Architecture Memory Systems
Chapter 7: Microarchitecture
Memory Performance
Memory Performance
• Hit: data found in that level of memory hierarchy
• Miss: data not found (must go to next level)
Hit Rate = # hits / # memory accesses
= 1 – Miss Rate
Miss Rate = # misses / # memory accesses
= 1 – Hit Rate
• Average memory access time (AMAT): average
time for processor to access data
AMAT = tcache + MRcache[tMM + MRMM(tVM)]
9 Digital Design & Computer Architecture Memory Systems
Memory Performance Example 1
• A program has 2,000 loads and stores
• 1,250 of these data values in cache
• Rest supplied by other levels of memory
hierarchy
• What are the cache hit and miss rates?
Hit Rate = 1250/2000 = 0.625
Miss Rate = 750/2000 = 0.375 = 1 – Hit Rate
10 Digital Design & Computer Architecture Memory Systems
Memory Performance Example 2
• Suppose processor has 2 levels of hierarchy:
cache and main memory
• tcache = 1 cycle, tMM = 100 cycles
• What is the AMAT (average memory access
time) of the program from Example 1?
AMAT = tcache + MRcache(tMM)
= [1 + 0.375(100)] cycles
= 38.5 cycles
11 Digital Design & Computer Architecture Memory Systems
Gene Amdahl, 1922
• Amdahl’s Law: the effort
spent increasing the
performance of a
subsystem is wasted
unless the subsystem
affects a large percentage
of overall performance
• Cofounded 3 companies,
including one called
Amdahl Corporation in
1970
12 Digital Design & Computer Architecture Memory Systems
Chapter 7: Microarchitecture
Caches
Cache
• Highest level in memory hierarchy
• Fast (typically ~ 1 cycle access time)
• Ideally supplies most data to processor
• Usually holds most recently accessed data
CLK Processor Chip
Main
Hard
CPU Cache Memory
Disk
14 Digital Design & Computer Architecture Memory Systems
Cache Design Questions
• What data is held in the cache?
• How is data found?
• What data is replaced?
We focus on data loads, but stores follow the same principles.
15 Digital Design & Computer Architecture Memory Systems
What data is held in the cache?
• Ideally, cache anticipates needed data and
puts it in cache
• But impossible to predict future
• Use past to predict future – temporal and
spatial locality:
– Temporal locality: copy newly accessed data into
cache
– Spatial locality: copy neighboring data into cache
too
16 Digital Design & Computer Architecture Memory Systems
Cache Terminology
• Capacity (C):
– number of data bytes in cache
• Block size (b):
– bytes of data brought into cache at once
• Number of blocks (B = C/b):
– number of blocks in cache: B = C/b
• Degree of associativity (N):
– number of blocks in a set
• Number of sets (S = B/N):
– each memory address maps to exactly one cache set
17 Digital Design & Computer Architecture Memory Systems
How is data found?
• Cache organized into S sets
• Each memory address maps to exactly one set
• Caches categorized by # of blocks in a set:
– Direct mapped: 1 block per set
– Nway set associative: N blocks per set
– Fully associative: all cache blocks in 1 set
• Examine each organization for a cache with:
– Capacity (C = 8 words)
– Block size (b = 1 word)
– So, number of blocks (B = 8)
18 Digital Design & Computer Architecture Memory Systems
Example Cache Parameters
• C = 8 words (capacity)
• b = 1 word (block size)
• So, B = 8 (# of blocks)
Ridiculously small, but will illustrate organizations
19 Digital Design & Computer Architecture Memory Systems
Chapter 7: Microarchitecture
DirectMapped Caches
Direct Mapped Cache
Address
11...11111100 mem[0xFF...FC]
11...11111000 mem[0xFF...F8]
11...11110100 mem[0xFF...F4]
11...11110000 mem[0xFF...F0]
11...11101100 mem[0xFF...EC]
11...11101000 mem[0xFF...E8]
11...11100100 mem[0xFF...E4]
11...11100000 mem[0xFF...E0]
00...00100100 mem[0x00...24]
00...00100000 mem[0x00..20] Set Number
00...00011100 mem[0x00..1C] 7 (111)
00...00011000 mem[0x00...18] 6 (110)
00...00010100 mem[0x00...14] 5 (101)
00...00010000 mem[0x00...10] 4 (100)
00...00001100 mem[0x00...0C] 3 (011)
00...00001000 mem[0x00...08] 2 (010)
00...00000100 mem[0x00...04] 1 (001)
00...00000000 mem[0x00...00] 0 (000)
230 Word Main Memory 23 Word Cache
21 Digital Design & Computer Architecture Memory Systems
Direct Mapped Cache Hardware
Byte
Tag Set Offset
Memory
00
Address
27 3
V Tag Data
8-entry x
(1+27+32)-bit
SRAM
27 32
Hit Data
22 Digital Design & Computer Architecture Memory Systems
Direct Mapped Cache Performance
Byte
Tag Set Offset
Memory
00...00 001 00
Address 3
V Tag Data
0 Set 7 (111)
# RISC-V assembly code 0 Set 6 (110)
addi s0, zero, 5 0 Set 5 (101)
addi s1, zero, 0 0 Set 4 (100)
1 00...00 mem[0x00...0C] Set 3 (011)
LOOP: beq s0, zero, DONE
1 00...00 mem[0x00...08] Set 2 (010)
lw s2, 4(s1) 1 00...00 mem[0x00...04] Set 1 (001)
lw s3, 12(s1) 0 Set 0 (000)
lw s4, 8(s1)
addi s0, s0, -1 Miss Rate = 3/15
j LOOP = 20%
DONE:
Temporal Locality
Compulsory Misses
23 Digital Design & Computer Architecture Memory Systems
Direct Mapped Cache: Conflict Miss
Byte
Tag Set Offset
Memory
00...01 001 00
Address 3
V Tag Data
# RISC-V assembly code 0 Set 7 (111)
0 Set 6 (110)
addi s0, zero, 5
0 Set 5 (101)
addi s1, zero, 0 0 Set 4 (100)
LOOP: beq s0, zero, DONE 0 Set 3 (011)
lw s2, 0x4(s1) 0 Set 2 (010)
mem[0x00...04] Set 1 (001)
1 00...00
lw s4, 0x24(s1) mem[0x00...24]
0 Set 0 (000)
addi s0, s0, -1
j LOOP Miss Rate = 10/10
DONE:
= 100%
Conflict Misses
24 Digital Design & Computer Architecture Memory Systems
Chapter 7: Microarchitecture
Associative Caches
NWay Set Associative Cache
Byte
Tag Set Offset
Memory
00
Address Way 1 Way 0
28 2
V Tag Data V Tag Data
28 32 28 32
= =
0
Hit1 Hit0 Hit1
32
Hit
Conflict Misses
Data
26 Digital Design & Computer Architecture Memory Systems
NWay Set Assoc. Cache Performance
# RISC-V assembly code
addi s0, zero, 5
addi s1, zero, 0
LOOP: beq s0, zero, DONE Miss Rate = 2/10
lw s2, 0x4(s1) = 20%
lw s4, 0x24(s1)
addi s0, s0, -1 Associativity reduces
j LOOP
conflict misses
DONE:
Way 1 Way 0
V Tag Data V Tag Data
0 0 Set 3
0 0 Set 2
1 00...10 mem[0x00...24] 1 00...00 mem[0x00...04] Set 1
0 0 Conflict Misses Set 0
27 Digital Design & Computer Architecture Memory Systems
Fully Associative Cache
V Tag Data V Tag Data V Tag Data V Tag Data V Tag Data V Tag Data V Tag Data V Tag Data
Reduces conflict misses
Expensive to build
Conflict Misses
28 Digital Design & Computer Architecture Memory Systems
Chapter 7: Microarchitecture
Spatial Locality
Spatial Locality
• Increase block size:
– Block size, b = 4 words
– C = 8 words
– Direct mapped (1 block per set)
– Number of blocks, B = 2 (C/b = 8/4 = 2)
Block Byte
Tag Set Offset Offset
Memory
00
Address
27 2
V Tag Data
Set 1
Set 0
27 32 32 32 32
11
10
01
00
32
=
Hit Data
30 Digital Design & Computer Architecture Memory Systems
Cache with Larger Block Size
Block Byte
Tag Set Offset Offset
Memory
00
Address
27 2
V Tag Data
Set 1
Set 0
27 32 32 32 32
11
10
01
00
32
=
Hit Data
31 Digital Design & Computer Architecture Memory Systems
Cache Perf. with Spatial Locality
addi s0, zero, 5
addi s1, zero, 0 Miss Rate = 1/15
LOOP: beq s0, zero, DONE
= 6.67%
lw s2, 4(s1)
lw s3, 12(s1)
Larger blocks reduce
lw s4, 8(s1)
addi s0, s0, -1 compulsory misses
j LOOP through spatial locality
DONE: Tag
Block Byte
Set Offset Offset
Memory
00...00 0 11 00
Address
27 2
V Tag Data
0 Set 1
1 00...00 mem[0x00...0C] mem[0x00...08] mem[0x00...04] mem[0x00...00] Set 0
27 32 32 32 32
11
10
01
00
32
=
Hit Data
32 Digital Design & Computer Architecture Memory Systems
Types of Misses
• Compulsory: first time data accessed
• Capacity: cache too small to hold all data of
interest
• Conflict: data of interest maps to same
location in cache
Miss penalty: time it takes to retrieve a block from
lower level of hierarchy
33 Digital Design & Computer Architecture Memory Systems
Cache Organization Recap
• Capacity: C
• Block size: b
• Number of blocks in cache: B = C/b
• Number of blocks in a set: N
• Number of sets: S = B/N
Number of Ways Number of Sets
Organization (N) (S = B/N)
Direct Mapped 1 B
N-Way Set Associative 1 < N < B B/N
Fully Associative B 1
34 Digital Design & Computer Architecture Memory Systems
Chapter 7: Microarchitecture
Cache Replacement
Policy
Replacement Policy
• Cache is too small to hold all data of interest at
once
• If cache full: program accesses data X and evicts
data Y
• Capacity miss when access Y again
• How to choose Y to minimize chance of needing it
again?
– Least recently used (LRU) replacement: the least
recently used block in a set evicted
36 Digital Design & Computer Architecture Memory Systems
LRU Replacement
# RISC-V assembly
lw s1, 0x04(zero)
lw s2, 0x24(zero)
lw s3, 0x54(zero)
Way 1 Way 0
V U Tag Data V Tag Data
0 0 0 Set 3 (11)
0 0 0 Set 2 (10)
01 0 00...010 mem[0x00...24] 01 00...000 mem[0x00...04] Set 1 (01)
0 0 0 Set 0 (00)
(a)
Way 1 Way 0
V U Tag Data V Tag Data
0 0 0 Set 3 (11)
0 0 0 Set 2 (10)
1 1 00...010 mem[0x00...24] 1 00...101 mem[0x00...54] Set 1 (01)
0 0 0 Set 0 (00)
(b)
37 Digital Design & Computer Architecture Memory Systems
Chapter 7: Microarchitecture
Cache Summary
Cache Summary
• What data is held in the cache?
– Recently used data (temporal locality)
– Nearby data (spatial locality)
• How is data found?
– Set is determined by address of data
– Word within block also determined by address
– In associative caches, data could be in one of several
ways
• What data is replaced?
– Leastrecently used way in the set
39 Digital Design & Computer Architecture Memory Systems
Miss Rate Trends
• Bigger caches reduce capacity misses
• Greater associativity reduces conflict misses
Adapted from Patterson & Hennessy, Computer Architecture: A Quantitative Approach, 2011
40 Digital Design & Computer Architecture Memory Systems
Miss Rate Trends
• Bigger blocks reduce compulsory misses
• Bigger blocks increase conflict misses
41 Digital Design & Computer Architecture Memory Systems
Multilevel Caches
• Larger caches have lower miss rates, longer
access times
• Expand memory hierarchy to multiple levels of
caches
• Level 1: small and fast (e.g. 16 KB, 1 cycle)
• Level 2: larger and slower (e.g. 256 KB, 26
cycles)
• Most modern PCs have L1, L2, and L3 cache
42 Digital Design & Computer Architecture Memory Systems
Intel Pentium III Die
© Intel Corp.
43 Digital Design & Computer Architecture Memory Systems