ECE 511 - Fall 2019 - Lecture 5 - Cache Concept
ECE 511 - Fall 2019 - Lecture 5 - Cache Concept
Cache Concept
Khaled N. Khasawneh
George Mason University
Fall 2019
Cache Concept
3
Topics Covered
n The Memory System
q Memory Hierarchy
n Cache Basics
q Block size
Register
Register
Register
I-mem
D-mem
Reg
Decode File
Logic ALU
Adder
Pipeline
Instruction (Instruction Data
Supply Supply
execution)
6
Memory and Processor Performance Gap!
n Memory technology evolves slower!
7
8
Memory Speed
n In fact, memory is orders of magnitude slower (than the CPU).
q Processor speeds are on the order of Gigahertz (GHz).
Why?
1
1
Register File
n The register file is a fast memory.
q It operates at the speed of the processor.
q It can be read/written in a portion of one
clock cycle.
q Pipeline does not need to be stalled when
accessing the register file.
q (It is part of the processor datapath).
SRAM
q High power consumption. SRAM Bit-Cell
q Requires large area on die (chip).
DRAM
q Low power consumption. DRAM Bit-Cell
q Small size.
n DRAM is dense (small size per bit) and inexpensive, but slow.
q Minimize cost.
15
Analogy
L1 Cache
Main Memory
L1 Cache
L2 Cache
L3 Cache
L1 Cache
L2 Cache
16
1
7
Miss Rate and EMAT
n Miss rate
q The probability that the data or instruction needed by the processor is
not found in the high-speed cache (SRAM).
CACHE BASICS
Block and Block size
Cache hit and miss
Effective Memory Access Time (EMAT)
2
3
Memory (and Cache) Block
n Block (aka. Line)
q The smallest unit of memory that can be
transferred between the main memory and Processor
the cache. Registers
q Can be a single word or many bytes.
Cache
block
Byte addressed
by processor.
Main Memory
Entire block is
transferred to the
cache.
Main Memory
2
4
Memory (and Cache) Block
n The word size and block size can be different.
word
byte byte byte byte byte byte byte byte byte byte byte byte byte byte byte byte
block
2
5
Cache Lookup
n The processor specifies a main memory address when
q Fetching an instruction from program memory.
q Reading data from or writing data to data memory.
n The cache is searched, using the MM address, for the instruction/data
needed.
n Hit: a successful search of the cache.
lw $s0, 0x1040($zero)
n Hit rate (h): the probability of
finding the specified MM address in
MM Address Data Block To CPU the cache.
n Miss: an unsuccessful search of the
cache.
n Miss rate (m): the probability of not
finding the specified MM address in
0x1040 the cache.
q M=1–h
n Miss penalty: the time penalty
incurred when MM must be accessed
for the required instruction or data.
Cache
2
6
Effective Memory Access Time
n EMAT is the effective (or average) time experienced by
the processor when accessing the memory system.
Level 0
n Ti = access time of level i
n mi = miss rate of level i
Level 1
n EMATi+1 = effective
memory access time of
level i+1 Level 2
Level 3
2
8
Multi-level Cache
n The goal of the design of an hierarchical memory system is to
reduce the effective memory access time.
EMAT = TC + ( mC * TMM )
CACHE BASICS
How does the cache work?
3
0
Reading Data from Memory
n How does the CPU read data (and instructions) from memory?
n Processor issues a MM address.
n Cache is searched using the MM
address. Processor
q Address is also sent to the main Registers
memory.
n If the MM address is found (cache hit),
processor reads data from cache. Cache
q One clock cycle to read from cache.
n If the MM address is not found (miss),
data is transferred from MM to the
cache and the processor.
q Miss penalty = 100’s of clock cycles. Main Memory
q Data is copied into an unused cache
block.
q Or must overwrite a used cache block.
3
1
Reading Data from Memory
n How does the CPU read data (and instructions) from memory?
MEMORY ORGANIZATION
Memory is organized into blocks.
Calculating the main memory block number.
3
6
Main Memory Block Number
n Main memory and cache are organized into blocks.
q The block size is the same for both memories.
n The processor issues an a-bit main memory address.
n To determine the block number, the address can be
partitioned into two fields:
a bits
a – b = 13 bits b = 3 bits
Block 1858 Byte 7
a – b = 11 bits b = 5 bits
Block 464 Byte 23
CACHE ORGANIZATION
4
0
Cache Organization
n Each main memory block is mapped to a block in the cache.
n The cache can be organized in a variety of ways.
1. Fully-associative
q Each main memory block can be mapped to any cache block.
2. Direct-mapped
q Each main memory block is mapped to exactly one cache block.
3. Set-associative
q Each main memory block is mapped to a set of cache blocks.
4
1
Register
Register
Register
I-Cache D-
Cache
Reg
Decode File
Logic ALU
Adder
q Reads data from and writes data to the D-cache in the MEM stage.
Register
Register
Register
I-Cache D-
Cache
Reg
Decode File
Logic ALU
Adder
n The instruction and data caches make it possible for the IF and MEM
stages of the pipeline to have comparable clock cycle times to the
other stages of the pipeline.
q This assumes cache hits.
Register
Register
Register
I-Cache D-
Cache
Reg
Decode File
Logic ALU
Adder
Register
Register
Register
I-Cache D-
Cache
Reg
Decode File
Logic ALU
Adder
n Instruction fetch
n Load word instruction (read from memory)
q Write stall
n Store word instruction (write to memory)
4
7
Effect on Processor Performance
n Processor performance is measured using Execution Time (ET).
n Assuming a perfect cache (i.e. miss rate = 0 %)
n Thus, due to the non-ideal cache, the time required to execute the program
will increase by a factor of 2.902 / 1.8 = 1.61.
5
1
Cache Performance: Example #2
n Given the following system specifications:
q CPIAVG = 2.0 (does not include memory stalls)
n Thus, a processor with a perfect cache is 5.44 / 2.0 = 2.72 times faster than
a processor with the non-ideal cache described above.
5
4
Improving Cache Performance
n Improve cache performance to improve processor performance.
n Processor performance can be improved by reducing pipeline stalls,
including those due to memory access (memory stalls).
EMAT = TC + ( mC * TMM )
CACHE ORGANIZATIONS
The (hierarchical) memory system.
Cache is “hidden” memory.
Fully-associative, Direct-mapped, Set-associative
Main memory (and cache) is organized into blocks.
56
The (Hierarchical) Memory System
n The memory system is organized hierarchically:
57
Cache and Main Memory
n Cache is “hidden” memory.
Processor
faster expensive q Not visible to the programmer.
Registers q Not directly accessible to the
processor.
1. Fully-associative
q Each main memory block can be mapped to any cache block.
2. Direct-mapped
q Each main memory block is mapped to exactly one cache block.
3. Set-associative
q Each main memory block is mapped to a set of cache blocks.
59
Main Memory Block Number
n Main memory and cache are organized into blocks.
q The block size is the same for both memories.
a bits
DIRECT-MAPPED CACHE
Basics
Organization of a direct-mapped cache
Example
61
Direct-mapped Cache
n The cache is organized into blocks.
n Each main memory block maps to exactly one cache block.
q A cache block is identified by its cache index.
8 cache blocks
Block (data or
instructions) copied from
MM
62
Direct-mapped Cache
n The processor issues an a-bit main memory address.
n When accessing a direct-mapped cache, the address is partitioned
into three fields:
q Tag field: distinguishes between the different main memory
blocks that can map to the same cache block in a direct-mapped
cache.
q Cache index: identifies the cache block to which the main
memory block is mapped in a direct-mapped cache.
q Block offset: identifies a byte within the block.
Cache
H/M
Index V Tag Data
lbu $s0, 0x32($zero)
0 0
lbu $s0, 0xC9($zero) 1 0
Cache
H/M
Index V Tag Data
lbu $s0, 0x32($zero) M 0 0
lbu $s0, 0xC9($zero) 1 0
1 2 Byte 2
74
Direct-mapped Cache: Example
Memory Specifications: Processor
0xC9
Main Memory size = 256 bytes (a = 8)
Block size = 8 bytes (b = 3) Registers
Cache
H/M
Index V Tag Data
lbu $s0, 0x32($zero) M 0 0
lbu $s0, 0xC9($zero) M 1 1 6 D25
Cache
H/M
Index V Tag Data
lbu $s0, 0x32($zero) M 0 0
lbu $s0, 0xC9($zero) M 1 1 6 D25
2 1 7 D31
lbu $s0, 0xF4($zero) M
3 0
lbu $s0, 0x37($zero)
lbu $s0, 0x5D($zero) Main Memory
0 1 2 3 4 5 6 7
lbu $s0, 0x58($zero)
8 9 10 11 12 13 14 15
lbu $s0, 0x1E($zero) 16 17 18 19 20 21 22 23
D31
Block 31 11110 = 30
24 25 26 27 28 29 30 31
Cache
H/M
Index V Tag Data
lbu $s0, 0x32($zero) M 0 0
lbu $s0, 0xC9($zero) M 1 1 6 D25
2 1 1 D06
lbu $s0, 0xF4($zero) M
3 0
lbu $s0, 0x37($zero) M
lbu $s0, 0x5D($zero) Main Memory
0 1 2 3 4 5 6 7
lbu $s0, 0x58($zero)
8 9 10 11 12 13 14 15
lbu $s0, 0x1E($zero) 16 17 18 19 20 21 22 23
D06
Block 6 24 25 26 27 28 29 30 31
Cache
H/M
Index V Tag Data
lbu $s0, 0x32($zero) M 0 0
lbu $s0, 0xC9($zero) M 1 1 6 D25
2 1 1 D06
lbu $s0, 0xF4($zero) M
3 1 2 D11
lbu $s0, 0x37($zero) M
lbu $s0, 0x5D($zero) M Main Memory
0 1 2 3 4 5 6 7
lbu $s0, 0x58($zero)
8 9 10 11 12 13 14 15
lbu $s0, 0x1E($zero) 16 17 18 19 20 21 22 23
D11
Block 11 24 25 26 27 28 29 30 31
2 3 Byte 5
78
Direct-mapped Cache: Example
Memory Specifications: Processor
0x58
Main Memory size = 256 bytes (a = 8)
Block size = 8 bytes (b = 3) Registers
Cache
H/M
Index V Tag Data
lbu $s0, 0x32($zero) M 0 0
lbu $s0, 0xC9($zero) M 1 1 6 D25
2 1 1 D06
lbu $s0, 0xF4($zero) M D11
3 1 2 D11
lbu $s0, 0x37($zero) M
lbu $s0, 0x5D($zero) M Main Memory
0 1 2 3 4 5 6 7
lbu $s0, 0x58($zero) H 8 9 10 11 12 13 14 15
lbu $s0, 0x1E($zero) 16 17 18 19 20 21 22 23
Block 11 24 25 26 27 28 29 30 31
Cache
H/M
Index V Tag Data
lbu $s0, 0x32($zero) M 0 0
lbu $s0, 0xC9($zero) M 1 1 6 D25
2 1 1 D06
lbu $s0, 0xF4($zero) M
3 1 0 D03
lbu $s0, 0x37($zero) M
lbu $s0, 0x5D($zero) M Main Memory
0 1 2 3 4 5 6 7
lbu $s0, 0x58($zero) H 8 9 10 11 12 13 14 15
lbu $s0, 0x1E($zero) M 16 17 18 19 20 21 22 23
D03
Block 3 24 25 26 27 28 29 30 31
REPLACEMENT POLICY
Direct-mapped cache
81
Replacement Policy
n Empty block
q Use an “empty” cache block, if one is available.
CACHE SIZE
Direct-mapped cache: Example
83
Cache Size
n Memory system specifications: Direct-mapped Cache
q Main memory size = 1M byte Index V Tag Data
q Cache size = 128 bytes 0 0
q Block size = 8 bytes 1 0
q Direct-mapped cache 2 0
(One-way Set-associative cache). 3 0
(illustrated here:
4 sets with one block each)
n Calculate the total size of the cache (in bits).
n Determine the number of blocks in main memory.
q # blocks in MM = Main memory size / Block size
q # blocks in MM = 1M byte / 8 bytes = 220 / 23 = 217 = 131072
blocks.
n Determine the number of blocks in the cache.
q # blocks in cache = Cache size / Block size
q # blocks in cache = 128 bytes / 8 bytes = 16 blocks.
84
Cache Size
n Determine the number of address bits for the tag field.
q a = address bits = log2(memory size in bytes) = log2(1M bytes) = 20 bits.
q n = index = log2(cache size in blocks) = log2(16) = 4 bits.
q b = block offset = log2(block size in bytes) = log2(8 bytes) = 3 bits.
q t = tag = a – (b + n) = 20 – (3 + 4) = 13 bits.
n This implies that just 82.05% of the cache is used to store data.
n There may be a more efficient design for the cache.
q One in which a greater percentage of the cache is used to store data.
86
More Efficient Cache Design
n How can the size of the cache be reduced?
q This leads to a reduction in the cache overhead and total cache size.
(illustrated here:
n Calculate the total size of the cache (in bits). 4 sets with one block each)
n Determine the number of blocks in main memory.
q # blocks in MM = Main memory size / Block size
n By increasing the block size from 8 bytes to 32 bytes, the cache overhead was
decreased from 17.95 % to 5.19 % .
q This implies that nearly 95 % of the cache is now used for data.
q The total cache size was also decreased from 1248 bits to 1080 bits, which
DIRECT-MAPPED CACHE
The problem with a direct-mapped cache: Example.
92
Direct-mapped Cache: Example
n The code segment given below is executed on
a processor with a 4-block direct-mapped Cache specifications:
cache. • Cache size = 4 blocks
• Block size = 4 bytes
11
n Problem: the direct-mapped cache design allows each main memory block
to be placed in just one cache block (specified by the cache index).
FULLY-ASSOCIATIVE CACHE
Basics
Organization of fully-associative cache
Example
94
Fully-associative Cache
n The cache is organized into blocks.
n Each cache block contains storage space for the
q Tag field: contains the tag field of the MM block copied into the cache.
q Data: contains a copy of the data in the specified main memory block.
Tag field of block copied from MM Block (data or instructions) copied from MM
Tag Data Tag Data Tag Data Tag Data Tag Data Tag Data Tag Data Tag Data
8 cache blocks
95
Fully-associative Cache
n The processor issues an a-bit main memory address.
n When accessing a fully-associative cache, the address is partitioned into
two fields:
q Tag field: identifies the main memory block (i.e. the block number).
a bits
multiplexer
102
Fully-associative Cache
Cache Blocks
n Each block contains
q Data (or instructions)
30 q Metadata (V bit, tag)
n Valid bit
q Indicates validity of
V Tag Data V Tag Data V Tag Data corresponding block.
n Tag bits
30 32 30 32 q Identify the MM block
30 32 that was copied into
the cache block.
q Upper t bits of MM
addr.
n Data
q Block copied from MM.
multiplexer q Data or instructions.
103
Valid bit
n Blocks are copied from main memory to the cache (as needed).
multiplexer
Large OR gate and multiplexer is
difficult and costly to realize.
106
Fully-associative Cache: Example
Memory Specifications:
Main Memory size = 256 bytes (a = 8)
Block size = 8 bytes (b = 3) Processor
Registers
H/M
lbu $s0, 0x32($zero)
lbu $s0, 0x37($zero)
lbu $s0, 0x5B($zero)
Cache
lbu $s0, 0xC9($zero)
lbu $s0, 0x58($zero) 0 1 2 3 4 5 6 7
lbu $s0, 0xA4($zero) 8 9 10 11 12 13 14 15
16 17 18 19 20 21 22 23
lbu $s0, 0x3D($zero)
24 25 26 27 28 29 30 31
Block # Byte #
(Tag field)
107
Fully-associative Cache: Example
Memory Specifications:
Main Memory size = 256 bytes (a = 8)
Block size = 8 bytes (b = 3) Processor
0x32
Registers
H/M
lbu $s0, 0x32($zero) M
lbu $s0, 0x37($zero)
06 D06
lbu $s0, 0x5B($zero)
Cache
lbu $s0, 0xC9($zero)
lbu $s0, 0x58($zero) 0 1 2 3 4 5 6 7
lbu $s0, 0xA4($zero) 8 9 10 11 12 13 14 15
16 17 18 19 20 21 22 23
lbu $s0, 0x3D($zero) D06
24 25 26 27 28 29 30 31
Block 6 Byte 2
Processor is stalled 100’s of clock cycles.
(Tag field)
108
Fully-associative Cache: Example
Memory Specifications:
Main Memory size = 256 bytes (a = 8)
Block size = 8 bytes (b = 3) Processor
0x37
D06
Registers
H/M
lbu $s0, 0x32($zero) M
lbu $s0, 0x37($zero) H
06 D06
lbu $s0, 0x5B($zero)
Cache
lbu $s0, 0xC9($zero)
lbu $s0, 0x58($zero) 0 1 2 3 4 5 6 7
lbu $s0, 0xA4($zero) 8 9 10 11 12 13 14 15
16 17 18 19 20 21 22 23
lbu $s0, 0x3D($zero)
24 25 26 27 28 29 30 31
Block 11 Byte 3
Processor is stalled 100’s of clock cycles.
(Tag field)
110
Fully-associative Cache: Example
Memory Specifications:
Main Memory size = 256 bytes (a = 8)
Block size = 8 bytes (b = 3) Processor
0xC9
Registers
H/M
lbu $s0, 0x37($zero) M
lbu $s0, 0x32($zero) H
06 D06 11 D11 25 D25
lbu $s0, 0x5B($zero) M
Cache
lbu $s0, 0xC9($zero) M
lbu $s0, 0x58($zero) 0 1 2 3 4 5 6 7
lbu $s0, 0xA4($zero) 8 9 10 11 12 13 14 15
16 17 18 19 20 21 22 23
lbu $s0, 0x3D($zero) D25
24 25 26 27 28 29 30 31
Block 25 Byte 1
Processor is stalled 100’s of clock cycles.
(Tag field)
111
Fully-associative Cache: Example
Memory Specifications:
Main Memory size = 256 bytes (a = 8)
Block size = 8 bytes (b = 3) Processor
0x58
D11
Registers
H/M
lbu $s0, 0x37($zero) M
lbu $s0, 0x32($zero) H
06 D06 11 D11 25 D25
lbu $s0, 0x5B($zero) M
Cache
lbu $s0, 0xC9($zero) M
lbu $s0, 0x58($zero) H 0 1 2 3 4 5 6 7
lbu $s0, 0xA4($zero) 8 9 10 11 12 13 14 15
16 17 18 19 20 21 22 23
lbu $s0, 0x3D($zero)
24 25 26 27 28 29 30 31
Block 11 Byte 0
Processor is not stalled.
(Tag field)
112
Fully-associative Cache: Example
Memory Specifications:
Main Memory size = 256 bytes (a = 8)
Block size = 8 bytes (b = 3) Processor
0xA4
Registers
H/M
lbu $s0, 0x37($zero) M
lbu $s0, 0x32($zero) H
06 D06 11 D11 25 D25 20 D20
lbu $s0, 0x5B($zero) M
Cache
lbu $s0, 0xC9($zero) M
lbu $s0, 0x58($zero) H 0 1 2 3 4 5 6 7
lbu $s0, 0xA4($zero) M 8 9 10 11 12 13 14 15
16 17 18 19 20 21 22 23
lbu $s0, 0x3D($zero) D20
24 25 26 27 28 29 30 31
Block 20 Byte 4
Processor is stalled 100’s of clock cycles.
(Tag field)
113
Fully-associative Cache: Example
Memory Specifications:
Main Memory size = 256 bytes (a = 8)
Block size = 8 bytes (b = 3) Processor
0x3D
Registers
H/M
lbu $s0, 0x37($zero) M
To where should Block 7 be copied?
lbu $s0, 0x32($zero) H
06 D06 11 D11 25 D25 20 D20
lbu $s0, 0x5B($zero) M
Cache
lbu $s0, 0xC9($zero) M
lbu $s0, 0x58($zero) H 0 1 2 3 4 5 6 7
lbu $s0, 0xA4($zero) M 8 9 10 11 12 13 14 15
16 17 18 19 20 21 22 23
lbu $s0, 0x3D($zero) M D07
24 25 26 27 28 29 30 31
Block 7 Byte 5
Processor is stalled 100’s of clock cycles.
(Tag field)
114
REPLACEMENT POLICY
Fully-associative cache
115
Replacement Policy
n Empty block
q Use an “empty” cache block, if one is available.
n Random
q Gives approximately the same performance as LRU for high
associativity.
q Used for Fully-associative caches.
116
CACHE SIZE
Fully-associative cache: Example
117
Cache Size
n Memory system specifications:
q Main memory size = 1M byte
q Cache size = 128 bytes Fully-associative Cache
q Block size = 8 bytes V Tag Data
q Total bits in each cache block = data bits + metadata bits = 82 bits.
n This implies that just 78.05% of the cache is used to store data.
120
121
SET-ASSOCIATIVE CACHE
Basics
Organization of a set-associative cache
Example
122
Set-associative Cache
m-way Set-associative cache
n The cache is organized into sets.
q Each set contains m blocks for data or instructions.
2 blocks
Block 0 Block 1
cache index Index Tag Data Tag Data
m=4
130
Set-associative Cache
Cache
n Each cache set contains
q m blocks of data
q Metadata for each block
n Data (or instruction)
q Block copied from MM.
n Tag bits
q Identify the MM block
that is stored in the
cache block.
q Upper t bits of MM addr.
n Valid bit
q Indicates validity of
corresponding block.
n Index
q specifies location of the
m=4 set within the cache
131
Set-associative Cache
Cache Search (hit or miss)
n CPU issues a MM addr.
q Index field selects set.
Parallel comparison
q Tag field compared
against tag stored in all m
blocks of selected set.
q Valid bit checked
n If cache hit
q Processor reads data from
cache.
q Processor does not need
to be stalled.
n If cache miss
q CPU reads data from MM.
q MM block copied into
appropriate cache set.
m=4 q CPU must be stalled.
H/M
lbu $s0, 0x32($zero) Block 0 Block 1
Index V Tag Data V Tag Data
lbu $s0, 0xC9($zero) 0 0 0
lbu $s0, 0xF4($zero) 1 0 0
Block # 24 25 26 27 28 29 30 31
Main Memory
MM Address: 0000 0000
Cache specs: 2-way (w = 2), 2 sets (n = 1)
Tag index Byte #
133
Set-associative Cache: Example
Memory Specifications: Processor
0x32
Main Memory size = 256 bytes (a = 8)
Block size = 8 bytes (b = 3) Registers
H/M
Block 0 Block 1
lbu $s0, 0x32($zero) M
Index V Tag Data V Tag Data
lbu $s0, 0xC9($zero) 0 1 3 D06 0
lbu $s0, 0xF4($zero) 1 0 0
Main Memory
MM Address: 0011 0010
Processor is stalled 100’s of clock cycles.
3 0 2
134
Set-associative Cache: Example
Memory Specifications: Processor
0xC9
Main Memory size = 256 bytes (a = 8)
Block size = 8 bytes (b = 3) Registers
H/M
Block 0 Block 1
lbu $s0, 0x32($zero) M
Index V Tag Data V Tag Data
lbu $s0, 0xC9($zero) M 0 1 3 D06 0
lbu $s0, 0xF4($zero) 1 1 12 D25 0
Main Memory
MM Address: 1100 1001
Processor is stalled 100’s of clock cycles.
12 1 1
135
Set-associative Cache: Example
Memory Specifications: Processor
0xF4
Main Memory size = 256 bytes (a = 8)
Block size = 8 bytes (b = 3) Registers
H/M
Block 0 Block 1
lbu $s0, 0x32($zero) M
Index V Tag Data V Tag Data
lbu $s0, 0xC9($zero) M 0 1 3 D06 1 15 D30
lbu $s0, 0xF4($zero) M 1 1 12 D25 0
Main Memory
MM Address: 1111 0100
Processor is stalled 100’s of clock cycles.
15 0 4
136
Set-associative Cache: Example
Memory Specifications: Processor
0x37 D06
Main Memory size = 256 bytes (a = 8)
Block size = 8 bytes (b = 3) Registers
H/M
Block 0 Block 1
lbu $s0, 0x32($zero) M
Index V Tag Data V Tag Data
lbu $s0, 0xC9($zero) M 0 1 3 D06 1 15 D30
lbu $s0, 0xF4($zero) M 1 1 12 D25 0
Block 6 24 25 26 27 28 29 30 31
Main Memory
MM Address: 0011 0010
3 0 2 Processor is not stalled.
137
Set-associative Cache: Example
Memory Specifications: Processor
0x5D
Main Memory size = 256 bytes (a = 8)
Block size = 8 bytes (b = 3) Registers
H/M
Block 0 Block 1
lbu $s0, 0x32($zero) M
Index V Tag Data V Tag Data
lbu $s0, 0xC9($zero) M 0 1 3 D06 1 15 D30
lbu $s0, 0xF4($zero) M 1 1 12 D25 1 5 D11
Main Memory
MM Address: 0101 1101
Processor is stalled 100’s of clock cycles.
5 1 5
138
Set-associative Cache: Example
Memory Specifications: Processor
0x58 D11
Main Memory size = 256 bytes (a = 8)
Block size = 8 bytes (b = 3) Registers
H/M
Block 0 Block 1
lbu $s0, 0x32($zero) M
Index V Tag Data V Tag Data
lbu $s0, 0xC9($zero) M 0 1 3 D06 1 15 D30
lbu $s0, 0xF4($zero) M 1 1 12 D25 1 5 D11
Block 11 24 25 26 27 28 29 30 31
Main Memory
MM Address: 0101 1000
Processor is not stalled.
5 1 0
139
Set-associative Cache: Example
Memory Specifications: Processor
0x58
Main Memory size = 256 bytes (a = 8)
Block size = 8 bytes (b = 3) Registers
Main Memory
MM Address: 0001 1110
Processor is stalled 100’s of clock cycles.
1 1 6
140
REPLACEMENT POLICY
Set-associative cache
141
Replacement Policy
n Empty block
q Use an “empty” cache block, if one is available.
n Random
q Gives approximately the same performance as LRU for high
associativity.
q Used for Fully-associative caches.
142
SPECTRUM OF ASSOCIATIVITY
Set-associative cache
143
Spectrum of Associativity
n Consider a cache with 8 blocks for data storage.
m=2
m=1
m=4
m=8
144
Spectrum of Associativity
n Memory system specifications:
q Main memory size = 64K bytes -> a = log2(64K) = 16 bits
q Cache size = 128 bytes (cache size is fixed).
q Block size = 16 bytes -> b = log2(16) = 4 bits
Cache Type Cache Entries Ways (m) Tag bits Index bits Block Offset
(b)
Direct Mapped 8 1 9 3 4
Two-way
4 2 10 2 4
Set Associative
Four-way
2 4 11 1 4
Set Associative
Fully Associative 1 8 12 0 4
145
How much Associativity?
n Increased associativity decreases miss rate.
q But with diminishing returns.
CACHE SIZE
Set-associative cache: Example
147
Cache Size Set-associative Cache
Block 0 Block 1
q t = tag = a – (b + n) = 20 – (3 + 3) = 14 bits.
149
Cache Size
n Determine the number of bits in each cache block.
q Data bits = block size = 8 bytes * 8 bits/byte = 64 bits.
q Metadata bits = Valid bit + Tag bits = 1 + 14 = 15 bits.
q Total bits in each cache block = data bits + metadata bits = 79 bits.
n This implies that just 81.01% of the cache is used to store data.
n As previously discussed, increasing the block size will reduce the cache
overhead and total cache size.
q It may also lead to a reduction in the cache miss rate.
151
Questions?