0% found this document useful (0 votes)
64 views34 pages

Lecture4-Ch2-Memory Hierarchy Design

Uploaded by

tgccx1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
64 views34 pages

Lecture4-Ch2-Memory Hierarchy Design

Uploaded by

tgccx1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 34

King Khalid University, Computer Engineering Department 9 January 2023

Computer Architecture
A Quantitative Approach, Fifth Edition
Computer Architecture
Chapter 2
Memory Hierarchy Design
(Pages 78 – 125) + Appendix B

Dr. Abdulmajeed Farea Aljunaid

Introduction

Introduction
 Programmers want unlimited amounts of memory with
low latency
 Fast memory technology is more expensive per bit than
slower memory
 Solution: organize memory system into a hierarchy
 Entire addressable memory space available in largest, slowest
memory
 Incrementally smaller and faster memories, each containing a
subset of the memory below it, proceed in steps up toward the
processor
 Temporal and spatial locality insures that nearly all
references can be found in smaller memories
 Gives the allusion of a large, fast memory being presented to the
processor

Chapter 2 — Memory Hierarchy Design 1


King Khalid University, Computer Engineering Department 9 January 2023

Introduction
Memory Hierarchy

Introduction

Memory Hierarchy

Chapter 2 — Memory Hierarchy Design 2


King Khalid University, Computer Engineering Department 9 January 2023

Introduction
Memory Performance Gap

muumuu

high bandwith
memory
Introduction

Memory Hierarchy Design


 Memory hierarchy design becomes more crucial
with recent multi-core processors:
 Aggregate peak bandwidth grows with # cores:
 Intel Core i7 can generate two data memory references per
core per clock

GBS
 Four cores and 3.2 GHz clock

409.6
 25.6 billion 64-bit data references/second +

 12.8 billion 128-bit instruction references

 = 409.6 GB/s!

 This can be achieved by:


 Multi-port, pipelined caches
 Two levels of cache per core
 Shared third-level cache on chip
 But, DRAM bandwidth is only 8% of 409.6 GB/s = 34.1
GB/s
6

Chapter 2 — Memory Hierarchy Design 3


King Khalid University, Computer Engineering Department 9 January 2023

Introduction
Performance and Power
 High-end microprocessors have >10 MB on-chip
cache
 Consumes large amount of area and power budget
 Thus, more designs must consider both performance
and power trade-offs

Introduction

Memory Hierarchy Basics


 When a word is not found in the cache, a miss
occurs:
 Fetch word from lower level in hierarchy, requiring a
higher latency reference
 Lower level may be another cache or the main
memory
 Also fetch the other words contained within the block
 Takes advantage of spatial locality
 Place block into cache in any location within its set,
determined by address(set associative mapping)
 Set number = block address MOD number of sets
 See Appendix B for More details

Chapter 2 — Memory Hierarchy Design 4


King Khalid University, Computer Engineering Department 9 January 2023

Introduction
Memory Hierarchy Basics
 n blocks per set => n-way set associative
 Direct-mapped cache => one block per set (one-way)
 Fully associative => one set
 Place block into cache in any location within its set, determined by
address
 block address MOD number of sets
4

200
Hit: data appears in some block in the upper level (example: Block 44
X)

1 miss Rate
Hit Rate: the fraction of memory access found in the upper level
 Hit Time: Time to access the upper level which consists of
RAM access time + Time to determine hit/miss
 Miss: data needs to be retrieved from a block in the lower level
(Block Y)
 Miss Rate = 1 - (Hit Rate)
 Miss Penalty: Time to replace a block in the upper level +
Time to deliver the block the processor

Memory Hierarchy Basics


Lower Level
To Processor Upper Level Memory
Memory
Blk X
From Processor Blk Y

 Writing to cache: two strategies


 Write-through
 Immediately update lower levels of hierarchy
 Write-back
 Only update lower levels of hierarchy when an updated block is
replaced
 Both strategies use write buffer to make writes asynchronous

Cache Lower
Processor Level
Memory

Holds data awaiting write-through to Write Buffer


lower level memory

Chapter 2 — Memory Hierarchy Design 5


King Khalid University, Computer Engineering Department 9 January 2023

Q4: What happens on a write?


Write-Through Write-Back
Write data only to the
Data written to cache cache
block
Policy
also written to lower- Update lower level
level memory when a block falls out
of the cache

IT Debug If Easy Hard


Do read misses
produce writes?
No Yes
Do repeated writes
make it to lower Yes No
level?

Introduction

Memory Hierarchy Basics


 Causes of misses
 Compulsory
 First reference to a block, also called “cold miss”
 Capacity
 Blocks discarded (lack of space) and later retrieved
 Conflict
 Program makes repeated references to multiple addresses from
different blocks that map to the same location in the cache

 Note that speculative and multithreaded processors may


execute other instructions during a miss
 Reduces performance impact of misses

12

Chapter 2 — Memory Hierarchy Design 6


King Khalid University, Computer Engineering Department 9 January 2023

Introduction
Memory Hierarchy Basics
 Six basic cache optimizations:(See Appendix B
for quantitative examples)
 Larger block size
 Reduces compulsory misses
 Increases capacity and conflict misses, increases miss penalty
 Larger total cache capacity to reduce miss rate
 Increases hit time, increases power consumption
 Higher associativity
 Reduces conflict misses
 Increases hit time, increases power consumption
 Higher number of cache levels
 Reduces overall memory access time
 Giving priority to read misses over writes
 Reduces miss penalty (read will check write buffer and not wait for writes)
 Avoiding address translation in cache indexing
 Reduces hit time

13

Advanced Optimizations

Advanced Optimizations of Cache

 Increase cache bandwidth and decrease hit


time, Miss rate, and miss penalty.
1- Small and simple first level caches (To reduce hit
time and power)
 Critical timing path of hit time:

 addressing tag memory using the index, then


 comparing tags, then
 selecting correct data from set using multiplexer
 Direct-mapped caches can overlap tag compare and
transmission of data
 Lower associativity reduces power because fewer
cache lines are accessed

14

Chapter 2 — Memory Hierarchy Design 7


King Khalid University, Computer Engineering Department 9 January 2023

Advanced Optimizations
L1 Size and Associativity

Access time vs. size and associativity

15

Advanced Optimizations

L1 Size and Associativity

POWER

Energy per read vs. size and associativity


a
16

Chapter 2 — Memory Hierarchy Design 8


King Khalid University, Computer Engineering Department 9 January 2023

Advanced Optimizations
2- Pipelining Cache
 Pipeline cache access to improve bandwidth (divide
cache access stage in Inst. pipeline into multiple stages)
 The effective latency of a first-level cache hit can be
multiple clock cycles.
 Giving fast clock cycle time and high bandwidth but slow
hits.
Example: Accessing instructions from I-Cache
 Pentium: 1 cycle
 Pentium Pro – Pentium III: 2 cycles
 Pentium 4 – Core i7: 4 cycles
 Drawback: Increasing the number of pipeline stages
leads to greater penalty of mispredicted branches.

17

Advanced Optimizations

3- Nonblocking Caches
 In out of order execution processors, allows continued cache hits during
misses to increase cache bandwidth
 “Hit under miss”
 “Hit under multiple miss”
 Require Multibank memories
 Reduce miss penalty by being helpful during a miss instead of ignoring the
requests of the processor, but increase cache complexity

18

Chapter 2 — Memory Hierarchy Design 9


King Khalid University, Computer Engineering Department 9 January 2023

Advanced Optimizations
4- Multibanked Caches
 Organize cache as independent banks to support
simultaneous access
 ARM Cortex-A8 supports 1-4 banks for L2
 Intel i7 supports 4 banks for L1 and 8 banks for L2
 Banking supports simultaneous accesses only when the
addresses are spread across multiple banks.
 The mapping of addresses to banks affects the behavior
of the memory system.
 Interleave banks according to block address

19

Advanced Optimizations

5- Hardware Prefetching

(If prefeched block cause a delay for missed block)

20

Chapter 2 — Memory Hierarchy Design 10


King Khalid University, Computer Engineering Department 9 January 2023

Advanced Optimizations
Hardware Prefetching
 Hardware prefetching reduces miss rate or miss penalty

21

Advanced Optimizations

Summary

22

Chapter 2 — Memory Hierarchy Design 11


King Khalid University, Computer Engineering Department 9 January 2023

Example
 In an L2 cache, a cache hit takes 0.8ns and a
cache miss takes 5.1ns on average. The
cache hit ratio is 95% while the cache miss
ratio is 5%. Assuming a cycle time is 0.5ns,
compute average memory access time.
 A cache hit takes 0.8/0.5 = 2 cycles, and a
cache miss takes 5.1/0.5 = 11 cycles
 Average memory access cycles =
0.95*2+0.05*11 = 2.45 cycles
 Average memory access time = 2.45*0.5 =
1.225ns

23

Computer Memory Hierarchy

http://www.bit-tech.net/hardware/memory/2007/11/15/the_secrets_of_pc_memory_part_1/3

24

Chapter 2 — Memory Hierarchy Design 12


King Khalid University, Computer Engineering Department 9 January 2023

Memory Technology
Memory Technology
 Performance metrics
 Latency is concern of cache
 Bandwidth is concern of Main memory for multiprocessors
and I/O
 External approach (e.g., multi-bank memory)
 Internal approach (e.g., SDRAM, DDR)
 Memory latency
 Access time (AT): time between read request and when

desired word arrives


 Cycle time (CT): minimum time between unrelated requests
to memory
 DRAM used for main memory
 Dynamic: Must write after read, must refresh: AT < CT
 SRAM used for cache
 Static: no refresh or read followed by write: AT ≈ CT

25

Memory Technology

Memory Technology
 SRAM
 Requires low power to retain bit
 Requires 6 transistors/bit
 DRAM
 Must be re-written after being read
 Must also be periodically refreshed
 Every ~ 8 ms
 Each row can be refreshed simultaneously
 One transistor/bit
 Address lines are multiplexed:
 Upper half of address: row access strobe (RAS)
 Lower half of address: column access strobe (CAS)

26

Chapter 2 — Memory Hierarchy Design 13


King Khalid University, Computer Engineering Department 9 January 2023

SRAM versus DRAM


dc voltage

Address line T3 T4

T5 C1 C2 T6
Transistor

Storage
capacitor
T1 T2

Bit line Ground


Ground
B

Bit line Address Bit line


B line B

(a) Dynamic RAM (DRAM) cell (b) Static RAM (SRAM) cell

Figure 5.2 Typical Memory Cell Structures

27

A DRAM Example
RAS CAS WE OE

Timing and Control

Refresh
Counter MUX

Row Row Memory array


Address De- (2048 2048 4)
A0 Buffer coder
A1

Data Input
A10 Column Buffer D1
Address D2
Refresh circuitry D3
Buffer Data Output D4
Buffer
Column Decoder

Figure 5.3 Typical 16 Megabit DRAM (4M 4)

28

Chapter 2 — Memory Hierarchy Design 14


King Khalid University, Computer Engineering Department 9 January 2023

w.ms

Memory Technology
Memory Technology
 Amdahl:
 Memory capacity should grow linearly with processor speed
 Unfortunately, memory capacity and speed has not kept
pace with processors

 Some optimizations:
 Multiple column accesses to same row (Asynchronous
interface - Overhead problem)
 Synchronous DRAM
 Added clock to DRAM interface and enables pipelining
 Burst mode (block transfer) with critical word first
 Wider interfaces(4-bits , 8-bits , 16-bits)
 Double data rate (DDR, read on rising and falling edge)
 Multiple banks on each DRAM device

29

Memory Technology

Memory Optimizations

30

Chapter 2 — Memory Hierarchy Design 15


King Khalid University, Computer Engineering Department 9 January 2023

Memory Technology
Memory Optimizations
DRAMs are commonly sold on small boards called dual
inline memory modules (DIMMs) that contain 4–16 DRAM
chips.

Gs

31

DIMM Dual Inline Memory Module

http://en.wikipedia.org/wiki/DIMM

32

Chapter 2 — Memory Hierarchy Design 16


King Khalid University, Computer Engineering Department 9 January 2023

Memory Technology
Memory Optimizations
 DDR:
 DDR2
 Lower power (2.5 V -> 1.8 V)
 Higher clock rates (266 MHz, 333 MHz, 400 MHz)
 DDR3
 1.5 V
 800 MHz
 DDR4
 1-1.2 V
 1600 MHz
 Graphic DRAM is a special class of DRAMs based on
SDRAM designs but tailored for handling the higher
bandwidth demands of graphics processing units.
 GDDR5 is graphics memory based on DDR3

33

Memory Technology

Memory Optimizations
 Graphics memory:
 Achieve 2-5 X bandwidth per DRAM vs. DDR3
 Wider interfaces (32 vs. 16 bit)
 Higher clock rate
 Possible because they are attached via soldering instead of
socketted DIMM modules

 Reducing power in SDRAMs:


 Lower voltage
 Low power mode (ignores clock, continues to
refresh)

34

Chapter 2 — Memory Hierarchy Design 17


King Khalid University, Computer Engineering Department 9 January 2023

Memory Technology
Memory Power Consumption

35

Memory Technology

Flash Memory
 Used as a secondary storage in PMDs.
 Type of EEPROM, Flash uses a very different architecture
and has different properties than standard DRAM.
 Reads to Flash are sequential and read an entire page
 Must be erased (in blocks) before being overwritten
 Non volatile
 Limited number of write cycles (at least 100,000)
 Cheaper than SDRAM, more expensive than disk ($2/GiB
for Flash, $20 to $40/GiB for SDRAM, and $0.09/GiB for
magnetic disks)
 Slower than SDRAM, faster than disk

36

Chapter 2 — Memory Hierarchy Design 18


King Khalid University, Computer Engineering Department 9 January 2023

Solid State Drive (nowadays)

37

Comparison

Attribute SSD HDD


Random access time 0.1 ms 5-10 ms
Bandwidth 100-500 MB/s 100 MB/s sequential
Price/GB 0.9$-2$ 0.1$
Size Up to 2TB, 250GB 4TB
common
Power consumption 5 watts Up to 20 watts
Read/write symmetry No Yes
Noise No Yes (spin, rotate)

38

Chapter 2 — Memory Hierarchy Design 19


King Khalid University, Computer Engineering Department 9 January 2023

Virtual Memory
The Limits of Physical Addressing

“Physical addresses” of memory locations

A0-A31 A0-A31

CPU Memory
D0-D31 D0-D31

Data
oAll programs share one address space: The physical
address space
oMachine language programs must be aware of the machine
organization
oNo way to prevent a program from accessing any machine
resource
39

Virtual Memory
Solution: Add a Layer of Indirection

“Virtual Addresses” “Physical


Addresses”
A0-A31 A0-A31
Virtual Physical

CPU Address Memory


Translation
D0-D31 D0-D31

Data
• User programs run in a standardized virtual address space
• Address Translation hardware, managed by the operating
system (OS), maps virtual address to physical memory
•Hardware supports “modern” OS features: Protection,
Translation, Sharing

40

Chapter 2 — Memory Hierarchy Design 20


King Khalid University, Computer Engineering Department 9 January 2023

Virtual Memory and Virtual Machines


Virtual Memory
 Page table of each process resides in main memory
 Translation Lookaside Buffer (TLP) is a special
translation address cache

41

Virtual Memory and Virtual Machines

Virtual Memory
two mode
 Multiprogramming, where several programs running
concurrently would share a computer, led to demands
for protection and sharing among programs
 Protection via virtual memory
 Keeps processes in their own memory space
 Role of architecture:
 Provide user mode and supervisor mode
 Protect certain aspects of process state (read/write privileges)

I
 Provide mechanisms for switching between user mode and
supervisor mode
 Provide mechanisms to limit memory accesses
 Provide TLB (Translation Look aside Buffer) to translate
addresses
 Some bits in each TLB or page entry for page protection.

42

Chapter 2 — Memory Hierarchy Design 21


King Khalid University, Computer Engineering Department 9 January 2023

Virtual Memory and Virtual Machines


Virtual Machines

43

Virtual Memory and Virtual Machines

Virtual Machines
 Supports isolation and security
 Sharing a computer among many unrelated users
 Enabled by raw speed of processors, making the
overhead more acceptable
 Two types: System VM (like IBM VM/370) and
Application VM (like Java VM and .NET Framework)
 System VM allows different ISAs and operating
systems to be presented to user programs
 SVM software is called “virtual machine monitor” or
“hypervisor”
 Individual virtual machines run under the monitor are called
“guest VMs”

44

Chapter 2 — Memory Hierarchy Design 22


King Khalid University, Computer Engineering Department 9 January 2023

Virtual Machine Monitors (VMMs)


• Virtual machine monitor (VMM) or hypervisor is software
that supports VMs
• VMM determines how to map virtual resources to physical
resources
• Physical resource may be time-shared, partitioned, or
emulated in software
• VMM is much smaller than a traditional OS;

If
of
1/9/2023 45

Cache
Review

Chapter 2 — Memory Hierarchy Design 23


King Khalid University, Computer Engineering Department 9 January 2023

so I 11
got km'S
66 to

Q1: Where can a block be placed in the upper level?


• Block 12 placed in an 8-block cache:
– Fully associative, direct mapped, 2-way set associative
– S.A. Mapping = (Block Number) Modulo (Number Sets)

Chapter 2 — Memory Hierarchy Design 24


King Khalid University, Computer Engineering Department 9 January 2023

Direct Mapped Block Placement

Cache

address maps to block:


*0 *4 *8 *C
location = (block address MOD # blocks in cache)

00 04 08 0C 10 14 18 1C 20 24 28 2C 30 34 38 3C 40 44 48 4C

Memory

Fully Associative Block Placement

Cache

arbitrary block mapping


location = any

00 04 08 0C 10 14 18 1C 20 24 28 2C 30 34 38 3C 40 44 48 4C

Memory

Chapter 2 — Memory Hierarchy Design 25


King Khalid University, Computer Engineering Department 9 January 2023

Set-Associative Block Placement

Cache

address maps to set:


*0 *0 *4 *4 *8 *8 *C *C
location = (block address MOD # sets in cache)
(arbitrary location in set)

Set 0 Set 1 Set 2 Set 3

00 04 08 0C 10 14 18 1C 20 24 28 2C 30 34 38 3C 40 44 48 4C

Memory

Q2: How is a block found if it is in the upper level?

• Tag on each block


– No need to check index or block offset
• Increasing associativity shrinks index, expands
tag

Block Address Block


Tag Index Offset

Chapter 2 — Memory Hierarchy Design 26


King Khalid University, Computer Engineering Department 9 January 2023

Direct-Mapped Cache Design


Cache
ADDRESS Tag Byte Offset DATA HIT =1
Index
0x0000000 3 0
ADDR
V Tag Data
1 0x00001C0 0xff083c2d
0
1 0x0000000 0x00000021
1 SRAM
CACHE 0x0000000 0x00000103
0
0
1
0 0x23F0210 0x00000009
DATA[59] DATA[58:32] DATA[31:0]

Set Associative Cache Design

Address

• Key idea: 31 30 12 11 10 9 8 3210

– Divide cache into sets 22 8

– Allow block anywhere in a set


Index V Tag Data V Tag Data V Tag Data V Tag Data
• Advantages: 0
1
– Better hit rate 2

• Disadvantage: 253
254
255
– More tag bits 22 32

– More hardware
– Higher access time

4-to-1 multiplexor

Hit Data

A Four-Way Set-Associative Cache

Chapter 2 — Memory Hierarchy Design 27


King Khalid University, Computer Engineering Department 9 January 2023

Fully Associative Cache Design


• Key idea: set size of one block
–1 comparator required for each block
–No address decoding
–Practical only for small caches due to
hardware demands
tag in 11110111 data out 1111000011110000101011

= tag 00011100 data 0000111100001111111101


= tag 11110111 data 1111000011110000101011
= tag 11111110 data 0000000000001111111100
= tag 00000011 data 1110111100001110000001
= tag 11100110 data 1111111111111111111111

Exercise
• Given the following requirements for cache design for a 32-bit-address
computer (word addressable): (1) cache contains 16KB of data, and (2)
each cache block contains 16 words. (3) Placement policy is 4-way set-
associative.
– What are the lengths (in bits) of the block offset field and the index
field in the address?

– Block size = cache line size → # ℎ = =


= 2 = 256
#
– # = = = 64 =2
#
Tag Index Block offset
22 6 4
– What are the lengths (in bits) of the index field and the tag field in the
address if the placement is 1-way set-associative?
Tag Index Block offset
20 8 4

Chapter 2 — Memory Hierarchy Design 28


King Khalid University, Computer Engineering Department 9 January 2023

Q3: After a cache read miss, if there are no empty


cache blocks, which block should be removed from
the cache?
• Easy for Direct Mapped
• Set Associative or Fully Associative:
 Random
 LRU (Least Recently Used)

The Least Recently Used A randomly chosen block?


(LRU) block? Appealing, Easy to implement, how
but hard to implement for well does it work?
high associativity
Miss Rate for 2-way Set Associative Cache
Size Random LRU

16 KB 5.7% 5.2%
64 KB 2.0% 1.9%
256 KB 1.17% 1.15%

Chapter 2 — Memory Hierarchy Design 29


King Khalid University, Computer Engineering Department 9 January 2023

More about replacement algorithms

Assoc.: 2-way 4-way 8-way


Size LRU Ran LRU Ran LRU Ran
16 KB 5.2% 5.7% 4.7% 5.3% 4.4% 5.0%
64 KB 1.9% 2.0% 1.5% 1.7% 1.4% 1.5%
256 KB 1.15% 1.17% 1.13% 1.13% 1.12% 1.12%

Chapter 2 — Memory Hierarchy Design 30


King Khalid University, Computer Engineering Department 9 January 2023

Reducing Cache Misses: 1. Larger Block Size

Using the principle of locality. The larger the block, the greater the chance parts
of it will be used again.

25% Size of Cache

20% 1K

4K
15%
Miss
16K
Rate
10%
64K
5% 256K

0%
16

32

64

128

Block Size (bytes) 256

Increasing Block Size


• One way to reduce the miss rate is to increase
the block size
– Take advantage of spatial locality
– Decreases compulsory misses
• However, larger blocks have disadvantages
– May increase the miss penalty (need to get more
data)
– May increase hit time (need to read more data from
cache and larger mux)
– May increase miss rate, since conflict misses
• Increasing the block size can help, but don’t
overdo it.

Chapter 2 — Memory Hierarchy Design 31


King Khalid University, Computer Engineering Department 9 January 2023

Reducing Cache Misses: 2. Higher Associativity

• Increasing associativity helps reduce conflict


misses
• 2:1 Cache Rule:
– The miss rate of a direct mapped cache of size N is
about equal to the miss rate of a 2-way set
associative cache of size N/2
– For example, the miss rate of a 32 Kbyte direct
mapped cache is about equal to the miss rate of a
16 Kbyte 2-way set associative cache
• Disadvantages of higher associativity
– Need to do large number of comparisons
– Need n-to-1 multiplexor for n-way set associative
– Could increase hit time
– Consume more power

Chapter 2 — Memory Hierarchy Design 32


King Khalid University, Computer Engineering Department 9 January 2023

AMAT vs. Associativity

Cache Size Associativity


(KB) 1-way 2-way 4-way 8-way
1 7.65 6.60 6.22 5.44
2 5.90 4.90 4.62 4.09
4 4.60 3.95 3.57 3.19
8 3.30 3.00 2.87 2.59
16 2.45 2.20 2.12 2.04
32 2.00 1.80 1.77 1.79
64 1.70 1.60 1.57 1.59
128 1.50 1.45 1.42 1.44

Red means A.M.A.T. not improved by more associativity

Q4: What Happens on a Write?


The write policies often distinguish cache designs. There are two
basic options when writing to the cache:
• Write through: The information is written to both the block in the
cache and to the block in the lower-level memory.
• Write back: The information is written only to the block in the
cache. The modified cache block is written to main memory only
when it is replaced.
– is block clean or dirty? (add a dirty bit to each block)
• Pros and Cons of each:
– Write through
» Read misses cannot result in writes to memory, (always clean)
» Easier to implement and simplifies data coherency
» Always combine with write buffers to avoid memory latency
– Write back
» Less memory traffic (Less writes only when replacing if dirty).
» Perform writes at the speed of the cache

Chapter 2 — Memory Hierarchy Design 33


King Khalid University, Computer Engineering Department 9 January 2023

Q4: What Happens on a Write?


• Since data does not have to be brought into the
cache on a write miss, there are two options:
– Write allocate
» The block is brought into the cache on a write miss
» Used with write-back caches
» Hope subsequent writes to the block hit in cache
– No-write allocate
» The block is modified in memory, but not brought
into the cache
» Used with write-through caches
» Writes have to go to memory anyway, so why bring
the block into the cache

Chapter 2 — Memory Hierarchy Design 34

You might also like