0% found this document useful (0 votes)

61 views20 pages

Cse 410 Computer Systems: Hal Perkins Spring 2010 L T 13 C Hwit DPF Lecture 13 - Cache Writes and Performance

The document discusses writing to caches and improving memory system performance. It covers write-through and write-back cache policies for handling write hits. For write misses, it describes write-around and allocate-on-write strategies. It also discusses how interleaving main memory into multiple banks can reduce cache miss penalties by overlapping memory access latencies.

Uploaded by

Adit Kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

61 views20 pages

Cse 410 Computer Systems: Hal Perkins Spring 2010 L T 13 C Hwit DPF Lecture 13 - Cache Writes and Performance

Uploaded by

Adit Kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 20

CSE 410

Computer Systems
Hal Perkins
Spring 2010
L t
Lecture
13 Cache
C h W
Writes
it and
dP
Performance
f

Reading
Computer
p
Organization
g
and Design
g
Section 5.1 Introduction
Section 5.2 Basics of Caches
Section 5.3 Measuring and Improving Cache
Performance

Cache Writing & Performance

Whats left?
Writing to caches: keeping memory consistent & writeallocation.
allocation
Well also investigate some main memory
organizations
g
that can help
p increase memory
y system
y
performance.
Later, well talk about Virtual Memory, where memory is
treated like a cache of the disk
disk.
3

Four important questions

1 When we copy a block of data from main memory
1.
to the cache, where exactly should we put it?
2. How can we tell if a word is already in the cache, or
if it has to be fetched from main memory first?
3. Eventually, the small cache memory might fill up.
To load a new block from main RAM, wed have to
replace one of the existing blocks in the cache...
which one?
4. How can write operations be handled by the
y system?
y
memory

Weve
W answeredd the
h first
fi 3.
3 Now,
N
we consider
id the
h 4th.
4h
4

Writing to a cache
Writing to a cache raises several additional issues.
First, lets assume that the address we want to write to is already
loaded in the cache. Well assume a simple direct-mapped cache.
Index

Tag
g

Data

...
110

Address

Data

...
1

11010

42803

...

1101 0110

42803

...

If we write a new value to that address, we can store the new data in
the cache, and avoid an expensive main memory access.
Mem[214] = 21763

Index

Tag

Data

...

Data

...

...
110

Address

11010

21763

1101 0110
...

42803
5

Inconsistent memory
But now the cache and memoryy contain different,,
inconsistent data!
How can we ensure that subsequent loads will return
th right
the
i ht value?
l ?
This is also problematic if other devices are sharing
the main memory, as in a multiprocessor system.
Index

Tag

Data

...
110
...

Address

Data

...
1

11010

21763

1101 0110

42803

...

Write-through
Write
through caches
A write-through cache solves the inconsistency problem by
f i allll writes
forcing
it tto update
d t b
both
th th
the cache
h and
d the
th main
i
memory.
Mem[214] = 21763

Index

Tag

Data

...
110
...

Address

Data

...
1

11010

21763

1101 0110

21763

...

Thi
This is
i simple
i l tto iimplement
l
t and
d kkeeps th
the cache
h and
d
memory consistent.
Why is this not so good?

Write-back
Write
back caches
In a write-back cache, the memory is not updated until the cache block
needs to be replaced (e.g., when loading data into a full cache set).
For example, we might write some data to the cache at first, leaving it
inconsistent with the main memory as shown before.
The cache block is marked dirty to indicate this inconsistency
Mem[214] = 21763

Index

V Dirty

Tag

Data

...
110
...

11010

21763

Address

Data

1000 1110

1225

1101 0110

42803

...

Subsequent reads to the same memory address will be serviced by the

p
data.
cache, which contains the correct, updated

Finishing the write back

We dont need to store the new value back to main memory unless the cache
bl k gets
block
t replaced.
l
d
For example, on a read from Mem[142], which maps to the same cache block,
the modified cache contents will first be written to main memory.
Index

V Dirty

Tag

Data

...
110

11010

21763

Data

1000 1110

1225

1101 0110

21763

...

Address

Only then can the cache block be replaced with data from address 142.

Index

V Dirty

Tag

Data

...
110
...

10001

1225

Address

Data

1000 1110

1225

1101 0110

21763

...
9

Write-back
Write
back cache discussion
The advantage
g of write-back caches is that not all
write operations need to access main memory, as
with write-through caches.
If a single
i l address
dd
iis ffrequently
tl written
itt tto, th
then it
doesnt pay to keep writing that data through to
main memory.
If several bytes within the same cache block are
modified, they will only force one memory write
operation at write
write-back
back time.
time

Write-back
Write
back cache discussion
Each block in a write-back cache needs a dirtyy bit to
indicate whether or not it must be saved to main
memory before being replacedotherwise we might
perform unnecessary
p
y writebacks.
Notice the penalty for the main memory access will
not be applied until the execution of some
subsequent instruction following the write
write.
In our example, the write to Mem[214] affected
only the cache.
But
B t the
th load
l d ffrom Mem[142]
M [142] resulted
lt d in
i two
t
memory accesses: one to save data to address
214, and one to load data from address 142.
11

Write misses
A second scenario is if we try to write to an address that is
nott already
l d contained
t i d iin th
the cache;
h thi
this iis called
ll d a write
it
miss.
Lets say we want to store 21763 into Mem[1101 0110] but
we find that address is not currently in the cache
cache.
Index

Tag

Data

...
110
...

Address

Data

...
1

00010

123456

1101 0110

6378

...

When we update Mem[1101 0110], should we also load it

into the cache?

Write around caches (a.k.a. write-no-allocate)

write no allocate)
With a write around policy, the write operation goes directly to
main memory without affecting the cache
cache.
Mem[214] = 21763

Index

Tag

Data

...

Data

...

...
110

Address

00010

123456

1101 0110

21763

...

This is good when data is written but not immediately used

again,
i iin which
hi h case th
theres
no point
i t tto lload
d it iinto
t th
the cache
h yet.
t
for (int i = 0; i < SIZE; i++)
a[i] = i;

Allocate on write
An allocate on write strategy would instead load the newly
written data into the cache.
Mem[214]
[
] = 21763

Index

Tag

Data

...
110
...

Address

Data

...
1

11010

21763

1101 0110

21763

...

If that data is needed again soon, it will be available in the

cache.
14

Basic main memory design

There are some ways the main memory can be organized to reduce
miss penalties and help with caching
caching.
For some concrete examples, lets assume the following
three steps are taken when a cache needs to load data
CPU
from the main memory.
1 It ttakes
1.
k 1 cycle
l tto send
d an address
dd
tto th
the RAM
RAM.
2. There is a 15-cycle latency for each RAM access.
Cache
3. It takes 1 cycle to return data from the RAM.
In the setup shown here
here, the buses from the CPU to the
cache and from the cache to RAM are all one word wide.
If the cache has one-word blocks, then filling a block
from RAM (i.e., the miss penalty) would take 17 cycles.
1 + 15 + 1 = 17 clock cycles

Main
Memory

The cache controller has to send the desired address to

the RAM, wait and receive the data.
15

Miss penalties for larger cache blocks

If the cache has four-word blocks, then loading a single block would
need four individual main memory accesses, and a miss penalty of 68
cycles!
4 x (1 + 15 + 1) = 68 clock cycles
CPU

Cache

Main
Memory
16

A wider memory
A simple way to decrease the
miss p
penalty
y is to widen the
memory and its interface to
the cache, so we can read
multiple words from RAM in
one shot
shot.
If we could read four words
from the memory at once, a
four-word cache load would
need just 17
1 cycles.
1 + 15 + 1 = 17 cycles

CPU

Cache

Main
Memory

The disadvantage is the cost

of the wider buseseach
additional bit of memory width
requires another connection to
the cache.
17

An interleaved memory
Another approach is to interleave
the memory, or split it into banks
banks
that can be accessed individually.
The main benefit is overlapping
the latencies of accessing each
word
word.
For example, if our main memory
has four banks, each one byte
wide, then we could load four
bytes into a cache block in just 20
cycles.

CPU

Cache

Main Memory

1 + 15 + (4 x 1) = 20 cycles
Our buses are still one byte wide
here, so four cycles are needed to
transfer data to the caches.
This is cheaper
p than implementing
p
g
a four-byte bus, but not too much
slower.

Bank 0

Bank 1

Bank 2

Bank 3

Interleaved memory accesses

Clock cycles
Load word 1
Load word 2
Load word 3
Load word 4

15 cycles

Here is a diagram to show how the memory accesses can be

interleaved.
The magenta cycles represent sending an address to a memory
bank.
bank
Each memory bank has a 15-cycle latency, and it takes another
cycle (shown in blue) to return data from the memory.
This is the same basic idea as pipelining!
As soon as we request data from one memory bank, we can go
ahead and request data from another bank as well.
Each individual load takes 17 clock cycles, but four overlapped
loads require just 20 cycles.
19

Summary

Writing to a cache poses a couple of interesting issues.

Write-through and write-back policies keep the cache consistent with main
memory in different ways for write hits.
Write-around and allocate-on-write are two strategies to handle write misses,
differing in whether updated data is loaded into the cache.
Memory system performance depends upon the cache hit time,
time miss rate and
miss penalty, as well as the actual program being executed.
We can use these numbers to find the average memory access time.
We can also revise our CPU time formula to include stall cycles.
AMAT = Hit time + (Miss rate x Miss penalty)
Memory stall cycles = Memory accesses x miss rate x miss penalty
CPU time = (CPU execution cycles + Memory stall cycles) x Cycle time

The organization of a memory system affects its performance.

The cache size, block size, and associativity affect the miss rate.
We can organize the main memory to help reduce miss penalties. For
example interleaved memory supports pipelined data accesses.
example,
accesses
20

Cache Memory Types and Write Strategies
No ratings yet
Cache Memory Types and Write Strategies
11 pages
Chap 6
No ratings yet
Chap 6
48 pages
Cache Optimization Techniques
No ratings yet
Cache Optimization Techniques
23 pages
Cache Performance and Write Strategies
No ratings yet
Cache Performance and Write Strategies
30 pages
Cache1 2
No ratings yet
Cache1 2
30 pages
Chapter 7
No ratings yet
Chapter 7
23 pages
Advanced Computer Architecture: BY Dr. Radwa M. Tawfeek
No ratings yet
Advanced Computer Architecture: BY Dr. Radwa M. Tawfeek
32 pages
UNIT2 Cahe-Opt
No ratings yet
UNIT2 Cahe-Opt
134 pages
Chapter 6
No ratings yet
Chapter 6
37 pages
Memory Hierarchy and Cache Mapping Techniques
No ratings yet
Memory Hierarchy and Cache Mapping Techniques
58 pages
Memory Hierarchy Design Guide
No ratings yet
Memory Hierarchy Design Guide
115 pages
Address Field Breakdown for Cache System
No ratings yet
Address Field Breakdown for Cache System
55 pages
Lec2 PDF
No ratings yet
Lec2 PDF
21 pages
Chapter 2z
No ratings yet
Chapter 2z
54 pages
Lectures wk11
No ratings yet
Lectures wk11
21 pages
Lecture 13 - Introduction To Cache
No ratings yet
Lecture 13 - Introduction To Cache
47 pages
Cache
No ratings yet
Cache
34 pages
Unit V
No ratings yet
Unit V
44 pages
Computer Org and Arch: R.Magesh
No ratings yet
Computer Org and Arch: R.Magesh
48 pages
Understanding Cache Memory Design
No ratings yet
Understanding Cache Memory Design
38 pages
Chapter 5.1-5.6 Memory
No ratings yet
Chapter 5.1-5.6 Memory
26 pages
Computer Organization and Architecture (AT70.01) : Comp. Sc. and Inf. MGMT
No ratings yet
Computer Organization and Architecture (AT70.01) : Comp. Sc. and Inf. MGMT
49 pages
Cache Mapping
100% (1)
Cache Mapping
44 pages
25 e 50 Beb 5 Aad 8 F 60
No ratings yet
25 e 50 Beb 5 Aad 8 F 60
49 pages
18 Caches Cornell PDF
No ratings yet
18 Caches Cornell PDF
43 pages
Direct-Mapped Cache Protocols Explained
No ratings yet
Direct-Mapped Cache Protocols Explained
25 pages
Unit - 5 DPCO
No ratings yet
Unit - 5 DPCO
35 pages
Memory 2
No ratings yet
Memory 2
31 pages
Lec8 Memory
No ratings yet
Lec8 Memory
17 pages
Memory
No ratings yet
Memory
42 pages
Cache Memories
No ratings yet
Cache Memories
41 pages
6.module 2 - Part 2
No ratings yet
6.module 2 - Part 2
39 pages
Computer Architecture: Assoc. Prof. Nguyễn Trí Thành, Phd
No ratings yet
Computer Architecture: Assoc. Prof. Nguyễn Trí Thành, Phd
55 pages
Computer Architecture: Memory Organization
No ratings yet
Computer Architecture: Memory Organization
65 pages
Chapter5-The Memory System
No ratings yet
Chapter5-The Memory System
36 pages
ch5 1
No ratings yet
ch5 1
44 pages
Computer Organization and Architecture (AT70.01)
No ratings yet
Computer Organization and Architecture (AT70.01)
49 pages
5 Memory Hierarchy
No ratings yet
5 Memory Hierarchy
99 pages
Cache Memory, Virtual Memory and Auxiliary Memory Notes
No ratings yet
Cache Memory, Virtual Memory and Auxiliary Memory Notes
42 pages
Computer Organization and Architecture Chapter 7 Large and Fast Exploiting
No ratings yet
Computer Organization and Architecture Chapter 7 Large and Fast Exploiting
32 pages
Computer Organization and Architecture
No ratings yet
Computer Organization and Architecture
12 pages
Intel Optane Memory Business Overview
No ratings yet
Intel Optane Memory Business Overview
27 pages
Cache Memory, Virtual Memory and Auxiliary Memory Ppts Lecture
No ratings yet
Cache Memory, Virtual Memory and Auxiliary Memory Ppts Lecture
42 pages
Principles of Cache Memory Explained
No ratings yet
Principles of Cache Memory Explained
47 pages
15IF11 Multicore B
No ratings yet
15IF11 Multicore B
36 pages
Lecture 16
No ratings yet
Lecture 16
22 pages
CA Chap5 Memory
No ratings yet
CA Chap5 Memory
91 pages
09 Caches Tlbs
No ratings yet
09 Caches Tlbs
33 pages
Cache PPT
No ratings yet
Cache PPT
38 pages
Understanding Cache Memory Types and Design
No ratings yet
Understanding Cache Memory Types and Design
39 pages
Cache Memory for CS Students
No ratings yet
Cache Memory for CS Students
45 pages
Memory Hierarchy and Cache Performance
No ratings yet
Memory Hierarchy and Cache Performance
101 pages
CMP3010L08 Memory
No ratings yet
CMP3010L08 Memory
45 pages
GRASP for Container Loading Optimization
No ratings yet
GRASP for Container Loading Optimization
8 pages
100 Creative Brain Teaser Answers
No ratings yet
100 Creative Brain Teaser Answers
3 pages
Tree Search For Container Loading Problem
No ratings yet
Tree Search For Container Loading Problem
26 pages
Knapsack Backtrack
No ratings yet
Knapsack Backtrack
6 pages
Pair of Straight Lines Summary
No ratings yet
Pair of Straight Lines Summary
3 pages
Kinematcs
100% (1)
Kinematcs
6 pages
Askiitians Chemistry Test204
No ratings yet
Askiitians Chemistry Test204
11 pages
How To Match A LUN's NAA Number To Its Serial Number
No ratings yet
How To Match A LUN's NAA Number To Its Serial Number
4 pages
Keycode For ELM327 Bluetooth OBDII Adapter
No ratings yet
Keycode For ELM327 Bluetooth OBDII Adapter
3 pages
Galaxy F04 Technical Specifications Guide
No ratings yet
Galaxy F04 Technical Specifications Guide
6 pages
Installed Software List 2007-2013
No ratings yet
Installed Software List 2007-2013
3 pages
Comparing ODS RTF in Batch Using VBA and SAS
No ratings yet
Comparing ODS RTF in Batch Using VBA and SAS
8 pages
Ldiag 5.18.1 Windows User Guide
No ratings yet
Ldiag 5.18.1 Windows User Guide
123 pages
C++ Graphics Programs Guide
No ratings yet
C++ Graphics Programs Guide
30 pages
Leader End of Financial Catalogue Apr-Jun 2020 - LowRez - Final
No ratings yet
Leader End of Financial Catalogue Apr-Jun 2020 - LowRez - Final
16 pages
Hardware Sales & Service Project
No ratings yet
Hardware Sales & Service Project
80 pages
Connecting Tokico LPG Pumps
100% (2)
Connecting Tokico LPG Pumps
4 pages
Uninterruptible Power Supply Circuitry Design Apc Smartups 1000
No ratings yet
Uninterruptible Power Supply Circuitry Design Apc Smartups 1000
19 pages
ALPHACAM+2021 0+SU2114+Maintenance+Report
No ratings yet
ALPHACAM+2021 0+SU2114+Maintenance+Report
9 pages
Fix Your PC Needs To Be Repaired Error 0x0000034
No ratings yet
Fix Your PC Needs To Be Repaired Error 0x0000034
2 pages
Myst Online Uru Live Manual
No ratings yet
Myst Online Uru Live Manual
16 pages
A106-Wang 0
No ratings yet
A106-Wang 0
6 pages
Golden Laser Offline Software Manual
No ratings yet
Golden Laser Offline Software Manual
93 pages
Lenovo S110 (BM5138 1.3)
33% (3)
Lenovo S110 (BM5138 1.3)
42 pages
Cohesity Dataprotect Datasheet en
No ratings yet
Cohesity Dataprotect Datasheet en
3 pages
Overview of Computer Organization Concepts
No ratings yet
Overview of Computer Organization Concepts
38 pages
Advanced C Programming Course Outline
No ratings yet
Advanced C Programming Course Outline
3 pages
Oracle SPARC Servers Sales Insights
80% (5)
Oracle SPARC Servers Sales Insights
6 pages
Pseudocode and Algorithms in C
No ratings yet
Pseudocode and Algorithms in C
4 pages
Technical Specification For KLF 200 API-Ver3-18
No ratings yet
Technical Specification For KLF 200 API-Ver3-18
115 pages
Digital Clock with Bash Scripting
No ratings yet
Digital Clock with Bash Scripting
6 pages
Advantages and Disadvantages of Browsers
No ratings yet
Advantages and Disadvantages of Browsers
6 pages
Koyo
No ratings yet
Koyo
4 pages
PSoC Designer Guide for Engineers
No ratings yet
PSoC Designer Guide for Engineers
12 pages
Compal La-6755p, La-6757p r1.0 Schematics
No ratings yet
Compal La-6755p, La-6757p r1.0 Schematics
52 pages
Osnovi Informatike 6 Razred
No ratings yet
Osnovi Informatike 6 Razred
99 pages
6th Gen Core PCH U y Io Datasheet Vol 1
No ratings yet
6th Gen Core PCH U y Io Datasheet Vol 1
278 pages

Cse 410 Computer Systems: Hal Perkins Spring 2010 L T 13 C Hwit DPF Lecture 13 - Cache Writes and Performance

Uploaded by

Cse 410 Computer Systems: Hal Perkins Spring 2010 L T 13 C Hwit DPF Lecture 13 - Cache Writes and Performance

Uploaded by

CSE 410

Cache Writing & Performance

Four important questions

Subsequent reads to the same memory address will be serviced by the

Finishing the write back

When we update Mem[1101 0110], should we also load it

Write around caches (a.k.a. write-no-allocate)

This is good when data is written but not immediately used

If that data is needed again soon, it will be available in the

Basic main memory design

The cache controller has to send the desired address to

Miss penalties for larger cache blocks

The disadvantage is the cost

Interleaved memory accesses

Here is a diagram to show how the memory accesses can be

Writing to a cache poses a couple of interesting issues.

The organization of a memory system affects its performance.

You might also like