0% found this document useful (0 votes)

126 views25 pages

Cau 6 Cache

The document discusses key concepts in computer architecture related to memory hierarchy, including cache types, virtual memory, and performance metrics like miss rate and miss penalty. It explains terms such as cache hit, cache miss, and memory stall cycles, along with strategies for block placement, identification, replacement, and write policies. Additionally, it provides examples and calculations to illustrate the impact of cache performance on CPU execution time.

Uploaded by

Tuan Dao Duy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

126 views25 pages

Cau 6 Cache

Uploaded by

Tuan Dao Duy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

CS4617 Computer Architecture

Lecture 3: Memory Hierarchy 1

Dr J Vaughan

September 15, 2014

1/25

Important terms

Cache

fully associative

write allocate

Virtual memory

dirty bit

unified cache

Memory stall cycles

block offset

misses per instruction

Direct mapped

write-back

block

Valid hit

data cache

locality

Block address

hit time

address trace

Table: Memory terms

2/25

Important terms (ctd)

Write-through

Cache miss

Set

Instruction cache

Page fault

Random replacement

Average memory
access time

Miss rate

Index field

Cache hit

n-way set associative

No-write allocate

Page

Least recently used

Write buffer

Miss penalty

Tag field

Write stall

Table: Memory terms

3/25

Term definitions 1
Cache The first level of memory met when the address
leaves the processor. The word cache is often used to
mean buffering commonly-used items for re-use
Cache hit Success: finding a referenced item in cache
Cache miss Failure: the required item is not in the cache
Block The fixed number of bytes of main memory that is
copied to the cache in one transfer operation. This
transfer happens when a cache miss occurs
Temporal locality A referenced item is likely to be referenced again
in the near future
Spatial locality Other data in the same block as a referenced item
are likely to be needed soon

4/25

Term definitions 2

Virtual memory The extension of memory to an address space that

encompasses the physical residence on disk of some
programs and data
Page A fixed-size block of virtual memory address space,
usually in the range 1K to 4K.
A page is either in main memory or on disk
Page fault An interrupt generated when the processor references
a page that is neither in cache nor in main memory

5/25

Term definitions 3

Memory stall cycles Number of cycles during which processor is

stalled waiting for a memory access
Miss penalty Cost per cache miss
Address trace A record of instruction and data references with a
count of the number of accesses and miss totals

6/25

Term definitions 4
I

CPU execution time =

(CPU clock cycles + Memory stall cycles) clock cycle time
Assumes CPU clock cycles include time to handle a cache hit
and that the processor is stalled during a cache miss

Memory stall cycles = Number of misses Miss penalty

Misses
Miss penalty
= IC Instruction
Memory accesses
= IC Instruction Miss rate Miss penalty
where IC = instruction count

Miss rate
Fraction of cache accesses that miss =
Number of accesses that miss
Number of accesses

7/25

Miss rates and miss penalties are different for reads and writes
but are averaged here

Example

8/25

Computer cycles per instruction (CPI), is 1.0 when all memory

accesses are cache hits. The only data accesses are loads and
stores, representing a total of 50% of the instructions.
If the miss penalty is 25 clock cycles and the miss rate is 2%,
how much faster would the computer be if all instructions
were cache hits?

Solution

If all accesses are cache hits:

CPU execution time =

(CPU clock cycles + Memory stall cycles) Clock cycle =
(IC CPI + 0) Clock cycle = IC 1.0 Clock cycle

With real cache:

9/25

Memory stall cycles =

accesses
Miss rate Miss penalty
IC Memory
Instruction
= IC (1 + 0.5) 0.02 25 = IC 0.75
where (1 + 0.5) represents 1 instruction access and 0.5 data
accesses per instruction

Solution (continued)

10/25

Total performance:
CPU execution timecache
= (IC 1.0 + IC 0.75) Clock cycle time
= 1.75 IC Clock cycle time
execution timecache
Performance ratio = CPU
CPU execution time
1.75IC Clock cycle time
= 1.0IC Clock cycle time
= 1.75

So computer with no cache misses is 1.75 times faster

Misses per instruction

accesses
= Miss rateMemory
Instruction count
accesses
= Miss rate Memory
Instruction

This formula is useful when the average number of memory

accesses per instruction is known
It allows conversion of miss rate into misses per instruction
and vice versa

In the last example,

Misses
Instruction

accesses
= Miss rate Memory
Instruction
= 0.02 1.5 = 0.03

11/25

Example

Same data as previous example

What is memory stall time in terms of instruction count?

Assume miss rate of 30 per 1000 instructions

Answer
I

12/25

Memory stall cycles

= Number of misses Miss penalty
Misses
Miss penalty
= IC Instruction
= IC 0.75

Four Memory Hierarchy Questions

Q1: Block placement Where can a block be placed in the upper

level?
Q2: Block identification How is a block found if it is in the upper
level?
Q3: Block replacement Which block should be replaced on a miss?
Q4: Write strategy What happens on a write?

13/25

Q1: Where can a block be placed in cache?

Three organisations

Direct mapping Line = (Block address) mod (Number of blocks in

cache)
Associative mapping Block can be placed in any line
Set-associative mapping n lines per set = n-way set
Set = block address mod Number of sets in cache
Place block in any set line

14/25

Mapping

Direct mapping = 1-way set-associative

Associative with m blocks = m-way set associative

Most processor caches are either

I
I
I

15/25

Direct mapped
2-way set associative
4-way set associative

Q2: How is a block found if it is in cache?

16/25

A tag on every block frame gives the block address

All possible tags are searched in parallel for tag of required

block

A valid bit used to indicate if block contents are valid

If the valid bit is not set, there is no match

Memory address from processor

17/25

Address =< Block address >< Block offset >

Block address =< Tag >< Index >

Index field selects set

Tag field used to search in set for a hit

Offset selects data when block found

Q3: Which block should be replaced on a cache miss?

Direct mapping
I

Fully associative or set associative

18/25

Only 1 block frame checked for a hit, only that block can be
replaced
Choice of which block to replace

Replacement strategies

Random
I

LRU
I

Relies on locality

FIFO
I

19/25

Selects block for replacement randomly

LRU is difficult to calculate so the oldest block is selected for

replacement

Q4: What happens on a write?

Reads dominate processor cache access

All instruction fetches are reads

Most instructions do not write to memory

Make common case fast

20/25

Optimize for reads

Common case is easy to make fast

21/25

Read block from cache at the same time that the tag is read
and compared

Block read begins as soon as block address is available

If read is a hit, requested part of block is passed to CPU

immediately

If read is a miss, no benefit, no harm, just ignore the value

read

Write

22/25

Tag checking and block modification cannot occur in parallel

Therefore, writes take longer than reads

Processor specifies size of write (between 1 and 8 bytes) so

only that part of the block can be changed

Reads can access more bytes than necessary without difficulty

Write policies

Write-through
I

Write-back
I

23/25

Information written to the block in cache and to lower-level

memory
Only write to block in lower-level memory if dirty bit set when
block is replaced

Advantages of write-back

24/25

Writes occur at the speed of cache memory

Multiple writes within a block require only 1 write to

lower-level memory

So write-back uses less memory bandwidth which is useful in

multiprocessors

Write-back uses the memory hierarchy and interconnect less

than write-through so it saves power and is appropriate for
embedded applications

Advantages of write-through

25/25

Easier to implement than write-back

Cache is always clean so misses never cause a write to the

lower level

Next lower level has current copy of data which simplifies data
coherence

Data coherence is important for multiprocessors and I/O

Multilevel caches make write-through more viable for the

upper-level caches as the writes need only propagate to the
next lower level rather than all the way to main memory

Memory Hierarchy - Introduction: Cost Performance of Memory Reference
No ratings yet
Memory Hierarchy - Introduction: Cost Performance of Memory Reference
52 pages
UNIT2 Cahe-Opt
No ratings yet
UNIT2 Cahe-Opt
134 pages
05) Cache Memory Introduction
No ratings yet
05) Cache Memory Introduction
20 pages
Memory Hierarchy Design Guide
No ratings yet
Memory Hierarchy Design Guide
115 pages
Cache Performance and Write Strategies
No ratings yet
Cache Performance and Write Strategies
30 pages
Cache Memory
No ratings yet
Cache Memory
28 pages
Lect12 Cache
No ratings yet
Lect12 Cache
39 pages
25 e 50 Beb 5 Aad 8 F 60
No ratings yet
25 e 50 Beb 5 Aad 8 F 60
49 pages
Memory Hierarchy Design Guide
No ratings yet
Memory Hierarchy Design Guide
54 pages
Cache
No ratings yet
Cache
34 pages
Computer Org and Arch: R.Magesh
No ratings yet
Computer Org and Arch: R.Magesh
48 pages
Memory Hierarchy and Cache Optimization
No ratings yet
Memory Hierarchy and Cache Optimization
36 pages
Ca Mod 2
No ratings yet
Ca Mod 2
40 pages
ch5 1
No ratings yet
ch5 1
44 pages
Computer Architecture: Memory Organization
No ratings yet
Computer Architecture: Memory Organization
65 pages
10 Cacheperf
No ratings yet
10 Cacheperf
24 pages
Chapter # 05
No ratings yet
Chapter # 05
42 pages
Chapter5 PDF
No ratings yet
Chapter5 PDF
95 pages
Week 13 - Lecture 13 - Memory (Cont)
No ratings yet
Week 13 - Lecture 13 - Memory (Cont)
31 pages
Memory & Cache Fundamentals
No ratings yet
Memory & Cache Fundamentals
38 pages
Computer Architecture: Assoc. Prof. Nguyễn Trí Thành, Phd
No ratings yet
Computer Architecture: Assoc. Prof. Nguyễn Trí Thành, Phd
55 pages
ch2 Appb
No ratings yet
ch2 Appb
58 pages
Cache Memory: A Safe Place For Hiding or Storing Things
100% (1)
Cache Memory: A Safe Place For Hiding or Storing Things
34 pages
Memory Cache
No ratings yet
Memory Cache
18 pages
CS2115 Chapter-6
No ratings yet
CS2115 Chapter-6
45 pages
Lecture 13 - Introduction To Cache
No ratings yet
Lecture 13 - Introduction To Cache
47 pages
CMP3010L09 MemoryII
No ratings yet
CMP3010L09 MemoryII
39 pages
Lecture 8
No ratings yet
Lecture 8
33 pages
Graduate Computer Architecture: Caches & Memory Systems
No ratings yet
Graduate Computer Architecture: Caches & Memory Systems
49 pages
Advanced Computer Architecture: BY Dr. Radwa M. Tawfeek
No ratings yet
Advanced Computer Architecture: BY Dr. Radwa M. Tawfeek
32 pages
Cache Memory: A Safe Place For Hiding or Storing Things
No ratings yet
Cache Memory: A Safe Place For Hiding or Storing Things
34 pages
Chapter 5.1-5.6 Memory
No ratings yet
Chapter 5.1-5.6 Memory
26 pages
Understanding Memory Hierarchy and Caches
No ratings yet
Understanding Memory Hierarchy and Caches
39 pages
Cache
No ratings yet
Cache
36 pages
Cache Optimization Techniques
No ratings yet
Cache Optimization Techniques
23 pages
DigitalLogic ComputerOrganization L22 CachesP3 Handout
No ratings yet
DigitalLogic ComputerOrganization L22 CachesP3 Handout
52 pages
Unit Iv
No ratings yet
Unit Iv
61 pages
Cache Performance Insights
No ratings yet
Cache Performance Insights
17 pages
Chapter 2z
No ratings yet
Chapter 2z
54 pages
CMSC 611: Advanced Computer Architecture
No ratings yet
CMSC 611: Advanced Computer Architecture
21 pages
Cache Memory Parameters Explained
No ratings yet
Cache Memory Parameters Explained
18 pages
53-Cache Memory - Principles, Cache Memory Management Techniques-28!02!2025
No ratings yet
53-Cache Memory - Principles, Cache Memory Management Techniques-28!02!2025
38 pages
Question: Who Cares About The Memory Hierarchy?: Caches and Memory Systems I
No ratings yet
Question: Who Cares About The Memory Hierarchy?: Caches and Memory Systems I
13 pages
Lecture 16
No ratings yet
Lecture 16
22 pages
CA Chap5 Memory
No ratings yet
CA Chap5 Memory
91 pages
Computer Architecture and Organization: Lecture15: Cache Performance
No ratings yet
Computer Architecture and Organization: Lecture15: Cache Performance
17 pages
Cache 1 54
No ratings yet
Cache 1 54
54 pages
ACA Unit 2
No ratings yet
ACA Unit 2
45 pages
Memory Hierarchy Design
No ratings yet
Memory Hierarchy Design
76 pages
15IF11 Multicore B
No ratings yet
15IF11 Multicore B
36 pages
Memory Hierarchy for Engineers
No ratings yet
Memory Hierarchy for Engineers
15 pages
Understanding Cache Memory Operations
No ratings yet
Understanding Cache Memory Operations
49 pages
Fundamentals of Computer Systems: Caches
No ratings yet
Fundamentals of Computer Systems: Caches
28 pages
Cache Memory Performance
No ratings yet
Cache Memory Performance
10 pages
Advanced Architecture Memory
No ratings yet
Advanced Architecture Memory
13 pages
Computer Organization and Architecture Chapter 7 Large and Fast Exploiting
No ratings yet
Computer Organization and Architecture Chapter 7 Large and Fast Exploiting
32 pages
CH04 COA10e
No ratings yet
CH04 COA10e
46 pages
Border Router Discovery Protocol (BRDP) Based Routing: Exit Routing For Multi-Homed Networks
No ratings yet
Border Router Discovery Protocol (BRDP) Based Routing: Exit Routing For Multi-Homed Networks
34 pages
Program The Internet of Things With Swift For iOSB5OmVn94E6 PDF
100% (2)
Program The Internet of Things With Swift For iOSB5OmVn94E6 PDF
509 pages
C2009 - Fast-Handover
No ratings yet
C2009 - Fast-Handover
6 pages
Evaluating Mobility Models Within An Ad Hoc Network
No ratings yet
Evaluating Mobility Models Within An Ad Hoc Network
86 pages
Queueing Ok
No ratings yet
Queueing Ok
25 pages
400 Must Have Word For Toefl Ibt
100% (3)
400 Must Have Word For Toefl Ibt
209 pages
Advanced Wireless Communication
No ratings yet
Advanced Wireless Communication
37 pages
Queueing Theory Overview
No ratings yet
Queueing Theory Overview
50 pages
SOR Main Angol
No ratings yet
SOR Main Angol
193 pages
963 Bai Essays Mau
No ratings yet
963 Bai Essays Mau
632 pages
IEEE 802.21 Reliable Event
No ratings yet
IEEE 802.21 Reliable Event
6 pages
CH 2
No ratings yet
CH 2
48 pages
Chinese Flashcards for Beginners
100% (1)
Chinese Flashcards for Beginners
55 pages
Learn Chinese 你好
No ratings yet
Learn Chinese 你好
32 pages
Learn Chinese 你好
No ratings yet
Learn Chinese 你好
32 pages
Learn Chinese Fast: 48-Hour Guide
No ratings yet
Learn Chinese Fast: 48-Hour Guide
1 page
British English and American English
No ratings yet
British English and American English
13 pages
PHP and Zend Framework Getting Started
No ratings yet
PHP and Zend Framework Getting Started
6 pages
Data Analytics in Ministry of Finance
100% (1)
Data Analytics in Ministry of Finance
46 pages
Dbmslab PDF
100% (1)
Dbmslab PDF
114 pages
ESD Unit 4 Memory 2024
No ratings yet
ESD Unit 4 Memory 2024
78 pages
WS2 Database
No ratings yet
WS2 Database
2 pages
Final Hotel Management Full Project
No ratings yet
Final Hotel Management Full Project
19 pages
List of Vacancies CO September 07 2021
No ratings yet
List of Vacancies CO September 07 2021
14 pages
Assignment-1 DWH
No ratings yet
Assignment-1 DWH
13 pages
Rieger Et Al 2022 Doing Embodied Mapping S Becoming With in Qualitative Inquiry
No ratings yet
Rieger Et Al 2022 Doing Embodied Mapping S Becoming With in Qualitative Inquiry
14 pages
Mini Project Report on DBMS
No ratings yet
Mini Project Report on DBMS
31 pages
Factors Influencing The Time Management Behaviours of Small Busin
No ratings yet
Factors Influencing The Time Management Behaviours of Small Busin
12 pages
Bureau of Fire Protection Bureau of Fire Protection: Visitor'S Slip Visitor'S Slip
100% (1)
Bureau of Fire Protection Bureau of Fire Protection: Visitor'S Slip Visitor'S Slip
2 pages
Content Analysis Techniques Overview
No ratings yet
Content Analysis Techniques Overview
34 pages
Microprocessor I/O & Memory Mapping
No ratings yet
Microprocessor I/O & Memory Mapping
14 pages
Tanralili Hiking Trail Coordinates
No ratings yet
Tanralili Hiking Trail Coordinates
1 page
Data Science Presentation
No ratings yet
Data Science Presentation
27 pages
PL/SQL Quiz: Data Types and Code Practices
No ratings yet
PL/SQL Quiz: Data Types and Code Practices
5 pages
Data Science Interview Prep Guide
No ratings yet
Data Science Interview Prep Guide
79 pages
Job Designing Analysis BSNL
No ratings yet
Job Designing Analysis BSNL
75 pages
Resume Format M
No ratings yet
Resume Format M
2 pages
E-Marketing Lecture Notes Knowledge Management
No ratings yet
E-Marketing Lecture Notes Knowledge Management
15 pages
Cs614 Solved Mcqs Final Term by Vu Malik
No ratings yet
Cs614 Solved Mcqs Final Term by Vu Malik
31 pages
Advanced Databricks Curriculum
No ratings yet
Advanced Databricks Curriculum
2 pages
Data Warehousing for IT Professionals
No ratings yet
Data Warehousing for IT Professionals
8 pages
Paperless Research Output Strategy
No ratings yet
Paperless Research Output Strategy
16 pages
Data Literacy
No ratings yet
Data Literacy
33 pages
Group 7
No ratings yet
Group 7
10 pages
Unit 2 (Rmipr)
No ratings yet
Unit 2 (Rmipr)
13 pages
Descriptive Analytics
No ratings yet
Descriptive Analytics
11 pages
Order Management Data Strategies
No ratings yet
Order Management Data Strategies
43 pages
Digital Forensics Coursework
No ratings yet
Digital Forensics Coursework
25 pages

Cau 6 Cache

Uploaded by

Cau 6 Cache

Uploaded by

CS4617 Computer Architecture

Lecture 3: Memory Hierarchy 1

September 15, 2014

Memory stall cycles

misses per instruction

Table: Memory terms

Important terms (ctd)

n-way set associative

Least recently used

Table: Memory terms

Virtual memory The extension of memory to an address space that

Memory stall cycles Number of cycles during which processor is

CPU execution time =

Memory stall cycles = Number of misses Miss penalty

Computer cycles per instruction (CPI), is 1.0 when all memory

If all accesses are cache hits:

CPU execution time =

With real cache:

Memory stall cycles =

So computer with no cache misses is 1.75 times faster

Misses per instruction

This formula is useful when the average number of memory

In the last example,

Same data as previous example

What is memory stall time in terms of instruction count?

Assume miss rate of 30 per 1000 instructions

Memory stall cycles

Four Memory Hierarchy Questions

Q1: Block placement Where can a block be placed in the upper

Q1: Where can a block be placed in cache?

Direct mapping Line = (Block address) mod (Number of blocks in

Direct mapping = 1-way set-associative

Associative with m blocks = m-way set associative

Q2: How is a block found if it is in cache?

A tag on every block frame gives the block address

All possible tags are searched in parallel for tag of required

A valid bit used to indicate if block contents are valid

If the valid bit is not set, there is no match

Memory address from processor

Address =< Block address >< Block offset >

Block address =< Tag >< Index >

Index field selects set

Tag field used to search in set for a hit

Offset selects data when block found

Q3: Which block should be replaced on a cache miss?

Fully associative or set associative

Selects block for replacement randomly

LRU is difficult to calculate so the oldest block is selected for

Q4: What happens on a write?

Reads dominate processor cache access

All instruction fetches are reads

Most instructions do not write to memory

Optimize for reads

Common case is easy to make fast

Block read begins as soon as block address is available

If read is a hit, requested part of block is passed to CPU

If read is a miss, no benefit, no harm, just ignore the value

Tag checking and block modification cannot occur in parallel

Therefore, writes take longer than reads

Processor specifies size of write (between 1 and 8 bytes) so

Reads can access more bytes than necessary without difficulty

Information written to the block in cache and to lower-level

Writes occur at the speed of cache memory

Multiple writes within a block require only 1 write to

So write-back uses less memory bandwidth which is useful in

Write-back uses the memory hierarchy and interconnect less

Easier to implement than write-back

Cache is always clean so misses never cause a write to the

Data coherence is important for multiprocessors and I/O

Multilevel caches make write-through more viable for the

You might also like