0% found this document useful (0 votes)

88 views76 pages

Memory Hierarchy Design

Memory hierarchy design pdf chapter 2 Computer Architecture.

Uploaded by

Omkar Sanjay

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

88 views76 pages

Memory Hierarchy Design

Memory hierarchy design pdf chapter 2 Computer Architecture.

Uploaded by

Omkar Sanjay

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Computer Architecture

A Quantitative Approach, Fifth Edition

Chapter 2
Memory Hierarchy Design

Copyright © 2012, Elsevier Inc. All rights reserved. 1

Introduction
Introduction
  Programmers want unlimited amounts of memory with
low latency
  Fast memory technology is more expensive per bit than
slower memory
  Solution: organize memory system into a hierarchy
  Entire addressable memory space available in largest, slowest
memory
  Incrementally smaller and faster memories, each containing a
subset of the memory below it, proceed in steps up toward the
processor
  Temporal and spatial locality insures that nearly all
references can be found in smaller memories
  Gives the illusion of a large, fast memory being presented to the
processor

Copyright © 2012, Elsevier Inc. All rights reserved. 2

Introduction
Memory Hierarchy

Copyright © 2012, Elsevier Inc. All rights reserved. 3

Introduction
Memory Performance Gap

Increase in
memory Increase in
requests/sec memory
acesses/sec

Copyright © 2012, Elsevier Inc. All rights reserved. 4

Introduction
Memory Hierarchy Design
  Memory hierarchy design becomes more crucial
with recent multi-core processors:
  Aggregate peak bandwidth grows with # cores:
  Intel Core i7 can generate two references per core per clock
  Four cores and 3.2 GHz clock
  25.6 billion 64-bit data references/second +

  12.8 billion 128-bit instruction references

  = 409.6 GB/s!

  DRAM bandwidth is only 6% of this (25 GB/s)

  Requires:
  Multi-port, pipelined caches
  Two levels of cache per core
  Shared third-level cache on chip

Copyright © 2012, Elsevier Inc. All rights reserved. 5

Introduction
Performance and Power
  High-end microprocessors have >10 MB on-chip
cache
  Consumes large amount of area and power budget
  Static power due to leakage when not operating and
dynamic power when performing reads/writes.
  Power consumption in PMDs due to caches can be
between 25% and 50% of the total power
consumption.

Copyright © 2012, Elsevier Inc. All rights reserved. 6

Introduction
Memory Hierarchy Basics
  When a word is not found in the cache, a miss
occurs:
  Fetch word from lower level in hierarchy, requiring a
higher latency reference
  Lower level may be another cache or the main
memory
  Also fetch the other words contained within the block
  Takes advantage of spatial locality
  Place block into cache in any location within its set,
determined by address
  block address MOD number of sets in the cache

Copyright © 2012, Elsevier Inc. All rights reserved. 7

Introduction
Memory Hierarchy Basics
CPU
Read 007

CACHE MISS! CACHE

000 001 003 004

005 006 007 008

MEMORY
009 010 011 012
013 014 015 016
017 018 019 020
021 022 023 024

Copyright © 2013, Daniel A. Menasce. All rights reserved. 8

Introduction
Memory Hierarchy Basics
CPU
007

005 006 007 008

CACHE

005 006 007 008

000 001 003 004

005 006 007 008

MEMORY
009 010 011 012
013 014 015 016
017 018 019 020
021 022 023 024

Copyright © 2013, Daniel A. Menasce. All rights reserved. 9

Introduction
Memory Hierarchy Basics
CPU
Read 005

005 006 007 008

CACHE HIT! CACHE

000 001 003 004

005 006 007 008

MEMORY
009 010 011 012
013 014 015 016
017 018 019 020
021 022 023 024

Copyright © 2013, Daniel A. Menasce. All rights reserved. 10

Introduction
Memory Hierarchy Basics
CPU
005

005 006 007 008

CACHE

000 001 003 004

005 006 007 008

MEMORY
009 010 011 012
013 014 015 016
017 018 019 020
021 022 023 024

Copyright © 2013, Daniel A. Menasce. All rights reserved. 11

Introduction
Memory Hierarchy Basics
CPU
Read 018

005 006 007 008

CACHE MISS! CACHE

000 001 003 004

005 006 007 008

MEMORY
009 010 011 012
013 014 015 016
017 018 019 020
021 022 023 024

Copyright © 2013, Daniel A. Menasce. All rights reserved. 12

Introduction
Memory Hierarchy Basics
CPU
018

005 006 007 008

CACHE
017 018 019 020
CACHE MISS!

017 018 019 020

000 001 003 004

005 006 007 008

MEMORY
009 010 011 012
013 014 015 016
017 018 019 020
021 022 023 024

Copyright © 2013, Daniel A. Menasce. All rights reserved. 13

Introduction
Memory Hierarchy Basics
  Placement of blocks in a cache
  Set associative: block is mapped into a set and the
block is placed anywhere in the set
  Finding a block: map block address to set and search
set (usually in parallel) to find block.

  n blocks in a set: n-way associative cache

  One block per set (n=1): direct-mapped cache
  One set per cache: fully associative cache

Copyright © 2012, Elsevier Inc. All rights reserved. 14

Introduction
Fully Associative Cache
Memory block 3 can go anywhere in the cache.

Block 0
Block 1
Block 2
CACHE
Block 3

Block 0 000 001 003 004

Block 1 005 006 007 008

MEMORY
Block 2 009 010 011 012
Block 3 013 014 015 016
Block 4 017 018 019 020
Block 5 021 022 023 024

Copyright © 2013, Daniel A. Menasce. All rights reserved. 15

Introduction
2-Way Associative Cache
Memory block 3 can only go into cache set (3 mod 2) = 1
in the cache.
Block 0
SET 0 Block 1
Block 2
CACHE
SET 1
Block 3

Block 0 000 001 003 004

Block 1 005 006 007 008

MEMORY
Block 2 009 010 011 012
Block 3 013 014 015 016
Block 4 017 018 019 020
Block 5 021 022 023 024

Copyright © 2013, Daniel A. Menasce. All rights reserved. 16

Introduction
Direct-mapped Cache
Memory block 5 can only go into cache block (5 mod 4) = 1
in the cache.
SET 0 Block 0
SET 1
SET 2
Block 1
Block 2
CACHE
SET 3 Block 3

Block 0 000 001 003 004

Block 1 005 006 007 008

MEMORY
Block 2 009 010 011 012
Block 3 013 014 015 016
Block 4 017 018 019 020
Block 5 021 022 023 024

Copyright © 2013, Daniel A. Menasce. All rights reserved. 17

Introduction
Processor/Cache Addressing
2-way associative cache.
MEMORY
Processor Set 0: B0 0000
0001
0010
V Cache data Set 1: B1
1 1 0 0 0011
B TAG 0100
Set 0: B2
1 11 01110010 10100011 0101
0 0110
1 10 11101111 10001001 Set 1: B3
0111

Set 0: B4
11101111 1000
1 10001001 1001
1010
Set 1: B5
CACHE 1011
01110010 1100
Set 0: B6
<2> <1> <1> 10100011 1101
1110
Block Set 1: B7
1111
TAG Index
Offset <8>
address
INDEX: selects set
TAG: used to check for cache hit if valid bit (VB) is 1.

Copyright © 2013, Daniel A. Menasce. All rights reserved. 18

= Set 0: B4
11101111 1000
1 10001001 1001
1010
CACHE HIT Set 1: B5
CACHE 1011
01110010 1100
Set 0: B6
<2> <1> <1> 10100011 1101
1110
Block Set 1: B7
1111
TAG Index
Offset <8>
address
INDEX: selects set
TAG: used to check for cache hit if valid bit (VB) is 1.

Copyright © 2013, Daniel A. Menasce. All rights reserved. 19

Introduction
Processor/Cache Addressing
2-way associative cache.
MEMORY
Processor Set 0: B0 0000
0001
0010
V Cache data Set 1: B1
1 1 0 0 0011
B TAG 0100
Set 0: B2
0 11 01001110 10110111 0101
0 0110
1 10 11101111 10001001 Set 1: B3
0111

= Set 0: B4
11101111 1000
1 10001001 1001
1010
CACHE MISS Set 1: B5
WHY?
CACHE 1011
01110010 1100
Set 0: B6
<2> <1> <1> 10100011 1101
1110
Block Set 1: B7
1111
TAG Index
Offset <8>
address
INDEX: selects set
TAG: used to check for cache hit if valid bit (VB) is 1.

Copyright © 2013, Daniel A. Menasce. All rights reserved. 20

Introduction
Processor/Cache Addressing
2-way associative cache.
MEMORY
Processor Set 0: B0 0000
0001
0010
V Cache data Set 1: B1
0 1 0 0 0011
B TAG 0100
Set 0: B2
0 11 01001110 10110111 0101
0 0110
1 10 11101111 10001001 Set 1: B3
0111

≠ Set 0: B4
11101111 1000
1 10001001 1001
1010
CACHE MISS Set 1: B5
CACHE 1011
01110010 1100
Set 0: B6
<2> <1> <1> 10100011 1101
1110
Block Set 1: B7
1111
TAG Index
Offset <8>
address
INDEX: selects set
TAG: used to check for cache hit if valid bit (VB) is 1.

Copyright © 2013, Daniel A. Menasce. All rights reserved. 21

Introduction
Cache Miss: Block Replacement
  Which Block to Replace?
  Direct-mapped cache:
  Only one option: replace the block in the location where the
incoming block has to go.
  Fully Associative or Set Associative
  Random: spreads allocation uniformly.
  Least Recently Used (LRU): the block replaced is the one
that has been unused for the longest time. Can be expensive
to implement in hardware. Pseudo-LRU provides an
approximation using bits associated to each set to record
when blocks in a set were accessed.
  First In First Out (FIFO): selects the oldest rather than the
LRU block.

Copyright © 2012, Elsevier Inc. All rights reserved. 22

Introduction
Memory Hierarchy Basics
  Writing to cache: two strategies
  Write-through
  Immediately update lower levels of hierarchy
  Write-back
  Only update lower levels of hierarchy when an updated block
is replaced
  Both strategies use write buffer to make writes
asynchronous

Copyright © 2012, Elsevier Inc. All rights reserved. 23

Introduction
Write-through
CPU
Write 018

Block 0 005 006 007 008

CACHE
Block 1 017 018 019 020
CACHE HIT!
Block 2
Block 3

018

000 001 003 004

005 006 007 008

MEMORY
009 010 011 012
013 014 015 016
017 018 019 020
021 022 023 024

Copyright © 2013, Daniel A. Menasce. All rights reserved. 24

Introduction
Write-back – Part I
CPU
Write 018

Block 0 005 006 007 008

CACHE
Block 1 017 018 019 020
CACHE HIT!
Block 2
Block 3

000 001 003 004

005 006 007 008

MEMORY
009 010 011 012
013 014 015 016
017 018 019 020
021 022 023 024

Copyright © 2013, Daniel A. Menasce. All rights reserved. 25

Introduction
Write-back – Part II
CPU
Read 024

Block 0 005 006 007 008

CACHE
Block 1 017 018 019 020
CACHE MISS!
Block 2
Block 3

018

000 001 003 004

005 006 007 008

MEMORY
009 010 011 012
013 014 015 016
017 018 019 020
021 022 023 024

Copyright © 2013, Daniel A. Menasce. All rights reserved. 26

Introduction
Write-back – Part III
CPU
BLOCK 1
REPLACEMENT
Read 024 024
Block 0 005 006 007 008

CACHE
Block 1 021 022 023 024
Block 2
Block 3

021 022 023 024

000 001 003 004

005 006 007 008

MEMORY
009 010 011 012
013 014 015 016
017 018 019 020
021 022 023 024

Copyright © 2013, Daniel A. Menasce. All rights reserved. 27

Introduction
Memory Hierarchy Basics
  Miss rate
  Fraction of cache access that result in a miss

  Causes of misses (3 C’s)

  Compulsory
  First reference to a block. Would happen even for infinite
caches.
  Capacity
  Blocks discarded and later retrieved
  Conflict
  Program makes repeated references to multiple addresses
from different blocks that map to the same location in the
cache

Copyright © 2012, Elsevier Inc. All rights reserved. 28

Introduction
Memory Hierarchy Basics

MemoryStallCycles = NumberOfMisses × MissPenalty =

Misses
= IC × × MissPenalty
Instruction
MemoryAccesses
= IC × × MissRate × MissPenalty
Instruction

  Note that speculative and multithreaded

processors
€ may execute other instructions during
a miss
  Reduces performance impact of misses

Copyright © 2012, Elsevier Inc. All rights reserved. 29

Introduction
Memory Performance
Example: Assume that the CPI of a computer is 1.0 when all memory
accesses hit in the cache. Data access instructions (loads and stores)
account for 50% of all instructions. The miss penalty is 25 cycles and the
miss rate is 4%. Assume that 70% instructions are cache hits.
What is the execution time in clock cycles (CC) and Instruction Counts (IC)?

Instruction Data Cache Hit/Miss Probability No. CCs

0.04 miss 0.014 1+25=26
0.5 memory
0.96 hit 0.336 1
0.7 Hit
1
0.5 no memory 0.35
1
0.04 miss 0.006 25+25+1=51
0.5 memory
0.96 hit 0.144 25+1=26
0.3 Miss
25+1=26
0.5 no memory 0.15
25+1=26

IC*(0.014*26+(0.336+0.35)*1+0.006*51+(0.144+0.15)*26)*CC = 9 * IC * CC

Copyright © 2013, Daniel A. Menasce All rights reserved. 30

Introduction
Memory Performance
Example: How would the execution time in the previous example
change if the miss rate were reduced to 2%?

Instruction Data Cache Hit/Miss Probability No. CCs

0.02 miss 0.007 1+25=26
0.5 memory
0.98 0.343 1
0.7 Hit
1
0.5 no memory 0.35
1
0.02 0.003 25+25+1=51
0.5 memory
0.98 0.147 25+1=26
0.3 Miss
25+1=26
0.5 no memory 0.15
25+1=26

IC*(0.007*26+(0.334+0.35)*1+0.003*51+(0.147+0.15)*26)*CC = 8.741 * IC * CC

Copyright © 2013, Daniel A. Menasce All rights reserved. 31

Introduction
Memory Performance
Example: What is the miss rate were kept at 4% but memory access
time were improved so that the miss penalty were reduced to 10 cycles?
Instruction Data Cache Hit/Miss Probability No. CCs
0.04 miss 0.014 1+10=11
0.5 memory
0.96 hit 0.336 1
0.7 Hit
1
0.5 no memory 0.35
1
0.04 miss 0.006 10+10+1=21
0.5 memory
0.96 hit 0.144 10+1=11
0.3 Miss
10+1=11
0.5 no memory 0.15
10+1=11

IC*(0.014*11+(0.336+0.35)*1+0.006*21+(0.144+0.15)*11)*CC = 4.2 * IC * CC

Avg. of 3.2 stall cycles/instruction

Copyright © 2013, Daniel A. Menasce All rights reserved. 32

Introduction
Memory Hierarchy Basics
  Six basic cache optimizations:
  Larger block size
  Reduces compulsory misses
  Increases capacity and conflict misses, increases miss penalty
  Larger total cache capacity to reduce miss rate
  Increases hit time, increases power consumption
  Higher associativity
  Reduces conflict misses
  Increases hit time, increases power consumption
  Higher number of cache levels (Multi-level caches)
  Reduces overall memory access time:
HitTime L1 + MissRate L1 × (HitTime L2 + MissRate L2 × MissPenalty L2 )
  Giving priority to read misses over writes
  Reduces miss penalty
€   Avoiding address translation in cache indexing
  Reduces hit time

Copyright © 2012, Elsevier Inc. All rights reserved. 33

Advanced Optimizations
Ten Advanced Optimizations
  Based on improvements on
  Hit time
  Miss rate
  Miss penalty
  Cache bandwidth
  Power consumption

Copyright © 2012, Elsevier Inc. All rights reserved. 34

Advanced Optimizations
Ten Advanced Optimizations (1)
  Small and simple first level caches
  Critical timing path:
  addressing tag memory, then
  comparing tags, then
  selecting correct set
  Direct-mapped caches can overlap tag compare and
transmission of data
  Lower associativity reduces power because fewer
cache lines are accessed

Copyright © 2012, Elsevier Inc. All rights reserved. 35

Advanced Optimizations
L1 Size and Associativity

Access time vs. size and associativity

Copyright © 2012, Elsevier Inc. All rights reserved. 36

Advanced Optimizations
L1 Size and Associativity

Energy per read vs. size and associativity

Copyright © 2012, Elsevier Inc. All rights reserved. 37

Introduction
Memory Performance Example
Determine if a 32-KB 4-way set associative L1 cache has a faster memory
access time than a 32-KB 2-way set associative L1 cache. Assume that
the miss penalty to L2 is 15 clock cycles and that the hit time for the
the faster L1 cache is 1 clock cycle. Ignore misses beyond the L2 cache
and assume miss rates for the L1 cache as below.

Miss rate for 2-way associative = 0.038

Miss rate for 4-way associative = 0.037

Copyright © 2013, Daniel A. Menasce All rights reserved. 38

Miss rate for 2-way associative = 0.038

Miss rate for 4-way associative = 0.037

Hit time 2-way associative = 1.

Avg. access time for 2-way associative = HitTime + MissRate * MissPenalty
= 1 + 0.038 * 15 = 1.57

The figure on slide 36 shows that the access time for the 4-way L1 cache is approximately 1.4 that of
2-way associative.
Avg. access time for 4-way associative = HitTime + MissRate * MissPenalty
= 1.4 * 1 + 0.037 * 15 = 1.955

Copyright © 2013, Daniel A. Menasce All rights reserved. 39

Advanced Optimizations
Way Prediction (2)
  To improve hit time, predict the way to pre-set
mux (i.e., the block within the set of next cache
accesses).
  Mis-prediction gives longer hit time
  Prediction accuracy
  > 90% for two-way
  > 80% for four-way
  I-cache has better accuracy than D-cache
  First used on MIPS R10000 in mid-90s
  Used on ARM Cortex-A8

Copyright © 2012, Elsevier Inc. All rights reserved. 40

Advanced Optimizations
Pipelining Cache (3)
  Pipeline cache access to improve bandwidth
  Examples:
  Pentium: 1 cycle
  Pentium Pro – Pentium III: 2 cycles
  Pentium 4 – Core i7: 4 cycles

  Increases branch mis-prediction penalty

  Makes it easier to increase associativity

Copyright © 2012, Elsevier Inc. All rights reserved. 41

Advanced Optimizations
Nonblocking Caches (4)
  Allow hits before
previous misses
complete
  “Hit under miss”
  “Hit under multiple
miss”
  L2 must support this
  In general,
processors can hide
L1 miss penalty but
not L2 miss penalty

Copyright © 2012, Elsevier Inc. All rights reserved. 42

Advanced Optimizations
Multibanked Caches (5)
  Organize cache as independent banks to
support simultaneous access
  ARM Cortex-A8 supports 1-4 banks for L2
  Intel i7 supports 4 banks for L1 and 8 banks for L2

  Interleave banks according to block address

Copyright © 2012, Elsevier Inc. All rights reserved. 43

Advanced Optimizations
Critical Word First, Early Restart (6)
  Critical word first
  Request missed word from memory first
  Send it to the processor as soon as it arrives
  Early restart
  Request words in normal order
  Send missed work to the processor as soon as it
arrives

  Effectiveness of these strategies depends on

block size and likelihood of another access to
the portion of the block that has not yet been
fetched

Copyright © 2012, Elsevier Inc. All rights reserved. 44

Advanced Optimizations
Merging Write Buffer (7)
  When storing to a block that is already pending in the
write buffer, update write buffer
  Reduces stalls due to full write buffer
  Do not apply to I/O addresses

No write
buffering

Write buffering

Copyright © 2012, Elsevier Inc. All rights reserved. 45

Advanced Optimizations
Compiler Optimizations (8)
  Loop Interchange
  Swap nested loops to access memory in
sequential order

  Blocking (see examples on pages 89-90)

  Instead of accessing entire rows or columns,
subdivide matrices into blocks
  Requires more memory accesses but improves
locality of accesses

Advanced Optimizations
Loop Interchange
  Assume [5000,100] array stored in row major
order (i.e., laid out by rows)
  Before: x[0,0], x[1,0], x[2,0] …
for (j=0; j < 100; j = j + 1)
for (i=0; i < 5000; i = i + 1)
x[i][j] = 2 * x[i][j];
  After: x[0,0], x[0,1], x[0,1], …
for (i=0; i < 5000; i = i + 1)
for (j=0; j < 100; j = j + 1)
x[i][j] = 2 * x[i][j];
  Which one produces less misses?

Advanced Optimizations
Hardware Prefetching (9)
  Fetch two blocks on miss (include next
sequential block)

Pentium 4 Pre-fetching (subset of SPEC 2000 programs)

Advanced Optimizations
Compiler Prefetching (10)
  Insert prefetch instructions before data is needed
  Non-faulting: prefetch doesn’t cause exceptions

  Register prefetch
  Loads data into register

  Cache prefetch
  Loads data into cache

  Combine with loop unrolling and software

pipelining

Advanced Optimizations
Summary

Memory Technology
Main Memory Technology
  Performance metrics
  Latency is concern of cache
  Bandwidth is concern of multiprocessors and I/O
  Access time
  Time between read request and when desired word
arrives
  Cycle time
  Minimum time between unrelated requests to memory

  DRAM used for main memory, SRAM used

for cache

Memory Technology
Memory Technology
  SRAM
  Does not require refresh
  Requires low power to retain bit
  Requires 6 transistors/bit
  DRAM (used for main memory)
  Must be re-written after being read
  Must also be periodically refreshed
  Every ~ 8 ms
  Each row can be refreshed simultaneously by reading
the row
  One transistor/bit
  Address lines are multiplexed:
  Upper half of address: row access strobe (RAS)
  Lower half of address: column access strobe (CAS)

Memory Technology
Memory Technology
  Access time
  Time between a memory request and when
the desired word arrives.

  Cycle time
  Minimum time between unrelated request
to memory
  SRAMs don’t need to refresh => cycle
time and access times are very close.

Memory Technology
Memory Technology
  Amdahl:
  Memory capacity should grow linearly with processor speed
  Unfortunately, memory capacity and speed has not kept
pace with processors

  Some optimizations:
  Multiple accesses to same row (using row buffer)
  Synchronous DRAM (SDRAM)
  Added clock to DRAM interface
  Burst mode with critical word first
  Wider interfaces (DDR2 and DDR3)
  Double data rate (DDR): transfer data at the rising and falling
edges of the clock cycle.
  Multiple independent banks on each DRAM device =>
address = (bank #, row #, column #)

Memory Technology
Memory Optimizations

Memory Technology
Memory Optimizations
  DDR:
  DDR2
  Lower power (2.5 V -> 1.8 V)
  Higher clock rates (266 MHz, 333 MHz, 400 MHz)
  DDR3
  1.5 V
  Maximum clock frequency: 800 MHz
  DDR4
  1-1.2 V
  Maximum clock frequency: 1600 MHz

  GDDR5 is graphics memory based on DDR3

  Used in GPUs.

Memory Technology
Memory Optimizations
  Graphics memory:
  Achieve 2-5 X bandwidth per DRAM vs. DDR3
  Wider interfaces (32 vs. 16 bit)
  Higher clock rate
  Possible because they are attached via soldering instead of
socketted DIMM modules

  Reducing power in SDRAMs:

  Lower voltage (1.35V or 1.5V)
  Low power mode (ignores clock, continues to
refresh)

Memory Technology
Memory Power Consumption

Micron 1.5 V 2Gb DDR3-1066 SDRAM

Memory Technology
Flash Memory
  Type of EEPROM (Electronically Erasable
Programmable Read-Only Memory)
  Must be erased (in blocks) before being
overwritten
  Non volatile
  Limited number of write cycles
  Cheaper than SDRAM, more expensive than
disk
  Slower than SRAM, faster than disk

Memory Technology
Memory Dependability
  Memory is susceptible to cosmic rays
  Soft errors: dynamic errors
  Detected and fixed by error correcting codes
(ECC)
  Hard errors: permanent errors
  Use sparse rows to replace defective rows

  Chipkill: a RAID-like error recovery

technique:
  Data and ECC is distributed so that loss of a
single memory chip can be tolerated and its
content reconstructed from the others.

Virtual Memory and Virtual Machines
Virtual Memory Basics
  The address space of a process is divided
into equal sized pages (e.g., 4KB in size).
  Main memory is divided into page frames that
can hold one page each.
  The size of a process’ address space can be
larger than main memory.
  At any time, some pages of a process may
not be in main memory. All pages are on
the paging disk.
  When a page that is not in memory is
referenced, it must be brought from disk to
memory. This is a page fault.
Copyright © 2013, D.A. Menasce. All rights reserved. 62
Virtual Memory and Virtual Machines
Virtual Memory Basics
  If all page frames are occupied when a page
fault occurs, a page has to evicted from
memory using a page replacement algorithm.
  Least Recently Used (LRU) is an example
of a PRA.
  There is a Page Table per process,
maintained by the operating system.
  Translation from a virtual address to a
physical address is done by the
hardware.
  A page fault invokes the operating
system.
Copyright © 2013, D.A. Menasce. All rights reserved. 63
Virtual Memory and Virtual Machines
Virtual Memory Basics
Virtual Address
A Page 0
C Page Frame 0
B Page 1
VPN Page Offset P Page Frame 1 C Page 2
N Page Frame 2 D Page 3
B Page Frame 3 E Page 4
Main memory F Page 5
G Page 6
H Page 7
Physical Address I Page 8
PFN Page Offset J Page 9
K Page 10
VPN Frame Number L Page 11
Page 0 M Page 12
Page 1 3 N Page 13
Page 2 0 0 Page 14
… … P Page 15
Paging Disk
Page 13 2
(process address
Page 14
Page 15 1 space)
Page Table

Virtual Memory and Virtual Machines
Virtual Memory Basics
Virtual Address
VPN A Page 0
C Page Frame 0
B Page 1
2 Page Offset P Page Frame 1 C Page 2
N Page Frame 2 D Page 3
B Page Frame 3 E Page 4
Main memory F Page 5
G Page 6
H Page 7
Physical Address I Page 8
0 Page Offset J Page 9
K Page 10
VPN Frame Number L Page 11
Page 0 - M Page 12
Page 1 3 N Page 13
Page 2 0 0 Page 14
… … P Page 15
Paging Disk
Page 13 2
(process address
Page 14 -
Page 15 1 space)
Page Table

Virtual Memory and Virtual Machines
Virtual Memory Basics
Virtual Address
VPN A Page 0
C Page Frame 0
B Page 1
14 Page Offset P Page Frame 1 C Page 2
N Page Frame 2 D Page 3
B Page Frame 3 E Page 4
Main memory F Page 5
G Page 6
H Page 7
Physical Address I Page 8
Page Offset J Page 9
K Page 10
VPN Frame Number L Page 11
Page 0 - M Page 12
Page 1 3 N Page 13
Page 2 0 0 Page 14
… … P Page 15
Paging Disk
Page 13 2
(process address
Page 14 - Page Fault!
Page 15 1 space)
Page Table

Virtual Memory and Virtual Machines
Virtual Memory Basics
Virtual Address
VPN A Page 0
C Page Frame 0
B Page 1
14 Page Offset P Page Frame 1 C Page 2
O Page Frame 2 D Page 3
B Page Frame 3 E Page 4
Main memory F Page 5
G Page 6
H Page 7
Physical Address I Page 8
2 Page Offset J Page 9
K Page 10
VPN Frame Number L Page 11
Page 0 - M Page 12
Page 1 3 N Page 13
Page 2 0 0 Page 14
… … P Page 15
Paging Disk
Page 13 -
(process address
Page 14 2
Page 15 1 space)
Page Table

Virtual Memory and Virtual Machines
Virtual Memory Example
A computer system has 4 GB of main
memory and a virtual address space equal to
64 GB. Each page is 8 KB long.
How many pages are there in the virtual
address space?
How many page frames are there in memory?
How many bits are needed for the VPN field
of an address?
How many bits are needed for the page frame
bit of an address?
How many bits are needed for the page offset
field?
Copyright © 2013, D.A. Menasce. All rights reserved. 68
Virtual Memory and Virtual Machines
Virtual Memory Example
A computer system has 4 GB of main
memory and a virtual address space equal to
64 GB. Each page is 8 KB long.
How many pages are there in the virtual
address space?

64 * 2^30 / (8 * 2^10) = 8 * 2^20 = 2^23 pages

Virtual Memory and Virtual Machines
Virtual Memory Example
A computer system has 4 GB of main
memory and a virtual address space equal to
64 GB. Each page is 8 KB long.

How many page frames are there in memory?

4 * 2^30 / (8 * 2^10) = 4 * 2^17 = 2^15 page

frames

Virtual Memory and Virtual Machines
Virtual Memory Example
A computer system has 4 GB of main
memory and a virtual address space equal to
64 GB. Each page is 8 KB long.

How many bits are needed for the VPN field

of an address?

2^23 pages => 23 bits

Virtual Memory and Virtual Machines
Virtual Memory Example
A computer system has 4 GB of main
memory and a virtual address space equal to
64 GB. Each page is 8 KB long.

How many bits are needed for the page frame

bit of an address?

2^15 page frames => 15 bits

Virtual Memory and Virtual Machines
Virtual Memory Example
A computer system has 4 GB of main
memory and a virtual address space equal to
64 GB. Each page is 8 KB long.

How many bits are needed for the page offset

field?

If a word is 8 bytes long, then each page has

2^3 * 2^10 / 2^3 = 2^10 words

=> 10 bits are needed for the offset

Virtual Memory and Virtual Machines
Virtual Memory
  Protection via virtual memory
  Keeps processes in their own memory space

  Role of architecture:
  Provide user mode and supervisor mode
  Protect certain aspects of CPU state
  Provide mechanisms for switching between user
mode and supervisor mode
  Provide mechanisms to limit memory accesses
  Provide TLB to translate addresses

Virtual Memory and Virtual Machines
Virtual Machines
  Supports isolation and security
  Sharing a computer among many unrelated users
  Enabled by raw speed of processors, making the
overhead more acceptable

  Allows different ISAs and operating systems to be

presented to user programs
  “System Virtual Machines”
  SVM software is called “virtual machine monitor” or
“hypervisor”
  Individual virtual machines run under the monitor are called
“guest VMs”

Virtual Memory and Virtual Machines
Impact of VMs on Virtual Memory
  Each guest OS maintains its own set of page
tables
  VMM adds a level of memory between physical
and virtual memory called “real memory”
  VMM maintains shadow page table that maps
guest virtual addresses to physical addresses
  Requires VMM to detect guest’s changes to its own page
table
  Occurs naturally if accessing the page table pointer is a
privileged operation

Computer Architecture: Cache Design
No ratings yet
Computer Architecture: Cache Design
61 pages
CS2115 Chapter-6
No ratings yet
CS2115 Chapter-6
45 pages
Memory Hierarchy in Computer Architecture
No ratings yet
Memory Hierarchy in Computer Architecture
48 pages
Memory Hierarchy Design Guide
No ratings yet
Memory Hierarchy Design Guide
115 pages
Memory Cache
No ratings yet
Memory Cache
18 pages
Lecture 16
No ratings yet
Lecture 16
22 pages
Unit Iv
No ratings yet
Unit Iv
61 pages
Lecture 5: Memory Hierarchy and Cache Traditional Four Questions For Memory Hierarchy Designers
No ratings yet
Lecture 5: Memory Hierarchy and Cache Traditional Four Questions For Memory Hierarchy Designers
10 pages
Memory Hierarchy Design Guide
No ratings yet
Memory Hierarchy Design Guide
54 pages
15IF11 Multicore B
No ratings yet
15IF11 Multicore B
36 pages
Memory Systems for Engineers
No ratings yet
Memory Systems for Engineers
77 pages
Cache 1 54
No ratings yet
Cache 1 54
54 pages
Lecture 6 Memory 2023
No ratings yet
Lecture 6 Memory 2023
66 pages
UNIT2 Cahe-Opt
No ratings yet
UNIT2 Cahe-Opt
134 pages
Computer Architecture: Cache Memory
No ratings yet
Computer Architecture: Cache Memory
51 pages
Address Field Breakdown for Cache System
No ratings yet
Address Field Breakdown for Cache System
55 pages
5 Memory Hierarchy
No ratings yet
5 Memory Hierarchy
99 pages
Memory Hierarchy and Cache Design
No ratings yet
Memory Hierarchy and Cache Design
53 pages
Chapter 4 Memory Organization Lecture
No ratings yet
Chapter 4 Memory Organization Lecture
54 pages
CAO - Lecutre7 Cache Memory
100% (1)
CAO - Lecutre7 Cache Memory
39 pages
6.module 2 - Part 2
No ratings yet
6.module 2 - Part 2
39 pages
Chapter5 PDF
No ratings yet
Chapter5 PDF
95 pages
05) Cache Memory Introduction
No ratings yet
05) Cache Memory Introduction
20 pages
Memory Hierarchy - Introduction: Cost Performance of Memory Reference
No ratings yet
Memory Hierarchy - Introduction: Cost Performance of Memory Reference
52 pages
Computer Organization and Architecture: Cache Memory
100% (1)
Computer Organization and Architecture: Cache Memory
57 pages
Coa PPT
No ratings yet
Coa PPT
158 pages
Cache Memory
No ratings yet
Cache Memory
51 pages
Cache Memory Characteristics Explained
No ratings yet
Cache Memory Characteristics Explained
57 pages
Advanced Computer Architecture: BY Dr. Radwa M. Tawfeek
No ratings yet
Advanced Computer Architecture: BY Dr. Radwa M. Tawfeek
32 pages
Chapter 5 - Memory
No ratings yet
Chapter 5 - Memory
44 pages
Lecture 13 - Introduction To Cache
No ratings yet
Lecture 13 - Introduction To Cache
47 pages
Computer Org and Arch: R.Magesh
No ratings yet
Computer Org and Arch: R.Magesh
48 pages
Memory Hierarchy Essentials
No ratings yet
Memory Hierarchy Essentials
60 pages
13 - Large and Fast Exploiting Memory Hierarchy Final
No ratings yet
13 - Large and Fast Exploiting Memory Hierarchy Final
118 pages
Lec 4
No ratings yet
Lec 4
31 pages
11 Cache Memory
No ratings yet
11 Cache Memory
40 pages
CMP3010L08 Memory
No ratings yet
CMP3010L08 Memory
45 pages
Introduction To Cache Memory: Lecture 4A
No ratings yet
Introduction To Cache Memory: Lecture 4A
31 pages
Cache PPT
No ratings yet
Cache PPT
38 pages
04 - Cache Memory
No ratings yet
04 - Cache Memory
61 pages
10 Cache
No ratings yet
10 Cache
28 pages
CMP3010L09 MemoryII
No ratings yet
CMP3010L09 MemoryII
39 pages
Unit 4
No ratings yet
Unit 4
72 pages
Chapter 7
No ratings yet
Chapter 7
23 pages
55-Types of Caches, Caches Misses,-04!03!2025
No ratings yet
55-Types of Caches, Caches Misses,-04!03!2025
64 pages
Lec8 Memory
No ratings yet
Lec8 Memory
17 pages
Characteristics Location Capacity Unit of Transfer Access Method Performance Physical Type Physical Characteristics Organisation
No ratings yet
Characteristics Location Capacity Unit of Transfer Access Method Performance Physical Type Physical Characteristics Organisation
53 pages
Cache Memory Essentials
No ratings yet
Cache Memory Essentials
52 pages
25 e 50 Beb 5 Aad 8 F 60
No ratings yet
25 e 50 Beb 5 Aad 8 F 60
49 pages
Computer Architecture: Assoc. Prof. Nguyễn Trí Thành, Phd
No ratings yet
Computer Architecture: Assoc. Prof. Nguyễn Trí Thành, Phd
55 pages
Cache1 2
No ratings yet
Cache1 2
30 pages
Ch01 Part3 Caches
No ratings yet
Ch01 Part3 Caches
32 pages
Computer Architecture: Memory Organization
No ratings yet
Computer Architecture: Memory Organization
65 pages
Lecture 8
No ratings yet
Lecture 8
33 pages
Cache Memory Characteristics Explained
No ratings yet
Cache Memory Characteristics Explained
46 pages
Cache Memory: A Safe Place For Hiding or Storing Things
100% (1)
Cache Memory: A Safe Place For Hiding or Storing Things
34 pages
Explain Three Causes of Unemployment and Consider Which Cause Is Likely To Be Most Damaging To A High-Income Economy
No ratings yet
Explain Three Causes of Unemployment and Consider Which Cause Is Likely To Be Most Damaging To A High-Income Economy
2 pages
Mandal Level Officers Mandal Office Staff and Village Secretariat Functionaries Details of Annamayya District.
No ratings yet
Mandal Level Officers Mandal Office Staff and Village Secretariat Functionaries Details of Annamayya District.
3 pages
Title: Design Note of Parapet Wall of 1.1 M Height: National Highway Authority of India
100% (1)
Title: Design Note of Parapet Wall of 1.1 M Height: National Highway Authority of India
6 pages
BK July 2025
No ratings yet
BK July 2025
11 pages
D&D Consent Form SM New and Improved 2024
No ratings yet
D&D Consent Form SM New and Improved 2024
3 pages
Halloween Lesson Plan Overview
No ratings yet
Halloween Lesson Plan Overview
4 pages
Algebra Lineal: Libros y Soluciones
No ratings yet
Algebra Lineal: Libros y Soluciones
2 pages
SCC8000A Technical Parameters
No ratings yet
SCC8000A Technical Parameters
1 page
Military Pain Relief Acupuncture
100% (7)
Military Pain Relief Acupuncture
44 pages
Present Continuous Tense Examples
No ratings yet
Present Continuous Tense Examples
3 pages
Chavani Company Folder
No ratings yet
Chavani Company Folder
11 pages
Mmwave Radar Based Approach For Pedestrian
No ratings yet
Mmwave Radar Based Approach For Pedestrian
2 pages
Advanced VSI Inverter Control
No ratings yet
Advanced VSI Inverter Control
8 pages
7 Essential Innovation Habits
No ratings yet
7 Essential Innovation Habits
2 pages
Catalog Bioline International Version
No ratings yet
Catalog Bioline International Version
70 pages
Amplitude Modulation &demodulation: Experiment No. 3
No ratings yet
Amplitude Modulation &demodulation: Experiment No. 3
3 pages
Syntax Errors During/after Importing A Transport Request
No ratings yet
Syntax Errors During/after Importing A Transport Request
3 pages
The Fateful: A Warlock Patron Guide
No ratings yet
The Fateful: A Warlock Patron Guide
5 pages
Anagram Problem
No ratings yet
Anagram Problem
17 pages
Electronic System Design Lab Guide
No ratings yet
Electronic System Design Lab Guide
30 pages
Seguin Form Board Test (SFB Seguin 1907) What Does SFB Measure?
100% (1)
Seguin Form Board Test (SFB Seguin 1907) What Does SFB Measure?
3 pages
8.environmental Safeguards Specialist
No ratings yet
8.environmental Safeguards Specialist
4 pages
Fish
No ratings yet
Fish
32 pages
ZXV Launch Presentation - Jan, 2020
No ratings yet
ZXV Launch Presentation - Jan, 2020
22 pages
Cheat Sheet
No ratings yet
Cheat Sheet
2 pages
ANALYSIS O Mio Babbino Caro MFQF
No ratings yet
ANALYSIS O Mio Babbino Caro MFQF
6 pages
CAT C32 RNY Series Maintenance Interval Schedule
0% (1)
CAT C32 RNY Series Maintenance Interval Schedule
3 pages
Schedule Q
No ratings yet
Schedule Q
59 pages
BLW29
No ratings yet
BLW29
12 pages
Coconut-Lemon Beverage Optimization
No ratings yet
Coconut-Lemon Beverage Optimization
7 pages

Memory Hierarchy Design

Uploaded by

Memory Hierarchy Design

Uploaded by

Computer Architecture

A Quantitative Approach, Fifth Edition

Copyright © 2012, Elsevier Inc. All rights reserved. 1

Copyright © 2012, Elsevier Inc. All rights reserved. 2

Copyright © 2012, Elsevier Inc. All rights reserved. 3

Copyright © 2012, Elsevier Inc. All rights reserved. 4

 12.8 billion 128-bit instruction references

 DRAM bandwidth is only 6% of this (25 GB/s)

Copyright © 2012, Elsevier Inc. All rights reserved. 5

Copyright © 2012, Elsevier Inc. All rights reserved. 6

Copyright © 2012, Elsevier Inc. All rights reserved. 7

CACHE MISS! CACHE

000 001 003 004

Copyright © 2013, Daniel A. Menasce. All rights reserved. 8

005 006 007 008

005 006 007 008

000 001 003 004

Copyright © 2013, Daniel A. Menasce. All rights reserved. 9

005 006 007 008

CACHE HIT! CACHE

000 001 003 004

Copyright © 2013, Daniel A. Menasce. All rights reserved. 10

005 006 007 008

000 001 003 004

Copyright © 2013, Daniel A. Menasce. All rights reserved. 11

005 006 007 008

CACHE MISS! CACHE

000 001 003 004

Copyright © 2013, Daniel A. Menasce. All rights reserved. 12

005 006 007 008

017 018 019 020

000 001 003 004

Copyright © 2013, Daniel A. Menasce. All rights reserved. 13

 n blocks in a set: n-way associative cache

Copyright © 2012, Elsevier Inc. All rights reserved. 14

Block 0 000 001 003 004

Copyright © 2013, Daniel A. Menasce. All rights reserved. 15

Block 0 000 001 003 004

Copyright © 2013, Daniel A. Menasce. All rights reserved. 16

Block 0 000 001 003 004

Copyright © 2013, Daniel A. Menasce. All rights reserved. 17

Copyright © 2013, Daniel A. Menasce. All rights reserved. 18

Copyright © 2013, Daniel A. Menasce. All rights reserved. 19

Copyright © 2013, Daniel A. Menasce. All rights reserved. 20

Copyright © 2013, Daniel A. Menasce. All rights reserved. 21

Copyright © 2012, Elsevier Inc. All rights reserved. 22

Copyright © 2012, Elsevier Inc. All rights reserved. 23

Block 0 005 006 007 008

000 001 003 004

Copyright © 2013, Daniel A. Menasce. All rights reserved. 24

Block 0 005 006 007 008

000 001 003 004

Copyright © 2013, Daniel A. Menasce. All rights reserved. 25

Block 0 005 006 007 008

000 001 003 004

Copyright © 2013, Daniel A. Menasce. All rights reserved. 26

021 022 023 024

000 001 003 004

Copyright © 2013, Daniel A. Menasce. All rights reserved. 27

 Causes of misses (3 C’s)

Copyright © 2012, Elsevier Inc. All rights reserved. 28

MemoryStallCycles = NumberOfMisses × MissPenalty =

 Note that speculative and multithreaded

Copyright © 2012, Elsevier Inc. All rights reserved. 29

Instruction Data Cache Hit/Miss Probability No. CCs

Copyright © 2013, Daniel A. Menasce All rights reserved. 30

Instruction Data Cache Hit/Miss Probability No. CCs

Copyright © 2013, Daniel A. Menasce All rights reserved. 31

Avg. of 3.2 stall cycles/instruction

Copyright © 2013, Daniel A. Menasce All rights reserved. 32

Copyright © 2012, Elsevier Inc. All rights reserved. 33

Copyright © 2012, Elsevier Inc. All rights reserved. 34

Copyright © 2012, Elsevier Inc. All rights reserved. 35

Access time vs. size and associativity

Copyright © 2012, Elsevier Inc. All rights reserved. 36

Energy per read vs. size and associativity

  12.8 billion 128-bit instruction references

  DRAM bandwidth is only 6% of this (25 GB/s)

  n blocks in a set: n-way associative cache

  Causes of misses (3 C’s)

  Note that speculative and multithreaded

  Increases branch mis-prediction penalty

  Interleave banks according to block address

  Effectiveness of these strategies depends on

  Blocking (see examples on pages 89-90)

  Combine with loop unrolling and software

  DRAM used for main memory, SRAM used

  GDDR5 is graphics memory based on DDR3

  Reducing power in SDRAMs:

  Chipkill: a RAID-like error recovery

  Allows different ISAs and operating systems to be