0% found this document useful (0 votes)
52 views6 pages

Coa Memory Hierarchy Notes

The document discusses memory hierarchy in computer architecture, emphasizing the importance of cache design, including concepts like hit/miss rates and associativity. It outlines the specifications of modern computers and the types of memory technologies used, such as SRAM, DRAM, and Flash. Additionally, it covers the principles of locality and cache access mechanisms, providing examples of cache algorithms and their implementation in computer systems.

Uploaded by

Aditi chaudhary
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
52 views6 pages

Coa Memory Hierarchy Notes

The document discusses memory hierarchy in computer architecture, emphasizing the importance of cache design, including concepts like hit/miss rates and associativity. It outlines the specifications of modern computers and the types of memory technologies used, such as SRAM, DRAM, and Flash. Additionally, it covers the principles of locality and cache access mechanisms, providing examples of cache algorithms and their implementation in computer systems.

Uploaded by

Aditi chaudhary
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

CS521 CSE IITG 11/23/2012

• Memory Hierarchy
• Hit/Miss, IPC
• Cache: Set, Line size, Associativity

No Class on Friday (07‐SEP‐2012)

A Sahu 1 A Sahu 2

Typical specs of a computer today Memory technologies


http://www.flipkart.com
• Semiconductor
Dell XPS 14 – Registers
14.0” WLED (1366 x 768), Intel Core i7‐740QM – SRAM Random Access
4 GB DDR3 1333 MHz, 500 GB, Windows 7 Home Premium, 8x – DRAM
CD/DVD burner (dual layer DVD+/‐R drive), NVIDIA GeForce – FLASH Array : x=A[i]
GT 425M • Magnetic
M ti
– FDD
2 USB 2.0, HDMI, eSATA, 6‐cell lithium‐ion – HDD
Built‐in 2.0‐megapixel HD Price: 48,300 • Optical Random (Seq. Seek to Sector)
HP TouchSmart TM2 Series TM2‐2102TU + sequential
– CD
(Modern Argento), * Intel Core i3, * 3 GB DDR3 RAM, Array + Linked List
– DVD
* 12.1 Inch Screen, * Windows 7 Home Premium
Price : Rs. 47,199.00
A Sahu 3 A Sahu 4

Memory Hierarchy CPU Design with Memory Hierachy


Instruction
• Smaller is Faster ‐ Bigger is Slower Memory Data

Instruction Reg# Register


Address
• Places of Cash/Money PC
Address Reg# FILE
ALU Data
Reg# Memory
D t
Data

BANK HDD
Register Instruction Register
Home Locker DDR RAM PC
FILE Memory FILE
PC
CPU Unified Mem
Purse Cache CPU
L2 ory
Data
ALU IL1 DL1 ALU
Pocket Regs Memory

Speed Storage
A Sahu 5 A Sahu 6

A Sahu 1
CS521 CSE IITG 11/23/2012

Hierarchical structure
• Programmers want unlimited amounts of memory with low
latency
Speed CPU Size Cost/Bit
• Fast memory technology is more expensive per bit than
slower memory
• Solution: organize memory system into a hierarchy Fastest Memory
Smallest Highest
– Entire addressable memoryy spacep g
available in largest,
slowest memory
Memory
– Incrementally smaller and faster memories, each
containing a subset of the memory below it, proceed in
steps up toward the processor
• Temporal and spatial locality insures that nearly all references Memory
Memory
can be found in smaller memories
Slowest Biggest Lowest
– Gives the allusion of a large, fast memory being presented
to the processor
A Sahu 7 A Sahu 8

• Memory hierarchy design becomes more crucial with


recent multi‐core processors:
– Aggregate peak bandwidth grows with # cores:
• Intel Core i7 can generate two references per core per
clock
• Four cores and 3.2 GHz clock
– 25.6 billion
b ll 64‐bit
b data
d references/second
f / d+
– 12.8 billion 128‐bit instruction references
– = 409.6 GB/s!
• DRAM bandwidth is only 6% of this (25 GB/s)
• Requires:
– Multi‐port, pipelined caches
– Two levels of cache per core
– Shared third‐level cache on chip
A Sahu 9 A Sahu 10
Introduction

Performance and Power


• High‐end microprocessors have >10 MB on‐chip 100,000
cache
10,000
– Consumes large amount of area and power budget
1000
Processor
Performance

Processor‐Memory
100
Performance e Gap

10
Memory
1
1980 1985 1990 1995 2000 2005 2010
Year

A Sahu 11 A Sahu 12

A Sahu 2
CS521 CSE IITG 11/23/2012

Data transfer between levels Principle of locality


Processor
Processor hit • Temporal Locality
access – references repeated in time
miss
• Spatial Locality
Cache
– references repeated in space
Data are transferred – Special case: Sequential Locality
unit of transfer = block for(i=0;i<100;i++){ for(T=0;T<80;T++){
Memory A[i] += sqrt(i); for(i=0;i<10;i++)
} // 1D SPLocality A[i] +=M[T]*i;
Access A[i], near future }
will Access A[i+1], A[i+2].. A[i] repeated after some Time
A Sahu 13 A Sahu 14

• Address is divided in to three part : TAG, Index, Cache Example : Algorithmic


Offset 212012 212011
– Offset = Address % Line Size,
– Index = (Address/LineSize)%NumSet 0 0
2120 1
– TAG = Address/(LineSize*NumSet) 2 2
• If TAG matches with ExistingTAG then HIT else 3 3
4 4
miss if (TAG==CACHE[Index].TAG) 5 5
Cache HIT 6 6
else Cache MISS 7 7
8 8
• Assume LS=10, NumSet=100, Address 2067432 9 9

– Offset = 2, Index =43, TAG=2067 Tag index Line


Decimal Example, Direct mapped, Line size 10
A Sahu 15 A Sahu 16

Cache Example : Algorithmic Cache Example : Algorithmic


212335 212012 212011 414368 212335 212012 212011

0 0 0 0
2120 1 2120 1
2 2 2 2
2123 3 2123 3
4 4 4 4
5 5 5 5
6 6 4143 6
7 7 7 7
8 8 8 8
9 9 9 9
Tag index Line Tag index Line
Decimal Example, Direct mapped, Line size 10 Decimal Example, Direct mapped, Line size 10
A Sahu 17 A Sahu 18

A Sahu 3
CS521 CSE IITG 11/23/2012

Cache Example : Algorithmic Cache Example : Algorithmic


414318 414365 212335 212012 212011 414318 414365 212335 212012 212011

0 0 0 0
2120 4143 1
2 2 2 2
2123 3 2123 3
4 4 4 4
5 5 5 5
4143 6 4143 6
7 7 7 7
8 8 8 8
9 9 9 9
Tag index Line Tag index Line
Decimal Example, Direct mapped, Line size 10 Decimal Example, Direct mapped, Line size 10
A Sahu 19 A Sahu 20

Cache Example : Algorithmic Cache Example : Algorithmic


212012 212011 212335 212012 212011

0 0 0 0 0 0 0 0
2120 1 1 1 2120 1 1 1
2 2 2 2 2 2 2 2
3 3 3 3 2123 3 3 3
4 4 4 4 4 4 4 4
5 5 5 5 5 5 5 5
6 6 6 6 6 6 6 6
7 7 7 7 7 7 7 7
8 8 8 8 8 8 8 8
9 9 9 9 9 9 9 9
Tag index Line Tag index Line Tag index Line Tag index Line

Decimal Example, TwoA way


Sahu
Asso, Line size 10 21
Decimal Example, TwoA way
Sahu
Asso, Line size 10 22

Cache Example : Algorithmic Cache Example : Algorithmic


414368 212335 212012 212011 414318 414365 212335 212012 212011

0 0 0 0 0 0 0 0
2120 1 1 1 2120 1 4143 1
2 2 2 2 2 2 2 2
2123 3 3 3 2123 3 3 3
4 4 4 4 4 4 4 4
5 5 5 5 5 5 5 5
4143 6 6 6 4143 6 6 6
7 7 7 7 7 7 7 7
8 8 8 8 8 8 8 8
9 9 9 9 9 9 9 9
Tag index Line Tag index Line Tag index Line Tag index Line

Decimal Example, TwoA way


Sahu
Asso, Line size 10 23
Decimal Example, TwoA way
Sahu
Asso, Line size 10 24

A Sahu 4
CS521 CSE IITG 11/23/2012

Cache Size
• No of Set (Depend on index field) • Simple Hashing: Direct Map Cache
• Associatively (How many Tag) – Example: Array Direct/Random
• Line size (No of Addressable units/byte in a U – int A[10], each can store one element Access to
S – Data stored in Address%10 location Element
line) A
0 0 0 0 0 0 0 0 B • Array of List T
2120 1 4143 1 2120 1 4143 1 I I
2 2 2 2 2 2 2 2
L
– Int LA[10], each can store a list of element M
2123 3 3 3 2123 3 3 3 MIXED
4 4 4 4 4 4 4 4 I – Data stored in List of (Address%10)th location E
5 5 5 5 5 5 5 5
4143 6 6 6 4143 6 6 6 T – List size is limited in Set Associative Cache
7 7 7 7 7 7 7 7 Y
8
9
8
9
8
9
8
9
8
9
8
9
8
9
8
9
• List of Element
Tag index Line Tag index Line Tag index Line Tag index Line – Full Associative Cache Serial/Associative Access
• Cache Size = Nset X Associativity X LineSize – All data stored in one list to Element
= 10 x 4 x 10 = 400 Byte
A Sahu 25 A Sahu 26

Cache: Placement Addressing Cache


• Direct Mapped 0
2120
0
1
2 2 Tag Set Index Displacement
– Only one tag matching, only index 2123 3
4 4
5 5
• Set Associative 4143 6
Selects set
7 7
– Both Tag and index matching Compared to Tags
0 0 4 0 Tag index Line
0 1 5 1
2 2 0 2
2 index
3 Line 7 3
Tag Tag index Line Selects AU
• Full Associative
Early select: access data after tag matching
– Only Tag matchings, No index (CAM:Contents Add Mem) Late select: access data while tag matching
ttag ttag ttag ttag ttag ttag ttag ttag
0 1 2 3 4 5 6 7

A Sahu 27 A Sahu 28

Cache access mechanism Cache with 4 word blocks


Address Address
31 0 31 0

Hit Tag 20 10 2 Hit Tag 16 12 2 2 Data


byte Data byte offset block offset
index index
offset
index v tag data index v tag data
0 0
1 1

... ...
... ...

1023 20 32 1023 16 32 32 32 32
= =
Mux
A Sahu 29 A Sahu 30

A Sahu 5
CS521 CSE IITG 11/23/2012

31 0 Access time in μS Vs CacheSize


tag 20 8 2 2 byte offset 1000
index block offset 900
v tag data v tag data v tag data v tag data 800
0
700
...
... 600 1‐Way
... 500 2‐Way
255 400 4‐Way
20 128 20 128 20 128 20 128 300
200 8‐Way
= = = =
Mux Mux Mux Mux 100
32 32 32 32 0
16KB 32KB 64KB 128KB 256KB
Hit
Mux Access time vs. size and associativity
slide
A Sahu31
Data A Sahu 32

Energy/read in nano Joules


0.5
0.45
0.4
0.35
0.3 1‐Wayy
0.25 2‐Way
0.2 4‐Way
0.15 8‐Way
0.1
0.05
0
16KB 32KB 64KB 128KB 256KB

Energy per read vs. size and associativity


A Sahu 33

A Sahu 6

You might also like