CSC 203 1.
Computer System Architecture
Budditha Hettige
Department of Statistics and Computer Science University of Sri Jayewardenepura
Cache memory
Budditha Hettige
Cache Memory
A highspeed,speed small memory Most frequently used memory words are kept in When CPU needs a word, it first checks it in cache. If not found, checks in memory
Budditha Hettige
Cache and Main Memory
Budditha Hettige
Cache memory Vs Main Memory
Budditha Hettige
Cache Hit and Miss
Cache Hit: a request to read from memory, which can satisfy from the cache without using the main memory. Cache Miss: A request to read from memory, which cannot be satisfied from the cache, for which the main memory has to be consulted.
Budditha Hettige
Locality Principle
PRINCIPAL OF LOCALITY is the tendency to reference data items that are near other recently referenced data items, or that were recently referenced themselves. TEMPORAL LOCALITY : memory location that is referenced once is likely to be referenced multiple times in near future. SPATIAL LOCALITY : memory location that is referenced once, then the program is likely to be reference a nearby memory location in near future.
Budditha Hettige
Locality Principle
Let
c cache access time m main memory access time h hit ratio (fraction of all references that can be satisfied out of cache)
miss ratio = 1h
Average memory access time = c + (1h) m
H =1 No memory references H=0 all are memory references
Budditha Hettige
Example:
Suppose that a word is read k times in a short interval
First reference: memory, Other k1 references: cache h = k1 k
Memory access time = c + m k
Budditha Hettige
Cache Memory
Main memories and caches are divided into fixed sized blocks Cache lines blocks inside the cache On a cache miss, entire cache line is loaded into cache from memory Example:
64K cache can be divided into 1K lines of 64 bytes, 2K lines of 32 byte etc
Unified cache
instruction and data use the same cache
Split cache
Instructions in one cache and data in another
Budditha Hettige
10
A system with three levels of cache
Budditha Hettige
11
Pentium 4 Block Diagram
Budditha Hettige
12
Replacement Algorithm
Optimal Replacement: replace the block which is no longer needed in the future. If all blocks currently in Cache Memory will be used again, replace the one which will not be used in the future for the longest time. Random selection: replace a randomly selected block among all blocks currently in Cache Memory.
Budditha Hettige
13
Replacement Algorithm
FIFO (first-in first-out): replace the block that has been in Cache Memory for the longest time. LRU (Least recently used): replace the block in Cache Memory that has not been used for the longest time. LFU (Least frequently used): replace the block in Cache Memory that has been used for the least number of times
Budditha Hettige
14
Cache Memory Placement Policy
Three commonly used methods to translate main memory addresses to cache memory addresses.
Associative Mapped Cache Direct-Mapped Cache Set-Associative Mapped Cache
The choice of cache mapping scheme affects cost and performance, and there is no single best method that is appropriate for all situations
Budditha Hettige
15
Associative Mapping
Budditha Hettige
16
Associative Mapping
A block in the Main Memory can be mapped to any block in the Cache Memory available (not already occupied) Advantage: Flexibility. An Main Memory block can be mapped anywhere in Cache Memory. Disadvantage: Slow or expensive. A search through all the Cache Memory blocks is needed to check whether the address can be matched to any of the tags.
Budditha Hettige
17
Direct Mapping
Budditha Hettige
18
Direct Mapping
To avoid the search through all CM blocks needed by associative mapping, this method only allows # blocks in main memory # blocks in cache memory Blocks to be mapped to each Cache Memory block. Each entry (row) in cache can hold exactly one cache line from main memory 32byte cache line size cache can hold 64KB
Budditha Hettige
19
Direct Mapping
Advantage: Direct mapping is faster than the associative mapping as it avoids searching through all the CM tags for a match. Disadvantage: But it lacks mapping flexibility. For example, if two MM blocks mapped to same CM block are needed repeatedly (e.g., in a loop), they will keep replacing each other, even though all other CM blocks may be available.
Budditha Hettige
20
Set-Associative Mapping
Budditha Hettige
21
Set-Associative Mapping
This is a trade-off between associative and direct mappings where each address is mapped to a certain set of cache locations. The cache is broken into sets where each set contains "N" cache lines, let's say 4. Then, each memory address is assigned a set, and can be cached in any one of those 4 locations within the set that it is assigned to. In other words, within each set the cache is associative, and thus the name. Budditha Hettige 22
Set Associative cache
LRU (Least Recently Used) algorithm is used
keep an ordering of each set of locations that could be accessed from a given memory location whenever any of present lines are accessed, it updates list, making that entry the most recently accessed when it comes to replace an entry, one at the end of list is discarded
Budditha Hettige
23
Load-Through and Store-Through
Load-Through : When the CPU needs to read a word from the memory, the block containing the word is brought from MM to CM, while at the same time the word is forwarded to the CPU.
Store-Through : If store-through is used, a word to be stored from CPU to memory is written to both CM (if the word is in there) and MM. By doing so, a CM block to be replaced can be overwritten by an in-coming block without being saved to MM.
Budditha Hettige
24
Cache Write Methods
Words in a cache have been viewed simply as copies of words from main memory that are read from the cache to provide faster access. However this view point changes. There are 3 possible write actions:
Write the result into the main memory Write the result into the cache Write the result into both main memory and cache memory
Budditha Hettige
25
Cache Write Methods
Write Through: A cache architecture in which data is written to main memory at the same time as it is cached. Write Back / Copy Back: CPU performs write only to the cache in case of a cache hit. If there is a cache miss, CPU performs a write to main memory. When the cache is missed :
Write Allocate: loads the memory block into cache and updates the cache block No-Write allocation: this bypasses the cache and writes the word directly into the memory.
Budditha Hettige
26
Cache Evaluation
Problem Solution Processor on which feature first appears
External memory slower than the system bus Increased processor speed results in external bus becoming a bottleneck for cache access. Internal cache is rather small, due to limited space on chip
Add external cache using faster memory technology Move external cache on-chip, operating at the same speed as the processor Add external L2 cache using faster technology than main memory
386
486
486
Budditha Hettige
27
Cache Evaluation
Problem Increased processor speed results in external bus becoming a bottleneck for L2 cache access Solution Move L2 cache on to the processor chip. Create separate back-side bus that runs at higher speed than the main (front-side) external bus. The BSB is dedicated to the L2 cache. Add external L3 cache. Move L3 cache on-chip Processor on which feature first appears Pentium II
Pentium Pro
Some applications deal with massive databases and must have rapid access to large amounts of data. The on-chip caches are too small.
Pentium III Pentium IV
Budditha Hettige
28
Comparison of Cache Sizes
Processor Type
IBM 360/85
PDP-11/70 VAX 11/780 IBM 3033 IBM 3090 Intel 80486 Pentium PowerPC 601
Mainframe
Minicomputer Minicomputer Mainframe Mainframe PC PC PC
Year of Introduction 1968 1975 1978 1978 1985 1989 1993 1993
L1 cache
L2 cache
L3 cache
16 to 32 KB
1 KB 16 KB 64 KB 128 to 256 KB 8 KB 8 KB/8 KB 32 KB
256 to 512 KB
PowerPC 620
PowerPC G4 IBM S/390 G4 IBM S/390 G6 Pentium 4 IBM SP
PC
PC/server Mainframe Mainframe PC/server High-end server
1996
1999 1997 1999 2000 2000
32 KB/32 KB
32 KB/32 KB 32 KB 256 KB 8 KB/8 KB 64 KB/32 KB
256 KB to 1 MB 256 KB 8 MB 256 KB 8 MB
2 MB 2 MB
CRAY MTAb
Itanium SGI Origin 2001 Itanium 2 IBM POWER5 CRAY XD-1
Supercomputer
PC/server High-end server PC/server High-end server Supercomputer
2000
2001 2001 2002 2003
8 KB
16 KB/16 KB 32 KB/32 KB 32 KB 64 KB
2 MB
96 KB 4 MB 256 KB 1.9 MB 1MB
4 MB 6 MB 36 MB 29
Budditha Hettige 64 KB/64 KB 2004