Unit 5
Memory and I/O System
The Memory System
Overview
Memory Technology Organization of the main memory Cache memory Associative Memory Virtual memory mechanism
Some Basic Concepts
Memory Types
Characteristics of Memory Technologies
Big-endian and Little-endian
The maximum size of the memory that can be used in any computer is determined by the addressing scheme.
16-bit addresses = 216 = 64K memory locations
Most modern computers are byte addressable.
Word address 0 4 0 4 Byte address 1 5 2 6 3 7 0 4 3 7 Byte address 2 6 1 5 0 4
k k k k k k k k
k k
2 -4
2 -4
2 -3
2- 2
2 - 1
2 - 4
2- 1
2 - 2
2 -3
2 -4
7
(a) Big-endian assignment
(b) Little-endian assignment
Traditional Architecture
Processor MAR n-bit data bus MDR Up to 2k addressable locations Word length =n bits Control lines ( R / W , MFC, etc.) k-bit address bus Memory
Figure 5.1. Connection of the memory to the processor.
8
Basic Concepts
Block transfer bulk data transfer Memory access time (Time from Read to MFC) Memory cycle time (Time between two memory operations Reads or Writes) RAM any location can be accessed for a Read or Write operation in some fixed amount of time that is independent of the locations address. Cache memory (for reducing memory access time) Virtual memory, memory management unit ( physical memory address is different from address specified in the program)
9
Semiconductor RAM Memories
10
Types of Semiconductor Memory
Semiconductor Memory
Read Only Memory
Random Access Memory Dynamic RAM (DRAM) Static RAM (SRAM)
Field Programmable
Mask Programmable (ROM)
Fast Page Mode DRAM(FPM DRAM) Fusible link PROM ( PROM) Extended Data Out DRAM(EDO RAM) UV light Erasable PROM(EPROM) Electrically Alterable PROM(EAPROM) Electrically Erasable PROM(EEPROM) Synchronous DRAM (SDRAM) Double Data Rate Synchronous Dynamic RAM (DDR SDRAM)
11
RAM BUS DRAM (RDRAM)
Internal Organization of Memory Chips
b7 W0 FF A0 A1 A2 A3 W15 Address decoder W1 Memory cells Sense / Write circuit Sense / Write circuit R/W CS b1 b0
12
b7
b1
b1
b0
b0
FF
16 words of 8 bits each: 16x8 memory Sense / Write org.. It has 16 external connections: circuit addr. 4, data 8, control: 2, power/ground: 2
Data input/output lines: b7 1K memory cells: 128x8 memory, external connections: ? 19(7+8+2+2)
1Kx1:? 15 (10+1+2+2)
Figure 5.2. Organization of bit cells in a memory chip.
A Memory Chip
5-bit row address W0 W1 5-bit decoder W31 32 32 memory cell array Sense/ Write circuitry
10-bit address
32-to-1 output multiplexer and input demultiplexer 5-bit column address Data input/output
R/ W CS
Figure 5.3. Organization of a 1K 1 memory chip.
13
Static Memories
The circuits are capable of retaining their state as long as power is applied.
b b
T1
T2
Word line Bit lines
14
Figure 5.4. A static RAM cell.
Static Memories
CMOS cell: low power consumption
b Vsupply b
T3 T1 X
T4 T2 Y
T5
T6
Word line Bit lines
15
Figure 5.5. An example of a CMOS memory cell.
Asynchronous DRAMs
Static RAMs are fast, but they cost more area and are more expensive. Dynamic RAMs (DRAMs) are cheap and area efficient, but they can not retain their state indefinitely need to be periodically refreshed.
Bit line Word line
T C
16
Figure 5.6. A single-transistor dynamic memory cell
A Dynamic Memory Chip
RA S
Row Addr. Strobe
Row address latch Row decoder 4096 512 8 cell array
A20 - 9 A 8 - 0
Sense / Write circuits
CS R/ W
Column address latch
Column decoder
CA S
D7
D0
Column Addr. Strobe
Figure 5.7. Internal organization of a 2M 8 dynamic memory chip.
17
Fast Page Mode
When the DRAM in last slide is accessed, the contents of all 4096 cells in the selected row are sensed, but only 8 bits are placed on the data lines D7-0, as selected by A8-0. Fast page mode make it possible to access the other bytes in the same row without having to reselect the row. A latch is added at the output of the sense amplifier in each column. Good for bulk transfer.
18
Synchronous DRAMs
The operations of SDRAM are controlled by a clock signal.
Refresh counter
Row address latch Row/Column address Column address counter
Row decoder
Cell array
Column decoder
Read/Write circuits & latches
Clock RA S CA S R/ W CS
19
Mode register and timing control
Data input register
Data output register
Figure 5.8. Synchronous DRAM.
Data
Synchronous DRAMs
No CAS pulses is needed in burst operation. Refresh circuits are included (every 64ms). Clock frequency > 100 MHz Intel PC100 and PC133
20
Latency and Bandwidth
The speed and efficiency of data transfers among memory, processor, and disk have a large impact on the performance of a computer system. Memory latency the amount of time it takes to transfer a word of data to or from the memory. Memory bandwidth the number of bits or bytes that can be transferred in one second. It is used to measure how much time is needed to transfer an entire block of data. Bandwidth is not determined solely by memory. It is the product of the rate at which data are transferred (and accessed) and the width of the data bus.
21
DDR SDRAM
Double-Data-Rate SDRAM Standard SDRAM performs all actions on the rising edge of the clock signal. DDR SDRAM accesses the cell array in the same way, but transfers the data on both edges of the clock. The cell array is organized in two banks. Each can be accessed separately. DDR SDRAMs and standard SDRAMs are most efficiently used in applications where block transfers are prevalent.
22
Structures of Larger Memories
21-bit addresses A0 A1 19-bit internal chip address A19 A20
2-bit decoder
512K 8 memory chip
D31-24
D23-16
D 15-8
D7-0
512K 8 memory chip
19-bit address
8-bit data input/output
Chip select
23
Figure 5.10. Organization of a 2M 32 memory module using 512K 8 static memory chips.
Memory System Considerations
The choice of a RAM chip for a given application depends on several factors: Cost, speed, power, size SRAMs are faster, more expensive, smaller used as Cache memories DRAMs are slower, cheaper, larger- used as main Memories
24
Memory Controller
Address R/ W Request Processor Clock Memory controller Row/Column address RAS CAS R/ W CS Clock Memory
Data
Figure 5.11. Use of a memory controller.
25
Read-Only Memories
26
Read-Only-Memory
Volatile / non-volatile memory ROM PROM: programmable ROM EPROM: erasable, reprogrammable ROM EEPROM: can be programmed and erased electrically
Bit line T P Not connected to store a 1 Connected to store a 0
Figure 5.12. A ROM cell.
27
Flash Memory
Similar to EEPROM Difference: only possible to write an entire block of cells instead of a single cell Low power Use in portable equipment Implementation of such modules
Flash cards Flash drives
28
Speed, Size and Cost
Smaller, faster, and costlier (per byte) storage devices L0: registers L1: L2: on-chip L1 cache (SRAM) off-chip L2 cache (SRAM) main memory (DRAM)
CPU registers hold words retrieved from L1 cache. L1 cache holds cache lines retrieved from the L2 cache memory. L2 cache holds cache lines retrieved from main memory.
L3: Larger, slower, and cheaper L4: (per byte) storage devices L5:
Main memory holds disk blocks retrieved from local disks.
local secondary storage (local disks)
Local disks hold files retrieved from disks on remote network servers.
remote secondary storage (distributed file systems, Web servers)
29
Cache Memories
30
Cache
Processor Cache Main memory
Figure 5.14. Use of a cache memory.
Replacement algorithm Hit / miss Write-through / Write-back Load through
31
Cache Memory
Effectiveness of cache mechanism is based on a property of computer programs called locality of reference The references to memory at any given time interval tend to be confined within a localized areas Analysis of programs shows that most of their execution time is spent on routines in which instructions are executed repeatedly These instructions may be loops, nested loops , or few procedures that call each other Many instructions in localized areas of program are executed repeatedly during some time period and reminder of the program is accessed infrequently This property is called Locality of Reference
32
Locality of Reference
Locality of reference is manifested in two ways 1. Temporal- means that a recently executed instruction is likely to be executed again very soon
The information which will be used in near future is likely to be in use already( e.g. reuse of information in loops)
2. Spatial- means that instructions in close proximity to a recently executed instruction are also likely to be executed soon
If a word is accessed, adjacent (near) words are likely to be accessed soon ( e.g. related data items (arrays) are usually stored together; instructions are executed sequentially )
33
Locality of Reference (contd..)
If active segments of a program can be placed in a fast (cache) memory , then total execution time can be reduced significantly Temporal Locality of Reference suggests whenever an information (instruction or data) is needed first , this item should be brought in to cache Spatial aspect of Locality of Reference suggests that instead of bringing just one item from the main memory to the cache ,it is wise to bring several items that reside at adjacent addresses as well ( ie a block of information )
34
Locality of Reference (contd..)
Example : sum = 0; for (i=0 ; i<n ; i++) sum = sum + a[i] ; return sum Explanation: 1. sum = 0 2. i = 0 3. sum = sum + a[i] 4. i = i + 1 5. Check value of i 6. if i < n goto 3 7. Otherwise print sum
35
Locality of Reference example
Locality Example: Data Reference array elements in succession: Spatial locality Reference sum each iteration: Temporal locality Instructions Reference instructions in sequence: Spatial locality Cycle through loop repeatedly: Temporal locality
36
Principles of cache
The following diagram is used for explaining cache from access point of view
CPU Cache Memory 512 X 12
Main Memory 32 K X 12
37
Principles of cache (contd..)
When a read request is received from CPU, contents of a block of memory words containing the location specified are transferred in to cache When the program references any of the locations in this block , the contents are read from the cache Number of blocks in cache is smaller than number of blocks in main memory Correspondence between main memory blocks and those in the cache is specified by a mapping function
38
Principles of cache (contd..)
Assume cache is full and memory word not in cache is referenced Control hardware decides which block from cache is to be removed to create space for new block containing referenced word from memory Collection of rules for making this decision is called Replacement algorithm
39
Read/ Write operations on cache
Cache Hit Operation
CPU issues Read/Write requests using addresses that refer to locations in main memory Cache control circuitry determines whether requested word currently exists in cache If it does, Read/Write operation is performed on the appropriate location in cache (Read/Write Hit)
40
Read/Write operations on cache in case of Hit
In Read operation main memory is not involved In Write operation two things can happen
1.Cache and main memory locations are updated simultaneously ( Write Through ) OR 2.Update only cache location and mark it as Dirty or Modified Bit and update main memory location at the time of cache block removal ( Write Back or Copy Back )
41
Read/Write operations on cache in case of Miss
Read Operation
When addressed word is not in cache Read Miss occurs there are two ways this can be dealt with 1.Entire block of words that contain the requested word is copied from main memory to cache and the particular word requested is forwarded to CPU from the cache ( Load Through ) (OR) 2.The requested word from memory is sent to CPU first and then the cache is updated ( Early Restart )
42
Read/Write operations on cache in case of Miss (contd..)
Write Operation
If addressed word is not in cache Write Miss occurs If write through protocol is used information is directly written in to main memory In write back protocol , block containing the word is first brought in to cache , the desired word is then overwritten
43
Mapping Functions
Correspondence between main memory blocks and those in the cache is specified by a memory mapping function There are three techniques in memory mapping
1. Direct Mapping 2. Associative Mapping 3. Set Associative Mapping
44
Direct Mapping
Example: Consider a cache consisting of 128 blocks of 16 words each (128 X 16 = 2048 = 2K) Main memory is addressed by a 16 bit address (main memory has 64 K words, which is 4K (4096) blocks of 16 words each)
Tag 5 Block 7 16 7 11 4 cache
45
Word 4 Main memory
Main memory
Direct Mapping
Block j of main memory maps onto block j modulo 128 of the cache
Cache tag tag Block 0 Block 1
Block 0 Block 1
Block 127 Block 128 Block 129
4: one of 16 words. (each block has 16=24 words) 7: points to a particular block in the cache (128=27)
tag
Block 127
Block 255 Block 256 Block 257
Figure 5.15. Direct-mapped cache.
5: 5 tag bits are compared with the tag bits associated with its location in the cache. Identify which of the 32 blocks that are resident in the cache (4096/128).
Block 4095 Tag 5 Block 7 Word 4 Main memory address
46
Direct Mapping
Tag 5 Block 7 Word 4 Main memory address
11101,1111111,1100
Tag: 11101 Block: 1111111=127, in the 127th block of the cache Word:1100=12, the 12th word of the 127th block in the cache
47
Associative Mapping
Cache tag tag Block 0 Block 1
Main memory Block 0 Block 1
Block i tag
Block 127
4: one of 16 words. (each block has 16=24 words) 12: 12 tag bits Identify which of the 4096 blocks that are resident in the cache 4096=212.
Block 4095 Tag 12 Word 4 Main memory address
Figure 5.16. Associative-mapped cache.
48
Associative Mapping
Tag 12 Word 4 Main memory address
111011111111,1100
Tag: 111011111111 Word:1100=12, the 12th word of a block in the cache
49
Main memory Block 0
Set-Associative Mapping
Cache tag Set 0 tag tag Set 1 tag Block 0 Block 1
Block 1
Block 63 Block 64 Block 2 Block 65 Block 3
4: one of 16 words. (each block has 16=24 words)
tag Set 63 tag
Block 127 Block 126 Block 128 Block 127 Block 129
6: points to a particular set in the cache (128/2=64=26) 6: 6 tag bits is used to check if the desired block is present (4096/64=26).
Block 4095
Figure 5.17. Set-associative-mapped cache with two blocks per set.
Tag 6 Set 6 Word 4 Main memory address
50
Set-Associative Mapping
Tag 6 Set 6 Word 4 Main memory address
111011,111111,1100
Tag: 111011 Set: 111111=63, in the 63th set of the cache Word:1100=12, the 12th word of the 63th set in the cache
51
Replacement Policies
When the cache is full and there is necessity to bring new data to cache , then a decision must be made as to which data from cache is to be removed The guideline for taking a decision about which data is to be removed is called replacement policy Replacement policy depends on mapping There is no specific policy in case of Direct mapping as we have no choice of block placement in cache
52
Replacement Policies (contd..)
In case of associative mapping
A simple procedure is to replace cells of the cache in round robin order whenever a new word is requested from memory This constitutes a First-in First-out (FIFO) replacement policy Random replacement First-in First-out (FIFO) ( item chosen is the item that has been in the set longest) Least Recently Used (LRU)( item chosen is the item that has been least recently used by CPU)
53
In case of set associative mapping
Associative Memory
54
Associative Memory
Many data-processing applications require the search of items in a table stored in memory Example: an account number may be searched in a file to determine the holders name and account status Usually all items are stored in memory where they can be addressed in sequence Search procedure usually is choosing a sequence of addresses , reading the content of memory at each address , and comparing the information read with the item being searched until a match occurs
55
Associative Memory (contd..)
The number of accesses to memory depends on the location of the item and the efficiency of the search algorithm Time required to find an item stored in memory can be reduced considerably if stored data can be identified for access by the content of the data itself rather by an address A memory unit accessed by content is called an associative memory or content addressable memory (CAM)
56
Associative Memory (contd..)
This type of memory is accessed simultaneously and in parallel on the basis of data content rather than by specific address or location When a word is written in associative memory , no address is given memory is capable of finding an empty location to store the word When a word is to be read from an associative memory , the content of the word or part of the word is specified - the memory locates all words which match the specified content and makes them for reading
57
Associative Memory (contd..)
Associative memory is uniquely suited to do parallel searches by data association Associative memory is more expensive than a random access memory because each cell must have storage capability as well as logic circuit for matching its content with an external argument Associative memories are used in applications where the search time is very critical and must be very short
58
Hardware Organization
Argument register (A)
Key Register (K) Match register Input Associative memory array and logic Read Write M m words n bits per word
output
59
Operation
Consists of a memory array and logic for m words with n bits per word Argument register A and Key register K each have n bits , one bit for each bit of a word Match register M has m bits , one bit for each memory word Each word in memory is compared in parallel with the content of the argument register The words that match the bits of the argument register set a corresponding bit in the match register
60
Operation (contd..)
After the matching process , those bits in the match register that have been set indicate the fact that that their corresponding words have been matched Reading is done by a sequential access to memory for those words whose corresponding bits in the match register have been set The key register provides a mask for choosing a particular field or key in the argument word - set to 0 for masking , otherwise set to 1
61
Operation (contd..)
Example for matching : A K Word 1 Word 2 101 111 100 101 111100 000000 111100 000001
no match match
62
Associative memory of m word , n cells per word
A1 Aj An
K1
Kj
Kn
Word 1
C 11
C 1j
C 1n
M1
Word i
C i1
C ij
C in
Mi
Word m
C m1 Bit 1
C mj Bit j
C mn Bit n
Mm
63
Operation
The cells in the array are marked by the letter C with 2 subscripts
The first subscript gives the word number and the second specifies the bit position in the word Example: C i j is the cell for bit j in word i
A bit A j in the argument register is compared with all the bits in column j of the array provided that K j = 1 This is done for all columns j = 1,2,.., n If a match occurs between all unmasked bits of the argument and the bits in word i , the corresponding bit M i in the match register is set to 1 If one or more unmasked bits of the argument and the word do not match , M i is cleared to 0
64
Internal organization of a typical cell Cij
Input Write
Aj
Kj
R Read Output
S F ij
Match logic
To M i
65
Internal Organization of one cell (contd..)
It consists of a flip-flop storage element F ij and the circuits for reading , writing and matching the cell Input bit is transferred in to the storage cell during a write operation The bit stored is read out during a read operation The match logic compares the content of the storage cell with the corresponding unmasked bit of the argument and provides an output for the decision logic that sets the bit in M i
66
MATCH LOGIC
K1 F'i1 F i1 A1 K2 F'i2 F i2 A2 .... Kn F'in F in An
Mi
67
Read Operation
If more than one word in memory matches the unmasked argument field , all the matched words will have 1s in the corresponding bit position of the match register In read operation all matched words are read in sequence by applying a read signal to each word line whose corresponding Mi bit is a logic 1 In applications where no two identical items are stored in the memory , only one word may match , in which case we can use Mi output directly as a read signal for the corresponding word
68
Write Operation
Can take two different forms
1. Entire memory may be loaded with new information 2.Unwanted words to be deleted and new words to be inserted
1.Entire memory : writing can be done by addressing each location in sequence This makes it random access memory for writing and content addressable memory for reading number of lines needed for decoding is d
Where m = 2 d , m is number of words
69
Write Operation (contd..)
2.Unwanted words to be deleted and new words to be inserted :
Tag register is used which has as many bits as there are words in memory For every active ( valid ) word in memory , the corresponding bit in tag register is set to 1 When word is deleted the corresponding tag bit is reset to 0 The word is stored in the memory by scanning the tag register until the first 0 bit is encountered After storing the word the bit is set to 1
70
Virtual Memory
71
Virtual Memory
Early days memory was expensive hence small Programmers were using secondary storage for overlaying Programmers were responsible for breaking programs in to overlays , decide where to keep in secondary memory, arranging for transfer of overlays between main memory and secondary memory In 1961 Manchester University proposed a method for performing overlay process automatically which has given rise to the concept of Virtual memory today
72
Virtual Memory - Background
Separate concept of address space and memory locations Programs reference instructions and data that is independent of available physical memory Addresses issued by processor for Instructions or Data are called Virtual or Logical addresses Virtual addresses are translated in to physical addresses by a combination of Hardware and Software components
73
Types of Memory
Real memory
Main memory Memory on disk Allows for effective multiprogramming and relieves the user of tight constraints of main memory
Virtual memory
74
Overview
Processor Virtual address
Data
MMU
Memory Management Unit
Physical address
Cache
Data
Physical address
Main memory
DMA transfer
Disk storage
Figure 5.26.
Virtual memory organization.
75
Address Space and Memory Space
Address used by a programmer is called virtual address and set of such addresses is called address space Address in main memory is called a location or physical address and set of such locations is called the memory space The Address Space is allowed to be larger than the memory space in computers with virtual memory
76
Address Space and Memory Space
address space
memory space
virtual address (logical address)
Mapping physical address
address generated by programs
actual main memory address
77
Address Translation
All programs and data are composed of fixedlength units called pages, each of which consists of a block of words that occupy contiguous locations in the main memory Page cannot be too small or too large The virtual memory mechanism bridges the size and speed gaps between the main memory and secondary storage similar to cache
78
Memory Table for mapping a virtual address
CPU will reference instructions and data with virtual address , but this information must be taken from physical memory for speedy access A table is needed to map virtual address to a physical address ( dynamic operation) This table may be kept in
a separate memory or main memory or associative memory
79
Paging
Each process has its own page table Each page table entry contains the frame number of the corresponding page in main memory A bit is needed to indicate whether the page is in main memory or not Another bit is needed to indicate whether the page is modified or not helps in deciding whether to be copied or not
80
Paging
Virtual Address Page number offset
Page Table Entry VM Other Control bits Frame number
81
Address Translation
Virtual address from processor Page table base register Page table address Virtual page number Offset
+
PAGE TABLE
Control Page frame in memory Page frame Offset bits
82
Virtual-memory address translation.
Physical address in main memory
Organization of memory Mapping Table in a paged system
Page no. 1 0 1 Table address Line number 0 1 0 1 0 1 0 0 1 1 Presence bit 000 001 010 011 100 101 110 111 11 00 0 1 1 0 0 1 1 0 1 Main memory Block 0 Block 1 Block 2 Block 3 MBR Virtual address
Memory page table
01
0101010011
01 10
Main memory address register
01
Memory table buffer register
83
Translation Lookaside Buffer
Each virtual memory reference can cause two physical memory accesses
One to fetch the page table One to fetch the data
To overcome this problem a high-speed cache is set up for page table entries
Called a Translation Lookaside Buffer (TLB)
84
Translation Lookaside Buffer (contd..)
Contains page table entries that have been most recently used
85
Virtual address from processor
TLB
TLB Virtual page number
Virtual page number
Offset
Control bits
Page frame in memory
No
=? Yes
Miss
Hit
Page frame
Offset
Physical address in main memory
Figure 5.28. Use of an associative-mapped TLB.
86
87
Translation Lookaside Buffer (contd..)
Given a virtual address, processor examines the TLB If page table entry is present (TLB hit), the frame number is retrieved and the real address is formed If page table entry is not found in the TLB (TLB miss), the page number is used to index the process page table
88
Translation Lookaside Buffer (contd..)
First checks if page is already in main memory
If not in main memory a page fault is issued
The TLB is updated to include the new page entry
89