Memory System
Memory System
Assistant Professor
VIT Bhopal University
Memory System
Memory
Memory is an essential component of the computer system.
It stores binary instructions and datum for the computer system.
The memory is the place where the computer holds current programs and data that are in
use.
None technology is optimal in satisfying the memory requirements for a computer
system.
Computer memory exhibits perhaps the widest range of type, technology, organization,
performance and cost of any feature of a computer system.
The memory unit that communicates directly with the CPU is called main memory.
Devices that provide backup storage are called auxiliary memory or secondary memory.
Location
• Processor memory: The memory like registers is included within the processor and
termed as processor memory.
• Internal memory: It is often termed as main memory and resides within the CPU.
• External memory: It consists of peripheral storage devices such as disk and magnetic
tape that are accessible to processor via i/o controllers.
Capacity
• Word size: Capacity is expressed in terms of words or bytes.
— The natural unit of organisation
• Number of words: Common word lengths are 8, 16, 32 bits etc.
— or Bytes
Unit of Transfer
• Internal: For internal memory, the unit of transfer is equal to the number of data lines
into and out of the memory module.
• External: For external memory, they are transferred in block which is larger than a
word.
• Addressable unit
— Smallest location which can be uniquely addressed
— Word internally
— Cluster on Magnetic disks
[Type here]
Dr M Suresh
Assistant Professor
VIT Bhopal University
Access Method
• Sequential access: In this access, it must start with beginning and read through a
specific linear sequence. This means access time of data unit depends on position of
records (unit of data) and previous location.
— e.g. tape
• Direct Access: Individual blocks of records have unique address based on location.
Access is accomplished by jumping (direct access) to general vicinity plus a
sequential search to reach the final location.
— e.g. disk
• Random access: The time to access a given location is independent of the sequence of
prior accesses and is constant. Thus any location can be selected out randomly and
directly addressed and accessed.
— e.g. RAM
• Associative access: This is random access type of memory that enables one to make a
comparison of desired bit locations within a word for a specified match, and to do this
for all words simultaneously.
— e.g. cache
Performance
• Access time: For random access memory, access time is the time it takes to perform a
read or write operation i.e. time taken to address a memory plus to read / write from
addressed memory location. Whereas for non-random access, it is the time needed to
position read / write mechanism at desired location.
— Time between presenting the address and getting the valid data
• Memory Cycle time: It is the total time that is required to store next memory access
operation from the previous memory access operation.
Memory cycle time = access time plus transient time (any additional time required
before a second access can commence).
— Time may be required for the memory to “recover” before next access
— Cycle time is access + recovery
• Transfer Rate: This is the rate at which data can be transferred in and out of a
memory unit.
— Rate at which data can be moved
— For random access, R = 1 / cycle time
— For non-random access, Tn = Ta + N / R; where Tn – average time to read or
write N bits, Ta – average access time, N – number of bits, R – Transfer rate
in bits per second (bps).
Physical Types
• Semiconductor
— RAM
• Magnetic
— Disk & Tape
• Optical
— CD & DVD
• Others
[Type here]
Dr M Suresh
Assistant Professor
VIT Bhopal University
— Bubble
— Hologram
Physical Characteristics
• Decay: Information decays mean data loss.
• Volatility: Information decays when electrical power is switched off.
• Erasable: Erasable means permission to erase.
• Power consumption: how much power consumes?
Organization
• Physical arrangement of bits into words
• Not always obvious
- e.g. interleaved
CPU logic is usually faster than main memory access time, with the result that processing
speed is limited primarily by the speed of main memory
The cache is used for storing segments of programs currently being executed in the CPU
and temporary data frequently needed in the present calculations
The memory hierarchy system consists of all storage devices employed in a computer
system from slow but high capacity auxiliary memory to a relatively faster cache memory
accessible to high speed processing logic. The figure below illustrates memory hierarchy.
Hierarchy List
Registers
L1 Cache
L2 Cache
Main memory
Disk cache
Disk
Optical
Tape
Registers
Cache Memory
Locality of Reference: Programs access a small portion of memory repeatedly (temporal and spatial locality).
Trade-off Between Speed, Cost, and Capacity: Faster memory is smaller and more expensive, while larger memory is slower
and cheaper.
Data Movement: Information is moved between levels to balance performance and cost efficiently.
[Type here]
Dr M Suresh
Assistant Professor
VIT Bhopal University
[Type here]
Dr M Suresh
Assistant Professor
VIT Bhopal University
Types of ROM
Programmable ROM (PROM)
o It is non-volatile and may be written into only once. The writing process is
performed electrically and may be performed by a supplier or customer at
a time later than the original chip fabrication.
Erasable Programmable ROM (EPROM)
o It is read and written electrically. However, before a write operation, all
the storage cells must be erased to the same initial state by exposure of the
packaged chip to ultraviolet radiation (UV ray). Erasure is performed by
shining an intense ultraviolet light through a window that is designed into
the memory chip. EPROM is optically managed and more expensive than
[Type here]
Dr M Suresh
Assistant Professor
VIT Bhopal University
PROM, but it has the advantage of the multiple update capability.
Electrically Erasable programmable ROM (EEPROM)
o This is a read mostly memory that can be written into at any time without
erasing prior contents, only the byte or byte addresses are updated. The
write operation takes considerably longer than the read operation, on the
order of several hundred microseconds per byte. The EEPROM combines
the advantage of non-volatility with the flexibility of being updatable in
place, using ordinary bus control, addresses and data lines. EEPROM is
more expensive than EPROM and also is less dense, supporting fewer bits
per chip.
Flash Memory
o Flash memory is also the semiconductor memory and because of the speed
with which it can be reprogrammed, it is termed as flash. It is interpreted
between EPROM and EEPROM in both cost and functionality. Like
EEPROM, flash memory uses an electrical erasing technology. An entire
flash memory can be erased in one or a few seconds, which is much faster
than EPROM. In addition, it is possible to erase just blocks of memory
rather than an entire chip. However, flash memory doesn't provide byte
level erasure, a section of memory cells are erased in an action or 'flash'.
External Memory
The devices that provide backup storage are called external memory or auxiliary
memory. It includes serial access type such as magnetic tapes and random access
type such as magnetic disks.
Magnetic Tape
A magnetic tape is the strip of plastic coated with a magnetic recording medium.
Data can be recorded and read as a sequence of character through read / write
head. It can be stopped, started to move forward or in reverse or can be rewound.
Data on tapes are structured as number of parallel tracks running length wise.
Earlier tape system typically used nine tracks. This made it possible to store data
one byte at a time with additional parity bit as 9th track. The recording of data in
this form is referred to as parallel recording.
Magnetic Disk
A magnetic disk is a circular plate constructed with metal or plastic coated with
magnetic material often both side of disk are used and several disk stacked on one
spindle which Read/write head available on each surface. All disks rotate together
at high speed. Bits are stored in magnetize surface in spots along concentric
circles called tracks. The tracks are commonly divided into sections called
sectors. After the read/write head are positioned in specified track the system has
to wait until the rotating disk reaches the specified sector under read/write head.
Information transfer is very fast once the beginning of sector has been reached.
Disk that are permanently attached to the unit assembly and cannot be used by
occasional user are called hard disk drive with removal disk is called floppy disk.
[Type here]
Dr M Suresh
Assistant Professor
VIT Bhopal University
Optical Disk
The huge commercial success of CD enabled the development of low cost optical
disk storage technology that has revolutionized computer data storage. The disk is
form from resin such as polycarbonate. Digitally recorded information is
imprinted as series of microscopic pits on the surface of poly carbonate. This is
done with the finely focused high intensity leaser. The pitted surface is then
coated with reflecting surface usually aluminum or gold. The shiny surface is
protected against dust and scratches by the top coat of acrylic.
Information is retrieved from CD by low power laser. The intensity of reflected
light of laser changes as it encounters a pit. Specifically if the laser beam falls on
pit which has somewhat rough surface the light scatters and low intensity is
reflected back to the surface. The areas between pits are called lands. A land is a
smooth surface which reflects back at higher intensity. The change between pits
and land is detected by photo sensor and converted into digital signal. The sensor
tests the surface at regular interval.
DVD-Technology
Multi-layer
Very high capacity (4.7G per layer)
Full length movie on single disk
Using MPEG compression
Finally standardized (honest!)
Movies carry regional coding
Players only play correct region films
DVD-Writable
Loads of trouble with standards
First generation DVD drives may not read first generation DVD-W disks
First generation DVD drives may not read CD-RW disks
1. Latency:Latency is the delay between when a request for data is made and when the data is
actually
available for use. It is typically measured in nanoseconds (ns).
Types of Latency:
Access Latency: Time taken to access a particular memory location.
Read/Write Latency: Time between issuing a read/write command and receiving/sending data.
DRAM Latency: Time required to retrieve data from DRAM after a request.
Impact on Performance:
Lower latency means faster memory response, improving CPU performance.
High latency can slow down applications, especially in real-time systems and gaming.
[Type here]
Dr M Suresh
Assistant Professor
VIT Bhopal University
2. Cycle Time
Cycle time is the minimum time between successive memory accesses. It is the time required to
complete one memory operation (read/write) and be ready for the next.
Relation Between Cycle Time and Access Time:
Cycle Time ≥ Access Time
If cycle time is significantly greater than access time, the memory remains idle between operations,
leading to inefficiencies.
Performance Considerations:
SRAM: Shorter cycle time, leading to higher performance.
DRAM: Longer cycle time due to refresh operations, making it slower.
3. Bandwidth
Memory bandwidth is the amount of data that can be transferred per unit of time. It is measured
in megabytes per second (MB/s) or gigabytes per second (GB/s).
Formula for Bandwidth:
or
Bandwidth=Bus Width×Clock Speed×Data Rate (Transfers per cycle)
4. Memory Interleaving
Memory interleaving is a technique used to increase memory access speed by dividing memory into
multiple banks and accessing them in parallel.
Types of Interleaving:
Low-order Interleaving:
Consecutive memory addresses are assigned to different memory banks.
Helps reduce latency for sequential memory access.
High-order Interleaving:
Groups addresses in large blocks before assigning them to banks.
Used in systems with fewer but larger memory accesses.
Advantages of Interleaving:
Improves memory throughput by overlapping memory accesses.
Reduces waiting time as the CPU can fetch data from multiple banks simultaneously.
Better CPU utilization as memory is accessed efficiently.
[Type here]
Dr M Suresh
Assistant Professor
VIT Bhopal University
When the CPU needs to access memory, cache is examined. If the word is found in
cache, it is read from the cache and if the word is not found in cache, main memory is
accessed to read word. A block of word containing the one just accessed is then
transferred from main memory to cache memory.
[Type here]
Dr M Suresh
Assistant Professor
VIT Bhopal University
[Type here]
Dr M Suresh
Assistant Professor
VIT Bhopal University
Fig: Cache memory / Main memory structure
Locality of Reference
The reference to memory at any given interval of time tends to be confined within
a few localized area of memory. This property is called locality of reference. This
is possible because the program loops and subroutine calls are encountered
frequently. When program loop is executed, the CPU will execute same portion of
program repeatedly. Similarly, when a subroutine is called, the CPU fetched
starting address of subroutine and executes the subroutine program. Thus loops
and subroutine localize reference to memory.
This principle states that memory references tend to cluster over a long period of
time, the clusters in use changes but over a short period of time, the processor is
primarily working with fixed clusters of memory references.
Spatial Locality
It refers to the tendency of execution to involve a number of memory locations
[Type here]
Dr M Suresh
Assistant Professor
VIT Bhopal University
that are clustered.
It reflects tendency of a program to access data locations sequentially, such as
when processing a table of data.
Temporal Locality
It refers to the tendency for a processor to access memory locations that have been
used frequently. For e.g. Iteration loops executes same set of instructions
repeatedly.
Elements of Cache design
Cache size
Size of the cache to be small enough so that the overall average cost per bit is
close to that of main memory alone and large enough so that the overall average
access time is close to that of the cache alone.
The larger the cache, the larger the number of gates involved in addressing the
cache.
Large caches tend to be slightly slower than small ones – even when built with the
same integrated circuit technology and put in the same place on chip and circuit
board.
The available chip and board also limits cache size.
Mapping function
The transformation of data from main memory to cache memory is referred to as
memory mapping process.
Because there are fewer cache lines than main memory blocks, an algorithm is
needed for mapping main memory blocks into cache lines.
There are three different types of mapping functions in common use and are direct,
associative and set associative. All the three include following elements in each
example.
o The cache can hold 64 Kbytes
o Data is transferred between main memory and the cache in blocks of 4
bytes each. This means that the cache is organized as 16Kbytes = 214 lines
of 4 bytes each.
o The main memory consists of 16 Mbytes with each byte directly
addressable by a 24 bit address (224 = 16Mbytes). Thus, for mapping
purposes, we can consider main memory to consist of 4Mbytes blocks of 4
bytes each.
Direct Mapping
It is the simplex technique, maps each block of main memory into only one possible
cache line i.e. a given main memory block can be placed in one and only one place on
cache.
i = j modulo m
Where I = cache line number; j = main memory block number; m = number of lines in
the cache
The mapping function is easily implemented using the address. For purposes of cache
[Type here]
Dr M Suresh
Assistant Professor
VIT Bhopal University
access, each main memory address can be viewed as consisting of three fields.
The least significant w bits identify a unique word or byte within a block of main
memory. The remaining s bits specify one of the 2s blocks of main memory.
The cache logic interprets these s bits as a tag of (s-r) bits most significant position and a
line field of r bits. The latter field identifies one of the m = 2r lines of the cache.
24 bit address
2 bit word identifier (4 byte block)
22 bit block identifier
8 bit tag (=22-14), 14 bit slot or line
No two blocks in the same line have the same Tag field
Check contents of cache by finding line and checking Tag
Cache Line 0 1 2 3 4
0 1 2 3 4
Main 5 6 7 8 9
Memory 10 11 12 13 14
Block 15 16 17 18 19
20 21 22 23 24
Note that
o all locations in a single block of memory have the same higher order bits (call them the
block number), so the lower order bits can be used to find a particular word in the block.
o within those higher-order bits, their lower-order bits obey the modulo mapping given
above (assuming that the number of cache lines is a power of 2), so they can be used to
get the cache line for that block
o the remaining bits of the block number become a tag, stored with each cache line, and
used to distinguish one block from another that could fit into that same cache
[Type here]
Dr M Suresh
Assistant Professor
VIT Bhopal University
line.
[Type here]
Dr M Suresh
Assistant Professor
VIT Bhopal University
o If a program accesses 2 blocks that map to the same line repeatedly, cache
misses are very high
Associated Mapping
It overcomes the disadvantage of direct mapping by permitting each main memory block
to be loaded into any line of cache.
Cache control logic interprets a memory address simply as a tag and a word field
Tag uniquely identifies block of memory
Cache control logic must simultaneously examine every line’s tag for a match which
requires fully associative memory
very complex circuitry, complexity increases exponentially with size
Cache searching gets expensive
[Type here]
Dr M Suresh
Assistant Professor
VIT Bhopal University
[Type here]
Dr M Suresh
Assistant Professor
VIT Bhopal University
Fig: Set associative mapping example
13 bit set number
Block number in main memory is modulo 213
000000, 00A000, 00B000, 00C000 … map to same set
Use set field to determine cache set to look in
Compare tag field to see if we have a hit
e.g
Address Tag Data Set number
1FF 7FFC 1FF 12345678 1FFF
001 7FFC 001 11223344 1FFF
Replacement algorithm
When all lines are occupied, bringing in a new block requires that an existing line be
overwritten.
Direct mapping
No choice possible with direct mapping
Each block only maps to one line
Replace that line
[Type here]
Dr M Suresh
Assistant Professor
VIT Bhopal University
Write policy
When a line is to be replaced, must update the original copy of the line in main
memory if any addressable unit in the line has been changed
If a block has been altered in cache, it is necessary to write it back out to main
memory before replacing it with another block (writes are about 15% of memory
references)
Must not overwrite a cache block unless main memory is up to date
I/O modules may be able to read/write directly to memory
Multiple CPU’s may be attached to the same bus, each with their own cache
Write Through
All write operations are made to main memory as well as to cache, so main
memory is always valid
Other CPU’s monitor traffic to main memory to update their caches when needed
This generates substantial memory traffic and may create a bottleneck
Anytime a word in cache is changed, it is also changed in main memory
Both copies always agree
Generates lots of memory writes to main memory
Multiple CPUs can monitor main memory traffic to keep local (to CPU) cache up
to date
Lots of traffic
Slows down writes
Remember bogus write through caches!
Write back
When an update occurs, an UPDATE bit associated with that slot is set, so when
the block is replaced it is written back first
During a write, only change the contents of the cache
Update main memory only when the cache line is to be replaced
Causes “cache coherency” problems -- different values for the contents of an
address are in the cache and the main memory
Complex circuitry to avoid this problem
Accesses by I/O modules must occur through the cache
Multiple caches still can become invalidated, unless some cache coherency
system is used. Such systems include:
o Bus Watching with Write Through - other caches monitor memory writes
by other caches (using write through) and invalidates their own cache line
if a match
o Hardware Transparency - additional hardware links multiple caches so
that writes to one cache are made to the others
o Non-cacheable Memory - only a portion of main memory is shared by
more than one processor, and it is non-cacheable
[Type here]
Dr M Suresh
Assistant Professor
VIT Bhopal University
Number of caches
L1 and L2 Cache
On-chip cache (L1 Cache)
It is the cache memory on the same chip as the processor, the on-chip cache. It
reduces the processor's external bus activity and therefore speeds up execution
times and increases overall system performance.
Requires no bus operation for cache hits
Short data paths and same speed as other CPU transactions
Split Cache
o Cache splits into two parts first for instruction and second for data. Can
outperform unified cache in systems that support parallel execution and
pipelining (reduces cache contention)
o Trend is toward split cache because of superscalar CPU’s
o Better for pipelining, pre-fetching, and other parallel instruction execution
designs
o Eliminates cache contention between instruction processor and the
execution unit (which uses data)
[Type here]
Dr M Suresh
Assistant Professor
VIT Bhopal University
4. Cache Coherence
Cache coherence ensures data consistency between multiple caches in a multi-core system.
Cache Coherence Problems:
1. Write Propagation Issue: When one processor updates a cache block, others must see the updated value.
2. Stale Data Issue: If one core reads outdated data, incorrect execution occurs.
Cache Coherence Protocols:
(a) Write-Through vs. Write-Back
Write-Through: Every write updates both cache and main memory (slow but consistent).
Write-Back: Updates cache first and writes to memory later (faster but requires coherence management).
(b) MESI Protocol (Most Common)
A state-based protocol that ensures consistency in multi-core systems:
Modified (M): Data in cache is modified and different from memory.
Exclusive (E): Data is unmodified but exists only in this cache.
Shared (S): Data exists in multiple caches, same as main memory.
Invalid (I): Data is no longer valid in the cache.
State Meaning
Modified The cache block is changed & different from main memory.
Exclusive The cache block matches memory but is only in one cache.
Shared The block is unchanged and present in multiple caches.
Invalid The block is outdated and must be reloaded.
Coherence Protocol Techniques:
1. Snooping Protocol
o Every cache monitors (or "snoops") the memory bus to track changes.
o Used in small-scale multiprocessor systems.
2. Directory-Based Protocol
o A directory in main memory keeps track of cache copies.
o More scalable for large multiprocessor systems.
[Type here]