0% found this document useful (0 votes)
12 views91 pages

S04 Memory Subsystem

The document covers the organization and architecture of computer memory subsystems, detailing types of memory such as SRAM and DRAM, memory hierarchy, and the concepts of cache and virtual memory. It explains the internal structure of memory chips, the importance of memory access speed, and the differences between Von Neumann and Harvard architectures. Additionally, it discusses memory management techniques and error correction methods.

Uploaded by

MUNTAZIR ALI
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views91 pages

S04 Memory Subsystem

The document covers the organization and architecture of computer memory subsystems, detailing types of memory such as SRAM and DRAM, memory hierarchy, and the concepts of cache and virtual memory. It explains the internal structure of memory chips, the importance of memory access speed, and the differences between Von Neumann and Harvard architectures. Additionally, it discusses memory management techniques and error correction methods.

Uploaded by

MUNTAZIR ALI
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 91

CS2002-Computer Architecture and Organization

Memory Subsystem

23-08-2025 CS203: Computer Organization and Architecture 1


Contents
• Internal organization of a memory chip
• Organization of a memory unit
• Semiconductor memories: SRAM and DRAM cells.
• Error correction
• Read-Only Memories
• Interleaved Memories
• Cache Memories: Concept, Mapping methods, Caches in commercial processors
• Memory management unit: Concept of virtual memory, Address translation,
• Hardware support for memory management,
• Secondary storage: Hard Disks, RAID, Optical Disks, Magnetic Tape Systems.

23-08-2025 CS203: Computer Organization and Architecture 2


Von Neumann Architecture Harvard Architecture
• It is ancient computer architecture based on • It is modern computer architecture based on
stored program computer concept. Harvard Mark I relay based model.
• Same physical memory address is used for • Separate physical memory address is used
instructions and data. for instructions and data.
• There is common bus for data and instruction • Separate buses are used for transferring data
transfer. and instruction.
• Two clock cycles are required to execute
• An instruction is executed in a single cycle.
single instruction.
• It is cheaper in cost. • It is costly than Von Neumann Architecture.
• CPU can not access instructions and • CPU can access instructions and read/write at
read/write at the same time. the same time.
• It is used in personal computers and small • It is used in microcontrollers and signal
computers (microprocessors). processing.
23-08-2025 CS203: Computer Organization and Architecture 3
Basic Concepts: Memory
• Maximum size of the memory that can be used in any
computer is determined by the addressing scheme.
• For example: Suppose, a computer that generates 16-bit addresses
✓It is capable of addressing up to 216 = 64K (kilo) memory
locations.
✓Similarly, for 32-bit address computer: 232 = 4GB locations
✓And, for 64-bit address computer: 264 = 16E (exa) ≈ 16 ×
1018 locations
• The number of locations represents the size of the address
space of the computer.

23-08-2025 CS203: Computer Organization and Architecture 4


Basic Concepts: Memory
• Digital computer works on stored programmed concept introduced by Von
Neumann.
• Memory is used to store the information, which includes both program and
data.
• Due to several reasons, we have different kind of memories i.e. at different
level different kind of memory is used.
• Memory of computer is broadly categories into two categories:
• Internal: used by CPU to perform task, and
• External: used to store bulk information, including large software and data.

23-08-2025 CS203: Computer Organization and Architecture 5


Basic Concepts: Memory Hierarchy
• Programmers want unlimited amounts of memory with low latency.
• Fast memory technology is more expensive per bit than slower memory.
• Solution: organize memory system into a hierarchy.
• Entire addressable memory space available in largest, slowest memory.
• Incrementally smaller and faster memories, each containing a subset of the
memory below it, proceed in steps up toward the processor.
• The purpose of memory hierarchy is:
• To bridge the speed mismatch between processor and memory at
reasonable cost.
• Minimize the average access time of entire memory systems

23-08-2025 CS203: Computer Organization and Architecture 6


Basic Concepts: Memory Hierarchy
• Temporal and spatial locality insures that nearly all references can be found in smaller
memories.
• i.e. Gives the allusion of a large, fast memory being presented to the processor.

• Locality:
• Spatial Locality: Data is more likely to be accessed if neighboring data is
accessed.
e.g., data in a sequentially access array

• Temporal Locality: Data is more likely to be accessed if it has been


recently accessed.
e.g. code within a loop
23-08-2025 CS203: Computer Organization and Architecture 7
Basic Concepts: Memory Hierarchy

23-08-2025 CS203: Computer Organization and Architecture 8


Basic Concepts: Memory Hierarchy
• Memory Hierarchy is to obtain the highest possible access speed while minimizing the total
cost of the memory system
• Speed of memory access is critical, the idea is to bring instructions and data that will be used
in the near future as close to the processor as possible.
Auxiliary memory
Magnetic
tapes I/O Main Register
processor memory
Magnetic
disks
Cache

CPU Cache
memory
Main Memory

Increasing
Magnetic Disk cost

Magnetic Tape

23-08-2025 CS203: Computer Organization and Architecture 9


Basic Concepts: Memory Hierarchy Example

23-08-2025 CS203: Computer Organization and Architecture 10


Main Memory
• The connection between the processor and its memory consists of
address, data, and control lines.
▪ Address lines to specify the memory location
involved in a data transfer operation
▪ Data lines to transfer the data.
▪ At the same time, control lines carry the
command indicating a Read or a Write operation
and whether a byte or a word is to be transferred.
▪ Control lines also provide the necessary timing
information and are used by the memory to
indicate when it has completed the requested
operation.

23-08-2025 CS203: Computer Organization and Architecture 11


How to measure Speed of memory units?
• Memory access time:
• It is the time that elapses between the initiation of an operation to transfer a
word of data and the completion of that operation.
• This is referred to as the memory access time.

• Memory cycle time


• It is the minimum time delay required between the initiation of two
successive memory operations.
• For example, the time between two successive Read operations.

• The cycle time is usually slightly longer than the access time, depending on
the implementation details of the memory unit.
23-08-2025 CS203: Computer Organization and Architecture 12
Random-Access Memory (RAM)
• A memory unit is called a Random-access Memory (RAM) if the
access time to any location is the same, independent of the
location’s address.
• This distinguishes such memory units from serial, or partly serial,
access storage devices such as magnetic and optical disks.
• Access time of the these devices depends on the address or
position of the data.
• The technology for implementing computer memories uses
semiconductor integrated circuits.

23-08-2025 CS203: Computer Organization and Architecture 13


Cache Memory
• In general, a processor can process instructions and data faster than they
can be fetched from the main memory.
• Hence, the memory access time is the bottleneck in the system.
• Solution: use a cache memory.
• Cache memory:
- a small, fast memory inserted between the larger, slower main memory
and the processor.
- It holds the currently active portions of a program and their data.

Reading Assignment: What is Virtual Memory

23-08-2025 CS203: Computer Organization and Architecture 14


Virtual Memory
• Virtual memory is another important concept related to memory
organization.
• With this technique:
• only the active portions of a program are stored in the main
memory, and remainder is stored on the much larger secondary
storage device.
• Sections of the program are transferred back and forth between
the main memory and the secondary storage device in a manner
that is transparent to the application program.
• As a result, the application program sees a memory that is much
larger than the computer’s physical main memory.
23-08-2025 CS203: Computer Organization and Architecture 15
Block Transfers
• As we know, data move frequently between the main memory and the cache
and between the main memory and the disk.
• These transfers do not occur one word at a time.
• Data are always transferred in contiguous blocks involving tens, hundreds,
or thousands of words.
• Data transfers between the main memory and high-speed devices such as a graphic
display or an Ethernet interface also involve large blocks of data.
• Hence, a critical parameter for the performance of the main memory is its
ability to read or write blocks of data at high speed.
• This is an important consideration that we will encounter repeatedly as we
discuss memory technology and the organization of the memory system.

23-08-2025 CS203: Computer Organization and Architecture 16


Semiconductor
RAM Memories
Semiconductor random-access memories (RAMs) are available in a wide range of speeds.
Their cycle times range from 100 ns to less than 10 ns.

23-08-2025 CS203: Computer Organization and Architecture 17


23-08-2025 CS203: Computer Organization and Architecture 18
Internal Organization of Memory Chips
• Memory cells are usually organized in the form of an array, in which each cell is
capable of storing one bit of information.
• Each row of cells constitutes a memory word and all cells of a row are
connected to a common line referred to as the word line, which is driven by the
address decoder on the chip.
• The cells in each column are connected to a Sense/Write circuit by two bit
lines, and the Sense/Write circuits are connected to the data input/output lines of
the chip.
• During a Read operation, these circuits sense, or read, the information stored in
the cells selected by a word line and place this information on the output data
lines.
• During a Write operation, the Sense/Write circuits receive input data and store
them in the cells of the selected word.
23-08-2025 CS203: Computer Organization and Architecture 19
Internal Organization of Memory Chips
• An example of a very small memory circuit consisting of 16 words of 8 bits each,
referred to as a 16 × 8 organization.
• Data input and the data output of
each Sense/Write circuit are
connected to a single bidirectional
data line that can be connected
to the data lines of a computer.
• Two control lines, R/W and CS,
are provided.
• The R/W (Read/Write) input
specifies the required operation,
and the CS (Chip Select) input
selects a given chip in a multichip
memory system.

23-08-2025 CS203: Computer Organization and Architecture 20


Internal Organization of Memory Chips

Organization of a
1K × 1 memory chip.

23-08-2025 CS203: Computer Organization and Architecture 21


Reading Assignments
• What is CMOS (P-type and N-type transistors)? Working Principle & Its
Applications.
• What is field effect transistor (FET)? Working Principle & Its Applications.

23-08-2025 CS203: Computer Organization and Architecture 22


Memory Cell: SRAM
Memories that consist of circuits capable of retaining their state as long as power is applied
are known as static memories.
• Two inverters are cross-connected to form a latch and it is
connected to two bit lines by transistors T1 and T2.
• These transistors act as switches that can be opened or closed
under control of the word line.
• When the word line is at ground level, the transistors are
turned off and the latch retains its state.
• For example,
• if the logic value at point X is 1 and at point Y is 0, this state
is maintained as long as the signal on the word line is at
ground level.
• Assume that this state represents the value 1.

23-08-2025 CS203: Computer Organization and Architecture 23


Memory Cell: SRAM
Read Operation:
- In order to read the state of the SRAM cell, the word line is
activated to close switches T1 and T2.
- If the cell is in state 1, the signal on bit line b is high and the signal
on bit line b‘ is low.
- The opposite is true if the cell is in state 0.
- Thus, b and b' are always complements of each other.
- The Sense/Write circuit at the end of the two bit lines monitors their
state and sets the corresponding output accordingly.

23-08-2025 CS203: Computer Organization and Architecture 24


Memory Cell: SRAM
Write Operation:
- During a Write operation, the Sense/Write circuit drives bit lines
b and b' instead of sensing their state.
- It places the appropriate value on bit line b and its complement
on b' and activates the word line.
- This forces the cell into the corresponding state, which the cell
retains when the word line is deactivated.

23-08-2025 CS203: Computer Organization and Architecture 25


Memory Cell: SRAM CMOS Cell
• Transistor pairs (T , T ) and (T , T )
3 5 4 6

form the inverters in the latch.


• For example, in state 1 the voltage at
point X is maintained high by having
transistors:
• T3 and T6 ON, while T4 and T5 are
OFF.
• If T1 and T2 are turned ON, bit
lines b and b' will have high and
low signals, respectively.

23-08-2025 CS203: Computer Organization and Architecture 26


Memory Cell: SRAM CMOS Cell
• Continuous power is needed for the cell to retain its state.
• If power is interrupted, the cell’s contents are lost.
• When power is restored, the latch settles into a stable state, but not necessarily the same
state the cell was in before the interruption.
• Hence, SRAMs are said to be volatile memories because their contents are lost when
power is interrupted.
• A major advantage of CMOS SRAMs is their very low power consumption, because
current flows in the cell only when the cell is being accessed.
• Otherwise, T1, T2, and one transistor in each inverter are turned off, ensuring that
there is no continuous electrical path between Vsupply and ground.
• Static RAMs can be accessed very quickly. Access times on the order of a few
nanoseconds are found in commercially available chips.
• SRAMs are used in applications where speed is of critical concern.

23-08-2025 CS203: Computer Organization and Architecture 27


Memory Cell: DRAM
• Static RAMs are fast, but their cells require several transistors → Expensive
• Less expensive and higher density RAMs can be implemented with simpler cells.
• But, these simpler cells do not retain their state for a long period, unless they are
accessed frequently for Read or Write operations.
• Memories that use such cells are called dynamic RAMs (DRAMs).
• A DRAM memory cell consists of a single field effect
transistor (FET) and a capacitor.
• Bit stored in a cell in the form of a charge on a
capacitor.
• To store a bit in this cell, transistor T is turned ON and an
appropriate voltage is applied to the bit line.
• This causes a known amount of charge to be
stored in the capacitor.
23-08-2025 CS203: Computer Organization and Architecture 28
Memory Cell: DRAM
• After the transistor is turned off, the charge remains stored in the
capacitor, but not for long.
• The capacitor begins to discharge. charge can be maintained for
only tens of milliseconds.
• This is because the transistor continues to conduct a tiny amount of
current, measured in picoamperes, after it is turned off.
• Cell is required to store data for longer time, its content must be
periodically refreshed by restoring capacitor charge to its full
value.

23-08-2025 CS203: Computer Organization and Architecture 29


Memory Cell: DRAM
• Information stored in the cell can be retrieved correctly only if it is read before the
charge in the capacitor drops below some threshold value.
• A sense amplifier connected to the bit line detects whether the charge stored in the
capacitor is above or below the threshold value.
• If the charge is above the threshold:
- Sense amplifier drives the bit line to the full voltage representing the logic value 1.
- As a result, the capacitor is recharged to the full charge corresponding to the logic value 1.
• If the charge in the capacitor is below the threshold value:
- It pulls the bit line to ground level to discharge the capacitor fully (logic value 0).
- Thus, reading the contents of a cell automatically refreshes its contents.

Since the word line is common to all cells in a row, all cells in a selected row are read
and refreshed at the same time.

23-08-2025 CS203: Computer Organization and Architecture 30


Memory Cell: DRAM

23-08-2025 CS203: Computer Organization and Architecture 31


Internal diagram of a DRAM chip
Each cell has a unique location or address
defined by intersection of row and a column.

23-08-2025 CS203: Computer Organization and Architecture 32


Memory Control of DRAM
• Word line and bit line are connected as shown to select
the required bit within memory to be read or written to.
• Multitudes of Such cells form word consisting of bits.
• Memory addresses are decoded and converted as rows
and columns of matrix that memory elements are
arranged in.
• Processor when address memory sends the complete
address on its address pins.
• Between processor and DRAM chip there is a
memory controller whose function is to split the
address into two as columns and rows.
• The memory controller should also generate the
signals necessary for reading or writing to DRAM.

23-08-2025 CS203: Computer Organization and Architecture 33


DRAM: 256-Megabit chip
Internal organization of a
32M × 8 DRAM chip.

In commercial DRAM chips, the


RAS and CAS control signals are
active when low.

RAS: row address strobe


CAS: column address strobe

23-08-2025 CS203: Computer Organization and Architecture 34


DRAM: 256-Megabit chip
• 256-Megabit DRAM chip, configured as 32M×8, is shown in the above Figure
• The cells are organized in the form of a 16K×16K array.
• The 16,384 cells in each row are divided into 2,048 groups of 8, forming 2,048
bytes of data.
• Therefore, 14 address bits are needed to select a row, and another 11 bits are
needed to specify a group of 8 bits in the selected row.
• In total, a 25-bit address is needed to access a byte in this memory.
• The high-order 14 bits and the low-order 11 bits of the address constitute the row
and column addresses of a byte, respectively.
23-08-2025 CS203: Computer Organization and Architecture 35
DRAM: Refreshing
• The refresh rate of DRAM depends on the temperature and the DRAM standard:
• DDR5 and LPDDR5: refresh period of 32 milliseconds at 85°C.
• JEDEC standard:
• refresh rate of 64 milliseconds at normal temperatures (<85°C)
• 32 milliseconds at high temperatures (>85°C)

How is refreshing done?


- There are many methods for refresh and one commonly used method is
ROR (RAS only Refresh). By activating each row using RAS.

23-08-2025 CS203: Computer Organization and Architecture 36


DRAM: Refreshing
• DRAM controller takes care of scheduling the refreshes and making sure
that
they do not interfere with regular reads and writes.
• So to keep the data in DRAM chip from leaking away, the DRAM controller
periodically sweeps through all of the rows by cycling repeatedly and
placing a series of row addresses on the address bus.
• This method is designated as ROR or RAS Only Refresh.
• To reduce the number of refresh cycles, one method of design is to split the
address such that there are fewer rows and more columns.
• So, the DRAM array is then a rectangular array, rather than a square one.

23-08-2025 CS203: Computer Organization and Architecture 37


Synchronous DRAM
• In Asynchronous DRAM, access timing is not related to the system clock
at all.
• In Synchronous DRAM, access are synchronized with system clock and
SDRAM is currently the RAM that is used as primary memory in general
purpose computer systems.
• Synchronization with system clock easier control of the memory access
operations.
• SDRAMs have built-in refresh circuitry, with a refresh counter to provide
the addresses of the rows to be selected for refreshing.
• As a result, the dynamic nature of these memory chips is almost invisible to
the user.
Synchronous Vs Asynchronous DRAM
Asynchronous synchronous
• Does not share any common • Shares a common clock with CPU,
clock with CPU, the controller commands can be placed on its control
pins on clock edges.
chips have to manipulate the
DRAM’s control pins based on • In SDRAM, the input signals are latched
all sorts of timing considerations. into control logic block which functions
as input to a state machine.
• For Accessing Memory, toggling • State Machine controls memory Access.
of the external control inputs has
• Read, write and refresh are initiated by
a direct effect on internal loading control commands into device.
memory array.
DDR (Double Data Rate) SDRAM
• It can be made to transfer data at rising and falling edges of the clock, instead
of just at rising edge.
• The key idea is to take advantage of the fact that a large number of bits are
accessed at the same time inside the chip when a row address is applied.
• To make the best use of the available clock speed, data are transferred
externally on both the rising and falling edges of the clock.
• That is why it is called double the data rate.
• Several versions of DDR chips have been developed:
• DDR, DDR2, DDR3, and DDR4 with enhanced capabilities in terms of
increased storage capacity, lower power, and faster clock speeds.
DDR (Double Data Rate) SDRAM
Examples: SRAM
Organization of a 2M × 32
memory module using 512K × 8
static memory chips.

23-08-2025 CS203: Computer Organization and Architecture 42


Examples: SRAM
• Consider a memory consisting of 2M words of 32 bits each.
• Above Figure shows how this memory can be implemented using 512K × 8 static
memory chips.
• Each column in the figure implements one byte position in a word, with four chips
providing 2M bytes.
• Four columns implement the required 2M × 32 memory.
• Each chip has a control input called Chip-select.

Problem: Describe a structure similar to the one in above Figure for an 8M × 32 memory
using 512K × 8 memory chips.
Solution: The required structure is essentially the same as in the above Figure, except that 16
rows are needed, each with four 512 × 8 chips. Address lines A18−0 should be connected
to all chips. Address lines A22−19 should be connected to a 4-bit decoder to select one of
the 16 rows.
23-08-2025 CS203: Computer Organization and Architecture 43
Read-Only Memories

23-08-2025 CS203: Computer Organization and Architecture 44


Read-Only Memories
• Both SRAM and DRAM chips are volatile, which means
that they retain information only while power is turned on.
• There are many applications requiring memory devices
that retain the stored information when power is turned off.
For Example:
- Booting information of a computer.
- Many embedded systems do not use a hard disk and require nonvolatile memories to store their
software. Such as; fire alarm, washing machine etc.

➢Different types of nonvolatile memories have been developed.


➢Generally, their contents can be read in the same way and a special writing process is needed to
place the information into a nonvolatile memory.
➢Since its normal operation involves only reading the stored data, a memory of this type is called a
read-only memory (ROM).
23-08-2025 CS203: Computer Organization and Architecture 45
ROM
▪ A logic value 0 is stored in the cell if the transistor is
connected to ground at point P; otherwise, a 1 is
stored.
▪ The bit line is connected through a resistor to the power
supply.
▪ To read the state of the cell, the word line is activated
to close the transistor switch.
▪ As a result, the voltage on the bit line drops to near zero if there is a connection between
the transistor and ground.
▪ If there is no connection to ground, the bit line remains at the high voltage level,
indicating a 1.
▪ A sense circuit at the end of the bit line generates the proper output value.
▪ The state of the connection to ground in each cell is determined when the chip is
manufactured, using a mask with a pattern that represents the information to be stored.
23-08-2025 CS203: Computer Organization and Architecture 46
PROM
▪ Some ROM designs allow the data to be loaded by the user, thus providing a
programmable ROM (PROM).
▪ Programmability is achieved by inserting a fuse at point P.
▪ Before it is programmed, the memory contains all 0s.
▪ The user can insert 1s at the required locations by burning out the fuses at these locations
using high-current pulses. (Of course, this process is irreversible).
▪ PROMs provide flexibility and convenience not available with ROMs.
▪ The cost of preparing the masks needed for storing a particular information pattern makes
ROMs cost effective only in large volumes.
▪ The alternative technology of PROMs provides a more convenient and considerably less
expensive approach, because memory chips can be programmed directly by the user.
▪ Types of PROMS: EPROM and EEPROM

23-08-2025 CS203: Computer Organization and Architecture 47


PROM: EPROM and EEPROM
EPROM (Erasable and Programmable ROM):
• Contents can be erased by exposing to ultraviolet radiation.
• Such ROMs have a window through which UV light is applied in the chip.

EEPROM (Electrically Erasable PROM):


• Erasure can be done while chip is on circuit board.
• Programmer can change the data one byte at a time, takes long time when erasing.
• EEPROM is non-volatile, but also erasable and reprogrammable
• Used for data storage in small quantities where data have to be read frequently, may not
have to be changed normally.

23-08-2025 CS203: Computer Organization and Architecture 48


Interleaved Memories
• Interleaved memory is designed to compensate for the relatively slow speed of
dynamic random-access memory (DRAM) or core memory by spreading memory
addresses evenly across memory banks.
• Problem: A single monolithic memory array takes long time to access and does not
enable multiple accesses in parallel.
• Goal: Reduce the latency of memory array access and enable multiple access in
parallel.
• Idea: Divide the array into multiple banks that can be accessed independently (in
the same cycle or consecutive cycles)
• Where each bank is smaller than the entire memory storage.
• Access to different banks can be overlapped

• A Key Issue: How do you map data to different banks?


23-08-2025 CS203: Computer Organization and Architecture 49
Interleaved Memories

➢ Cannot satisfy multiple access to same bank.


➢ It can use Crossbar interconnection in
Input/Output

➢ Bank Conflict: Two accesses are to the same


bank difficult to handle.

23-08-2025 CS203: Computer Organization and Architecture 50


Direct Memory Access (DMA)
• Blocks of data are often transferred between the
main memory and I/O devices such as disks.
• DMA is a technique for controlling such transfers
without frequent, program-controlled intervention by
the processor.
• Data transferred from an I/O device to the memory by
first reading them from the I/O device using an
instruction such as:
Load R2, DATAIN
which loads the data into a processor register.
• The reverse process takes place for transferring data
from the memory to an I/O device.
23-08-2025 CS203: Computer Organization and Architecture 51
Direct Memory Access (DMA)
• An instruction to transfer input or output data is executed only after the
processor determines that the I/O device is ready, either by polling its status
register or by waiting for an interrupt request.
• In either case, considerable overhead is incurred, because several program
instructions must be executed involving many memory accesses for each data word
transferred.
• An alternative approach is used to transfer blocks of data directly between the main
memory and I/O devices, such as disks.
• A special control unit is provided to manage the transfer, without continuous
intervention by the processor.
• This approach is called direct memory access, or DMA.
• The unit that controls DMA transfers is referred to as a DMA controller.

23-08-2025 CS203: Computer Organization and Architecture 52


Cache Memories

23-08-2025 CS203: Computer Organization and Architecture 53


Memory Interfacing

23-08-2025 CS203: Computer Organization and Architecture 54


Cache Memories
• The cache is a small and very fast memory, interposed between the
processor and the main memory.
• Its purpose is to make the main memory appear to the processor to be
much faster than it actually is.
• The effectiveness of this approach is based on a property of computer
programs called locality of reference.
• Cache block refers to a set of contiguous address locations of some size.

❖ Analysis of programs shows that most of their execution time is spent in routines in
which many instructions are executed repeatedly. These instructions may constitute a
simple loop, nested loops, or a few procedures that repeatedly call each other.
❖ In other words, many instructions in localized areas of the program are executed repeatedly
during some time period. (Locality: Temporal and Spatial)

23-08-2025 CS203: Computer Organization and Architecture 55


Cache Memories
• When the processor issues a Read request:
• A block of memory words containing the location specified are transferred into the
cache.
• Subsequently, when the program references any of the locations in this block, the
desired contents are read directly from the cache.
• Usually, the cache memory can store a reasonable number of blocks at any given time,
but this number is small compared to the total number of blocks in the main memory.
• The correspondence between the main memory blocks and those in the cache is
specified by a mapping function.
• When the cache is full and a memory word (instruction or data) that is not in the
cache is referenced, the cache control hardware must decide which block should be
removed to create space for the new block that contains the referenced word.
• The collection of rules for making this decision constitutes the cache’s replacement
algorithm.

23-08-2025 CS203: Computer Organization and Architecture 56


Cache Hits
• The processor does not need to know explicitly about the existence of the
cache.
• It simply issues Read and Write requests using addresses that refer to locations
in the memory.
• Cache control circuitry determines whether the requested word currently
exists in the cache.
• If it does, the Read or Write operation is performed on the appropriate
cache location i.e. a read or write hit.
• The main memory is not involved when there is a cache hit in a Read
operation.

23-08-2025 CS203: Computer Organization and Architecture 57


Cache Hits
• For a Write operation, the system can proceed in one of two ways:
I. First, write-through protocol: both the cache location and the
main memory location are updated.
II. The second technique:
✓Update only the cache location and to mark the block containing it with
an associated flag bit, often called the dirty or modified bit.
✓The main memory location of the word is updated later, when the block
containing this marked word is removed from the cache to make room
for a new block.
✓This technique is known as the write-back, or copy-back, protocol.

23-08-2025 CS203: Computer Organization and Architecture 58


Cache Hits
• The write-through protocol is simpler than the write-back
protocol, but it results in unnecessary Write operations in the main
memory when a given cache word is updated several times during its
cache residency.
• The write-back protocol also involves unnecessary Write operations,
because all words of the block are eventually written back, even if
only a single word has been changed while the block was in the cache.
• The write-back protocol is used most often, to take advantage of the
high speed with which data blocks can be transferred to memory
chips.

23-08-2025 CS203: Computer Organization and Architecture 59


Cache Misses
• A Read operation for a word that is not in the cache constitutes a
Read miss.
• It causes the block of words containing the requested word to be
copied from the main memory into the cache.
• After the entire block is loaded into the cache, the particular word
requested is forwarded to the processor.
• Alternatively, this word may be sent to the processor as soon as it
is read from the main memory (It is called load-through, or early
restart) and it reduces the processor’s waiting time.

23-08-2025 CS203: Computer Organization and Architecture 60


Cache Misses
• When a Write miss occurs in a computer that uses the write-
through protocol, the information is written directly into the
main memory.
• For the write-back protocol, the block containing the
addressed word is first brought into the cache, and then the
desired word in the cache is overwritten with the new
information.

23-08-2025 CS203: Computer Organization and Architecture 61


Access Time: Memory (Two Level Memory )

23-08-2025 CS203: Computer Organization and Architecture 62


Access Time: Memory (Three Level Memory)

23-08-2025 CS203: Computer Organization and Architecture 63


Mapping Functions
• There are several possible methods for determining where memory blocks are placed in the
cache.

23-08-2025 CS203: Computer Organization and Architecture 64


Mapping Functions
• Example:
- Consider a cache consisting of 128 blocks of 16 words
each, for a total of 2048 (2K) words, and assume that the
main memory is addressable by a 16-bit address.
- The main memory has 64K words, which we will view
as 4K blocks of 16 words each.
- For simplicity, we have assumed that consecutive
addresses refer to consecutive words.

23-08-2025 CS203: Computer Organization and Architecture 65


Direct Mapping
• The simplest way to determine cache
locations in which to store memory blocks is
the direct-mapping technique.
• In this technique, block j of the main
memory maps onto block j modulo 128 of
the cache.
• Thus, whenever one of the main memory
blocks 0, 128, 256, . . . is loaded into the
cache, it is stored in cache block 0.
• Blocks 1, 129, 257, . . . are stored in cache
block 1, and so on.

23-08-2025 CS203: Computer Organization and Architecture 66


Direct Mapping
• Since more than one memory block is mapped
onto a given cache block position, contention
may arise for that position even when the cache is
not full.
• For example:
• Instructions of a program may start in block
1 and continue in block 129, possibly after a
branch.
• As this program is executed, both of these
blocks must be transferred to the block-1
position in the cache.
• Contention is resolved by allowing the new block
to overwrite the currently resident block.
23-08-2025 CS203: Computer Organization and Architecture 67
Direct Mapping
• With direct mapping, the replacement algorithm is trivial.
• Placement of a block in the cache is determined by its memory address.
• The memory address can be divided into three fields:
1. Word: The low-order 4 bits select one of 16 words in a block.

2. Block: When a new block enters the cache, the 7-bit cache block field
determines the cache position in which this block must be stored.

3. Tag: The high-order 5 bits of the memory address of the block are
stored in 5 tag bits associated with its location in the cache. The tag bits
identify which of the 32 main memory blocks mapped into this cache
position is currently resident in the cache.
23-08-2025 CS203: Computer Organization and Architecture 68
Direct Mapping
• As execution proceeds, the 7-bit cache block field of each address
generated by the processor points to a particular block location in the
cache.
• The high-order 5 bits of the address are compared with the tag bits
associated with that cache location.
• If they match, then the desired word is in that block of the cache.
• If there is no match, then the block containing the required word
must first be read from the main memory and loaded into the
cache.
• The direct-mapping technique is easy to implement, but it is not very
flexible.
23-08-2025 CS203: Computer Organization and Architecture 69
Direct Mapping

23-08-2025 CS203: Computer Organization and Architecture 70


23-08-2025 CS203: Computer Organization and Architecture 71
Associative Mapping

23-08-2025 CS203: Computer Organization and Architecture 72


Associative Mapping
• A flexible mapping method, in which a main
memory block can be placed into any cache
block position.
• In this case, 12 tag bits are required to
identify a memory block when it is resident in
the cache.
• The tag bits of an address received from the
processor are compared to the tag bits of each
block of the cache to see if the desired block is
present.
• This is called the associative-mapping
technique.

23-08-2025 CS203: Computer Organization and Architecture 73


Associative Mapping
• It gives complete freedom in choosing the cache location in which to place the
memory block, resulting in a more efficient use of the space in the cache
• When a new block is brought into the cache, it replaces (ejects) an existing
block only if the cache is full.
• In this case, we need an algorithm to select the block to be replaced.
• The complexity of an associative cache is higher than that of a direct-mapped
cache, because of the need to search all 128 tag patterns to determine whether
a given block is in the cache.
• To avoid a long delay, the tags must be searched in parallel.
• A search of this kind is called an associative search.

23-08-2025 CS203: Computer Organization and Architecture 74


Set-Associative Mapping
• It is a combination of the direct and
associative mapping techniques.
• The blocks of the cache are grouped into
sets, and the mapping allows a block of the
main memory to reside in any block of a
specific set.
• Hence, the contention problem of the direct
method is eased by having a few choices for
block placement.
• At the same time, the hardware cost is
reduced by decreasing the size of the
associative search.
23-08-2025 CS203: Computer Organization and Architecture 75
Set-Associative Mapping
Example: A cache with two blocks per set
- memory blocks 0, 64, 128, . . . , 4032 map into
cache set 0, and they can occupy either of the two
block positions within this set.
- Having 64 sets means that the 6-bit set field of the
address determines which set of the cache might
contain the desired block.
- The tag field of the address must then be
associatively compared to the tags of the
two blocks of the set to check if the desired block is
present.
• The number of blocks per set is a parameter that
can be selected to suit the requirements of a
particular computer.
23-08-2025 CS203: Computer Organization and Architecture 76
Set-Associative Mapping
For the main memory and cache sizes in Figure:
- 4 blocks per set can be accommodated by a 5-bit
set field,
- 8 blocks per set by a 4-bit set field, and so on.
- The extreme condition of 128 blocks per set
requires no set bits and corresponds to the fully-
associative technique, with 12 tag bits.
- The other extreme of one block per set is the
direct-mapping method.
• A cache that has k blocks per set is referred
to as a k-way set-associative cache.

23-08-2025 CS203: Computer Organization and Architecture 77


Reading Assignment
Q. What is Victim Cache?

23-08-2025 CS203: Computer Organization and Architecture 78


Cache Replacement Policy
• Cache Replacement Policy is required for associative and set associative mapping but
not for direct mapping.
• These are aimed to minimize the miss penalty for the future references.

23-08-2025 CS203: Computer Organization and Architecture 79


Replacement Algorithms
• Once the cache has been filled, when a new block is brought into
the cache, one of the existing blocks must be replaced.
➢For direct mapping there is only one possible line for any
particular block and no choice is possible.
➢For the associative and set-associative techniques a replacement
algorithm is needed
➢To achieve high speed, an algorithm must be implemented in
hardware

23-08-2025 CS203: Computer Organization and Architecture 80


23-08-2025 CS203: Computer Organization and Architecture 81
Caches in commercial processors: Intel Cache Evolution

23-08-2025 CS203: Computer Organization and Architecture 82


Caches in commercial processors: Pentium 4 Block Diagram

23-08-2025 CS203: Computer Organization and Architecture 83


Self Study
• Memory management unit:
• Concept of virtual memory,
• Address translation,
• Hardware support for memory management

• Secondary storage:
• RAID,
• Optical Disks,
• Magnetic Tape Systems.

23-08-2025 CS203: Computer Organization and Architecture 84


Secondary Storage: Hard Disks
• Main limitation of the semiconductor memories is the cost per bit of stored information.
• The large storage requirements of most computer systems are economically realized in the
form of magnetic and optical disks, which are usually referred to as secondary storage
devices.
Magnetic Hard Disks:
• storage medium in a magnetic-disk system consists of
one or more disk platters mounted on a common
spindle.
• A thin magnetic film is deposited on each platter,
usually on both sides.
• Assembly is placed in a drive that causes it to rotate at
a constant speed.
• The magnetized surfaces move in close proximity to
read/write heads
23-08-2025 CS203: Computer Organization and Architecture 85
Secondary Storage: Hard Disks
Magnetic Hard Disks:
• Each read/write head consists of a magnetic yoke and a
magnetizing coil.
• Digital information can be stored on the magnetic film by
applying current pulses of suitable polarity to the
magnetizing coil.
• This causes the magnetization of the film in the area
immediately underneath the head to switch to a direction
parallel to the applied field.
• Same head can be used for reading the stored information.
• Changes in the magnetic field in the vicinity of the head caused by the movement of the film relative to
the yoke induce a voltage in the coil, which now serves as a sense coil.
• Polarity of this voltage is monitored by the control circuitry to determine the state of magnetization of the
film.
• Only changes in the magnetic field under the head can be sensed during the Read operation.
23-08-2025 CS203: Computer Organization and Architecture 86
Secondary Storage: Hard Disks
Magnetic Hard Disks:
• if the binary states 0 and 1 are represented by two opposite states of magnetization, a voltage
is induced in the head only at 0-to-1 and at 1-to-0 transitions in the bit stream.
• A long string of 0s or 1s causes an induced voltage only at the beginning and end of the
string.
• Therefore, to determine the number of consecutive 0s or 1s stored, a clock must provide
information for synchronization.
• One simple scheme, depicted in Figure is
known as phase encoding or Manchester
encoding. In this scheme, changes in
magnetization occur for each data bit.
• Clocking information is provided by the
change in magnetization at the midpoint of
each bit period.

23-08-2025 CS203: Computer Organization and Architecture 87


Secondary Storage: Hard Disks

23-08-2025 CS203: Computer Organization and Architecture 88


Secondary Storage: Hard Disks
Data Buffer/Cache:
• A disk drive is connected to the rest of a computer system using some standard
interconnection scheme, such as SCSI (Small Computer System Interface) or SATA (Serial
Advanced Technology Attachment) .
• The interconnection hardware is usually capable of transferring data at much higher rates
than the rate at which data can be read from disk tracks.
• An efficient way to deal with the possible differences in transfer rates is to include a data
buffer in the disk unit.
• The buffer is a semiconductor memory, capable of storing a few megabytes of data.
• The requested data are transferred between the disk tracks and the buffer at a rate dependent
on the rotational speed of the disk.
• Transfers between the data buffer and the main memory can then take place at the
maximum rate allowed by the interconnect between them

23-08-2025 CS203: Computer Organization and Architecture 89


Secondary Storage: Hard Disks
Disk Controller:
• Operation of a disk drive is controlled by a disk controller circuit, which also provides an
interface between the disk drive and the rest of the computer system. One disk controller
may be used to control more than one drive.
• A disk controller that communicates directly with the processor contains a number of
registers that can be read and written by the operating system.
• Thus, communication between the OS and the disk controller is achieved in the same
manner as with any I/O interface.
• The disk controller uses the DMA scheme to transfer data between the disk and the main
memory.
• Actually, these transfers are from/to the data buffer, which is implemented as a part of the
disk controller module.
• The OS initiates the transfers by issuing Read and Write requests, which entail loading the
controller’s registers with the necessary addressing and control information.
23-08-2025 CS203: Computer Organization and Architecture 90
Secondary Storage: Hard Disks
Disk Controller:
• Typically, this information includes:
• Main memory address—The address of the first main memory location of the block of words involved in
the transfer.
• Disk address—The location of the sector containing the beginning of the desired block of words.
• Word count—The number of words in the block to be transferred.
• On the disk drive side, the controller’s major functions are:
• Seek —Causes the disk drive to move the read/write head from its current position to the desired track.
• Read —Initiates a Read operation, starting at the address specified in the disk address register. Data read
serially from the disk are assembled into words and placed into the data buffer for transfer to the main
memory.
• Write —Transfers data to the disk, using a control method similar to that for Read operations.
• Error checking—Computes the error correcting code (ECC) value for the data read from a given sector
and compares it with the corresponding ECC value read from the disk. In the case of a mismatch, it
corrects the error if possible; otherwise, it raises an interrupt to inform the OS that an error has occurred.
During a Write operation, the controller computes the ECC value for the data to be written and stores this
value on the disk.
23-08-2025 CS203: Computer Organization and Architecture 91

You might also like