IFT212 – Computer Architecture and Organization
Module 8
8 Memory hierarchy, Cache memory and performance
Module outline
8.1 Introduction to memory hierarchy
8.2 Cache memory and it’s types
Introduction
In navigating the complexities of modern computer systems, our exploration into Memory
Hierarchy, Virtual Memory, Cache Memory, and Performance in this university course is
crucial for understanding the intricate architecture that governs information storage and
retrieval. The introductory phase immerses students in the foundational concept of the
"Memory Hierarchy," where a comprehensive overview delineates the significance of multiple
memory levels, ranging from registers to disk storage. Following this, we delve into the realm
of "Virtual Memory," unraveling its definition and purpose, and exploring the intricate
mechanisms of address translation and paging that underpin its functionality. The journey
progresses to illuminate the "Role of Cache Memory in the Memory Hierarchy," exploring its
pivotal role as a bridge between fast and slow memory components. Subsequently, we
scrutinize the "Types of Cache Memory and Their Characteristics," understanding the nuances
that distinguish various cache architectures. The course concludes by elucidating the
"Relationship between Memory Hierarchy, Virtual Memory, Cache Memory, and Overall
System Performance," providing students with a holistic perspective on how these
components collectively impact the efficiency and speed of a computer system. By following
this sequence, students will gain a comprehensive understanding of memory management in
computing, laying the groundwork for further exploration into system optimization and
performance enhancement. Join us on this educational journey as we unravel the intricacies
of memory hierarchy, virtual memory, cache memory, and their symbiotic relationship with
overall system performance.
Module learning outcomes
Upon completing this module, you should be able to:
MLO 1. Evaluate the components of Memory Hierarchy and synthesize the importance of each
level in the hierarchy for efficient data storage and retrieval.
MLO 2. Define Virtual Memory and apply knowledge of address translation and paging
mechanisms in practical scenarios.
MLO 3. Design and assess the Role of Cache Memory in the Memory Hierarchy.
8.1 Introduction to memory hierarchy
A memory is just like a human brain. It is used to store data and instructions. Computer
memory is the storage space in computer where data is to be processed and instructions
required for processing is stored. The memory is divided into large number of small parts
Module 8: Memory hierarchy, Cache memory and performance 1
IFT212 – Computer Architecture and Organization
called cells. Each location or cell has a unique address which varies from zero to memory size
minus one. For example, if computer has 64k words, then this memory unit has
64*1024=65536 memory locations. The address of these locations varies from 0 to 65535.
A four level Computer memory hierarchy
Memory is primarily of three types: Cache Memory, Primary Memory/Main Memory and
Secondary Memory.
8.1.1 Memory Representation
All the qualities, physical or otherwise, can be measured in some units. For example, length
is measured in meters and mass in grams. Likewise, for measuring computer memory, a
standard unit is required. The basic unit of memory is bit. Digital computers work on only two
states, ON (1) and OFF (0). One such value (either 0 or 1) is called a binary digit and each
such bit can be considered a symbol for a piece of information. The various units, used to
measure computer memory are as follows:
i. Bits (Binary Digits): A binary digit is logical 0 and 1 representing a passive or an
active state of a component in an electric circuit.
ii. Nibble: A group of 4 bits is called nibble.
iii. Byte: A group of 8 bits is called byte. A byte is the smallest unit which can
represent a data item or a character.
iv. Word: A computer word, like a byte, is a group of fixed number of bits processed
as a unit which varies from computer to computer but is fixed for each computer.
The length of a computer word is called word-size or word length and it may be
as small as 8 bits or may be as long as 96 bits. A computer stores the information
in the form of computer words.
v. KiloByte (KB): 1KB = 1024 Bytes
2 Module 8: Memory hierarchy, Cache memory and performance
IFT212 – Computer Architecture and Organization
vi. MegaByte (MB): 1MB = 1024 KB (1,048,576 bytes)
vii. GigaByte (GB): 1 GB = 1024 MB (1,073,741,824 bytes)
viii. TeraByte (TB): 1 TB = 1024 GB (1,099,511,000,000 bytes)
ix. PeraByte (PB): 1 PB = 1024 TB (1,125,899,900,000,000 bytes)
In computer organisation, the memory hierarchy separates computer storage into a hierarchy
based on response time. Since response time, complexity, and capacity are related, the levels
may also be distinguished by their performance and controlling technologies. Memory
hierarchy affects performance in computer architectural design, algorithm predictions, and
lower-level programming constructs involving locality of reference. Designing for high
performance requires considering the restrictions of the memory hierarchy, i.e. the size and
capabilities of each component. Each of the various components can be viewed as part of a
hierarchy of memories (m1, m2, ..., mn) in which each member mi is typically smaller and
faster than the next highest member mi+1 of the hierarchy. To limit waiting by higher levels, a
lower level will respond by filling a buffer and then signalling for activating the transfer.
There are four major storage levels.
i. Internal – Processor registers and cache.
ii. Main – the system RAM and controller cards.
iii. On-line mass storage – Secondary storage.
iv. Off-line bulk storage – Tertiary and Off-line storage.
This is a general memory hierarchy structuring. Many other structures are useful. For example,
a paging algorithm may be considered as a level for virtual memory when designing a
computer architecture, and one can include a level of nearline storage between online and
offline storage.
Computing systems use memory very heavily, and so performance depends in part on
efficiently accessing memory. In this document, we investigate how modern systems reduce
memory access time using caches. Today's computing systems include four important
technologies for remembering information.
Disks use a magnetic material and represent a bit using the magnetic polarity at an individual
location on the disk. Although some disks are portable, like the old floppy disks, we'll use the
word disk to refer to a computer's hard drive.
Flash memory uses special transistors that can hold their state even in the absence of any
power. Like disks, then, flash memory retain any stored values even when power is off; this
distinguishes it from the other two technologies (DRAM and SRAM), though in this document
we're not concerned about this property.
DRAM (for Dynamic Random Access Memory) uses electrical capacitors for holding data. A
capacitor holds an electronic charge for a brief instant. Storing a bit long-term requires
recharging each depleted capacitor many times a second. DRAM is so central to a computer's
operation than the terms memory and RAM refer almost always to DRAM. (Manufacturers
often distinguish between varieties of DRAM. The S in SDRAM, for example, stands for
synchronous.)
Module 8: Memory hierarchy, Cache memory and performance 3
IFT212 – Computer Architecture and Organization
SRAM (for Static Random Access Memory) is built using transistors, working on the same
principle as an SR latch built out of two NOR gates (or two NAND gates).
Most modern CPUs are so fast that for most program workloads, the bottleneck is the locality
of reference of memory accesses and the efficiency of the caching and memory transfer
between different levels of the hierarchy. As a result, the CPU spends much of its time idling,
waiting for memory I/O to complete. This is sometimes called the space cost, as a larger
memory object is more likely to overflow a small/fast level and require use of a larger/slower
level. The resulting load on memory use is known as pressure (respectively register pressure,
cache pressure, and (main) memory pressure). Terms for data being missing from a higher
level and needing to be fetched from a lower level are, respectively: register spilling (due to
register pressure: register to cache), cache miss (cache to main memory), and (hard) page
fault (main memory to disk).
Modern programming languages mainly assume two levels of memory, main memory and disk
storage, though in assembly language and inline assemblers in languages such as C, registers
can be directly accessed. Taking optimal advantage of the memory hierarchy requires the
cooperation of programmers, hardware, and compilers (as well as underlying support from the
operating system).
8.2 Cache memory and it’s types
A CPU cache is a piece of hardware that reduces access time to data in memory by keeping
some part of the frequently used data of the main memory in a 'cache' of smaller and faster
memory.
A CPU cache is a hardware cache used by the central processing unit (CPU) of a computer
to reduce the average cost (time or energy) to access data from the main memory.
A cache is a smaller, faster memory, located closer to a processor core, which stores copies
of the data from frequently used main memory locations. Most CPUs have a hierarchy of
multiple cache levels (L1, L2, often L3, and rarely even L4), with different instruction-specific
and data-specific caches at level 1.
The cache memory is typically implemented with static random-access memory (SRAM), in
modern CPUs by far the largest part of them by chip area, but SRAM is not always used for
all levels (of I- or D-cache), or even any level, sometimes some latter or all levels are
implemented with eDRAM.
Cache are faster than the main memory, the cache are faster yet very expensive memories
and are used in only small sizes. Example cache of sizes 128KB, 256KB etc. are normally
used in typical Pentium based systems. Whereas they can have 4 to 128MB RAMs or even
more. Memory cache is a portion of the high-speed static RAM (SRAM) and is effective
because most programs access the same data or instructions over and over. By keeping as
much of this information as possible in SRAM, the computer avoids accessing the slower
DRAM, making the computer perform faster and more efficiently.
Today, most computers come with L3 cache or L2 cache, while older computers included only
L1 cache. Below is an example of the Intel i7 processor and its shared L3 cache.
4 Module 8: Memory hierarchy, Cache memory and performance
IFT212 – Computer Architecture and Organization
Cache memory is fast and expensive. Traditionally, it is categorized as "levels" that describe
its closeness and accessibility to the microprocessor:
Level 1 (L1) cache is extremely fast but relatively small, and is usually embedded in the
processor chip (CPU).
Level 2 (L2) cache is often more capacious than L1; it may be located on the CPU or on a
separate chip or coprocessor with a high-speed alternative system bus interconnecting the
cache to the CPU, so as not to be slowed by traffic on the main system bus.
Level 3 (L3) cache is typically specialized memory that works to improve the performance of
L1 and L2. It can be significantly slower than L1 or L2, but is usually double the speed of RAM.
In the case of multicore processors, each core may have its own dedicated L1 and L2 cache,
but share a common L3 cache. When an instruction is referenced in the L3 cache, it is typically
elevated to a higher tier cache.
The performance of a computer system depends on the performance of all individual units—
which include execution units like integer, branch and floating point, I/O units, bus, caches and
memory systems. Cache performance measurement has become important in recent times
where the speed gap between the memory performance and the processor performance is
increasing exponentially. The cache was introduced to reduce this speed gap. Thus knowing
how well the cache is able to bridge the gap in the speed of processor and memory becomes
important, especially in high-performance systems.
The cache hit rate and the cache miss rate play an important role in determining this
performance. To improve the cache performance, reducing the miss rate becomes one of the
necessary steps among other steps. Decreasing the access time to the cache also gives a
boost to its performance and helps with optimization.
The critical component in most high-performance computers is the cache. Since the cache
exists to bridge the speed gap, its performance measurement and metrics are important in
designing and choosing various parameters like cache size, associativity, replacement policy,
etc.
Cache performance depends on cache hits and cache misses, which are the factors that
create constraints to system performance. Cache hits are the number of accesses to the cache
that actually find that data in the cache, and cache misses are those accesses that don't find
the block in the cache. These cache hits and misses contribute to the term average access
time (AAT) also known as AMAT (average memory access time), which, as the name
suggests, is the average time it takes to access the memory. This is one major metric for
cache performance measurement, because this number becomes highly significant and
critical as processor speed increases.
Module 8: Memory hierarchy, Cache memory and performance 5