0% found this document useful (0 votes)
32 views151 pages

Input Output Organization (2.3)

The document outlines key concepts in computer organization and architecture, focusing on memory hierarchy, cache memory, and various cache mapping techniques. It discusses types of memory, access methods, cache replacement algorithms, and the importance of cache size and mapping strategies. Additionally, it covers fragmentation and its impact on memory efficiency.

Uploaded by

Ritika Chauhan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views151 pages

Input Output Organization (2.3)

The document outlines key concepts in computer organization and architecture, focusing on memory hierarchy, cache memory, and various cache mapping techniques. It discusses types of memory, access methods, cache replacement algorithms, and the importance of cache size and mapping strategies. Additionally, it covers fragmentation and its impact on memory efficiency.

Uploaded by

Ritika Chauhan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 151

University Institute of Engineering

DEPARTMENT OF COMPUTER SCIENCE


& ENGINEERING
Bachelor of Engineering (Computer Science & Engineering)
Subject Name: Computer Organization & Architecture
Subject Code: CST-252

INPUT OUTPUT ORGANIZATION DISCOVER . LEARN . EMPOWER


Topics covered
• Asynchronous Data transfer:

• Source Initiated

• Destination Initiated

• Handshaking

• Programmed I/O

• Interrupts DMA, and IOP

2
Memory Hierarchy in Computer
Architecture
• The memory in a computer can be divided into five hierarchies based
on the speed as well as use.
• The processor can move from one level to another based on its
requirements.
• The five hierarchies in the memory are registers, cache, main
memory, magnetic discs, and magnetic tapes.
• The first three hierarchies are volatile memories which mean when
there is no power, and then automatically they lose their stored data.
• Whereas the last two hierarchies are not volatile which means they
store the data permanently.
3
Memory Hierarchy in Computer
Architecture

4
Memory Hierarchy in Computer
Architecture
• Primary Memory
• The primary memory is also known as internal memory, and this is
accessible by the processor straightly. This memory includes main,
cache, as well as CPU registers.
• Secondary Memory
• The secondary memory is also known as external memory, and this is
accessible by the processor through an input/output module. This
memory includes an optical disk, magnetic disk, and magnetic tape.

5
Characteristics of Memory
Hierarchy
• Performance
• Ability
• Access Time
• Cost per bit

6
Memory Access Methods

• Random Access:
• Sequential Access
• Direct Access

7
Cache Memory

• The data or contents of the main memory that are used again and again by
CPU, are stored in the cache memory so that we can easily access that data
in shorter time.
• Whenever the CPU needs to access memory, it first checks the cache
memory. If the data is not found in cache memory then the CPU moves
onto the main memory. It also transfers block of recent data into the cache
and keeps on deleting the old data in cache to accomodate the new one.
• Cache memory is a random access memory.
• It lies on the path between the processor and the main memory.
• It bridges the speed mismatch between the fastest processor and the
slower main memory.

8
Hit Ratio
• The performance of cache memory is measured in terms of a quantity
called hit ratio.
• When the CPU refers to memory and finds the word in cache it is said to
produce a hit.
• If the word is not found in cache, it is in main memory then it counts as
a miss.

• The ratio of the number of hits to the total CPU references to memory is
called hit ratio.
• Hit ratio= Hit/(Hit+Miss)

9
types of cache memory

10
types of cache memory
• The L1 cache, also known as the primary cache, is very fast, but it is
relatively small. It is embedded usually in the processor chip in the
form of the CPU cache.
• L1 is the primary type cache memory.
• The Size of the L1 cache very small comparison to others that is
between 2KB to 64KB, it depends on computer processor.
• It is an embedded register in the computer microprocessor(CPU).
• The Instructions that are required by the CPU that are firstly searched
in L1 Cache.

11
types of cache memory
• The secondary cache, also known as the L2 cache, is often
comparatively more capacious than the L1 cache.
• L2 is secondary type cache memory.
• The Size of the L2 cache is more capacious than L1 that is between
256KB to 512KB.
• L2 cache is Located on computer microprocessor.
• After searching the Instructions in L1 Cache,if not found then it
searched into L2 cache by computer microprocessor.
• The high-speed system bus interconnecting the cache to the
microprocessor.
12
types of cache memory
• The Level 3 (or L3) cache refers to a specialized memory that is
developed in order to improve the actual performance of the L1 and
L2.
• The L3 cache is larger in size but also slower in speed than L1 and L2.
• Its size is between 1MB to 8MB.
• In Multi core processors, each core may have separate L1 and L2,but
all core share a common L3 cache.
• L3 cache double speed than the RAM.

13
need
• The need for the cache memory is due to the mismatch between the
speeds of the main memory and the CPU.
• It acts as a high speed buffer between CPU and main memory and is
used to temporary store very active data

14
Cache replacement Algorithms

• Caching
• Caching is the process of storing some data near where It's supposed
to be used rather than accessing them from an expensive origin,
every time a request comes in.

15
Cache Replacement Algorithms
• LRU
• LRU keeps the least
recently used objects at
the top and evicts
objects that haven't been
used in a while if the list
reaches the maximum
capacity.

16
Cache Replacement Algorithms

• LFU
• the Least Frequently Used (LFU) algorithm works similarly to LRU
except it keeps track of how many times an object was accessed
instead of how recently it was accessed.
• Each object has a counter that counts how many times it was
accessed.
• When the list reaches the maximum capacity, objects with the lowest
counters are evicted.

17
Cache Replacement Algorithms

• FIFO

18
Random Replacement (RR)

• This algorithm randomly selects an object when it reaches maximum


capacity.
• It has the benefit of not keeping any reference or history of objects
and being very simple to implement at the same time.

19
. Write Policy

• write-through
• write-around
• write-back

• A cache’s write policy is the behavior of a cache while performing a


write operation

20
Write-through

21
Write-around

• If we do not expect a read operation


shortly after, the cache would
become polluted with the entries
we’re not using.
• To avoid cache pollution, we may
bypass cache entry allocation in case
of a cache miss:

22
Write-back/write-behind:
• we would have to return
the response before
updating the backing store.
• In this case, the backing
store update happens
asynchronously in a
separate sequence
• For CPU caches, we use
a dirty bit as a state
indicator.

23
Cache Size and cache Lines

• Cache Size :
• The "size" of the cache is the amount of main memory data it can
hold. This size can be calculated as the number of bytes stored in
each data block times the number of blocks stored in the cache.
• Cache Lines
• Cache memory is divided into equal size partitions called as cache
lines

24
Effect of Changing block size
• Case-01: Decreasing the Block Size-
• A smaller block size will contain a smaller number of near by addresses in it.
• Thus, only smaller number of near by addresses will be brought into the cache.
• This increases the chances of cache miss which reduces the exploitation of spatial
locality.
• Thus, smaller is the block size, inferior is the spatial locality.

• Case-02: Increasing the Block Size-


• A larger block size will contain a larger number of near by addresses in it.
• Thus, larger number of near by addresses will be brought into the cache.
• This increases the chances of cache hit which increases the exploitation of spatial
locality.
25
Inc/Decrease

26
Cache Mapping:
• The main memory gets divided into
multiple partitions of equal size,
known as the frames or blocks.
• The cache memory is actually divided
into various partitions of the same
sizes as that of the blocks, known as
lines.
• The main memory block is copied
simply to the cache during the process
of cache mapping, and this block isn’t
brought at all from the main memory
27
Cache Mapping:

• There are three different types of mapping used for the purpose of
cache memory which are as follows:
• Direct mapping,
• Associative mapping
• K-Way Set-Associative mapping

28
Direct Mapping -

• In direct mapping, the cache consists of normal high-speed random-


access memory.
• Each location in the cache holds the data, at a specific address in the
cache.
• This address is given by the lower significant bits of the main memory
address.
• This enables the block to be selected directly from the lower
significant bit of the memory address.
• The remaining higher significant bits of the address are stored in the
cache with the data to complete the identification of the cached data.

29
Direct Mapping -

30
Direct Mapping -

• The index is first used to access a word in the cache.


• The tag stored in the accessed word is read.
• This tag is then compared with the tag in the address.
• If two tags are same this indicates cache hit and required
data is read from the cache word.
• If the two tags are not same, this indicates a cache miss.
Then the reference is made to the main memory to find it.

31
Direct Mapping -

32
Direct Mapping -

• In such a case, the main memory address consists of a tag, an index


and a word within a line. All the words within a line in the cache have
the same stored tag
• The index part in the address is used to access the cache and the
stored tag is compared with required tag address.
• For a read operation, if the tags are same, the word within the block is
selected for transfer to the processor. If tags are not same, the block
containing the required word is first transferred to the cache.

33
Direct Mapping
In direct mapping, the cache consists of normal high-speed random-access
memory. A certain block of the main memory would be able to map a cache only
up to a certain line of the cache. The total line numbers of cache to which any
distinct block can map are given by the following:
Cache line number = (Address of the Main Memory Block) Modulo (Total
number of lines in Cache)
For example,
Let us consider that particular cache memory is divided into a total of ‘n’ number
of lines.
Then, the block ‘j’ of the main memory would be able to map to line number
only of the cache (j mod n).
The Need for Replacement Algorithm
In the case of direct mapping,
 There is no requirement for a replacement
algorithm.
 It is because the block of the main memory
would be able to map to a certain line of the
cache only.
 Thus, the incoming (new) block always
happens to replace the block that already exists,
if any, in this certain line.
Division of Physical Address
In the case of direct mapping, the division of the physical address occurs as
follows:
Set Associative Mapping -

37
Set Associative Mapping -

• In set associative mapping a cache is divided into a set of blocks.


• The number of blocks in a set is known as associativity or set size.
• Each block in each set has a stored tag. This tag together with index
completely identify the block
• Thus, set associative mapping allows a limited number of blocks, with
the same index and different tags.

38
Set Associative Mapping -

• In this type of cache, the following steps are used to access the data
from a cache:
• The index of the address from the processor is used to access the set.
• Then the comparators are used to compare all tags of the selected set
with the incoming tag.
• If a match is found, the corresponding location is accessed.
• If no match is found, an access is made to the main memory.

39
Fully Associative Mapping

In the case of fully associative mapping,


 The main memory block is capable of
mapping to any given line of the cache
that’s available freely at that particular
moment.
 It helps us make a fully associative
mapping comparatively more flexible
than direct mapping.
Let us consider the scenario given as
follows:
Here, we can see that,
 Every single line of cache is available freely.
 Thus, any main memory block can map to a line of the cache.
 In case all the cache lines are occupied, one of the blocks that exists already
needs to be replaced.

The Need for Replacement Algorithm


In the case of fully associative mapping,
 The replacement algorithm is always required.
 The replacement algorithm suggests a block that is to be replaced whenever
all the cache lines happen to be occupied.
 So replacement algorithms such as LRU Algorithm, FCFS Algorithm, etc. are
employed.
Division of Physical Address
In the case of fully associative mapping, the division of the physical address
occurs as follows:
Fully associative mapping

In fully associative type of cache memory, each location in cache stores both
memory address as well as data.

43
Fully associative mapping

44
Fully associative mapping

• A line constitutes four words, each word being 4 bytes.


• In such case, the least significant part of the address selects the
particular byte,
• the next part selects the word, and the remaining bits form the address.
• These address bits are compared to the address in the cache. The whole
line can be transferred to and from the cache in one transaction if there
are sufficient data paths between the main memory and the cache.
• With only one data word path, the words of the line have to be
transferred in separate transactions.

45
K-way Set Associative Mapping
In the case of k-way set associative mapping,
 The grouping of the cache lines occurs into various sets where all the sets
consist of k number of lines.
 Any given main memory block can map only to a particular cache set.
 However, within that very set, the block of memory can map any cache line
that is freely available.
 The cache set to which a certain main memory block can map is basically
given as follows:
Cache set number = (Block Address of the Main Memory) Modulo (Total
Number of sets present in the Cache)
Let us consider the example given as follows of a two-way set-associative
mapping:
In this case,
• k = 2 would suggest that every set consists of
two cache lines.
• Since the cache consists of 6 lines, the total
number of sets that are present in the cache =
6/2 = 3 sets.
• The block ‘j’ of the main memory is capable of
mapping to the set number only (j mod 3) of the
cache.
• Here, within this very set, the block ‘j’ is
capable of mapping to any cache line that is
freely available at that moment.
• In case all the available cache lines happen to
be occupied, then one of the blocks that already
exist needs to be replaced.
The Need for Replacement Algorithm
In the case of k-way set associative mapping,
 The k-way set associative mapping refers to a combination of the direct
mapping as well as the fully associative mapping.
 It makes use of the fully associative mapping that exists within each set.
 Therefore, the k-way set associative mapping needs a certain type of
replacement algorithm.

Division of Physical Address


In the case of fully k-way set mapping, the division of the physical address occurs
as follows:
Special Cases

 In case k = 1, the k-way set associative mapping would become direct


mapping. Thus, Direct Mapping = one-way set associative mapping
 In the case of k = The total number of lines present in the cache, then the k-
way set associative mapping would become fully associative mapping.
Fragmentation
• What is Fragmentation ?
• Fragmentation refers to a process of information storage where the memory
space of the system is used inadequately, thus reducing the overall efficiency
or ability or both (sometimes). The implications of the process of
fragmentation depend entirely on the specific allocation of storage space
schemes in the operation along with the particular fragmentation types. In
some instances, fragmentation leads to some unused storage capacity. This
concept is also applicable to the generated unused space in this very
situation.
• The memory used for the preservation of the data set (like file formats) is very
similar to the other systems (like the FAT file system), irrespective of the
amount of fragmentation (it happens from null to the extreme).
50
Types of Fragmentation

• External Fragmentation
• Internal Fragmentation

• Fragmentation can often be acknowledged when preferring


enhancements, usability, or inefficiency. Similar things might also
happen for the other tools, like processors. Let us discuss what’s the
major difference between internal and external fragmentation

51
Fragmentation

• Internal Fragmentation
• Whenever a memory block gets allocated with a process, and in case the process
happens to be smaller than the total amount of requested memory, a free space
is ultimately created in this memory block. And due to this, the memory block’s
free space is unused. This is what causes internal fragmentation
• External Fragmentation
• External fragmentation occurs whenever a method of dynamic memory
allocation happens to allocate some memory and leave a small amount of
unusable memory. The total quantity of the memory available is reduced
substantially in case there’s too much external fragmentation. So, there’s enough
memory space in order to complete a request, and it is not contiguous. Thus, it is
known as external fragmentation.
52
What is virtual memory?

• Virtual memory, also called virtual storage, is a technique that computers use to optimize memory
management by transferring data between different storage systems, such as random access
memory (RAM) and disk storage. This memory system is a built-in component of most modern
desktop computers. It's incorporated into the hardware of a computer's central processing unit
(CPU) and is a more cost-effective method for managing memory than expanding the computer's
physical memory storage system. Here are some other advantages of this memory system:
• It frees applications from having to compete for shared memory space.
• It allows processes to share memory between libraries (a collection of code that provides the
foundation for a program's operations).
• It improves security by isolating and segmenting where the computer stores information.
• It increases the amount of memory available by working outside the limits of a computer's physical
memory space.
• It optimizes the CPU usage.
• It manages multiple applications at the same time.

53
Memory Management Unit

54
virtual memory
• In the most computer system, the physical main memory is not as large as address
space of the processor.
• When we try to run a program, if it do not completely fit into the main memory the
parts of its currently being executed are stored in main memory and remaining portion
is stored in secondary storage device such as HDD.
• When a new part of program is to be brought into main memory for execution and if the
memory is full, it must replace another part which is already in main memory.
• As this secondary memory is not actually part of system memory, so for CPU, secondary
memory is Virtual Memory.
• Virtual Memory is used to logically extend the size of main memory.
• When Virtual Memory is used, the address field is virtual address.
• A special hardware unit knows as MMU translates Virtual Address into Physical Address.
55
• Virtual Memory is a imaginary
memory which we assume or use,
when we have a material that
exceeds our memory at that time.

• Virtual Memory is temporary memory


which is used along with the ram of
the system.

56
Virtual Memory
Virtual Memory is a storage scheme that provides user an illusion of having a very big main memory. This is done
by treating a part of secondary memory as the main memory.
In this scheme, User can load the bigger size processes than the available main memory by having the illusion that
the memory is available to load the process.
Instead of loading one big process in the main memory, the Operating System loads the different parts of more
than one process in the main memory.
By doing this, the degree of multiprogramming will be increased and therefore, the CPU utilization will also be
increased.

How Virtual Memory Works?


In this scheme, whenever some pages needs to be loaded in the main memory for the execution and the memory is
not available for those many pages, then in that case, instead of stopping the pages from entering in the main
memory, the OS search for the RAM area that are least used in the recent times or that are not referenced and copy
that into the secondary memory to make the space for the new pages in the main memory.
Since all this procedure happens automatically, therefore it makes the computer feel like it is having the unlimited
RAM.
Demand Paging
Demand Paging is a popular method of virtual memory management. In demand paging, the pages of a process
which are least used, get stored in the secondary memory.
A page is copied to the main memory when its demand is made or page fault occurs. There are various page
replacement algorithms which are used to determine the pages which will be replaced.
Advantages of Virtual Memory
The degree of Multiprogramming will be increased.
User can run large application with less real RAM.
There is no need to buy more memory RAMs.

Disadvantages of Virtual Memory


The system becomes slower since swapping takes time.
It takes more time in switching between applications.
The user will have the lesser hard disk space for its use.
virtual memory
• Address Translation is done by two techniques

• Paging
• Segmentation

59
Paging

• Paging is a non-contiguous memory allocation technique in which


secondary memory and the main memory is divided into equal
size partitions.
• The partitions of the secondary memory are called pages while the
partitions of the main memory are called frames .
• They are divided into equal size partitions to have maximum
utilization of the main memory and avoid external fragmentation.

60
Paging
• Example: We have a process P having process size as 4B, page size as
1B. Therefore there will be four pages(say, P0, P1, P2, P3) each of size
1B. Also, when this process goes into the main memory for execution
then depending upon the availability, it may be stored in non-
contiguous fashion in the main memory frame as shown below:

61
Paging
In Operating Systems, Paging is a storage mechanism used to retrieve processes from the secondary storage into the main memory
in the form of pages.
The main idea behind the paging is to divide each process in the form of pages. The main memory will also be divided in the form of
frames.
One page of the process is to be stored in one of the frames of the memory. The pages can be stored at the different locations of the
memory but the priority is always to find the contiguous frames or holes.
Prepared By: Dr. Tarun Singhal
Translation of logical Address into physical
Address

• As a CPU always generates a logical address and we need a physical address


for accessing the main memory. This mapping is done by the MMU(memory
management Unit) with the help of the page table . Lets first understand
some of the basic terms then we will see how this translation is done.
• Logical Address: The logical address consists of two parts page
number and page offset.
• 1. Page Number: It tells the exact page of the process which the CPU wants
to access.
• 2. Page Offset: It tells the exact word on that page which the CPU wants to
read.
• 3.Logical Address = Page Number + Page Offset
64
Translation of logical Address into physical
Address

• Physical Address: The physical address consists of two parts frame


number and page offset.
• 1. Frame Number: It tells the exact frame where the page is stored in
physical memory.
• 2. Page Offset: It tells the exact word on that page which the CPU
wants to read. It requires no translation as the page size is the same
as the frame size so the place of the word which CPU wants access
will not change.
• Physical Address = Frame Number + Page Offset

65
Page table
• Page table: A page table contains the frame number corresponding to
the page number of some specific process. So, each process will have
its own page table. A register called Page Table Base Register(PTBR)
which holds the base value of the page table.

66
Translation Process
• The CPU generates the logical address which contains the page
number and the page offset .
• The PTBR register contains the address of the page table. Now, the
page table helps in determining the frame number corresponding to
the page number.
• Now, with the help of frame number and the page offset the physical
address is determined and the page is accessed in the main memory.

67
Translation Process

68
Advantages/ Disadvantages
• Advantages of Paging
• There is no external fragmentation as it allows us to store the data in a non-contiguous way.
• Swapping is easy between equal-sized pages and frames.
• Disadvantages of Paging
• As the size of the frame is fixed, so it may suffer from internal fragmentation. It may happen
that the process is too small and it may not acquire the entire frame size.
• The access time increases because of paging as the main memory has to be now accessed
two times. First, we need to access the page table which is also stored in the main memory
and second, combine the frame number with the page offset and then get the physical
address of the page which is again stored in the main memory.
• For every process, we have an independent page table and maintaining the page table is
extra overhead.

69
Segmentation is a memory management
technique whereby data items are stored in
segments on the storage media. It divides the
process or user-accessible space into fixed-sized
blocks, called segments.

Segmentation is a memory management


technique that splits up the virtual address space
of an application into chunks.
By splitting the memory up into manageable
chunks, the operating system can track which parts
of the memory are in use and which parts are free.
This makes allocating and deallocating memory
much faster and simpler for the operating system.
The segments are of unequal size and are not
placed in a contiguous way. As it’s a non-
contiguous memory allocation technique, internal
fragmentation doesn’t occur. The length is decided
on the base of the purpose of the segment in a
user program.
Advantages of Segmentation
•No internal fragmentation
•Average Segment Size is larger than the actual page size.
•Less overhead
•It is easier to relocate segments than entire address space.
•The segment table is of lesser size as compared to the page table in paging.
Disadvantages
•It can have external fragmentation.
•it is difficult to allocate contiguous memory to variable sized partition.
•Costly memory management algorithms.
Segmentation

• In segmentation, we divide the process into modules for better


visualization of the process. Here each segment or module consists of
the same type of functions.
• For example, the main function is included in one segment, library
function is kept in other segments, and so on.
• As the size of segments may vary, so memory is divided into variable
size parts.

72
Segmentation
• As a CPU always generates a logical address and we need a
physical address for accessing the main memory. This mapping is
done by the MMU(memory management Unit) with the help of
the segment table .
• Lets first understand some of the basic terms then we will see how
this translation is done.
• Logical Address: The logical address consists of two parts segment
number and page offset.
• 1. Segment Number: It tells the specific segment of the process from
which the CPU wants to read the data.
73
Segmentation
• 2. Segment Offset: It tells the exact word in that segment which the CPU
wants to read.
• 3. Logical Address = Segment Number + Segment Offset
• Physical Address: The physical address is obtained by adding the base
address of the segment to the segment offset.
• Segment table: A segment table stores the base address of each segment
in the main memory. It has two parts i.e. Base and Limit .
Here, base indicates the base address or starting address of the segment in
the main memory. Limit tells the size of that segment. A register called
Segment Table Base Register(STBR) which holds the base value of the
segment table. The segment table is also stored in the main memory itself.
74
Translation Process
• The CPU generates the logical address which contains the segment number and
the segment offset .
• STBR register contains the address of the segment table.
• Now, the segment table helps in determining the base address of the
segment corresponding to the page number.
• Now, the segment offset is compared with the limit corresponding to the Base.
• If the segment offset is greater than the limit then it is an invalid address. This is
because the CPU is trying to access a word in the segment and this value is greater
than the size of the segment itself which is not possible. If the segment offset is
less than or equal to the limit then only the request is accepted. The physical
address is generated by adding the base address of the segment to the segment
offset.
75
Segmentation

76
Segmentation
• Advantages of Segmentation
• The size of the segment table is less compared to the size of the page table.
• There is no internal fragmentation.
• Disadvantages of Segmentation
• When the processes are loaded and removed ( during swapping ) from the main memory
then free memory spaces are broken into smaller pieces and this causes external
fragmentation.
• Here also the time to access the data increases as due to segmentation the main memory
has to be now accessed two times. First, we need to access the segment table which is also
stored in the main memory and second, combine the base address of the segment with the
segment offset and then get the physical address which is again stored in the main memory.

77
Sr Paging Segmentation
No.
1 Non-Contiguous memory allocation Non-contiguous memory allocation

2 Paging divides program into fixed size pages. Segmentation divides program into
variable size segments.
3 OS is responsible Compiler is responsible.

4 Paging is faster than segmentation Segmentation is slower than paging

5 Paging is closer to Operating System Segmentation is closer to User

6 It suffers from internal fragmentation It suffers from external fragmentation

7 There is no external fragmentation There is no external fragmentation

8 Logical address is divided into page number and Logical address is divided into segment
page offset number and segment offset

9 Page table is used to maintain the page Segment Table maintains the segment
information. information
10 Page table entry has the frame number and some Segment table entry has the base
flag bits to represent details about pages. address of the segment and some
protection bits for the segments.
Fragmentation
Fragmentation is an unwanted problem where the memory blocks cannot be allocated to the
processes due to their small size and the blocks remain unused.
Internal Fragmentation
In this fragmentation, the process is allocated a memory block of size more than the size of that process. Due to this some part
of the memory is left unused and this cause internal fragmentation.

This problem can be removed if we use dynamic


partitioning for allocating space to the process. In
dynamic partitioning, the process is allocated only that
much amount of space which is required by the
process. So, there is no internal fragmentation.

P1 of size
3MB
External Fragmentation
In this fragmentation, although we have total space available that is needed by a process still we are not
able to put that process in the memory because that space is not contiguous. This is called external
fragmentation.

This problem is occurring because we are


allocating memory continuously to the
processes. So, if we remove this condition
external fragmentation can be reduced. This is
what done in compactation, paging &
segmentation(non-contiguous memory
allocation techniques) where memory is
allocated non-contiguously to the processes.
Basic Cache Optimization Techniques
The cache is a part of the hierarchy present next to the CPU. It is used in storing the
frequently used data and instructions. It is generally very costly i.e., the larger the cache
memory, the higher the cost. Hence, it is used in smaller capacities to minimize costs. To
make up for its less capacity, it must be ensured that it is used to its full potential.
Optimization of cache performance ensures that it is utilized in a very efficient manner to its
full potential.

Average Memory Access Time(AMAT):


AMAT helps in analyzing the Cache memory and its performance. The lesser the AMAT, the better the
performance is. AMAT can be calculated as,

AMAT
= Hit Ratio * Cache access time + Miss Ratio * (Cache access time
+ Main memory access time) = (h * tc) + (1-h) * (tc+tm)
Or
= Hit Time + (Miss Rate * Miss Penalty)
if Hit time, Miss Rate, and Miss Penalty are reduced, the AMAT reduces which in turn ensures optimal
performance of the cache.
Methods for reducing Hit Time, Miss Rate, and Miss Penalty:
Methods to reduce Hit Time:
1. Small and simple caches: If lesser hardware is required for the implementation of caches, then it
decreases the Hit time because of the shorter critical path through the Hardware.
2. Avoid Address translation during indexing: Caches that use physical addresses for indexing are
known as a physical cache. Caches that use the virtual addresses for indexing are known as virtual
cache. Address translation can be avoided by using virtual caches. Hence, they help in reducing the
hit time.
Methods to reduce Miss Rate:
1. Larger block size: If the block size is increased, spatial locality can be exploited in an efficient way
which results in a reduction of miss rates
2. Larger cache size: Increasing the cache size results in a decrease of capacity misses, thereby
decreasing the miss rate. But, they increase the hit time and power consumption.
3. Higher associativity: Higher associativity results in a decrease in conflict misses. Thereby, it helps in
reducing the miss rate.
Methods to reduce Miss Penalty:
1. Multi-Level Caches: If there is only one level of cache, then we need to decide
between keeping the cache size small in order to reduce the hit time or making it
larger so that the miss rate can be reduced. Both of them can be achieved
simultaneously by introducing cache at the next levels.
2. Critical word first and Early Restart: Generally, the processor requires one word of
the block at a time. So, there is no need of waiting until the full block is loaded before
sending the requested word. This is achieved using:
The critical word first: It is also called a requested word first. In this method, the exact
word required is requested from the memory and as soon as it arrives, it is sent to the
processor. In this way, two things are achieved, the processor continues execution, and
the other words in the block are read at the same time.
Early Restart: In this method, the words are fetched in the normal order. When the
requested word arrives, it is immediately sent to the processor which continues
execution with the requested word.
Problem:

The main memory of a computer has 2 cm blocks while the cache has 2c blocks. If the
cache uses the set associative mapping scheme with 2 blocks per set, then block k of
the main memory maps to how many set.

Number of blocks in main memory = 2 cm


Number of blocks in cache = 2 c
Number of blocks in one set of cache = 2

Number of Sets in Cache-

Number of sets in cache


= Number of blocks in cache / Number of blocks in one set
=2c/2
=c
A block 4-set associative cache memory consists of 128 blocks divided
into four block sets . The main memory consists of 16,384 blocks and
each block contains 256 eight bit words.
i)How many bits are required for addressing the main memory?

ii)How many bits are needed to represent the TAG, SET and WORD fields?

Given-
Number of blocks in cache memory = 128
Number of blocks in each set of cache = 4
Main memory size = 16384 blocks
Block size = 256 bytes
1 word = 8 bits = 1 byte
We have-
Size of main memory
= 16384 blocks
= 16384 x 256 bytes
= 222 bytes
Thus, Number of bits required to address main memory = 22 bits
We have-
Block size
= 256 bytes
= 28 bytes
Thus, Number of bits in block offset or word = 8 bits

89
Number of sets in cache
= Number of lines in cache / Set size
= 128 blocks / 4 blocks
= 32 sets
= 25 sets
Thus, Number of bits in set number = 5 bits

Number of bits in tag


= Number of bits in physical address – (Number of bits in set number
+ Number of bits in word)
= 22 bits – (5 bits + 8 bits)
= 22 bits – 13 bits
= 9 bits
Thus, Number of bits in tag = 9 bits

Thus, physical address is-


91
92
93
94
95
96
97
98
Synchronous data transfer
• In Synchronous data transfer, the sending and receiving units are
enabled with same clock signal.
• It is possible between two units when each of them knows the behavior
of the other.
• The master performs a sequence of instructions for data transfer in a
predefined order. All these actions are synchronized with the common
clock.
• The master is designed to supply the data at a time when the slave is
definitely ready for it. Usually, the master will introduce sufficient delay
to take into account the slow response of the slave, without any request
from the slave.
99
Synchronous data transfer
• The master does not expect any acknowledgment signal from the slave when
data is sent by the master to the slave.
• Similarly, when data from the slave is read by the master, neither the slave
informs that the data has been placed on the data bus nor the master
acknowledges that the data has been read.
• Both the master and slave perform their own task of transferring data at a
designed clock period. Since both devices know the behavior (response time)
of each other, no difficulty arises.
• Prior to transferring data, the master must logically select the slave either by
sending slave’s address or sending “device select” signal to the slave. But there
is no acknowledgment signal from the slave to the master if the device is
selected.
100
Timing diagram of the synchronous read operation is
given below:

101
Synchronous data transfer
• Advantages –
• The design procedure is easy. The master does not wait for any acknowledge signal
from the slave, though the master waits for a time equal to slave’s response time.
• The slave does not generate an acknowledge signal, though it obeys the timing
rules as per the protocol set by the master or system designer.

• Disadvantages –
• If a slow speed unit connected to a common bus, it can degrade the overall rate of
transfer in the system.
• If the slave operates at a slow speed, the master will be idle for some time during
data transfer and vice versa.

102
Asynchronous data transfer
• Asynchronous data transfer is a mode of data transfer in which two-
components have two different clocks.
• Asynchronous data transfer between any two independent units
requires that the control signals be transmitted between
communicating units so that times can be indicated at which they
transfer data.

103
Asynchronous data transfer
• If the registers in the I/O interface share a common clock with CPU
registers, then transfer between the two units is said to be
synchronous.
• But in most cases, the internal timing in each unit is independent of
each other, so each uses its private clock for its internal registers.
• In this case, the two units are said to be asynchronous to each other,
and if data transfer occurs between them, this data transfer is
called Asynchronous Data Transfer.

104
Asynchronous data transfer
• the Asynchronous Data Transfer between two independent units
requires that control signals be transmitted between the
communicating units so that the time can be indicated at which they
send data.
• Strobe control: A strobe pulse is supplied by one unit to indicate to
the other unit when the transfer has to occur.
• Handshaking: This method is commonly used to accompany each
data item being transferred with a control signal that indicates data in
the bus. The unit receiving the data item responds with another signal
to acknowledge receipt of the data.

105
Asynchronous data transfer
• the CPU is the source during output or write transfer and the
destination unit during input or read transfer.
• Therefore, the control sequence during an asynchronous transfer
depends on whether the transfer is initiated by the source or by the
destination.

106
Asynchronous Data Transfer
Methods
• 1. Strobe Control Method
• The Strobe Control method of asynchronous data transfer employs a
single control line to time each transfer. This control line is also known
as a strobe, and it may be achieved either by source or destination,
depending on which initiate the transfer.
• Source initiated strobe: In the below block diagram, you can see that
strobe is initiated by source, and as shown in the timing diagram, the
source unit first places the data on the data bus.

107
Asynchronous Data Transfer
Methods

108
Asynchronous Data Transfer
Methods
• Destination initiated strobe: In the below block diagram, you see that
the strobe initiated by destination, and in the timing diagram, the
destination unit first activates the strobe pulse, informing the source
to provide the data.

109
2. Handshaking Method

• The strobe method has the disadvantage that the source unit that
initiates the transfer has no way of knowing whether the destination has
received the data that was placed in the bus. Similarly, a destination unit
that initiates the transfer has no way of knowing whether the source unit
has placed data on the bus.
• So this problem is solved by the handshaking method. The handshaking
method introduces a second control signal line that replays the unit that
initiates the transfer.
• In this method, one control line is in the same direction as the data flow
in the bus from the source to the destination. The source unit uses it to
inform the destination unit whether there are valid data in the bus.
110
2. Handshaking Method

• Source initiated handshaking: In the below block diagram, you can


see that two handshaking lines are "data valid", which is generated by
the source unit, and "data accepted", generated by the destination
unit.

111
2. Handshaking Method

112
2. Handshaking Method

• Destination initiated handshaking: In the below block diagram, you


see that the two handshaking lines are "data valid", generated by the
source unit, and "ready for data" generated by the destination unit.
Note that the name of signal data accepted generated by the
destination unit has been changed to ready for data to reflect its new
meaning.

113
2. Handshaking Method

114
• Advantages of Asynchronous Data Transfer
• Asynchronous Data Transfer in computer organization has the following advantages, such as:
• It is more flexible, and devices can exchange information at their own pace. In addition,
individual data characters can complete themselves so that even if one packet is corrupted, its
predecessors and successors will not be affected.
• It does not require complex processes by the receiving device. Furthermore, it means that
inconsistency in data transfer does not result in a big crisis since the device can keep up with
the data stream. It also makes asynchronous transfers suitable for applications where character
data is generated irregularly.
• Disadvantages of Asynchronous Data Transfer
• There are also some disadvantages of using asynchronous data for transfer in computer
organization, such as:
• The success of these transmissions depends on the start bits and their recognition.
Unfortunately, this can be easily susceptible to line interference, causing these bits to be
corrupted or distorted.
• A large portion of the transmitted data is used to control and identify header bits and thus
carries no helpful information related to the transmitted data. This invariably means that more
data packets need to be sent.
115
Mode of Transfer:

• Data transfer between CPU and the I/O devices may be done in
different modes.
• Data transfer to and from the peripherals may be done in any of the
three possible ways
• Programmed I/O.
• Interrupt- initiated I/O.
• Direct memory access( DMA).

116
Programmed I/O.

• Programmed input/output (PIO) is a method of transferring data between the


CPU and a peripheral, such as a network adapter or an ATA storage device. Each
data item transfer is initiated by an instruction in the program, involving the CPU
for every transaction. In contrast, in Direct Memory Access (DMA) operations, the
CPU is not involved in the data transfer.
• The term Programmed I/O can refer to either
Memory-mapped I/O (MMIO) or Port-mapped I/O (PMIO). PMIO refers to
transfers using a special address space outside of normal memory, usually
accessed with dedicated instructions, such as IN and OUT in x86 architectures.
MMIO refers to transfers to I/O devices that are mapped into the normal address
space available to the program. PMIO was very useful for early microprocessors
with small address spaces, since the valuable resource was not consumed by the
I/O devices.
• The best known example of a PC device that uses programmed I/O is the ATA
interface; however, this interface can also be operated in any of several DMA
modes. Many older devices in a PC also use PIO, including legacy serial ports,
legacy parallel ports when not in ECP mode, the PS/2 keyboard and mouse ports,117
Programmed I/O.

118
Programmed I/O.

119
I/O Commands

• Whenever the processor experience the I/O related instruction, to execute this I/O instruction the
processor issues two things I/O command and address on the bus which is decoded by every I/O module
connected to the system. Whichever I/O module is addressed by the processor recognizes this address
respond to the issued I/O command.
• The processor issue the I/O commands to the I/O module can be of four types.
• Control: This I/O command activates the I/O module addressed by the processor and directs it to the
task it has to perform. This command can be customized depending upon the type of peripherals.
• Test: This I/O command tests the status of the I/O module and its peripherals to ensure that the
addressed peripheral is powered on and available for the task. This command also tests whether the
most recent I/O operation has completed successfully or if any error has occurred.
• Read: This I/O command lets the I/O module extract the data from the corresponding peripheral and
store it in its internal buffer. Further, the I/O module can place this data over the data bus on the
processor’s demand.
• Write: This I/O command lets the I/O module accept the data over the data bus and transmit it to the
corresponding peripheral.

120
Programmed I/O.

• In this case, the I/O device does not have direct access to the
memory unit.
• A transfer from I/O device to memory requires the execution of
several instructions by the CPU, including an input instruction to
transfer the data from device to the CPU and store instruction to
transfer the data from CPU to memory.
• In programmed I/O, the CPU stays in the program loop until the I/O
unit indicates that it is ready for data transfer.
• This is a time-consuming process since it needlessly keeps the CPU
busy. This situation can be avoided by using an interrupt facility.
121
Schemes using Programmed IO
• Memory-mapped I/O uses the same address space to address both
main memory and I/O devices.
• The memory and registers of the I/O devices are mapped to (associated
with) address values. So a memory address may refer to either a
portion of physical RAM, or instead to memory and registers of the I/O
device. Thus, the CPU instructions used to access the memory can also
be used for accessing devices.
• With the memory-mapped I/O, the processor access memory and I/O
using a single address space. Here the processor uses the same address,
data, and control bus. So, the same set of machine instructions
addresses both memory and I/O.
122
Memory-mapped I/O

123
Port-mapped I/O
With isolated I/O the address space for memory is isolated from the address space of
I/O. Though the processor uses the same data and address line for memory and I/O
devices it uses a separate control line for memory and I/O devices.

Port-mapped I/O often uses a special class of CPU instructions designed


specifically for performing I/O, such as the IN and OUT instructions found on
microprocessors
Mnemonics, Operand Opcode(in HEX) Bytes

IN Port-address DB 2

OUT Port-Address D3 2

124
Port-mapped I/O
• The IN or OUT instruction mnemonics should be followed by an 8-bit
port address. Thus we can have 28 = 256 input ports and 256 output
ports
• IN and OUT both are 2-Bytes instructions.

125
Need?

• With the programmed I/O this entire functioning is regulated with the
help of a program. This action of transferring the typed characters
from the keyboard to memory and then to the display module must
happen at the right time.
• The speed at which the character can be transmitted and displayed on
the display module is still much slower if compared to the speed of
the processor.
• So, to overcome the difference in the speed of the processor and I/O
device a mechanism must be implemented to synchronize the
transfer of data between the processor and I/O modules.

126
Interrupt- initiated I/O:
• Whenever it is determined that the device is ready for data transfer it initiates an interrupt
request signal to the computer. Upon detection of an external interrupt signal the CPU stops
momentarily the task that it was already performing, branches to the service program to process
the I/O transfer, and then return to the task it was originally performing.

• Note: Both the methods programmed I/O and Interrupt-driven I/O require the active
intervention of the processor to transfer data memory and th I/O module, and any data transfer
must transverse a path through the processor. Thus, both these forms of I/O suffer from two
inherent drawbacks.

• The I/O transfer rate is limited by the speed with which the processor can test and
• service a device.
• The processor is tied up in managing an I/O transfer; a number of instructions must be executed
for each I/O transfer.
127
Interrupt- initiated I/O:
• Interrupt driven I/O is an alternative scheme dealing with I/O.
Interrupt I/O is a way of controlling input/output activity whereby a
peripheral or terminal that needs to make or receive a data transfer
sends a signal. This will cause a program interrupt to be set.
• At a time appropriate to the priority level of the I/O interrupt. Relative
to the total interrupt system, the processors enter an interrupt service
routine. The function of the routine will depend upon the system of
interrupt levels and priorities that is implemented in the
processor. The interrupt technique requires more complex hardware
and software, but makes far more efficient use of the computer’s time
and capacities. Figure 2 shows the simple interrupt processing.
128
Interrupt- initiated I/O:

129
Interrupt- initiated I/O:

130
There are multiple I/O modules, how should the processor
determine the device that issued the interrupt signal?
How does the processor decide which module to process
when multiple interrupts have occurred?

• There are 4 main ways to counter these problems, which are:

• Multiple Interrupt Lines


• Software Poll
• Daisy Chain (Hardware Poll, Vectored)
• Bus Arbitration (Vectored)

131
Interrupt- initiated I/O:
• Multiple Interrupt Lines
As the name suggests, we provide multiple interrupt lines between the processor
and the I/O modules. This allows multiple modules to be handled at the same time.
• The processor picks the interrupt line with highest priority.

• Software Poll
Whenever an interrupt is detected by the processor, it branches to an interrupt
service routine which will poll each and every I/O module to determine the exact
interrupting module. The processor raises a poll which could be in the form of a
command line. Consequently, the address of the respective I/O module which is
interacted by the poll will be placed on the address line.
• The priority is determined by the order in which the modules are polled.
132
Interrupt- initiated I/O:
• Daisy Chain (Hardware Poll, Vectored)
This is actually a hardware poll. The interrupt acknowledge line is
daisy chained to all the modules. Whenever there is an interrupt, the
processor send out an interrupt acknowledge which will propagate
throughout the series of I/O modules. This process will continue until
it reaches a requesting module.
• The priority is determined by the order in which the modules are
polled.

133
Interrupt- initiated I/O:
• Bus Arbitration (Vectored)
The last method which also utilizes vectored interrupts is bus
arbitration. This method involves the I/O module gaining control over
the bus before requesting for the interrupt. This is limited to only one
module at a time.

134
Interrupt- initiated I/O:
• However, when there are multiple interrupts at a single time, there
will be a need to assign priorities. These 4 methods have their own
way of assigning priorities:

135
Direct Memory Access:
• Direct Memory Access: The data transfer between a fast storage media
such as magnetic disk and memory unit is limited by the speed of the
CPU.
• Thus we can allow the peripherals directly communicate with each other
using the memory buses, removing the intervention of the CPU.
• This type of data transfer technique is known as DMA or direct memory
access.
• During DMA the CPU is idle and it has no control over the memory buses.
• The DMA controller takes over the buses to manage the transfer directly
between the I/O devices and the memory unit.

136
DMA

137
DMA

• Large blocks of data transferred at a high speed to or from high-speed


devices, magnetic drums, disks, tapes, etc.
• DMA controller Interface that provides I/O transfer of data directly to
and from the memory and the I/O device
• CPU initializes the DMA controller by sending a memory address and
the number of words to be transferred
• Actual transfer of data is done directly between the device and
memory through DMA controller -> Freeing CPU for other tasks

138
DMA

139
DMA

• The two control signals Bus Request and Bus Grant are used to
fascinate the DMA transfer. The bus request input is used by the DMA
controller to request the CPU for the control of the buses. When BR
signal is high, the CPU terminates the execution of the current
instructions and then places the address, data, read and write lines to
the high impedance state and sends the bus grant signal. The DMA
controller now takes the control of the buses and transfers the data
directly between memory and I/O without processor interaction.
When the transfer is completed, the bus request signal is made low
by DMA. In response to which CPU disables the bus grant and again
CPU takes the control of address, data, read and write lines.
140
DMA

• DMA BURST: The block of data consisting a number of memory words


is transferred at a time.
• CYCLE STEALING: DMA transfers one data word at a time after which
it must return control of the buses to the CPU.

141
DMA CONTROLLER

142
DMA CONTROLLER

• When the bus grant signal is zero, the CPU communicates through
the data bus to read or write into the DMA register. When bus grant
is one, the DMA controller takes the control of buses and transfers
the data between the memory and I/O.

143
DMA TRANSFER

144
DMA TRANSFER
• DMA request signal is given from I/O device to DMA controller.
• DMA sends the bus request signal to CPU in response to which CPU disables its
current instructions and initialize the DMA by sending the following information.
• o The starting address of the memory block where the data are available (for
read) and where data to be stored (for write)
• o The word count which is the number of words in the memory block
• o Control Register to specify the mode of transfer
• o Sends a bust grant as 1 so that DMA controller can take the control of the buses
• o DMA sends the DMA acknowledge signal in response to which peripheral
device puts the words in the data bus (for write) or receives a word from the data
bus (for read).
145
I/O PROCESSORS

146
I/O PROCESSORS

• A computer may incorporate one or more external processors and


assign them the task of communicating directly with the I/O devices
so that no each interface need to communicate with the CPU. An I/O
processor (IOP) is a processor with direct memory access capability
that communicates with I/O devices. IOP instructions are specifically
designed to facilitate I/O transfer. The IOP can perform other
processing tasks such as arithmetic logic, branching and code
translation.

147
I/O PROCESSORS

• A computer may incorporate one or more external processors and


assign them the task of communicating directly with the I/O devices
so that no each interface need to communicate with the CPU. An I/O
processor (IOP) is a processor with direct memory access capability
that communicates with I/O devices. IOP instructions are specifically
designed to facilitate I/O transfer. The IOP can perform other
processing tasks such as arithmetic logic, branching and code
translation.

148
I/O PROCESSORS

• A computer may incorporate one or more external processors and


assign them the task of communicating directly with the I/O devices
so that no each interface need to communicate with the CPU. An I/O
processor (IOP) is a processor with direct memory access capability
that communicates with I/O devices. IOP instructions are specifically
designed to facilitate I/O transfer. The IOP can perform other
processing tasks such as arithmetic logic, branching and code
translation.

149
CPU-IOP Communication

150
THANK YOU

You might also like