Input Output Organization (2.3)
Input Output Organization (2.3)
• Source Initiated
• Destination Initiated
• Handshaking
• Programmed I/O
2
Memory Hierarchy in Computer
Architecture
• The memory in a computer can be divided into five hierarchies based
on the speed as well as use.
• The processor can move from one level to another based on its
requirements.
• The five hierarchies in the memory are registers, cache, main
memory, magnetic discs, and magnetic tapes.
• The first three hierarchies are volatile memories which mean when
there is no power, and then automatically they lose their stored data.
• Whereas the last two hierarchies are not volatile which means they
store the data permanently.
3
Memory Hierarchy in Computer
Architecture
4
Memory Hierarchy in Computer
Architecture
• Primary Memory
• The primary memory is also known as internal memory, and this is
accessible by the processor straightly. This memory includes main,
cache, as well as CPU registers.
• Secondary Memory
• The secondary memory is also known as external memory, and this is
accessible by the processor through an input/output module. This
memory includes an optical disk, magnetic disk, and magnetic tape.
5
Characteristics of Memory
Hierarchy
• Performance
• Ability
• Access Time
• Cost per bit
6
Memory Access Methods
• Random Access:
• Sequential Access
• Direct Access
7
Cache Memory
• The data or contents of the main memory that are used again and again by
CPU, are stored in the cache memory so that we can easily access that data
in shorter time.
• Whenever the CPU needs to access memory, it first checks the cache
memory. If the data is not found in cache memory then the CPU moves
onto the main memory. It also transfers block of recent data into the cache
and keeps on deleting the old data in cache to accomodate the new one.
• Cache memory is a random access memory.
• It lies on the path between the processor and the main memory.
• It bridges the speed mismatch between the fastest processor and the
slower main memory.
8
Hit Ratio
• The performance of cache memory is measured in terms of a quantity
called hit ratio.
• When the CPU refers to memory and finds the word in cache it is said to
produce a hit.
• If the word is not found in cache, it is in main memory then it counts as
a miss.
• The ratio of the number of hits to the total CPU references to memory is
called hit ratio.
• Hit ratio= Hit/(Hit+Miss)
9
types of cache memory
10
types of cache memory
• The L1 cache, also known as the primary cache, is very fast, but it is
relatively small. It is embedded usually in the processor chip in the
form of the CPU cache.
• L1 is the primary type cache memory.
• The Size of the L1 cache very small comparison to others that is
between 2KB to 64KB, it depends on computer processor.
• It is an embedded register in the computer microprocessor(CPU).
• The Instructions that are required by the CPU that are firstly searched
in L1 Cache.
11
types of cache memory
• The secondary cache, also known as the L2 cache, is often
comparatively more capacious than the L1 cache.
• L2 is secondary type cache memory.
• The Size of the L2 cache is more capacious than L1 that is between
256KB to 512KB.
• L2 cache is Located on computer microprocessor.
• After searching the Instructions in L1 Cache,if not found then it
searched into L2 cache by computer microprocessor.
• The high-speed system bus interconnecting the cache to the
microprocessor.
12
types of cache memory
• The Level 3 (or L3) cache refers to a specialized memory that is
developed in order to improve the actual performance of the L1 and
L2.
• The L3 cache is larger in size but also slower in speed than L1 and L2.
• Its size is between 1MB to 8MB.
• In Multi core processors, each core may have separate L1 and L2,but
all core share a common L3 cache.
• L3 cache double speed than the RAM.
13
need
• The need for the cache memory is due to the mismatch between the
speeds of the main memory and the CPU.
• It acts as a high speed buffer between CPU and main memory and is
used to temporary store very active data
14
Cache replacement Algorithms
• Caching
• Caching is the process of storing some data near where It's supposed
to be used rather than accessing them from an expensive origin,
every time a request comes in.
15
Cache Replacement Algorithms
• LRU
• LRU keeps the least
recently used objects at
the top and evicts
objects that haven't been
used in a while if the list
reaches the maximum
capacity.
16
Cache Replacement Algorithms
• LFU
• the Least Frequently Used (LFU) algorithm works similarly to LRU
except it keeps track of how many times an object was accessed
instead of how recently it was accessed.
• Each object has a counter that counts how many times it was
accessed.
• When the list reaches the maximum capacity, objects with the lowest
counters are evicted.
17
Cache Replacement Algorithms
• FIFO
18
Random Replacement (RR)
19
. Write Policy
• write-through
• write-around
• write-back
20
Write-through
21
Write-around
22
Write-back/write-behind:
• we would have to return
the response before
updating the backing store.
• In this case, the backing
store update happens
asynchronously in a
separate sequence
• For CPU caches, we use
a dirty bit as a state
indicator.
23
Cache Size and cache Lines
• Cache Size :
• The "size" of the cache is the amount of main memory data it can
hold. This size can be calculated as the number of bytes stored in
each data block times the number of blocks stored in the cache.
• Cache Lines
• Cache memory is divided into equal size partitions called as cache
lines
24
Effect of Changing block size
• Case-01: Decreasing the Block Size-
• A smaller block size will contain a smaller number of near by addresses in it.
• Thus, only smaller number of near by addresses will be brought into the cache.
• This increases the chances of cache miss which reduces the exploitation of spatial
locality.
• Thus, smaller is the block size, inferior is the spatial locality.
26
Cache Mapping:
• The main memory gets divided into
multiple partitions of equal size,
known as the frames or blocks.
• The cache memory is actually divided
into various partitions of the same
sizes as that of the blocks, known as
lines.
• The main memory block is copied
simply to the cache during the process
of cache mapping, and this block isn’t
brought at all from the main memory
27
Cache Mapping:
• There are three different types of mapping used for the purpose of
cache memory which are as follows:
• Direct mapping,
• Associative mapping
• K-Way Set-Associative mapping
28
Direct Mapping -
29
Direct Mapping -
30
Direct Mapping -
31
Direct Mapping -
32
Direct Mapping -
33
Direct Mapping
In direct mapping, the cache consists of normal high-speed random-access
memory. A certain block of the main memory would be able to map a cache only
up to a certain line of the cache. The total line numbers of cache to which any
distinct block can map are given by the following:
Cache line number = (Address of the Main Memory Block) Modulo (Total
number of lines in Cache)
For example,
Let us consider that particular cache memory is divided into a total of ‘n’ number
of lines.
Then, the block ‘j’ of the main memory would be able to map to line number
only of the cache (j mod n).
The Need for Replacement Algorithm
In the case of direct mapping,
There is no requirement for a replacement
algorithm.
It is because the block of the main memory
would be able to map to a certain line of the
cache only.
Thus, the incoming (new) block always
happens to replace the block that already exists,
if any, in this certain line.
Division of Physical Address
In the case of direct mapping, the division of the physical address occurs as
follows:
Set Associative Mapping -
37
Set Associative Mapping -
38
Set Associative Mapping -
• In this type of cache, the following steps are used to access the data
from a cache:
• The index of the address from the processor is used to access the set.
• Then the comparators are used to compare all tags of the selected set
with the incoming tag.
• If a match is found, the corresponding location is accessed.
• If no match is found, an access is made to the main memory.
39
Fully Associative Mapping
In fully associative type of cache memory, each location in cache stores both
memory address as well as data.
43
Fully associative mapping
44
Fully associative mapping
45
K-way Set Associative Mapping
In the case of k-way set associative mapping,
The grouping of the cache lines occurs into various sets where all the sets
consist of k number of lines.
Any given main memory block can map only to a particular cache set.
However, within that very set, the block of memory can map any cache line
that is freely available.
The cache set to which a certain main memory block can map is basically
given as follows:
Cache set number = (Block Address of the Main Memory) Modulo (Total
Number of sets present in the Cache)
Let us consider the example given as follows of a two-way set-associative
mapping:
In this case,
• k = 2 would suggest that every set consists of
two cache lines.
• Since the cache consists of 6 lines, the total
number of sets that are present in the cache =
6/2 = 3 sets.
• The block ‘j’ of the main memory is capable of
mapping to the set number only (j mod 3) of the
cache.
• Here, within this very set, the block ‘j’ is
capable of mapping to any cache line that is
freely available at that moment.
• In case all the available cache lines happen to
be occupied, then one of the blocks that already
exist needs to be replaced.
The Need for Replacement Algorithm
In the case of k-way set associative mapping,
The k-way set associative mapping refers to a combination of the direct
mapping as well as the fully associative mapping.
It makes use of the fully associative mapping that exists within each set.
Therefore, the k-way set associative mapping needs a certain type of
replacement algorithm.
• External Fragmentation
• Internal Fragmentation
51
Fragmentation
• Internal Fragmentation
• Whenever a memory block gets allocated with a process, and in case the process
happens to be smaller than the total amount of requested memory, a free space
is ultimately created in this memory block. And due to this, the memory block’s
free space is unused. This is what causes internal fragmentation
• External Fragmentation
• External fragmentation occurs whenever a method of dynamic memory
allocation happens to allocate some memory and leave a small amount of
unusable memory. The total quantity of the memory available is reduced
substantially in case there’s too much external fragmentation. So, there’s enough
memory space in order to complete a request, and it is not contiguous. Thus, it is
known as external fragmentation.
52
What is virtual memory?
• Virtual memory, also called virtual storage, is a technique that computers use to optimize memory
management by transferring data between different storage systems, such as random access
memory (RAM) and disk storage. This memory system is a built-in component of most modern
desktop computers. It's incorporated into the hardware of a computer's central processing unit
(CPU) and is a more cost-effective method for managing memory than expanding the computer's
physical memory storage system. Here are some other advantages of this memory system:
• It frees applications from having to compete for shared memory space.
• It allows processes to share memory between libraries (a collection of code that provides the
foundation for a program's operations).
• It improves security by isolating and segmenting where the computer stores information.
• It increases the amount of memory available by working outside the limits of a computer's physical
memory space.
• It optimizes the CPU usage.
• It manages multiple applications at the same time.
53
Memory Management Unit
54
virtual memory
• In the most computer system, the physical main memory is not as large as address
space of the processor.
• When we try to run a program, if it do not completely fit into the main memory the
parts of its currently being executed are stored in main memory and remaining portion
is stored in secondary storage device such as HDD.
• When a new part of program is to be brought into main memory for execution and if the
memory is full, it must replace another part which is already in main memory.
• As this secondary memory is not actually part of system memory, so for CPU, secondary
memory is Virtual Memory.
• Virtual Memory is used to logically extend the size of main memory.
• When Virtual Memory is used, the address field is virtual address.
• A special hardware unit knows as MMU translates Virtual Address into Physical Address.
55
• Virtual Memory is a imaginary
memory which we assume or use,
when we have a material that
exceeds our memory at that time.
56
Virtual Memory
Virtual Memory is a storage scheme that provides user an illusion of having a very big main memory. This is done
by treating a part of secondary memory as the main memory.
In this scheme, User can load the bigger size processes than the available main memory by having the illusion that
the memory is available to load the process.
Instead of loading one big process in the main memory, the Operating System loads the different parts of more
than one process in the main memory.
By doing this, the degree of multiprogramming will be increased and therefore, the CPU utilization will also be
increased.
• Paging
• Segmentation
59
Paging
60
Paging
• Example: We have a process P having process size as 4B, page size as
1B. Therefore there will be four pages(say, P0, P1, P2, P3) each of size
1B. Also, when this process goes into the main memory for execution
then depending upon the availability, it may be stored in non-
contiguous fashion in the main memory frame as shown below:
61
Paging
In Operating Systems, Paging is a storage mechanism used to retrieve processes from the secondary storage into the main memory
in the form of pages.
The main idea behind the paging is to divide each process in the form of pages. The main memory will also be divided in the form of
frames.
One page of the process is to be stored in one of the frames of the memory. The pages can be stored at the different locations of the
memory but the priority is always to find the contiguous frames or holes.
Prepared By: Dr. Tarun Singhal
Translation of logical Address into physical
Address
65
Page table
• Page table: A page table contains the frame number corresponding to
the page number of some specific process. So, each process will have
its own page table. A register called Page Table Base Register(PTBR)
which holds the base value of the page table.
66
Translation Process
• The CPU generates the logical address which contains the page
number and the page offset .
• The PTBR register contains the address of the page table. Now, the
page table helps in determining the frame number corresponding to
the page number.
• Now, with the help of frame number and the page offset the physical
address is determined and the page is accessed in the main memory.
67
Translation Process
68
Advantages/ Disadvantages
• Advantages of Paging
• There is no external fragmentation as it allows us to store the data in a non-contiguous way.
• Swapping is easy between equal-sized pages and frames.
• Disadvantages of Paging
• As the size of the frame is fixed, so it may suffer from internal fragmentation. It may happen
that the process is too small and it may not acquire the entire frame size.
• The access time increases because of paging as the main memory has to be now accessed
two times. First, we need to access the page table which is also stored in the main memory
and second, combine the frame number with the page offset and then get the physical
address of the page which is again stored in the main memory.
• For every process, we have an independent page table and maintaining the page table is
extra overhead.
69
Segmentation is a memory management
technique whereby data items are stored in
segments on the storage media. It divides the
process or user-accessible space into fixed-sized
blocks, called segments.
72
Segmentation
• As a CPU always generates a logical address and we need a
physical address for accessing the main memory. This mapping is
done by the MMU(memory management Unit) with the help of
the segment table .
• Lets first understand some of the basic terms then we will see how
this translation is done.
• Logical Address: The logical address consists of two parts segment
number and page offset.
• 1. Segment Number: It tells the specific segment of the process from
which the CPU wants to read the data.
73
Segmentation
• 2. Segment Offset: It tells the exact word in that segment which the CPU
wants to read.
• 3. Logical Address = Segment Number + Segment Offset
• Physical Address: The physical address is obtained by adding the base
address of the segment to the segment offset.
• Segment table: A segment table stores the base address of each segment
in the main memory. It has two parts i.e. Base and Limit .
Here, base indicates the base address or starting address of the segment in
the main memory. Limit tells the size of that segment. A register called
Segment Table Base Register(STBR) which holds the base value of the
segment table. The segment table is also stored in the main memory itself.
74
Translation Process
• The CPU generates the logical address which contains the segment number and
the segment offset .
• STBR register contains the address of the segment table.
• Now, the segment table helps in determining the base address of the
segment corresponding to the page number.
• Now, the segment offset is compared with the limit corresponding to the Base.
• If the segment offset is greater than the limit then it is an invalid address. This is
because the CPU is trying to access a word in the segment and this value is greater
than the size of the segment itself which is not possible. If the segment offset is
less than or equal to the limit then only the request is accepted. The physical
address is generated by adding the base address of the segment to the segment
offset.
75
Segmentation
76
Segmentation
• Advantages of Segmentation
• The size of the segment table is less compared to the size of the page table.
• There is no internal fragmentation.
• Disadvantages of Segmentation
• When the processes are loaded and removed ( during swapping ) from the main memory
then free memory spaces are broken into smaller pieces and this causes external
fragmentation.
• Here also the time to access the data increases as due to segmentation the main memory
has to be now accessed two times. First, we need to access the segment table which is also
stored in the main memory and second, combine the base address of the segment with the
segment offset and then get the physical address which is again stored in the main memory.
•
77
Sr Paging Segmentation
No.
1 Non-Contiguous memory allocation Non-contiguous memory allocation
2 Paging divides program into fixed size pages. Segmentation divides program into
variable size segments.
3 OS is responsible Compiler is responsible.
8 Logical address is divided into page number and Logical address is divided into segment
page offset number and segment offset
9 Page table is used to maintain the page Segment Table maintains the segment
information. information
10 Page table entry has the frame number and some Segment table entry has the base
flag bits to represent details about pages. address of the segment and some
protection bits for the segments.
Fragmentation
Fragmentation is an unwanted problem where the memory blocks cannot be allocated to the
processes due to their small size and the blocks remain unused.
Internal Fragmentation
In this fragmentation, the process is allocated a memory block of size more than the size of that process. Due to this some part
of the memory is left unused and this cause internal fragmentation.
P1 of size
3MB
External Fragmentation
In this fragmentation, although we have total space available that is needed by a process still we are not
able to put that process in the memory because that space is not contiguous. This is called external
fragmentation.
AMAT
= Hit Ratio * Cache access time + Miss Ratio * (Cache access time
+ Main memory access time) = (h * tc) + (1-h) * (tc+tm)
Or
= Hit Time + (Miss Rate * Miss Penalty)
if Hit time, Miss Rate, and Miss Penalty are reduced, the AMAT reduces which in turn ensures optimal
performance of the cache.
Methods for reducing Hit Time, Miss Rate, and Miss Penalty:
Methods to reduce Hit Time:
1. Small and simple caches: If lesser hardware is required for the implementation of caches, then it
decreases the Hit time because of the shorter critical path through the Hardware.
2. Avoid Address translation during indexing: Caches that use physical addresses for indexing are
known as a physical cache. Caches that use the virtual addresses for indexing are known as virtual
cache. Address translation can be avoided by using virtual caches. Hence, they help in reducing the
hit time.
Methods to reduce Miss Rate:
1. Larger block size: If the block size is increased, spatial locality can be exploited in an efficient way
which results in a reduction of miss rates
2. Larger cache size: Increasing the cache size results in a decrease of capacity misses, thereby
decreasing the miss rate. But, they increase the hit time and power consumption.
3. Higher associativity: Higher associativity results in a decrease in conflict misses. Thereby, it helps in
reducing the miss rate.
Methods to reduce Miss Penalty:
1. Multi-Level Caches: If there is only one level of cache, then we need to decide
between keeping the cache size small in order to reduce the hit time or making it
larger so that the miss rate can be reduced. Both of them can be achieved
simultaneously by introducing cache at the next levels.
2. Critical word first and Early Restart: Generally, the processor requires one word of
the block at a time. So, there is no need of waiting until the full block is loaded before
sending the requested word. This is achieved using:
The critical word first: It is also called a requested word first. In this method, the exact
word required is requested from the memory and as soon as it arrives, it is sent to the
processor. In this way, two things are achieved, the processor continues execution, and
the other words in the block are read at the same time.
Early Restart: In this method, the words are fetched in the normal order. When the
requested word arrives, it is immediately sent to the processor which continues
execution with the requested word.
Problem:
The main memory of a computer has 2 cm blocks while the cache has 2c blocks. If the
cache uses the set associative mapping scheme with 2 blocks per set, then block k of
the main memory maps to how many set.
ii)How many bits are needed to represent the TAG, SET and WORD fields?
Given-
Number of blocks in cache memory = 128
Number of blocks in each set of cache = 4
Main memory size = 16384 blocks
Block size = 256 bytes
1 word = 8 bits = 1 byte
We have-
Size of main memory
= 16384 blocks
= 16384 x 256 bytes
= 222 bytes
Thus, Number of bits required to address main memory = 22 bits
We have-
Block size
= 256 bytes
= 28 bytes
Thus, Number of bits in block offset or word = 8 bits
89
Number of sets in cache
= Number of lines in cache / Set size
= 128 blocks / 4 blocks
= 32 sets
= 25 sets
Thus, Number of bits in set number = 5 bits
101
Synchronous data transfer
• Advantages –
• The design procedure is easy. The master does not wait for any acknowledge signal
from the slave, though the master waits for a time equal to slave’s response time.
• The slave does not generate an acknowledge signal, though it obeys the timing
rules as per the protocol set by the master or system designer.
• Disadvantages –
• If a slow speed unit connected to a common bus, it can degrade the overall rate of
transfer in the system.
• If the slave operates at a slow speed, the master will be idle for some time during
data transfer and vice versa.
102
Asynchronous data transfer
• Asynchronous data transfer is a mode of data transfer in which two-
components have two different clocks.
• Asynchronous data transfer between any two independent units
requires that the control signals be transmitted between
communicating units so that times can be indicated at which they
transfer data.
103
Asynchronous data transfer
• If the registers in the I/O interface share a common clock with CPU
registers, then transfer between the two units is said to be
synchronous.
• But in most cases, the internal timing in each unit is independent of
each other, so each uses its private clock for its internal registers.
• In this case, the two units are said to be asynchronous to each other,
and if data transfer occurs between them, this data transfer is
called Asynchronous Data Transfer.
104
Asynchronous data transfer
• the Asynchronous Data Transfer between two independent units
requires that control signals be transmitted between the
communicating units so that the time can be indicated at which they
send data.
• Strobe control: A strobe pulse is supplied by one unit to indicate to
the other unit when the transfer has to occur.
• Handshaking: This method is commonly used to accompany each
data item being transferred with a control signal that indicates data in
the bus. The unit receiving the data item responds with another signal
to acknowledge receipt of the data.
105
Asynchronous data transfer
• the CPU is the source during output or write transfer and the
destination unit during input or read transfer.
• Therefore, the control sequence during an asynchronous transfer
depends on whether the transfer is initiated by the source or by the
destination.
106
Asynchronous Data Transfer
Methods
• 1. Strobe Control Method
• The Strobe Control method of asynchronous data transfer employs a
single control line to time each transfer. This control line is also known
as a strobe, and it may be achieved either by source or destination,
depending on which initiate the transfer.
• Source initiated strobe: In the below block diagram, you can see that
strobe is initiated by source, and as shown in the timing diagram, the
source unit first places the data on the data bus.
107
Asynchronous Data Transfer
Methods
108
Asynchronous Data Transfer
Methods
• Destination initiated strobe: In the below block diagram, you see that
the strobe initiated by destination, and in the timing diagram, the
destination unit first activates the strobe pulse, informing the source
to provide the data.
109
2. Handshaking Method
• The strobe method has the disadvantage that the source unit that
initiates the transfer has no way of knowing whether the destination has
received the data that was placed in the bus. Similarly, a destination unit
that initiates the transfer has no way of knowing whether the source unit
has placed data on the bus.
• So this problem is solved by the handshaking method. The handshaking
method introduces a second control signal line that replays the unit that
initiates the transfer.
• In this method, one control line is in the same direction as the data flow
in the bus from the source to the destination. The source unit uses it to
inform the destination unit whether there are valid data in the bus.
110
2. Handshaking Method
111
2. Handshaking Method
112
2. Handshaking Method
113
2. Handshaking Method
114
• Advantages of Asynchronous Data Transfer
• Asynchronous Data Transfer in computer organization has the following advantages, such as:
• It is more flexible, and devices can exchange information at their own pace. In addition,
individual data characters can complete themselves so that even if one packet is corrupted, its
predecessors and successors will not be affected.
• It does not require complex processes by the receiving device. Furthermore, it means that
inconsistency in data transfer does not result in a big crisis since the device can keep up with
the data stream. It also makes asynchronous transfers suitable for applications where character
data is generated irregularly.
• Disadvantages of Asynchronous Data Transfer
• There are also some disadvantages of using asynchronous data for transfer in computer
organization, such as:
• The success of these transmissions depends on the start bits and their recognition.
Unfortunately, this can be easily susceptible to line interference, causing these bits to be
corrupted or distorted.
• A large portion of the transmitted data is used to control and identify header bits and thus
carries no helpful information related to the transmitted data. This invariably means that more
data packets need to be sent.
115
Mode of Transfer:
• Data transfer between CPU and the I/O devices may be done in
different modes.
• Data transfer to and from the peripherals may be done in any of the
three possible ways
• Programmed I/O.
• Interrupt- initiated I/O.
• Direct memory access( DMA).
116
Programmed I/O.
118
Programmed I/O.
119
I/O Commands
• Whenever the processor experience the I/O related instruction, to execute this I/O instruction the
processor issues two things I/O command and address on the bus which is decoded by every I/O module
connected to the system. Whichever I/O module is addressed by the processor recognizes this address
respond to the issued I/O command.
• The processor issue the I/O commands to the I/O module can be of four types.
• Control: This I/O command activates the I/O module addressed by the processor and directs it to the
task it has to perform. This command can be customized depending upon the type of peripherals.
• Test: This I/O command tests the status of the I/O module and its peripherals to ensure that the
addressed peripheral is powered on and available for the task. This command also tests whether the
most recent I/O operation has completed successfully or if any error has occurred.
• Read: This I/O command lets the I/O module extract the data from the corresponding peripheral and
store it in its internal buffer. Further, the I/O module can place this data over the data bus on the
processor’s demand.
• Write: This I/O command lets the I/O module accept the data over the data bus and transmit it to the
corresponding peripheral.
120
Programmed I/O.
• In this case, the I/O device does not have direct access to the
memory unit.
• A transfer from I/O device to memory requires the execution of
several instructions by the CPU, including an input instruction to
transfer the data from device to the CPU and store instruction to
transfer the data from CPU to memory.
• In programmed I/O, the CPU stays in the program loop until the I/O
unit indicates that it is ready for data transfer.
• This is a time-consuming process since it needlessly keeps the CPU
busy. This situation can be avoided by using an interrupt facility.
121
Schemes using Programmed IO
• Memory-mapped I/O uses the same address space to address both
main memory and I/O devices.
• The memory and registers of the I/O devices are mapped to (associated
with) address values. So a memory address may refer to either a
portion of physical RAM, or instead to memory and registers of the I/O
device. Thus, the CPU instructions used to access the memory can also
be used for accessing devices.
• With the memory-mapped I/O, the processor access memory and I/O
using a single address space. Here the processor uses the same address,
data, and control bus. So, the same set of machine instructions
addresses both memory and I/O.
122
Memory-mapped I/O
123
Port-mapped I/O
With isolated I/O the address space for memory is isolated from the address space of
I/O. Though the processor uses the same data and address line for memory and I/O
devices it uses a separate control line for memory and I/O devices.
IN Port-address DB 2
OUT Port-Address D3 2
124
Port-mapped I/O
• The IN or OUT instruction mnemonics should be followed by an 8-bit
port address. Thus we can have 28 = 256 input ports and 256 output
ports
• IN and OUT both are 2-Bytes instructions.
125
Need?
• With the programmed I/O this entire functioning is regulated with the
help of a program. This action of transferring the typed characters
from the keyboard to memory and then to the display module must
happen at the right time.
• The speed at which the character can be transmitted and displayed on
the display module is still much slower if compared to the speed of
the processor.
• So, to overcome the difference in the speed of the processor and I/O
device a mechanism must be implemented to synchronize the
transfer of data between the processor and I/O modules.
126
Interrupt- initiated I/O:
• Whenever it is determined that the device is ready for data transfer it initiates an interrupt
request signal to the computer. Upon detection of an external interrupt signal the CPU stops
momentarily the task that it was already performing, branches to the service program to process
the I/O transfer, and then return to the task it was originally performing.
•
• Note: Both the methods programmed I/O and Interrupt-driven I/O require the active
intervention of the processor to transfer data memory and th I/O module, and any data transfer
must transverse a path through the processor. Thus, both these forms of I/O suffer from two
inherent drawbacks.
•
• The I/O transfer rate is limited by the speed with which the processor can test and
• service a device.
• The processor is tied up in managing an I/O transfer; a number of instructions must be executed
for each I/O transfer.
127
Interrupt- initiated I/O:
• Interrupt driven I/O is an alternative scheme dealing with I/O.
Interrupt I/O is a way of controlling input/output activity whereby a
peripheral or terminal that needs to make or receive a data transfer
sends a signal. This will cause a program interrupt to be set.
• At a time appropriate to the priority level of the I/O interrupt. Relative
to the total interrupt system, the processors enter an interrupt service
routine. The function of the routine will depend upon the system of
interrupt levels and priorities that is implemented in the
processor. The interrupt technique requires more complex hardware
and software, but makes far more efficient use of the computer’s time
and capacities. Figure 2 shows the simple interrupt processing.
128
Interrupt- initiated I/O:
129
Interrupt- initiated I/O:
130
There are multiple I/O modules, how should the processor
determine the device that issued the interrupt signal?
How does the processor decide which module to process
when multiple interrupts have occurred?
131
Interrupt- initiated I/O:
• Multiple Interrupt Lines
As the name suggests, we provide multiple interrupt lines between the processor
and the I/O modules. This allows multiple modules to be handled at the same time.
• The processor picks the interrupt line with highest priority.
• Software Poll
Whenever an interrupt is detected by the processor, it branches to an interrupt
service routine which will poll each and every I/O module to determine the exact
interrupting module. The processor raises a poll which could be in the form of a
command line. Consequently, the address of the respective I/O module which is
interacted by the poll will be placed on the address line.
• The priority is determined by the order in which the modules are polled.
132
Interrupt- initiated I/O:
• Daisy Chain (Hardware Poll, Vectored)
This is actually a hardware poll. The interrupt acknowledge line is
daisy chained to all the modules. Whenever there is an interrupt, the
processor send out an interrupt acknowledge which will propagate
throughout the series of I/O modules. This process will continue until
it reaches a requesting module.
• The priority is determined by the order in which the modules are
polled.
133
Interrupt- initiated I/O:
• Bus Arbitration (Vectored)
The last method which also utilizes vectored interrupts is bus
arbitration. This method involves the I/O module gaining control over
the bus before requesting for the interrupt. This is limited to only one
module at a time.
134
Interrupt- initiated I/O:
• However, when there are multiple interrupts at a single time, there
will be a need to assign priorities. These 4 methods have their own
way of assigning priorities:
135
Direct Memory Access:
• Direct Memory Access: The data transfer between a fast storage media
such as magnetic disk and memory unit is limited by the speed of the
CPU.
• Thus we can allow the peripherals directly communicate with each other
using the memory buses, removing the intervention of the CPU.
• This type of data transfer technique is known as DMA or direct memory
access.
• During DMA the CPU is idle and it has no control over the memory buses.
• The DMA controller takes over the buses to manage the transfer directly
between the I/O devices and the memory unit.
136
DMA
137
DMA
138
DMA
139
DMA
• The two control signals Bus Request and Bus Grant are used to
fascinate the DMA transfer. The bus request input is used by the DMA
controller to request the CPU for the control of the buses. When BR
signal is high, the CPU terminates the execution of the current
instructions and then places the address, data, read and write lines to
the high impedance state and sends the bus grant signal. The DMA
controller now takes the control of the buses and transfers the data
directly between memory and I/O without processor interaction.
When the transfer is completed, the bus request signal is made low
by DMA. In response to which CPU disables the bus grant and again
CPU takes the control of address, data, read and write lines.
140
DMA
141
DMA CONTROLLER
142
DMA CONTROLLER
• When the bus grant signal is zero, the CPU communicates through
the data bus to read or write into the DMA register. When bus grant
is one, the DMA controller takes the control of buses and transfers
the data between the memory and I/O.
143
DMA TRANSFER
144
DMA TRANSFER
• DMA request signal is given from I/O device to DMA controller.
• DMA sends the bus request signal to CPU in response to which CPU disables its
current instructions and initialize the DMA by sending the following information.
• o The starting address of the memory block where the data are available (for
read) and where data to be stored (for write)
• o The word count which is the number of words in the memory block
• o Control Register to specify the mode of transfer
• o Sends a bust grant as 1 so that DMA controller can take the control of the buses
• o DMA sends the DMA acknowledge signal in response to which peripheral
device puts the words in the data bus (for write) or receives a word from the data
bus (for read).
145
I/O PROCESSORS
146
I/O PROCESSORS
147
I/O PROCESSORS
148
I/O PROCESSORS
149
CPU-IOP Communication
150
THANK YOU