COA
GATE फर्रे
1. Introduction to COA 1. Fetch the contents of the memory location
1.1. Types of computers pointed to by the PC, and load into IR. i.e. IR ←
● Embedded computers [(PC)]
● Personal computers: desktop computers, 2. Assuming that the memory is byte addressable
workstation computers, portable & one word is 32 bit (4B), increment the contents of
computers etc. the PC by 4, so PC ← [PC] +4
● Servers & Enterprise systems 3. Decode the instn to understand the
● Supercomputers operation & generate the control signals
necessary to carry out the operation.
1.2. Components of Computer
1.2.1. CPU (Central Processing Unit) 03. Accumulator
[Link]. ALU (Arithmetic Logical Unit) Temporary storage location for arithmetic &
ALU performs the required micro-operations logical operations
for executing the instructions.
04. MAR
[Link]. Control Unit Works with the memory bus to fetch/store
The control unit supervises data at a specific address.
the transfer of information among the
registers and instructs the ALU as to which 05. MDR
operation to a. Temporarily holds data being transferred
perform. to/from memory. Contents of MBR are
directly connected to the data bus.
[Link]. Registers b. Acts as a buffer between CPU and
● Registers are small, high-speed storage memory.
locations within the CPU used for
temporary storage, control, and data 1.2.2. Memory
manipulation during program execution. ● Main/Primary Memory
● They hold binary information and are ● Secondary Memory
crucial for efficient CPU operation.
1.2.3. I/O
[Link].1. Types of CPU Registers [Link]. Input Unit
01. PC
● The processor keeps track of the address The process of receiving data from an
of the memory location containing the external source (like a user typing on a
next instruction to be fetched using the keyboard or a sensor reading data) and
program counter. making it available to the computer's
● After fetching an instruction, the contents internal components for processing.
of the PC are updated to point to the next
instruction in the sequence. [Link]. Output Unit
02. IR (Instruction Register) The process of sending data from the
Decodes fetched instruction with predefined computer's internal components to an
format, which is: external device or system (like displaying
Page No:-01
COA
GATE फर्रे
information on a monitor, printing to a
printer, or sending data over a network).
1.3. Big-endian & Little-endian
Assignments
● The name big-endian is used when lower
byte addresses are used for the more
significant bytes.
● The name little-endian is used for where
the lower byte addresses are used for the 1.5.1. Instruction Fetch (IF)
less significant bytes of the word. ● PC → MAR: Program Counter (PC) holds
address of next instruction; transferred
to Memory Address Register (MAR).
● Read Memory: Control unit issues a read;
memory returns the instruction into
Memory Buffer Register (MBR).
● MBR → IR: Instruction Buffer Register (IR)
loads the instruction.
● PC ← PC + 1: PC is incremented to point
to the following instruction.
1.4. Memory Addressing 1.5.2. Instruction Decode (ID)
If the memory is byte addressable, then ● IR → Control Unit: Opcode field is sent to
byte locations have addresses 0, 1, 2... and the control unit for interpretation.
if the memory is word addressable and the ● Operand Fetch: If needed, source
length of each word is 32 bits, then operand addresses are loaded into MAR;
successive words are located at addresses memory or register file is accessed,
0,4,8,12,...., with each word consisting of 2 placing data into temporary registers.
bytes. (1B = 8 bits) ● Register File Access: Control signals
select appropriate registers; operands
are read into internal CPU registers.
1.5.3. Execute (EX)
● ALU Operation: Arithmetic/logic unit
performs the operation specified (e.g.,
add, subtract, logical AND/OR, shift).
● Address Calculation: For memory‐
reference instructions, effective address
is computed here.
1.5. Instruction Cycle
Page No:-02
COA
GATE फर्रे
1.5.4. Memory Access (MEM) 2.2. Types of Instructions
Data Read/Write: 2.2.1. Three-Address Instructions
● Load: If it’s a load instruction,
MAR→memory→MBR→destination register.
● Store: If it’s a store instruction, source Each address field specifies either a register
register→MBR→memory at MAR. or an operand. The advantage of the three-
address format is that it results in short
1.5.5. Write‐B a ck ( W B ) programs when evaluating arithmetic
● Result → Register: ALU or MBR result is expressions.
written back into the register file or PC
(for branches/jumps).
● Flags Update: Condition codes (zero,
carry, overflow) are updated if needed.
Note:
2.2.2. Two-Address Instructions
● The fetch-execute cycle repeats to
execute next instructions until a halt
instruction is executed. Each address field can specify either a
● Halt (HLT) instruction stops the execution register or a word. The program to evaluate
of further instructions until an interrupt X = (A + B) * (C + D) is as follows
or reset signal is received.
● While halted, the CPU may perform
minimal operations like memory refresh
to maintain system integrity.
2. Machine Instructions & Addressing
Modes Note: The MOV instruction moves or
2.1. Instructions Format transfers the operands to and from memory
A computer will usually have a variety of and processor registers.
instruction code formats. It is the function of
the control unit within the CPU to interpret 2.2.3. One-Address Instructions
each instruction code and provide the
necessary control functions needed to
One-address instructions use an implied
process the instruction. The most common
accumulator (AC) register for all data
fields found in instruction formats are:
manipulation. The program to evaluate X =
1. An operation code field (opcode) that
(A + B) * (C + D) is
specifies the operation to be performed.
2. An address field that designates a
memory address or a processor register.
3. A mode field that specifies the way the
operand or the effective address is
determined.
Page No:-03
COA
GATE फर्रे
2.3. Addressing Modes
The way the operands are chosen during
program execution is dependent on the
addressing mode of the instruction.
Its types are:
● Implied Mode
● Register Indirect Mode
2.2.4. Zero-Address Instructions ● Immediate Mode
● Autoincrement or Autodecrement
Used in stack organised computers with Mode
PUSH & POP instructions. The following ● Direct Addressing Mode/Absolute
program shows how Addressing Mode
X = (A + B) * (C + D) will be written ● Relative Address Mode
● Indirect Address Mode
● Indexed Addressing Mode
● Register Mode
● Base Register Addressing Mode
2.3.1. Implied Mode
In this mode the operands are specified
implicitly in the definition of the instruction.
Note: To evaluate arithmetic expressions in a. All register reference instructions that use
a stack computer, it is necessary to convert an accumulator are implied-mode
the expression into Reverse Polish notation. instructions.
b. Zero-address instructions in a stack-
2.2.5. RISC Instructions organized computer are implied-mode
It is restricted to LOAD & STORE instructions instructions since the operands are implied
when interacting between memory & CPU. to be on top of the stack.
All other instructions like ADD, MUL are
executed within the registers of the CPU 2.3.2. Immediate Mode
without referring to memory. Following is a
program to evaluate X = (A + B) * (C + D)
In this mode the operand is specified in the
instruction itself. An immediate-mode
instruction has an operand field rather than
an address field.
2.3.3 Direct Address Mode/Absolute
Address Mode
Page No:-04
COA
GATE फर्रे
In this mode the operands are in registers
that reside within the CPU. The particular
register is selected from a register field in
the instruction. A k-bit field can specify any
one of 2k registers.
2.3.6. Register Indirect Mode
In this mode the effective address is equal to
the address part of the instruction. The
In this mode the instruction specifies a
operand resides in memory and its address
register in the CPU whose contents give the
is given directly by the address field of the
address of the operand in memory.
instruction. In a branch-type instruction the
The advantage of a register indirect mode
address field specifies the actual branch
instruction is that the address field of the
address.
instruction uses fewer bits to select a
register than a memory address directly.
2.3.4. Indirect Address Mode
2.3.7. Autoincrement or Autodecrement
Mode
In this mode the address field of the
instruction gives the address where the
effective address is stored in memory.
Control fetches the instruction from memory
and uses its address part to access memory This is similar to the register indirect mode
again to read the effective address. except that the register is incremented or
decremented after its value is used to access
2.3.5. Register Mode memory.
Note: The Effective Address (E.A.) is the
memory location calculated based on the
addressing mode specified in the instruction,
i.e.
E.A. = address part of instn + content of CPU
register
Page No:-05
COA
GATE फर्रे
2.3.8. Relative Address Mode 2.3.10. Base Register Addressing Mode
● In this mode the content of a base
register is added to the address part of
In this mode the content of the program the instruction.
counter is added to the address part to ● The base register is assumed to hold the
obtain E.A. The address part is a signed base address.
number (2’s complement) which can be ● The address field gives the displacement
either positive or negative. relative to this base address.
EA = Address Part (off set) + PC value EA = Address Part (displacement/offset) +
It results in a shorter address field since the Base register value (Base address)
relative address can be specified with a
smaller number of bits compared to the 2.4. CISC vs RISC
entire memory address. It’s generally used CISC (Complex Instruction Set
in Branch-Type instructions. Computer)
● A large number of instructions-
2.3.9. Indexed Addressing Mode typically from 100 to 250 instructions
● Some instructions that perform
specialized tasks and are used
infrequently
● A large variety of addressing modes-
typically from 5 to 20 different modes
● Variable-length instruction formats
● Instructions that manipulate operands
in memory
● In this mode the content of an index
RISC (Reduced Instruction Set
register is added to the address part to
Computer)
obtain E.A.
● Relatively few instructions
● The index register contains an index
● Relatively few addressing modes
value.
● Memory access limited to load and
● The address field of the instruction
store instructions
defines the beginning address of a data
● All operations done within the
array in memory.
registers of the CPU
EA = Address Part (base address of data
● Fixed-length, easily decoded
array) + Index register value (index value)
instruction format
Page No:-06
COA
GATE फर्रे
● Single-cycle instruction execution,
CPI=1
● Hardwired rather than
microprogrammed control
● A relatively large number of registers
in the processor unit
● Use of overlapped register windows to
speed-up procedure call and return.
● Efficient instruction pipeline, CPI=1
● Compiler support for efficient
translation of high-level language
programs into machine language
programs.
3. ALU, Data-Path and Control Unit
3.1. ALU
ALU is a digital circuit that provides
arithmetic and logic operations. It is the
fundamental building block of the CPU of a
computer. The data and address lines of the external
memory bus are shown above connected to
the internal processor bus via the memory
data register (MDR) and the memory
address register (MAR) respectively.
3.2.1. Types of Datapath
a. One bus datapath
3.2. Datapath
CPU has 2 sections: Data Section(Data
Path)+ Control Section(Control Path)
Data Path = Registers + ALU +
Interconnecting bus
● Structure: A single internal bus connects
all registers, ALU inputs, and memory
data lines.
● Operation:
○ Only one data transfer or ALU operation
can occur at any given clock cycle.
○ Uses multiplexers to select which register
drives the bus and where the bus feeds.
Page No:-07
COA
GATE फर्रे
● Advantages: c. Three bus datapath
○ Minimal hardware (only one bus, fewer
multiplexers).
○ Low cost.
● Disadvantages:
○ Poor parallelism → low throughput.
○ Longer instruction execution time due to
sequential transfers.
b. Two bus datapath
● Structure:
○ Three distinct internal buses (Bus A, Bus
B, Bus C).
○ Two source operands and one destination
can be driven simultaneously.
● Operation:
○ Enables two reads and one write to
registers in a single cycle.
○ E.g., read R₁ → Bus A, R₂ → Bus B → ALU
● Structure:
→ write result to R₃ via Bus C.
○ Two internal buses (Bus A and Bus B)
● Advantages:
available for data movement.
○ Maximum data transfer parallelism for
○ ALU takes inputs from both buses and
simple register‑to‑ register and ALU
writes result back to one of them.
ops.
● Operation:
○ Shortest instruction cycle times for
○ Can perform one register‑to‑ register
register‑based operations.
transfer and one ALU operation
● Disadvantages:
concurrently in the same cycle.
○ Highest hardware overhead (three buses,
○ E.g., load register R₁ → Bus A, register R₂ →
extensive multiplexing).
Bus B → ALU → result back to R₃ on Bus A.
○ Greater control complexity and cost.
● Advantages:
○ Better parallelism than single‑ bus → higher
3.3. Control Unit
instruction throughput.
a. The control unit supervises the transfer of
○ Still relatively simple compared to
information among the registers and
three‑bus.
instructs the ALU as to which operation to
● Disadvantages:
perform.
○ Increased hardware cost (double buses,
b. The function of the control unit in a digital
more multiplexers).
computer is to initiate sequences of micro-
○ Still limited: only one ALU operation per
operations.
cycle.
Page No:-08
COA
GATE फर्रे
3.4.3. Hardwired Control Unit
Fig. Types of Control Unit
3.4.1. Microinstruction (Control Word)
Each word in control memory contains within
it a microinstruction. The microinstruction
specifies one or more microoperations for 1. Implementation
the system. a. Fixed combinational and sequential logic
(decoders, counters, gating circuits).
3.4.2. Microprogram b. Control signals expressed as Boolean SOP
A sequence of microinstructions. (Sum-of-Products) functions of:
An instruction can be executed by ➔ Control step counter outputs (T₁, T₂,…).
performing one or more of the following ➔ Instruction register bits (opcode).
operations in some ➔ Condition codes & external flags (e.g.
specified sequence: MFC, interrupt request).
a. Transfer a word of data from one
processor register to another or to the ALU. 2. Characteristics
b. Perform an arithmetic or a logic operation ➔ Speed: Very fast (single-cycle micro-
and store the result in a processor register. operations).
c. Fetch the contents of a given memory ➔ Complexity: Logic grows exponentially
location and load them into a processor with ISA complexity.
register. ➔ Flexibility: Difficult to modify or extend
d. Store a word of data from a processor once designed.
register into a given memory location ➔ Use Cases: Simple RISC processors with
limited instruction sets.
3.4.4. Microprogrammed Control Unit
1. Implementation
a. Control signals generated by
microinstructions stored in Control Memory
(CM).
Page No:-09
COA
GATE फर्रे
b. Each microinstruction (control word) fewer add/subtract operations by encoding
encodes one or more control signals. runs of 1’s in the multiplier
c. Sequencing via a Micro-Program Counter
(µPC) & Address Sequencer.
2. Microinstruction Fields
F1, F2, F3 micro-operations fields, CD:
Condition for branching, BR: Branch Field,
AD: Address field.
3. Characteristics
➔ Speed: Slower than hardwired (multiple
memory accesses).
➔ Flexibility: Easy to modify control
sequences (update micro-program).
➔ Complexity Handling: Suited for
complex ISAs (CISC) with many 3.5.2. Best Case and Worst Case
instructions. Occurrence:
➔ Storage: Requires ROM/RAM for control Best case is when there is a large block of
memory (2 K–10 K microinstructions). consecutive 1's and 0's in the multipliers, so
that there is minimum number of logical
[Link]. Horizontal Microprogramming operations taking place, as in addition and
● Control Word: One bit per control signal → subtraction.
maximal parallelism. Worst case is when there are pairs of
● Word Width: Very wide (one bit × alternate 0's and 1's, either 01 or 10 in the
number of signals). multipliers, so that maximum number of
● Decoder: None (signals directly driven). additions and subtractions are required.
[Link]. Vertical Microprogramming 3.5.3. Key Idea
● Control Word: Encoded fields (k-bits select 2ᵏ ➢ Examine pairs of bits of the multiplier
signals). (current LSB and an extra “previous”
● Word Width: Narrower, but requires bit); decide whether to add, subtract, or
decoders. do nothing with the multiplicand.
● Parallelism: Limited (typically one group ➢ Shift right each cycle, accumulating the
executed per cycle). partial product in a combined register.
3.5. Booth’s Algorithm
3.5.1. Goal: Efficiently multiply two signed
binary integers (two’s-complement) with
Page No:-10
COA
GATE फर्रे
3.5.4. Registers & Initialization ∴ Final [A|Q] = 11110001₂ = –12₁₀ , which is
3 × (–4).
Register Width Initial Content
4. Cache Memory Organisation
A n-bit 0…0 4.1. Introduction
Q n-bit Multiplier
M n-bit Multiplicand
Q-1 1 bit 0
● A small, fast SRAM buffer placed between
n - Number of bits in the CPU and main memory.
Q and M ● Holds copies of frequently accessed
memory blocks (cache lines), exploiting
ACC = [A (n bits) | Q (n bits) | Q-1(1 bit)] temporal locality & spatial locality
starts as [0…0 | Qinitial | 0] ● Goal: Reduce average memory access
time by satisfying most requests from the
3.5.5. Step‑by ‑ S te p Algorithm cache rather than slower DRAM.
1. Initialize A = 0, Q = multiplier, Q₋₁ = 0 , ● CPU always generated MM address (even
count = n. to access cache too)
2. Repeat until count = 0: ● The performance of cache can be
a. Examine (Q₀, Q₋₁) and modify A per the analysed with the following
decision rule. characteristics.
b. Right‑shift the triple [ A | Q | Q₋ ₁ ] : ○ Cache size (Small in KB’s)
i. New Q₋₁ ← o ld Q ₀ ○ Block or line size
ii. New Q ← old A₀ (least significant bit of A) ○ No. of levels of cache
… old Q₁ … old Qₙ₋₁ ○ Cache mapping
iii. New A ← sign‑ extended shift of old A ○ Cache replacement policy
3. Decrement count by 1. ○ Cache updating scheme
4. Result is in [A | Q] (2n bits).
4.2. Cache Organisation
3.5.6. Example ➔ Cache Line (Block): Unit of transfer
Multiply M = +3 (0011₂) by Q = –4 (1100₂) ➔ Fields in Address:
using 4‑bit registers: ◆ Tag: Identifies which memory block is
cached.
◆ Index: Selects a cache set or line.
◆ Offset: Chooses the byte/word within
the cache line.
➔ Metadata:
◆ Valid Bit: Line contains valid data.
◆ Dirty Bit: Line has been written (for
write-back).
Page No:-11
COA
GATE फर्रे
Note: 4.3.3. Fully Associative Mapping
1. CM block number = (MM block no.)
modulo (no. of blocks/lines in cache)
● Index in fully associative mapping = 0-
2. No. of blocks = Cache size/Block size
bits
● Tag in fully associative = mm address −
4.3. Cache Mapping
log2(block size)
● In fully associative mapping,
tag = mm block no.
Note:
1. Size of tag is maximum in fully associative
& minimum in direct mapping.
2. Size of index is minimum in fully
associative & maximum in direct mapping.
4.3.1. Direct Mapping
Main Memory(MM) Address is 4.4. Hardware Implementation
4.4.1. Direct Mapping
● Index in direct mapping = cm block ➢ Number of MUX for tag selection = Tag-
number bits
● Tag in direct mapping = mm address - ➢ Size of MUX for tag selection = Number
log2(cache size) of blocks : 1
● MM block no. = Tag + cm block no. ➢ Number of comparators = 1
➢ Size of comparator = Tag-bits
Note:
1. Tag directory size (all mappings) = 4.4.2. K-way Set Associative Mapping
Number of blocks in cache * (tag + extra ➢Number of MUX for tag selection =
bits) K * Tag-bits
2. For a given cache size, block size and ➢Size of MUX for tag selection= Number of set : 1
mm size: Tag is same (for byte and word ➢Number of comparators = k
addressable memory both) ➢Size of comparator = Tag-bits
➢OR-gate = 1 (k-input OR gate)
4.3.2. Set Associative Mapping
4.4.3. Fully Associative Mapping
➢Number of comparators = Number of blocks in
● Cm set number = (mm block no.) % no.
cache
of sets in cache
➢Size of comparator = Tag-bits
● Index in set associative mapping = Set
➢OR-gate = 1 (number of blocks-input OR gate)
offset
● Tag in K-way set associative mapping =
Note: Hit Latency Time
mm address − log2(cache size) + log2K
➢Direct mapping = MUX delay + comparator delay
➢Set associative mapping = MUX delay +
comparator delay + OR-gate delay
➢Fully associative mapping = comparator delay +
OR-gate delay
Page No:-12
COA
GATE फर्रे
cache and cache;
4.5. Types of misses in cache main memory mark line as
4.5.1. Compulsory/Cold Miss immediately. dirty.
The very first access to a block that can’t be
in the cache, so the block must
be brought into cache. These are also called
On Eviction N/A (data If dirty,
first reference misses. already in write entire
memory) block back
4.5.2. Capacity Miss to memory.
If the cache cannot contain all the blocks
Memory High (every Lower
needed to during execution of a
Traffic write (writes only
program. Capacity misses will occur because
generates a on dirty-line
of blocks being discarded and later retrieved. memory evictions)
write)
4.5.3. Conflict Miss
When multiple blocks compete for the same Data Always Memory
Consistency consistent stale until
cache line (or set) under the chosen
between write-back
mapping, even though other lines are free. cache & occurs.
memory.
Note:
1. To reduce conflict miss: increase Hardware Simple; no Requires
Needs dirty bits dirty bit per
associativity
required. line and
2. To reduce cold miss: increase block size
write-back
3. To reduce capacity miss: increase cache logic.
size
4.6. Cache Replacement Policies Typical Use L1 caches for L2/L3
4.6.1. FIFO (First-In-First-Out) simplicity & caches to
predictability reduce bus
It replaces the cache block having the
traffic
longest time stamp with a new block.
4.6.2. LRU (Least Recently Used) 4.7.2. Write Miss Handling
It replaces the cache block which is having [Link]. Write Allocate
less no. of references with the longest time On a write miss, fetch the block into cache,
stamp with a new block. then perform the write (marking it dirty
under write-back).
4.7. Cache Write Policy
4.7.1. Write Through VS Write Back [Link]. No-Write-Allocate
On a write miss, bypass cache and write
directly to main memory; cache remains
Feature Write Write Back unchanged.
Through
On Write Hit Update both Update only
Page No:-13
COA
GATE फर्रे
4.7.3. Write Buffering: Reduces CPU stalls 5.2. Types of memory (Based on
on write operations by enqueuing the write methods of accessing)
in a write buffer. 5.2.1. Sequential Access Memory
● Data is accessed in a fixed linear order.
4.7.4. Summary ● Example: Magnetic tape.
● Use Case: Archival storage (low cost,
Policy Pros Cons
Combinat high capacity), not for random
ion reads/writes.
Write Simple; High memory 5.2.2. Direct Access Memory
Through consistent traffic; write
● Allows access to a record by first moving
+ No memory; no misses still go
Allocate block to memory. to a general area (track/sector), then
pollution on sequentially to the exact record.
writes. ● Example: Hard disks, optical disks
(CD/DVD).
Write Subsequent Still high ● Use Case: File systems,
Through reads of the memory traffic.
databases;moderate access time, large
+ Write block benefit
Allocate from caching. capacity.
Write Lowest Complex; must 5.2.3. Random Access Memory (RAM)
Back + memory track dirty ● Uniform constant-time access to any
Write traffic; good lines; potential
location
Allocate for write- data staleness.
heavy ● Each address has a dedicated physical
workloads. path (wired) for immediate read/write.
● Examples:
Write Rarely used Not practical; ○ DRAM (Dynamic RAM)
Back + block writes
○ SRAM (Static RAM)
No aren’t cached,
● Use Case: Primary/main memory, CPU
Allocate so dirty bits
unused. caches (fastest at their level).
5.2.4. Associative (Content-
5. Memory Organisation
Addressable) Memory
5.1. Memory Hierarchy
● Retrieves data by content rather than by
specific address
● All words are compared simultaneously;
matching word(s) are returned.
● Example: Translation Lookaside Buffer
(TLB) in virtual memory.
● Use Case: Fast lookups (e.g., cache tags,
TLB), where search key determines the
fetch.
Page No:-14
COA
GATE फर्रे
Note: Average Memory Access Time (AMAT)
= Hit Time+Miss Rate×Miss Penalty
5.3. Difference b/w SRAM & DRAM
SRAM
Consider n levels, then;
Tavg = H1*T1 + (1-H1)*H2*T2 +...+ (1-
H1)*(1-H2)...(1-Hn-1)*Hn*Tn
DRAM (where Hn is Hit Ratio, Tn be access time for
each level)
5.4.2. Hierarchical Access
Data comes from other levels to Level 1 then
CPU gets its access as shown in the figure
below.
Note: DRAM consists of rows of cells &
DRAM Refresh time = no. of rows of cells in
DRAM * 1 cell refresh time
Consider n levels, then;
Tavg=T1+(1-H1)T2+(1-H1)(1-H2)T3+...+(1-
H1)*(1-H2)...(1-Hn-1)*Tn
Note:
1. Memory Access Rate = 1/cycle time
2. Multiplication table for 2, n-bit unsigned
number = 22n * 2n bits
3. Addition table for 2, n-bit unsigned
number = 22n * (n + 1) bits
5.5. Locality Principles
5.5.1. Temporal Locality: Recently
accessed data likely to be reused soon.
5.4. Types of Memory Access
5.5.2. Spatial Locality: Data near recently
5.4.1. Simultaneous Access
accessed addresses likely to be accessed
CPU can access the data simultaneously
soon.
from all levels of memory
Note: Caches exploit both to deliver high hit
rates.
Page No:-15
COA
GATE फर्रे
5.6. Memory representation [Link]. Maskable Interrupts
Eg: 214 x 32 bits memory means, ● Interrupts that can be disabled or ignored
214 addresses and 32 bits wide word by the CPU using a special flag or
instruction.
5.7. Interrupts ● Used for lower-priority or non-critical
5.7.1. Introduction tasks.
An interrupt is a signal that causes the CPU ● Eg. INTR in 8085, I/O completion.
to temporarily halt the current execution and ● Controlled By: Interrupt Enable/Disable
jump to a specific service routine (ISR i.e. instructions (EI, DI)
Interrupt Service Routine) to handle an
event (e.g., I/O completion, error, etc.). [Link]. Non-Maskable Interrupts
● Cannot be disabled by the CPU; always
gets attention.
● Used for critical events (e.g., power
failure, hardware fault).
● Eg. TRAP in 8085.
● Priority: Highest as it overrides all other
interrupts.
Note: Difference b/w Interrupts and
Exceptions
● Exceptions are caused by software
executing instructions. Eg. a page fault,
or an attempted write to read only page.
An expected exception is ‘trap’,
unexpected is a “fault”.
● Interrupts are caused by hardware
devices. Eg. device finishes I/O, timer
fires.
6. Pipelining
5.7.2. Types of Interrupts 6.1. What Is Pipelining?
[Link]. Vectored Interrupts ● Pipelining is a technique of overlapping
A unique ISR address is supplied (either the execution of multiple instructions by
directly or via vector table) to reduce the dividing the processor’s datapath into
time involved in the polling process. stages, each handling a part of the
instruction.
[Link]. Non-Vectored Interrupts ● It increases instruction throughput
CPU jumps to a general or fixed location without reducing the execution time of
(e.g., predefined interrupt handler), and individual instructions.
software determines the source.
Page No:-16
COA
GATE फर्रे
6.2. Types of Pipelines More generally, for a pipeline with k stages
6.2.1. Linear Pipeline and clock period tₚ running n instructions:
Used for single specific function ● First instn takes k*tp to traverse all
stages.
6.2.2. Non-linear Pipeline ● Remaining (n–1) instn each completes in
At a given stage, multiple parallel functional one additional tₚ
units or paths exist Total time: k*tp + (n−1)*tp = (k+n−1)*tp
rather than a single straight chain.
Note:
6.2.3. Synchronous Pipeline 1. To complete n tasks using a k-segment
On a common clock, all registers transfer pipeline requires k + (n - 1) clock cycles
data to next stages simultaneously. 2. Tp = max(all segment/stage delays) +
register delay
6.2.4. Asynchronous Pipeline
Note: Consider a nonpipeline unit that
6.3. Performance of Pipeline performs the same operation and takes time
6.3.1. equal to tn to complete each task. The total
time required for n tasks is n*tn, where tn =
sum of all segment delays
6.3.2. Speedup (S)
[Link]. The speedup of a pipeline
Suppose 5 instn, T₁ through T5, pass through processing over an equivalent non-pipeline
a pipeline of four stages. processing is defined by the ratio, S =
���
(�+�−1)×��
[Link]. In ideal conditions i.e. n>>>k, we
��
ignore (k-1), then S =
��
[Link]. If we assume that the time it takes
to process a task is the same in the pipeline
● Cycle 1: Stage 1 processes T₁ and non-pipeline circuits, we will have tn =
● Cycle 2: Stage 2 works on T₁ while Stage k*tp, then S = k
1 begins T₂ In this scenario, maximum speedup is
● Cycle 3: Stage 3 takes on T₁, Stage 2 achieved.
handles T₂, and Stage 1 starts T₃
● Cycle 4: Stage 4 completes T₁; 6.4. Latency & Throughput
meanwhile, earlier stages are busy with 6.4.1. Latency
T₂, T₃, and T₄ After how much time i/p is given to the
After these four cycles, the pipeline is “full,” system
and thereafter it finishes one task per cycle. ● Pipeline Latency = tp (∵ after every cycle,
new i/p)
Page No:-17
COA
GATE फर्रे
● Non-Pipeline Latency = tn ● If the branch is not taken, we keep and
6.4.2. Throughput use the instn that FI already fetched in
No. of tasks/instn per unit time cycle 4, and the pipeline keeps flowing as
● Pipeline: In ideal case = 1/tp normal.
6.5. Instruction Pipelining
It overlaps the fetch-decode-execute phases
of multiple instructions in a linear chain of
stages, so each clock cycle a different
instruction occupies each stage.
Eg. Consider a four stage pipeline as
follows:
1. FI is the segment that fetches an
instruction. 6.6. Pipeline Hazards/Conflicts
2. DA is the segment that decodes the 6.6.1. Structural Hazard/Resource
instruction and calculates the effective Conflict
address. 
3. FO is the segment that fetches the
operand.
4. EX is the segment that executes the
instruction.
We assume the CPU has separate instruction
and data memories, so it can do an
instruction fetch (FI) and a data fetch (FO)
at the same time.
At cycle 4 in our 4-stage pipeline: Caused by access to memory by two
● Stage EX is finishing Instruction 1 segments at the same time. Most of these
● Stage FO is loading the operand for conflicts can be resolved by using separate
Instruction 2 instruction & data memories.
● Stage DA is decoding Instruction 3
● Stage FI is fetching Instruction 4 Solution:
1. Incure stall cycles
Now suppose Instruction 3 turns out to be a 2. Increase no. of resources
branch. As soon as it’s decoded in DA (still in
cycle 4), we stop feeding any new instn from Stall cycles
FI into DA until we know where the branch Also called a pipeline bubble, a stall is a
goes. clock cycle in which no new instruction
completes, inserted to resolve a hazard or
● If the branch is taken, we throw away resource conflict.
what was in FI/DA and fetch the correct
next instn in cycle 7.
Page No:-18
COA
GATE फर्रे
Solution of above figure: Soln of Structural Hazard
Introduce bubble/stall cycles which stalls the
pipeline as in figure besides. Note:
At t4, I4 is not allowed to proceed, rather ● Stalls because of branch = i – 1 (if after
delayed. It could have been allowed in t5, ith stage, the condition is evaluated)
but again a clash with I2 RW. For the same ● Even if branch is not taken, then too
reason, I4 is not allowed in t6 too. Finally, I4 stalls are there due to branch instructions
could be allowed to proceed (stalled) in the ● Result of branch condition evaluation is
pipe only at t7. available after execution phase of branch
instruction
6.6.2. Data Hazard/Data dependency
A data dependency occurs when one
instruction needs data that is produced or
used by another instruction,creating a
potential conflict in a pipeline.
Eg.
Conflicts
instruction, all three instructions would be
In the above case, ADD instruction writes using the wrong data from R3, which is
the result into the register R3 in t5. If stalls earlier to ADD result. The program goes
are not introduced to delay the next SUB wrong! The possible solutions are:
[Link]. Hardware Interlocks [Link]. Operand/Data Forwarding
Hardware Interlocks are built-in pipeline Also called bypassing, is a hardware
controls that detect hazards at runtime & technique that routes results directly from
automatically stall the pipeline to prevent one pipeline stage to an earlier stage that
incorrect execution. needs them, avoiding a stall.
Page No:-19
COA
GATE फर्रे
[Link]. Delayed load/No operation by reordering subsequent instructions or
Delayed Load is a compiler technique that inserting a “no-op” slot before using the
avoids a pipeline stall after a load instruction loaded data.
[Link]. Data Hazard Classifications ➢ RAW (Read After Write)
Assume 2 instn i & j
i: R1 ← R2 + R3 If j writes a destination before its read by i,
j: R5 ← R1 + R4 hence i reads incorrect value
If j reads a source before its written by i,
hence j gets incorrect value Note:
1. RAW is True Dependency while WAR &
➢ WAW (Write After Write) WAW are False Dependencies as they can
i: R1 ← R2 + R3 be solved by register renaming.
j: R1 ← R4 * R5 2. WAR is known as Anti-dependency
If j writes a destination before its written by 3. WAW is known as Output Dependency
i 4. Operand forwarding and register
➢ WAR (Write After Read) renaming can not solve the memory
i: R1 ← R2 + R3 access dependencies
j: R2 ← R9 + R7
6.6.3. Control hazard
is always executed, regardless of whether
the branch is taken. The compiler fills this
slot with a safe instruction (or a NOP) to
hide the branch-decision penalty and
improve pipeline throughput without extra
hardware.
[Link]. Branch Prediction
It is hardware logic that guesses the
outcome of a conditional branch before it is
resolved, allowing the pipeline to continue
[Link]. Delayed Branch
fetching along the predicted path and avoid
A technique where the instruction
stalls when the guess is correct.
immediately following a branch, “delay slot”
Page No:-20
COA
GATE फर्रे
7. Secondary Memory Variable recording density: Outer tracks hold
7.1. Introduction more sectors than inner tracks.
7.1.1. Platters & Surfaces Maintains constant angular velocity.
One or more circular platters coated with
magnetic material; each surface has its own 7.2.2. Variable Track Capacity (VTC)
read/write head. Constant recording density: Same number of
sectors per track; disk spins slower at outer
7.1.2. Track tracks to equalize data rate.
A concentric circle on a platter surface. All Note: Each rotation of disk covers 1 track
tracks on different platters at the same
radius form a cylinder i.e. No. of cylinders = 7.3. Disk Performance
No. of tracks
7.1.3. Sector
● Smallest addressable unit on a track
(typically 512 B)
● Contains user data + control information
(format overhead)
7.3.1. Seek Time (Ts)
7.1.4. Gaps
Time to move the R/W head from its current
[Link]. Inter-Sector Gap: Empty region
track to the target track.
between adjacent sectors.
[Link]. Inter-Track Gap: Empty region
7.3.2. Rotational Latency (Tr)
between adjacent tracks.
Time for the platter to spin so the desired
sector comes under the head.
Tr = (rotation time)/2
7.3.3. Transfer Time (Ttransfer)
Time to actually read/write the bits in the
sector once under the head.
7.3.4. Overhead Time (Toverhead)
Controller delays (e.g., command
processing).
7.3.5. Disk Capacity
Disk Capacity = 2 * no. of platters * tracks
Fig. Disk Structure
per surface * sectors per track * sector
capacity
7.2. Types of Disk Constructions
7.2.1. Constant Track Capacity (CTC)
Page No:-21
COA
GATE फर्रे
Note: control lines for each.
1. Tavg = Ts+Tr+Ttransfer+Toverhead(if given) 3. Use one common bus for memory and I/O
2. Sequentially stored N sector transfer time with common control lines.
= Seek Time + Rotational Latency + N *
(1 sector Transfer Time)
3. Randomly stored N sector transfer time =
N * (Seek Time + Rotational Latency + 1
sector Transfer Time)
7.4. Disk Addressing
Disk addressing < c, h, s > , where c =
cylinder number, h = surface number, s =
sector number
● Sector number for given address = 8.2. I/O processor (IOP)
c * sectors per cylinder + h * sectors per ➔ Same memory bus for both CPU & IOP
track + s ➔ IOP communicates with I/O devices
● c = sector number / sectors per cylinder through a separate I/O bus with its own
● h = (sector number % sectors per address, data and control lines.
cylinder) / sectors per track
● s = (sector number % sectors per 8.3. Isolated I/O
cylinder) % sectors per track Common address space and different control
lines
8. I/O Interface
8.1. Introduction 8.4. Memory mapped I/O
Input-output interface provides a method for ➔ take few addresses from memory address
transferring information between internal space
storage and external I/O devices. ➔ same address space for both memory
There are 3 ways that computer buses can and I/O
be used to communicate with memory and
I/O:
1. Use two separate buses, one for memory
and the other for I/O.
2. Use one common bus for both
memory and IO but have separate
Difference between Memory mapped I/O &
Isolated I/O
Page No:-22
COA
GATE फर्रे
8.5. I/O Interface
3.5.1. Programmed I/O
3.5.2. Interrupt driven I/O
3.5.3. Direct Memory Access (DMA)
8.5.1. Programmed I/O
With programmed I/O, data is exchanged
8.5.3. Direct Memory Access
between the processor and the I/O module.
[Link]. Definition & Purpose
The processor executes a program that gives
➔ DMA is a mode of data transfer where a
it direct control of the I/O operation. If the
dedicated DMA controller moves data
processor is faster than the I/O module, this
directly between I/O device and main
is wasteful of processor time.
memory without CPU intervention for
Time in programmed IO = time to check
each word.
status + time to transfer data
➔ Frees the CPU from high-volume data
transfers, improving overall system
8.5.2. Interrupt driven I/O
throughput.
a. Interrupt I/O is a data transfer technique
where the CPU is interrupted by the I/O
[Link]. DMA Controller
device only when it is ready (e.g., after
➔ Address Register: Holds the current
completing a data transfer or requiring
memory address for transfer.
service).
➔ Count Register: Holds the number of
b. It eliminates constant polling of the
words/bytes left to transfer.
device, thus increases CPU efficiency.
➔ Control Logic: Generates read/write and
Time in Interrupt IO = interrupt overhead +
bus-request signals.
time to service interrupt
➔ Bus Arbitration Interface: Requests and
releases the system bus from the CPU.
[Link]. Daisy– Chaining Priority
(Serial– Priority Interrupt)
[Link]. DMA Transfer Sequence
a. CPU Initialization:
The system consists of a serial connection of
● Loads DMA controller’s Address and
all devices that request an interrupt. The
Count registers.
device with the highest priority is placed in
● Issues a “Start DMA” command.
first position followed by lower–priority
b. Bus Request: DMA controller asserts Bus
devices.
Request (BR); CPU grants via Bus Grant
(BG).
c. Data Transfer:
Page No:-23
COA
GATE फर्रे
● DMA takes control of the bus, reads from 2. % of time CPU blocked (cycle stealing)
device → writes to memory (or vice versa)
automatically.
● Address register increments/decrements; 3. Max data transferred using DMA without
Count register decrements. CPUs intervention =
d. Completion: When Count = 0, DMA 2x − 1, x = bits in data count
releases bus and raises an Interrupt to CPU
indicating end of transfer.
[Link]. DMA Modes
Mode Description CPU
Burst DMA transfers the CPU
Mode entire block in one paused
bus-hold period. for the
burst
Cycle DMA takes the bus CPU
Stealing for one transfer, slowed
then releases; later slightly
it will ‘steal’
memory cycle when
CPU is idle.
Interleavin DMA transfers only Minimal
g Mode when CPU is not interfere
using the bus. nce
Note:
1. % of time CPU blocked (burst mode)
Page No:-24