0% found this document useful (0 votes)
51 views64 pages

Ade 2

The document provides an overview of computer architecture and organization, detailing the theoretical concepts, major components of a computer system, and the structure and functioning of CPU units. It discusses processor evolution, performance metrics, factors affecting performance, and benchmarking methods. Additionally, it covers I/O interfacing, types of computer architecture, the instruction cycle, and the role of registers in CPU operations.

Uploaded by

alepikinsamson
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
51 views64 pages

Ade 2

The document provides an overview of computer architecture and organization, detailing the theoretical concepts, major components of a computer system, and the structure and functioning of CPU units. It discusses processor evolution, performance metrics, factors affecting performance, and benchmarking methods. Additionally, it covers I/O interfacing, types of computer architecture, the instruction cycle, and the role of registers in CPU operations.

Uploaded by

alepikinsamson
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 64

Lecture Note: Computer Architecture – HND 1

NCC & SWD

COURSE LECTURER: MAL. MURTALA

FIRST SEMESTER LECTURE MATERIAL

1.1 Theoretical Concept of Computer Organization and Architecture

Computer Architecture
• Refers to the abstract model and design of a computer system.
• It includes the instruction set architecture (ISA), data formats, addressing modes, and
how instructions are executed.
• Example: The x86 or ARM instruction set used by modern CPUs.
Computer Organization
• Refers to the physical implementation of a computer system.
• It focuses on how hardware components like the control unit, ALU, memory, and I/O are
connected and work together.
• Example: How a processor executes an instruction using control signals and registers.

1.2 Three Major Components of a Computer System

1. Central Processing Unit (CPU)


• The brain of the computer.
• Executes instructions using:

o Control Unit (CU): Directs operations.

o Arithmetic Logic Unit (ALU): Performs calculations.

o Registers: Temporary storage for fast access.

2. Memory (Primary Storage)


• Stores data and instructions temporarily.
• Types:

o RAM (Volatile): Temporary storage while the system is running.

o ROM (Non-volatile): Permanent storage for firmware.


3. Input/Output (I/O) Devices
• Facilitate interaction between user and computer.

o Input: Keyboard, mouse, scanner.

o Output: Monitor, printer, speakers.

1.3 Structure and Functioning of CPU Units

1. Control Unit (CU)


• Directs the operation of the processor.
• Fetches instructions from memory, decodes them, and coordinates execution.

2. Arithmetic Logic Unit (ALU)


• Executes arithmetic (add, subtract) and logic operations (AND, OR, NOT).

3. Registers
• Small, fast storage units inside the CPU.
• Types:

o Accumulator (ACC): Stores intermediate results.

o Program Counter (PC): Holds address of the next instruction.

o Instruction Register (IR): Holds current instruction.

1.4 Intel and ARM Processor Evolution


Intel Processors:
• Early Years (1971): 4004 (4-bit CPU).
• Evolution: 8086 → 80286 → Pentium → Core series (i3, i5, i7, i9).
• Focus on high-performance CISC (Complex Instruction Set Computing).

ARM Processors:
• ARM1 (1985): Developed by Acorn Computers.
• Known for RISC (Reduced Instruction Set Computing).
• Focus on low power, high efficiency.
• Widely used in mobile phones, embedded systems, and IoT devices.

1.5 Metrics for Measuring Processor Performance

Metric Description

Speed Measured in GHz, determines how fast instructions are executed.

Power Power consumption in watts. Important for mobile/embedded systems.

Throughput Number of tasks a system can complete in a given time (e.g., MIPS).

• Additional Metrics:

o CPI (Cycles Per Instruction): Lower is better.

o Instructions Per Second (IPS): Measures total instructions executed.

o Latency: Time taken to complete a task from start to end.

1.6 Factors Affecting Processor Performance

1. Architecture Design

o Determines how efficiently a processor executes instructions.

o Factors include pipelining, cache size, branch prediction, and bus width.

o Example: A modern superscalar architecture can execute multiple instructions per


cycle.
2. Clock Speed

o Measured in GHz (gigahertz); higher clock speed = faster instruction execution.

o Limited by heat generation and power consumption.

3. Instruction Set Complexity

o CISC (Complex Instruction Set Computing) has more powerful instructions but
can be slower.

o RISC (Reduced Instruction Set Computing) uses simpler instructions for faster
execution.

4. Parallel Processing Capabilities


o Use of multiple cores or threads to perform tasks concurrently.

o Example: Multi-core CPUs (e.g., Quad-core, Octa-core) increase throughput and


multitasking ability.

1.7 Concept of Processor Benchmarking

What is Benchmarking?
• A method of measuring and comparing the performance of processors using standard
programs or tests.

Use of Benchmarking
• Helps users and manufacturers evaluate CPU performance.
• Used to:

o Compare different processors.

o Identify bottlenecks.

o Optimize performance.

Examples of Benchmarks:
• SPEC (Standard Performance Evaluation Corporation): Measures CPU and system
performance.
• Geekbench, Cinebench, PassMark: Used in real-world and synthetic testing scenarios.

1.8 Microprocessor I/O Interfacing


Definition:
• Refers to connecting external input/output devices (e.g., keyboard, printer) to a
microprocessor system.

Types of I/O Interfacing:

1. Memory-Mapped I/O:

o I/O devices share the same address space as memory.

o Use same instructions to access memory and I/O.

2. Isolated I/O (Port-Mapped I/O):

o Separate address space for I/O devices.


o Uses special instructions like IN and OUT.

Importance:
• Ensures data exchange between processor and peripheral devices.
• Requires I/O controllers and ports.

1.9 I/O Interface: Interrupt and DMA Modes (Software)

1. Interrupt-Driven I/O
• Concept: Processor is interrupted when an I/O device needs attention.
• Allows the CPU to perform other tasks until the device is ready.
• Steps:

1. Device sends an interrupt signal.

2. CPU pauses current task.


3. Executes Interrupt Service Routine (ISR).

4. Resumes previous task.


• Advantages:

o Efficient CPU usage.

o Better multitasking.

2. Direct Memory Access (DMA)


• Concept: Allows I/O devices to transfer data directly to/from memory without CPU
involvement.
• A DMA controller manages the transfer.
• Steps:

1. Device requests DMA.

2. CPU grants bus access.

3. DMA transfers data directly.

4. CPU is notified after transfer.


• Advantages:
o Frees CPU from data transfer tasks.

o Faster and more efficient for large data volumes.

2.1 Types of Computer Architecture (Including Harvard and Von Neumann Architectures)

Computer architecture refers to the design principles and structure of a computer system. There
are several types of architectures, but two foundational ones are Von Neumann and Harvard
architectures. These architectures define how memory, data, and instructions interact within a
system.

1. Von Neumann Architecture

Definition:

Proposed by John Von Neumann in 1945, this architecture uses a single memory for storing both
instructions and data.

Key Features:
• Single memory space for both data and programs.
• Instructions are fetched and executed sequentially.
• Uses a single bus for data and instructions, leading to bottlenecks.
• Simpler and cost-effective design.

Von Neumann Bottleneck:


• The limitation occurs because instructions and data cannot be fetched at the same time due
to the shared bus.
• This leads to reduced processing speed, especially in complex systems.
Applications:
• Used in general-purpose computers like PCs, laptops, and workstations.

2. Harvard Architecture
Definition:

This architecture uses separate memory for storing instructions and data, allowing simultaneous
access.

Key Features:
• Separate storage and buses for instructions and data.
• Allows simultaneous fetching of instructions and reading/writing of data.
• More efficient and faster than Von Neumann for certain applications.
• More complex and costly due to dual memory systems.

Advantages:
• No bottleneck between data and instruction transfer.
• Better performance in real-time and embedded systems.

Applications:
• Common in microcontrollers, digital signal processors (DSPs), and embedded systems.

Other Types of Computer Architecture (Brief Overview):

1. Instruction Set Architecture (ISA):

o Defines the set of instructions a processor can execute.

o Examples: x86, ARM, MIPS.

2. RISC (Reduced Instruction Set Computing):

o Simplifies instructions to execute them faster.

o Common in mobile and embedded devices (e.g., ARM processors).

3. CISC (Complex Instruction Set Computing):

o Uses complex instructions that perform multiple operations.

o Common in desktop and server processors (e.g., Intel x86).

4. Parallel Architecture:

o Uses multiple processors or cores for simultaneous execution.

o Common in high-performance computing.

2.2 Instruction Cycle

Definition:

The instruction cycle, also called the fetch-decode-execute cycle, is the basic operational process
of a computer. It refers to the sequence of steps the CPU follows to fetch an instruction from
memory, decode it, and execute it.
This cycle repeats continuously while the computer is running.

Main Stages of the Instruction Cycle:

1. Fetch

o The Control Unit (CU) retrieves the next instruction from memory.

o The address of the instruction is held in the Program Counter (PC).

o The instruction is then loaded into the Instruction Register (IR).

o After fetching, the PC is incremented to point to the next instruction.

2. Decode

o The CPU interprets the fetched instruction.

o The Control Unit decodes the binary code into a form that the CPU can understand
(e.g., operation type, operands).

o It identifies what operation is to be performed and which data or memory


locations are involved.

3. Execute

o The Arithmetic Logic Unit (ALU) or appropriate part of the CPU carries out the
instruction.

o Operations may include arithmetic (e.g., addition), logic (e.g., AND), data
movement (e.g., load/store), or control (e.g., jump).

o Results may be stored in registers, memory, or sent to an output device.


4. Store (Optional)

o The result of the execution may be written back to memory or a register,


depending on the type of instruction.

Example of an Instruction Cycle:


Let’s say the instruction is ADD R1, R2, R3 (Add contents of R2 and R3, store result in R1)
• Fetch: Instruction ADD R1, R2, R3 is fetched from memory.
• Decode: CPU understands it needs to add contents of R2 and R3.
• Execute: ALU adds the values in R2 and R3.
• Store: Result is placed in R1.
Registers Involved in the Instruction Cycle:

Register Function

Program Counter (PC) Holds the memory address of the next instruction

Instruction Register (IR) Holds the current instruction being decoded/executed

Memory Address Register (MAR) Holds the address to access memory

Memory Data Register (MDR) Holds the data being transferred to/from memory

General Purpose Registers Hold temporary data and results

2.3 Describe Register Set

2.3 Description of Register Set

Definition:

A register set refers to a collection of high-speed, small-sized memory units located inside the
Central Processing Unit (CPU). Registers temporarily hold data, instructions, addresses, and
intermediate results that the CPU uses during processing.

Registers are faster than RAM and are crucial for CPU operations.

Characteristics of Registers:
• Very fast access time (faster than cache and RAM).
• Small in size, typically measured in bits (e.g., 8-bit, 16-bit, 32-bit, or 64-bit).
• Located directly inside the CPU.
• Used during instruction execution, arithmetic operations, memory access, and control
operations.

Types of Registers in the Register Set:

Register Type Description

Accumulator (ACC) Stores results of arithmetic and logic operations.


Register Type Description

Program Counter (PC) Holds the address of the next instruction to be fetched.

Instruction Register (IR) Holds the current instruction being executed.

Memory Address Register


Holds the address in memory where data is to be read or written.
(MAR)

Memory Data Register


Holds the actual data being transferred to or from memory.
(MDR)

Temporary storage used by the ALU for computation (e.g., R0,


General Purpose Registers
R1, R2, R3).

Stack Pointer (SP) Points to the top of the stack in memory.

Status Register (Flag Contains flags that indicate the status of operations (e.g., Zero,
Register) Carry, Overflow).

Functions of Register Set:

1. Instruction Execution:

o Registers like IR, PC, and ACC play key roles in fetching and executing
instructions.

2. Data Storage:

o General-purpose registers temporarily store values during calculations and program


flow.

3. Address Handling:

o Registers like MAR and PC handle memory addressing and program sequencing.

4. Control Flow:

o Registers like SP and status flags are essential in managing procedure calls, loops,
and branching.
Example in Practice:

Let’s say an instruction is: ADD R1, R2, R3


• CPU fetches the instruction and places it in the Instruction Register (IR).
• It uses the values in R2 and R3, adds them, and stores the result in R1.
• The Program Counter (PC) is updated to point to the next instruction.

2.1 Describe different CPU Registers Including General purpose and Specific Registers

CPU Registers

Registers are small, fast storage locations inside the Central Processing Unit (CPU). They are
used to store data, addresses, instructions, and intermediate results during processing.
Registers are essential to the functioning of the instruction cycle and overall CPU operations.

1. General-Purpose Registers (GPRs)

These are flexible, multipurpose registers used by the CPU to perform arithmetic, logic, and data
manipulation operations. They temporarily hold operands and intermediate values during
program execution.

Examples of General-Purpose Registers:

Register Description

Typically labeled as R0, R1, R2, ..., Rn depending on the architecture (e.g.,
R0 - Rn
8 or 16 registers in RISC). Used for holding temporary data.

AX (Accumulator
Used in arithmetic and logic operations. Common in x86 architecture.
Register)

BX (Base Register) Often used as a base pointer for memory access.

CX (Count Register) Used in loop operations and count-controlled processes.

DX (Data Register) Used for I/O operations and extended precision arithmetic.

Note: In RISC (e.g., ARM) architectures, general-purpose registers are usually named R0 to R15.
2. Special-Purpose Registers (Specific Registers)

These registers serve dedicated roles in the CPU for control and memory operations, instruction
execution, and status reporting.

Examples of Special-Purpose Registers:

Register Full Name Description

Holds the address of the next instruction to execute.


PC Program Counter
Automatically increments after fetching an instruction.

Holds the current instruction fetched from memory for


IR Instruction Register
decoding and execution.

Points to the top of the current stack in memory. Used in


SP Stack Pointer
function calls and returns.

Memory Address Holds the address of the memory location to read from or
MAR
Register write to.

Holds the actual data being transferred to or from the


MDR Memory Data Register
memory location specified by MAR.

FLAGS / Flag Register or Holds condition flags (e.g., Zero, Carry, Sign, Overflow)
PSW Program Status Word that affect instruction flow and decision-making.

Common Condition Flags in the Flag Register:

Flag Description

Zero (Z) Set if the result of an operation is zero.

Carry (C) Set if there is a carry-out from an arithmetic operation.

Sign (S) Set if the result is negative.

Overflow (O) Set if the result of an operation exceeds the data size.

Parity (P) Set if the number of 1 bits in the result is even.

2.4 Explain Instruction Set


ChatGPT said:

Certainly!

2.4 Instruction Set


Definition:
An Instruction Set is the complete collection of machine language instructions that a particular
CPU is designed to understand and execute. These instructions are the commands given to the
processor to perform specific operations such as data movement, arithmetic, logic, and control.

Each CPU architecture (like x86, ARM, MIPS, etc.) has its own unique instruction set.

Types of Instructions in an Instruction Set:

1. Data Transfer Instructions

o Move data from one location to another.

o Examples:
▪ MOV R1, R2 – Copy contents of R2 into R1
▪ LOAD A, [1000] – Load data from memory location 1000 into register A
▪ STORE A, [1000] – Store contents of A into memory location 1000
2. Arithmetic Instructions

o Perform mathematical operations.

o Examples:
▪ ADD R1, R2 – Add R1 and R2
▪ SUB R3, R4 – Subtract contents of R4 from R3

▪ MUL R1, R2 – Multiply R1 by R2


▪ DIV R5, R6 – Divide R5 by R6
3. Logical Instructions

o Perform bitwise operations and logical tests.

o Examples:
▪ AND R1, R2
▪ OR R3, R4
▪ XOR R5, R6
▪ NOT R1

4. Control (Branching) Instructions

o Control the flow of execution.

o Examples:
▪ JMP 200 – Jump to instruction at address 200

▪ CALL 300 – Call subroutine at address 300


▪ RET – Return from subroutine
▪ JE, JNE, JZ, JNZ – Conditional jumps based on flag status

5. Input/Output Instructions

o Used for communication with external devices.

o Examples:
▪ IN R1, PORT1 – Input from port1 to R1
▪ OUT PORT2, R2 – Output contents of R2 to port2
6. Shift and Rotate Instructions

o Shift or rotate bits in a register.

o Examples:
▪ SHL R1, 1 – Shift bits of R1 left by 1
▪ SHR R2, 2 – Shift bits of R2 right by 2
▪ ROL R3 – Rotate bits of R3 left

▪ ROR R4 – Rotate bits of R4 right

Types of Instruction Set Architectures (ISA):

1. RISC (Reduced Instruction Set Computer)

o Few simple instructions

o Fixed-length instructions

o Faster execution
o Example: ARM, MIPS

2. CISC (Complex Instruction Set Computer)

o Many complex instructions

o Variable-length instructions

o More powerful per instruction but slower overall

o Example: x86, Intel Pentium

Importance of Instruction Set:


• Acts as a bridge between hardware and software.
• Determines what operations the CPU can perform.
• Affects compiler design and overall system performance.
• Software must be written in the machine language that matches the instruction set of the
CPU.

Use of Various Sets of Instructions in Program Development

In computer programming and software development—particularly in low-level programming or


assembly language—instruction sets are essential for writing programs that communicate directly
with the hardware. Each instruction set category plays a specific role in controlling how a program
behaves, processes data, manages memory, and interacts with external devices.
Below is an explanation of how the various sets of instructions are used during program
development:

1. Data Transfer Instructions


Purpose:

To move data between memory, registers, and I/O devices.

Use in Program Development:


• Loading data into registers for computation.
• Saving results back into memory.
• Moving data between different parts of a program.

Examples:
• MOV R1, R2: Copy contents of R2 into R1.
• LOAD R1, [1000]: Load data from memory location 1000 into R1.
• STORE R1, [1000]: Store contents of R1 into memory location 1000.
2. Arithmetic Instructions

Purpose:

To perform mathematical operations like addition, subtraction, multiplication, and division.

Use in Program Development:


• Implementing business logic (e.g., tax calculation, billing).
• Creating mathematical models and simulations.
• Performing data analysis and numerical computations.
Examples:
• ADD R1, R2: Add R1 and R2.
• SUB R3, R4: Subtract R4 from R3.
• MUL R1, R2: Multiply R1 by R2.

3. Logical Instructions

Purpose:
To perform bitwise operations and logical comparisons.

Use in Program Development:


• Evaluating conditions in control structures (e.g., IF, WHILE).
• Bit-level operations for encryption, compression, and hardware control.
• Implementing logical decision-making in programs.

Examples:
• AND R1, R2: Perform bitwise AND on R1 and R2.
• OR R1, R2: Perform bitwise OR.
• NOT R1: Invert bits of R1.

4. Control Transfer Instructions (Branching)

Purpose:
To alter the flow of execution based on conditions or jump to subroutines.

Use in Program Development:


• Controlling loops, conditional execution, and function calls.
• Creating modular programs with reusable procedures or functions.
• Handling decision-making logic and recursion.

Examples:
• JMP 200: Jump to instruction at address 200.
• CALL 300: Call subroutine at 300.
• RET: Return from subroutine.
• JZ, JNZ, JE, JNE: Conditional jumps based on flag status.

5. Input/Output (I/O) Instructions

Purpose:

To enable communication with external devices such as keyboards, screens, printers, and
sensors.
Use in Program Development:
• Reading user input.
• Displaying output to the screen or other devices.
• Sending and receiving data in embedded or real-time systems.
Examples:
• IN R1, PORT1: Input data from port1 to R1.
• OUT PORT2, R1: Output contents of R1 to port2.

6. Shift and Rotate Instructions


Purpose:

To perform bit-level manipulation of data.

Use in Program Development:


• Multiplying or dividing by powers of 2.
• Cryptographic algorithms.
• Manipulating individual bits for control flags or hardware signaling.

Examples:
• SHL R1, 1: Shift bits in R1 to the left.
• SHR R1, 1: Shift bits in R1 to the right.
• ROL R1: Rotate bits of R1 left.

Instruction Set Architecture (ISA)

Definition:
The Instruction Set Architecture (ISA) is the interface between hardware and software of a
computer. It defines how the processor understands and executes instructions—including the
types of instructions available, their format, and how they interact with memory and registers.

In simple terms, the ISA is the programmer's view of the computer hardware, and it tells how
a CPU should behave in response to specific binary commands.

Key Components of an ISA:


1. Instruction Format

o Specifies the layout of bits in an instruction.

o Typically includes:
▪ Opcode: Operation code (e.g., ADD, LOAD).
▪ Operands: Registers or memory addresses involved in the operation.
▪ Addressing Mode: How operands are accessed (e.g., direct, immediate).

2. Instruction Types

o Data movement (e.g., LOAD, STORE)

o Arithmetic and logic (e.g., ADD, SUB, AND, OR)


o Control (e.g., JUMP, CALL, RETURN)

o I/O instructions

3. Data Types

o Defines the size and type of data the CPU can process (e.g., 8-bit, 16-bit, 32-bit
integers, floating-point numbers).

4. Registers

o Describes the number, type, and function of CPU registers available (e.g., general-
purpose, program counter, stack pointer).

5. Addressing Modes

o Specifies how to access operands:


▪ Immediate addressing: Operand is part of the instruction.
▪ Direct addressing: Operand address is given directly.
▪ Indirect addressing: Address of operand is stored in a register.

6. Memory Model

o Describes how memory is accessed and managed (e.g., byte-addressable vs. word-
addressable).

Types of Instruction Set Architectures:

1. CISC (Complex Instruction Set Computer)

o Large number of complex instructions.

o One instruction may do multiple tasks.

o Example: Intel x86 architecture

o Pros: Shorter programs.

o Cons: Slower execution per instruction.

2. RISC (Reduced Instruction Set Computer)

o Smaller number of simple instructions.

o Each instruction performs a single task.

o Example: ARM, MIPS


o Pros: Faster execution and easier pipelining.

o Cons: May need more instructions to perform a task.

Importance of ISA:
• Software Compatibility: Programs compiled for a specific ISA will only run on
processors that support that ISA.
• Hardware Flexibility: Allows hardware engineers to design different implementations of
the same ISA with varying performance and cost.
• Compiler Design: The compiler relies on ISA to generate machine code that the CPU can
understand.

Different Instruction Set Architecture (ISA) Designs


The Instruction Set Architecture (ISA) defines how a CPU interprets and executes machine-
level instructions. There are different types of ISA designs based on how instructions are structured
and executed. The two most prominent ISA designs are:
• Reduced Instruction Set Computer (RISC)
• Complex Instruction Set Computer (CISC)

1. Reduced Instruction Set Computer (RISC)

Definition:
RISC is an ISA design philosophy that uses a small number of simple instructions, each designed
to execute in a single clock cycle.

Key Characteristics:
• Fixed-length instructions
• Fewer instruction types
• Load/store architecture: Memory is accessed only via specific load and store instructions.
• Simplified addressing modes
• Emphasizes compiler optimization and hardware pipelining

Advantages:
• Faster execution due to simple instructions
• Easier to implement pipelining and parallelism
• Lower power consumption

Disadvantages:
• Programs may require more instructions
• More demand on memory bandwidth

Examples of RISC Architectures:


• ARM (used in smartphones and tablets)
• MIPS (used in routers and embedded systems)
• RISC-V (open-source and academic use)
• SPARC (used in servers and workstations)

2. Complex Instruction Set Computer (CISC)

Definition:

CISC uses a large and rich set of instructions, where each instruction may perform multiple
operations (e.g., memory access + arithmetic + condition check).

Key Characteristics:
• Variable-length instructions
• Supports complex operations with fewer instructions
• Rich addressing modes (immediate, direct, indirect, indexed)
• Emphasizes instruction-level efficiency
Advantages:
• Programs can be smaller in size (fewer instructions)
• Reduces the complexity of compiler design
• Good for code density in memory-constrained systems

Disadvantages:
• Slower instruction decoding
• More difficult to pipeline
• Higher power consumption

Examples of CISC Architectures:


• x86 (Intel, AMD CPUs for desktops and laptops)
• VAX (used historically in DEC minicomputers)
• System/360 and 370 (IBM mainframes)

3. Other ISA Design Approaches (Brief Overview)

a. Very Long Instruction Word (VLIW)


• Executes multiple operations in one instruction word
• Compiler decides which operations to group together
• Used in Intel Itanium, TI DSPs

b. Explicitly Parallel Instruction Computing (EPIC)


• Similar to VLIW, but adds extra hardware support for parallelism
• Used in Intel’s Itanium processors

c. Stack-Based Architecture
• Uses a stack to hold intermediate values rather than registers
• Simple design but limited flexibility
• Example: Java Virtual Machine (JVM)

2.6 Addressing Modes

Definition:

Addressing modes are the methods used by the CPU to locate the operands (data values) required
for executing an instruction. In simple terms, addressing modes define how and where the CPU
should look for the data to be processed.

Each addressing mode offers a different way to access data from memory, registers, or directly
from the instruction itself.
Types of Addressing Modes:

1. Immediate Addressing Mode


• Description: The operand (data) is given directly in the instruction itself.

2. Register Addressing Mode


• Description: The operand is located in a register.

3. Direct Addressing Mode


• Description: The memory address of the operand is specified explicitly in the
instruction.

4. Indirect Addressing Mode


• Description: The instruction refers to a register or memory location that holds the
address of the operand.

5. Indexed Addressing Mode


• Description: The effective address is obtained by adding a constant value (offset) to the
contents of a register (index register).

6. Base Register Addressing Mode

7. Relative Addressing Mode


• Description: The effective address is determined by adding a constant value (offset) to
the current value of the program counter (PC).

8. Implied Addressing Mode


• Description: The operand is implicitly specified in the instruction (no need to mention it).

3.1 Arithmetic and Logic Unit (ALU)


Definition:

The Arithmetic and Logic Unit (ALU) is a core component of the Central Processing Unit
(CPU) responsible for carrying out arithmetic operations (like addition and subtraction) and
logic operations (like AND, OR, NOT). It is the part of the CPU where actual data processing
takes place.

Main Functions of the ALU:

The ALU performs two categories of operations:

1. Arithmetic Operations

These involve basic mathematical calculations such as:

o Addition (e.g., R1 = R2 + R3)

o Subtraction (e.g., R4 = R5 - R6)

o Multiplication

o Division

o Increment/Decrement (e.g., increasing a counter)

2. Logic (Bitwise) Operations

These involve operations at the bit level, such as:

o AND: Returns 1 if both bits are 1

o OR: Returns 1 if either bit is 1

o XOR: Returns 1 if the bits are different

o NOT: Inverts the bits (1 becomes 0 and vice versa)

o Shift and Rotate: Moves bits to the left or right

Components of an ALU:

The ALU usually consists of the following parts:

Component Function

Arithmetic Circuit Performs mathematical operations

Logic Circuit Handles logical and comparison operations


Component Function

Accumulator/Register Temporarily holds data being processed

Indicates the outcome of operations (Zero, Carry, Overflow, Negative,


Flags/Status Register
etc.)

Common ALU Status Flags:

Flag Description

Zero (Z) Set if the result of an operation is zero

Carry (C) Set if there's a carry out of the most significant bit in addition

Overflow (O) Set if the result exceeds the representable range

Negative (N) Set if the result is negative (most significant bit is 1)

Role of ALU in CPU Operations:


• The ALU works in coordination with the Control Unit (CU) and registers.
• The CU sends control signals to the ALU, telling it what operation to perform.
• The data is fetched from registers, processed by the ALU, and then stored back in a
register or memory.
Example Operation:

If the instruction is ADD R1, R2, R3:


• The ALU receives values from R2 and R3.
• It performs the addition.
• The result is stored in R1.

mportance of the ALU:


• It is central to processing tasks in any computer system.
• Every computation (whether basic or complex) goes through the ALU.
• Determines system performance, especially in tasks like gaming, data science,
encryption, etc.
3.3 Graphical Processing Units (GPUs)

Definition:

A Graphical Processing Unit (GPU) is a specialized electronic circuit designed to accelerate the
processing of images, videos, and complex graphical computations. Unlike the Central
Processing Unit (CPU), which is designed for general-purpose tasks, a GPU is optimized for
parallel processing—making it extremely effective for tasks that involve large-scale data
operations, such as rendering graphics, video editing, and machine learning.

Key Functions of a GPU:

1. Rendering Graphics: The primary purpose of a GPU is to convert data into images by
performing calculations related to lighting, shading, textures, and object positioning in
2D/3D spaces.
2. Parallel Data Processing: GPUs can process thousands of threads simultaneously,
which makes them ideal for performing repetitive tasks across large data sets.
3. Acceleration of Complex Computations: Modern GPUs are also used for scientific
computing, AI training, cryptocurrency mining, and more.

Difference Between CPU and GPU:

Feature CPU (Central Processing Unit) GPU (Graphics Processing Unit)

Purpose General-purpose processing Specialized in graphics and parallel tasks

Cores Fewer (typically 4 to 16) Hundreds or thousands

Thread Handling Handles few threads efficiently Handles thousands of threads in parallel

Processing Type Sequential Highly parallel

Use Cases OS tasks, applications, logic Gaming, graphics rendering, AI computation

Components of a GPU:

1. Streaming Multiprocessors (SMs):

o Contain multiple cores to perform parallel operations.

o Each SM can run many threads concurrently.


2. Memory:

o High-speed memory (e.g., GDDR6) used to store textures, data, and intermediate
results.

o Includes cache and shared memory for efficient processing.

3. Shader Units:

o Perform pixel shading, vertex calculations, lighting, and other rendering effects.

4. Rasterizers & Render Output Units (ROPs):

o Convert 3D models into 2D images.

o Handle final image output to the display.

Types of GPUs:

1. Integrated GPUs:

o Built into the CPU or motherboard.

o Share system memory with the CPU.

o Lower performance; suitable for basic graphics and video playback.

o Example: Intel HD Graphics


2. Dedicated (Discrete) GPUs:

o Standalone cards with their own memory and power.

o High performance; ideal for gaming, design, and scientific applications.

o Example: NVIDIA GeForce, AMD Radeon

Applications of GPUs:

1. Gaming:

o Rendering realistic 3D environments, lighting, and special effects.

2. Scientific Computing:

o Accelerating simulations in physics, chemistry, biology, and more.

3. Machine Learning and AI:


o Training deep neural networks using frameworks like TensorFlow and PyTorch.

4. Cryptocurrency Mining:

o Performing hash calculations at high speed.

5. Medical Imaging:

o Processing high-resolution scans (e.g., CT, MRI).

Popular GPU Technologies:


• CUDA (Compute Unified Device Architecture):

o Developed by NVIDIA, allows developers to use C/C++ to write parallel programs


for GPUs.
• OpenCL (Open Computing Language):

o Open standard for parallel programming of diverse platforms including GPUs and
CPUs.
• DirectX / OpenGL / Vulkan:

o Graphics APIs used for rendering images in games and applications.

3.4 Control Unit Design

Definition:
The Control Unit (CU) is a fundamental component of the Central Processing Unit (CPU)
responsible for directing the operations of the processor. It controls how data moves within the
CPU, to and from memory, and between input/output devices. Essentially, it acts as the "brain
within the brain"—coordinating and sequencing all actions taken by the computer.

Functions of the Control Unit:

The Control Unit performs the following essential functions:

1. Fetch: Retrieves instructions from main memory.

2. Decode: Interprets the fetched instruction.


3. Execute Control: Directs the appropriate components (ALU, registers, memory) to carry
out the operation.

4. Control Signals: Generates and sends signals to control the flow of data and the operation
of hardware components.

Control Unit Workflow (Instruction Cycle):

1. Fetch the instruction from memory.

2. Decode the instruction to understand the operation.

3. Execute by sending control signals to the appropriate units (ALU, registers, memory, I/O).
4. Repeat for the next instruction.

Types of Control Unit Design:

There are two main approaches to designing a Control Unit:

1. Hardwired Control Unit

Definition:
A control unit where control signals are generated by hardware logic circuits (gates, flip-flops,
decoders, etc.).

Features:
• Fast and efficient.
• Difficult to modify (not flexible).
• Best suited for simple and fixed instruction sets.

Advantages:
• High performance (speed).
• Low latency signal generation.

Disadvantages:
• Hard to change or upgrade.
• Complex to design for large instruction sets.
Use Cases:
• Used in high-speed processors or embedded systems with fixed functionality.

2. Microprogrammed Control Unit


Definition:
A control unit that uses a set of instructions (microinstructions) stored in control memory to
generate control signals.
Features:
• Flexible and easier to modify.
• Slower than hardwired control units.

Components:
• Control memory: Stores microprograms.
• Control address register: Points to the current microinstruction.
• Control data register: Holds the microinstruction being executed.
Advantages:
• Easier to design and modify.
• Supports complex and large instruction sets.

Disadvantages:
• Slower due to memory fetches.
• Higher control memory overhead.

Use Cases:
• Used in CISC (Complex Instruction Set Computer) architectures like Intel x86.

Comparison: Hardwired vs Microprogrammed

Feature Hardwired CU Microprogrammed CU

Speed Faster Slower


Feature Hardwired CU Microprogrammed CU

Flexibility Low (hard to change) High (easier to update)

Complexity Complex for large instruction sets Simpler and more scalable

Instruction Set Best for small/fixed sets Suitable for complex instructions

Design Components of a Control Unit:

Component Function

Instruction Register Holds the current instruction to be decoded and executed

Decoder Breaks down the instruction into control signals

Timing Generator Provides timing signals to coordinate operation sequences

Control Signal Generator Generates control signals for all other CPU components

Role of Control Unit in CPU Operation:


• Coordinates ALU, memory, and I/O units.
• Manages timing and sequencing of operations.
• Ensures correct execution of instruction cycles (fetch-decode-execute).
• Facilitates pipelining in modern CPUs.

Hardwired and Microprogrammed Control Units (Control Unit Design)

The Control Unit (CU) is the component of the CPU (Central Processing Unit) that directs the
operations of the processor by generating control signals to guide the execution of instructions.
There are two primary design approaches for building control units:
1. Hardwired Control Unit

Definition:

A Hardwired Control Unit uses combinational logic circuits (like gates, flip-flops, and
decoders) to generate control signals directly based on the instruction opcode and the current
timing step.

How It Works:
• The control logic is implemented with hardware circuits.
• The instruction is decoded, and based on its type and the timing signals, control signals
are activated.
• The circuit is pre-designed for a specific instruction set.

Advantages:
• Very fast (ideal for high-speed processors).
• No memory needed for control instructions.

Disadvantages:
• Difficult to modify or upgrade (not flexible).
• Complex for large instruction sets (hard to manage as logic becomes dense).

2. Microprogrammed Control Unit

Definition:
A Microprogrammed Control Unit uses a set of microinstructions stored in a special memory
called the control memory. These microinstructions are fetched and executed to generate the
required control signals.
How It Works:
• The control unit contains a control memory with microprograms.
• Each machine instruction maps to a sequence of microinstructions.
• The control memory outputs microinstructions which generate control signals.

Advantages:
• Easier to modify and extend (microcode can be changed).
• Simplifies complex instruction sets (ideal for CISC architectures).
Disadvantages:
• Slower than hardwired control units due to microinstruction fetch time.
• Consumes more memory for microprogram storage.

4.1 Hierarchy of Different Types of Memory and Storage Devices

Definition:

In computer systems, memory hierarchy refers to a structured arrangement of storage devices


and memory units based on their speed, cost, size, and proximity to the CPU. This hierarchy
helps optimize performance by balancing access speed and storage capacity.

4.2 Characteristics of Different Types of Memory and Storage Devices

In computer systems, various types of memory and storage devices are used to hold data
temporarily or permanently. Each type has specific characteristics such as speed, size, cost,
volatility, and accessibility that make it suitable for particular tasks.

Below is a detailed explanation of the key characteristics of different types of memory and storage
devices.

1. Registers

Characteristics:
• Speed: Fastest memory (1 nanosecond or less).
• Volatility: Volatile (data is lost when power is off).
• Size: Very small (few bytes).
• Location: Inside the CPU.
• Purpose: Holds data currently being processed, such as instructions and immediate results.
• Accessibility: Directly accessed by the CPU without delay.

2. Cache Memory

Characteristics:
• Speed: Very fast (faster than RAM, slower than registers).
• Volatility: Volatile.
• Size: Small (KB to a few MB).
• Location: Between CPU and main memory.
• Purpose: Stores frequently accessed data and instructions to reduce memory access time.
• Types:

o L1 Cache: Closest to CPU, smallest, fastest.

o L2 Cache: Larger and slower than L1.

o L3 Cache: Shared between cores, larger and slower than L2.

3. Main Memory (RAM - Random Access Memory)

Characteristics:
• Speed: Fast (slower than cache).
• Volatility: Volatile.
• Size: Medium (GBs, e.g., 4GB – 64GB).
• Location: On the motherboard.
• Purpose: Temporarily stores data and programs that are in use.
• Types:

o DRAM (Dynamic RAM): Common, needs refreshing.

o SRAM (Static RAM): Faster, used in cache, more expensive.

4. ROM (Read-Only Memory)

Characteristics:
• Speed: Slower than RAM.
• Volatility: Non-volatile (retains data when power is off).
• Size: Small (KB to MB).
• Purpose: Stores firmware and bootloader.
• Types:

o PROM: Programmable once.

o EPROM: Erasable with UV light.

o EEPROM/Flash ROM: Electrically erasable and programmable.

5. Secondary Storage (HDD and SSD)

A. Hard Disk Drive (HDD)


• Speed: Slow (mechanical parts, ~10 ms access time).
• Volatility: Non-volatile.
• Size: Large (500GB to 10TB or more).
• Cost: Low cost per GB.
• Purpose: Permanent storage of OS, software, and files.
• Durability: Susceptible to damage due to moving parts.

B. Solid-State Drive (SSD)


• Speed: Much faster than HDD (~0.1 ms access time).
• Volatility: Non-volatile.
• Size: Medium to large (128GB to several TB).
• Cost: More expensive per GB than HDD.
• Purpose: High-speed storage for faster system performance.
• Durability: More shock-resistant; no moving parts.

6. Optical Storage (CD, DVD, Blu-ray)

Characteristics:
• Speed: Slow (compared to HDD/SSD).
• Volatility: Non-volatile.
• Size: Small to medium (700MB for CD, 4.7GB for DVD).
• Purpose: Distribution of media, backup.
• Durability: Can degrade over time; sensitive to scratches.
• Usage: Becoming obsolete due to flash storage and cloud.

7. Flash Memory (USB, SD Cards)


Characteristics:
• Speed: Faster than HDD, slower than SSD.
• Volatility: Non-volatile.
• Size: Medium (2GB – 1TB).
• Purpose: Portable data storage and transfer.
• Durability: Good resistance to shock; reusable many times.
• Common Forms: Pen drives, memory cards.

8. Cloud Storage

Characteristics:
• Speed: Depends on internet connection.
• Volatility: Non-volatile.
• Size: Virtually unlimited.
• Purpose: Online storage, backup, remote access.
• Durability: Data stored in data centers with redundancy.
• Cost: Subscription-based or free with limits.

4.3 How Data is Stored and Retrieved in Different Memory and Storage Devices

Overview:

In computer systems, data storage and retrieval vary depending on the type of memory or storage
device involved. These processes are influenced by the device’s structure, access method, speed,
and volatility. Understanding how data is stored and retrieved in different types of memory helps
in optimizing system performance and reliability.
1. Registers (CPU Registers)

Storage:
• Data is stored directly using electrical flip-flops or latches within the CPU.
• Each register is identified by name (e.g., AX, BX, PC).

Retrieval:
• CPU directly accesses and retrieves data from registers during instruction execution.
• Access is instantaneous and requires no addressing mechanism.

Speed: Fastest
Volatility: Volatile (data lost when power is off)

2. Cache Memory (L1, L2, L3)

Storage:
• Uses SRAM (Static RAM) technology.
• Data and instructions that are frequently accessed from RAM are automatically stored in
cache.
Retrieval:
• When the CPU requests data:

o It first checks the L1 cache.

o If not found, it checks L2, then L3, before going to RAM.


• This is called a cache hit (data found) or cache miss (data not found).

Access Method: Associative mapping and locality of reference


Volatility: Volatile

3. Main Memory (RAM)

Storage:
• Data is stored in memory cells, each with a unique address.
• The CPU uses address buses to write data to specific memory locations.
Retrieval:
• The CPU sends a memory address via the address bus.
• The RAM locates the data and sends it back via the data bus.

Access Method: Direct (random) access


Volatility: Volatile

4. Read-Only Memory (ROM)


Storage:
• Data is permanently written during manufacturing or programming.
• Uses masking or programmable logic (PROM, EPROM, EEPROM).

Retrieval:
• Data is retrieved using the address lines, like RAM.
• The CPU reads the content but cannot modify it during execution.

Access Method: Direct access


Volatility: Non-volatile

5. Secondary Storage

A. Hard Disk Drive (HDD)

Storage:
• Data is stored magnetically on rotating platters in sectors and tracks.
• A read/write head moves to the correct track to write bits as magnetic patterns.

Retrieval:
• The read/write head locates the correct track and sector.
• Data is read magnetically and passed to RAM via the I/O interface.
Access Method: Semi-sequential/random access
Volatility: Non-volatile
Speed: Relatively slow due to mechanical parts
B. Solid-State Drive (SSD)

Storage:
• Uses flash memory (NAND chips) to store data as electrical charges in cells.
• Organized into blocks and pages.
Retrieval:
• Controller fetches the data electrically without moving parts.
• Fast access to specific pages.

Access Method: Random access


Volatility: Non-volatile
Speed: Faster than HDD

6. Optical Storage (CD, DVD, Blu-ray)

Storage:
• Data is stored as pits and lands on a reflective disk surface.
• A laser burns these patterns during writing.

Retrieval:
• A laser beam scans the surface.
• Reflection differences (pits and lands) are translated into binary data.

Access Method: Sequential or indexed access


Volatility: Non-volatile
Speed: Slow compared to HDD/SSD

7. Flash Memory (USB, SD Cards)

Storage:
• Data is stored by trapping electrons in floating gate transistors.
• Each bit is stored as a charge state (1 or 0).

Retrieval:
• Controller reads voltage levels to determine stored bits.
• Access is block-based.

Access Method: Random block-level access


Volatility: Non-volatile
Speed: Moderate to fast

8. Cloud Storage

Storage:
• Data is stored on remote servers in data centers.
• Uses a combination of SSDs, HDDs, and backup tapes.
Retrieval:
• Data is accessed over the internet via protocols (e.g., HTTP, FTP).
• Requires authentication and encryption for security.

Access Method: Network-based access


Volatility: Non-volatile
Speed: Depends on internet speed

4.4 Performance Differences Among Various Types of Memory and Storage Devices

Overview:

The performance of memory and storage devices in a computer system depends on several factors,
including speed (latency and bandwidth), capacity, cost, volatility, and power consumption.
These differences affect how quickly data can be accessed, how much data can be stored, and how
efficiently a system operates.

Key Performance Metrics:

Metric Explanation

Access Time
Time taken to read/write a single piece of data. Lower is better.
(Latency)

Bandwidth Amount of data that can be transferred per second. Higher is better.
Metric Explanation

Volatility Whether data persists after power is off.

Throughput Total amount of data transferred over a given period.

Expense to store one bit of data. Lower is more cost-effective for large
Cost per Bit
storage.

Power consumed during operation. Important for mobile and embedded


Energy Efficiency
devices.

Performance Comparison of Memory and Storage Devices

1. CPU Registers
• Speed: Fastest (nanoseconds or less)
• Latency: Almost zero (directly accessed by CPU)
• Bandwidth: Extremely high
• Capacity: Very limited (bytes)
• Use Case: Immediate data used by the CPU
• Cost: Very high per bit

2. Cache Memory (L1, L2, L3)


• Speed: Very fast (next to registers)
• Latency: L1 (1–2 ns), L2 (3–10 ns), L3 (10–30 ns)
• Bandwidth: High
• Capacity: Small (KB to a few MB)
• Use Case: Store frequently accessed data and instructions
• Cost: High

3. Main Memory (RAM)


• Speed: Fast
• Latency: 50–100 ns (depends on DRAM type)
• Bandwidth: Moderate to high
• Capacity: Medium (4GB to 64GB+)
• Use Case: Holds active programs and data
• Cost: Moderate

4. ROM (Read-Only Memory)


• Speed: Slower than RAM
• Latency: ~100 ns to several microseconds
• Bandwidth: Low
• Capacity: Small (KB to MB)
• Use Case: Firmware storage
• Cost: Low to moderate

5. Solid-State Drives (SSD)


• Speed: High (especially NVMe SSDs)
• Latency: ~0.05–0.1 ms
• Bandwidth: Very high (especially for PCIe NVMe)
• Capacity: Large (128GB to several TB)
• Use Case: Fast permanent storage (OS, games, apps)
• Cost: Higher than HDD per GB
6. Hard Disk Drives (HDD)
• Speed: Moderate to slow
• Latency: 5–10 ms (mechanical delay)
• Bandwidth: Lower than SSD
• Capacity: Very large (500GB to 10TB+)
• Use Case: Mass storage (documents, videos, backups)
• Cost: Low per GB

7. Optical Discs (CD/DVD/Blu-ray)


• Speed: Very slow
• Latency: High (hundreds of milliseconds)
• Bandwidth: Low
• Capacity: Low to moderate (700MB – 50GB)
• Use Case: Media distribution, backup (becoming obsolete)
• Cost: Very low
8. Flash Memory (USB drives, SD cards)
• Speed: Moderate (depends on quality and interface)
• Latency: ~1–10 ms
• Bandwidth: Lower than SSDs
• Capacity: Small to moderate (2GB – 1TB)
• Use Case: Portable storage
• Cost: Moderate

9. Cloud Storage
• Speed: Depends on internet speed
• Latency: High (due to network)
• Bandwidth: Varies with internet and provider
• Capacity: Virtually unlimited
• Use Case: Remote access, backup, collaboration
• Cost: Subscription-based

4.5 Concept and Use of Virtual Memory in Computer Systems

Definition:
Virtual Memory is a memory management technique that allows a computer to compensate
for physical memory (RAM) limitations by temporarily transferring data from RAM to disk
storage. It enables a system to execute large programs or multiple programs simultaneously,
even if the physical RAM is insufficient.

Concept of Virtual Memory


Virtual memory creates an illusion of a large and continuous memory space, even though the
system may have limited physical RAM. It works by using a portion of the hard disk or SSD as
an extension of RAM, often referred to as the page file or swap space.

Key Concepts:

1. Address Translation:

o Programs use virtual addresses, which are translated by the Memory


Management Unit (MMU) to physical addresses.

2. Paging:

o Virtual memory is divided into pages (typically 4KB each).

o RAM is divided into page frames.

o Pages are loaded from secondary storage into RAM as needed.

3. Page Table:

o A data structure that keeps track of the mapping between virtual pages and
physical frames.
4. Page Fault:

o Occurs when the CPU accesses a page that is not in RAM.

o The operating system retrieves the page from disk into RAM.

How Virtual Memory Works:

1. The CPU generates a virtual address.


2. The MMU translates it to a physical address.
3. If the page is in RAM, data is accessed directly.

4. If the page is not in RAM (page fault), the OS:

o Pauses the program.

o Loads the page from disk into RAM.

o Updates the page table.

o Resumes execution.
Uses of Virtual Memory

1. Running Large Programs


• Programs larger than available RAM can still execute efficiently.
2. Multitasking
• Multiple programs can run simultaneously without exhausting RAM.

3. Memory Isolation and Protection


• Each process operates in its own virtual address space, protecting it from others.

4. Efficient Use of RAM


• Frequently used pages stay in RAM; infrequently used pages are swapped out.

5. Simplified Programming
• Programmers can write programs as if infinite memory is available.

Advantages of Virtual Memory

Advantage Description

Efficient memory usage Only necessary parts of programs are loaded into RAM

Program isolation Prevents one process from corrupting another’s memory

Simplified program development Developers don’t worry about exact memory availability

Multitasking support Enables multiple applications to run smoothly

Disadvantages of Virtual Memory

Disadvantage Description

Slower than physical RAM Disk access is much slower than RAM

Page faults cause delays Excessive page swapping (thrashing) degrades performance

Requires disk space Needs part of disk for swap file or page file
Real-Life Example:

If your computer has 4 GB of RAM and you're running programs that require 6 GB, virtual
memory will use disk space (e.g., 2 GB) as an extension of RAM, ensuring smooth operation —
though with reduced speed compared to real RAM.

5.1 Instruction Level Parallelism (ILP) and Its Importance in Increasing Computing
Performance

What is Instruction-Level Parallelism (ILP)?

Instruction-Level Parallelism (ILP) is a concept in computer architecture that refers to the


ability of a CPU to execute multiple instructions simultaneously within a single program or
thread. It involves identifying independent instructions that can be executed in parallel rather
than sequentially, to improve CPU throughput and performance.

In simple terms, ILP means doing more work at once within the processor by overlapping the
execution of instructions.

5.2 Hardware and Software Techniques Used to Increase Instruction-Level Parallelism (ILP)

Instruction-Level Parallelism (ILP) aims to execute multiple instructions in parallel within a


single processor. To exploit ILP effectively, both hardware mechanisms (built into the CPU) and
software techniques (employed by compilers and programmers) are used. These techniques help
identify, schedule, and manage independent instructions for simultaneous execution.

A. Hardware Techniques to Increase ILP

Hardware techniques are implemented within the processor architecture to dynamically identify
and exploit ILP during program execution.

1. Pipelining
• Breaks instruction execution into multiple stages (e.g., fetch, decode, execute, write-back).
• Allows overlapping execution of multiple instructions at different stages.
Result: Increases throughput and reduces instruction latency.
2. Superscalar Architecture
• Allows multiple instructions to be fetched, decoded, and executed per clock cycle.
• Uses multiple functional units (e.g., ALUs, FPUs).
Example: A dual-issue processor can process two instructions per cycle.

3. Out-of-Order Execution
• CPU executes instructions out of the original program order as long as data
dependencies are not violated.
• Keeps execution units busy and improves performance.

Benefit: Avoids stalling due to waiting for a previous instruction’s result.

4. Branch Prediction
• Predicts the outcome of conditional branches (e.g., if-else) before they are executed.
• Ensures the pipeline stays filled by speculatively executing instructions.

Modern CPUs use dynamic branch predictors with high accuracy.

5. Register Renaming
• Resolves false data dependencies by assigning new registers to avoid naming conflicts.
• Prevents Write After Read (WAR) and Write After Write (WAW ) hazards.

6. Speculative Execution
• Executes instructions before it is certain they are needed (based on branch prediction).
• If the prediction is correct, the results are kept; if not, they are discarded.

7. Scoreboarding / Tomasulo’s Algorithm


• Techniques for dynamic scheduling of instructions in hardware.
• Allow instructions to be issued out-of-order but commit in order to maintain correctness.

B. Software Techniques to Increase ILP

Software techniques are applied by compilers and programmers to optimize code and expose
more parallelism.

1. Instruction Scheduling
• Compilers rearrange instructions to avoid stalls due to data hazards or delays.
• Example: Placing independent instructions between a load and its dependent instruction.

2. Loop Unrolling
• Reduces loop control overhead by executing multiple iterations in one loop pass.
• Increases the number of independent instructions for parallel execution.

3. Software Pipelining
• Reorganizes loops so that different iterations are overlapped.
• Independent operations from multiple loop iterations are interleaved.

4. Function Inlining
• Reduces function call overhead and exposes more optimization opportunities.
• Allows better scheduling across what would have been function call boundaries.

5. Use of Parallel Libraries and Compilers


• Compilers like GCC, LLVM, and Intel compilers automatically perform static analysis to
identify ILP.
• Libraries and directives (e.g., OpenMP, SIMD intrinsics) guide compilers in parallel
instruction generation.
6. Data Dependency Analysis
• Compilers analyze instructions to identify true, anti, and output dependencies.
• Helps to schedule instructions or perform register renaming appropriately.

5.3 Impact of Instruction-Level Parallelism (ILP) on Pipeline Performance

Introduction:

Instruction-Level Parallelism (ILP) directly influences the efficiency and throughput of


instruction pipelines in modern processors. Pipelining is a technique where the execution of
multiple instruction stages is overlapped to improve performance. The higher the ILP, the more
instructions can be kept in the pipeline simultaneously, maximizing the processor's utilization and
minimizing idle stages.

How ILP Enhances Pipeline Performance

1. Better Pipeline Utilization


• With high ILP, independent instructions are available to fill every pipeline stage.
• This ensures the pipeline is kept full, reducing idle time and improving throughput.

Result: More instructions completed per unit of time.

2. Reduced Pipeline Stalls and Hazards


• ILP allows the processor or compiler to reorder instructions to avoid pipeline stalls
caused by:

o Data hazards (dependencies between instructions)

o Control hazards (branches)

o Structural hazards (limited execution resources)

Example: While waiting for the result of an instruction, other independent instructions can be
executed.
3. Increased Instruction Throughput
• Throughput is the number of instructions completed per cycle.
• A high degree of ILP allows multiple instructions per cycle, especially in superscalar
and out-of-order processors.

Effect: More instructions are retired per second, leading to higher performance.

4. Better Branch Prediction and Speculative Execution


• ILP enables speculative execution, where instructions following a predicted branch are
executed in parallel.
• If the branch prediction is correct, the pipeline continues without interruption.

Effect: Minimizes performance penalties from branch instructions.

5. Enables Deeper and Wider Pipelines


• In systems with high ILP, CPU architects can design deeper pipelines (more stages) and
wider pipelines (multiple instruction issue slots).
• These designs require a steady stream of independent instructions to operate efficiently.

Result: Higher clock speeds and multiple instructions per cycle.

Impact of Low ILP on Pipeline Performance


When ILP is low, the pipeline suffers from:

1. Frequent Stalls
• Due to data or control dependencies, the pipeline stages wait for previous instructions to
complete.

2. Underutilization of Execution Units


• Some pipeline stages remain idle because there are no independent instructions to feed
them.

3. Increased Latency
• Execution time for each instruction increases, as the pipeline isn’t fully optimized.
Illustrative Example:

Assume a simple 5-stage pipeline:


Fetch → Decode → Execute → Memory → Write-back

5.4 Out-of-Order Execution and Speculative Execution in the Context of Instruction-Level


Parallelism (ILP)

Introduction

Modern processors aim to execute instructions as fast as possible by exploiting Instruction-Level


Parallelism (ILP). Two powerful hardware techniques that significantly enhance ILP are:

1. Out-of-Order Execution (OoOE)

2. Speculative Execution
These techniques allow the processor to continue executing instructions even when the natural
(program) order would require it to wait due to dependencies or uncertainties (like branches).

A. Out-of-Order Execution (OoOE)

Definition:
Out-of-Order Execution is a technique where the processor executes instructions as soon as their
operands are available, not necessarily in the order they appear in the program. This
increases instruction throughput by avoiding unnecessary stalls.

How It Works:

1. Instruction Fetch: Instructions are fetched in program order.

2. Instruction Queue: Instructions are placed in a buffer.

3. Dependency Check: The processor checks if the operands (data) needed for an instruction
are available.

4. Execution: If ready, the instruction is sent to the execution unit, even if earlier instructions
are not yet complete.

5. Reordering Buffer: Ensures results are committed in original program order to


preserve correctness.

B. Speculative Execution
Definition:

Speculative Execution is a technique where the processor guesses the outcome of instructions
(typically branches) and executes subsequent instructions ahead of time. If the guess is correct,
execution continues without interruption. If incorrect, the speculative work is discarded.

How It Works:

1. Branch Prediction:

o The CPU predicts the result of a conditional branch (e.g., if-else).

2. Speculative Execution:

o The CPU continues executing instructions beyond the branch as if the prediction
was correct.

3. Commit or Rollback:

o If the prediction is right, results are committed.

o If wrong, the CPU rolls back and executes the correct path.

5.5 Implications of Dependencies Between Instructions in the Concept of Instruction-Level


Parallelism (ILP

Introduction

In the context of Instruction-Level Parallelism (ILP), the ability to execute multiple


instructions simultaneously is often limited by instruction dependencies. Dependencies
determine whether instructions can be safely reordered or executed in parallel. When dependencies
are present, they can cause pipeline stalls, reduce execution throughput, and limit the
effectiveness of ILP.
Types of Instruction Dependencies

Dependencies between instructions fall into three main categories, based on how they affect
parallel execution:

1. Data Dependencies (True Data Dependency or Read-After-Write – RAW)


• Occurs when an instruction depends on the result of a previous instruction.
Example:

assembly

1. A = B + C ; Instruction 1

2. D = A * E ; Instruction 2 (depends on result of A)


• Implication: Instruction 2 must wait for Instruction 1 to finish.
• Impact on ILP: Prevents simultaneous execution; causes stalls or delayed issue.

2. Name Dependencies (False Dependencies)

These dependencies exist because of shared register names, not because of actual data flow.
a. Write-After-Read (WAR)
• An instruction writes to a register after another instruction has read it.

assembly

1. B = A + 5 ; Reads A

2. A = C + 1 ; Writes A (should not happen before instruction 1 reads A)

b. Write-After-Write ( WAW)
• Two instructions write to the same register; the second must not overwrite the first
prematurely.

assembly

1. A = B + 1
2. A = C + 2
• Implication: WAR and WAW can often be resolved by register renaming.
• Impact on ILP: Can cause unnecessary stalls if not handled.

3. Control Dependencies
• Occur when an instruction depends on the outcome of a branch or conditional statement.
Example:

assembly
1. if (x > 0)

2. y = x + 1;
• The instruction in line 2 depends on whether the branch in line 1 is taken.
• Implication: Until the condition is evaluated, the processor cannot know which path to
execute.
• Impact on ILP: Limits the number of instructions that can be issued speculatively.
5.6 Flynn’s Classification of Parallel Computers

Introduction

Flynn's Classification is a method proposed by Michael J. Flynn in 1966 to categorize computer


architectures based on the number of instruction streams and data streams they can handle
simultaneously. It is a widely used model in computer architecture to describe the structure and
behavior of sequential and parallel computing systems.

Flynn's taxonomy classifies computers into four categories, using the concept of Instruction
Stream (IS) and Data Stream (DS):

Flynn's Four-Class Model

Category Instruction Stream Data Stream Description

SISD Single Single Traditional sequential computers

SIMD Single Multiple Same instruction on multiple data elements

MISD Multiple Single Rare; multiple instructions on same data

MIMD Multiple Multiple Most modern multiprocessors

1. SISD – Single Instruction, Single Data


• Description: Executes one instruction at a time on one data element.
• Architecture Type: Traditional Von Neumann architecture.
• Execution: Sequential; no parallelism.
• Example Systems:

o Simple uniprocessor systems

o Basic microcontrollers

2. SIMD – Single Instruction, Multiple Data


• Description: Executes the same instruction on multiple data elements simultaneously.
• Use Case: Ideal for data-parallel tasks (e.g., vector processing, graphics).
• Example Systems:

o GPUs (Graphics Processing Units)

o Vector processors (e.g., Intel AVX, ARM NEON)


o Array processors

3. MISD – Multiple Instruction, Single Data


• Description: Executes different instructions on the same data element.
• Practicality: Rare in commercial systems; mainly theoretical or specialized (e.g., fault-
tolerant systems).
• Example Use:

o Real-time systems with redundancy for fault detection (e.g., space shuttles)

o Pipelined systems (some interpretations)

4. MIMD – Multiple Instruction, Multiple Data


• Description: Executes different instructions on different data simultaneously.
• Architecture Type: Most modern parallel systems fall into this category.
• Types:

o Shared Memory (e.g., multicore CPUs)

o Distributed Memory (e.g., clusters, supercomputers)


• Example Systems:

o Multi-core processors (Intel Core i9, AMD Ryzen)

o Parallel servers

o High-Performance Computing (HPC) clusters

6.1 Basic Concepts of Multiprocessors and Thread-Level Parallelism (TLP)

1. Multiprocessors: Basic Concept

Definition:
A multiprocessor system is a computer system with two or more processors (CPUs) that share
a common memory and are capable of executing multiple instructions simultaneously. These
processors collaborate to perform computational tasks more efficiently.
Types of Multiprocessor Systems
Type Description

Symmetric Multiprocessing All processors have equal access to shared memory and operate
(SMP) under a single OS.

Asymmetric Multiprocessing One processor is the master and controls others, often used in
(AMP) embedded systems.

Systems with dozens to thousands of processors working


Massively Parallel Systems
together (e.g., supercomputers).

Key Features of Multiprocessor Systems


• Shared Memory: All CPUs access a common memory space.
• Concurrent Execution: Multiple instructions are executed at once.
• Inter-Processor Communication: Processors communicate via shared memory or
interconnects.
• Scalability: More processors can be added for higher performance.

Advantages of Multiprocessors
• Increased system throughput and performance.
• Faster execution of parallel tasks.
• Better reliability and fault tolerance.
• Efficient resource utilization in multitasking environments.

Example Use Cases


• Servers and data centers
• Scientific computing
• Real-time processing systems
• Graphics rendering
2. Thread-Level Parallelism (TLP): Basic Concept

Definition:
Thread-Level Parallelism (TLP) is the ability of a processor to execute multiple threads
simultaneously, either on separate cores or interleaved on the same core. A thread is the
smallest unit of a program that can be scheduled for execution.

TLP vs ILP

Aspect Instruction-Level Parallelism (ILP) Thread-Level Parallelism (TLP)

Granularity Within a single thread/program Across multiple threads

Focus Parallelism of instructions Parallelism of independent threads

Example Pipeline, Out-of-order execution Multi-threaded apps, parallel tasks

Types of Thread Execution

Type Description

Coarse-Grained Switches threads only when a stall occurs (e.g., memory access
Multithreading delay).

Fine-Grained
Switches threads every clock cycle, reducing idle time.
Multithreading

Simultaneous Executes multiple threads in parallel on the same core, utilizing


Multithreading (SMT) different functional units.

Multi-core Parallelism Executes threads on separate processor cores simultaneously.

Advantages of TLP
• Better utilization of CPU resources.
• Improved responsiveness in multi-user/multitasking systems.
• Increased performance for parallel workloads (e.g., web servers, simulations).
• Enables smooth execution of background tasks.

6.2 Different Architectures of Multiprocessor Systems


Introduction

A multiprocessor system is a computer system with two or more processing units (CPUs or
cores) that operate simultaneously. The architecture of such systems determines how processors
are connected, how they communicate, and how memory is shared. These architectural designs
impact performance, scalability, complexity, and the ability to execute parallel tasks efficiently.

Main Multiprocessor Architectures

Multiprocessor systems are generally classified based on how they manage memory and
coordinate processors. The major architectures are:

1. Shared Memory Architecture

Description:

In shared memory architecture, all processors have access to a common global memory.
Processors communicate and exchange data by reading and writing to the shared memory.

Characteristics:
• All CPUs access the same address space.
• Communication is implicit via memory operations.
• Requires memory synchronization mechanisms (e.g., locks, semaphores).

Types of Shared Memory Architecture:

Type Description

Uniform Memory Access


All processors have equal access time to memory.
(UMA)

Non-Uniform Memory Access Access time varies depending on which processor accesses
(NUMA) which memory segment.

Cache-Coherent NUMA Adds hardware mechanisms to ensure cache consistency across


(ccNUMA) processors.

Advantages:
• Easy to program (since memory is globally accessible).
• Efficient for tasks with frequent communication between processors.

Disadvantages:
• Scalability issues due to contention for shared memory.
• Complexity in managing cache coherence.

2. Distributed Memory Architecture


Description:

Each processor has its own private local memory. Processors communicate by explicit message
passing rather than shared memory.

Characteristics:
• No global memory.
• Communication is done using interconnection networks.
• Used in cluster computing and massively parallel processors (MPPs).
Advantages:
• Highly scalable.
• Avoids memory contention and bottlenecks.

Disadvantages:
• Complex programming model (requires explicit communication).
• Data locality must be managed manually.

Examples:
• MPI-based (Message Passing Interface) systems.
• Beowulf clusters.
• Supercomputers like Cray and IBM Blue Gene.

3. Hybrid Architecture (Shared + Distributed Memory)

Description:
Combines both shared and distributed memory approaches. Often implemented in multi-core
clusters where each node has shared memory (among cores), and nodes communicate via message
passing.

Characteristics:
• Intra-node communication uses shared memory.
• Inter-node communication uses distributed memory.

Advantages:
• Balances ease of programming (within nodes) and scalability (across nodes).
• Efficient resource utilization.
Disadvantages:
• Increased system complexity.
• Requires hybrid programming models (e.g., OpenMP + MPI).

4. Symmetric Multiprocessing (SMP)


Description:
All processors are equal peers and share memory and I/O devices.

Characteristics:
• Shared memory.
• Single operating system instance.
• Common in desktop, server, and small multiprocessor systems.

Advantages:
• Simplified design.
• Efficient for a small number of processors.

Disadvantages:
• Not scalable beyond a certain number of processors.

5. Asymmetric Multiprocessing (AMP)

Description:

One processor is the master and controls the system; others are slaves that perform specific tasks.

Characteristics:
• Used in embedded systems or systems with mixed workloads.
• Limited or no memory sharing.

Advantages:
• Simpler task division.
• Useful in real-time or specialized systems.

Disadvantages:
• Poor fault tolerance.
• Not suitable for general-purpose parallel computing.
6.3 Concept of Cache Coherence and Memory Consistency in Multiprocessor Systems

Introduction

In multiprocessor systems, where multiple CPUs (or cores) have their own private caches and
share a main memory, ensuring data consistency becomes a major challenge. This leads to the
concept of:
• Cache Coherence
• Memory Consistency

These mechanisms help ensure that all processors see the correct and updated value of shared
variables and maintain the expected behavior of memory operations across processors.

A. Cache Coherence

What is Cache Coherence?


Cache coherence refers to the uniformity of shared data values that are stored in multiple
caches. In multiprocessor systems, if a processor modifies a value in its cache, the change must
be visible to other processors accessing the same memory location to prevent inconsistent views
of memory.

Why Cache Coherence is Needed

Let’s consider two processors P1 and P2, both having a cached copy of variable X = 5.
• P1 updates X to 10 in its cache.
• P2 continues to read X as 5 from its own cache.
This creates an inconsistency — a cache coherence problem.

Key Goals of Cache Coherence Protocols


• Ensure that no two caches have inconsistent copies of the same memory location.
• Ensure that any read operation returns the most recent write.
Basic Cache Coherence Protocols

Protocol Description

Writes are immediately passed to the main memory and other caches are
Write-through
updated.
Protocol Description

Writes occur only in cache, memory is updated later (requires tracking of


Write-back
changes).

All caches monitor (or "snoop") a common bus to observe and react to
Snoopy Protocol
memory actions.

Directory-based A centralized directory keeps track of where copies of data exist and
Protocol coordinates coherence.

Common Protocols Used:


• MSI Protocol (Modified, Shared, Invalid)
• MESI Protocol (Modified, Exclusive, Shared, Invalid)
• MOESI Protocol (Modified, Owned, Exclusive, Shared, Invalid)

These protocols define states and transitions for cache blocks based on reads/writes by other
processors.

B. Memory Consistency

What is Memory Consistency?

While cache coherence ensures correctness of individual memory locations, memory consistency
defines the order in which memory operations (reads/writes) become visible across processors.

Memory Consistency Models

Model Description

Strict Consistency A read returns the most recent write instantly — hard to implement.

Sequential The result is the same as if operations were executed in some sequential
Consistency order.

Weak Consistency Memory may appear out-of-order unless explicitly synchronized.

Release Consistency Operations are ordered at synchronization points (e.g., locks/barriers).

6.4 Role of Parallel Programming and How to Exploit Thread-Level Parallelism (TLP)

Introduction
Modern computing systems are built with multiple cores and processors. To fully utilize their
power, we need to write programs that can execute tasks concurrently — this is where parallel
programming and thread-level parallelism (TLP) come in.
• Parallel Programming is the technique of writing code that runs multiple instructions
or tasks simultaneously.
• Thread-Level Parallelism (TLP) refers to executing multiple threads of a program
concurrently to improve speed, efficiency, and performance.
Role of Parallel Programming in Modern Computing

1. Utilization of Multi-Core CPUs


• Today’s processors have multiple cores.
• Serial (non-parallel) code uses only one core, wasting others.
• Parallel programming ensures all cores are actively used.

2. Improved Performance and Speed


• Parallel programs divide tasks into smaller subtasks.
• These are executed at the same time, significantly reducing execution time.

3. Scalability
• Well-designed parallel programs can scale across many cores or nodes, making them
suitable for:

o Cloud computing

o High-Performance Computing (HPC)

o Real-time systems

4. Handling Complex Applications


• Essential in fields like:

o Scientific simulations

o Video rendering

o Machine learning

o Web servers

6.5 Performance Implications and Scalability of Multiprocessor Systems


Introduction

Multiprocessor systems—computers with two or more processing units—are designed to improve


performance, handle more workloads, and scale better than single-processor systems.
However, achieving efficient performance and ensuring scalability is not automatic. It depends
on hardware design, memory architecture, inter-processor communication, and the nature of the
software being run

You might also like