Lecture Notes Computer System Architecture
Lecture Notes Computer System Architecture
As shown in the diagram input is given to the CPU through input devices. This input goes to
memory and the control unit gets instructions from memory. The control unit now decides
what to do with the input or instructions and transfers it to ALU. Now, ALU performs various
operations like addition, subtraction, multiplication, division, logical operations, etc. After
that, the final result gets stored in memory and finally passed to output devices to give the
output. So, this is how the CPU works.
2. Motherboard
It is the main circuit board inside a computer and it contains most of the electronic
components together. All the components of the computer are directly or indirectly
connected to the motherboard. It includes RAM slots, controllers, system chipsets, etc.
3. RAM (Random Access Memory)
It is also known as temporary or volatile memory. It holds the program and data, which are
currently in process or processing. All the data is erased as soon as the computer is turned
off or in case of a power failure. Data stored in this memory can be changed. There are two
types of RAM:-
1. SRAM (Static RAM): SRAM basically consists of a flip-flop using a transistor or
Mosfet (MOS). It is fast and has less access time. In this refreshing circuits are
not required. But it is costly and requires more space. For e.g. cache memory.
2. DRAM (Dynamic RAM): DRAM consists of capacitors and the data is stored in
the form of capacitors. Capacitors charge when data is 1 and don’t charge if
data is 0. It requires refreshing circuits, as leakage of current in the capacitor
can occur, so they need to be refreshed to the data. It is slower and has a higher
access time. It is cheaper in comparison with SRAM. For e.g. Main memory.
4. Video Graphics Array Port
A video input commonly used on computer monitors is called a video graphics array (VGA)
port. Verifying that there isn’t a loose connection, a damaged cable, or a broken display is
one step in troubleshooting a VGA port. Compressed air can also be sprayed inside the VGA
port by a computer expert to make sure it’s dust-free.
5. Power Supply
All of a computer system’s parts are powered by a power source. Typically, a power cord is
used to connect a computer tower to an electrical outlet. By turning off the computer,
unplugging and separating the power supply cord, or trying a different cord or socket, a
technician can diagnose the power supply.
6. Cooling Fan
A computer’s system to prevent overheating uses cooling fans. To aid customers who use
their computers intensively, such as when streaming video or playing games, many
computers contain more than one cooling fan. If a user detects their computer overheating,
a computer expert might need to repair the cooling fan. The blades may be examined for
any damage and cleared of any foreign objects. A technician’s standard method
of troubleshooting may involve replacing computer fans.
7. Hard Drive
On a computer system, files, programs, and other types of information are stored on hard
drives, which are data storage devices. They utilise hard drives, which are magnetically
coated discs used to store digital versions of information. A computer technician can
suspect a corrupt hard disk when a hard drive dies.
Relationship Between Computer Hardware and Software
Both the Hardware and software are mutually dependent on each other. Each
should function properly so that the computer produces an output.
Software utilization can not be done without supporting of the hardware.
Relevant software should be loaded into the hardware to get the latest
software.
Hardware is a one-time expense while software is not.
Software development is very expensive while hardware cant be developed if in
use once.
Many software applications and their sub-applications can be loaded on
hardware to run different jobs.
The software acts as an interface between the user and the hardware.
Input device accepts the coded information as source program i.e. high level language. This is
either stored in the memory or immediately used by the processor to perform the desired
operations. The program stored in the memory determines the processing steps. Basically the
computer converts one source program to an object program. i.e. into machine language.
Finally the results are sent to the outside world through output device. All of these actions are
coordinated by the control unit.
Input unit: -
The source program/high level language program/coded information/simply data is fed to a
computer through input devices keyboard is a most common type. Whenever a key is pressed,
one corresponding word or number is translated into its equivalent binary code over a cable
& fed either to memory or processor. Joysticks, trackballs, mouse, scanners etc are other input
devices.
Memory unit: - Its function into store programs and data. It is basically to two types
1. Primary memory
2. Secondary memory
1. Primary memory: -
Is the one exclusively associated with the processor and operates at the electronics speeds
programs must be stored in this memory while they are being executed. The memory contains
a large number of semiconductors storage cells. Each capable of storing one bit of information.
These are processed in a group of fixed site called word. To provide easy access to a word in
memory, a distinct address is associated with each word location. Addresses are numbers that
identify memory location. Number of bits in each word is called word length of the computer.
Programs must reside in the memory during execution. Instructions and data can be written
into the memory or read out under the control of processor. Memory in which any location
can be reached in a short and fixed amount of time after specifying its address is called
random-access memory (RAM). The time required to access one word in called memory
access time. Memory which is only readable by the user and contents of which can’t be altered
is called read only memory (ROM) it contains operating system. Caches are the small fast RAM
units, which are coupled with the processor and are aften contained on the same IC chip to
achieve high performance. Although primary storage is essential it tends to be expensive.
2 Secondary memory: -
Is used where large amounts of data & programs have to be stored, particularly information
that is accessed infrequently. Examples: - Magnetic disks & tapes, optical disks (ie CD-ROM’s),
floppies etc., Arithmetic logic unit (ALU):- Most of the computer operators are executed in ALU
of the processor like addition, subtraction, division, multiplication, etc. the operands are
brought into the ALU from memory and stored in high speed storage elements called register.
Then according to the instructions the operation is performed in the required sequence. The
control and the ALU are may times faster than other devices connected to a computer system.
This enables a single processor to control a number of external devices such as key boards,
displays, magnetic and optical disks, sensors and other mechanical controllers. Output unit:-
These actually are the counterparts of input unit. Its basic function is to send the processed
results to the outside world. Examples:- Printer, speakers, monitor etc.
Control unit:-
It effectively is the nerve center that sends signals to other units and senses their states. The
actual timing signals that govern the transfer of data between input unit, processor, memory
and output unit are generated by the control unit. Basic operational concepts: - To perform a
given task an appropriate program consisting of a list of instructions is stored in the memory.
Individual instructions are brought from the memory into the processor, which executes the
specified operations. Data to be stored are also stored in the memory.
Examples: - Add LOCA, R0 This instruction adds the operand at memory location LOCA, to
operand in register R0 & places the sum into register.
This instruction requires the performance of several steps,
1. First the instruction is fetched from the memory into the processor.
2. The operand at LOCA is fetched and added to the contents of R0
3. Finally the resulting sum is stored in the register R0 The preceding add instruction combines
a memory access operation with an ALU Operations. In some other type of computers, these
two types of operations are performed by separate instructions for performance reasons. 3
Load LOCA, R1 Add R1, R0 Transfers between the memory and the processor are started by
sending the address of the memory location to be accessed to the memory unit and issuing
the appropriate control signals. The data are then transferred to or from the memory.
Computer components
Computer components are the essential building parts of developing a functional computer
system. The components that make up a computer are called computer components. The
processor (CPU), memory, and input/output devices are every computer's three main building
blocks. Initially computers were primarily used for numerical computations because any
information can be numerically encoded. The ability of computers to interpret information for
several purposes was quickly recognized.
o Input Devices
o CPU
o Output Devices
o Primary Memory
o Secondary Memory
A computer system's input devices are important because they allow users to enter
commands and data. Keyboards, mice, scanners, and microphones are numerous examples
of input devices.
o The keyboard is the most commonly utilized input device for inserting text and queries
into a computer system.
o Mice are another common input tool used to move the cursor on a computer screen.
o Scanners are used for inputting physical documents or images into a computer system.
o Microphones are used to input audio data into a system for computing. They can be
used for various tasks, including recording audio for podcasts, participating in video
conferences, and creating voice memos for later use.
CPU
A computer machine's "brain" is its central processing unit (CPU). It executes the calculations
and commands required for functioning of the computer device. The CPU comprises some
components: the control unit, the arithmetic logic unit (ALU), and registers.
o The CPU's control unit is a crucial component. It is in charge of reading and decoding
instructions from memory. The right part of the CPU receives and executes these
instructions from the control unit.
o The ALU, often known as the arithmetic logic unit, is another crucial CPU part. The ALU
operates addition, subtraction, comparison calculations, and other logical and
mathematical processes. These operations are carried out using binary logic, which
limits operations to the 0 and 1 digits.
o Registers are compact, high-speed data and instruction storage spaces within the CPU.
They are utilized to store data that is being processed by the CPU momentarily.
Registers are used to accelerate data processing because they are much faster than
other forms of memory, such as RAM.
o The CPU's clock speed is yet another crucial aspect that affects overall performance.
The clock speed, measured in GHz (gigahertz), controls what number of commands
the integral processing unit can process in a second.
Modern CPUs also have additional features like cache memory, virtualization capability, and
a couple of cores in addition to the abovementioned components. A small, quick memory
called a cache is used to store data and instructions that are utilized frequently. A single CPU
may run numerous operating systems thanks to virtualization capability. The CPU can execute
numerous tasks simultaneously thanks to multiple cores, enhancing its performance and
multitasking capabilities.
Primary Memory
The CPU has direct access to primary memory, sometimes referred to as random access
memory (RAM). The data and instructions that are currently being processed are kept in
primary memory. The data and instructions are accessed by the CPU from primary memory
when a computer programme is running. The information is removed from primary memory
once the programme is completed.
Primary memory is classified into two types: random access memory (RAM) and read-only
memory (ROM).
o RAM is the most common form of primary memory and is used to store data and
instructions that the CPU wishes to access frequently. RAM is volatile, which means
that its contents are lost when the computer is turned off. But RAM can be effortlessly
and quickly written to and read from, making it a really perfect storage medium for
temporary data and instructions.
o ROM is a form of memory this is used to store data and instructions that don't change.
ROM is non-volatile, which means that its contents aren't lost while the computer is
turned off. ROM is used to keep firmware and the laptop's basic input/Output machine
(BIOS), that are required for the computer to boot up and function well.
Other primary memory types, including cache memory, are sometimes used in computer
systems. High-speed memory called cache saves information and instructions, which might
be utilized often. By lowering the time, the CPU has to wait for data to be received from RAM
or secondary storage devices, it is used to speed up the processing of records.
Secondary Memory:
Secondary memory, also called auxiliary storage, is a type of computer memory that is used
to store data and programs that aren't currently being utilized by the CPU. In contrast to
primary memory, secondary memory is non-volatile, which means that its contents are not
lost when the computer is turned off.
There are several types of secondary memory devices, such as hard disk drives (HDD), solid-
state drives (SSD), optical disks (including CDs and DVDs), and USB flash drives. These devices
have varying storage capacities, read and write speeds, and different capabilities that make
them appropriate for different types of applications.
o Hard disc drives are the most typical secondary memory tool in computing devices and
laptop computers. They come in various sizes and speeds and keep data on magnetic
discs. However, solid-state drives employ flash memory to store data and are typically
quicker and more reliable despite being extra high-priced than HDDs.
o Optical discs are a secondary memory that reads and writes data to discs using lasers.
They are frequently used for data backup, distribution of software, and other digital
information. Small, portable storage devices known as USB flash drives are connected
to a computer's USB connection.
Users can store a lot of data and programmes in secondary memory, which is useful since it
makes them accessible fast and readily when needed. Users can also protect crucial data from
loss due to system crashes or other issues by using secondary memory devices as backups.
Output Devices:
Output devices are hardware components of a computer system that are used to show or
send data from the pc to the user or any other device. They enable customers to view and
engage with the information and applications the computer is processing. Speakers,
projectors, printers, and monitors are a few examples of output devices.
o Monitors are the most frequently used output devices used to show data on a
computer machine. They may be used to show photos, videos, and different forms of
data and exist in various sizes and resolutions.
o Printers are another form of output device this is used to print hard copies of papers
and other styles of data. They include inkjet and laser printers and are available in
various sizes and brands. While laser printers utilize toner to make speedy, high-
volume prints, inkjet printers employ liquid ink to produce high-quality prints.
o Speakers are used to output sound from a computer system. They can be connected
externally or incorporated into the computer system. They enable users to interact
with other forms of multimedia material, view videos, and listen to music.
o Projectors are output device that displays huge images and videos on a screen or wall.
They are frequently utilized in presentations and other occasions that call for a sizable
display.
Computer components collaborate to carry out the numerous tasks necessary for a computer
system to run. The following are some of the major operations carried out by computer
components:
1. Inputting: It is the process of entering raw data, instructions and information into the
computer. Keyboards, mice, and scanners are used as input devices to help with the
process. These tools are used to enter information and instructions into a computer
system. Data is transferred to the CPU for processing after inputting by an input device.
2. Processing: It is the process of converting the raw data into useful information. This
process is performed by the CPU of the computer. It takes the raw data from storage,
processes it and then sends back the processed data to storage. The CPU performs
arithmetic computations, logical operations, and data transport processes.
3. Storing: The computer has primary memory and secondary storage to store data and
instructions. It stores the data before sending it to CPU for processing and also stores
the processed data before displaying it as output. The primary memory, sometimes
called RAM, is where the CPU processes the data and instructions. Hard disc drives and
solid-state drives, which serve as secondary memory, offer long-term storage for data
and programmes that are not currently used.
4. Outputting: It is the process of presenting the processed data through output devices
like monitor, printer and speakers. These devices display or produce the results of the
processing performed by the CPU. The results are sent to an output device for display
or printing after the CPU has finished processing the data and instructions.
5. Controlling: This operation is performed by the control unit that is part of CPU. The
control unit ensures that all basic operations are executed in a right manner and
sequence. The main circuit board connects all the parts of the computer. It also
regulates the data flow between them, ensuring they function properly and
connecting them all.
Performance Measures
In computer organization, performance refers to the speed and efficiency at which a
computer system can execute tasks and process data. A high-performing computer system
is one that can perform tasks quickly and efficiently while minimizing the amount of time
and resources required to complete these tasks.
There are several factors that can impact the performance of a computer system, including:
Processor speed: The speed of the processor, measured in GHz (gigahertz),
determines how quickly the computer can execute instructions and process
data.
Memory: The amount and speed of the memory, including RAM (random
access memory) and cache memory, can impact how quickly data can be
accessed and processed by the computer.
Storage: The speed and capacity of the storage devices, including hard drives
and solid-state drives (SSDs), can impact the speed at which data can be stored
and retrieved.
I/O devices: The speed and efficiency of input/output devices, such
as keyboards, mice, and displays, can impact the overall performance of the
system.
Software optimization: The efficiency of the software running on the system,
including operating systems and applications, can impact how quickly tasks can
be completed.
Improving the performance of a computer system typically involves optimizing one or more
of these factors to reduce the time and resources required to complete tasks. This can
involve upgrading hardware components, optimizing software, and using specialized
performance-tuning tools to identify and address bottlenecks in the system.
Computer performance is the amount of work accomplished by a computer system. The
word performance in computer performance means “How well is the computer doing the
work it is supposed to do?”. It basically depends on the response time, throughput, and
execution time of a computer system. Response time is the time from the start to
completion of a task. This also includes:
Operating system overhead.
Waiting for I/O and other processes
Accessing disk and memory
Time spent executing on the CPU or execution time.
Throughput is the total amount of work done in a given time. CPU execution time is the
total time a CPU spends computing on a given task. It also excludes time for I/O or running
other programs. This is also referred to as simply CPU time. Performance is determined by
execution time as performance is inversely proportional to execution time.
Performance = (1 / Execution time)
And,
(Performance of A / Performance of B)
= (Execution Time of B / Execution Time of A)
If given that Processor A is faster than processor B, that means execution time of A is less
than that of execution time of B. Therefore, performance of A is greater than that of
performance of B. Example – Machine A runs a program in 100 seconds, Machine B runs
the same program in 125 seconds
(Performance of A / Performance of B)
= (Execution Time of B / Execution Time of A)
= 125 / 100 = 1.25
That means machine A is 1.25 times faster than Machine B. And, the time to execute a given
program can be computed as:
Execution time = CPU clock cycles x clock cycle time
Since clock cycle time and clock rate are reciprocals, so,
Execution time = CPU clock cycles / clock rate
The number of CPU clock cycles can be determined by,
CPU clock cycles
= (No. of instructions / Program ) x (Clock cycles / Instruction)
= Instruction Count x CPI
Which gives,
Execution time
= Instruction Count x CPI x clock cycle time
= Instruction Count x CPI / clock rate
Units for CPU Execution Time
Chapter-2
2.1 Fundamentals to instruction
Instruction
Set of commands that we give to processor to which task to do and how to perform that task.
So whatever be the processor (like X86,AMD,intel Pentium etc.) it will perform the task
according to the instruction on the basis of provided commands.
Types of instruction
1. Data transfer instruction
2. Data manipulation instruction
3. Program control instruction
1. Data transfer instruction
We can assume that according to the name, if we want to transfer data then these
instruction we can use
MOV,LD,STA,XCHG, PUSH, POP etc.
If we want to transfer the data between the register to register, register to memory & vice-
versa, memory to memory, from input device to O/P device & vice-versa then we need the
help of these instruction.
- It is the first part of the instruction that tells the computer what function/task is to be
perform
- it is the second part of the instruction which indicates the computer system with whom the
task is going to perform/where to store the data/where to manipulate the data
3.operand – what are the operand , to whom with perform the operation
A stack-based computer does not use the address field in the instruction. To evaluate an
expression first it is converted to reverse Polish Notation i.e. Postfix Notation.
Expression: X = (A+B)*(C+D)
Postfixed : X = AB+CD+*
TOP means top of stack
M[X] is any memory location
PUSH A TOP = A
PUSH B TOP = B
PUSH C TOP = C
PUSH D TOP = D
Expression: X = (A+B)*(C+D)
AC is accumulator
M[] is any memory location
M[T] is temporary location
LOAD A AC = M[A]
ADD B AC = AC + M[B]
STORE T M[T] = AC
LOAD C AC = M[C]
ADD D AC = AC + M[D]
MUL T AC = AC * M[T]
STORE X M[X] = AC
MUL R1, R2 R1 = R1 * R2
MOV X, R1 M[X] = R1
Expression: X = (A+B)*(C+D)
R1, R2 are registers
M[] is any memory location
ADD R1, A, B R1 = M[A] + M[B]
E.g. INC A
CLC (used to reset Carry flag to 0)
In the case of auto increment mode, the content present in the register is initially
incremented, and then the content that is incremented in the register is used in the form of
an effective address. Once the content present in the register in the auto-increment
addressing mode is accessed by its instruction, the content of the register is incremented to
refer to the next operand.
In the case of auto decremented mode, the content present in the register is initially
decremented, and then the content that is decremented in the register is used in the form of
an effective address. Once the content present in the register in the auto- decrement
addressing mode is accessed by its instruction, the content of the register is decremented to
refer to the next operand.
EA = A + (PC)
Here, EA: Effective address, PC: program counter.
Indexed addressing means that the final address for the data is determined by adding an
offset to a base address.
This memory address mode is ideal to store and access values stored in arrays. Arrays are
often stored as a complete block in memory (A block of consecutive memory locations). The
array has a base address which is the location of the first element, then an index is used that
adds an offset to the base address in order to fetch the specified element within the array.
Chapter-3
Register File
Program counter
Program counter stores the address of the next instruction to be executed which is going to
perform.
So program counter contains the address, as it is containing the address the size of the
Program counter depends upon size of address bits
Instruction Cycle
Memory address registers(MAR) : It is connected to the address lines of the system bus. It
specifies the address in memory for a read or write operation.
Memory Buffer Register(MBR) : It is connected to the data lines of the system bus. It
contains the value to be stored in memory or the last value read from the memory.
Program Counter(PC) : Holds the address of the next instruction to be fetched.
Instruction Register(IR) : Holds the last instruction fetched.
In computer organization, an instruction cycle, also known as a fetch-decode-execute
cycle, is the basic operation performed by a central processing unit (CPU) to execute an
instruction. The instruction cycle consists of several steps, each of which performs a
specific function in the execution of the instruction. The major steps in the instruction
cycle are:
1. Fetch: In the fetch cycle, the CPU retrieves the instruction from memory. The
instruction is typically stored at the address specified by the program counter
(PC). The PC is then incremented to point to the next instruction in memory.
2. Decode: In the decode cycle, the CPU interprets the instruction and determines
what operation needs to be performed. This involves identifying the opcode
and any operands that are needed to execute the instruction.
3. Execute: In the execute cycle, the CPU performs the operation specified by the
instruction. This may involve reading or writing data from or to memory,
performing arithmetic or logic operations on data, or manipulating the control
flow of the program.
4. There are also some additional steps that may be performed during the
instruction cycle, depending on the CPU architecture and instruction set:
5. Fetch operands: In some CPUs, the operands needed for an instruction are
fetched during a separate cycle before the execute cycle. This is called the fetch
operands cycle.
6. Store results: In some CPUs, the results of an instruction are stored during a
separate cycle after the execute cycle. This is called the store results cycle.
7. Interrupt handling: In some CPUs, interrupt handling may occur during any
cycle of the instruction cycle. An interrupt is a signal that the CPU receives from
an external device or software that requires immediate attention. When an
interrupt occurs, the CPU suspends the current instruction and executes an
interrupt handler to service the interrupt.
These cycles are the basic building blocks of the CPU’s operation and are performed for
every instruction executed by the CPU. By optimizing these cycles, CPU designers can
improve the performance and efficiency of the CPU, allowing it to execute instructions
faster and more efficiently.
The Instruction Cycle –
Each phase of Instruction Cycle can be decomposed into a sequence of elementary micro-
operations. In the above examples, there is one sequence each for the Fetch, Indirect,
Execute and Interrupt Cycles.
The Indirect Cycle is always followed by the Execute Cycle. The Interrupt Cycle is always
followed by the Fetch Cycle. For both fetch and execute cycles, the next cycle depends on
the state of the system.
The Fetch Cycle –
At the beginning of the fetch cycle, the address of the next instruction to be
executed is in the Program Counter(PC).
Step 1: The address in the program counter is moved to the memory address
register(MAR), as this is the only register which is connected to address lines of
the system bus.
Step 2: The address in MAR is placed on the address bus, now the control unit
issues a READ command on the control bus, and the result appears on the data
bus and is then copied into the memory buffer register(MBR). Program counter
is incremented by one, to get ready for the next instruction. (These two action
can be performed simultaneously to save time)
Step 3: The content of the MBR is moved to the instruction register(IR).
Thus, a simple Fetch Cycle consist of three steps and four micro-operation.
Symbolically, we can write these sequence of events as follows:-
Hardware control
It is a control unit that uses a fixed set of logic gates & circuits to execute the instruction
The control signals for each instruction are hardwared into control unit, so the control unit
has a dedicated circuit for each possible instructions.
Hardwared control units are complex & fast but they can be inflexible & difficult to modify
The control hardware can be viewed as a state machine that changes from one state to
another in every clock cycle, depending on the contents of the instruction register, condition
codes & external I/P.
The o/p of the state machine are the control signals.
Memory hierarchy
In the Computer System Design, Memory Hierarchy is an enhancement to organize the
memory such that it can minimize the access time. The Memory Hierarchy was developed
based on a program behavior known as locality of references. The figure below clearly
demonstrates the different levels of the memory hierarchy.
Why Memory Hierarchy is Required in the System?
Memory Hierarchy is one of the most required things in Computer Memory as it helps in
optimizing the memory available in the computer. There are multiple levels present in the
memory, each one having a different size, different cost, etc. Some types of memory like
cache, and main memory are faster as compared to other types of memory but they are
having a little less size and are also costly whereas some memory has a little higher storage
value, but they are a little slower. Accessing of data is not similar in all types of memory,
some have faster access whereas some have slower access.
Types of Memory Hierarchy
This Memory Hierarchy Design is divided into 2 main types:
External Memory or Secondary Memory: Comprising of Magnetic Disk, Optical
Disk, and Magnetic Tape i.e. peripheral storage devices which are accessible by
the processor via an I/O Module.
Internal Memory or Primary Memory: Comprising of Main Memory, Cache
Memory & CPU registers. This is directly accessible by the processor.
Memory Hierarchy Design
1. Registers
Registers are small, high-speed memory units located in the CPU. They are used to store the
most frequently used data and instructions. Registers have the fastest access time and the
smallest storage capacity, typically ranging from 16 to 64 bits.
2. Cache Memory
Cache memory is a small, fast memory unit located close to the CPU. It stores frequently
used data and instructions that have been recently accessed from the main memory. Cache
memory is designed to minimize the time it takes to access data by providing the CPU with
quick access to frequently used data.
3. Main Memory
Main memory, also known as RAM (Random Access Memory), is the primary memory of a
computer system. It has a larger storage capacity than cache memory, but it is slower. Main
memory is used to store data and instructions that are currently in use by the CPU.
Types of Main Memory
Static RAM: Static RAM stores the binary information in flip flops and
information remains valid until power is supplied. It has a faster access time and
is used in implementing cache memory.
Dynamic RAM: It stores the binary information as a charge on the capacitor. It
requires refreshing circuitry to maintain the charge on the capacitors after a few
milliseconds. It contains more memory cells per unit area as compared to SRAM.
4. Secondary Storage
Secondary storage, such as hard disk drives (HDD) and solid-state drives (SSD), is a non-
volatile memory unit that has a larger storage capacity than main memory. It is used to store
data and instructions that are not currently in use by the CPU. Secondary storage has the
slowest access time and is typically the least expensive type of memory in the memory
hierarchy.
5. Magnetic Disk
Magnetic Disks are simply circular plates that are fabricated with either a metal or a plastic
or a magnetized material. The Magnetic disks work at a high speed inside the computer and
these are frequently used.
6. Magnetic Tape
Magnetic Tape is simply a magnetic recording device that is covered with a plastic film. It is
generally used for the backup of data. In the case of a magnetic tape, the access time for a
computer is a little slower and therefore, it requires some amount of time for accessing the
strip.
Speed: RAM is much faster than other types of storage, such as a hard drive or
solid-state drive, which means that the computer can access the data stored in
RAM more quickly.
Flexibility: RAM is volatile memory, which means that the data stored in it can
be easily modified or deleted. This makes it ideal for storing data that the
computer is currently using or processing.
Capacity: The capacity of RAM can be easily upgraded, which allows the
computer to store more data in memory and thus improve performance.
Power Management: RAM consumes less power compared to hard drives, and
solid-state drives, which makes it an ideal memory for portable devices.
Disadvantages of Random Access Memory (RAM)
Volatility: RAM is volatile memory, which means that the data stored in it is lost
when the power is turned off. This can be a problem for important data that
needs to be preserved, such as unsaved work or files that have not been backed
up.
Capacity: The capacity of RAM is limited, and although it can be upgraded, it may
still not be sufficient for certain applications or tasks that require a lot of memory.
Cost: RAM can be relatively expensive compared to other types of memory, such
as hard drives or solid-state drives, which can make upgrading the memory of a
computer or device more costly.
Non-volatility: ROM is non-volatile memory, which means that the data stored
in it is retained even when the power is turned off. This makes it ideal for storing
data that does not need to be modified, such as the BIOS or firmware for other
hardware devices.
Reliability: Because the data stored in ROM is not easily modified, it is less prone
to corruption or errors than other types of memory.
Power Management: ROM consumes less power compared to other types of
memory, which makes it an ideal memory for portable devices.
Limited Flexibility: ROM is read-only memory, which means that the data stored
in it cannot be modified. This can be a problem for applications or firmware that
need to be updated or modified.
Limited Capacity: The capacity of ROM is typically limited, and upgrading it can
be difficult or expensive.
Cost: ROM can be relatively expensive compared to other types of memory, such
as hard drives or solid-state drives, which can make upgrading the memory of a
computer or device more costly.
While copying virtual memory into physical memory, the OS divides memory with a fixed
number of addresses into either pagefiles or swap files. Each page is stored on a disk, and
when the page is needed, the OS copies it from the disk to main memory and translates the
virtual addresses into real addresses.
However, the process of swapping virtual memory to physical is rather slow. This means using
virtual memory generally causes a noticeable reduction in performance. Because of swapping,
computers with more RAM are considered to have better performance.
What are the benefits of using virtual memory?
Cache memory
Cache Memory is a special very high-speed memory. The data or contents of the main
memory that are used frequently by CPU are stored in the cache memory so that the
processor can easily access that data in a shorter time. Whenever the CPU needs to access
memory, it first checks the cache memory. If the data is not found in cache memory, then the
CPU moves into the main memory.
Cache memory is placed between the CPU and the main memory. The block diagram for a
cache memory can be represented as:
The cache is the fastest component in the memory hierarchy and approaches the speed of
CPU components.
Cache memory is organised as distinct set of blocks where each set contains a small fixed
number of blocks.
Cache Performance
When the processor needs to read or write a location in the main memory, it first checks
for a corresponding entry in the cache.
If the processor finds that the memory location is in the cache, a Cache Hit has occurred and
data is read from the cache.
If the processor does not find the memory location in the cache, a cache miss has occurred.
For a cache miss, the cache allocates a new entry and copies in data from the main memory,
then the request is fulfilled from the contents of the cache.
The performance of cache memory is frequently measured in terms of a quantity called Hit
ratio.
Interleaved Memory
Interleaved memory is designed to compensate for the relatively slow speed of dynamic
random-access memory (DRAM) or core memory by spreading memory addresses evenly
across memory banks. In this way, contiguous memory reads and writes use each memory
bank, resulting in higher memory throughput due to reduced waiting for memory banks to
become ready for the operations.
It is different from multi-channel memory architectures, primarily as interleaved memory does
not add more channels between the main memory and the memory controller. However,
channel interleaving is also possible, for example, in Freescale i.MX6 processors, which allow
interleaving to be done between two channels. With interleaved memory, memory addresses
are allocated to each memory bank.
Example of Interleaved Memory
It is an abstraction technique that divides memory into many modules such that successive
words in the address space are placed in different modules.
Suppose we have 4 memory banks, each containing 256 bytes, and then the Block Oriented
scheme (no interleaving) will assign virtual addresses 0 to 255 to the first bank and 256 to 511
to the second bank. But in Interleaved memory, virtual address 0 will be with the first bank, 1
with the second memory bank, 2 with the third bank and 3 with the fourth, and then 4 with
the first memory bank again.
Chapter – 5
Input-Output Interface
When we type something from our keyboard, the input data is transferred to the computer's
CPU, and the screen displays the output data. But how does our computer's CPU or processors
share information with the external Input-Output devices? Well, we can achieve this with the
input-output Interface.
The Input-output interface allows transferring information between external input/output
devices (i.e., peripherals) and processors. This article will discuss the Input-Output Interface,
its structure, and function.
1. In the above figure, we see that every peripheral device has an interface unit
associated with it.
2. For example, A mouse or keyboard that provides input to the computer is called an
input device while a printer or monitor that provides output to the computer is called
an output device.
The I/O buses include control lines, address lines, and data lines. In any general computer, the
printer, keyboard, magnetic disk, and display terminal are commonly connected. Every
peripheral unit has an interface associated with it. Every interface decodes the address and
control received from the I/O bus.
It can describe the control and address received from the computer peripheral and supports
signals for the computer peripheral controller. It can also conduct the transfer of information
or data between peripheral and processor and can also integrate the data flow.
The I/O buses are linked to all the peripheral interfaces from the computer processor. The
processor locates a device address on the address line for interaction with a specific device.
Every interface contains an address decoder that monitors the address lines, attached to the
I/O bus.
When the interface recognizes the address, it activates the direction between the bus and the
device that it controls. The interface will disable the peripherals whose address is not
equivalent to the address in the bus.
1. The CPU sends a command to the I/O device to initiate the transfer. This command may
specify the transfer direction (i.e., from memory to the I/O device or from the I/O device to
memory), the starting address in memory where the data is to be stored or retrieved, and the
number of bytes to be transferred.
2. The I/O device receives the command and begins the transfer. It transfers the data to or
from the specified memory location, one byte at a time.
3. The CPU repeatedly polls the status register of the I/O device to monitor the progress of
the transfer. The status register contains information about the current state of the I/O
device, including whether the transfer is complete.
4. When the transfer is complete, the I/O device sets a flag in the status register to indicate
that the transfer is done. The CPU reads the status register, and when it sees the flag, it knows
that the transfer is complete.
5. The CPU can resume other tasks after the transfer is finished. In programmed I/O transfer,
the CPU manages the transfer and monitors its progress. This allows for low-level control over
the transfer but also means that the CPU is tied up during the transfer and cannot perform
other tasks. Programmed I/O transfer is typically slower than other transfer modes, such as
direct memory access (DMA), as the CPU must repeatedly poll the status register to monitor
the transfer.
Here are a few examples of how programmed I/O transfer might be used in a computer system:
1. Keyboard input: When a user types on the keyboard, the keyboard sends a signal to the
CPU indicating that data is ready for transfer. The CPU uses programmed I/O transfer to
receive the data from the keyboard, one character at a time.
2. Serial communication: When a computer communicates with another device over a serial
connection, the CPU uses programmed I/O transfer to send and receive data. The CPU sends
a command to the serial communication device to initiate the transfer and then waits for the
transfer to complete.
3. Disk I/O: When a computer reads data from a disk or writes data to a disk, the CPU uses
programmed I/O transfer to transfer the data. The CPU sends a command to the disk
controller to initiate the transfer and then waits for the transfer to complete.
4. Display output: When a computer displays data on a screen, the CPU uses programmed
I/O transfer to send the data to the display adapter. The CPU sends a command to the display
adapter to initiate the transfer and then waits for the transfer to complete.
These are just a few examples of how programmed I/O transfer might be used in a computer
system. The specific details of the transfer will vary depending on the I/O device and the
architecture of the computer system.
Advantages of Programmed I/O Transfer
1. Flexibility: Programmed I/O transfer allows for fine-grained control over the transfer. The
CPU can issue commands to the I/O device to control the transfer, which can be useful for
certain applications requiring low-level control.
2. Compatibility: Programmed I/O transfer works with a wide range of I/O devices, making it
a versatile transfer mode.
3. Debugging: As the CPU actively manages the transfer, diagnosing and fixing problems that
may occur during the transfer is easier.
4. Simple Implementation: Programmed I/O transfer can be implemented using simple
programming constructs, such as loops and status register polling.
5. Cost Effective: Programmed I/O transfer does not require specialized hardware, making it
a cost-effective solution for transferring data in certain situations.
These are just a few advantages of programmed I/O transfer. The advantages will
depend on the particular system used and the application's requirements.
Input-Output Processor
The DMA mode of data transfer reduces the CPU’s overhead in handling I/O operations. It
also allows parallelism in CPU and I/O operations. Such parallelism is necessary to avoid the
wastage of valuable CPU time while handling I/O devices whose speeds are much slower as
compared to CPU. The concept of DMA operation can be extended to relieve the CPU
further from getting involved with the execution of I/O operations. This gives rise to the
development of special purpose processors called Input-Output Processor (IOP) or IO
channels.
The Input-Output Processor (IOP) is just like a CPU that handles the details of I/O operations.
It is more equipped with facilities than those available in a typical DMA controller. The IOP
can fetch and execute its own instructions that are specifically designed to characterize I/O
transfers. In addition to the I/O-related tasks, it can perform other processing tasks like
arithmetic, logic, branching, and code translation. The main memory unit takes a pivotal
role. It communicates with the processor by means of DMA.
The Input-Output Processor is a specialized processor which loads and stores data in
memory along with the execution of I/O instructions. It acts as an interface between the
system and devices. It involves a sequence of events to execute I/O operations and then
store the results in memory.
Features of an Input-Output Processor
Specialized Hardware: An IOP is equipped with specialized hardware that is
optimized for handling input/output operations. This hardware includes
input/output ports, DMA controllers, and interrupt controllers.
DMA Capability: An IOP has the capability to perform Direct Memory Access
(DMA) operations. DMA allows data to be transferred directly between
peripheral devices and memory without going through the CPU, thereby freeing
up the CPU for other tasks.
Interrupt Handling: An IOP can handle interrupts from peripheral devices and
manage them independently of the CPU. This allows the CPU to focus on
executing application programs while the IOP handles interrupts from peripheral
devices.
Protocol Handling: An IOP can handle communication protocols for different
types of devices such as Ethernet, USB, and SCSI. This allows the IOP to interface
with a wide range of devices without requiring additional software support from
the CPU.
Buffering: An IOP can buffer data between the CPU and peripheral devices. This
allows the IOP to handle large amounts of data without overloading the CPU or
the peripheral devices.
Command Processing: An IOP can process commands from peripheral devices
independently of the CPU. This allows the CPU to focus on executing application
programs while the IOP handles peripheral device commands.
Parallel Processing: An IOP can perform input/output operations in parallel with
the CPU. This allows the system to handle multiple tasks simultaneously and
improve overall system performance.
Applications of I/O Processors
Data Acquisition Systems: I/O processors can be used in data acquisition systems
to acquire and process data from various sensors and input devices. The I/O
processor can handle high-speed data transfer and perform real-time processing
of the acquired data.
Industrial Control Systems: I/O processors can be used in industrial control
systems to interface with various control devices and sensors. The I/O processor
can provide precise timing and control signals, and can also perform local
processing of the input data.
Multimedia Applications: I/O processors can be used in multimedia applications
to handle the input and output of multimedia data, such as audio and video. The
I/O processor can perform real-time processing of multimedia data, including
decoding, encoding, and compression.
Network Communication Systems: I/O processors can be used in network
communication systems to handle the input and output of data packets. The I/O
processor can perform packet routing, filtering, and processing, and can also
perform encryption and decryption of the data.
Storage Systems: I/O processors can be used in storage systems to handle the
input and output of data to and from storage devices. The I/O processor can
handle high-speed data transfer and perform data caching and prefetching
operations.
Input-Output Interface
Introduction
When we type something from our keyboard, the input data is transferred to the computer's
CPU, and the screen displays the output data. But how does our computer's CPU or processors
share information with the external Input-Output devices? Well, we can achieve this with the
input-output Interface.
The Input-output interface allows transferring information between external input/output
devices (i.e., peripherals) and processors. This article will discuss the Input-Output Interface,
its structure, and function.
1. In the above figure, we see that every peripheral device has an interface unit
associated with it.
2. For example, A mouse or keyboard that provides input to the computer is called an
input device while a printer or monitor that provides output to the computer is called
an output device.
Data Bus
Address Bus
Control Bus
Address Lines:
Used to carry the address to memory and IO.
Unidirectional.
Based on the width of an address bus we can determine the capacity of a main
memory
The width of the address bus determines the memory capacity of the system.
The content of address lines is also used for addressing I/O ports. The higher-
order bits determine the bus module, and the lower-ordered bits determine
the address of memory locations or I/O ports.
The content of the address lines of the bus determines the source or
destination of the data present on the data bus. The number of address lines
together is referred to as the address bus. The number of address lines in
the address bus determines its width.
Whenever the processor has to read a word from memory, it simply places
the address of the corresponding word on the address line.
Data Lines
Each data line is able to transfer only one bit at a time. So the number of
data lines in a data bus determines how many bits it can transfer at a time.
The performance of the system also depends on the width of the data bus.
Used to carry the binary data between the CPU, memory and IO.
Bidirectional.
Based on the width of a data bus we can determine the word length of a CPU.
Based on the word length we can determine the performance of a CPU.
Control Lines:
Used to carry the control signals and timing signals
Control signals indicate the type of operation. The control signal consists
of the command and timing information
Timing Signals are used to synchronize the memory and IO operations with
a CPU clock.
Typical Control Lines may include Memory Read/Write, IO Read/Write, Bus
Request/Grant, etc.
2) Method of Arbitration
Determining who can use the bus at a particular time.
A) Centralized
A single hardware device called the bus controller or arbiter allocate time on
the bus.
The device may be a separate or a part of a processor.
B) Distributed
There is no centralized controllers.
Each module contains assess control logic and the modules act together.
3) Timing
A) Synchronous Timing
Bus includes a clock line upon which a clock transmits a regular sequence of
alternating 1's and 0's
A single 1-0 transition is referred to as a clock cycle or bus cycle.
All other devices on the bus can read the clock line.
All events start at the beginning of a clock cycle
B) Asynchronous Timing
The occurrence of one event on a bus follows and depends on the occurrence
of a previous event.
Harder to implement and text than synchronous timing.
4) Bus Width
The width of data bus has an impact on the databus has an impact on the
system performance.
The wider data bus, the greater number of bit transferred at one time.
The wider address bus, the greater range of location that can be referenced.
5) Data Transfer Type
Read-Modify-Write : A read followed immediately by a write to the same
address.
Read-After-Write : Consisting of a write followed immediately by a read
from the same address (for error checking purposes).
A small computer systems interface (SCSI) is a standard interface for connecting peripheral
devices to a PC. Depending on the standard, generally it can connect up to 16 peripheral
devices using a single bus including one host adapter. SCSI is used to increase performance,
deliver faster data transfer transmission and provide larger expansion for devices such as CD-
ROM drives, scanners, DVD drives and CD writers. SCSI is also frequently used with RAID,
servers, high-performance PCs and storage area networks SCSI has a controller in charge of
transferring data between the devices and the SCSI bus. It is either embedded on the
motherboard or a host adapter is inserted into an expansion slot on the motherboard. The
controller also contains SCSI basic input/output system, which is a small chip providing the
required software to access and control devices. Each device on a parallel SCSI bus must be
assigned a number between 0 and 7 on a narrow bus or 0 and 15 on a wider bus. This number
is called an SCSI ID. Newer serial SCSI IDs such as serial attached SCSI (SAS) use an automatic
process assigning a 7-bit number with the use of serial storage architecture initiators.
USB was designed to standardize the connection of peripherals like pointing devices,
keyboards, digital still, and video cameras. But soon devices such as printers, portable media
players, disk drives, and network adaptors to personal computers used USB to communicate
and to supply electric power. It is commonplace to many devices and has largely replaced
interfaces such as serial ports and parallel ports. USB connectors have replaced other types
of battery chargers for portable devices with themselves.
USB(Universal Serial Bus)
Universal Serial Bus (USB) is an industry standard that establishes specifications for
connectors, cables, and protocols for communication, connection, and power supply
between personal computers and their peripheral devices. There have been 3 generations
of USB specifications:
USB 1.x
USB 2.0
USB 3.x
The first USB was formulated in the mid-1990s. USB 1.1 was announced in 1995 and
released in 1996. It was too popular and grab the market till about the year 2000. In the
duration of USB 1.1 Intel announced a USB host controller and Philips announced USB audio
for isochronous communication with consumer electronics devices.
In April of 2000, USB 2.0 was announced. USB 2.0 has multiple updates and additions. The
USB Implementer Forum (USB IF) currently maintains the USB standard and it was released
in 1996.
In modern times, all computers contain at least one USB port in different locations. Below, a
list is given that contains USB port locations on the devices that may help you out to find
them.
Laptop computer: A laptop computer may contain one to four ports on the left or right side,
and some laptops have on the behind of the laptop computer.
Desktop computer: Usually, a desktop computer has 2 to 4 USB ports in the front and 2 to 8
ports on the backside.
Tablet computer: On the tablet, a USB connection is situated in the charging port and is
sometimes USB-C and usually micro USB.
Smartphone: In the form of micro USB or USB-C, a USB port is used for both data transfer and
charging, similar to tablets on smartphones.
There are different shapes and sizes available for the USB connector. Also, there are
numerous versions of USB connectors, such as Mini USB, Micro USB, etc.
1. Mini -USB: Mini USB is used with digital cameras and computer peripherals and
divided into A-type, B-type and AB-type. It is also known as mini-B and is the most
common type of interface. On the latest devices, Micro-USB and USB-C cables have
largely replaced the mini-USB. It transfers data and power between two devices as it
is made of coaxial cable. Also, it is applied to MP3 players, digital cameras, and mobile
hard drives. In a mini USB cable, one-end is a much smaller quadrilateral hub, and the
other end is a standard flat-head USB hub. Thus, it is easily plugged into mobile
devices. The mini USB can also be used to transfer data between computers with at
least one USB port but is mainly used for charging devices. It includes two advantages:
Waterproofness and Portability.
2. Micro-USB: It is a reduced version of the USB (Universal Serial Bus). It was announced
in 2007 and designed to replace mini-USB and developed for connecting compact and
mobile devices such as digital cameras, smartphones, GPS devices, Mp3 players and
photo printers.
Micro A, micro B and micro USB 3 are the three varieties of Micro-USB. The type
Micro-A and Micro-B have a connector size of 6.85 x 1.8 mm, although the Micro-A
connector has a greater maximum overmild size. USB 3 micro is more similar to micro
B, but it has better speed as compared to micro B because it includes an additional
collection of pins on the side for twice the wires. Micro versions are hot-swappable,
and plug-and-play like standard USB and micro-USB is still widely used with electronic
devices.
3. USB Type-C: On most modern newer Android smartphones and other USB-connected
devices, a USB Type-C cable is a relatively new type of connector. It is used for
delivering data and power to computing devices. As compared to other forms of USB
connections, USB-C cables are reversible; they can be plugged either way in the
devices, whether they are upside down.
o USB 1.0 is capable of supporting up to 127 peripheral devices and also able to support
data transfer rates of 12 Mbps as it is an external bus standard.
o In 2001, USB 2.0 was developed by Phillips, Lucent, Microsoft, Hewlett Packard, Intel,
NEC, and Compaq that is also known as hi-speed USB. It has the ability to support a
transfer rate of 60 megabytes per second or up to 480 Mbps (megabits per second).
o In November 2009, USB 3.0 was available for the first time by Buffalo Technology,
which is also called as SuperSpeed USB. But until January 2010, the first certified
devices were not available. The performance and increased speed in USB 3.0, also
helped to improve upon the USB 2.0 technology, power management and increased
bandwidth capability.
It includes a feature to offer two unidirectional data paths to send and receive data in
one go. It can support transfer rates of 640 megabytes per second, or up to 5.0 gigabits
per second (Gbps). Its name was changed to USB 3.1 Gen1 for making purposes after
releasing the USB 3.1. The first certified devices were designed with motherboards of
Gigabyte and ASUS Technology. In April 2011, Dell started to introduce USB 3.0 ports
with their Dell XPS and Inspiron series of computers.
o USB 3.1 is the latest version of the USB protocol that was made available till 31 July
2013, which is also known as SuperSpeed It can support transfer rates of up to 10
Gbps. Nowadays, USB 3.0 and 3.1 revisions are used by various devices to improve
speed and performance.
Every version of the USB port can support any version of USB (Universal Serial Bus), as it has
the ability to compatible with both backward and forward. For example, devices are able to
work in a 3.0 designing with USB 1.1 and 2.0 technology. Even though USB 3.0 is capable of
higher speed, devices with lower versions run at their native transfer speed. Therefore, if a
USB 3.1 device is connected with a USB 2.0 port, its max transfer rate is limited to that of the
2.0 port.
Parallel processing
Parallel processing is a computing technique when multiple streams of calculations or data
processing tasks co-occur through numerous central processing units (CPUs) working
concurrently. This article explains how parallel processing works and examples of its
application in real-world use cases.
Pipelining
Pipelining is a process of arrangement of hardware elements of the CPU such that its overall
performance is increased. Simultaneous execution of more than one instruction takes place
in a pipelined processor. Let us see a real-life example that works on the concept of
pipelined operation. Consider a water bottle packaging plant. Let there be 3 stages that a
bottle should pass through, Inserting the bottle(I), Filling water in the bottle(F), and Sealing
the bottle(S). Let us consider these stages as stage 1, stage 2, and stage 3 respectively. Let
each stage take 1 minute to complete its operation. Now, in a non-pipelined operation, a
bottle is first inserted in the plant, after 1 minute it is moved to stage 2 where water is filled.
Now, in stage 1 nothing is happening. Similarly, when the bottle moves to stage 3, both
stage 1 and stage 2 are idle. But in pipelined operation, when the bottle is in stage 2, another
bottle can be loaded at stage 1. Similarly, when the bottle is in stage 3, there can be one
bottle each in stage 1 and stage 2. So, after each minute, we get a new bottle at the end of
stage 3. Hence, the average time taken to manufacture 1 bottle is:
Without pipelining = 9/3 minutes = 3m
IFS||||||
|||IFS|||
| | | | | | I F S (9 minutes)
With pipelining = 5/3 minutes = 1.67m
IFS||
|IFS|
| | I F S (5 minutes)
Thus, pipelined operation increases the efficiency of a system.
Design of a basic pipeline
In a pipelined processor, a pipeline has two ends, the input end and the output
end. Between these ends, there are multiple stages/segments such that the
output of one stage is connected to the input of the next stage and each stage
performs a specific operation.
Interface registers are used to hold the intermediate output between two
stages. These interface registers are also called latch or buffer.
All the stages in the pipeline along with the interface registers are controlled by
a common clock.
Execution in a pipelined processor Execution sequence of instructions in a pipelined
processor can be visualized using a space-time diagram. For example, consider a processor
having 4 stages and let there be 2 instructions to be executed. We can visualize the
execution sequence through the following space-time diagrams:
Non-overlapped execution:
Stage / Cycle 1 2 3 4 5 6 7 8
S1 I1 I2
S2 I1 I2
S3 I1 I2
S4 I1 I2
S1 I1 I2
S2 I1 I2
S3 I1 I2
Stage / Cycle 1 2 3 4 5
S4 I1 I2
Total time = 5 Cycle Pipeline Stages RISC processor has 5 stage instruction pipeline to
execute all the instructions in the RISC instruction set. Following are the 5 stages of the
RISC pipeline with their respective operations:
Stage 1 (Instruction Fetch) In this stage the CPU reads instructions from the
address in the memory whose value is present in the program counter.
Stage 2 (Instruction Decode) In this stage, instruction is decoded and the
register file is accessed to get the values from the registers used in the
instruction.
Stage 3 (Instruction Execute) In this stage, ALU operations are performed.
Stage 4 (Memory Access) In this stage, memory operands are read and written
from/to the memory that is present in the instruction.
Stage 5 (Write Back) In this stage, computed/fetched value is written back to
the register present in the instructions.
Performance of a pipelined processor Consider a ‘k’ segment pipeline with clock cycle
time as ‘Tp’. Let there be ‘n’ tasks to be completed in the pipelined processor. Now, the
first instruction is going to take ‘k’ cycles to come out of the pipeline but the other ‘n – 1’
instructions will take only ‘1’ cycle each, i.e, a total of ‘n – 1’ cycles. So, time taken to
execute ‘n’ instructions in a pipelined processor:
ETpipeline = k + n – 1 cycles
= (k + n – 1) Tp
In the same case, for a non-pipelined processor, the execution time of ‘n’ instructions will
be:
ETnon-pipeline = n * k * Tp
So, speedup (S) of the pipelined processor over the non-pipelined processor, when ‘n’
tasks are executed on the same processor is:
S = Performance of non-pipelined processor /
Performance of pipelined processor
As the performance of a processor is inversely proportional to the execution time, we
have,
S = ETnon-pipeline / ETpipeline
=> S = [n * k * Tp] / [(k + n – 1) * Tp]
S = [n * k] / [k + n – 1]
When the number of tasks ‘n’ is significantly larger than k, that is, n >> k
S=n*k/n
S=k
where ‘k’ are the number of stages in the pipeline. Also, Efficiency = Given speed up / Max
speed up = S / Smax We know that Smax = k So, Efficiency = S / k Throughput = Number of
instructions / Total time to complete the instructions So, Throughput = n / (k + n – 1) * Tp
Note: The cycles per instruction (CPI) value of an ideal pipelined processor is 1 Please see Set
2 for Dependencies and Data Hazard and Set 3 for Types of pipeline and Stalling.
Performance of pipeline is measured using two main metrices as Throughput and latency.
Throughput:
It measure number of instruction completed per unit time.
It represents overall processing speed of pipeline.
Higher throughput indicate processing speed of pipeline.
Calculated as, throughput= number of instruction executed/ execution time.
It can be affected by pipeline length, clock frequency. efficiency of instruction
execution and presence of pipeline hazards or stalls.
Latency:
It measure time taken for a single instruction to complete its execution.
It represents delay or time it takes for an instruction to pass through pipeline
stages.
Lower latency indicates better performance .
It is calculated as, Latency= Execution time/ Number of instruction executed.
It in influenced by pipeline length, depth, clock cycle time, instruction
dependencies and pipeline hazards.
Linear Pipelining
Linear pipeline is a pipeline in which a series of processors are connected together in a
serial manner. In linear pipeline the data flows from the first block to the final block of
processor. The processing of data is done in a linear and sequential manner. The input is
supplied to the first block and we get the output from the last block till which the
processing of data is being done. The linear pipelines can be further be divided into
synchronous and asynchronous models. Linear pipelines are typically used when the data
transformation process is straightforward and can be performed in a single path.
Applications of Multiprocessor –
1. As a uniprocessor, such as single instruction, single data stream (SISD).
2. As a multiprocessor, such as single instruction, multiple data stream (SIMD),
which is usually used for vector processing.
3. Multiple series of instructions in a single perspective, such as multiple
instruction, single data stream (MISD), which is used for describing hyper-
threading or pipelined processors.
4. Inside a single system for executing multiple, individual series of instructions in
multiple perspectives, such as multiple instruction, multiple data stream
(MIMD).
Benefits of using a Multiprocessor –
Enhanced performance.
Multiple applications.
Multi-tasking inside an application.
High throughput and responsiveness.
Hardware sharing among CPUs.
Advantages:
Improved performance: Multiprocessor systems can execute tasks faster than single-
processor systems, as the workload can be distributed across multiple processors.
Better scalability: Multiprocessor systems can be scaled more easily than single-processor
systems, as additional processors can be added to the system to handle increased
workloads.
Increased reliability: Multiprocessor systems can continue to operate even if one
processor fails, as the remaining processors can continue to execute tasks.
Reduced cost: Multiprocessor systems can be more cost-effective than building multiple
single-processor systems to handle the same workload.
Enhanced parallelism: Multiprocessor systems allow for greater parallelism, as different
processors can execute different tasks simultaneously.
Disadvantages:
Advantages:
Improved performance: Multicomputer systems can execute tasks faster than single-
computer systems, as the workload can be distributed across multiple computers.
Better scalability: Multicomputer systems can be scaled more easily than single-computer
systems, as additional computers can be added to the system to handle increased
workloads.
Increased reliability: Multicomputer systems can continue to operate even if one
computer fails, as the remaining computers can continue to execute tasks.
Reduced cost: Multicomputer systems can be more cost-effective than building a single
large computer system to handle the same workload.
Enhanced parallelism: Multicomputer systems allow for greater parallelism, as different
computers can execute different tasks simultaneously.
Disadvantages:
Increased complexity: Multicomputer systems are more complex than single-computer
systems, and they require additional hardware, software, and management resources.
Higher power consumption: Multicomputer systems require more power to operate than
single-computer systems, which can increase the cost of operating and maintaining the
system.
Difficult programming: Developing software that can effectively utilize multiple
computers can be challenging, and it requires specialized programming skills.
Synchronization issues: Multicomputer systems require synchronization between
computers to ensure that tasks are executed correctly and efficiently, which can add
complexity and overhead to the system.
Network latency: Multicomputer systems rely on a network to communicate between
computers, and network latency can impact system performance.
Flynn’’s Classification
Parallel computing is computing where the jobs are broken into discrete parts that can be
executed concurrently. Each part is further broken down into a series of instructions.
Instructions from each piece execute simultaneously on different CPUs. The breaking up of
different parts of a task among multiple processors will help to reduce the amount of time
to run a program. Parallel systems deal with the simultaneous use of multiple computer
resources that can include a single computer with multiple processors, a number of
computers connected by a network to form a parallel processing cluster, or a combination
of both. Parallel systems are more difficult to program than computers with a single
processor because the architecture of parallel computers varies accordingly and the
processes of multiple CPUs must be coordinated and synchronized. The difficult problem of
parallel processing is portability.
An Instruction Stream is a sequence of instructions that are read from memory. Data Stream
is the operations performed on the data in the processor.
Flynn’s taxonomy is a classification scheme for computer architectures proposed by Michael
Flynn in 1966. The taxonomy is based on the number of instruction streams and data
streams that can be processed simultaneously by a computer architecture.
There are four categories in Flynn’s taxonomy:
1. Single Instruction Single Data (SISD): In a SISD architecture, there is a single
processor that executes a single instruction stream and operates on a single data
stream. This is the simplest type of computer architecture and is used in most
traditional computers.
2. Single Instruction Multiple Data (SIMD): In a SIMD architecture, there is a single
processor that executes the same instruction on multiple data streams in parallel.
This type of architecture is used in applications such as image and signal
processing.
3. Multiple Instruction Single Data (MISD): In a MISD architecture, multiple
processors execute different instructions on the same data stream. This type of
architecture is not commonly used in practice, as it is difficult to find applications
that can be decomposed into independent instruction streams.
4. Multiple Instruction Multiple Data (MIMD): In a MIMD architecture, multiple
processors execute different instructions on different data streams. This type of
architecture is used in distributed computing, parallel processing, and other high-
performance computing applications.
Flynn’s taxonomy is a useful tool for understanding different types of computer
architectures and their strengths and weaknesses. The taxonomy highlights the importance
of parallelism in modern computing and shows how different types of parallelism can be
exploited to improve performance.
systems are classified into four major categories:
Flynn’s classification –
1. Single-instruction, single-data (SISD) systems – An SISD computing system is a
uniprocessor machine that is capable of executing a single instruction, operating
on a single data stream. In SISD, machine instructions are processed in a
sequential manner and computers adopting this model are popularly called
sequential computers. Most conventional computers have SISD architecture. All
the instructions and data to be processed have to be stored in primary memory.
The speed of the processing element in the SISD model is limited(dependent) by
the rate at which the computer can transfer information internally. Dominant
representative SISD systems are IBM PC, and workstations.
2. SIMD architecture: This type of architecture is highly parallel and can offer
significant performance gains for applications that can be parallelized. However,
it requires specialized hardware and software and is not well-suited for
applications that cannot be parallelized.
4. MIMD architecture: This type of architecture is highly parallel and can offer
significant performance gains for applications that can be parallelized. It is well-
suited for distributed computing, parallel processing, and other high-
performance computing applications. However, it requires specialized hardware
and software and can be challenging to program and debug.
Overall, the advantages and disadvantages of different types of computer architectures
depend on the specific application and the level of parallelism that can be exploited. Flynn’s
taxonomy is a useful tool for understanding the different types of computer architectures
and their potential uses, but ultimately the choice of architecture depends on the specific
needs of the application.
Some additional features of Flynn’s taxonomy include: