What is a computer?
A computer is a sophisticated electronic calculating machine that:
Accepts input information,
Processes the information according to a list of internally stored
instructions and Produces the resulting output information.
Functions performed by a computer are:
Accepting information to be processed as input.
Storing a list of instructions to process the information.
Processing the information according to the list of instructions.
Providing the results of the processing as output.
What are the functional units of a computer?
Functional units of a computer
Input unit Arithmetic and logic
accepts unit(ALU):
information: •Performs the
desired operations
•Human operators,
on the input
•Electromechanical information as
devices (keyboard) determined by
•Other computers Output instructions in the
Control memory
Output unit sends
results of processing:
Control unit
•To a monitor display, Stores coordinates various
•To a printer information: actions
•Instructions, •Input,
•Data
•Output
•Processing
Information in a computer -- Instructions
Instructions specify commands to:
Transfer information within a computer (e.g., from memory
to ALU)
Transfer of information between the computer and I/O
devices (e.g., from keyboard to computer, or computer to
printer)
Perform arithmetic and logic operations (e.g., Add two
numbers, Perform a logical AND).
A sequence of instructions to perform a task is called a
program, which is stored in the memory.
Processor fetches instructions that make up a program from
the memory and performs the operations stated in those
instructions.
Information in a computer -- Data
Data are the “operands” upon which instructions operate.
Data could be:
Numbers,
Encoded characters.
Data, in a broad sense means any digital information.
Computers use data that is encoded as a string of binary digits called bits.
Input unit
• Binary information must be presented to a computer in a
specific format. This task is performed by the input unit:
-Interfaces with input devices.
-Accepts binary information from the input devices.
-Presents this binary information in a format expected by the
computer.
-Transfers this information to the memory or processor.
Memory Unit
Memory unit stores instructions and data.
Recall, data is represented as a series of bits.
To store data, memory unit thus stores bits.
Processor reads instructions and reads/writes data from/to the memory
during the execution of a program.
In theory, instructions and data could be fetched one bit at a time.
In practice, a group of bits is fetched at a time.
Group of bits stored or retrieved at a time is termed as “word”
Number of bits in a word is termed as the “word length” of a
computer.
In order to read/write to and from memory, a processor should know
where to look:
“Address” is associated with each word location.
Memory Unit contd.,
Processor reads/writes to/from memory based on the memory address:
Access any word location in a short and fixed amount of time based on the
address.
Random Access Memory (RAM) provides fixed access time independent
of the location of the word.
Access time is known as “Memory Access Time”.
Memory and processor have to “communicate” with each other in order to
read/write information.
In order to reduce “communication time”, a small amount of RAM (known
as Cache) is tightly coupled with the processor.
Modern computers have three to four levels of RAM units with different speeds
and sizes:
Fastest, smallest known as Cache
Slowest, largest known as Main memory.
Memory Unit contd.,
Primary storage of the computer consists of RAM units.
Fastest, smallest unit is Cache.
Slowest, largest unit is Main Memory.
Primary storage is insufficient to store large amounts of data and programs.
Primary storage can be added, but it is expensive.
Store large amounts of data on secondary storage devices:
Magnetic disks and tapes,
Optical disks (CD-ROMS).
Access to the data stored in secondary storage in slower, but take advantage
of the fact that some information may be accessed infrequently.
Cost of a memory unit depends on its access time, lesser access time implies
higher cost.
Arithmetic and Logic Unit (ALU)
Operations are executed in the Arithmetic and Logic Unit (ALU).
Arithmetic operations such as addition, subtraction.
Logic operations such as comparison of numbers.
In order to execute an instruction, operands need to be brought into the ALU
from the memory.
Operands are stored in general purpose registers available in the ALU.
Access times of general purpose registers are faster than the cache.
Results of the operations are stored back in the memory or retained in the
processor for immediate use.
Output Unit
•Computers represent information in a specific binary form. Output units:
- Interface with output devices.
- Accept processed results provided by the computer in specific binary form.
- Convert the information in binary form to a form understood by an
output device.
Control Unit
Operation of a computer can be summarized as:
Accepts information from the input units (Input unit).
Stores the information (Memory).
Processes the information (ALU).
Provides processed results through the output units (Output unit).
Operations of Input unit, Memory, ALU and Output unit are coordinated by
Control unit.
Instructions control “what” operations take place (e.g. data transfer,
processing).
Control unit generates timing signals which determines “when” a
particular operation takes place.
Computer Types
How are the functional units connected?
• For a computer to achieve its operation, the functional units need to
communicate with each other.
•In order to communicate, they need to be connected.
•Functional units may be connected by a group of parallel wires.
•The group of parallel wires is called a bus.
•Each wire in a bus can transfer one bit of information.
• The number of parallel wires in a bus is equal to the word length of a
computer
Connection between the processor and the memory
Registers
In addition to the ALU and the control circuitry, the processor contains
number of registers used for several different purposes.
Instruction register (IR)
It holds the instruction that is currently being executed.
Its output is available to the control circuits which generates the timing
signals that control the various processing elements involved in executing
the instruction.
Program counter (PC)
PC is another specialized register.
It keeps track of the execution of a program. It contains the memory
address of the next instruction to be fetched and executed
Registers Contd.,
• During the execution of an instruction, the contents of the PC updated to
correspond to the address of the next instruction to executed.
• PC points to the next instruction that is to be fetched from the memory.
Two registers facilitate communication with the memory
Memory address register (MAR)
• holds the address of the memory location to be accessed.
Memory data register (MDR)
• MDR contains the data to be written into or read out of the addressed location.
General-purpose register (R0 – Rn-1)
Typical Operating Steps
Programs reside in the memory through input devices
PC is set to point to the first instruction
The contents of PC are transferred to MAR
A Read signal is sent to the memory
The first instruction is read out and loaded into MDR
The contents of MDR are transferred to IR
Decode and execute the instruction
• Get operands for ALU
General-purpose register
Memory (address to MAR – Read – MDR to ALU)
Perform operation in ALU
Store the result back
To general-purpose register
To memory (address to MAR, result to MDR – Write)
During the execution, PC is incremented to the next instruction
Example:
List the steps needed to execute the machine instruction Add LOCA, R0
In terms of transfers between the components and simple control commands. Assume that the
instruction itself is stored in the memory at location INSTR and that this address is initially in
register PC.
Solution:
• Transfer the contents of register PC to register MAR
• Issue a Read command to memory, and then wait until it has transferred the requested word into
register MDR
• Transfer the instruction from MDR into IR and decode it
• Transfer the address LOCA from IR to MAR
• Issue a Read command and wait until MDR is loaded
• Transfer contents of MDR to the ALU
• Transfer contents of R0 to the ALU
• Perform addition of the two operands in the ALU and transfer result into R0
• Transfer contents of PC to ALU
• Add 1 to operand in ALU and transfer incremented address to PC
Assignment Question
List the steps needed to execute the machine instruction
Add R1, R2, R3
In terms of transfers between the components and simple control
commands. Assume that the instruction itself is stored in the
memory at location INSTR and that this address is initially in register
PC.
• Example- Give a short sequence of machine instructions for the task
“Add the contents of memory location A to those of location B, and
place the answer in location c”.
Instructions Load LOC, Ri and Store Ri, LOC
are the only instructions available to transfer data between the
memory and general-purpose register Ri. Do not destroy the contents
of either location A or B.
Solution:
Load A, R0
Load B, R1
Add R0, R1
Store R1, C
Bus Structures(Cont..)
• Bus control lines are used to arbitrate multiple requests
for use of the bus.
• Single bus Structure:
• Low cost and its flexibility for attaching peripherals
devices.
• Multiple Bus:
• Systems that contain multiple buses achieve more
concurrency in operations by allowing two or more
transfers to be carried out at the same time.
• Better performance but at increased cost
Single Bus
Drawbacks of the Single Bus
Structure
The devices connected to a bus vary widely in their speed of operation
Some devices are relatively slow, such as printer and keyboard
Some devices are considerably fast, such as optical disks
Memory and processor units operate are the fastest parts of a computer
Efficient transfer mechanism thus is needed to cope with this problem
A common approach is to include buffer registers with the devices to hold
the information during transfers
An another approach is to use two-bus structure and an additional
transfer mechanism
• A high-performance bus, a low-performance, and a bridge for
transferring the data between the two buses. ARMA Bus belongs to this
structure
Performance
The speed with which a computer executes programs is affected by the
design of its hardware and its machine language instructions
Because programs are usually written in a high-level language, performance
is also affected by the compiler that translates programs into machine
languages
For best performance, the following factors must be considered
Compiler
Instruction set
Hardware design
Performance contd.,
Processor circuits are controlled by a timing signal called a clock
The clock defines regular time intervals, called clock cycles
To execute a machine instruction, the processor divides the action to be
performed into a sequence of basic steps, such that each step can be completed
in one clock cycle
Let the length P of one clock cycle, its inverse is the clock rate, R=1/P
Basic performance equation
T=(NxS)/R, where T is the processor time required to execute a program, N is
the number of instruction executions, and S is the average number of basic
steps needed to execute one machine instruction
Problems-1
1. PROCESSOR CLOCK
• Processor circuits are controlled by a timing signal called a clock.
• Clock defines regular time intervals, called clock cycles.
• To execute a machine instruction, processor divides the action to be performed
into a sequence of basic steps, such that each step can be completed in one clock
cycle.
• Length of one clock cycle is an important parameter that affects processor
performance. It is denoted by P.
• Its inverse is the clock rate, R = 1/P, which is measured in cycles per second. It is
denoted by hertz (Hz).
• Processors used in personal computers and workstations have clock rates that
range from a few hundred million to over a billion cycles per second.
1. PROCESSOR CLOCK
• Term “cycles per second” is called hertz (Hz).
• Term “million” is denoted by prefix Mega (M).
• Term “billion” is denoted by prefix Giga (G).
• Hence, 500 million cycles per second is usually abbreviated to 500 Megahertz
(MHz).
• 1250 million cycles per second is abbreviated to 1.25 Gigahertz (GHz).
2. BASIC PERFORMANCE EQUATION
• Let T be processor time required to execute a program.
• Assume that complete execution of the program requires the execution of N
machine language instructions.
• N is the actual number of instruction executions.
• N is not equal to number of machine instructions in the object program.
• Some instructions may be executed more than once. Ex: program loop.
• Others may not be executed at all.
• Suppose that average number of steps needed to execute one machine instruction
is S, where each basic step is completed in one clock cycle.
• If clock rate is R cycles per second, program execution time is given by
• This is often referred to as the basic performance equation.
2. BASIC PERFORMANCE EQUATION
• To achieve high performance, the computer designer must seek ways to reduce the
value of T, which means reducing N and S, and increasing R.
• Value of N is reduced if the source program is compiled into fewer machine
instructions.
• Value of S is reduced if instructions have a smaller number of basic steps to
perform or if the execution of instructions is overlapped.
• R can be increased using a higher-frequency clock, which means that time
required to complete a basic execution step is reduced.
Problem-1:
A CPU is driven by 2 GHz clock.
a) Compute the duration of one clock cycle.
b) Assume that on average the execution of an instruction takes 4 clock cycles. Compute
the performance of the CPU in terms of MIPS (millions of instructions per second).
c) Assume that executing a specific program of 400 million instructions takes 2 seconds.
How many clock cycles does it take on average to execute an instruction of this
program?
a) Compute the duration of one clock cycle.
A CPU is driven by 2 GHz clock.
CPU processor speed (no. of clock cycles per second) R = 2 GHz
= 2 * 109 Hz
Duration of one clock cycle P = 1/R = 1/ 2 * 109
= 0.5 * 10-9 Seconds
Problem-1:
b) Assume that on average the execution of an instruction takes 4 clock cycles.
Compute the performance of the CPU in terms of MIPS (millions of instructions per
second).
• Average execution of an instruction takes 4 clock cycles.
• Number of clock cycles per instruction (CPI) = 4
• MIPS(Million of Instructions Per Second) = Number of Clock Cycles per Second (CPU) /
Number of Clock Cycles per Instruction (CPI)
• MIPS=CPU/CPI
= 2 * 109/4
= 0.5 * 103 * 106
= 500 * 106
=500 MIPS
Problem-1:
c) Assume that executing a specific program of 400 million instructions takes 2
seconds. How many clock cycles does it take on average to execute an instruction
of this program?
Program of 400 million instructions = 400 *106 instructions
CPI (Cycles/Instruction) = Total Number of Clock Cycles (C) / Total Number
of Instructions (I)
Clock Rate = Total Number of Clock Cycles / Second
Total Number of Clock Cycles = Clock Rate * Second
Total Number of Clock Cycles = 2 GHz * 2 second
= 2 * 109 cycles per second * 2 second
= 4 * 109 cycles
CPI (Cycles/Instruction) = 4 * 109 cycles / 400 *106 instructions
= 10 cycles / instructions
Therefore, 10 cycles are needed for one instruction.
Problem-2:
Consider a 3.2 GHz CPU where executing data processing (arithmetic and logical)
instructions takes 4 clock cycles and executing data transfer (load and store) instructions
takes 10 clock cycles. When a specific program of one million instructions runs, 60% of the
instructions are data processing and 40% of the instructions are data transfer. How long
does it take to run this program to completion?
CPU processor speed (no. of clock cycles per second) R = 3.2 GHz
= 3.2 * 109 Hz
= 3.2 * 109 cycles per second
Number of Clock Cycles per Instruction (CPI) for data processing instructions = 4
Number of Clock Cycles per Instruction (CPI) for data transfer instructions = 10
Program of one million instructions runs = 1 * 106 instructions
CPU_time = Clock cycles for a program / Clock rate
Problem-2:
Total Number of Clock Cycles = CPI (Cycles/Instruction) * Total Number of
Instructions (I)
= (60/100) * 4 * 1 *106 + (40/100) * 10 * 1 * 106
= 0.6 * 4 * 106 + 0.4 * 10 * 106
= 2.4 * 106 + 4 * 106
= 6.4 * 106 cycles
CPU_time = Clock cycles for a program / Clock rate
= 6.4 * 106 cycles / 3.2 * 109 cycles per second
= 2 * 10-3 seconds
= 2 milliseconds
RISC vs CISC
Instructions and Instruction Sequencing
Program execution
• The processor contains a register called PC, which holds the address of the
instruction to be executed next.
• The address of the first instruction to be placed into PC. The processor control
circuits use the information in the PC to fetch and execute instructions, one at
a time, in order of increasing addresses. This is called straight-line sequencing.
• During the execution of each instruction PC is incremented by 4 to point the
next instruction. After the Move instruction at location i+8 is executed, the PC
contains value i+12, which is the address of the first location of the next
program segment.
• Executing a given instruction is a two-phase procedure.
• First phase, called instruction fetch, the instruction is fetched from the
memory location whose address is in the PC. This instruction is placed in IR.
• Second phase, called instruction execute, the instruction IR is examined to
determine which operation is to be performed. The specified operation is then
performed.
Branching
• Consider the task of adding the list of n numbers.
• The address of the memory locations containing the n numbers are given
as NUM1, NUM2,….NUMn and a separate add instruction is used to add
each number to the contents of register R0. After all the numbers have
been added, the result is placed in a memory location SUM.
Continued.,
• The loop is a straight line sequence of instructions executed as many times
as needed. It starts at location LOOP and ends at the instructions
Branch>0.
•During each pass through this loop, the address of the next list entry is
determined and that entry is fetched and added to R0.
•Assume that the number of entries in the list is n, which is stored in the
location N. Register R1 is used as a counter to determine the number of
times the loop is executed. Hence, the contents of location N are loaded
into register R1 at the beginning of the program. The instruction is
Decrement R1 reduces the contents R1 by 1 each time through the loop.
Execution of the loop is repeated as long as the result of the decrement
operation is greater than zero.
Continued.,
• Branch target: the processor fetches and executes the instruction at the
new address, instead of the instruction at the locations that follows the
branch instruction in sequential address order.
• A conditional branch instructions causes a branch only if a specified
condition is satisfied. If the condition is not satisfied, the PC is incremented
in the normal way, and the next instruction in sequential address order is
fetched and executed.
Condition codes
• The processor keeps track of information about the results of various
operations for use by subsequent conditional branch instructions. This is
accomplished by recording the required information in individual bits
called condition code flags.
• These flags are grouped together in a special processor register is called
condition code register or status register.
• N (negative) – set to 1 if the result is negative, otherwise, cleared to 0.
• Z (zero)- set to 1 if the result is 0, otherwise, cleared to 0.
• V (overflow)- set to 1 if arithmetic overflow occurs, otherwise, cleared to
0.
• C (carry)- set to 1 if a carry out results from the operation, otherwise,
cleared to 0.
Basic Input / Output Operations
We have assumed that the data on which these
instructions operate are already stored in the memory.
Now examine the means by which data are
transferred between the memory of a computer and
the outside world
Basic Input/Output Operations
Program-controlled I/O
Consider a task that
• reads in character input from a keyboard and produces character output on a display screen.
program-controlled I/O (method).
rate of data transfer from the keyboard to a computer:
• limited by the typing speed of the user
• to exceed a few characters per second.
Rate of output transfer from computer to display:
much higher.
It is determined by the rate which characters can be transmitted over the link between the computer
and the display device, typically several thousand characters per second.
Basic input/operations (Cont.,)
Still much slower than the speed of a processor that can execute many millions of
instructions per second.
The difference in speed between the processor and I/O devices creates the need for
mechanisms to synchronize the transfer of data between them.
Solution:
On output, the processor sends the first character and then waits for a signal from the
display that the character has been received.
It then sends the second character, and so on.
Input is sent from the keyboard in a similar way; the processor waits for a signal
indicating that a character key has been struck and that its code is available in some
buffer register associated with the keyboard.
Then the processor proceeds to read that code.
Basic input/operations
Consider the problem of moving a character code from the
keyboard to the processor.
Striking a key stores the corresponding character code in an 8-bit
buffer register (DATAIN) associated with the keyboard.
To inform the processor that a valid character is in DATAIN, a
status control flag, SIN, is set to 1.
A program monitors SIN, and when SIN is set to 1, the processor
reads the contents of DATAIN.
When the character is transferred to the processor, SIN is
automatically cleared to 0.
If a second character is entered at the keyboard, SIN is again set
to 1 and the process repeats.
Action of Key on the keyboard
• The action of striking a key on the keyboard does not automatically
cause the corresponding character to be displayed on the screen.
• One block of instructions in the I/O program transfers the character
into the processor, and another associated block of instructions
causes the character to be displayed.
Basic input/operations
When characters are transferred from the processor to the display.
A buffer register, DATAOUT, and a status control flag, SOUT, are used for this
transfer
When SOUT = 1,the display is ready to receive a character
Under program control, the processor monitors SOUT, and when SOUT is set to
1,the processor transfers a character code to DATAOUT.
The transfer of a character to DATAOUT clears SOUT to 0;
When the display device is ready to receive a second character, SOUT is again set
to 1.
The buffer registers DATAIN and DATAOUT and the status flags SIN and SOUT
are part of circuitry commonly known as a device interface.
The processor monitors
the status flag by
executing a short wait
loop and proceeds to
transfer the input data
when SIN is set to 1 as a
result of a key being
struck.
The wait loop is executed repeatedly until the status flag SOUT is set to 1 by
the display when it is free to receive a character. The Output operation
transfers a character from R1 to DATAOUT to be displayed, and it clears
SOUT to 0.
Program to read a line of characters and display
Stacks and Queues
A stack is a list of data elements, usually words or bytes, with the
accessing restriction that elements can be added or removed at one end of
the list only
It is also called a last-in-first-out (LIFO) stack
A stack has two basic operations: push and pop
The terms push and pop are used to describe placing a new item on the stack and
removing the top item from the stack, respectively.
Another useful data structure that is similar to the stack is called a queue
Data are stored in and retrieved from a queue on a first-in-first- out (FIFO) basis
Two pointers are needed to keep track of the two ends of the queue
A Stack of Words in the Memory
Stack pointer register Low address
SP -28 Current top element
17
739
Stack
BOTTOM 43 Bottom element
High address
Push and Pop Operations
Assume that a byte-addressable memory with 32-bit words
The push operation can be implemented as
Subtract #4, SP
Move NEWITEM, (SP)
These two instructions move the word from location NEWITEM onto the
top of the stack, decrementing the stack pointer by 4 before the move.
The pop operation can be implemented as
Move (SP), ITEM
Add #4, SP
These two instructions move the top value from the stack into location
ITEM and then increment the stack pointer by 4 so that it points to the
new top element.
If the processor has the Auto-increment and Auto-decrement
addressing modes, then the push operation can be implemented by the
single instruction
• Move NEWITEM, -(SP)
And the pop operation can be implemented as
• Move (SP)+, ITEM
Examples
SP 19
SP -28 SP -28
17 SP 17
43 43
NEWITEM 19 ITEM -28
Push operation Pop operation
Checking for Empty and Full Errors
When a stack is used in a program, it is usually allocated a fixed amount of
space in the memory
We must avoid pushing an item onto the stack when the stack has reached in its
maximum size, i.e., the stack is full
On the other hand, we must avoid popping an item off the stack when the stack has
reached in its minimum size, i.e., the stack is empty
Routine for a safe pop or a safe push operation
Compare src, dst
Perform [dst]-[src]
Sets the condition code flags according to the result
Subroutines
In a given program, it is often necessary to perform a particular
subtask many times on different data values. Such a subtask is called
a subroutine.
Memory Memory
location Calling program location Subroutine SUB
.
.
.
200 Call SUB 1000 first instruction
204 next instruction .
.
. .
. Return
.
The location where the calling program resumes execution is the
location pointed by the updated PC while the Call instruction is
being executed. Hence the contents of the PC must be saved by
the Call instruction to enable correct return to the calling program
Subroutine Linkage
The way in which a computer makes it possible to call and
return from subroutines is referred to as its subroutine
linkage method
Subroutine linkage using a link register
1000
PC 204
Link 204
Call Return
Subroutine Nesting
A common programming practice, called subroutine nesting, is
to have one subroutine call another
Subroutine nesting call be carried out to any depth.
Eventually, the last subroutine called completes its computations
and returns to the subroutine that called it
The return address needed for this first returns is the last one
generated in the nested call sequence. That is, return addresses
are generated and used in a last-in-first-out order
Many processors do this by using a stack pointer and the stack
pointer points to a stack called the processor stack
Example of Subroutine Nesting
Main Program SUB 1 SUB 2 SUB 3
A C
B
. . . .
. . . .
. . . .
Parameter Passing
When calling a subroutine, a program must provide to the
subroutine the parameters, that is, the operands or their addresses,
to be used in the computation. Later, the subroutine returns other
parameters, in this case, the result of computation
The exchange of information between a calling program and a
subroutine is referred to as parameter passing
Parameter passing approaches
The parameters may be placed in registers or in memory locations, where
they can be accessed by the subroutine
The parameters may be placed on the processor stack used for saving the
return address
Passing Parameters with Registers
passed by value
passing by reference
Calling program
Move N, R1 R1 serves as a counter
Move #NUM1, R2 R2 points to the list
Call LISTADD Call subroutine
M.ove R0, SUM Save result
.
.
Subroutine
LISTADD Clear R0 Initialize sum to 0
LOOP Add (R2)+, R0 Add entry from list
Decrement R1
Branch>0
LOOP Return to calling program
Return
Passing Parameters with Stack
Assume top of stack is at level 1 below.
Move #NUM1, -(SP) Push parameters onto stack
Move N, -(SP)
Call LISTADD Call subroutine
(top of stack at level 2)
Move 4(SP), SUM Save result
Add #8, SP Restore top of stack Level 3 [R2]
. (top of stack at level 1) [R1]
.
. [R0]
LISTADD MoveMultiple R0-R2, -(SP) Save registers Return address
Level 2
(top of stack at level 3)
N
Move 16(SP), R1 Initialize counter to N.
Move 20(SP), R2 Initialize pointer to the list NUM1
Clear R0 Initialize sum to 0 Level 1
LOOP Add (R2)+, R0 Add entry from list
Decrement R1
Branch>0 LOOP
Move R0, 20(SP) Put result on the stack
MoveMultiple (SP)+, R0-R2 Restore registers
Return Return to calling program
Stack Frame
SP (Stack pointer) Saved [R1]
Saved [R0]
-12(FP) localvar3
-8(FP) localvar2
-4(FP) localvar1
FP (Frame pointer) saved [FP] Stack frame
Return address
8(FP) param1
12(FP) param2
16(FP) param3
20(FP) param4