INTRODUCTION TO COMPUTER
ARCHITECTURE
BASIC CONCEPTS OF COMPUTER ARCHITECTURE
Computer Architecture is the design
of
computers, including their instruction sets,
hardware components, and system
organization.
It refers to the understanding of the
components
that
Moremake up the computer and
specifically, the way refers
architecture they
to
are
the attributes of the system that are visible to
interconnected.
the
programmer those attributes that have a
direct
- Instruction Sets
impact on the execution of a program.
- Data Representation
- Addressing
- I/O
1
On the other hand, Computer Organization is
the underlying implementation of the architecture
which is transparent to the programmer. An
architecture can have a number of organizational
implementations:
- Control Signals
- Technologies
- Device Implementations
Most computers follow the Von Neumann
Architecture. It is also known as the Stored
Program Architecture or the Fetch-Decode-
Execute Architecture.
A computer follows the Von Neumann
Architecture if it meets the following criteria:
1. It has three basic hardware subsystems:
a CPU, a main memory system, and an
I/O system.
2. It is a storedprogram computer.
Programs (together with data) are stored
in main memory during execution.
3. It carries out instructions sequentially.
4. It has, or at least appears to have, a
single path between the main memory
and the control unit of the CPU.
2
ARCHITECTURAL CLASSIFICATION SCHEMES
Flynns Classification of Computers (in terms
of multiplicity of instruction-data streams) is the
most universally accepted method of classifying
computers.
Definitions of Terms:
1. Instruction Stream (IS) a sequence of
instructions as executed by a machine.
2. Data Stream (DS) a sequence of data
including input, partial, or temporary
results, called for by the instruction
stream.
Both instructions and data are fetched from the
memory units (MU). Instructions are decoded by
the control unit (CU), which sends the decoded
instruction stream to the processor unit (PU) for
execution.
Any computer can be placed in one of four broad
categories:
1. SISD (Single Instruction Stream over a
Single Data stream)
2. SIMD (Single Instruction Stream over a
Multiple Data stream)
3. MIMD (Multiple Instruction Stream over
a Multiple Data stream)
4. MISD (Multiple Instruction Stream over
a Single Data stream)
3
SISD (Single Instruction Stream over a Single
Data stream)
An SISD machine is a conventional sequential
machine (Von Neumann). A program executed by
the processor constitutes the single instruction
stream, and the sequence of data items that it
operates on constitutes the single data stream.
IS
IS DS
CU PU MU
I/O
Instructions are executed sequentially but may be
overlapped in their execution stages (pipelining).
Most SISD uniprocessor systems are pipelined.
4
SIMD (Single Instruction Stream over a Multiple
Data stream)
A single stream of instructions is broadcast to a
number of processors. Each processor operates
on its own data. The multiple data streams are
the sequences of data items accessed by the
individual processors in their own memories.
In other words, an SIMD computer has several
processors running the same program in lockstep
but each operating on different sets of data. This
type of processing is also called array
processing.
DS DS
PU1 LM1
. . data
IS sets
CU IS . . loaded
program
is . . from
hosts
loaded DS DS
from host PUn LMn
5
MIMD (Multiple Instruction Stream over a
Multiple Data stream)
These are the parallel computers (multiprocessor
and multiple computer systems). They involve a
number of independent processors, each
executing a different program and accessing its
own sequence of data items (or the same program
and the same data but not in lockstep as in SIMD
machines).
IS
IS DS
CU1 PU1
I/O
. .
. . Shared
Memory
. .
I/O IS DS
CUn PUn
IS
6
MISD (Multiple Instruction Stream over a Single
Data stream)
A common data structure is manipulated by
separate processors, and each executes a
different program.
This is also known as systolic arrays for
pipelined execution of specific algorithms.
This form of computation does not arise often in
practice.
IS
.. . IS
CU1 CU2 .. . CUn
Memory
(Program IS IS IS
and
Data) DS DS DS DS
PU1 PU2 .. . PUn
I/O
7
SYSTEM ATTRIBUTES TO PERFORMANCE
The ideal performance of a computer system
demands a perfect match between machine
capability and program behavior.
Machine capability can be enhanced with better
hardware technology, innovative architectural
features, and efficient resource management.
Program behavior is affected by algorithm design,
data structures, language efficiency, programmer
skill, and compiler technology.
The simplest measure of program performance is
the turnaround time (the interval from the time
of submission to the time of completion. It is the
sum of the periods spent for disk and memory
accesses, I/O activities, compilation time, OS
overhead, and CPU time). In order to reduce
turnaround time, one must reduce all these time
factors.
In a multiprogrammed computer, the I/O and
system overheads of a given program may overlap
with the CPU times in other programs. Therefore,
it is fair to compare just the total CPU time
needed for program execution.
8
The CPU of todays modern digital computer is
driven by a clock with a constant clock rate or
clock frequency (f in megahertz). The inverse of
the clock rate is the period or cycle time ( = 1/f
in seconds).
The size of the program is determined by its
Instruction Count c(I ), in terms of the number of
machine instructions to be executed in the
program.
Different machine instructions may require
different numbers of clock cycles to execute.
Example:
For the Intel microprocessors, the MOV
instruction (register to register) takes 2 cycles
to execute. The MOV instruction (memory to
register) takes 8 cycles to execute. While the
SHR instruction takes 4 cycles to execute.
Therefore, the cycles per instruction (CPI)
becomes an important parameter for measuring
the time needed to execute each instruction.
For a given instruction set, the average CPI over
all instruction types can be computed.
9
The CPU Time (T in seconds/program) needed to
execute the program is estimated by finding the
product of the three contributing factors:
CPU Time (T) = I CPI
c
Example 1:
A 40-MHz processor was used to execute a
program with 50,000 instructions. The average
CPI is estimated to be 3.5 cycles/instruction.
Calculate the total execution time.
Solution:
1 1
25 ns
f 40106
CPUTime (T) I CPI
c
9
500003.5
2510
4.375 ms
10
Example 2:
A 40-MHz processor was used to execute a
benchmark program with the following
instruction mix and clock cycle counts:
Instruction Instruction Clock Cycle
Type Count Count
Integer 45,000 1
Arithmetic
Data Transfer 32,000 2
Floating Point 15,000 2
Control 8,000 2
Transfer
Determine the effective CPI and execution time for
this program.
Solution:
1 1
25 ns
f 40106
TotalCycles
450001 3
2000 2 1
5000 2 8
000 2
TotalCycles 155,000
cycles
11
Total Number of
CPI Cycles
Total Number of
Instructions
155000
45000 32000 15000
8000
155000
45000 32000 15000
8000
155000
100000
1.55cycles/instructi
on
CPUTime (T) I CPI
c
9
1000001.55
2510
3.875 ms
12
The execution of an instruction requires going
through a cycle of events involving instruction
fetch, decode, operand(s) fetch, execution, and
store results.
Only the instruction decode and execution phases
are carried out in the CPU. The remaining three
operations may be required to access memory.
Memory cycle is defined as the time needed to
complete one memory reference (read or write).
Usually, a memory cycle is k times the processor
cycle . The value of k depends on the speed of
the memory technology and processor-memory
interconnection scheme used.
The CPI of an instruction can be divided into two
component terms corresponding to the total
processor cycles and memory cycles needed to
complete the execution of the instruction.
CPU Time (T) = I (p + m k)
c
where:
p is the number of processor cycles
needed for the instruction decode and
execute
m is the number of memory references
needed
k is the ratio between memory cycle
and processor cycle
13
Introduction to Computer Architecture
MIPS Rate
The processor speed is often measured in terms
of million instructions per second.
Let C be the total number of clock pulses or
cycles needed to execute a given program.
C I CPI
c
CPUTime (T) I CPI
c
C
C
f
The equation for the MIPS rate is:
Ic
MIPS
T106
Since T I CPI , then a second equation for
c
the MIPS rate can be derived as:
f
MIPS
CPI106
14
Introduction to Computer Architecture
Since CPI C/I , then a third equation for the
c
MIPS rate can be derived as:
f Ic
MIPS
C106
Example 2:
A 40-MHz processor was used to execute a
benchmark program with the following
instruction mix and clock cycle counts:
Instruction Instruction Clock Cycle
Type Count Count
Integer 45,000 1
Arithmetic
Data Transfer 32,000 2
Floating Point 15,000 2
Control 8,000 2
Transfer
Determine the MIPS rate of the system.
Solution:
From the previous example:
Ic = 100,000 instructions
T = 3.875 ms
I 100000
MIPS c 25.81MIPS
6 3
T10 3.87510
106
15