Understanding Computer Architecture Basics
Understanding Computer Architecture Basics
Page-10
project. lower-level details are hidden to offer a simpler
Computer architects must model at higher levels.
anticipate this rapid iii) Make the
change. common case
ii) Use Abstraction to fast
Simplify Design Making the common case fast will tend to
Both computer architects enhance performance better than
and programmers had to optimizing the rare case.
invent techniques to The common case is often simpler than the rare
make themselves more case and it is often easier to enhance
productive.
Common case fast is only possible with careful
A major productivity experimentation and measurement.
technique for hardware
and soft ware is to use iv) Performance via parallelism
abstractions to represent Computer architects have offered designs that get
the design at different more performance by performing operations in
levels of representation; parallel.
Page-11
v) Performance via individuals with buckets running back and
pipelining forth.
Pipelining is an vi) Performance
implementation via prediction
technique in which The next great idea is prediction.
multiple instructions
are overlapped in In some cases it can be faster on average to
execution. Pipelining guess and start working rather than wait until
improves performance you know for sure.
by increasing
This mechanism to recover from a
instruction
misprediction is not too expensive
throughput.
the prediction is relatively accurate.
For example, before fire
engines, a human chain vii) Hierarchy of memories
can carry a water Programmers want memory to be fast, large, and
source to fire much cheap.
more quickly than
Page-12
Memory speed often
shapes performance
Capacity limits the size of
problems that can be
solved.
The cost of memory today
is often the majority of
computer cost.
Page-13
Architects have found that they these conflicting
demands with a hierarchy of memories the fastest,
smallest, and most expensive memory per bit is at
the top of the hierarchy the slowest, largest, and
cheapest
per bit is at the bottom.
Caches give the programmer the illusion that
main memory is nearly as fast as the top of
the hierarchy and nearly as big and cheap as
the bottom of the hierarchy.
After being sliced from the silicon ingot, blank wafers are put through 20 to 40 steps to
create patterned wafers.
These patterned wafers are then tested with a wafer tester, and a map of the good
parts is made. Then, the wafers are diced into dies. The good dies are then bonded into
packages and tested one more time before shipping the packaged parts to customers.
Cost of an IC is found from
Cost per die= (cost per wafer) / ((dies per wafer)*yield) Yield refers the
fraction of dies that pass testing.
Dies / wafer= wafer area / die area
Yield=1 / (1 + (defects per area * die area)/2 )2
Programmable Logic Device (PLD)
A programmable logic device (PLD) is an electronic component used to build
reconfigurable digital circuits. Unlike a logic gate, which has a fixed function, a
PLD has an undefined function at the time of manufacture. Before the PLD can
be used in a circuit it must be programmed, that is, reconfigured.
The major limitations of PLD:
Consume space due to large number of switches for programmability
Low speed due to the presence of many switches.
Programmable Logic Device
Custom chips
An Application-Specific Integrated Circuit (ASIC) is an integrated circuit (IC) customized
for a particular use, rather than intended for general-purpose use. Application-Specific
Standard Products (ASSPs) are intermediate between ASICs and industry standard
integrated circuits.
Topic
Performance
The most important measure of a computer is how quickly it can execute programs.
Performance of computer depends on many things such as modern software
system and wide range of performance improvement techniques in hardware side.
In order to get the best performance it is required to design the compiler, machine
instruction set and hardware in a coordinated manner.
Performance Metrics
Response time
The time between the start and the completion of a task (in time units).
It is defined as the total time required for the computers to complete a task, including
disk access, memory accesses, I/O activities, operating system over head, CPU
execution time and so on.
Throughput
The total amount of tasks done in a given time period (in number of tasks per unit of time)
iii) Performance
No. of events occurring per unit of time
Response time, throughput and performance are all closely related to each other.
Page-19
To maximize performance, we want to minimize response time
Execution time is the reciprocal of performance.
Lower execution time implies higher performance
Performance = 1/ Execution time
Consider two computers X & Y, if the performance of X is greater than the
performance of Y, then we have
PerformanceX > PerformanceY
1 > 1
Execution timeX Execution timeY
Example:
If computer A runs a program in 10 seconds and computer B runs the same program in 15
seconds, how much faster is A than B?
Solution
Execution time of A = 10 sec
Execution time of B = 15 sec
A is n times faster than B if
Page-22
User CPU time
The CPU time spent in a program itself.
System CPU time
The CPU time spent in the operating system performing tasks on behalf of the program.
CPU time (or CPU Execution time) is the time between the start and the end
of execution of a given program.
CPU time is a true measure of processor/memory performance
CPU time depends on the program which is executed, including
i) The number of instructions executed,
ii) Types of instructions executed and their frequency of usage
(OR)
1 Hz = 1 X
To run the program in 6 seconds, B must have twice the clock rate of A.
Instruction Performance
The compiler clearly generated instructions to execute
The execution time must depend on the number of instructions in a program.
Execution time is equals the number of instructions executed multiplied by the
average time per instruction.
The number of clock cycles required for a program can be written as
600 x I ps
= 1.2
500 x I ps
Conclude that computer A is 1.2 times as fast as computer B for this program.
Comparing Code Segments
A compiler designer is trying to decide between two code sequences for a particular computer. The
hardware designers have supplied the following facts:
For a particular high-level language statement, the compiler writer is considering two code
sequences that require the following instruction counts:
Which code sequence executes the most instructions? Which will be faster? What is the CPI for each
sequence?
Solution
Given CPI for A, B and C
Instruction Count for sequence 1 & 2
Sequence 1 executes = 2 + 1 + 2 = 5 instructions.
Sequence 2 executes = 4 + 1 + 1 = 6 instructions.
Therefore, sequence 1 executes fewer
instructions Calculate CPU clock cycles for each
sequence
CPU clock cycles1 = (CA x CPIA ) + (CB x CPIB ) + (CC x CPIC)
= (2 x 1) + (1 x 2) + (2 x 3)
=2+2+6
CPU clock cycles1= 10 cycles
CPU clock cycles2 = (CA x CPIA ) + (CB x CPIB ) + (CC x CPIC)
= (4 x 1) + (1 x 2) + (1 x 3)
=4+2+3
CPU clock cycles2 = 9 cycles
So code sequence 2 is faster, even though it executes one extra instruction
The CPI values can be computed by
Topic
When an instruction with two operands uses immediate addressing, the first operand may
be a register or memory location and the second operand is an immediate constant.
Example
ADD r2,r0,#5
ii) Register Addressing
In this addressing mode, the operand is a register.
Depending upon the instruction, the register may be the first operand, second operand
or both.
Example
ADD r2, r0, r1
iii) Direct Addressing Mode
The direct addressing mode is also known as Absolute Addressing mode.
The instruction contains the address of the location in memory where the value of the
operand is stored.
Here, the effective address is the address of memory location.
EA = A
Example
Add R2, A
Store R2, B
The Add instruction includes the memory location A which has the value to be added to the
content of register R2
iv) Index Addressing Mode
Index addressing mode is helpful when the instructions in the program are accessing
the array or the large range of memory addresses.
In this mode, the effective address is generated by adding a constant to the register’s
content. The content of the register does not change.
The effective address is denoted by
EA = X + (R)
Example
Load R2, A
Load R3, (R2)
Load R4, 4(R2)
Load R5, 8(R2)
Load R6, 12(R2)
The above instructions will load the register
R3, R4, R5, R6 with the contents, present at
the successive memory addresses from
memory location A correspondingly.
v) Register Indirect Addressing mwowdwe.
A processor register is used to hold the address of a memory location where the operand is placed.
This addressing mode allows executing the same set of instructions for the different memory
location.
In higher-level language, it is referred to as pointers. The indirect mode is denoted by placing the
register inside the parenthesis. Here the effective address is the content of memory location
present in the register.
The effective address is the content of
memory location present in the register.
EA=(R)
Example
Load R3, (R2) // Load R2, A
The Load instruction above will load the
value present at the memory location
contained by register R2 in the register
R3.
a* b 1000
vi) 1000 addressing10
PC-relative
Page-41
The branch address is the sum of the PC and a constant in the instruction
Page-42
viii) Auto Decrement Addressing
It is just opposite of auto-increment mode.
In auto decrement mode the content of the register is decremented initially and then
the decremented content of the register is used as effective address.
Symbolically it is presented as:
-(R)
The auto-increment and decrement mode help to implement the stack structure
ix) Base or displacement addressing
The operand is at the memory location whose address is the sum of a register and a
constant in the instruction.
A very powerful mode of addressing combines the capabilities of direct addressing and
register indirect addressing
EA = A + (R)
Topic
In MIPS, data must be in registers to perform arithmetic, register $zero always equals 0, and
register $at is reserved by the assembler to handle large constants.
Accessed only by data transfer instructions. MIPS uses byte addresses, so sequential word
addresses differ by 4. Memory holds data structures, arrays, and spilled registers.
Design Principles
Hardware technology has three design principles that will give the detailed information
about fixed and variable number of operands to perform any operation.
Design Principle 1: Simplicity favors regularity
Consider the following examples to understand the concept.
Example 1: Compiling Two C Assignment Statements into MIPS
This segment of a C program contains the five variables a, b, c, d, and e.
a = b + c;
d = a – e;
The translation from C to MIPS assembly language instructions is performed by the
compiler.
Solution A MIPS instruction operates on two source operands
and places the result in one destination operand.
Hence, the two simple statements above compile
directly into these two MIPS assembly language
instructions.
a
d
d
a
,
b
,
s
Page-50
ub d, a, e t complex statement
Exampl contains the five variables
e 2: f, g, h, i, and j: f = (g + h) – (i
Compili + j);
ng a What might a C compiler produce?
Comple Solution
xC The compiler must break this statement into several
Assignm assembly instructions, since only one operation is
ent into performed per MIPS instruction. The first MIPS
MIPS instruction calculates the sum of g and h.
add t0, g, h #
s temporary
o variable t0
m contains g + h add
e t1, i, j #
w temporary
h variable t1
a contains i + j sub f,
Page-51
t
0
,
t
1
g
e
t
s
t
0
Page-52
Page-53
Operands of the Computer Hardware
Operand is a variable used to perform any kind of operations. Usage of operation is varied between
one language into another language.
The operands of arithmetic instructions are restricted. They must be from a limited number of
special locations built directly in hardware called registers.
Registers are primitives used in hardware design that are also visible to the programmer when the
computer is completed.
The size of a register in the MIPS architecture is 32 bits; groups of 32 bits are called word.
Word
A word is a unit of data of a defined bit length that can be addressed and moved between storage
and the computer processor.
The natural unit of access in a computer, usually a group of 32 bits; corresponds to the size of
a register in the MIPS architecture.
The reason for the limit of 32 registers
Design Principle 2: Smaller is faster
Registers are the fastest place to hold data in a computer. A very large number of registers may
increase the clock cycle time simply because it takes electronic signals longer to travel it.
Example 1: Compiling a C Assignment
Consider the assignment statement
f = (g + h) – (i + j);
The variables f, g, h, i, and j are assigned to the registers $s0, $s1, $s2, $s3, and $s4, respectively.
What is the compiled MIPS code?
Solution
The compiled program is very similar to the prior example, except replace the variables with the
register names mentioned above plus two temporary registers, $t0 and $t1, which correspond
to the temporary variables above:
add $t0,$s1,$s2 # register $t0 contains g + h
add $t1,$s3,$s4 # register $t1 contains i + j
sub $s0,$t0,$t1 # f gets $t0 – $t1, which is (g + h)–(i + j)
Memory Operands
Programming languages have simple variables that contain single data elements.
But some times variables are having more complex data structures like arrays and structures.
These complex data structures can contain large amount of data elements than the registers
in a computer.
The processor can keep only a small amount of data in registers, but computer memory contains
billions of data elements. Hence, data structures (arrays and structures) are kept in memory.
Arithmetic operations occur only on registers in MIPS instructions; thus, MIPS must include
instructions that transfer data between memory and registers.
Data transfer instruction
A command that moves data between memory and registers.
Address
To access a word in memory, the instruction must supply the memory address.
A value used to describe the location of a specific data element within a memory array.
Memory
Memory is just a large, single-dimensional array, with the address acting as the index to that array,
starting at 0.
The address of the third data element is 2, and the value of Memory [2] is 10
The data transfer instruction that copies data from memory to a register is traditionally called load.
The format of the load instruction is the name of the operation followed by the register to
be loaded, then a constant and register used to access memory.
The actual MIPS name for this instruction is lw, standing for load word
Example 1: Compiling an Assignment When an Operand Is in Memory
Assume that A is an array of 100 words and that the compiler has associated the variables g and h
with the registers $s1 and $s2 as before.
Let’s also assume that the starting address, or base address, of the array is in $s3. Compile this C
assignment statement.
g = h + A[8];
Solution
There is a single operation in this assignment statement, one of the operands is in memory, so we
must first transfer A[8] to a register.
The address of this array element is the sum of the base of the array A, found in register $s3, plus
the number to select element 8.
The data should be placed in a temporary register for use in the next instruction. The first
compiled instruction is
The instruction must add h (contained in $s2) to A[8] (contained in $t0) and put the sum
in the register corresponding to g (associated with $s1):
The constant in a data transfer instruction (8) is called the off set, and the register added to
form the address ($s3) is called the base register.
In MIPS, words must start at addresses that are multiples of 4. This requirement is
called an alignment restriction, and many architectures have it.
Byte addressing also affects the array index. To get the proper byte address in the code
above, the off set to be added to the base register $s3 must be 4 8, or 32
The instruction complementary to load is traditionally called store; it copies data from a register to
memory.
The format of a store is similar to that of a load: the name of the operation, followed by the
register to be stored, then offset to select the array element, and finally the base register.
The actual MIPS name is sw, standing for store word.
Example: Compiling Using Load and Store
Assume variable h is associated with register $s2 and the base address of the array A is in $s3. What is
the MIPS assembly code for the C assignment statement below?
A[12] = h + A[8];
Solution
lw $t0, 32($s3) # Temporary reg $t0 gets A[8]
add $t0,$s2,$t0 # Temporary reg $t0 gets h + A[8]
sw $t0,48($s3) # Stores h + A[8] back into A[12]
Load word and store word are the instructions that copy words between memory and registers in the
MIPS architecture
Spilling registers
Many programs have more variables than computers
have registers. Consequently, the compiler tries to
keep the most frequently used variables in registers
and places the rest in memory, using loads and stores
to move variables between registers and memory. The
process of putting less commonly used variables (or
those needed later) into memory is called spilling
registers.
Constant or Immediate Operands
Many times a program will use a constant in an
operation—for example, incrementing an index to
point to the next element of an array. In fact, more
than half of the MIPS arithmetic instructions have a
constant as an operand.
Quick add instruction with one constant operand is
called add immediate or addi.
To add 4 to
register $s3, we
just write
addi
$s3,$s3,
4
$s3 + 4
Topic
Representing
instructions
Introduction
An instruction is an order given to a computer processor by a computer program.
Each instruction is a sequence of 0’s and 1’s that describes a physical operation
Instructions are kept in the computer as a series of high and low electronic signals
and may be represented as numbers.
Each piece of an instruction can be considered as an individual number, and
placing these numbers side by side forms the instruction.
Instruction Format
A form of representation of an instruction composed of fields of binary numbers.
Used to represent instructions specified by particular
language. Machine Language
All MIPS instructions are 32 bits long so we need to focus some numeric version of
instruction called machine language.
It is a binary representation used for communication within a computer system.
Instructions used in machine language are called machine code.
Hexadecimal Numbers
All computer data sizes are multiplies of 4 in that hexadecimal numbers are popular.
Base value of hexadecimal number is 16 and it is the power of 2
MIPS Fields
Design Principle 3: Good design demands good compromises
Different kinds of instruction formats for different kinds of instructions.
MIPS fields have two kinds of format such as
i) R - type or R – format (registers)
It is data processing (DP) instruction format
The first and last fields (containing 0 and 32 in this case) in combination tell the MIPS
computer that this instruction performs addition.
The second field gives the number of the register that is the first source operand of the addition
operation (17 $s1).
The third field gives the other source operand for the addition (18 $s2).
The fourth field contains the number of the register that is to receive the sum (8 $t0).
The fifth field is unused in this instruction, so it is set to 0.
In MIPS assembly language,
Registers $s0 to $s7 map onto registers 16 to 23, and registers $t0 to $t7 map onto registers 8
to 15. Hence, $s0 means register 16, $s1 means register 17, $s2 means register 18 and so on.
$t0 means register 8, $t1 means register 9, and so on.
This instruction can also be numbers as opposed to decimal:
The 16-bit address means a load word instruction can load any word within a region of
±215 or 32,768 bytes of the address in the base register rs.
Similarly, add immediate is limited to constants no larger than ± 215.
8 19 8 32
In a load word instruction, the rt field specifies the destination register, which receives the
result of the load.
The numbers used in each field for the MIPS instructions
Logical
Operations
Introduction
Some operation can be executed in computer using single bit of information is called as
logical operation.
Logical operation is an instruction in which the quantity being operated on bit and the
results of the operation can each have two values ( 0 & 1)
Example
Register $t2 contains
0000 0000 0000 0000 0000 1101 1100 0000two
Then, after executing the MIPS instruction
mvn $t0, $t2 # reg
$t0 = ~ reg $t2
Page-76
16 = 144ten
0000 0000 0000 ii) Logical
0000 0000 0000 Shift Right(
0010 0100 LSL)
shift Moves all the bits in a word to the right side and filling the
0000 0000 emptied bits with 0’s
0000 0000 For example, if register $s0 contained
0000 0000 0000 0000 0000 0000 0000 0000 0000 1000two = 8ten
0100 1000 Shifting right by i bits gives the same The instruction to
-> 3rd Shift shift right by 2 was executed re
i
by 2
0000 0000
0000 0000 0000 0000 0000 0000 0000 1000 Given
0000 0000 i=2 (shift)
0000 0000 0000 0000 0000 0000 0000 0000 0000 0100 -> 1st shift
1001 0000 2i = 22 = 2
4th Shift x2=4
The new value would be: 0000 0000 0000 0000 0000 0000 0000 0010 -> 2nd shift
0000 0000 0000 0000 8/4 =2
0000 0000 1001 0000two The new value would be:
Page-77
0000 0000 0000 0000 reg $t2 = reg $s0 << 4 bits
Page-78
Topic
Control
Operations
Introduction
The main aspects that distinguishes a computer from a
simple calculator is its ability to make
decisions.
Based on the input data and the values created during
computation, different instructions are executed.
Decision making is commonly represented in
programming languages using the if statement,
sometimes combined with go to statements and labels.
MIPS assembly language includes many decision-
making instructions, similar to an if statement with a go
to.
Conditional Branches
An instruction that requires the comparison of two
values and that allows for a subsequent
transfers of control to a new address in the program based
on the outcome of the comparison.
In MIPS assembly language, beq and bne are traditionally
called as conditional branches.
Page-66
beq instruction means go to the statement labeled L1 if the
beq stands value in register1 equals the value in register2
for branch if bne
equal. bne stands for branch
The format of if not equal
these The format is
instruction is bne
be register1,
q register2,
re L1
gis It means go to the statement labeled L1 if the value in
ter register1 does not equal the value in register2.
1, Example: Compiling if-then-else into Conditional Branches
re In the following code segment, f, g, h, i, and j are variables. If
gis the five variables f through j correspond to the five registers
ter $s0 through $s4, what is the compiled MIPS code for this C if
2, statement?
L1 if (i == j) f = g + h; else f = g – h;
This
Page-67
Solu ps
tion
jum
p
Loo
Page-68
Decisions are important both for choosing between two
b Els # go to Else if i ≠
alternatives—found in if statements—and for iterating a
add computation—found in loops.
n $s0,$s1,$s2
Looping statements are used to execute the same task
e more than one time until certain condition
(skipped if i gets failed.
≠ j) For looping also we need to perform some decisions to
$
j Exit obtain the task.
s Exit Example: Compiling a while Loop in C
Uncondition
3 al branch Here is a traditional loop in C:
w
, called as h
i
$ Else: sub
l
$s0,$s1,$s2
s e
4 (skipped if i (
= j) s
, Exit: a
Page-69
v correspond to registers $s3 and $s5 and the base of the array
e save is in $s6. What is the MIPS assembly code corresponding
[ to this C segment?
i S
] o
l
=
u
=
t
k i
) o
n
i The first step is to load save[i] into a temporary register
Multiply the index i by 4 due to the byte addressing
+ problem.
= S
1 hifti
; ng
Assume that left
i and k by 2
Page-70
b 2 or
i 4
t Loop
s : sll
$t1,$
m s3,2 #
u
l Tem
t p reg
i $t1 =
p i*4
l add $t1,$t1,$s6 # $t1 = address of save[i] // To get the
i address of save[i], we need to
e add $t1 and the base
s of save in $s6:
b lw $t0,0($t1) # Temp reg $t0 = save[i] // use that
y address to load save[i] into a
temporary register
2 bne $t0,$s5, Exit # go to Exit if save[i] ≠ k //
Page-71
instruction ic
performs the b
loop test l
addi o
$s3,$s3,1 #i= c
i+1 // k
instruction A sequence of instructions without branches (except possibly
adds 1 to i: at the end) and without branch targets
j Loop # go to or branch labels (except possibly at the beginning).
Loop // Comparison Instruction
The end of The test for equality or inequality is probably the most
the loop popular test, but sometimes it is useful to see if
branches a variable is less than another variable.
back to the MIPS compilers uses the
while test at following to create all
loop relative conditions slt –
Exit: set less than
B a sle – set less than equal
s slti – set less than immediate
Page-72
sgt – set t
greater than h
s a
g n
e
e
– q
u
s a
e l
t
s
g n
r e
e
a -
t
e s
r e
t
Page-73
n $s3, $s4 # $t0 = 1 if $s3 < $s4
o Means that register $t0 is set to 1 if the value in register
t $s3 is less than the value in register $s4; otherwise, register
$t0 is set to 0.
e slti $t0,$s2,10 # $t0 = 1 if $s2 < 10
q To test if register $s2 is less than the constant 10
u
Case/Switc
a
h
l
Statement
E Most programming languages have a case or switch
x statement that allows the programmer to select one of many
a alternatives depending on a single value.
m switch statement can be implemented in two ways.
p
l i) Using a chain of if-then-else statements
e ii) Using jump address table
slt
$t Jump table or Jump address table
0, It is a table of addresses of alternative instruction
Page-74
sequence program loads the appropriate entry from the jump table
s into a register. It then needs to jump
The using the address in the register.
jump To support such situations, MIPS include a jump register
table is instruction (jr), meaning an unconditional jump to the
then just address specified in a register. Then it jumps to the
an array proper address using this instruction.
of words Addressing in
containi Branches and Jumps
ng The MIPS jump instructions have the simplest addressing.
address They use the final MIPS instruction format, called the J-type,
es that which consists of 6 bits for the operation field and the rest of
corresp the bits for the address field.
ond to
labels in
the
code.
The
Page-75
E
x
a j 10000 #
mgo to
p location
l 10000
e
bne
$s0,$s1,Exit
# go to Exit
if $s0 ≠ $s1
Page-76