CS6461 Computer Architecture
Fall 2016
Instructor Morris Lancaster
Adapted from Professor Stephen Kaislers Slides
.
Lecture 2 - Basic System Design
Hierarchical System Architecture
10/7/2017 CS6461 Computer Architecture - 2014 2
Dept. of Computer Science
Technology Trends
Processor
logic capacity: 2 x increase in performance every 1.5 - 2 years;
clock rate: about 25% per year
overall performance: 1000 x in last decade
Main Memory
DRAM capacity: 2 x every 2 years; 1000 x size in last decade
memory speed: about 10% per year
cost / bit: improves about 25% per year
Disk
capacity: > 2 x increase in capacity every 1.5 years
cost / bit: improves about 60% per year
120 x capacity in last decade
Disk architecture not much different than IBMs 10 MByte disks of the
early 1980s
Network Bandwidth
Bandwidth: 1 Gbit/s standard to the desktop in many places
Bandwidth: Probably 1 Tbit/s b end of decade, but may require new
infrastructure
10/7/2017 CS6461 Computer Architecture - 2014 3
Dept. of Computer Science
Intel Processor Evolution
10/7/2017 CS6461 Computer Architecture - 2014 4
Dept. of Computer Science
Processor Clock Speed
10/7/2017 CS6461 Computer Architecture - 2014 5
Dept. of Computer Science
Cost Per GFLOP
10/7/2017 CS6461 Computer Architecture - 2014 6
Dept. of Computer Science
# Servers Comprising WWW
10/7/2017 CS6461 Computer Architecture - 2014 7
Dept. of Computer Science
Technology Progress
Growth factors: 45%
40%
35%
Transistors/Chips
Transistors/chip: 30%
since 1971
>100,000 since 1971 25%
Disk Density since 1956
Disk density: 20%
>100,000,000 since 1956 15% Disk Speed since 1956
Disk speed: 10%
12.5 since 1956 5%
0%
Compound Annual Growth Rate
The disk speed barrier
dominates everything!
10/7/2017 CS6461 Computer Architecture - 2014 8
Dept. of Computer Science
The 1,000,000:1 disk-speed barrier
RAM access times ~5-7.5 nanoseconds
CPU clock speed <1 nanosecond
Interprocessor communication can be ~1,000X slower than
on-chip
Disk seek times ~2.5-3 milliseconds
Limit = rotation
i.e., 1/30,000 minutes
i.e., 1/500 seconds = 2 ms
Tiering brings it closer to ~1,000:1 in practice, but even so
the difference is VERY BIG
10/7/2017 CS6461 Computer Architecture - 2014 9
Dept. of Computer Science
State of the Art
State-of-the-art PC (on your desk) now:
Processor clock speed: ~4 GigaHertz
Memory capacity: 2 to 8 GigaBytes (Windows 7 limits to 8
GBytes; Windows 8 limits to 128 GBytes on x64 )
Disk capacity: 1 TByte for <$79; 2 TBytes for <$129
Wow!!
In five years, we will need new units!
Mega -> Giga -> Tera -> Peta -> Exa (Big Data!)
10/7/2017 CS6461 Computer Architecture - 2014 10
Dept. of Computer Science
Intel 4004 Die Photo
(2250 transistors, 12 mm2, 108 KHz, 1970)
10/7/2017 CS6461 Computer Architecture - 2014 11
Dept. of Computer Science
Intel 80486 Die Photo
(1,200,000 transistors, 81 mm2, 25 MHz, 1989)
10/7/2017 CS6461 Computer Architecture - 2014 12
Dept. of Computer Science
Pentium Die Photo
(3,100,000 transistors; 296 mm2; 60 MHz, 1993)
10/7/2017 CS6461 Computer Architecture - 2014 13
Dept. of Computer Science
I/O System Side
Each bus and adapter has its own specifications.
Interfaces are where the problems are - between functional units and
between the computer and the outside world
Need to design against constraints of performance, power, area and cost
10/7/2017 CS6461 Computer Architecture - 2014 14
Dept. of Computer Science
Issues
Performance:
the key to computing for most intensive problems
whats the secret? TIME, TIME, TIME
analogy to Real Estate: Location, Location, Location
Response Time:
How long does it take for my job/program to run?
How long does it take to execute my job/program?
[NOTE: These are not equivalent. Why not?]
How long must I wait for a database query?
Throughput:
How many jobs can the machine run at once?
What is the average execution rate?
How much work is getting done?
How long does it take to handle an interrupt?
Execution Times:
Elapsed Time: counts everything, disk and memory accesses, I/O waits, etc.
Sometimes, a useful number, but not good for comparison purposes
CPU Time: counts instruction execution times, but not I/O time; basis for
MIPS/MFLOPS; often divided into system time and user time
Q? What are MIPS and MFLOPS good measures of, if anything?
10/7/2017 CS6461 Computer Architecture - 2014 15
Dept. of Computer Science
Lets start to design the machine for the CS211 CISC Computer!
Reset
Init
Initialize
Machine
Fetch
Instr.
Branch Load/ XEQ
Store Instr.
Register-
to-Register
Branch Branch
Taken Not Taken
Incr.
PC
10/7/2017 CS6461 Computer Architecture - 2014 16
Dept. of Computer Science
Analyze LDR/STR Instructions
From our analysis of LDR/LDA/STR instructions, what do
we know?
Memory Address Register (MAR)
Memory Buffer Register (MBR)
Program Counter (PC)
4 GPRs (given)
Instruction Register (IR)
Register Select Register (RSR)
Instruction Operation Register (Opcode)
10/7/2017 CS6461 Computer Architecture - 2014 17
Dept. of Computer Science
How do these hook together?
How many
Memory
Registers do I need to access RF?
See Mul/Div instructions
MAR MBR How do I
Hook in the
IR OpCode Index Registers?
X1
R0
X2
R1
RFI
X3
R2
R3
ALU
Carry
PC
Condition Codes
10/7/2017 CS6461 Computer Architecture - 2014 18
Dept. of Computer Science
Execution Structure
R0 IR
R1
R2 MUX MUX
MBR
R3
ALU PC
Control
Data1 Data2 Carry Data1 Data2 Count Data1
Arithmetic Shifter
Logical Unit
Unit
Carry
Opcode
LRR ARR SRR
ALU-Result
xRR = result registers, hold result of operation for store on next cycle
10/7/2017 CS6461 Computer Architecture - 2014 19
Dept. of Computer Science
Comments on Multiplexors
Both the arithmetic unit and the logic unit are active
and produce outputs.
The mux determines whether the final result comes from the
arithmetic or logic unit.
The output of the other one is effectively ignored.
Our hardware scheme may seem like wasted effort, but
its not really.
Deactivating one or the other wouldnt save that much time.
We have to build hardware for both units anyway, so we might
as well run them together.
This is a very common use of multiplexers in logic
design.
10/7/2017 CS6461 Computer Architecture - 2014 20
Dept. of Computer Science
Shifter
A shifter is most useful for arithmetic operations since
shifting is equivalent to multiplication by powers of two.
Shifting is necessary, for example, during floating point operation
arithmetic.
The simplest shifter is the shift register, which can shift
by one position per clock cycle.
So, the number of shifts equals the number of clock
cycles consumed.
Barrel shifter allows rotations as well
10/7/2017 CS6461 Computer Architecture - 2014 21
Dept. of Computer Science
Adder
The adder is probably the most studied digital circuit.
There are a great many ways to perform binary addition, each
with its own area/delay trade-offs.
Adder delay is dominated by carry chain.
Full Adder:
Computes one-bit sum, carry:
si = ai XOR bi XOR ci
ci+1 = aibi + aici + bici
10/7/2017 CS6461 Computer Architecture - 2014 22
Dept. of Computer Science
Instruction Path
Program Counter (PC)
Keeps track of program execution
Address of next instruction to read from memory
May have auto-increment feature or use ALU
Instruction Register (IR)
Current instruction
Includes ALU operation and address of operand
Also holds target of jump instruction
Immediate operands
Relationship to Data Path
PC may be incremented through ALU or separate adder
Contents of IR may also be required as input to ALU
10/7/2017 CS6461 Computer Architecture - 2014 23
Dept. of Computer Science
Questions?
How will you do Scalar Integer Multiply/Divide?
Just use the Java operators, but must be sure to do it only on 18
bits
Think about using an Integer subclass with just 18 bits?
There is no negating instruction. How will you compute
the negative of a number?
Should you use the Adder to increment the PC or just
provide a separate adder circuit.
How will you detect overflow/underflow when doing
adding/subtracting?
10/7/2017 CS6461 Computer Architecture - 2014 24
Dept. of Computer Science
Simple Procedure Calls
Using a procedure involves the following sequence of
actions:
1. Put arguments in places known to procedure (registers)
2. Transfer control to procedure, saving the return address (JSR)
3. Acquire storage space, if required, for use by the procedure
4. Perform the desired task
5. Put results in places known to calling program (registers or
elsewhere)
6. Return control to calling point (RFS)
10/7/2017 CS6461 Computer Architecture - 2014 25
Dept. of Computer Science
Simple Procedure Calls
10/7/2017 CS6461 Computer Architecture - 2014 26
Dept. of Computer Science
Example: Finding the absolute value of an integer
jsr abs ; assume integer in r0
. ; instruction after subroutine call
abs
str r0,0,<tempInt> ; store r0 in <tempInt>, some location
ldr r1,0,smask ; mask for sign bit = 100 000 000 000 000 000
and r1,r0 ; AND r1 and r0: if r0 bit is set it will be set in r1
jz r1,0,pos ; test if sign = 0, e.g., r0 bit 0 is 0
src r0,1,1,1 ; shift r0 logical left 1 bit
src r0,1,0,1 ; shift r0 logical right sets sign bit to 0
pos
rfs 1 ; return with 1 => true and r0 has absolute integer
10/7/2017 CS6461 Computer Architecture - 2014 27
Dept. of Computer Science
Soooo!
Convoluted?? Yes!
Why??
1. No jump less than or greater than instructions!
2. Did we really need them or were they a matter of convenience?
E.g., how many instructions did we save by not having them?
3. Implicit use of r3
10/7/2017 CS6461 Computer Architecture - 2014 28
Dept. of Computer Science