Foundations of Computer Systems
(CS541)
Arijit Mondal
CS541 1
Syllabus
• Introduction to computer architecture
• Instruction set architecture
• CPU design
CS541 2
Books to be followed
• Computer Organization and Design: The Hardware/Software Interface – David A.
Patterson, John L. Hennessy
• Computer Organization and Architecture – William Stallings
• Computer Architecture: A Quantitative approach – David A. Patterson, John L.
Hennessy
CS541 3
Evaluation policy
• Midsem: 30%
• Endsem: 50%
• Assignments/Quiz: 20%
CS541 4
Introduction
CS541 5
Abstraction of computing systems
Application
Physics
CS541 6
Abstraction of computing systems
Application
Algorithms
Physics
CS541 7
Abstraction of computing systems
Application
Algorithms
Programming language
Physics
CS541 8
Abstraction of computing systems
Application
Algorithms
Programming language
Operating systems
Physics
CS541 9
Abstraction of computing systems
Application
Algorithms
Programming language
Operating systems
Instruction set architecture
Physics
CS541 10
Abstraction of computing systems
Application
Algorithms
Programming language
Operating systems
Instruction set architecture
Microarchitecture
Physics
CS541 11
Abstraction of computing systems
Application
Algorithms
Programming language
Operating systems
Instruction set architecture
Microarchitecture
Register transfer level
Physics
CS541 12
Abstraction of computing systems
Application
Algorithms
Programming language
Operating systems
Instruction set architecture
Microarchitecture
Register transfer level
Gates
Physics
CS541 13
Abstraction of computing systems
Application
Algorithms
Programming language
Operating systems
Instruction set architecture
Microarchitecture
Register transfer level
Gates
Circuits
Physics
CS541 14
Abstraction of computing systems
Application
Algorithms
Programming language
Operating systems
Instruction set architecture
Microarchitecture
Register transfer level
Gates
Circuits
Physics
CS541 15
Abstraction of computing systems
Application
Algorithms • Application Requirements:
Suggest how to improve architecture
Programming language
Provide revenue to fund development
Operating systems
Instruction set architecture
• Architecture provides feedback to guide applica-
Microarchitecture
tion and technology research directions
Register transfer level
Gates • Technology Constraints:
Restrict what can be done efficiently
Circuits
New technologies make new arch possible
Physics
CS541 16
Abstraction
• Abstraction helps us to deal with complexity
Hide lower level details
• Instruction set architecture
Hardware/Software interface
• Application binary interface
ISA plus system software
• Implementation
The details underlying and interface
CS541 17
Architecture vs Microarchitecture
• Architecture / Instruction Set Architecture
Programmer visible state (Memory & Register)
Operations (Instructions and how they work)
Execution Semantics (interrupts)
Input/Output
Data Types/Sizes
• Microarchitecture / Organization
Microarchitecture/Organization: Tradeoffs on how to implement ISA for some metric
(Speed, Energy, Cost)
Examples: Pipeline depth, number of pipelines, cache size, silicon area, peak power,
execution ordering, bus widths, ALU widths
CS541 18
Levels of Program Code
• High level language
g = h * i ;
Easy to code & debug
k = j + i ;
Close to problem domain g = h[1] ;
Provides productivity
CS541 19
Levels of Program Code
• High level language
g = h * i ;
Easy to code & debug
k = j + i ;
Close to problem domain g = h[1] ;
Provides productivity
compiler
CS541 19
Levels of Program Code
• High level language
g = h * i ;
Easy to code & debug
k = j + i ;
Close to problem domain g = h[1] ;
Provides productivity
compiler
• Assembly language MUL R0, R1, R2 ;
Textual representation of instructions ADD R3, R4, R2
LDR R3, [R0,#4]
CS541 19
Levels of Program Code
• High level language
g = h * i ;
Easy to code & debug
k = j + i ;
Close to problem domain g = h[1] ;
Provides productivity
compiler
• Assembly language MUL R0, R1, R2 ;
Textual representation of instructions ADD R3, R4, R2
LDR R3, [R0,#4]
assembler
CS541 19
Levels of Program Code
• High level language
g = h * i ;
Easy to code & debug
k = j + i ;
Close to problem domain g = h[1] ;
Provides productivity
compiler
• Assembly language MUL R0, R1, R2 ;
Textual representation of instructions ADD R3, R4, R2
LDR R3, [R0,#4]
assembler
• Hardware language
0000101101000010101
Binary data 1010101111100101010
Encoded instruction and data 1010101011110000011
CS541 19
Components of a Computer
• Same components for all kind of computers
Server, Desktop, Embedded systems
CS541 20
Components of a Computer
• Same components for all kind of computers
Server, Desktop, Embedded systems
• Input-Output support
CS541 20
Components of a Computer
• Same components for all kind of computers
Server, Desktop, Embedded systems
• Input-Output support
User interface devices – Keyboard, mouse, display
Storage devices – Hard disk, CD/DVD, Flash
Network adapters for communicating with others
CS541 20
Components of a Computer
• Same components for all kind of computers
Server, Desktop, Embedded systems
• Input-Output support
User interface devices – Keyboard, mouse, display
Storage devices – Hard disk, CD/DVD, Flash
Network adapters for communicating with others
• Inside the computer
CS541 20
Components of a Computer
• Same components for all kind of computers
Server, Desktop, Embedded systems
• Input-Output support
User interface devices – Keyboard, mouse, display
Storage devices – Hard disk, CD/DVD, Flash
Network adapters for communicating with others
• Inside the computer
Arithmetic logic unit (ALU)
Program control unit
Memory
Datapath
CS541 20
IAS Computer
Arithmetic Logic
Unit
CS541 21
IAS Computer
Arithmetic Logic
Unit
Program control
Unit
CS541 22
IAS Computer
Arithmetic Logic
Unit
Main
Memory
Program control
Unit
CS541 23
IAS Computer
Arithmetic Logic
Unit
Main Input
Memory Output
Program control
Unit
CS541 24
IAS Computer
Arithmetic Logic
Unit
Main Input
Memory Output
Program control
Unit
CS541 25
Expanded structure of IAS Computer
Arithmatic logic unit (ALU)
AC MQ Input
Output
equipment
Arithmatic logic circuit
MBR
IBR PC
Main
Memory
IR MAR
Control Control
Circuits signals
Program control unit
CS541 26
Top level view of computer
CPU Main memory
Instruction
PC MBR Instruction
IR MAR
IOAR Data
Execution Data
Unit
IOBR Data
I/O Module
Buffers
CS541 27
Basic instruction cycle
Fetch cyle Execute cycle
Fetch next Execute
Start Halt
instruction Instruction
CS541 28
Machine Model
Stack Accumulator Reg−Mem Reg−Reg
Processor
Processor
Processor
Processor
Memory
Memory
Memory
Memory
push A load A load r1, A load r1,A
push B add B add r3,r1, B load r2,B
add store C store r3, C add r3,r2,r1
pop C store r3,c
CS541 29
Understanding Performance
• Algorithms
Determines number of operation executed
• Programing language, compiler, architecture
Determine number of machine instructions is executed per operation
• Processor and memory systems
Determines how fast instructions are executed
• I/O systems
Determines how fast I/O operations are performed
CS541 30
Performance
• Response time
How long it takes to finish a task
• Throughput
Total workdone per unit time (eg. task/transaction/per hour)
• Dependency of response time and throughput
Replacing the processor with a faster version?
Adding more processors?
CS541 31
Relative performance
• Performance is defined as 1/Execution time
• X is n times faster than Y
PerformanceX /PerformanceY = Execution timeY /Execution timeX = n
• Example: Time taken to run a program
10ns in machine X and 15ns in machine Y
Execution timeY /Execution timeX = 15/10 = 1.5
So, X is 1.5 times faster than Y
CS541 32
Measuring performance
• Elapsed time (Wall clock time)
Total time to complete a task including I/O, memory access, disk access, OS overhead,
etc.
• CPU time
The time the CPU spends computing this task
Does not include I/O time, other jobs’ share
Can be further subdivided – user CPU time and system CPU time
• Different programs are affected differently by CPU and system performance
CS541 33
CPU clocking
• Operation is controlled by a constant rate clock
Clock period is duration of clock cycle. (eg. 300ns = 300 × 10−9 s)
Clock frequency is cycles per second. (eg. 4GHz = 4 × 109 Hz)
Clock period = 1/Clock frequency
CS541 34
CPU Time
CPU clock cycle
• CPU time = CPU clock cycles × Clock period =
Clock frequency
• Performance can be improved by
Reducing number of clock cycle
Increasing clock frequency
Hardware designer must trade off clock frequency against cycle count
CS541 35
Example
• Machine A: Run time 10s, Clock speed 2GHz
• Design a new machine (B say)
Run time is 6s
Faster clock require 1.2 times more clock cycles compared to A
• Clock frequency for machine B?
CS541 36
Instruction count and CPI
• Clock cycles = Instruction count × Cycles per instruction
Instruction count × CPI
• CPU time = Instruction count × CPI × Clock period =
Clock frequency
• Instruction count for a program
Depends on ISA, compiler, program
• Average cycles per instruction
Determined by CPU hardware
Different instruction have different CPI
Average CPI is affected by instruction mix
CS541 37
CPI example
• Machine A: Clock period - 250ps, CPI - 2.0
• Machine B: Clock period - 500ps, CPI - 1.2
• Same set of instructions
• Which is faster?
CS541 38
CPI in more detail
• Different instructions take different cycles
X n
• Clock cycles = (CPIi × Instruction counti )
i=1
• Weighted average CPI =
n
Clock cycle X Instruction counti
= CPIi ×
Instruction count Instruction count
i=1
CS541 39
CPI example
Instruction A B C
CPI for instruction 1 2 3
IC in Sequence 1 2 1 2
IC in Sequence 2 4 1 1
• Which code sequence executes the most instructions?
• Compute average CPI for each sequence.
CS541 40
Performance summary
Instructions Clock cycles second
• CPU Time = × ×
Program Instruction Clock cycle
• Performance depends on
CS541 41
Performance summary
Instructions Clock cycles second
• CPU Time = × ×
Program Instruction Clock cycle
• Performance depends on
Algorithm -
CS541 41
Performance summary
Instructions Clock cycles second
• CPU Time = × ×
Program Instruction Clock cycle
• Performance depends on
Algorithm - Affects IC, possibly CPI
CS541 41
Performance summary
Instructions Clock cycles second
• CPU Time = × ×
Program Instruction Clock cycle
• Performance depends on
Algorithm - Affects IC, possibly CPI
Programming language -
CS541 41
Performance summary
Instructions Clock cycles second
• CPU Time = × ×
Program Instruction Clock cycle
• Performance depends on
Algorithm - Affects IC, possibly CPI
Programming language - Affects IC, CPI
CS541 41
Performance summary
Instructions Clock cycles second
• CPU Time = × ×
Program Instruction Clock cycle
• Performance depends on
Algorithm - Affects IC, possibly CPI
Programming language - Affects IC, CPI
Compiler -
CS541 41
Performance summary
Instructions Clock cycles second
• CPU Time = × ×
Program Instruction Clock cycle
• Performance depends on
Algorithm - Affects IC, possibly CPI
Programming language - Affects IC, CPI
Compiler - Affects IC, CPI
CS541 41
Performance summary
Instructions Clock cycles second
• CPU Time = × ×
Program Instruction Clock cycle
• Performance depends on
Algorithm - Affects IC, possibly CPI
Programming language - Affects IC, CPI
Compiler - Affects IC, CPI
Instruction set architecture -
CS541 41
Performance summary
Instructions Clock cycles second
• CPU Time = × ×
Program Instruction Clock cycle
• Performance depends on
Algorithm - Affects IC, possibly CPI
Programming language - Affects IC, CPI
Compiler - Affects IC, CPI
Instruction set architecture - Affects IC, CPI, Clock period
CS541 41
Performance: Power
• Power ∝ Capacitive load × Voltage2 × Frequency
• Suppose a new CPU has the following
85% of capacitive load of old CPU
15% reduction in voltage, 15% reduction in frequency
Pnew 0.85 × Cold × (Vold × 0.85)2 × Fold × 0.85
◦ = = 0.854 = 0.52
Pold Cold × (Vold )2 × Fold
Constraints
◦ Further reduction in voltage may not be possible
◦ Dissipation of heat
CS541 42
MIPS as performance metric
• MIPS: Millions of Instruction Per Second
Does not account for
◦ Differences in ISAs in computers
◦ Differences in complexity between instructions
Instruction count Instruction count
• MIPS = = Instruction count×CPI
Execution time × 106 × 106
Clock frequency
Clock frequency
=
CPI × 106
• CPI varies between programs on a given CPU
CS541 43
Multiprocessors
• Multicore multiprocessors
More than one processor per chip
• Requires explicit parallel programming
Instruction level parallelism
◦ Hardware executes multiple instructions simultaneously
◦ Hidden from programmer
Hard to do
◦ Programming for performance
◦ Load balancing
◦ Optimizing communication and synchronization
CS541 44
Conclusion
• Cost/performance is improving
Due to underlying technology development
• Hierarchical layer of abstraction
In both hardware and software
• Instruction set architecture
The Hardware/Software interface
• Execution time – measure of performance
• Power is a limiting factor
Use parallelism to improve performance
CS541 45