0% found this document useful (0 votes)

82 views44 pages

01 - Introduction PDF

This document provides an introduction to performance and machine basics in three levels of abstraction: higher level, lower level, and lowest level. It discusses different levels of computer usage and typical tasks at each level. It also covers levels of program code from application software to hardware and why high-level languages are used. Key concepts around performance include execution time, CPU time, clock cycles, and instructions per cycle. Optimization aims to reduce execution time by improving these factors. Amdahl's law models how optimization of a portion of code affects overall performance. Different number representations like decimal and binary are also introduced.

Uploaded by

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

82 views44 pages

01 - Introduction PDF

Uploaded by

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 44

Introduction /

Performance / Machine
Basics
CS270 Unit 1
Max Luttrell, Spring 2017
"Welcome to the machine"
- Pink Floyd, 1975
levels of
computer usage
example user typical task

higher level "can you use your phone

grandma
to check traffic for me?"

"use a smartphone to
dad
check traffic"
"write a smartphone app in
application programmer java which provides traffic
predictions"
"write MIPS assembly
lower level cs270 student language code to sort an
array of integers"
levels of
program code
• Application software
• written in a high level language (e.g.
C/C++, java, swift, etc.)
• Systems software
• compiler - translate C++ code into
target processor's machine code
• operating system
• handle input/output (I/O)
• manage memory and storage
• schedule tasks
• share resources
• Hardware
• processor, memory, I/O controllers
compiler: translates high-level
language into assembly language

assembler: translates assembly

language into binary machine language
why use a high level
language?
• closer to natural language / algebra

• can be tailored to its typical usage, e.g. fortran for scientific

computation, cobol for business applications

• improves programmer productivity

• fewer lines = fewer opportunities for error, easier to debug

• closer to natural language than assembly, easier to understand/

debug

• platform independent

• the same C program can be compiled for different target

architectures, e.g. MIPS, ARM, Intel x86
performance
Which airplane has the best performance?
Boeing 777 Boeing 777

Boeing 747 Boeing 747

BAC/Sud BAC/Sud
Concorde Concorde
Douglas Douglas DC-
DC-8-50 8-50

0 100 200 300 400 500 0 2000 4000 6000 8000 10000

Passenger Capacity Cruising Range (miles)

Boeing 777 Boeing 777

Boeing 747 Boeing 747

BAC/Sud BAC/Sud
Concorde Concorde
Douglas Douglas DC-
DC-8-50 8-50

0 500 1000 1500 0 100000 200000 300000 400000

Cruising Speed (mph) Passengers x mph

what is high performance?
which is best
scenario
performance

run same program on two whichever one gets the

different desktop program done first (best
computers execution time)

datacenter with multiple whichever one completes

computers running jobs the most jobs in a given
submitted by many users day (best throughput)
relative performance
• define performance = 1 / execution time

• X is n times faster than Y

performancex / performancey =

execution timey / execution timex = n

• Example: execution time to run a program

• 10s on computer A, 15s on computer B

• execution timeB / execution timeA = 15s / 10s = 1.5

• Thus, A is 1.5 times faster than B

measuring execution time
• total response time

• more than just computation, also includes I/O,

OS time, idle time

• CPU time for a given job

• CPU time plus OS time

• discounts I/O time and idle time

CPU clocks
• operation of digital hardware is governed by a constant rate-clock

• Clock period = duration of one clock cycle

A. e.g. 0.1 s
-12
B. e.g. 250ps = 0.25ns = 250*10 s

• Clock frequency = clock cycles per second = 1 / clock period

A. e.g. 10 Hz
9
B. e.g. 4.0GHz = 4000MHz = 4.0*10 Hz
CPU time
• CPU Time = CPU Clock Cycles * Clock Cycle Period =

CPU Clock Cycles / Clock Rate

• Performance improved by:

• reduce number of clock cycles needed

• increase clock rate

• HW designer must often trade off clock rate against

cycle count
CPU time example
• Computer A: 2GHz clock, 10s CPU time for our program

• Designing computer B

• Aiming for 6s CPU time for our program

• We can do a faster clock for computer B, but this will cause

us to need 1.2x the clock cycles we needed on computer A

• How fast must computer B's clock be?

Clock CyclesB 1.2 × Clock CyclesA
Clock RateB = =
CPU TimeB 6s
Clock CyclesA = CPU Time A × Clock Rate A
= 10s × 2GHz = 20 × 10 9
1.2 × 20 × 10 9 24 × 10 9
Clock RateB = = = 4GHz
6s 6s
instruction count / CPI
Clock Cycles = Instruction Count × Cycles per Instruction
CPU Time = Instruction Count × CPI × Clock Cycle Time
Instruction Count × CPI
=
Clock Rate
• instruction count

• determined by program, Instruction Set Architecture (ISA), and compiler

• average cycles per instruction (CPI)

• determined by CPU hardware

• different instructions can have different CPI

• average CPI affected by instruction mix

CPI example
• Computer A: Cycle Time = 250ps, CPI = 2.0

• Computer B: Cycle Time = 500ps, CPI = 1.2

• Both computers have the same Instruction Set Architecture (ISA)

• For a given program (with a fixed instruction count I), which

computer is faster?
CPU Time = Instruction Count × CPI × Cycle Time
A A A
= I × 2.0 × 250ps = I × 500ps
CPU Time = Instruction Count × CPI × Cycle Time
B B B
= I × 1.2 × 500ps = I × 600ps
CPU Time
B = I × 600ps = 1.2
CPU Time I × 500ps
A
another CPI example
• we have two compiled code sequences using
instructions in classes A, B, C, each with different CPI's

class A B C
CPI for class 1 2 3

class A B C
IC in code sequence 1 2 1 2
IC in code sequence 2 4 1 1

Sequence 1: IC = 5 Sequence 2: IC = 6
Clock Cycles  Clock Cycles 
= 2×1 + 1×2 + 2×3  = 4×1 + 1×2 + 1×3 
= 10 =9
Avg. CPI = 10/5 = 2.0 Avg. CPI = 9/6 = 1.5
performance optimization
• common case

• optimize what's used most commonly during

program execution

• typically, this is a small fraction of overall code

Amdahl's law
Taffected
Timproved = + Tunaffected
improvemen t factor
• Given: a program runs in 100 seconds on our computer. 80
seconds of this time is spent doing multiplication. 20 seconds is
spent doing other things.

• Assume we can get our hardware designer to speed up

multiplication on our computer.

• Question: what improvement factor for multiplication do we need if

we want our program to run:

A. two times faster?

B. five times faster?

performance optimization
optimization type example

running two errands: I run

parallel one errand, you run the
other errand
laundry: move first load to
pipeline dryer, immediately start
second load in washer
travel: research trip
speculative execution itinerary before you're sure
you will go

cache: cache memory

use several memory levels
faster than main memory
number representation
• decimal - base 10 - humans

• each digit represents a power of 10

• e.g. 1979decimal = 11000 + 9100 + 710 + 19

• binary - base 2 - computers

• each digit represents a power of 2

• e.g. 1011binary = 18 + 04 + 12 + 11 = 11decimal

number representation
position values examples
base
(b) in in
exponential decimal value base b decimal position values decimal

10 103 102 10 1 1000 100 10 1 1234 11000+2100+310+41 1234

2 23 22 2 1 8 4 2 1 1011 18 + 04 + 12 + 11 11

4 43 42 4 1 64 16 4 1 3203 364 + 216 + 04 + 31 227

8 83 82 8 1 512 64 8 1 2715 2512+ 764 + 18 + 51 1485

16 163 162 16 1 4096 256 16 1 2AC8 24096+10256+1216+81 10952

base 16: hexadecimal (aka hex) uses digits 0-9, then A-F
base conversion
• e.g. convert octal 437 to hexadecimal

• idea: convert octal to decimal, then decimal to

hex
• 437octal = 4*64 + 3*8 + 7 = 287decimal

• 287decimal contains 256 (16*16). 287-256 = 31

• 31 contains 16. 31-16 = 15

• answer: 11Fhex, commonly written 0x11F

base conversion
• an easier way to convert base for powers of 2 is
to express in binary first. for our example:

octal 4 3 7

binary 1 0 0 0 1 1 1 1 1

hex 1 1 F

binary 1 0 0 0 1 1 1 1 1
signed integers
• signed integer can be negative

• unsigned integer can not be negative

• sign/magnitude: use the first bit to indicate sign. 0-
pos; 1-neg. then use the rest of the bits for magnitude.

• example: we have 4 bit numbers. 6 is 0110. -6 is 1110.

two's complement
• instead of sign-magnitude, modern computers use two's
complement

• idea:

• first bit still indicates sign, 0-pos and 1-neg

• to change sign, flip every bit and then add 1 (two’s

complement operation)

• example: convert 8 bit number for positive 23 to negative -23

00010111 11101000 11101001

23 flip +1 -23
interpreting binary
• example: the following binary contains an
unsigned 4-bit int, then a signed 8-bit int, then a
signed, 2’s complement 16-bit int. Convert each
to decimal

• 1001001010011111111111101001
value two's complement decimal total
1001 n/a 8+1 9
00101001 n/a 32+8+1 41
1111111111101001 0000000000010111 16+4+2+1 -23
sign extension
• converting shorter ints to longer ints

• for unsigned, just pad with zeros

• ex: 4-bit 5 is 0101. 8-bit 5 is 00000101.

• for signed two's complement, just pad with

most-significant-bit (MSB)

• ex: 4 bit -7 is 1001. 8-bit -7 is 11111001

adding/subtracting two's
complement
• addition: do bitwise addition

• e.g. 4 bit values

• 3decimal = 0011. 2decimal = 0010. 3+2 = 5 = 0101

• subtraction: take two's complement of number

being subtracted, and then do addition

• 3decimal = 0011. 2decimal = 0010. 3 - 2 = 3+(-2) =

0011 + 1110 = 0001 = 1decimal
overflow
• overflow: not enough bits to store the result

• e.g. if we try signed two’s complement 4-bit 7+3

= 0111+0011 = 1010 (note sign is wrong!)

• can detect overflow if two operands are like-

signed, and the sign of result is different
interpreting data
• what's this?

• 0001001010011111

• could be anything!

• 16 bit unsigned int

• 16 bit signed int

• float

• an instruction for a computer to execute

where is data stored?
memory type
slower,
cheaper, backup
(e.g. cloud, tape backup)
larger
external storage
(e.g. hard drive, flash drive)

random access memory

(RAM)
faster, more
expensive, registers
smaller
typically, data must be in a register to run an instruction on it
running a program
• a program's instructions are typically stored
consecutively in RAM

• typical machine uses two special registers to

execute them

Instruction Register
holds current instruction
(IR)

Program Counter holds address of next

(PC) instruction to execute
fetch / decode / execute
cycle
• Prior to running program, load PC
with address of its first instruction.
IR PC
Then repeatedly:
• Fetch
• load IR with next instruction
• increment PC
• Decode
• determine what instruction
does and where operands are
• Execute
• perform the instruction
types of instructions -
arithmetic
• arithmetic instruction includes:

• opcode operation to perform, e.g. add,

multiply

• operands - what data to perform operation on

• where to place the result

types of instructions -
branch
• branch instruction includes:

• under what conditions to branch

• address of next instruction to execute if we

are to branch
types of instructions -
halt
• halt: stop the machine

• halt instruction: normal termination

• or, error condition encountered

Instruction Set Architecture
(ISA)
• ISA: set of instructions and machine capabilities

• interface between hardware and lowest-level software

type description example
postfix calculator,
stack-based uses stack for data e.g. 3 4 + to
calculate 3+4
implicitly uses one
accumulator-
accumulator (or a select Simple Machine
based
few) for data
many general purpose
general- registers; info about all MIPS, ARM, Intel
register operands contained in x86
each instruction
general-register
architecture types
• CISC vs. RISC

• Reduced Instruction Set Computer -

instructions are simple. Most modern
architectures are RISC

• Complex Instruction Set Computer - more

complex instructions available. Intel x86 is
CISC but with some RISC added
addressing modes
• where are the operands?
mode description example

instruction requires both add two registers,

one operand is in a add a register to a

mixed / register, the other operand constant, store
constant is a constant contained in result in register
instruction
MIPS mult
instruction places
one or more operands is
implicit results of multiply
implicit in the instruction
in lo and hi
registers
memory interface
• we need to be able to read from
memory and write to memory
MBR MAR
• memory interface: array of bytes
with two special registers
• Memory Address Register
(MAR) - address of next
memory reference
• Memory Buffer Register
RAM
(MBR) - storage location for
data being read from or
written to memory
memory cycle
• how to write to memory:
MBR MAR • access the MAR
• place contents of MBR at
memory[MAR]

• how to read from memory:

RAM • access the MAR
• retrieve contents of
memory[MAR] and put in
MBR
register transfer language/
notation (RTL/RTN)
• all internal operations in a CPU can be described as
sequence of operations on, and transfers between registers,
and memory cycles.

• the RTL (or RTN) contains:

• operations

• operands

• ability to specify specific bits in registers

• programming constructs for CPU's logic capabilities

•
Exercise 1A
Make sure you can connect to our course Canvas page and view homework
#1.

• http://ccsf.instructure.com

• Make sure you can connect between our hills linux server and the Batmale
413 linux workstations, using the syllabus as a guide and for password /
login information.
• If you're on a linux workstation, demonstrate an ssh connection to hills.
Replace “uname” with your username.
ssh [email protected]
• If you're on your own laptop, demonstrate a terminal window logged into
hills, and from that terminal window, ssh to a linux workstation. The IP
range you can use is anything between 147.144.23.31-147.144.23.58.
ssh [email protected]
• For Mac laptops, ssh is builtin to the terminal window. For Windows
laptops, you will need to download a terminal client like Putty or
SSHSecureShellClient.exe. See the computer systems guide on my web
page, http://fog.ccsf.edu/~mluttrel
Exercise 1B

• Linux basics and simple machine intro

http://fog.ccsf.edu/~mluttrel/cs270/exercises/ex1b.html

Unit - I Syllabus: Basic Structure of Computers
100% (1)
Unit - I Syllabus: Basic Structure of Computers
72 pages
Chapter 2 Performance and Number Systems
No ratings yet
Chapter 2 Performance and Number Systems
22 pages
CS1601 Computer Architecture
100% (1)
CS1601 Computer Architecture
389 pages
Week 1
No ratings yet
Week 1
34 pages
Lec 2
No ratings yet
Lec 2
31 pages
Chapter 1
No ratings yet
Chapter 1
53 pages
Lecture 2: Performance/Power, MIPS Instructions
No ratings yet
Lecture 2: Performance/Power, MIPS Instructions
28 pages
Lec 2
No ratings yet
Lec 2
31 pages
Computer Organization & Design The Hardware/Software Interface, 2nd Edition Patterson & Hennessy
80% (5)
Computer Organization & Design The Hardware/Software Interface, 2nd Edition Patterson & Hennessy
118 pages
Understanding Computer Architecture Basics
No ratings yet
Understanding Computer Architecture Basics
54 pages
Handout Chapter-1 PBK
No ratings yet
Handout Chapter-1 PBK
14 pages
Computer Architecture & Performance
No ratings yet
Computer Architecture & Performance
56 pages
Chapter - 01 - Computer Abstractions
No ratings yet
Chapter - 01 - Computer Abstractions
37 pages
Solution Manual of Cmputer Organization and Architectur
44% (27)
Solution Manual of Cmputer Organization and Architectur
29 pages
Ui Hamidehu1298886666042
No ratings yet
Ui Hamidehu1298886666042
33 pages
CPSC 321 Computer Architecture: Fall 2006
No ratings yet
CPSC 321 Computer Architecture: Fall 2006
36 pages
PPT#01
No ratings yet
PPT#01
30 pages
Computer Architecture and Performance Insights
100% (1)
Computer Architecture and Performance Insights
6 pages
Understanding Computer Performance Metrics
No ratings yet
Understanding Computer Performance Metrics
27 pages
Advanced Computer Architecture Overview
No ratings yet
Advanced Computer Architecture Overview
32 pages
Unit-I Basic Computer Organization
No ratings yet
Unit-I Basic Computer Organization
68 pages
Cheatsheet CA
No ratings yet
Cheatsheet CA
12 pages
MIPS Instructions & Performance Analysis
No ratings yet
MIPS Instructions & Performance Analysis
22 pages
Com212 Note
No ratings yet
Com212 Note
31 pages
Lec 1
No ratings yet
Lec 1
32 pages
Microprocessors and Interfacing Programming and Hardware 2nd Edition Solution Douglas V Hall PDF
No ratings yet
Microprocessors and Interfacing Programming and Hardware 2nd Edition Solution Douglas V Hall PDF
64 pages
7y8567857645ghj768 PDF
No ratings yet
7y8567857645ghj768 PDF
64 pages
Microprocessors and Interfacing Programming and Hardware 2nd Edition Solution Douglas V Hall
No ratings yet
Microprocessors and Interfacing Programming and Hardware 2nd Edition Solution Douglas V Hall
64 pages
Microprocessors and Interfacing Programming and Hardware 2nd Edition Solution Douglas V Hall PDF
100% (1)
Microprocessors and Interfacing Programming and Hardware 2nd Edition Solution Douglas V Hall PDF
64 pages
Microprocessors and Interfacing Programming and Hardware 2nd Edition Solution Douglas V Hall PDF
No ratings yet
Microprocessors and Interfacing Programming and Hardware 2nd Edition Solution Douglas V Hall PDF
64 pages
Microprocessors and Interfacing Programming and Hware 2nd Edition Solution Douglas V Hall PDF
100% (1)
Microprocessors and Interfacing Programming and Hware 2nd Edition Solution Douglas V Hall PDF
64 pages
Performance Comparison of Machine Architectures
No ratings yet
Performance Comparison of Machine Architectures
44 pages
EC8552 - Computer Architecture and Organization (Ripped From Amazon Kindle Ebooks by Sai Seena)
No ratings yet
EC8552 - Computer Architecture and Organization (Ripped From Amazon Kindle Ebooks by Sai Seena)
476 pages
Performance Chap4
No ratings yet
Performance Chap4
20 pages
Microprocessor Course Overview
No ratings yet
Microprocessor Course Overview
27 pages
ARM Computer Organization-Chapter01
No ratings yet
ARM Computer Organization-Chapter01
55 pages
Chapter 01 Modified
No ratings yet
Chapter 01 Modified
55 pages
Computer Performance Metrics Explained
No ratings yet
Computer Performance Metrics Explained
28 pages
Homework 1
No ratings yet
Homework 1
11 pages
Computer Abstractions and Technology Overview
No ratings yet
Computer Abstractions and Technology Overview
39 pages
Assembly Lecture 1
No ratings yet
Assembly Lecture 1
23 pages
Ico22 - 1 - Computer Abstraction and Technology
No ratings yet
Ico22 - 1 - Computer Abstraction and Technology
42 pages
1aca L1
No ratings yet
1aca L1
35 pages
Processor Clock:: Computer Organization
No ratings yet
Processor Clock:: Computer Organization
17 pages
Detailed Notes On Computer Organization
No ratings yet
Detailed Notes On Computer Organization
1 page
Computer Architecture and Operating Systems (Caos) Course Code: CS31702 4-0-0
No ratings yet
Computer Architecture and Operating Systems (Caos) Course Code: CS31702 4-0-0
33 pages
Chapter 1 Introduction To Microprocessor
No ratings yet
Chapter 1 Introduction To Microprocessor
9 pages
Computer Abstractions and Technology Measuring Performance
No ratings yet
Computer Abstractions and Technology Measuring Performance
21 pages
Week 1 - Lecture 1 - Introduction
No ratings yet
Week 1 - Lecture 1 - Introduction
26 pages
Computer Organization and Architecture (AT70.01)
No ratings yet
Computer Organization and Architecture (AT70.01)
29 pages
Mphy0020 Notes
No ratings yet
Mphy0020 Notes
26 pages
Computer Architecture Exercises
No ratings yet
Computer Architecture Exercises
10 pages
Unit I
No ratings yet
Unit I
27 pages
Assembly Full Syllabus
No ratings yet
Assembly Full Syllabus
25 pages
Chapter 1
No ratings yet
Chapter 1
18 pages
Lecture 5
No ratings yet
Lecture 5
143 pages
Solutions COA7e 1
No ratings yet
Solutions COA7e 1
92 pages
Star Trek
No ratings yet
Star Trek
9 pages
Java 8: Part One
No ratings yet
Java 8: Part One
51 pages
Java Thread Management and Synchronization
No ratings yet
Java Thread Management and Synchronization
92 pages
Words
No ratings yet
Words
61 pages
Words
No ratings yet
Words
585 pages
Words
No ratings yet
Words
585 pages
Sequential Logic: CS270 Max Luttrell, Spring 2017
No ratings yet
Sequential Logic: CS270 Max Luttrell, Spring 2017
38 pages
Understanding MIPS Pointers and Instructions
No ratings yet
Understanding MIPS Pointers and Instructions
27 pages
09 - Sequential Logic
No ratings yet
09 - Sequential Logic
13 pages
Combinational Logic: CS270 Max Luttrell, Fall 2016
No ratings yet
Combinational Logic: CS270 Max Luttrell, Fall 2016
67 pages
06 - MIPS Pointers - Objects
No ratings yet
06 - MIPS Pointers - Objects
32 pages
09 - Sequential Logic
No ratings yet
09 - Sequential Logic
46 pages
09 - Sequential Logic
No ratings yet
09 - Sequential Logic
13 pages
Review - Midterm 2
No ratings yet
Review - Midterm 2
10 pages
MIPS Assembly Basics and Operations
No ratings yet
MIPS Assembly Basics and Operations
47 pages
Combinational Logic: CS270 Max Luttrell, Fall 2016
No ratings yet
Combinational Logic: CS270 Max Luttrell, Fall 2016
67 pages
FPGA Prototyping
100% (1)
FPGA Prototyping
13 pages
Schematic for MRP6X and TD62304AP
No ratings yet
Schematic for MRP6X and TD62304AP
4 pages
Sany SCC 1500c
No ratings yet
Sany SCC 1500c
21 pages
TA 1800 - TA 1801 Broch
No ratings yet
TA 1800 - TA 1801 Broch
1 page
Ds PIC30 F3010
100% (1)
Ds PIC30 F3010
8 pages
Introduction to Embedded Systems
No ratings yet
Introduction to Embedded Systems
28 pages
Mindray-Wato-Ex-65 - Service Manual
No ratings yet
Mindray-Wato-Ex-65 - Service Manual
322 pages
Islamiat Class 11th CH 01
No ratings yet
Islamiat Class 11th CH 01
21 pages
MSP430 Microcontroller Evolution
No ratings yet
MSP430 Microcontroller Evolution
19 pages
Capacitance and Dissipation Factor Measurements
No ratings yet
Capacitance and Dissipation Factor Measurements
10 pages
AMIE CSE Syllabus Overview
No ratings yet
AMIE CSE Syllabus Overview
11 pages
AutoCAD & Inventor 2009 Installation Guide
No ratings yet
AutoCAD & Inventor 2009 Installation Guide
3 pages
CH 2 - Arithmetic Instruction Group
No ratings yet
CH 2 - Arithmetic Instruction Group
22 pages
Axioo & Infinix
No ratings yet
Axioo & Infinix
1 page
WaveForms SDK RM 3
No ratings yet
WaveForms SDK RM 3
83 pages
Maa WP 11g Transientlogicalrollingupgrade
No ratings yet
Maa WP 11g Transientlogicalrollingupgrade
28 pages
Embedded Linux Qemu Slides
No ratings yet
Embedded Linux Qemu Slides
520 pages
Product Price List: Electronics & Accessories
No ratings yet
Product Price List: Electronics & Accessories
106 pages
1.1 Communication Skills - I: Rationale
No ratings yet
1.1 Communication Skills - I: Rationale
25 pages
ETC Console Shortcut Keys: Eos Family v2.2.0
No ratings yet
ETC Console Shortcut Keys: Eos Family v2.2.0
3 pages
Valve Tester OS.111 - EN 2
No ratings yet
Valve Tester OS.111 - EN 2
16 pages
David Lee Wright: Experience Objective
No ratings yet
David Lee Wright: Experience Objective
3 pages
TrumaBend V Series en 1999 02 Serv
No ratings yet
TrumaBend V Series en 1999 02 Serv
358 pages
Catalog Muc 5.0 Model PDS Insight™ 2
No ratings yet
Catalog Muc 5.0 Model PDS Insight™ 2
5 pages
A Solution For Every HPC Challenge: Fast, Easy Development Large Dataset Handling Parallel Processing Multiple Output
No ratings yet
A Solution For Every HPC Challenge: Fast, Easy Development Large Dataset Handling Parallel Processing Multiple Output
4 pages
WWW Rejinpaul Com 2013 04 Anna University Question Papers El
100% (1)
WWW Rejinpaul Com 2013 04 Anna University Question Papers El
5 pages
QL Ne100
No ratings yet
QL Ne100
4 pages
Microprocessor Evolution Overview
No ratings yet
Microprocessor Evolution Overview
11 pages
DARPA-BAA-16-22 Open Project
100% (1)
DARPA-BAA-16-22 Open Project
21 pages
Hirschmann 943658002
No ratings yet
Hirschmann 943658002
2 pages

01 - Introduction PDF

Uploaded by

01 - Introduction PDF

Uploaded by

Introduction /

higher level "can you use your phone

assembler: translates assembly

• can be tailored to its typical usage, e.g. fortran for scientific

• improves programmer productivity

• fewer lines = fewer opportunities for error, easier to debug

• closer to natural language than assembly, easier to understand/

• the same C program can be compiled for different target

Boeing 747 Boeing 747

Passenger Capacity Cruising Range (miles)

Boeing 777 Boeing 777

Boeing 747 Boeing 747

0 500 1000 1500 0 100000 200000 300000 400000

Cruising Speed (mph) Passengers x mph

run same program on two whichever one gets the

datacenter with multiple whichever one completes

• X is n times faster than Y

execution timey / execution timex = n

• Example: execution time to run a program

• 10s on computer A, 15s on computer B

• execution timeB / execution timeA = 15s / 10s = 1.5

• Thus, A is 1.5 times faster than B

• more than just computation, also includes I/O,

• CPU time for a given job

• CPU time plus OS time

• discounts I/O time and idle time

• Clock period = duration of one clock cycle

• Clock frequency = clock cycles per second = 1 / clock period

CPU Clock Cycles / Clock Rate

• Performance improved by:

• reduce number of clock cycles needed

• increase clock rate

• HW designer must often trade off clock rate against

• Aiming for 6s CPU time for our program

• We can do a faster clock for computer B, but this will cause

• How fast must computer B's clock be?

• determined by program, Instruction Set Architecture (ISA), and compiler

• average cycles per instruction (CPI)

• determined by CPU hardware

• different instructions can have different CPI

• average CPI affected by instruction mix

• Computer B: Cycle Time = 500ps, CPI = 1.2

• Both computers have the same Instruction Set Architecture (ISA)

• For a given program (with a fixed instruction count I), which

• optimize what's used most commonly during

• typically, this is a small fraction of overall code

• Assume we can get our hardware designer to speed up

• Question: what improvement factor for multiplication do we need if

A. two times faster?

B. five times faster?

running two errands: I run

cache: cache memory

• each digit represents a power of 10

• e.g. 1979decimal = 1*1000 + 9*100 + 7*10 + 1*9

• binary - base 2 - computers

• each digit represents a power of 2

• e.g. 1011binary = 1*8 + 0*4 + 1*2 + 1*1 = 11decimal

10 103 102 10 1 1000 100 10 1 1234 1*1000+2*100+3*10+4*1 1234

2 23 22 2 1 8 4 2 1 1011 1*8 + 0*4 + 1*2 + 1*1 11

4 43 42 4 1 64 16 4 1 3203 3*64 + 2*16 + 0*4 + 3*1 227

8 83 82 8 1 512 64 8 1 2715 2*512+ 7*64 + 1*8 + 5*1 1485

16 163 162 16 1 4096 256 16 1 2AC8 2*4096+10*256+12*16+8*1 10952

• idea: convert octal to decimal, then decimal to

• 287decimal contains 256 (16*16). 287-256 = 31

• 31 contains 16. 31-16 = 15

• answer: 11Fhex, commonly written 0x11F

• unsigned integer can not be negative

• example: we have 4 bit numbers. 6 is 0110. -6 is 1110.

• first bit still indicates sign, 0-pos and 1-neg

• to change sign, flip every bit and then add 1 (two’s

• example: convert 8 bit number for positive 23 to negative -23

00010111 11101000 11101001

• for unsigned, just pad with zeros

• ex: 4-bit 5 is 0101. 8-bit 5 is 00000101.

• for signed two's complement, just pad with

• ex: 4 bit -7 is 1001. 8-bit -7 is 11111001

• e.g. 1979decimal = 11000 + 9100 + 710 + 19

• e.g. 1011binary = 18 + 04 + 12 + 11 = 11decimal

10 103 102 10 1 1000 100 10 1 1234 11000+2100+310+41 1234

2 23 22 2 1 8 4 2 1 1011 18 + 04 + 12 + 11 11

4 43 42 4 1 64 16 4 1 3203 364 + 216 + 04 + 31 227

8 83 82 8 1 512 64 8 1 2715 2512+ 764 + 18 + 51 1485

16 163 162 16 1 4096 256 16 1 2AC8 24096+10256+1216+81 10952