0% found this document useful (1 vote)

280 views13 pages

Advanced Computer Architecture

This document provides an overview of the CSCE 4610/5610: Computer Architecture course. It outlines the course objectives, which include applying metrics to evaluate computer system performance, designing processor pipelines and branch prediction, and gaining knowledge of cache design alternatives. It then reviews key concepts in computer architecture like instruction set architecture and microarchitecture. The document discusses issues like processor cost and yield, measuring performance, and power consumption considerations for computer architects.

Uploaded by

julianraja20

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (1 vote)

280 views13 pages

Advanced Computer Architecture

Uploaded by

julianraja20

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

1/24/14

Welcome To CSCE 4610/5610: Computer Architecture

Outcomes for CSCE 4610

1. Apply metrics to evaluate performance modern computer systems.

2. Design processor pipelines to meet specifications.

3. Design simple branch prediction for a pipelined processor.

4. Design an out-of-order instruction execution using reservation stations and

reorder buffers.

5. Apply simple compiler techniques to improve performance.

6. Gain knowledge about various cache design alternatives.

You will be asked at the end of the semester to see if we met these objectives

Review

What is computer architecture

Instruction Set Architecture

Computer Organization

Micro-architecture

System Architecture

CSCE 4610/5610 Jan. 16, 2014

Welcome To CSCE 4610/5610: Computer Architecture

Role of a computer architect

Support functionality

Depends on the type of applications or target market

Desktop, server, scientific, mobile/personal devices

Embedded systems (controllers, etc.)

Understand technology trends

Denser chips

Denser memories

Memory wall

Clock frequencies

Heat dissipation

Support functionality with best performance

Speed performance

Reliability, availability

Power/energy performance

hard or soft real-time requirements

CSCE 4610/5610 Jan. 16, 2014

1/24/14

CSCE 4610/5610: Computer Architecture

Issues related to Cost

Cost of Integrated Chip

Cost of the die (or chip)

Cost of testing

Cost of packaging

Cost of the die depends on the number of chips per wafer and

how many good dies per wafer (or yield)

Dies per wafer:

CSCE 4610/5610 Jan. 16, 2014

4610/5610: Computer Architecture

CSCE

Die yield=
(Wafer _ yield)*

1
[1+ (defects _ per _ unit _ area)*(die _ area)]N

N is known as process complexity and in 2010 its value ranged between 11.5 and 15.5

Example from page 31

Note: Some of the problems from Chapter 1 do not work with this formula

30cm water and we have two different dies, 1.5cm or 1.0 cm square

Dies per wafer:

With 1.5 cm dies or 2.25 cm2 we get 270 dies

With 1cm dies or 1.0 cm2, we get 640 dies

CSCE 4610/5610 Jan. 16, 2014

1/24/14

CSCE 4610/5610: Computer Architecture

Defects per square determine the yield

Given 0.031 defects for cm2 and N=13.5

If we use 1.5cm dies,

Die yield = 0.40. So we get 270*.4 = 108 good chips from 30cm wafer

If we use 1cm dies: die yield = 0.66 and we will get 640*0.66 = 422 good chips

CSCE 4610/5610 Jan. 16, 2014

CSCE 4610/5610: Computer Architecture

Another example: Problem 1.1 on page 62

CSCE 4610/5610 Jan. 16, 2014

1/24/14

CSCE 4610/5610: Computer Architecture

Here we simply apply the yield equation

Die yield=
(Wafer _ yield)*

1
[1+ (defects _ per _ unit _ area)*(die _ area)]N

We will assume wafer yield to be 100% and N= 13.5

If you the equation in the text book, we get Yield = 2.9*10-5

VERY BAD!

The previous edition used the following equation for yield

Die _ yield = wafer _ yield *[1+

(defects _ per _ unit _ area)*(die _ area)

]

Assume Wafer-Yield =100% and alpha = 4

Now, the yield for Power 4 turns out to be more reasonable = 0.36

CSCE 4610/5610 Jan. 16, 2014

CSCE 4610/5610: Computer Architecture

Why does power5 have a lower defect rate?

IBM technology is older (see larger scale in terms of manufacturing size in nm)

So it is mature and have fewer defects in manufacturing

Power consumed by a processor

Two types of power consumed

Static: even if a hardware component is not active

sometimes called leakage

dynamic: due to switching of transistors

Powerdynamic = (1/2)*(Capacitive load)*(threshold voltage)2*(operating frequency)

CSCE 4610/5610 Jan. 16, 2014

1/24/14

CSCE 4610/5610: Computer Architecture

Example: What happens if voltage is dropped by 15% and (proportional) change

in operating frequency?

No change in capacitive load

So we reduced the power consumption by 39%

Consider another example: problem 1.4

CSCE 4610/5610 Jan. 16, 2014

CSCE 4610/5610: Computer Architecture

CSCE 4610/5610 Jan. 16, 2014

1/24/14

CSCE 4610/5610: Computer Architecture

Note we are using Intel processor with 2 DRAM chips and 7200 rpm disk

total power = 66 (for processor) +2*(2.3) (for DRAM) + 7.9 (for disk)

= 78.5 W

However if a power supply only works at 80% efficiency and need to supply 78.5W we
need a power supply that is rated for 78.5/0.8 = 99W

b). Disk is 60% idle (or 40% busy)

7.8*40% + 4.0*60% = 5.56

c). 7200 disk can be idle longer, or seek time shorter

seek7600= 75% seek5400

Total-power7200=100%=seek7200+idle7600=75%*seek5400+idle7200

Total-power5400=seek5400+idle5400

We need to equate these two power equations and use the power consumed by the two disks

for seek and idle given in the table

Note seek_time = 1-(idle_time)

Solving, you will see that idle7600 is approximately 29%

CSCE 4610/5610 Jan. 16, 2014

CSCE 4610/5610: Computer Architecture

Note: More hardware means more power consumption

both static and dynamic

The capacitive load is proportional to the number of transistors

Dynamic voltage and frequency scaling

Changing voltage and clock speed) can degrade performance

A better measure may be time*energy product

If we change voltage and frequency in the middle of execution

you will lose some time since hardware components need to be resynchronized

Dropping (threshold) voltage reduces power consumption but may become more error prone

Lot of work done on changing frequencies as well as shutting off components to save power

Globally Asynchronous Locally Synchronous (GALS)

Different units (different stages of a pipeline) run at different clock rates

CSCE 4610/5610 Jan. 16, 2014

1/24/14

CSCE 4610/5610: Computer Architecture

Another criteria is the amount of silicon area needed

at least for embedded systems

Let us define the size needed for 1bit register as 1rbe.

To build one bit SRAM we need 0.6 rbe

To build one bit DRAM we need 0.1 rbe

To build one bit direct mapped cache we need ~ 0.8 rbe

So, need to decide if you want DRAM or DRAM or Cache or registers

Tradeoff Registers are faster than caches, caches faster than DRAM

Logic circuits (like control logic, arithmetic logic) consume more area than memory units

If we can reduce the amount of cache memory needed, we can potentially reduce the area
needed for cache and power consumption

We have explored some ideas -- keep the same performance but reduce area and

Power consumed using different cache organizations

CSCE 4610/5610 Jan. 16, 2014

CSCE 4610/5610: Computer Architecture

Cost of computers

The cost of CPU chip is only a small fraction of the overall cost a computer

CPU is 22% of total system cost

This fraction keeps changing on how the cost of other components change

The cost of system must be understood in relation to the selling price of the system

the actual cost of the system is only 25% of list price

rest for marketing, profits etc

So, if the cost of the CPU increases by $, the system cost increases by $4.5

The list price will increase by $18!

So, if we are considering adding new functionality, we need to worry about the impact of the
functionality on cost and price

And the increase should be justified by performance either speed or reliability/availability
or less power

CSCE 4610/5610 Jan. 16, 2014

1/24/14

CSCE 4610/5610: Computer Architecture

How do we define the performance of a processor?

Execution time for a program?

Wall clock or CPU time?

User CPU and System CPU Time

For now we will only use user CPU time =

(instr uction count)* (CPI)

(Clock Rate)

Note cycle time = 1/clock_rate. 1 Ghz clock means 1 ns per cycle

CPI: Average number of Cycles Per Instruction.

How do we find this?

Consider for example that we collected average frequencies for various instruction types.

ALU operations occur 43% of time and take 1 clock cycle to execute

Load instructions occur 21% of the time and need 2 cycles

Store instructions occur 12% of the time and need 2 cycles

Branch instructions occur 24% of the time and need 2 cycles

CSCE 4610/5610 Jan. 16, 2014

CSCE 4610/5610: Computer Architecture

How do we get CPI = average cycles per instruction?

instruction count

(Ci)

(instruction count)

The average number of cycles per instruction =

0.43*1+0.21*2+0.12*2+0.24*2 = 1.57 Cycles per instruction

Once we have the CPI, and clock speed, we can find the MIPS ratings of a processor

If we are using 1Ghz processor, the MIPS rating is given by

109 /(1.57) = 637 Million Instruction Per Second

Execution time = (instruction_count)*(1/637 mips)

= (instruction_count)*6.37*10-9 seconds

Remember, the clock speed (or frequency) is inversely related to clock period.

CSCE 4610/5610 Jan. 16, 2014

1/24/14

CSCE 4610/5610: Computer Architecture

Consider why MIPS rating can be misleading.

Suppose we have a compiler that can optimize the program

The optimized compiler can eliminate 50% of Arithmetic instructions.

Now let us consider how the equations change. What is the CPI?

Consider for example that we collect average frequencies for various instruction types.

ALU operations occur 21.5% of time and take 1 clock cycle to execute

Load instructions occur 21% of the time and need 2 cycles

Store instructions occur 12% of the time and need 2 cycles

Branch instructions occur 24% of the time and need 2 cycles

But we need to scale these fractions since the total is only 78.5%

So CPI = [(21.5%)*1 + (21%)*2+(12%)*2+(24%)*2]/(78.5%)

= 1.73 cycles per instruction larger CPI?

MIPS = (1 Ghz)/[1.73*106] = 578 MIPS

So the computer with an optimized compiler has lower MIPS rating!

CSCE 4610/5610 Jan. 16, 2014

CSCE 4610/5610: Computer Architecture

Another Example

Consider two processors with different ways of implementing conditional instructions

CPU-A: needs two instructions; A compare and a branch (eg., SLT R3, R1, R2; BNZ R3, loop)

CPU-B: A single instruction to compare and branch (eg., BLT R1, R2, Loop)

Branches take 2 cycles and all other instructions take 1 cycle

Frequency of branches = 20%

CPU-As clock is 25% faster simpler instructions

Time on CPU-A = (Instr_Count)*{0.80*1+0.20*(2+1)}(Cycle_Time)

= (Instr_Count)*1.4*(Cycle_Time)

Time on CPU-B = (inst_Count)*(0.8*1+0.2*2)*(1.25*Cycle_Time)

= (Instr_Count)*1.5*(Cycle_Time)

CPU-A is faster even if it needs more instructions!

CSCE 4610/5610 Jan. 16, 2014

1/24/14

CSCE 4610/5610: Computer Architecture

Many Complex Interactions During Execution.

Pipeline Bubbles or Stalls or lost cycles due to branch instructions

Consider for example, on the average 50% of all branches

are taken and cause 3 cycle stalls or lost cycles

What is the CPI for branch instructions?

If not taken, CPI = 1

If taken, CPI = 4

Effective CPI for branches = 0.5*1+0.5*4 =2.5

If branches are 20%, total CPI = 80%*1 + 20%*2.5 = 1.3

Cache Misses

Effect only load and store instructions

If no cache miss, say CPI =2

If cache miss, we may have a CPI of 50

5% miss rate leads to 0.95*2+0.05*50 = 4.4

Remember the instruction frequencies from a previous example

CSCE 4610/5610 Jan. 16, 2014

CSCE 4610/5610: Computer Architecture

ALU operations occur 43% of time and take 1 clock cycle to execute

Load instructions occur 21% of the time and need 2 cycles without cache misses

Store instructions occur 12% of the time and need 2 cycles without cache misses

Branch instructions occur 24% of the time and need 2 cycles

But if have 21% loads and 12% stores with 4.4 cycles with cache misses,

the new CPI is = 33%*4.4 + 43%*1+24%*2 = 2.32 CPI

compared to 1.57 CPI with no cache misses

How to report performance data?

Execution time for one program

Execution times for all programs

Average execution time across all programs

Weighted average etc.

Arithmetic Mean =

Assuming n programs

1 n
(Time)i
n i =1
CSCE 4610/5610 Jan. 16, 2014

1/24/14

CSCE 4610/5610: Computer Architecture

(Weight ) * (Time)

Weighted Arithmetic Mean =

n
Harmonic Mean =

i =1

(Time)
i =1

Let us look an example. Here we are comparing 3 different computers using 2 programs.

Computer A Computer B Computer C
Pgm P1
Pgm P2
Total

1
1000
1001

10
100
110

20
20
40

Let us find weighted arithmetic average execution times and we will use 3 different weights

W1: P1=50% P2=50%

W2: P1= 90.9%, P2= 9.1%

W3: P1= 99.9%; P2=0.1%

CSCE 4610/5610 Jan. 16, 2014

CSCE 4610/5610: Computer Architecture

Computer A Computer B Computer C
Pgm P1
Pgm P2
Total
Avg with W1
Avg with W2
Avg with W3

1
1000
1001

10
100
110

20
20
40

500.5
91.91
2

55
18.19
10.09

20
20
20

So which computer is best?

If we use W1, C is best, with W2, B is best and with W3 A is best

Can we think of a different way of computing averages?

Relative performance. For each program use a relative execution time, compared a
standard computer.

The relative execution times can be used to compute an arithmetic (or weighted) means.

CSCE 4610/5610 Jan. 16, 2014

1/24/14

CSCE 4610/5610: Computer Architecture

We can also compute Geometric Mean.

( Relative _ execution _ time)

i =1

Let us look our example using Geometric means. The relative performance of the 3 machines
remain the same

Pgm P1
Pgm P2
Arithmetic Mean
Geometric Mean

Pgm P1
Pgm P2
Arithmetic Mean
Geometric Mean

Normalized to A
Computer A Computer B Computer C
1
10
20
1
0.1
0.02
1
1

5.05
1

10.01
0.63

Normalized to B
Computer A Computer B Computer C
0.1
1
2
10
1
0.2
5.05
1

1
1

Now, C is always the best

1.1
0.63

Normalized to C
Computer A Computer B Computer C
0.05
0.5
1
50
5
1
25.03
1.58

2.75
1.58

1
1

CSCE 4610/5610: Computer Architecture

Another example. See the table on page 43

here we are looking at Geometric means for Opteron and Itanium

Sun Ultra Spark 5 is used as the reference computer

Opteron runs 30% slower

CSCE 4610/5610 Jan. 16, 2014

1/24/14

CSCE 4610/5610: Computer Architecture

What programs to use in evaluating performance?

The programs that will be run in the field

Benchmark programs

Real programs that are common in an application domain

e.g. SPEC benchmarks(Spec CPU, Integer, float)

SPECWeb, SPECvirt

bio-informatics

High-performance (SPEComp)

Program kernels:

eg. Embedded kernels (EEMBC)

NAS benchmarks, Livermore loops

Synthetic program mixes

CSCE 4610/5610 Jan. 16, 2014

CSCE 4610/5610: Computer Architecture

How to collect performance data using benchmarks?

Actual Measurements and Simulations

If the architecture already exist, run programs and collect data

Need to be careful in collecting data

Instrumentation may skew data

Performance Registers

Software profiling techniques

Or develop simulations.

Detailed simulations

Trace driven simulations

Monte Carlo simulations

CSCE 4610/5610 Jan. 16, 2014

Lec2 6
No ratings yet
Lec2 6
8 pages
Intro
No ratings yet
Intro
14 pages
Computer Systems Design Drill
No ratings yet
Computer Systems Design Drill
5 pages
Computer Architecture Performance Evaluation
No ratings yet
Computer Architecture Performance Evaluation
22 pages
Lec01 Intro
No ratings yet
Lec01 Intro
41 pages
04-Time Area Reliability
No ratings yet
04-Time Area Reliability
46 pages
Computer Architecture Course Overview
No ratings yet
Computer Architecture Course Overview
18 pages
Slide 1
No ratings yet
Slide 1
33 pages
Computer Architecture Basics
No ratings yet
Computer Architecture Basics
64 pages
Chap1 PPA
No ratings yet
Chap1 PPA
30 pages
CS6461 - Computer Architecture Fall 2016 Instructor Morris Lancaster
No ratings yet
CS6461 - Computer Architecture Fall 2016 Instructor Morris Lancaster
28 pages
Chip Basics: Time, Area, Power, Reliability, Configurability
No ratings yet
Chip Basics: Time, Area, Power, Reliability, Configurability
46 pages
Performance Numericals
No ratings yet
Performance Numericals
24 pages
Computer Architecture Overview
No ratings yet
Computer Architecture Overview
34 pages
Lecture - 4 - Performance
No ratings yet
Lecture - 4 - Performance
31 pages
Lecture 1 Computer Abstraction and Performance
No ratings yet
Lecture 1 Computer Abstraction and Performance
25 pages
CH 1
No ratings yet
CH 1
55 pages
Chapter 01 RISC V
No ratings yet
Chapter 01 RISC V
30 pages
Lecture2 ch1
No ratings yet
Lecture2 ch1
23 pages
Power Wall in Computer Architecture
No ratings yet
Power Wall in Computer Architecture
35 pages
Wiley Encyclopedia of Computer Science and Engineering - 2007 - Flynn - Computer Architecture
No ratings yet
Wiley Encyclopedia of Computer Science and Engineering - 2007 - Flynn - Computer Architecture
18 pages
CMP2008 L1
No ratings yet
CMP2008 L1
47 pages
Aca Lecuternotes 1-4units
No ratings yet
Aca Lecuternotes 1-4units
91 pages
Computer Architecture & Performance
No ratings yet
Computer Architecture & Performance
56 pages
Understanding Computer Architecture Basics
No ratings yet
Understanding Computer Architecture Basics
54 pages
Chapter 1
No ratings yet
Chapter 1
34 pages
Measuring Performance in Computer Architecture
No ratings yet
Measuring Performance in Computer Architecture
42 pages
Chapter 1 Part 2: Computer Abstractions and Technology
No ratings yet
Chapter 1 Part 2: Computer Abstractions and Technology
27 pages
Computer Architecutre
No ratings yet
Computer Architecutre
77 pages
Computer Architecture Course Overview
No ratings yet
Computer Architecture Course Overview
18 pages
CA 02 Performance
No ratings yet
CA 02 Performance
21 pages
Computer Architecture and Operating Systems (Caos) Course Code: CS31702 4-0-0
No ratings yet
Computer Architecture and Operating Systems (Caos) Course Code: CS31702 4-0-0
33 pages
Chapter 1 Introduction
No ratings yet
Chapter 1 Introduction
49 pages
01) Fundamentals of Quantitative Design and Analysis
No ratings yet
01) Fundamentals of Quantitative Design and Analysis
71 pages
Unit - 1 (Fundamentals of Computer Architecture and Technology Trends)
No ratings yet
Unit - 1 (Fundamentals of Computer Architecture and Technology Trends)
68 pages
Fundamentals of Quantitative Design and Analysis: A Quantitative Approach, Fifth Edition
No ratings yet
Fundamentals of Quantitative Design and Analysis: A Quantitative Approach, Fifth Edition
77 pages
CAQA5e ch1
No ratings yet
CAQA5e ch1
80 pages
COA Unit 1 - Section 2.2
No ratings yet
COA Unit 1 - Section 2.2
34 pages
Computer Abstractions and Technology
No ratings yet
Computer Abstractions and Technology
49 pages
PDF
No ratings yet
PDF
41 pages
Cs6303comparchnotes PDF
No ratings yet
Cs6303comparchnotes PDF
250 pages
1 Al
No ratings yet
1 Al
10 pages
Computer Architecture & OS Syllabus
No ratings yet
Computer Architecture & OS Syllabus
30 pages
Architecture II
No ratings yet
Architecture II
247 pages
16 - Computer Design Basics
No ratings yet
16 - Computer Design Basics
46 pages
VLSI Design: Professor: S.Ramasamy/ECE B64, R307 Rams@aastu - Edu.et
No ratings yet
VLSI Design: Professor: S.Ramasamy/ECE B64, R307 Rams@aastu - Edu.et
79 pages
RISC-V Computer Organization Overview
No ratings yet
RISC-V Computer Organization Overview
49 pages
Chapter 01
No ratings yet
Chapter 01
54 pages
Lecture 02 - Computer Abstractions and Technology
No ratings yet
Lecture 02 - Computer Abstractions and Technology
23 pages
Computer Abstractions and Technology Overview
No ratings yet
Computer Abstractions and Technology Overview
39 pages
Computer Architecture Overview
No ratings yet
Computer Architecture Overview
30 pages
COAL Lecture 02
No ratings yet
COAL Lecture 02
36 pages
Keckler Clock Rate Versus IPC
No ratings yet
Keckler Clock Rate Versus IPC
13 pages
ECE 4680 Computer Architecture and Organization
No ratings yet
ECE 4680 Computer Architecture and Organization
15 pages
Lec 3
No ratings yet
Lec 3
41 pages
30VLSI System Level
No ratings yet
30VLSI System Level
49 pages
Lecture 3
No ratings yet
Lecture 3
26 pages
Computer Architecture: Vnu - University Engineering Technology
No ratings yet
Computer Architecture: Vnu - University Engineering Technology
30 pages
Computer Architecture Performance Metrics
No ratings yet
Computer Architecture Performance Metrics
25 pages
Hybrid Topology in LAN Networks
No ratings yet
Hybrid Topology in LAN Networks
64 pages
Infographic Design & Uses Guide
0% (1)
Infographic Design & Uses Guide
18 pages
FRM Course Syllabus IPDownload
No ratings yet
FRM Course Syllabus IPDownload
1 page
Check List
No ratings yet
Check List
4 pages
PIC Development Board Lab Guide
No ratings yet
PIC Development Board Lab Guide
20 pages
Ashok Leyland Vehicle Innovations
No ratings yet
Ashok Leyland Vehicle Innovations
12 pages
Pawan Tiwari: Objective
No ratings yet
Pawan Tiwari: Objective
1 page
Simplex Foundation Series: Easy Fire System Installation
No ratings yet
Simplex Foundation Series: Easy Fire System Installation
1 page
Ali Raza Resume
No ratings yet
Ali Raza Resume
3 pages
Volvo AllSystemDTC 20250113023645
No ratings yet
Volvo AllSystemDTC 20250113023645
2 pages
Mini Project
100% (1)
Mini Project
14 pages
Text-to-Speech Website Development Roadmap
No ratings yet
Text-to-Speech Website Development Roadmap
16 pages
CCS366 Software Testing and Automation Notes
No ratings yet
CCS366 Software Testing and Automation Notes
105 pages
Tektronix AFG3102 Service ID6061
No ratings yet
Tektronix AFG3102 Service ID6061
160 pages
Eaton Vickers PVH Variable Axial Piston Pump: Industrial Applications Mobile Applications
No ratings yet
Eaton Vickers PVH Variable Axial Piston Pump: Industrial Applications Mobile Applications
9 pages
Searchmonkey - Docs - User Manual
No ratings yet
Searchmonkey - Docs - User Manual
7 pages
Manipur - 30kW Ongrid System
No ratings yet
Manipur - 30kW Ongrid System
5 pages
402 Centork Centronik Manual-Instruction Manuals-English
No ratings yet
402 Centork Centronik Manual-Instruction Manuals-English
66 pages
Bezier Curve Classification with CNTK
No ratings yet
Bezier Curve Classification with CNTK
11 pages
Bangkok Piling Practices
No ratings yet
Bangkok Piling Practices
7 pages
Labview Decoder Cruzsalinas
No ratings yet
Labview Decoder Cruzsalinas
12 pages
Multibeam Geoswath
No ratings yet
Multibeam Geoswath
2 pages
Captcha: Kiit University
No ratings yet
Captcha: Kiit University
27 pages
Caterpillar Acert Service Training
100% (67)
Caterpillar Acert Service Training
6 pages
hk416 T4e
No ratings yet
hk416 T4e
60 pages
Advanced Digital Recorders for Scientists
No ratings yet
Advanced Digital Recorders for Scientists
2 pages
Beautiful PHD Thesis Latex Template
100% (1)
Beautiful PHD Thesis Latex Template
6 pages
Cisco Duo Gateway AWS Quick Start
No ratings yet
Cisco Duo Gateway AWS Quick Start
1 page
101 Useful Macro Codes Examples For VBA Beginners + (Free PDF Copy) PDF
67% (3)
101 Useful Macro Codes Examples For VBA Beginners + (Free PDF Copy) PDF
44 pages
CSCI430 Assessment1 Fall24 25
No ratings yet
CSCI430 Assessment1 Fall24 25
7 pages

Advanced Computer Architecture

Uploaded by

Advanced Computer Architecture

Uploaded by

1/24/14

Welcome To CSCE 4610/5610: Computer Architecture

Welcome To CSCE 4610/5610: Computer Architecture

CSCE 4610/5610 Jan. 16, 2014

CSCE 4610/5610: Computer Architecture

Dies per wafer:

CSCE 4610/5610 Jan. 16, 2014

CSCE 4610/5610 Jan. 16, 2014

CSCE 4610/5610: Computer Architecture

CSCE 4610/5610 Jan. 16, 2014

CSCE 4610/5610: Computer Architecture

CSCE 4610/5610 Jan. 16, 2014

CSCE 4610/5610: Computer Architecture

We will assume wafer yield to be 100% and N= 13.5

Die _ yield = wafer _ yield *[1+

(defects _ per _ unit _ area)*(die _ area)

Assume Wafer-Yield =100% and alpha = 4

CSCE 4610/5610 Jan. 16, 2014

CSCE 4610/5610: Computer Architecture

CSCE 4610/5610 Jan. 16, 2014

CSCE 4610/5610: Computer Architecture

So we reduced the power consumption by 39%

Consider another example: problem 1.4

CSCE 4610/5610 Jan. 16, 2014

CSCE 4610/5610: Computer Architecture

CSCE 4610/5610 Jan. 16, 2014

CSCE 4610/5610: Computer Architecture

CSCE 4610/5610: Computer Architecture

CSCE 4610/5610 Jan. 16, 2014

CSCE 4610/5610: Computer Architecture

CSCE 4610/5610 Jan. 16, 2014

CSCE 4610/5610: Computer Architecture

CSCE 4610/5610: Computer Architecture

(instr uction count)* (CPI)

Note cycle time = 1/clock_rate. 1 Ghz clock means 1 ns per cycle

CSCE 4610/5610: Computer Architecture

The average number of cycles per instruction =

CSCE 4610/5610 Jan. 16, 2014

CSCE 4610/5610: Computer Architecture

CSCE 4610/5610: Computer Architecture

CSCE 4610/5610: Computer Architecture

CSCE 4610/5610 Jan. 16, 2014

CSCE 4610/5610: Computer Architecture

CSCE 4610/5610: Computer Architecture

Weighted Arithmetic Mean =

CSCE 4610/5610: Computer Architecture

So which computer is best?

CSCE 4610/5610 Jan. 16, 2014

CSCE 4610/5610: Computer Architecture

( Relative _ execution _ time)

Now, C is always the best

CSCE 4610/5610: Computer Architecture

Opteron runs 30% slower

CSCE 4610/5610 Jan. 16, 2014

CSCE 4610/5610: Computer Architecture

Real programs that are common in an application domain

CSCE 4610/5610 Jan. 16, 2014

CSCE 4610/5610: Computer Architecture

CSCE 4610/5610 Jan. 16, 2014

You might also like