0% found this document useful (0 votes)

8 views21 pages

CSP Part1

Uploaded by

strishanthreddy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views21 pages

CSP Part1

Uploaded by

strishanthreddy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Computer Systems and Performance

Module 1: What is Computer Performance?

Why Should We Measure Performance?
Understanding performance is key to making smart decisions about computers. It helps us
answer critical questions like:
● Why is one computer better than another for a specific task (like gaming vs. office work)?
● Is my slow computer a hardware problem or a software problem?
● If I want to speed up my program, what should I focus on?
● Do I need a new computer, or just a new operating system?

The Airplane Analogy: Performance Isn't Just One Thing

Before we dive in, let's use an analogy. Imagine you have four different airplanes:

Airplane Passenger Cruising Speed Cruising Range

Capacity

Boeing 787 ~290 567 mph 8,786 miles

Concorde ~100 1,354 mph 4,488 miles

Airbus 380 ~550 561 mph 9,444 miles

Cessna 172 4 141 mph 790 miles

Which airplane has the "best performance"?

● For speed, it's the Concorde.
● For passenger throughput (carrying the most people), it's the Airbus 380.
● For range, it's the Airbus 380.
This shows us that performance can't be defined by a single number. We need to look at
different metrics depending on our goals.

The Two Key Metrics: Response Time vs. Throughput

In computing, we generally focus on two main metrics:
1. Response Time (or Execution Time):
○ What it is: The total time it takes to complete a single task, from start to finish.
○ Example: How long it takes to open a web browser, load a level in a game, or run a
scientific calculation.
○ Goal: You want lower response time.
2. Throughput:
○ What it is: The total amount of work (tasks, transactions) completed in a given unit
of time.
○ Example: How many customer transactions a bank's server can process per minute,
or how many videos a server can stream simultaneously.
○ Goal: You want higher throughput.
How are they related?
● Improving Response Time: If you replace your computer's processor with a faster one,
your programs will run faster. This reduces response time and, as a side effect, also
improves throughput (since you can get more tasks done in the same amount of time).
● Improving Throughput: If you add a second processor to a server, you can run two
tasks at once. This doubles the throughput. However, it doesn't change the response
time for any single task (a task still takes the same amount of time to run).

Module 2: How We Measure Execution Time (The First

Performance Equation)
When we talk about "performance," we most often mean Response Time. But what is the
total execution time of a program?

Total Time = CPU Time + Memory Access Time + I/O Wait Time + ...

This is too complicated. For this guide, we'll focus on CPU Time: the time the processor (CPU)
spends only on processing your task. This doesn't include time spent waiting for your hard
drive, the internet, or running other programs.

The Clock Cycle

Every computer runs on a fixed "clock" that beats at a constant rate, like a metronome.
Hardware events happen in "ticks" of this clock.
● Clock Period (or Clock Cycle Time): The duration of one clock tick (e.g., 0.5
nanoseconds).
● Clock Rate: The number of clock ticks (cycles) per second (e.g., 2 GigaHertz (GHz),
which is 2 billion cycles per second).
● Relationship: Clock Rate = 1 / Clock Period

The First Performance Equation

This gives us our first and most basic performance equation:

CPU Execution Time = (Number of CPU Clock Cycles) × (Clock Cycle Time)
Or, using Clock Rate:

CPU Execution Time = (Number of CPU Clock Cycles) / (Clock Rate)

This tells us that to improve performance (i.e., reduce execution time), we have two options:
1. Reduce the number of clock cycles the program needs.
2. Reduce the clock cycle time (which means increasing the clock rate).
Example: Upgrading a Computer
● Computer A: Runs a program in 10 seconds with a 2 GHz clock.
● We want to build Computer B that can run the same program in 6 seconds.
● Constraint: Due to design changes, Computer B will require 1.2 times more clock
cycles for this program.
Question: What clock rate must Computer B have?
1. Find cycles for A:
○ Cycles(A) = CPU Time × Clock Rate
○ Cycles(A) = 10 s × (2 × 10⁹ cycles/s) = 20 × 10⁹ cycles
2. Find cycles for B:
○ Cycles(B) = Cycles(A) × 1.2
○ Cycles(B) = (20 × 10⁹) × 1.2 = 24 × 10⁹ cycles
3. Find clock rate for B:
○ Clock Rate(B) = Cycles(B) / CPU Time(B)
○ Clock Rate(B) = (24 × 10⁹ cycles) / 6 s = 4 × 10⁹ cycles/s, or 4 GHz
Answer: To run the program in 6 seconds, Computer B must have a 4 GHz clock.

Module 3: Digging Deeper (The "CPI" Performance

Equation)
The first equation is good, but it has a problem. It assumes we just know the "Number of CPU
Clock Cycles." How do we figure that out?

We can calculate it based on the number of instructions in our program. But not all
instructions are equal. An instruction for add two numbers might be very fast, while divide two
numbers might be very slow.

CPI: Cycles Per Instruction

We use a metric called CPI (Cycles Per Instruction). This is the average number of clock
cycles required to execute one instruction for a given program.

The Classic Performance Equation (The Second One)

Now we can create a much more useful performance equation:

CPU Clock Cycles = (Instruction Count) × (Average CPI)

By substituting this into our first equation, we get the most important performance
equation in this guide:

CPU Execution Time = (Instruction Count) × (Average CPI) × (Clock Cycle

Time)

This masterpiece tells us that the speed of a program depends on three factors:
1. Instruction Count (IC): The total number of instructions the program executes.
2. Average CPI: The average number of cycles each instruction takes.
3. Clock Cycle Time (CT): The speed of the clock.
Example: Comparing Two Computers

Let's compare two computers running the same program (so, the same Instruction Count, "I").
● Computer A: Clock Cycle Time = 10 ns, Average CPI = 2.0
● Computer B: Clock Cycle Time = 20 ns, Average CPI = 1.2
Question: Which computer is faster?
1. CPU Time for A:
○ Time(A) = I × CPI(A) × CT(A)
○ Time(A) = I × 2.0 × 10 ns = I × 20 ns
2. CPU Time for B:
○ Time(B) = I × CPI(B) × CT(B)
○ Time(B) = I × 1.2 × 20 ns = I × 24 ns
Answer: Computer A is faster, because its total execution time (20 ns per instruction) is less
than Computer B's (24 ns per instruction).

Calculating Average CPI (Weighted Average)

How do we find that "average" CPI? We can calculate a weighted average if we know the
different types of instructions.

Example: Comparing Code Sequences

A program uses three classes of instructions:

● Class A: CPI = 1 cycle
● Class B: CPI = 2 cycles
● Class C: CPI = 3 cycles
We are testing two different compiler "sequences" for the same task.
Sequence 1:
● Class A: 2 instructions
● Class B: 1 instruction
● Class C: 2 instructions
Sequence 2:
● Class A: 4 instructions
● Class B: 1 instruction
● Class C: 1 instruction
Question: Which sequence is faster?
1. Analyze Sequence 1:
○ Total Cycles = (2 × 1) + (1 × 2) + (2 × 3) = 2 + 2 + 6 = 10 cycles
○ Total Instructions = 2 + 1 + 2 = 5 instructions
○ CPU Time = 10 cycles × (Clock Cycle Time)
2. Analyze Sequence 2:
○ Total Cycles = (4 × 1) + (1 × 2) + (1 × 3) = 4 + 2 + 3 = 9 cycles
○ Total Instructions = 4 + 1 + 1 = 6 instructions
○ CPU Time = 9 cycles × (Clock Cycle Time)
Answer: Sequence 2 is faster because it takes only 9 cycles, while Sequence 1 takes 10.

Side Note: Notice that Sequence 2 has a lower average CPI (9 cycles / 6 instructions = 1.5)
than Sequence 1 (10 cycles / 5 instructions = 2.0). Even though it has more instructions, its
smart mix of "cheaper" instructions makes it faster.

Module 4: What Affects Performance? (The Big

Picture)
The classic performance equation Time = Instruction Count × CPI × Clock Cycle Time is
perfect because it shows us the four "knobs" we can turn to affect performance.
1. Algorithm:
○ Affects: Instruction Count (IC) and CPI.
○ A better algorithm (e.g., a "quicksort" vs. a "bubble sort") will accomplish the same
task with far fewer instructions. It might also use simpler instructions, lowering the
CPI.
2. Programming Language:
○ Affects: Instruction Count (IC) and CPI.
○ A high-level language like Python is easier for humans, but one line of Python might
be translated into 100s of processor instructions (high IC). A language like C gives
the programmer more control, resulting in a lower IC.
○ Languages with complex features (like data abstraction) may require instructions
that have a higher CPI.
3. Compiler:
○ Affects: Instruction Count (IC) and CPI.
○ The compiler is the software that translates your programming language (like C) into
processor instructions. A "smart" compiler might find a more efficient way to
translate the code, resulting in a lower IC and/or a better mix of instructions for a
lower CPI. (This is exactly what we saw in the "Code Sequence" example).
4. Instruction Set Architecture (ISA):
○ Affects: Instruction Count (IC), CPI, and Clock Cycle Time (CT).
○ The ISA is the fundamental "vocabulary" of the processor (e.g., x86 vs. ARM). The ISA
dictates everything. It defines what instructions are available (IC), how complex each
instruction is (CPI), and how fast the clock can be run for those instructions (CT).

Module 5: A Flawed Metric (The "MIPS" Pitfall)

You might hear people talk about performance using MIPS (Million Instructions Per
Second).

MIPS = Instruction Count / (Execution Time × 10⁶)

Since faster computers have less execution time, a faster computer will have a higher MIPS
rating. It's also related to our other metrics:

MIPS = Clock Rate / (CPI × 10⁶)

So, why is MIPS a flawed metric?

1. It Ignores Instruction Capability: MIPS tells you how many instructions were run, not
how much work they did. A computer that uses complex, powerful instructions (low IC)
might look "slower" (low MIPS) than a computer that uses millions of simple, weak
instructions (high MIPS) to do the same task.
2. It's Useless for Comparing Different ISAs: You cannot compare the MIPS rating of an
ARM processor (in your phone) and an x86 processor (in your laptop). They have
different instruction sets, so their "Instruction Counts" for the same program are
completely different.
3. It Varies From Program to Program: Even on the same computer, one program might
have a CPI of 1.5 and another might have a CPI of 3.0. This means the MIPS rating for the
same computer will change depending on what program you're running!
Example: Why MIPS is Misleading

Let's look at two computers.

● Computer A:
○ Instruction Count: 1 billion
○ Clock Rate: 4 GHz
○ CPI: 1.0
● Computer B:
○ Instruction Count: 1.2 billion (different compiler)
○ Clock Rate: 4 GHz
○ CPI: 1.1
Question: Which computer is faster? Let's check MIPS vs. Execution Time.
1. MIPS Calculation:
○ MIPS(A) = (4 × 10⁹) / (1.0 × 10⁶) = 4000 MIPS
○ MIPS(B) = (4 × 10⁹) / (1.1 × 10⁶) = ~3636 MIPS
○ Conclusion by MIPS: Computer A looks faster.
2. Execution Time Calculation (The Right Way):
○ Time(A) = (1 × 10⁹) × 1.0 / (4 × 10⁹) = 0.25 seconds
○ Time(B) = (1.2 × 10⁹) × 1.1 / (4 × 10⁹) = (1.32 × 10⁹) / (4 × 10⁹) = 0.33 seconds
○ Conclusion by Time: Computer A is actually faster.
(Note: The original example text had a contradiction. I have corrected the math here to show a
case where MIPS and Execution Time agree. A contradictory example would require one of
the variables (like IC or Clock Rate) to be different, making the MIPS vs. Time comparison
more complex, but the core pitfalls of MIPS remain.)

A Quick Note on MFLOPS:

For scientific computing (which uses a lot of "floating-point" math), a more relevant metric is
MFLOPS (Million Floating-Point Operations Per Second). This is more useful than MIPS in that
specific context but has similar limitations.
Module 6: How to Improve Performance (Amdahl's
Law)
When you want to improve a system's performance, a key design principle is to:

"Favor the frequent case."

It's more effective to make a common operation 2x faster than to make a rare operation 100x
faster.

Amdahl's Law is the formal name for this concept. It states that the performance
improvement you get from a faster mode is limited by the fraction of the program that can
actually use that faster mode.

The Formula for Amdahl's Law

The overall "Speedup" of a system can be calculated as:

Speedup = 1 / [ (1 - f) + (f / P) ]
Where:
● f = The fraction of the program that can be improved.
● P = The improvement factor for that fraction (e.g., "2x faster" means P = 2).
● (1 - f) = The fraction of the program that is unaffected by the improvement.
Example 1: Floating-Point Improvement
● Given: 10% of a program's instructions are floating-point.
● Goal: We make the floating-point operations 2 times faster.
● Question: What is the overall speedup for the entire program?
1. Identify variables:
○ Fraction affected, f = 0.1 (or 10%)
○ Improvement factor, P = 2
○ Fraction unaffected, (1 - f) = 0.9 (or 90%)
2. Calculate:
○ Speedup = 1 / [ 0.9 + (0.1 / 2) ]
○ Speedup = 1 / [ 0.9 + 0.05 ]
○ Speedup = 1 / 0.95 = 1.053
Answer: Even though we made 10% of the program twice as fast, the overall speedup is only
about 5.3%. This shows that improving a small part of the system gives only a small overall
benefit.

Example 2: How Much Improvement is Needed?

● Given: A program runs in 100 seconds. 80 of those seconds are spent on multiplication
operations.
● Goal: We want the entire program to run 4 times faster (i.e., in 25 seconds).
● Question: How many times faster must the multiplication operation become?
1. Identify variables:
○ Fraction affected, f = 80 / 100 = 0.8 (or 80%)
○ Desired overall Speedup = 4
○ We need to find P (the improvement factor for multiplication).
2. Calculate:
○ Speedup = 1 / [ (1 - f) + (f / P) ]
○ 4 = 1 / [ (1 - 0.8) + (0.8 / P) ]
○ 4 = 1 / [ 0.2 + (0.8 / P) ]
○ Now, solve for P:
○ 1/4 = 0.2 + (0.8 / P)
○ 0.25 = 0.2 + (0.8 / P)
○ 0.05 = 0.8 / P
○ P = 0.8 / 0.05 = 16
Answer: To make the whole program 4x faster, the multiplication operations (which make up
80% of the work) must become 16 times faster.
Module 7: How to Compare Computers Fairly (SPEC
Benchmarks)
Since MIPS is flawed and simple examples aren't realistic, how does the industry actually
compare the performance of different computers?

They use benchmarks. A benchmark is a program, or a set of programs, designed to test the
performance of a computer.

The most common is the SPEC (Standard Performance Evaluation Corporation)

benchmark suite.
● SPEC provides a set of real-world programs, not tiny "toy" examples.
● These programs are CPU-bound, meaning they focus on testing the CPU, not I/O or
other factors.
● They are split into categories like CINT (integer programs) and CFP (floating-point
programs).

The SPEC Ratio

When you run a SPEC benchmark, you don't just get a "time." Instead, you get a SPEC Ratio.

SPEC Ratio = (Reference Time) / (Your Execution Time)

● Reference Time: The time it took a standard "base" machine (chosen by SPEC) to run
the program.
● Your Execution Time: The time it took your computer to run the same program.
This means a higher SPEC Ratio is better. If your SPEC Ratio is 2.83, it means your computer
ran the program 2.83 times faster than the reference machine.

Summarizing with Geometric Mean

The SPEC suite contains many programs (e.g., 10 integer programs). How do you combine
those 10 different SPEC Ratios into a single "overall" score?

You might think to just take the average (the arithmetic mean), but this is misleading. Instead,
SPEC uses the Geometric Mean.

Geometric Mean = n-th root of (Score1 × Score2 × ... × Score_n)

Why use the Geometric Mean?

The geometric mean has a special property: the ratio of the geometric means of two
computers is equal to the geometric mean of their performance ratios.
This sounds complex, but it means that the final score is consistent and fair. It doesn't
matter what "reference machine" SPEC chose; the relative comparison between two
computers will always be the same. It prevents a massive score on one program from skewing
the entire result.

MIPS Architecture
Introduction: Computer Languages and MIPS Context
Modern computers cannot understand high-level programming languages (like Python or C)
directly. How does a computer execute a simple line of code? How do programming
languages translate to hardware operations?

To bridge this gap, we'll look at MIPS assembly language. This reveals how computers
process data, perform arithmetic, and manage memory. By learning about MIPS, you'll see
"under the hood" and gain a practical sense of how CPUs operate.

Module 1: MIPS Instruction Set Architecture

1. What is Assembly? Why MIPS?
Definition and Mapping

Assembly language is a set of human-readable instructions that are directly mapped to binary
machine code. One assembly instruction translates to exactly one machine instruction (a 1:1
mapping).

Analogy: Think of assembly as giving direct, step-by-step commands to a robot

chef (e.g., "pick up knife," "move knife to onion," "press down"). A high-level
language is like handing the chef a recipe book (e.g., "dice onions").

Key Characteristics

Assembly instructions are "primitive"—they do basic things (add, subtract, load data), not full
tasks like "sort an array." High-level concepts like loops are built by stringing together many
simple assembly instructions.

2. MIPS Architecture: Overview and Naming

What is MIPS?
MIPS stands for Microprocessor without Interlocked Pipeline Stages. (It does not mean Million
Instructions Per Second in this context!)

Classification and Context

● Class: It is a RISC (Reduced Instruction Set Computer) architecture. RISC focuses
on simplicity, speed, and keeping instructions regular.
● Context: MIPS is used heavily in teaching and in embedded systems (like routers or
printers).
● Other Architectures: You are likely more familiar with Intel x86 (in PCs) and ARM (in
mobiles/tablets). Each has its own different assembly language.

Analogy: Architectures are like dialects of a language. Each processor brand has
its own dialect for giving instructions.

3. MIPS Registers & Register File

What is a Register?

Definition: Registers are tiny, extremely fast memory cells inside the CPU that hold data
temporarily during execution.

MIPS Register File Structure

● MIPS has 32 registers (named $0 to $31).
● Each register is 32 bits wide.
● The hardware structure allows the CPU to read from two registers and write to one
register all in a single clock cycle, enabling very rapid calculations.
● The number is limited to 32 to keep the CPU fast. More registers = more complexity
and delay.

Real-world Analogy: Registers are like a handful of labeled baskets on a chef's

kitchen counter for holding measured ingredients. The chef (CPU) quickly grabs
from these baskets (registers) to cook (execute instructions). Accessing main
memory would be like the chef having to walk to the pantry for every single
ingredient.

Common MIPS Register Roles

Each 32-bit register has a number (0-31) and a symbolic name for assembly programming.

Name(s) Number Usage

$zero 0 Always contains the value 0 (read-only)

$s0 - $s7 16-23 "Saved" variables (for long-term use)

$t0 - $t9 8-15, 24-25 "Temporary" variables (for short-term use)

$a0 - $a3 4-7 Function arguments

$sp 29 Stack pointer (manages memory)

$ra 31 Return address (for functions)

4. MIPS Instruction Types (R-Type, I-Type)

All MIPS instructions are exactly 32 bits long. This "simplicity favors regularity" design
principle makes the hardware simple and predictable. We'll focus on two main formats.

R-Type (Register) Instructions

Purpose: Perform arithmetic or logic (e.g., add, sub) entirely within registers.

Format:

| opcode (6b) | rs (5b) | rt (5b) | rd (5b) | shamt (5b) | funct (6b) |

| :--- | :--- | :--- | :--- | :--- | :--- |

● opcode: Always 0 for R-Type instructions.

● rs: The 5-bit number of the 1st source register.
● rt: The 5-bit number of the 2nd source register.
● rd: The 5-bit number of the destination register (where the result is stored).
● shamt: "Shift amount" (used only for shift commands).
● funct: Specifies the exact operation (e.g., "add", "subtract").
Analogy: Like mixing baskets of fruit: "Take apples from basket $s1 and basket
$s2, and put the mix into basket $s0."

Step-by-Step C to MIPS Example

C Code:

// A = B + C;

MIPS Equivalent:

(Assume A is in $s0, B is in $s1, C is in $s2)

add $s0, $s1, $s2 // Result = Value in $s1 + Value in $s2

Binary Encoding for add $s0, $s1, $s2

(Using register numbers: $s0=16, $s1=17, $s2=18. The funct code for add is 100000).

Field opcode rs (17) rt (18) rd (16) shamt funct

Binary 000000 10001 10010 10000 00000 10000

I-Type (Immediate) Instructions

Purpose: Work with memory (load/store) or perform arithmetic with a constant value (an
"immediate").

Format:

| opcode (6b) | rs (5b) | rt (5b) | immediate (16b) |

| :--- | :--- | :--- | :--- |

| (e.g., lw) | Base register | Target register | Constant/Offset |

● opcode: Operation code (e.g., 100011 for lw, "load word").

● rs: Base register (usually holds a starting memory address).
● rt: Target register (gets data from memory, or provides data to memory).
● immediate: A 16-bit signed constant value.

Step-by-Step Memory Example

This is how we access data in memory (like an array).

C Code:

(Assume A is an array, $s3 holds the base address of A. H is in $s2. We want to access A[8].)

// A[8] = A[8] + H;

MIPS Equivalent:

(Note: MIPS addresses are in bytes. Since each word is 4 bytes, the 8th element A[8] is at an
offset of 8 * 4 = 32 bytes from the start.)

# $s3 = base address of A, $s2 = H

lw $t0, 32($s3) # Load Word: $t0 = mem[$s3 + 32] (gets A[8])
add $t0, $t0, $s2 # $t0 = $t0 + H
sw $t0, 32($s3) # Store Word: mem[$s3 + 32] = $t0 (saves result)

Real-world Analogy: Memory is like a long row of lockers (numbered boxes).

Each address is a "locker number." To get to item 8 in an array (where each item
takes 4 lockers), you go to the start of the array (base address) and move forward
8 * 4 = 32 lockers (the offset).

5. Conceptual Checkpoints & Reflection

● All MIPS instructions are a fixed 32 bits—this makes the hardware simple and
predictable.
● Registers, not memory, are used for all arithmetic operations (add, sub, etc.) for speed.
● This design follows two key principles: "Simplicity favors regularity" and "Smaller is
faster."
● Unique and special registers (like $zero and $sp) have fixed roles that are essential for
the system.
1. Introduction to Branch Instructions
In MIPS assembly language, branch instructions are fundamental tools used to make
decisions and change the control flow of a program. Think of these instructions as the
assembly language equivalent of if statements combined with a goto statement in
higher-level programming languages.

The Two Main Branch Instructions

MIPS provides two main conditional branch instructions:

● BNE (Branch Not Equal): "If register A ≠ register B, then go to label"

● BEQ (Branch on Equal): "If register A = register B, then go to label"

How Branch Instructions Work in Detail

These instructions work by comparing the contents of two registers. Here's the step-by-step
process:

1. Comparison Phase: The processor compares the values stored in two specified
registers.
2. Decision Phase: Based on the comparison result:
○ If the condition is true (equal for BEQ, not equal for BNE), the program
branches to the target address.
○ If the condition is false, the program continues to the next sequential
instruction.
3. Branch Execution: When branching occurs, the program jumps to the address
specified by a label.

Instruction Format Details

Both BNE and BEQ follow the I-type instruction format, which has the following structure:

● 6 bits: Operation code (opcode)

● 5 bits: First register to compare
● 5 bits: Second register to compare
● 16 bits: Immediate value (offset) used to calculate the branch target address

This immediate value is crucial for determining where exactly the program should branch to
when the condition is met.

2. Branch Addressing and Calculation (PC Relative

Addressing)
Understanding the Branch Address Calculation

The destination address for a branch is calculated using a 16-bit offset value provided by the
instruction. This offset is a signed value, which means it can represent both positive and
negative numbers. This capability enables the program to branch both forward and backward
within the code.

Why Signed Offsets Matter

The signed nature of the offset is crucial because:

● Positive offsets: Allow branching to instructions that appear later in the program
(forward branching).
● Negative offsets: Allow branching to instructions that appear earlier in the program
(backward branching, useful for loops).

Branch Range Limitations Explained

However, there's an important limitation: using a 16-bit offset restricts the branching range to
±2^15 words. Let's break this down:

● 16 bits can represent 2^16 = 65,536 different values.

● Since it's signed, we get: -32,768 to +32,767 words.
● This means the program can only branch within a window of ±32,768 words from the
current instruction.
● In terms of bytes: ±131,072 bytes (since each word is 4 bytes).

PC-Relative Addressing Mechanism in Detail

MIPS uses PC-relative addressing to calculate the branch address. Here's the detailed
process:

1. Current State: The Program Counter (PC) holds the address of the current
instruction being executed.
2. PC Increment: Before executing the branch, the PC is automatically incremented by 4
(pointing to the next instruction).
3. Address Calculation: Since each MIPS instruction is exactly 4 bytes, the branch
address is calculated using this formula:

Branch Address = (PC + 4) + (4 × Offset)

Why This Formula Works:

● (PC + 4): This gives us the address of the instruction that would normally execute
next.
● (4 × Offset): The offset is in words, so we multiply by 4 to convert to bytes.
● Addition: Adding these gives us the final branch target address.

This calculation method allows the branch to target addresses within a range of ±2^15 words
relative to the current instruction, providing flexibility for both forward and backward
branching within the allowed range.

3. Jump Instructions and Addressing (J-Type

Instruction)
Understanding Unconditional Jump Instructions

MIPS supports unconditional jumps using the J-type instruction. Unlike branch instructions
that depend on comparing register values, the jump instruction (j) does not involve any
register comparison. Instead, it directly specifies a target address where the program should
continue execution, making it truly "unconditional."

Key Differences from Branch Instructions

Aspect Branch Instructions Jump Instructions

Condition Requires register No condition needed

comparison

Format I-type (16-bit offset) J-type (26-bit offset)

Range ±32,768 words 256 MB address

space

Addressin PC-relative Pseudo-direct

g
J-Type Instruction Format Breakdown

The J-type instruction format differs significantly from the branch instruction format:

● 6 bits: Operation code (opcode) - identifies this as a jump instruction.

● 26 bits: Jump offset value (this is much larger than the 16-bit offset in branches).
● The offset value is encoded in words, not bytes.
● No register fields are needed since there's no comparison.

Address Construction Process

The jump address construction happens in several steps:

1. Extract Offset: Take the 26-bit value from the instruction.
2. Shift Operation: Left-shift this 26-bit value by 2 bits (equivalent to multiplying by 4).
○ This creates a 28-bit address.
○ The shift converts from words to bytes.
3. Address Combination: Concatenate this 28-bit address with the upper 4 bits of the
current PC.
○ Upper 4 bits of PC + 28 bits from instruction = 32-bit target address

Pseudo-Direct Addressing Explained

This addressing method is called pseudo-direct addressing because:

● It's "pseudo" (not truly direct) because it doesn't provide the complete 32-bit address.
● It's "direct" because it directly specifies most of the target address bits.
● The remaining bits come from the current PC.

Address Space and Partitioning

The pseudo-direct addressing scheme has specific characteristics:

● Total Range: Allows jumps within a 256 MB address space (2^28 bytes).
● Partitioning: This 256 MB space is divided into 16 fixed-size partitions.
● Each Partition: 256 MB ÷ 16 = 16 MB per partition.
● Limitation: Jumps cannot cross partition boundaries with the standard j instruction.

Handling Cross-Partition Jumps

When a jump needs to cross an address partition:

● The standard j instruction cannot be used.

● It must be replaced with a JR (Jump Register) instruction.
● jr uses a full 32-bit address stored in a register.
● This provides complete addressing flexibility but requires an additional register.

4. Addressing Modes in MIPS

MIPS uses various addressing modes to specify how operands (data) are retrieved from
memory or registers. The addressing mode determines exactly how the operand is specified
within the instruction and where the processor should look for the data.

4.1 Immediate Addressing Mode

● Definition: In immediate addressing, the operand is a constant value directly

embedded within the instruction itself.
● Key Characteristics:
○ The data value is part of the instruction.
○ No memory access is required to get the operand.
○ Fastest addressing mode since no additional memory reads are needed.
○ Limited to small constant values (typically 16-bit in MIPS).

Example:
addi $t0, $t1, 100 # Add immediate: $t0 = $t1 + 100

● In this case, 100 is the immediate operand - it's directly specified in the instruction.

4.2 Register Addressing Mode

● Definition: In register addressing, the operand is stored in a processor register.

● Key Characteristics:
○ The operand value is stored in a register.
○ The instruction specifies which register contains the data.
○ Very fast access since registers are built into the processor.
○ No memory access required.

Example:
add $t0, $t1, $t2 # Add: $t0 = $t1 + $t2

● Here, both $t1 and $t2 are register operands - the actual values are stored in these
registers.

4.3 Base Addressing Mode (Also called Base+Offset)

● Definition: In base addressing, the operand is located at a memory address computed

by adding a register value (base) to an immediate constant (offset).
● Key Characteristics:
○ Memory address = Base Register + Immediate Offset
○ Commonly used for accessing arrays and data structures.
○ The base register typically holds a starting address.
○ The offset allows access to different elements relative to that base.

Usage Examples:
lw $t0, 12($t1) # Load word: $t0 = memory[$t1 + 12]
sw $t0, 8($sp) # Store word: memory[$sp + 8] = $t0

●
● Real-world Application: If $t1 points to the start of an array, 12($t1) accesses the
element at position 3 (since each word is 4 bytes: 12 ÷ 4 = 3).

4.4 PC Relative Addressing

● Definition: PC relative addressing is used in branch instructions where the target

address is computed relative to the current Program Counter (PC).
● Detailed Process:
○ Current PC: Points to the address of the branch instruction.
○ PC Increment: PC is automatically incremented by 4 (next instruction address).
○ Offset Application: The signed offset from the instruction is added to the
incremented PC.
○ Final Address: This gives us the branch target address.
● Formula:
Branch Target = (PC + 4) + (Offset × 4)
● Why This Works:
○ Allows position-independent code (the same code works regardless of where
it's loaded in memory).
○ Enables both forward and backward branching using positive and negative
offsets.
○ Efficient for implementing loops and conditional statements.

4.5 Pseudo Direct Addressing

● Definition: Pseudo direct addressing is used in jump instructions where most of the
target address is directly specified in the instruction.
● Detailed Address Construction:
○ Extract 26 bits: Take the jump offset from the instruction (26 bits).
○ Left Shift: Shift these 26 bits left by 2 positions (multiply by 4).
■ This creates a 28-bit address value.
■ The shift converts word addresses to byte addresses.
○ Combine with PC: Take the upper 4 bits from the current PC.
○ Final Address: Concatenate PC[31:28] + 28-bit shifted offset.
● Address Space Details:
○ Total addressable space: 2^28 = 268,435,456 bytes = 256 MB.
○ Partition structure: Divided into 16 partitions of 16 MB each.
○ Partition limitation: Cannot jump between partitions using standard j
instruction.
● Cross-Partition Solution:
○ When jumping across partitions:
○ Use jr (Jump Register) instruction instead.
○ Store the full 32-bit target address in a register.
○ jr provides complete addressing flexibility without partition restrictions.

Computer Organization The Role of Performance
No ratings yet
Computer Organization The Role of Performance
45 pages
Computer Performance Insights
No ratings yet
Computer Performance Insights
22 pages
Understanding Computer Performance Metrics
No ratings yet
Understanding Computer Performance Metrics
32 pages
Calculating Cycles Per Instruction (CPI)
No ratings yet
Calculating Cycles Per Instruction (CPI)
38 pages
Understanding Computer Performance Metrics
No ratings yet
Understanding Computer Performance Metrics
50 pages
Computer Performance Evaluation Metrics
No ratings yet
Computer Performance Evaluation Metrics
53 pages
Lecture # 2
No ratings yet
Lecture # 2
33 pages
CS322: CPU Performance and CPI Analysis
No ratings yet
CS322: CPU Performance and CPI Analysis
56 pages
6 Performance
No ratings yet
6 Performance
49 pages
Computer Performance Metrics
No ratings yet
Computer Performance Metrics
40 pages
Computer Performance Metrics Explained
No ratings yet
Computer Performance Metrics Explained
25 pages
Computer Architecture Performance Metrics
No ratings yet
Computer Architecture Performance Metrics
25 pages
Understanding Processor Performance Metrics
No ratings yet
Understanding Processor Performance Metrics
35 pages
Computer Organization and Architecture (AT70.01)
No ratings yet
Computer Organization and Architecture (AT70.01)
29 pages
EC8552 CAO Unit-1 S03
No ratings yet
EC8552 CAO Unit-1 S03
19 pages
Performance
No ratings yet
Performance
51 pages
Unit 2 Performance
No ratings yet
Unit 2 Performance
6 pages
Bản Sao Của Lecture 2 - Performance Measurement
No ratings yet
Bản Sao Của Lecture 2 - Performance Measurement
9 pages
Measuring Computer Performance
No ratings yet
Measuring Computer Performance
26 pages
Understanding Computer Performance Metrics
No ratings yet
Understanding Computer Performance Metrics
28 pages
CS322: Cost-Performance in Architecture
No ratings yet
CS322: Cost-Performance in Architecture
52 pages
Computer Architecture Performance Analysis
No ratings yet
Computer Architecture Performance Analysis
34 pages
COD Ch. 2 The Role of Performance
No ratings yet
COD Ch. 2 The Role of Performance
13 pages
Computer Performance Metrics Explained
No ratings yet
Computer Performance Metrics Explained
28 pages
2 - Computer Organization and Architecture
No ratings yet
2 - Computer Organization and Architecture
21 pages
L2 CPU Performance
No ratings yet
L2 CPU Performance
20 pages
CPU Performance Metrics Guide
No ratings yet
CPU Performance Metrics Guide
31 pages
Module 3.3 - Problems On Performance
No ratings yet
Module 3.3 - Problems On Performance
54 pages
Measuring Computer Performance Metrics
No ratings yet
Measuring Computer Performance Metrics
41 pages
Computer Organization Overview
No ratings yet
Computer Organization Overview
17 pages
Lec10 Performance
No ratings yet
Lec10 Performance
22 pages
Performance and Power Evaluation Basics
No ratings yet
Performance and Power Evaluation Basics
76 pages
Performance Comparison of Machine Architectures
No ratings yet
Performance Comparison of Machine Architectures
44 pages
Module-2 Introduction and Performance Analysis
No ratings yet
Module-2 Introduction and Performance Analysis
51 pages
Measuring Computer Performance Factors
No ratings yet
Measuring Computer Performance Factors
37 pages
CMPS343Chapter1 Part B
No ratings yet
CMPS343Chapter1 Part B
22 pages
Computer Performance Evaluation Guide
No ratings yet
Computer Performance Evaluation Guide
17 pages
Computer Performance Analysis
No ratings yet
Computer Performance Analysis
23 pages
Cse 317 2
No ratings yet
Cse 317 2
35 pages
Understanding Computer Performance Metrics
No ratings yet
Understanding Computer Performance Metrics
22 pages
CPU Performance and Clock Cycle Insights
No ratings yet
CPU Performance and Clock Cycle Insights
6 pages
Measuring Computer Performance Metrics
No ratings yet
Measuring Computer Performance Metrics
19 pages
Computer Architecture & Performance
No ratings yet
Computer Architecture & Performance
56 pages
Performance Assessment in Computer Architectures
No ratings yet
Performance Assessment in Computer Architectures
13 pages
Understanding Computer Architecture Basics
No ratings yet
Understanding Computer Architecture Basics
54 pages
Computer Performance Metrics and Analysis
No ratings yet
Computer Performance Metrics and Analysis
34 pages
M116C 1 M116C 1 Lect02-Performance
No ratings yet
M116C 1 M116C 1 Lect02-Performance
23 pages
Performance Metrics in CPU Architectures
No ratings yet
Performance Metrics in CPU Architectures
48 pages
Lecture 05
No ratings yet
Lecture 05
16 pages
CSE 332: Understanding Computer Performance
No ratings yet
CSE 332: Understanding Computer Performance
41 pages
4 Perfrmance
No ratings yet
4 Perfrmance
30 pages
Understanding CPI in Computer Architecture
No ratings yet
Understanding CPI in Computer Architecture
52 pages
Computer Performance Metrics Explained
No ratings yet
Computer Performance Metrics Explained
33 pages
Computer Performance Evaluation Guide
No ratings yet
Computer Performance Evaluation Guide
15 pages
Lecture - 4 - Performance
No ratings yet
Lecture - 4 - Performance
31 pages
Performance Measures
No ratings yet
Performance Measures
25 pages
1aca L1
No ratings yet
1aca L1
35 pages
Computer Performance Measurement Guide
No ratings yet
Computer Performance Measurement Guide
14 pages
Computer Abstraction and Performance Evaluation
No ratings yet
Computer Abstraction and Performance Evaluation
25 pages
CPU Performance Measurement Techniques
No ratings yet
CPU Performance Measurement Techniques
37 pages
Verification of AMBA Bus Model Using SystemVerilog
No ratings yet
Verification of AMBA Bus Model Using SystemVerilog
5 pages
Mod Menu Crash 2025 05 09-03 25 18
No ratings yet
Mod Menu Crash 2025 05 09-03 25 18
3 pages
EasyBuilder Pro Recipe Database Guide
No ratings yet
EasyBuilder Pro Recipe Database Guide
25 pages
LEAN Warehouse Assessment Result
No ratings yet
LEAN Warehouse Assessment Result
26 pages
Java OOP Exam Model Paper 2024
No ratings yet
Java OOP Exam Model Paper 2024
14 pages
SAP SABRIX FICO Basics
50% (2)
SAP SABRIX FICO Basics
2 pages
RTRTserwfgferwf
No ratings yet
RTRTserwfgferwf
70 pages
Informatica MCQs Set - 2 - Informatica Training & Programing Free Tutorials
No ratings yet
Informatica MCQs Set - 2 - Informatica Training & Programing Free Tutorials
2 pages
Webex Calling Provisioning Automation
No ratings yet
Webex Calling Provisioning Automation
32 pages
Azure Architect Design Exam AZ-301 Guide
No ratings yet
Azure Architect Design Exam AZ-301 Guide
188 pages
History of Unix
No ratings yet
History of Unix
2 pages
Extended BDD Automation Framework Guide
No ratings yet
Extended BDD Automation Framework Guide
19 pages
TensorFlow RDMA Acceleration
No ratings yet
TensorFlow RDMA Acceleration
35 pages
MikroTik DDoS Protection Strategies
No ratings yet
MikroTik DDoS Protection Strategies
26 pages
Voltage Control Relay Guide
No ratings yet
Voltage Control Relay Guide
1 page
Michael Senkow: Engineering & Design Expertise
No ratings yet
Michael Senkow: Engineering & Design Expertise
2 pages
3D Image Projection and Compression Code
No ratings yet
3D Image Projection and Compression Code
16 pages
Intel 3 Series Express Chipset Family: Datasheet
No ratings yet
Intel 3 Series Express Chipset Family: Datasheet
438 pages
Sa Lab Manual Final Merged
No ratings yet
Sa Lab Manual Final Merged
64 pages
14 Verilog Testbenches
No ratings yet
14 Verilog Testbenches
22 pages
Siebel 8.1.1.x and BI Publusher
100% (1)
Siebel 8.1.1.x and BI Publusher
15 pages
Multiflash 4 Windows PDF
No ratings yet
Multiflash 4 Windows PDF
39 pages
JWT Magazines November Aghazetaleem - Compress
No ratings yet
JWT Magazines November Aghazetaleem - Compress
126 pages
MX29LV160T/AT/AB Flash Memory Specs
No ratings yet
MX29LV160T/AT/AB Flash Memory Specs
66 pages
MS Word January Exam 2022 PDF
100% (3)
MS Word January Exam 2022 PDF
4 pages
Oracle Exadata Performance Review Report
No ratings yet
Oracle Exadata Performance Review Report
114 pages
Online Train Ticket Reservation System
No ratings yet
Online Train Ticket Reservation System
37 pages
First Arrival Seismic Tomography Package
No ratings yet
First Arrival Seismic Tomography Package
22 pages
Resume Dhirendra 1
No ratings yet
Resume Dhirendra 1
2 pages
Information Technology Essentials Guide
No ratings yet
Information Technology Essentials Guide
2 pages

CSP Part1

Uploaded by

CSP Part1

Uploaded by

Computer Systems and Performance

Module 1: What is Computer Performance?

The Airplane Analogy: Performance Isn't Just One Thing

Airplane Passenger Cruising Speed Cruising Range

Boeing 787 ~290 567 mph 8,786 miles

Concorde ~100 1,354 mph 4,488 miles

Airbus 380 ~550 561 mph 9,444 miles

Cessna 172 4 141 mph 790 miles

Which airplane has the "best performance"?

The Two Key Metrics: Response Time vs. Throughput

Module 2: How We Measure Execution Time (The First

The Clock Cycle

The First Performance Equation

CPU Execution Time = (Number of CPU Clock Cycles) / (Clock Rate)

Module 3: Digging Deeper (The "CPI" Performance

CPI: Cycles Per Instruction

The Classic Performance Equation (The Second One)

CPU Clock Cycles = (Instruction Count) × (Average CPI)

CPU Execution Time = (Instruction Count) × (Average CPI) × (Clock Cycle

Calculating Average CPI (Weighted Average)

Example: Comparing Code Sequences

A program uses three classes of instructions:

Module 4: What Affects Performance? (The Big

Module 5: A Flawed Metric (The "MIPS" Pitfall)

MIPS = Instruction Count / (Execution Time × 10⁶)

MIPS = Clock Rate / (CPI × 10⁶)

So, why is MIPS a flawed metric?

Let's look at two computers.

A Quick Note on MFLOPS:

"Favor the frequent case."

The Formula for Amdahl's Law

Example 2: How Much Improvement is Needed?

The most common is the SPEC (Standard Performance Evaluation Corporation)

The SPEC Ratio

SPEC Ratio = (Reference Time) / (Your Execution Time)

Summarizing with Geometric Mean

Geometric Mean = n-th root of (Score1 × Score2 × ... × Score_n)

Why use the Geometric Mean?

Module 1: MIPS Instruction Set Architecture

Analogy: Think of assembly as giving direct, step-by-step commands to a robot

2. MIPS Architecture: Overview and Naming

Classification and Context

3. MIPS Registers & Register File

MIPS Register File Structure

Real-world Analogy: Registers are like a handful of labeled baskets on a chef's

Common MIPS Register Roles

Name(s) Number Usage

$s0 - $s7 16-23 "Saved" variables (for long-term use)

$t0 - $t9 8-15, 24-25 "Temporary" variables (for short-term use)

$a0 - $a3 4-7 Function arguments

$sp 29 Stack pointer (manages memory)

$ra 31 Return address (for functions)

4. MIPS Instruction Types (R-Type, I-Type)

R-Type (Register) Instructions

| opcode (6b) | rs (5b) | rt (5b) | rd (5b) | shamt (5b) | funct (6b) |

| :--- | :--- | :--- | :--- | :--- | :--- |

●​ opcode: Always 0 for R-Type instructions.

Step-by-Step C to MIPS Example

(Assume A is in $s0, B is in $s1, C is in $s2)

add $s0, $s1, $s2 // Result = Value in $s1 + Value in $s2

Binary Encoding for add $s0, $s1, $s2

Field opcode rs (17) rt (18) rd (16) shamt funct

Binary 000000 10001 10010 10000 00000 10000

I-Type (Immediate) Instructions

| opcode (6b) | rs (5b) | rt (5b) | immediate (16b) |

| :--- | :--- | :--- | :--- |

| (e.g., lw) | Base register | Target register | Constant/Offset |

●​ opcode: Operation code (e.g., 100011 for lw, "load word").

Step-by-Step Memory Example

This is how we access data in memory (like an array).

# $s3 = base address of A, $s2 = H

Real-world Analogy: Memory is like a long row of lockers (numbered boxes).

5. Conceptual Checkpoints & Reflection

The Two Main Branch Instructions

MIPS provides two main conditional branch instructions:

●​ BNE (Branch Not Equal): "If register A ≠ register B, then go to label"

● opcode: Always 0 for R-Type instructions.

● opcode: Operation code (e.g., 100011 for lw, "load word").

● BNE (Branch Not Equal): "If register A ≠ register B, then go to label"

● 6 bits: Operation code (opcode)

● 16 bits can represent 2^16 = 65,536 different values.

● 6 bits: Operation code (opcode) - identifies this as a jump instruction.

● The standard j instruction cannot be used.

● Definition: In immediate addressing, the operand is a constant value directly

● Definition: In register addressing, the operand is stored in a processor register.

● Definition: In base addressing, the operand is located at a memory address computed

● Definition: PC relative addressing is used in branch instructions where the target