0% found this document useful (0 votes)
8 views21 pages

CSP Part1

Uploaded by

strishanthreddy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views21 pages

CSP Part1

Uploaded by

strishanthreddy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Computer Systems and Performance

Module 1: What is Computer Performance?


Why Should We Measure Performance?
Understanding performance is key to making smart decisions about computers. It helps us
answer critical questions like:
●​ Why is one computer better than another for a specific task (like gaming vs. office work)?
●​ Is my slow computer a hardware problem or a software problem?
●​ If I want to speed up my program, what should I focus on?
●​ Do I need a new computer, or just a new operating system?

The Airplane Analogy: Performance Isn't Just One Thing


Before we dive in, let's use an analogy. Imagine you have four different airplanes:

Airplane Passenger Cruising Speed Cruising Range


Capacity

Boeing 787 ~290 567 mph 8,786 miles

Concorde ~100 1,354 mph 4,488 miles

Airbus 380 ~550 561 mph 9,444 miles

Cessna 172 4 141 mph 790 miles

Which airplane has the "best performance"?


●​ For speed, it's the Concorde.
●​ For passenger throughput (carrying the most people), it's the Airbus 380.
●​ For range, it's the Airbus 380.
This shows us that performance can't be defined by a single number. We need to look at
different metrics depending on our goals.

The Two Key Metrics: Response Time vs. Throughput


In computing, we generally focus on two main metrics:
1.​ Response Time (or Execution Time):
○​ What it is: The total time it takes to complete a single task, from start to finish.
○​ Example: How long it takes to open a web browser, load a level in a game, or run a
scientific calculation.
○​ Goal: You want lower response time.
2.​ Throughput:
○​ What it is: The total amount of work (tasks, transactions) completed in a given unit
of time.
○​ Example: How many customer transactions a bank's server can process per minute,
or how many videos a server can stream simultaneously.
○​ Goal: You want higher throughput.
How are they related?
●​ Improving Response Time: If you replace your computer's processor with a faster one,
your programs will run faster. This reduces response time and, as a side effect, also
improves throughput (since you can get more tasks done in the same amount of time).
●​ Improving Throughput: If you add a second processor to a server, you can run two
tasks at once. This doubles the throughput. However, it doesn't change the response
time for any single task (a task still takes the same amount of time to run).

Module 2: How We Measure Execution Time (The First


Performance Equation)
When we talk about "performance," we most often mean Response Time. But what is the
total execution time of a program?

Total Time = CPU Time + Memory Access Time + I/O Wait Time + ...

This is too complicated. For this guide, we'll focus on CPU Time: the time the processor (CPU)
spends only on processing your task. This doesn't include time spent waiting for your hard
drive, the internet, or running other programs.

The Clock Cycle


Every computer runs on a fixed "clock" that beats at a constant rate, like a metronome.
Hardware events happen in "ticks" of this clock.
●​ Clock Period (or Clock Cycle Time): The duration of one clock tick (e.g., 0.5
nanoseconds).
●​ Clock Rate: The number of clock ticks (cycles) per second (e.g., 2 GigaHertz (GHz),
which is 2 billion cycles per second).
●​ Relationship: Clock Rate = 1 / Clock Period

The First Performance Equation


This gives us our first and most basic performance equation:

CPU Execution Time = (Number of CPU Clock Cycles) × (Clock Cycle Time)
Or, using Clock Rate:

CPU Execution Time = (Number of CPU Clock Cycles) / (Clock Rate)

This tells us that to improve performance (i.e., reduce execution time), we have two options:
1.​ Reduce the number of clock cycles the program needs.
2.​ Reduce the clock cycle time (which means increasing the clock rate).
Example: Upgrading a Computer
●​ Computer A: Runs a program in 10 seconds with a 2 GHz clock.
●​ We want to build Computer B that can run the same program in 6 seconds.
●​ Constraint: Due to design changes, Computer B will require 1.2 times more clock
cycles for this program.
Question: What clock rate must Computer B have?
1.​ Find cycles for A:
○​ Cycles(A) = CPU Time × Clock Rate
○​ Cycles(A) = 10 s × (2 × 10⁹ cycles/s) = 20 × 10⁹ cycles
2.​ Find cycles for B:
○​ Cycles(B) = Cycles(A) × 1.2
○​ Cycles(B) = (20 × 10⁹) × 1.2 = 24 × 10⁹ cycles
3.​ Find clock rate for B:
○​ Clock Rate(B) = Cycles(B) / CPU Time(B)
○​ Clock Rate(B) = (24 × 10⁹ cycles) / 6 s = 4 × 10⁹ cycles/s, or 4 GHz
Answer: To run the program in 6 seconds, Computer B must have a 4 GHz clock.

Module 3: Digging Deeper (The "CPI" Performance


Equation)
The first equation is good, but it has a problem. It assumes we just know the "Number of CPU
Clock Cycles." How do we figure that out?

We can calculate it based on the number of instructions in our program. But not all
instructions are equal. An instruction for add two numbers might be very fast, while divide two
numbers might be very slow.

CPI: Cycles Per Instruction


We use a metric called CPI (Cycles Per Instruction). This is the average number of clock
cycles required to execute one instruction for a given program.

The Classic Performance Equation (The Second One)


Now we can create a much more useful performance equation:

CPU Clock Cycles = (Instruction Count) × (Average CPI)

By substituting this into our first equation, we get the most important performance
equation in this guide:

CPU Execution Time = (Instruction Count) × (Average CPI) × (Clock Cycle


Time)

This masterpiece tells us that the speed of a program depends on three factors:
1.​ Instruction Count (IC): The total number of instructions the program executes.
2.​ Average CPI: The average number of cycles each instruction takes.
3.​ Clock Cycle Time (CT): The speed of the clock.
Example: Comparing Two Computers

Let's compare two computers running the same program (so, the same Instruction Count, "I").
●​ Computer A: Clock Cycle Time = 10 ns, Average CPI = 2.0
●​ Computer B: Clock Cycle Time = 20 ns, Average CPI = 1.2
Question: Which computer is faster?
1.​ CPU Time for A:
○​ Time(A) = I × CPI(A) × CT(A)
○​ Time(A) = I × 2.0 × 10 ns = I × 20 ns
2.​ CPU Time for B:
○​ Time(B) = I × CPI(B) × CT(B)
○​ Time(B) = I × 1.2 × 20 ns = I × 24 ns
Answer: Computer A is faster, because its total execution time (20 ns per instruction) is less
than Computer B's (24 ns per instruction).

Calculating Average CPI (Weighted Average)


How do we find that "average" CPI? We can calculate a weighted average if we know the
different types of instructions.

Example: Comparing Code Sequences

A program uses three classes of instructions:


●​ Class A: CPI = 1 cycle
●​ Class B: CPI = 2 cycles
●​ Class C: CPI = 3 cycles
We are testing two different compiler "sequences" for the same task.
Sequence 1:
●​ Class A: 2 instructions
●​ Class B: 1 instruction
●​ Class C: 2 instructions
Sequence 2:
●​ Class A: 4 instructions
●​ Class B: 1 instruction
●​ Class C: 1 instruction
Question: Which sequence is faster?
1.​ Analyze Sequence 1:
○​ Total Cycles = (2 × 1) + (1 × 2) + (2 × 3) = 2 + 2 + 6 = 10 cycles
○​ Total Instructions = 2 + 1 + 2 = 5 instructions
○​ CPU Time = 10 cycles × (Clock Cycle Time)
2.​ Analyze Sequence 2:
○​ Total Cycles = (4 × 1) + (1 × 2) + (1 × 3) = 4 + 2 + 3 = 9 cycles
○​ Total Instructions = 4 + 1 + 1 = 6 instructions
○​ CPU Time = 9 cycles × (Clock Cycle Time)
Answer: Sequence 2 is faster because it takes only 9 cycles, while Sequence 1 takes 10.

Side Note: Notice that Sequence 2 has a lower average CPI (9 cycles / 6 instructions = 1.5)
than Sequence 1 (10 cycles / 5 instructions = 2.0). Even though it has more instructions, its
smart mix of "cheaper" instructions makes it faster.

Module 4: What Affects Performance? (The Big


Picture)
The classic performance equation Time = Instruction Count × CPI × Clock Cycle Time is
perfect because it shows us the four "knobs" we can turn to affect performance.
1.​ Algorithm:
○​ Affects: Instruction Count (IC) and CPI.
○​ A better algorithm (e.g., a "quicksort" vs. a "bubble sort") will accomplish the same
task with far fewer instructions. It might also use simpler instructions, lowering the
CPI.
2.​ Programming Language:
○​ Affects: Instruction Count (IC) and CPI.
○​ A high-level language like Python is easier for humans, but one line of Python might
be translated into 100s of processor instructions (high IC). A language like C gives
the programmer more control, resulting in a lower IC.
○​ Languages with complex features (like data abstraction) may require instructions
that have a higher CPI.
3.​ Compiler:
○​ Affects: Instruction Count (IC) and CPI.
○​ The compiler is the software that translates your programming language (like C) into
processor instructions. A "smart" compiler might find a more efficient way to
translate the code, resulting in a lower IC and/or a better mix of instructions for a
lower CPI. (This is exactly what we saw in the "Code Sequence" example).
4.​ Instruction Set Architecture (ISA):
○​ Affects: Instruction Count (IC), CPI, and Clock Cycle Time (CT).
○​ The ISA is the fundamental "vocabulary" of the processor (e.g., x86 vs. ARM). The ISA
dictates everything. It defines what instructions are available (IC), how complex each
instruction is (CPI), and how fast the clock can be run for those instructions (CT).

Module 5: A Flawed Metric (The "MIPS" Pitfall)


You might hear people talk about performance using MIPS (Million Instructions Per
Second).

MIPS = Instruction Count / (Execution Time × 10⁶)

Since faster computers have less execution time, a faster computer will have a higher MIPS
rating. It's also related to our other metrics:

MIPS = Clock Rate / (CPI × 10⁶)

So, why is MIPS a flawed metric?


1.​ It Ignores Instruction Capability: MIPS tells you how many instructions were run, not
how much work they did. A computer that uses complex, powerful instructions (low IC)
might look "slower" (low MIPS) than a computer that uses millions of simple, weak
instructions (high MIPS) to do the same task.
2.​ It's Useless for Comparing Different ISAs: You cannot compare the MIPS rating of an
ARM processor (in your phone) and an x86 processor (in your laptop). They have
different instruction sets, so their "Instruction Counts" for the same program are
completely different.
3.​ It Varies From Program to Program: Even on the same computer, one program might
have a CPI of 1.5 and another might have a CPI of 3.0. This means the MIPS rating for the
same computer will change depending on what program you're running!
Example: Why MIPS is Misleading

Let's look at two computers.


●​ Computer A:
○​ Instruction Count: 1 billion
○​ Clock Rate: 4 GHz
○​ CPI: 1.0
●​ Computer B:
○​ Instruction Count: 1.2 billion (different compiler)
○​ Clock Rate: 4 GHz
○​ CPI: 1.1
Question: Which computer is faster? Let's check MIPS vs. Execution Time.
1.​ MIPS Calculation:
○​ MIPS(A) = (4 × 10⁹) / (1.0 × 10⁶) = 4000 MIPS
○​ MIPS(B) = (4 × 10⁹) / (1.1 × 10⁶) = ~3636 MIPS
○​ Conclusion by MIPS: Computer A looks faster.
2.​ Execution Time Calculation (The Right Way):
○​ Time(A) = (1 × 10⁹) × 1.0 / (4 × 10⁹) = 0.25 seconds
○​ Time(B) = (1.2 × 10⁹) × 1.1 / (4 × 10⁹) = (1.32 × 10⁹) / (4 × 10⁹) = 0.33 seconds
○​ Conclusion by Time: Computer A is actually faster.
(Note: The original example text had a contradiction. I have corrected the math here to show a
case where MIPS and Execution Time agree. A contradictory example would require one of
the variables (like IC or Clock Rate) to be different, making the MIPS vs. Time comparison
more complex, but the core pitfalls of MIPS remain.)

A Quick Note on MFLOPS:


For scientific computing (which uses a lot of "floating-point" math), a more relevant metric is
MFLOPS (Million Floating-Point Operations Per Second). This is more useful than MIPS in that
specific context but has similar limitations.
Module 6: How to Improve Performance (Amdahl's
Law)
When you want to improve a system's performance, a key design principle is to:

"Favor the frequent case."

It's more effective to make a common operation 2x faster than to make a rare operation 100x
faster.

Amdahl's Law is the formal name for this concept. It states that the performance
improvement you get from a faster mode is limited by the fraction of the program that can
actually use that faster mode.

The Formula for Amdahl's Law


The overall "Speedup" of a system can be calculated as:

Speedup = 1 / [ (1 - f) + (f / P) ]
Where:
●​ f = The fraction of the program that can be improved.
●​ P = The improvement factor for that fraction (e.g., "2x faster" means P = 2).
●​ (1 - f) = The fraction of the program that is unaffected by the improvement.
Example 1: Floating-Point Improvement
●​ Given: 10% of a program's instructions are floating-point.
●​ Goal: We make the floating-point operations 2 times faster.
●​ Question: What is the overall speedup for the entire program?
1.​ Identify variables:
○​ Fraction affected, f = 0.1 (or 10%)
○​ Improvement factor, P = 2
○​ Fraction unaffected, (1 - f) = 0.9 (or 90%)
2.​ Calculate:
○​ Speedup = 1 / [ 0.9 + (0.1 / 2) ]
○​ Speedup = 1 / [ 0.9 + 0.05 ]
○​ Speedup = 1 / 0.95 = 1.053
Answer: Even though we made 10% of the program twice as fast, the overall speedup is only
about 5.3%. This shows that improving a small part of the system gives only a small overall
benefit.

Example 2: How Much Improvement is Needed?


●​ Given: A program runs in 100 seconds. 80 of those seconds are spent on multiplication
operations.
●​ Goal: We want the entire program to run 4 times faster (i.e., in 25 seconds).
●​ Question: How many times faster must the multiplication operation become?
1.​ Identify variables:
○​ Fraction affected, f = 80 / 100 = 0.8 (or 80%)
○​ Desired overall Speedup = 4
○​ We need to find P (the improvement factor for multiplication).
2.​ Calculate:
○​ Speedup = 1 / [ (1 - f) + (f / P) ]
○​ 4 = 1 / [ (1 - 0.8) + (0.8 / P) ]
○​ 4 = 1 / [ 0.2 + (0.8 / P) ]
○​ Now, solve for P:
○​ 1/4 = 0.2 + (0.8 / P)
○​ 0.25 = 0.2 + (0.8 / P)
○​ 0.05 = 0.8 / P
○​ P = 0.8 / 0.05 = 16
Answer: To make the whole program 4x faster, the multiplication operations (which make up
80% of the work) must become 16 times faster.
Module 7: How to Compare Computers Fairly (SPEC
Benchmarks)
Since MIPS is flawed and simple examples aren't realistic, how does the industry actually
compare the performance of different computers?

They use benchmarks. A benchmark is a program, or a set of programs, designed to test the
performance of a computer.

The most common is the SPEC (Standard Performance Evaluation Corporation)


benchmark suite.
●​ SPEC provides a set of real-world programs, not tiny "toy" examples.
●​ These programs are CPU-bound, meaning they focus on testing the CPU, not I/O or
other factors.
●​ They are split into categories like CINT (integer programs) and CFP (floating-point
programs).

The SPEC Ratio


When you run a SPEC benchmark, you don't just get a "time." Instead, you get a SPEC Ratio.

SPEC Ratio = (Reference Time) / (Your Execution Time)

●​ Reference Time: The time it took a standard "base" machine (chosen by SPEC) to run
the program.
●​ Your Execution Time: The time it took your computer to run the same program.
This means a higher SPEC Ratio is better. If your SPEC Ratio is 2.83, it means your computer
ran the program 2.83 times faster than the reference machine.

Summarizing with Geometric Mean


The SPEC suite contains many programs (e.g., 10 integer programs). How do you combine
those 10 different SPEC Ratios into a single "overall" score?

You might think to just take the average (the arithmetic mean), but this is misleading. Instead,
SPEC uses the Geometric Mean.

Geometric Mean = n-th root of (Score1 × Score2 × ... × Score_n)

Why use the Geometric Mean?


The geometric mean has a special property: the ratio of the geometric means of two
computers is equal to the geometric mean of their performance ratios.
This sounds complex, but it means that the final score is consistent and fair. It doesn't
matter what "reference machine" SPEC chose; the relative comparison between two
computers will always be the same. It prevents a massive score on one program from skewing
the entire result.

MIPS Architecture
Introduction: Computer Languages and MIPS Context
Modern computers cannot understand high-level programming languages (like Python or C)
directly. How does a computer execute a simple line of code? How do programming
languages translate to hardware operations?

To bridge this gap, we'll look at MIPS assembly language. This reveals how computers
process data, perform arithmetic, and manage memory. By learning about MIPS, you'll see
"under the hood" and gain a practical sense of how CPUs operate.

Module 1: MIPS Instruction Set Architecture


1. What is Assembly? Why MIPS?
Definition and Mapping

Assembly language is a set of human-readable instructions that are directly mapped to binary
machine code. One assembly instruction translates to exactly one machine instruction (a 1:1
mapping).

Analogy: Think of assembly as giving direct, step-by-step commands to a robot


chef (e.g., "pick up knife," "move knife to onion," "press down"). A high-level
language is like handing the chef a recipe book (e.g., "dice onions").

Key Characteristics

Assembly instructions are "primitive"—they do basic things (add, subtract, load data), not full
tasks like "sort an array." High-level concepts like loops are built by stringing together many
simple assembly instructions.

2. MIPS Architecture: Overview and Naming


What is MIPS?
MIPS stands for Microprocessor without Interlocked Pipeline Stages. (It does not mean Million
Instructions Per Second in this context!)

Classification and Context


●​ Class: It is a RISC (Reduced Instruction Set Computer) architecture. RISC focuses
on simplicity, speed, and keeping instructions regular.
●​ Context: MIPS is used heavily in teaching and in embedded systems (like routers or
printers).
●​ Other Architectures: You are likely more familiar with Intel x86 (in PCs) and ARM (in
mobiles/tablets). Each has its own different assembly language.

Analogy: Architectures are like dialects of a language. Each processor brand has
its own dialect for giving instructions.

3. MIPS Registers & Register File


What is a Register?

Definition: Registers are tiny, extremely fast memory cells inside the CPU that hold data
temporarily during execution.

MIPS Register File Structure


●​ MIPS has 32 registers (named $0 to $31).
●​ Each register is 32 bits wide.
●​ The hardware structure allows the CPU to read from two registers and write to one
register all in a single clock cycle, enabling very rapid calculations.
●​ The number is limited to 32 to keep the CPU fast. More registers = more complexity
and delay.

Real-world Analogy: Registers are like a handful of labeled baskets on a chef's


kitchen counter for holding measured ingredients. The chef (CPU) quickly grabs
from these baskets (registers) to cook (execute instructions). Accessing main
memory would be like the chef having to walk to the pantry for every single
ingredient.

Common MIPS Register Roles

Each 32-bit register has a number (0-31) and a symbolic name for assembly programming.

Name(s) Number Usage


$zero 0 Always contains the value 0 (read-only)

$s0 - $s7 16-23 "Saved" variables (for long-term use)

$t0 - $t9 8-15, 24-25 "Temporary" variables (for short-term use)

$a0 - $a3 4-7 Function arguments

$sp 29 Stack pointer (manages memory)

$ra 31 Return address (for functions)

4. MIPS Instruction Types (R-Type, I-Type)


All MIPS instructions are exactly 32 bits long. This "simplicity favors regularity" design
principle makes the hardware simple and predictable. We'll focus on two main formats.

R-Type (Register) Instructions


Purpose: Perform arithmetic or logic (e.g., add, sub) entirely within registers.

Format:

| opcode (6b) | rs (5b) | rt (5b) | rd (5b) | shamt (5b) | funct (6b) |

| :--- | :--- | :--- | :--- | :--- | :--- |

| 000000 | 1st source | 2nd source | Destination | shift amt | function code |

●​ opcode: Always 0 for R-Type instructions.


●​ rs: The 5-bit number of the 1st source register.
●​ rt: The 5-bit number of the 2nd source register.
●​ rd: The 5-bit number of the destination register (where the result is stored).
●​ shamt: "Shift amount" (used only for shift commands).
●​ funct: Specifies the exact operation (e.g., "add", "subtract").
Analogy: Like mixing baskets of fruit: "Take apples from basket $s1 and basket
$s2, and put the mix into basket $s0."

Step-by-Step C to MIPS Example

C Code:

// A = B + C;

MIPS Equivalent:

(Assume A is in $s0, B is in $s1, C is in $s2)

add $s0, $s1, $s2 // Result = Value in $s1 + Value in $s2

Binary Encoding for add $s0, $s1, $s2

(Using register numbers: $s0=16, $s1=17, $s2=18. The funct code for add is 100000).

Field opcode rs (17) rt (18) rd (16) shamt funct

Binary 000000 10001 10010 10000 00000 10000


0

I-Type (Immediate) Instructions


Purpose: Work with memory (load/store) or perform arithmetic with a constant value (an
"immediate").

Format:

| opcode (6b) | rs (5b) | rt (5b) | immediate (16b) |

| :--- | :--- | :--- | :--- |

| (e.g., lw) | Base register | Target register | Constant/Offset |

●​ opcode: Operation code (e.g., 100011 for lw, "load word").


●​ rs: Base register (usually holds a starting memory address).
●​ rt: Target register (gets data from memory, or provides data to memory).
●​ immediate: A 16-bit signed constant value.

Step-by-Step Memory Example

This is how we access data in memory (like an array).

C Code:

(Assume A is an array, $s3 holds the base address of A. H is in $s2. We want to access A[8].)

// A[8] = A[8] + H;

MIPS Equivalent:

(Note: MIPS addresses are in bytes. Since each word is 4 bytes, the 8th element A[8] is at an
offset of 8 * 4 = 32 bytes from the start.)

# $s3 = base address of A, $s2 = H


lw $t0, 32($s3) # Load Word: $t0 = mem[$s3 + 32] (gets A[8])
add $t0, $t0, $s2 # $t0 = $t0 + H
sw $t0, 32($s3) # Store Word: mem[$s3 + 32] = $t0 (saves result)

Real-world Analogy: Memory is like a long row of lockers (numbered boxes).


Each address is a "locker number." To get to item 8 in an array (where each item
takes 4 lockers), you go to the start of the array (base address) and move forward
8 * 4 = 32 lockers (the offset).

5. Conceptual Checkpoints & Reflection


●​ All MIPS instructions are a fixed 32 bits—this makes the hardware simple and
predictable.
●​ Registers, not memory, are used for all arithmetic operations (add, sub, etc.) for speed.
●​ This design follows two key principles: "Simplicity favors regularity" and "Smaller is
faster."
●​ Unique and special registers (like $zero and $sp) have fixed roles that are essential for
the system.
1. Introduction to Branch Instructions
In MIPS assembly language, branch instructions are fundamental tools used to make
decisions and change the control flow of a program. Think of these instructions as the
assembly language equivalent of if statements combined with a goto statement in
higher-level programming languages.

The Two Main Branch Instructions

MIPS provides two main conditional branch instructions:

●​ BNE (Branch Not Equal): "If register A ≠ register B, then go to label"


●​ BEQ (Branch on Equal): "If register A = register B, then go to label"

How Branch Instructions Work in Detail

These instructions work by comparing the contents of two registers. Here's the step-by-step
process:

1.​ Comparison Phase: The processor compares the values stored in two specified
registers.
2.​ Decision Phase: Based on the comparison result:
○​ If the condition is true (equal for BEQ, not equal for BNE), the program
branches to the target address.
○​ If the condition is false, the program continues to the next sequential
instruction.
3.​ Branch Execution: When branching occurs, the program jumps to the address
specified by a label.

Instruction Format Details

Both BNE and BEQ follow the I-type instruction format, which has the following structure:

●​ 6 bits: Operation code (opcode)


●​ 5 bits: First register to compare
●​ 5 bits: Second register to compare
●​ 16 bits: Immediate value (offset) used to calculate the branch target address

This immediate value is crucial for determining where exactly the program should branch to
when the condition is met.

2. Branch Addressing and Calculation (PC Relative


Addressing)
Understanding the Branch Address Calculation

The destination address for a branch is calculated using a 16-bit offset value provided by the
instruction. This offset is a signed value, which means it can represent both positive and
negative numbers. This capability enables the program to branch both forward and backward
within the code.

Why Signed Offsets Matter

The signed nature of the offset is crucial because:

●​ Positive offsets: Allow branching to instructions that appear later in the program
(forward branching).
●​ Negative offsets: Allow branching to instructions that appear earlier in the program
(backward branching, useful for loops).

Branch Range Limitations Explained

However, there's an important limitation: using a 16-bit offset restricts the branching range to
±2^15 words. Let's break this down:

●​ 16 bits can represent 2^16 = 65,536 different values.


●​ Since it's signed, we get: -32,768 to +32,767 words.
●​ This means the program can only branch within a window of ±32,768 words from the
current instruction.
●​ In terms of bytes: ±131,072 bytes (since each word is 4 bytes).

PC-Relative Addressing Mechanism in Detail

MIPS uses PC-relative addressing to calculate the branch address. Here's the detailed
process:

1.​ Current State: The Program Counter (PC) holds the address of the current
instruction being executed.
2.​ PC Increment: Before executing the branch, the PC is automatically incremented by 4
(pointing to the next instruction).
3.​ Address Calculation: Since each MIPS instruction is exactly 4 bytes, the branch
address is calculated using this formula:

Branch Address = (PC + 4) + (4 × Offset)

Why This Formula Works:

●​ (PC + 4): This gives us the address of the instruction that would normally execute
next.
●​ (4 × Offset): The offset is in words, so we multiply by 4 to convert to bytes.
●​ Addition: Adding these gives us the final branch target address.

This calculation method allows the branch to target addresses within a range of ±2^15 words
relative to the current instruction, providing flexibility for both forward and backward
branching within the allowed range.

3. Jump Instructions and Addressing (J-Type


Instruction)
Understanding Unconditional Jump Instructions

MIPS supports unconditional jumps using the J-type instruction. Unlike branch instructions
that depend on comparing register values, the jump instruction (j) does not involve any
register comparison. Instead, it directly specifies a target address where the program should
continue execution, making it truly "unconditional."

Key Differences from Branch Instructions


Aspect Branch Instructions Jump Instructions

Condition Requires register No condition needed


comparison

Format I-type (16-bit offset) J-type (26-bit offset)

Range ±32,768 words 256 MB address


space

Addressin PC-relative Pseudo-direct


g
J-Type Instruction Format Breakdown

The J-type instruction format differs significantly from the branch instruction format:

●​ 6 bits: Operation code (opcode) - identifies this as a jump instruction.


●​ 26 bits: Jump offset value (this is much larger than the 16-bit offset in branches).
●​ The offset value is encoded in words, not bytes.
●​ No register fields are needed since there's no comparison.

Address Construction Process

The jump address construction happens in several steps:

1.​ Extract Offset: Take the 26-bit value from the instruction.
2.​ Shift Operation: Left-shift this 26-bit value by 2 bits (equivalent to multiplying by 4).
○​ This creates a 28-bit address.
○​ The shift converts from words to bytes.
3.​ Address Combination: Concatenate this 28-bit address with the upper 4 bits of the
current PC.
○​ Upper 4 bits of PC + 28 bits from instruction = 32-bit target address

Pseudo-Direct Addressing Explained

This addressing method is called pseudo-direct addressing because:

●​ It's "pseudo" (not truly direct) because it doesn't provide the complete 32-bit address.
●​ It's "direct" because it directly specifies most of the target address bits.
●​ The remaining bits come from the current PC.

Address Space and Partitioning

The pseudo-direct addressing scheme has specific characteristics:

●​ Total Range: Allows jumps within a 256 MB address space (2^28 bytes).
●​ Partitioning: This 256 MB space is divided into 16 fixed-size partitions.
●​ Each Partition: 256 MB ÷ 16 = 16 MB per partition.
●​ Limitation: Jumps cannot cross partition boundaries with the standard j instruction.

Handling Cross-Partition Jumps

When a jump needs to cross an address partition:

●​ The standard j instruction cannot be used.


●​ It must be replaced with a JR (Jump Register) instruction.
●​ jr uses a full 32-bit address stored in a register.
●​ This provides complete addressing flexibility but requires an additional register.

4. Addressing Modes in MIPS


MIPS uses various addressing modes to specify how operands (data) are retrieved from
memory or registers. The addressing mode determines exactly how the operand is specified
within the instruction and where the processor should look for the data.

4.1 Immediate Addressing Mode

●​ Definition: In immediate addressing, the operand is a constant value directly


embedded within the instruction itself.
●​ Key Characteristics:
○​ The data value is part of the instruction.
○​ No memory access is required to get the operand.
○​ Fastest addressing mode since no additional memory reads are needed.
○​ Limited to small constant values (typically 16-bit in MIPS).

Example:​
addi $t0, $t1, 100 # Add immediate: $t0 = $t1 + 100

●​ In this case, 100 is the immediate operand - it's directly specified in the instruction.

4.2 Register Addressing Mode

●​ Definition: In register addressing, the operand is stored in a processor register.


●​ Key Characteristics:
○​ The operand value is stored in a register.
○​ The instruction specifies which register contains the data.
○​ Very fast access since registers are built into the processor.
○​ No memory access required.

Example:​
add $t0, $t1, $t2 # Add: $t0 = $t1 + $t2

●​ Here, both $t1 and $t2 are register operands - the actual values are stored in these
registers.

4.3 Base Addressing Mode (Also called Base+Offset)

●​ Definition: In base addressing, the operand is located at a memory address computed


by adding a register value (base) to an immediate constant (offset).
●​ Key Characteristics:
○​ Memory address = Base Register + Immediate Offset
○​ Commonly used for accessing arrays and data structures.
○​ The base register typically holds a starting address.
○​ The offset allows access to different elements relative to that base.

Usage Examples:​
lw $t0, 12($t1) # Load word: $t0 = memory[$t1 + 12]
sw $t0, 8($sp) # Store word: memory[$sp + 8] = $t0

●​
●​ Real-world Application: If $t1 points to the start of an array, 12($t1) accesses the
element at position 3 (since each word is 4 bytes: 12 ÷ 4 = 3).

4.4 PC Relative Addressing

●​ Definition: PC relative addressing is used in branch instructions where the target


address is computed relative to the current Program Counter (PC).
●​ Detailed Process:
○​ Current PC: Points to the address of the branch instruction.
○​ PC Increment: PC is automatically incremented by 4 (next instruction address).
○​ Offset Application: The signed offset from the instruction is added to the
incremented PC.
○​ Final Address: This gives us the branch target address.
●​ Formula:​
Branch Target = (PC + 4) + (Offset × 4)
●​ Why This Works:
○​ Allows position-independent code (the same code works regardless of where
it's loaded in memory).
○​ Enables both forward and backward branching using positive and negative
offsets.
○​ Efficient for implementing loops and conditional statements.

4.5 Pseudo Direct Addressing

●​ Definition: Pseudo direct addressing is used in jump instructions where most of the
target address is directly specified in the instruction.
●​ Detailed Address Construction:
○​ Extract 26 bits: Take the jump offset from the instruction (26 bits).
○​ Left Shift: Shift these 26 bits left by 2 positions (multiply by 4).
■​ This creates a 28-bit address value.
■​ The shift converts word addresses to byte addresses.
○​ Combine with PC: Take the upper 4 bits from the current PC.
○​ Final Address: Concatenate PC[31:28] + 28-bit shifted offset.
●​ Address Space Details:
○​ Total addressable space: 2^28 = 268,435,456 bytes = 256 MB.
○​ Partition structure: Divided into 16 partitions of 16 MB each.
○​ Partition limitation: Cannot jump between partitions using standard j
instruction.
●​ Cross-Partition Solution:
○​ When jumping across partitions:
○​ Use jr (Jump Register) instruction instead.
○​ Store the full 32-bit target address in a register.
○​ jr provides complete addressing flexibility without partition restrictions.

You might also like