CSC21000
Mohamed Hiba
April 27, 2025
Bubble Sort Implementation and Performance Analysis (C++ and MIPS)
Objective:
The purpose of this project was to:
Implement a Bubble Sort algorithm based on the textbook Computer Organization and
Design, 5th Edition, Section 2.13.
Measure and compare runtime performance of:
o MIPS assembly implementation (handwritten, running in MARS).
o C++ implementation compiled with:
No optimization (-O0)
Full optimization (-O3)
Generate an optimized assembly listing (.s file) from the C++ program.
Analyze the performance impact of compiler optimization and discuss the potential for
manual optimization in MIPS.
Files Submitted:
File Name Description
sort_driver.cpp
C++17 program implementing bubble sort and swap, with timing via
std::chrono.
sort_O0 Executable compiled from sort_driver.cpp with no optimization (-O0).
sort_O3 Executable compiled from sort_driver.cpp with full optimization (-O3).
sort_O3.s Optimized C++ assembly listing generated with clang++ -S -O3.
bubble_sort.asm Handwritten MIPS program for bubble sort, runnable in MARS.
Methodology:
C++ Implementation:
o sort_driver.cpp contains a direct implementation of textbook bubble sort and
swap.
o Timing was performed using std::chrono::high_resolution_clock.
o Arrays of size 10, 100, 500, and 1000 were initialized in descending order and
sorted.
o Compilation was done twice:
Without optimization (-O0).
With full optimization (-O3).
o Assembly code was generated using clang++ -S -O3.
MIPS Implementation:
o bubble_sort.asm implemented the same algorithm manually in MIPS assembly.
o Timing was measured using syscall 30 (MARS simulator’s microsecond timer).
o Output printed the first 20 elements after sorting to verify correctness.
o Arrays of size 10, 100, 500, and 1000 were sorted.
o 10,000 elements were not tested on MIPS due to impractical simulation time (>20
minutes).
C++ Code (sort_driver.cpp):
// sort_driver.cpp
#include <iostream>
#include <vector>
#include <chrono>
#include <cstdlib>
void swap(int v[], int k)
{
int temp = v[k];
v[k] = v[k+1];
v[k+1] = temp;
}
void sort(int v[], int n)
{
for (int i = 0; i < n; ++i)
for (int j = i-1; j >= 0 && v[j] > v[j+1]; --j)
swap(v, j);
}
int main(int argc, char* argv[])
{
int N = (argc > 1) ? std::atoi(argv[1]) : 10000;
if (N < 10) N = 10;
if (N > 10000) N = 10000;
std::vector<int> v(N);
for (int i = 0; i < N; ++i) v[i] = N - i;
auto t0 = std::chrono::high_resolution_clock::now();
sort([Link](), N);
auto t1 = std::chrono::high_resolution_clock::now();
std::cout << "first 10: ";
for (int i = 0; i < 10 && i < N; ++i) std::cout << v[i] << ' ';
std::cout << "\nμs: "
<< std::chrono::duration_cast<std::chrono::microseconds>(t1-
t0).count() << '\n';
}
MIPS Assembly Code (bubble_sort.asm)
The swap and sort functions are based directly on textbook Figures 2-24 and 2-27.
The array is manually initialized in descending order.
Sorting time is measured using syscall 30.
#############################################################################
##
# bubble_sort_demo.s — Textbook bubble-sort + swap, ready for MARS
# Source: Computer Organization & Design, 5e (§2.13) – adapted for I/O + timing
# Author : Mohamed Hiba
#############################################################################
##
.data
.align 2
array: .space 40000 # 10 000 words × 4 bytes
prompt: .asciiz "Enter array size (10-10000): "
newline: .asciiz "\nFirst 20 sorted numbers:\n"
space: .asciiz " "
time_msg: .asciiz "\nElapsed time (microseconds): "
acc_newline: .asciiz "\n"
.text
.globl main
############################################################
# main — build descending array, call sort, time it, show results
############################################################
main:
# ---- read N -----------------------------------------------------------
li $v0, 4
la $a0, prompt
syscall
li $v0, 5 # read_int
syscall
move $s0, $v0 # s0 ← N
# clamp N to [10,10000] -----------------------------------------------
li $t0, 10
blt $s0, $t0, set10
li $t1, 10000
bgt $s0, $t1, set10000
j init_array
set10: li $s0, 10
j init_array
set10000: li $s0, 10000
# fall through
# ---- initialise array: array[i] = N-i --------------------------------
init_array:
la $s1, array # base ptr
move $t0, $zero #i=0
move $t2, $s0 # val = N
init_loop:
beq $t0, $s0, prep_sort
sw $t2, 0($s1)
addi $t0, $t0, 1
addi $t2, $t2, -1
addi $s1, $s1, 4
j init_loop
# ---- take start-time & call sort(v,n) -------------------------------
prep_sort:
li $v0, 30 # syscall 30: time in µs
syscall
move $t6, $a0 # t6 = start_time
la $a0, array # a0 = &v[0]
move $a1, $s0 # a1 = n
jal sort # ---- BUBBLE SORT ----
# ---- take end-time, compute elapsed -------------------------------
li $v0, 30
syscall
subu $t7, $a0, $t6 # t7 = end - start (µs)
# ---- print first 20 elements ---------------------------------------
li $v0, 4
la $a0, newline
syscall
li $t3, 0 # counter
la $t4, array # ptr
print_loop:
li $t5, 20
beq $t3, $t5, print_time
lw $a0, 0($t4)
li $v0, 1 # print_int
syscall
li $v0, 4
la $a0, space
syscall
addi $t3, $t3, 1
addi $t4, $t4, 4
j print_loop
# ---- show elapsed time ---------------------------------------------
print_time:
li $v0, 4
la $a0, time_msg
syscall
move $a0, $t7
li $v0, 1 # print_int
syscall
li $v0, 4
la $a0, acc_newline
syscall
# ---- exit ----------------------------------------------------------
li $v0, 10
syscall
############################################################
# swap(int v[], int k) — textbook leaf procedure (Fig 2-25)
############################################################
swap:
sll $t1, $a1, 2 # k*4
add $t1, $a0, $t1 # &v[k]
lw $t0, 0($t1) # temp = v[k]
lw $t2, 4($t1) # t2 = v[k+1]
sw $t2, 0($t1) # v[k] = t2
sw $t0, 4($t1) # v[k+1] = temp
jr $ra
############################################################
# sort(int v[], int n) — textbook bubble sort (Fig 2-27)
############################################################
sort:
addi $sp, $sp, -20
sw $ra, 16($sp)
sw $s3, 12($sp)
sw $s2, 8($sp)
sw $s1, 4($sp)
sw $s0, 0($sp)
move $s2, $a0 #v
move $s3, $a1 #n
move $s0, $zero #i=0
outer_test:
slt $t0, $s0, $s3 #i<n?
beq $t0, $zero, outer_exit
addi $s1, $s0, -1 # j = i-1
inner_test:
slti $t0, $s1, 0 #j<0?
bne $t0, $zero, inner_exit
sll $t1, $s1, 2 # j*4
add $t2, $s2, $t1 # &v[j]
lw $t3, 0($t2) # v[j]
lw $t4, 4($t2) # v[j+1]
slt $t0, $t4, $t3 # v[j+1] < v[j] ?
beq $t0, $zero, inner_exit
move $a0, $s2 # parm 1
move $a1, $s1 # parm 2
jal swap
addi $s1, $s1, -1 # j--
j inner_test
inner_exit:
addi $s0, $s0, 1 # i++
j outer_test
outer_exit:
lw $s0, 0($sp)
lw $s1, 4($sp)
lw $s2, 8($sp)
lw $s3, 12($sp)
lw $ra, 16($sp)
addi $sp, $sp, 20
jr $ra
6. Timing Results
Array Size (N) MIPS (baseline) (µs) C++ -O0 (µs) C++ -O3 (µs)
10 4 0 0
100 233 35 9
500 4462 573 305
1000 16606 4467 991
As the textbook mentions it is aligned with what we got:
Analysis:
MIPS vs C++:
o MIPS shows classic O ( n2 ) growth, with runtime rapidly increasing with N.
o C++ -O0 performance was much better than MIPS, but still significant due to lack
of optimization.
o C++ -O3 optimization provided drastic improvements, showing the power of
modern compiler optimizations.
10,000 element sort skipped on MIPS because it would require impractical simulation
time.
Observations and Takeaways
Optimization Matters:
C++ -O3 provided up to 4× faster execution compared to -O0 without modifying source
code.
Compiler Optimization:
In the C++ assembly (sort_O3.s), functions were inlined, loops were optimized, and
unnecessary instructions were eliminated.
Manual MIPS Improvement (Future Work):
Although not required for this project, inlining the swap function or unrolling loops
would further improve the MIPS timing results.
Scalability:
Bubble sort is not practical for large arrays; better sorting algorithms like quicksort would
perform significantly better.