2021
Summary of algorithms
DATA STRUCTURES AND ALGORITHMS COURSE KEN1420
Table of contents
Algorithms ..................................................................................................................................................... 2
Basic Algorithm Analysis ............................................................................................................................... 2
Asymptotic Algorithm Analysis ..................................................................................................................... 3
Binary Search................................................................................................................................................. 3
Tree Traversal ................................................................................................................................................ 3
Bubble Sort .................................................................................................................................................... 5
Merge Sort .................................................................................................................................................... 5
Quick Sort ...................................................................................................................................................... 5
Bucket Sort .................................................................................................................................................... 6
Graph Traversal ............................................................................................................................................. 6
Dijkstra’s Algorithm ....................................................................................................................................... 7
Topological Sort............................................................................................................................................. 8
Amortization.................................................................................................................................................. 8
The Greedy Method ...................................................................................................................................... 9
Divide and Conquer! ..................................................................................................................................... 9
Dynamic Programming .................................................................................................................................. 9
Brute-force pattern search..........................................................................................................................10
Boyer-Moore Heuristics ..............................................................................................................................10
The KMP-algorithm .....................................................................................................................................10
Algorithm complexity summary ..................................................................................................................12
Algorithms:
• Contains set of input/output and sequence of instructions how to process
input to output.
• Instructions are:
o Unambiguous
o Executable
o Terminating
Basic Algorithm Analysis
• Pseudo-code: high level description of an algorithm, less detailed than
actual code.
• Main idea: help to understand how program works, useful to analyze
efficiency.
• Theoretical Analysis:
o High-level description instead of implementation.
o Characterize running time in terms of input size N.
o Take into account all possible inputs.
• Properties of Algorithm: runtime, storage space, input, output, operations
performed. All these properties determine the complexity of an algorithm.
• Central questions: performance, scale, classification, bounds.
• Primitive Operations: basic computation performed by an algorithm;
assumed to take constant amount of time.
• Algorithm Classes (n is input size):
o Constant ≈ 1. Typical for a set of atomic operations. Not affected by
input size.
o Logarithmic ≈ log(𝑛). Occurs when input is constantly decreasing by
half.
o Linear ≈ 𝑛. Every item is process e.g. a loop over input.
o N-Log-N ~ ≈ 𝑛 ∗ log(𝑛). Case of best performing sorting algorithms.
Each item in input is processed log(𝑛) times.
o Quadratic ≈ 𝑛2 . Can be seen when there are nested loops.
o Exponential ≈ 2𝑛 . Typically occurs in approximation algorithms.
When permutations or combinations of input is considered.
Asymptotic Algorithm Analysis
• Big O: upper bound for the algorithm complexity. Maximal growth rate of a
function.
• In the asymptotic analysis all the constant factors and lower-order terms
are dropped, so we end up in a specific complexity class.
• Big Ω: lower bound for algorithm complexity.
• Big Θ: average case of algorithm complexity.
Binary Search: searches a specific element in a sorted set. With each
new threshold set is cut in half until element in not found.
• Pick middle element as a threshold.
• If a searched element is not equal to threshold and smaller/bigger than the
threshold, perform search on the left/right half from the threshold.
• If threshold equals out searched element, stop.
• Runs in 𝑂(log(𝑛))time and uses 𝑂(𝑛)space.
Tree Traversal: Iteration through a tree in systematic manner.
• Euler tour: walk around the tree and visit each node by preorder, inorder
and postorder techniques.
• Preorder traversal: each node is visited before its descendants. Process root
node, then its left/right subtrees.
• Postorder traversal: nodes are visited after their descendants. Process
left/right subtrees, then root node.
• Inorder: node is visited after its left subtree and before right subtree.
Process left subtree, then root node, then right. Can be used in printing
arithmetic expressions.
• Traversals are defined recursively.
Preorder traversal.
Postorder traversal.
Inorder traversal.
Bubble Sort
• For each pair of items:
o Swap if in wrong order.
o Stop if no more swaps are made.
• 𝑂(𝑛2 ) complexity.
Merge Sort
• Not in-place: needs n extra space during execution.
• General approach:
o Divide list into two part.
o Sort each part.
o Merge parts back.
• Runtime is of 𝑂(𝑛 ∗ log(𝑛)).
Quick Sort
• In-place sort.
• General approach:
o Divide into two parts based on a picked pivot element, such that
elements in one are smaller than in another.
o Sort each part.
o Append one part to another to maintain proper order.
• Pivot element can be picked randomly or it’s a median of first, middle and
last element.
• Worst-case time: 𝑂(𝑛2 ). Can occur due to a bad pivot or input elements
e.g., 1, 2, 2, 2, 2, 2.
• Average time: 𝑂(𝑛 ∗ log(𝑛)).
Bucket Sort: Linear time sorting that uses representation of the
element.
• Non-comparison based.
• Not in-place.
• General approach:
o Establish finite range N.
o Create N buckets (auxiliary array to stores key of elements).
o Put each element into its corresponding key-bucket.
• Run time of 𝑂(𝑛).
• Storage: 𝑂(N), can get out of hands if large is too big.
Graph Traversal
• Two typical algorithms: Depth-first search (DFS), Breadth-first search (BFS).
• DFS: finds path between two vertices by exploring each possible path as
many steps as possible before backtracking.
o General approach.
▪ Mark v as visited.
▪ For each unvisited neighbor vi of v.
▪ Perform DFS on vi.
o Run time: 𝑂(V + E).
• BFS: finds path between two nodes by taking one step down the path and
then backtracking. Return paths with the fewest edges between the start
and the goal vertices.
o General approach:
▪ Start with enqueued initial vertex.
▪ Deque vertex.
▪ For all not visited neighbors of vertex, mark as visited and
enqueue.
▪ Back to step 2, until all vertices are not visited.
o Run time: 𝑂(V + E).
• DFS performs better at biconnected components.
• BFS is better at finding shortest paths.
Dijkstra’s Algorithm: Finds shortest path between two vertices in a
weighted directed graph.
• Idea is to create table of current best path to each vertex: distance,
previous vertex. Improve it until it reaches the best solution.
• Step of initialization:
o Process each vertex by adding it to a priority que Q and creating a
table.
• Main part:
o For each vertex v in Q, visit its neighbors.
o If current distance to neighbor is bigger than distance to v + distance
from v to neighbor, then change table and mark v as a previous
vertex of neighbor.
• Greedy algorithm: makes locally optimal decisions.
• Run time by using priority que and graph represented by adjacency list is
𝑂((E + V) ∗ log(V)).
o 𝑂(E ∗ log(V)): Changing of a vertex priority takes log(V) if we are
using binary heap. This will be performed on a vertex at most once
for its neighbors, which is in worst case E.
o 𝑂(V ∗ log(V)): Removing item from priority que takes log(V). This
operation will be performed on every vertice, so V.
o 𝑂((E + V) ∗ log(V)): 𝑂(E ∗ log(V)) + 𝑂(V ∗ log(V)).
Topological Sort: Applied on directed acyclic graphs. Used for Bayesian
networks.
• Given a DAG, will output vertices in such order that no vertex appears
before another vertex that points to it.
• Algorithm:
o Keep track of the in-degree of each node.
o Use queue to stores vertices that can be added.
o Every time in-degree is zero, enqueue it.
o Every time a node is processed, decrement it’s adjacent’ in-degree.
• Run time: 𝑂(𝐸 + 𝑉):
o Initialization: 𝑂(E + V), if adjacency list is used.
o Sum of all enqueues and dequeues: 𝑂(V).
o Sum of all decrements: 𝑂(E).
Amortization: Paying off of a debt in equal instalments composed of
gradually changing amounts of principal and interest.
• When calculating amortized runtime the goal is to compare total cost of
operations with how many of those happened.
• Intuition: build up enough credit with a series of cheap operations, so that
when more expensive is to be executed, we can average out the cost.
• Example:
o Always analyze following equation: total cost of operation/total
number of operations.
o Resize rule: when inserting into full array, make a new one of a
doubled size.
o Insertion is a cheap operation 𝑂(1) and increase of size is expensive
𝑂(N). Number of cheap operations increase linearly as we increase
size of array.
𝑁𝑐ℎ𝑒𝑎𝑝𝑜𝑝𝑒𝑟𝑎𝑡𝑖𝑜𝑛𝑠∗𝑂(1)+1𝑒𝑥𝑝𝑒𝑛𝑠𝑖𝑣𝑒 𝑂(𝑛)
o = = 𝑂(1)
𝑁+1𝑜𝑝𝑒𝑟𝑎𝑡𝑖𝑜𝑛𝑠 𝑁
o Important here: we had N cheap operations 𝑂(1) and one expensive
that cost is 𝑂(N).
• Used for average case analysis of run time.
The Greedy Method: Globally-optimal solution can always be found a
series of local improvements.
• Does not lead to optimal solution in general.
• General approach:
o Start from some well understood configuration.
o Make the decision that best from all of those are currently possible.
• Example: Knapsack problem.
Divide and Conquer!
• Divide: divide input S in disjoint subsets S1, S2….
• Recur: solve the subproblems recursively.
• Conquer: combine solutions for S1, S2… into a solution for S.
• Implemented as recursion:
o The base case for the recursion are the sub-problems of constant
size.
• Example: Merge sot.
• Time complexity: can become exponential since same sub-problems can
occur multiple time.
o Solution: Start with the smallest sub-problems, store their results and
use them to solve larger problems.
Dynamic Programming: Can be interpreted as a special variety of space-
and-time tradeoff.
• To solve optimization problem.
• Breaking up task into sub-problems which can be enumerated easily.
• Optimal solution can be derived from local optimal sub-solutions.
• Sub-problems can overlap.
• Main principle:
o Store the results of smaller instances.
o Solve a larger instance more quickly rather than repeatedly solving
the smaller instances more than once.
• Example: Dijkstra shortest path algorithm.
Brute-force pattern search
• Compare pattern P with the text T for each possible shift of P relative to T
until match found or all placements were tried.
• Runs in 𝑂(nm), where n – length of pattern P, and m – size of text T.
Boyer-Moore Heuristics
• Based on two heuristics:
o Looking-glass heuristic: compare P with a subsequence of T moving
backwards.
o Character-jump heuristic: when a mismatch occurs at T[i] = c.
o Last-occurrence function: preprocessing pattern and the alphabet.
▪ Setting index of last occurring letter in alphabet of a pattern.
• Runs in 𝑂(nm + s).
The KMP-algorithm: Compares pattern to the text in left-to-right.
• When a mismatch occurs, shifts to the largest prefix that is also a suffix.
• KMP failure function: processes the pattern to find matches of prefixes of
the pattern with the pattern itself.
o KMP modifies brute-force algorithm so that if a mismatch occurs at
P[j] != T[i] we set j <- F(j-1).
• Run time in 𝑂(m + n).
Algorithm complexity summary
Algorithm Time complexity Space
complexity
Best Average Worst Worst
Binary search Ω(log(𝑛)) Θ(log(𝑛)) 𝑂(log(𝑛)) N/A
Bubble sort Ω(𝑛) Θ(𝑛2 ) 𝑂(𝑛2 ) 𝑂(1)
Insertion sort Ω(𝑛) Θ(𝑛2 ) 𝑂(𝑛2 ) 𝑂(1)
Selection sort Ω(𝑛2 ) Θ(𝑛2 ) 𝑂(𝑛2 ) 𝑂(1)
Heapsort Ω(𝑛log(𝑛)) Θ(𝑛log(𝑛)) 𝑂(𝑛log(𝑛)) 𝑂(1)
Merge sort Ω(𝑛log(𝑛)) Θ(𝑛log(𝑛)) 𝑂(𝑛log(𝑛)) 𝑂(𝑛)
Quick sort Ω(𝑛𝑙𝑜𝑔(𝑛)) Θ(𝑛log(𝑛)) 𝑂(𝑛2 ) 𝑂(log(𝑛))
Bucket sort Ω(𝑛 + 𝑘) Θ(𝑛 + 𝑘) 𝑂(𝑛2 ) 𝑂(𝑛)
DFS Ω(𝐸 + 𝑉) Θ(𝐸 + 𝑉) 𝑂(𝐸 + 𝑉) N/A
BFS Ω(𝐸 + 𝑉) Θ(E + V) 𝑂(𝐸 + 𝑉) N/A
Dijkstra 𝑂((𝐸 + 𝑉) log(𝑉)): 𝑎𝑑𝑗𝑎𝑐𝑒𝑛𝑐𝑦𝑙𝑖𝑠𝑡 𝑂(𝑛)
+ 𝑏𝑖𝑛𝑎𝑟𝑦ℎ𝑒𝑎𝑝
Topological Ω(𝐸 + 𝑉) Θ(𝐸 + 𝑉) 𝑂(𝐸 + 𝑉) N/A
sort
Boyer-Moore Ω(𝑛𝑚 + 𝑠) Θ(𝑛𝑚 + 𝑠) 𝑂(𝑛𝑚 + 𝑠) N/A
KMP Ω(𝑚 + 𝑛) Θ(𝑚 + 𝑛) 𝑂(m + n) N/A