Finals Complexity and Algorithmn
Finals Complexity and Algorithmn
Bernisto, MIS
COMSC 3
WHAT IS HASHING?
It refers to the process of generating a fixed size output from an input of
variable size using the mathematical formulas known as hash functions. This
technique determines an index or location for the storage of an item in data
structure.
Every day, the data on the internet is increasing multifold and it is always a
struggle to store this data efficiently. In day-to-day programming, this amount
of data might not be that big, but still, it needs to be stored, accessed, and
processed easily and efficiently. A very common data structure that is used for
such a purpose is the Array data structure.
Now the question arises if Array was already there, what was the need for a new
data structure! The answer to this is in the word “efficiency “. Though storing in
Array takes O(1) time, searching in it takes at least O(log n) time. This time
appears to be small, but for a large data set, it can cause a lot of problems and
this, in turn, makes the Array data structure inefficient.
Components of Hashing
1. Key: A Key can be anything string or integer which is fed as input in the
hash function the technique that determines an index or location for
storage of an item in a data structure.
2. Hash Function: The hash function receives the input key and returns the
index of an element in an array called a hash table. The index is known as
the hash index.
1|Page
Levi John A. Bernisto, MIS
COMSC 3
3. Hash Table: Hash table is a data structure that maps keys to values using
a special function called a hash function. Hash stores the data in an
associative manner in an array where each data value has its own unique
index.
When implementing a hash table using arrays, the nodes are not stored
consecutively, instead the location of storage is computed using the key and a
hash function. The computation of the array index can be visualized as shown
below:
The value computed by applying the hash function to the key is often referred to
as the hashed key. The entries into the array, are scattered (not necessarily
sequential) as can be seen in figure below.
The cost of the insert, find and delete operations is now only O(1). Can you think
of why? Hash tables are very good if you need to perform a lot of search
operations on a relatively stable table (i.e. there are a lot fewer insertion and
deletion operations than search operations). One the other hand, if traversals
(covering the entire table), insertions, deletions are a lot more frequent than
simple search operations, then ordered binary trees (also called AVL trees) are
the preferred implementation choice.
Hashing Performance
Hash function
o should distribute the keys and entries evenly throughout the entire
table
o should minimize collision
Collision resolution strategy
o Open Addressing: store the key/entry in a different position
2|Page
Levi John A. Bernisto, MIS
COMSC 3
1. Efficiently computable.
2. Should uniformly distribute the keys (Each table position equally likely for
each key)
For example: For phone numbers, a bad hash function is to take the first three digits. A
better function is considered the last three digits. Please note that this may not be
the best hash function. There may be better ways.
2. Number of collisions should be less while placing the record in the hash
table. Ideally no collision should occur. Such a function is called perfect hash
function.
3. Hash function should produce such keys which will get distributed uniformly
over an array.
4. The hash function should depend on every bit of the key. Thus, the hash
function that simply extracts the portion of a key is not suitable.
The hash function converts the key into the table position. It can be carried out
using:
Modular Arithmetic: Compute the index by dividing the key with some
value and use the remainder as the index. This forms the basis of the next
two techniques.
Truncation: Ignoring part of the key and using the rest as the array
index. The problem with this approach is that there may not always be an
even distribution throughout the table.
For Example: If student id’s are the key 928324312 then select just the last
three digits as the index i.e. 312 as the index. => the table size has to be
at least 999.
MID- SQUARE- is a hashing technique in which unique keys are
generated. In this technique, a seed value is taken and it is squared. Then,
some digits from the middle are extracted. These extracted digits form a
number which is taken as the new seed. This technique can generate keys
with high randomness if a big enough seed value is taken. However, it has
3|Page
Levi John A. Bernisto, MIS
COMSC 3
Example:
Suppose the size of the Hash Table (m) = 10 (0 - 9) maximum digits
required for the index is 1
Element (x) = 12 ⇒ x2 = 144
Mid 1 digit of 144 is 4, so the element x=12 will be stored at the
index=4 in the hash table with the size of 10 slots.
Another Example:
Suppose the size of the Hash Table (m) = 1000 (0 - 999) maximum
The possible 3 digit mids of 7644179761 are 417 or 179, we can pick
any of those mids. If we pick 419 then the element x=87431 will be
stored at the index=419 in the hash table with the size of 1000
slots.
4|Page
Levi John A. Bernisto, MIS
COMSC 3
Folding: Partition the key into several pieces and then combine it in some
convenient way. it breaks up a key value into precise segments that are
added to form a hash value, and look at another technique is to apply a
multiplicative hash function to each segment individually before adding.
Some folding methods go one step further and reverse every other piece
before the addition. This folding method is independent of distribution.
Algorithm:
The folding method is used for creating hash functions starts with the item being
divided into equal-sized pieces i.e., the last piece may not be of equal size.
The outcome of adding these bits together is the hash value, H(x) = (a + b + c)
mod M, where a, b, and c represent the preconditioned key broken down into
three parts and M is the table size, and mod stands for modulo.
In other words, the sum of three parts of the preconditioned key is divided by
the table size. The remainder is the hash key.
Explanation:
Example 1: The task is to fold the key 123456789 into a Hash Table of ten
spaces (0 through 9).
It is given that the key, say X is 123456789 and the table size (i.e., M =
10).
Since it can break X into three parts in any order. Let’s divide it evenly.
Therefore, a = 123, b = 456, c = 789.
Now, H(x) = (a + b + c) mod M i.e., H(123456789) =(123 + 456 + 789) mod
10 = 1368 mod 10 = 8.
Hence, 123456789 is inserted into the table at address 8.
Example 2: The task is to fold the key 452378912 into a Hash Table of
ten spaces (0 through 9).
It is given that the key, say X is 452378912 and the table size (i.e., M =
10).
Since it can break X into three parts in any order. Let’s divide it evenly.
Therefore, a = 452, b = 378, c = 912.
Now, H(x) = (a + b + c) mod M i.e., H(452378912) =(452 + 378 + 912) mod
10 = 1742 mod 10 = 2.
Hence, 452378912 is inserted into the table at address 2.
Collision
Let us consider the case when we have a single array with four records, each
with two fields, one for the key and one to hold data (we call this a single slot
bucket). Let the hashing function be a simple modulus operator i.e. array index
is computed by finding the remainder of dividing the key by 4.
Then key values 9, 13, 17 will all hash to the same index. When two (or more)
keys hash to the same value, a collision is said to occur.
5|Page
Levi John A. Bernisto, MIS
COMSC 3
Collision Resolution
1. Open Addressing:
For example, the typical gap between two probes is 1 as seen in the example
below:
Let hash(x) be the slot index computed using a hash function and S be the table
size
Let us consider a simple hash function as “key mod 7” and a sequence of keys
as 50, 700, 76, 85, 92, 73, 101,
which means hash(key)= key% S, here S=size of the table =7, indexed from 0 to
6. We can define the hash function as per our choice if we want to create a hash
table, although it is fixed internally with a pre-defined formula.
6|Page
Levi John A. Bernisto, MIS
COMSC 3
Symbol tables: Linear probing is commonly used in symbol tables, which are
used in compilers and interpreters to store variables and their associated
values. Since symbol tables can grow dynamically, linear probing can be used to
handle collisions and ensure that variables are stored efficiently.
Databases: Linear probing can be used in databases to store records and their
associated keys. When a collision occurs, linear probing can be used to find the
next available slot to store the record.
Overall, linear probing is a simple and efficient method for handling collisions
in hash tables, and it can be used in a variety of applications that require
efficient storage and retrieval of data.
Example: Let us consider a simple hash function as “key mod 5” and a sequence
of keys that are to be inserted are 50, 70, 76, 93.
Step1: First draw the empty hash table which will have a possible range of hash
values from 0 to 4 according to the hash function provided.
7|Page
Levi John A. Bernisto, MIS
COMSC 3
Hash table
Step 2: Now insert all the keys in the hash table one by one. The first key is 50.
It will map to slot number 0 because 50%5=0. So insert it into slot number 0.
Step 3: The next key is 70. It will map to slot number 0 because 70%5=0 but 50
is already at slot number 0 so, search for the next empty slot and insert it.
8|Page
Levi John A. Bernisto, MIS
COMSC 3
Step 4: The next key is 76. It will map to slot number 1 because 76%5=1 but 70
is already at slot number 1 so, search for the next empty slot and insert it.
Step 5: The next key is 93 It will map to slot number 3 because 93%5=3, So
insert it into slot number 3.
9|Page
Levi John A. Bernisto,
MIS
COMSC 3
2. Quadratic Probing
If you observe carefully, then you will understand that the interval between
probes will increase proportionally to the hash value. Quadratic probing is a
method with the help of which we can solve the problem of clustering that was
discussed above. This method is also known as the mid-square method. In this
method, we look for the i2‘th slot in the ith iteration. We always start from the
original hash location. If only the location is occupied then we check the other
slots.
10 | P a g e
Levi John A. Bernisto,
MIS
COMSC 3
Hash table
Hash (22) = 22 % 7 = 1, Since the cell at index 1 is empty, we can easily insert
22 at slot 1.
Hash (30) = 30 % 7 = 2, Since the cell at index 2 is empty, we can easily insert
30 at slot 2.
11 | P a g e
Levi John A. Bernisto,
MIS
COMSC 3
In our hash table slot 1 is already occupied. So, we will search for slot 1+i2, i.e.
1+1 = 2,
Again slot 2 is found occupied, so we will search for cell 1+i2, i.e.1+4 = 5,
1+22=1+4=5
Let hash(x) be the slot index computed using the hash function.
If the slot hash(x) % S is full, then we try (hash(x) + 1*1) % S.
If (hash(x) + 1*1) % S is also full, then we try (hash(x) + 2*2) % S.
If (hash(x) + 2*2) % S is also full, then we try (hash(x) + 3*3) % S.
This process is repeated for all the values of i until an empty slot is found.
For example: Let us consider a simple hash function as “key mod 7” and
sequence of keys as 50, 700, 76, 85, 92, 73, 101
12 | P a g e
Levi John A. Bernisto,
MIS
COMSC 3
Closed Addressing:
The idea behind separate chaining is to implement the array as a linked list
called a chain. Separate chaining is one of the most popular and commonly
used techniques in order to handle collisions.
The linked list data structure is used to implement this technique. So what
happens is, when multiple elements are hashed into the same slot index, then
these elements are inserted into a singly-linked list which is known as a chain.
Here, all those elements that hash into the same slot index are inserted into a
linked list. Now, we can use a key K to search in the linked list by just linearly
traversing. If the intrinsic key for any entry is equal to K then it means that we
have found our entry. If we have reached the end of the linked list and yet we
haven’t found our entry then it means that the entry does not exist. Hence, the
conclusion is that in separate chaining, if two different elements have the same
hash value then we store both the elements in the same linked list one after the
other.
Example: Let us consider a simple hash function as “key mod 7” and a sequence
of keys as 50, 700, 76, 85, 92, 73, 101
13 | P a g e
Levi John A. Bernisto,
MIS
COMSC 3
Advantages:
Simple to implement.
Hash table never fills up, we can always add more elements to the chain.
Less sensitive to the hash function or load factors.
It is mostly used when it is unknown how many and how frequently keys may
be inserted or deleted.
Disadvantages:
The cache performance of chaining is not good as keys are stored using a linked
list. Open addressing provides better cache performance as everything is
stored in the same table.
Wastage of Space (Some Parts of the hash table are never used)
If the chain becomes long, then search time can become O(n) in the worst case
Uses extra space for links
Performance of Chaining:
Performance of hashing can be evaluated under the assumption that each key is
equally likely to be hashed to any slot of the table (simple uniform hashing).
14 | P a g e
Levi John A. Bernisto,
MIS
COMSC 3
Cuckoo hashing applies the idea of multiple-choice and relocation together and
guarantees O(1) worst case lookup time!
Multiple-choice: We give a key two choices the h1(key) and h2(key) for residing.
Relocation: It may happen that h1(key) and h2(key) are preoccupied. This is
resolved by imitating the Cuckoo bird: it pushes the other eggs or young out of
the nest when it hatches. Analogously, inserting a new key into a cuckoo hashing
table may push an older key to a different location. This leaves us with the
problem of re-placing the older key.
Otherwise, the older key displaces another key. This continues until the
procedure finds a vacant position, or enters a cycle. In the case of a cycle, new
hash functions are chosen and the whole data structure is ‘rehashed’. Multiple
rehashes might be necessary before Cuckoo succeeds.
Insertion is expected O(1) (amortized) with high probability, even considering the
possibility of rehashing, as long as the number of keys is kept below half of the
capacity of the hash table, i.e., the load factor is below 50%.
Illustration
Input:
Hash Functions:
h1(key) = key%11
Let’s start by inserting 20 at its possible position in the first table determined by
h1(20):
Next: 50
15 | P a g e
Levi John A. Bernisto,
MIS
COMSC 3
Next: 67. h1(67) = 1. But 100 is already there at 1. We place 67 in table 1 & 100
in table 2
Next: 105. h1(105) = 6. But 50 is already there at 6. We place 105 in table 1 &
50 in table 2 at h2(50) = 4. Now 53 has been displaced. h1(53) = 9. 75 displaced:
h2(75) = 6.
Next: 3. h1(3) = 3.
16 | P a g e
Levi John A. Bernisto,
MIS
COMSC 3
17 | P a g e
Levi John A. Bernisto,
MIS
COMSC 3
Graph algorithms are methods used to manipulate and analyze graphs, solving
various problems like finding the shortest path or detecting cycles.
Components of a Graph:
Vertices: Vertices are the fundamental units of the graph. Sometimes, vertices
are also known as vertex or nodes. Every node/vertex can be labeled or
unlabeled.
Edges: Edges are drawn or used to connect two nodes of the graph. It can be
ordered pair of nodes in a directed graph. Edges can connect any two nodes in
any possible way. There are no rules. Sometimes, edges are also known as arcs.
Every edge can be labelled/unlabelled.
Graph data structures are a powerful tool for representing and analyzing
complex relationships between objects or entities. They are particularly useful
in fields such as social network analysis, recommendation systems, and
computer networks. In the field of sports data science, graph data structures
18 | P a g e
Levi John A. Bernisto,
MIS
COMSC 3
can be used to analyze and understand the dynamics of team performance and
player interactions on the field.
Types Of Graphs
1. Null Graph
2. Trivial Graph
Graph having only a single vertex, it is also the smallest graph possible.
19 | P a g e
Levi John A. Bernisto,
MIS
COMSC 3
3. Undirected Graph
A graph in which edges do not have any direction. That is the nodes are
unordered pairs in the definition of every edge.
4. Directed Graph
A graph in which edge has direction. That is the nodes are ordered pairs in the
definition of every edge.
5. Connected Graph
20 | P a g e
Levi John A. Bernisto,
MIS
COMSC 3
The graph in which from one node we can visit any other node in the graph is known
as a connected graph.
6. Disconnected Graph
The graph in which at least one node is not reachable from a node is known as a
disconnected graph.
7. Regular Graph
The graph in which the degree of every vertex is equal to K is called K regular graph.
8. Complete Graph
The graph in which from each node there is an edge to each other node.
21 | P a g e
Levi John A. Bernisto,
MIS
COMSC 3
9. Cycle Graph
The graph in which the graph is a cycle in itself, the degree of each vertex is 2.
22 | P a g e
Levi John A. Bernisto,
MIS
COMSC 3
A graph in which vertex can be divided into two sets such that vertex in each
set does not contain any edge between them.
A graph in which the edges are already specified with suitable weight is known
as a weighted graph.
23 | P a g e
Levi John A. Bernisto,
MIS
COMSC 3
Applications of Graph:
24 | P a g e
Levi John A. Bernisto,
MIS
COMSC 3
CHROMATIC NUMBER
The minimum number of colors needed to color a graph is called
its chromatic number. For example, the following can be colored a minimum of
2 colors.
A decision problem is stated as, “With given M colors and graph G, whether a
such color scheme is possible or not?”.
The optimization problem is stated as, “Given M colors and graph G, find the
minimum number of colors required for graph coloring.”
25 | P a g e
Levi John A. Bernisto,
MIS
COMSC 3
Assign colors one by one to different vertices, starting from vertex 0. Before
assigning a color, check if the adjacent vertices have the same color or not. If
there is any color assignment that does not violate the conditions, mark the
color assignment as part of the solution. If no assignment of color is possible
then backtrack and return false.
Illustration:
To color the first node there are 3 choices of colors Red, Green and Blue, so let’s take
the red color for first node.
After Red color for first node is fixed then we have made choice for second node in
similar manner as we did for first node, then for 3rd node and so on.
There is one important point to remember. while choosing color for the node, it should
not be same as the color of the adjacent node.
As shown in the above diagram, all the solutions are shown by coloring the first node
in Red.
Let’s choose Green color for the first node and explore the options for the remaining
nodes.
26 | P a g e
Levi John A. Bernisto,
MIS
COMSC 3
As shown in the above diagram, all the solutions are shown by coloring the first node
in blue.
Let’s choose blue color for the first node and explore the options for the remaining
nodes.
27 | P a g e
Levi John A. Bernisto,
MIS
COMSC 3
to each edge of the spanning tree. All weights of the given graph will be positive
for simplicity.
A minimum spanning tree (MST) is defined as a spanning tree that has the
minimum weight among all the possible spanning trees
1. The number of vertices (V) in the graph and the spanning tree is the
same.
2. There is a fixed number of edges in the spanning tree which is equal to
one less than the total number of vertices ( E = V-1 ).
3. The spanning tree should not be disconnected, as in there should only be
a single source of component, not more than that.
4. The spanning tree should be acyclic, which means there would not be
any cycle in the tree.
5. The total cost (or weight) of the spanning tree is defined as the sum of
the edge weights of all the edges of the spanning tree.
6. There can be many possible spanning trees for a graph.
Kruskal’s algorithm
28 | P a g e
Levi John A. Bernisto,
MIS
COMSC 3
In Kruskal’s algorithm, sort all edges of the given graph in increasing order. Then
it keeps on adding new edges and nodes in the MST if the newly added edge does
not form a cycle. It picks the minimum weighted edge at first and the maximum
weighted edge at last.
Below are the steps for finding MST using Kruskal’s algorithm:
Kruskal’s algorithm to find the minimum cost spanning tree uses the greedy
approach. The Greedy Choice is to pick the smallest weight edge that does not
cause a cycle in the MST constructed so far. Let us understand it with an
example:
Illustration:
Input Graph:
1 7 6
2 8 2
29 | P a g e
Levi John A. Bernisto,
MIS
COMSC 3
2 6 5
4 0 1
4 2 5
6 8 6
7 2 3
7 7 8
8 0 7
8 1 2
9 3 4
10 5 4
11 1 7
14 3 5
30 | P a g e
Levi John A. Bernisto,
MIS
COMSC 3
31 | P a g e
Levi John A. Bernisto,
MIS
COMSC 3
32 | P a g e
Levi John A. Bernisto,
MIS
COMSC 3
33 | P a g e
Levi John A. Bernisto,
MIS
COMSC 3
34 | P a g e
Levi John A. Bernisto,
MIS
COMSC 3
Prim’s algorithm
The algorithm starts with an empty spanning tree. The idea is to maintain two
sets of vertices. The first set contains the vertices already included in the MST,
and the other set contains the vertices not yet included. At every step, it
considers all the edges that connect the two sets and picks the minimum
weight edge from these edges. After picking the edge, it moves the other
endpoint of the edge to the set containing MST.
The working of Prim’s algorithm can be described by using the following steps:
Consider the following graph as an example for which we need to find the
Minimum Spanning Tree (MST).
35 | P a g e
Levi John A. Bernisto,
MIS
COMSC 3
36 | P a g e
Levi John A. Bernisto,
MIS
COMSC 3
37 | P a g e
Levi John A. Bernisto,
MIS
COMSC 3
38 | P a g e
Levi John A. Bernisto,
MIS
COMSC 3
39 | P a g e
Levi John A. Bernisto,
MIS
COMSC 3
40 | P a g e
Levi John A. Bernisto,
MIS
COMSC 3
Follow the given steps to utilize the Prim’s Algorithm mentioned above for
finding MST of a graph:
Create a set mstSet that keeps track of vertices already included in MST.
Assign a key value to all vertices in the input graph. Initialize all key values as
INFINITE. Assign the key value as 0 for the first vertex so that it is picked first.
Pick a vertex u that is not there in mstSet and has a minimum key value.
Update the key value of all adjacent vertices of u. To update the key values,
iterate through all adjacent vertices.
For every adjacent vertex v, if the weight of edge u-v is less than the previous
key value of v, update the key value as the weight of u-v.
The idea of using key values is to pick the minimum weight edge from the cut.
The key values are used only for vertices that are not yet included in MST, the
key value for these vertices indicates the minimum weight edges connecting
them to the set of vertices included in MST
41 | P a g e
Levi John A. Bernisto,
MIS
COMSC 3
Prim’s algorithm prefers list data Kruskal’s algorithm prefer heap data
structures. structur
42 | P a g e
Levi John A. Bernisto,
MIS
COMSC 3
OBJECTIVES
By the end of this unit, you will be able to
Understand several design techniques or paradigms of algorithms
Know the meaning of Asymptotic notations
Understand some popular Asymptotic notations
Learn how to apply some of the Asymptotic notations learnt
43 | P a g e
Levi John A. Bernisto,
MIS
COMSC 3
44 | P a g e
Levi John A. Bernisto,
MIS
COMSC 3
45 | P a g e
Levi John A. Bernisto,
MIS
COMSC 3
46 | P a g e
Levi John A. Bernisto,
MIS
COMSC 3
Hence, function g (n) is an upper bound for function f (n), as g (n) grows
faster than f (n)
Examples:
1. 3n+2=O(n) as 3n+2≤4n for all n≥2
2. 3n+3=O(n) as 3n+3≤4n for all n≥3
Hence, the complexity of f(n) can be represented as O (g (n))
3.3.2. Big Omega () Notation
The function f (n) = Ω (g (n)) [read as "f of n is omega of g of n"] if and
only if there exists positive constant c and n0 such that
F (n) ≥ k* g (n) for all n, n≥ n0
47 | P a g e
Levi John A. Bernisto,
MIS
COMSC 3
Example:
f (n) =8n2+2n-3≥8n2-3
=7n2+(n2-3)≥7n2 (g(n))
Thus, k1=7
Hence, the complexity of f (n) can be represented as Ω (g (n))
For Example:
3n+2= θ (n) as 3n+2≥3n and 3n+2≤ 4n, for n
k1=3,k2=4, and n0=2
Hence, the complexity of f (n) can be represented as θ (g(n)).
The Theta Notation is more precise than both the big-oh and Omega
notation. The function f (n) = θ (g (n)) if g(n) is both an upper and lower
bound.
CONCLUSION
Algorithm design techniques presents us with different paradigms or
48 | P a g e
Levi John A. Bernisto,
MIS
COMSC 3
SUMMARY
Several design techniques or paradigms are available for specifying
algorithms and they range from the popular Divide-and-Conquer,
Greedy techniques and Randomized algorithms amongst others. In the
same vein, we have three main notations for carrying out the Asymptotic
analysis of algorithms and they are the Big O, Big Omega and Big Theta
notations.
49 | P a g e