0% found this document useful (0 votes)
32 views49 pages

Finals Complexity and Algorithmn

Hashing is the process of generating a fixed-size output from variable-size input using hash functions, facilitating efficient data storage and retrieval. It consists of three main components: key, hash function, and hash table, which maps keys to values. The performance of hashing is influenced by factors such as the hash function, collision resolution strategy, and table size, with techniques like linear and quadratic probing used to handle collisions.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views49 pages

Finals Complexity and Algorithmn

Hashing is the process of generating a fixed-size output from variable-size input using hash functions, facilitating efficient data storage and retrieval. It consists of three main components: key, hash function, and hash table, which maps keys to values. The performance of hashing is influenced by factors such as the hash function, collision resolution strategy, and table size, with techniques like linear and quadratic probing used to handle collisions.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

Levi John A.

Bernisto, MIS
COMSC 3

CHAPTER IV: HASHING

WHAT IS HASHING?
It refers to the process of generating a fixed size output from an input of
variable size using the mathematical formulas known as hash functions. This
technique determines an index or location for the storage of an item in data
structure.

Hashing is the process of scrambling a piece of information or data beyond


recognition. They are designed to be irreversible. We pass the input through a
hash function to calculate the hash value.

Need for Hash data structure

Every day, the data on the internet is increasing multifold and it is always a
struggle to store this data efficiently. In day-to-day programming, this amount
of data might not be that big, but still, it needs to be stored, accessed, and
processed easily and efficiently. A very common data structure that is used for
such a purpose is the Array data structure.

Now the question arises if Array was already there, what was the need for a new
data structure! The answer to this is in the word “efficiency “. Though storing in
Array takes O(1) time, searching in it takes at least O(log n) time. This time
appears to be small, but for a large data set, it can cause a lot of problems and
this, in turn, makes the Array data structure inefficient.

Components of Hashing

There are majorly three components of hashing:

1. Key: A Key can be anything string or integer which is fed as input in the
hash function the technique that determines an index or location for
storage of an item in a data structure.
2. Hash Function: The hash function receives the input key and returns the
index of an element in an array called a hash table. The index is known as
the hash index.

1|Page
Levi John A. Bernisto, MIS
COMSC 3

3. Hash Table: Hash table is a data structure that maps keys to values using
a special function called a hash function. Hash stores the data in an
associative manner in an array where each data value has its own unique
index.

Hashing using Arrays

When implementing a hash table using arrays, the nodes are not stored
consecutively, instead the location of storage is computed using the key and a
hash function. The computation of the array index can be visualized as shown
below:

The value computed by applying the hash function to the key is often referred to
as the hashed key. The entries into the array, are scattered (not necessarily
sequential) as can be seen in figure below.

The cost of the insert, find and delete operations is now only O(1). Can you think
of why? Hash tables are very good if you need to perform a lot of search
operations on a relatively stable table (i.e. there are a lot fewer insertion and
deletion operations than search operations). One the other hand, if traversals
(covering the entire table), insertions, deletions are a lot more frequent than
simple search operations, then ordered binary trees (also called AVL trees) are
the preferred implementation choice.

Hashing Performance

There are three factors the influence the performance of hashing:

 Hash function
o should distribute the keys and entries evenly throughout the entire
table
o should minimize collision
 Collision resolution strategy
o Open Addressing: store the key/entry in a different position

2|Page
Levi John A. Bernisto, MIS
COMSC 3

o Separate Chaining: chain several keys/entries in the same position


 Tablesize
o Too large a table, will cause a wastage of memory
o Too small a table will cause increased collisions and eventually
force rehashing (creating a new hash table of larger size and copying
the contents of the current hash table into it)
o The size should be appropriate to the hash function used and
should typically be a prime number. Why? (We discussed this in
class).

CHARATERISTICS OF A GOOD HASH FUNCTION

A good hash function should have the following properties:

1. Efficiently computable.
2. Should uniformly distribute the keys (Each table position equally likely for
each key)

For example: For phone numbers, a bad hash function is to take the first three digits. A
better function is considered the last three digits. Please note that this may not be
the best hash function. There may be better ways.

Rules for choosing good hash function:

1. The hash function should be simple to compute.

2. Number of collisions should be less while placing the record in the hash
table. Ideally no collision should occur. Such a function is called perfect hash
function.

3. Hash function should produce such keys which will get distributed uniformly
over an array.

4. The hash function should depend on every bit of the key. Thus, the hash
function that simply extracts the portion of a key is not suitable.

Selecting Hash Functions

The hash function converts the key into the table position. It can be carried out
using:

 Modular Arithmetic: Compute the index by dividing the key with some
value and use the remainder as the index. This forms the basis of the next
two techniques.

For Example: index := key MOD table_size

 Truncation: Ignoring part of the key and using the rest as the array
index. The problem with this approach is that there may not always be an
even distribution throughout the table.
For Example: If student id’s are the key 928324312 then select just the last
three digits as the index i.e. 312 as the index. => the table size has to be
at least 999.
 MID- SQUARE- is a hashing technique in which unique keys are
generated. In this technique, a seed value is taken and it is squared. Then,
some digits from the middle are extracted. These extracted digits form a
number which is taken as the new seed. This technique can generate keys
with high randomness if a big enough seed value is taken. However, it has
3|Page
Levi John A. Bernisto, MIS
COMSC 3

a limitation. As the seed is squared, if a 6-digit number is taken, then the


square will have 12-digits.

Example:
Suppose the size of the Hash Table (m) = 10 (0 - 9) maximum digits
required for the index is 1
Element (x) = 12 ⇒ x2 = 144
Mid 1 digit of 144 is 4, so the element x=12 will be stored at the
index=4 in the hash table with the size of 10 slots.
Another Example:
Suppose the size of the Hash Table (m) = 1000 (0 - 999) maximum

Element (x) = 87431 ⇒ x2 = 7644179761


digits required for the index is 3

The possible 3 digit mids of 7644179761 are 417 or 179, we can pick
any of those mids. If we pick 419 then the element x=87431 will be
stored at the index=419 in the hash table with the size of 1000
slots.

4|Page
Levi John A. Bernisto, MIS
COMSC 3

 Folding: Partition the key into several pieces and then combine it in some
convenient way. it breaks up a key value into precise segments that are
added to form a hash value, and look at another technique is to apply a
multiplicative hash function to each segment individually before adding.
Some folding methods go one step further and reverse every other piece
before the addition. This folding method is independent of distribution.

Algorithm:
The folding method is used for creating hash functions starts with the item being
divided into equal-sized pieces i.e., the last piece may not be of equal size.

The outcome of adding these bits together is the hash value, H(x) = (a + b + c)
mod M, where a, b, and c represent the preconditioned key broken down into
three parts and M is the table size, and mod stands for modulo.

In other words, the sum of three parts of the preconditioned key is divided by
the table size. The remainder is the hash key.

Explanation:

Example 1: The task is to fold the key 123456789 into a Hash Table of ten
spaces (0 through 9).

 It is given that the key, say X is 123456789 and the table size (i.e., M =
10).
 Since it can break X into three parts in any order. Let’s divide it evenly.
 Therefore, a = 123, b = 456, c = 789.
 Now, H(x) = (a + b + c) mod M i.e., H(123456789) =(123 + 456 + 789) mod
10 = 1368 mod 10 = 8.
 Hence, 123456789 is inserted into the table at address 8.
Example 2: The task is to fold the key 452378912 into a Hash Table of
ten spaces (0 through 9).

 It is given that the key, say X is 452378912 and the table size (i.e., M =
10).
 Since it can break X into three parts in any order. Let’s divide it evenly.
 Therefore, a = 452, b = 378, c = 912.
 Now, H(x) = (a + b + c) mod M i.e., H(452378912) =(452 + 378 + 912) mod
10 = 1742 mod 10 = 2.
 Hence, 452378912 is inserted into the table at address 2.

PROBLEMS ENCOUNTER IN HASHING

Collision

Let us consider the case when we have a single array with four records, each
with two fields, one for the key and one to hold data (we call this a single slot
bucket). Let the hashing function be a simple modulus operator i.e. array index
is computed by finding the remainder of dividing the key by 4.

Array Index: = key MOD 4

Then key values 9, 13, 17 will all hash to the same index. When two (or more)
keys hash to the same value, a collision is said to occur.

5|Page
Levi John A. Bernisto, MIS
COMSC 3

Collision Resolution

1. Open Addressing:

Linear Probing: Search for an empty slot sequentially.


In linear probing, the hash table is searched sequentially that starts from the
original location of the hash. If in case the location that we get is already
occupied, then we check for the next location.

The function used for rehashing is as follows: rehash(key) = (n+1) %table-size.

For example, the typical gap between two probes is 1 as seen in the example
below:

Let hash(x) be the slot index computed using a hash function and S be the table
size

If slot hash(x) % S is full, then we try (hash(x) + 1) % S


If (hash(x) + 1) % S is also full, then we try (hash(x) + 2) % S
If (hash(x) + 2) % S is also full, then we try (hash(x) + 3) % S

Let us consider a simple hash function as “key mod 7” and a sequence of keys
as 50, 700, 76, 85, 92, 73, 101,
which means hash(key)= key% S, here S=size of the table =7, indexed from 0 to
6. We can define the hash function as per our choice if we want to create a hash
table, although it is fixed internally with a pre-defined formula.

Applications of linear probing:

Linear probing is a collision handling technique used in hashing, where the


algorithm looks for the next available slot in the hash table to store the collided
key. Some of the applications of linear probing include:

6|Page
Levi John A. Bernisto, MIS
COMSC 3

Symbol tables: Linear probing is commonly used in symbol tables, which are
used in compilers and interpreters to store variables and their associated
values. Since symbol tables can grow dynamically, linear probing can be used to
handle collisions and ensure that variables are stored efficiently.

Caching: Linear probing can be used in caching systems to store frequently


accessed data in memory. When a cache miss occurs, the data can be loaded
into the cache using linear probing, and when a collision occurs, the next
available slot in the cache can be used to store the data.

Databases: Linear probing can be used in databases to store records and their
associated keys. When a collision occurs, linear probing can be used to find the
next available slot to store the record.

Compiler design: Linear probing can be used in compiler design to implement


symbol tables, error recovery mechanisms, and syntax analysis.

Spell checking: Linear probing can be used in spell-checking software to store


the dictionary of words and their associated frequency counts. When a collision
occurs, linear probing can be used to store the word in the next available slot.

Overall, linear probing is a simple and efficient method for handling collisions
in hash tables, and it can be used in a variety of applications that require
efficient storage and retrieval of data.

Challenges in Linear Probing:

Primary Clustering: One of the problems with linear probing is Primary


clustering, many consecutive elements form groups and it starts taking time to
find a free slot or to search for an element.

Secondary Clustering: Secondary clustering is less severe; two records only


have the same collision chain (Probe Sequence) if their initial position is the
same.

Example: Let us consider a simple hash function as “key mod 5” and a sequence
of keys that are to be inserted are 50, 70, 76, 93.

Step1: First draw the empty hash table which will have a possible range of hash
values from 0 to 4 according to the hash function provided.

7|Page
Levi John A. Bernisto, MIS
COMSC 3

Hash table

Step 2: Now insert all the keys in the hash table one by one. The first key is 50.
It will map to slot number 0 because 50%5=0. So insert it into slot number 0.

Insert 50 into hash table

Step 3: The next key is 70. It will map to slot number 0 because 70%5=0 but 50
is already at slot number 0 so, search for the next empty slot and insert it.

8|Page
Levi John A. Bernisto, MIS
COMSC 3

Insert 70 into hash table

Step 4: The next key is 76. It will map to slot number 1 because 76%5=1 but 70
is already at slot number 1 so, search for the next empty slot and insert it.

Insert 76 into hash table

Step 5: The next key is 93 It will map to slot number 3 because 93%5=3, So
insert it into slot number 3.

9|Page
Levi John A. Bernisto,
MIS
COMSC 3

Insert 93 into hash table

2. Quadratic Probing

If you observe carefully, then you will understand that the interval between
probes will increase proportionally to the hash value. Quadratic probing is a
method with the help of which we can solve the problem of clustering that was
discussed above. This method is also known as the mid-square method. In this
method, we look for the i2‘th slot in the ith iteration. We always start from the
original hash location. If only the location is occupied then we check the other
slots.

let hash(x) be the slot index computed using hash function.

If slot hash(x) % S is full, then we try (hash(x) + 1*1) % S


If (hash(x) + 1*1) % S is also full, then we try (hash(x) + 2*2) % S
If (hash(x) + 2*2) % S is also full, then we try (hash(x) + 3*3) % S

Example: Let us consider table Size = 7, hash function as Hash(x) = x % 7 and


collision resolution strategy to be f(i) = i2 . Insert = 22, 30, and 50.

Step 1: Create a table of size 7.

10 | P a g e
Levi John A. Bernisto,
MIS
COMSC 3

Hash table

Step 2 – Insert 22 and 30

Hash (22) = 22 % 7 = 1, Since the cell at index 1 is empty, we can easily insert
22 at slot 1.

Hash (30) = 30 % 7 = 2, Since the cell at index 2 is empty, we can easily insert
30 at slot 2.

Insert keys 22 and 30 in the hash table


Step 3: Inserting 50
Hash(50) = 50 % 7 = 1

11 | P a g e
Levi John A. Bernisto,
MIS
COMSC 3

In our hash table slot 1 is already occupied. So, we will search for slot 1+i2, i.e.
1+1 = 2,

Again slot 2 is found occupied, so we will search for cell 1+i2, i.e.1+4 = 5,

1+22=1+4=5

Now, cell 5 is not occupied so we will place 50 in slot 5.

Insert key 50 in the hash table

Quadratic Probing: Search for an empty slot using a quadratic function

Quadratic probing is an open-addressing scheme where we look for the i2‘th


slot in the i’th iteration if the given hash value x collides in the hash table.

How Quadratic Probing is done?

Let hash(x) be the slot index computed using the hash function.
If the slot hash(x) % S is full, then we try (hash(x) + 1*1) % S.
If (hash(x) + 1*1) % S is also full, then we try (hash(x) + 2*2) % S.
If (hash(x) + 2*2) % S is also full, then we try (hash(x) + 3*3) % S.
This process is repeated for all the values of i until an empty slot is found.

For example: Let us consider a simple hash function as “key mod 7” and
sequence of keys as 50, 700, 76, 85, 92, 73, 101

12 | P a g e
Levi John A. Bernisto,
MIS
COMSC 3

Closed Addressing:

Chaining: Store colliding keys in a linked list at each index.

The idea behind separate chaining is to implement the array as a linked list
called a chain. Separate chaining is one of the most popular and commonly
used techniques in order to handle collisions.

The linked list data structure is used to implement this technique. So what
happens is, when multiple elements are hashed into the same slot index, then
these elements are inserted into a singly-linked list which is known as a chain.

Here, all those elements that hash into the same slot index are inserted into a
linked list. Now, we can use a key K to search in the linked list by just linearly
traversing. If the intrinsic key for any entry is equal to K then it means that we
have found our entry. If we have reached the end of the linked list and yet we
haven’t found our entry then it means that the entry does not exist. Hence, the
conclusion is that in separate chaining, if two different elements have the same
hash value then we store both the elements in the same linked list one after the
other.

Example: Let us consider a simple hash function as “key mod 7” and a sequence
of keys as 50, 700, 76, 85, 92, 73, 101

13 | P a g e
Levi John A. Bernisto,
MIS
COMSC 3

Advantages:

Simple to implement.

Hash table never fills up, we can always add more elements to the chain.
Less sensitive to the hash function or load factors.
It is mostly used when it is unknown how many and how frequently keys may
be inserted or deleted.

Disadvantages:

The cache performance of chaining is not good as keys are stored using a linked
list. Open addressing provides better cache performance as everything is
stored in the same table.

Wastage of Space (Some Parts of the hash table are never used)

If the chain becomes long, then search time can become O(n) in the worst case
Uses extra space for links
Performance of Chaining:

Performance of hashing can be evaluated under the assumption that each key is
equally likely to be hashed to any slot of the table (simple uniform hashing).

m = Number of slots in hash table


n = Number of keys to be inserted in hash table
Cuckoo Hashing: Use multiple hash functions to distribute keys

14 | P a g e
Levi John A. Bernisto,
MIS
COMSC 3

Cuckoo hashing applies the idea of multiple-choice and relocation together and
guarantees O(1) worst case lookup time!

Multiple-choice: We give a key two choices the h1(key) and h2(key) for residing.

Relocation: It may happen that h1(key) and h2(key) are preoccupied. This is
resolved by imitating the Cuckoo bird: it pushes the other eggs or young out of
the nest when it hatches. Analogously, inserting a new key into a cuckoo hashing
table may push an older key to a different location. This leaves us with the
problem of re-placing the older key.

If the alternate position of older key is vacant, there is no problem.

Otherwise, the older key displaces another key. This continues until the
procedure finds a vacant position, or enters a cycle. In the case of a cycle, new
hash functions are chosen and the whole data structure is ‘rehashed’. Multiple
rehashes might be necessary before Cuckoo succeeds.

Insertion is expected O(1) (amortized) with high probability, even considering the
possibility of rehashing, as long as the number of keys is kept below half of the
capacity of the hash table, i.e., the load factor is below 50%.

Deletion is O(1) worst-case as it requires inspection of just two locations in the


hash table.

Illustration

Input:

{20, 50, 53, 75, 100, 67, 105, 3, 36, 39}

Hash Functions:

h1(key) = key%11

h2(key) = (key/11) %11

Let’s start by inserting 20 at its possible position in the first table determined by
h1(20):

Next: 50

15 | P a g e
Levi John A. Bernisto,
MIS
COMSC 3

Next: 53. h1(53) = 9. But 20 is already there at 9. We place 53 in table 1 & 20 in


table 2 at h2(20)

Next: 75. h1(75) = 9. But 53 is already there at 9. We place 75 in table 1 & 53 in


table 2 at h2(53)

Next: 100. h1(100) = 1.

Next: 67. h1(67) = 1. But 100 is already there at 1. We place 67 in table 1 & 100
in table 2

Next: 105. h1(105) = 6. But 50 is already there at 6. We place 105 in table 1 &
50 in table 2 at h2(50) = 4. Now 53 has been displaced. h1(53) = 9. 75 displaced:
h2(75) = 6.

Next: 3. h1(3) = 3.

16 | P a g e
Levi John A. Bernisto,
MIS
COMSC 3

Next: 36. h1(36) = 3. h2(3) = 0.

Next: 39. h1(39) = 6. h2(105) = 9. h1(100) = 1. h2(67) = 6. h1(75) = 9. h2(53) = 4.


h1(50) = 6. h2(39) = 3.
Here, the new key 39 is displaced later in the recursive calls to place 105, which
it displaced.

17 | P a g e
Levi John A. Bernisto,
MIS
COMSC 3

CHAPTER V: GRAPH ALGORITHM

Graph algorithms are methods used to manipulate and analyze graphs, solving
various problems like finding the shortest path or detecting cycles.

Components of a Graph:

Vertices: Vertices are the fundamental units of the graph. Sometimes, vertices
are also known as vertex or nodes. Every node/vertex can be labeled or
unlabeled.

Edges: Edges are drawn or used to connect two nodes of the graph. It can be
ordered pair of nodes in a directed graph. Edges can connect any two nodes in
any possible way. There are no rules. Sometimes, edges are also known as arcs.
Every edge can be labelled/unlabelled.

A Graph is a non-linear data structure consisting of vertices and edges. The


vertices are sometimes also referred to as nodes and the edges are lines or arcs
that connect any two nodes in the graph. More formally a Graph is composed
of a set of vertices(V ) and a set of edges( E ). The graph is denoted by G (V, E).

Graph data structures are a powerful tool for representing and analyzing
complex relationships between objects or entities. They are particularly useful
in fields such as social network analysis, recommendation systems, and
computer networks. In the field of sports data science, graph data structures

18 | P a g e
Levi John A. Bernisto,
MIS
COMSC 3

can be used to analyze and understand the dynamics of team performance and
player interactions on the field.

Imagine a game of football as a web of connections, where players are the


nodes and their interactions on the field are the edges. This web of connections
is exactly what a graph data structure represents, and it’s the key to unlocking
insights into team performance and player dynamics in sports.

Types Of Graphs

1. Null Graph

A graph is known as a null graph if there are no edges in the graph.

2. Trivial Graph

Graph having only a single vertex, it is also the smallest graph possible.

19 | P a g e
Levi John A. Bernisto,
MIS
COMSC 3

3. Undirected Graph

A graph in which edges do not have any direction. That is the nodes are
unordered pairs in the definition of every edge.

4. Directed Graph

A graph in which edge has direction. That is the nodes are ordered pairs in the
definition of every edge.

5. Connected Graph

20 | P a g e
Levi John A. Bernisto,
MIS
COMSC 3

The graph in which from one node we can visit any other node in the graph is known
as a connected graph.

6. Disconnected Graph

The graph in which at least one node is not reachable from a node is known as a
disconnected graph.

7. Regular Graph

The graph in which the degree of every vertex is equal to K is called K regular graph.

8. Complete Graph

The graph in which from each node there is an edge to each other node.

21 | P a g e
Levi John A. Bernisto,
MIS
COMSC 3

9. Cycle Graph

The graph in which the graph is a cycle in itself, the degree of each vertex is 2.

10. Cyclic Graph

A graph containing at least one cycle is known as a Cyclic graph.

11. Directed Acyclic Graph

A Directed Graph that does not contain any cycle.

22 | P a g e
Levi John A. Bernisto,
MIS
COMSC 3

12. Bipartite Graph

A graph in which vertex can be divided into two sets such that vertex in each
set does not contain any edge between them.

13. Weighted Graph

A graph in which the edges are already specified with suitable weight is known
as a weighted graph.

Weighted graphs can be further classified as directed weighted graphs and


undirected weighted graphs.

Tree v/s Graph


Trees are the restricted types of graphs, just with some more rules. Every tree
will always be a graph but not all graphs will be trees. Linked List, Trees,
and Heaps all are special cases of graphs.

23 | P a g e
Levi John A. Bernisto,
MIS
COMSC 3

BASIC OPERATIONS ON GRAPHS


Insertion of Nodes/Edges in the graph – Insert a node into the graph.

Deletion of Nodes/Edges in the graph – Delete a node from the graph.

Searching on Graphs – Search an entity in the graph.

Traversal of Graphs – Traversing all the nodes in the graph.

Applications of Graph:

Following are the real-life applications:

1. Graph data structures can be used to represent the interactions between


players on a team, such as passes, shots, and tackles. Analyzing these
interactions can provide insights into team dynamics and areas for
improvement.
2. Commonly used to represent social networks, such as networks of friends on
social media.
3. Graphs can be used to represent the topology of computer networks, such
as the connections between routers and switches.
4. Graphs are used to represent the connections between different places in a
transportation network, such as roads and airports.
5. Graphs are used in Neural Networks where vertices represent neurons and
edges represent the synapses between them. Neural networks are used to
understand how our brain works and how connections change when we
learn. The human brain has about 10^11 neurons and close to 10^15
synapses.

GRAPH COLORING - refers to the problem of coloring vertices of a graph in such


a way that no two adjacent vertices have the same color. This is also called
the vertex coloring problem. If coloring is done using at most m colors, it is called
m-coloring.

24 | P a g e
Levi John A. Bernisto,
MIS
COMSC 3

CHROMATIC NUMBER
The minimum number of colors needed to color a graph is called
its chromatic number. For example, the following can be colored a minimum of
2 colors.

Graph coloring problem is both, a decision problem as well as an optimization


problem.

A decision problem is stated as, “With given M colors and graph G, whether a
such color scheme is possible or not?”.

The optimization problem is stated as, “Given M colors and graph G, find the
minimum number of colors required for graph coloring.”

Follow the given steps to solve the problem:


1. Create a recursive function that takes the graph, current index, number
of vertices, and color array.
2. If the current index is equal to the number of vertices. Print the color
configuration in the color [Link] a color to a vertex from the range
(1 to m).
3. For every assigned color, check if the configuration is safe, (i.e. check if
the adjacent vertices do not have the same color) and recursively call the
function with the next index and number of vertices else return false
4. If any recursive function returns true then break the loop and return true
5. If no recursive function returns true then return false

Algorithm of Graph Coloring using Backtracking

25 | P a g e
Levi John A. Bernisto,
MIS
COMSC 3

Assign colors one by one to different vertices, starting from vertex 0. Before
assigning a color, check if the adjacent vertices have the same color or not. If
there is any color assignment that does not violate the conditions, mark the
color assignment as part of the solution. If no assignment of color is possible
then backtrack and return false.
Illustration:

To color the graph, color each node one by one.

To color the first node there are 3 choices of colors Red, Green and Blue, so let’s take
the red color for first node.

After Red color for first node is fixed then we have made choice for second node in
similar manner as we did for first node, then for 3rd node and so on.

There is one important point to remember. while choosing color for the node, it should
not be same as the color of the adjacent node.

Choosing Red For Node 1

As shown in the above diagram, all the solutions are shown by coloring the first node
in Red.

Let’s choose Green color for the first node and explore the options for the remaining
nodes.

26 | P a g e
Levi John A. Bernisto,
MIS
COMSC 3

Choosing Green for Node 1

As shown in the above diagram, all the solutions are shown by coloring the first node
in blue.

Let’s choose blue color for the first node and explore the options for the remaining
nodes.

Minimal Spanning Tree

Given a connected and undirected graph, a spanning tree of that graph is a


subgraph that is a tree and connects all the vertices together. A single graph can
have many different spanning trees. A minimum product spanning tree for a
weighted, connected, and undirected graph is a spanning tree with a weight
product less than or equal to the weight product of every other spanning tree.
The weight product of a spanning tree is the product of weights corresponding

27 | P a g e
Levi John A. Bernisto,
MIS
COMSC 3

to each edge of the spanning tree. All weights of the given graph will be positive
for simplicity.

A minimum spanning tree (MST) is defined as a spanning tree that has the
minimum weight among all the possible spanning trees

Properties of a Spanning Tree:

The spanning tree holds the below-mentioned principles:

1. The number of vertices (V) in the graph and the spanning tree is the
same.
2. There is a fixed number of edges in the spanning tree which is equal to
one less than the total number of vertices ( E = V-1 ).
3. The spanning tree should not be disconnected, as in there should only be
a single source of component, not more than that.
4. The spanning tree should be acyclic, which means there would not be
any cycle in the tree.
5. The total cost (or weight) of the spanning tree is defined as the sum of
the edge weights of all the edges of the spanning tree.
6. There can be many possible spanning trees for a graph.

Kruskal’s algorithm

28 | P a g e
Levi John A. Bernisto,
MIS
COMSC 3

In Kruskal’s algorithm, sort all edges of the given graph in increasing order. Then
it keeps on adding new edges and nodes in the MST if the newly added edge does
not form a cycle. It picks the minimum weighted edge at first and the maximum
weighted edge at last.

How to find MST using Kruskal’s algorithm?

Below are the steps for finding MST using Kruskal’s algorithm:

1. Sort all the edges in non-decreasing order of their weight.


2. Pick the smallest edge. Check if it forms a cycle with the spanning tree
formed so far. If the cycle is not formed, include this edge. Else, discard
it.
3. Repeat step#2 until there are (V-1) edges in the spanning tree.

Kruskal’s algorithm to find the minimum cost spanning tree uses the greedy
approach. The Greedy Choice is to pick the smallest weight edge that does not
cause a cycle in the MST constructed so far. Let us understand it with an
example:

Illustration:

Input Graph:

Weight Source Destination

1 7 6

2 8 2

29 | P a g e
Levi John A. Bernisto,
MIS
COMSC 3

Weight Source Destination

2 6 5

4 0 1

4 2 5

6 8 6

7 2 3

7 7 8

8 0 7

8 1 2

9 3 4

10 5 4

11 1 7

14 3 5

30 | P a g e
Levi John A. Bernisto,
MIS
COMSC 3

31 | P a g e
Levi John A. Bernisto,
MIS
COMSC 3

32 | P a g e
Levi John A. Bernisto,
MIS
COMSC 3

33 | P a g e
Levi John A. Bernisto,
MIS
COMSC 3

34 | P a g e
Levi John A. Bernisto,
MIS
COMSC 3

Prim’s algorithm
The algorithm starts with an empty spanning tree. The idea is to maintain two
sets of vertices. The first set contains the vertices already included in the MST,
and the other set contains the vertices not yet included. At every step, it
considers all the edges that connect the two sets and picks the minimum
weight edge from these edges. After picking the edge, it moves the other
endpoint of the edge to the set containing MST.

How does Prim’s Algorithm Work?

The working of Prim’s algorithm can be described by using the following steps:

Step 1: Determine an arbitrary vertex as the starting vertex of the MST.


Step 2: Follow steps 3 to 5 till there are vertices that are not included in the
MST (known as fringe vertex).
Step 3: Find edges connecting any tree vertex with the fringe vertices.
Step 4: Find the minimum among these edges.
Step 5: Add the chosen edge to the MST if it does not form any cycle.
Step 6: Return the MST and exit

Illustration of Prim’s Algorithm:

Consider the following graph as an example for which we need to find the
Minimum Spanning Tree (MST).

35 | P a g e
Levi John A. Bernisto,
MIS
COMSC 3

36 | P a g e
Levi John A. Bernisto,
MIS
COMSC 3

37 | P a g e
Levi John A. Bernisto,
MIS
COMSC 3

38 | P a g e
Levi John A. Bernisto,
MIS
COMSC 3

39 | P a g e
Levi John A. Bernisto,
MIS
COMSC 3

40 | P a g e
Levi John A. Bernisto,
MIS
COMSC 3

How to implement Prim’s Algorithm?

Follow the given steps to utilize the Prim’s Algorithm mentioned above for
finding MST of a graph:

Create a set mstSet that keeps track of vertices already included in MST.

Assign a key value to all vertices in the input graph. Initialize all key values as
INFINITE. Assign the key value as 0 for the first vertex so that it is picked first.

While mstSet doesn’t include all vertices

Pick a vertex u that is not there in mstSet and has a minimum key value.

Include u in the mstSet.

Update the key value of all adjacent vertices of u. To update the key values,
iterate through all adjacent vertices.

For every adjacent vertex v, if the weight of edge u-v is less than the previous
key value of v, update the key value as the weight of u-v.

The idea of using key values is to pick the minimum weight edge from the cut.
The key values are used only for vertices that are not yet included in MST, the
key value for these vertices indicates the minimum weight edges connecting
them to the set of vertices included in MST

41 | P a g e
Levi John A. Bernisto,
MIS
COMSC 3

Prim’s Algorithm Kruskal’s Algorithm

It starts to build the It starts to build the Minimum


Minimum Spanning Tree from Spanning Tree from the vertex
any vertex in the graph. carrying minimum weight in the
graph.

It traverses one node more than


It traverses one node only once.
one time to get the minimum
distance.

Prim’s algorithm has a time


complexity of O(V2), V being Kruskal’s algorithm’s time complexity
the number of vertices and is O(E log V), V being the number of
can be improved up to O(E vertices.
log V) using Fibonacci heaps.

Kruskal’s algorithm can generate


Prim’s algorithm gives
forest(disconnected components) at
connected component as well
any instant as well as it can work
as it works only on connected
on disconnected components
graph.

Prim’s algorithm runs faster in Kruskal’s algorithm runs faster in


dense graphs. sparse
graphs.

It generates the minimum It generates the minimum spanning


spanning tree starting from the tree starting from the least
root vertex. weighted edge.

Applications of prim’s algorithm


are Travelling Salesman Applications of Kruskal algorithm
Problem, Network for roads and are LAN connection, TV
Rail tracks connecting all the Network etc.
cities etc.

Prim’s algorithm prefers list data Kruskal’s algorithm prefer heap data
structures. structur

42 | P a g e
Levi John A. Bernisto,
MIS
COMSC 3

ALGORITHM DESIGN TECHNIQUES


1.0 Introduction
2.0 Objectives
3.0 Algorithm Design Techniques
3.1 Popular Algorithm Design Techniques
3.1.1 Divide-and-Conquer Approach
3.1.2 Greedy Techniques
3.1.3 Dynamic Programming
3.1.4 Branch and Bound
3.1.5 Backtracking Algorithm
3.1.6 Randomized Algorithm
1.0 INTRODUCTION
The design of any algorithm follows some planning as there are different
design techniques, strategies or paradigms that could be adopted
depending on the problem domain and a better understanding by the
designer. Some of these techniques could be combined also while the
limiting behaviour of the algorithm can be represented with asymptotic
analysis of which we shall be looking at examples of algorithm design
techniques and asymptotic notations.

OBJECTIVES
By the end of this unit, you will be able to
Understand several design techniques or paradigms of algorithms
 Know the meaning of Asymptotic notations
 Understand some popular Asymptotic notations
 Learn how to apply some of the Asymptotic notations learnt

Algorithm Design Techniques


An algorithm design technique (or “strategy” or “paradigm”) is a
general approach to solving problems algorithmically that is applicable
to a variety of problems from different areas of computing. Learning
these techniques is of utmost importance for the following reasons.
First, they provide guidance for designing algorithms for new
problems, i.e., problems for which there is no known satisfactory
algorithm.
Second, algorithms are the cornerstone of computer science.
Every science is interested in classifying its principal subject, and
computer science is no exception. Algorithm design techniques
make it possible to classify algorithms according to an underlying
design idea; therefore, they can serve as a natural way to both
categorize and study algorithms.
While the algorithm design techniques do provide a powerful set of
general approaches to algorithmic problem solving, designing an
algorithm for a particular problem may still be a challenging task. Some
design techniques can be simply inapplicable to the problem in question.
Sometimes, several techniques need to be combined, and there are
algorithms that are hard to pinpoint as applications of the known design
techniques.
Popular Algorithm Design Techniques
The following is a list of several popular design approaches:
3.1.1 Divide and Conquer Approach:
The divide-and-conquer paradigm often helps in the discovery of

43 | P a g e
Levi John A. Bernisto,
MIS
COMSC 3

efficient algorithms. It is a top-down approach. The algorithms which


follow the divide & conquer techniques involve three steps:
Divide the original problem into a set of sub-problems.
Solve every sub-problem individually, recursively.
Combine the solution of the sub-problems (top level) into a
solution of the whole original problem.
Following are some standard algorithms that are of the Divide and
Conquer algorithms variety.
Binary Search is a searching algorithm. ...
Quicksort is a sorting algorithm. ...
Merge Sort is also a sorting algorithm. ...
Closest Pair of Points The problem is to find the closest pair of
points in a set of points in x-y plane.
3.1.2. Greedy Technique
Greedy method or technique is an algorithmic paradigm that builds
up a solution piece by piece, always choosing the next piece that offers
the most obvious and immediate benefit. So the problems where
choosing locally optimal also leads to global solution are best fit for
Greedy. The Greedy method is used to solve the optimization problem.
An optimization problem is one in which we are given a set of input
values, which are required either to be maximized or minimized (known
as objective), i.e. some constraints or conditions.
Greedy Algorithm always makes the choice (greedy criteria)
looks best at the moment, to optimize a given objective.
The greedy algorithm doesn't always guarantee the optimal
solution however it generally produces a solution that is very
close in value to the optimal.
Examples of Greedy Algorithms
Prim's Minimal Spanning Tree Algorithm.
Travelling Salesman Problem.
Graph – Map Coloring.
Kruskal's Minimal Spanning Tree Algorithm.
Dijkstra's Minimal Spanning Tree Algorithm.
Graph – Vertex Cover.
Knapsack Problem.
Job Scheduling Problem.
3.1.3 Dynamic Programming
Dynamic Programming (DP) is an algorithmic technique for solving
an optimization problem by breaking it down into simpler subproblems
and utilizing the fact that the optimal solution to the overall
problem depends upon the optimal solution to its sub-problems.
Dynamic programming is both a mathematical optimization method and
a computer programming method. The method was developed by
Richard Bellman in the 1950s and has found applications in numerous
fields, from aerospace engineering to economic.
Dynamic programming is used where we have problems, which can
be divided into similar sub-problems, so that their results can be re-
used. Mostly, these algorithms are used for optimization. Before solving
the in-hand sub-problem, dynamic algorithm will try to examine the
results of the previously solved sub-problems.
Some examples of Dynamic Programming are;
Tower of Hanoi
Dijkstra Shortest Path
Fibonacci sequence

44 | P a g e
Levi John A. Bernisto,
MIS
COMSC 3

Matrix chain multiplication


Egg-dropping puzzle, etc
3.1.4 Branch and Bound
The branch and bound method is a solution approach that partitions
the feasible solution space into smaller subsets of solutions. , can
assume any integer value greater than or equal to zero is what gives this
model its designation as a total integer model.
It is used for solving the optimization problems and minimization
problems. If we have given a maximization problem then we can
convert it using the Branch and bound technique by simply converting
the problem into a maximization problem.
An important advantage of branch-and-bound algorithms is that we can
control the quality of the solution to be expected, even if it is not yet
found. The cost of an optimal solution is only up to smaller than the cost
of the best computed one.
Branch and bound is an algorithm design paradigm which is generally
used for solving combinatorial optimization problems.
Some examples of Branch-and-Bound Problems are:
Knapsack problems
Traveling Salesman Problem
Job Assignment Problem, etc
3.1.5. Backtracking Algorithm
A backtracking algorithm is a problem-solving algorithm that uses a
brute force approach for finding the desired output. The Brute force
approach tries out all the possible solutions and chooses the desired/best
solutions.

Backtracking is a general algorithm for finding solutions to some


computational problems, notably constraint satisfaction problems,
that incrementally builds candidates to the solutions, and abandons a
candidate ("backtracks") as soon as it determines that the candidate
cannot possibly be completed to a valid solution.
The algorithm works as follows:
Given a problem:
\Backtrack(s)
if is not a solution return false if is a new solution add to list of
solutions backtrack (expand s)
It finds a solution by building a solution step by step, increasing levels
over time, using recursive calling. A search tree known as the statespace
tree is used to find these solutions. Each branch in a state-space
tree represents a variable, and each level represents a solution.
A backtracking algorithm uses the depth-first search method. When the
algorithm begins to explore the solutions, the abounding function is
applied so that the algorithm can determine whether the proposed
solution satisfies the constraints. If it does, it will keep looking. If it does
not, the branch is removed, and the algorithm returns to the previous
level.
In any backtracking algorithm, the algorithm seeks a path to a feasible
solution that includes some intermediate checkpoints. If the checkpoints
do not lead to a viable solution, the problem can return to the
checkpoints and take another path to find a solution
There are the following scenarios in which you can use the
backtracking:
It is used to solve a variety of problems. You can use it, for

45 | P a g e
Levi John A. Bernisto,
MIS
COMSC 3

example, to find a feasible solution to a decision problem.


Backtracking algorithms were also discovered to be very effective
for solving optimization problems.
In some cases, it is used to find all feasible solutions to
the enumeration problem.
Backtracking, on the other hand, is not regarded as an optimal
problem-solving technique. It is useful when the solution to a
problem does not have a time limit.
Backtracking algorithms are used in;
Finding all Hamiltonian paths present in a graph
Solving the N-Queen problem
Randomized Algorithm
A randomized algorithm is an algorithm that employs a degree of
randomness as part of its logic or procedure..........In some cases,
probabilistic algorithms are the only practical means of solving a
problem.
The output of a randomized algorithm on a given input is a random
variable. Thus, there may be a positive probability that the outcome is
incorrect. As long as the probability of error is small for every possible
input to the algorithm, this is not a problem.
There are two main types of randomized algorithms: Las Vegas
algorithms and Monte-Carlo algorithms.
Example 1: In Quick Sort, using a random number to choose a pivot.
Example 2: Trying to factor a large number by choosing a random
number as possible divisors.
Self-Assessment Exercise
1. What do you understand by an Algorithm design paradigm?
2. How does the Greedy Technique work and give an example?
3, Give a difference between the Backtracking and Randomized
algorithm techniques
3.2 Asymptotic Analysis of algorithms (Growth of function)
Resources for an algorithm are usually expressed as a function regarding
input. Often this function is messy and complicated to work. To study
Function growth efficiently, we reduce the function down to the
important part.
Let f (n) = an2+bn+c
In this function, the n2 term dominates the function that is when n gets
sufficiently large.
Dominate terms are what we are interested in reducing a function, in
this; we ignore all constants and coefficient and look at the highest order
term concerning n.
3.2.1 Asymptotic analysis
It is a technique of representing limiting behavior. The methodology has
the applications across science. It can be used to analyze the
performance of an algorithm for some large data set.
In computer science in the analysis of algorithms, considering the
performance of algorithms when applied to very large input datasets
The simplest example is a function ƒ (n) = n2+3n, the term 3n becomes
insignificant compared to n2 when n is very large. The function "ƒ (n) is
said to be asymptotically equivalent to n2 as n → ∞", and here is
written symbolically as
ƒ (n) ~ n2.
Asymptotic notations are used to write fastest and slowest possible
running time for an algorithm. These are also referred to as 'best case'

46 | P a g e
Levi John A. Bernisto,
MIS
COMSC 3

and 'worst case' scenarios respectively.


"In asymptotic notations, we derive the complexity concerning the size
of the input. (Example in terms of n)"
"These notations are important because without expanding the cost of
running the algorithm, we can estimate the complexity of the
algorithms."
3.2.2 Why is Asymptotic Notation Important?
1. They give simple characteristics of an algorithm's efficiency.
2. They allow the comparisons of the performances of various
algorithms.
3.3 Asymptotic Notations:
Asymptotic Notation is a way of comparing function that ignores
constant factors and small input sizes. Three notations are used to
calculate the running time complexity of an algorithm:
3.3.1. Big-oh notation:
Big-oh is the formal method of expressing the upper bound of an
algorithm's running time. It is the measure of the longest amount of
time. The function f (n) = O (g (n)) [read as "f of n is big-oh of g of n"]

f (n) ⩽ k.g (n)f(n)⩽k.g(n) for n>n0n>n0 in all case


if and only if exist positive constant c and such that

Hence, function g (n) is an upper bound for function f (n), as g (n) grows
faster than f (n)

Examples:
1. 3n+2=O(n) as 3n+2≤4n for all n≥2
2. 3n+3=O(n) as 3n+3≤4n for all n≥3
Hence, the complexity of f(n) can be represented as O (g (n))
3.3.2. Big Omega () Notation
The function f (n) = Ω (g (n)) [read as "f of n is omega of g of n"] if and
only if there exists positive constant c and n0 such that
F (n) ≥ k* g (n) for all n, n≥ n0

47 | P a g e
Levi John A. Bernisto,
MIS
COMSC 3

Example:
f (n) =8n2+2n-3≥8n2-3
=7n2+(n2-3)≥7n2 (g(n))
Thus, k1=7
Hence, the complexity of f (n) can be represented as Ω (g (n))

Big Theta (θ)


The function f (n) = θ (g (n)) [read as "f is the theta of g of n"] if and
only if there exists positive constant k1, k2 and k0 such that
k1 * g (n) ≤ f(n)≤ k2 g(n)for all n, n≥ n0

For Example:
3n+2= θ (n) as 3n+2≥3n and 3n+2≤ 4n, for n
k1=3,k2=4, and n0=2
Hence, the complexity of f (n) can be represented as θ (g(n)).
The Theta Notation is more precise than both the big-oh and Omega
notation. The function f (n) = θ (g (n)) if g(n) is both an upper and lower
bound.

CONCLUSION
Algorithm design techniques presents us with different paradigms or

48 | P a g e
Levi John A. Bernisto,
MIS
COMSC 3

methods of representing or designing computer algorithms and as the


algorithm executes and grows in bounds (upper or lower), the
Asymptotic notations help us to determine the levels of growth.

SUMMARY
Several design techniques or paradigms are available for specifying
algorithms and they range from the popular Divide-and-Conquer,
Greedy techniques and Randomized algorithms amongst others. In the
same vein, we have three main notations for carrying out the Asymptotic
analysis of algorithms and they are the Big O, Big Omega and Big Theta
notations.

49 | P a g e

You might also like