Data Structures Unit-2 Notes
Data Structures Unit-2 Notes
results = {'Detra' : 17,'Nova' : 84,'Charlie' : 22, 'Henry' : 75, 'Roxanne' : 92, 'Elsa' : 29}
Note:The keys in a dictionary must be simple types such as integers or strings while the values can be of
any type.
Operations of Dictionary:
1. Insertion
2. Deletion
3. Searc
hExample:
Consider an empty unordered dictionary and the following set of operations:
Implementation of Dictionary:
Dictionary is implemented in four ways:
1. Linear List Representation
2. Skip List Representation
3. Hashing
4. Trees
Linear List Representation
The dictionary can be represented as a linear list.The linear list is a collection of pairs(key & value).
EX:
key value next key value next key value next key value next
1 40 2 50 4 70 6 80
struct node
{
int key;
int value;
struct node *next;
}*head=NUL
L;void
insert(); void
delete(); void
search();
1. Insertion:
Step 1: Create a new node say ‘ptr’ with key and value.
Step 2: Check whether dictionary is EMPTY. (head==NULL)
Step 3:If Dictionary is Empty,then set head=ptr and define node pointer curr aand initialize it with head. i.e
curr=head.
Step 4: If it is not Empty then check the condition ptr->key>curr->key.
Step 5: If it is True then, set curr->next=ptr,curr=ptr.
Step 6: If it is not true then,define two node pointers temp and temp1 and initialize them with head.
Step 7: Keep moving the temp to its next node until the condition is true.
Step 8:Move temp1 to next node until the condition is true.Then set ptr->next=temp1->next ,temp1-
>next=ptr.
curr->next=ptr;
curr=ptr;
}
else
{
struct node *temp=*temp1=head;
while(temp->key<ptr->key)
temp=temp->next;
for(;temp1->next!=temp;temp1=temp1->next);
ptr->next=temp1->next;
temp1->next=ptr;
}
}
}
2. Deletion
3. Search
1. Searching Process
· When an element is tried to search, the search begins at the head element of the top list.
· It proceeds horizontally until the current element is greater than or equal to the target.
· If current element and target are matched, it means they are equal and search gets finished.
· If the current element is greater than target, the search goes on and reaches to the end of the linked
list, the procedure is repeated after returning to the previous element and the search reaches to the
next lower list (vertically).
2. Insertion
· The insertion algorithm for skip lists uses randomization to decide how many references to the new
item (k,e) should be added to the skip list.
· We then insert (k,e) in this bottom-level list immediately after position p.
· After inserting the new item at this level we “flip a coin”.
· If the flip comes up tails, then we stop right there.
· If the flip comes up heads, we move to next higher level and insert (k,e) in this level at the
appropriate position.
Exapmle:
· Suppose we want to insert 15
· Do a search, and find the spot between 10 and 23
3. Deletion
Example
1) Suppose we want to delete 34
2) Do a search, find the spot between 23 and 45
3) Remove all the position above p
1. 34<+ “drop down”
2. 34=34 “return element(after(p)”
3. Remove all the position above 34
Hash Table
0 72 H(36)=36%8=4
1 H(18)=18%8=2
2 18 H(72)=72%8=0
3 43 H(43)=43%8=3
4 36 H(6)=6%8=6
5
6 6
7
Collision
If the hash function returns same hash key for more than one element then this situation is called Collision.
(OR)
If x1 and x2 are two different keys,but the hash values of x1 and x2 are equal(i.e, h(x1)=h(x2)) then it is
called as a Collision.
0 H(131)=131%10=1
1 131
2 H(3)=3%10=3
3 3
4 4 H(4)=4%10=4
5
6 H(21)=21%10=1
7 Collision
97
8
H(61)=61%10=1
9
89
H(24)=24%10=4
H(7)=7%10=7
H(97)=97%10=7
H(89)=89%10=9
Collision Resolution Techniques
Collision Resolution Techniques are the techniques used for resolving or handling the collision.
Collision resolution techniques are classified as
Separate Chaining
To handle the collision,
This technique creates a linked list to the slot for which collision occurs.
The new key is then inserted in the linked list.
These linked lists to the slots appear like chains.
That is why, this technique is called as separate chaining.
Problem
Insert the following sequence of keys in the hash table50,
700, 76, 85, 92, 73 and 101.
Use separate chaining technique for collision resolution. Table size is 7.
Solution
Draw an empty hash table consisting of 7 buckets.
The possible range of hash values is [0, 6].
Insert the given keys in the hash table one by one.
Hashing Formula is H(Key)=Key mod Table Size.
Insert 50:
H(50)=50 mod
7H(50)=1
So,insert 50 in bucket-1 of the hash table.
Insert 700:
H(700)=700 mod
7H(700)=0
So,insert 700 in bucket-0 of the hash table.
Insert 76:
H(76)=76 mod
7H(76)=6
So,insert 76 in bucket-6 of the hash table.
Insert 85:
H(85)=85 mod
7H(85)=1
Since bucket-1 is already occupied, so collision occurs.
Separate chaining handles the collision by creating a linked list to bucket-1.
So,insert 85 in bucket-1 of the hash table.
Insert 92:
H(92)=92 mod
7H(92)=1
Since bucket-1 is already occupied, so collision occurs.
Separate chaining handles the collision by creating a linked list to bucket-1.
So,insert 92 in bucket-1 of the hash table.
Insert 73:
H(73)=73 mod
7H(73)=3
Since bucket-3 is already occupied, so collision occurs.
Separate chaining handles the collision by creating a linked list to bucket-3.
So,insert 73 in bucket-3 of the hash table.
Insert 101:
H(101)=101 mod
7H(101)=3
Since bucket-3 is already occupied, so collision occurs.
Separate chaining handles the collision by creating a linked list to bucket-3.
So,insert 101 in bucket-3 of the hash table.
Open Addressing
In open addressing,
Unlike separate chaining, all the keys are stored inside the hash table.
No key is stored outside the hash table.
Advantage
It is easy to compute.
Disadvantage
The main problem with linear probing is clustering.
Many consecutive elements form groups.
Then, it takes time to search an element or to find an empty bucket.
Problem
Insert the following sequence of keys in the hash table50,
700, 76, 85, 92, 73,101.
Use linear probing technique for collision resolution.Table size is 7
Solution
Draw an empty hash table consisting of 7 buckets.
The possible range of hash values is [0, 6].
Insert the given keys in the hash table one by one.
Hashing Formula is H(Key)=Key mod Table Size.
Insert 50:
· H(50)=50 mod 7
· H(50)=1
· So,insert 50 in bucket-1 of the hash table.
Insert
700:
· H(700)=700 mod 7
· H(700)=0
· So,insert 700 in bucket-0 of the hash table.
Insert
76:
· H(76)=76 mod 7
· H(76)=6
· So,insert 76 in bucket-6 of the hash table.
Insert
85: · H(85)=85 mod 7
· H(85)=1
· Since bucket-1 is already occupied, so collision occurs.
· To handle the collision, linear probing technique keeps probing linearly until an empty
bucket is found.
· The first empty bucket is bucket-2.
· So,insert 85 in bucket-2 of the hash table.
Insert
92: · H(92)=92 mod 7
· H(92)=1
· Since bucket-1 is already occupied, so collision occurs.
· To handle the collision, linear probing technique keeps probing linearly until an empty
bucket is found.
· The first empty bucket is bucket-3.
· So,insert 92 in bucket-3 of the hash table.
Insert
73: · H(73)=73 mod 7
· H(73)=3
· Since bucket-3 is already occupied, so collision occurs.
· To handle the collision, linear probing technique keeps probing linearly until an empty
bucket is found.
· The first empty bucket is bucket-4.
· So,insert 73 in bucket-4 of the hash table.
Insert
101: · H(101)=101 mod 7
· H(101)=3
· Since bucket-3 is already occupied, so collision occurs.
· To handle the collision, linear probing technique keeps probing linearly until an empty
bucket is found.
· The first empty bucket is bucket-5.
· So,insert 101 in bucket-5 of the hash table.
Quadratic Probing
In quadratic probing,
· When collision occurs, we probe for i2‘th bucket in ith iteration.
· We keep probing until an empty bucket is found.
Solution:
Insert 23
Hi(X)=((Hash(X)+f(i)) mod Tablesize
H0(23)=((Hash(23)+f(0)) mod 11
Hash(23)=23 mod 11
Hash(23)=1
H0(23)=((1+0)) mod 11
H0(23)=1 mod 11
H0(23)=1
0
1 23
2
3
4
5
6
7
8
9
10
Insert 13
Hi(X)=((Hash(X)+f(i)) mod Tablesize
H0(13)=((Hash(13)+f(0)) mod 11
Hash(13)=13 mod 11
Hash(13)=2
H0(13)=((2+0)) mod 11
H0(13)=2 mod 11
H0(13)=2
0
1 23
2 13
3
4
5
6
7
8
9
10
Insert 12
Hi(X)=((Hash(X)+f(i)) mod Tablesize
H0(12)=((Hash(12)+f(0)) mod 11
Hash(12)=12 mod 11
Hash(12)=1
H0(12)=((1+0)) mod 11
H0(12)=1 mod 11
H0(12)=1
Collision is Occurred at Bucket-1.
H1(12)=((Hash(12)+f(1)) mod 11
H1(12)=(1+12) mod 11
H1(12)=2 mod 11
H1(12)=2
Collision is Occurred at Bucket-2
H2(12)=((Hash(12)+f(2)) mod 11
H2(12)=(1+22) mod 11
H2(12)=5 mod 11
H2(12)=5
0
1 23
2 13
3
4
5 12
6
7
8
9
10
Insert 22
0 22
1 23
2 13
3
4
5 12
6
7
8
9
10
Insert 18
Hi(X)=((Hash(X)+f(i)) mod Tablesize
H0(18)=((Hash(18)+f(0)) mod 11
Hash(18)= 18 mod 11
Hash(18)=7
H0(18)=((7+0)) mod 11
H0(18)=7 mod 11
H0(18)=7
0 22
1 23
2 13
3
4
5 12
6
7 18
8
9
10
Insert 28
Hi(X)=((Hash(X)+f(i)) mod Tablesize
H0(28)=((Hash(28)+f(0)) mod 11
Hash(28)= 28 mod 11
Hash(28)=6
H0(28)=((6+0)) mod 11
H0(28)=6 mod 11
H0(28)=6
0 22
1 23
2 13
3
4
5 12
6 28
7 18
8
9
10
Double Hashing
Double hashing is a collision resolving technique in Open Addressed Hash tables. Double hashing uses the
idea of applying a second hash function to key when a collision occurs.
Problem
Insert the following list of keys in the hash table by using Double Hashing Technique for
Collision Resolution.Table size is 10.
89,18,49,58,69,60
Solution:
Insert 89
Hi(X)=((Hash(X)+f(i)) mod Tablesize
H0(89)=((Hash(89)+f(0)) mod 10
Hash(89)= 89 mod 10
Hash(89)=9
H0(89)=((9+0)) mod 10
H0(89)=9 mod 10
H0(89)=9
0
1
2
3
4
5
6
7
8
9 89
Insert 18
Hi(X)=((Hash(X)+f(i)) mod Tablesize
H0(18)=((Hash(18)+f(0)) mod 10
Hash(18)= 18 mod 10
Hash(18)=8
H0(18)=((8+0)) mod 10
H0(18)=8 mod 10
H0(18)=8
0
1
2
3
4
5
6
7
8 18
9 89
Insert 49
Hi(X)=((Hash(X)+f(i)) mod Tablesize
H0(49)=((Hash(49)+f(0)) mod 10
Hash(49)= 49 mod 10
Hash(49)=9
H0(49)=((9+0)) mod 10
H0(49)=9 mod 10
H0(49)=9
Collision is Occurred at Bucket-9.
H1(49)=((Hash(49)+f(1)) mod 10
H1(49)=((9+(1*Hash2(49)) mod 10
Hash2(49)=7-(49 mod 7)
Hash2(49)=7-0
Hash2(49)=7
H1(49)=((9+7) mod 10
H1(49)=16 mod 10
H1(49)=6
0
1
2
3
4
5
6 49
7
8 18
9 89
Insert 58
Hi(X)=((Hash(X)+f(i)) mod Tablesize
H0(58)=((Hash(58)+f(0)) mod 10
Hash(58)= 58 mod 10
Hash(58)=8
H0(58)=((8+0)) mod 10
H0(58)=8 mod 10
H0(58)=8
Collision is Occurred at Bucket-8.
H1(58)=((Hash(58)+f(1)) mod 10
H1(58)=((8+(1*Hash2(58)) mod 10
Hash2(58)=7-(58 mod 7)
Hash2(58)=7-2
Hash2(58)=5
H1(58)=((8+5) mod 10
H1(58)=13 mod 10
H1(58)=3
0
1
2
3 58
4
5
6 49
7
8 18
9 89
Insert 69
Hi(X)=((Hash(X)+f(i)) mod Tablesize
H0(69)=((Hash(69)+f(0)) mod 10
Hash(69)= 69 mod 10
Hash(69)=9
H0(69)=((9+0)) mod 10
H0(69)=9 mod 10
H0(69)=9
Collision is Occurred at Bucket-9.
H1(69)=((Hash(69)+f(1)) mod 10
H1(69)=((9+(1*Hash2(69)) mod 10
Hash2(69)=7-(69 mod 7)
Hash2(69)=7-6
Hash2(69)=1
H1(69)=((9+1) mod 10
H1(69)=10 mod 10
H1(69)=0
0 69
1
2
3 58
4
5
6 49
7
8 18
9 89
Insert 60
Hi(X)=((Hash(X)+f(i)) mod Tablesize
H0(60)=((Hash(60)+f(0)) mod 10
Hash(60)= 60 mod 10
Hash(60)=0
H0(60)=((0+0)) mod 10
H0(60)=0 mod 10
H0(60)=0
Collision is Occurred at Bucket-0.
H1(60)=((Hash(60)+f(1)) mod 10
H1(60)=((0+(1*Hash2(60)) mod 10
Hash2(60)=7-(60 mod 7)
Hash2(60)=7-4
Hash2(60)=3
H1(60)=((0+3) mod 10
H1(60)=3 mod 10
H1(60)=3
Collision is Occurred at Bucket-3.
H2(60)=((Hash(60)+f(2)) mod 10
H2(60)=((0+(2*Hash2(60)) mod 10
H2(60)=((0+(2*3)) mod 10
H2(60)=((0+6) mod 10
H2(60)=6 mod 10
H2(60)=6
Collision is Occurred at Bucket-6.
H3(60)=((0+(3*Hash2(60)) mod 10
H3(60)=((0+(3*3)) mod 10
H3(60)=9 mod 10
H3(60)=9
Collision is Occurred at Bucket-9.
H4(60)=((0+(4*Hash2(60)) mod 10
H4(60)=((0+(4*3)) mod 10
H4(60)=12 mod 10
H4(60)=2
0 60
1
2 60
3 58
4
5
6 49
7
8 18
9 89
Load Factor(α)-
Loadfactor(α)isdefinedas-
Load Factor (α) = M / N
Where,
M = Number of elements present in the
hash tableN = Total size of the hash table.
The default load factor of Hash Map is 0.75f (75% of the map size).
When the load factor ratio (m/n) reaches 0.75 at that time, hash map increases its capacity.
How Load Factor is calculated :
Now check that we need to increase the hashmap capacity
or not Number of elements present in the hash table
(M)=9
Total size of the hash table ( N ) = 11
Load Factor (α) = M / N
= 9 / 11 = 0.81
Now compare this value with the default
factor0.81 > 0.75
Now we need to increase the hashmap size.
In open addressing, the value of load factor always lie between 0and
1. This is because-
· In open addressing, all the keys are stored inside the hash table.
· So, size of the table is always greater or at least equal to the number of keys stored in the table.
REHASHING
Rehashing is a technique in which the table is resized, i.e., the size of table is doubled by creating a
new table. It is preferable is the total size of table is a prime number. There are situations in which
the rehashing is required.
When table is completely full
With quadratic probing when the table is filled half.
When insertions fail due to overflow.
In such situations, we have to transfer entries from old table to the new table by re computing
their positions using hash functions.
Example:
Consider we have to insert the elements 37, 90, 55, 22, 16, 49, 33 and 88.
Table size is 10
Solution :
h(37) = 37 mod 10 = 7
Buckets Key
h(90) = 90 mod 10 = 0
h(55) = 55 mod 10 = 5 0 90
h(22) = 22 mod 10 = 2 1
h(16) = 16 mod 10 = 6 2 22
h(49) = 49 mod 10 = 9 3 33
h(33) = 33 mod 10 = 3 4
h(88) = 88 mod 10 = 8 5 55
6 16
7 37
8 88
9 49
Now this table is almost full and if we try to insert more elements collisions will occur and eventually
further insertions will fail. Hence we will rehash by doubling the table size. The old table size is 10 then
we should double this size for new table that becomes 20. But 20 is not a prime number, we will prefer to
make the table size as 23.
New Hash Table
Buckets Key
h(X) = X mod Table size
0
h(37) = 37 mod 23 = 14 1
h(90) = 90 mod 23 = 21 2
h(55) = 55 mod 23 = 9
3 49
h(22) = 22 mod 23 = 22
4
h(16) = 16 mod 23 = 16
5
h(49) = 49 mod 23 = 3
6
h(33) = 33 mod 23= 10
7
h(88) = 88 mod 23 = 19
8
9 55
10 33
11
12
13
14 37
15
16 16
17
18
19 88
20
21 90
22 22
Advantages:
1. This technique provides the programmer a flexibility to enlarge the table size if required.
2. Only the space gets doubled with simple hash function which avoids occurrence of collisions.
EXTENDABLE HASHING
Extendible Hashing is a dynamic hashing method wherein directories, and buckets are
used to hash data. It is an aggressively flexible method in which the hash function also experiences
dynamic changes.
Main features of Extendible Hashing: The main features in this hashing technique are:
Step 1 – Analyze Data Elements: Data elements may exist in various forms eg. Integer, String, Float, etc..
Currently, let us consider data elements of type integer. eg: 49.
Step 2 – Convert into binary format: Convert the data element in Binary form. For string elements,
consider the ASCII equivalent integer of the starting character and then convert the integer into binary form.
Since we have 49 as our data element, its binary form is 110001.
Step 3 – Check Global Depth of the directory. Suppose the global depth of the Hash-directory is 3.
Step 4 – Identify the Directory: Consider the ‘Global-Depth’ number of LSBs in the binary number and
match it to the directory id.
Eg. The binary obtained is: 110001 and the global-depth is 3. So, the hash function will return 3 LSBs
of110001 viz. 001.
Step 5 – Navigation: Now, navigate to the bucket pointed by the directory with directory-id 001.
Step 6 – Insertion and Overflow Check: Insert the element and check if the bucket overflows. If an
overflow is encountered, go to step 7 followed by Step 8, otherwise, go to step 9.
Step 7 – Tackling Over Flow Condition during Data Insertion: Many times, while inserting data in
thebuckets, it might happen that the Bucket overflows. In such cases, we need to follow an appropriate
procedure to avoid mishandling of data.
First, Check if the local depth is less than or equal to the global depth. Then choose one of the cases below.
Case 1: If the local depth of the overflowing Bucket is equal to the global depth, then Directory
Expansion, as well as Bucket Split, needs to be performed. Then increment the global depth and
the local depth value by 1. And, assign appropriate pointers.
Directory expansion will double the number of directories present in the hash structure.
Case 2: In case the local depth is less than the global depth, then only Bucket Split takes place.
Then increment only the local depth value by 1. And, assign appropriate pointers.
Step 8 – Rehashing of Split Bucket Elements: The Elements present in the overflowing bucket that is split
are rehashed w.r.t the new global depth of the directory.
Key Binary
s Form
16 10000
4 00100
6 00110
22 10110
24 11000
10 01010
31 11111
7 00111
9 01001
20 10100
26 11010
Step1 : Initially, the global-depth and local-depth is always 1. Thus, the hashing frame is
id=0.
id=0.
Since Local Depth = Global Depth, the bucket splits and directory expansion takes place. Also,
rehashing of numbers present in the overflowing bucket takes place after the split. And, since the
global depth is incremented by 1, now,the global depth is 2. Hence, 16,4,6,22 are now rehashed
w.r.t 2 LSBs.[ 16(10000),4(100),6(110),22(10110) ]
Step 6: Inserting 24 : The binary format of 24 is 11000 and Global depth is 2.
It is LSB of 11111 which is 11. Hence, 31 is mapped to the directory with id=11.
It is LSB of 00111 which is 11. Hence, 7 is mapped to the directory with id=11.
Step 10: Inserting 9 : The binary format of 9 is 01001 and Global depth is 2.
It is LSB of 01001 which is 01. Hence, 9 is mapped to the directory with id=01.
Step 11: Inserting 20 : The binary format of 20 is 10100 and Global depth is
2. It is LSB of 10100 which is 00. Hence, 20 is mapped to the directory with
id=00.
Since Local Depth = Global Depth, the bucket splits and directory expansion takes place. Also,
rehashing of numbers present in the overflowing bucket takes place after the split. And, since the
global depth is incremented by 1, now, the global depth is 3. Hence, 16,4,24,20 are now rehashed
w.r.t 3 LSBs.[ 16(10000),4(00100),24(11000),20(10100) ]
Step 12: Inserting 26 : The binary format of 26 is 11010 and Global depth is 3.
It is LSB of 11010 which is 010. Hence, 26 is mapped to the directory with id=010.
The bucket pointed by directory 010 is Already full. Hence , overflow occurs.