05 CS19241 DS Unit V LN 2020
05 CS19241 DS Unit V LN 2020
1.1 INTRODUCTION
Linear search
Binary search
1. What is searching?
2. What are the types of searching?
2.1 INTRODUCTION
Consider an array of integers A containing n elements. Let k be the value that needs to
be searched. The linear search technique will first compare A[0] with k to find out if they are
same. If the two values are found to be same then the index value, i.e., 0 will be returned as the
location containing k. However, if the two values are not same then k will be compared with
A[1]. This process will be repeated until the element is not found. If the last comparison
between k and A[n– 1] is also negative then the search will be considered as unsuccessful.
Figure depicts the linear search technique performed on an array of integers.
As shown in Fig., the value k is repeatedly compared with each element of the array A.
As soon as the element is found, the corresponding index location is returned and the search
operation is terminated.
2.2 ALGORITHM
LinearSearch(A[], N, KEY)
Step 1 : Start.
Step 2 : Repeat For I = 0 to N-1.
Step 3 : If A[I] = KEY then Goto Step 4 else Goto Step 5.
Step 4 : Return I and Stop.
Step 5 : Increment I by 1.
Step 6 : [End of Step 3 For loop].
Step 7 : Return -1.
Step 8 : Stop.
2.4 PROGRAM
#include <stdio.h>
int main()
{
int n, a[10], i, key, pos;
printf("Enter the limit : ");
scanf("%d", &n);
printf("Enter the elements : ");
for (i = 0; i < n; i++)
scanf("%d", &a[i]);
printf("Enter the element to search : ");
scanf("%d", &key);
pos = LinearSearch(a, n, key);
if (pos != -1)
printf("Element found at location : %d", pos);
else
printf("Element not found.");
return 0;
}
Output
Assume that an array containing n elements is to be searched for the value k. In the best
case, k would be the first element in the list, thus requiring only one comparison. In the worst
case, it would be last element in the list, thus requiring n comparisons.
To compute the efficiency of linear search we can add all the possible number of
comparisons and divide it by n.
3.1 INTRODUCTION
Binary search technique has a prerequisite – it requires the elements of a data structure
(list) to be already arranged in a sorted manner before search can be performed in it. It begins
by comparing the element that is present at the middle of the list. If there is a match, then the
search ends immediately and the location of the middle element is returned. However, if there
is a mismatch then it focuses the search either in the left or the right sub list depending on
whether the target element is lesser than or greater than middle element. The same
methodology is repeatedly followed until the target element is found.
As we can see in Fig., the array A on which binary search is to be performed is already
sorted. The following steps describe how binary search is performed on array A to search for
value k:
1. First of all, the middle element in the array A is identified, which is 18.
2. Now, k is compared with 18. Since k is greater than 18, the search is focused on the
right sub list.
3. The middle element in the right sub list is 33. Since k is less than 33, the search is
focused on the left sub list, which is {21, 33}.
4. Now, again k is compared with the middle element of {21, 33}, which is 21. Thus, it
matches with k.
5. The index value of 21, i.e., 4 is returned and the search is considered as successful.
3.2 ALGORITHM
Step 1 : Start.
Step 2 : Set FIRST = 0.
Step 3 : Set LAST = N – 1.
Step 4 : Repeat While FIRST <= LAST.
Step 5 : Set MID = (FIRST + LAST) / 2.
Step 6 : If A[MID] = KEY then Goto Step 7 else Goto Step 8.
Step 7 : Return MID and Stop.
Step 8 : If A[MID] < KEY then Goto Step 9 else Goto Step 10.
3.3 ROUTINE
3.4 PROGRAM
#include <stdio.h>
int main()
{
int n, a[10], i, key, pos;
printf("Enter the limit : ");
scanf("%d", &n);
printf("Enter the elements in sorted order : ");
for (i = 0; i < n; i++)
scanf("%d", &a[i]);
printf("Enter the element to search : ");
scanf("%d", &key);
pos = BinarySearch(a, n, key);
if (pos != -1)
printf("Element found at location : %d", pos);
else
printf("Element not found.");
return 0;
}
Output
Best Case
The best case for a binary search algorithm occurs when the element to be searched is
present at the middle of the list. In this case, only one comparison is made to perform the search
operation.
Worst Case
The worst case for a binary search algorithm occurs when the element to be searched is
not present in the list. In this case, the list is continuously divided until only one element is left
for comparison.
Let n be the number of list elements and c be the total number of comparisons made in
the worst case.
Now, after every single comparison, the number of list elements left to be searched is
reduced by 2.
Unlike linear search, it requires the list to be sorted before search can be performed.
In comparison to linear search, the binary search technique may seem to be a little
difficult to implement.
11.1 INTRODUCTION
Sorting and searching are two of the most common operations performed by computers
all around the world. The sorting operation arranges the numerical and alphabetical data
present in a list, in a specific order or sequence. Searching, on the other hand, locates a specific
element across a given list of elements. At times, a list may require sorting before the search
operation can be performed on it.
Sorting Techniques
Internal Sorting
External Sorting
All sorting techniques which require the data set to be present in the main memory are
referred as internal sorting techniques. Examples:
Insertion sort
Selection sort
Shell sort
Bubble sort
Quick sort
Heap sort
External Sorting, takes place in the secondary memory of a computer. Since the number
of objects to be sorted is too large to fit in main memory. Examples:
Merge Sort
Multiway Merge
Polyphase merge
1. What is sorting?
2. What are the types of sorting?
3. Differentiate between internal sorting and external sorting. (or) Differentiate internal and
external sorting.
4. List the four types of sorting techniques.
5. What is the need for external sorting?
6. List sorting algorithm which uses logarithmic time complexity.
12.1. INTRODUCTION
Insertion sorts works by taking elements from the list one by one and inserting them in
their current position into a new sorted list. Insertion sort consists of N - 1 passes, where N is
the number of elements to be sorted. The ith pass of insertion sort will insert the ith element A[i]
into its rightful place among A[1], A[2], . . . A[i-1]. After doing this insertion the records
occupying A[1] . . . A[i] are in sorted order.
The following list describes the tasks performed in each of the passes:
Pass 1: A[2] is compared with A[1] and inserted either before or after A[1]. This makes
A[1], A[2] a sorted sub array.
Pass 2: A[3] is compared with both A[1] and A[2] and inserted at an appropriate place.
This makes A[1], A[2], A[3] a sorted sub array.
Pass n–1: A[n] is compared with each element in the sub array A[1], A[2], A[3], … A[n-1]
and inserted at an appropriate position. This eventually makes the entire array A sorted.
12.2. ALGORITHM
InsertionSort(A[], N)
Step 1 : Start.
Step 2 : Repeat For I = 1 to N-1.
Step 3 : Set TEMP = A[I].
Step 4 : Set J = I.
Step 5 : Repeat While J > 0 and A[J - 1] > TEMP.
Step 6 : Set A[J] = A[J - 1].
Step 7 : Decrement J by 1.
Step 8 : [End of Step 5 While loop].
Step 9 : Set A[J] = TEMP.
Step 10 : Increment I by 1.
Step 11 : [End of Step 2 For loop].
Step 12 : Stop.
12.4. EXAMPLE
20 10 60 40 30 15
Positions
Original 20 10 60 40 30 15
Moved
After i = 1 10 20 60 40 30 15 1
After i = 2 10 20 60 40 30 15 0
After i = 3 10 20 40 60 30 15 1
After i = 4 10 20 30 40 60 15 2
After i = 5 10 15 20 30 40 60 4
Sorted Array 10 15 20 30 40 60
12.5 PROGRAM
#include <stdio.h>
int main()
{
int n, i, a[10];
printf("Enter the limit : ");
scanf("%d", &n);
printf("Enter the elements : ");
for (i = 0; i < n; i++)
scanf("%d", &a[i]);
InsertionSort(a, n);
Output
Assume that an array containing n elements is sorted using insertion sort technique.
The efficiency of O(n2) is not well suited for large sized lists.
It is expensive because of shifting all following elements by one.
13.1 INTRODUCTION
Selection sort is one of the most basic sorting techniques. It works on the principle of
identifying the smallest element in the list and moving it to the beginning of the list. This process
is repeated until all the elements in the list are sorted.
13.2 ALGORITHM
SelectionSort(A[], N)
Step 1 : Start.
Step 2 : Repeat For I = 0 to N - 1.
Step 3 : Set MIN = I.
Step 4 : Repeat For J = I + 1 to N.
Step 5 : If A[J] < A[MIN] then Goto Step 6 else Goto Step 7.
Step 6 : Set MIN = J.
Step 7 : Increment J by 1.
Step 8 : [End of Step 4 For loop].
Step 9 : Set TEMP = A[I].
Step 10 : Set A[I] = A[MIN].
Step 11 : Set A[MIN] = TEMP.
Step 12 : Increment I by 1.
Step 13 : [End of Step 2 For loop].
Step 14 : Stop.
13.3 ROUTINE
Let us consider an example where a list L contains five integers stored in a random
fashion, as shown:
Now, if the list L is sorted using selection sort technique then first of all the first element
in the list, i.e., 18 will be selected and compared with all the remaining elements in the list. The
element which is found to be the lowest amongst the remaining set of elements will be swapped
with the first element. Then, the second element will be selected and compared with the
remaining elements in the list. This process is repeated until all the elements are rearranged in
a sorted manner.
Table illustrates the sorting of list L in ascending order using selection sort.
A single iteration of the selection sorting technique that brings the smallest element at
the beginning of the list is called a pass. As we can see in the above table, four passes were
required to sort a list of five elements. Hence, we can say that selection sort requires n–1 passes
to sort an array of n elements.
13.5. PROGRAM
#include <stdio.h>
int main()
{
int i, n, a[10];
printf("Enter the limit : ");
scanf("%d", &n);
printf("Enter the elements : ");
for (i = 0; i < n; i++)
scanf("%d", &a[i]);
SelectionSort(a, n);
Output
Assume that an array containing n elements is sorted using selection sort technique.
The disadvantages associated with selection sort that prevent the programmers from
using it often are as follows:
The efficiency of O(n2) is not well suited for large sized lists.
It does not leverage the presence of any existing sort pattern in the list.
14.1 INTRODUCTION
Shell sort is an algorithm that first sorts the elements far apart from each other and
successively reduces the interval between the elements to be sorted. It is a generalized version
of insertion sort.
In shell sort, elements at a specific interval are sorted. The interval between the
elements is gradually decreased based on the sequence used. The performance of the shell sort
depends on the type of sequence used for a given input array.
14.2. ALGORITHM
ShellSort(A[], N)
Step 1 : Start.
Step 2 : Repeat For GAP = N / 2 to 0.
Step 3 : Repeat For I = GAP to N.
Step 4 : Set TEMP = A[I].
Step 5 : Repeat For J = I to J >= GAP and A[J - GAP] > TEMP.
Step 6 : Set A[J] = A[J - GAP].
Step 7 : [End of Step 5 For loop].
Step 8 : Set A[J] = TEMP.
Step 9 : [End of Step 3 For loop].
Step 10 : [End of Step 2 For loop].
Step 11 : Stop.
14.3 ROUTINE
14.4 EXAMPLE
81 94 11 96 12 35 17 95 28 58
81 94 11 96 12 35 17 95 28 58
35 17 11 28 12 81 94 95 96 58
28 12 11 35 17 81 58 95 96 94
11 12 17 28 35 58 81 94 95 96
14.5 PROGRAM
#include <stdio.h>
void ShellSort(int a[], int n);
int main()
{
int i, n, a[10];
printf("Enter the limit : ");
scanf("%d", &n);
printf("Enter the elements : ");
for (i = 0; i < n; i++)
scanf("%d", &a[i]);
ShellSort(a, n);
printf("The sorted elements are : ");
for (i = 0; i < n; i++)
printf("%d\t", a[i]);
return 0;
}
Output
It is one of the fastest sorting techniques for sorting small number of elements.
It requires relatively small amount of memory.
15.1 INTRODUCTION
Bubble sort is one of the simplest internal sorting algorithms. Bubble sort works by
comparing two consecutive elements and the largest element among these two bubbles
towards right. At the end of the first pass the largest element gets sorted and placed at the end
of the sorted list. This process is repeated for all pairs of the elements until it moves the largest
element to the end of the list in that iteration.
Bubble sort consists of n-1 passes, where ‘n’ is the number of elements to be sorted. In
the 1st pass, the largest element will be placed in the nth position. In the 2nd pass, the second
largest element will be placed in the (n-1)th position. In (n-1)th pass only the first two elements
are compared.
15.2 ALGORITHM
BubbleSort(A[], N)
15.3 ROUTINE
void BubbleSort(int a[], int n)
{
int i, j, t;
for (i = 0; i < n - 1; i++)
{
for (j = 0; j < n - 1 - i; j++)
{
if (a[j] > a[j + 1])
{
t = a[j];
a[j] = a[j + 1];
a[j + 1] = t;
}
}
}
}
Let us consider an example where a list L contains five integers stored in a random
fashion, as shown:
Table illustrates the sorting of list L in ascending order using bubble sort:
As we can see in the above illustration, four passes were required to sort a list of five
elements. Hence, we can say that bubble sort requires n–1 passes to sort an array of n elements.
15.5 PROGRAM
#include <stdio.h>
int main()
{
int a[10], i, n;
printf("Enter the limit : ");
scanf("%d", &n);
printf("Enter the numbers : ");
for (i = 0; i < n; i++)
scanf("%d", &a[i]);
BubbleSort(a, n);
printf("The sorted elements are : ");
for (i = 0; i < n; i++)
printf("%d\t", a[i]);
return 0;
}
[Link] | AP (SG) | CSE | Rajalakshmi Engineering College 35
void BubbleSort(int a[], int n)
{
int i, j, t;
for (i = 0; i < n - 1; i++)
{
for (j = 0; j < n - 1 - i; j++)
{
if (a[j] > a[j + 1])
{
t = a[j];
a[j] = a[j + 1];
a[j + 1] = t;
}
}
}
}
Output
Assume that an array containing n elements is sorted using bubble sort technique.
The disadvantages associated with bubble sorting technique are given below:
The efficiency of O(n2) is not well suited for large sized lists.
It requires large number of elements to be shifted.
It is slow in execution as large elements are moved towards the end of the list in a
step-by-step fashion.
16.1. INTRODUCTION
Quick Sort is the most efficient internal sorting technique. It possesses a very good
average case behaviour among all the sorting techniques. It is also called partitioning sort which
uses divide and conquer techniques.
The quick sort works by partitioning the array A[1], A[2] . . . A[n] by picking some key
value in the array as a pivot element. Pivot element is used to rearrange the elements in the
array. Pivot can be the first element of an array and the rest of the elements are moved so that
the elements on left side of the pivot are lesser than the pivot, whereas those on the right side
are greater than the pivot. Now, the pivot element is placed in its correct position. Now the
quicksort procedure is applied for left array and right array in a recursive manner.
16.2. ALGORITHM
Step 1 : Start.
Step 2 : If LEFT < RIGHT then Goto Step 3 else Goto Step 23.
Step 3 : Set PIVOT = LEFT.
Step 4 : Set I = LEFT + 1.
Step 5 : Set J = RIGHT.
Step 6 : Repeat While I < J.
Step 7 : Repeat While A[I] < A[PIVOT].
Step 8 : Increment I by 1.
Step 9 : [End of Step 7 While loop].
Step 10 : Repeat While A[J] > A[PIVOT].
Step 11 : Decrement J by 1.
Step 12 : [End of Step 10 While loop].
Step 13 : If I < J then Goto Step 14 else Goto Step 17.
Step 14 : Set TEMP = A[I].
Step 15 : Set A[I] = A[J].
Step 16 : Set A[J] = TEMP.
Step 17 : [End of Step 6 While loop].
Step 18 : Set TEMP = A[PIVOT].
Step 19 : Set A[PIVOT] = A[J].
Step 20 : Set A[J] = TEMP.
Step 21 : QUICKSORT(LEFT, J – 1),
Step 22 : QUICKSORT(J + 1, RIGHT).
Step 23 : Stop.
16.4 EXAMPLE
40 20 70 14 60 61 97 30
The value of i is incremented till a[i] < Pivot and the value of j is decremented till a[j]
> pivot, this process is repeated until i < j.
If a[i] > pivot and a[j] < pivot and also if i < j then swap a[i] and a[j].
If i > j then swap a[j] and a[pivot].
Once the correct location for PIVOT is found, then partition array into left sub array and
right subarray, where left sub array contains all the elements less than the PIVOT and right sub
array contains all the elements greater than the PIVOT.
40 20 70 14 60 61 97 30
Pivot i j
40 20 70 14 60 61 97 30
Pivot i j
40 20 30 14 60 61 97 70
Pivot i j
40 20 30 14 60 61 97 70
Pivot i j
40 20 30 14 60 61 97 70
Pivot i j
40 20 30 14 60 61 97 70
Pivot i j
40 20 30 14 60 61 97 70
Pivot i j
40 20 30 14 60 61 97 70
Pivot ij
40 20 30 14 60 61 97 70
Pivot j i
j i
Now, the pivot element has reached its correct position. The elements lesser than the
Pivot {14, 20, 30} is considered as left sub array. The elements greater than the pivot {60, 61,
97, 70} is considered as right sub array. Then the QuickSort procedure is applied recursively
for both these arrays.
16.5 PROGRAM
#include <stdio.h>
int main()
{
int i, n, a[10];
printf("Enter the limit : ");
scanf("%d", &n);
printf("Enter the elements : ");
for (i = 0; i < n; i++)
scanf("%d", &a[i]);
QuickSort(a, 0, n - 1);
printf("The sorted elements are : ");
for (i = 0; i < n; i++)
printf("%d\t", a[i]);
return 0;
}
Output
The worst case efficiency of O(n2) is not well suited for large sized lists.
Its algorithm is considered as a little more complex in comparison to some other
sorting techniques.
17.1 INTRODUCTION
The most common algorithm used in external sorting is the merge sort. This algorithm
follows divide and conquer strategy.
In dividing phase, the problem is divided into smaller problem and solved
recursively.
In conquering phase, the partitioned array is merged together recursively.
Merge sort is applied to the first half and second half of the array. This gives two sorted
halves, which can then be recursively merged together using the merging algorithm.
The basic merging algorithm takes two input arrays A and B and an output array C. The
first element of A array and B array are compared, then the smaller element is stored in the
output array C and the corresponding pointer is incremented. When either input array is
exhausted the remainder of the other array is copied to an output array C.
17.2 ALGORITHM
Step 1 : Start.
Step 2 : Set N1 = CENTER - LEFT + 1.
Step 3 : Set N2 = RIGHT - CENTER.
Step 4 : Repeat For I = 0 to N1 - 1.
Step 5 : Set A[I] = ARR[LEFT + I].
Step 6 : Increment I by 1.
Step 7 : [End of Step 4 For loop].
Step 8 : Repeat For J = 0 to N2 - 1.
Step 9 : Set B[J] = ARR[CENTER + 1 + J].
Step 10 : Increment J by 1.
Step 11 : [End of Step 8 For loop].
Step 12 : Repeat While APTR < N1 AND BPTR < N2.
Step 13 : If A[APTR]<= B[BPTR] then Goto Step 14 else Goto Step 18.
Step 14 : Set ARR[CPTR] = A[APTR].
Step 15 : Increment APTR by 1 and Goto Step 19.
Step 16 : Set ARR[CPTR] = B[BPTR].
Step 17 : Increment BPTR by 1.
Step 18 : Increment CPTR by 1.
Step 19 : [End of Step 12 While loop].
Step 20 : Repeat While APTR < N1.
Step 21 : Set ARR[CPTR] = A[APTR].
Step 22 : Increment APTR by 1.
Step 23 : Increment CPTR by 1.
17.4 EXAMPLE
For instance, to sort the eight element array 24, 13, 26, 1, 2, 27, 38, 15, we recursively
sort the first four and last four elements, obtaining 1, 13, 24, 26, 2, 15, 27, 38 then these array
is divided into two halves and the merging algorithm is applied to get the final sorted array.
24 13 26 1 2 27 38 15
13, 24 1, 26 2, 27 15, 38
Let us consider first 4 elements 1, 13, 24, 26 as A array and the next four elements 2, 15,
27, 38 as B array.
1 13 24 26 2 15 27 38
Aptr Bptr Cptr
First, the element 1 from A array and element 2 from B array is compared, then the
smallest element 1 from A array is copied to an output array C. Then the pointers Aptr and Cptr
is incremented by one.
1 13 24 26 2 15 27 38 1
Aptr Bptr Cptr
1 13 24 26 2 15 27 38 1 2
Aptr Bptr Cptr
1 13 24 26 2 15 27 38 1 2 13
Aptr Bptr Cptr
1 13 24 26 2 15 27 38 1 2 13 15
Aptr Bptr Cptr
1 13 24 26 2 15 27 38 1 2 13 15 24
Aptr Bptr Cptr
1 13 24 26 2 15 27 38 1 2 13 15 24 26
Aptr Bptr Cptr
Since A array is exhausted, the remaining elements of B array is then copied to C array.
1 13 24 26 2 15 27 38 1 2 13 15 24 26 27 38
Aptr Bptr Cptr
1 2 13 15 24 26 27 38
17.5 PROGRAM
#include <stdio.h>
int main()
{
int i, n, arr[20];
printf("Enter the limit : ");
scanf("%d", &n);
printf("Enter the elements : ");
for (i = 0; i < n; i++)
scanf("%d", &arr[i]);
MergeSort(arr, 0, n - 1);
48 [Link] | AP (SG) | CSE | Rajalakshmi Engineering College
printf("The sorted elements are : ");
for (i = 0; i < n; i++)
printf("%d\t", arr[i]);
return 0;
}
void MergeSort(int arr[], int left, int right)
{
int center;
if (left < right)
{
center = (left + right) / 2;
MergeSort(arr, left, center);
MergeSort(arr, center + 1, right);
Merge(arr, left, center, right);
}
}
17.7 ADVANTAGES
17.8 LIMITATIONS
It requires additional memory space to perform sorting. The size of the additional
space is in direct proportion to the size of the input list.
Even though the number of comparisons made by merge sort are nearly optimal, its
performance is slightly lesser than that of quick sort.
Merge sort sorts the larger amount of data making use of external storage device.
It requires extra memory space.
16.1. INTRODUCTION
In heap sort the array is interpreted as a binary tree. This method has two phases. In
phase 1, binary heap is constructed and in phase 2, delete min routine is performed.
Phase 1:
Two properties of a binary heap namely structure property and heap order property is
achieved in phase 1.
Structure Property
For any element in array position I, the left child is in position 2i, the right child is in
2i+1, (ie) the cell after the left child.
The key value in the parent node is smaller than or equal to the key value of any of its
child node. To Build the heap, apply the heap order property starting from the right most non-
leaf node at the bottom level.
Phase 2
The array elements are sorted using deletemin operation, which gives the elements
arranged in descending order.
Phase 1:
Phase 2:
Remove the smallest element from the heap and place it in the array.
After the last deletemin, the array will contain the elements in descending order. To get the
elements sorted in ascending order, Deletemax routine of heap is applied, in which the heap
order property is changed. i.e. the parent has a larger key value then its children.
Now, the root node contains the maximum key value, so apply delete max routine to this heap.
Remove the maximum element and place it in an array from the heap.
4.1 INTRODUCTION
Static hashing
Dynamic hashing
In static hashing, the hash function maps search key values to a fixed set of locations.
In dynamic hashing, the hash table can grow to handle more items. The associated hash
function must change as the table grows.
A hash table is a data structure, which is implemented by a hash function and used for
searching elements in quick time. In a hash table, hash keys act as the addresses of the elements.
Compilers
Graph theory problem
Online spelling checkers etc.
Database systems
Symbol tables
Data dictionaries
Network processing algorithms
Browse caches
5.1 INTRODUCTION
Hashing finds the location of an element in a data structure without making any
comparisons. In contrast to the other comparison-based searching techniques, like linear and
binary search, hashing uses a mathematical function to determine the location of an element.
This mathematical function called hash function accepts a value, known as key, as input and
generates an output known as hash key. The hash function generates hash keys and stores
elements corresponding to each hash key in the hash table. The keys that hash function accepts
as input could be a digit associated with the input elements.
Let us consider a simple example of a file containing information for five employees of a
company. Each record in that file contains the name and a three-digit numeric Employee ID of
the employee. In this case, the hash function will implement a hash table of five slots using
Employee IDs as the keys. That means, the hash function will take Employee IDs as input and
generate the hash keys, as shown in Fig.
In the hash table generated in the above example, the hash function is Employee ID%10.
Thus, for Employee ID 101, hash key will be calculated as 1. Therefore, Name1 will be stored at
position 1 in the hash table. For Employee ID 102, hash key will be 2, hence Name2 will be
stored at position 2 in the hash table. Similarly, Name3, Name4, and Name5 will be stored at
position 4, 3, and 5 respectively, as shown in Fig. Later, whenever an employee record is
searched using the Employee ID, the hash function will indicate the exact position of that record
in the hash table.
Minimize collisions
Be easy and quick to compute
Distribute key values evenly in the hash table
Use all the information provided in the key
In this method, the key is squared and the middle part of the result based on the number
of digits required for addressing is taken as the hash value. This method works well if the keys
do not contain a lot of leading and trailing zeros.
For example:
Map the key 2453 into a hash table of size 1000, there, X = 2453.
X2 = 6017209
This method computes the hash value from the key using the modulo (%) operator.
Here, the table size that is power of 2 like 32, 64 and 1024 should be avoided as it leads to more
collisions. It is always better to select table size not close to the power of 2.
For example:
H(4) = 4 % 4 = 0
This method involves splitting keys into two or more parts each of which has the same
length as the required address with the possible exception of the last part and then adding the
parts to form the hash address. There are two types of folding method. They are:
Key is broken into several parts of same length of the required address and then added
to get the hash value. Final carry is ignored.
For example:
Position X into 123, 203, 241, then add these three values.
Key is broken into several parts and reverse the digits in the outermost partitions and
then add the partition to form the hash value.
For example:
X = 123203241
This method generates random number given a seed as parameter and the resulting
random number then scaled into the possible address range using modulo division. It must
ensure that it always generates the same random value for a given key. The random number
produced can be transformed to produce a valid hash value.
Xi+1 = A Xi % TableSize
This method extracts the selected digits from the key and used it as the address or it can
be reversed to give the hash value.
For example:
Similarly, the hash value can also be obtained by considering the above digit positions
and reversing it, which yields the hash value as 4022.
In this method a key is transformed into another number base to obtain the hash value.
For example:
Map the key (8465)10 in the range 0 to 999 using base 15.
(8465)10 = (2795)15
6.1 INTRODUCTION
The hash function takes some key values as input, performs some mathematical
calculation, and generates hash key to ascertain the position in the hash table where the record
corresponding to the key will be stored. However, it is quite possible that the hash function
generates same hash keys for two different key values. That means, two different records are
indicated to be stored at the same position in the hash table. This situation is termed as collision.
As a result, a hash function must be designed in such a way that the possibility of a
collision is negligible. Various techniques such as, linear probing, chaining without
replacement, and chaining with replacement are used to evade the chances of a collision.
Separate chaining
Open addressing
o Linear probing
o Quadratic probing
o Double hashing
Rehashing
Extendible hashing
7.1 INTRODUCTION
The separate chaining technique uses linked lists to overcome the problem of collisions.
In this technique, all the keys that resolve to the same hash values are maintained in a linked
list.
Whenever a collision occurs, the key value is added at the end of the corresponding list.
Thus, each position in the hash table acts as a header node for a linked list. To perform a search
operation, the list indicated by the hash function is searched sequentially.
7.2 INSERTION
To perform an insert, we traverse down the appropriate list to check whether the
element is already in place (if duplicates are expected, an extra field is usually kept, and this
field would be incremented in the event of a match). If the element turns out to be new, it is
inserted either at the front of the list or at the end of the list, whichever is easiest.
Insert 0 H(0) = 0 % 10 = 0
Insert 1 H(1) = 1 % 10 = 1
Insert 81 H(81) = 81 % 10 = 1
The element 81 collides to the same hash value 1. To place the value at this position,
perform the following:
Insert 4 H(4) = 4 % 10 = 4
Insert 64 H(64) = 64 % 10 = 4
Insert 25 H(25) = 25 % 10 = 5
Insert 16 H(16) = 16 % 10 = 6
Insert 36 H(36) = 36 % 10 = 6
Insert 9 H(9) = 9 % 10 = 9
Insert 49 H(49) = 49 % 10 = 9
7.3 FIND
To perform a find, we use the hash function to determine which list to traverse. We then
traverse this list in the normal manner, returning the position where the item is found.
7.4 ADVANTAGE
8.1 INTRODUCTION
Separate chaining hashing has the disadvantage of requiring pointers. This tends to slow
the algorithm down a bit because of the time required to allocate new cells, and also essentially
requires the implementation of a second data structure.
More formally, cells h0(X), h1(X), h2(X), . . . are tried in succession where
with F(0) = 0.
Linear Probing
Quadratic Probing
Double Hashing
In linear probing, F is a linear function of i, typically F(i) = i. This amounts to trying cells
sequentially (with wraparound) in search of an empty cell.
hi(X) = (Hash(X) + i) % N
h0(X) = Hash(X) % N
h1(X) = (Hash(X) + 1) % N
h2(X) = (Hash(X) + 2) % N
Figure shows the result of inserting keys {89, 18, 49, 58, 69} into a hash table using the
same hash function as before and the collision resolution strategy, F(i) = i.
The first collision occurs when 49 is inserted; it is put in the next available spot, namely
spot 0, which is open.
58 collides with 18, 89, and then 49 before an empty cell is found three away.
Figure: Open addressing hash table with linear probing, after each insertion
Advantage
Disadvantage
It forms clusters, which degrades the performance of the hash table for storing and
retrieving.
h0(X) = Hash(X) % N
h1(X) = (Hash(X) + 1) % N
h2(X) = (Hash(X) + 4) % N
Figure shows the resulting open addressing hash table with this collision function on the
same input used in the linear probing example.
When 49 collides with 89, the next position attempted is one cell away. This cell is empty,
so 49 is placed there.
Figure: Open addressing hash table with quadratic probing, after each insertion
Although quadratic probing eliminates primary clustering, elements that hash to the
same position will probe the same alternate cells. This is known as secondary clustering.
Secondary clustering is a slight theoretical blemish. Simulation results suggest that it generally
causes less than an extra half probe per search.
Double hashing uses the idea of applying a second hash function to the key when a
collision occurs. The result of the second hash function will be the number of positions from the
point of collision to insert. It must never evaluate to zero.
Hash2(Key) = R – (Key % R)
where R is a prime number that is smaller than the size of the table.
If we choose R = 7, then Figure shows the results of inserting the same keys as before.
Figure: Open addressing hash table with double hashing, after each insertion
Hash(49) = 49 % 10 = 9
Hash2(49) = 7 – (49 % 7) = 7 – 0 = 7
h1(49) = (9 + 1 * 7) % 10 = (9 + 7) % 10 = 16 % 10 = 6.
Hash(58) = 58 % 10 = 8
Hash2(58) = 7 – (58 % 7) = 7 – 2 = 5
h1(58) = (8 + 1 * 5) % 10 = (8 + 5) % 10 = 13 % 10 = 3.
Hash(69) = 69 % 10 = 9
Hash2(69) = 7 – (69 % 7) = 7 – 6 = 1
h1(69) = (9 + 1 * 1) % 10 = (9 + 1) % 10 = 10 % 10 = 0.
9.1 INTRODUCTION
If the table gets too full, the running time for the operations will start taking too long and
inserts might fail for closed hashing with quadratic resolution. This can happen if there are too
many deletions intermixed with insertions. A solution, then, is to build another table that is
about twice as big (with associated new hash function) and scan down the entire original hash
table, computing the new hash value for each (non-deleted) element and inserting it in the new
table. This entire operation is called rehashing.
As an example, suppose the elements 13, 15, 24, and 6 are inserted into a closed hash
table of size 7. The hash function is h(X) = X mod 7. Suppose linear probing is used to resolve
collisions. The resulting hash table appears in Figure.
Figure: Open addressing hash table with linear probing with input 13,15, 6, 24
If 23 is inserted into the table, the resulting table in Figure will be over 70 percent full.
Figure: Open addressing hash table with linear probing after 23 is inserted
This entire operation is called rehashing. This is obviously a very expensive operation;
the running time is O(N), since there are N elements to rehash and the table size is roughly 2N,
but it is actually not all that bad, because it happens very infrequently.
HashTable Rehash(HashTable H)
{
int i, OldSize;
Cell *OldCells;
OldCells = H->TheCells;
OldSize = H->TableSize;
H = InitializeTable(2 * OldSize);
for(i = 0; i < OldSize; i++)
if(OldCells[i].Info == Legitimate)
Insert(OldCells[i].Element, H);
free(OldCells);
return H;
}
1. What is rehashing?
2. When the rehashing be implemented?
10.1 INTRODUCTION
If either open hashing or closed hashing is used, the major problem is that collisions
could cause several blocks to be examined during a find, even for a well-distributed hash table.
Furthermore, when the table gets too full, an extremely expensive rehashing step must be
performed, which requires O(n) disk accesses.
Let us suppose, for the moment, that our data consists of several six-bit integers. Figure
shows an extendible hashing scheme for this data. The root of the "tree" contains four pointers
determined by the leading two bits of the data. Each leaf has up to m = 4 elements. It happens
that in each leaf the first two bits are identical; this is indicated by the number in parentheses.
To be more formal, D will represent the number of bits used by the root, which is sometimes
known as the directory. The number of entries in the directory is thus 2D. dL is the number of
leading bits that all the elements of some leaf L have in common. d L will depend on the particular
leaf, and dL<D.
Suppose that we want to insert the key 100100. This would go into the third leaf, but as
the third leaf is already full, there is no room. We thus split this leaf into two leaves, which are
now determined by the first three bits. This requires increasing the directory size to 3. These
changes are reflected in Figure.
Notice that all of the leaves not involved in the split are now pointed to by two adjacent
directory entries. Thus, although an entire directory is rewritten, none of the other leaves are
actually accessed.
If the key 000000 is now inserted, then the first leaf is split, generating two leaves with
dL = 3. Since D = 3, the only change required in the directory is the updating of the 000 and 001
pointers. See Figure.
This very simple strategy provides quick access times for Insert and Find operations on
large databases.