0% found this document useful (0 votes)
14 views28 pages

8 Search+hash - 2

The document covers searching and hashing techniques in computer science, detailing various searching algorithms such as sequential and binary search, as well as hashing methods and collision resolution strategies. It explains the importance of efficient data retrieval and introduces concepts like comparison trees, hash functions, and different hashing techniques including identity, truncation, folding, and modular arithmetic. Additionally, it discusses collision resolution methods such as replacement, open addressing, and chaining.

Uploaded by

aldardiseaseel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views28 pages

8 Search+hash - 2

The document covers searching and hashing techniques in computer science, detailing various searching algorithms such as sequential and binary search, as well as hashing methods and collision resolution strategies. It explains the importance of efficient data retrieval and introduces concepts like comparison trees, hash functions, and different hashing techniques including identity, truncation, folding, and modular arithmetic. Additionally, it discusses collision resolution methods such as replacement, open addressing, and chaining.

Uploaded by

aldardiseaseel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 28

Searching

&
Hashing
Lecture Objectives
 Concept of searching
 Different types of searching algorithms
 Concept of comparison trees
 Concepts of hashing & hashing techniques
 Different types of sorting techniques
Searching

 Searching
Searching ::
–– Retrieving
Retrieving the
the Information
Information Stored
Stored in
in the
the computer
computer is
is one
one of
of the
the
most
most important
important applications
applications of
of computer.
computer.
–– e.g.,
e.g., Looking
Looking for
for aa Name
Name byby giving
giving the
the telephone
telephone number.
number.
–– We
We give
give one
one piece
piece ofof information
information (called
(called asas KEY)
KEY) and
and we
we also
also
provide
provide other
other information
information associated
associated with
with the
the key
key

Types of Searching
There are two Types of Searching :
1. External searching
Searching in external storage devices (Disk,Tapes etc..)

2. Internal searching
Searching within the computer memory
e.g., Sequential search and Binary Search
Sequential Searching
Start at the first piece of data and look at it and if it is not the data
you are looking for, then keep searching until you find what you are
looking for or until you have reached the last.

Find the marks of the students from the Data structures and
algorithms course using an array. ( Key - ID)

a id mark1 mark2
[0] 456 60 80 for ( i =0; i < size; i++)
{ if (a[i].id == ‘123’)
[1]
780 65 82 cout << a[i].id << a[i].mark1;
found = true }
[2] 550 70 80 if (found !=true)
cout << “Not in the List”
[3] 650 60 80
}
Sequential Searching
Two Kinds of result
1) Successful and 2) Unsuccessful
Best Case - If the element to be searched is in the first location then
the number of comparison is one. O(1) (456)
Worst Case - nth element a id mark1 mark2
O(n) (last) [0] 456 60 80
Average Case : 1/2 n (n+1)/n = O(n)
[1]
780 65 82

[2] 550 70 80

[3] 650 60 80
Sequential Searching
1) It is useful for limited sized data sets as it is simple and does not
require data to be structured in any way
2) when the data set to be searched is constantly changing

Disadvantages :
 Time consuming (large list)

Is there any other suitable data structures?


This algorithm is well suited for implementation with linked list - follow ‘next’
pointer until we find or until we have reached the last
Advantages: Addition/deletion is easy
Binary Search
How it works?
Binary – (two) - List is broken down into two parts and
searched

Working :
Compare the target with the one in the center of the list
and then restrict our attention to only the first or second
half depending on whether the target comes before or after
the central one.
( we reduce the length to be searched by half)
Binary Search
Rule - numbers must be in sorted order

Steps:
1. Mark the first element as low and last element as high
2. Find the mid point – (mid = (low + high) /2)
1. mark l & h
3. Check whether mid = x if so return index
2. Find Mid
4.3.IfCheck
not ,whether
check mid
whether x isreturn
= x if so greater
indexthan mid, then
4.bring
If not low to mid
, check +1 and
whether x is >carry
then on the same process
5. If bring
x is less
low tothan mid,
mid +1 andthen
carrybring
on thehigh
sameto mid-1 and carry on
process
5.the
If < same process
then bring high to mid-1 and carry on the same process
6.6.Until
Until you
you find
findthe value
the or ifor
value lowif and
lowhigh
andcross
higheach other
cross each other
Binary Search-Example 1
4 7 8 10 14 21 22 36 62 77 81 91

LOW =1 MID = (LOW + HIGH) /2 = 13/2 =6 HIGH=12

Compare Mid with Search value - Mid is Smaller than Search Value
So move Low to MID + 1 and proceed

4 7 8 10 14 21 22 36 62 77 81 91

MID = 7+12/2 = 9
LOW =7 HIGH=12
Compare Mid with Search Value - Yes it matches , So print out Element Found

Target : 62
Binary Search-Non Recursive Algorithm
#include <iostream> /* The binary search */
using namespace std; while ((found == 0) && (bottom<= top))
void main() { mid = (top + bottom)/2;
{ if (target == table[mid])
int table[200]; found = 1;
else
int target,top,bottom,mid;
if(target< table[mid])
int i, found; top = mid - 1;
/* Setting up the array */ else
for (i=0;i<200;i++) bottom = mid + 1;
table[i] = 2*i; }
bottom = 0; if (found)
top = 199; cout<<"\nPosition ="<<mid;
found = 0; else
cout<<“\n Input the target”; cout<<"\nTarget not found";
}
cin>>target;
Binary Search-Recursive Algorithm
template<class ItemType>
bool BinarySearch ( ItemType info[ ] , ItemType item , int fromLoc ,
int toLoc )
{ int mid ; base case -- not found
if ( fromLoc > toLoc )
return false ;
else {
mid = ( fromLoc + toLoc ) / 2 ; base case-- found at mid

if ( info [ mid ] == item )

return true ;

else if ( item < info [ mid ] ) search lower half

return BinarySearch ( info, item, fromLoc, mid-1 ) ;


search upper half
else
return BinarySearch( info, item, mid + 1, toLoc ) ; }
Hashing
Introduction
We have discussed many ways of organizing data so that we can find things
quickly.
We have seen how we have improved from the O(n) algorithms to O(log n)
algorithms.
But wouldn’t it be great if we could just find the item immediately?
Consider this :

We have an array of all students in tiny college and the student id numbers
from 1 to 200. Now if we want to look for any particular student we simply go to
the location of the student id ( KEY - value used for searching) and retrieve.
The search time will be O(1).
If the students’ Id is 10 digit long then we need to convert that large number into
a smaller number.
The Hash Function
 This is what we use to convert the key into a small enough
number as to be practical for a table size.
 The Keys may be integers, characters or some kind of data and
we use the hash function to turn them into integers in the range
of 0..m-1, where m is the size of the array of records.
 This function may or may not be simple.
 The term hash and hashing come from the very fact that the
conversion from key to index really does ‘hash’ the key as in
many cases the index resulting from hashing the key bears
absolutely no resemblance to the original key at all.
Different Methods of Hashing
Identity:
Suppose memory is limitless then we could have the simplest
hash function of all:
index = key
this would mean declaring an array of elements and putting the
keys in those elements with the same index!

e.g., 1 2 3 4 5

Disadvantage:
For this to be a viable proposition memory would have to be
limitless (and wastable). In practice you cannot afford such
extravagance.
Different Methods of Hashing
Truncation:
With truncation part of the key is ignored and the remainder used
directly.
Suppose in a certain situation there were just 6 keys and they
were: 12302 12303 12305 12307 12308 12309

The hash function here could be:


index = last digit of key
(or index = key mod 12300 )
Of course the above hash function would not work if the range of
keys was:
2134 4134 5134 7134 9134
or 2560 4670 6124 8435 9200
Here you would require a hash function that removes the last 3
digits. This is why it is necessary for you to know something about
the distribution of actual keys before deciding a hash function.
Different Methods of Hashing
Folding :
Here the key is partitioned, and the parts are combined in some way
(maybe by adding or multiplying them) to obtain the index. Suppose
we have 1000 records with 8 digit keys then perhaps:

The 8 digit key such as 62538194 62538195


may be partitioned into: 625 381 94 625 381 95
the groups now added: 1100 1101
and the result truncated: 100 101

Since, all the information in the key is used in generating the


index, folding often achieves a better spread than truncation just by
itself.
Different Methods of Hashing
Modular Arithmetic:
The key is converted into an integer, the integer is then divided by the size of
the index range and the remainder is taken as the index position of the
record.

index = key mod size

As an example, suppose we have a 5 digit integer as a key and there


are 1000 records (rooms for 1000 records) .

The hash function would then be: index = KEY % size


e.g., 12345 % 1000

If we are very lucky! our keys might be such that there is only 1 key that maps
to each index. Of course we might still have the situation where two keys map
to the same index (e.g. 23456, 43456) - this is called a COLLISION.
Example
#include <iostream.h> EX:
#include <conio.h> getch() and clrscr( int HASH(int key)
)
{ int index;
# define MAX 20
int HASH(int); //Function prototype index = key/10 -1;
int main ( )
{ if (index<MAX) return index;
int table[MAX],index, i, target; else return -1;
for(i=1; i<=MAX; i++)
table[i-1]=10*i; }
cout<<“Enter Target: “;
cin>>target;
index = HASH(target);
Output:
if (index!=-1) Enter Target: 20
{if table[index] == target) Found at table[1]
cout<<“Found at”<<“table[“<<index<<“]”;
else cout <<“Target Not found”;
}
Enter Target: 15
else cout <<“Target Not found”; Target not found
return 0;
}
Hashing Techniques

What is a Collision ?
If the hashing Function gives out SAME INDEX
for TWO keys then it is called a collision
e.g., : 123 % 10 index will be 3
223 % 10 index will be again 3
We cannot store two values in the same location.
Hashing Techniques
How to resolve collisions?
Three Methods are available:
1 . Resolving collisions by REPLACEMENT

2. Resolving collisions by OPEN ADDRESSING


• Linear probing
• Quadratic probing

3. Resolving collisions by CHAINING


Hashing Techniques
Resolving collisions by replacement:
 Working : we simply replace the old KEY with the
new KEY when there is a collision
• The Old Key is simply Lost
• Or
• It is combined with the new Key.
• EG : 121 122 221 124 125 126 224 index = key % 10

221
121 122 124 125 126
224

 When it is used ?
– Very Rarely used
– Used only when the data are sets . Using UNION Operation old data
is combined with the new data.
Hashing Techniques
Resolving collisions by Open addressing:
 We resolve the collision by putting the new Key in some
other empty location in the table.
– Two methods are used to locate the empty location
1. Linear Probing 2. Quadratic Probing
 Linear Probing :
– Start at the point where the collision occurred and do a
sequential search through the table for an empty
location.
 Improvement :
– Circular Probing : After reaching the end start probing from the first
Example - Linear Probing
Keys = 6 Table Size = 7 Function = key mod 7

key 12 15 21 36 84 96
Index 5 1 0 1 0 5

SOLUTION

Index [0] [1] [2] [3] [4] [5] [6]

21 15 36 84 12 96
Resolving collisions by Open addressing
 Quadratic Probing:
If there is a collision at the address ‘h’, this
method probes the table at locations
(h + i2) % hashtablesize for i = 1,2,3…
Dis: It does not probe all locations in the table.
Keys = 6 Table Size = 7 Function = key mod 7
key 12 15 21 36 84 96
Index 5 1 0 1 0 5
index = (h + i 2) % 7 i = 1 , 2 , 3 …… h(36) = (1 + 12) % 7 = 2
h(84) = (0 + 22) % 7 = 4 h(96) = (5 + 62) % 7 = 6

Index [0] [1] [2] [3] [4] [5] [6]

21 15 36 84 12 96
Resolving collisions by Chaining
key 12 15 21 36 84 96
Index 5 1 0 1 0 5

 Implemented using
0 21 84
Linked List
1 15 36
2  Whenever a collision
occurs a new node is
3
created and the new
4 value is stored and
linked to the old
5 12 96 value.
6
Exercise
Using modulo-division method and linear probing store the
following keys in an array of 19 elements
224562, 137456, 214562,140145, 214576, 162145,
144467, 199645, 234534
Index is :
224562 % 19 = 1, 137456 % 19 =10 , 214562 % 19 =14,140145 % 19 =1 ( 2),
214576 % 19 =9, 162145 % 19 =18, 144467 % 19 =10 ( 11), 199645 % 19 =12,
234534 % 19 =17
[1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11]

224562 214576 214576 137456 144467

[12] [13] [14] [15] [16] [17] [18] [19]

199645 214562 234534 162145


Thank you

You might also like