DATA STRUCTURES AND ALGORITHMS
SEARCHING
Searching
The process of retrieving some particular information from a large
amount of previously stored information.
The information can be in sorted or unsorted form.
Normally we think of the information as divided up into records,
each record having a key for use in searching.
The goal of the search is to find all records with keys matching a
given search key.
The purpose of the search is usually to access information within
the record for processing.
It is not compulsory that the searched key values are found in the
records.
Searching (Types)
Searching generally falls in two categories:
Internal Search
External Search
Internal Search
If the data to be searched are all present in main memory, then the
searching becomes internal search
Internal Searches are faster than External Search and hence are
recommended whenever possible
They are mainly done among the data which occupy less space
compared to the space of RAM
External Search
If most of the data to be searched are in auxiliary
memory, then the search becomes External Search.
If the data are very large and our main memory is not
large enough to hold them all during the process, then
the external search is used.
Searching Algorithms
Different searching algorithms are used
The choice of proper algorithm depends upon the way the
data are arranged
Any algorithm technique may be better than another
according to the favorable way the data are arranged for it
We are going to study the following 3 searching
techniques:
Linear Search
For unsorted data in linear structure
Binary Search
For sorted data in linear structure
Tree Search
For data maintained in search trees
Linear Search
Also called Sequential Search
Simplest among all
Applicable for data organized in form of array or linked list
It is applicable for small data values
Each element in the array is compared with the value to be
searched
If the values are matched, then the search is successful
Otherwise the comparison is kept on doing until all the values
are compared
By the end of the comparison if the value in the array is not
matched with the value to be searched, then the search is
considered unsuccessful
Linear Search (Example)
To find
Result
Linear Search (Example)
Result Not Found
Linear Search (Algorithm)
Declare and initialize necessary variables
n; a[n]; item-to be searched from array
flag=0 for determining the success of search
For i=0 to n-1
if a[i]=item
display “Search Successful”
flag=1
stop
end if
End for
If flag=0
display “Search Unsuccessful”
End if
Linear Search
It is considered simple and is very applicable when searching for
small data
It is good in searching for unsorted data
Whereas,
It is slower as compared to other searching algorithms
It is applicable only for small amount of data
Binary Search
If the data items are presented in sorted form (i.e. ascending or
descending), then this algorithm is used
It is much more efficient algorithm than the general linear search
for the sorted data
Key value is compared with the middle element of the list
If the values are equal then the search is successful and the
process is stopped
If the middle values is less then the result is in upper half of the list
If the middle value is greater then the result is in lower half of the
list
The search is repeated for the lower or upper half of the list until
we find the required value in the list or all items from the list is
searched
Binary Search (Example)
Find
‘x’
Total Iteration:
4
Binary Search (Example)
Find
‘4’
Total Iteration:
4
Binary Search (Algorithm)
Declare and initialize necessary variables
n; a[n]; first=0; last=n-1;item;flag=0;middle=(first + last)/2
While (first<=last)
if (a[middle]=item)
display “Search Successful”
flag=1;
stop
else if (item<a[middle])
last=middle-1
else
first=middle+1
end if
middle=(first + last)/2
End while
If (flag=0)
display “Search unsuccessful”
End if
Binary Search (Example)
Binary Search
Binary search is best suited if
the data are present in sorted form and
are being represented in array or list
The main drawbacks are:
Requires the data to be already sorted
Cannot be used where there are many insertions or
deletions
Tree Search
If the data are arranged in the form of search tree
structure, then the tree search can be applied
Search tree generally is a Binary Search Tree
Here the key value is compared with the root node at
first.
If it matches then the search is successful
If it doesn’t matches then either the left or right subtree
is searched based upon the comparison result
If the data item is less than the root node, then left subtree is
searched
If the data item is greater than the root node, then right subtree is
searched
The traversal is repeated until the searched item is
found or null value is reached
Tree Search (Example)
Q. Search for 31 Compare key 31 with root value
11
31>11, so move to right
Compare key 31 with
root value 19
31>19, so move to right
43>31
Move to left
Found 31
Search Successful
Tree Search (Example)
Tree Search can be implemented in m-way tree as well
Tree Search (Algorithm)
Declare and initialize necessary variables
node=root – node pointer pointing to root of the tree
item – data to be searched; flag=0
If(node=NULL)
display “Empty Tree”; Stop;
End if
While (node!=NULL)
if (item=node.data)
display “Search Successful”
flag=1;
stop;
else if (item>node.data)
node=node.right;
else
node=node.left;
end if
End while
If (flag=0)
display “Search Unsuccessful”
End if
Hashing
Hashing is the technique of representing longer records by
shorter values called keys
The keys are placed in a table called hash table where the keys
are compared for finding the records
Hash table is a dictionary in which keys are mapped to array
positions by hash function
Items being searched can directly be accessed by using the
hash table by mapping the corresponding key values into
records
Hashing technique requires minimum (generally 1) number of
comparison for searching the desired record
Hashing
Lets consider an example: Key Value
Numbers from 1 to 99 can be indexed in a hash
table by using the hash function of modulo 10 0 10, 20, 40
division
This function results with the last digit of the 1
numbers which can be used as key for the hash 2 42, 82
table
3
If random numbers are chosen, all the indices
may not be full 4 64
Some index may contain more than one values 5 95
and some may not contain any 6
This imbalance in indices is called Clustering 7 87, 47
More than one data in a single index is called 8
Collision, which results with the conflict while
searching 9 99
Hash Function (Types)
A function that transforms a key into a table index is called a
hash function
Different types of hash functions can be used
Some of the popular ones are:
Direct hashing
Modulo Division
Multiplicative
Digit Extraction (truncation)
Mid-Square
Folding
Direct Hashing
The address is the key itself Address Key
Hash(key)=key
0 0
The main advantage is that
1 1
there is not any collision
--- ---
The disadvantage is that the
--- ---
address space (storage) is
50 50
as large as the key space
51 51
--- ---
--- ---
1089 1089
1090 1090
Modulo Division
Hash(key)=address=key%listsize
Yields hash value which belongs to the set
{0,1,2,3,……..,listsize}
Fewer collisions if listsize is a prime number
Example:
Numbering system to handle 1,500 students
If key is 12865
Address=hash(12865)=12865%1500=865
Digit Extraction
Some digits from the number in specific places are
extracted
The places from which the extraction has to be done are
predefined
The same extraction technique is used for all the keys
Here,
Address=selected digit from the key
Example:
345261=326
167524=152
543625=562
987709=970
Mid Square
Few number of middle digits from the key are extracted
Thus extracted number is squared
The squared result is the required address value
The number of digits chosen depends on number of
digits allowed for indexing
Example:
Assume;
k=12876
Extract second and third digit
N=28
Now,
Address=hash(12876)=N*N
=28*28
=784
Mid Square
The major disadvantage is value obtained by
doing square may be too large
The resolution can be to use only a portion of
the result
Few number of digits from the middle of the result
is used
Example:
K=39873
Address=98*98=9604
Which is long
Hence use only portion of the result
New Address=60
Folding
The key is divided into parts whose size matches the
address size (or less for the last part)
Sum all the divided parts
If there is any carry in the result, then discard it
Thus formed number is the address for the key
Example:
Assume;
k=12896543
Hash table size = 000 to 999 (i.e. 3 digits)
Our part division will be: 128+965+43
=1136
Truncate the carry (i.e. 1 in thousand’s place)
Hence our address will be
Address=136
Collision Resolution
Direct hashing maps the key values with the individual
addresses, hence it is a one-to-one mapping technique and
no collision occurs.
All other hashing techniques may results with some collision
Different collision resolution techniques are used
These techniques are independent of the hashing functions
applied
All these techniques target to minimize clustering because
clustering is the main reason for collision
Collision Resolution (Techniques)
Three basic techniques are used:
Rehashing
Also called Open Addressing
The types are:
Linear Probing
Quadratic Probing
Double Hashing
Chaining
Bucket Addressing
Collision Resolution
Open Addressing:
When collision occurs, an unoccupied address is
searched for placing the new element
It can be done in 3 different ways:
Linear Probing
Quadratic Probing
Double Hashing
Linear probing
When a home address is occupied, go to the next
address
Next address = current address + 1
Rh(k,i) = (h(k)+i) % listsize
Where h(k) = k % listsize
and i=0, 1, 2, 3, …………., listsize-1
Linear Probing
Linear Probing
Q) Insert Keys = {89,18,49,58,67} with the hash
function h(x) = x mod 10 using linear probing.
When x=89, h(89)=89%10=9
When x=18, h(18)=18%10=8
When x=49, h(49)=49%10=9(Collision)
So insert key 49 in next possible vacant location of 9 i.e. 0
When x=58, h(58)=58%10=8(Collision)
So insert key 58 in next possible vacant location of 9 i.e. 1
When x=67, h(67)=67%10=7
Linear Probing
Advantages:
Simple to implement
Data tend to cluster around home address resulting to
compactness of disk spaces
Disadvantages:
Data tend to cluster around specific home address
(Primary Clustering)
The linear searching is required if data is not present in the
searched location, this is very slow process
Quadratic Probing
Tends to minimize the problem of primary clustering
from linear probing
The value is moved considerable distance from the
initial collision
The address incremented is the collision probe number
squared, i.e.
rh(k,i) = (h(k) + i2) % listsize
Where h(k) = k % listsize
and i=0, 1, 2, 3, ………, listsize-1
Quadratic Probing
Q) Insert Keys = {89,18,49,58,67} with the hash
function h(x) = x mod 10 using quadratic probing.
When x=89, h(89)=89%10=9
When x=18, h(18)=18%10=8
When x=49, h(49)=49%10=9(Collision)
So, h1(49)=(9+1)%10=0
So insert key 49 in location 0
Quadratic Probing
Q) Insert Keys = {89,18,49,58,67} with the hash
function h(x) = x mod 10 using quadratic probing.
When x=58, h(58)=58%10=8(Collision)
So, h1(58)=(8+1)%10=9(Collision)
So, h2(58)=(8+4)%10=2
So insert key 58 in location 2
When x=67, h(67)=67%10=7
Quadratic Probing
Advantages:
Works much better than linear probing
Removes primary clustering
Disadvantages:
Time consuming than linear probing
Produces secondary clustering
Double Hashing
Two different hash functions are used to generate
the address if the initial hashing results with collision
This removes the secondary collision
The initial hash value is reused to rehash functions
and new hash value is computed
hp(k, i) = (h1(k) + i*h2(k)) % listsize
Where h1(k) = k % listsize
and h2(k) = k % (some integer slightly less than listsize)
I = 0, 1, 2, 3, ………, (listsize-1)
Double Hashing
Q) Insert Keys = {89,18,49,58,67} with the hash
function h(x) = x mod 10 using double hashing.
When x=89, h(89)=89%10=9
When x=18, h(18)=18%10=8
When x=49, h(49)=49%10=9(Collision)
So, h2(x)= R – (x mod R) = 7-49%7=7
Position = (original hash value + i*h2(x))%tablesize
=(9+1*7)%10 = 6
So insert key 49 in location 6
Double Hashing
Q) Insert Keys = {89,18,49,58,67} with the hash
function h(x) = x mod 10 using double hashing.
When x=58, h(58)=58%10=8(Collision)
So, h2(x)= R – (x mod R) = 7-58%7=7-2=5
Position = (original hash value + i*h2(x))%tablesize
=(8+1*5)%10 = 3
So insert key 58 in location 3
When x=67, h(67)=67%10=7
Open Addressing (Disadvantage)
Major disadvantages are:
Each collision resolution results with the
probability for future collision
If the number keys are more than the address
size of hash table, then collision is sure to occur.
This is called overflow
To overcome these disadvantages, separate
chaining is used.
Chaining
Also called separate chaining
Use fixed size hash table
Link lists are used to store the synonyms
Each slot in hash table points to the head of the linked list
All the elements for that address is placed in linked list
Chaining
Chaining
Bucket Addressing
We can store the colliding elements in the same position in
the hash table.
We associate a bucket with each address.
A bucket is a block of space large enough to store multiple
items.
If the bucket is full, the item has to be stored somewhere else.
So, this approach is often combined with open addressing.
Bucket Addressing
Q) Insert Keys = {102,18,49,58,69,87,88,77,83,120}
with the hash function h(x) = x mod 10 using bucket
hashing with bucket of size 3.
When x=102, h(102)=102%10=2
When x= 18,h()= 18%10 = 8
When x= 49,h()= 49%10 = 9
When x= 58,h()= 58%10 = 8
When x= 69,h()= 69%10 = 9
When x= 87,h()= 87%10 = 7
When x= 88,h()= 88%10 = 8
When x= 77,h()= 77%10 = 7
When x= 83,h()= 83%10 = 3
When x= 120,h()=120%10 =0
Efficiency Comparison of different
Search Techniques
Efficiency Comparison of different
Search Techniques