0% found this document useful (0 votes)
37 views52 pages

Searching

dsa searching

Uploaded by

080bct035
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views52 pages

Searching

dsa searching

Uploaded by

080bct035
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 52

DATA STRUCTURES AND ALGORITHMS

SEARCHING
Searching
 The process of retrieving some particular information from a large
amount of previously stored information.
 The information can be in sorted or unsorted form.
 Normally we think of the information as divided up into records,
each record having a key for use in searching.
 The goal of the search is to find all records with keys matching a
given search key.
 The purpose of the search is usually to access information within
the record for processing.
 It is not compulsory that the searched key values are found in the
records.
Searching (Types)
 Searching generally falls in two categories:
 Internal Search
 External Search
Internal Search
 If the data to be searched are all present in main memory, then the
searching becomes internal search
 Internal Searches are faster than External Search and hence are
recommended whenever possible
 They are mainly done among the data which occupy less space
compared to the space of RAM
External Search
 If most of the data to be searched are in auxiliary
memory, then the search becomes External Search.
 If the data are very large and our main memory is not
large enough to hold them all during the process, then
the external search is used.
Searching Algorithms
 Different searching algorithms are used
 The choice of proper algorithm depends upon the way the
data are arranged
 Any algorithm technique may be better than another
according to the favorable way the data are arranged for it
 We are going to study the following 3 searching
techniques:
 Linear Search
 For unsorted data in linear structure
 Binary Search
 For sorted data in linear structure
 Tree Search
 For data maintained in search trees
Linear Search
 Also called Sequential Search
 Simplest among all
 Applicable for data organized in form of array or linked list
 It is applicable for small data values
 Each element in the array is compared with the value to be
searched
 If the values are matched, then the search is successful
 Otherwise the comparison is kept on doing until all the values
are compared
 By the end of the comparison if the value in the array is not
matched with the value to be searched, then the search is
considered unsuccessful
Linear Search (Example)
To find

Result
Linear Search (Example)

Result Not Found


Linear Search (Algorithm)
Declare and initialize necessary variables
n; a[n]; item-to be searched from array
flag=0 for determining the success of search
For i=0 to n-1
if a[i]=item
display “Search Successful”
flag=1
stop
end if
End for
If flag=0
display “Search Unsuccessful”
End if
Linear Search
 It is considered simple and is very applicable when searching for
small data
 It is good in searching for unsorted data
 Whereas,
 It is slower as compared to other searching algorithms
 It is applicable only for small amount of data
Binary Search
 If the data items are presented in sorted form (i.e. ascending or
descending), then this algorithm is used
 It is much more efficient algorithm than the general linear search
for the sorted data
 Key value is compared with the middle element of the list
 If the values are equal then the search is successful and the
process is stopped
 If the middle values is less then the result is in upper half of the list
 If the middle value is greater then the result is in lower half of the
list
 The search is repeated for the lower or upper half of the list until
we find the required value in the list or all items from the list is
searched
Binary Search (Example)

Find
‘x’
Total Iteration:
4
Binary Search (Example)

Find
‘4’
Total Iteration:
4
Binary Search (Algorithm)
Declare and initialize necessary variables
n; a[n]; first=0; last=n-1;item;flag=0;middle=(first + last)/2
While (first<=last)
if (a[middle]=item)
display “Search Successful”
flag=1;
stop
else if (item<a[middle])
last=middle-1
else
first=middle+1
end if
middle=(first + last)/2
End while
If (flag=0)
display “Search unsuccessful”
End if
Binary Search (Example)
Binary Search
 Binary search is best suited if
 the data are present in sorted form and
 are being represented in array or list
 The main drawbacks are:
 Requires the data to be already sorted
 Cannot be used where there are many insertions or
deletions
Tree Search
 If the data are arranged in the form of search tree
structure, then the tree search can be applied
 Search tree generally is a Binary Search Tree
 Here the key value is compared with the root node at
first.
 If it matches then the search is successful
 If it doesn’t matches then either the left or right subtree
is searched based upon the comparison result
 If the data item is less than the root node, then left subtree is
searched
 If the data item is greater than the root node, then right subtree is
searched
 The traversal is repeated until the searched item is
found or null value is reached
Tree Search (Example)
Q. Search for 31 Compare key 31 with root value
11
31>11, so move to right
Compare key 31 with
root value 19
31>19, so move to right

43>31
Move to left

Found 31
Search Successful
Tree Search (Example)
 Tree Search can be implemented in m-way tree as well
Tree Search (Algorithm)
Declare and initialize necessary variables
node=root – node pointer pointing to root of the tree
item – data to be searched; flag=0
If(node=NULL)
display “Empty Tree”; Stop;
End if
While (node!=NULL)
if (item=node.data)
display “Search Successful”
flag=1;
stop;
else if (item>node.data)
node=node.right;
else
node=node.left;
end if
End while
If (flag=0)
display “Search Unsuccessful”
End if
Hashing
 Hashing is the technique of representing longer records by
shorter values called keys
 The keys are placed in a table called hash table where the keys
are compared for finding the records
 Hash table is a dictionary in which keys are mapped to array
positions by hash function
 Items being searched can directly be accessed by using the
hash table by mapping the corresponding key values into
records
 Hashing technique requires minimum (generally 1) number of
comparison for searching the desired record
Hashing

Lets consider an example: Key Value
 Numbers from 1 to 99 can be indexed in a hash
table by using the hash function of modulo 10 0 10, 20, 40
division
 This function results with the last digit of the 1
numbers which can be used as key for the hash 2 42, 82
table
3
 If random numbers are chosen, all the indices
may not be full 4 64
 Some index may contain more than one values 5 95
and some may not contain any 6
 This imbalance in indices is called Clustering 7 87, 47
 More than one data in a single index is called 8
Collision, which results with the conflict while
searching 9 99
Hash Function (Types)
 A function that transforms a key into a table index is called a
hash function
 Different types of hash functions can be used
 Some of the popular ones are:
 Direct hashing

 Modulo Division

 Multiplicative

 Digit Extraction (truncation)

 Mid-Square

 Folding
Direct Hashing
 The address is the key itself Address Key
 Hash(key)=key
0 0
 The main advantage is that
1 1
there is not any collision
--- ---
 The disadvantage is that the
--- ---
address space (storage) is
50 50
as large as the key space
51 51
--- ---
--- ---
1089 1089
1090 1090
Modulo Division
 Hash(key)=address=key%listsize
 Yields hash value which belongs to the set
{0,1,2,3,……..,listsize}
 Fewer collisions if listsize is a prime number
 Example:
 Numbering system to handle 1,500 students
 If key is 12865
 Address=hash(12865)=12865%1500=865
Digit Extraction
 Some digits from the number in specific places are
extracted
 The places from which the extraction has to be done are
predefined
 The same extraction technique is used for all the keys
 Here,
 Address=selected digit from the key
Example:
345261=326
167524=152
543625=562
987709=970
Mid Square
 Few number of middle digits from the key are extracted
 Thus extracted number is squared
 The squared result is the required address value
 The number of digits chosen depends on number of
digits allowed for indexing
Example:
Assume;
k=12876
Extract second and third digit
N=28
Now,
Address=hash(12876)=N*N
=28*28
=784
Mid Square
 The major disadvantage is value obtained by
doing square may be too large
 The resolution can be to use only a portion of
the result
 Few number of digits from the middle of the result
is used
Example:
K=39873
Address=98*98=9604
Which is long
Hence use only portion of the result
New Address=60
Folding
 The key is divided into parts whose size matches the
address size (or less for the last part)
 Sum all the divided parts
 If there is any carry in the result, then discard it
 Thus formed number is the address for the key
Example:
Assume;
k=12896543
Hash table size = 000 to 999 (i.e. 3 digits)
Our part division will be: 128+965+43
=1136
Truncate the carry (i.e. 1 in thousand’s place)
Hence our address will be
Address=136
Collision Resolution
 Direct hashing maps the key values with the individual
addresses, hence it is a one-to-one mapping technique and
no collision occurs.
 All other hashing techniques may results with some collision
 Different collision resolution techniques are used
 These techniques are independent of the hashing functions
applied
 All these techniques target to minimize clustering because
clustering is the main reason for collision
Collision Resolution (Techniques)

 Three basic techniques are used:


 Rehashing
 Also called Open Addressing
 The types are:
 Linear Probing
 Quadratic Probing
 Double Hashing
 Chaining

 Bucket Addressing
Collision Resolution
 Open Addressing:
 When collision occurs, an unoccupied address is
searched for placing the new element
 It can be done in 3 different ways:
 Linear Probing
 Quadratic Probing
 Double Hashing
Linear probing
 When a home address is occupied, go to the next
address
 Next address = current address + 1
 Rh(k,i) = (h(k)+i) % listsize
 Where h(k) = k % listsize
 and i=0, 1, 2, 3, …………., listsize-1
Linear Probing
Linear Probing
Q) Insert Keys = {89,18,49,58,67} with the hash
function h(x) = x mod 10 using linear probing.
 When x=89, h(89)=89%10=9
 When x=18, h(18)=18%10=8
 When x=49, h(49)=49%10=9(Collision)
So insert key 49 in next possible vacant location of 9 i.e. 0
 When x=58, h(58)=58%10=8(Collision)
So insert key 58 in next possible vacant location of 9 i.e. 1
 When x=67, h(67)=67%10=7
Linear Probing
 Advantages:
 Simple to implement
 Data tend to cluster around home address resulting to
compactness of disk spaces
 Disadvantages:
 Data tend to cluster around specific home address
(Primary Clustering)
 The linear searching is required if data is not present in the
searched location, this is very slow process
Quadratic Probing
 Tends to minimize the problem of primary clustering
from linear probing
 The value is moved considerable distance from the
initial collision
 The address incremented is the collision probe number
squared, i.e.
 rh(k,i) = (h(k) + i2) % listsize
 Where h(k) = k % listsize
 and i=0, 1, 2, 3, ………, listsize-1
Quadratic Probing
Q) Insert Keys = {89,18,49,58,67} with the hash
function h(x) = x mod 10 using quadratic probing.
 When x=89, h(89)=89%10=9
 When x=18, h(18)=18%10=8
 When x=49, h(49)=49%10=9(Collision)
So, h1(49)=(9+1)%10=0
So insert key 49 in location 0
Quadratic Probing
Q) Insert Keys = {89,18,49,58,67} with the hash
function h(x) = x mod 10 using quadratic probing.
 When x=58, h(58)=58%10=8(Collision)
So, h1(58)=(8+1)%10=9(Collision)
So, h2(58)=(8+4)%10=2
So insert key 58 in location 2
 When x=67, h(67)=67%10=7
Quadratic Probing
 Advantages:
 Works much better than linear probing
 Removes primary clustering
 Disadvantages:
 Time consuming than linear probing
 Produces secondary clustering
Double Hashing
 Two different hash functions are used to generate
the address if the initial hashing results with collision
 This removes the secondary collision
 The initial hash value is reused to rehash functions
and new hash value is computed
 hp(k, i) = (h1(k) + i*h2(k)) % listsize
 Where h1(k) = k % listsize
 and h2(k) = k % (some integer slightly less than listsize)
 I = 0, 1, 2, 3, ………, (listsize-1)
Double Hashing
Q) Insert Keys = {89,18,49,58,67} with the hash
function h(x) = x mod 10 using double hashing.
 When x=89, h(89)=89%10=9
 When x=18, h(18)=18%10=8
 When x=49, h(49)=49%10=9(Collision)
So, h2(x)= R – (x mod R) = 7-49%7=7
Position = (original hash value + i*h2(x))%tablesize
=(9+1*7)%10 = 6
So insert key 49 in location 6
Double Hashing
Q) Insert Keys = {89,18,49,58,67} with the hash
function h(x) = x mod 10 using double hashing.
 When x=58, h(58)=58%10=8(Collision)
So, h2(x)= R – (x mod R) = 7-58%7=7-2=5
Position = (original hash value + i*h2(x))%tablesize
=(8+1*5)%10 = 3
So insert key 58 in location 3
 When x=67, h(67)=67%10=7
Open Addressing (Disadvantage)

 Major disadvantages are:


 Each collision resolution results with the
probability for future collision
 If the number keys are more than the address
size of hash table, then collision is sure to occur.
 This is called overflow
 To overcome these disadvantages, separate
chaining is used.
Chaining
 Also called separate chaining
 Use fixed size hash table
 Link lists are used to store the synonyms
 Each slot in hash table points to the head of the linked list
 All the elements for that address is placed in linked list
Chaining
Chaining
Bucket Addressing
 We can store the colliding elements in the same position in
the hash table.
 We associate a bucket with each address.
 A bucket is a block of space large enough to store multiple
items.
 If the bucket is full, the item has to be stored somewhere else.
 So, this approach is often combined with open addressing.
Bucket Addressing
Q) Insert Keys = {102,18,49,58,69,87,88,77,83,120}
with the hash function h(x) = x mod 10 using bucket
hashing with bucket of size 3.
 When x=102, h(102)=102%10=2
 When x= 18,h()= 18%10 = 8
 When x= 49,h()= 49%10 = 9
 When x= 58,h()= 58%10 = 8
 When x= 69,h()= 69%10 = 9
 When x= 87,h()= 87%10 = 7
 When x= 88,h()= 88%10 = 8
 When x= 77,h()= 77%10 = 7
 When x= 83,h()= 83%10 = 3
 When x= 120,h()=120%10 =0
Efficiency Comparison of different
Search Techniques
Efficiency Comparison of different
Search Techniques

You might also like