Search Algorithms in Java. This article describes the implementation of different search algorithms for searching elements in collections. Currently sequential search and binary search are described.
1. Searching in Collections
The Java programming language offers a variety of search algorithms that are both fast and efficient for locating elements within a collection.
This article explores the implementation of various standard search algorithms in Java, focusing on their functionality rather than relying on existing standard implementations.
When searching in collections, several key questions arise:
-
Does the element exist within the collection?
-
How can I retrieve the element from the collection?
-
How can I remove the element from the collection?
In this context, the term "collection" is used broadly and does not strictly adhere to Java’s Collection framework. For example, a collection may include arrays or lists.
2. Implementation
For the upcoming implementations, we will need a project named com.vogella.algorithms.search. To utilize JUnit 5 for testing, create a Maven project with the necessary dependencies and add the following to your pom.xml file:
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="https://maven.apache.org/POM/4.0.0" xmlns:xsi="https://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="https://maven.apache.org/POM/4.0.0 https://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.vogella</groupId>
<artifactId>algorithms.search</artifactId>
<version>1.0-SNAPSHOT</version>
<properties>
<maven.compiler.source>21</maven.compiler.source>
<maven.compiler.target>21</maven.compiler.target>
<junit.jupiter.version>5.10.0</junit.jupiter.version>
</properties>
<dependencies>
<!-- JUnit 5 API and Engine -->
<dependency>
<groupId>org.junit.jupiter</groupId>
<artifactId>junit-jupiter</artifactId>
<version>${junit.jupiter.version}</version>
<scope>test</scope>
</dependency>
<!-- Assertions and Parameterized Tests -->
<dependency>
<groupId>org.junit.jupiter</groupId>
<artifactId>junit-jupiter-params</artifactId>
<version>${junit.jupiter.version}</version>
<scope>test</scope>
</dependency>
</dependencies>
<build>
<plugins>
<!-- Compiler Plugin for Java 21 -->
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<version>3.11.0</version>
<configuration>
<release>21</release>
</configuration>
</plugin>
<!-- Surefire Plugin to run JUnit 5 tests -->
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-surefire-plugin</artifactId>
<version>3.2.5</version>
<configuration>
<includes>
<include>**/*Test.java</include>
</includes>
</configuration>
</plugin>
</plugins>
</build>
</project>
In this project, we will create different packages and implement software tests for various search algorithms.
3. Sequential Search
3.1. Overview
Sequential search is the simplest search approach. Given a collection, it involves examining each element in the collection until the desired element is found or the end of the collection is reached.
3.2. Implementation
Implementing a sequential search is straightforward.
Create a package named com.vogella.algorithms.search.sequential, along with a package that shares the same name.
Then, create the following program.
package com.vogella.algorithms.search.sequential;
public class SequentialSearch {
public static boolean contains(int[] a, int b) {
for (int i : a) {
if (i == b) {
return true;
}
}
return false;
}
}
3.3. Test
You can use the following JUnit 5 test to validate the functionality of your contains method.
For more information on JUnit 5, please refer to the [JUnit 5 Tutorial](https://www.vogella.com/tutorials/JUnit/article.html).
package com.vogella.algorithms.search.sequential;
import static org.junit.jupiter.api.Assertions.assertFalse;
import static org.junit.jupiter.api.Assertions.assertTrue;
import org.junit.jupiter.api.Test;
public class SequentialSearchTest {
@Test
public void testContains() {
int[] a = {1, 2, 3, 4, 5, 19, 17, 7};
assertTrue(SequentialSearch.contains(a, 17));
assertTrue(SequentialSearch.contains(a, 1));
assertTrue(SequentialSearch.contains(a, 2));
assertTrue(SequentialSearch.contains(a, 3));
assertTrue(SequentialSearch.contains(a, 4));
assertFalse(SequentialSearch.contains(a, 10));
}
}
3.4. Complexity Analysis
The sequential search algorithm has an average and worst-case runtime complexity of O(N). For an introduction to complexity analysis, see [Complexity Analysis](https://www.vogella.com/tutorials/ComplexityAnalysis/article.html).
4. Binary Search
4.1. Overview
Binary search is a highly efficient algorithm that locates an element in a sorted collection. To perform a binary search, the collection must first be sorted using algorithms like [Quicksort](https://www.vogella.com/tutorials/JavaAlgorithmsQuicksort/article.html) or [Mergesort](https://www.vogella.com/tutorials/JavaAlgorithmsMergesort/article.html).
The algorithm works by checking the element in the middle of the collection. If the target element is equal to the middle element, the search is complete. If the target element is smaller than the middle element, the search continues in the left sub-array (from the start of the array to the middle element). If the target element is larger, the search continues in the right sub-array (from the middle element to the end of the array). This process repeats until the target element is found or the search space is exhausted.
4.2. Implementation
To implement binary search, create a package named com.vogella.algorithms.search.binary and a package with the same name.
Here is the program that performs the binary search:
package com.vogella.algorithms.search.binary;
public class BinarySearch {
public static boolean contains(int[] a, int b) {
if (a.length == 0) {
return false;
}
int low = 0;
int high = a.length - 1;
while (low <= high) {
int middle = (low + high) / 2;
if (b > a[middle]) {
low = middle + 1;
} else if (b < a[middle]) {
high = middle - 1;
} else { // The element has been found
return true;
}
}
return false; // Element not found
}
}
4.3. Test
You can use the following JUnit 5 test to validate the functionality of your contains method.
For more information on JUnit 5, please refer to the [JUnit 5 Tutorial](https://www.vogella.com/tutorials/JUnit/article.html).
package com.vogella.algorithms.search.binary;
import static org.junit.jupiter.api.Assertions.assertFalse;
import static org.junit.jupiter.api.Assertions.assertTrue;
import org.junit.jupiter.api.Test;
public class BinarySearchTest {
@Test
public void testContains() {
int[] a = {1, 2, 3, 4, 5, 7, 17, 19}; // Sorted array
assertTrue(BinarySearch.contains(a, 1));
assertTrue(BinarySearch.contains(a, 2));
assertTrue(BinarySearch.contains(a, 3));
assertTrue(BinarySearch.contains(a, 4));
assertFalse(BinarySearch.contains(a, 10)); // Element not in array
assertTrue(BinarySearch.contains(a, 17));
}
}
4.4. Complexity Analysis
Binary search significantly reduces the search space with each iteration, resulting in a time complexity of O(log N). For more information on complexity analysis, please see the [Complexity Analysis Tutorial](https://www.vogella.com/tutorials/ComplexityAnalysis/article.html).
5. Interpolation Search
5.1. Overview
Interpolation search is an improved variant of binary search that works on the principle of estimating the position of the search key based on the values of the elements in the array. Unlike binary search, which splits the array in half, interpolation search calculates a more likely position for the search key based on its value relative to the values at the bounds of the array.
Interpolation search is most effective for uniformly distributed data. Its efficiency decreases when data is not uniformly distributed.
5.2. Implementation
To implement interpolation search, you first need a sorted array.
Create a package named com.vogella.algorithms.search.interpolation and a package with the same name.
Here is a simple implementation of interpolation search:
package com.vogella.algorithms.search.interpolation;
public class InterpolationSearch {
public static boolean contains(int[] a, int b) {
int low = 0;
int high = a.length - 1;
while (low <= high && b >= a[low] && b <= a[high]) {
// Estimate the position
int pos = low + ((b - a[low]) * (high - low)) / (a[high] - a[low]);
// Check if the estimated position has the search key
if (a[pos] == b) {
return true;
}
if (a[pos] < b) {
low = pos + 1;
} else {
high = pos - 1;
}
}
return false;
}
}
5.3. Test
You can use the following JUnit 5 test to validate your interpolation search method.
package com.vogella.algorithms.search.interpolation;
import static org.junit.jupiter.api.Assertions.assertFalse;
import static org.junit.jupiter.api.Assertions.assertTrue;
import org.junit.jupiter.api.Test;
public class InterpolationSearchTest {
@Test
public void testContains() {
int[] a = {1, 2, 3, 4, 5, 7, 17, 19};
assertTrue(InterpolationSearch.contains(a, 17));
assertTrue(InterpolationSearch.contains(a, 1));
assertTrue(InterpolationSearch.contains(a, 2));
assertTrue(InterpolationSearch.contains(a, 3));
assertTrue(InterpolationSearch.contains(a, 4));
assertFalse(InterpolationSearch.contains(a, 10));
}
}
5.4. Complexity Analysis
Interpolation search has an average-case time complexity of O(log N) for uniformly distributed data. However, in the worst case, its performance can degrade to O(N), especially for non-uniformly distributed data.
6. Exponential Search
6.1. Overview
Exponential search is a combination of binary search and a linear search. It works well for unbounded or infinite lists and is useful when you want to find a range in which an element may be located. The algorithm first finds a range where the element could be present and then applies binary search within that range.
Exponential search is particularly effective when the size of the list is not known beforehand.
6.2. Implementation
To implement exponential search, create a Java project named com.vogella.algorithms.search.exponential and a package with the same name.
Here is a simple implementation of exponential search:
package com.vogella.algorithms.search.exponential;
public class ExponentialSearch {
public static boolean contains(int[] a, int b) {
if (a.length == 0) {
return false;
}
if (a[0] == b) {
return true;
}
int i = 1;
while (i < a.length && a[i] <= b) {
i *= 2;
}
return binarySearch(a, Math.min(i, a.length - 1), b);
}
private static boolean binarySearch(int[] a, int high, int b) {
int low = high / 2;
while (low <= high) {
int mid = low + (high - low) / 2;
if (a[mid] == b) {
return true;
} else if (a[mid] < b) {
low = mid + 1;
} else {
high = mid - 1;
}
}
return false;
}
}
6.3. Test
You can use the following JUnit 5 test to validate your exponential search method.
package com.vogella.algorithms.search.exponential;
import static org.junit.jupiter.api.Assertions.assertFalse;
import static org.junit.jupiter.api.Assertions.assertTrue;
import org.junit.jupiter.api.Test;
public class ExponentialSearchTest {
@Test
public void testContains() {
int[] a = {1, 2, 3, 4, 5, 7, 17, 19};
assertTrue(ExponentialSearch.contains(a, 17));
assertTrue(ExponentialSearch.contains(a, 1));
assertTrue(ExponentialSearch.contains(a, 2));
assertTrue(ExponentialSearch.contains(a, 3));
assertTrue(ExponentialSearch.contains(a, 4));
assertFalse(ExponentialSearch.contains(a, 10));
}
}
6.4. Complexity Analysis
Exponential search has a time complexity of O(log N) for the search operation. This is because it effectively combines linear and binary search methods, making it efficient for unbounded lists.
7. Comparison of Search Algorithms
In this section, we compare the four search algorithms: Sequential Search, Binary Search, Interpolation Search, and Exponential Search. Each algorithm has its strengths and weaknesses depending on the nature of the data and the requirements of the application.
7.1. Overview of Search Algorithms
| Algorithm | Time Complexity (Average) | Time Complexity (Worst Case) | Space Complexity | Remarks |
|---|---|---|---|---|
Sequential Search |
O(N) |
O(N) |
O(1) |
Simple to implement, works on unsorted data, but inefficient for large collections. |
Binary Search |
O(log N) |
O(log N) |
O(1) |
Requires sorted data, efficient for large collections. |
Interpolation Search |
O(log N) |
O(N) |
O(1) |
Best for uniformly distributed data, requires sorted data, may perform poorly for skewed distributions. |
Exponential Search |
O(log N) |
O(log N) |
O(1) |
Effective for unbounded lists, combines linear search for range finding with binary search for searching within the range. |
7.2. Detailed Comparison
-
Data Requirements:
-
Sequential Search: Can work with unsorted data, making it flexible but less efficient.
-
Binary Search: Requires the collection to be sorted prior to searching that can add overhead if the data is frequently updated.
-
Interpolation Search: Also requires sorted data and is most effective when the data is uniformly distributed.
-
Exponential Search: Requires sorted data and is particularly useful when the size of the dataset is not known in advance.
-
-
Efficiency:
-
Sequential Search: Linear search makes it inefficient for large datasets, as it checks each element one by one.
-
Binary Search: Much more efficient due to its logarithmic time complexity, making it suitable for large collections.
-
Interpolation Search: Can be faster than binary search for uniformly distributed datasets, achieving sub-logarithmic time complexity in average cases.
-
Exponential Search: Efficient for unbounded datasets, leveraging the speed of binary search once the range is found.
-
-
Implementation Complexity:
-
Sequential Search: Very simple to implement and understand.
-
Binary Search: More complex than sequential search due to the need for maintaining the bounds of the search space.
-
Interpolation Search: Slightly more complex than binary search, requiring additional calculations to find the estimated position.
-
Exponential Search: Requires implementing both linear search and binary search, making it more complex to implement.
-
-
Use Cases:
-
Sequential Search: Best used for small or unsorted datasets where performance is not a critical factor.
-
Binary Search: Ideal for large, sorted datasets, such as searching in databases or large arrays.
-
Interpolation Search: Effective for searching in large, uniformly distributed datasets, like statistical data.
-
Exponential Search: Useful for searching in unbounded or infinite datasets where the size is not known in advance, such as in certain algorithmic problems.
-
7.3. Conclusion
Choosing the appropriate search algorithm depends on various factors, including the size and structure of the data, performance requirements, and the specific use case. Each algorithm comes with its own set of trade-offs, making them suited to different scenarios.
Sequential Search is the simplest algorithm and can be applied to any dataset, sorted or unsorted. However, its linear time complexity makes it unsuitable for large datasets, as it will have to check each element one by one, leading to slow performance. Sequential Search is ideal when dealing with small datasets or when simplicity in implementation is a priority over speed. It can also be effective in scenarios where the data is constantly changing, and sorting would introduce additional overhead.
Binary Search is an excellent choice when working with large, sorted datasets. Its logarithmic time complexity makes it significantly faster than Sequential Search, especially as the dataset grows. However, its requirement for sorted data can sometimes introduce additional complexity. In dynamic environments where data is frequently updated, the overhead of keeping the data sorted may offset the benefits of Binary Search. For static or rarely updated datasets, Binary Search is a go-to algorithm for fast search operations, commonly used in databases, index searching, and sorted arrays.
Interpolation Search can outperform Binary Search when the dataset is large and uniformly distributed. Its ability to "guess" the position of the desired element, rather than blindly splitting the dataset as Binary Search does, allows it to achieve sub-logarithmic time complexity in the best-case scenarios. However, this strength is also its weakness: when the data distribution is not uniform, the performance of Interpolation Search can degrade, even approaching linear time complexity in the worst cases. As a result, it is best suited for highly structured data, like statistical data, financial data, or any dataset where the distribution is known and predictable.
Exponential Search is designed for unbounded or infinite lists, where the size of the dataset is not known beforehand. It efficiently finds the range where the search element is located and then applies Binary Search within that range. This makes it a hybrid search algorithm that combines the advantages of linear search (for finding the range) and binary search (for efficiently narrowing down the location). Exponential Search is particularly useful in algorithmic problems where the dataset is theoretically infinite, such as when streaming data is being processed or when performing searches over large, unindexed datasets. Its ability to handle unknown dataset sizes sets it apart from the other algorithms.
When comparing these search algorithms, it is important to consider the nature of the dataset first and foremost. If the dataset is unsorted and small, Sequential Search may be the best fit due to its simplicity. For larger, sorted datasets, Binary Search offers a significant performance boost. If the dataset is large and uniformly distributed, Interpolation Search can provide even faster search times. For cases where the size of the dataset is unknown, Exponential Search offers an efficient hybrid approach to find the element in logarithmic time.
Performance is a key factor in choosing the right search algorithm. For small datasets, the differences between the algorithms are often negligible. However, as datasets grow, the impact of time complexity becomes more pronounced. For instance, a linear search (O(N)) may be fast for a dataset with 10 elements, but for a dataset with a million elements, a logarithmic search (O(log N)) like Binary Search or Exponential Search will outperform Sequential Search by orders of magnitude.
Data structure also plays a crucial role. If the data is sorted, Binary Search and Interpolation Search are clear choices. For unsorted data, Sequential Search is the only applicable option unless sorting the data beforehand is feasible. For applications with dynamic data (frequent insertions, deletions, or updates), maintaining a sorted structure may introduce performance penalties, making Sequential Search a viable choice despite its inefficiency for large datasets.
Lastly, use cases should be carefully considered. Sequential Search is typically used when simplicity is more important than speed, such as in small-scale or throwaway scripts. Binary Search is well-suited for applications where data is frequently accessed and rarely modified, such as lookup tables, databases, and indexing systems. Interpolation Search is ideal for statistical or numerical datasets where values are uniformly distributed, making it common in mathematical software or applications with fixed intervals between data points. Exponential Search, on the other hand, shines in problems where the size of the data is not known upfront, such as searching in infinite or partially unknown datasets like network streams or online algorithms.
In conclusion, there is no one-size-fits-all solution when it comes to search algorithms. Each algorithm has specific strengths that make it optimal for certain types of data or use cases. Sequential Search offers simplicity at the cost of performance, Binary Search provides speed in sorted datasets, Interpolation Search excels in uniform data distributions, and Exponential Search offers a flexible approach for unbounded datasets. By understanding the characteristics of the dataset and the requirements of the application, developers can select the most appropriate search algorithm to achieve efficient and reliable results.
8. Links and Literature
Nothing listed.
8.1. vogella Java example code
If you need more assistance we offer Online Training and Onsite training as well as consulting