Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2009
…
12 pages
1 file
In this paper, we present a Distributed Incremental Nearest Neighbor algorithm (DINN) for finding closest objects in an incremental fashion over data distributed among computer nodes, each able to perform its local Incremental Nearest Neighbor (local-INN) algorithm. We prove that our algorithm is optimum with respect to both the number of involved nodes and the number of local-INN invocations. An implementation of our DINN algorithm, on a real P2P system called MCAN, was used for conducting an extensive experimental evaluation on a real-life dataset.
Future Generation Computer Systems, 2009
The state of the art of searching for non-text data (e.g., images) is to use extracted metadata annotations or text, which might be available as a related information. However, supporting real content-based audio-visual search, based on similarity search on features, is significantly more expensive than searching for text. Moreover, such search exhibits linear scalability with respect to the data set size, so parallel query execution is needed.
2007
Abstract Searching for non-text data (eg, images) is mostly done by means of metadata annotations or by extracting the text close to the data. However, supporting real content-based audio-visual search, based on similarity search on features, is significantly more expensive than searching for text. Moreover, the search exhibits linear scalability with respect to the data set size.
2005
Similarity search in metric spaces represents an important paradigm for content-based retrieval in many applications. Existing centralized search structures can speed-up retrieval, but they do not scale up to large volume of data because the response time is linearly increasing with the size of the searched file. In this article, we study the problem of executing the nearest neighbor (s) queries in a distributed metric structure, which is based on the P2P communication paradigm and the generalized hyperplane partitioning.
Lecture Notes in Computer Science, 2012
We propose a novel approach for solving the approximate nearest neighbor search problem in arbitrary metric spaces. The distinctive feature of our approach is that we can incrementally build a non-hierarchical distributed structure for given metric space data with a logarithmic complexity scaling on the size of the structure and adjustable accuracy probabilistic nearest neighbor queries. The structure is based on a small world graph with vertices corresponding to the stored elements, edges for links between them and the greedy algorithm as base algorithm for searching. Both search and addition algorithms require only local information from the structure. The performed simulation for data in the Euclidian space shows that the structure built using the proposed algorithm has navigable small world properties with logarithmic search complexity at fixed accuracy and has weak (power law) scalability with the dimensionality of the stored data.
ArXiv, 2020
Nearest neighbor search (NNS) has a wide range of applications in information retrieval, computer vision, machine learning, databases, and other areas. Existing state-of-the-art algorithm for nearest neighbor search, Hierarchical Navigable Small World Networks(HNSW), is unable to scale to large datasets of 100M records in high dimensions. In this paper, we propose LANNS, an end-to-end platform for Approximate Nearest Neighbor Search, which scales for web-scale datasets. Library for Large Scale Approximate Nearest Neighbor Search (LANNS) is deployed in multiple production systems for identifying topK ($100 \leq topK \leq 200$) approximate nearest neighbors with a latency of a few milliseconds per query, high throughput of 2.5k Queries Per Second (QPS) on a single node, on large ($\sim$180M data points) high dimensional (50-2048 dimensional) datasets.
IEEE Transactions on Knowledge and Data Engineering, 2015
Central to many applications involving moving objects is the task of processing k-nearest neighbor (k-NN) queries. Most of the existing approaches to this problem are designed for the centralized setting where query processing takes place on a single server; it is difficult, if not impossible, for them to scale to a distributed setting to handle the vast volume of data and concurrent queries that are increasingly common in those applications. To address this problem, we propose a suite of solutions that can support scalable distributed processing of k-NN queries. We first present a new index structure called Dynamic Strip Index (DSI), which can better adapt to different data distributions than exiting grid indexes. Moreover, it can be naturally distributed across the cluster, therefore lending itself well to distributed processing. We further propose a distributed k-NN search (DKNN) algorithm based on DSI. DKNN avoids having an uncertain number of potentially expensive iterations, and is thus more efficient and more predictable than existing approaches. DSI and DKNN are implemented on Apache S4, an open-source platform for distributed stream processing. We perform extensive experiments to study the characteristics of DSI and DKNN, and compare them with three baseline methods. Experimental results show that our proposal scales well and significantly outperforms the alternative methods.
Proceedings of the first ACM/IEEE-CS joint conference on Digital libraries - JCDL '01, 2001
This paper considers the processing of digital library queries, consisting of a text component and a structured component in distributed environments. The paper concentrates on the processing of the structured component of a distributed query. A method is proposed to identify the databases that are likely to be useful for processing any given query and to determine the tuples from each useful site which are necessary for answering the query. In this way, both the communication cost and the local processing costs are saved. One common characteristic of these "k" nearest neighbors queries is that it is not necessary to obtain all the "k" nearest neighbors; it is often sufficient to get most of the "k" neighbors. Experimental results are provided to demonstrate that most of the "k" nearest neighbors (85% to 100%) are obtained using this approach. An average accuracy rate of 94.7% is achieved when the 20 closest neighbors are desired. (Contains 15 references.) (AEF) Reproductions supplied by EDRS are the best that can be made from the original document.
Nearest neighbor queries can be satisfied, in principle, with a greedy algorithm under a proximity graph. Each object in the database is represented by a node, and proximal nodes in this graph will share an edge. To find the nearest neighbor the idea is quite simple, we start in a random node and get iteratively closer to the nearest neighbor following only adjacent edges in the proximity graph. Every reachable node from current vertex is reviewed, and only the closer-to-the-query node is expanded in the next round. The algorithm stops when none of the neighbors of the current node is closer to the query. The number of revised objects will be proportional to the diameter of the graph times the average degree of the nodes. Unfortunately the degree of a proximity graph is unbounded for a general metric space [1], and hence the number of inspected objects can be linear on the size of the database, which is the same as no indexing at all. In this paper we introduce a quasi-proximity graph induced by the all-knearest neighbor graph. The degree of the above graph is bounded but we will face local minima when running the above greedy algorithm, which boils down to have false positives in the queries. We show experimental results for high dimensional spaces. We report a recall greater than 90% for most configurations, which is very good for many proximity searching applications, reviewing just a tiny portion of the database. The space requirement for the index is linear on the database size, and the construction time is quadratic in worst case. Relaxations of our method are sketched to obtain practical subquadratic implementations.
2009 Second International Workshop on Similarity Search and Applications, 2009
Retrieving the k-nearest neighbors of a query object is a basic primitive in similarity searching. A related, far less explored primitive is to obtain the dataset elements which would have the query object within their own k-nearest neighbors, known as the reverse k-nearest neighbor query. We already have indices and algorithms to solve k-nearest neighbors queries in general metric spaces; yet, in many cases of practical interest they degenerate to sequential scanning. The naive algorithm for reverse k-nearest neighbor queries has quadratic complexity, because the k-nearest neighbors of all the dataset objects must be found; this is too expensive. Hence, when solving these primitives we can tolerate trading correctness in the solution for searching time. In this paper we propose an efficient approximate approach to solve these similarity queries with high retrieval rate. Then, we show how to use our approximate k-nearest neighbor queries to construct (an approximation of) the k-nearest neighbor graph when we have a fixed dataset. Finally, combining both primitives we show how to dynamically maintain the approximate k-nearest neighbor graph of the objects currently stored within the metric dataset, that is, considering both object insertions and deletions.
ICPS '05. Proceedings. International Conference on Pervasive Services, 2005., 2005
In databases of moving objects it is important to answer queries that concern the future positions of the objects. An important query type in such an environment is the nearest-neighbor query, which asks for the k closest objects of a query object during a time interval [t s , t e ]. However, there are cases where the (k+1)-th nearest-neighbor is requested after the execution of the k-NN query. In such a case, either the query must be evaluated again, or we can exploit the previous result and use an incremental method to determine the new answer. We focus on the second alternative and present efficient incremental algorithms that outperform the trivial method which is based on complete re-execution of the query. In addition, we study the problem of keeping the query result consistent in the presence of object insertions, deletions and updates which are very common in a dynamic moving-object environment.
Lecture Notes in Computer Science, 2005
Proceedings of the thirteenth ACM …, 2004
Lecture Notes in Computer Science, 2005
Proceedings of the 1st international workshop on Computer vision meets databases - CVDB '04, 2004
Proceedings of the 32nd ACM Symposium on Parallelism in Algorithms and Architectures, 2020