Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2007, The VLDB Journal
Given a multi-dimensional point q, a reverse k nearest neighbor (RkNN) query retrieves all the data points that have q as one of their k nearest neighbors. Existing methods for processing such queries have at least one of the following deficiencies: they (i) do not support arbitrary values of k, (ii) cannot deal efficiently with database updates, (iii) are applicable only to 2D data but not to higher dimensionality, and (iv) retrieve only approximate results. Motivated by these shortcomings, we develop algorithms for exact RkNN processing with arbitrary values of k on dynamic, multi-dimensional datasets. Our methods utilize a conventional data-partitioning index on the dataset and do not require any pre-computation. As a second step, we extend the proposed techniques to continuous RkNN search, which returns the RkNN results for every point on a line segment. We evaluate the effectiveness of our algorithms with extensive experiments using both real and synthetic datasets.
Proceedings 2004 VLDB Conference, 2004
Given a point q, a reverse k nearest neighbor (RkNN) query retrieves all the data points that have q as one of their k nearest neighbors. Existing methods for processing such queries have at least one of the following deficiencies: (i) they do not support arbitrary values of k (ii) they cannot deal efficiently with database updates, (iii) they are applicable only to 2D data (but not to higher dimensionality), and (iv) they retrieve only approximate results. Motivated by these shortcomings, we develop algorithms for exact processing of RkNN with arbitrary values of k on dynamic multidimensional datasets. Our methods utilize a conventional data-partitioning index on the dataset and do not require any pre-computation. In addition to their flexibility, we experimentally verify that the proposed algorithms outperform the existing ones even in their restricted focus.
Proceedings of the twelfth international conference on Information and knowledge management - CIKM '03, 2003
Reverse Nearest Neighbor (RNN) queries are of particular interest in a wide range of applications such as decision support systems, profile based marketing, data streaming, document databases, and bioinformatics. The earlier approaches to solve this problem mostly deal with two dimensional data. However most of the above applications inherently involve high dimensions and high dimensional RNN problem is still unexplored. In this paper, we propose an approximate solution to answer RNN queries in high dimensions. Our approach is based on the strong correlation in practice between k-NN and RNN. It works in two phases. In the first phase the k-NN of a query point is found and in the next phase they are further analyzed using a novel type of query Boolean Range Query (BRQ). Experimental results show that BRQ is much more efficient than both NN and range queries, and can be effectively used to answer RNN queries. Performance is further improved by running multiple BRQ simultaneously. The proposed approach can also be used to answer other variants of RNN queries such as RNN of order k, bichromatic RNN, and Matching Query which has many applications of its own. Our technique can efficiently answer NN, RNN, and its variants with approximately same number of I/O as running a NN query.
2006
The reverse k-nearest neighbor (RkNN) problem, i.e. finding all objects in a data set the k-nearest neighbors of which include a specified query object, is a generalization of the reverse 1-nearest neighbor problem which has received increasing attention recently. Many industrial and scientific applications call for solutions of the RkNN problem in arbitrary metric spaces where the data objects are not Euclidean and only a metric distance function is given for specifying object similarity. Usually, these applications need a solution for the generalized problem where the value of k is not known in advance and may change from query to query. However, existing approaches, except one, are designed for the specific R1NN problem. In addition -to the best of our knowledge -all previously proposed methods, especially the one for generalized RkNN search, are only applicable to Euclidean vector data but not for general metric objects. In this paper, we propose the first approach for efficient RkNN search in arbitrary metric spaces where the value of k is specified at query time. Our approach uses the advantages of existing metric index structures but proposes to use conservative and progressive distance approximations in order to filter out true drops and true hits. In particular, we approximate the k-nearest neighbor distance for each data object by upper and lower bounds using two functions of only two parameters each. Thus, our method does not generate any considerable storage overhead. We show in a broad experimental evaluation on real-world data the scalability and the usability of our novel approach.
2007
The reverse k-nearest neighbor (RkNN) problem, i.e. finding all objects in a data set the k-nearest neighbors of which include a specified query object, has received increasing attention recently. Many industrial and scientific applications call for solutions of the RkNN problem in arbitrary metric spaces where the data objects are not Euclidean and only a metric distance function is given for specifying object similarity. Usually, these applications need a solution for the generalized problem where the value of k is not known in advance and may change from query to query. In addition, many applications require a fast approximate answer of RkNN-queries. For these scenarios, it is important to generate a fast answer with high recall. In this paper, we propose the first approach for efficient approximative RkNN search in arbitrary metric spaces where the value of k is specified at query time. Our approach uses the advantages of existing metric index structures but proposes to use an approximation of the nearest-neighbor-distances in order to prune the search space. We show that our method scales significantly better than existing non-approximative approaches while producing an approximation of the true query result with a high recall.
Given a set P of N points in a ddimensional space, along with a query point q, it is often desirable to find k points of P that are with high probability close to q. This is the Approximate k-Nearest-Neighbors problem. We present two algorithms for AkNN. Both require O(N 2 d) preprocessing time. The first algorithm has a query time cost that is O(d+log N ), while the second has a query time cost that is O(d). Both algorithms create an undirected graph on the points of P by adding edges to a linked list storing P in Hilbert order. To find approximate nearest neighbors of a query point, both algorithms perform bestfirst search on this graph. The first algorithm uses standard one dimensional indexing structures to find starting points on the graph for this search, whereas the second algorithm using random starting points. Despite the quadratic preprocessing time, our algorithms have the potential to be useful in machine learning applications where the number of query points that need to be processed is large compared to the number of points in P . The linear dependence in d of the preprocessing and query time costs of our algorithms allows them to remain effective even when dealing with highdimensional data.
The Vldb Journal
In this paper, we study the problem of continuous monitoring of reverse k nearest neighbors queries in Euclidean space as well as in spatial networks. Existing techniques are sensitive toward objects and queries movement. For example, the results of a query are to be recomputed whenever the query changes its location. We present a framework for continuous reverse k nearest neighbor (RkNN) queries by assigning each object and query with a safe region such that the expensive recomputation is not required as long as the query and objects remain in their respective safe regions. This significantly improves the computation cost. As a byproduct, our framework also reduces the communication cost in client–server architectures because an object does not report its location to the server unless it leaves its safe region or the server sends a location update request. We also conduct a rigid cost analysis for our Euclidean space RkNN algorithm. We show that our techniques can also be applied to answer bichromatic RkNN queries in Euclidean space as well as in spatial networks. Furthermore, we show that our techniques can be extended for the spatial networks that are represented by directed graphs. The extensive experiments demonstrate that our techniques outperform the existing techniques by an order of magnitude in terms of computation cost and communication cost.
2009 IEEE 25th International Conference on Data Engineering, 2009
Reverse nearest neighbor (RNN) queries have a broad application base such as decision support, profile-based marketing, resource allocation, data mining, etc. Previous work on RNN search does not take obstacles into consideration. In the real world, however, there are many physical obstacles (e.g., buildings, blindages, etc.), and their presence may affect the visibility/distance between two objects. In this paper, we introduce a novel variant of RNN queries, namely visible reverse nearest neighbor (VRNN) search, which considers the obstacle influence on the visibility of objects. Given a data set P, an obstacle set O, and a query point q, a VRNN query retrieves the points in P that have q as their nearest neighbor and are visible to q. We propose an efficient algorithm for VRNN query processing, assuming that both P and O are indexed by R-trees. Our methods do not require any pre-processing, and employ half-plane property and visibility check to prune the search space. In addition, we extend our solution to tackle the visible reverse knearest neighbor (VRkNN) search, which finds the points in P that have q as one of their k nearest neighbors and are visible to q. Extensive experiments on synthetic and real datasets have been conducted which demonstrate the efficiency and effectiveness of our proposed algorithms.
Proceedings of the 12th International Conference on Extending Database Technology Advances in Database Technology - EDBT '09, 2009
In this paper, we propose an original solution for the general reverse k-nearest neighbor (RkNN) search problem. Compared to the limitations of existing methods for the RkNN search, our approach works on top of any hierarchically organized tree-like index structure and, thus, is applicable to any type of data as long as a metric distance function is defined on the data objects. We will exemplarily show how our approach works on top of the most prevalent index structures for Euclidean and metric data, the R-Tree and the M-Tree, respectively. Our solution is applicable for arbitrary values of k and can also be applied in dynamic environments where updates of the database frequently occur. Although being the most general solution for the RkNN problem, our solution outperforms existing methods in terms of query execution times because it exploits different strategies for pruning false drops and identifying true hits as soon as possible.
2009 Second International Workshop on Similarity Search and Applications, 2009
Retrieving the k-nearest neighbors of a query object is a basic primitive in similarity searching. A related, far less explored primitive is to obtain the dataset elements which would have the query object within their own k-nearest neighbors, known as the reverse k-nearest neighbor query. We already have indices and algorithms to solve k-nearest neighbors queries in general metric spaces; yet, in many cases of practical interest they degenerate to sequential scanning. The naive algorithm for reverse k-nearest neighbor queries has quadratic complexity, because the k-nearest neighbors of all the dataset objects must be found; this is too expensive. Hence, when solving these primitives we can tolerate trading correctness in the solution for searching time. In this paper we propose an efficient approximate approach to solve these similarity queries with high retrieval rate. Then, we show how to use our approximate k-nearest neighbor queries to construct (an approximation of) the k-nearest neighbor graph when we have a fixed dataset. Finally, combining both primitives we show how to dynamically maintain the approximate k-nearest neighbor graph of the objects currently stored within the metric dataset, that is, considering both object insertions and deletions.
2018
Efficient k-nearest neighbor computation for high-dimensional data is an important, yet challenging task. The response times of stateof-the-art indexing approaches highly depend on factors like distribution of the data. For clustered data, such approaches are several factors faster than a sequential scan. However, if various dimensions contain uniform or Gaussian data they tend to be clearly outperformed by a simple sequential scan. Hence, we require for an approach generally delivering good response times, independent of the data distribution. As solution, we propose to exploit a novel concept to efficiently compute nearest neighbors. We name it sub-space distance equality, which aims at reducing the number of distance computations independent of the data distribution. We integrate knn computing algorithms into the Elf index structure allowing to study the sub-space distance equality concept in isolation and in combination with a main-memory optimized storage layout. In a large com...
Data & Knowledge Engineering, 2005
Previous algorithms of data partitioning methods (DPMs) to find the exact K-nearest neighbors (KNN) at high dimensions are outperformed by a linear scan method [J.M. Kleinberg, Two algorithms for nearest neighbor search in high dimensions, 29th ACM Symposium on Theory of computing, 1997; R. Weber, H.-J. Schek, S. Blott. A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. in: Proc. of the 24th VLDB, USA, 1998]. In this paper, we present a ''plug& search'' method to greatly speed up the exact KNN search of existing DPMs. The idea is to linearize the data partitions produced by a DPM, rather than the points themselves, into a one-dimensional array-index, that is simple, compact and fast. Unlike most DPMs that support KNN search, which require storage space linear, or exponential [J.M. Kleinberg, Two algorithms for nearest neighbor search in high dimensions, 29th ACM Symposium on Theory of computing, 1997; M. Hagedoom, Nearest neighbors can be found efficiently if the dimension is small relative to the input size, ICDT 2003], in dimensions, the array-index requires a storage space that is linear in the number of mapped partitions.
2011
Given a set of objects and a query q, a point p is called the reverse k nearest neighbor (RkNN) of q if q is one of the k closest objects of p. In this paper, we introduce the concept of influence zone which is the area such that every point inside this area is the RkNN of q and every point outside this area is not the RkNN. The influence zone has several applications in location based services, marketing and decision support systems. It can also be used to efficiently process RkNN queries. First, we present efficient algorithm to compute the influence zone. Then, based on the influence zone, we present efficient algorithms to process RkNN queries that significantly outperform existing best known techniques for both the snapshot and continuous RkNN queries. We also present a detailed theoretical analysis to analyse the area of the influence zone and IO costs of our RkNN processing algorithms. Our experiments demonstrate the accuracy of our theoretical analysis.
International Journal of Intelligent Systems, 2011
K-nearest neighbors (KNN) search in a high-dimensional vector space is an important paradigm for a variety of applications. Despite the continuous efforts in the past years, algorithms to find the exact KNN answer set at high dimensions are outperformed by a linear scan method. In this paper, we propose a technique to find the exact KNN image objects to a given query object. First, the proposed technique clusters the images using a self-organizing map algorithm and then it projects the found clusters into points in a linear space based on the distances between each cluster and a selected reference point. These projected points are then organized in a simple, compact, and yet fast index structure called array-index. Unlike most indexes that support KNN search, the array-index requires a storage space that is linear in the number of projected points. The experiments show that the proposed technique is more efficient and robust to dimensionality as compared to other well-known techniques because of its simplicity and compactness. C 2011 Wiley Periodicals, Inc. Figure 10. Comparison between the array-index, a linear scan, R*-tree, and SR-tree methods in terms of KNN search time versus number of dimensions: (left) for K = 1 and (right) for K = 50.
Proceedings of the VLDB Endowment, 2017
Given a query object q, reverse k -nearest neighbor (R k NN) search aims to locate those objects of the database that have q among their k -nearest neighbors. In this paper, we propose an approximation method for solving R k NN queries, where the pruning operations and termination tests are guided by a characterization of the intrinsic dimensionality of the data. The method can accommodate any index structure supporting incremental (forward) nearest-neighbor search for the generation and verification of candidates, while avoiding impractically-high preprocessing costs. We also provide experimental evidence that our method significantly outperforms its competitors in terms of the tradeoff between execution time and the quality of the approximation. Our approach thus addresses many of the scalability issues surrounding the use of previous methods in data mining.
Pattern Recognition, 2010
A novel approach for k-nearest neighbor (k-NN) searching with Euclidean metric is described. It is well known that many sophisticated algorithms cannot beat the brute-force algorithm when the dimensionality is high. In this study, a probably correct approach, in which the correct set of k-nearest neighbors is obtained in high probability, is proposed for greatly reducing the searching time. We exploit the marginal distribution of the kth nearest neighbors in low dimensions, which is estimated from the stored data (an empirical percentile approach). We analyze the basic nature of the marginal distribution and show the advantage of the implemented algorithm, which is a probabilistic variant of the partial distance searching. Its query time is sublinear in data size n, that is, O(mnδ) with δ = o(1) in n and δ ≤ 1 , for any fixed dimension m.
IEEE Transactions on Pattern Analysis and Machine Intelligence, 1997
The problem of finding the closest point in high-dimensional spaces is common in pattern recognition. Unfortunately, the complexity of most existing search algorithms, such as k-d tree and R-tree, grows exponentially with dimension, making them impractical for dimensionality above 15. In nearly all applications, the closest point is of interest only if it lies within a user-specified distance e. We present a simple and practical algorithm to efficiently search for the nearest neighbor within Euclidean distance e. The use of projection search combined with a novel data structure dramatically improves performance in high dimensions. A complexity analysis is presented which helps to automatically determine e in structured problems. A comprehensive set of benchmarks clearly shows the superiority of the proposed algorithm for a variety of structured and unstructured search problems. Object recognition is demonstrated as an example application. The simplicity of the algorithm makes it possible to construct an inexpensive hardware search engine which can be 100 times faster than its software equivalent. A C++ implementation of our algorithm is available upon request to [email protected]/CAVE/.
IEEE Transactions on Knowledge and Data Engineering, 2018
Given a set of facilities and a set of users, a reverse nearest neighbors (RNN) query retrieves every user u for which the query facility q is its closest facility. Since q is the closest facility to u, the user u is said to be influenced by q. In this paper, we propose a relaxed definition of influence where a user u is said to be influenced by not only its closest facility but also every other facility that is almost as close to u as its closest facility is. Based on this definition of influence, we propose reverse approximate nearest neighbors (RANN) queries. Formally, given a value x > 1, an RANN query q returns every user u for which distðu; qÞ x  NNDistðuÞ where NNDistðuÞ denotes the distance between a user u and its nearest facility, i.e., q is an approximate nearest neighbor of u. In this paper, we study both snapshot and continuous versions of RANN queries. In a snapshot RANN query, the underlying data sets do not change and the results of a query are to be computed only once. In the continuous version, the users continuously change their locations and the results of RANN queries are to be continuously monitored. Based on effective pruning techniques and several non-trivial observations, we propose efficient RANN query processing algorithms for both the snapshot and continuous RANN queries. We conduct extensive experiments on both real and synthetic data sets and demonstrate that our algorithm for both snapshot and continuous queries are significantly better than the competitors.
The VLDB Journal, 2012
Given a set of objects and a query , a point is called the reverse nearest neighbor (R NN) of if is one of the closest objects of. In this paper, we introduce the concept of influence zone which is the area such that every point inside this area is the R NN of and every point outside this area is not the R NN. The influence zone has several applications in location based services, marketing and decision support systems. It can also be used to efficiently process R NN queries. First, we present efficient algorithm to compute the influence zone. Then, based on the influence zone, we present efficient algorithms to process R NN queries that significantly outperform existing best known techniques for both the snapshot and continuous R NN queries. We also present a detailed theoretical analysis to analyse the area of the influence zone and IO costs of our R NN processing algorithms. Our experiments demonstrate the accuracy of our theoretical analysis. This paper is an extended version of our previous work [9]. We make the following new contributions in this extended version: 1) we conduct a rigorous complexity analysis and show that the complexity of one of our proposed algorithms in [9] can be reduced from (2) to () where > is the number of objects used to compute the influence zone ; 2) we show that our techniques can
1998
Similarity search in multimedia databases requires an efficient support of nearest-neighbor search on a large set of high-dimensional points as a basic operation for query processing. As recent theoretical results show, state of the art approaches to nearest-neighbor search are not efficient in higher dimensions. In our new approach, we therefore precompute the result of any nearest-neighbor search which corresponds to a computation of the voronoi cell of each data point. In a second step, we store the voronoi cells in an index structure efficient for high-dimensional data spaces. As a result, nearest neighbor search corresponds to a simple point query on the index structure. Although our technique is based on a precomputation of the solution space, it is dynamic, i.e. it supports insertions of new data points. An extensive experimental evaluation of our technique demonstrates the high efficiency for uniformly distributed as well as real data. We obtained a significant reduction of the search time compared to nearest neighbor search in the X-tree (up to a factor of 4).
2015
Spatial data mining is a special kind of data mining. Patterns, clusters, classifications, etc., can be derived from the big data available. Especially, nearest neighbor search approach with respect to a query point plays a key role in arriving at the final decision making. Like Computer Integrated Manufacturing, Facility Layout, Cellular Manufacturing, nearest neighbor search has been found several applications in searching the nearest hospitals, restaurants, jogging parks, wedding halls, cinema theaters, schools, etc. This paper presents a brief literature review of efficient and fast nearest neighbor search. The older approach is banked upon IR 2 –Tree that usually follows two strategies: R Tree and Signature files. But during the last couple of years, several research papers have been published for fast and efficient nearest neighbor search (FNN) optimizing space; accuracy for handling geometric properties and documents, etc, SI-Index is one of the latest techniques that deal ef...
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.