Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2014, Lecture Notes in Computer Science
…
12 pages
1 file
We study the convex-hull problem in a probabilistic setting, motivated by the need to handle data uncertainty inherent in many applications, including sensor databases, location-based services and computer vision. In our framework, the uncertainty of each input point is described by a probability distribution over a finite number of possible locations including a null location to account for non-existence of the point. Our results include both exact and approximation algorithms for computing the probability of a query point lying inside the convex hull of the input, time-space tradeo↵s for the membership queries, a connection between Tukey depth and membership queries, as well as a new notion of-hull that may be a useful representation of uncertain hulls.
Lecture Notes in Computer Science, 2005
We study the problem of answering spatial queries in databases where objects exist with some uncertainty and they are associated with an existential probability. The goal of a thresholding probabilistic spatial query is to retrieve the objects that qualify the spatial predicates with probability that exceeds a threshold. Accordingly, a ranking probabilistic spatial query selects the objects with the highest probabilities to qualify the spatial predicates. We propose adaptations of spatial access methods and search algorithms for probabilistic versions of range queries and nearest neighbors and conduct an extensive experimental study, which evaluates the effectiveness of proposed solutions.
The domain of existentially uncertain spatial data refers to objects that are modelled using an existential probability accompanying spatial data values. An interesting and challenging query type over existentially uncertain data is the search of the Nearest Neighbor (NN), since the probability of a potential dataset object to be the NN of the query object depends on the locations and probabilities of other points in the same dataset. In this paper, following a statistical approach, we estimate the average number of the NNs required to answer probabilistic thresholding NN (PTNN) queries as function of the threshold t, allowing us to utilize existing approaches and propose a cost model for such queries. Based on the same statistical approach, we propose an efficient algorithm for PTNN queries over arbitrarily structured existentially uncertain spatial data. Our experimental study demonstrates the accuracy and efficiency of the proposed techniques.
Lecture Notes in Computer Science, 2013
Consider a set of d-dimensional points where the existence or the location of each point is determined by a probability distribution. The convex hull of this set is a random variable distributed over exponentially many choices. We are interested in finding the most likely convex hull, namely, the one with the maximum probability of occurrence. We investigate this problem under two natural models of uncertainty: the point (also called the tuple) model where each point (site) has a fixed position si but only exists with some probability πi, for 0 < πi ≤ 1, and the multipoint model where each point has multiple possible locations or it may not appear at all. We show that the most likely hull under the point model can be computed in O(n 3) time for n points in d = 2 dimensions, but it is NP-hard for d ≥ 3 dimensions. On the other hand, we show that the problem is NP-hard under the multipoint model even for d = 2 dimensions. We also present hardness results for approximating the probability of the most likely hull. While we focus on the most likely hull for concreteness, our results hold for other natural definitions of a probabilistic hull.
IEEE Transactions on Knowledge and Data Engineering, 2000
We study the problem of answering spatial queries in databases where objects exist with some uncertainty and they are associated with an existential probability. The goal of a thresholding probabilistic spatial query is to retrieve the objects that qualify the spatial predicates with probability that exceeds a threshold. Accordingly, a ranking probabilistic spatial query selects the objects with the highest probabilities to qualify the spatial predicates. We propose adaptations of spatial access methods and search algorithms for probabilistic versions of range queries, nearest neighbors, spatial skylines, and reverse nearest neighbors and conduct an extensive experimental study, which evaluates the effectiveness of proposed solutions.
Information Sciences, 2017
Location-based services have motivated intensive research in the field of mobile computing, and particularly on location-dependent queries. Existing approaches usually assume that the location data are expressed at a fine geographic precision (physical coordinates such as GPS). However, many positioning mechanisms are subject to an inherent imprecision (e.g., the cell-id mechanism used in cellular networks can only determine the cell where a certain moving object is located). Moreover, even a GPS location can be subject to an error or be obfuscated for privacy reasons. Thus, moving objects can be considered to be associated to an uncertainty area where they can be located. In this paper, we analyze the problem introduced by the imprecision of the location data available in the data sources by modelling them using uncertainty areas. To do so, we propose to use a higher-level representation of locations which includes uncertainty, formalizing the concept of uncertainty location granule. This allows us to consider probabilistic location-dependent queries, among which we will focus on probabilistic inside (range) constraints. The adopted model allows us to develop a systematic and efficient approach for processing this kind of queries. An experimental evaluation shows that these probabilistic queries can be supported efficiently.
2007
Nearest-neighbor queries are an important query type for commonly used feature databases. In many different application areas, e.g. sensor databases, location based services or face recognition systems, distances between objects have to be computed based on vague and uncertain data. A successful approach is to express the distance between two uncertain objects by probability density functions which assign a probability value to each possible distance value. By integrating the complete probabilistic distance function as a whole directly into the query algorithm, the full information provided by these functions is exploited. The result of such a probabilistic query algorithm consists of tuples containing the result object and a probability value indicating the likelihood that the object satisfies the query predicate. In this paper we introduce an efficient strategy for processing probabilistic nearest-neighbor queries, as the computation of these probability values is very expensive. In a detailed experimental evaluation, we demonstrate the benefits of our probabilistic query approach. The experiments show that we can achieve high quality query results with rather low computational cost.
2008 Panhellenic Conference on Informatics, 2008
The domain of existentially uncertain spatial data refers to objects that are modelled using an existential probability accompanying spatial data values. An interesting and challenging query type over existentially uncertain data is the search of the Nearest Neighbor (NN), since the probability of a potential dataset object to be the NN of the query object depends on the locations and probabilities of other points in the same dataset. In this paper, following a statistical approach, we estimate the average number of the NNs required to answer probabilistic thresholding NN (PTNN) queries as function of the threshold t, allowing us to utilize existing approaches and propose a cost model for such queries. Based on the same statistical approach, we propose an efficient algorithm for PTNN queries over arbitrarily structured existentially uncertain spatial data. Our experimental study demonstrates the accuracy and efficiency of the proposed techniques.
2004
It is infeasible for a sensor database to contain the exact value of each sensor at all points in time. This uncertainty is inherent in these systems due to measurement and sampling errors, and resource limitations. In order to avoid drawing erroneous conclusions based upon stale data, the use of uncertainty intervals that model each data item as a range and associated probability density function (pdf) rather than a single value has recently been proposed. Querying these uncertain data introduces imprecision into answers, in the form of probability values that specify the likeliness the answer satisfies the query. These queries are more expensive to evaluate than their traditional counterparts but are guaranteed to be correct and more informative due to the probabilities accompanying the answers. Although the answer probabilities are useful, for many applications, their precise value is less critical. In particular, for many queries it is only necessary to know whether the probability exceeds a given threshold -we term these Probabilistic Threshold Queries (PTQ). In this paper we address the efficient computation of these types of queries.
IEEE Transactions on Knowledge and Data Engineering, 2012
In many applications, including location-based services, queries may not be precise. In this paper, we study the problem of efficiently computing range aggregates in a multidimensional space when the query location is uncertain. Specifically, for a query point Q whose location is uncertain and a set S of points in a multidimensional space, we want to calculate the aggregate (e.g., count, average and sum) over the subset S 0 of S such that for each p 2 S 0 , Q has at least probability within the distance to p. We propose novel, efficient techniques to solve the problem following the filtering-and-verification paradigm. In particular, two novel filtering techniques are proposed to effectively and efficiently remove data points from verification. Our comprehensive experiments based on both real and synthetic data demonstrate the efficiency and scalability of our techniques.
Information Systems, 2013
Due to the pervasive data uncertainty in many real applications, efficient and effective query answering on uncertain data has recently gained much attention from the database community. In this paper, we propose a novel and important query in the context of uncertain databases, namely probabilistic group subspace skyline (PGSS) query, which is useful in applications like sensor data analysis. Specifically, a PGSS query retrieves those uncertain objects that are, with high confidence, not dynamically dominated by other objects, with respect to a group of query points in ad-hoc subspaces. In order to enable fast PGSS query answering, we propose effective pruning methods to reduce the PGSS search space, which are seamlessly integrated into an efficient PGSS query procedure. Furthermore, to achieve low query cost, we provide a cost model, in light of which uncertain data are pre-processed and indexed. Extensive experiments have been conducted to demonstrate the efficiency and effectiveness of our proposed approaches.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.
Lecture Notes in Computer Science, 2003
IEEE Transactions on Knowledge and Data Engineering, 2010
IEEE Transactions on Knowledge and Data Engineering, 2000
Lecture Notes in Computer Science, 2013
Lecture Notes in Computer Science, 2009
Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, 2015
Journal of Computer and System Sciences
Lecture Notes in Computer Science, 2011
bvicam.ac.in
Proceedings of the 31st symposium on Principles of Database Systems - PODS '12, 2012
The Vldb Journal, 2017
Journal of Computer Science, 2020
Proceedings of the 11th international conference on Extending database technology Advances in database technology - EDBT '08, 2008
VLDB JOURNAL, 2009
Proceedings of the VLDB …, 2008