Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2011, Information Systems
Many recent applications involve processing and analyzing uncertain data. In this paper, we combine the feature of top-k objects with that of skyline to model the problem of top-k skyline objects against uncertain data. The problem of efficiently computing top-k skyline objects on large uncertain datasets is challenging in both discrete and continuous cases. In this paper, firstly an efficient exact algorithm for computing the top-k skyline objects is developed for discrete cases. To address applications where each object may have a massive set of instances or a continuous probability density function, we also develop an efficient randomized algorithm with an E-approximation guarantee. Moreover, our algorithms can be immediately extended to efficiently compute p-skyline; that is, retrieving the uncertain objects with skyline probabilities above a given threshold. Our extensive experiments on synthetic and real data demonstrate the efficiency of both algorithms and the randomized algorithm is highly accurate. They also show that our techniques significantly outperform the existing techniques for computing p-skyline.
IEEE Transactions on Knowledge and Data Engineering, 2012
With the rapid increase in the amount of uncertain data available, probabilistic skyline computation on uncertain databases has become an important research topic. Previous work on probabilistic skyline computation, however, only identifies those objects whose skyline probabilities are higher than a given threshold, or is useful only for 2D data sets. In this paper, we develop a probabilistic skyline algorithm called PSkyline which computes exact skyline probabilities of all objects in a given uncertain data set. PSkyline aims to identify blocks of instances with skyline probability zero, and more importantly, to find incomparable groups of instances and dispense with unnecessary dominance tests altogether. To increase the chance of finding such blocks and groups of instances, PSkyline uses a new in-memory tree structure called Z-tree. We also develop an online probabilistic skyline algorithm called O-PSkyline for uncertain data streams and a top-k probabilistic skyline algorithm called K-PSkyline to find top-k objects with the highest skyline probabilities. Experimental results show that all the proposed algorithms scale well to large and high-dimensional uncertain databases.
Skyline filters out hard and fast of thrilling points from a probably large set of statistics points. A point is thrilling if it is not dominated by using explicit. The skyline queries are vital on the way to assist users to deal with the large amount of available records through figuring out a fixed of interesting records items. Skyline computation is extensively used in multi-standards decision making. This paper conducts a survey on research issues on computing skyline for uncertain databases, with the view of providing interested researchers with an overview of the most recent research directions in this area. It further suggests possible research direction on skyline processing for uncertain databases.
Lecture Notes in Computer Science, 2013
Skyline operator is a useful tool in multi-criteria decision making in various applications. Uncertainty is inherent in real applications due to various reasons. In this paper, we consider the problem of efficiently computing probabilistic skylines against the most recent N uncertain elements in a data stream seen so far. Specifically, we study the problem in the n-of-N model; that is, computing the probabilistic skyline for the most recent n (∀n ≤ N) elements, where an element is a probabilistic skyline element if its skyline probability is not below a given probability threshold q. Firstly, an effective pruning technique to minimize the number of uncertain elements to be kept is developed. It can be shown that on average storing only O(log d N) uncertain elements from the most recent N elements is sufficient to support the precise computation of all probabilistic n-of-N skyline queries in a d-dimension space if the data distribution on each dimension is independent. A novel encoding scheme is then proposed together with efficient update techniques so that computing a probabilistic n-of-N skyline query in a d-dimension space is reduced to O(d log log N + s) if the data distribution is independent, where s is the number of skyline points. A trigger based technique is provided to process continuous n-of-N skyline queries. Extensive experiments demonstrate that the new techniques on uncertain data streams can support on-line probabilistic skyline query computation over rapid data streams.
ACM Transactions on Database Systems, 2012
In many applications involving multiple criteria optimal decision making, users may often want to make a personal trade-off among all optimal solutions for selecting one object that fits best their personal needs. As a key feature, the skyline in a multidimensional space provides the minimum set of candidates for such purposes by removing all points not preferred by any (monotonic) utility/scoring functions; that is, the skyline removes all objects not preferred by any user no matter how their preferences vary. Driven by many recent applications with uncertain data, the probabilistic skyline model is proposed to retrieve uncertain objects based on skyline probabilities. Nevertheless, skyline probabilities cannot capture the preferences of monotonic utility functions. Motivated by this, in this article we propose a novel skyline operator, namely stochastic skylines. In the light of the expected utility principle, stochastic skylines guarantee to provide the minimum set of candidates ...
Information Sciences, 2015
Efficient computation of skyline probability over uncertain preferences has not received much attention in the database community as compared to skyline probability computation over uncertain data. All known algorithms for probabilistic skyline computation over uncertain preferences attempt to find inexact value of skyline probability by resorting to sampling or to approximation scheme. Exact computation of skyline probability for database with uncertain preferences of moderate size is not possible with any of the existing algorithms. In this paper, we propose an efficient algorithm that can compute skyline probability exactly for reasonably large database. The inclusion-exclusion principle is used to express skyline probability in terms of joint probabilities of all possible combination. In this regard we introduce the concept of zero-contributing set which has zero effect in the signed aggregate of joint probabilities. Our algorithm employs a prefix-based k-level absorption to identify zero-contributing sets. It is shown empirically that only a very small portion of exponential search space remains after level wise application of prefix-based absorption. Thus it becomes possible to compute skyline probability with respect to large datasets. Detailed experimental analysis for real and synthetic datasets are reported to corroborate this claim. We also propose an incremental algorithm to compute skyline probability in dynamic scenarios wherein objects are added incrementally. Moreover, the theoretical concepts developed in this paper help to devise an efficient technique to compute skyline probability of all objects in the database. We show that the exponential search space is pruned once and then for each individual object skyline probability can be derived by inspecting a portion of the pruned lattice. We also use a concept of revival of absorbed pairs. We believe that this process is more efficient than computing the skyline probability individually.
The Journal of Supercomputing, 2020
In recent years, numerous applications have been continuously generating large amounts of uncertain data. The advanced analysis queries such as skyline operators are essential topics to extract interesting objects from the vast uncertain dataset. Recently, the MapReduce system has been widely used in the area of big data analysis. Although the probabilistic skyline query is not decomposable, it does not make sense to implement the probabilistic skyline query in the MapReduce framework. This paper proposes an effective parallel method called parallel computation of probabilistic skyline query (PCPS) that can measure the probabilistic skyline set in one MapReduce computation pass. The proposed method takes into account the critical sections and detects data with a high probability of existence through a proposed smart sampling algorithm. PCPS implements a new approach to the fair allocation of input data. The experimental results indicate that our proposed approach can not only reduce the processing time of the probabilistic skyline queries, but also achieve fair precision with varying dimensionality degrees.
2011
The skyline of a relation is the set of tuples that are not dominated by any other tuple in the same relation, where tuple u dominates tuple v if u is no worse than v on all the attributes of interest and strictly better on at least one attribute. Previous attempts to extend skyline queries to probabilistic databases have proposed either a weaker form of domination, which is unsuitable to univocally define the skyline, or a definition that implies algorithms with exponential complexity. In this paper we demonstrate how, given a semantics for linearly ranking probabilistic tuples, the skyline of a probabilistic relation can be univocally defined. Our approach preserves the three fundamental properties of skyline: 1) it equals the union of all top-1 results of monotone scoring functions, 2) it requires no additional parameter to be specified, and 3) it is insensitive to actual attribute scales. We also detail efficient sequential and index-based algorithms.
IEEE Transactions on Knowledge and Data Engineering, 2013
In a deterministic relation R, a tuple u dominates tuple v if u is no worse than v on all the attributes of interest, and strictly better than v on at least one attribute. This notion of Pareto domination is at the heart of skyline queries, that return the set of undominated tuples in R. Unlike previous approaches, in which the skyline of a probabilistic relation is not univocally defined, being it dependent on a threshold parameter, in this paper we demonstrate that, given a semantics for linearly ranking probabilistic tuples, the concept of skyline is well-defined even in the probabilistic case. Our approach exploits the order-theoretic definition of Pareto domination and preserves the three fundamental properties the skyline has in the deterministic case: 1) it equals the union of all top-1 results of monotone scoring functions, 2) it requires no additional parameter, and 3) it is insensitive to actual attribute scales. We then show how domination among probabilistic tuples (or P-domination for short) can be efficiently checked by means of a set of rules. We detail such rules for all most notable semantics for ranking of probabilistic tuples. Since computing the skyline of a probabilistic relation is a time-consuming task, we introduce a family of algorithms for checking P-domination rules in an optimized way. Our experiments show that these algorithms can dramatically reduce the actual execution times with respect to a naïve evaluation, which makes skyline queries applicable also to large probabilistic datasets.
Information Sciences, 2007
In a number of emerging streaming applications, the data values that are produced have an associated time interval for which they are valid. A useful computation over such streaming data is to produce a continuous and valid skyline summary. Previous work on skyline algorithms have only focused on evaluating skylines over static data sets, and there are no known algorithms for skyline computation in the continuous setting. In this paper, we introduce the continuous time-interval skyline operator, which continuously computes the current skyline over a data stream. We present a new algorithm called LookOut for evaluating such queries efficiently, and empirically demonstrate the scalability of this algorithm. In addition, we also examine the effect of the underlying spatial index structure when evaluating skylines. Whereas previous work on skyline computations have only considered using the R *-tree index structure, we show that for skyline computations using an underlying quadtree has significant performance benefits over an R *-tree index.
2018
Traditionally, skyline and ranking queries have been treated separately as alternative ways of discovering interesting data in potentially large datasets. While ranking queries adopt a specific scoring function to rank tuples, skyline queries return the set of non-dominated tuples and are independent of attribute scales and scoring functions. Ranking queries are thus less general, but cheaper to compute and widely used. In this paper, we integrate these two approaches under the unifying framework of restricted skylines by applying the notion of dominance to a set of scoring functions of interest.
2011
In many applications involving the multiple criteria optimal decision making, users may often want to make a personal trade-off among all optimal solutions. As a key feature, the skyline in a multi-dimensional space provides the minimum set of candidates for such purposes by removing all points not preferred by any (monotonic) utility/scoring functions; that is, the skyline removes all objects not preferred by any user no mater how their preferences vary. Driven by many applications with uncertain data, the probabilistic skyline model is proposed to retrieve uncertain objects based on skyline probabilities. Nevertheless, skyline probabilities cannot capture the preferences of monotonic utility functions. Motivated by this, in this paper we propose a novel skyline operator, namely stochastic skyline. In the light of the expected utility principle, stochastic skyline guarantees to provide the minimum set of candidates for the optimal solutions over all possible monotonic multiplicative utility functions. In contrast to the conventional skyline or the probabilistic skyline computation, we show that the problem of stochastic skyline is NP-complete with respect to the dimensionality. Novel and efficient algorithms are developed to efficiently compute stochastic skyline over multidimensional uncertain data, which run in polynomial time if the dimensionality is fixed. We also show, by theoretical analysis and experiments, that the size of stochastic skyline is quite similar to that of conventional skyline over certain data. Comprehensive experiments demonstrate that our techniques are efficient and scalable regarding both CPU and IO costs.
Information Systems, 2013
Skyline computation has many applications including multi-criteria decision making. In this paper, we study the problem of efficient processing of continuous skyline queries over sliding windows on uncertain data elements regarding given probability thresholds. We first characterize what kind of elements we need to keep in our query computation. Then we show the size of dynamically maintained candidate set and the size of skyline. We develop novel, efficient techniques to process a continuous, probabilistic skyline query. Finally, we extend our techniques to the applications where multiple probability thresholds are given or we want to retrieve "top-k" skyline data objects. Our extensive experiments demonstrate that the proposed techniques are very efficient and handle a high-speed data stream in real time.
The perception of skyline query is to find a set of objects that is much preferred in all dimensions. While this theory is easily applicable on certain and complete database, however, when it comes to data integration of databases where each has different representation of data in a same dimension, it would be difficult to determine the dominance relation between the underlying data. In this paper, we propose a framework, SkyQUD, to efficiently compute the skyline probability of datasets in uncertain dimensions. We explore the effects of having datasets with uncertain dimensions in relation to the dominance relation theory and propose a framework that is able to support skyline queries on this type of datasets.
Applied Soft Computing, 2017
In recent years, a great attention has been paid to skyline computation over uncertain data. In this paper, we study how to conduct advanced skyline analysis over uncertain databases where uncertainty is modeled thanks to the evidence theory (a.k.a., belief functions theory). We particularly tackle an important issue, namely the skyline stars (denoted by SKY 2) over the evidential data. This kind of skyline aims at retrieving the best evidential skyline objects (or the stars). Efficient algorithms have been developed to compute the SKY 2. Extensive experiments have demonstrated the efficiency and effectiveness of our proposed approaches that considerably refine the huge skyline. In addition, the conducted experiments have shown that our algorithms significantly outperform the basic skyline algorithms in terms of CPU and memory costs.
Information Sciences, 2012
Uncertain data are inevitable in many applications due to various factors such as the limitations of measuring equipment and delays in data updates. Although modeling and querying uncertain data have recently attracted considerable attention from the database community, there are still many critical issues to be resolved with respect to conducting advanced analysis on uncertain data. In this paper, we study the execution of the probabilistic skyline query over uncertain data streams. We propose a novel sliding window skyline model where an uncertain tuple may take the probability to be in the skyline at a certain timestamp t. Formally, a Wp-Skyline(p, t) contains all the tuples whose probabilities of becoming skylines are at least p at timestamp t. However, in the stream environment, computing a probabilistic skyline on a large number of uncertain tuples within the sliding window is a daunting task in practice. In order to efficiently calculate Wp-Skyline, we propose an efficient and effective approach, namely the candidate list approach, which maintains lists of candidates that might become skylines in future sliding windows. We also propose algorithms that continuously monitor the newly incoming and expired data to maintain the skyline candidate set incrementally. To further reduce the computation cost of deciding whether or not a candidate tuple belongs to the skyline, we propose an enhanced refinement strategy that is based on a multi-dimensional indexing structure combined with a grouping-and-conquer strategy. To validate the effectiveness of our proposed approach, we conduct extensive experiments on both real and synthetic data sets and make comparisons with basic techniques.
2021 30th Wireless and Optical Communications Conference (WOCC), 2021
Skyline is widely used in reality to solve multicriteria problems, such as environmental monitoring and business decision-making. When a data is not worse than another data on all criteria and is better than another data at least one criterion, the data is said to dominate another data. When a data item is not dominated by any other data item, this data is said to be a member of the skyline. However, as the number of criteria increases, the possibility that a data dominates another data decreases, resulting in too many members of the skyline set. To solve this kind of problem, the concept of the k-dominant skyline was proposed, which reduces the number of skyline members by relaxing the limit. The uncertainty of the data makes each data have a probability of appearing, so each data has the probability of becoming a member of the k-dominant skyline. When a new data item is added, the probability of other data becoming members of the k-dominant skyline may change. How to quickly update the k-dominant skyline for real-time applications is a serious problem. This paper proposes an effective method, Middle Indexing (MI), which filters out a large amount of irrelevant data in the uncertain data stream by sorting data specifically, so as to improve the efficiency of updating the k-dominant skyline. Experiments show that the proposed MI outperforms the existing method by approximately 13% in terms of computation time.
2010
The skyline query returns the most interesting tuples according to a set of explicitly defined preferences among attribute values. This work relaxes this requirement, and allows users to pose meaningful skyline queries without stating their choices. To compensate for missing knowledge, we first determine a set of uncertain preferences based on user profiles, i.e., information collected for previous contexts. Then, we define a probabilistic contextual skyline query (p-CSQ) that returns the tuples which are interesting with high probability. We emphasize that, unlike past work, uncertainty lies within the query and not the data, i.e., it is in the relationships among tuples rather than in their attribute values. Furthermore, due to the nature of this uncertainty, popular skyline methods, which rely on a particular tuple visit order, do not apply for p-CSQs. Therefore, we present novel non-indexed and index-based algorithms for answering p-CSQs. Our experimental evaluation concludes that the proposed techniques are significantly more efficient compared to a standard block nested loops approach.
Database Systems for Advanced Applications, 2014
Uncertainty is inherent in many important applications, such as data integration, environmental surveillance, location-based services (LBS), sensor monitoring and radio-frequency identification (RFID). In recent years, we have witnessed significant research efforts devoted to producing probabilistic database management systems, and many important queries are re-investigated in the context of uncertain data models. In the paper, we study the problem of top k dominating query on multi-dimensional uncertain objects, which is an essential method in the multi-criteria decision analysis when an explicit scoring function is not available. Particularly, we formally introduce the top k dominating model based on the state-of-the-art top k semantic over uncertain data. We also propose effective and efficient algorithms to identify the top k dominating objects. Novel pruning techniques are proposed by utilizing the spatial indexing and statistic information, which significantly improve the performance of the algorithms in terms of CPU and I/O costs. Comprehensive experiments on real and synthetic datasets demonstrate the effectiveness and efficiency of our techniques.
Web Information Systems Engineering – WISE 2020, 2020
Given a graph, and a set of query vertices (subset of the vertices), the dynamic skyline query problem returns a subset of data vertices (other than query vertices) which are not dominated by other data vertices based on certain distance measure. In this paper, we study the dynamic skyline query problem on uncertain graphs (DySky). The input to this problem is an uncertain graph, a subset of its nodes as query vertices, and the goal here is to return all the data vertices which are not dominated by others. We employ two distance measures in uncertain graphs, namely, Majority Distance, and Expected Distance. Our approach is broadly divided into three steps: Pruning, Distance Computation, and Skyline Vertex Set Generation. We implement the proposed methodology with three publicly available datasets and observe that it can find out skyline vertex set without taking much time even for million sized graphs if expected distance is concerned. Particularly, the pruning strategy reduces the computational time significantly. CCS CONCEPTS • Information systems → Data management systems; • Mathematics of computing → Graph theory; • Theory of computation → Probabilistic computation.
International Journal of Approximate Reasoning, 2017
For determining skyline objects for an uncertain database with uncertain preferences, it is necessary to compute the skyline probability of a given object with respect to other objects. The problem boils down to computing the probability of the union of events from the probabilities of all possible joint probabilities. Linear Bonferroni bound is concerned with computing the bounds on the probability of the union of events with partial information. We use this technique to estimate the skyline probability of an object and propose a polynomial-time algorithm for computing sharp upper bound. We show that the use of partial information does not affect the quality of solution but helps in improving the efficiency. We formulate the problem as a Linear Programming Problem (LPP) and characterize a set of feasible points that is believed to contain all extreme points of the LPP. The maximization of the objective function over this set of points is equivalent to a bi-polar quadratic optimization problem. We use a spectral relaxation technique to solve the bipolar quadratic optimization problem. The proposed algorithm is of O (n 3) time complexity and is the first ever polynomial-time algorithm to determine skyline probability. We show that the bounds computed by our proposed algorithm determine almost the same set of skyline objects as that with the deterministic algorithm. Experimental results are presented to corroborate this claim.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.