Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2011, The VLDB Journal
…
26 pages
1 file
Uncertain data streams, where data are incomplete and imprecise, have been observed in many environments. Feeding such data streams to existing stream systems produces results of unknown quality, which is of paramount concern to monitoring applications. In this paper, we present the Claro system that supports stream processing for uncertain data naturally captured using continuous random variables. Claro employs a unique data model that is flexible and allows efficient computation. Built on this model, we develop evaluation techniques for relational operators by exploring statistical theory and approximation. We also consider query planning for complex queries given an accuracy requirement. Evaluation results show that our techniques can achieve high performance while satisfying accuracy requirements, and outperform state-of-the-art sampling methods.
Computing Research Repository, 2009
We present the design and development of a data stream system that captures data uncertainty from data collection to query processing to final result generation. Our system focuses on data that is naturally modeled as continuous ran- dom variables such as many types of sensor data. To provide an end-to-end solution, our system employs probabilistic modeling and inference to generate
Proceedings of the VLDB Endowment, 2010
Uncertain data streams are increasingly common in real-world deployments and monitoring applications require the evaluation of complex queries on such streams. In this paper, we consider complex queries involving conditioning (e.g., selections and group by's) and aggregation operations on uncertain data streams. To characterize the uncertainty of answers to these queries, one generally has to compute the full probability distribution of each operation used in the query. Computing distributions of aggregates given conditioned tuple distributions is a hard, unsolved problem. Our work employs a new evaluation framework that includes a general data model, approximation metrics, and approximate representations. Within this framework we design fast data-stream algorithms, both deterministic and randomized, for returning approximate distributions with bounded errors as answers to those complex queries. Our experimental results demonstrate the accuracy and efficiency of our approximation techniques and offer insights into the strengths and limitations of deterministic and randomized algorithms.
2010
Query processing on uncertain data streams has attracted a lot of attentions lately, due to the imprecise nature in the data generated from a variety of streaming applications, such as readings from a sensor network. However, all of the existing works on uncertain data streams study unbounded streams. This paper takes the first step towards the important and challenging problem of answering sliding-window queries on uncertain data streams, with a focus on arguably one of the most important types of queries-top-k queries. The challenge of answering sliding-window top-k queries on uncertain data streams stems from the strict space and time requirements of processing both arriving and expiring tuples in high-speed streams, combined with the difficulty of coping with the exponential blowup in the number of possible worlds induced by the uncertain data model. In this paper, we design a unified framework for processing sliding-window top-k queries on uncertain streams. We show that all the existing top-k definitions in the literature can be plugged into our framework, resulting in several succinct synopses that use space much smaller than the window size, while are also highly efficient in terms of processing time. In addition to the theoretical space and time bounds that we prove for these synopses, we also present a thorough experimental report to verify their practical efficiency on both synthetic and real data.
28th IEEE International Real-Time Systems Symposium (RTSS 2007), 2007
Data uncertainty is a common problem for the real-time monitoring of data streams. In this paper, we address the issue of efficiently monitoring the satisfaction/violation of user-defined constraints over data streams where the data uncertainty can be probabilistically characterized. We propose a monitoring architecture SPMON that can incorporate probabilistic models of uncertainty in constraint monitoring. We adapt the concept of data similarity in real-time databases to the processing of uncertain data streams. In doing so, we generalize the data similarity by a new concept psr (probabilistic similarity region) that allows us to define similarity relations for probabilistic data with respect to the set of constraints being monitored. This enables the construction of lightweight filters for saving bandwidth. We also show how to efficiently update the filter conditions at run-time.
2004
Sensor data streams exhibit special characteristics such as inherent information uncertainty and inherent data sample correlations, both within and across streams. We introduce a new data model, called Probabilistic Stream Relational Algebra (PSRA), that models a sensor data stream as a set of probabilistic data samples, along with prediction strategies for each attributes, capturing domain knowledge of inherent data correlations. We also explicitly associate every operation with schedule, specifying when next data sample should be produced, to facilitate resource management in sensor networks. We prove that operators in PSRA are non-blocking, thus making PSRA especially suitable for data stream processing. We also show that conventional relational model and existing deterministic data stream processing model can be modeled in PSRA.
2002
Recent years have witnessed an increasing interest in designing algorithms for querying and analyzing streaming data (i.e., data that is seen only once in a fixed order) with only limited memory. Providing (perhaps approximate) answers to queries over such continuous data streams is a crucial requirement for many application environments; examples include large telecom and IP network installations where performance data from different parts of the network needs to be continuously collected and analyzed.
2009
Networks of sensors are used in many different fields, from industrial applications to surveillance applications. A common feature of these applications is the necessity of a monitoring infrastructure that analyzes a large number of data streams and outputs values that satisfy certain constraints.
Proceedings. 16th International Conference on Scientific and Statistical Database Management, 2004.
We present a novel approach to approximate evaluation of standing aggregate queries over streaming data, subject to user-specified error bounds. Our method models the behavior of aggregates as Brownian motions, and adaptively updates the model according to stream characteristics. This approach has two advantages. First, it greatly improves system scalability since we can defer query evaluation as long as the difference between the returned and true aggregate values remains within user-specified bounds. Second, we are able to provide approximate answers during stream interruptions by estimating the rate at which the streams and the aggregate drift during the blackout periods. We also study processor allocation issues in such approximate aggregate evaluation systems. Our experiments show that our model captures the behavior of real-world streams such as sensor data and stock traces with excellent fidelity, and scales very well for large numbers of standing queries.
Proceedings of the 2003 ACM SIGMOD international conference on on Management of data - SIGMOD '03, 2003
Many applications employ sensors for monitoring entities such as temperature and wind speed. A centralized database tracks these entities to enable query processing. Due to continuous changes in these values and limited resources (e.g., network bandwidth and battery power), it is often infeasible to store the exact values at all times. A similar situation exists for moving object environments that track the constantly changing locations of objects. In this environment, it is possible for database queries to produce incorrect or invalid results based upon old data. However, if the degree of error (or uncertainty) between the actual value and the database value is controlled, one can place more confidence in the answers to queries. More generally, query answers can be augmented with probabilistic estimates of the validity of the answers. In this paper we study probabilistic query evaluation based upon uncertain data. A classification of queries is made based upon the nature of the result set. For each class, we develop algorithms for computing probabilistic answers. We address the important issue of measuring the quality of the answers to these queries, and provide algorithms for efficiently pulling data from relevant sensors or moving objects in order to improve the quality of the executing queries. Extensive experiments are performed to examine the effectiveness of several data update policies.
2008
our probabilistic data stream model where each data value is given by a probability distribution. In particular, for uniform and gaussian distributions, we show how we derive a set of constraints on distribution parameters as a metric of similarity distances, exploiting the semantics of probabilistic queries being monitored. The derived constraints enable us to formulate the probabilistic similarity region that suppresses unnecessary data transmission in a monitoring system.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.
2008 IEEE 24th International Conference on Data Engineering, 2008
IEEE Transactions on Knowledge and Data Engineering, 2000
Proceedings of the 2008 ACM SIGMOD international conference on Management of data - SIGMOD '08, 2008
VLDB '02: Proceedings of the 28th International Conference on Very Large Databases, 2002
Information Systems, 2007
Information Sciences, 2012
Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, 2015
IEEE Internet of Things Journal, 2019
Proceedings of the 2006 ACM SIGMOD international conference on Management of data - SIGMOD '06, 2006
a book on data stream …, 2004
Proceedings of the VLDB Endowment, 2010
IEEE Sensors Journal, 2015
IEEE Data(base) Engineering Bulletin, 2003
Advances in Database Systems, 2005