Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2008, 2008 IEEE 24th International Conference on Data Engineering
This paper introduces U-relations, a succinct and purely relational representation system for uncertain databases. U-relations support attribute-level uncertainty using vertical partitioning. If we consider positive relational algebra extended by an operation for computing possible answers, a query on the logical level can be translated into, and evaluated as, a single relational algebra query on the Urelation representation. The translation scheme essentially preserves the size of the query in terms of number of operations and, in particular, number of joins. Standard techniques employed in off-the-shelf relational database management systems are effective for optimizing and processing queries on U-relations. In our experiments we show that query evaluation on U-relations scales to large amounts of data with high degrees of uncertainty.
Journal of Computer Science and Cybernetics
In this paper, we propose a new probabilistic relational database model, denote by PRDB, as an extension of the classical relational database model where the uncertainty of relational attribute values and tuples are respectively represented by finite sets and probability intervals. A probabilistic interpretation of binary relations on finite sets is proposed for the computation of their probability measures. The combination strategies on probability intervals are employed to combine attribute values and compute uncertain membership degrees of tuples in a relation. The fundamental concepts of the classical relational database model are extended and generalized for PRDB. Then, the probabilistic relational algebraic operations are formally defined accordingly in PRDB. In addition, a set of the properties of the algebraic operations in this new model also are formulated and proven.
International Journal of Innovative Research in Science, Engineering and Technology, 2012
Many real world applications need a database that stores probabilistic and uncertain database. Trio is a robust prototype build to store and retrieve uncertain and lineage data. It also supports some features of a relational DBMS. ULDB is an extension of relational databases with expressive construct for representing and manipulating both lineage and uncertainty. ULDB representation is complete and it permits straightforward implementation of many relational operations. Currently Trio performs only select-project-join queries and some set operations. Queries are expressed using TriQL query language. This paper highlights on how multiple aggregation can be handled in select clause in Trio system for uncertain and probabilistic data. It also highlights on how distinct clause can be used along with aggregation function. The results on the implementation of minus and intersect all clause in Trio system have been discussed. These operations allow users to use Trio system in a more flexib...
Distributed and Parallel Databases, 1993
In heterogeneous database systems, partial values have been used to resolve some schema integration problems. Performing operations on partial values may produce maybe tuples in the query result which cannot be compared. Thus, users have no way to distinguish which maybe tuple is the most possible answer. In this paper, the concept of partial values is generalized to probabilistic partial values. We propose an approach to resolve the schema integration problems using probabilistic partial values and develop a full set of extended relational operators for manipulating relations containing probabilistic partial values. With this approach, the uncertain answer tuples of a query are associated with degrees of uncertainty (represented by probabilities). That provides users a comparison among maybe tuples and a better understanding on the query results. Besides, extended selection and join are generalized to c~-selection and c~-join, respectively, which can be used to filter out maybe tuples with low probabilities-those which have probabilities smaller than a.
Proceedings of the 15th ACM international conference on Information and knowledge management - CIKM '06, 2006
In many applications data values are inherently uncertain. This includes moving-objects, sensors and biological databases. There has been recent interest in the development of database management systems that can handle uncertain data. Some proposals for such systems include attribute values that are uncertain. In particular, an attribute value can be modeled as a range of possible values, associated with a probability density function. Previous efforts for this type of data have only addressed simple queries such as range and nearest-neighbor queries. Queries that join multiple relations have not been addressed in earlier work despite the significance of joins in databases. In this paper we address join queries over uncertain data. We propose a semantics for the join operation, define probabilistic operators over uncertain data, and propose join algorithms that provide efficient execution of probabilistic joins. The paper focuses on an important class of joins termed probabilistic threshold joins that avoid some of the semantic complexities of dealing with uncertain data. For this class of joins we develop three sets of optimization techniques: item-level, page-level, and index-level pruning. These techniques facilitate pruning with little space and time overhead, and are easily adapted to most join algorithms. We verify the performance of these techniques experimentally.
2018 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), 2018
In many real applications, data are intrinsically uncertain due to measurement errors, interpretability issues, information incompleteness, etc. In those uncertain databases, users usually express quality requirements when the system evaluates their queries. However, as they may not be familiar with the contents of the queried database, their queries may be failing i.e., they may return no results or results that do not satisfy the expected degree of certainty. To provide users with relevant information in order to obtain alternative satisfactory results, we introduce a cooperative approach based on the dualization concept. This approach computes a set of meaningful subqueries (MFSs and XSSs) of the initial failing query, which is of paramount importance for query reformulation and relaxation purposes. The conducted experiments show that our proposition, a Mixed Dualization Matrix-Based approach (MDMB), outperforms existing algorithms, especially for large queries.
Proceedings of the 2021 International Conference on Management of Data, 2021
Incomplete and probabilistic database techniques are principled methods for coping with uncertainty in data. Unfortunately, the class of queries that can be answered eciently over such databases is severely limited, even when advanced approximation techniques are employed. We introduce attribute-annotated uncertain databases (AU-DBs), an uncertain data model that annotates tuples and attribute values with bounds to compactly approximate an incomplete database. AU-DBs are closed under relational algebra with aggregation using an ecient evaluation semantics. Using optimizations that trade accuracy for performance, our approach scales to complex queries and large datasets, and produces accurate results.
2008
The inherent uncertainty of data present in numerous applications such as sensor databases, text annotations, and information retrieval motivate the need to handle imprecise data at the database level. Uncertainty can be at the attribute or tuple level and is present in both continuous and discrete data domains. This paper presents a model for handling arbitrary probabilistic uncertain data (both discrete and continuous) natively at the database level. Our approach leads to a natural and efficient representation for probabilistic data. We develop a model that is consistent with possible worlds semantics and closed under basic relational operators. This is the first model that accurately and efficiently handles both continuous and discrete uncertainty. The model is implemented in a real database system (PostgreSQL) and the effectiveness and efficiency of our approach is validated experimentally.
2007
Top-k processing in uncertain databases is semantically and computationally different from traditional top-k processing. The interplay between score and uncertainty makes traditional techniques inapplicable. We introduce new probabilistic formulations for top-k queries. Our formulations are based on "marriage" of traditional top-k semantics and possible worlds semantics. In the light of these formulations, we construct a framework that encapsulates a state space model and efficient query processing techniques to tackle the challenges of uncertain data settings. We prove that our techniques are optimal in terms of the number of accessed tuples and materialized search states. Our experiments show the efficiency of our techniques under different data distributions with orders of magnitude improvement over naïve materialization of possible worlds.
2009
Abstract The ability to flexibly compose confidence computation with the operations of relational algebra is an important feature of probabilistic database query languages. Computing confidences is computationally hard, however, and has to be approximated in practice.
1994
As models of the real world, databases are often permeated with forms of uncertainty, including imprecision, incompleteness, vagueness, inconsis- tency, and ambiguity. This chapter addresses issues of database uncer- tainty. It defines basic terminology, and it classifies the various kinds of uncertainty. It then surveys solutions that have been attempted, and it speculates on the reasons that have hindered the development of general- purpose database systems with powerful uncertainty capabilities. Finally, it describes challenging new applications that will require such capabilities, and it points to promising directions for research.
We propose an extension of possibilistic databases that also includes provenance. The introduction of provenance makes our model closed under selection with equalities, projection and join. In addition the computation of query computing with possibilities is polynomial, in contrast with current models that combine provenance with probabilities and have #P complexity.
Computer and Information Science, 2015
In the last years, uncertainty management became an important aspect as the presence of uncertain data increased rapidly. Due to the several advanced technologies that have been developed to record large quantity of data continuously, resulting is a data that contain errors or may be partially complete. Instead of dealing with data uncertainty by removing it, we must deal with it as a source of information. To deal with this data, database management system should have special features to handle uncertain data. The aim of this paper is twofold: on one hand, to introduce some main concepts of uncertainty in database by focusing on different data management issues in uncertain databases such as join and query processing, database integration, indexing uncertain data, security and information leakage and representation formalisms. On the other hand, to provide a survey of the current database management systems dealing with uncertain data, presenting their features and comparing them.
2007
Top-k processing in uncertain databases is semantically and computationally different from traditional top-k processing. The interplay between query scores and data uncertainty makes traditional techniques inapplicable. We introduce URank, a system that processes new probabilistic formulations of top-k queries in uncertain databases. The new formulations are based on marriage of traditional top-k semantics with possible worlds semantics. URank encapsulates a new processing framework that leverages existing query processing capabilities, and implements efficient search strategies that integrate ranking on scores with ranking on probabilities, to obtain meaningful answers for top-k queries.
bvicam.ac.in
Databases today are deterministic, that is, an item is either in the database or not. Similarly, a tuple is either in the query result or not. This process of mapping the real world inherently includes ambiguities and uncertainties and is seldom perfect. In today's data-driven competitive world a wide range of applications have emerged that needs to handle very large, imprecise data sets with inherent uncertainties. Uncertain data is natural in many important real world applications like environmental surveillance, market analysis and quantitative economic research. Data uncertainty innate in these important real world applications is generally the result of factors like data randomness and incompleteness, misaligned schemas, limitations of measuring equipment, delayed data update, imprecise queries etc . Due to the importance of these applications and the rapidly increasing amount of uncertain data collected and accumulated, analyzing large collections of uncertain data has become an important task and has attracted more and more interest from the database community. Probabilistic Databases hold the promise of being a viable means for large-scale uncertainty management, increasingly being required in a large number of real world application domains . A probabilistic database is an uncertain database in which the possible worlds have associated probabilities, that is, an item belongs to the database is a probabilistic event either with tuple-existence uncertainty or with attribute-value uncertainty. However, a tuple as an answer to query is again a probabilistic event. An important aspect in tackling the research and development on uncertain data processing is the query answering techniques on uncertain and probabilistic data. Query processing in probabilistic databases remains a computational challenge as it is fundamentally more complex than other data models. There exists a rich collection of powerful, non-trivial techniques and results, some old, some very recent, that could lead to practical management techniques for probabilistic databases. However, all such techniques suffer from limitations of uncertainty inherent in result of the query. Hence, there is a need for a general probabilistic model that tackles this uncertainty at the grass root level. The basic tool for dealing with this uncertainty is probability which is defined for an event as the proportion of times that the event would occur in repetitions of essentially identical situations. Although useful and successful in many applications, probability theory is, in fact, appropriate for dealing with only a very special type of uncertainty for measuring information. Probabilistic databases are all the more susceptible to uncertainties in query results being exclusively dependent on the probabilities assigned with inherent uncertainty in the evaluation of probabilities. Thus it becomes a potential area where this fundamental problem can be addressed and a suitable correction can be made to probabilities evaluated thereof.
Proceedings of the 2019 International Conference on Management of Data, 2019
Certain answers are a principled method for coping with uncertainty that arises in many practical data management tasks. Unfortunately, this method is expensive and may exclude useful (if uncertain) answers. Thus, users frequently resort to less principled approaches to resolve uncertainty. In this paper, we propose Uncertainty Annotated Databases (UA-DBs), which combine an under-and over-approximation of certain answers to achieve the reliability of certain answers, with the performance of a classical database system. Furthermore, in contrast to prior work on certain answers, UA-DBs achieve a higher utility by including some (explicitly marked) answers that are not certain. UA-DBs are based on incomplete K-relations, which we introduce to generalize the classical set-based notion of incomplete databases and certain answers to a much larger class of data models. Using an implementation of our approach, we demonstrate experimentally that it efficiently produces tight approximations of certain answers that are of high utility.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.