Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2012, Lecture Notes in Computer Science
Data completeness is an essential aspect of data quality as in many scenarios it is crucial to guarantee the completeness of query answers. Data might be incomplete in two ways: records may be missing as a whole, or attribute values of a record may be absent, indicated by a null. We extend previous work by two of the authors [10] that dealt only with the first aspect, to cover both missing records and missing attribute values. To this end, we refine the formalization of incomplete databases and identify the important special case where values of key attributes are always known. We show that in the presence of nulls, completeness of queries can be defined in several ways. We also generalize a previous approach stating completeness of parts of a database, using so-called table completeness statements. With this formalization in place, we define the main inferences for completeness reasoning over incomplete databases and present first results.
Proceedings of the 21st ACM international conference on Information and knowledge management - CIKM '12, 2012
Data completeness is an important aspect of data quality. We consider a setting, where databases can be incomplete in two ways: records may be missing and records may contain null values. We (i) formalize when the answer set of a query is complete in spite of such incompleteness, and (ii) we introduce table completeness statements, by which one can express that certain parts of a database are complete. We then study how to deduce from a set of tablecompleteness statements that a query can be answered completely. Null values as used in SQL are ambiguous. They can indicate either that no attribute value exists or that a value exists, but is unknown. We study completeness reasoning for the different interpretations. We show that in the combined case it is necessary to syntactically distinguish between different kinds of null values and present an encoding for doing that in standard SQL databases. With this technique, any SQL DBMS evaluates complete queries correctly with respect to the different meanings that nulls can carry. We study the complexity of completeness reasoning and provide algorithms that in most cases agree with the worst-case lower bounds.
2010
The theoretical study of the relational model of data is ongoing and highly developed. Yet the vast majority of real databases include incomplete data, and the incomplete data is widely modelled using special flags called nulls. As noted many times by Date and others, the inclusion of nulls is not compatible with the relational model and invalidates many of the theoretical results as well as requiring a three-valued logic for query support. In category theoretic applications to computer science, partial functions are frequently modelled by using a special value approach (the partial map classifier), or by explicit reference to the domain of definition subobject. In a former edition of the CATS conference the first author and his colleague Rosebrugh proved a Morita equivalence theorem showing that for database modelling the two approaches are equivalent, provided the domain of definition subobject is complemented. In this paper we study the uncomplemented domain of definition approac...
Proceedings of the VLDB Endowment, 2011
I am also thankful to Zeno Moriggl and Martin Prosch from the school IT department of the province of South Tyrol who initiated the research collaboration that led to my thesis and who invested their time to give me an understanding of their practical problems. Many thanks go to Franz Baader, who immediately agreed to take supervision from Dresden and made it possible for me to write my thesis in Bozen. Thanks to all people from the KRDB group in Bozen who welcomed me in their group and provided a friendly and productive atmosphere for working. Without my teacher and organizer of the Erasmus program, Uwe Petersohn, I might have never come to Bozen, thank you. Finally, thank you to my family for everything. 10
The assumption that a database includes a representation ol every occurrence in the real world environmrnl that it models (the Closed World Asscrtnplio?l) is frequently unrealistic, because it is always made on the database as a whole. This paper introduces a new type of dntab,ase information, called completeness inlormnlion, lo dcscrihe the subsets of the database for which this assumption is correct. With completeness information it is possible to determine whether each ansivcr to a user query is complete, or whether any subsets of it are complete. To users, answers which are accompanied by a statement about their completeness are more mraningful. First, the principles of completeness informn.lion are defined formally, using an abstract data model. Then, specific methods are described for implcmrnting completeness information in the relational modr4. With these methods, each relational algebra query can be n.ccompnnietl wi(.h a.n instantaneous verdict on its coml)letcness (or on the completcncss of some of its subsets).
Information Processing & Management, 1988
This article discusses a query processor to deal with incomplete information in a database. We suggest using the relaxed database, which is an abstraction from the original database, as a basis for a front-end query processor. The purposes of the relaxed database are twofold: first, to restrict the number of the objects to be processed in a query, and second, to aid the interpretation of a query.
2007
Abstract Incomplete information arises naturally in numerous data management applications. Recently, several researchers have studied query processing in the context of incomplete information. Most work has combined the syntax of a traditional query language like relational algebra with a nonstandard semantics such as certain or ranked possible answers. There are now also languages with special features to deal with uncertainty.
Proceedings of the twenty-ninth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems of data - PODS '10, 2010
Databases in real life are often neither entirely closed-world nor entirely open-world. Indeed, databases in an enterprise are typically partially closed, in which a part of the data is constrained by master data that contains complete information about the enterprise in certain aspects . It has been shown that despite missing tuples, such a database may turn out to have complete information for answering a query . This paper studies partially closed databases from which both tuples and values may be missing. We specify such a database in terms of conditional tables constrained by master data, referred to as c-instances. We first propose three models to characterize whether a c-instance T is complete for a query Q relative to master data. That is, depending on how missing values in T are instantiated, the answer to Q in T remains unchanged when new tuples are added. We then investigate four problems, to determine (a) whether a given c-instance is complete for a query Q, (b) whether there exists a c-instance that is complete for Q relative to master data available, (c) whether a c-instance is a minimal-size database that is complete for Q, and (d) whether there exists a c-instance of a bounded size that is complete for Q. We establish matching lower and upper bounds on these problems for queries expressed in a variety of languages, in each of the three models for specifying relative completeness.
arXiv (Cornell University), 2023
To answer database queries over incomplete data the gold standard is finding certain answers: those that are true regardless of how incomplete data is interpreted. Such answers can be found efficiently for conjunctive queries and their unions, even in the presence of constraints. With negation added, the problem becomes intractable however. We concentrate on the complexity of certain answers under constraints, and on effficiently answering queries outside the usual classes of (unions) of conjunctive queries by means of rewriting as Datalog and first-order queries. We first notice that there are three different ways in which query answering can be cast as a decision problem. We complete the existing picture and provide precise complexity bounds on all versions of the decision problem, for certain and best answers. We then study a well-behaved class of queries that extends unions of conjunctive queries with a mild form of negation. We show that for them, certain answers can be expressed in Datalog with negation, even in the presence of functional dependencies, thus making them tractable in data complexity. We show that in general Datalog cannot be replaced by first-order logic, but without constraints such a rewriting can be done in first-order. The paper is under consideration in Theory and Practice of Logic Programming (TPLP).
Fundamenta Informaticae
Codd's relational model describes just one possible world. To better cope with incomplete information, extended database models allow several possible worlds. Vague tables are one such convenient extended model where attributes accept sets of possible values (e.g., the manager is either Jill or Bob). However, conceptual database design in such cases remains an open problem. In particular, there is no canonical definition of functional dependencies (FDs) over possible worlds (e.g., each employee has just one manager). We identify several desirable properties that the semantics of such FDs should meet including Armstrong's axioms, the independence from irrelevant attributes, seamless satisfaction and implied by strong satisfaction. We show that we can define FDs such that they have all our desirable properties over vague tables. However, we also show that no notion of FD can satisfy all our desirable properties over a more general model (disjunctive tables). Our work formalizes a trade-off between having a general model and having well-behaved FDs.
1998
In databases, queries are usually defined on complete databases. In this paper we introduce and motivate the notion of extended queries that are defined on incomplete databases. We argue that the language of extended logic program is appropriate for representing extended queries. We show through examples that given a query, a particular extension of it has important characteristics which corresponds to removal of the CWA from the original specification of the query.
Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, 2015
In many applications including loosely coupled cloud databases, collaborative editing and network monitoring, data from multiple sources is regularly used for query answering. For reasons such as system failures, insufficient author knowledge or network issues, data may be temporarily unavailable or generally nonexistent. Hence, not all data needed for query answering may be available. In this paper, we propose a natural class of completeness patterns, expressed by selections on database tables, to specify complete parts of database tables. We then show how to adapt the operators of relational algebra so that they manipulate these completeness patterns to compute completeness patterns pertaining to query answers. Our proposed algebra is computationally sound and complete with respect to the information that the patterns provide. We show that stronger completeness patterns can be obtained by considering not only the schema but also the database instance and we extend the algebra to take into account this additional information. We develop novel techniques to efficiently implement the computation of completeness patterns on query answers and demonstrate their scalability on real data.
Information Processing Letters, 1986
When considering using databases to represent incomplete information, the relationship between two facts where one may imply the other needs to be addressed. In relational databases, this question becomes whether null completion is assumed. That is, does a (possibly partially-defined) tuple imply the existence of tuples that are' less informative' than the original tuple. We show that no relational algebra, that assumes equivalence under null completion, can include set-theoretic operators that are compatible with ordinary set theory. Thus, the approach of x-relations is incompatible with the axioms of a boolean algebra.
ACM Transactions on Database Systems, 2014
The term naïve evaluation refers to evaluating queries over incomplete databases as if nulls were usual data values, i.e., to using the standard database query evaluation engine. Since the semantics of query answering over incomplete databases is that of certain answers, we would like to know when naïve evaluation computes them: i.e., when certain answers can be found without inventing new specialized algorithms. For relational databases it is well known that unions of conjunctive queries possess this desirable property, and results on preservation of formulae under homomorphisms tell us that within relational calculus, this class cannot be extended under the open-world assumption. Our goal here is twofold. First, we develop a general framework that allows us to determine, for a given semantics of incompleteness, classes of queries for which naïve evaluation computes certain answers. Second, we apply this approach to a variety of semantics, showing that for many classes of queries beyond unions of conjunctive queries, naïve evaluation makes perfect sense under assumptions different from open-world. Our key observations are: (1) naïve evaluation is equivalent to monotonicity of queries with respect to a semantics-induced ordering, and (2) for most reasonable semantics of incompleteness, such monotonicity is captured by preservation under various types of homomorphisms. Using these results we find classes of queries for which naïve evaluation works, e.g., positive first-order formulae for the closed-world semantics. Even more, we introduce a general relation-based framework for defining semantics of incompleteness, show how it can be used to capture many known semantics and to introduce new ones, and describe classes of first-order queries for which naïve evaluation works under such semantics.
Information Systems, 2019
Certain answers are a widely accepted semantics of query answering over incomplete databases. As their computation is a coNP-hard problem, recent research has focused on developing (polynomial time) evaluation algorithms with correctness guarantees, that is, techniques computing a sound but possibly incomplete set of certain answers. The aim is to make the computation of certain answers feasible in practice, settling for under-approximations. In this paper, we present novel evaluation algorithms with correctness guarantees, which provide better approximations than current techniques, while retaining polynomial time data complexity. The central tools of our approach are conditional tables and the conditional evaluation of queries. We propose different strategies to evaluate conditions, leading to different approximation algorithms-more accurate evaluation strategies have higher running times, but they pay off with more certain answers being returned. Thus, our approach offers a suite of approximation algorithms enabling users to choose the technique that best meets their needs in terms of balance between efficiency and quality of the results.
ACM SIGMOD Record, 1988
Reiter has proposed extended relational theory to formulate relational databases with null values and presented a query evaluation algorithm for such databases. However, due to indefinite information brought in by null values, Reiter's algorithm is sound but not complete. In this paper, we first propose an extended relation to represent indefinite information in relational databases. Then, we define an extended relational algebra for extended relations. Based on Reiter's extended relational theory, and our extended relations and the extended relational algebra, we present a sound and complete query evaluation algorithm for relational databases with null values
Proceedings of the 22nd International Database Engineering & Applications Symposium, 2018
Incomplete information arises in many database applications, such as data integration, data exchange, inconsistency management, data cleaning, ontological reasoning, and many others. A principled way of answering queries over incomplete databases is to compute certain answers, which are query answers that can be obtained from every complete database represented by an incomplete one. For databases containing (labeled) nulls, certain answers to positive queries can be easily computed in polynomial time, but for more general queries with negation the problem becomes coNP-hard. To make query answering feasible in practice, one might resort to SQL's evaluation, but unfortunately, the way SQL behaves in the presence of nulls may result in wrong answers. Thus, on the one hand, SQL's evaluation is efficient but flawed, on the other hand, certain answers are a principled semantics but with high complexity. To deal with issue, recent research has focused on developing polynomial time ...
2007
Abstract MayBMS [4, 1, 3, 2] is a data management system for incomplete information developed at Saarland University. Its main features are a simple and compact representation system for incomplete information and a language called I-SQL with explicit operations for handling uncertainty. MayBMS is currently an extension of PostgreSQL and manages both complete and incomplete data and evaluates I-SQL queries.
2021
In this paper we address the problem of handling inconsistencies in tables with missing values (also called nulls) and functional dependencies. Although the traditional view is that table instances must respect all functional dependencies imposed on them, it is nevertheless relevant to develop theories about how to handle instances that violate some dependencies. Regarding missing values, we make no assumptions on their existence: a missing value exists only if it is inferred from the functional dependencies of the table. We propose a formal framework in which each tuple of a table is associated with a truth value among the following: true, false, inconsistent or unknown; and we show that our framework can be used to study important problems such as consistent query answering, table merging, and data quality measures - to mention just a few. In this paper, however, we focus mainly on consistent query answering, a problem that has received considerable attention during the last decad...
Lecture Notes in Computer Science, 2006
The Local Closed-World Assumption (LCWA) is a generalization of Reiter's Closed-World Assumption (CWA) for relational databases that may be incomplete. Two basic questions that are related to this assumption are: (1) how to represent the fact that only part of the information is known to be complete, and (2) how to properly reason with this information, that is: how to determine whether an answer to a database query is complete even though the database information is incomplete. In this paper we concentrate on the second issue based on a treatment of the first issue developed in earlier work of the authors. For this we consider a fixpoint semantics for declarative theories that represent locally complete databases. This semantics is based on 3-valued interpretations that allow to distinguish between the certain and possible consequences of the database's theory.
Proceedings of the VLDB Endowment, 2013
We present a system that computes for a query that may be incomplete, complete approximations from above and from below. We assume a setting where queries are posed over a partially complete database, that is, a database that is generally incomplete, but is known to contain complete information about specific aspects of its application domain. Which parts are complete, is described by a set of so-called table-completeness statements. Previous work led to a theoretical framework and an implementation that allowed one to determine whether in such a scenario a given conjunctive query is guaranteed to return a complete set of answers or not. With the present demonstrator we show how to reformulate the original query in such a way that answers are guaranteed to be complete. If there exists a more general complete query, there is a unique most specific one, which we find. If there exists a more specific complete query, there may even be infinitely many. In this case, we find the least specific specializations whose size is bounded by a threshold provided by the user. Generalizations are computed by a fixpoint iteration, employing an answer set programming engine. Specializations are found leveraging unification from logic programming.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.