Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
Data Mining VII: Data, Text and Web Mining and their Business Applications
…
10 pages
1 file
The concept and semantics of null values in relational databases has been discussed widely since the introduction of the relational data model in the late 1960s. With the introduction of highly mobile, distributed databases, in order to preserve the accepted soundness and completeness criteria, the semantics of the null value needs to expand to reflect a localised lack of information that may not be apparent for the global database. This paper discusses an extension to the notion of nulls to include the semantics of 'local' nulls. The paper introduces local nulls in terms of amendments to the relational algebra and examines its impact on query languages. 1 Previous research and motivation Much of the research on the semantics of null values in relational databases dates back to the 1970s and 1980s [1-7]. The two definitions of nulls as given by Codd are missing and applicable, and missing and inapplicable [1] and Zaniolo [4] later proposed a third definition as, essentially, a lack of knowledge about the attribute's applicability, or no information. To handle null values, various logical approaches have been developed. For example, the commonly-used three value logic includes true, false (often by virtue of a value's absence-q.v. the closed-world assumption [8]), and a maybe value which indicates that the results may be true [9, 10]. A four value logic has also been proposed which includes an additional truth value, which represents the outcome of evaluating expressions which have inapplicable values [11, 12]. Approaches to accommodating null values in practical systems include the work of Motro [13] who uses the ideas of conceptual closeness fill the vacancies represented by a null value and Roth et al. who aim to include nulls in NF 2 databases [5]. Null values have also been studied in relation to schema evolution and integration [14, 15].
Lecture Notes in Computer Science, 2004
A considerable part of the information available in the real world is inherently imperfect and/or incomplete. Traditionally, the most commonly adopted modelling approaches for dealing with imperfect and missing information are based on the use of default values and of null values. However, in order to deal with imperfect information in a more efficient way, a database system needs some more advanced modelling facilities that better reflect the semantics of the data. Such facilities become even more necessary if data stemming from different data sources must be integrated in a distributed, federated database system. In this paper, the concept of a 'null' value has been revisited. A semantically richer definition, based on possibility theory, is proposed together with a description of the accompanying many-valued logic. Additionally, the potentials of the new approach with respect to the integration of data in multi-database environments or grid database services are illustrated.
2010
The theoretical study of the relational model of data is ongoing and highly developed. Yet the vast majority of real databases include incomplete data, and the incomplete data is widely modelled using special flags called nulls. As noted many times by Date and others, the inclusion of nulls is not compatible with the relational model and invalidates many of the theoretical results as well as requiring a three-valued logic for query support. In category theoretic applications to computer science, partial functions are frequently modelled by using a special value approach (the partial map classifier), or by explicit reference to the domain of definition subobject. In a former edition of the CATS conference the first author and his colleague Rosebrugh proved a Morita equivalence theorem showing that for database modelling the two approaches are equivalent, provided the domain of definition subobject is complemented. In this paper we study the uncomplemented domain of definition approac...
Information Sciences, 1997
An important reality when studying relational databases is the fact that entries in relational tables may often be \missing" or only partially speci ed. The study of such missing information has led to a rich body of work on \null values." It was recognized early on that there are many di erent types of null values, each of which re ects di erent intuitions about why a particular piece of information is missing. Di erent relations (or even the same relation) could contain di erent types of null values; yet, very little work has been done on providing a unifying model that reasons with di erent types of nulls. In this paper, we use constraints to provide a unifying framework for the most common types of nulls. We show how viewing tuples containing null values of these types can be viewed as constraints, and how this leads to an algebra for null values. In particular, this algebra contains a unique operator (called the \compaction" operator) used to remove redundancies from null valued relations. We have studied various properties of this algebra. We have built a prototype implementation based on the null valued operators described here and conducted various experiments using this testbed.
ACM SIGMOD Record, 1988
Reiter has proposed extended relational theory to formulate relational databases with null values and presented a query evaluation algorithm for such databases. However, due to indefinite information brought in by null values, Reiter's algorithm is sound but not complete. In this paper, we first propose an extended relation to represent indefinite information in relational databases. Then, we define an extended relational algebra for extended relations. Based on Reiter's extended relational theory, and our extended relations and the extended relational algebra, we present a sound and complete query evaluation algorithm for relational databases with null values
ArXiv, 2020
The design of SQL is based on a three-valued logic (3VL), rather than the familiar Boolean logic with truth values true and false, to accommodate the additional truth value unknown for handling nulls. It is viewed as indispensable for SQL expressiveness, but is at the same time much criticized for leading to unintuitive behavior of queries and thus being a source of programmer mistakes. We show that, contrary to the widely held view, SQL could have been designed based on the standard Boolean logic, without any loss of expressiveness and without giving up nulls. The approach itself follows SQL’s evaluation which only retains tuples for which conditions in the WHERE clause evaluate to true. We show that conflating unknown, resulting from nulls, with false leads to an equally expressive version of SQL that does not use the third truth value. Queries written under the two-valued semantics can be efficiently translated into the standard SQL and thus executed on any existing RDBMS. These ...
Proceedings of the 42nd ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems
The design of SQL is based on a three-valued logic (3VL), rather than the familiar two-valued Boolean logic (2VL). In addition to true and false, 3VL adds unknown to handle nulls. Viewed as indispensable for SQL expressiveness, it is often criticized for unintuitive behavior of queries and for being a source of programmer mistakes. We show that, contrary to the widely held view, SQL could have been designed based on 2VL, without any loss of expressiveness. Similarly to SQL's WHERE clause, which only keeps true tuples, we conflate false and unknown for conditions involving nulls to obtain an equally expressive 2VL-based version of SQL. This applies to the core of the 1999 SQL Standard. Queries written under the 2VL semantics can be efficiently translated into the 3VL SQL and thus executed on any existing RDBMS. We show that 2VL enables additional optimizations. To gauge its applicability, we establish criteria under which 2VL and 3VL semantics coincide, and analyze common benchmarks such as TPC-H and TPC-DS to show that most of their queries are such. For queries that behave differently under 2VL and 3VL, we undertake a user study to show a consistent preference for the 2VL semantics. CCS CONCEPTS • Information systems → Relational database query languages; Relational database model; • Theory of computation → Logic.
International journal of biomedical soft computing and human sciences, 2000
Uhiversio, oj"Sciences. Vietnam (reeetved1ONovembet'l999,revisedcu2dac'cepted25Mevtrli2cr")y Abstraet 71his paper extencls the concept of.fi.tnctional dependency in a database in which the presence of context nutl vatues is atlou'ed. ft is shown that the set qfArmstreng 's injLirence rutes forms a sound and complete axiom s.ystem.forlitnctionai dependencies vvith context nulls as wett, Some rutesfor the dota update procedures are also introdttced and examined to ensure that the database ttnder consideration stilt satiofies a given set oflfitnetional dependencies.
Information Processing Letters, 1986
When considering using databases to represent incomplete information, the relationship between two facts where one may imply the other needs to be addressed. In relational databases, this question becomes whether null completion is assumed. That is, does a (possibly partially-defined) tuple imply the existence of tuples that are' less informative' than the original tuple. We show that no relational algebra, that assumes equivalence under null completion, can include set-theoretic operators that are compatible with ordinary set theory. Thus, the approach of x-relations is incompatible with the axioms of a boolean algebra.
2021
In this paper we address the problem of handling inconsistencies in tables with missing values (also called nulls) and functional dependencies. Although the traditional view is that table instances must respect all functional dependencies imposed on them, it is nevertheless relevant to develop theories about how to handle instances that violate some dependencies. Regarding missing values, we make no assumptions on their existence: a missing value exists only if it is inferred from the functional dependencies of the table. We propose a formal framework in which each tuple of a table is associated with a truth value among the following: true, false, inconsistent or unknown; and we show that our framework can be used to study important problems such as consistent query answering, table merging, and data quality measures - to mention just a few. In this paper, however, we focus mainly on consistent query answering, a problem that has received considerable attention during the last decad...
2021
In this paper we address the problem of handling inconsistencies in tables with missing values (or nulls) and functional dependencies. Although the traditional view is that table instances must respect all functional dependencies imposed on them, it is nevertheless relevant to develop theories about how to handle instances that violate some dependencies. The usual approach to alleviate the impact of inconsistent data on the answers to a query is to introduce the notion of repair: a repair is a minimally different consistent instance and an answer is consistent if it is present in every repair. Our approach is fundamentally different: we use set theoretic semantics for tuples and functional dependencies that allow us to associate each tuple with a truth value among the following: true, false, inconsistent or unknown. The users of the table can then query the set of true tuples as usual. Regarding missing values, we make no assumptions on their existence: a missing value exists only i...
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.
The Computer Journal, 2014
Fundamenta Informaticae
Journal of Database Management, 2002
ACM Transactions on Database Systems, 2014
IEEE Transactions on Software Engineering, 2000
Database TheoryICDT 2007, 2006
Lecture Notes in Computer Science, 2012
arXiv (Cornell University), 2023
Proceedings of the ACM on programming languages, 2021
The Journal of Logic Programming, 1995
Lecture Notes in Computer Science, 2010
Lecture Notes in Computer Science, 2012
ACM Computing Surveys, 1984