Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2014, Proceedings of the 33rd ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems - PODS '14
To make query answering feasible in big datasets, practitioners have been looking into the notion of scale independence of queries. Intuitively, such queries require only a relatively small subset of the data, whose size is determined by the query and access methods rather than the size of the dataset itself. This paper aims to formalize this notion and study its properties. We start by defining what it means to be scale-independent, and provide matching upper and lower bounds for checking scale independence, for queries in various languages, and for combined and data complexity.
Lecture Notes in Computer Science, 2003
The problem of answering queries using views in data integration has recently received considerable attention. A number of algorithms, such as the bucket algorithm, the SVB algorithm, the MiniCon algorithm, and the inverse rules algorithm, have been proposed. However, integrity constraints, such as functional dependencies, have not been considered in these algorithms. Some efforts have been made in some inverse rule-based algorithms in the presence of functional dependencies. In this paper, we extend the bucket-based algorithms to handle query rewritings using views in the presence of functional dependencies. We build relationships between views containing no subgoal of a given query and the query itself. We present an algorithm which is scalable compared to the inverse rule-based algorithms. The problem of missing query rewritings in the presence of functional dependencies that occurs in the previous bucket-based algorithms is avoided. We prove that the query rewritings generated by our algorithm are maximally-contained rewritings relative to functional dependencies.
Proceedings of the VLDB Endowment, 2013
A query class is traditionally considered tractable if there exists a polynomial-time (PTIME) algorithm to answer its queries. When it comes to big data, however, PTIME algorithms often become infeasible in practice. A traditional and effective approach to coping with this is to preprocess data off-line, so that queries in the class can be subsequently evaluated on the data efficiently. This paper aims to provide a formal foundation for this approach in terms of computational complexity. (1) We propose a set of Π-tractable queries, denoted by ΠT 0 Q , to characterize classes of queries that can be answered in parallel poly-logarithmic time (NC) after PTIME preprocessing. We show that several natural query classes are Π-tractable and are feasible on big data.
Lecture Notes in Computer Science, 2009
In this paper, we investigate the problem of query rewriting using views in a hybrid language allowing nominals (i.e., individual names) to occur in intentional descriptions. Of particular interest, restricted form of nominals where individual names refer to simple values enable the specification of value constraints, i.e, sets of allowed values for attributes. Such constraints are very useful in practice enabling, for example, fine-grained description of queries and views in integration systems and thus can be exploited to reduce the query processing cost. We use description logics to formalize the problem of query rewriting using views in presence of value constraints and show that the technique of query rewriting can be used to process queries under the certain answer semantics. We propose a sound and complete query rewriting Bucket-like algorithm. Data mining techniques have been used to favor scalability w.r.t. the number of views. Experiments on synthetic datasets have been conducted.
The VLDB Journal, 2005
In this paper we study the following problem. Given a database and a set of queries, we want to find a set of views that can compute the answers to the queries, such that the amount of space, in bytes, required to store the viewset is minimum on the given database. (We also handle problem instances where the input has a set of database instances, as described by an oracle that returns the sizes of view relations for given view definitions.) This problem is important for applications such as distributed databases, data warehousing, and data integration. We explore the decidability and complexity of the problem for workloads of conjunctive queries. We show that results differ significantly depending on whether the workload queries have self-joins. Further, for queries without self-joins we describe a very compact search space of views, which contains all views in at least one optimal viewset. We present techniques for finding a minimum-size viewset for a single query without self-joins by using the shape of the query and its constraints, and validate the approach by extensive experiments.
Conference on Innovative Database System …, 2007
We present a "black-box" approach to estimating query cardinality that has no knowledge of query execution plans and data distribution, yet provides accurate estimates. It does so by grouping queries into syntactic families and learning the cardinality distribution of that group directly from points in a high-dimensional input space constructed from the query's attributes, operators, function arguments, aggregates, and constants. We envision an increasing need for such an approach in applications in which query cardinality is required for resource optimization and decision-making at locations that are remote from the data sources. Our primary case study is the Open SkyQuery federation of Astronomy archives, which uses a scheduling and caching mechanism at the mediator for execution of federated queries at remote sources. Experiments using real workloads show that the black-box approach produces accurate estimates and is frugal in its use of space and in computation resources. Also, the black-box approach provides dramatic improvements in the performance of caching in Open SkyQuery.
Proceedings of SEBD-2010, Rimini, Italy, 2010
We study a general framework for query rewriting in the presence of general FOL constraints, where standard theorem proving techniques (eg, tableau or resolution) can be used. The novel results of applying this framework include: 1) if the original constraints are domain independent, then so will be the query rewritten in terms of database predicates; 2) for infinite databases, the rewriting of conjunctive queries over connected views is decidable; 3) one can apply this technique to the guarded fragment of FOL, obtaining ...
Proceedings of the 2007 ACM SIGMOD international conference on Management of data - SIGMOD '07, 2007
In contrast to classical databases and IR systems, real-world information systems have to deal increasingly with very vague and diverse structures for information management and storage that cannot be adequately handled yet. While current object-relational database systems require clear and unified data schemas, IR systems usually ignore the structured information completely. Malleable schemas, as recently introduced, provide a novel way to deal with vagueness, ambiguity and diversity by incorporating imprecise and overlapping definitions of data structures. In this paper, we propose a novel query relaxation scheme that enables users to find best matching information by exploiting malleable schemas to effectively query vaguely structured information. Our scheme utilizes duplicates in differently described data sets to discover the correlations within a malleable schema, and then uses these correlations to appropriately relax the users' queries. In addition, it ranks results of the relaxed query according to their respective probability of satisfying the original query's intent. We have implemented the scheme and conducted extensive experiments with real-world data to confirm its performance and practicality.
ABSTRACT We propose two novel querying formalisms: monadically defined queries (MODEQs) and the more expressive nested monadically defined queries (NEMODEQs). Both subsume and go beyond conjunctive queries, conjunctive two-way regular path queries, and monadic Datalog queries.
University Computing, the Universities and Colleges Information Systems Association of the UK Bulletin of Academic Computing and Information Systems, 1989
We give a formal calculus, based on higher-order functions and indexed data, for the evaluation of database queries. Simple functions take the role sometimes played by tabular relations in evaluating queries, whilst a set of higher-order generators replaces the query-interpretation mechanism. The calculus may be implemented in any sufficiently powerful programming language and yields a computationally complete database query language. This paper gives an explicit presentation of the `transform calculus'.
Artificial Intelligence, 2014
We give a solution to the succinctness problem for the size of first-order rewritings of conjunctive queries in ontologybased data access with ontology languages such as OWL 2 QL, linear Datalog ± and sticky Datalog ± . We show that positive existential and nonrecursive datalog rewritings, which do not use extra non-logical symbols (except for intensional predicates in the case of datalog rewritings), suffer an exponential blowup in the worst case, while first-order rewritings can grow superpolynomially unless NP ⊆ P/poly. We also prove that nonrecursive datalog rewritings are in general exponentially more succinct than positive existential rewritings, while first-order rewritings can be superpolynomially more succinct than positive existential rewritings. On the other hand, we construct polynomial-size positive existential and nonrecursive datalog rewritings under the assumption that any data instance contains two fixed constants. (Thomas Schwentick), [email protected] (Michael Zakharyaschev)
Proceedings of the 35th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, 2016
A major theme in relational database theory is navigating the tradeoff between expressiveness and tractability for query languages, where the query-containment problem is considered a benchmark of tractability. The query class UCQ, consisting off unions of conjunctive queries, is a fragment of first-order logic that has a decidable query containment problem, but its expressiveness is limited. Extending UCQ with recursion yields Datalog, an expressive query language that has been studied extensively and has recently become popular in application areas such as declarative networking. Unfortunately, Datalog has an undecidable query containment problem. Identifying a fragment of Datalog that is expressive enough for applications but has a decidable query-containment problem has been an open problem for several years. In the area of graph databases, there has been a similar search for a query language that combines expressiveness and tractability. Because of the need to navigate along graph paths of unspecified length, transitive closure has been considered a fundamental operation. Query classes of increasing complexity -using the operations of disjunction, conjunction, projection, and transitive closure -have been studied, but the classes lacked natural closure properties. The class RQ of regular queries has emerged only recently as a natural query class that is closed under all of its operations and has a decidable query-containment problem. RQ turned out to be a fragment of Datalog where recursion can be used only to express transitive closure. Furthermore, it turns out that applying this idea to Datalog, that is, restricting recursion to the expression of transitive closure, does yield the long-sought goal -an expressive fragment of Datalog with a decidable query-optimization problem.
2009
This article deals with consistent query answering to conjunctive queries under primary key constraints. The repairs of an inconsistent database db are obtained by selecting a maximum number of tuples from db without ever selecting two tuples that agree on their primary key. For a Boolean conjunctive query q, we are interested in the following question: does there exist a Boolean first-order query ϕ such that for every database db, ϕ evaluates to true on db if and only if q evaluates to true on every repair of db? We address this problem for acyclic conjunctive queries in which no relation name occurs more than once. Our results improve previous solutions that are based on Fuxman-Miller join graphs.
Proceedings of the 39th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, 2020
We study consistent query answering with respect to key dependencies. Given a (possibly inconsistent) database instance and a set of key dependencies, a repair is an inclusion-maximal subinstance that satisfies all key dependencies. Consistent query answering for a Boolean query is the following problem: given a database instance as input, is the query true in every repair? In [Koutris and Wijsen, ICDT 2019], it was shown that for every self-join-free Boolean conjunctive query and set of key dependencies containing exactly one key dependency per relation name (also called the primary key), this problem is in FO, L-complete, or coNP-complete, and it is decidable which of the three cases applies. In this paper, we consider the more general case where a relation name can be associated with more than one key dependency. It is shown that in this more general setting, it remains decidable whether or not the above problem is in FO, for self-join-free Boolean conjunctive queries. Moreover, it is possible to effectively construct a first-order query that solves the problem whenever such a query exists. CCS CONCEPTS • Information systems → Relational database query languages; • Theory of computation → Incomplete, inconsistent, and uncertain databases; Logic and databases.
Models and Computability, 1999
We use standard notation from recursion theory [22, 25]. We define classes of functions that can be computed with a bound on the number of queries to an oracle. Definition 2.1 [2] FQ(n, A) is the collection of all total functions f such that f is recursive in A via an oracle Turing machine that makes at most n sequential (i.e., adaptive) queries to A. FQ || (n, A) is the collection of all total functions f such that f is recursive in A via an oracle Turing machine that makes at most n parallel (i.e., nonadaptive) queries to A (as in a weak truth-table reduction). FQ X (n, A) and FQ X || (n, A) are similar except that we also allow unlimited queries to X. Correspondingly, we define classes of sets that can be decided with a bound on the number of queries. Definition 2.2 • B ∈ Q(n, A) if χ B ∈ FQ(n, A). • B ∈ Q || (n, A) if χ B ∈ FQ || (n, A). • B ∈ Q X (n, A) if χ B ∈ FQ X (n, A). • B ∈ Q X || (n, A) if χ B ∈ FQ X || (n, A). If the oracle is a function g rather than a set A, complexity classes FQ(n, g), FQ || (n, g), FQ X (n, g), FQ X || (n, g), Q(n, g), Q || (n, g), Q X (n, g), and Q X || (n, g) are defined similarly to FQ(n, A) etc. For a class of sets C, we define FQ(n, C) = A∈C FQ(n, A), and we define FQ || (n, C) etc. similarly. Note that if (say) f ∈ FQ(3, A) then it might be that while trying to compute (say) f (10), and 3 INCORRECT answers are given, the computation might diverge. We now define the class of functions for which this does not happen. Definition 2.3 [2] FQC(n, A) is the collection of all total functions f such that f is recursive in A via an oracle Turing machine M () that has the following property: for all x, for all X, M X (x) makes at most n sequential queries to X and M X (x) ↓. Note 2.4 The classes FQC || (n, A), QC(n, A), and QC || (n, A) can easily be defined. The following notion has important connections to bounded queries which will be made explicit in Theorem 2.6. Definition 2.5 Let n ≥ 1. A function f is n-enumerable (denoted f ∈ EN(n)) if there exists a recursive function g such that, for all x, |W g(x) | ≤ n and f (x) ∈ W g(x). Let n ≥ 1. A function f is strongly n-enumerable (denoted f ∈ SEN(n)) if there exists a recursive function g such that, for all x, |D g(x) | ≤ n and f (x) ∈ D g(x). (This concept first appeared in a recursiontheoretic framework in [1]. The name "enumerable" was coined in [6].) Theorem 2.6 [2] If f is any function then (1) (∃X)[f ∈ FQ(n, X)] ⇐⇒ f ∈ EN(2 n); and (2) (∃X)[f ∈ FQC(n, X)] ⇐⇒ f ∈ SEN(2 n). The following definition introduceds two functions which have been very useful in the study of bounded queries. Definition 2.7 [2] Let n ≥ 1.
2000
Typical queries over data warehouses perform aggregation. One of the main ideas to optimize the execution of an aggregate query is to reuse results of previously answered queries. This leads to the problem of rewriting aggregate queries using views. More precisely, given a set of queries, called "views," and a new query, the task is to reformulate the new query with the help of the views in such a way that executing the reformulated query over the views yields the same result as executing the original query over the base relations. Due to a lack of theory, so far algorithms for this problem were rather ad-hoc. They were sound, but were not proven to be complete. In earlier work we have given syntactic characterizations for the equivalence of aggregate queries, and applied them decide when there exist rewritings. However, these decision procedures are highly nondeterministic and do not lend themselves immediately to an implementation. In the current paper, we refine ...
Database TheoryICDT'97, 1997
In this paper we study the expressiveness of local queries. By locality we mean | informally | that in order to check if a tuple belongs to the result of a query, one only has to look at a certain predetermined portion of the input. Examples include all relational calculus queries.
Lecture Notes in Computer Science, 2013
Local-As-View (LAV) mediators provide a uniform interface to a federation of heterogeneous data sources to attempt the execution of queries against the federation. LAV mediators rely on query rewriters to translate mediator queries into equivalent queries on the federated data sources. The query rewriting problem in LAV mediators has shown to be NP-complete, and there may be an exponential number of rewritings, making unfeasible the execution or even generation of all the rewritings for some queries. The complexity of this problem can be particularly impacted when queries and data sources are described using SPARQL conjunctive queries, for which millions of rewritings could be generated. We aim at providing an efficient solution to the problem of executing LAV SPARQL query rewritings while the gathered answer is as complete as possible. We formulate the Result-Maximal k-Execution problem (Re-MakE) as the problem of maximizing the query results obtained from the execution of only k rewritings. Additionally, a novel query execution strategy called GUN is proposed to solve the ReMakE problem. Our experimental evaluation demonstrates that GUN outperforms traditional techniques in terms of answer completeness and execution time.
Theoretical Computer Science, 1998
We i n vestigate queries in the presence of external functions with arbitrary inputs and outputs (atomic values, sets, nested sets etc). We propose a new notion of domain independence for queries with external functions which, in contrast to previous work, can also be applied to query languages with xpoints or other kinds of iterators. Next, we de ne two new notions of computable queries with external functions, and prove that they are equivalent, under the assumption that the external functions are total. Thus, our de nition of computable queries with external functions is robust. Finally, based on the equivalence result, we g i v e examples of complete query languages with external functions. A byproduct of the equivalence result is the fact that Relational Machines are complete for complex objects: it was known that they are not complete over at relations.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.