On scale independence for querying big data

Wenfei Fan; Floris Geerts; Leonid Libkin

On scale independence for querying big data

Floris Geerts

2014, Proceedings of the 33rd ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems - PODS '14

Sign up for access to the world's latest research

checkGet notified about relevant papers

checkSave papers to use in your research

checkJoin the discussion with peers

checkTrack your impact

Abstract

To make query answering feasible in big datasets, practitioners have been looking into the notion of scale independence of queries. Intuitively, such queries require only a relatively small subset of the data, whose size is determined by the query and access methods rather than the size of the dataset itself. This paper aims to formalize this notion and study its properties. We start by defining what it means to be scale-independent, and provide matching upper and lower bounds for checking scale independence, for queries in various languages, and for combined and data complexity.

Michael McTear

Lecture Notes in Computer Science, 2003

The problem of answering queries using views in data integration has recently received considerable attention. A number of algorithms, such as the bucket algorithm, the SVB algorithm, the MiniCon algorithm, and the inverse rules algorithm, have been proposed. However, integrity constraints, such as functional dependencies, have not been considered in these algorithms. Some efforts have been made in some inverse rule-based algorithms in the presence of functional dependencies. In this paper, we extend the bucket-based algorithms to handle query rewritings using views in the presence of functional dependencies. We build relationships between views containing no subgoal of a given query and the query itself. We present an algorithm which is scalable compared to the inverse rule-based algorithms. The problem of missing query rewritings in the presence of functional dependencies that occurs in the previous bucket-based algorithms is avoided. We prove that the query rewritings generated by our algorithm are maximally-contained rewritings relative to functional dependencies.

Log In

On scale independence for querying big data

Sign up for access to the world's latest research

Abstract

Related papers

Related papers