Worst-Case Optimal Algorithms for Parallel Query Processing

Dan Suciu

Worst-Case Optimal Algorithms for Parallel Query Processing

Dan Suciu

2016

Sign up for access to the world's latest research

checkGet notified about relevant papers

checkSave papers to use in your research

checkJoin the discussion with peers

checkTrack your impact

Abstract

In this paper, we study the communication complexity for the problem of computing a conjunctive query on a large database in a parallel setting with $p$ servers. In contrast to previous work, where upper and lower bounds on the communication were specified for particular structures of data (either data without skew, or data with specific types of skew), in this work we focus on worst-case analysis of the communication cost. The goal is to find worst-case optimal parallel algorithms, similar to the work of [18] for sequential algorithms. We first show that for a single round we can obtain an optimal worst-case algorithm. The optimal load for a conjunctive query $q$ when all relations have size equal to $M$ is $O(M/p^{1/\psi^*})$, where $\psi^*$ is a new query-related quantity called the edge quasi-packing number, which is different from both the edge packing number and edge cover number of the query hypergraph. For multiple rounds, we present algorithms that are optimal for several c...

Dan Suciu

Proceedings of the 36th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems - PODS '17, 2017

We study the optimal communication cost for computing a full conjunctive query Q over p distributed servers. Two prior results were known. First, for one-round algorithms over skew-free data the optimal communication cost per server is m/p 1/τ * (Q) , where m is the size of the largest input relation, and τ * is the fractional vertex covering number of the query hypergraph. Second, for multi-round algorithms and unrestricted database instances, it was shown that any algorithm requires at least m/p 1/ρ * (Q) communication cost per server, where ρ * (Q) is the fractional edge covering number of the query hypergraph; but no matching algorithms were known for this case (except for two restricted queries: chains and cycles). In this paper we describe a multi-round algorithm that computes any query with load m/p 1/ρ * (Q) per server, in the case when all input relations are binary. Thus, we prove this to be the optimal load for all queries over binary input relations. Our algorithm represents a non-trivial extension of previous algorithms for chains and cycles, and exploits some unique properties of graphs, which no longer hold for hyper-graphs.

Log In

Worst-Case Optimal Algorithms for Parallel Query Processing

Sign up for access to the world's latest research

Abstract

Related papers

Related topics

Related papers