A Near-Optimal Parallel Algorithm for Joining Binary Relations

Dan Suciu

A Near-Optimal Parallel Algorithm for Joining Binary Relations

2020

Abstract

We present a constant-round algorithm in the massively parallel computation (MPC) model for evaluating a natural join where every input relation has two attributes. Our algorithm achieves a load of $\tilde{O}(m/p^{1/\rho})$ where $m$ is the total size of the input relations, $p$ is the number of machines, $\rho$ is the join's fractional edge covering number, and $\tilde{O}(.)$ hides a polylogarithmic factor. The load matches a known lower bound up to a polylogarithmic factor. At the core of the proposed algorithm is a new theorem (which we name {\em the isolated cartesian product theorem}) that provides fresh insight into the problem's mathematical structure. Our result implies that the {\em subgraph enumeration problem}, where the goal is to report all the occurrences of a constant-sized subgraph pattern, can be settled optimally (up to a polylogarithmic factor) in the MPC model.

The quest for efficient parallel algorithms for graph related problems necessitates not only fast computational schemes but also requires insights into their inherent structures that lend themselves to elegant problem solving methods. Towards this objective efficient parallel algorithms on a class of hypergraphs called acyclic hyper graphs and directed hypergraphs are developed in this thesis. Acyclic hypergraphs are precisely chordal graphs and its subclasses, and they have applications in rela tional databases and computer networks. In this thesis, firstly, we present efficient parallel algorithms for the following problems on graphs. determining whether a graph is strongly chordal, ptolemaic, or a block graph. If the graph is strongly chordal, determine the strongly perfect vertex elimination ordering. determining the minimal set of edges needed to make an arbitrary graph strongly chordal, ptolemaic, or a block graph. determining the minimum cardinality dominating set, connected dominating set, total dominating set, and the domatic number of a strongly chordal graph. Secondly, we show that the query implication problem (Q 1-> <2 2) on two queries, which is to determine whether the data retrieved by query Q x is always a sub set of the data retrieved by query Q 2, is not even in NP and in fact complete in U2P. We present several 'fine-grain' analysis of the query implication problem and show that the query implication can be solved in polynomial time given chordal queries. Thirdly, we develop efficient parallel algorithms for manipulating directed hypergraphs H such as finding a directed path in H , closure of H , and minimum equivalent hypergraph of H. We show that finding a directed path in a directed hypergraph is inherently sequential. For directed hypergraphs with fixed degree and diameter we present NC algorithms for manipulations. Directed hypergraphs are representation schemes for functional dependencies in relational databases. Finally, we also present an efficient parallel algorithm for multi-dimensional range search. We show that a set of points in a rectangular parallelepiped can be obtained in O (logn) time with only 2.1og2«-10.logn + 14 processors on a EREW-PRAM. A non-trivial implementation technique on the hypercube parallel architec ture is also presented. Our method can be easily generalized to the case of ddimensional range search.

Log In

A Near-Optimal Parallel Algorithm for Joining Binary Relations

Sign up for access to the world's latest research

Abstract

Related papers

Related topics