Academia.eduAcademia.edu

External memory bisimulation reduction of big graphs

2012, arXiv (Cornell University)

Abstract

In this paper, we present, to our knowledge, the first known I/O efficient solutions for computing the k-bisimulation partition of a massive directed graph, and performing maintenance of such a partition upon updates to the underlying graph. Ubiquitous in the theory and application of graph data, bisimulation is a robust notion of node equivalence which intuitively groups together nodes in a graph which share fundamental structural features. kbisimulation is the standard variant of bisimulation where the topological features of nodes are only considered within a local neighborhood of radius k 0. The I/O cost of our partition construction algorithm is bounded by O(k • sort (|Et|) + k • scan(|Nt|) + sort (|Nt|)), while our maintenance algorithms are bounded by O(k • sort (|Et|) + k • sort (|Nt|)). The space complexity bounds are O(|Nt| + |Et|) and O(k • |Nt| + k • |Et|), resp. Here, |Et| and |Nt| are the number of disk pages occupied by the input graph's edge set and node set, resp., and sort (n) and scan(n) are the cost of sorting and scanning, resp., a file occupying n pages in external memory. Empirical analysis on a variety of massive real-world and synthetic graph datasets shows that our algorithms perform efficiently in practice, scaling gracefully as graphs grow in size.