We have a multi-billion node business graph and we are unable to run triangleCount() for nodes of any significant degree - 1,000 or beyond. The complexity reasons for this are obvious given the implementation is just a motif for (a)->(b)->(c) with a groupBy() to accomplish the counts.
val triangles = g2.find("(a)-[]->(b); (b)-[]->(c); (a)-[]->(c)")
val triangleCounts = triangles
.select(explode(array(col("a.id"), col("b.id"), col("c.id"))).as(ID))
.groupBy(ID)
.count()
While this is a straightforward implementation I am investigating the literature for a better MapReduce implementation (such as L16?) or a message passing implementation. I don't know the right way to do this and am interested in suggestions by the core contributors - @thunterdb @jkbradley @mengxr and @felixcheung.
If we can figure out a way to optimize GraphFrame.triangleCount() I will take on the implementation :)